Sunday, December 13, 2009

How not to write a plpgsql functions

I found very strange PL/pgSQL code. PL/pgSQL has some specifics, and is good, if programmer has good knowledge about it. The basic rule - don't repeat string or array concat too often inside loop in function. So bad code:
CREATE OR REPLACE FUNCTION generate_uuid_v4() RETURNS uuid AS $$
DECLARE
    value VARCHAR(36);
BEGIN
    value =          lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad((to_hex((ceil(random() * 255)::int & 15) | 64)), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad((to_hex((ceil(random() * 255)::int & 63) | 128)), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || '-';
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
    RETURN value::uuid;
END;
$$ LANGUAGE 'plpgsql';

postgres=# select generate_uuid_v4() from generate_series(1,10000);
           generate_uuid_v4           
--------------------------------------
 9833de6f-4ba6-49f1-8d5f-d98d10d79346
 340c6008-b8fe-4267-bcc5-96bcddb5f675
...
 6f54c524-bd02-4efc-84b3-b56d1c93f3ec
 96ee245e-2990-4842-9d36-45a66dab8adb
Time: 407,082 ms
This is good sample of wrong using PL/pgSQL language. PLPerl, maybe PLPython will do much better work. When we cannot use these languages, we have to use some tricks. With some modifications original code - we can have 2x faster code.
CREATE OR REPLACE FUNCTION _generate_uuid_v4() RETURNS uuid AS $$
BEGIN
    RETURN  (lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad((to_hex((ceil(random() * 255)::int & 15) | 64)), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad((to_hex((ceil(random() * 255)::int & 63) | 128)), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0'))::uuid;
END;
$$ LANGUAGE 'plpgsql'

postgres=# select _generate_uuid_v4() from generate_series(1,10000);
          _generate_uuid_v4           
--------------------------------------
 5134fe69-5d2a-4419-bf84-d72dc290f72f
 919e8663-b384-4eae-89d9-1026d6be4127
 f23f8f4f-6914-47e8-983f-155ace0d8860
...
 39951d8f-fbd1-48bf-9c5d-157eefbd5af6
 73318784-2cfa-4bcc-ad42-307fc4c173fc
Time: 203,516 ms
Original needs 400ms for ten thousand values. Edited code needs only 200ms for same number of values. This code should be inlined - when we use SQL language:
CREATE OR REPLACE FUNCTION _generate_uuid_v4() RETURNS uuid AS $$
SELECT (lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad((to_hex((ceil(random() * 255)::int & 15) | 64)), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad((to_hex((ceil(random() * 255)::int & 63) | 128)), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || '-'
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0')
   || lpad(to_hex(ceil(random() * 255)::int), 2, '0'))::uuid;
$$ LANGUAGE SQL;

postgres=# select _generate_uuid_v4() from generate_series(1,10000);
          _generate_uuid_v4           
--------------------------------------
 e725eb84-67ad-485e-bec2-d73485db4e69
 49f605d1-226e-4d9e-8457-30a925c98a0c
 ed759055-23e7-4322-a98a-552f65a297ae
...
 414e62c3-1b84-4bc9-8fd7-ba8bd0a72403
 03e27594-42c0-4ec9-a493-636b6d130aad
Time: 148,184 ms
With SQL function we safe other 50ms. PLpgSQL is good language - but it isn't C language or Fortran. It needs some different manners.

2 Comments:

At December 14, 2009 at 7:57 AM , Blogger Steve said...

If you can use the pgcrypto contrib module, you can get a bit more performance out of it. This is the function I use for generating random uuids:

create or replace function generate_uuid_v4() returns uuid as $$
select encode(gen_random_bytes(16), 'hex')::uuid
$$ language sql;

On my test system, this takes 88ms to generate 10,000 uuids vs 227ms for the SQL function in your post.

 
At December 14, 2009 at 8:03 AM , Blogger Pavel Stěhule said...

or you can use uuid-ossp contrib :).

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home