Sunday, August 29, 2010

performance issue of SQL functions

I though so sql functions is always faster than plpgsql function. But it's not true. Only when sql function is inlined, then function is faster, else plpgsql can be faster. Now I know, so sql functions where parameter is twice and more used are not inlined - because it means double parameter evaluation. you can see:
CREATE OR REPLACE FUNCTION is_empty(text) 
RETURNS bool AS $$ 
SELECT $1 IS NULL OR $1 = '' 
$$ LANGUAGE sql;
This function returns true when parameter is empty string or is null. The first parameter is used twice and then the body of this function isn't inlined.
postgres=# EXPLAIN VERBOSE SELECT count(is_empty(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12.50..12.77 rows=1 width=0)
   Output: count(is_empty(CASE WHEN (random() < 0.5::double precision) THEN NULL::text ELSE 'x'::text END))
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=0)
         Output: generate_series
         Function Call: generate_series(1, 100000)
(5 rows)
The execution time is 458ms. When I modify the body for one time paremeter using:
CREATE OR REPLACE FUNCTION is_empty(text) 
RETURNS bool AS $$
SELECT COALESCE($1,'') = '' 
$$ LANGUAGE sql;
Then function is inlined:
postgres=# EXPLAIN VERBOSE SELECT count(is_empty(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) FROM generate_series(1,100000);
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12.50..12.52 rows=1 width=0)
   Output: count((COALESCE(CASE WHEN (random() < 0.5::double precision) THEN NULL::text ELSE 'x'::text END, ''::text) = ''::text))
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=0)
         Output: generate_series
         Function Call: generate_series(1, 100000)
(5 rows)
and execution time is only 68ms (5 x faster than not inlined function). Attention: this note is valid for some more complex parameters - like volatile functions. The life isn't simple - for basic parameters (like columns or simple expression), the most simple SQL function (in my article function with OR operator and twice used parameter) can be better - wirking better with planner.

Tuesday, August 17, 2010

divide a table into a partitions

Hello I had to verify a speed of two methods of dividing data to partion. First method is based on dynamic SQL, second is based on cursor scan and prepared statements. I was surprised - both variants has a same speed.
CREATE OR REPLACE FUNCTION public.foo()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
BEGIN
  FOR i IN 0..9
  LOOP
    RAISE NOTICE '***** % *****', i;
    EXECUTE 'TRUNCATE TABLE ' || ('x' || i)::regclass;
    EXECUTE 'INSERT INTO ' || ('x' || i)::regclass ||
                ' SELECT * FROM x WHERE a >= $1 AND a < $1 + 100000'
       USING i * 100000;
  END LOOP;
  RETURN;
END;
$function$
This variant is shorter, but needs a active index on source table.
CREATE OR REPLACE FUNCTION public.foo1()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
DECLARE
  c CURSOR FOR SELECT * FROM x;
  r RECORD;
BEGIN
  TRUNCATE TABLE x0;
  TRUNCATE TABLE x1;
  TRUNCATE TABLE x2;
  TRUNCATE TABLE x3;
  TRUNCATE TABLE x4;
  TRUNCATE TABLE x5;
  TRUNCATE TABLE x6;
  TRUNCATE TABLE x7;
  TRUNCATE TABLE x8;
  TRUNCATE TABLE x9;

  FOR r IN c
  LOOP
    CASE
        WHEN r.a BETWEEN 0 AND 99999 THEN
            INSERT INTO x0 VALUES(r.a);
        WHEN r.a BETWEEN 100000 AND 199999 THEN
            INSERT INTO x1 VALUES(r.a);
        WHEN r.a BETWEEN 200000 AND 299999 THEN
            INSERT INTO x2 VALUES(r.a);
        WHEN r.a BETWEEN 300000 AND 399999 THEN
            INSERT INTO x3 VALUES(r.a);
        WHEN r.a BETWEEN 400000 AND 499999 THEN
            INSERT INTO x4 VALUES(r.a);
        WHEN r.a BETWEEN 500000 AND 599999 THEN
            INSERT INTO x5 VALUES(r.a);
        WHEN r.a BETWEEN 600000 AND 699999 THEN
            INSERT INTO x6 VALUES(r.a);
        WHEN r.a BETWEEN 700000 AND 799999 THEN
            INSERT INTO x7 VALUES(r.a);
        WHEN r.a BETWEEN 800000 AND 899999 THEN
            INSERT INTO x8 VALUES(r.a);
        WHEN r.a BETWEEN 900000 AND 1000000 THEN
            INSERT INTO x9 VALUES(r.a);
    END CASE;
  END LOOP;
END;
$function$

This variant is longer, but doesn't need a active index on source table.