Pages

Monday, August 25, 2008

Using cursors for generating cross tables

I am working on procedure support in PostgreSQL more than one year. Missing procedures is one disadvantage of PostgreSQL. Because procedures are not called from SELECT statement, then it couldn't respect some rules like functions. Procedures are mainly used for explicit transaction controlling and for generating mutable results (like dynamic record set or stacked record set). Dynamic record set is interesting feature when we nothing know about result set's columns. It's typical for cross tables. Because PostgreSQL doesn't support procedures, we cannot return dynamic (mutable number of columns) queries directly, but we can return dynamic cursor. This method of generating cross tables is inspired by Roland Bauman's http://rpbouman.blogspot.com blog.
Cross table is based on query's pattern:
SELECT shop,
SUM(CASE gender WHEN 'f' THEN salary ELSE 0 END) AS f,
SUM(CASE gender WHEN 'm' THEN salary ELSE 0 END) AS m,
SUM(salary) AS total
FROM employees INNER JOIN shops USING (shop_id)
GROUP BY shop

for data:
CREATE TABLE employees (
id serial PRIMARY KEY,
shop_id int,
gender char(1),
name varchar(32),
salary int
);

CREATE TABLE shops (
id serial PRIMARY KEY,
shop varchar(32)
);

INSERT INTO shops(shop) VALUES('Zurich'),('New York'),('London');

INSERT INTO employees (shop_id, gender, name, salary)
VALUES
(1, 'm', 'Jon Simpson', 4500),
(1, 'f', 'Barbara Breitenmoser',4700),
(2, 'f', 'KirstenRuegg',5600),
(3, 'm', 'Ralp Teller',5100),
(3, 'm', 'Peter Jonson',4700);
I write function, that generate necessary SELECT statement and open dynamic cursor.
CREATE OR REPLACE FUNCTION do_cross_cursor(dimx_name varchar,
    dimx_source varchar, dimy_name varchar,
    dimy_source varchar, expr varchar)
RETURNS refcursor AS $$
DECLARE
col_list text[] := '{}';
query text;
r RECORD;
result refcursor := 'result';
BEGIN
FOR r IN EXECUTE 'SELECT DISTINCT '
  || dimx_name || '::text AS val ' || dimx_source
LOOP
col_list := array_append(col_list, 'SUM(CASE ' || dimx_name
 || ' WHEN ' || quote_literal(r.val) || ' THEN ' || expr
 || ' ELSE 0 END) AS ' || quote_ident(r.val) || '');
END LOOP;
query := 'SELECT ' || dimy_name || ', '
 || array_to_string(col_list, ',')
 || ', SUM(' || expr || ') AS Total '
 || dimy_source || ' GROUP BY ' || dimy_name;
OPEN result NO SCROLL FOR EXECUTE query;
RETURN result;
END;
$$ LANGUAGE plpgsql STRICT;
Because cursors should be used only in transaction, I have to use explicit transaction:
BEGIN;
SELECT do_cross_cursor('gender', 'FROM employees','shop',
     'FROM employees e JOIN shops s ON s.id = e.shop_id',
     'salary');
FETCH ALL FROM result;
END;

BEGIN;
SELECT do_cross_cursor('shop', 'FROM shops','gender',
      'FROM employees e JOIN shops s ON s.id = e.shop_id',
      'salary');
FETCH ALL FROM result;
END;
There is result:
postgres=# BEGIN;
BEGIN
postgres=#   SELECT do_cross_cursor('gender', 'FROM employees',
  'shop',
  'FROM employees e JOIN shops s ON s.id = e.shop_id', 'salary');
do_cross_cursor
-----------------
result
(1 row)

postgres=#   FETCH ALL FROM result;
 shop   |  m   |  f   | total
----------+------+------+-------
New York |    0 | 5600 |  5600
Zurich   | 4500 | 4700 |  9200
London   | 9800 |    0 |  9800
(3 rows)

postgres=# END;BEGIN;
COMMIT
BEGIN
postgres=#   SELECT do_cross_cursor('shop', 'FROM shops','gender',
 'FROM employees e JOIN shops s ON s.id = e.shop_id', 'salary');
do_cross_cursor
-----------------
result
(1 row)

postgres=#   FETCH ALL FROM result;
gender | New York | Zurich | London | total
--------+----------+--------+--------+-------
m      |        0 |   4500 |   9800 | 14300
f      |     5600 |   4700 |      0 | 10300
(2 rows)

postgres=# END;
COMMIT
A alternative for this solution is using a tablefuc extension:
postgres=# CREATE VIEW report AS SELECT shop, gender, salary 
                                    FROM shops, employees 
                                   WHERE shop_id = shops.id;
CREATE VIEW
postgres=# SELECT shop, gender, sum(salary) FROM report GROUP BY 1,2 
 UNION ALL SELECT shop, 'X',    sum(salary) FROM report GROUP BY 1 
 UNION ALL SELECT NULL, gender, sum(salary) FROM report GROUP BY 2 
 UNION ALL SELECT NULL, 'X',    sum(salary) FROM report ORDER BY 1;
   shop   | gender |  sum  
----------+--------+-------
 London   | m      |  9800
 London   | X      |  9800
 New York | f      |  5600
 New York | X      |  5600
 Zurich   | m      |  4500
 Zurich   | X      |  9200
 Zurich   | f      |  4700
          | m      | 14300
          | f      | 10300
          | X      | 24600
(10 rows)

postgres=# SELECT * FROM crosstab(
         'SELECT shop, gender, sum(salary) FROM report GROUP BY 1,2 
UNION ALL SELECT shop, ''X'',  sum(salary) FROM report GROUP BY 1 
UNION ALL SELECT NULL, gender, sum(salary) FROM report GROUP BY 2 
UNION ALL SELECT NULL, ''X'',  sum(salary) FROM report ORDER BY 1',
         'VALUES(''f''),(''m''),(''X'')') AS ct(shop varchar, 
                                                f bigint, 
                                                m bigint, 
                                                "total" bigint);
   shop   |   f   |   m   | total 
----------+-------+-------+-------
 London   |       |  9800 |  9800
 New York |  5600 |       |  5600
 Zurich   |  4700 |  4500 |  9200
          | 10300 | 14300 | 24600
(4 rows)

15 comments:

  1. At least on Firefox, the SQL text in the page is truncated. May it be possible to fix the CSS or to wrap the text so it is readable ?

    Thank you for your valuable work on postgreSQL

    ReplyDelete
  2. Hello,

    I reformated article. Wrong style :(

    ReplyDelete
  3. Your article was very helpful to me, thanks a lot :-)
    w

    ReplyDelete
  4. Hi!

    Very good article! It's just what I've looking for.

    I have one question: probably I'll use this technique in an application I'm developing and want to know about the impact in terms of performance with a large ammount of data. Is it fine?

    Thank you!

    ReplyDelete
  5. To Gustavo

    Hello

    what I know, this method is the best on SQL level. But on very large tables this query should run long time. PostgreSQL missing OLAP or ROLAP support. Try it and you will see. This solution is very simple, but if it will be too slow you can use some general OLAP sw - Mondrian, Pentaho

    regards
    Pavel

    ReplyDelete
  6. How about implementing a new keyword in Postgres, like EXECUTE PROCEDURE 'function_name()' which returns "SETOF RECORD", and which is equivalent to SELECT * FROM 'function_name()', but doesn't give you an error?

    regards,
    davor

    ReplyDelete
  7. This is in my ToDo (when I meet some sponsor). It is standardized: CREATE PROCEDURE and CALL statement.

    ReplyDelete
  8. Brilliant article... it is simpler to do crosstab ;;)

    Best regards,
    Irman

    ReplyDelete
  9. I am a molecular biologist and have minimal Postgres knowledge, so my question is very naive. I pasted your code into pgAdmin sql window, and everything works fine, your function is being created. But when I execute your procedure, PG says "3 rows discarded" and shows nothing. I am sure that this is because of my ignorance - can you tell me how to do it right? My goal is to create simple dynamic pivot queries like with Access.

    ReplyDelete
  10. To Adriano: you can test it in console? What are you doing exactly? This is pgAdmin message or PostgreSQL message?

    ReplyDelete
  11. Dear Pawel
    thank you for your help. When I paste the function into pgAdmin without wrapping it into the transaction, it works fine. Within the transaction block, it rejects the output. I will try to read more about transactions in order to understand this behavior.
    I have a big question though. Could you write a very short (3-4 lines) documentation of the parameters of your function? What is the admissible parameter syntax, etc? A million thanks in advance!!!

    ReplyDelete
  12. well, I tried the following:

    SELECT do_cross_cursor( 'varid',
    'FROM loggingdb_ips_boolean As log',
    'to_char(ipstimestamp, ''mon DD HH24h'')',
    'FROM loggingdb_ips_boolean As log',
    'COUNT(*)::integer');
    FETCH ALL FROM result;

    But it aborts with the following message:
    ERROR: aggregate function calls cannot be nested
    SQL state: 42803
    Context: PL/pgSQL function "do_cross_cursor" line 19 at OPEN

    Does this mean that one cannot use aggregate functions within this cursor?

    ReplyDelete
  13. To Adriano: You can use aggregate function with cursors - but you cannot nested aggregates - if you can understand code - it is based on dynamic SQL - then some query is dynamically created - and it use SUM as example. When you use COUNT(*), then it creates query with ..SUM( .. COUNT(*)) that is not valid. If you can use COUNT, then you have to modify source code. /parameters - it is column name and source, column name and source.

    ReplyDelete
  14. Hi Pavel Stěhule
    its a beautiful code to create cross relation pivot table. i need to change all diagonal values to 'N/A'. i have been trying to make some changes but could not succeeded. can you plz tell me how can i do this.

    ReplyDelete
  15. hi there . your blog has helped me alot thank you very much . now i need some altretaion in this besause my requirmnts are slightly different . my scenario is i have three columns say
    A(text) B(text) Value(text)
    AA1 AA1 0
    AA2 AA1 100
    AA3 AA1 100
    AA1 AA2 100
    AA2 AA2 0
    AA3 AA2 100
    AA1 AA3 100
    AA2 AA3 100
    AA3 AA3 0

    now i want a cross table that looks like
    AA1 AA2 AA3
    AA1 0 100 100
    AA2 100 0 100
    AA3 100 100 0

    and then it should replace o with not applicable (N/A) so i am looking a cross table like

    AA1 AA2 AA3
    AA1 N/A 100 100
    AA2 100 N/A 100
    AA3 100 100 N/A

    i have achieved the table with zeros but i am unable to place N/A value where row and column name is same i,e fro AA1 to AA1 or AA2 to AA2 . thans i will wait for your reply

    ReplyDelete