Monday, August 25, 2008

Using cursors for generating cross tables

I am working on procedure support in PostgreSQL more than one year. Missing procedures is one disadvantage of PostgreSQL. Because procedures are not called from SELECT statement, then it couldn't respect some rules like functions. Procedures are mainly used for explicit transaction controlling and for generating mutable results (like dynamic record set or stacked record set). Dynamic record set is interesting feature when we nothing know about result set's columns. It's typical for cross tables. Because PostgreSQL doesn't support procedures, we cannot return dynamic (mutable number of columns) queries directly, but we can return dynamic cursor. This method of generating cross tables is inspired by Roland Bauman's http://rpbouman.blogspot.com blog.
Cross table is based on query's pattern:
SELECT shop,
SUM(CASE gender WHEN 'f' THEN salary ELSE 0 END) AS f,
SUM(CASE gender WHEN 'm' THEN salary ELSE 0 END) AS m,
SUM(salary) AS total
FROM employees INNER JOIN shops USING (shop_id)
GROUP BY shop

for data:
CREATE TABLE employees (
id serial PRIMARY KEY,
shop_id int,
gender char(1),
name varchar(32),
salary int
);

CREATE TABLE shops (
id serial PRIMARY KEY,
shop varchar(32)
);

INSERT INTO shops(shop) VALUES('Zurich'),('New York'),('London');

INSERT INTO employees (shop_id, gender, name, salary)
VALUES
(1, 'm', 'Jon Simpson', 4500),
(1, 'f', 'Barbara Breitenmoser',4700),
(2, 'f', 'KirstenRuegg',5600),
(3, 'm', 'Ralp Teller',5100),
(3, 'm', 'Peter Jonson',4700);
I write function, that generate necessary SELECT statement and open dynamic cursor.
CREATE OR REPLACE FUNCTION do_cross_cursor(dimx_name varchar,
    dimx_source varchar, dimy_name varchar,
    dimy_source varchar, expr varchar)
RETURNS refcursor AS $$
DECLARE
col_list text[] := '{}';
query text;
r RECORD;
result refcursor := 'result';
BEGIN
FOR r IN EXECUTE 'SELECT DISTINCT '
  || dimx_name || '::text AS val ' || dimx_source
LOOP
col_list := array_append(col_list, 'SUM(CASE ' || dimx_name
 || ' WHEN ' || quote_literal(r.val) || ' THEN ' || expr
 || ' ELSE 0 END) AS ' || quote_ident(r.val) || '');
END LOOP;
query := 'SELECT ' || dimy_name || ', '
 || array_to_string(col_list, ',')
 || ', SUM(' || expr || ') AS Total '
 || dimy_source || ' GROUP BY ' || dimy_name;
OPEN result NO SCROLL FOR EXECUTE query;
RETURN result;
END;
$$ LANGUAGE plpgsql STRICT;
Because cursors should be used only in transaction, I have to use explicit transaction:
BEGIN;
SELECT do_cross_cursor('gender', 'FROM employees','shop',
     'FROM employees e JOIN shops s ON s.id = e.shop_id',
     'salary');
FETCH ALL FROM result;
END;

BEGIN;
SELECT do_cross_cursor('shop', 'FROM shops','gender',
      'FROM employees e JOIN shops s ON s.id = e.shop_id',
      'salary');
FETCH ALL FROM result;
END;
There is result:
postgres=# BEGIN;
BEGIN
postgres=#   SELECT do_cross_cursor('gender', 'FROM employees',
  'shop',
  'FROM employees e JOIN shops s ON s.id = e.shop_id', 'salary');
do_cross_cursor
-----------------
result
(1 row)

postgres=#   FETCH ALL FROM result;
 shop   |  m   |  f   | total
----------+------+------+-------
New York |    0 | 5600 |  5600
Zurich   | 4500 | 4700 |  9200
London   | 9800 |    0 |  9800
(3 rows)

postgres=# END;BEGIN;
COMMIT
BEGIN
postgres=#   SELECT do_cross_cursor('shop', 'FROM shops','gender',
 'FROM employees e JOIN shops s ON s.id = e.shop_id', 'salary');
do_cross_cursor
-----------------
result
(1 row)

postgres=#   FETCH ALL FROM result;
gender | New York | Zurich | London | total
--------+----------+--------+--------+-------
m      |        0 |   4500 |   9800 | 14300
f      |     5600 |   4700 |      0 | 10300
(2 rows)

postgres=# END;
COMMIT
A alternative for this solution is using a tablefuc extension:
postgres=# CREATE VIEW report AS SELECT shop, gender, salary 
                                    FROM shops, employees 
                                   WHERE shop_id = shops.id;
CREATE VIEW
postgres=# SELECT shop, gender, sum(salary) FROM report GROUP BY 1,2 
 UNION ALL SELECT shop, 'X',    sum(salary) FROM report GROUP BY 1 
 UNION ALL SELECT NULL, gender, sum(salary) FROM report GROUP BY 2 
 UNION ALL SELECT NULL, 'X',    sum(salary) FROM report ORDER BY 1;
   shop   | gender |  sum  
----------+--------+-------
 London   | m      |  9800
 London   | X      |  9800
 New York | f      |  5600
 New York | X      |  5600
 Zurich   | m      |  4500
 Zurich   | X      |  9200
 Zurich   | f      |  4700
          | m      | 14300
          | f      | 10300
          | X      | 24600
(10 rows)

postgres=# SELECT * FROM crosstab(
         'SELECT shop, gender, sum(salary) FROM report GROUP BY 1,2 
UNION ALL SELECT shop, ''X'',  sum(salary) FROM report GROUP BY 1 
UNION ALL SELECT NULL, gender, sum(salary) FROM report GROUP BY 2 
UNION ALL SELECT NULL, ''X'',  sum(salary) FROM report ORDER BY 1',
         'VALUES(''f''),(''m''),(''X'')') AS ct(shop varchar, 
                                                f bigint, 
                                                m bigint, 
                                                "total" bigint);
   shop   |   f   |   m   | total 
----------+-------+-------+-------
 London   |       |  9800 |  9800
 New York |  5600 |       |  5600
 Zurich   |  4700 |  4500 |  9200
          | 10300 | 14300 | 24600
(4 rows)

15 Comments:

At August 26, 2008 at 2:49 AM , Blogger Luca Veronese said...

At least on Firefox, the SQL text in the page is truncated. May it be possible to fix the CSS or to wrap the text so it is readable ?

Thank you for your valuable work on postgreSQL

 
At August 26, 2008 at 4:07 AM , Blogger Pavel Stěhule said...

Hello,

I reformated article. Wrong style :(

 
At February 18, 2009 at 12:15 PM , Anonymous Anonymous said...

Your article was very helpful to me, thanks a lot :-)
w

 
At July 7, 2009 at 4:11 PM , Blogger Gustavo Straube said...

Hi!

Very good article! It's just what I've looking for.

I have one question: probably I'll use this technique in an application I'm developing and want to know about the impact in terms of performance with a large ammount of data. Is it fine?

Thank you!

 
At July 7, 2009 at 10:38 PM , Blogger Pavel Stěhule said...

To Gustavo

Hello

what I know, this method is the best on SQL level. But on very large tables this query should run long time. PostgreSQL missing OLAP or ROLAP support. Try it and you will see. This solution is very simple, but if it will be too slow you can use some general OLAP sw - Mondrian, Pentaho

regards
Pavel

 
At February 12, 2010 at 4:45 AM , Anonymous Anonymous said...

How about implementing a new keyword in Postgres, like EXECUTE PROCEDURE 'function_name()' which returns "SETOF RECORD", and which is equivalent to SELECT * FROM 'function_name()', but doesn't give you an error?

regards,
davor

 
At February 12, 2010 at 4:58 AM , Blogger Pavel Stěhule said...

This is in my ToDo (when I meet some sponsor). It is standardized: CREATE PROCEDURE and CALL statement.

 
At July 19, 2010 at 6:29 PM , Anonymous irman said...

Brilliant article... it is simpler to do crosstab ;;)

Best regards,
Irman

 
At June 11, 2012 at 9:23 PM , Blogger Prof. Adriano Aguzzi, MD PhD DVM h.c. FRCP FRCPath said...

I am a molecular biologist and have minimal Postgres knowledge, so my question is very naive. I pasted your code into pgAdmin sql window, and everything works fine, your function is being created. But when I execute your procedure, PG says "3 rows discarded" and shows nothing. I am sure that this is because of my ignorance - can you tell me how to do it right? My goal is to create simple dynamic pivot queries like with Access.

 
At June 11, 2012 at 11:33 PM , Blogger Pavel Stěhule said...

To Adriano: you can test it in console? What are you doing exactly? This is pgAdmin message or PostgreSQL message?

 
At June 18, 2012 at 10:07 PM , Blogger Prof. Adriano Aguzzi, MD PhD DVM h.c. FRCP FRCPath said...

Dear Pawel
thank you for your help. When I paste the function into pgAdmin without wrapping it into the transaction, it works fine. Within the transaction block, it rejects the output. I will try to read more about transactions in order to understand this behavior.
I have a big question though. Could you write a very short (3-4 lines) documentation of the parameters of your function? What is the admissible parameter syntax, etc? A million thanks in advance!!!

 
At June 18, 2012 at 10:23 PM , Blogger Prof. Adriano Aguzzi, MD PhD DVM h.c. FRCP FRCPath said...

well, I tried the following:

SELECT do_cross_cursor( 'varid',
'FROM loggingdb_ips_boolean As log',
'to_char(ipstimestamp, ''mon DD HH24h'')',
'FROM loggingdb_ips_boolean As log',
'COUNT(*)::integer');
FETCH ALL FROM result;

But it aborts with the following message:
ERROR: aggregate function calls cannot be nested
SQL state: 42803
Context: PL/pgSQL function "do_cross_cursor" line 19 at OPEN

Does this mean that one cannot use aggregate functions within this cursor?

 
At June 20, 2012 at 1:32 AM , Blogger Pavel Stěhule said...

To Adriano: You can use aggregate function with cursors - but you cannot nested aggregates - if you can understand code - it is based on dynamic SQL - then some query is dynamically created - and it use SUM as example. When you use COUNT(*), then it creates query with ..SUM( .. COUNT(*)) that is not valid. If you can use COUNT, then you have to modify source code. /parameters - it is column name and source, column name and source.

 
At August 1, 2012 at 1:01 AM , Anonymous Peter said...

Hi Pavel Stěhule
its a beautiful code to create cross relation pivot table. i need to change all diagonal values to 'N/A'. i have been trying to make some changes but could not succeeded. can you plz tell me how can i do this.

 
At August 1, 2012 at 1:31 AM , Blogger Unknown said...

hi there . your blog has helped me alot thank you very much . now i need some altretaion in this besause my requirmnts are slightly different . my scenario is i have three columns say
A(text) B(text) Value(text)
AA1 AA1 0
AA2 AA1 100
AA3 AA1 100
AA1 AA2 100
AA2 AA2 0
AA3 AA2 100
AA1 AA3 100
AA2 AA3 100
AA3 AA3 0

now i want a cross table that looks like
AA1 AA2 AA3
AA1 0 100 100
AA2 100 0 100
AA3 100 100 0

and then it should replace o with not applicable (N/A) so i am looking a cross table like

AA1 AA2 AA3
AA1 N/A 100 100
AA2 100 N/A 100
AA3 100 100 N/A

i have achieved the table with zeros but i am unable to place N/A value where row and column name is same i,e fro AA1 to AA1 or AA2 to AA2 . thans i will wait for your reply

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home