Pages

Wednesday, October 13, 2010

use a plpgsql well

Hello

I found a following code on net. It's good example of bad code (written by PHP coder):
01.CREATE OR REPLACE FUNCTION delete_data(IN data integer[]) RETURNS integer AS
02.--DECLARATION OF FUNCTION
03.$$
04.DECLARE
05.--DECLARATION OF LOCAL VARIABLES
06.sql varchar;
07.i integer;
08.BEGIN
09.--START OF THE PROCEDURE
10.i := 0;
11.loop
12.--LOOP THROUGH AOUR ARRAY OF DATA
13.if (data[i][1] IS NULL) then
14.return 1;
15.--IF WE LOOPED THROUG ALL OF ARRAY, IT'S DONE
16.exit;
17.end if;
18.sql := 'DELETE FROM data_table WHERE data_table.id='||data[i];
19.--THE SQL ITSELF, COMBINED WITH VARIABLE
20.execute(sql);
21.--THE EXECUTE FUNCTION BUILT IN PL/pgSQL, EXECUTES VARCHAR SQLs
22.i := i + 1;
23.end loop;
24.--END OF THE PROCEDURE
25. 
26.END;
27.--END OF FUNCTION'S LOGIC
28. 
29.$$
30.LANGUAGE 'plpgsql';
31.--DECLARATION OF LANGUAGE USED
what is wrong:
  • using a dynamic SQL - really, there cannot be a dynamic SQL
  • this code is cryptographic - It needs big fantasy to see iteration over array in this code
  • inconsistency - keywords, useless braces in IF statement (PL/pgSQL isn't PHP)
  • useless return value - not handled case, when input array is empty
why the coder didn't write just?:
01.CREATE OR REPLACE FUNCTION delete_data(IN data integer[]) 
02.RETURNS void AS $$
03.BEGIN
04.  FOR i IN array_lower(data,1)..array_upper(data,1)
05.  LOOP
06.    DELETE FROM data_table WHERE data_table.id = data[i];
07.  END LOOP;
08.END;
09.$$ LANGUAGE plpgsql; 
or better (if you like procedures and dislike SQL in your app. code) (inpiration by Thermic):
  CREATE OR REPLACE FUNCTION delete_data(data integer[])
  RETURNS void AS $$
    DELETE FROM data_table WHERE data_table.id = ANY($1)
  $$ LANGUAGE sql;
note: you can use a plpgsql or sql. Both environments has a few advantages and disadvantages. For using in web environments I little bit prefer SQL language (when it's called once per session, when size of input array can be significantly different).

6 comments:

  1. Not only is that an excellent example of bad code, it's actually it's poor practice to use PL/pgSQL at all in this case, the right way is to just use plain SQL:

    DELETE FROM data_table WHERE data_table.id = ANY ( ? );

    That's it. Bind an array to ? in your app, execute and you're done.


    Here's an example running through it:

    thormick=> CREATE TEMPORARY TABLE data_table ( id int );
    CREATE TABLE
    thormick=> \copy data_table from stdin
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    1
    2
    3
    4
    \.
    thormick=> SELECT * FROM data_table;
    SELECT * FROM data_table;
    id
    ----
    1
    2
    3
    4
    (4 rows)

    thormick=> DELETE FROM data_table WHERE data_table.id = ANY ( ARRAY[1,3]::INT[] );
    DELETE 2
    thormick=> SELECT * FROM data_table;
    SELECT * FROM data_table;
    id
    ----
    2
    4
    (1 row)

    ReplyDelete
  2. "# using a dynamic SQL - really, there cannot be a dynamic SQL"

    So, what do you do if you have a dynamic search? Sometimes it will be on the format:

    SELECT x.*
    FROM table x
    WHERE x.id = ?

    and sometimes it's on the format

    SELECT x.*
    FROM table x
    INNER JOIN table y ON ..
    INNER JOIN table z ON ..
    WHERE x.id = ?
    AND z.value = ?

    This is a simple example, so you could get away with running two different SQL-statements, but what do you do if you have for instance 20 different input fields for the search?

    ReplyDelete
  3. to christian

    a) My personal opinion on dynamic search is negative - this is badly designed application.

    b) if you need, you can use a trick (probably with dynamic sql) - so empty value means - don't use this value for searching

    EXECUTE '...WHERE column1 = $1 or $1 is null
    AND column2 = $2 or $2 is null
    ... '
    USING var1, var2

    This is safe to SQL injection and readable.

    ReplyDelete
  4. To Pavel Stěhule:

    I wasn't saying you should string concatenate user input into your SQL statements. As (I think) you can see from my example, I would be using prepared statements. It would be just as safe from SQL-injection.

    Your example would mean you are relying on the optimizer and putting a higher strain on the database.

    In an optimal world, I wouldn't use dynamic SQL either, but I can't really see how you can avoid it without ending up with much slower SQL-statements.

    If you look at my example again:

    SELECT x.*
    FROM table x
    WHERE x.id = ?

    and sometimes it's on the format

    SELECT x.*
    FROM table x
    INNER JOIN table y ON ..
    INNER JOIN table z ON ..
    WHERE x.id = ?
    AND z.value = ?

    Let us say that table y has a bazillion rows... and you only rarily searched for z.value, don't you think it would be a waste going through both table y and z every time you did that search?

    Another option is to write SQL statements for the border cases, but that doesn't work when you have searches with let's say 10 fields. You would have to write something like 10! (10*9*...*2*1= 3628800) different SQL statements, or join in all the tables all the time and hope the optimizer is perfect. Which it isn't. Or, atleast, it isn't in Postgres 8.3 which we are running.

    ReplyDelete
  5. Obviously someone who hasn't heard Quassnoi's click (http://explainextended.com/2009/06/21/click/)!

    It ain't easy making the leap from procedurally- to set-based thinking. I speak from experience! :)

    ReplyDelete