Friday, August 30, 2019

Why you need plpgsql_check (if you write procedures in PLpgSQL)

PLpgSQL is simple (but relatively powerful) specialized language. It is specialized for usage inside PostgreSQL as glue of SQL statements. It is great language for this purpose (and can be bad if it is used differently - for high cycles numeric calculation for example).

Originally a implementation of PLpgSQL language was really simple - it was very simple language with possibility to execute embedded SQL. But there was not possibility to check syntax of this SQL in validation time.

Note: PLpgSQL is a interpret of AST (Abstract Syntax Tree) nodes. There is a validation stage (when code is parsed into AST), and evaluation stage (runtime), when AST (parsed again when code is first executed) is interpreted.

Old SPI (Stored Procedure Interface) had not any API for validation of SQL without execution. Almost all checks in this time was at runtime. It was not too practical - so some checks (check of SQL syntax) are at validation stage now. Still PLpgSQL validator doesn't check a validity of SQL identifiers (tables, columns, functions, ... names). Now, there are two reason why the validator doesn't do it: a) we have not any other tools how to solve cyclic dependencies, b) Postgres's local temp tables - PostgreSQL temporary tables are created at run-time and doesn't exists at validation time - so PLpgSQL validator should not to check validity of SQL identifiers - they should not to exists in this time.

SQL identifiers validity checks


But often we should to check all identifiers inside code (SQL identifiers too). This is time for plpgsql_check:

CREATE TABLE bigtable(id int, v int);
INSERT INTO bigtable
   SELECT random() * 1000000, random()*10000
   FROM generate_series(1,1000000);
CREATE INDEX ON bigtable(id);
VACUUM ANALYZE bigtable;

CREATE OR REPLACE FUNCTION example01(_id numeric)
RETURNS numeric AS $$
DECLARE
  r record; 
  s numeric DEFAULT 0; 
BEGIN
  FOR r IN SELECT * FROM bigtable WHERE id = _id
  LOOP
    s := s + r.k;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

This code has lot of issues, but no one is a problem for buildin plpgsql validator. So this function can be created and executed. Sure, there will be run-time error:

I executed this function 2x and I got two different errors:
postgres=# select example01(1000);
ERROR:  control reached end of function without RETURN
CONTEXT:  PL/pgSQL function example01(numeric)
postgres=# select example01(1001);
ERROR:  record "r" has no field "k"
CONTEXT:  SQL statement "SELECT s + r.k"
PL/pgSQL function example01(numeric) line 8 at assignment

There are possible two branches of execution (with/without execution of body of cycle). There is wrong reference to SQL identifiers r.k, and a statement RETURN is missing.

It isn't too hard to see these issues in this simple code. But it is harder to find these bugs inside function with more hundreds of lines. plpgsql_check can helps:

postgres=# SELECT * FROM plpgsql_check_function('example01', fatal_errors=>false);
+------------------------------------------------------------+
|                   plpgsql_check_function                   |
+------------------------------------------------------------+
| error:42703:8:assignment:record "r" has no field "k"       |
| Context: SQL statement "SELECT s + r.k"                    |
| error:2F005:control reached end of function without RETURN |
| warning extra:00000:3:DECLARE:never read variable "r"      |
| warning extra:00000:4:DECLARE:never read variable "s"      |
+------------------------------------------------------------+
(5 rows)

After fix the plpgsql_check is a happy:

CREATE OR REPLACE FUNCTION public.example01(_id numeric)
 RETURNS numeric
 LANGUAGE plpgsql
AS $function$
DECLARE
  r record;
  s numeric DEFAULT 0;
BEGIN
  FOR r IN SELECT * FROM bigtable WHERE id = _id
  LOOP
    s := s + r.v;
  END LOOP;
  RETURN s;
END;
$function$

postgres=# SELECT * FROM plpgsql_check_function('example01', fatal_errors=>false);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)

postgres=# select example01(1001);
+-----------+
| example01 |
+-----------+
|      3164 |
+-----------+
(1 row)

But performance is not too good - it should to read only few (or zero) rows from indexed columns, and execution time is 275 ms. This is time for integrated profiler:

plpgsql_check profiler


plpgsql_check profiler can works in two modes. a) with persistent profiles in shared memory (requires loading at start of Postgres). b) with session only profiles. For my purpose session only profiles are ok, so I don't need special configuration. Just I need to start profiler:
-- ensure active profiler
LOAD 'plpgsql'; LOAD 'plpgsql_check'; SET plpgsql_check.profiler TO on;

-- run example01 more times
SELECT example01(1001);
SELECT example01(1001);
SELECT example01(1001);

Now I can see a profile of execute01 function

postgres=# SELECT * FROM plpgsql_profiler_function_tb('example01');
+--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+
| lineno | stmt_lineno | cmds_on_row | exec_stmts | total_time | avg_time | max_time  | processed_rows |                      source                      |
+--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+
|      1 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              |                                                  |
|      2 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              | DECLARE                                          |
|      3 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              |   r record;                                      |
|      4 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              |   s numeric DEFAULT 0;                           |
|      5 |           5 |           1 |          3 |       0.07 |    0.024 | {0.029}   | {0}            | BEGIN                                            |
|      6 |           6 |           1 |          3 |    851.895 |  283.965 | {299.195} | {0}            |   FOR r IN SELECT * FROM bigtable WHERE id = _id |
|      7 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              |   LOOP                                           |
|      8 |           8 |           1 |          6 |      0.063 |    0.011 | {0.023}   | {0}            |     s := s + r.v;                                |
|      9 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              |   END LOOP;                                      |
|     10 |          10 |           1 |          3 |          0 |        0 | {0}       | {0}            |   RETURN s;                                      |
|     11 |           ∅ |           ∅ |          ∅ |          ∅ |        ∅ | ∅         | ∅              | END;                                             |
+--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+
(11 rows)

You can see - almost all time is in query execution. Fast recheck of query doesn't show any special:
postgres=# EXPLAIN ANALYZE SELECT * FROM bigtable WHERE id = 1001;
+---------------------------------------------------------------------------------------------------------------------------+
|                                                        QUERY PLAN                                                         |
+---------------------------------------------------------------------------------------------------------------------------+
| Index Scan using bigtable_id_idx on bigtable  (cost=0.42..12.46 rows=2 width=8) (actual time=0.224..0.233 rows=2 loops=1) |
|   Index Cond: (id = 1001)                                                                                                 |
| Planning Time: 0.198 ms                                                                                                   |
| Execution Time: 0.289 ms                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------+
(4 rows)
Why the direct execution of query needs 0.3ms and execution inside function 283ms? plpgsql_check can helps again.

Performance tips


We should to use a option performance_warnings
postgres=# SELECT * FROM plpgsql_check_function('example01', performance_warnings => true);
+-------------------------------------------------------------------------------------------------------------------------------+
|                                                    plpgsql_check_function                                                     |
+-------------------------------------------------------------------------------------------------------------------------------+
| performance:42804:5:statement block:target type is different type than source type                                            |
| Detail: cast "integer" value to "numeric" type                                                                                |
| Hint: Hidden casting can be a performance issue.                                                                              |
| Context: during statement block local variable "s" initialization on line 4                                                   |
| performance:42804:6:FOR over SELECT rows:implicit cast of attribute caused by different PLpgSQL variable type in WHERE clause |
| Query: SELECT * FROM bigtable WHERE id = _id                                                                                  |
| --                                       ^                                                                                    |
| Detail: An index of some attribute cannot be used, when variable, used in predicate, has not right type like a attribute      |
| Hint: Check a variable type - int versus numeric                                                                              |
| performance:00000:routine is marked as VOLATILE, should be STABLE                                                             |
| Hint: When you fix this issue, please, recheck other functions that uses this function.                                       |
+-------------------------------------------------------------------------------------------------------------------------------+
(11 rows)

There is a bad type of variable - numeric. Postgres cannot to use index when column is a integer, and searched value is a numeric:
postgres=# EXPLAIN ANALYZE SELECT * FROM bigtable WHERE id = 1001::numeric;
+---------------------------------------------------------------------------------------------------------------------------+
|                                                        QUERY PLAN                                                         |
+---------------------------------------------------------------------------------------------------------------------------+
| Gather  (cost=1000.00..12175.00 rows=5000 width=8) (actual time=21.000..115.142 rows=2 loops=1)                           |
|   Workers Planned: 2                                                                                                      |
|   Workers Launched: 2                                                                                                     |
|   ->  Parallel Seq Scan on bigtable  (cost=0.00..10675.00 rows=2083 width=8) (actual time=56.270..105.250 rows=1 loops=3) |
|         Filter: ((id)::numeric = '1001'::numeric)                                                                         |
|         Rows Removed by Filter: 333333                                                                                    |
| Planning Time: 0.180 ms                                                                                                   |
| Execution Time: 115.190 ms                                                                                                |
+---------------------------------------------------------------------------------------------------------------------------+
(8 rows)

From times you can see, so parallel execution is not available inside plpgsql too.

So after fix:
DROP FUNCTION example01 (numeric) ;
CREATE OR REPLACE FUNCTION public.example01(_id int)
 RETURNS int
 LANGUAGE plpgsql
AS $function$
DECLARE
  r record; 
  s int DEFAULT 0; 
BEGIN
  FOR r IN SELECT * FROM bigtable WHERE id = _id
  LOOP
    s := s + r.v;
  END LOOP;
  RETURN s;
END;
$function$

the execution time is about 2ms. But plpgsql_check is not happy still:
postgres=# SELECT * FROM plpgsql_check_function('example01', performance_warnings => true);
+-----------------------------------------------------------------------------------------+
|                                 plpgsql_check_function                                  |
+-----------------------------------------------------------------------------------------+
| performance:00000:routine is marked as VOLATILE, should be STABLE                       |
| Hint: When you fix this issue, please, recheck other functions that uses this function. |
+-----------------------------------------------------------------------------------------+
(2 rows)

When the function does only read operations from database, then should be marked as STABLE. So final code should to look like:
CREATE OR REPLACE FUNCTION public.example01(_id int)
 RETURNS int
 LANGUAGE plpgsql
AS $function$
DECLARE
  r record; 
  s int DEFAULT 0; 
BEGIN
  FOR r IN SELECT * FROM bigtable WHERE id = _id
  LOOP
    s := s + r.v;
  END LOOP;
  RETURN s;
END;
$function$ STABLE
Now execution is little bit faster (about 0.5 ms)

Next example is very untypical case for PLpgSQL (this language is not designed for numeric calculations), but it is good example of some developer's mistakes.

CREATE OR REPLACE FUNCTION public.example02(integer)
 RETURNS bigint
 LANGUAGE plpgsql
AS $function$
DECLARE result bigint DEFAULT 0;
BEGIN
  FOR i IN 1..$1
  LOOP
    result := result + 1.0;
  END LOOP;
  RETURN result;
END;
$function$

Again, there is more than one issue. But we can check a performance - 1M iterations needs about 922ms. It is relatively slow - but this language is designed for different purpose - it is glue of SQL. plpgsql_check shows some warnings:

postgres=# SELECT * FROM plpgsql_check_function('example02', performance_warnings => true);
+-----------------------------------------------------------------------------------------+
|                                 plpgsql_check_function                                  |
+-----------------------------------------------------------------------------------------+
| performance:42804:3:statement block:target type is different type than source type      |
| Detail: cast "integer" value to "bigint" type                                           |
| Hint: Hidden casting can be a performance issue.                                        |
| Context: during statement block local variable "result" initialization on line 2        |
| performance:42804:6:assignment:target type is different type than source type           |
| Detail: cast "numeric" value to "bigint" type                                           |
| Hint: Hidden casting can be a performance issue.                                        |
| Context: at assignment to variable "result" declared on line 2                          |
| warning extra:00000:4:DECLARE:never read variable "i"                                   |
| performance:00000:routine is marked as VOLATILE, should be IMMUTABLE                    |
| Hint: When you fix this issue, please, recheck other functions that uses this function. |
+-----------------------------------------------------------------------------------------+
(11 rows)

Time: 2,080 ms

The most important issues are: 1. lazy casting from numeric inside cycle's body, 2. the function is marked as VOLATILE by default, although it should be marked IMMUTABLE (because doesn't do any operation with tables). The overhead of lazy cast from numeric type is about 470ms. The overhead of default VOLATILE is 200ms. The function
CREATE OR REPLACE FUNCTION public.example02(integer)
 RETURNS bigint
 LANGUAGE plpgsql
AS $function$
DECLARE result bigint DEFAULT 0;
BEGIN
  FOR i IN 1..$1
  LOOP
    result := result + 1;
  END LOOP;
  RETURN result;
END;
$function$ IMMUTABLE

needs about 250ms for 1M iterations (good reduction from original 922ms). Probably it should be faster - I have developer build of PostgreSQL with active assertions.

SQL injection detection


plpgsql_check can help with most dangerous security issue of SQL world - SQL injection. Here is again buggy code:

CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$
DECLARE result int;
BEGIN
  EXECUTE 'SELECT count(*) FROM ' || tablename INTO result;
  RETURN result;
END 
$$ LANGUAGE plpgsql STABLE;

CREATE FUNCTION

postgres=# SELECT count_rows('bigtable');
+------------+
| count_rows |
+------------+
| 1000000    |
+------------+
(1 row)

This code is terrible bad, but plpgsql_check can to show the problem:
postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true);
+-----------------------------------------------------------------------------+
|                           plpgsql_check_function                            |
+-----------------------------------------------------------------------------+
| security:00000:3:EXECUTE:text type variable is not sanitized                |
| Query: SELECT 'SELECT count(*) FROM ' || tablename                          |
| --                                       ^                                  |
| Detail: The EXECUTE expression is SQL injection vulnerable.                 |
| Hint: Use quote_ident, quote_literal or format function to secure variable. |
+-----------------------------------------------------------------------------+
(5 rows)

Fixing is not hard:

CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$DECLARE result int;
BEGIN
  EXECUTE 'SELECT count(*) FROM ' || quote_ident(tablename) INTO result;
  RETURN result;
END
$$ LANGUAGE plpgsql STABLE;
CREATE FUNCTION
Time: 51,511 ms
postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)

CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$DECLARE result int;
BEGIN
  EXECUTE format('SELECT count(*) FROM %I', tablename) INTO result;
  RETURN result;
END
$$ LANGUAGE plpgsql STABLE;
CREATE FUNCTION
Time: 55,931 ms
postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)

The SQL injection detection is only basic, but almost all bugs of this type are very simple.



1 Comments:

At September 2, 2019 at 12:28 AM , Blogger Misha said...

Cool! I like injections test! So we need now dedicated out of the box “code coverage” case I think. Just run test and get report. There would be tricky stuff about SQL plans (how to cover different tree path throw all possible plans) but we have to start from something.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home