Friday, August 29, 2014

lot of fixes in plpgsql_check

if you use it, please update

Adam Bartoszewicz sent instructions and compiled ddl for plpgsql_check for windows

Adam wrote (this is a how to build a PostgreSQL extension on MS Windows):

My general steps to compile plpgsql_check plugin on Windows 7:
  1. Download and install PostgreSQL 9.3.4 for Win32 from http://www.enterprisedb.com
  2. Download and install Microsoft Visual C++ 2010 Express
  3. Lern tutorial http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows
  4. The plpgsql_check depends on plpgsql and we need to add plpgsql.lib to the library list. Unfortunately PostgreSQL 9.4.3 does not contain this library.
  5. Create a plpgsql.lib from plpgsql.dll as described in http://adrianhenke.wordpress.com/2008/12/05/create-lib-file-from-dll
  6. Change plpgsql_check.c file, add PGDLLEXPORT line before evry extension function, as described in http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows
    ...PGDLLEXPORT
    Datum plpgsql_check_function_tb(PG_FUNCTION_ARGS);
    PGDLLEXPORT
    Datum plpgsql_check_function(PG_FUNCTION_ARGS);
    ...
    PGDLLEXPORT
    Datum
    plpgsql_check_function(PG_FUNCTION_ARGS)
    {
    Oid            funcoid = PG_GETARG_OID(0);
    ...
    PGDLLEXPORT
    Datum
    plpgsql_check_function_tb(PG_FUNCTION_ARGS)
    {
    Oid            funcoid = PG_GETARG_OID(0);
    ...
    
  7. Build plpgsql_check.dll
  8. Install plugin
    1. copy plpgsql_check.dll to PostgreSQL\9.3\lib
    2. copy plpgsql_check.control and plpgsql_check--0.8.sql to PostgreSQL\9.3\share\extension

Monday, August 18, 2014

plpgsql_check 0.9

I released new version of plpgsql_check.

What is new?

lot of new checks:

  • assignment stmt
  • performance warnings
  • return stmts

create or replace function fx()
returns t2 as $$
begin
return (10,20,30)::t1;
end;
$$ language plpgsql;
select * from plpgsql_check_function('fx()', performance_warnings := true);
plpgsql_check_function
---------------------------------------------------------------------------------
error:42804:3:RETURN:returned record type does not match expected record type
Detail: Returned type integer does not match expected type numeric in column 2.
(2 rows)

create or replace function f1()
returns setof int as $$
begin
return next 1::numeric; -- tolerant, doesn't use tupmap
end;
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
plpgsql_check_function
-------------------------------------------------------------------------------------------
performance:42804:3:RETURN NEXT:target variable has different type then expression result
Detail: assign "numeric" value to "integer" variable
Hint: Hidden casting can be a performance issue.
(3 rows)

create or replace function f1()
returns int as $$
begin
return current_date;
end;
$$ language plpgsql;
select * from plpgsql_check_function('f1()', performance_warnings := true);
plpgsql_check_function
----------------------------------------------------------------------------------
warning:42804:3:RETURN:target variable has different type then expression result
Detail: assign "date" value to "integer" variable
Hint: There are no possible explicit coercion between those types, possibly bug!
(3 rows)

Last month was a discussion about speed of plpgsql. There was example of synthetic tests - for these tests plpgsql is strongly penalized. There is zero optimization for intensive mathematics calculations :
DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..10000000 LOOP
if 0=0 then
n = SQRT (f);
end if;
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;

There is two issues: a) useless test 0=0 (Oracle throws it, Postgres has no any optimization for code cleaning), b) hidden (IO) cast from double to integer. Original execution runs 5sec on Oracle and 25sec on Postgres. (32 sec on my notebook).

After manual fix:
postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..10000000 LOOP
--if 0=0 then
n = SQRT (f)::real;
--end if;
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
NOTICE:  Result => 3162
DO
Time: 6771.010 ms
It is not bad - little bit slower than Oracle on wrong benchmark (plpgslq is not targeted for numeric calculations, plpgsql is best glue for SQL statements) without any internal optimization.

postgres=# load 'plpgsql';
LOAD
postgres=# create extension plpgsql_check ;
CREATE EXTENSION

        CREATE OR REPLACE FUNCTION public.foo()
         RETURNS void
         LANGUAGE plpgsql
1       AS $function$
2       DECLARE f integer; n real;
3       BEGIN
4       FOR f IN 1..10000000 LOOP
5       --if 0=0 then
6       n = SQRT (f);
7       --end if;
8       END LOOP;
9       RAISE NOTICE 'Result => %',n;
10      END $function$


postgres=# select * from plpgsql_check_function('foo()', fatal_errors := false, performance_warnings := true);
                                  plpgsql_check_function                                  
------------------------------------------------------------------------------------------
 performance:42804:6:assignment:target variable has different type then expression result
 Detail: assign "integer" value to "real" variable
 Hint: Hidden casting can be a performance issue.
 warning:00000:0:RETURN:unused declared variable
 Detail: variable f declared on line 2
(5 rows)

You can see two issues:
  • unused variable f - integer FOR stmt has own automatic variable  - so DECLARE f integer is useless
  • hidden IO cast on line six, that is main performance issue

Friday, June 27, 2014

plpgsql_check is available for PostgreSQL 9.2

Short note: I did backport of plpgsql_check to PostgreSQL 9.2.

What is plpgsql_check? It is plpgsql validator without necessity to run checked function and all paths inside.
postgres=# select oid::regprocedure, plpgsql_check_function(oid::regprocedure) from pg_proc where prolang = 12599 and prorettype <> 2279;
      oid      │                         plpgsql_check_function                          
───────────────┼─────────────────────────────────────────────────────────────────────────
 bubu(integer) │ error:42702:3:RETURN:column reference "a" is ambiguous
 bubu(integer) │ Query: SELECT (select * from t where t.a = a)
 bubu(integer) │ --                                         ^
 bubu(integer) │ Detail: It could refer to either a PL/pgSQL variable or a table column.
(4 rows)

Time: 4.857 ms

Friday, June 6, 2014

New PostgreSQL 9.4 article

I upgraded a mediawiki 1.23 on our Czech PostgreSQL site. New important feature is a integration of Google Translator. Now a articles from this site can be read in seven languages. New one is my fresh article about PostgreSQL 9.4 new features

Sunday, May 11, 2014

A speed of PL languages for atypical usage

Last week I play with Lua - It is really beautiful, powerful and strange language. I would to some image about this language so I wrote a primitive benchmark. Note: Although Lua is clean winner, I'll use a my favourite PL/pgSQL still - for business logic implementation is really best - and (I have to accent it) any real load limited by IO speed and available IOPS significantly decrease a differences of evaluation speed.

A typical usage of PL languages should be a glue of SQL statements. But sometimes can be useful use these languages for PostgreSQL library enhancing.
I test a simple variadic function - function "least" that I can to compare with native C implementation (buildin). I was little bit surprised by speed of Lua - it is really fast and only one order slower than C implementation - PL/pgSQL is not bad - it is slower than PL/Lua - but only two times (it is relative very fast SQL glue).

-- native implementation
postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 55.776 ms
Table foo has about 100K rows.
create table foo(a int, b int, c int, d int, e int);
insert into foo select random()*100, random()*100, random()*100, random()*100, random()*100 from generate_series(1,100000);

postgres=# select count(*) from foo;
 count  
────────
 100000
(1 row)

Time: 21.305 ms
I started with PL/pgSQL
CREATE OR REPLACE FUNCTION public.myleast1(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if result is null then 
      result := a; 
    elseif a < result then
      result := a;
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 996.684 ms
with small optimization (possible due result is not varlena type) it is about 3% faster
CREATE OR REPLACE FUNCTION public.myleast1a(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if a < result then 
      result := a; 
    else
      result := coalesce(result, a);
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1a(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 968.769 ms
Second possible optimization is reduction IF expressions (more than 18% speed-up):
CREATE OR REPLACE FUNCTION public.myleast1b(VARIADIC integer[])
 RETURNS integer
 LANGUAGE plpgsql
 IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if result is null or a < result then 
      result := a; 
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1b(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 801.634 ms
or totally removing IF statements (30% speedup, but code is little bit obscure):
CREATE OR REPLACE FUNCTION public.myleast1c(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop 
    result := case when a < result then a else coalesce(result, a) end; 
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1c(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 691.597 ms
Wrapping SQL in PL/pgSQL doesn't help
CREATE OR REPLACE FUNCTION public.myleast2(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  return (select min(v) from unnest($1) g(v));
end;
$function$

postgres=# select count(*) filter (where a = myleast2(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 1886.462 ms
Single line SQL functions is not faster than PL/pgSQL - the body of SQL function is not trivial, and Postgres cannot to inline function body effectively
CREATE OR REPLACE FUNCTION public.myleast3(VARIADIC integer[])
 RETURNS integer LANGUAGE sql IMMUTABLE STRICT
AS $function$select min(v) from unnest($1) g(v)$function$

postgres=# select count(*) filter (where a = myleast3(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 1238.185 ms
A winner of this test is implementation in PL/Lua - the code is readable and pretty fast.
CREATE OR REPLACE FUNCTION public.myleast4(VARIADIC a integer[])
 RETURNS integer LANGUAGE pllua IMMUTABLE STRICT
AS $function$
local result;
for k,v in pairs(a) do 
  if result == nil then 
    result = v
  elseif v < result then 
    result = v
  end; 
end
return result;
$function$

postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 469.174 ms
By contrast I was surprised a slower speed of PL/Perl (and write code was little bit more difficult). Sometimes I used a perl for similar small functions and looks so Lua is better than Perl for these purposes.
CREATE OR REPLACE FUNCTION public.myleast5(VARIADIC integer[])
 RETURNS integer LANGUAGE plperl IMMUTABLE STRICT
AS $function$
my $result = undef;
for my $value (@{$_[0]} ) {
  if (! defined $result || $value < $result) {
    $result = $value;
  }
}
return $result;
$function$

postgres=# select count(*) filter (where a = myleast5(a,b,c,d,e)) from foo;
 count 
───────
   535
(1 row)

Time: 1591.802 ms
I rechecked PL/Pythonu - it is fast too:
CREATE OR REPLACE FUNCTION public.myleast6(VARIADIC a integer[])
RETURNS integer LANGUAGE plpythonu
AS $function$
r = None
for x in a:
   if r is None or x < r:
      r = x
return r
$function$
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 621.150 ms
I did test of PL/v8 too (tested on different computer than previous tests), and it is slightly (few percent) faster than Python:
CREATE OR REPLACE FUNCTION public.myleast7(VARIADIC a integer[])
 RETURNS integer LANGUAGE plv8 AS $function$
var r = null;
for(var i = 0; i < a.length; i++) { 
  if (r === null || r > a[i]) r = a[i]; 
}
return r;
$function$

-- JavaScript on second computer
postgres=# select count(*) filter (where a = myleast7(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 371.943 ms

-- Python on second computer
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 388.633 ms
Next day I found so JavaScript and Python has special construct for finding minimum - and I tested it too:
CREATE OR REPLACE FUNCTION public.myleast7a(VARIADIC a integer[])
 RETURNS integer
 LANGUAGE plv8
AS $function$
return Math.min.apply(Math, a);
$function$

-- JavaScript on second computer
postgres=# select count(*) filter (where a = myleast7a(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 331.515 ms

-- Native implementation on second computer
postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 35.841 ms

CREATE OR REPLACE FUNCTION public.myleast6a(VARIADIC a integer[])
 RETURNS integer
 LANGUAGE plpythonu
AS $function$
return min(a)
$function$

-- it doesn't help too much (Python, second computer)
postgres=# select count(*) filter (where a = myleast6a(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 371.775 ms
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 379.546 ms

-- but still Lua is winner (Lua, second computer)
postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 271.235 ms
This optimization is important for PL/V8 but not for Python.

PL/Lua is not well known development environment - although it looks so for similar use cases is second candidate after C language. Still Perl is best for other special use cases due unlimited support of available extensions on CPAN.

Second note: this synthetic benchmark is not pretty fair - in really typical use case a real bottleneck is IO operations and the speed of similar functions should not be significant - this class of databases like Postgres, MSSQL, Oracle is hardly optimized on minimize IO operations. Numeric calculations are secondary target. Probably any IO waits clean differences between these implementations. Third note: The maximal difference on 100K rows is less than 2sec - there are lot of situation where this difference is insignificant. If I run this tests with different number of columns, then I results can be different. This tests shows three kind of costs: transformation from PostgreSQL array to target language array, scripting environment initialization and numeric expressions comparations. A importance of these factors can be different.

Attention: Every usage is specific - today I had to solve a issue of my customer, where code in PL/pgSQL needs about 4 sec, and after rewriting to SQL (where inlining was successful) the query calculation was about 200ms (so SQL is 20x faster). But usually SQL functions are fast, only when the inlining is available.

Friday, December 27, 2013

new PostgreSQL extension - plpgsql_check

Three years ago I worked on complex project, that was written in PL/pgSQL almost. I wrote a regress tests, but these tests was not able to find all errors due missing coverage some paths. So I wrote simple extension, that ensured a plan preparation for all embedded SQL on start. This extension helps me to find 99% bugs. I cleaned this extension and I published it as plpgsql_lint. As a plpgsql developer I am sure, so this functionality is missing in core, so I started in core implementation. Two years later I am still on start. I didn't find how to merge a requested functionality with current code cleanly (so committers will be happy) - probably we have to do some significant plpgsql refactoring first. And now is clean, so this code will not be in prepared release PostgreSQL 9.4. Although a code is not enough for committers, it can be useful for common plpgsql developers.

I started new github project - plpgsql_check. It is plpgsql_lint (next generation). plpgsql_lint checked functions on start. plpgsql_check is coming with plpgsql_check_function. You can check any plpgsql function without execution. It can be joined with event trigger and you can ensure a extended check after function creating.

Any comments, patches, ideas are welcome - you can try send me a email or you can join to related postgresql-extensions-hacking google group.

examples:
postgres=# LOAD 'plpgsql';
LOAD
postgres=# CREATE EXTENSION plpgsql_check;
LOAD
postgres=# CREATE TABLE t1(a int, b int);
CREATE TABLE

postgres=#
CREATE OR REPLACE FUNCTION public.f1()
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE r record;
BEGIN
  FOR r IN SELECT * FROM t1
  LOOP
    RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
  END LOOP;
END;
$function$;

CREATE FUNCTION

postgres=# select f1(); -- execution doesn't find a bug due empty table t1
  f1 
 ────

 (1 row)

postgres=# \x
Expanded display is on.
postgres=# select * from plpgsql_check_function_tb('f1()');
─[ RECORD 1 ]───────────────────────────
functionid │ f1
lineno     │ 6
statement  │ RAISE
sqlstate   │ 42703
message    │ record "r" has no field "c"
detail     │ [null]
hint       │ [null]
level      │ error
position   │ 0
query      │ [null]

postgres=# \sf+ f1
    CREATE OR REPLACE FUNCTION public.f1()
     RETURNS void
     LANGUAGE plpgsql
1       AS $function$
2       DECLARE r record;
3       BEGIN
4         FOR r IN SELECT * FROM t1
5         LOOP
6           RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
7         END LOOP;
8       END;
9       $function$

select * from plpgsql_check_function('f1()', fatal_errors := false);
                         plpgsql_check_function                         
------------------------------------------------------------------------
 error:42703:4:SQL statement:column "c" of relation "t1" does not exist
 Query: update t1 set c = 30
 --                   ^
 error:42P01:7:RAISE:missing FROM-clause entry for table "r"
 Query: SELECT r.c
 --            ^
 error:42601:7:RAISE:too few parameters specified for RAISE
(7 rows)

postgres=# select * from plpgsql_check_function('fx()', format:='xml');
                 plpgsql_check_function                     
────────────────────────────────────────────────────────────────
 <function oid="16400">                                        ↵
   <issue>                                                     ↵
     <level>error</level>                                      ↵
     <sqlstate>42P01</Sqlstate>                                ↵
     <message>relation "foo111" does not exist</Message>       ↵
     <stmt lineno="3">RETURN</Stmt>                            ↵
     <query position="23">SELECT (select a from foo111)</Query>↵
   </Issue>                                                    ↵
  </Function>
 (1 row)

Friday, December 20, 2013

some basic PostgreSQL orchestration from Bash

I had to designed some scripts for migration, upgrade and rebuild PostgreSQL. Some fragments can be useful for others:

Prolog:
#!/bin/bash

# Installation prefix
prefix=/usr/local/pgsql

PGCTL="$prefix/bin/pg_ctl"
PGUSER=postgres

if [ ! -n "$PGDATA" ]; then
  PGDATA="$prefix/data"
fi

Ensure a stop of PostgreSQL, continue when server is down:
echo "shutdown server"
su - $PGUSER -c "$PGCTL status -D $PGDATA" 
if [ $? != 1 ]; then
  su - $PGUSER -c "$PGCTL stop -D  $PGDATA -m fast"
  if [ $? != 1 -a $? != 0 ]; then
    su - $PGUSER  -c "$PGCTL status -D  $PGDATA"
    while [ $? != 1 -a $? != 3 ]; do
      echo "PostgreSQL still running.."
      sleep 1
      su - $PGUSER -c "$PGCTL status -D  $PGDATA"
    done
  fi
fi

Start server and continue when server is ready to accept connection:
echo "start server"
/etc/init.d/pgsql start
$prefix/bin/pg_isready
while [ $? != 0 ]; do
  sleep 1;
  $prefix/bin/pg_isready
done