A typical usage of PL languages should be a glue of SQL statements. But sometimes can be useful use these languages for PostgreSQL library enhancing.
I test a simple variadic function - function "least" that I can to compare with native C implementation (buildin). I was little bit surprised by speed of Lua - it is really fast and only one order slower than C implementation - PL/pgSQL is not bad - it is slower than PL/Lua - but only two times (it is relative very fast SQL glue).
-- native implementation postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 55.776 msTable foo has about 100K rows.
create table foo(a int, b int, c int, d int, e int); insert into foo select random()*100, random()*100, random()*100, random()*100, random()*100 from generate_series(1,100000); postgres=# select count(*) from foo; count ──────── 100000 (1 row) Time: 21.305 msI started with PL/pgSQL
CREATE OR REPLACE FUNCTION public.myleast1(VARIADIC integer[]) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ declare result int; a int; begin foreach a in array $1 loop if result is null then result := a; elseif a < result then result := a; end if; end loop; return result; end; $function$ postgres=# select count(*) filter (where a = myleast1(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 996.684 mswith small optimization (possible due result is not varlena type) it is about 3% faster
CREATE OR REPLACE FUNCTION public.myleast1a(VARIADIC integer[]) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ declare result int; a int; begin foreach a in array $1 loop if a < result then result := a; else result := coalesce(result, a); end if; end loop; return result; end; $function$ postgres=# select count(*) filter (where a = myleast1a(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 968.769 msSecond possible optimization is reduction IF expressions (more than 18% speed-up):
CREATE OR REPLACE FUNCTION public.myleast1b(VARIADIC integer[]) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ declare result int; a int; begin foreach a in array $1 loop if result is null or a < result then result := a; end if; end loop; return result; end; $function$ postgres=# select count(*) filter (where a = myleast1b(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 801.634 msor totally removing IF statements (30% speedup, but code is little bit obscure):
CREATE OR REPLACE FUNCTION public.myleast1c(VARIADIC integer[]) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ declare result int; a int; begin foreach a in array $1 loop result := case when a < result then a else coalesce(result, a) end; end loop; return result; end; $function$ postgres=# select count(*) filter (where a = myleast1c(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 691.597 msWrapping SQL in PL/pgSQL doesn't help
CREATE OR REPLACE FUNCTION public.myleast2(VARIADIC integer[]) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ declare result int; a int; begin return (select min(v) from unnest($1) g(v)); end; $function$ postgres=# select count(*) filter (where a = myleast2(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 1886.462 msSingle line SQL functions is not faster than PL/pgSQL - the body of SQL function is not trivial, and Postgres cannot to inline function body effectively
CREATE OR REPLACE FUNCTION public.myleast3(VARIADIC integer[]) RETURNS integer LANGUAGE sql IMMUTABLE STRICT AS $function$select min(v) from unnest($1) g(v)$function$ postgres=# select count(*) filter (where a = myleast3(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 1238.185 msA winner of this test is implementation in PL/Lua - the code is readable and pretty fast.
CREATE OR REPLACE FUNCTION public.myleast4(VARIADIC a integer[]) RETURNS integer LANGUAGE pllua IMMUTABLE STRICT AS $function$ local result; for k,v in pairs(a) do if result == nil then result = v elseif v < result then result = v end; end return result; $function$ postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 469.174 msBy contrast I was surprised a slower speed of PL/Perl (and write code was little bit more difficult). Sometimes I used a perl for similar small functions and looks so Lua is better than Perl for these purposes.
CREATE OR REPLACE FUNCTION public.myleast5(VARIADIC integer[]) RETURNS integer LANGUAGE plperl IMMUTABLE STRICT AS $function$ my $result = undef; for my $value (@{$_[0]} ) { if (! defined $result || $value < $result) { $result = $value; } } return $result; $function$ postgres=# select count(*) filter (where a = myleast5(a,b,c,d,e)) from foo; count ─────── 535 (1 row) Time: 1591.802 msI rechecked PL/Pythonu - it is fast too:
CREATE OR REPLACE FUNCTION public.myleast6(VARIADIC a integer[]) RETURNS integer LANGUAGE plpythonu AS $function$ r = None for x in a: if r is None or x < r: r = x return r $function$ postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo; count ─────── 20634 (1 row) Time: 621.150 msI did test of PL/v8 too (tested on different computer than previous tests), and it is slightly (few percent) faster than Python:
CREATE OR REPLACE FUNCTION public.myleast7(VARIADIC a integer[]) RETURNS integer LANGUAGE plv8 AS $function$ var r = null; for(var i = 0; i < a.length; i++) { if (r === null || r > a[i]) r = a[i]; } return r; $function$ -- JavaScript on second computer postgres=# select count(*) filter (where a = myleast7(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 371.943 ms -- Python on second computer postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 388.633 msNext day I found so JavaScript and Python has special construct for finding minimum - and I tested it too:
CREATE OR REPLACE FUNCTION public.myleast7a(VARIADIC a integer[]) RETURNS integer LANGUAGE plv8 AS $function$ return Math.min.apply(Math, a); $function$ -- JavaScript on second computer postgres=# select count(*) filter (where a = myleast7a(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 331.515 ms -- Native implementation on second computer postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 35.841 ms CREATE OR REPLACE FUNCTION public.myleast6a(VARIADIC a integer[]) RETURNS integer LANGUAGE plpythonu AS $function$ return min(a) $function$ -- it doesn't help too much (Python, second computer) postgres=# select count(*) filter (where a = myleast6a(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 371.775 ms postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 379.546 ms -- but still Lua is winner (Lua, second computer) postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo; count ------- 20634 (1 row) Time: 271.235 msThis optimization is important for PL/V8 but not for Python.
PL/Lua is not well known development environment - although it looks so for similar use cases is second candidate after C language. Still Perl is best for other special use cases due unlimited support of available extensions on CPAN.
Second note: this synthetic benchmark is not pretty fair - in really typical use case a real bottleneck is IO operations and the speed of similar functions should not be significant - this class of databases like Postgres, MSSQL, Oracle is hardly optimized on minimize IO operations. Numeric calculations are secondary target. Probably any IO waits clean differences between these implementations. Third note: The maximal difference on 100K rows is less than 2sec - there are lot of situation where this difference is insignificant. If I run this tests with different number of columns, then I results can be different. This tests shows three kind of costs: transformation from PostgreSQL array to target language array, scripting environment initialization and numeric expressions comparations. A importance of these factors can be different.
Attention: Every usage is specific - today I had to solve a issue of my customer, where code in PL/pgSQL needs about 4 sec, and after rewriting to SQL (where inlining was successful) the query calculation was about 200ms (so SQL is 20x faster). But usually SQL functions are fast, only when the inlining is available.
Although an interesting exercise, I think it really represents a false result. Stored procedures/Functions are best used when multiple operations must be executed to get the correct result.
ReplyDeleteI would also like to see some tests against plV8 and plpython.
@Joshua - I wrote, so this benchmark is synthetic - so it is only one part of the true, but similar functions are used - sometimes for emulation other databases or higher postgresql releases. But some results are realistic - Postgres comparative functions are more expensive then native, and Perl comparative functions are slower too, due possible casting from strings.
ReplyDeleteI bet PL/V8 will be the fastest.
ReplyDelete@Neil - I cannot to compile PL/V8 on my Scientific Linux and gcc 4.4.7. I'll check it office this week
ReplyDelete@Neil, No, Lua is faster - PL/V8 is lightly faster than Python, but Lua is about 30% faster than PL/V8 - tested on Intel Lenovo T520
ReplyDeleteWhen did
ReplyDeleteSELECT .. FILTER (WHERE ... )
get postgres support ?
@Anonymous: It is new feature from preparing 9.4 (planned release autumn 2014)
ReplyDelete