Sunday, May 11, 2014

A speed of PL languages for atypical usage

Last week I play with Lua - It is really beautiful, powerful and strange language. I would to some image about this language so I wrote a primitive benchmark. Note: Although Lua is clean winner, I'll use a my favourite PL/pgSQL still - for business logic implementation is really best - and (I have to accent it) any real load limited by IO speed and available IOPS significantly decrease a differences of evaluation speed.

A typical usage of PL languages should be a glue of SQL statements. But sometimes can be useful use these languages for PostgreSQL library enhancing.
I test a simple variadic function - function "least" that I can to compare with native C implementation (buildin). I was little bit surprised by speed of Lua - it is really fast and only one order slower than C implementation - PL/pgSQL is not bad - it is slower than PL/Lua - but only two times (it is relative very fast SQL glue).

-- native implementation
postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 55.776 ms
Table foo has about 100K rows.
create table foo(a int, b int, c int, d int, e int);
insert into foo select random()*100, random()*100, random()*100, random()*100, random()*100 from generate_series(1,100000);

postgres=# select count(*) from foo;
 count  
────────
 100000
(1 row)

Time: 21.305 ms
I started with PL/pgSQL
CREATE OR REPLACE FUNCTION public.myleast1(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if result is null then 
      result := a; 
    elseif a < result then
      result := a;
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 996.684 ms
with small optimization (possible due result is not varlena type) it is about 3% faster
CREATE OR REPLACE FUNCTION public.myleast1a(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if a < result then 
      result := a; 
    else
      result := coalesce(result, a);
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1a(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 968.769 ms
Second possible optimization is reduction IF expressions (more than 18% speed-up):
CREATE OR REPLACE FUNCTION public.myleast1b(VARIADIC integer[])
 RETURNS integer
 LANGUAGE plpgsql
 IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop
    if result is null or a < result then 
      result := a; 
    end if;
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1b(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 801.634 ms
or totally removing IF statements (30% speedup, but code is little bit obscure):
CREATE OR REPLACE FUNCTION public.myleast1c(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  foreach a in array $1
  loop 
    result := case when a < result then a else coalesce(result, a) end; 
  end loop;
  return result;
end;
$function$
postgres=# select count(*) filter (where a = myleast1c(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 691.597 ms
Wrapping SQL in PL/pgSQL doesn't help
CREATE OR REPLACE FUNCTION public.myleast2(VARIADIC integer[])
 RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT
AS $function$
declare result int;
a int;
begin
  return (select min(v) from unnest($1) g(v));
end;
$function$

postgres=# select count(*) filter (where a = myleast2(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 1886.462 ms
Single line SQL functions is not faster than PL/pgSQL - the body of SQL function is not trivial, and Postgres cannot to inline function body effectively
CREATE OR REPLACE FUNCTION public.myleast3(VARIADIC integer[])
 RETURNS integer LANGUAGE sql IMMUTABLE STRICT
AS $function$select min(v) from unnest($1) g(v)$function$

postgres=# select count(*) filter (where a = myleast3(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 1238.185 ms
A winner of this test is implementation in PL/Lua - the code is readable and pretty fast.
CREATE OR REPLACE FUNCTION public.myleast4(VARIADIC a integer[])
 RETURNS integer LANGUAGE pllua IMMUTABLE STRICT
AS $function$
local result;
for k,v in pairs(a) do 
  if result == nil then 
    result = v
  elseif v < result then 
    result = v
  end; 
end
return result;
$function$

postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 469.174 ms
By contrast I was surprised a slower speed of PL/Perl (and write code was little bit more difficult). Sometimes I used a perl for similar small functions and looks so Lua is better than Perl for these purposes.
CREATE OR REPLACE FUNCTION public.myleast5(VARIADIC integer[])
 RETURNS integer LANGUAGE plperl IMMUTABLE STRICT
AS $function$
my $result = undef;
for my $value (@{$_[0]} ) {
  if (! defined $result || $value < $result) {
    $result = $value;
  }
}
return $result;
$function$

postgres=# select count(*) filter (where a = myleast5(a,b,c,d,e)) from foo;
 count 
───────
   535
(1 row)

Time: 1591.802 ms
I rechecked PL/Pythonu - it is fast too:
CREATE OR REPLACE FUNCTION public.myleast6(VARIADIC a integer[])
RETURNS integer LANGUAGE plpythonu
AS $function$
r = None
for x in a:
   if r is None or x < r:
      r = x
return r
$function$
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
───────
 20634
(1 row)

Time: 621.150 ms
I did test of PL/v8 too (tested on different computer than previous tests), and it is slightly (few percent) faster than Python:
CREATE OR REPLACE FUNCTION public.myleast7(VARIADIC a integer[])
 RETURNS integer LANGUAGE plv8 AS $function$
var r = null;
for(var i = 0; i < a.length; i++) { 
  if (r === null || r > a[i]) r = a[i]; 
}
return r;
$function$

-- JavaScript on second computer
postgres=# select count(*) filter (where a = myleast7(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 371.943 ms

-- Python on second computer
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 388.633 ms
Next day I found so JavaScript and Python has special construct for finding minimum - and I tested it too:
CREATE OR REPLACE FUNCTION public.myleast7a(VARIADIC a integer[])
 RETURNS integer
 LANGUAGE plv8
AS $function$
return Math.min.apply(Math, a);
$function$

-- JavaScript on second computer
postgres=# select count(*) filter (where a = myleast7a(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 331.515 ms

-- Native implementation on second computer
postgres=# select count(*) filter (where a = least(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 35.841 ms

CREATE OR REPLACE FUNCTION public.myleast6a(VARIADIC a integer[])
 RETURNS integer
 LANGUAGE plpythonu
AS $function$
return min(a)
$function$

-- it doesn't help too much (Python, second computer)
postgres=# select count(*) filter (where a = myleast6a(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 371.775 ms
postgres=# select count(*) filter (where a = myleast6(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 379.546 ms

-- but still Lua is winner (Lua, second computer)
postgres=# select count(*) filter (where a = myleast4(a,b,c,d,e)) from foo;
 count 
-------
 20634
(1 row)

Time: 271.235 ms
This optimization is important for PL/V8 but not for Python.

PL/Lua is not well known development environment - although it looks so for similar use cases is second candidate after C language. Still Perl is best for other special use cases due unlimited support of available extensions on CPAN.

Second note: this synthetic benchmark is not pretty fair - in really typical use case a real bottleneck is IO operations and the speed of similar functions should not be significant - this class of databases like Postgres, MSSQL, Oracle is hardly optimized on minimize IO operations. Numeric calculations are secondary target. Probably any IO waits clean differences between these implementations. Third note: The maximal difference on 100K rows is less than 2sec - there are lot of situation where this difference is insignificant. If I run this tests with different number of columns, then I results can be different. This tests shows three kind of costs: transformation from PostgreSQL array to target language array, scripting environment initialization and numeric expressions comparations. A importance of these factors can be different.

Attention: Every usage is specific - today I had to solve a issue of my customer, where code in PL/pgSQL needs about 4 sec, and after rewriting to SQL (where inlining was successful) the query calculation was about 200ms (so SQL is 20x faster). But usually SQL functions are fast, only when the inlining is available.