Saturday, December 1, 2007

Using $_SHARED as table cache in plperl

Oracle has new function result cache. It's nice idea. There are some use cases, mostly in www applications. There are nothing similar in PostgreSQL. I found simple similar solution in plperl (with important disadvantage to Oracle solution). It works, but without any pooling mechanisms its not really effective.

I created some tables:
CREATE TABLE Books(
id serial PRIMARY KEY,
name VARCHAR(20));

CREATE TABLE Sale(
book_id integer REFERENCES Books(id),
inserted timestamp DEFAULT(CURRENT_TIMESTAMP)
);

INSERT INTO Books VALUES(1,'Dracula');
INSERT INTO Books VALUES(2,'Nosferatu');
INSERT INTO Books VALUES(3,'Bacula');

INSERT INTO Sale VALUES(1, '2007-10-11');
INSERT INTO Sale VALUES(2, '2007-10-12');
INSERT INTO Sale VALUES(2, '2007-10-13');
INSERT INTO Sale VALUES(3, '2007-10-10');
and I created function Top10Books that returns top ten books.
-- Top10
CREATE OR REPLACE FUNCTION Top10Books(IN date, OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
BEGIN
ordr := 0;
FOR name IN SELECT b.name
FROM Books b
JOIN
Sale s
ON b.id = s.book_id
WHERE s.inserted BETWEEN date_trunc('month', $1)
AND date_trunc('month', $1)
+ interval '1month' - interval '1day'
GROUP BY b.name
ORDER BY count(*) DESC
LIMIT 10
LOOP
ordr := ordr + 1;
RETURN NEXT;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
One my customer put similar function on every page on some site (with really destructive impact to load on db server. Correct solution is to use some php cache or some similar tool. With plperl I am able cache result in Postgres too:
CREATE OR REPLACE FUNCTION Top10BooksCached(IN date, IN bool,
OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
return $_SHARED{tableof_top10book}
if (defined ($_SHARED{tableof_top10book}) and not (defined($_[1]) and $_[1] eq "t"));
if (not defined($_SHARED{plan_for_top10books}))
{
$_SHARED{plan_for_top10books} = spi_prepare(
'SELECT b.name
FROM Books b
JOIN
Sale s
ON b.id = s.book_id
WHERE s.inserted BETWEEN date_trunc(\'month\', $1)
AND date_trunc(\'month\', $1)
+ interval \'1month\' - interval \'1day\'
GROUP BY b.name
ORDER BY count(*) DESC
LIMIT 10' , 'DATE');
}
my $row;
my $i = 0;
my $heap;
my $sth = spi_query_prepared($_SHARED{plan_for_top10books}, $_[0]);
while (defined ($row = spi_fetchrow($sth))) {
push @$heap, {ordr => ++$i, name => $row->{name}}
}
$_SHARED{tableof_top10book} = $heap ;
return $_SHARED{tableof_top10book};
$$ LANGUAGE plperlu;
or with cache expiration
CREATE OR REPLACE FUNCTION Top10BooksCached(IN date, IN integer,
OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
return $_SHARED{tableof_top10book}
if (defined ($_SHARED{tableof_top10book})
and defined($_SHARED{actualised_top10book})
and ($_SHARED{actualised_top10book} + $_[1] > time));
if (not defined($_SHARED{plan_for_top10books}))
{
$_SHARED{plan_for_top10books} = spi_prepare(
'SELECT b.name
FROM Books b
JOIN
Sale s
ON b.id = s.book_id
WHERE s.inserted BETWEEN date_trunc(\'month\', $1)
AND date_trunc(\'month\', $1)
+ interval \'1month\' - interval \'1day\'
GROUP BY b.name
ORDER BY count(*) DESC
LIMIT 10' , 'DATE');
}
my $row;
my $i = 0;
my $heap;
my $sth = spi_query_prepared($_SHARED{plan_for_top10books}, $_[0]);
while (defined ($row = spi_fetchrow($sth))) {
push @$heap, {ordr => ++$i, name => $row->{name}}
}
$_SHARED{tableof_top10book} = $heap ;
$_SHARED{actualised_top10book} = time;
return $_SHARED{tableof_top10book};
$$ LANGUAGE plperlu;
An usage is simple:
-- first call is slow
postgres=# select * from Top10BooksCached(current_date, 300);
ordr | name
------+-----------
1 | Nosferatu
2 | Bacula
3 | Dracula
(3 rows)

Time: 128,965 ms
-- second call is fast
postgres=# select * from Top10BooksCached(current_date, 300);
ordr | name
------+-----------
1 | Nosferatu
2 | Bacula
3 | Dracula
(3 rows)

Time: 11,911 ms
Use it carefully! It's not good for big tables, and what more, this function can returns out-of-date values.

Labels: ,