Pavel Stehule's blog: December 2007

Oracle has new function result cache. It's nice idea. There are some use cases, mostly in www applications. There are nothing similar in PostgreSQL. I found simple similar solution in plperl (with important disadvantage to Oracle solution). It works, but without any pooling mechanisms its not really effective.

I created some tables:

CREATE TABLE Books(
id serial PRIMARY KEY,
name VARCHAR(20));

CREATE TABLE Sale(
book_id integer REFERENCES Books(id),
inserted timestamp DEFAULT(CURRENT_TIMESTAMP)
);

INSERT INTO Books VALUES(1,'Dracula');
INSERT INTO Books VALUES(2,'Nosferatu');
INSERT INTO Books VALUES(3,'Bacula');

INSERT INTO Sale VALUES(1, '2007-10-11');
INSERT INTO Sale VALUES(2, '2007-10-12');
INSERT INTO Sale VALUES(2, '2007-10-13');
INSERT INTO Sale VALUES(3, '2007-10-10');

and I created function Top10Books that returns top ten books.

-- Top10
CREATE OR REPLACE FUNCTION Top10Books(IN date, OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
BEGIN
ordr := 0;
FOR name IN SELECT b.name
              FROM Books b
                   JOIN
                   Sale s
                   ON b.id = s.book_id
             WHERE s.inserted BETWEEN date_trunc('month', $1)
                                  AND date_trunc('month', $1)
           + interval '1month' - interval '1day'
             GROUP BY b.name
             ORDER BY count(*) DESC
             LIMIT 10
LOOP
ordr := ordr + 1;
RETURN NEXT;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;

One my customer put similar function on every page on some site (with really destructive impact to load on db server. Correct solution is to use some php cache or some similar tool. With plperl I am able cache result in Postgres too:

CREATE OR REPLACE FUNCTION Top10BooksCached(IN date, IN bool,
       OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
return $_SHARED{tableof_top10book}
    if (defined ($_SHARED{tableof_top10book}) and not (defined($_[1]) and $_[1] eq "t"));
if (not defined($_SHARED{plan_for_top10books}))
{
   $_SHARED{plan_for_top10books} = spi_prepare(
           'SELECT b.name                                                                                                                                                                                                
              FROM Books b                                                                                                                                                                                               
                   JOIN                                                                                                                                                                                                  
                   Sale s                                                                                                                                                                                                
                   ON b.id = s.book_id                                                                                                                                                                                   
             WHERE s.inserted BETWEEN date_trunc(\'month\', $1)                                                                                                                                                          
                                  AND date_trunc(\'month\', $1)
    + interval \'1month\' - interval \'1day\'                                                                                                                
             GROUP BY b.name                                                                                                                                                                                             
             ORDER BY count(*) DESC                                                                                                                                                                                      
             LIMIT 10' , 'DATE');
}
my $row;
my $i = 0;
my $heap;
my $sth = spi_query_prepared($_SHARED{plan_for_top10books}, $_[0]);
while (defined ($row = spi_fetchrow($sth))) {
    push @$heap, {ordr => ++$i, name => $row->{name}}
}
$_SHARED{tableof_top10book} =  $heap ;
return $_SHARED{tableof_top10book};
$$ LANGUAGE plperlu;

or with cache expiration

CREATE OR REPLACE FUNCTION Top10BooksCached(IN date, IN integer,
     OUT ordr integer, OUT name varchar(20))
RETURNS SETOF RECORD
AS $$
return $_SHARED{tableof_top10book}
    if (defined ($_SHARED{tableof_top10book})
            and defined($_SHARED{actualised_top10book})
            and ($_SHARED{actualised_top10book} + $_[1] > time));
if (not defined($_SHARED{plan_for_top10books}))
{
   $_SHARED{plan_for_top10books} = spi_prepare(
           'SELECT b.name                                                                                                                                                                                                
              FROM Books b                                                                                                                                                                                               
                   JOIN                                                                                                                                                                                                  
                   Sale s                                                                                                                                                                                                
                   ON b.id = s.book_id                                                                                                                                                                                   
             WHERE s.inserted BETWEEN date_trunc(\'month\', $1)                                                                                                                                                          
                                  AND date_trunc(\'month\', $1)
    + interval \'1month\' - interval \'1day\'                                                                                                                
             GROUP BY b.name                                                                                                                                                                                             
             ORDER BY count(*) DESC                                                                                                                                                                                      
             LIMIT 10' , 'DATE');
}
my $row;
my $i = 0;
my $heap;
my $sth = spi_query_prepared($_SHARED{plan_for_top10books}, $_[0]);
while (defined ($row = spi_fetchrow($sth))) {
    push @$heap, {ordr => ++$i, name => $row->{name}}
}
$_SHARED{tableof_top10book} =  $heap ;
$_SHARED{actualised_top10book} = time;
return $_SHARED{tableof_top10book};
$$ LANGUAGE plperlu;

An usage is simple:

-- first call is slow
postgres=# select * from Top10BooksCached(current_date, 300);
ordr |   name
------+-----------
1 | Nosferatu
2 | Bacula
3 | Dracula
(3 rows)

Time: 128,965 ms
-- second call is fast
postgres=# select * from Top10BooksCached(current_date, 300);
ordr |   name
------+-----------
1 | Nosferatu
2 | Bacula
3 | Dracula
(3 rows)

Time: 11,911 ms

Use it carefully! It's not good for big tables, and what more, this function can returns out-of-date values.

Labels: plperl, postgresql

Pavel Stehule's blog

Saturday, December 1, 2007

Using $_SHARED as table cache in plperl

About Me

Links

Previous Posts

Archives