Wednesday, December 29, 2010

Bitmapset for PL/pgSQL

PostgreSQL has a nice functions for operations over bitmapset. A bitmapset is set of positive integers. It should be smaller than array, and, what is important for me. it support very fast test if some value is in set or not. PostgreSQL internally use this functions, but there are not possibility to use it from SQL (PL/pgSQL). I wrote a wrapper for this functionality and moved it to PST collection - http://pgfoundry.org/frs/download.php/2909/pstcoll-10-12-29.tgz. so there is some preview:
pavel=# select pg_column_size(pst.bitmapset '{1,2,3,4}');
 pg_column_size 
----------------
              8
(1 row)

pavel=# select pg_column_size(array[1,2,3,4]);
 pg_column_size 
----------------
             40
(1 row)

pavel=# select pst.add_members('{}', 1, 2, 4,8);
 add_members 
-------------
 {1,2,4,8}
(1 row)

pavel=# select pst.is_member(pst.add_members('{}', 1, 2, 4,8), 8);
 is_member 
-----------
 t
(1 row)

pavel=# select pst.bitmapset_union('{1,2,3}','{6,2,9}');
 bitmapset_union 
-----------------
 {1,2,3,6,9}
(1 row)
Without bitmapsets we have to use a arrays. But bitmaps are better for storing some flags - it's more adequate tool.
pavel=# select count(*) from omega;
  count  
---------
 1010000
(1 row)

pavel=# select bitmapset_collect(a) from omega;
    bitmapset_collect     
--------------------------
 {0,1,2,3,4,5,6,7,8,9,10}
(1 row)

Time: 558.667 ms

pavel=# select array_agg(distinct a) from omega;
        array_agg         
--------------------------
 {0,1,2,3,4,5,6,7,8,9,10}
(1 row)

Time: 3859.567 ms
Using a bitmapset is about 7x faster.
pavel=# select del_members('{2,3,4,5,1}',2,3);
 del_members 
-------------
 {1,4,5}
(1 row)

pavel=# select bitmapset_is_subset('{1,2,3}','{1,3}');
 bitmapset_is_subset 
---------------------
 f
(1 row)

pavel=# select bitmapset_overlap('{1,2,3}','{1,3}');
 bitmapset_overlap 
-------------------
 t
(1 row)

pavel=# select bitmapset_difference('{1,2,3}','{1,3}');
 bitmapset_difference 
----------------------
 {2}
(1 row)
Probably this simple wrapper can be enhanced - some GiST or GIN index can be nice. The bitmapset is limited only on integers. There is not simple way to use it together with enums for example.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home