Wednesday, July 15, 2015

aliasing psql

When you run psql without arguments, then the psql tries to connect to database named like current user. I dislike this behave. Surely I have not database "pavel". There is trivial solution - add to your .bashrc code:
function psql { 
if [[ $# -eq 0 ]]; then
   env psql postgres
  env psql "$@"

Tuesday, July 14, 2015

simple parallel run statement in every database in PostgreSQL cluster

When I need to execute some statement in every database of some PostgreSQL cluster, I am using a script:
for db in `psql -At -c "select datname from pg_database where datname not in ('template0','template1')"`; 
  psql -At -c "select current_database(), pg_size_pretty(pg_table_size('pg_attribute')) where pg_table_size('pg_attribute') > 100 * 1024 * 1024" $db; 
Today I needed to run VACUUM statement for selected databases. So I needed to find a way, how to run this slow statement. I was to surprised how it is simple task due strong xarg command. And nice bonus - I can run this slow queries parallel - because xargs can run entered command in more workers (-P option):
# find databases with bloated pg_attribute table, and enforce VACUUM
for db in `psql -At -c "select datname from pg_database where datname not in ('template0','template1')"`; 
  psql -At -c "select current_database() where pg_table_size('pg_attribute') > 100 * 1024 * 1024" $db; 
done | xargs -P 3 -I % psql % -c "vacuum full  verbose analyze pg_attribute"


Previous example can be little bit simplified
psql postgres -At -c "select datname from pg_database where datname not in ('template0','template1')" |
xargs -P 3 -I % psql % -c "select current_database() where pg_table_size('pg_attribute') > 100 * 1024 * 1024" |
xargs -P 3 -I % psql % -c "vacuum full analyze pg_attribute"

Saturday, July 11, 2015

New Orafce and Plpgsql_check extensions released

I released a new versions of these packages: Orafce and Plpgsql_check. Its mostly bugfix releases with PostgreSQL 9.5 support.

Sunday, May 31, 2015

How to write robust database application?

  1. Fault tolerant design is cheaper than fault tolerant application and fault tolerant application is significantly cheaper than fault tolerant database and more robust. Database has not knowledge what is critical and what is not critical, what can be lost with less cost, and what must not be lost.
  2. Database is good tool for data processing, and wrong tool for communication - A design and implementation of communication server is more simply, more robust than master-master replication based communication solution. Design servers, design protocols, design services - don't lost your time with master-master replication and collision solutions.
  3. More smaller servers, databases are more robust, cheaper, practical than one super server, super database. Try to minimize dependency between components - when one component fails, all other should to work without significant problems some time (or critical services should to continue). Strong dependency  increases risk of total blackout. Think about failures. Decrease strong dependency, when it is possible and practical. Better data duplication than total blackout.
  4. Be sure, so you well understand, what kind is your application - for lot of application some 24x7 solution uselessly increase a complexity, cost, maintenance cost, time. The most simple solution is usually the best. But attention on too simply solutions!
  5. Think about deployment - strong dependency between components increase a risk of total failures, when you deploy your solution. Try to deploy any component without strong dependency on other component - you should be able to do rollback any individual component.

How to write fast database applications?

I would to write short notices about writing effective database applications.
  1. What can be done on database side simply, it should be done there! Don't move useless data from server to client. When you can use one statement instead thousands statements simply, use it. Thousands queries in total means slow application.
  2. Slow queries is mainly your problem. There are few typical reasons why queries are slow - some can be solved simply, some not. Aggregation over billion rows cannot be fast. You have to know well your slow queries. Without this knowledge, you cannot do any well positioned optimization. If you have to run slow query, don't repeat it if it is possible. Try to remove slow query every time - cache, indexes, redesign, partitioning, .. Try to understand, why is your query slow - missing index, bad estimation, big data, too rows, locks, ...
  3. Don't use relational database as cache - any other software is better.
  4. Use cache when you expect higher load - it is not necessary for small data, small load (intranet for less than hundreds users, data less 10GB), every else the cache is necessary - slow queries, often queries. Memory is good - use it intelligently.
  5. Ensure good monitoring of your database, your server - you have to know where is your hardware bottleneck - IO, CPU, writing to transaction log, checkpoints.
  6. Don't use relational databases everywhere - transactional relational databases are perfect for typical OLTP purposes - else where some simple application over files can be better - last decade there are lot of specialized databases - it can be faster, cheaper, more robust on special workloads. 

Friday, May 29, 2015

New features in PostgreSQL 9.5

I wrote article about 9.5 in Czech language. There is google translator - so I hope it can be interesting for somebody here too.

Monday, May 18, 2015

faster plpgsql in PostgreSQL 9.5 (second part)

Tom Lane introduced new state for TOAST types - expanded state. The side effect of this change together with some changes in plpgsql internals has big impact to speed of array operation in plpgsql.
I have two plpgsql block with cycles. First enforces a array fields update, second a array append.
DO $$ 
DECLARE a int[] = array_fill(10, ARRAY[10000]);
FOR i IN 1 .. 10000 LOOP
  a[i] := 0;

DO $$ 
DECLARE a int[] = '{}';
FOR i IN 1 .. 10000 LOOP
  a := a || 10;
You can try this code on PostgreSQL 9.4 - and you can get time 450 and 220ms. Same code needs 6 and 5 ms only on PostgreSQL 9.5! It is more than one order speedup.