Sunday, May 31, 2015

How to write robust database application?

  1. Fault tolerant design is cheaper than fault tolerant application and fault tolerant application is significantly cheaper than fault tolerant database and more robust. Database has not knowledge what is critical and what is not critical, what can be lost with less cost, and what must not be lost.
  2. Database is good tool for data processing, and wrong tool for communication - A design and implementation of communication server is more simply, more robust than master-master replication based communication solution. Design servers, design protocols, design services - don't lost your time with master-master replication and collision solutions.
  3. More smaller servers, databases are more robust, cheaper, practical than one super server, super database. Try to minimize dependency between components - when one component fails, all other should to work without significant problems some time (or critical services should to continue). Strong dependency  increases risk of total blackout. Think about failures. Decrease strong dependency, when it is possible and practical. Better data duplication than total blackout.
  4. Be sure, so you well understand, what kind is your application - for lot of application some 24x7 solution uselessly increase a complexity, cost, maintenance cost, time. The most simple solution is usually the best. But attention on too simply solutions!
  5. Think about deployment - strong dependency between components increase a risk of total failures, when you deploy your solution. Try to deploy any component without strong dependency on other component - you should be able to do rollback any individual component.

How to write fast database applications?

I would to write short notices about writing effective database applications.
  1. What can be done on database side simply, it should be done there! Don't move useless data from server to client. When you can use one statement instead thousands statements simply, use it. Thousands queries in total means slow application.
  2. Slow queries is mainly your problem. There are few typical reasons why queries are slow - some can be solved simply, some not. Aggregation over billion rows cannot be fast. You have to know well your slow queries. Without this knowledge, you cannot do any well positioned optimization. If you have to run slow query, don't repeat it if it is possible. Try to remove slow query every time - cache, indexes, redesign, partitioning, .. Try to understand, why is your query slow - missing index, bad estimation, big data, too rows, locks, ...
  3. Don't use relational database as cache - any other software is better.
  4. Use cache when you expect higher load - it is not necessary for small data, small load (intranet for less than hundreds users, data less 10GB), every else the cache is necessary - slow queries, often queries. Memory is good - use it intelligently.
  5. Ensure good monitoring of your database, your server - you have to know where is your hardware bottleneck - IO, CPU, writing to transaction log, checkpoints.
  6. Don't use relational databases everywhere - transactional relational databases are perfect for typical OLTP purposes - else where some simple application over files can be better - last decade there are lot of specialized databases - it can be faster, cheaper, more robust on special workloads. 

Friday, May 29, 2015

New features in PostgreSQL 9.5

I wrote article about 9.5 in Czech language. There is google translator - so I hope it can be interesting for somebody here too.

Monday, May 18, 2015

faster plpgsql in PostgreSQL 9.5 (second part)

Tom Lane introduced new state for TOAST types - expanded state. The side effect of this change together with some changes in plpgsql internals has big impact to speed of array operation in plpgsql.
I have two plpgsql block with cycles. First enforces a array fields update, second a array append.
DO $$ 
DECLARE a int[] = array_fill(10, ARRAY[10000]);
BEGIN
FOR i IN 1 .. 10000 LOOP
  a[i] := 0;
END LOOP;
END;
$$;

DO $$ 
DECLARE a int[] = '{}';
BEGIN
FOR i IN 1 .. 10000 LOOP
  a := a || 10;
END LOOP;
END;
$$;
You can try this code on PostgreSQL 9.4 - and you can get time 450 and 220ms. Same code needs 6 and 5 ms only on PostgreSQL 9.5! It is more than one order speedup.

Sunday, May 10, 2015

faster plpgsql in PostgreSQL 9.5 (first part)

The PL/pgSQL is important part of PostgreSQL. A implementation of plpgsql is terrible simple, what has two impacts (one positive, one negative).
  • We are able to implement new features very simply - implementation of some new functionality like ASSER or RAISE WITH CONTEXT needs only few hours.
  • Some usual features from other languages are missing or it its implementation is suboptimal. But there are significant progress in prepared PostgreSQL 9.5.
One less known issue of PLpgSQL is casting. PLpgSQL uses IO casting - every time, when type of variables or typmod isnot identical, plpgsql runtime transforms values from source type to text and back to target type. This IO casting is very simple for implementation, but it should be very hidden performance killer. plpgsql_check can raise warnings about it. You can see a impact of this IO cast on following test:
-- tested on PostgreSQL 9.4 (synthetic test, worst case) 
-- IO cast
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i; end loop; end $$;
DO
Time: 6708.727 ms
-- enforced binary casts
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i::numeric; end loop; end $$;
DO
Time: 4738.093 ms
    Tom Lane wrote and commited patch, that enforce binary casting everywhere it is possible. When it is impossible, then IO casting is used as before.
    -- tested on PostgreSQL 9.5
    postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i; end loop; end $$;
    DO
    Time: 3992.551 ms
    postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i::numeric; end loop; end $$;
    DO
    Time: 3693.739 ms
    
    There is still some overhead, but it is significantly less 30% versus 7.5%. Other positive - PostgreSQL is in this test about 30% faster (it is synthetic test, so reality should be different, but, the expectation so 9.5 will be faster than 9.4 is valid).