Saturday, March 14, 2015

long term monitoring communication between PostgreSQL client and PostgreSQL server

We have some issue with our application and pgbouncer. We detect some new errors with very low frequency. One possibility how to detect a reason of these issues is monitoring the communication between our application and Postgres. I found a great tool pgShark. But I had to solve two issues.
  1. I have to reduce logged content - lot of messages are unimportant for my purpose or generate lot of content.  pgs-debug hasn't any option for it, so I had to modify source code. You can comment unwanted method. I disabled: Bind, BindComplete, CopyData, DataRow, Describe, Parse, ParseComplete, RowDescription, Sync. After this change the compressed log was few GB per day.
  2. I had a output (log) with attached timestamp. I can do it simply in bash:
    | while read line; do echo `date +"%T.%3N"` $line; done | 
I wrote a line:
unbuffer ./pgs-debug --host 127.0.0.1 -i lo --port 6432 | while read line; do echo `date +"%d %T.%3N"` $line; done | gzip > /mnt/large/pgsharklog.gz

It does what I need:
12:55:13.407 P=1425556513.403313, s=288765600856048 type=SSLRequest, F -> B
12:55:13.408 SSL REQUEST
12:55:13.409
12:55:13.411 P=1425556513.403392, s=288765600856048 type=SSLAnswer, B -> F
12:55:13.412 SSL BACKEND ANSWER: N
12:55:13.414
12:55:13.415 P=1425556513.403486, s=288765600856048 type=StartupMessage, F -> B
12:55:13.416 STARTUP MESSAGE version: 3
12:55:13.418 database=db_lc3hfmn22q8vdt6mhopr2wj4zskyaous
12:55:13.419 application_name=starjoin
12:55:13.420 user=beard
12:55:13.421
12:55:13.423 P=1425556513.403526, s=288765600856048 type=AuthenticationMD5Password, B -> F
12:55:13.424 AUTHENTIFICATION REQUEST code=5 (MD5 salt='fe45f1a1')
12:55:13.425
12:55:13.426 P=1425556513.403577, s=288765600856048 type=PasswordMessage, F -> B
12:55:13.428 PASSWORD MESSAGE password=md5a0cd0711e0e191467bca6e94c03fb50f
12:55:13.429
12:55:13.430 P=1425556513.403614, s=288765600856048 type=AuthenticationOk, B -> F
12:55:13.431 AUTHENTIFICATION REQUEST code=0 (SUCCESS)
12:55:13.433
12:55:13.434 P=1425556513.403614, s=288765600856048 type=ParameterStatus, B -> F
12:55:13.435 PARAMETER STATUS name='integer_datetimes', value='on'
12:55:13.436
12:55:13.437 P=1425556513.403614, s=288765600856048 type=ParameterStatus, B -> F
12:55:13.439 PARAMETER STATUS name='IntervalStyle', value='postgres'
12:55:13.440