Thursday, June 6, 2024

More complex log in json format processing

 I needed to summarize number of bugs per week per user

JSON log entry looks like:
{"timestamp":"2024-06-07 00:10:56.525 CEST","user":"backend","dbname":"prd","pid":3557301,"remote_host":"199.16.203.8",
"remote_port":31315,"session_id":"666229de.3647b5","line_num":3,"ps":"INSERT","session_start":"2024-06-06 23:27:58 CEST","vxid":"67/48252790",
"txid":2033150293,"error_severity":"ERROR","state_code":"23505","message":"duplicate key value violates unique constraint \"payment_from_account_trid_key\"",
"detail":"Key (trid)=(440607272834519) already exists.","statement":"insert into ... values ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10)\nRETURNING *","application_name":"PostgreSQL JDBC Driver",
"backend_type":"client backend","query_id":8536052635720301574}
cat postgresql-*.json | \
jq -r 'select(.error_severity=="ERROR") | select (.application_name | (startswith("pgAdmin") or startswith("IntelliJ"))) ' | \
jq -rs 'group_by (.user, .message) | map({user: .[0].user, message: .[0].message, count: length}) | .[] | [.count, .user, .message] | @tsv'
I had to learn so grouping inside jq is allowed only on json array (and result is an array again), so I needed to merge input records to array by using -s, --slurp option. For final processing, I need to translate result array just to set of records by syntax .[]. Transformation to tsv (tab separated data) are used for better readability than csv. result in tsv format can looks like:
1	lukas	relation "item_image" does not exist
1	lukas	relation "item_image_share" does not exist
3	petr	UNION types "char" and text cannot be matched
2	petr	canceling statement due to user request
6	petr	current transaction is aborted, commands ignored until end of transaction block
1	petr	duplicate key value violates unique constraint "idx_op"
1	prd_jenkins_flyway	relation "item_shipping" does not exist
jq language is powerful and strange. It is strange to use | pipe symbol inside an expression - like filter on application_name looks:
.application_name | startswith("xxx") or startswith("yyyy")
Unfortunately, I didn't find a documentation named "gentle introduction to jq for people who knows SQL and C", so using jq language looks scary , but it is working pretty well.