Pavel Stehule's blog

Saturday, April 19, 2025

Article about PostgreSQL 18

I wrote an article about PostgreSQL 18. It is in Czech language, but translators from Czech to English, German, ... works relatively well today.

https://www.root.cz/clanky/postgresql-18-tricet-let-otevreneho-vyvoje-databaze/

Saturday, March 29, 2025

How to fix Hibernate bug by conditional index

Yesterday I found significant grow of seq read tuples. After some investigation I found query with strange predicate:

WHERE 1 = case when pb1_0.parent_id is not null then 0 end

It is really strange, and I had to ask, who wrote it.

The reply is - Hibernate. It is a transformation of predicate parent_id = (?) when the list of id is empty.

Unfortunately, PostgreSQL is not able to detect so this predicate is always false, and then the repeated execution ended in repeated full scans.

Fortunately, Postgres has simple workaround - conditional index

CREATE INDEX ON TABLE pb(id)
  WHERE 1 = case when pb1_0.parent_id is not null then 0 end

This index is always empty, and then index scan is fast.

This issue should be fixed in more recent versions of Hibernate where predicate 1=0 is generated instead.

Thursday, February 27, 2025

fresh plpgsql_check 2.7.15 for PostgreSQL 16, 17 for MS Windows

I compiled and uploaded zip files plpgsql_check for PostgreSQL 16 and PostgreSQL 17 - I used Microsoft Visual C 2022.

Setup:

download plpgsql_check-2.7.15-x86_64-windows.zip and extract files
copy related dll file to PostgreSQL lib directory (NN is number of pg release)
plpgsql_check_NN.dll -> "c:\Program Files\PostgreSQL\NN\lib"
copy *.sql and *.control files to extension directory
*.sql, *.control -> "c:\Program Files\PostgreSQL\NN\share\extension"
execute with super user rights SQL command CREATE EXTENSION
CREATE EXTENSION plpgsql_check;

Wednesday, February 5, 2025

plpgsql_check, raise warning when syntax of plpgsql expression is not pure

Originally the PL/pgSQL was very simple language and integration procedural language with SQL was very primitive. Very specific feature of PL/pgSQL is translation of every expression to SQL. Thanks to this translation, the PL/pgSQL is very simple and really fully integrated with SQL engine. Bad side is possibility to write dirty or broken code, that is not detected.

Every expression is translated to queries. So expression `10` is translated to `SELECT 10`. The PL/pgSQL allows dirty expressions like `a FROM foo WHERE x = 10`. Although this syntax is ugly, it was very popular and it is supported and will be supported for ever.

Unfortunately, this support of this syntax with removing some limits in PostgreSQL 14, allows new kind of bugs (see https://www.postgresql.org/message-id/CALyvM2bp_CXMH_Gyq87pmHJRuZDEhV40u9VP8rX=CAnEt2wUXg@mail.gmail.com).

When somebody write code:

DECLARE var int;
BEGIN
var := 10
DELETE FROM tab WHERE varx = 20;
END;

This code on PostgreSQL 13 and older fails, but on PostgreSQL 14 and higher just quietly does nothing. Why?

The problem is in missing semicolon after `10`. PL/pgSQL translates the code to query:

`SELECT 10 DELETE FROM tab WHERE varx = 20`. This query fails on older Postgres, because `DELETE` is keyword and requires usage of `AS` keyword. But PostgreSQL 14 doesn't need it - It allows to usage a keywords like column names without necessity to use `AS` keyword.

I wrote a patch to Postgres - that implements new extra check that can detect this issue https://commitfest.postgresql.org/52/5044/. But this patch is waiting for review half year. I am not sure how much of plpgsql's developers using extra checks.

Today I found a way, how it is possible to detect this issue without necessity to modify SQL parser, and I wrote new check to plpgsql_check (it is merged in master branch):

(2025-02-05 22:17:04) postgres=# create or replace function foofoo()
returns void as $$
declare x int;
begin
x := 1 delete from foo where x = 10 ;
end;
$$ language plpgsql;
CREATE FUNCTION
(2025-02-05 22:17:07) postgres=# select plpgsql_check_function('foofoo()');
┌────────────────────────────────────────────────────────────────────┐
│                       plpgsql_check_function                       │
╞════════════════════════════════════════════════════════════════════╡
│ warning extra:00000:4:assignment:expression is not pure expression │
│ Query: x := 1 delete from foo where x = 10                     │
│ --     ^                                                           │
│ Detail: there is a possibility of unwanted behave                  │
│ Context: at assignment to variable "x" declared on line 2          │
│ warning extra:00000:2:DECLARE:never read variable "x"              │
└────────────────────────────────────────────────────────────────────┘
(6 rows)

Monday, December 2, 2024

compiled win x64 dll of orafce-4.14.0 and plpgsql_check-2.7.12 for PostgreSQL 16 and 17

I compiled and uploaded zip files with latest orafce and plpgsql_check for PostgreSQL 16 and PostgreSQL 17 - I used Microsoft Visual C 2022.

Setup:

download orafce-4.14.0-x64.zip or plpgsql_check-2.7.12-x64.zip and extract files
copy related dll file to PostgreSQL lib directory (NN is number of pg release)
orafce-NN.dll -> "c:\Program Files\PostgreSQL\NN\lib"
remove suffix "x64-16" or "x64-17" from dll file
orafce-NN.dll -> orafce.dll
copy *.sql and *.control files to extension directory
*.sql, *.control -> "c:\Program Files\PostgreSQL\NN\share\extension"
execute with super user rights SQL command CREATE EXTENSION
CREATE EXTENSION orafce;

This was first time when I used meson build system, and I was able to run regress tests. Fortunately the meson reduces lot of monkey work, unfortunately not all - I didn't find a way how to use ninja install with UAC.

Friday, September 6, 2024

How to get info about relations between system tables?

One my customer asked me "what tables holds references to table pg_roles"?

The reply is simple - none. pg_roles is view. But this view uses very important table pg_authid. For custom tables we can use a query:

SELECT conname, conrelid::pg_catalog.regclass AS ontable,
       pg_catalog.pg_get_constraintdef(oid, true) AS condef
  FROM pg_catalog.pg_constraint c
 WHERE confrelid = 'a'::pg_catalog.regclass
       AND contype = 'f'
ORDER BY conname;
┌─────────────┬─────────┬─────────────────────────────────────┐
│   conname   │ ontable │               condef                │
╞═════════════╪═════════╪═════════════════════════════════════╡
│ b_a_id_fkey │ b       │ FOREIGN KEY (a_id) REFERENCES a(id) │
└─────────────┴─────────┴─────────────────────────────────────┘
(1 row)

But this method doesn't work for system tables. These tables doesn't use explicitly defined foreign keys. We should to use different method. The system function pg_get_catalog_foreign_keys returns all referencies between system tables, and we can filter result:

SELECT * 
 FROM pg_get_catalog_foreign_keys()
WHERE pktable = 'pg_authid'::regclass;

┌─────────────────────────┬──────────────┬───────────┬────────┬──────────┬────────┐
│         fktable         │    fkcols    │  pktable  │ pkcols │ is_array │ is_opt │
╞═════════════════════════╪══════════════╪═══════════╪════════╪══════════╪════════╡
│ pg_proc                 │ {proowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_type                 │ {typowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_class                │ {relowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_operator             │ {oprowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_opfamily             │ {opfowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_opclass              │ {opcowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_language             │ {lanowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_largeobject_metadata │ {lomowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_statistic_ext        │ {stxowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_event_trigger        │ {evtowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_namespace            │ {nspowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_conversion           │ {conowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_database             │ {datdba}     │ pg_authid │ {oid}  │ f        │ f      │
│ pg_db_role_setting      │ {setrole}    │ pg_authid │ {oid}  │ f        │ t      │
│ pg_tablespace           │ {spcowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_auth_members         │ {roleid}     │ pg_authid │ {oid}  │ f        │ f      │
│ pg_auth_members         │ {member}     │ pg_authid │ {oid}  │ f        │ f      │
│ pg_auth_members         │ {grantor}    │ pg_authid │ {oid}  │ f        │ f      │
│ pg_ts_config            │ {cfgowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_ts_dict              │ {dictowner}  │ pg_authid │ {oid}  │ f        │ f      │
│ pg_extension            │ {extowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_foreign_data_wrapper │ {fdwowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_foreign_server       │ {srvowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_user_mapping         │ {umuser}     │ pg_authid │ {oid}  │ f        │ t      │
│ pg_policy               │ {polroles}   │ pg_authid │ {oid}  │ t        │ t      │
│ pg_default_acl          │ {defaclrole} │ pg_authid │ {oid}  │ f        │ f      │
│ pg_collation            │ {collowner}  │ pg_authid │ {oid}  │ f        │ f      │
│ pg_publication          │ {pubowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_subscription         │ {subowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_variable             │ {varowner}   │ pg_authid │ {oid}  │ f        │ f      │
└─────────────────────────┴──────────────┴───────────┴────────┴──────────┴────────┘
(30 rows)

Thursday, June 6, 2024

More complex log in json format processing

I needed to summarize number of bugs per week per user

JSON log entry looks like:

{"timestamp":"2024-06-07 00:10:56.525 CEST","user":"backend","dbname":"prd","pid":3557301,"remote_host":"199.16.203.8",
"remote_port":31315,"session_id":"666229de.3647b5","line_num":3,"ps":"INSERT","session_start":"2024-06-06 23:27:58 CEST","vxid":"67/48252790",
"txid":2033150293,"error_severity":"ERROR","state_code":"23505","message":"duplicate key value violates unique constraint \"payment_from_account_trid_key\"",
"detail":"Key (trid)=(440607272834519) already exists.","statement":"insert into ... values ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10)\nRETURNING *","application_name":"PostgreSQL JDBC Driver",
"backend_type":"client backend","query_id":8536052635720301574}

cat postgresql-*.json | \
jq -r 'select(.error_severity=="ERROR") | select (.application_name | (startswith("pgAdmin") or startswith("IntelliJ"))) ' | \
jq -rs 'group_by (.user, .message) | map({user: .[0].user, message: .[0].message, count: length}) | .[] | [.count, .user, .message] | @tsv'

I had to learn so grouping inside jq is allowed only on json array (and result is an array again), so I needed to merge input records to array by using -s, --slurp option. For final processing, I need to translate result array just to set of records by syntax .[]. Transformation to tsv (tab separated data) are used for better readability than csv. result in tsv format can looks like:

1	lukas	relation "item_image" does not exist
1	lukas	relation "item_image_share" does not exist
3	petr	UNION types "char" and text cannot be matched
2	petr	canceling statement due to user request
6	petr	current transaction is aborted, commands ignored until end of transaction block
1	petr	duplicate key value violates unique constraint "idx_op"
1	prd_jenkins_flyway	relation "item_shipping" does not exist

jq language is powerful and strange. It is strange to use | pipe symbol inside an expression - like filter on application_name looks:

.application_name | startswith("xxx") or startswith("yyyy")

Unfortunately, I didn't find a documentation named "gentle introduction to jq for people who knows SQL and C", so using jq language looks scary , but it is working pretty well.