Hello
I had to solve one simple task - to find the longest common substring only in SQL:
I have a two functions - one returns all substrings of entered string:
CREATE OR REPLACE FUNCTION substrings(text)
RETURNS text[] AS $$
SELECT array(SELECT DISTINCT substring($1 from i for j)
FROM generate_series(1, length($1)) g(i),
generate_series(1, length($1)) h(j)
)
$$ LANGUAGE sql;
postgres=# SELECT substrings('pavel');
substrings
-----------------------------------------------------
{e,vel,avel,p,pav,a,pavel,pa,av,ve,ave,l,pave,el,v}
(1 row)
Second function returns the longest common substrings:
CREATE OR REPLACE FUNCTION max_common_substring(text, text)
RETURNS text[] AS $$
SELECT ARRAY(SELECT x
FROM (SELECT $1, x, length(x), rank() OVER (ORDER BY length(x) DESC)
FROM unnest(substrings($2)) g(x)
WHERE strpos($1,x) > 0) xx
WHERE rank = 1)
$$ LANGUAGE sql;
postgres=# SELECT max_common_substring('stěhula','stěhule');
max_common_substring
----------------------
{stěhul}
(1 row)
Using the prefix_range extension:
ReplyDeleteprefix=# select 'stěhula'::prefix_range | 'stěhule' as union;
union
-------------
stěhul[a-e]
(1 row)
Then in my use case taking apart the common prefix and the range limits was not expected, I could add it pretty easily though.
Very useful post. Thank you.
ReplyDelete