Pavel Stehule's blog: the longest common substring of two strings

Friday, June 4, 2010

the longest common substring of two strings

Hello I had to solve one simple task - to find the longest common substring only in SQL: I have a two functions - one returns all substrings of entered string:

CREATE OR REPLACE FUNCTION substrings(text) 
RETURNS text[] AS $$ 
SELECT array(SELECT DISTINCT substring($1 from i for j) 
                FROM generate_series(1, length($1)) g(i), 
                     generate_series(1, length($1)) h(j)
            )
$$ LANGUAGE sql;

postgres=# SELECT substrings('pavel');
                     substrings                      
-----------------------------------------------------
 {e,vel,avel,p,pav,a,pavel,pa,av,ve,ave,l,pave,el,v}
(1 row)

Second function returns the longest common substrings:

CREATE OR REPLACE FUNCTION max_common_substring(text, text) 
RETURNS text[] AS $$ 
  SELECT ARRAY(SELECT x 
                  FROM (SELECT $1, x, length(x), rank() OVER (ORDER BY length(x) DESC) 
                           FROM unnest(substrings($2)) g(x) 
                          WHERE strpos($1,x) > 0) xx 
                 WHERE rank = 1) 
$$ LANGUAGE sql;

postgres=# SELECT max_common_substring('stěhula','stěhule');
 max_common_substring 
----------------------
 {stěhul}
(1 row)

2 Comments:

At June 4, 2010 at 3:29 AM , dim said...: Using the prefix_range extension:

prefix=# select 'stěhula'::prefix_range | 'stěhule' as union;
union
-------------
stěhul[a-e]
(1 row)

Then in my use case taking apart the common prefix and the range limits was not expected, I could add it pretty easily though.
At February 17, 2015 at 9:24 PM , Unknown said...: Very useful post. Thank you.

Pavel Stehule's blog

Friday, June 4, 2010

the longest common substring of two strings

2 Comments:

About Me

Previous Posts