Pages

Friday, June 4, 2010

the longest common substring of two strings

Hello I had to solve one simple task - to find the longest common substring only in SQL: I have a two functions - one returns all substrings of entered string:
CREATE OR REPLACE FUNCTION substrings(text) 
RETURNS text[] AS $$ 
SELECT array(SELECT DISTINCT substring($1 from i for j) 
                FROM generate_series(1, length($1)) g(i), 
                     generate_series(1, length($1)) h(j)
            )
$$ LANGUAGE sql;

postgres=# SELECT substrings('pavel');
                     substrings                      
-----------------------------------------------------
 {e,vel,avel,p,pav,a,pavel,pa,av,ve,ave,l,pave,el,v}
(1 row)
Second function returns the longest common substrings:
CREATE OR REPLACE FUNCTION max_common_substring(text, text) 
RETURNS text[] AS $$ 
  SELECT ARRAY(SELECT x 
                  FROM (SELECT $1, x, length(x), rank() OVER (ORDER BY length(x) DESC) 
                           FROM unnest(substrings($2)) g(x) 
                          WHERE strpos($1,x) > 0) xx 
                 WHERE rank = 1) 
$$ LANGUAGE sql;

postgres=# SELECT max_common_substring('stěhula','stěhule');
 max_common_substring 
----------------------
 {stěhul}
(1 row)

2 comments:

  1. Using the prefix_range extension:

    prefix=# select 'stěhula'::prefix_range | 'stěhule' as union;
    union
    -------------
    stěhul[a-e]
    (1 row)

    Then in my use case taking apart the common prefix and the range limits was not expected, I could add it pretty easily though.

    ReplyDelete