Pages

Thursday, October 21, 2010

diff string for PostgreSQL

Hello, I am working on migration a large dataset from archive to a new format. This work is joined with searching a small differences between original and transformed data. For this work I implemented a small library for PostgreSQL. It contains a two functions: diff_string and lc_substring.These functions should to support multibyte encoding. I hope, so this can be useful for someone.
postgres=# select lc_substring('Hello World','ello');
 lc_substring 
──────────────
 ello
(1 row)

postgres=# select diff_string('Hello World','ello');
         diff_string         
─────────────────────────────
 <del>H</>ello<del> World</>
(1 row)
This library exists on pgFoundry http://pgfoundry.org/frs/?group_id=1000457

2 comments:

  1. It isn't clear from the example what lc_substring does. Does it search for and lowercase a substring, if found? If the second argument has to be a lowercase string, what is the point? What happens if the string differs in case sensitivity, e.g., lc_substring('Hello World', 'world')?

    ReplyDelete
  2. lc_substring returns the longest common substring - it's case sensitive - the function diff_string is based on recursive calling this function.

    so lc_substring(Hello World, world) -> orld

    ReplyDelete