N-grams from text in PostgreSQL

Question

N-grams from text in PostgreSQL

I am looking to create n-grams from a text column in PostgreSQL. Currently, I am separating (in white spaces) data (sentences) in a text column into an array.

enter code here select regexp_split_to_array (suggestion Data, E '\ s +') from tableName

Once I have this array, how can I do this:

Creating a loop to search for n-grams and writing each row to another table

Using unsest, I can get all the elements of all arrays in separate rows, and maybe I can come up with a way to get n-grams from one column, but I would lose the boundaries of the sentences that I consider to keep.

PostgreSQL SQL sample code to emulate the above script

create table tableName(sentenceData  text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\\s+')   from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;

+3

sql postgresql text-mining

harshsinghal Jun 15 '10 at 12:59

1

Alex Brasetvik · Accepted Answer · 2010-06-15T15:42:02+0000

pg_trgm: " pg_trgm , , ."

N-grams from text in PostgreSQL

More articles: