I am looking to create n-grams from a text column in PostgreSQL. Currently, I am separating (in white spaces) data (sentences) in a text column into an array.
enter code here select regexp_split_to_array (suggestion Data, E '\ s +') from tableName
Once I have this array, how can I do this:
- Creating a loop to search for n-grams and writing each row to another table
Using unsest, I can get all the elements of all arrays in separate rows, and maybe I can come up with a way to get n-grams from one column, but I would lose the boundaries of the sentences that I consider to keep.
PostgreSQL SQL sample code to emulate the above script
create table tableName(sentenceData text);
INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');
INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');
INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');
select regexp_split_to_array(sentenceData,E'\\s+') from tableName;
select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;