N-grams from text in PostgreSQL

I am looking to create n-grams from a text column in PostgreSQL. Currently, I am separating (in white spaces) data (sentences) in a text column into an array.

enter code here select regexp_split_to_array (suggestion Data, E '\ s +') from tableName

Once I have this array, how can I do this:

  • Creating a loop to search for n-grams and writing each row to another table

Using unsest, I can get all the elements of all arrays in separate rows, and maybe I can come up with a way to get n-grams from one column, but I would lose the boundaries of the sentences that I consider to keep.

PostgreSQL SQL sample code to emulate the above script

create table tableName(sentenceData  text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\\s+')   from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;
+3
1

pg_trgm: " pg_trgm , , ."

+2

Source: https://habr.com/ru/post/1750116/


All Articles