Get inverted index from SQLite FTS table

After I implemented the full-text search function in my application using Sqlite and FTS tables, I would be interested in an effective way to extract the FULL inverted index from my FTS table. Essentially, I need a table of results, including a mapping between all terms -> docid -> number of entries.

Following Sqlite FTS documentation after creating tables

-- Create an FTS4 table CREATE VIRTUAL TABLE ft USING fts4(x, y); -- Create an fts4aux table to access the full-text index for table "ft" CREATE VIRTUAL TABLE ft_terms USING fts4aux(ft); 

... and pasting content ...

 INSERT INTO ft(x, y) VALUES('Apple banana', 'Cherry'); INSERT INTO ft(x, y) VALUES('Banana Date Date', 'cherry'); INSERT INTO ft(x, y) VALUES('Cherry Elderberry', 'Elderberry'); 

... instead of having only terms and the number of occurrences in all documents, such as the FTS AUX table ...

 SELECT term, col, documents, occurrences FROM ft_terms; -- apple | * | 1 | 1 -- apple | 0 | 1 | 1 -- banana | * | 2 | 2 -- banana | 0 | 2 | 2 -- cherry | * | 3 | 3 -- cherry | 0 | 1 | 1 -- cherry | 1 | 2 | 2 -- date | * | 1 | 2 -- date | 0 | 1 | 2 -- elderberry | * | 1 | 2 -- elderberry | 1 | 1 | 1 -- elderberry | 1 | 1 | 1 

My result should look like this:

  Term |col |docid| occurences ------------------------------------------ -- apple | 0 | 1 | 1 -- banana | 0 | 2 | 1 -- cherry | 0 | 3 | 1 -- cherry | 1 | 1 | 1 -- cherry | 1 | 2 | 1 -- date | 0 | 2 | 2 -- elderberry | 0 | 3 | 1 -- elderberry | 1 | 3 | 1 

I'm still not sure that simply matching queries across all terms in a document collection is efficient enough - maybe there is a more direct way?

+4
source share

Source: https://habr.com/ru/post/1400859/


All Articles