After I implemented the full-text search function in my application using Sqlite and FTS tables, I would be interested in an effective way to extract the FULL inverted index from my FTS table. Essentially, I need a table of results, including a mapping between all terms -> docid -> number of entries.
Following Sqlite FTS documentation after creating tables
-- Create an FTS4 table CREATE VIRTUAL TABLE ft USING fts4(x, y); -- Create an fts4aux table to access the full-text index for table "ft" CREATE VIRTUAL TABLE ft_terms USING fts4aux(ft);
... and pasting content ...
INSERT INTO ft(x, y) VALUES('Apple banana', 'Cherry'); INSERT INTO ft(x, y) VALUES('Banana Date Date', 'cherry'); INSERT INTO ft(x, y) VALUES('Cherry Elderberry', 'Elderberry');
... instead of having only terms and the number of occurrences in all documents, such as the FTS AUX table ...
SELECT term, col, documents, occurrences FROM ft_terms; -- apple | * | 1 | 1 -- apple | 0 | 1 | 1 -- banana | * | 2 | 2 -- banana | 0 | 2 | 2 -- cherry | * | 3 | 3 -- cherry | 0 | 1 | 1 -- cherry | 1 | 2 | 2 -- date | * | 1 | 2 -- date | 0 | 1 | 2 -- elderberry | * | 1 | 2 -- elderberry | 1 | 1 | 1 -- elderberry | 1 | 1 | 1
My result should look like this:
Term |col |docid| occurences ------------------------------------------ -- apple | 0 | 1 | 1 -- banana | 0 | 2 | 1 -- cherry | 0 | 3 | 1 -- cherry | 1 | 1 | 1 -- cherry | 1 | 2 | 1 -- date | 0 | 2 | 2 -- elderberry | 0 | 3 | 1 -- elderberry | 1 | 3 | 1
I'm still not sure that simply matching queries across all terms in a document collection is efficient enough - maybe there is a more direct way?