SQLite FTS4 with preferred language

I have a SQLite table that was generated using the FTS4 module. Each record is specified at least twice with different languages, but a unique identifier is still used (int column, not indexed). Here is what I want to do: I want to find a term in my preferred language. I want to combine the result with searching for the same term using a different language. However, for the second search, I want to ignore all records (identified by their identifier) ​​that I already found during the first search. So basically I want to do this:

WITH term_search1 AS ( SELECT * FROM myFts WHERE myFts MATCH 'term' AND languageId = 1) SELECT * FROM term_search1 UNION SELECT * FROM myFts WHERE myFts MATCH 'term' AND languageId = 2 AND id NOT IN (SELECT id FROM term_search1) 

The problem is that the term_seach1 request will execute twice. Could there be a way to materialize my results? Any solution to limit it to 2 Requests (instead of 3) would be great.

I also tried using recursive queries, for example:

 WITH RECURSIVE term_search1 AS ( SELECT * FROM myFts WHERE myFts MATCH 'term' AND languageId = 1 UNION ALL SELECT m.* FROM myFts m LEFT OUTER JOIN term_search1 t ON (m.id = t.id) WHERE myFts MATCH 'term' AND m.languageId = 2 AND t.id IS NULL ) SELECT * FROM term_search1 

That didn't work either. Apparently, he just performed two searches for the id = 2 language (maybe this is an error?).

Thank you in advance:)

+6
source share
2 answers

You can use TEMPORARY tables to reduce the number of queries to myFts to 2:

 CREATE TEMP TABLE results (id INTEGER PRIMARY KEY); INSERT INTO results SELECT id FROM myFts WHERE myFts MATCH 'term' AND languageId = 1; INSERT INTO results SELECT id FROM myFts WHERE myFts MATCH 'term' AND languageId = 2 AND id NOT IN (SELECT id FROM results); SELECT * FROM myFts WHERE id IN (SELECT id FROM results); DROP TABLE results; 

If it is possible to change the schema, you should save only text data in the FTS table. This way you will avoid incorrect results when searching for numbers and strings matching languageId . Create another meta table containing non-textual data (e.g. id and languageId ) and filter the rows by joining rowid myFts . Thus, you will need to query the FTS table only once - use a temporary table to store the results of the FTS table, then use the meta table to order them.

+4
source

This is the best I can think of:

 SELECT * FROM myFts t1 JOIN (SELECT COUNT(*) AS cnt, id FROM myFts t2 WHERE t2.languageId in (1, 2) AND t2.myFts MATCH 'term' GROUP BY t2.id) t3 ON t1.id = t3.id WHERE t1.myFts MATCH 'term' AND t1.languageId in (1, 2) AND (t1.languageId = 1 or t3.cnt = 1) 

I'm not sure what the second MATCH clause is required. The idea is to first calculate acceptable strings, and then choose the best one.

Change I have no idea why this is not working with your table. This is what I did to test it (SQLite version 3.8.10.2):

 CREATE VIRTUAL TABLE myFts USING fts4( id integer, languageId integer, content TEXT ); insert into myFts(id, languageId, content) values (10, 1, 'term 10 lang 1'); insert into myFts(id, languageId, content) values (10, 2, 'term 10 lang 2'); insert into myFts(id, languageId, content) values (11, 1, 'term 11 lang 1'); insert into myFts(id, languageId, content) values (12, 2, 'term 12 lang 2'); insert into myFts(id, languageId, content) values (13, 1, 'not_erm 13 lang 1'); insert into myFts(id, languageId, content) values (13, 2, 'term 13 lang 2'); 

Execution of the request gives:

 sqlite> SELECT * ...> FROM myFts t1 ...> JOIN (SELECT COUNT(*) AS cnt, id ...> FROM myFts t2 ...> WHERE t2.languageId in (1, 2) ...> AND t2.myFts MATCH 'term' ...> GROUP BY t2.id) t3 ...> ON t1.id = t3.id ...> WHERE t1.myFts MATCH 'term' ...> AND t1.languageId in (1, 2) ...> AND (t1.languageId = 1 or t3.cnt = 1); 10|1|term 10 lang 1|2|10 11|1|term 11 lang 1|1|11 12|2|term 12 lang 2|1|12 13|2|term 13 lang 2|1|13 sqlite> 
+2
source

Source: https://habr.com/ru/post/983912/


All Articles