How to use full-text search in sqlite3 database in django?

I am working on a django application with sqlite3 database which has fixed database content. By fixed, I mean that the contents of db will not change over time. The model looks something like this:

class QScript(models.Model): ch_no = models.IntegerField() v_no = models.IntegerField() v = models.TextField() 

The table contains about 6,500 entries. Given a text that may have several words, or some words with an error, I need to define it ch_no and v_no . For example, if there is a v field in db with the text “This is an example verse”, this text, for example “This is an egsample verse”, should give me ch_no and v_no from db. This can be done using the full-text search, which I consider.

My queries are:

  • Can full-text search do this? My suggestion from what I learned may be, as the sqlite3 page says : full-text search is "what Google, Yahoo, and Bing do with documents hosted on the World Wide Web." Cited in SO, I read this article along with many others, but did not find anything that closely matches my requirements.

  • How to use FTS in django models? I read this one , but it didn’t help. It seems too outdated. Read here , which: "... requires direct database processing to add a full-text index." The search provides mostly MySQL related information, but I need to do this in sqlite3. So how to do this direct manipulation in sqlite3?


Edit:

Am I following sqlite3 correctly? Or should I use something else (e.g. haystack + elasticsearch, as Alex Morozov said)? My db won't grow anymore, and I learned that for small dbs, sqlite is almost always better (my situation corresponds to the fourth in sqlite when using a checklist ).

+3
source share
2 answers

I think that while sqlite is an amazing piece of software, full-text search capabilities are very limited. Instead, you can index your database using the Haystack Django app with some backends, such as Elasticsearch . Having this setup (and still available for your sqlite database) seems to me to be the most reliable and flexible way in terms of FTS.

Elasticsearch has a fuzzy search based on Levenshtein distance (in a nutshell it will handle your "egsample" requests). So, all you need to do is make the correct request type:

 from haystack.forms import SearchForm from haystack.generic_views import SearchView from haystack import indexes class QScriptIndex(indexes.SearchIndex, indexes.Indexable): v = indexes.CharField(document=True) def get_model(self): return QScript class QScriptSearchForm(SearchForm): text_fuzzy = forms.CharField(required=False) def search(self): sqs = super(QScriptSearchForm, self).search() if not self.is_valid(): return self.no_query_found() text_fuzzy = self.cleaned_data.get('text_fuzzy') if text_fuzzy: sqs = sqs.filter(text__fuzzy=text_fuzzy) return sqs class QScriptSearchView(SearchView): form_class = QScriptSearchForm 

Update: if PostgreSQL has a Levenshtein distance function, you can also use it as a Haystack backend or as a standalone search engine. If you choose the second method, you will need to execute a custom query expression , which is relatively simple if you use the latest version of Django.

0
source

The SQLite FTS engine is based on tokens - keywords that the search engine is trying to match.

Various tokenizers are available, but they are relatively simple. A “simple” tokenizer simply breaks each word and reduces it: for example, in the line “Fast brown fox jumps over a lazy dog” the word “jumps” will correspond, but not “jump”. The porter tokenizer is a little more advanced, removing word conjugations, so that jumps and jumps will match, but a typo like jmups will not.

In short, the SQLite FTS extension is fairly simple and not designed to compete with, say, Google.

As for the integration of Django, I don’t think it is. You will probably need to use the Django raw SQL query interface to create and query the FTS table.

+1
source

Source: https://habr.com/ru/post/1447113/


All Articles