The best-known plug-in application for this is Django-Haystack , which allows you to connect to multiple search engines:
haystack allows you to use an API that is similar to Queryset Django's own syntax to use these search engines directly (all of which have their own APIs and dialects).
If you use paperclip tools , any tool you will use: BeautifulSoup or Scrappy, you will be on your own by writing Python code that will analyze what you want to parse and then populate your django models.
It can even be individual python scripts available in the commands.py module.
If you have many files to search for, you probably need an index that is often rebuilt and allows you to quickly perform searches without getting into ORM django.
Using the Solr index (for example), you can create other fields on the fly, for example, virtual fields based on real model fields (for example: splitting the authorβs name and surname, adding a file header field in upper case, whatever)
Of course, you do not need fast indexing, keyword promotion or semantic analysis, you can still do the classic full-text search in several fields of the django model. i:
source share