Proposal for creating a search engine using Django

Im new to web crawl. I am going to create a search engine where the crawler stores Rapidshare links, including the URL where the Rapidshare links are found ...

In other words, I'm going to create a site similar to filestube.com

After some searching, I found Scrapy working with Django. I tried to find about combining walnut with Django but found nothing

I hope you can give me an offer to create such a site ... especially a scanner

+4
source share
2 answers

The best-known plug-in application for this is Django-Haystack , which allows you to connect to multiple search engines:

haystack allows you to use an API that is similar to Queryset Django's own syntax to use these search engines directly (all of which have their own APIs and dialects).

If you use paperclip tools , any tool you will use: BeautifulSoup or Scrappy, you will be on your own by writing Python code that will analyze what you want to parse and then populate your django models.
It can even be individual python scripts available in the commands.py module.

If you have many files to search for, you probably need an index that is often rebuilt and allows you to quickly perform searches without getting into ORM django.
Using the Solr index (for example), you can create other fields on the fly, for example, virtual fields based on real model fields (for example: splitting the author’s name and surname, adding a file header field in upper case, whatever)

Of course, you do not need fast indexing, keyword promotion or semantic analysis, you can still do the classic full-text search in several fields of the django model. i:

+7
source

Did you tag DjangoItem ? This is an experimental Scrapy function, but it is known to work.

0
source

Source: https://habr.com/ru/post/1334666/


All Articles