I am trying to create a Django application that functions as storage. Elements are scraped from all over the Internet and constantly update the database of the Django project (say, every few days). I use the Scrapy framework to perform the cleanup, and although there is an experimental function here ) and use them for loaddata in the Django project as XML lights (docs here ). Everything seems to be in order, because if one of the two processes is screwed, there is an intermediate file between them. Modulating the application as a whole does not seem like a bad idea either.
Some problems:
- To make these files too large to read into memory for Django
loaddata . - I spend too much time on this when there may be a better or simpler solution, for example, exporting directly to a database, which in this case is MySQL.
- No one seems to have written about this process on the Internet, which is strange given that Scrapy is a great basis for connecting to the Django application, in my opinion.
- There is no definitive guide to manually creating Django tools in Django docs - it seems to be more focused on resetting and reloading fixtures from the application itself.
The existence of the experimental DjangoItem suggests that Scrapy + Django is a popular choice for a good solution.
I would really appreciate any decisions, advice or wisdom on this matter.
emish source share