How to integrate javascript rendering module in scrapy?

I am working on a web page cleaning program, but I am having a problem using scrapy with javascript generated content. I know that scrapy is not built for this type of curettage, but I tried to use scrapyjs or splash to accomplish what I needed.

However, I cannot get one of these two modules to work correctly with scrapy. My question is: if someone has a minimal example, can he show that he is using scrapyjs or a splash to render javascript pages?

Edit: My platform is ubuntu and I am working with python. For scrapyjs, I just put the source code in the topmost directory of the scrapy project, and I have yet to find any real guides on how to use the splash. The reason I'm asking for a splash is because it seems to be a more powerful module for rendering javascript and is mentioned a lot in the same conversation as scrapjs.

+4
source share
1 answer

I believe that all you need to do is implement process_links in your Spider :

def proxy_url(url):
        return "http://localhost:8050/render.html?url=%s&timeout=15&wait=1" % url


def process_links(self,links):
        for link in links:
            link.url = proxy_url(link.url)
        return links
+1
source

Source: https://habr.com/ru/post/1525317/


All Articles