Scrapy log handler

I seek your help in the following two questions. How to install a handler for different levels of logs, as in python. I currently have

STATS_ENABLED = True STATS_DUMP = True LOG_FILE = 'crawl.log' 

But the debug messages generated by Scrapy are also added to the log files. This is very long and ideally, I would like DEBUG messages to leave a standard error, and INFO messages to be a dump for my LOG_FILE .

Secondly, the docs say The logging service must be explicitly started through the scrapy.log.start() function. My question is: where can I run this scrapy.log.start() ? Is it inside my spider?

+4
source share
4 answers

Hmm

I just want to update that I can get the log file handler to file using

 from twisted.python import log import logging logging.basicConfig(level=logging.INFO, filemode='w', filename='log.txt'""") observer = log.PythonLoggingObserver() observer.start() 

however, I cannot get the log to display the name of the spiders, for example, from a twisted standard error. I posted this question .

+3
source

Secondly, the docs say The logging service must be explicitly started through the scrapy.log.start() function . My question is: where to run this scrapy.log.start ()? Is it inside my spider?

If you start a spider using scrapy crawl my_spider , the log starts automatically if STATS_ENABLED = True

If you start the crawler process manually, you can do scrapy.log.start() before starting the crawler process.

 from scrapy.crawler import CrawlerProcess from scrapy.conf import settings settings.overrides.update({}) # your settings crawlerProcess = CrawlerProcess(settings) crawlerProcess.install() crawlerProcess.configure() crawlerProcess.crawl(spider) # your spider here log.start() # depends on LOG_ENABLED print "Starting crawler." crawlerProcess.start() print "Crawler stopped." 

A little knowledge about your first question:

Since you must manually run the scrapy log, this will allow you to use your own logger.

I think you can copy the scrapy/scrapy/log.py into scrapy/scrapy/log.py sources, modify it, import it instead of scrapy.log and run start() - scrapy will use your log. It has a line in the start() function that says log.startLoggingWithObserver(sflo.emit, setStdout=logstdout) .

Make your own observer ( http://docs.python.org/howto/logging-cookbook.html#logging-to-multiple-destinations ) and use it there.

+4
source

I would like DEBUG messages to leave a standard error, and INFO messages to dump my LOG_FILE.

You can set LOG_LEVEL = 'INFO' in settings.py, but it will completely disable DEBUG messages.

+3
source
 scrapy some-scrapy's-args -L 'INFO' -s LOG_FILE=log1.log 
Outputs

will be redirected to the log file.

0
source

Source: https://habr.com/ru/post/1383788/


All Articles