When saving the cleaned item and file, Scrapy inserts blank lines into the csv output file

Question

When saving the cleaned item and file, Scrapy inserts blank lines into the csv output file

I have Scrapy (version 1.0.3) in which I also extract some data from a web page, and I also upload a file like this (simplified):

def extract_data(self, response): title = response.xpath('//html/head/title/text()').extract()[0].strip() my_item = MyItem() my_item['title'] = title file_url = response.xpath('...get url of file...') file_urls = [file_url] # here there can be more urls, so I'm storing like a list fi = FileItem() fi['file_urls'] = file_urls yield my_item yield fi

in pipelines.py I just override FilePipeline to change the file name:

 from scrapy.pipelines.files import FilesPipeline class CustomFilesPipeline(FilesPipeline): def file_path(self, request, response=None, info=None): filename = format_filename(request.url) return filename

in items.py I have:

 class MyItem(scrapy.Item): title = scrapy.Field() class FileItem(scrapy.Item): file_urls = scrapy.Field() files = scrapy.Field()

in settings.py I have:

 ITEM_PIPELINES = { 'myscraping.pipelines.CustomFilesPipeline': 100 }

now in the csv output file I get something like this:

 title1 title2 , , title3 etc.

It seems that the empty lines (having only a comma) represent the downloaded file, and I would like to know or get recommendations on how to prevent such lines from appearing in the csv output file. (files are saved in a folder).
In the Scrapy settings, I found out about FEED_STORE_EMPTY (by default it is false, i.e. It should not export empty channels), but this does not apply to the files that I assume.
I feel that this should do something with pipelines, but I cannot figure out how to do this.
any help would be appreciated

+5

python scrapy scrapy-spider

zdenulo Oct 14 '15 at 13:10

source share

1 answer

vianney · Accepted Answer · 2015-10-15T09:32:47+0000

Paste the answer here:

  def extract_data (self, response):
     title = response.xpath ('// html / head / title / text ()'). extract () [0] .strip ()
     my_item = MyItem ()
     my_item ['title'] = title    
     file_url = response.xpath ('... get url of file ...')
     my_item ['file_urls'] = [file_url]
     yield my_item

When saving the cleaned item and file, Scrapy inserts blank lines into the csv output file

More articles: