Scrapy from list to retrieve key or value

I am new to python and am having some problems moving my head around getting a specific value or keys from a list.

When my scraped element displays its value, I sometimes get such a return.

first list:

'image_urls': [u'http://www.websites.com/1.jpg',
                u'http://www.websites.com/2.jpg',
                u'http://www.websites.com/3.jpg'],

now I got around this by making the xpath more targeted and selecting elements by numbers [2], but my real problem is with these returns from crossed images

second list:

'images': [{'checksum': '2efhz768djdzs76dz',
            'path': 'full/2efhz768djdzs76dz.jpg',
            'url': 'http://www.websites.com/1.jpg'},
           {'checksum': 'zadz764dhqj34dsjs',
            'path': 'full/zadz764dhqj34dsjs.jpg',
            'url': 'http://www.websites.com/2.jpg'}],

I use sqlite3 to store my other cleared data using item.get

item.get ('image_urls', '')

how do you combine a list of values ​​into a string or customize it based on your rank? (first list)

and how to get value for checksum, path and url using item.get? (second list)

: :

:

'images': [{'checksum': '2efhz768djdzs76dz',
            'path': 'full/2efhz768djdzs76dz.jpg',
            'url': 'http://www.websites.com/1.jpg'},
           {'checksum': 'zadz764dhqj34dsjs',
            'path': 'full/zadz764dhqj34dsjs.jpg',
            'url': 'http://www.websites.com/2.jpg'}],

sqlite. :

item.get('scrapy-item', ''), , .

+4
2

,

x['image_urls'][0]

>>> images
[{'path': 'full/2efhz768djdzs76dz.jpg', 'url': 'http://www.websites.com/1.jpg', 'checksum': '2efhz768djdzs76dz'}, {'path': 'full/zadz764dhqj34dsjs.jpg', 'url': 'http://www.websites.com/2.jpg', 'checksum': 'zadz764dhqj34dsjs'}]
>>> list(map(lambda x : x['url'] + '/' + x['path'], images))
['http://www.websites.com/1.jpg/full/2efhz768djdzs76dz.jpg', 'http://www.websites.com/2.jpg/full/zadz764dhqj34dsjs.jpg']
>>> list(map(lambda x : x['checksum'], images))
['2efhz768djdzs76dz', 'zadz764dhqj34dsjs']

, . , -.

, . , -, .

+2

, , , scrapy, scrapy .

, , , , :

websites_urls=[]
checksums=[]
paths=[]
whole_item=[]
for image_url in item.get('image_urls'):
    for image in item.get('images'):
        if image_url==image['url']:
            websites_urls.append(image['url'])
            checksums.append(image['checksum'])
            paths.append(image['path'])
            whole_item.append(image)
            break
+2

Source: https://habr.com/ru/post/1650436/


All Articles