Django Relationship with Scrapy, how are items stored?

I just need to understand. How to determine if a copy and item are saved in a spider? I retrieve elements from the site, and after that I get comments on this element. So first I have to save the item after that, I will save the comments. But when I write the code after the lesson, it gives me this error.

save() prohibited to prevent data loss due to unsaved related object ''.

And this is my code

 def parseProductComments(self, response): name = response.css('h1.product-name::text').extract_first() price = response.css('span[id=offering-price] > span::text').extract_first() node = response.xpath("//script[contains(text(),'var utagData = ')]/text()") data = node.re('= (\{.+\})')[0] #data = xpath.re(" = (\{.+\})") data = json.loads(data) barcode = data['product_barcode'] objectImages = [] for imageThumDiv in response.css('div[id=productThumbnailsCarousel]'): images = imageThumDiv.xpath('img/@data-src').extract() for image in images: imageQuality = image.replace('/80/', '/500/') objectImages.append(imageQuality) company = Company.objects.get(pk=3) comments = [] item = ProductItem(name=name, price=price, barcode=barcode, file_urls=objectImages, product_url=response.url,product_company=company, comments = comments) yield item print item["pk"] for commentUl in response.css('ul.chevron-list-container'): url = commentUl.css('span.link-more-results::attr(href)').extract_first() if url is not None: for commentLi in commentUl.css('li.review-item'): comment = commentLi.css('p::text').extract_first() commentItem = CommentItem(comment=comment, product=item.instance) yield commentItem else: yield scrapy.Request(response.urljoin(url), callback=self.parseCommentsPages, meta={'item': item.instance}) 

And this is my conveyor.

 def comment_to_model(item): model_class = getattr(item, 'Comment') if not model_class: raise TypeError("Item is not a `DjangoItem` or is misconfigured") def get_comment_or_create(model): model_class = type(model) created = False # Normally, we would use `get_or_create`. However, `get_or_create` would # match all properties of an object (ie create a new object # anytime it changed) rather than update an existing object. # # Instead, we do the two steps separately try: # We have no unique identifier at the moment; use the name for now. obj = model_class.objects.get(product=model.product, comment=model.comment) except model_class.DoesNotExist: created = True obj = model # DjangoItem created a model for us. obj.save() return (obj, created) def get_or_create(model): model_class = type(model) created = False # Normally, we would use `get_or_create`. However, `get_or_create` would # match all properties of an object (ie create a new object # anytime it changed) rather than update an existing object. # # Instead, we do the two steps separately try: # We have no unique identifier at the moment; use the name for now. obj = model_class.objects.get(product_company=model.product_company, barcode=model.barcode) except model_class.DoesNotExist: created = True obj = model # DjangoItem created a model for us. obj.save() return (obj, created) def update_model(destination, source, commit=True): pk = destination.pk source_dict = model_to_dict(source) for (key, value) in source_dict.items(): setattr(destination, key, value) setattr(destination, 'pk', pk) if commit: destination.save() return destination class ProductItemPipeline(object): def process_item(self, item, spider): if isinstance(item, ProductItem): item['cover_photo'] = item['files'][0]['path'] item_model = item.instance model, created = get_or_create(item_model) #update_model(model, item_model) if created: for image in item['files']: imageItem = ProductImageItem(image=image['path'], product=item.instance) imageItem.save() # for comment in item['comments']: # commentItem = CommentItem(comment=comment, product= item.instance) # commentItem.save() return item if isinstance(item, CommentItem): comment_to_model = item.instance model, created = get_comment_or_create(comment_to_model) if created: print model else: print created return item 
+5
source share
1 answer

Get or Create

Most of your code seems to deal with the obvious weakness of get_or_create

 # Normally, we would use `get_or_create`. However, `get_or_create` would # match all properties of an object (ie create a new object # anytime it changed) rather than update an existing object. 

Fortunately, this obvious short approximation can be easily overcome. Thanks to the default parameter get_or_create

Any keyword arguments passed to get_or_create () - with the exception of an optional one called defaults - will be used in the get () call. If the object is found, get_or_create () returns a tuple of this object and False. If multiple objects are found, get_or_create raises MultipleObjectsReturned. If the object is not found, get_or_create () will create and save the new object, returning a tuple of the new object and True.

Update or Create

Still not convinced that get_or_create is the right person to work with? So do I. There is something even better. update_or_create !!

A convenient method for updating an object with kwargs data, creating a new one if necessary. By default, the dictionary (field, value) used to update the object is used.

But I will not dwell on the user update_or_create, because the lines in your code that are trying to update your model have been commented out and you have not clearly indicated that you want to update.

New conveyor

Using standard API methods, your module containing your pipeline simply boils down to the ProductItemPipeline class. And it can be changed

 class ProductItemPipeline(object): def process_item(self, item, spider): if isinstance(item, ProductItem): item['cover_photo'] = item['files'][0]['path'] model, created = ProductItem.get_or_create(product_company=item['product_company'], barcode=item['bar_code'], defaults={'Other_field1': value1, 'Other_field2': value2}) if created: for image in item['files']: imageItem = ProductImageItem(image=image['path'], product=item.instance) imageItem.save() return item if isinstance(item, CommentItem): model, created = CommentItem.get_or_create(field1=value1, defaults={ other fields go in here'}) if created: print model else: print created return item 

Error in source code

I believe this is the place where there was a mistake.

  obj = model_class.objects.get(product=model.product, comment=model.comment) 

Now we are not using this, so the error should go away. If you still have problems, insert the full trace.

+1
source

Source: https://habr.com/ru/post/1262234/


All Articles