Do bulk inserts / updates in MongoDB with PyMongo

How to do bulk update / paste in mongoDb using pymongo / pandas. The error I get is batch op errors occurred I understand because I set the "_id" I want to make. I code works fine on the first run, but on the second run it fails. I want to use pandas in a workflow. Data has a datetime object.

The syntax is completely different for upsert = True , with Update . An efficient solution with Update would be useful, where "_id" or "qid" could be installed. But there are python datetime objects!

 InSQL = 'SELECT * from database2.table2 ' sqlOut = pd.read_sql(InSQL,cxn) sqlOut['_id'] = "20170101" + ":"+ sqlOut['Var'] dfOut = sqlOut.to_json(orient='records',date_format='iso' ) try: db["test"].insert_many(json.loads(dfOut)) except Exception as e: print e 

I gave 50 pt bounty which expired without response. Hm ...

+5
source share
2 answers

a batch error can be caused by duplicating _id, so delete the same _id documents already in mongo before inserting

Or use update_many https://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=update#pymongo.collection.Collection.update_many

https://docs.mongodb.com/manual/reference/method/db.collection.updateMany/

0
source

You get an error because you are trying to insert documents with fields that conflict with the fields of existing documents in the second and subsequent insert_many calls. You correctly determined that this could be related to your _id setting explicitly, which then contradicts the existing _id values ​​in the collection.

MongoDB automatically creates a unique index on _id , which prohibits duplication of values.

You need to update or replace your documents with calls after the first (which inserted documents in their first version). There is indeed a concept of "upsert", which will take care of inserting previously non-existing documents into the collection, as well as updating existing ones.

Your options:

  • Most effective: pymongo.collection.Collection.bulk_write

     import pymongo operations = [pymongo.operations.ReplaceOne( filter={"_id": doc["_id"]}, replacement=doc, upsert=True ) for doc in json.loads(dfOut)] result = db["test"].bulk_write(operations) # handle results 

Note that performance also depends on whether the field is indexed in the collection, which, by the way, holds for _id . (also see pymongo.operations.ReplaceOne )

Note: pymongo.collection.Collection.update_many seems unsuitable for your needs, since you are not trying to set the same value in all matches of this filter.

0
source

Source: https://habr.com/ru/post/1264588/


All Articles