How to optimize adding new nodes in `django-mptt`?

I am creating a script that will synchronize two databases. There is data in the database that should be stored as a tree, so I use django-mptt for the new database. When I synchronize the database, I select new data from the old database and must save it in the new one.

I want to know if there is a better way to add new nodes to the tree? Now it looks like this:

... # Add new data to DB for new_record in new_records: # Find appropriate parent using data in 'new_record' parent = get_parent(new_record) # Create object which should be added using data in 'new_record' new_node = MyMPTTModel(...) new_node.insert_at(parent, save = True) # Similar to: # new_node.insert_at(parent, save = False) # new_node.save() 

But it works very slowly. I think this works in this way, because after each call to the insert_at(..., save = True) django-mptt method, insert_at(..., save = True) should write a new node to the database and change the left and right keys for records that are already in the database.

Is there a way to make django-mptt change the request every time I call insert_at and then apply all the changes together when save called? Or do you know any other ways to reduce execution time?

Thanks in advance.

+4
source share
2 answers

First, do not use insert_at . This is not a reason for slow work, but it is unnecessary and looks ugly. Just install node.parent :

 for new_record in new_records: new_node = MyMPTTModel(..., parent=get_parent(new_record)) new_node.save() 

Now for the performance issue. If you use the latest mptt (git master, not 0.5.4), there the context manager is called delay_mptt_updates to prevent mptt from doing a lot of these updates until you add all the nodes:

 with transaction.atomic(): with MyMPTTModel.objects.delay_mptt_updates(): for new_record in new_records: new_node = MyMPTTModel(..., parent=get_parent(new_record)) new_node.save() 

Alternatively, if you touch almost the entire tree, you can speed up even more by using disable_mptt_updates and rebuild the entire tree at the end:

 with transaction.atomic(): with MyMPTTModel.objects.disable_mptt_updates(): for new_record in new_records: new_node = MyMPTTModel(..., parent=get_parent(new_record)) new_node.save() MyMPTTModel.objects.rebuild() 
+12
source

Django-MPTT maintains a tree structure for you. Thus, with each insert_at it will change all the nodes to the right of the inserted one - this is why you are experiencing performance issues.

One way is to manually build a tree structure without django-mptt .

So, you will need to take new entries, and, according to them, find out how the old nodes in the tree should be changed. Since you only insert data, only the left and right attributes are changed, but not the level, so this should make it a little easier. Once you know which nodes will be changed, you can change them using a single update transaction.

Then you can start inserting new data. Again, the fastest way is to calculate the left, right and level values ​​for each new record, and then do one bulk_insert (Django> = 1.4). Doing this will only result in two db operations, which obviously should be much faster in terms of db transactions.

However, this method will require some clever way to figure out how to change the old nodes in the tree. The easiest way is to unload the entire tree into a python structure, and then figure out the changes in that structure. This will not be possible if your tree is very large due to memory limitations.

Now I'm not sure if there is a more efficient way to do this. Maybe someone else at StackOverflow has some interesting ideas ...

EDIT

Sorry for the confusion of update . I meant one transaction. In such cases, I usually make a raw sql query, where I do update tbname set ... where id=1; update tbname set ... where id=2; update tbname set ... where id=1; update tbname set ... where id=2; Therefore, I am doing several sql statements in a single sql query. In my experience, the expensive part of db is not the execution of the statement, but the transaction itself, since there is network latency, db blocking, etc. Thus, having one transaction allows db to be as fast as possible. Not sure how to do this in django using queries. I usually make a raw SQL query.

0
source

Source: https://habr.com/ru/post/1437039/


All Articles