The session will track all CodeSample objects that you retrieve. So, after iterating over 2M objects, the session contains a link to all of them. The session needs these links so that it can write the correct changes to the database on flush . Therefore, I believe what you expect.
To save only N objects in memory at a time, you can do something like the code below (inspired by this answer , disclaimer: I have not tested it).
offset = 0 N = 10000 got_rows = True while got_rows: got_rows = False for sample in session.query(CodeSample).limit(N).offset(offset): got_rows = True for proj in projects: if sample.filename.startswith(proj.abs_source): sample.filename = "some other path" offset += N session.flush()
But the above is a little inconvenient, maybe some SQLAlchemy gurus know how to do this better.
By the way, you will not need session.add (), the session tracks changes in objects. Why are you using yield_per ( EDIT: I assume this is needed to extract rows in chunks from the database, is this correct? In any case, the session will keep track of all of them.)
EDIT:
Hmm, it seems like I didn’t understand something. From the docs :
weak_identity_map: If the default value is set to True, a map with a weak link is used; instances that are not external link will be garbage collected immediately. For dereferenced instances that are currently waiting for changes, the attribute management system will create a temporary strong reference to the object, which lasts until the database changes, after which it is dereferenced again. On the other hand, when using the False value, the identification card uses a regular Python dictionary to store instances. The session will support all instances until they are deleted using expunge (), clear (), or purge ().
and
prune (): delete unlinked instances cached in the identity card.
Please note that this method only makes sense if the parameter "weak_identity_map" is set to False. By default, a weak ID card is self-limiting.
Deletes any object in this session ID that is not mentioned in the user code, modified, not set, or scheduled to be deleted. Returns the number of trimmed objects.