I am working on a batch process that flushes ~ 800,000 records from a slow obsolete database (1.4-2 ms for the record fetch time ... it is being added) to MySQL, which can execute a little faster. To optimize this, I loaded all the MySQL records into memory, which makes use of about 200 MB. Then I start dumping from an obsolete database and updating records.
Initially, when this finishes updating records, I would then call SaveContext, which would then move my memory from ~ 500MB-800MB to 1.5GB. Very soon, I will have exceptions from the memory (on a virtual machine that runs 2 GB of RAM), and even if I could provide it with more RAM, then 1.5-2 GB is still a bit excessive, and it will just be put a group on this problem. To fix this, I started calling SaveContext every 10,000 records, which helped a little, and since I used delegates to retrieve data from the old database and update in MySQL, I did not get too terrible success in performance because after 5 seconds or so wait for it to save it, and then start the update in memory for 3000 or so records that were copied. However, memory usage is still growing.
Here are my potential problems:
- Data is output from the old database in any order, so I canโt block updates and periodically issue ObjectContext.
- If I donโt retrieve all the data from MySQL beforehand and instead scan it during the write update process, it is incredibly slow. Instead, I take it all in advance, send it to the dictionary indexed by the primary key, and when I update the data, I delete the entries from the dictionary.
One of the possible solutions that I was thinking about is to somehow free the memory used by entities, which, as I know, I will never touch, since they are already updated (for example, clearing the cache, but only for a specific element), but I donโt know if this is possible using the Entity Framework.
Does anyone have any thoughts?
source share