Excel VBA: Macro for data processing is getting slower and slower the longer it works

I analyze large volumes of historical financial data using the QuantlibXl library in Excel 2010, 32-bit. My typical worksheet contains long columns of empirical data of up to 1 million rows. My macros usually have to go through each line from top to bottom and do some Quantlib-specific financial analytics, such as reevaluating security, which requires creating Quantlib objects on each line. Analytical material is contained in the cells in the form of formulas.

So, in the beginning, I tried to simply select the cells with formulas in the top row and fill them out by dragging the lower right corner to the bottom of the sheet. Already here, the processing time increased exponentially with the number of rows involved.

So, I decided that I needed to write a macro that processes smaller pieces of strings at a time. The macro basically takes care to fill the top row only 100 lines at a time. This and a number of optimizations (explained below), of course, significantly improved the speed, but the processing time still increased exponentially.

The problem is, as far as I'm trying to optimize my macros, they are slower and slower working longer than they are working. I keep track of the processed lines in the status bar, and, for example, if 2000 lines are processed per minute (when calculating quite actively) when the macro is run, its speed is constantly reduced throughout the execution time, for example, only 100 lines per minute after 60,000 lines. In this rhythm, he will never see the end of the sheet. Thus, in fact, at some point it becomes optimal to simply stop it and start again from where it left off. I also shared files and allowed them to work on different computers at the same time, which is a pain in the ass from a management point of view.

I have already implemented tons of optimization: - Screen updates and automatic calculations are disabled. - I only perform the calculation on the line being processed at a time. - garbage collection: Quantlib objects are deleted immediately after they are no longer used. I thought that they were eating all the free memory that caused a slowdown. - So far I have written the corresponding results (cells) to a text file and deleted the lines that are no longer needed. Again, the macro was very fast at the beginning and would have worked to the end for several hours if it had not become slower again, like 70,000 lines. In fact, I was hoping to see an increase in speed at runtime as the lines are deleted and the sheet is compressed, but this just doesn't happen. So I just keep stopping the process for 60,000 lines and start it again, but this is tedious.

I would like to find out why this Excel behavior does not process large amounts of data linearly and requires a restart and how to avoid it. If someone faced similar problems and found a way around this, I would be happy to hear about it.

EDIT: Every time I stop the process to speed it up by starting it, I noticed that I need to restart Excel, otherwise it will resume as slowly as before. My current hypothesis is that at some point, the data is not being cleared correctly. If so, your decision will draw me further. The Quantlib library has a method to see how many objects are still in memory called ohRepositoryObjectCount (). I call the ohRepositoryDeleteAllObjects () function after each calculation, and they are effectively deleted according to this other method, but maybe there is still some leak that remains undetected.

EDIT2: Now I'm sure there is a memory leak, since after a long batch the task manager shows 3 or 4 Excel processes consuming together about 1.5 GB of memory. When exiting Excel, it crashes (with the message in the lines "Excel no longer works"), and the processes are saved, so I have to kill them manually.

+4
source share
1 answer

If my assumption is correct, your lines are a list of all your securities; and are not related to each other; and you do not calculate them. If this is correct, follow these steps:

  • On a separate sheet, arrange all the data columns (both input and output) to represent a single row.
  • Copy and paste the values โ€‹โ€‹of one row of data from your "source" sheet.
  • Remove all your calculations from the source sheet and place it here.
  • Copy and paste the values โ€‹โ€‹back into the source text.

Put # 2 in # 4 in the macro and loop your data.

Here is my answer, the following comment. If I did this:

  • my "source" data will be in the database. I am sure that there are relationships among securities that I would like to study.
  • I would transfer the row elements to a column on my calc sheet for easy reading.
  • I would rip out calculations across multiple columns and sections for readability.
0
source

Source: https://habr.com/ru/post/1442410/


All Articles