TPL Tasks + dynamic == OutOfMemoryException?

I am working on a streaming twitter client - after 1-2 days of continuous work, I get memory usage> 1.4gigs (32-bit process), and soon after it reaches this amount, I will get an exception in memory in the code, which by essentially this (this code will be an error of <30 seconds on my machine):

while (true) { Task.Factory.StartNew(() => { dynamic dyn2 = new ExpandoObject(); //get a ton of text, make the string random //enough to be be interned, for the most part dyn2.text = Get500kOfText() + Get500kOfText() + DateTime.Now.ToString() + DateTime.Now.Millisecond.ToString(); }); } 

I have profiled it, and it is definitely because of the class down in DLR (from memory - I do not have my details here) xxRuntimeBinderxx and xxAggregatexx.

This answer from Eric Lippert (microsoft) seems to indicate that I am doing expression parsing of objects behind the scenes that will never get GC'd, although there are no links in my code.

If this is the case, is there any way in the code above to prevent or reduce it?

My failure is to exclude dynamic usage, but I would prefer not to.

thanks

Update:

12/14/12:

ANSWER:

To get this specific example , in order to free up its tasks, it was necessary to execute (Thread.Sleep (0)), which would allow the GC to process freed tasks. I assume that the message / event loop is not allowed to handle in this particular case.

In the actual code that I used (TPL Dataflow), I did not call Complete () on the blocks because they had to be an infinite stream of data - the task will receive Twitter messages until Twitter sends them. In this model, there was never a reason to tell any of the blocks that they were made because they never executed BE while the application was starting.

Unfortunately, this does not look like the data flow blocks were never designed to be very long or handle unspeakable amounts of elements, because they actually contain a link to everything that was sent to them. If I am wrong, let me know.

So, the workaround is to periodically (based on the use of your memory - mine were every 100 thousand Twitter messages) released blocks and set them again.

In accordance with this scheme, memory consumption never exceeds 80 megabytes, and after reusing blocks and forced GC for a good estimate, the heap gen2 returns to 6 megabytes, and everything is fine.

10/17/12:

  • "It does nothing useful." This example is just to let you quickly create a problem. It fell off several hundred lines of code that have nothing to do with the problem.
  • "An endless loop that creates a task and, in turn, creates objects": Remember - it just quickly demonstrates the problem - the actual code sits there, waiting for more streaming data. In addition, looking at the code, all objects are created inside the Action <> lambda in the task. Why is this not cleared (after all) after it leaves the sphere of action? The problem also is not to do it too fast - it takes more than a day for the actual code to arrive at an exception from memory - it just makes it fast enough to try to do something.
  • "Are freedoms guaranteed?" An object is an object, right? I understand that the scheduler just uses the threads in the pool and the lambda that it executes will be thrown after execution independently.
+4
source share
2 answers

This is more due to the fact that the manufacturer works far ahead of the consumer than the DLR. The cycle creates tasks as quickly as possible - and tasks do not start as "immediately." It’s easy to see how far he can fall behind:

  int count = 0; new Timer(_ => Console.WriteLine(count), 0, 0, 500); while (true) { Interlocked.Increment(ref count); Task.Factory.StartNew(() => { dynamic dyn2 = new ExpandoObject(); dyn2.text = Get500kOfText() + Get500kOfText() + DateTime.Now.ToString() + DateTime.Now.Millisecond.ToString(); Interlocked.Decrement(ref count); }); } 

Output:

 324080 751802 1074713 1620403 1997559 2431238 

This is very important for planning for 3 seconds. Removing Task.Factory.StartNew (single-threaded execution) gives stable memory.

The reproduction you gave seems a little clever. If too many concurrent tasks is really your problem, you can try creating a custom task scheduler that limits concurrent scheduling .

+3
source

The problem here is not that the tasks you create are not cleared. Asti demonstrated that your code creates tasks faster than they can be processed, so when you clear the memory of completed tasks, you will end up anyway.

You said:

placing strategic sleep in this example will still throw an exception in memory - it will take longer

You did not specify a code for this or any other example that limits the number of simultaneous tasks. I assume that you are limiting creation to some extent, but the speed of creation is even faster than the rate of consumption. Here is my own limited example:

 int numConcurrentActions = 100000; BlockingCollection<Task> tasks = new BlockingCollection<Task>(); Action someAction = () => { dynamic dyn = new System.Dynamic.ExpandoObject(); dyn.text = Get500kOfText() + Get500kOfText() + DateTime.Now.ToString() + DateTime.Now.Millisecond.ToString(); }; //add a fixed number of tasks for (int i = 0; i < numConcurrentActions; i++) { tasks.Add(new Task(someAction)); } //take a task out, set a continuation to add a new one when it finishes, //and then start the task. foreach (Task t in tasks.GetConsumingEnumerable()) { t.ContinueWith(_ => { tasks.Add(new Task(someAction)); }); t.Start(); } 

This code ensures that no more than 100,000 tasks will be launched at any given time. When I run this, the memory is stable (if averaged for several seconds). It restricts tasks by creating a fixed number and then setting a continuation to schedule a new task whenever an existing one ends.

So, this tells us that since your real data is based on a feed from some external source, you get data from this channel ever so little faster than you can process it. You have several options. You can queue items as they arrive, make sure that only a limited number are currently running, and throw out requests if you have exceeded your capacity (or find another way to filter input so that you don't handle all of this), or you can just improve the hardware (or optimize the processing method that you have) so that you can process requests faster than they can be created.

Although I would usually say that people tend to try to optimize the code when it is already "fast enough", this clearly does not apply to you. You have a pretty tough benchmark to hit; you need to process the elements faster than they arrive. You are not currently encountering this test (but since it has been running for a while before the failure, you should not be that far).

+1
source

Source: https://habr.com/ru/post/1440356/


All Articles