Perl Memory Conservation Tips

What are some helpful memory saving tips in a Perl script? I am interested in knowing how to save as much memory as possible for systems that depend on Perl programs. I know Perl is not very good when it comes to memory usage, but I would like to know if there are any tips for improving it.

So what can you do to make the Perl script use less memory. I am interested in any suggestions, whether they are real tips for writing code or tips for compiling Perl in different ways.

Edit for Bounty: I have a perl program that serves as a server for a network application. Each client that connects to it currently receives its own child process. I also used threads instead of forks, but I was unable to determine if using threads instead of forks is actually more memory efficient.

I would like to try using streams again instead of forks. I believe in the theory that this should save on memory usage. I have a few questions in this regard:

  • Are threads created in Perl created to copy Perl module libraries to memory for each thread?
  • threads (use threads) the most efficient way (or the only) way to create threads in Perl?
  • In threads, I can specify the stack_size paramater parameter, what should I specifically consider when determining this value and how does this affect memory usage?

With threads in Perl / Linux, what is the most reliable method for determining the actual memory usage by thread?

+44
memory perl
Mar 16 2018-12-12T00:
source share
6 answers

What problem are you experiencing, and what does “big” mean to you? I have friends, you need to load 200 GB of files into memory, so their idea of ​​good advice is much different from a budget buyer for minimal VM fragments suffering from 250 MB of RAM (indeed, my phone has more).

In general, Perl holds onto any memory you use, even if it doesn't use it. Understand that optimization is one way, for example. memory can adversely affect another, such as speed.

This is not an exhaustive list (and there is more in Perl Programming ):

☹ Use Perl's memory profiling tools to help you find problem areas. See Using memory heap memory in perl programs and How to find the amount of physical memory used by a hash in Perl?

☹ Use the smallest lexical variables to allow Perl to reuse this memory when you don't need it.

☹ Avoid creating large time frames. For example, reading a file using foreach reads all the input at the same time. If you only need this one at a time, use while .

  foreach ( <FILE> ) { ... } # list context, all at once while( <FILE> ) { ... } # scalar context, line by line 

☹ You may not even need a file in memory. Memory card files instead of marking them up

☹ If you need to create large data structures, consider something like DBM :: Deep or other storage systems to save most of them RAM and to disk until you need it.

☹ Do not let people use your program. Whenever I did this, I reduced the memory by about 100%. It also reduces support requests.

☹ Pass large chunks of text and large aggregates via the link so that you do not make a copy, thereby saving the same information twice. If you need to copy it because you want to change something, you can get stuck. This happens in both directions as subroutine arguments and subroutine return values:

  call_some_sub( \$big_text, \@long_array ); sub call_some_sub { my( $text_ref, $array_ref ) = @_; ... return \%hash; } 

☹ Track memory leaks in modules. I had big problems with the application until I realized that the module was not releasing memory . I found the fix in the RT module queue, applied it and solved the problem.

☹ If you need to process a large piece of data once, but do not need to keep a constant amount of memory, unload the work to a child process. A child process has only a memory area at runtime. When you receive a response, the child process exits and frees memory. Similarly, work distribution systems such as Gearman can be distributed between machines.

☹ Turn recursive solutions into iterative ones. Perl does not have tail recursion optimization, so every new call is added to the call stack. You can optimize the tail problem yourself with tricks using goto or a module, but it takes a lot of work to hang on a technique that you probably don't need.

☹ Did he use 6 GB or only five? Well, to be honest, in all this excitement, I kind of lost myself. But, since it is Perl, the most powerful language in the world, and would take away your memory, you have to ask yourself one question: “I’m lucky? Well, yes, punk?

There is still much, but early in the morning to find out what it is. I highlight some of Mastering Perl and Effective Perl Programming .

+75
Mar 16 '12 at 9:28
source share

My two decades.

  • Are threads created in Perl created to copy Perl module libraries to memory for each thread?

    • This is not so, this is just one process that is not repeated in the program stack, each thread must have its own.
  • Is a thread (using threads) the most efficient way (or the only) way to create threads in Perl?

    • IMO Any method ultimately calls the pthread APIs that actually do the job.
  • In threads, I can specify the stack_size parameter, what should I specifically consider when specifying this value and how does it affect memory usage?

    • Because threads run in the same process space, the stack cannot be shared. The stack size tells pthreads how far they should be from each other. Each time a function is called, local variables are allocated on the stack. Thus, the size of the stack limits the depth of regeneration. you can select as little as possible so that your application still works.

With threads in Perl / Linux, what is the most reliable method for determining the actual memory usage for each thread?

 * Stack storage is fixed after your thread is spawned, heap and static storage is shared and they can be used by any thread so this notion of memory usage per-thread doesn't really apply. It is per process. Comparing fork and thread: * fork duplicate the process and inherites the file handles advantages: simpler application logic, more fault tolerant. the spawn process can become faulty and leaking resource but it will not bring down the parent. good solution if you do not fork a lot and the forked process eventually exits and cleaned up by the system. disadvantages: more overhead per fork, system limitation on the number of processes you can fork. You program cannot share variables. * threads runs in the same process with addtional program stacks. advantages: lower memory footprint, thread spawn if faster and ligther than fork. You can share variables. disadvantages: more complex application logic, serialization of resources etc. need to have very reliable code and need to pay attention to resource leaks which can bring down the entire application. IMO, depends on what you do, fork can use way less memory over the life time of the application run if whatever you spawn just do the work independently and exit, instead of risking memory leaks in threads. 
+4
Mar 25 '12 at 18:08
source share

If you are really desperate, you can try to install some memory as a file system ( tmpfs / ramdisk) and read / write / delete files on it. I guess the tmpfs implementation is smart enough to free up memory when deleting a file.

You can also mmap (see File :: Map , Sys :: Mmap ) a huge file on tmpfs, an idea I got from Cache :: FastMmap .

Never tried, but it should work :)

+2
Mar 16 2018-12-12T00:
source share

Both streams and forks will write CoW memory pages (copy by write). Using threads, you can define common variables, but by default copy your variables to the stream. In both cases, you can expect a higher memory load.

I don’t know which applications you come across, but you may want to write your application using the Event Driven model instead of the Parent / Child processes. I would recommend that you take a look at AnyEvent , it is quite simple and provided that the application becomes single-threaded (or a process), which you save some memory (and even faster in some cases). People even wrote web servers with AnyEvent with very good performance, and you hardly noticed that it is single-threaded. Take a look at Twiggy for example

+1
Jan 07 '13 at 2:42 on
source share

In addition to brian d foy's suggestions, I found that the following also helped LOT.

  • If possible, do not use external modules, you do not know how much memory they use. I found by replacing LWP and HTTP :: Request :: Common modules either by using Curl or Lynx to reduce memory usage by half.
  • Dumped it again, changing our own modules and executing only the required routines, using "require", and not a complete library of unnecessary subscribers.
  • Brian mentions the use of lexical variables with the smallest possible scale. If you use forking, using "undef" also helps, immediately freeing up memory for Perl for reuse. This way you declare a scalar, array, hash or even sub, and when you are done with any of them, use:

    my (@divs) = localtime (time); $ VAR {minute} = $ divs [1];

    undef @divs; undef @array; undef $ scalar; undef% hash; undef & sub;

  • And don't use extra variables to make your code smaller. It is best to hard code everything possible to reduce the use of the namespace.

Then there are many other tricks that you can try depending on the functionality of your application. Every minute we were controlled by cron. We found that we can split half of the processes with sleep (30), so half will work and end within the first 30 seconds, freeing up the processor and memory, and the other half will work after a 30 second delay. Resource use decreased again. At the same time, we were able to reduce the amount of RAM used from more than 2 GB to 200 MB, which is 90% savings.

We were able to get a pretty good idea of ​​memory usage with

 top -M 

since our script was executed on a relatively stable server with one site. Therefore, viewing the "free sheep" gave us a pretty good sign of memory usage.

Also "ps" grepping for your script, and if forking, sorting by memory or CPU usage was good help.

 ps -e -o pid,pcpu,pmem,stime,etime,command --sort=+cpu | grep scriptname | grep -v grep 
+1
04 Feb '15 at 17:02
source share

Try using more caching. The logic for implementing the caching procedure is always the same, so you can automate the use of the CPAN Memoize module. Use Devel :: Size to check the actual amount of memory.

-7
Mar 16 '12 at 9:28
source share



All Articles