Python garbage collection for Linux

Question

Python garbage collection for Linux

I am a little puzzled by how Python allocates memory and garbage collection, and how it depends on the platform. For example, when we compare the following two code fragments:

Fragment A:

>>> id('x' * 10000000) == id('x' * 10000000) True

Fragment B:

 >>> x = "x"*10000000 >>> y = "x"*10000000 >>> id(x) == id(y) False

Snippet A returns true because when Python allocates memory, it allocates it in the same place for the first test and in different places in the second test, so their memory locations are different.

But apparently, the performance of the system or the platform affects this because when I try to do it on a larger scale:

 for i in xrange(1, 1000000000): if id('x' * i) != id('x' * i): print i break

A friend on the Mac tried this and it worked to the end. When I ran it on a bunch of Linux virtual machines, it invariably returned (but at different times) to different virtual machines. Is this because of scheduling garbage collection in Python? Was it because my Linux virtual machines had less processing speed than the Mac, or the Linux implementation garbage Python collection in different ways?

+4

python linux

pbanka Oct 30 '12 at 19:40

source share

3 answers

The garbage collector simply uses any convenient place. There are many different garbage collection strategies, and all of this is also affected by options, different platforms, memory usage, moon phase, etc. Trying to guess how the interpreter will distribute certain objects is just a waste of time.

+6

Antimony Oct 30 '12 at 19:45

source share

This is because python caches small integers and strings:

large lines : stored in variables not cached:

 In [32]: x = "x"*10000000 In [33]: y = "x"*10000000 In [34]: x is y Out[34]: False

large lines : not stored in variables, looks like cached:

 In [35]: id('x' * 10000000) == id('x' * 10000000) Out[35]: True

small lines : cached

 In [36]: x="abcd" In [37]: y="abcd" In [38]: x is y Out[38]: True

small integers: Cached

 In [39]: x=3 In [40]: y=3 In [41]: x is y Out[41]: True

large integers:

stored in variables: not cached

 In [49]: x=12345678 In [50]: y=12345678 In [51]: x is y Out[51]: False

not saved: cached

 In [52]: id(12345678)==id(12345678) Out[52]: True

+5

Ashwini chaudhary Oct 30 '12 at 19:50

source share

ebo · Accepted Answer · 2012-10-30T19:50:16+0000

CPython uses two memory management strategies:

Reference counting
Mark-and-Sweep Garbage Collection.

Allocation is generally done through the malloc / free platforms and inherits the performance characteristics of the subclass. If memory is reused, it is determined by the operating system. (There are some objects combined by python vm)

In your example, however, the "real" GC algorithm does not start (this is used only for collecting loops). Your long line will be freed as soon as the last link is removed.

Python garbage collection for Linux

More articles: