Python garbage collection for Linux

I am a little puzzled by how Python allocates memory and garbage collection, and how it depends on the platform. For example, when we compare the following two code fragments:

Fragment A:

>>> id('x' * 10000000) == id('x' * 10000000) True 

Fragment B:

 >>> x = "x"*10000000 >>> y = "x"*10000000 >>> id(x) == id(y) False 

Snippet A returns true because when Python allocates memory, it allocates it in the same place for the first test and in different places in the second test, so their memory locations are different.

But apparently, the performance of the system or the platform affects this because when I try to do it on a larger scale:

 for i in xrange(1, 1000000000): if id('x' * i) != id('x' * i): print i break 

A friend on the Mac tried this and it worked to the end. When I ran it on a bunch of Linux virtual machines, it invariably returned (but at different times) to different virtual machines. Is this because of scheduling garbage collection in Python? Was it because my Linux virtual machines had less processing speed than the Mac, or the Linux implementation garbage Python collection in different ways?

+4
source share
3 answers

CPython uses two memory management strategies:

  • Reference counting
  • Mark-and-Sweep Garbage Collection.

Allocation is generally done through the malloc / free platforms and inherits the performance characteristics of the subclass. If memory is reused, it is determined by the operating system. (There are some objects combined by python vm)

In your example, however, the "real" GC algorithm does not start (this is used only for collecting loops). Your long line will be freed as soon as the last link is removed.

+3
source

The garbage collector simply uses any convenient place. There are many different garbage collection strategies, and all of this is also affected by options, different platforms, memory usage, moon phase, etc. Trying to guess how the interpreter will distribute certain objects is just a waste of time.

+6
source

This is because python caches small integers and strings:

large lines : stored in variables not cached:

 In [32]: x = "x"*10000000 In [33]: y = "x"*10000000 In [34]: x is y Out[34]: False 

large lines : not stored in variables, looks like cached:

 In [35]: id('x' * 10000000) == id('x' * 10000000) Out[35]: True 

small lines : cached

 In [36]: x="abcd" In [37]: y="abcd" In [38]: x is y Out[38]: True 

small integers: Cached

 In [39]: x=3 In [40]: y=3 In [41]: x is y Out[41]: True 

large integers:

stored in variables: not cached

 In [49]: x=12345678 In [50]: y=12345678 In [51]: x is y Out[51]: False 

not saved: cached

 In [52]: id(12345678)==id(12345678) Out[52]: True 
+5
source

Source: https://habr.com/ru/post/1443082/


All Articles