When is array.array more efficient than lists?

Question

When is array.array more efficient than lists?

I read the Fluent Python book "when I met a sentence in which the author claims that

If you need to store 10 million floating point values, an array is more efficient since the array does not actually have full objects, but only packed bytes representing their machine values are the same as an array in C.

I can’t understand what the author is trying to convey. What does he say about "packed bytes"? what does “packed byte storage” mean ?, How are python lists stored? why doesn't he store it that way, if that's what makes it effective?

+6

python arrays list

Harish kayarohanam Dec 23 '16 at 20:25

source share

2 answers

jez · Answer 1 · 2016-12-23T20:34:57+0000

Say you are dealing with 8 byte floating point numbers. "Packed bytes" in this context means that there is a allocated fragment of allocated memory in which the first 8 bytes represent the first float, and then immediately the next 8 bytes represent the next float, etc. No loss. This is the most economical way to store data (at least without compression). It may also be the most time-efficient for certain operations (e.g. massive arithmetic operations).

Python list does not save things this way. First, one list item may be a float, but the next may be a different type of object. Alternatively, you can remove, insert, or replace items in a list. Some of these operations are related to lengthening or shortening the list dynamically. All of them are very inefficient in time and memory if the elements are stored as packed bytes. The Python list class is designed to be as versatile as possible, making trade-offs between the effectiveness of various types of operations.

Probably the most important difference is that the Python list in its basic C implementation is a container full of pointers to objects, not a container full of contents of the original object. One consequence of this is that multiple references to the same Python object may appear in list . Another is that changing a specific item can be performed very efficiently. For example, suppose the first item in your list, a[0] , is an integer, but you want to replace it with a string that takes up more memory, for example. a[0] = "There a horse in aisle five." A packed array should (a) make an extra room by shifting the rest of the contents of the array in memory and (b) separately update some index of sizes and types of elements. But a Python list would only have to overwrite one pointer value with another.

In a CPython implementation, the pointers themselves can (more or less) be packed into memory. This means that adding a new item to the list will still be ineffective (as to how it would be implemented in the implementation of the Python list , say, the structure of the list of links under the hood).

In general, there is no absolute “effective” or “ineffective” - all this is a question of what resource you are working with, what (limitations) content types are in the container and how you intend to transform the container or its contents.

Mad physicist · Answer 2 · 2016-12-23T20:51:07+0000

Under the hood, Python mainly works with pointers. The Python list is actually an array of pointers (at least in the reference implementation). This is part of where inefficiency comes from. The other part is that Python usually uses duck printing, so you cannot determine if the operation will work on the list item until you try it.

When you access a list item to do something with it, for example. call the __add__ method, you must: A) dereference the pointer to the actual object, B) find out if it has the __add__ attribute, check if it is callable, and then make the call.

In an array, you know the type of data stored on the front, so you know all its attributes and all operations. In addition, all actual numbers are packed together in one piece of memory. This does not apply to lists. Since numbers are just an object type in Python, lists of numbers are really lists of common object pointers.

To summarize, arrays store raw numbers instead of pointers to large objects wrapping numbers, making them smaller. Arrays know the type of their content up, so applying an operation to the entire array requires only one check, not a check for each element, as in lists.

When is array.array more efficient than lists?

More articles: