Python str view

I have a huge str ~ 1 GB in length:

 >>> len(L) 1073741824 

I need to take many fragments of a string from specific indices to the end of the string. In C, I would do:

 char* L = ...; char* p1 = L + start1; char* p2 = L + start2; ... 

But in Python, line slicing creates a new str instance, using more memory:

 >>> id(L) 140613333131280 >>> p1 = L[10:] >>> id(p1) 140612259385360 

To save memory, how do I create a str-like object that is actually a pointer to the original L?

Edit : we have buffer and memoryview in Python 2 and Python 3, but memoryview does not show the same interface as str or bytes :

 >>> L = b"0" * 1000 >>> a = memoryview(L) >>> b = memoryview(L) >>> a < b Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: memoryview() < memoryview() >>> type(b'') <class 'bytes'> >>> b'' < b'' False >>> b'0' < b'1' True 
+5
source share
2 answers

There is a memoryview type:

 >>> v = memoryview('potato') >>> v[2] 't' >>> v[-1] 'o' >>> v[1:4] <memory at 0x7ff0876fb808> >>> v[1:4].tobytes() 'ota' 
+5
source

If you need to work with a string, use iterators to actually access the data without duplicating the contents in memory

Your trading tool will be itertools.tee and itertools.islice

 >>> L = "Random String of data" >>> p1, p2 = tee(L) >>> p1 = islice(p1,10,None) >>> p2 = islice(p2,15,None) >>> ''.join(p1) # This now creates a copy now 'ing of data' >>> ''.join(p2) # This now creates a copy now 'f data' 

This literally gives a pointer, unlike C / C ++, it's just a direct pointer / iterator

Note Outside of the course, you need to exercise due diligence when using advanced iterators, namely

  • To save the pointer before advancing. itertools.tee would be useful here, as in p1, p_saved = tee(p1)
  • You can read as the character next(p1) or as the string ''.join(p1) , but since the python string is not changed, every time you need a string representation, you should be presented as a copy.
  • As you can read as individual characters, all of your algorithms should use iterable capabilities, not generate a string. For example, to compare two iterators instead of comparing the contents of ''.join(p1) == ''.join(p2) , you need to do the following all(a == b for a, b in izip(p1, p2))
+1
source

Source: https://habr.com/ru/post/1207200/


All Articles