Subclass str and create a new method with the same effect as + =

I'm trying to subclass str - not for anything important, just to learn more about Python's built-in types. I subclassed str this way (using __new__ because str is immutable):

 class MyString(str): def __new__(cls, value=''): return str.__new__(cls, value) def __radd__(self, value): # what method should I use?? return MyString(self + value) # what goes here?? def write(self, data): self.__radd__(data) 

He initializes the law, as far as I can judge. but I can't get him to change himself in place using the + = operator. I tried to override __add__ , __radd__ , __iadd__ and many other configurations. Using the return , ive managed to get it to return a new instance of the correctly added MyString , but not change it in place. Success will look like this:

 b = MyString('g') b.write('h') # b should now be 'gh' 

Any thoughts?

UPDATE

To add a reason someone could do this, I followed up with a suggestion to create the following mutable class that uses a regular line inside:

 class StringInside(object): def __init__(self, data=''): self.data = data def write(self, data): self.data += data def read(self): return self.data 

and checked using timeit:

 timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.004415035247802734 timeit.timeit("arr.write('1234567890')", setup="from hard import StringInside; arr = StringInside()", number=10000) 0.0331270694732666 

The difference grows rapidly when number goes up - by 1 million interactions, StringInside took longer than I was ready to wait, and the pure version of str returned to ~ 100 ms.

UPDATE 2

For posterity, I decided to write a cython class wrapping a C ++ string to see if performance could be improved compared to one that was slightly based on the updated version of Mike Muller below, and I managed to succeed. I understand that cython is "cheating", but I provide this just for fun.

python version:

 class Mike(object): def __init__(self, data=''): self._data = [] self._data.extend(data) def write(self, data): self._data.extend(data) def read(self, stop=None): return ''.join(self._data[0:stop]) def pop(self, stop=None): if not stop: stop = len(self._data) try: return ''.join(self._data[0:stop]) finally: self._data = self._data[stop:] def __getitem__(self, key): return ''.join(self._data[key]) 

cython version:

 from libcpp.string cimport string cdef class CyString: cdef string buff cdef public int length def __cinit__(self, string data=''): self.length = len(data) self.buff = data def write(self, string new_data): self.length += len(new_data) self.buff += new_data def read(self, int length=0): if not length: length = self.length return self.buff.substr(0, length) def pop(self, int length=0): if not length: length = self.length ans = self.buff.substr(0, length) self.buff.erase(0, length) return ans 

performance:

write

 >>> timeit.timeit("arr.write('1234567890')", setup="from pyversion import Mike; arr = Mike()", number=1000000) 0.5992741584777832 >>> timeit.timeit("arr.write('1234567890')", setup="from cyversion import CyBuff; arr = CyBuff()", number=1000000) 0.17381906509399414 

reading

 >>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from pyversion import Mike; arr = Mike()", number=1000000) 1.1499049663543701 >>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from cyversion import CyBuff; arr = CyBuff()", number=1000000) 0.2894480228424072 

popping up

 >>> # note I'm using 10e3 iterations - the python version wouldn't return otherwise >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from pyversion import Mike; arr = Mike()", number=10000) 0.7390561103820801 >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from cyversion import CyBuff; arr = CyBuff()", number=10000) 0.01501607894897461 
+5
source share
1 answer

Decision

This is the answer to an updated question.

You can use the list to store data and only scribble a line when reading it:

 class StringInside(object): def __init__(self, data=''): self._data = [] self._data.append(data) def write(self, data): self._data.append(data) def read(self): return ''.join(self._data) 

Performance

The performance of this class:

 %%timeit arr = StringInside() arr.write('1234567890') 1000000 loops, best of 3: 352 ns per loop 

much closer to the original str :

 %%timeit str_arr = '' str_arr+='1234567890' 1000000 loops, best of 3: 222 ns per loop 

Compare with your version:

 %%timeit arr = StringInsidePlusEqual() arr.write('1234567890') 100000 loops, best of 3: 87 Β΅s per loop 

Cause

The way to build the string my_string += another_string was long lasting. CPython has some optimizations for this case. It seems that CPython cannot detect that this pattern is used here. This is likely because it is a little hidden inside the class.

Not all implementations have this optimization for various reasons. For instance. PyPy, which is generally much faster than CPython, is much slower for this use case:

PyPy 2.6.0 (Python 2.7.9)

 >>>> import timeit >>>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.08312582969665527 

CPython 2.7.11

 >>> import timeit >>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.002151966094970703 

Cut option

This version supports slicing:

 class StringInside(object): def __init__(self, data=''): self._data = [] self._data.extend(data) def write(self, data): self._data.extend(data) def read(self, start=None, stop=None): return ''.join(self._data[start:stop]) def __getitem__(self, key): return ''.join(self._data[key]) 

You can cut off the usual path:

 >>> arr = StringInside('abcdefg') >>> arr[2] 'c' >>> arr[1:3] 'bc' 

read() also supports optional start and stop indices:

 >>> arr.read() 'abcdefg' >>> arr.read(1, 3) 'bc' >>> arr.read(1) 'bcdefg' 
+5
source

Source: https://habr.com/ru/post/1240772/


All Articles