I'm trying to subclass str - not for anything important, just to learn more about Python's built-in types. I subclassed str this way (using __new__ because str is immutable):
class MyString(str): def __new__(cls, value=''): return str.__new__(cls, value) def __radd__(self, value):
He initializes the law, as far as I can judge. but I can't get him to change himself in place using the + = operator. I tried to override __add__ , __radd__ , __iadd__ and many other configurations. Using the return , ive managed to get it to return a new instance of the correctly added MyString , but not change it in place. Success will look like this:
b = MyString('g') b.write('h') # b should now be 'gh'
Any thoughts?
UPDATE
To add a reason someone could do this, I followed up with a suggestion to create the following mutable class that uses a regular line inside:
class StringInside(object): def __init__(self, data=''): self.data = data def write(self, data): self.data += data def read(self): return self.data
and checked using timeit:
timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.004415035247802734 timeit.timeit("arr.write('1234567890')", setup="from hard import StringInside; arr = StringInside()", number=10000) 0.0331270694732666
The difference grows rapidly when number goes up - by 1 million interactions, StringInside took longer than I was ready to wait, and the pure version of str returned to ~ 100 ms.
UPDATE 2
For posterity, I decided to write a cython class wrapping a C ++ string to see if performance could be improved compared to one that was slightly based on the updated version of Mike Muller below, and I managed to succeed. I understand that cython is "cheating", but I provide this just for fun.
python version:
class Mike(object): def __init__(self, data=''): self._data = [] self._data.extend(data) def write(self, data): self._data.extend(data) def read(self, stop=None): return ''.join(self._data[0:stop]) def pop(self, stop=None): if not stop: stop = len(self._data) try: return ''.join(self._data[0:stop]) finally: self._data = self._data[stop:] def __getitem__(self, key): return ''.join(self._data[key])
cython version:
from libcpp.string cimport string cdef class CyString: cdef string buff cdef public int length def __cinit__(self, string data=''): self.length = len(data) self.buff = data def write(self, string new_data): self.length += len(new_data) self.buff += new_data def read(self, int length=0): if not length: length = self.length return self.buff.substr(0, length) def pop(self, int length=0): if not length: length = self.length ans = self.buff.substr(0, length) self.buff.erase(0, length) return ans
performance:
write
>>> timeit.timeit("arr.write('1234567890')", setup="from pyversion import Mike; arr = Mike()", number=1000000) 0.5992741584777832 >>> timeit.timeit("arr.write('1234567890')", setup="from cyversion import CyBuff; arr = CyBuff()", number=1000000) 0.17381906509399414
reading
>>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from pyversion import Mike; arr = Mike()", number=1000000) 1.1499049663543701 >>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from cyversion import CyBuff; arr = CyBuff()", number=1000000) 0.2894480228424072
popping up
>>> # note I'm using 10e3 iterations - the python version wouldn't return otherwise >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from pyversion import Mike; arr = Mike()", number=10000) 0.7390561103820801 >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from cyversion import CyBuff; arr = CyBuff()", number=10000) 0.01501607894897461