I have a large string with order length 5*10^6.
I need to do some processing by dividing it into blocks of 16 characters. I used a specially created function to split the string, assuming that its performance would be better than the splice approach.
The functions are as follows:
def spliceSplitter(s):
sum = 0
while len(s) > 0:
block = s[:16]
sum += len(block)
s = s[16:]
return sum
And user function:
def normalSplitter(s):
sum = 0
l = len(s)
data =""
for i in xrange(l):
if i%16 == 0:
sum += len(data)
data = ""
data += s[i]
return sum+len(data)
I used cProfiler for both of them, and the results were as follows (time in seconds):
String Length | Splice Splitter | Normal Splitter
---------------------------------------------------------
5000000 | 289.0 | 1.274
500000 | 0.592 | 0.134
50000 | 0.25 | 0.28
5000 | 0.001 | 0.003
I create a line as follows:
s = ''.join([str(random.randint(1,9)) for x in xrange(5000000)])
My question is:
- Is there a pythonic way to get the same or better performance as a custom regular delimiter? Perhaps splitting the entire line in front of the hand, storing it in a list, and then using iteratively.
- Splice Splitter ? ( )
: process(data), , .
Yield Splice Splitter, :
String Length | Splice Splitter | Normal Splitter | Yield/Generator
-------------------------------------------------------------------------------
5000000 | 0.148 | 1.274 | 0.223
500000 | 0.016 | 0.134 | 0.29
50000 | 0.003 | 0.28 | 0.005
5000 | ~0.000 | 0.003 | ~0.000
:
def pythonicSplitter(s):
gen = (s[i:i+16] for i in xrange(0,len(s),16))
sum = 0
for data in gen:
sum += len(data)
return sum
def spliceSplitter(s):
sum = 0
for x in xrange(0, len(s), 16):
block = s[x:x+16]
sum += len(block)
return sum
:
- Splitter Splitter , .
s = s[16:], . ~O(N^2). s s[x:x+16], O(N*16), . Yield/Generator (pythonicSplitter()), - () , , Splice Splitter.- Splitter , 16. Python , , slice.