Efficiency: String Slice Vs Custom Function

I have a large string with order length 5*10^6.

I need to do some processing by dividing it into blocks of 16 characters. I used a specially created function to split the string, assuming that its performance would be better than the splice approach.

The functions are as follows:

def spliceSplitter(s):
     sum = 0
     while len(s) > 0:
             block = s[:16]
             # Assuming the process to be done with data block is calculating its length.
             sum += len(block)
             s = s[16:]
     return sum

And user function:

def normalSplitter(s):
     sum = 0
     l = len(s)
     data =""
     for i in xrange(l):
             if i%16 == 0:
                     # Assuming the process to be done with data block is calculating its length.
                     sum += len(data)
                     data = ""
             data += s[i]
     return sum+len(data)

I used cProfiler for both of them, and the results were as follows (time in seconds):

String Length     |  Splice Splitter   |  Normal Splitter
---------------------------------------------------------
5000000           |  289.0             |  1.274 
500000            |  0.592             |  0.134
50000             |  0.25              |  0.28
5000              |  0.001             |  0.003 

I create a line as follows:

s = ''.join([str(random.randint(1,9)) for x in xrange(5000000)])

My question is:

  • Is there a pythonic way to get the same or better performance as a custom regular delimiter? Perhaps splitting the entire line in front of the hand, storing it in a list, and then using iteratively.
  • Splice Splitter ? ( )

: process(data), , .

Yield Splice Splitter, :

String Length     |  Splice Splitter   |  Normal Splitter  |  Yield/Generator
-------------------------------------------------------------------------------
5000000           |  0.148             |  1.274            |  0.223
500000            |  0.016             |  0.134            |  0.29
50000             |  0.003             |  0.28             |  0.005
5000              |  ~0.000            |  0.003            |  ~0.000

:

def pythonicSplitter(s):
     gen = (s[i:i+16] for i in xrange(0,len(s),16))
     sum = 0
     for data in gen:
             sum += len(data)
     return sum
def spliceSplitter(s):
    sum = 0
    for x in xrange(0, len(s), 16):
         block = s[x:x+16]
         # Assuming the process to be done with data block is calculating its length.
         sum += len(block)
    return sum

:

  • Splitter Splitter , . s = s[16:], . ~O(N^2).
  • s s[x:x+16], O(N*16), . Yield/Generator (pythonicSplitter()), - () , , Splice Splitter.
  • Splitter , 16. Python , , slice.
+4
1

, : s = s[16:] s , . block = s[:16] , . data = "" normalSplitter() , , , 16 , .

, , (, , ). .

def newSplitter(s, n=16):
    for i in xrange(0, len(s), n):
        yield l[i:i+n]
+4

Source: https://habr.com/ru/post/1544768/


All Articles