Speed โ€‹โ€‹Up Numpy Array Iteration

I am working on performing image processing using Numpy, in particular for standard deviation. It reads in the X number of columns, finds Std. and performs percent linear tension. He then repeats the next โ€œgroupโ€ of columns and performs the same operations. The input image is 1 GB, a 32-bit disposable raster graphics processor that takes a lot of time to process (hours). Below is the code.

I understand that I have 3 nested loops, which are supposedly where the bottleneck occurs. If I process the image in "boxes", that is, loading an array that is [500, 500], and the iteration through the image processing time is quite short. Unfortunately, a camera error requires me to iterate over very long stripes (52,000 x 4) (y, x) to avoid going around.

Any suggestions to expedite this will be appreciated:

def box(dataset, outdataset, sampleSize, n): quiet = 0 sample = sampleSize #iterate over all of the bands for j in xrange(1, dataset.RasterCount + 1): #1 based counter band = dataset.GetRasterBand(j) NDV = band.GetNoDataValue() print "Processing band: " + str(j) #define the interval at which blocks are created intervalY = int(band.YSize/1) intervalX = int(band.XSize/2000) #to be changed to sampleSize when working #iterate through the rows scanBlockCounter = 0 for i in xrange(0,band.YSize,intervalY): #If the next i is going to fail due to the edge of the image/array if i + (intervalY*2) < band.YSize: numberRows = intervalY else: numberRows = band.YSize - i for h in xrange(0,band.XSize, intervalX): if h + (intervalX*2) < band.XSize: numberColumns = intervalX else: numberColumns = band.XSize - h scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float) standardDeviation = numpy.std(scanBlock) mean = numpy.mean(scanBlock) newMin = mean - (standardDeviation * n) newMax = mean + (standardDeviation * n) outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255 outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset scanBlockCounter = scanBlockCounter + 1 #print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX) if numberColumns == band.XSize - h: break #update progress line if not quiet: gdal.TermProgress_nocb( (float(h+1) / band.YSize) ) 

Here is the update: Without using the profile module, since I did not want to run small sections of code in the function, I used a combination of print and output operators to get a really general idea of โ€‹โ€‹which lines take the most time. Fortunately (and I understand how lucky I am), one line pulled everything down.

  outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset 

It seems that GDAL is quite inefficient at opening the output file and writing the array. With that in mind, I decided to add the modified "outBlock" arrays to the python list, and then write out the pieces. Here is the segment that I changed:

The output block has just been changed ...

  #Add the array to a list (tuple) outputArrayList.append(outputBlock) #Check the interval counter and if it is "time" write out the array if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1: #Convert the tuple to a numpy array. Here we horizontally stack the tuple of arrays. stacked = numpy.hstack(outputArrayList) #Write out the array outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset xOffset = xOffset + (intervalX*(intervalX * writeSize)) #Cleanup to conserve memory outputArrayList = list() stacked = None finisher=0 

A finisher is just a flag that processes edges. It took a little time to figure out how to build an array from a list. In this case, using numpy.array created a 3-dimensional array (does anyone want to explain why?), And a 2d array is required to write the array. The total processing time varies from less than 2 minutes to 5 minutes. Any idea why there might be a time range?

Many thanks to everyone who posted! The next step is to really get into Numpy and learn about vectorization for additional optimization.

+6
source share
3 answers

Answered as requested.

If you are attached to IO, you should block your read / write. Try to dump ~ 500 MB of data in ndarray, process everything, write, and then capture the next ~ 500 MB. Remember to reuse ndarray.

+5
source

One way to speed up operations on numpy data is to use vectorize . Essentially, vectorize takes a function f and creates a new function g that maps f to an array a . g then called g(a) : a g(a) .

 >>> sqrt_vec = numpy.vectorize(lambda x: x ** 0.5) >>> sqrt_vec(numpy.arange(10)) array([ 0. , 1. , 1.41421356, 1.73205081, 2. , 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ]) 

Without the data that you work with, I cannot say for sure whether this will help, but perhaps you can rewrite the above as a set of functions that can be vectorized . Perhaps in this case you could vectorize an array of indices in ReadAsArray(h,i,numberColumns, numberRows) . Here is an example of potential benefits:

 >>> print setup1 import numpy sqrt_vec = numpy.vectorize(lambda x: x ** 0.5) >>> print setup2 import numpy def sqrt_vec(a): r = numpy.zeros(len(a)) for i in xrange(len(a)): r[i] = a[i] ** 0.5 return r >>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup1, number=1) 0.30318188667297363 >>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup2, number=1) 4.5400981903076172 

15x acceleration! Also note that numpy slicing handles the edges of ndarray elegantly:

 >>> a = numpy.arange(25).reshape((5, 5)) >>> a[3:7, 3:7] array([[18, 19], [23, 24]]) 

So, if you could get your ReadAsArray data in ndarray , you would not need to do any Hennigan checks.


Regarding your question about reformation - restructuring does not fundamentally change the data at all. It just changes the "steps" by which numpy indexes the data. When you call the reshape method, the return value is a new kind of data; data is not copied or changed at all, as well as the old view with the old step information.

 >>> a = numpy.arange(25) >>> b = a.reshape((5, 5)) >>> a array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]) >>> b array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) >>> a[5] 5 >>> b[1][0] 5 >>> a[5] = 4792 >>> b[1][0] 4792 >>> a.strides (8,) >>> b.strides (40, 8) 
+6
source

Without trying to fully understand what you are doing, I notice that you are not using numpy slices or a broadcast array , both of which can speed up your code or at least make it more readable. We apologize if they are not related to your problem.

+2
source

Source: https://habr.com/ru/post/892515/


All Articles