How to optimize mathematical operations on a matrix in python

I am trying to reduce the time of a function that performs a series of calculations with two matrices. Finding this, I heard about numpy, but I really don't know how to apply it to my problem. In addition, I think one of the things is that my function is slow, has many point operators (I heard about this in this page ).

Mathematics corresponds to factorization for the quadratic assignment problem:

QAP Factorization

My code is:

delta = 0 for k in xrange(self._tam): if k != r and k != s: delta += self._data.stream_matrix[r][k] \ * (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \ self._data.stream_matrix[s][k] \ * (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \ self._data.stream_matrix[k][r] \ * (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \ self._data.stream_matrix[k][s] \ * (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]]) return delta 

Performing this task in size 20 (20x20 matrix) takes about 20 segments, the bottleneck in this function

 ncalls tottime percall cumtime percall filename:lineno(function) 303878 15.712 0.000 15.712 0.000 Heuristic.py:66(deltaC) 

I tried applying map to a for loop, but since the body of the loop is not a function call, this is not possible.

How can I shorten the time?

EDIT1

To respond to eickenberg's comment:

sol is a permutation, for example [1,2,3,4]. the function is called when I create neighboring solutions, so the neighbor [1,2,3,4] is [2,1,3,4]. I change only two positions in the original permutation, and then call deltaC , which calculates the factorization of the solution with position r, s swaped (In the above example, r, s = 0,1). This permutation is performed to avoid computing the entire cost of the decision of the neighbors. I believe that I can store the sol[k,r,s] values ​​in a local variable to avoid looking for its value at each iteration. I do not know, this is what you asked for in your comment.

EDIT2

Minimum working example:

 import random distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]] stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]] def deltaC(r, s, S=None): ''' Difference between C with values i and j swapped ''' S = [0,1,2,3] if S is not None: sol = S else: sol = S delta = 0 sol_r, sol_s = sol[r], sol[s] for k in xrange(4): if k != r and k != s: delta += (stream_matrix[r][k] \ * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \ stream_matrix[s][k] \ * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \ stream_matrix[k][r] \ * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \ stream_matrix[k][s] \ * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])) return delta for _ in xrange(303878): d = deltaC(random.randint(0,3), random.randint(0,3)) print d 

Now I think the best option is to use NumPy. I tried using Matrix () but did not improve performance.

The best solution found

Well, finally, I was able to reduce the time by combining the @TooTone solution a bit and keeping the indices in the set to avoid if. Time was reduced from 18 seconds to 8 seconds. Here is the code:

 def deltaC(self, r, s, sol=None): delta = 0 sol = self.S if sol is None else self.S sol_r, sol_s = sol[r], sol[s] stream_matrix = self._data.stream_matrix distance_matrix = self._data.distance_matrix indexes = set(xrange(self._tam)) - set([r, s]) for k in indexes: sol_k = sol[k] delta += \ (stream_matrix[r][k] - stream_matrix[s][k]) \ * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \ + \ (stream_matrix[k][r] - stream_matrix[k][s]) \ * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r]) return delta 

To reduce the time even further, I think the best way is to write a module.

+4
python numpy matrix heuristics
Apr 12 '14 at 4:55
source share
1 answer

In the simple example that you indicated with for k in xrange(4): body of the loop is executed only twice (if r==s ) or three times (if r!=s ) and the initial numpy implementation is lower, the slower the large coefficient. Numpy is optimized for performing calculations over long vectors, and if the vectors are short, the overhead can outweigh the benefits. (And pay attention to this formula, matrices are sliced ​​in different dimensions and not indexed in contact, which can only complicate the task for vectorization).

 import numpy as np distance_matrix_np = np.array(distance_matrix) stream_matrix_np = np.array(stream_matrix) n = 4 def deltaC_np(r, s, sol): delta = 0 sol_r, sol_s = sol[r], sol[s] K = np.array([i for i in xrange(n) if i!=r and i!=s]) return np.sum( (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \ * (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \ (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \ * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r])) 

In this implementation of numpy, and not in a for loop on elements in K , operations apply to all elements in K within numpy. Also note that your mathematical expression may be simplified. Each term in brackets on the left is a negative term in brackets on the right. enter image description here

This also applies to your source code. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) , so you did unnecessary calculations and your source code can be optimized without using numpy.

Turns out the numpy function's bottleneck is understanding an innocent list

 K = np.array([i for i in xrange(n) if i!=r and i!=s]) 

As soon as it is replaced with a vectorizing code

 if r==s: K=np.arange(n-1) K[r:] += 1 else: K=np.arange(n-2) if r<s: K[r:] += 1 K[s-1:] += 1 else: K[s:] += 1 K[r-1:] += 1 

the numpy function is much faster.

The runtime graph is shown below (at the bottom bottom of this answer is the original graph before optimizing the numpy function). You can see that it makes sense to use your optimized source code or numpy code depending on how big the matrix is.

enter image description here

The full code is provided below for reference, partly in case someone else can continue it. (The deltaC2 function is your source code, optimized to take into account how to simplify the mathematical expression.)

 def deltaC(r, s, sol): delta = 0 sol_r, sol_s = sol[r], sol[s] for k in xrange(n): if k != r and k != s: delta += \ stream_matrix[r][k] \ * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \ stream_matrix[s][k] \ * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \ stream_matrix[k][r] \ * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \ stream_matrix[k][s] \ * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]) return delta import numpy as np def deltaC_np(r, s, sol): delta = 0 sol_r, sol_s = sol[r], sol[s] if r==s: K=np.arange(n-1) K[r:] += 1 else: K=np.arange(n-2) if r<s: K[r:] += 1 K[s-1:] += 1 else: K[s:] += 1 K[r-1:] += 1 #K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW return np.sum( (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \ * (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \ (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \ * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r])) def deltaC2(r, s, sol): delta = 0 sol_r, sol_s = sol[r], sol[s] for k in xrange(n): if k != r and k != s: sol_k = sol[k] delta += \ (stream_matrix[r][k] - stream_matrix[s][k]) \ * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \ + \ (stream_matrix[k][r] - stream_matrix[k][s]) \ * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r]) return delta import time N=200 elapsed1s = [] elapsed2s = [] elapsed3s = [] ns = range(10,410,10) for n in ns: distance_matrix_np=np.random.uniform(0,n**2,size=(n,n)) stream_matrix_np=np.random.uniform(0,n**2,size=(n,n)) distance_matrix=distance_matrix_np.tolist() stream_matrix=stream_matrix_np.tolist() sol = range(n-1,-1,-1) sol_np = np.array(range(n-1,-1,-1)) Is = np.random.randint(0,n-1,4) Js = np.random.randint(0,n-1,4) total1 = 0 start = time.clock() for reps in xrange(N): for i in Is: for j in Js: total1 += deltaC(i,j, sol) elapsed1 = (time.clock() - start) start = time.clock() total2 = 0 start = time.clock() for reps in xrange(N): for i in Is: for j in Js: total2 += deltaC_np(i,j, sol_np) elapsed2 = (time.clock() - start) total3 = 0 start = time.clock() for reps in xrange(N): for i in Is: for j in Js: total3 += deltaC2(i,j, sol_np) elapsed3 = (time.clock() - start) print n, elapsed1, elapsed2, elapsed3, total1, total2, total3 elapsed1s.append(elapsed1) elapsed2s.append(elapsed2) elapsed3s.append(elapsed3) #Check errors of one method against another #err = 0 #for i in range(min(n,50)): # for j in range(min(n,50)): # err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np)) #print err import matplotlib.pyplot as plt plt.plot(ns, elapsed1s, label='Original',lw=2) plt.plot(ns, elapsed3s, label='Optimized',lw=2) plt.plot(ns, elapsed2s, label='numpy',lw=2) plt.legend(loc='upper left', prop={'size':16}) plt.xlabel('matrix size') plt.ylabel('time') plt.show() 

And here is the original chart before optimizing list comprehension in deltaC_np

enter image description here

+6
Apr 12 '14 at 23:13
source share



All Articles