Requirements:
- I need to assemble an array arbitrarily large from the data.
- I can guess the size (approximately 100-200) without any guarantee that the array will fit every time
- Once it grows to its final size, I need to do numerical calculations on it, so I would prefer to end up moving to an array with two sizes.
- Speed ββis critical. For example, for one of the 300 files, the update () method is called 45 million times (takes 150 s or so), and the finalize () method is called 500k times (takes a total of 106 seconds) ... a total of 250 or so.
Here is my code:
def __init__(self): self.data = [] def update(self, row): self.data.append(row) def finalize(self): dx = np.array(self.data)
Other things I tried include the following code ... but this is waaaaay slower.
def class A: def __init__(self): self.data = np.array([]) def update(self, row): np.append(self.data, row) def finalize(self): dx = np.reshape(self.data, size=(self.data.shape[0]/5, 5))
Here is a diagram of what it's called:
for i in range(500000): ax = A() for j in range(200): ax.update([1,2,3,4,5]) ax.finalize() # some processing on ax
performance python numpy
fodon Aug 20 '11 at 18:45 2011-08-20 18:45
source share