Python - a way to quickly multiply and reduce the matrix when working in memmaps and CPU

Hi, I have a problem with fast matrix multiplication, adding, repeat_function and summing with axis reduction and working in numpy.memmaps on a processor without RAM (I think). Only when using numexpr can I avoid creating an array from a point.

For example: a=np.require(np.memmap('a.npy',mode='w+',order='C',dtype=np.float64,shape=(10,1)),requirements=['O']) b=np.memmap('b.npy',mode='w+',order='C',dtype=np.float64,shape=(1,5)) c=np.memmap('c.npy',mode='w+',order='C',dtype=np.float64,shape=(1,5)) #func -> some method, like ie sin() #in numexpr it will be simple ne.evaluate('sum(func(b*a+c),axis=1)') #in numpy with einsum it will have to be with creating additional out-of-dot handling array d=np.require(np.memmap('d.npy',mode='w+',order='C',dtype=np.float64,shape=(10,5)),requirements=['O']) np.einsum('ij,kj->ki',b,a,out=d) d+=c func(d,out=d) np.einsum('ij->i',d,out=c) 

Is it possible to do this faster using a CPU without RAM than numexpr? What about Cython + FORTRAN lapack or blass? Any tips or tricks are welcome! Thanks for any help!

FURTHER INFORMATION: By the way, Im works on a laptop with an Intel Core2Duo t9300 processor, 2.7 GB of RAM (only this can be seen from 4 GB, due to some BIOS problem), a 250 GB SSD, an old Intel GPU. Due to the low RAM level, which is mainly used by Firefox with some add-ons, there is not much left to encode, so I avoid using xD.

And I feel that I am at an advanced level (step 1/1000) when programming, when at the moment I do not know how the code works on the hardware - I assume this is only (this may cause some errors in my thinking of XD).

EDIT: I made some code in the cython to calculate sine waves using numexpr and cython forange.

Ripple data (for om, eps, Spectra, Amplitude) is stored in OM numpy.memmap and time data (t, z) in TI numpy.memmap. OM has a shape like (4,1,2500), and TI has a shape (2,1,5e + 5,1) - I just need this in that shape.

 cdef inline void sine_wave_numexpr(OM,TI,int num_of_threads): cdef long m,n=10 cdef Py_ssize_t s=TI.shape[2]/n cdef str ex_sine_wave=r'sum(A*sin(om*ti+eps),axis=1)' cdef dict dct={'A':OM[3],'om':OM[0],'eps':OM[2]} for m in range(n): sl=slice(s*m,s*(m+1)) dct['ti']=TI[0,0,sl] evaluate(ex_sine_wave, global_dict=dct, out=TI[1,0,sl,0]) cdef inline void sine_wave_cython(double[:,:,::1]OM,double[:,:,:,::1]TI,int num_of_threads): cdef int i,j cdef Py_ssize_t n,m cdef double t,A,om,eps n=OM.shape[2] m=TI.shape[2] for i in prange(m,nogil=True,num_threads=num_of_threads): t=TI[0,0,i,0] for j in prange(n,num_threads=num_of_threads): A=OM[3,0,j] om=OM[0,0,j] eps=OM[2,0,j] TI[1,0,i,0]+=A*sin(om*t+eps) cpdef inline void wave_elevation(double dom,OM,TI,int num_of_threads, str method='cython'): cdef int ni cdef double i,j cdef Py_ssize_t shape=OM.shape[2] numexpr_threads(num_of_threads) OM[2,0]=2.*np.random.standard_normal(shape) evaluate('sqrt(dom*2*S)',out=OM[3], local_dict={'dom':dom,'S':OM[1]}) if method=='cython': sine_wave_cython(OM,TI,num_of_threads) elif method=='numexpr': sine_wave_numexpr(OM,TI,num_of_threads) TI.shape=TI.shape[:3] 

I am just starting out with Cython, so it cannot be well optimized. At the moment, the code with prange is running at the same time as when using numexpr (RAM usage is 100 MB for the entire code with the part turned on, CPU is 50%, SSD with a low calculation time is 1-2 minutes). I tried with memory, but created some local copies and used RAM over time. I need to be an advanced 3/3000 step to understand how to work with memory.

+2
source share

Source: https://habr.com/ru/post/971571/


All Articles