This piece of code relates to the dotproduct method for my vector class. The method calculates the internal products for the target array of vectors (1000 vectors).
When the length of the vector is an odd number (262145), the calculation time is 4.37 seconds. When the length of the vector (N) is 262144 (a multiple of 8), the calculation time is 1.93 seconds.
time1=System.nanotime(); int count=0; for(int j=0;j<1000;i++) { b=vektors[i]; // selects next vector(b) to multiply as inner product. // each vector has an array of float elements. if(((N/2)*2)!=N) { for(int i=0;i<N;i++) { t1+=elements[i]*b.elements[i]; } } else if(((N/8)*8)==N) { float []vek=new float[8]; for(int i=0;i<(N/8);i++) { vek[0]=elements[i]*b.elements[i]; vek[1]=elements[i+1]*b.elements[i+1]; vek[2]=elements[i+2]*b.elements[i+2]; vek[3]=elements[i+3]*b.elements[i+3]; vek[4]=elements[i+4]*b.elements[i+4]; vek[5]=elements[i+5]*b.elements[i+5]; vek[6]=elements[i+6]*b.elements[i+6]; vek[7]=elements[i+7]*b.elements[i+7]; t1+=vek[0]+vek[1]+vek[2]+vek[3]+vek[4]+vek[5]+vek[6]+vek[7]; //t1 is total sum of all dot products. } } } time2=System.nanotime(); time3=(time2-time1)/1000000000.0; //seconds
Question: Can reducing the time from 4.37 to 1.93s (2 times faster) be a JIT-wise decision to use SIMD instructions or just my cyclical positive effect?
If the JIT cannot automatically perform SIMD optimization, then this example also does not automatically optimize the JIT reversal, is that true ?.
For 1M iterations (vectors) and for vector size 64, does the multiplication multiplier reach 3.5X (cache advantage?).
Thanks.
source share