I wrote a direct routine in julia, and portability problems forced me to rewrite it in C. After writing it, I was surprised by the acceleration, I expected even an order of magnitude, but not two!
I was wondering if this kind of acceleration is normal, just rewriting in C, being Julia, so focused on speed and HPC.
These are codes, I simplified them to make them concise while maintaining the acceleration C (all masses are 1, Force is just the distance between two bodies).
iterate over each index (the star array propierties is a fixed size, but I use only the first 400 for testing) and calculate the contribution of the remaining indices, and then use the Euler integrator to calculate the new position (new speed + = F / m times dt, new position + = speed dt).
C compiled with gccand without special flags time ./a.outgives 0.98s:
#include <stdio.h>
#include <stdlib.h>
#define MAX_STAR_N (int)5e5
double *x,*y,*z,*vx,*vy,*vz;
void evolve_bruteforce(double dt){
int i,j;
for(i=0;i<400;i++){
double cacheforce[3] = {0,0,0};
double thisforce[3];
for(j=0;j<400;j++){
if(i!=j){
thisforce[0] = (x[j] - x[i]);
thisforce[1] = (y[j] - y[i]);
thisforce[2] = (z[j] - z[i]);
cacheforce[0] += thisforce[0];
cacheforce[1] += thisforce[1];
cacheforce[2] += thisforce[2];
}
}
vx[i] += cacheforce[0]*dt;
vy[i] += cacheforce[1]*dt;
vz[i] += cacheforce[2]*dt;
}
for(i=0;i<400;i++){
x[i] += vx[i]*dt;
y[i] += vy[i]*dt;
z[i] += vz[i]*dt;
}
}
int main (int argc, char *argv[]){
x = malloc(sizeof(double)*MAX_STAR_N);
y = malloc(sizeof(double)*MAX_STAR_N);
z = malloc(sizeof(double)*MAX_STAR_N);
vx = malloc(sizeof(double)*MAX_STAR_N);
vy = malloc(sizeof(double)*MAX_STAR_N);
vz = malloc(sizeof(double)*MAX_STAR_N);
int i;
for(i=0;i<1000;i++)
{
evolve_bruteforce(0.001);
}
}
Julia's code executed with help julia -O --check-bounds=nogives 102 seconds:
function evolve_bruteforce(dt,x,y,z,vx,vy,vz)
for i in 1:400
cacheforce = [0.0,0.0,0.0]
thisforce = Vector{Float64}(3)
for j in 1:400
if i != j
thisforce[1] = (x[j] - x[i])
thisforce[2] = (y[j] - y[i])
thisforce[3] = (z[j] - z[i])
cacheforce[1] += thisforce[1]
cacheforce[2] += thisforce[2]
cacheforce[3] += thisforce[3]
vx[i] += cacheforce[1]*dt
vy[i] += cacheforce[2]*dt
vz[i] += cacheforce[3]*dt
end
for i in 1:400
x[i] += vx[i]*dt
y[i] += vy[i]*dt
z[i] += vz[i]*dt
end
end
end
end
function main()
x = zeros(500000)
y = zeros(500000)
z = zeros(500000)
vx = zeros(500000)
vy = zeros(500000)
vz = zeros(500000)
@time for i in 1:1000
evolve_bruteforce(0.001,x,y,z,vx,vy,vz)
end
end
main()
I do not know how I can facilitate the answer, if I can change the message in any way, let me know.