How to optimize this Langton ant sim?

Question

How to optimize this Langton ant sim?

I am writing Langton ant sim (for proper RLR) and trying to optimize it for speed. Here is the corresponding code:

#define AREA_X 65536 #define AREA_Y 65536 #define TURN_LEFT 3 #define TURN_RIGHT 1 int main() { uint_fast8_t* state; uint_fast64_t ant=((AREA_Y/2)*AREA_X) + (AREA_X/2); uint_fast8_t ant_orientation=0; uint_fast8_t two_pow_five=32; uint32_t two_pow_thirty_two=0;/*not fast, relying on exact width for overflow*/ uint_fast8_t change_orientation[4]={0, TURN_RIGHT, TURN_LEFT, TURN_RIGHT}; int_fast64_t* move_ant={AREA_X, 1, -AREA_X, -1}; ... initialise empty state while(1) { while(two_pow_five--)/*removing this by doing 32 steps per inner loop, ~16% longer*/ { while(--two_pow_thirty_two) { /*one iteration*/ /* 54 seconds for init + 2^32 steps ant_orientation = ( ant_orientation + (117>>((++state[ant])*2 )) )&3; state[ant] = (36 >> (state[ant] *2) ) & 3; ant+=move_ant[ant_orientation]; */ /* 47 seconds for init + 2^32 steps ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3; state[ant] += (state[ant]==2)?-2:1; ant+=move_ant[ant_orientation]; */ /* 46 seconds for init + 2^32 steps ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3; if(state[ant]==2) { --state[ant]; --state[ant]; } else ++state[ant]; ant+=move_ant[ant_orientation]; */ /* 44 seconds for init + 2^32 steps ant_orientation = ( ant_orientation + ((++state[ant])==2?3:1) )&3; if(state[ant]==3)state[ant]=0; ant+=move_ant[ant_orientation]; */ // 37 seconds for init + 2^32 steps // handle every situation with nested switches and constants switch(ant_orientation) { case 0: switch(state[ant]) { case 0: ant_orientation=1; state[ant]=1; ++ant; break; case 1: ant_orientation=3; state[ant]=2; --ant; break; case 2: ant_orientation=1; state[ant]=0; ++ant; break; } break; case 1: switch(state[ant]) { ... } break; case 2: switch(state[ant]) { ... } break; case 3: switch(state[ant]) { ... } break; } } } two_pow_five=32; ... dump to file every 2^37 steps } return 0; }

I have two questions:

I tried to optimize as much as possible with trial and trial testing, are there any tricks that I have not used? Please try talking in c, not in the assembly, although I will probably try to build at some point.
Is there a better way to simulate a problem to increase speed?

Additional Information: Portability does not matter. I am on 64-bit Linux using gcc, i5-2500k and 16 GB of RAM. In the state array in which it is used, 4GiB is used, the program can use 12GiB RAM. SizeOf (uint_fast8_t) = 1. Estimates of the ratings are intentionally missing, corruption is easily detected manually from dumps.

edit: Possibly counterintuitively, overlapping switch statements instead of eliminating them has given the best performance so far.

edit: I re-simulated the problem and came up with something faster than one step to iterate. Previously, each state item used two bits and described a single cell in the Langton ant grid. The new method uses all 8 bits and describes a 2x2 grid section. At each iteration, a variable number of steps is performed by searching for previously calculated values of the number of steps, a new orientation and a new state for the current state + orientation. Assuming that everything is equally likely, it averages 2 steps per iteration. As a bonus, it uses 1/4 of the memory to simulate the same area:

 while(--iteration) { // roughly 31 seconds per 2^32 steps table_offset=(state[ant]*24)+(ant_orientation*3); it+=twoxtwo_table[table_offset+0]; state[ant]=twoxtwo_table[table_offset+2]; ant+=move_ant2x2[(ant_orientation=twoxtwo_table[table_offset+1])]; }

We did not try to optimize it, the next is to try to eliminate the displacement equation and search with nested switches and constants, as before (but with 648 internal cases instead of 12).

+3

optimization c

John smith Jul 11 '13 at 17:19

source share

1 answer

huseyin tugrul buyukisik · Answer 1 · 2013-07-11T21:38:16+0000

Or you can use a single unsigned constant as an artificial register instead of branching:

 value: 1 3 1 1 bits: 01 11 01 01 ---->101 decimal value for an unsigned byte index 3 2 1 0 ---> get first 2 bits to get "1" (no shift) --> get second 2 bits to get "1" (shifting for 2 times) --> get third 2 bits to get "3" (shifting for 4 times) --> get last 2 bits to get "1" (shifting for 6 times) Then "AND" the result with binary(11) or decimal(3) to get your value. (101>>( (++state[ant])*2 ) ) & 3 would give you the turnright or turnleft Example: ++state[ant]= 0: ( 101>>( (0)*2 ) )&3 --> 101 & 3 = 1 ++state[ant]= 1: ( 101>>( (1)*2 ) )&3 --> 101>>2 & 3 = 1 ++state[ant]= 2: ( 101>>( (2)*2 ) )&3 --> 101>>4 & 3 = 3 -->turn left ++state[ant]= 3: ( 101>>( (3)*2 ) )&3 --> 101>>6 & 3 = 1 Maximum six-shifting + one-multiplication + one-"and" may be better. Dont forget constant can be auto-promoted so you may add some suffixes or something else.

Since you are using "unsigned int" for module% 4, you can use "and".

  state[ant]=state[ant]&3; instead of state[ant]=state[ant]%4;

For unskilled compilers, this should increase speed.

The hardest part: modulo-3

  C = A % B is equivalent to C = A – B * (A / B) We need state[ant]%3 Result = state[ant] - 3 * (state[ant]/3) state[ant]/3 is always <=1 for your valid direction states. Only when state[ant] is 3 then state[ant]/3 is 1, other values give 0. When multiplied by 3, that part is 0 or 3 (only 3 when state[ant] is 3 otherwise 0) Result = state[ant] - (0 or 3) Lets look at all possibilities: state[ant]=0: 0 - 0 ---> 0 ----> 00100100 shifted by 0 times &3 --> 00000000 state[ant]=1: 1 - 0 ---> 1 ----> 00100100 shifted by 2 times &3 --> 00000001 state[ant]=2: 2 - 0 ---> 2 ----> 00100100 shifted by 4 times &3 --> 00000010 state[ant]=3: 3 - 3 ---> 0 ----> 00100100 shifted by 6 times &3 --> 00000000 00100100 is 36 in decimal. (36 >> (state[ant] *2) ) & 3 will give you state[ant]%3 for your valid states (0,1,2,3) Example: state[ant]=0: 36 >> 0 --> 36 ----> 36& 3 ----> 0 satisfies 0%3 state[ant]=1: 36 >> 2 --> 9 -----> 9 & 3 ----> 1 satisfies 1%3 state[ant]=2: 36 >> 4 --> 2 -----> 2 & 3 ----> 2 satisfies 2%3 state[ant]=3: 36 >> 6 --> 0 -----> 0 & 3 ----> 0 satisfies 3%3

How to optimize this Langton ant sim?

More articles: