I am writing Langton ant sim (for proper RLR) and trying to optimize it for speed. Here is the corresponding code:
#define AREA_X 65536 #define AREA_Y 65536 #define TURN_LEFT 3 #define TURN_RIGHT 1 int main() { uint_fast8_t* state; uint_fast64_t ant=((AREA_Y/2)*AREA_X) + (AREA_X/2); uint_fast8_t ant_orientation=0; uint_fast8_t two_pow_five=32; uint32_t two_pow_thirty_two=0; uint_fast8_t change_orientation[4]={0, TURN_RIGHT, TURN_LEFT, TURN_RIGHT}; int_fast64_t* move_ant={AREA_X, 1, -AREA_X, -1}; ... initialise empty state while(1) { while(two_pow_five--) { while(--two_pow_thirty_two) {
I have two questions:
I tried to optimize as much as possible with trial and trial testing, are there any tricks that I have not used? Please try talking in c, not in the assembly, although I will probably try to build at some point.
Is there a better way to simulate a problem to increase speed?
Additional Information: Portability does not matter. I am on 64-bit Linux using gcc, i5-2500k and 16 GB of RAM. In the state array in which it is used, 4GiB is used, the program can use 12GiB RAM. SizeOf (uint_fast8_t) = 1. Estimates of the ratings are intentionally missing, corruption is easily detected manually from dumps.
edit: Possibly counterintuitively, overlapping switch statements instead of eliminating them has given the best performance so far.
edit: I re-simulated the problem and came up with something faster than one step to iterate. Previously, each state item used two bits and described a single cell in the Langton ant grid. The new method uses all 8 bits and describes a 2x2 grid section. At each iteration, a variable number of steps is performed by searching for previously calculated values โโof the number of steps, a new orientation and a new state for the current state + orientation. Assuming that everything is equally likely, it averages 2 steps per iteration. As a bonus, it uses 1/4 of the memory to simulate the same area:
while(--iteration) { // roughly 31 seconds per 2^32 steps table_offset=(state[ant]*24)+(ant_orientation*3); it+=twoxtwo_table[table_offset+0]; state[ant]=twoxtwo_table[table_offset+2]; ant+=move_ant2x2[(ant_orientation=twoxtwo_table[table_offset+1])]; }
We did not try to optimize it, the next is to try to eliminate the displacement equation and search with nested switches and constants, as before (but with 648 internal cases instead of 12).
source share