just out of curiosity, I implemented vector3 utilities in three ways: array (with typedef), class and struct
This is an array implementation:
typedef float newVector3[3]; namespace vec3{ void add(const newVector3& first, const newVector3& second, newVector3& out_newVector3); void subtract(const newVector3& first, const newVector3& second, newVector3& out_newVector3); void dot(const newVector3& first, const newVector3& second, float& out_result); void cross(const newVector3& first, const newVector3& second, newVector3& out_newVector3); }
And the implementation of the class:
class Vector3{ private: float x; float y; float z; public:
Of course, it contains other functions that usually appear in the Vector3 class.
And finally, the implementation of the structure:
struct s_vector3{ float x; float y; float z; // constructors s_vector3(float new_x, float new_y, float new_z){ x = new_x; y = new_y; z = new_z; } s_vector3(const s_vector3& other){ if(&other != this){ this->x = other.x; this->y = other.y; this->z = other.z; } }
Again, I omitted some of the other common features of Vector3. Now I have allowed all three of them to create 9,000,000 new objects and make 9,000,000 times of cross-product (I wrote a huge chunk of data for caching after one of them ends to avoid cache assistance).
Here is the test code:
const int K_OPERATION_TIME = 9000000; const size_t bigger_than_cachesize = 20 * 1024 * 1024; void cleanCache() { // flush the cache long *p = new long[bigger_than_cachesize];// 20 MB for(int i = 0; i < bigger_than_cachesize; i++) { p[i] = rand(); } } int main(){ cleanCache(); // first, the Vector3 struct std::clock_t start; double duration; start = std::clock(); for(int i = 0; i < K_OPERATION_TIME; ++i){ s_vector3 newVector3Struct = s_vector3(i,i,i); newVector3Struct = s_vector3::cross(newVector3Struct, newVector3Struct); } duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC; printf("The struct implementation of Vector3 takes %f seconds.\n", duration); cleanCache(); // second, the Vector3 array implementation start = std::clock(); for(int i = 0; i < K_OPERATION_TIME; ++i){ newVector3 newVector3Array = {i, i, i}; newVector3 opResult; vec3::cross(newVector3Array, newVector3Array, opResult); } duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC; printf("The array implementation of Vector3 takes %f seconds.\n", duration); cleanCache(); // Third, the Vector3 class implementation start = std::clock(); for(int i = 0; i < K_OPERATION_TIME; ++i){ Vector3 newVector3Class = Vector3(i,i,i); newVector3Class = Vector3::cross(newVector3Class, newVector3Class); } duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC; printf("The class implementation of Vector3 takes %f seconds.\n", duration); return 0; }
The result is amazing.
struct and class implementation completes the task in about 0.23 seconds, while array implementation takes only 0.08 seconds!
If an array has a significant performance advantage like this, although its syntax will be ugly, it should be used in many cases.
So I really want to make sure this has to happen? Thanks!