I am implementing a dynamic language that will compile in C #, and it implements its own reflection API (.NET is too slow, and DLR is limited only to newer and more inventive implementations).
To do this, I implemented a simple interface .GetField (line f) and .SetField (line f, val object). Until recently, an implementation simply toggles all possible values of a field string and takes the appropriate action. In addition, this dynamic language has the ability to define anonymous objects. For those anonymous objects, firstly, I implemented a simple hashing algorithm.
I'm currently looking for ways to optimize the dynamic parts of the language, and I came across the fact that the hash algorithm for anonymous objects would be redundant. This is due to the fact that objects are usually small. I would say that objects contain 2 or 3 fields, as a rule. Very rarely, they contain more than 15 fields. The actual line hash and the search will take longer than if I checked the equality between them. (This is not verified, just theoretically).
The first thing I did was, at compile time, create a red-black tree for each anonymous object declaration and overlay it on an array so that the object could look for it in a very optimized way.
I am still divided though if this is the best way to do this. I could find the perfect hash function. Even more radically, I’m thinking about giving up the need for strings and actually working with a two-length structure.
These two longs will be encoded to support 10 characters (A-za-z0-9_) each, which is basically a good prediction of the size of the fields. For fields larger than this, a special function (slower) that receives a string will also be provided.
The result will be that the strings will be included (not links) and their comparisons will be as cheap as a long comparison.
In any case, it’s a little difficult to find good information about this optimization, since it is usually considered at the vm level, and not at the static implementation of the language compilation.
Does anyone have any thoughts or tips on the best data structure for handling dynamic calls?
Edit: At the moment, I'm really going with a long string representation and finding a linear binary tree.