The most suitable data structure for accessing dynamic language fields

I am implementing a dynamic language that will compile in C #, and it implements its own reflection API (.NET is too slow, and DLR is limited only to newer and more inventive implementations).

To do this, I implemented a simple interface .GetField (line f) and .SetField (line f, val object). Until recently, an implementation simply toggles all possible values ​​of a field string and takes the appropriate action. In addition, this dynamic language has the ability to define anonymous objects. For those anonymous objects, firstly, I implemented a simple hashing algorithm.

I'm currently looking for ways to optimize the dynamic parts of the language, and I came across the fact that the hash algorithm for anonymous objects would be redundant. This is due to the fact that objects are usually small. I would say that objects contain 2 or 3 fields, as a rule. Very rarely, they contain more than 15 fields. The actual line hash and the search will take longer than if I checked the equality between them. (This is not verified, just theoretically).

The first thing I did was, at compile time, create a red-black tree for each anonymous object declaration and overlay it on an array so that the object could look for it in a very optimized way.

I am still divided though if this is the best way to do this. I could find the perfect hash function. Even more radically, I’m thinking about giving up the need for strings and actually working with a two-length structure.

These two longs will be encoded to support 10 characters (A-za-z0-9_) each, which is basically a good prediction of the size of the fields. For fields larger than this, a special function (slower) that receives a string will also be provided.

The result will be that the strings will be included (not links) and their comparisons will be as cheap as a long comparison.

In any case, it’s a little difficult to find good information about this optimization, since it is usually considered at the vm level, and not at the static implementation of the language compilation.

Does anyone have any thoughts or tips on the best data structure for handling dynamic calls?

Edit: At the moment, I'm really going with a long string representation and finding a linear binary tree.

+4
source share
3 answers

I don’t know if this is useful, but I’ll post it in case

If this is compilation in C #, do you know the complete list of fields at compile time? So, as an idea, if your code is reading

// dynamic myObject.foo = "some value"; myObject.bar = 32; 

then during parsing, your symbol table can build an int for each field name;

 // parsing code symbols[0] == "foo" symbols[1] == "bar" 

then generate code using arrays or lists;

 // generated c# runtimeObject[0] = "some value"; // assign myobject.foo runtimeObject[1] = 32; // assign myobject.bar 

and create reflection as a separate array;

 runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string> runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int> 

As I said, it was thrown out in the hope that it would be useful. I do not know if this will happen!

+1
source

Since you are likely to use the same field and method names repeatedly, something like string interning will work well to quickly generate keys for your hash table. It would also ensure constant matching of string values.

+1
source

For such a small data set (expected upper bounds of 15) I think that almost any hashing will be more expensive than a tree or even a list search, but it really depends on your hashing algorithm.

If you want to use a dictionary / hash, you need to make sure that the objects you use for the key quickly return a hash code (perhaps one constant hash code that was created once). If you can prevent collisions inside the object (sounds pretty doable), you will get the speed and scalability (well, for any realistic size of the object / class) of the hash table.

Something that comes to mind is Ruby characters and messaging. I believe that Ruby characters act as a constant only for reference to memory. Thus, the comparison is constant, they are very lightweight, and you can use characters as variables (I am a little foggy on this and I don't have a Ruby interpreter on this machine). The Ruby call method really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (": func" is a character). I would suggest that this character makes searching for a message handler (as I call it) inside an object quite effective, since its hash code most likely does not need to be calculated, like most objects.

Perhaps something like this could be done in .NET.

0
source

Source: https://habr.com/ru/post/1336741/


All Articles