What is the best collection used to uniquely identify nodes?

Question

What is the best collection used to uniquely identify nodes?

I am currently using Dictionary<int,node> to store about 10,000 nodes. The key is used as an identification number for subsequent searches, and "node" is a class containing some data. Other classes in the program use an identification number as a pointer to a node. (This may seem ineffective. However, explaining my argument for using a dictionary for this is beyond the scope of my question.)

However, 20% of the nodes are duplicated. What I want to do is when I add a node check to make sure that this is all set. if so, use this identification number. If not create a new one.

This is my current solution to the problem:

 public class nodeDictionary { Dictionary<int, node> dict = new Dictionary<int, node>( ); public int addNewNode( latLng ll ) { node n = new node( ll ); if ( dict.ContainsValue( n ) ) { foreach ( KeyValuePair<int, node> kv in dict ) { if ( kv.Value == n ) { return kv.Key; } } } else { if ( dict.Count != 0 ) { dict.Add( dict.Last( ).Key + 1, n ); return dict.Last( ).Key + 1; } else { dict.Add( 0, n ); return 0; } } throw new Exception( ); }//end add new node }

The problem is that when you try to add a new node to a list of 100,000 nodes, it takes 78 milliseconds to add a node. This is unacceptable because I could add an extra 1000 nodes at any given time.

So, is there a better way to do this? I'm not looking for someone to write code for me, I'm just looking for guidance.

+4

collections c # .net data-structures

Chad carisch Nov 01 '11 at 17:55

source share

6 answers

What is the purpose of this code?

 if ( dict.ContainsValue( n ) ) { foreach ( KeyValuePair kv in dict ) { if ( kv.Value == n ) { return kv.Key; } } }

ContainsValue looking for a value (instead of a key) and very inefficient (O (n)). The same goes for foreach . Not to mention that you do both when only one is required (you can completely remove ContainsValue by slightly changing the if )!

You should probably use an extra dictionary, which is the “reverse” source (that is, the values in the old dictionary are the keys in the new one and vice versa) to “cover” your search patterns (similar to how databases can support multiple indexes par table to cover multiple path tables may be requested).

+1

Branko dimitrijevic Nov 01 '11 at 18:19

source share

I would add a second dictionary for the reverse direction. those. Dictionary<Node,int>

Then you either

Contained with reference equality and do nothing.
Create an IEqualityComparer<Node> and put it in the dictionary
Replace Equals and GetHashCode with Node

In both cases, a good implementation for hashcode is essential to ensure good performance.

+1

CodesInChaos Nov 01 '11 at 18:19

source share

Your decision is not only slow, but also wrong. The order of the elements in the Dictionary is undefined, so dict.Last() does not guarantee the return of the last element added. (Although it may often look like this.)

Using an identifier to identify an object in your application also seems wrong. You should consider using object references directly.

But if you want to use your current design and assuming that you are comparing nodes based on their latLng , you can create two dictionaries: the one that you already have, and the second, Dictionary<latLng, int> , which you can use to effectively cover whether a specific node already exists. And if so, it gives you its identifier.

+1

svick Nov 01 '11 at 18:20

source share

You can try using HashSet<T>

0

Inbetween Nov 01 '11 at 18:00

source share

You might want to consider restructuring to just use a list (where the "key" is just an index in the list) instead of a dictionary. Several advantages:

Finding an element with an integer key is now O (1) (and very fast O (1), given that it just plays the array inside).
When you insert a new item, you search O (n) to see if it exists in the list. If this is not the case, you have already gone through this list and can record whether there was a record with a null record. If you have, this index is the new key. If not, the new key is the current list. You list a collection once instead of several times, and the listing itself is much faster than listing the Dictionary.

0

Dan bryant Nov 01 '11 at 18:14

source share

sehe · Accepted Answer · 2011-11-01T18:01:56+0000

Looks like you want

make sure LatLng overrides Equals / GetHashCode (it is advisable to implement the IEquatable<LatLng> interface)
enter all elements directly into the HashSet<LatLng>

To implement GetHashCode, see here: Why is it important to override GetHashCode when the Equals method is overridden?

If you need to generate "artificial" unique identifiers in some way, I suggest you use the dictionary again, but "in reverse order":

 // uses the same hash function for speedy lookup/insertion IDictionary<LatLng, int> idMap = new Dictionary<LatLng, int>(); foreach (LatLng latLng in LatLngCoords) { if (!idMap.ContainsKey(latLng)) idMap.Add(latLng, idMap.Count+1); // to start with 1 }

You can replace idMap with HashSet<> ; the implementation (and performance characteristics) is essentially the same as the associative container.

Here's the search function to get from LatLng to Id:

 int IdLookup(LatLng latLng) { int id; if (idMap.TryGetValue(latLng, id)) return id; throw new InvalidArgumentException("Coordinate not in idMap"); }

You can just add it:

 int IdFor(LatLng latLng) { int id; if (idMap.TryGetValue(latLng, id)) return id; id = idMap.Count+1; idMap.Add(latLng, id); return id; }

What is the best collection used to uniquely identify nodes?

More articles: