C # Union vs Contains lists of continuous data

Question

C # Union vs Contains lists of continuous data

I am having trouble finding answers to a question that I have about some code that depends on what I'm working on, and I can’t find any documentation on how Union works on it in the main mechanics in C #. So the problem is this.

I have a data set that works similarly to this example:

object[] someMainTypeArray = new object [n]; List<object> objList2 = new List<object>(); foreach ( object obj in someMainTypeArray ) { List<object> objList1 = new List<object>() { "1","2","3" }; //each obj has a property that will generate a list of data //objList1 is the result of the data specific to obj //some of this data could be duplicates //Which is better, this: foreach ( object test in objList1 ) { if ( !objList2.Contains( test ) ) { objList2.Add( test ); } } //or this: objList2 = objList2.Union( objList1 ).ToList(); //Also, assume this has to happen anywhere from 0 to 60 times per second }

How much more efficient is it to allow the Union all the work? Or is it better to compare each item using Contains?

If not for both, then what is the best way to populate unique lists using minimal processing time?

Efficiency is important for this. In addition, this is not homework or something related to work, just related to training.

Lists are continuous at run time so that they are eventually cleared and re-updated. Changes to the lists are used to make decisions based on whether the final result lists similar to this example are used to compile the final list, and if this list is empty, this is a failure condition, and if this list is not empty, its success condition.

Here is a snippet of code for one of the generated lists:

  Player.ClearMoves(); List<Pair<BoardLocation, BoardLocation>> attacking = new List<Pair<BoardLocation, BoardLocation>>(); foreach ( ChessPiece p in Board[this.Player.Opponent] ) { if ( p.TheoryMove( this.Location ) ) { foreach ( Pair<BoardLocation , BoardLocation> l in Utility.GetLocations( p.Location , this.Location ) ) { if ( !attacking.Contains( l ) ) { attacking.Add( l ); } } } } if ( attacking.Count < 1 ) { return false; }

+5

contains c # unions

G1xb17 Nov 24 '15 at 10:57

source share

2 answers

If you look at the source of links for LINQ extensions (find UnionIterator ), you will see that Union works internally using Set<T> to keep track of which items were listed. Unfortunately, Set<T> is an inner class of the library, so you cannot use it directly. But there is a similar set called HashSet<T> that you can use.

Probably one of the main inefficiencies of your implementation is to create a new list for objList2 in each iteration of your outer loop. This will cause iterations and memory allocations every time. Since you are creating a list through nested loops, I would recommend doing one of the following:

Add everything to the List<T> and use .Distinct when you're done to filter out duplicates. This approach also uses the inner class Set<T> , but unlike a chain of several calls on Union it will use only one Set to create a unique list.
Use HashSet<T> to create a collection that always has a unique list of items.

+3

Erik Nov 24 '15 at 23:11

source share

poke · Accepted Answer · 2015-11-24T23:05:07+0000

You can find the implementation of Enumerable.Union in the reference source .

Here's how it works:

 public static IEnumerable<TSource> Union<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second) { if (first == null) throw Error.ArgumentNull("first"); if (second == null) throw Error.ArgumentNull("second"); return UnionIterator<TSource>(first, second, null); } static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) { Set<TSource> set = new Set<TSource>(comparer); foreach (TSource element in first) if (set.Add(element)) yield return element; foreach (TSource element in second) if (set.Add(element)) yield return element; }

As you can see, Union will iterate over both enumerations and derive objects from these sources. Like all Linq methods, it will not create a list, but work as a generator function. The list will only be created when .ToList() called.

To avoid duplication, he will use Set and try to add an element before yielding it. If the addition to the set was successful, then the element was not there yet, so it can be obtained.

Note that sets are very effective at finding if an item exists in it. They provide the search for elements in amortized constant time mode. Thus, it is definitely more efficient than your objList2.Contains , which will need to be objList2.Contains through the list again and again to determine if every item exists in it.

Also note that Union built to maintain the order of input enums. If you don’t need it, you can skip this completely and just use Set first. This is especially good if you plan to add new items to the same target set all the time, since it reuses the structure:

 HashSet<object> set = new HashSet<object>(); foreach (…) { List<object> objList1 = … // expand the set with the items from `objList1` set.UnionWith(objList1); }

It would be even better if you avoided creating objList1 in the first place and simply added your elements to the set directly - if this is possible for your use case.

C # Union vs Contains lists of continuous data

More articles: