How to report progress in a long .Distinct () call in C #

I have an array of custom objects called AnalysisResult . An array can contain hundreds of thousands of objects; and sometimes I only need the Distinct() elements of this array. So, I wrote a product comparison class called AnalysisResultDistinctItemComparer and make my call like this:

 public static AnalysisResult[] GetDistinct(AnalysisResult[] results) { return results.Distinct(new AnalysisResultDistinctItemComparer()).ToArray(); } 

My problem is that this call can take LONG time (of the order of minutes) when the array is especially large (more than 200,000 objects).

I am currently calling this method in the background worker and showing a rotating gif to alert the user that this method is running and that the application is not frozen. All this is fine and good, but it does not give the user any indication of the current progress.

I really need to be able to tell the user the current progress of this action; but I could not come up with a good approach. I played with something like this:

 public static AnalysisResult[] GetDistinct(AnalysisResult[] results) { var query = results.Distinct(new AnalysisResultDistinctItemComparer()); List<AnalysisResult> retVal = new List<AnalysisResult>(); foreach(AnalysisResult ar in query) { // Show progress here retVal.Add(ar); } return retVal.ToArray(); } 

But the problem is that I have no way of knowing what my actual progress is. Thoughts? Suggestions?

+6
source share
3 answers

Do not call ToArray() at the end of your method, just use yield return . So do this:

 public static IEnumerable<AnalysisResult> Distinct(AnalysisResult[] results) { var query = results.Distinct(new AnalysisResultDistinctItemComparer()); foreach(AnalysisResult ar in query) { // Use yield return here, so that the iteration remains lazy. yield return ar; } } 

Basically, yield return uses some compiler magic to ensure that the iteration remains lazy, so you don't have to wait until the complete new collection is created before returning to the caller. Instead, as each item is computed, you immediately return that item to the consumer (which can then execute update logic - for each item, if necessary). You can use the same method in your GetDistinct method.

Jon Skeet has an implementation that looks like this ( LINQ's Distinct () for a specific property ):

 public static IEnumerable<TSource> DistinctBy<TSource, TKey> (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) { HashSet<TKey> seenKeys = new HashSet<TKey>(); foreach (TSource element in source) { if (seenKeys.Add(keySelector(element))) { yield return element; } } } 

Note that it uses a HashSet , which is built to prohibit duplicates. Just check if the item is added, and if not, return it.

With everything said, remember that this is a question such as "Algorithms and Data." It would be much easier to do something like this:

 Dictionary<Key, Value> distinctItems = new Dictionary<Key, Value>(); foreach (var item in nonDistinctSetOfItems) { if (distinctItems.ConatainsKey(item.KeyProperty) == false) { distinctItems.Add(item.KeyProperty, item); } } ... = distinctItems.Values // This would contain only the distinct items. 

That is, a character table / Dictionary is created for this problem - linking entries with unique keys. If you save your data this way, it will greatly simplify the problem. Do not forget about the simple solution!

+4
source

Given the design of this Distinct method, you repeat the entire collection every time you call Distinct. Have you considered a custom collection entry that adds to the index every time you add an object to an array?

+1
source

Alternatively, you can use ThreadPool and WaitHandle to run your Distinct and DisplayProgress business with multiple threads.

 public class Sample { public void Run() { var state = new State(); ThreadPool.QueueUserWorkItem(DoWork, state); ThreadPool.QueueUserWorkItem(ShowProgress, state); WaitHandle.WaitAll(new WaitHandle[] {state.AutoResetEvent}); Console.WriteLine("Completed"); } public void DoWork(object state) { //do your work here for (int i = 0; i < 10; i++) { ((State) state).Status++; Thread.Sleep(1000); } ((State) state).AutoResetEvent.Set(); } public void ShowProgress(object state) { var s = (State) state; while (!s.IsCompleted()) { if (s.PrintedStatus != s.Status) Console.WriteLine(s.Status); s.PrintedStatus = s.Status; } } public class State { public State() { AutoResetEvent = new AutoResetEvent(false); } public AutoResetEvent AutoResetEvent { get; private set; } public int Status { get; set; } public int PrintedStatus { get; set; } private bool _completed; public bool IsCompleted() { return _completed; } public void Completed() { _completed = true; AutoResetEvent.Set(); } } } 
0
source

Source: https://habr.com/ru/post/951807/


All Articles