How to effectively compare the list?

I am currently working on a web application in asp.net. In some api calls, it is necessary to compare ListA with ListB of lists to determine if ListA have the same elements of any List in ListB. In other words: If ListA is included in ListB.

Both collections are requested with Linq from EF-Code-First db. ListB has either one corresponding list or none, no more than one. In the worst case, ListB has millions of elements, so the comparison must be scalable.

Instead of doing nested foreach loops, I'm looking for a clean linq query that will allow db to do the job. (before I look at the index of multiple columns)

To illustrate the structure:

//In reality Lists are queried of EF var ListA = new List<Element>(); var ListB = new List<List<Element>>(); List<Element> solution; bool flag = false; foreach (List e1 in ListB) { foreach(Element e2 in ListA) { if (e1.Any(e => e.id == e2.id)) flag = true; else { flag = false; break; } } if(flag) { solution = e1; break; } } 

Update structure

Since its database is EF, I will provide an appropriate object structure. I'm not sure that I am allowed to publish real code, so this example is still general.

 //List B class Result { ... public int Id; public virtual ICollection<Curve> curves; ... } class Curve { ... public int Id; public virtual Result result; public int resultId; public virtual ICollection<Point> points; ... } public class Point{ ... public int Id; ... } 

The controller (for api-call) wants to serve the right Curve-Object. To determine the correct object, a filter (ListA) (which is actually a curve object) is provided. Now the filter (ListA) needs to be compared with the list of curves in the results (ListB). The only way to compare the curves is to compare the points that are. (So ​​the actual comparison of the lists) Curves have about 1 - 50 points. The result may have about 500,000,000 curves.

Here it is possible to compare Object-Identity, because all objects (even the filter) are re-requested from db.

I am looking for a way to implement this mechanism, and not how to get around this situation. (e.g. using an index of multiple columns (table change))

(for illustration):

 class controller { ... public Response serveRequest(Curve filter) { foreach(Curve c in db.Result.curves) { if(compare(filter.points , c.points)) return c; } } } 
+6
source share
4 answers

Use Except:

  public static bool ContainsAllItems(IList<T> listA, IList<T> listB) { return !listB.Except(listA).Any(); } 

the above method will tell if listA contains all the elements of listB or not ... and the complexity is much faster than the O (n * m) approach.

+2
source

Try the following:

 bool isIn = ListB.Any(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y))); 

or if you want the item

 var solution = ListB.FirstOrDefault(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y))); 
0
source

I have something for you:

 var db = new MyContext(); var a = db.LoadList(); // or whatever var b = new List<IQueryable<Entities>>(db.LoadListOfLists()/*or whatever*/); b.Any(x => x.Count.Equals(a.Count) & x.All(y => a.Any(z => z.Id == y.Id))); 
0
source

Since performance is a concern, I would suggest converting your listA to lookup / dictionary before comparing Ex -

 var listALookup = listA.ToLookup(item => item.Id); var result = listB.FirstOrDefault(childList => childList.Count == listA.Count && childList.All(childListItem => listALookup.Contains(childListItem.Id))); 

Lookup.Contain is O (1), while List.Contains is O (n)

It is better to use this dB level comparison to reduce the load of unnecessary data.

0
source

Source: https://habr.com/ru/post/1015210/


All Articles