Why is this happening? The most efficient way to get multiple objects using a primary key?

What is the most efficient way to select multiple objects using a primary key?

public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids) { //return ids.Select(id => Images.Find(id)); //is this cool? return Images.Where( im => ids.Contains(im.Id)); //is this better, worse or the same? //is there a (better) third way? } 

I understand that I could run some performance tests for comparison, but I wonder if there is actually a better way than both, and I'm looking for some enlightenment about the difference between the two queries, if any, once they were "translated".

+50
c # linq entity-framework
Nov 12 2018-11-11T00:
source share
5 answers

UPDATE: With the addition of InExpression to EF6, Enumerable.Contains handling performance has improved significantly. The analysis in this answer is great, but pretty much out of date since 2013.

Using Contains in the Entity Framework is actually very slow. It is true that it translates into an IN clause in SQL and that the SQL query itself is fast. But the problem and performance bottleneck are related to translating your LINQ query into SQL. The expression tree that will be created expands into a long chain of OR concatenations because there is no native expression that represents IN . When SQL is created, this expression of many OR recognized and collapsed back into the SQL IN clause.

This does not mean that using Contains worse than issuing a single request for an item in your ids collection (your first option). This is probably even better - at least for not too large collections. But for large collections, this is really bad. I remember that some time ago I checked the Contains query with about 12,000 items that worked, but took about a minute, even if the SQL query completed in less than a second.

It might be worth checking the performance of a combination of multiple database calls with fewer elements in the Contains expression for each jumpback.

This approach, as well as the limitations of using Contains with Entity Framework, are shown and explained here:

Why does the Contains () operator so dramatically degrade the performance of the Entity Framework?

It is possible that the dbContext.Database.SqlQuery<Image>(sqlString) SQL command will work best in this situation, which means that you call dbContext.Database.SqlQuery<Image>(sqlString) or dbContext.Images.SqlQuery(sqlString) where sqlString is the SQL shown in @Rune's answer.

edit

Here are a few measurements:

I did this on a table with 550,000 records and 11 columns (identifiers begin with 1 without spaces) and randomly selected 20,000 identifiers:

 using (var context = new MyDbContext()) { Random rand = new Random(); var ids = new List<int>(); for (int i = 0; i < 20000; i++) ids.Add(rand.Next(550000)); Stopwatch watch = new Stopwatch(); watch.Start(); // here are the code snippets from below watch.Stop(); var msec = watch.ElapsedMilliseconds; } 

Test 1

 var result = context.Set<MyEntity>() .Where(e => ids.Contains(e.ID)) .ToList(); 

Result β†’ msec = 85.5 s

Test 2

 var result = context.Set<MyEntity>().AsNoTracking() .Where(e => ids.Contains(e.ID)) .ToList(); 

Result β†’ msec = 84.5 s

This tiny AsNoTracking effect AsNoTracking very unusual. This indicates that the bottleneck is not materialization of the object (not SQL, as shown below).

For both tests in SQL Profiler, you can see that the SQL query arrives at the database very late. (I didn’t exactly measure, but it was later than 70 seconds.) Obviously translating this LINQ query into SQL is very expensive.

Test 3

 var values = new StringBuilder(); values.AppendFormat("{0}", ids[0]); for (int i = 1; i < ids.Count; i++) values.AppendFormat(", {0}", ids[i]); var sql = string.Format( "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})", values); var result = context.Set<MyEntity>().SqlQuery(sql).ToList(); 

Result β†’ msec = 5.1 s

Test 4

 // same as Test 3 but this time including AsNoTracking var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList(); 

Result β†’ msec = 3.8 s

This time, the effect of disabling tracking is more noticeable.

Test 5

 // same as Test 3 but this time using Database.SqlQuery var result = context.Database.SqlQuery<MyEntity>(sql).ToList(); 

Result β†’ msec = 3.7 s

I understand that context.Database.SqlQuery<MyEntity>(sql) same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking() , so there is no difference between test 4 and test 5.

(The length of the result sets was not always the same due to possible duplicates after choosing a random identifier, but was always between 19,600 and 19,640 elements.)

Edit 2

Test 6

Even 20,000 database calls are faster than using Contains :

 var result = new List<MyEntity>(); foreach (var id in ids) result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id)); 

Result β†’ msec = 73.6 s

Note that I used SingleOrDefault instead of Find . Using the same code with Find very slow (I canceled the test after a few minutes), because Find calls DetectChanges inside. Disabling automatic change detection ( context.Configuration.AutoDetectChangesEnabled = false ) results in approximately the same performance as SingleOrDefault . Using AsNoTracking reduces time by one or two seconds.

Tests were performed with the database client (console application) and the database server on the same computer. The latter result can be much worse due to the "remote" database due to the large number of hits.

+122
Nov 13 '11 at 0:25
source share

The second option is definitely better than the first. The first option will result in ids.Length queries to the database, and the second option can use the 'IN' statement in the SQL query. This will basically turn your LINQ query into something like the following SQL:

 SELECT * FROM ImagesTable WHERE id IN (value1,value2,...) 

where value1, value2, etc. are the values ​​of your ids variable. Remember, however, that I think there may be an upper limit on the number of values ​​that can be serialized into a query this way. I will see if I can find the documentation ...

+4
Nov 12 '11 at 20:50
source share

I am using Entity Framework 6.1 and found out using the code that is better to use:

 return db.PERSON.Find(id); 

but not:

 return db.PERSONA.FirstOrDefault(x => x.ID == id); 

Search Result () vs FirstOrDefault are some thoughts on this.

+1
Feb 06 '15 at 9:13
source share

Converting a list to an array using the toArray () parameter improves performance. You can do it as follows:

 ids.Select(id => Images.Find(id)); return Images.toArray().Where( im => ids.Contains(im.Id)); 
+1
Jun 02 '18 at 11:18
source share

Vale, recently had a similar problem, and the best way I found is to insert the list contained in the temp table after creating the connection.

 private List<Foo> GetFoos(IEnumerable<long> ids) { var sb = new StringBuilder(); sb.Append("DECLARE @Temp TABLE (Id bitint PRIMARY KEY)\n"); foreach (var id in ids) { sb.Append("INSERT INTO @Temp VALUES ('"); sb.Append(id); sb.Append("')\n"); } sb.Append("SELECT f.* FROM [dbo].[Foo] f inner join @Temp t on f.Id = t.Id"); return this.context.Database.SqlQuery<Foo>(sb.ToString()).ToList(); } 

This is not very pretty, but for large lists it is very effective.

0
Apr 21 '17 at 11:55
source share



All Articles