Using LINQ vs SQL to filter a collection

I have a very general question regarding using LINQ vs SQL to filter a collection. Suppose you are using a fairly sophisticated filter in a database table. It works, say, 10,000 times, and the filters can be different every time. Efficiency, is it better for you to load the entire collection of database tables into memory and execute filters using LINQ, or you should let the database handle filtering using SQL (since it was built). Any thoughts?

EDIT: I should have been more clear. Suppose we are talking about a table with 1000 records with 20 columns (containing int / string / date data). Currently, in my application, I run one request every half hour to pull all the data into the collection (storing this collection in the application cache) and filtering this cached collection in my application. I am wondering if this is worse than doing a lot of round trips to the database server (this is Oracle fwiw).

+4
source share
6 answers

After update:

It works say 10,000 times and

I'm going to take a table with 1000 entries

It seems reasonable to assume that 1k entries fit easily into memory.

And then the launch of 10k filters will be much cheaper in memory (LINQ).
Using SQL will mean loading 10M records, lots of I / O.

+2
source

EDIT

Its all dependent on the amount of data that you have. If you have large data than sql and less than linq. it also depends on how often to call data from the sql server, it is too often than it is better to load it into memory, rather than use linq, but if better than sql.

First answer

It is better to go on the sql side rather than loading into memory and applying the linq filter.

One of the reasons is best used for sql, and linq is

if you switch to linq when you get 10,000 records, they are loaded into memory, and also increase the flow of new users

if you switch to sql, no records are reduced, so the amount of memory used is less and also reduces network traffic.

+2
source

Depends on how big your table is and what types of data it stores.

Personally, I would go with the return of all the data if you plan to use all of your filters during the same request.

If this is an on-demand filter using ajax, you can reload data from the database each time (insure at the same time that your data is updated)

0
source

I will say that it is much better to allow SQL to execute a complex filter and the rest of the processing, but why you may ask.

The main reason is that SQL Server has the specified index information and uses this index for quick access to data . If you upload them to Linq, you do not have this index information for quick access to the data, and you lose time for their access. You also lose time compiling linq every time.

You can do a simple test to see it differently. Which test? Create a simple table with a hundred random row and index this field with a row. Then search the string field using linq, and one of them sets sql.

Update

My first thought was that SQL stores the index and makes very quick access to the search database in your SQL.

Then I think that linq can also translate this filter to sql and then get the data, then you perform your action, etc.

Now I think that the actual reason depends on what actions you do. Running SQL is faster , but the reason for this depends on how you actually install your linq.

If you try to load everything in memory and then use linq, you lose speed from the SQL index and lose memory and lose a lot of actions to move your data from sql to memory.

If you get data using linq, and then another search is not required, you lose all this data in memory when moving and lose memory.

0
source

This is likely to cause some debate about the role of the database! I had this exact problem a little back, some kind of relatively complicated filtering (for example, β€œis in an X-country where the price is y and has the keyword z), and it was terribly slow. In combination with this, I was not allowed to change database structure, as it was a third-party database.

I changed all the logic, so the database just returned the results (which I cached every hour) and did filtering in memory - when I did this, I saw that there was an increase in performance.

0
source

t depends on the amount of data you filter on.

You say that the filter runs 10K times, and each time it can be different, in this case, if you have little data in the database, you can load this server variable.

If you have hundreds of thousands of records in the database that you should not do, you can probably create indexes in the database and compiled procedures for quickly collecting data.

You can implement a cache facade between them, which will help you store data on the server side at the first request and update it according to your requirement. (you can write a cache to populate a variable only if the data has a limit on the number of records).

You can calculate the time to get data from the database by doing some test queries and observations. At the same time, you can observe the response time from the server, if the data is stored in memory, and calculate the difference and make a decision in accordance with this.

There may be many other tricks, but the base line

You must observe and decide.

-1
source

Source: https://habr.com/ru/post/1403088/


All Articles