Should I use complex SQL queries or process results in an application?

I am dealing with an application with huge SQL queries. They are so complex that when I finish understanding, I already forgot how it all began.

I was wondering if it would be good practice to pull more data from the database and make the final request in my code, say, using Python. Am I nuts? Would it be bad for performance?

Notice that the results are huge, too, I'm talking about ERP in manufacturing developed by other people.

+6
source share
5 answers

I would have the logic of the business in the application as much as possible. Complex business logic in queries is difficult to maintain. (when I finish understanding one, I already forgot how it all started). Complex logic in stored procedures is fine. But with a typical python application, you would like your business logic to be in python.

Now the database handles data better than application code. Therefore, if your logic is associated with a huge amount of data, you can get better performance with logic in the database. But it will be for complex reports, accounting operations, etc., which work with a large amount of data. You can use stored procedures or systems specializing in such operations (data warehouse for reports) for these types of operations.

Normal OLTP operations do not contain a lot of data. A database can be huge, but the data needed for a typical transaction will (usually) be a very small part of it. Querying in a large database can cause performance problems, but you can optimize this in several ways (indexes, full-text search, redundancy, pivot tables ... depend on your real problem).

There are exceptions to each rule, but as a general guideline, try to use your business logic in the application code. Stored procedures for complex logic. A separate data warehouse or set of procedures for reporting.

+3
source

Let the DB find out how best to get the necessary information, otherwise you will have to duplicate the RDBMS functionality in your code, and this will be more complicated than your SQL queries.

In addition, you will spend time transferring all this unnecessary information from the database to your application so that you can filter and process it in code.

All of this is true because you say you are dealing with big data.

+5
source

My experience is that you should let the database do the processing. This will be much faster than getting all the data from db first, as well as adding code and filtering the results.

The tough part here is to document your requests, so someone else (or even you) will understand what happens if you look at them after a while. I found that most databases allow comments in SQL. Sometimes between / * comment * / tags and sometimes commenting on a comment line.

A documented request might look like this:

select name, dateofbirth from ( -- The next subquery will retrieve .... select .... ) SR /* SR SubResults */ .... 
+1
source

@Nivas is usually correct.

These are fairly common patterns.

  • Labor department - database administrators should return all the data necessary for the business, but they only have a database for work. Developers could work with database administrators to do this better, but departmental responsibilities make it virtually impossible. Thus, SQL is used to get heavier data.

  • lack of smaller features. Can a massive query be broken down into smaller steps using worksheets? Yes, but I knew environments where the new table needed recalculations - the heavy query is just written

So, in general, getting data from a database is up to the database. But if the SQL query is too long, it will be difficult to optimize for RDBMS, and this probably means that the query covers data, business logic, and even presentation at a time.

I would suggest that a more robust approach typically divides the β€œget my data” parts into stored procedures or other controlled queries that populate the intermediate tables. Then, business logic can be written in the scripting language above and the management of stored procedures. And the presentation is left elsewhere. Essentially, solutions like cognos try to do it anyway.

But if you look at ERP in production, the limitations and the above solutions probably already exist - are you talking to the right people?

+1
source

One of my applications (B) uses tempdb to split complex requests into a package. Another application (B) uses complex queries without this.

Appendix A is more efficient for large databases.

But to take a look at your situation, you can execute the DMV script as follows:

 -- Get top total worker time queries for entire instance (Query 38) (Top Worker Time Queries) SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], t.[text] AS [Query Text], qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time], qs.total_worker_time/qs.execution_count AS [Avg Worker Time], qs.max_worker_time AS [Max Worker Time], qs.execution_count AS [Execution Count], qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time], qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads], qs.total_physical_reads/qs.execution_count AS [Avg Physical Reads], qp.query_plan AS [Query Plan], qs.creation_time AS [Creation Time] FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE); 

then you can open the query plan for the query for heavy queries and try to find something like:

StatementOptmEarlyAbortReason = "TimeOut" or StatementOptmEarlyAbortReason = "MemoryLimitExceeded"

These facts tell you to break up a complex request into batch + tempdb

PS. It works with good queries without scanning indexes / tables, missing indexes, etc.

0
source

Source: https://habr.com/ru/post/896460/


All Articles