What are the benefits of a query using a view (s) for a query that does not use them?

Question

What are the benefits of a query using a view (s) for a query that does not use them?

I know how views are used, but I still don't see the real benefits of using them.

For example, in the following article http://techahead.wordpress.com/2007/10/01/sql-derived-tables/, the author tried to show the benefits of a query using a query view without one with an example where we want to create a report that shows the total number of orders, each customer placed in 1996, and we want this result set to include all customers, including those who did not place any orders this year, and those who never placed any orders at all ( HPP using the Northwind database).

But when I compare two queries, I don’t see any advantages of the query using the view (if nothing else, using the view does not simplify our code, at least not in this example)

Normal query:

SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders FROM Customers C LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996 GROUP BY C.CustomerID, C.CompanyName

Query using a view:

 SELECT C.CustomerID, C.CompanyName, COUNT(dOrders.OrderID) AS TotalOrders FROM Customers C LEFT OUTER JOIN (SELECT * FROM Orders WHERE YEAR(Orders.OrderDate) = 1996) AS dOrders ON C.CustomerID = dOrders.CustomerID GROUP BY C.CustomerID, C.CompanyName

Perhaps this just was not a good example, so could you show me an example where the benefits of a view are more obvious?

thanks

ANSWER TO HEAD:

In this case, you cannot capture both products and order aggregates if there is no connection between the Customers and the Products.

Could you clarify what exactly you mean? Will the following query return the same result as your query:

 SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders, COUNT(DISTINCT P.ProductID) AS DifferentProducts FROM Customers C LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996 LEFT OUTER JOIN Products P ON O.somethingID = P.somethingID GROUP BY C.CustomerID, C.CompanyName

RESPONSE TO ROUX FRAME:

In addition, if expressions are used to derive columns from derived columns with a lot of intermediate calculations, a set of nested views or stacked CTEs is the only way to do this:

 SELECT x, y, z1, z2 FROM ( SELECT * ,x + y AS z1 ,x - y AS z2 FROM ( SELECT x * 2 AS y FROM A ) AS A ) AS A

Will the following query return the same result as the above query:

 SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2 FROM A

+4

sql sql-server tsql

AspOnMyNet May 04 '10 at 18:42

source share

5 answers

I usually use a view (or CTE , which is sometimes an excellent alternative to view queries in SQL 2005/2008) to simplify reading and building queries, or when SQL does not allow me to perform a specific operation.

For example, one of the things you cannot do without a view or CTE puts an aggregate function in a WHERE clause. This will not work:

 SELECT name, city, joindate FROM members INNER JOIN cities ON cities.cityid = derived.cityid WHERE ROW_NUMBER() OVER (PARTITION BY cityid ORDER BY joindate) = 1

But this will work:

 SELECT name, city, joindate FROM ( SELECT name, cityid, joindate, ROW_NUMBER() OVER (PARTITION BY cityid ORDER BY joindate) AS rownum FROM members ) derived INNER JOIN cities ON cities.cityid = derived.cityid WHERE rn = 1

Additional warnings, especially for large-scale analytics

If you are working with relatively small data sets (rather than gigabytes), you can probably stop reading here. If you work with gigabytes of terabytes of data and use derived tables, read on ...

For very large-scale data operations, it is sometimes preferable to create a temporary table instead of using a derived query. This can happen if SQL statistics tell you that your derived query will return many more rows than the query will actually return, which happens more often than you think. Queries in which your main self-join with non-recursive CTE request is also problematic.

It is also possible that views will create unexpected query plans. For example, even if you put a WHERE string clause in your view to make this query very selective, SQL Server can reorder your query plan so that your WHERE clause is evaluated in the query plan. See Microsoft Connect Response for a discussion of this issue and a workaround.

Thus, for very intensive queries (especially data warehouse queries on 100GB + tables), I always like the prototype of the temporary table solution to find out if you have better performance than what you get from the view or CTE. This seems counterintuitive, since you do more I / O than a perfect single query solution, but with temporary tables you get complete control over the query plan used and the order of each subquery. This can sometimes improve performance of 10x or more.

I also prefer temporary tables in cases where I have to use query hints to get SQL to do what I want - if the SQL optimizer is already "mistaken", temporary tables are often a clearer way to get them to act like you want to.

I do not suggest this to be commonplace - in most cases, a temporary solution to the table will be at least a little worse, and sometimes tooltips are just one treatment. But don't assume that a CTE or derivative query solution will be your fastest option. Test, test, test!

+5

Justin grant May 04 '10 at 19:19

source share

Derived tables often replace correlated subqueries and are usually significantly faster.

They can also be used to significantly limit the number of records found for a large table, and therefore can also improve query speed.

As with all possible methods of increasing productivity, you need to check whether they have improved performance. The resulting table will almost always greatly exceed the correlated subquery, but there is a chance that this may not be.

In addition, there are times when you need to join data containing a cumulative number, which is almost impossible to do without a view or CTE (which in essence is another way to write a tbale derivative in many cases).

Derived tables are one of my most useful ways of defining complex reporting data. You can do it in pieces using table variables or temporary tables, but if you don't want to see the code in the procedural steps, people often change them to views as soon as they develop what they want using temporary tables.

Aggregating data from a join is another place where you need views.

+3

Hlgem May 04, '10 at 19:10

source share

Using your terminology and example, views are more complex without any advantages. However, for some things, a view is required. These can be in the most difficult cases of CTE (as shown above). But simple joins can demonstrate the need for views, all you have to do is process a query that requires an aggregate, here we use the quota query option to demonstrate this.

Select all the most expensive customer transactions.

 SELECT transactions.* FROM transactions JOIN ( select user_id, max(spent) AS spent from transactions group by user_id ) as derived_table USING ( derived_table.user_id = transaction.user_id AND derived_table.spent = transactions.spent )

+1

Evan carroll May 04 '10 at 19:08

source share

In this case, the YEAR(O.OrderDate) = 1996 allows YEAR(O.OrderDate) = 1996 in the WHERE clause.

In the external where clause, this is useless because it will change the JOIN to INNER.

Personally, I prefer the view (or CTE) design because it puts the filter in the right place

Another example:

 SELECT C.CustomerID, C.CompanyName, COUNT(D.OrderID) AS TotalOrders, COUNT(DISTINCT D.ProductID) AS DifferentProducts FROM Customers C LEFT OUTER JOIN ( SELECT OrderID, P.ProductID FROM Orders O JOIN Products P ON O.somethingID = P.somethingID WHERE YEAR(Orders.OrderDate) = 1996 ) D ON C.CustomerID = D.CustomerID GROUP BY C.CustomerID, C.CompanyName

In this case, you cannot capture both products and order aggregates if there is no connection between the Customers and the Products. Of course, this is far-fetched, but I hope I capture the concept

Edit:

I need to explicitly insert JOIN T1 and T2 before JOIN on MyTable. It happens. The resulting T1 / T2 join may be another request to 2 LEFT JOINs without a view. This happens quite often.

 SELECT --stuff-- FROM myTable M1 LEFT OUTER JOIN ( SELECT T1.ColA, T2.ColB FROM T1 JOIN T2 ON T1.somethingID = T2.somethingID WHERE --filter-- ) D ON M1.ColA = D.ColA AND M1.ColB = D.ColB

+1

gbn May 04, '10 at 19:10

source share

Cade roue · Accepted Answer · 2010-05-05T04:17:39+0000

In your examples, a view is not strictly necessary. There are many cases where you may need to join an aggregate or similar, and a view is actually the only way to handle this:

 SELECT * FROM A LEFT JOIN ( SELECT x, SUM(y) FROM B GROUP BY x ) AS B ON Bx = Ax

In addition, if expressions are used to derive columns from derived columns with a lot of intermediate calculations, a set of nested views or stacked CTEs is the only way to do this:

 SELECT x, y, z1, z2 FROM ( SELECT * ,x + y AS z1 ,x - y AS z2 FROM ( SELECT x * 2 AS y FROM A ) AS A ) AS A

Regarding maintainability, the use of stacked CTEs or views (they are mostly equivalent) can make the code more readable and maintainable, as well as simplify reuse and refactoring. The optimizer can usually be smoothed out easily.

I usually use folded CTEs instead of embedding them for greater readability (the same two examples):

 WITH B AS ( SELECT x, SUM(y) FROM B GROUP BY x ) SELECT * FROM A LEFT JOIN B ON Bx = Ax WITH A1 AS ( SELECT x * 2 AS y FROM A ) ,A2 AS ( SELECT * ,x + y AS z1 ,x - y AS z2 FROM A1 ) SELECT x, y, z1, z2 FROM A2

Regarding your question:

 SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2 FROM A

This code x * 2 is repeated 3 times. If this business rule needs to be changed, it will have to change in 3 places - a prescription for injecting defects. This gets complicated anytime you have intermediate calculations that need to be consistent and defined in only one place.

This will not be a problem if SQL Server scalar user-defined functions can be built-in (or if they are acceptable), you can simply create your own UDFs to collect your results, and the optimizer will eliminate redundant calls. Unfortunately, the scalar implementation of UDF SQL Server cannot handle this task for large sets of rows.

What are the benefits of a query using a view (s) for a query that does not use them?

More articles: