How many associations are possible in practice

This question may be more appropriate for .stackexchange programmers. If yes, please migrate.

I am currently reflecting on the complexity of typical data models. Everyone knows that data models must be normalized, however, on the other hand, a normalized data model will require quite a few connections for subsequent data collection. And joins are potentially expensive operations, depending on the size of the tables used. So the question I'm trying to figure out is how would this compromise normally be used? That is, in practice, how many connections will you find acceptable in typical queries when developing a data model? This would be especially interesting when counting multiple joins in single queries.

As an example, let's say that we have users who have houses, in which there are rooms, in which there are boxes, in which there are objects. It is trivial to normalize this using tables for users, houses, rooms, drawers and items in the sense described above, later it would require me to join the five tables upon receipt of all items belonging to a particular user. It seems to me a very difficult task.

Most likely, the size of the tables will also be involved. Combining five tables with little data is not as bad as three tables with millions of rows. Or is it wrong?

+6
source share
4 answers

There are reasons to normalize the database , and I have seen queries with more than 20 tables and subqueries that join together, working perfectly for a long time. I find that the concept of normalization is a huge victory, because it allows me to introduce new features that need to be added to existing working applications without affecting the so-called working parts.

Databases have various functions that make your life easier:

  • You can create views for the most frequently used queries (although this is not the only use case for views);
  • some RDBMS provide Common Table Expressions (CTEs) that allow you to use named subqueries as well as recursive queries;
  • some RDBMS provide extension languages ​​(e.g. PL / SQL or PL / pgSQL), which allows you to develop your own functions to hide the complexity of your schema and use only API calls to manage your data.

While there was a question related to how the SQL statement containing mutiple joins works? Perhaps it's worth a look at it.

Developing an application with a normalized database is simpler, because with the right approach, you can isolate your scheme using representations / functions and make the application code immune to changes to the scheme. If you go for a denormalized design, it may happen that design changes will affect a large amount of your code, since denormalized systems tend to be optimized at high speed due to the possibilities of change.

+5
source

Database normalization is an art form itself.
If you structure your joins correctly, you will only grab the columns you need.
It should be much faster to run a query with millions of records with multiple tables and just attach the fields you need if you said one or two tables with all the records. In the second example, you extract all the data and sort it, this will be a coding nightmare.
MySQL is only very good at getting the requested data.
Just because the request is long does not mean that it is slower.
I saw query requests with over 20 lines of code that were very fast.

Have faith in the request you are writing, and if you are not writing a test script, try it yourself.

+5
source

A fully normalized data model has a high cost in performance, but is more resilient to change. A data model that is as penny as one tuned for a single query will work much better, but you have to pay a price when the specifications change.

So, maybe the question is, will the use of your data model (query) change a lot? If not; do not normalize them, just configure them for specific queries (ask your database administrator). Otherwise, it is normalized and only according to the query execution plan, if you use for many associations, I can not give you a specific number.

+3
source

To solve your question, the answer is in:

http://en.wikipedia.org/wiki/Database_normalization

If performance becomes a problem when using denormalization, these problems can be solved. Thinking about this step forward (if you no longer have the expected workload) should not be done. Denormalize when it is really necessary and based on measurements.

+1
source

Source: https://habr.com/ru/post/919305/


All Articles