What is the best way to manage a large number of tables in MS SQL Server?

This question is related to another:
Will multiple filegroups speed up my database?

The software that we are developing is an analytical tool that uses MS SQL Server 2005 to store relational data. Initial analysis may be slow (since we process millions or billions of rows of data), but there are performance requirements for quickly recalling previous analyzes, so we “save” the results of each analysis.

Our current approach is to save the results of the analysis in a series of "execution-specific" tables, and the analysis is quite complicated, and we can get up to 100 tables for analysis. Typically, these tables are used for analysis for a couple of hundred MB (which is small compared to our hundreds of GB or sometimes several TB of source data). But overall, disk space is not a problem for us. Each set of tables refers to one analysis, and in many cases this gives us tremendous performance improvements over the original data.

The approach begins to break down as soon as we accumulate enough saved analysis results - before we added more reliable archiving / cleaning capabilities, our test database reached several million tables. But we do not need to have more than 100,000 tables, even in production. Microsoft sets a rather huge theoretical limit on sysobjects (~ 2 billion), but as soon as our database grows above 100,000 or so, simple queries like CREATE TABLE and DROP TABLE can drop dramatically.

We have a place to discuss our approach, but I think it can be difficult to do without a larger context, so instead I want to ask a more general question: if we are forced to create so many tables, then what is the best approach to manage them ? Multiple filegroups? Multiple schemes / owners? Multiple Databases?

One more note: I’m not enthusiastic about the idea of ​​“just throwing hardware at a problem” (that is, adding RAM, processor power, disk speed). But we will not exclude this, especially if (for example) someone can tell us what the effect of adding RAM or using several filegroups will have to manage a large system directory.

+4
source share
4 answers

We have finished dividing our database into several databases. Thus, the main database contains a “database” table that refers to one or more “running” databases, each of which contains different sets of analysis results. Then, the main run table contains the database identifier, and the code that retrieves the stored result includes the corresponding database prefix for all queries.

This approach allows the system catalog of each database to be more intelligent, it provides better separation between master / persistent tables and dynamic / startup tables, and also makes backups and archiving more manageable. It also allows us to split our data across multiple physical disks, although using multiple filegroups would also do this. In general, it works well for us, given our current requirements, and, based on the expected growth, we believe that it will scale well for us.

We also noticed that SQL 2008 tends to process large system directories better than SQL 2000 and SQL 2005. (We did not upgrade until 2008 when I posted this question.)

0
source

Without the prior visibility of the entire system, my first recommendation would be to save historical runs in combo tables with RunID as part of the key - a dimensional model can also make a difference here. This table can be partitioned for improvement, which will also allow you to distribute the table to other file groups.

Another possibility is to put each run in its own database and then detach them, only binding them as needed (and in a read-only form)

CREATE TABLE and DROP TABLE probably work poorly because the wizard or model databases are not optimized for this behavior.

I also recommend talking to Microsoft about your choice of database design.

+2
source

Are tables different structures? If they have the same structure, you can leave with one partitioned table.

If they are different structures, but just subsets of the same set of dimension columns, you can still store them in partitions in the same table with zeros in inapplicable columns.

If it's analytical (derived calculation calculations, maybe?), You can dump the results of the calculation run to flat files and reuse your calculations by loading them from flat files.

+1
source

This seems to be a very interesting issue / application that you are working with. I would like to work on something similar. :)

You have a very large problem surface, and this makes it difficult to get started. There are several solution options that do not appear in your message. For example, how long do you plan to keep execution analysis tables? There are many more questions to ask.

You will need a combination of serious data storage and data / table partitioning. Depending on how much data you want to save and archive, you may need to start normalizing and smoothing the tables.

This would be a very good case when communicating directly with Microsoft can be mutually beneficial. Microsoft gets a good example to show other customers, and you get help directly from the supplier.

0
source

Source: https://habr.com/ru/post/1277097/


All Articles