Which is better: to have many similar databases or one database with similar tables or one database with one table?

I need to work with multiple data samples, for example N Samples provide similar data, but from different sources. For example, the order history in different stores. Thus, the structure of all samples is the same. To work with data, I have several options:

  • Use N databases with identical schema, one for each sample

  • Use one database, but N sets of tables. For example, User_1, ..., User_N; Product_1, ..., Product_N, Order_1, ..., Order_N, etc.

  • Use one database with one set of User, Product, Order tables, but add an auxiliary column to each table that represents an approximate index. Obviously, this column must be an index.

The latter option seems to be the most convenient to use, because all queries become simple. In the second case, I need to send the table name to the query (stored procedure) as a parameter (is this possible?).

So how would you advise? Performance is very important.

+4
source share
3 answers

Step 1. Get the data warehouse book - ever since you do.

Step 2. Divide your data into facts (measurable things such as $, weight, etc.) and dimensions (immeasurable attributes such as product name, order number, usernames, etc.)

Step 3. Create a fact table (for example, order items) surrounded by the dimensions of this fact. Order item product, order item order, order item order number, order item date, etc. Etc. This will be one fact table and several dimension tables in one database. Each "origin" or "source" is simply a measurement of the basic fact.

Step 4. Use the very simple SELECT SUM () GROUP BY queries to summarize and analyze your data.

This is the highest performance, most scalable way to do business. Buy Ralph Kimball Data Warehouse Toolkit books for more details.

Do not create N databases with the same structure. Build one for TEST and one for PRODUCTION, but don't create N.

Do not create N tables with the same structure. What are the keys for?

+5
source

Here is one example. Each row of the fact table in the example contains one item from the order. The OrderID field can be used to search for all elements of a specific order.

sales_model_03

+2
source

Well, if you partition the databases, you will have smaller tables. This is usually more effective. If you ever need to get to another database, this is possible with Microsoft SQL Server. If you need to get to the database on another server, this is also possible.

This will depend on how strongly the data is correlated.

+1
source

Source: https://habr.com/ru/post/1299133/


All Articles