Database distribution

What are the options for distributing data selectively?

I will explain my question with an example. Consider a central database that stores all the data. This database is located in a specific geographical location.

Appendix A requires a subset of the information present in the central database. In addition, Appendix A may be located in a geographic location different (and possibly far away) from where the central database is located.

So, I thought about creating a new database in the same place of application A, which will contain a subset of the central database information.

What technologies / products allow me to deploy such a configuration?

thanks

+6
source share
4 answers

Find database replication . SQL Server can do this for sure, others ( Oracle , MySQL , ...) should also have this.

The idea is that a copy (subset) is supported elsewhere. Updates are exchanged in stages. The way conflicts are resolved depends on your application.

+4
source

Most core software databases, such as MySql and SQL Server, can do the job, but this is not a good model. With the growth of the application (traffic and users), you will not only create a load on the central database server (which can serve other applications), but you will also abuse the network bandwidth for transferring data between the remote database and the application server.

The best model is to keep your data close to the application server and use a remote database for backup and recovery purposes. You can use FC \ IP SAN (or any other storage network architecture) as your storage model based on your application needs.

+2
source

One big question that you did not address is that for application A, you need read-only access to data or if it needs read-write access.

The immediate concept that comes to mind when reading your requirements is sharding . In MySQL, this can be accomplished using partitioning . In this case, before moving on to sections, make sure that you read their pros and cons . There are times when partitioning can slow down if your indexes are not well selected or your partitioning scheme is not well designed.

If your needs are read-only, this should be a fairly simple solution. You can use MySQL in the β€œMaster Slave” context and use application A on the slave. If you need to read and write, it becomes much more difficult.

Depending on your needs for recording, you can divide your readings into your subordinate, and your records into masters, but this greatly complicates the structure of the code (you need to deal with several connections to several dbs). The advantage of this type of layout is that you do not need to have a complex database infrastructure.

On the other hand, you can save your code as is and use Master-Master replication in MySQL. Although Oracle is not officially supported, many people have been successful in this. A quick Google search will find a huge list of blogs, howtos, etc. for you. Just keep in mind that your code must be correctly written to support this (for example: you cannot use auto-growth fields for a PC, etc.).

If you have money to spend money, you can look at some of the more commercial offers. Oracle and SQL Server support this.

You can also use block-based data replication, such as DRDB (and Mysql DRDB) to handle replication between your nodes, but the problem you will always encounter is what happens if your connection between the two nodes fails.

The biggest problem you will encounter is to handle conflicting updates in two separate database nodes. If your data is geographically dependent, this may not be a problem for you.

In short, this is a difficult (or inexpensive) problem to solve.

+1
source

It is important to consider the possibility of conflicts at the design stage at any time when you are talking about database replication.

Based on this, the SAP Sybase Replication Server allows you to do this either using a Sybase database or third-party databases.

In the Sybase world, this is often called the corporate coagulation environment. There may be several geographically separated databases, each of which has a subset of the data that they have primary control. At headquarters, there is a server that contains all the various subsets in one repository. You can replicate whole tables or replicate based on values ​​in separate rows / columns.

This leads to the fact that the databases are in a weakly consistent state. Transaction rates, geographic separation, and latency that may be inherent in the network will affect how quickly updates move from one database to another. If the network connection is temporarily disconnected, the Sybase replication server pauses the transaction and sends them immediately after the link returns, but the reliability and stability of the replication system will depend on the stability of the network connection.

Again, as others have argued, this is not cheap, but it is relatively straightforward to implement and support.

Disclaimer: I worked at Sybase and am still part of the SAP family of companies.

0
source

Source: https://habr.com/ru/post/919661/


All Articles