Combining sim files for partitioned tables

I have two directories, each of which contains a table broken by date. Each directory has its own sym file, as expected. The tables are exactly the same.

I want to combine this into one directory, but I am having problems with this. At first I tried to create a soft link (due to the large amount of data) of the partitions in another directory. This did not work as the tables used the wrong sym file.

Does anyone have an idea how best to do this? Do I need to restore a new sym file for both directories?

thanks

+6
source share
2 answers

I'm not sure that I understand exactly what your situation is, but I can come up with several options.

  • The two databases are exactly the same. If you run a checksum in both directories, the hashes will match.

In this case, why do you need two copies? You can run multiple q processes from the same copy of the database. This is actually preferable because you get the advantage of the general caching provided by the OS disk cache. Just delete one of the copies and list all q processes in the same directory.

  • The two databases contain data downloaded from the same source, but otherwise do not match. If I query each of the databases with the same query, I can get the same result, but the checksums of the files do not match.

This can happen if the databases were created independently, but with the same source data. If you have not actually made a copy of the files, you cannot assume that the databases are the same. The obvious example is that you had a bunch of files uploaded to each database, but the order of the downloaded files was different for each database. In this case, you cannot use the same sym! This will make the data look at first glance, but all your sym values ​​are wrong. If for some reason you want to combine the two databases, you will need to take data from one database and load it into another. This is the only reliable way to be 100% sure that you will not damage your data.

  • You have two different databases, each of which contains the same table (in the sense of a checksum, you may have copied the table files from one directory to another).

This probably won't work if by some miracle the sym values ​​all match, which won't happen if the rest of the database is different. This is because the listed sym values ​​are global and depend on all sym values ​​in the database. If you need a table in both databases, you will need to recalculate the sym columns for each copy that you copy.

+4
source

Read every day from one directory, evaluate all of the listed sym columns, and write to another directory by listing another sym file.

+3
source

Source: https://habr.com/ru/post/947994/


All Articles