We have one general requirement (data transfer) for batch changing data, such as a user ID column (changing user ID from 001 to 002, changing user ID from 003 to 004). but the user ID field in table 1 is not the primary key (we cannot get all the rows for updating, except for selecting * from the table), and in table2 the primary key (we can handle this case). Thus, we do not have methods to select all the data, using the reason for all the tables.
So how to fulfill this requirement?
I just come up with two methods:
(1) select * from the table with sample size setting. Then update it .// Correct? (2) use the copy command in one CVS, and then change it and import again .// Is the performance slow?
Is it possible to use these methods in production (s> millionth records). Or is there any other standard method for this requirement? Sstableloader? Pig?
Perhaps it is a common requirement to change one column to the entire existing table in order to possibly exist on a standard solution.
Regardless of which method we choose, finally, when the data migration , how to solve the new problem of data transfer over the past period of the old data migration. In other words, how to solve the problem of data migration?
Expect Your Playback
table1 userid (pk) name sex
table2 phonenumber (pk) userid
source share