Divide the hexadecimal index into n parts

Question

Divide the hexadecimal index into n parts

I have a table with the main keys in a row , for example 12a4... , c3af... I want to process them in parallel:

 process_them(1,4) on machine 1 process_them(2,4) on machine 2 process_them(3,4) on machine 3 process_them(4,4) on machine 4

The implementation of the above should select all the rows in the table, without matching the machines with each other. The best idea I can come up with is to split them into 16, for example:

 select * from table where id like '1%' ... select * from table where id like 'e%' select * from table where id like 'f%'

Is there a better idea that allows me more sections like 1/2, 1/4, 1/8, 1/16, 1/32 etc. from all the lines?

Note. I do this to process user data at night and send their notification. I do not edit anything in the database itself. And we need to process thousands of users at a time, it cannot be divided in a fine-grained way, since it will not be effective in this way.

+6

mysql

aitchnyu Aug 1 '13 at 5:51

source share

2 answers

gbtimmon · Answer 1 · 2013-08-01T18:01:01+0000

The perfect idea ...

you can use the MD5 hash to quickly distribute strings in a reasonable distributed form, sequentially (there will never be a skipped string) and without changing ddl.

 *let n = number of desired partitions. Use the following sql to *let s = salt, expirementally chosen to provide the best distribution based on key allocation pattern. SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 0; SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 1; ... ... SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = (n-1);

This is the approach that I saw, implemented in the production environment several times with good results.

SQL is not verified here. I do not do gastyee on sytax.

Randomseeed · Answer 2 · 2013-08-01T14:36:00+0000

The easiest way is to add the status column to your table with at least two states:

 0 = pending 1 = *not* pending

Then, each processing thread could “reserve” a small batch of lines for processing. General workflow:

 BEGIN TRANSACTION; SELECT * FROM queue WHERE status = 0 LIMIT 5 FOR UPDATE; -- load 5 pending items -- if no pending item: terminate here -- save this list of jobs in your application layer here UPDATE queue SET status = 1 WHERE id IN (@id_list); -- list of id from the previous step COMMIT; -- process your jobs here -- loop

Depending on the actual processing time of your assignments, this approach may have too much overhead to meet your needs. Increase the LIMIT in the first step to load more jobs at a time, in order to reduce relative overhead due to the possibly less balanced distribution of tasks across all processes.

Divide the hexadecimal index into n parts

More articles: