Two solutions are presented. Both of these proposed solutions are only mysql and can be used by any programming language as a consumer. PHP would be too slow for this, but it could be its consumer.
Quick fix . I can cast 1000 random rows from a table of 19 million rows in about 2 tenths of a second with more advanced programming methods.
Slower solution . It takes about 15 seconds using technology without power.
By the way, both use the data generation seen HERE , which I wrote. So this is my little scheme. I use this, continue with TWO strong> more self inserts visible there until I have 19M lines. So I'm not going to show it again. But to get these 19 mm rows, look and do two more of these inserts, and you have 19M rows.
Slow version one
First, the slower method.
select id,thing from ratings order by rand() limit 1000;
This returns 1000 rows in 15 seconds.
For those new to mysql, don't even read the following.
Faster solution
This is a little harder to describe. The bottom line is that you pre-calculate your random numbers and generate the end in clause random numbers separated by commas and wrapped in a pair of parentheses.
It will look like (1,2,3,4) , but it will have 1000 numbers.
And you store them and use them once. Like a one-time keyboard for cryptography. Well, this is not a good analogy, but you hope I hope so.
Think of it as the end for the in clause and save it in the TEXT column (e.g. blob).
Why would anyone in the world want to do this? Because RNGs (random number generators) are excessively slow. But generating them with a few machines can quickly turn out thousands. By the way, (and you will see this in the structure of my so-called applications, I’ll write down how long it takes to generate one row. About 1 second with mysql. But C #, PHP, Java, everything can bring it together. This is not something how you put it together, but rather that you have it when you want it.
This strategy, long and short, when combined with retrieving a string that was not used as a random list, marking it as being used and issuing a call, for example
select id,thing from ratings where id in (a,b,c,d,e, ... )
and the in offer has 1000 numbers, results are available in less than half a second. Effective use of CBO mysql (cost-based optimizer), which treats it as a connection to a PC index.
I leave it in a summary form, because in practice it is a little complicated, but includes the following particles potentially
- table containing precomputed random numbers (Appendix A)
- mysql event creation strategy (Appendix B)
- stored procedure in which employees compose a prepared statement (Appendix C)
- stored in mysql proc to demonstrate the RNG
in clause for strokes (Appendix D)
Appendix A
A table containing precomputed random numbers
create table randomsToUse ( -- create a table of 1000 random numbers to use -- format will be like a long "(a,b,c,d,e, ...)" string -- pre-computed random numbers, fetched upon needed for use id int auto_increment primary key, used int not null, -- 0 = not used yet, 1= used dtStartCreate datetime not null, -- next two lines to eyeball time spent generating this row dtEndCreate datetime not null, dtUsed datetime null, -- when was it used txtInString text not null -- here is your in clause ending like (a,b,c,d,e, ... ) -- this may only have about 5000 rows and garbage cleaned -- so maybe choose one or two more indexes, such as composites );
Appendix B
It is in the interest of not turning this into a book, see my answer HERE for the mechanism for triggering a repeating mysql event. He will support the contents of the table in Appendix A using the methods described in Appendix D and the other thoughts you want to come up with. For example, reusing strings, archiving, deleting, whatever.
Appendix C
stored procedure to just get 1000 random rows.
DROP PROCEDURE if exists showARandomChunk; DELIMITER $$ CREATE PROCEDURE showARandomChunk ( ) BEGIN DECLARE i int; DECLARE txtInClause text; -- select now() into dtBegin; select id,txtInString into i,txtInClause from randomsToUse where used=0 order by id limit 1; -- select txtInClause as sOut; -- used for debugging -- if I run this following statement, it is 19.9 seconds on my Dell laptop -- with 19M rows -- select * from ratings order by rand() limit 1000; -- 19 seconds -- however, if I run the following "Prepared Statement", if takes 2 tenths of a second -- for 1000 rows set @s1=concat("select * from ratings where id in ",txtInClause); PREPARE stmt1 FROM @s1; EXECUTE stmt1; -- execute the puppy and give me 1000 rows DEALLOCATE PREPARE stmt1; END $$ DELIMITER ;
Appendix D
It may be intertwined with the concept of application B. However, you want to do this. But this leaves you with something to see how mysql can do it all on its own on the RNG side. By the way, for parameters 1 and 2, which are 1000 and 19 M, respectively, it takes 800 ms on my machine.
This procedure can be written in any language, as indicated at the beginning.
drop procedure if exists createARandomInString; DELIMITER $$ create procedure createARandomInString ( nHowMany int, -- how many numbers to you want nMaxNum int -- max of any one number ) BEGIN DECLARE dtBegin datetime; DECLARE dtEnd datetime; DECLARE i int; DECLARE txtInClause text; select now() into dtBegin; set i=1; set txtInClause="("; WHILE i<nHowMany DO set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,", "); -- extra space good due to viewing in text editor set i=i+1; END WHILE; set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,")"); -- select txtInClause as myOutput; -- used for debugging select now() into dtEnd; -- insert a row, that has not been used yet insert randomsToUse(used,dtStartCreate,dtEndCreate,dtUsed,txtInString) values (0,dtBegin,dtEnd,null,txtInClause); END $$ DELIMITER ;
How to call the stored procedure proc above:
call createARandomInString(1000,18000000);
This generates and stores 1 line of 1000 numbers wrapped as described above. Large numbers, from 1 to 18 m.
As a quick illustration, if someone had to modify the saved process, undo the line at the bottom that says “used for debugging”, and as the last line in the saved proc that starts, and run this:
call createARandomInString(4,18000000);
... to generate 4 random numbers up to 18M, the results may look like
+-------------------------------------+ | myOutput | +-------------------------------------+ | (2857561,5076608,16810360,14821977) | +-------------------------------------+
Appendix E
Reality check. These are somewhat advanced methods, and I cannot instruct anyone on them. But I still wanted to separate them. But I can’t learn it. Again and again.