Demand:
We have two identical tables on two servers. The first table on the server has unique key columns A, B, C, and we insert table 1 into table2, which have unique key columns B, C, D.
Table 1 has approximately 5 million rows, and table2 will insert about 3 million rows due to various limitations of unique key columns.
Here you need to get all the rows from Table 1 and insert into Table 2, if in table 2 there is no such record, and if the record matches, increase the counter and update the column "cron_modified_date" in table 2.
The PHP version is 5.5, and for this installation, MySQL version 5.7, and the database server is 6 GB.
When executing the below script, the processing speed becomes very slow after processing 2 million records, and the RAM is not freed, and after all RAM is consumed all the time the script, and after that the script is not processed at all.
As you can see, I reload the variables and close the connection to the database, but not freeing the RAM server of the database. After some reading, I found out maybe the PHP garbage collection should be called manually to free up resources, but also does not free up RAM.
What am I doing wrong here and how to process millions of records using PHP, MYSQL?
Any other way to free up RAM while the script is running and so that the script has to compete with the execution?
$queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'"; $rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll(); $recordsPerIteration = 50000 ; $totalCount = $rowsCount[0]['totalRecords']; $start = 0; gc_disable() ; if ( $totalCount > 0 ) { while ( $totalCount > 0 ) { $query = "SELECT * FROM TABLE1 WHERE where created_date > = '2018-02-10' ORDER BY suggestion_id DESC LIMIT ".$start.",".$recordsPerIteration; print "sql is $query" ; $getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll(); $GLOBALS['db']->queryString = null; $GLOBALS['db']->close() ; foreach ($getAllRows as $getRow) { $insertRow = " INSERT INTO TABLE2 ( Name, Company, ProductName, Status, cron_modified_date) VALUE ( ".$GLOBALS['db_ab']->quote($getRow['Name']).", ".$GLOBALS['db_ab']->quote($getRow['Company']).", ".$GLOBALS['db_ab']->quote($getRow['ProductName']).", ".$getRow['Status'].", ".$GLOBALS['db_ab']->quote($getRow['created_date'])." ) ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ; $GLOBALS['db_ab']->execRaw( $insertRow ) ; $GLOBALS['db_ab']->queryString = null; $getRow = null; $insertRow = null; $GLOBALS['db_ab']->close() ; } gc_enable() ; $totalCount = $totalCount- $recordsPerIteration; $start += $recordsPerIteration ; $getAllRows = null; gc_collect_cycles() ; } }
Decision
After the suggestions provided by @ABelikov and a few hit and trail methods ... Finally, the code below works fine and frees up RAM after each insert of 50K entries.
Below are the main findings.
- Release DB connection variables after each major operation that involves large data operations and reconnecting the database so that the database buffer is flushed.
Group insertion instructions and perform insertion at a time. Do not write once in a loop.
Thank you guys for the valuable advice and help.
/* Fetch records count for batch insert*/ $queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'"; $rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll(); $recordsPerIteration = 50000 ; $totalCount = $rowsCount[0]['totalRecords']; $start = 0; if ( $totalCount > 0 ) { while ( $totalCount > 0 ) { $query = "SELECT * FROM TABLE1 WHERE where created_date > = '2018-02-10' ORDER BY suggestion_id DESC LIMIT ".$start.",".$recordsPerIteration; print "sql is $query" ; $getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll(); $GLOBALS['db']->queryString = null; $GLOBALS['db']->close() ; $insertRow = " INSERT INTO TABLE2 ( Name, Company, ProductName, Status, cron_modified_date) VALUE ( " ; foreach ($getAllRows as $getRow) { $insertRow .= (".$GLOBALS['db_ab']->quote($getRow['Name']).", ".$GLOBALS['db_ab']->quote($getRow['Company']).", ".$GLOBALS['db_ab']->quote($getRow['ProductName']).", ".$getRow['Status'].", ".$GLOBALS['db_ab']->quote($getRow['created_date'])."),"; } $insertRow=rtrim($insertRow,','); // Remove last ',' $insertRow.= " ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ; $GLOBALS['db_ab']->execRaw( $insertRow ) ; //Flushing all data to freeup RAM $GLOBALS['db_ab'] = null ; $GLOBALS['db'] = null ; $insertRow = null; $totalCount = $totalCount- $recordsPerIteration; $start += $recordsPerIteration ; $getAllRows = array(); $getAllRows = null; print " \n Records needs to process ".$totalCount."\n"; } }