I am trying to delete a large number of rows (> 10 million, this is about 1/3 of all the records in the table) from the InnoDB MySQL table with the index primary / clustered. The field idis a primary / clustered index and is continuous without spaces. At least it should be, I'm not deleting records in the middle. But it's possible that some insert requests fail, and innodb highlights some identifiers that have become unused (I'm not sure if this is true). I only delete old entries that are no longer needed. The table contains varchar columns, so the rows do not have a fixed size.
First my attempt:
DELETE FROM `table` WHERE id<=10000000
It does not work with large io operations. It seems that mysql killed this query and discarded all changes. Estimated time to complete the request was 6 hours and approximately the same for rollback. My big mistake is that the transaction log size was standard 5 mb, keep this in mind. It should be increased.
The second attempt to delete in pieces 10,000 records, for example:
DELETE FROM `table` WHERE id<=10000;
COMMIT;
DELETE FROM `table` WHERE id<=20000;
COMMIT;
Etc. The query execution time from the beginning was about 10 seconds (on a laptop). The execution time increases gradually, and this is about 300 seconds per request after 6 hours of execution.
The third attempt is to make queries that, on average, run less than 1 second. php code:
protected function deleteById($table, $id) {
$MinId = $this->getMinFromTable($table, 'id');
$PackDeleteCount= $this->PackDeleteCount;
$timerTotal = new Timer();
$delCountTotal = 0;
$delCountReport = 0;
$delInfo = array();
$PackMinTime = round($this->PackDeleteTime - $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$PackMaxTime = round($this->PackDeleteTime + $this->PackDeleteTime*$this->PackDeleteDiv, 3);
$this->LogString(sprintf('Del `%s`, PackMinTime: %s; PackMaxTime: %s', $table, $PackMinTime, $PackMaxTime));
for (; $MinId < $id;) {
$MinId += $PackDeleteCount;
$delCountReport += $PackDeleteCount;
if ($MinId > $id) {
$MinId = $id;
}
$timer = new Timer();
$sql = sprintf('DELETE FROM `%s` WHERE id<=%s', $table, $MinId);
$this->s->Query($sql, __FILE__, __LINE__);
$delCount = $this->s->AffectedRows();
$this->s->CommitT();
$RoundTime = round($timer->end(), 3);
$delInfo[] = array(
'time' => $RoundTime,
'rows' => $PackDeleteCount,
);
$delCountTotal += $delCount;
if ($delCountReport >= $this->PackDeleteReport) {
$delCountReport = 0;
$delSqlCount = count($delInfo);
$EvTime = 0;
$PackTime = 0;
$EvCount = 0;
$PackCount = 0;
foreach ($delInfo as $v) {
$PackTime += $v['time'];
$PackCount += $v['rows'];
}
$EvTime = round($PackTime/$delSqlCount, 2);
$PackTime = round($PackTime, 2);
$EvCount = round($PackCount/$delSqlCount);
$TotalTime = $this->readableTime(intval($timerTotal->end()));
$this->LogString(sprintf('Del `%s`, Sql query count: %d; Time: %s; Count: %d; Evarage Time %s; Evarage count per delete: %d; Del total: %s; Del Total Time: %s; id <= %s', $table, $delSqlCount, $PackTime, $PackCount, $EvTime, $EvCount, $delCountTotal, $TotalTime, $MinId));
$delInfo = array();
}
$PackDeleteCountOld = $PackDeleteCount;
if ($RoundTime < $PackMinTime) {
$PackDeleteCount = intval($PackDeleteCount + $PackDeleteCount*(1 - $RoundTime/$this->PackDeleteTime));
} elseif ($RoundTime > $PackMaxTime) {
$PackDeleteCount = intval($PackDeleteCount - $PackDeleteCount*(1 - $this->PackDeleteTime/$RoundTime));
}
}
$this->LogString(sprintf('Finished del `%s`: time: %s', $table, round($timerTotal->end(), 2)));
}
It has some dependencies, but they are self-evident and can be easily changed by standard ones. I will explain only the input variables that are used here:
$table - target table, where rows needs to be deleted
$id - all records up to this id should be deleted
$MinId - Minimal id in the target table
$this->PackDeleteCount - Initial count of records, to start from. Then it recalculates row count to be deleted each new query.
$this->PackDeleteTime - desirable query execution time in average. I used 0.5
$this->PackDeleteDiv - acceptable deviation from $this->PackDeleteTime. In percentage. I used 0.3
$this->PackDeleteReport - Each N records should print statistic information about deleting
.
, , . , , , . , .
: , ?
, .