Well, therefore, if I read this correctly, you will have:
file1:
abc 12 34 abc 56 78 abc 90 12
file2:
abc 90 87 <-- common column 2 abc 12 67 <---common column 2 abc 23 1 <-- unique column 2
the conclusion should be:
abc 12 101 abc 90 99
If this is the case, then something like this (assuming they are formatted in .csv format):
$f1 = fopen('file1.txt', 'rb'); $f2 = fopen('file2.txt', 'rb'); $fout = fopen('outputxt.'); $data = array(); while(1) { if (feof($line1) || feof($line2)) { break; // quit if we hit the end of either file } $line1 = fgetcsv($f1); if (isset($data[$line1[1]])) { // saw the col2 value earlier, so do the math for the output file: $col3 = $line1[2] + $data[$line1[1]]; $output = array($line[0], $line1[1], $col3); fputcsv($fout, $output); unset($data[$line1[1]]); } else { $data[$line1[1]] = $line1; // cache the line, if the col2 value wasn't seen already } $line2 = fgetcsv($f2); if (isset($data[$line2[1]])) { $col3 = $data[$line2[1]] + $line2[2]; $newdata = array($line2[0], $line2[1], $col3); fputcsv($fout, $newdata); unset($data[$line2[1]]); // remove line from cache } else { $data[$line2[1]] = $line2; } } fclose($f1); fclose($f2); fclose($fout);
This is leaving my mind, not being tested, probably will not work, YMMV, etc.
This will make things easier if you pre-sort the two input files so that column2 is used as the sort key. This will reduce the cache size, as you know, if you had already seen a comparable value and when dumped previously saved data.
source share