I have a diff that is essentially equivalent to either extra unique lines or lines that move in the file, and therefore their line numbers have changed. To determine what is a truly new addition, I run this small perl snippet to separate the “allowed” lines from the “unresolved” lines:
perl -n -e' /^\-([^\-].*?)\([^,\(]+,\d+,\d+\).*$/ && do { print STDOUT "$1\n"; next; }; /^\+([^\+].*?)\([^,\(]+,\d+,\d+\).*$/ && do { print STDERR "$1\n"; next; }; ' "$delta" 1>resolved 2>unresolved
This is actually pretty fast, and it does the job, dividing the differentiation of lines between 6000+ into two more than 3000 linear files, removing any links to line numbers and unified markup. Next is the grep command, which seems to work on 100% CPU for almost 9 minutes (real):
grep -v -f resolved unresolved
This is essentially the removal of all allowed lines from an unresolved file. Exit after 9 minutes by coincidence - 9 lines of output - unique additions or unresolved lines.
Firstly, when I used grep in the past, it was pretty good, so why is it exceptionally slow in this case and the processor is hungry?
Secondly, is there a more efficient alternative way to delete lines from one file that is contained in another?
Craig source share