List of grep arguments too long multiple patterns

I am currently looking for several patterns in a file. The file size is 90 GB, I am looking in a specific field (from position 6-17 on each line). I am trying to get all strings containing any particular list of numbers. The current syntax I'm using is:

grep '^.\{6\}0000000012345\|^.\{6\}0000000012543' somelargeFile.txt > outputFile.txt

For a small number of templates, this works. For a large number of templates, I get the "List Argument" error too long.

One of the alternatives I tried is to search each template separately (using the for loop on the templates), but this will require several passes through a large data file (57102722 lines), which is inefficient.

From what I understand about the "Argument List Too Long" error, it is related to bash cmds in general and is not specific to grep. Is there any parameter that can be used to work around this error? Or, alternatively, any ideas on how to do this with awk or sed or another tool?

Thank!

+4
source share
2 answers

You can avoid the problem by placing the templates in a file and using the command-line option -ffor grep.

The most convenient way is to place each alternative on a separate line in the file:

patterns.txt

^.\{6\}0000000012345
^.\{6\}0000000012543

call

grep -f patterns.txt somelargeFile.txt > outputFile.txt
+6
source

Try using the rotation operator.

grep '^.\{6\}0000000012\(345\|543\)'
+1
source

Source: https://habr.com/ru/post/1598404/


All Articles