How to use awk and grep in a 300GB.txt file?

Question

How to use awk and grep in a 300GB.txt file?

I have a large .txt file, 300 GB, to be more precise, and I would like to put all the individual rows from the first column that match my template in another .txt file.

awk '{print $1}' file_name | grep -o '/ns/.*' | awk '!seen[$0]++' > test1.txt

This is what I tried, and as far as I can see, it works fine, but the problem is that after a while I get the following error:

awk: program limit exceeded: maximum number of fields size=32767
    FILENAME="file_name" FNR=117897124 NR=117897124

Any suggestions?

+4

unix regex grep awk large-files

Jovan andonov May 29 '14 at 13:35

source share

5 answers

" " - "" , , , .

, awk grep :

sed -n 's/\(^pattern...\).*/\1/p' some_file | awk '!seen[$0]++' > test1.txt

awk ( sed , , , , ).

+2

Norman Gray 29 '14 13:47

, awk , 117,897,124. .

, - script, split, , 100,000,000 .

, , , limits, awk. , unlimited , , , ...

+2

jimm-cl 29 '14 13:47

( temp.swp), Vim, vime regex , reimx vim http://thewebminer.com/regex-to-vim

0

Adrian B 29 '14 14:02

, awk. , , 1 , , :

awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\// && !seen[$0]++' file_name

, :

awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\//' file_name | sort -u

, , .

0

Ed Morton May 30 '14 at 2:34

source share

kev · Accepted Answer · 2014-05-29T14:03:43+0000

The error message tells you:

line(117897124) has to many fields (>32767).

You better check:

sed -n '117897124{p;q}' file_name

Use cutto retrieve the 1st column:

cut -d ' ' -f 1 < file_name | ...

_{. ' ' . $'\t'.}

How to use awk and grep in a 300GB.txt file?

More articles: