How to use awk and grep in a 300GB.txt file?

I have a large .txt file, 300 GB, to be more precise, and I would like to put all the individual rows from the first column that match my template in another .txt file.

awk '{print $1}' file_name | grep -o '/ns/.*' | awk '!seen[$0]++' > test1.txt

This is what I tried, and as far as I can see, it works fine, but the problem is that after a while I get the following error:

awk: program limit exceeded: maximum number of fields size=32767
    FILENAME="file_name" FNR=117897124 NR=117897124

Any suggestions?

+4
source share
5 answers

The error message tells you:

line(117897124) has to many fields (>32767).

You better check:

sed -n '117897124{p;q}' file_name

Use cutto retrieve the 1st column:

cut -d ' ' -f 1 < file_name | ...

. ' ' . $'\t'.

+2

" " - "" , , , .

, awk grep :

sed -n 's/\(^pattern...\).*/\1/p' some_file | awk '!seen[$0]++' > test1.txt

awk ( sed , , , , ).

+2

, awk , 117,897,124. .

, - script, split, , 100,000,000 .


, , , limits, awk. , unlimited , , , ...

+2

( temp.swp), Vim, vime regex , reimx vim http://thewebminer.com/regex-to-vim

0

, awk. , , 1 , , :

awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\// && !seen[$0]++' file_name

, :

awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\//' file_name | sort -u

, , .

0
source

Source: https://habr.com/ru/post/1542492/


All Articles