How to crop a file by the number of characters in a specific column

I have 4 columns separated by a character ;.

Some rows in the 3rd or 4th columns are huge with over 10,000 characters.

How to delete rows, no matter which columns, where the length of one particular column exceeds 10000 characters?

I tried with this

awk '{i += (length() + 1); if (i <= 10000) print $ALL}' 

But it takes the whole file, not just the specific column, and I need the length of the column, regardless of whether it is 3rd or 4th or, possibly, both.

TIA

+4
source share
4 answers

You can use this awk:

awk -F ';' 'length($3)<10000 && length($4)<10000' file

, 3 4 10000. , >= 10000, .

+4

, , :

$ cat file
a;b;c
d;efg;h
i;j;klm
opqr;s;t
uv;wx;yz

$ egrep -v '[^;]{3}' file
a;b;c
uv;wx;yz

$ awk '!/[^;]{3}/' file
a;b;c
uv;wx;yz

$ sed -r '/[^;]{3}/d' file
a;b;c
uv;wx;yz

"3" 1001 - ...

+5

Via sed,

sed '/^[^;]*;[^;]*;\([^;]\{10001\}[^;]*;[^;]*|[^;]*;[^;]\{10001\}[^;]*\)$/d' file

Via python

import csv
with open('/path/to/input/file') as infile:
    reader = csv.reader(infile, delimiter=";")
    for row in reader:
        if len(row[2]) <= 10000 and len(row[3]) <= 10000:
            print(row)
+4
source

This should work:

sed -n '/[^;]\{10001\}/!p' input

or that:

sed '/[^;]\{10001\}/d' input
+4
source

Source: https://habr.com/ru/post/1568643/


All Articles