How to crop a file by the number of characters in a specific column

Question

How to crop a file by the number of characters in a specific column

I have 4 columns separated by a character ;.

Some rows in the 3rd or 4th columns are huge with over 10,000 characters.

How to delete rows, no matter which columns, where the length of one particular column exceeds 10000 characters?

I tried with this

awk '{i += (length() + 1); if (i <= 10000) print $ALL}'

But it takes the whole file, not just the specific column, and I need the length of the column, regardless of whether it is 3rd or 4th or, possibly, both.

TIA

+4

python bash awk sed

Andy k Dec 22 '14 at 16:34

source share

4 answers

, , :

$ cat file
a;b;c
d;efg;h
i;j;klm
opqr;s;t
uv;wx;yz

$ egrep -v '[^;]{3}' file
a;b;c
uv;wx;yz

$ awk '!/[^;]{3}/' file
a;b;c
uv;wx;yz

$ sed -r '/[^;]{3}/d' file
a;b;c
uv;wx;yz

"3" 1001 - ...

+5

Ed Morton 22 . '14 20:33

Via sed,

sed '/^[^;]*;[^;]*;\([^;]\{10001\}[^;]*;[^;]*|[^;]*;[^;]\{10001\}[^;]*\)$/d' file

Via python

import csv
with open('/path/to/input/file') as infile:
    reader = csv.reader(infile, delimiter=";")
    for row in reader:
        if len(row[2]) <= 10000 and len(row[3]) <= 10000:
            print(row)

+4

Avinash raj Dec 22 '14 at 16:39

source share

This should work:

sed -n '/[^;]\{10001\}/!p' input

or that:

sed '/[^;]\{10001\}/d' input

+4

perreal Dec 22 '14 at 16:44

source share

anubhava · Accepted Answer · 2014-12-22T16:35:49+0000

You can use this awk:

awk -F ';' 'length($3)<10000 && length($4)<10000' file

, 3 4 10000. , >= 10000, .

How to crop a file by the number of characters in a specific column

More articles: