How to quickly delete lines in a file that contains items from a list in another file in BASH?

I have a file with a name words.txtcontaining a list of words. I also have a file named file.txtcontaining a sentence per line. I need to quickly remove any lines in file.txtthat contain one of the lines from words.txt, but only if a match is found somewhere between {and }.

eg. file.txt:

Once upon a time there was a cat.
{The cat} lived in the forest.
The {cat really liked to} eat mice.

eg. words.txt:

cat
mice

Output Example:

Once upon a time there was a cat.

It is deleted because "cat" is on these two lines, and the words are also between {and }.

The following script successfully completes this task:

while read -r line
do
    sed -i "/{.*$line.*}/d" file.txt
done < words.txt

script . words.txt , while . sed -f, , , , , , .

script?

+4
6

awk:

awk 'NR==FNR{a["{[^{}]*"$0"[^{}]*}"]++;next}{for(i in a)if($0~i)next;b[j++]=$0}END{printf "">FILENAME;for(i=0;i in b;++i)print b[i]>FILENAME}' words.txt file.txt

file.txt , .

Once upon a time there was a cat.

:

awk '
    NR == FNR {
        a["{[^{}]*" $0 "[^{}]*}"]++
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                next
        b[j++] = $0
    }
    END {
        printf "" > FILENAME
        for (i = 0; i in b; ++i)
            print b[i] > FILENAME
    }
' words.txt file.txt

, , , awk , stdout. :

awk '
    NR == FNR {
        a["{[^{}]*" $0 "[^{}]*}"]++
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                next
    }
    1
' words.txt file.txt
+4

grep 2 :

grep -vf words.txt file.txt
+2

, grep . :

grep -f words.txt -v file.txt
  • f grep words.txt
  • v , , .

{}, , , ( , ).

+2

, :

sed -e 's/.*/{.*&.*}/' words.txt | grep -vf- file.txt > out ; mv out file.txt

words.txt " " grep.

+2

:

  • words.txt {.* .*}:

    awk '{ print "{.*" $0 ".*}" }' words.txt > wrapped.txt
    
  • grep :

    grep -v -f wrapped.txt file.txt
    

, words.txt , awk ( words.txt ) .

, :

awk '{ print "{.*" $0 ".*}" }' words.txt | grep -v -f - file.txt

- , grep stdin


words.txt , awk:

awk 'NR==FNR{a[$0]++;next}{p=1;for(i in a){if ($0 ~ "{.*" i ".*}") { p=0; break}}}p' words.txt file.txt

:

awk 'NR==FNR { a[$0]++; next }
     { 
         p=1
         for (i in a) {
             if ($0 ~ "{.*" i ".*}") { p=0; break }
         }
     }p' words.txt file.txt

, words.txt. file.txt. p , . , p false. p true, , .

+1

bash (4.x):

#!/bin/env bash4
# ^-- MUST start with a /bin/bash shebang, NOT /bin/sh

readarray -t words <words.txt          # read words into array
IFS='|'                                # use | as delimiter when expanding $*
words_re="[{].*(${words[*]}).*[}]"     # form a regex matching all words
while read -r; do                      # for each line in file...
  if ! [[ $REPLY =~ $words_re ]]; then # ...check whether it matches...
    printf '%s\n' "$REPLY"             # ...and print it if not.
  fi
done <file.txt

bash , awk, (O(n+m), sed -i O(n*m)), , .

+1

Source: https://habr.com/ru/post/1543557/


All Articles