How can I print 2 lines if the second line matches the first line?

Let's say I have a file with several million lines organized like this:

@1:N:0:ABC
XYZ

@1:N:0:ABC
ABC

I am trying to write a single line function grep / sed / awk that returns both lines if the line NCCGGAGAfrom the first line is in the second line.

When I try to use grep -A1 -Pand scroll matches with a match, like '(?<=:)[A-Z]{3}', I get stuck. I think that my work cannot me here.

+4
source share
3 answers

WITH awk

$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
@1:N:0:ABC
ABC
  • -F:use :as delimiter, makes it easy to get the last column
  • s=$NF; p=$0 save the last column value and the entire row for printing later
  • NF==1 If the line does not contain :
  • $0 ~ s ,
    • , index($0,s) ,
  • , , , :, , :


GNU sed ( , )

$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
@1:N:0:ABC
ABC
  • /:/ :
  • N
  • /.*:(.*)\n.*\1/ : ,

, , .. ,

@1:N:0:ABC
@1:N:0:XYZ
XYZ
+6

Input_file , , .

awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF'  Input_file

EDIT: Sundeep.

awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file
+3

This may work for you (GNU sed):

sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file

Use grep-like option -nto explicitly print lines. Read the two lines in the template space and print them if they meet the requirements. Always delete the first and repeat.

+3
source

Source: https://habr.com/ru/post/1695962/


All Articles