Remove duplicate text between metrics

Question

Remove duplicate text between metrics

I have a data file for fortune that contains many repeating states. I would like to remove them.

Fortune is outlined %, so an example of a good luck file might look like this:

%
This is sample fortune 1
%
This is 
sample fortune 2
%
This fortune 
is repeated
%
This is sample fortune 3
%
This fortune 
is repeated
%
This fortune
is unique
%

As you can see, fate can span several lines, making decisions useless here .

What can I do to find and remove duplicate states? I was thinking of just finding ways to awkignore lines starting with %, but some states have the same lines but not the same overall (for example, the last two in my example), so this is not enough.

awk , .

+4

bash duplicates

SnoringFrog 03 . '15 20:37

2

Awk . "%\n", :

awk 'BEGIN{RS="%\n"} { if (! ($0 in fortunes)) { fortunes[$0]++; print $0 "%"} }' data
%
This is sample fortune 1
%
This is 
sample fortune 2
%
This fortune 
is repeated
%
This is sample fortune 3
%
This fortune
is unique
%
$

+4

Jonathan Leffler 03 . '15 20:46

hek2mgl · Accepted Answer · 2015-11-03T20:46:15+0000

awk:

awk 'seen[$0]{next}{seen[$0]=1}1' RS='%' ORS='%' fortune

RS='%' , % .

seen[$0] , . $0 - , , . , .

{seen[$0]=1} . 1 , . , , - next.

ORS='%' %.

Remove duplicate text between metrics

More articles: