Removing duplicate rows based on partial string comparison

I have a text file containing thousands of lines of text, as shown below.

123 hello world 124 foo bar 125 hello world 

I would like to test duplicates by checking the row subsection. For the above, he should output:

 123 hello world 124 foo bar 

Is there a vim command that can do this?

Refresh . I am on a Windows machine, so I can not use uniq

+4
source share
4 answers

This is the bash command:

 sort -k2 input | uniq -s4 
  • sort -k2 skip the first field when sorting
  • uniq -s4 skip the next 4 characters

In vim, you can invoke the external command above:

 :%!sort -k2 % | uniq -s4 
  • the second % will expand to the current file name.

Actually, you can sort in vim with this command:

 :sort /^\d*\s/ 
  • vim will skip matching numbers when sorting

After sorting, use this command to remove duplicate rows:

 :%s/\v(^\d*\s(.*)$\n)(^\d*\s\2$\n)+/\1/ 
  • To avoid too much backslash, I use \v in the pattern to enable VERY MAGIC .
  • In a multi-line pattern, $ will match the position before the newline character ( \n ). I do not think this is necessary here.
  • You can create your own regular expression.
+7
source

Using awk:

 $ awk '!a[$2$3]++' file 123 hello world 124 foo bar 

The first element, when entering the array, sets the count as 1, and therefore, further occurrences do not enter the array, since negation makes it false.

+1
source

In VIM, I was able to sort and delete duplicates with the following command

 :sort u 
+1
source

I'm not sure about vim, but you can do something with the uniq command. It has the -skip-fields argument, which can be used to skip the first part of each line.

 $ cat test.txt 123 hello world 124 foo bar 125 hello world $ cat test.txt | sort -k 2 | uniq --skip-fields=1 | sort 123 hello world 124 foo bar 
0
source

Source: https://habr.com/ru/post/1444385/


All Articles