How to find duplicate lines in a text file (except for the case) and print them?

I have a text file with 1,200 lines. Some of them are duplicates.

How to find duplicate lines in a file (but not worry about the case) and then print the line text on the screen so that I can go and find it? I do not want to delete them or anything else, just find what lines they can be.

+4
source share
3 answers

This is pretty easy with a set:

with open('file') as f: seen = set() for line in f: line_lower = line.lower() if line_lower in seen: print(line) else: seen.add(line_lower) 
+13
source

since there are only 1200 rows, so you can also use collections.Counter() :

 >>> from collections import Counter >>> with open('data1.txt') as f: ... c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search ... for line in c: ... if c[line]>1: ... print line ... 

if data1.txt looks something like this:

 ABC abc aBc CAB caB bca BcA acb 

output:

 cab abc bca 
+4
source

Search for case insensitive duplicates

This will not give you line numbers, but it will give you a list of duplicate lines that you can then explore further. For instance:

 tr 'AZ' 'az' < /tmp/foo | sort | uniq -d 

Sample Data File

 # /tmp/foo one One oNe two three 

The above pipeline will correctly display:

one

Search for line numbers

Then you can use grep for the corresponding line numbers:

 grep --ignore-case --line-number one /tmp/foo 
0
source

Source: https://habr.com/ru/post/1440333/


All Articles