How to find duplicate lines in a text file (except for the case) and print them?

Question

How to find duplicate lines in a text file (except for the case) and print them?

I have a text file with 1,200 lines. Some of them are duplicates.

How to find duplicate lines in a file (but not worry about the case) and then print the line text on the screen so that I can go and find it? I do not want to delete them or anything else, just find what lines they can be.

+4

python text

samiles Oct 17 '12 at 15:26

source share

3 answers

since there are only 1200 rows, so you can also use collections.Counter() :

 >>> from collections import Counter >>> with open('data1.txt') as f: ... c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search ... for line in c: ... if c[line]>1: ... print line ...

if data1.txt looks something like this:

 ABC abc aBc CAB caB bca BcA acb

output:

 cab abc bca

+4

Ashwini chaudhary Oct 17 '12 at 15:34

source share

Search for case insensitive duplicates

This will not give you line numbers, but it will give you a list of duplicate lines that you can then explore further. For instance:

 tr 'AZ' 'az' < /tmp/foo | sort | uniq -d

Sample Data File

 # /tmp/foo one One oNe two three

The above pipeline will correctly display:

one

Search for line numbers

Then you can use grep for the corresponding line numbers:

 grep --ignore-case --line-number one /tmp/foo

0

Todd A. Jacobs Oct 17 '12 at 15:36

source share

mgilson · Accepted Answer · 2012-10-17T15:28:03+0000

This is pretty easy with a set:

with open('file') as f: seen = set() for line in f: line_lower = line.lower() if line_lower in seen: print(line) else: seen.add(line_lower)

How to find duplicate lines in a text file (except for the case) and print them?

Search for case insensitive duplicates

Sample Data File

Search for line numbers

More articles: