How to get uniq strings with different encodings

I have a 1.txt file

$ cat 1.txt page1 age1 

But:

 $ head -n1 1.txt | file -i - /dev/stdin: text/plain; charset=us-ascii $ head -n2 1.txt | tail -n1 | file -i - /dev/stdin: text/plain; charset=utf-8 

Strings have a different encoding. Because of this, I cannot get a unique string using the method I know:

 $ cat 1.txt | sort | uniq -c | sort -rn 1 age1 1 page1 

So can you help me find a way to get only a unique string in my situation? Postscript Prefer linux / bash / awk command-line solutions only. But if you have a solution in another programming language, I would also like to.

Upd. awk '!a[$0]++' Input_file does not work, pic:

enter image description here

+5
source share

Source: https://habr.com/ru/post/1275398/


All Articles