How to get uniq strings with different encodings

Question

How to get uniq strings with different encodings

I have a 1.txt file

$ cat 1.txt page1 age1

But:

 $ head -n1 1.txt | file -i - /dev/stdin: text/plain; charset=us-ascii $ head -n2 1.txt | tail -n1 | file -i - /dev/stdin: text/plain; charset=utf-8

Strings have a different encoding. Because of this, I cannot get a unique string using the method I know:

 $ cat 1.txt | sort | uniq -c | sort -rn 1 age1 1 page1

So can you help me find a way to get only a unique string in my situation? Postscript Prefer linux / bash / awk command-line solutions only. But if you have a solution in another programming language, I would also like to.

Upd. awk '!a[$0]++' Input_file does not work, pic:

+5

linux bash awk character-encoding

Viktor Khilin Feb 16 '18 at 11:38

source share

No one has answered this question yet.

See similar questions:

5

How to replace Unicode characters with ASCII

or similar:

4800

How to find all files containing specific text in Linux?

4598

Get the source directory of the Bash script from the script itself

3029

How to find out if a regular file exists in Bash?

2561

How to combine string variables in bash

2273

How to check if a string contains a substring in Bash

2058

How to get consistent byte representation of strings in C # without manually specifying an encoding?

1913

How to check if a program exists from a Bash script?

1835

How to split line on delimiter in Bash?

1528

How to change echo output color on Linux

1324

How do I request Yes / No / Cancel input in a Linux shell script?

How to get uniq strings with different encodings

More articles: