I am writing a bash script to process some files automatically, and one sub-slab should use iconvto re-encode the source files if I don't like them. For this, I use:
enc=$(file -b --mime-encoding "$file")
if [ "$enc" = "iso-8859-1" ] || [ "$enc" = "us-ascii" ]
then
unset enc
fi
cat "$file" |
( [[ "${enc}" ]] && iconv -f "$enc" -t iso-8859-1 || cat ) |
awk '{# code to process file further}' > "$newfile"
The problem is that I have a UTF8 file, but filefalsely recognizes it as ASCII. The first character other than ASCII is character # 314206, which is located on line # 1028. There seems to be some sample size for file, for example, if I convert a file from a fixed width to a character limited to the first character other than ASCII, char # 80872 and filecorrectly recognizes the encoding of a file. Therefore, I assume that the sample size is between these two values.
(TL; DR)
file - bash ?
file -P, . man file , googling .
( , )