UTF-8 Batch Validation Tool?

Does anyone know an application / service / method that I could use to check heaps of XML files for UTF-8?

I basically have a ton of XML files that are supposedly UTF-8, and some of them contain some dummy characters that cause them to not appear directly in the content viewer.

I know that I can check one at a time using the methods found in this answer: How to check if a UTF-8 file is really?

... but what about thousands of XML files at once?

+3
source share
2 answers

? , , , iconv -f utf8 , .


, , . , , , , , .

* nix, script , .

#!/bin/sh
for f in *.xml; do
    if ! iconv -f utf8 $f >/dev/null 2>&1; then
        echo $f
    fi
done

, , , .

+5

jamessan iconv, script, unix find, , ( )

#!/bin/sh

for i in "$@"
do

    if ! iconv -f utf8 $i >/dev/null 2>&1;
    then
            echo "failed: $i"
    #else
        #   echo "ok: $i"
    fi

done

, script check_UTF8.sh, :

$ find -E . -type f -iregex ".*(.js|.css|.php|.tpl|.html)$" | xargs /path/to/check_UTF8.sh

, regex ( .js/.css/.php/.tpl/.html), check_UTF8.sh script, , UTF-8 , script

0

Source: https://habr.com/ru/post/1724419/


All Articles