Convert encoding via iconv linux

Question

Convert encoding via iconv linux

I used to convert the encoding through iconv, but today I settled on something new for me.
I did a test to make my question clear:

target is converted الحلقة الثالثةto its utf8 version: الحلقة الثالثة

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title> this text is from arabic language   </title>
</head>
<body>
<p><span> &#1575;&#1604;&#1581;&#1604;&#1602;&#1577; &#1575;&#1604;&#1579;&#1575;&#1604;&#1579;&#1577;</span></p>
</body>
</html>

tried to use an encoding type ASCII , LATIN1 , windows-1252, but no luck how can I say that this is an encoding type to convert it? were both google translate + stackoverflow editors able to detect it and hide it?

another example: this site http://kanjidict.stc.cx/recode.php was able to correctly convert the encoding if I checkedAssume HTML (default: handle as plain text)

that I went missing and these 3 websites did it to convert it correctly ????

+3

html command-line-interface encoding iconv arabic

tawfekov 10 . '11 12:12

4

. Python3.

:

>>> import re
>>> s = r'&#65;&#223;&#254;'
>>> r = re.compile(r'&#(\d+);')
>>> r.sub(lambda m:chr(int(m.group(1))), s)
'Aßþ'

:

>>> import re
>>> s = r'&#x41;&#223;&#xFE;'
>>> r = re.compile(r'&#(x?)(\w+);')
>>> r.sub(lambda m:chr(int(m.group(2), 10 if not m.group(1) else 16)), s)
'Aßþ'

+2

kev 05 . '11 15:20

. , url html, - , .

PHP http://www.php.net/manual/en/function.htmlspecialchars-decode.php

+1

simon 10 . '11 12:25

recode html..utf8

, PLS , , .

+1

Diego 19 . '15 22:49

tawfekov · Accepted Answer · 2011-01-11T11:48:04+0000

,

, , , , ascii2uni

: sudo apt-get install ascii2uni

unicode

ascii2uni -a D source.html > target.html

Convert encoding via iconv linux

More articles: