Python IRC Encoding and Encoding Issue

I currently have a simple IRC bot written in python.

Since I ported it to python 3.0, which distinguishes between Unicode bytes and strings, I am having encoding issues. In particular, if others do not send UTF-8.

Now I can just tell everyone to send UTF-8 (what they should do independently), but even the best solution would try to force python to use some other encoding or one by default.

So far, the code is as follows:

data = str(irc.recv(4096),"UTF-8", "replace")

This, at least, is not an exception. However, I want to pass it by: I want my bot to use a different encoding by default or somehow try to detect "nasty characters".

Also, I need to find out what this cryptic encoding that mIRC actually uses is because the other clients are working fine and sending UTF-8 as they should.

How do i do this?

+3
source share
4 answers

Well, after some research, it turns out that the garrison has problems with python 3. The solution turned out to be simpler than I thought. I decided to abandon CP1252 if UTF-8 doesn't cut it:

data = irc.recv ( 4096 )
try: data = str(data,"UTF-8")
except UnicodeDecodeError: data = str(data,"CP1252")

It seems to work. Although it does not detect the encoding, so if someone came up with an encoding that is neither UTF-8 nor CP1252, I will have a problem again.

This is really just a temporary fix.

-1
source

chardet - Python .

+3

, , , RichieHindle. , 90% , , , :

def decode(bytes):
    try:
        text = bytes.decode('utf-8')
    except UnicodeDecodeError:
        try:
            text = bytes.decode('iso-8859-1')
        except UnicodeDecodeError:
            text = bytes.decode('cp1252')
    return text


def encode(bytes):
    try:
        text = bytes.encode('utf-8')
    except UnicodeEncodeError:
        try:
            text = bytes.encode('iso-8859-1')
        except UnicodeEncodeError:
            text = bytes.encode('cp1252')
    return text
0

chardet , ( IRC).

. ( , . http://en.wikipedia.org/wiki/Internet_Relay_Chat#Character_encoding), , d go to chardet ( - - , ).

:

def decode_irc(raw, preferred_encs = ["UTF-8", "CP1252", "ISO-8859-1"]):
    changed = False
    for enc in preferred_encs:
        try:
            res = raw.decode(enc)
            changed = True
            break
        except:
            pass
    if not changed:
        try:
            enc = chardet.detect(raw)['encoding']
            res = raw.decode(enc)
        except:
            res = raw.decode(enc, 'ignore')
return res
0
source

Source: https://habr.com/ru/post/1709586/


All Articles