Stream / string / bytearray conversions in Python 3

Python 3 cleans up Python handling of Unicode strings. I assume that as part of this effort, codecs in Python 3 have become more restrictive, according to the Python 3 documentation , compared to the Python 2 documentation .

For example, codecs that conceptually convert a stream to another form of a stream have been removed:

  • base64_codec
  • bz2_codec
  • hex_codec

And the codecs that conceptually convert Unicode to another Unicode form have also been removed (in Python 2 it really went between Unicode and bytestream, but conceptually this is really Unicode for Unicode, I reckon):

  • rot_13

My main question is: what is the “right way” in Python 3 to do what these remote codecs used? They are not codecs in the strict sense, but "transformations." But the interface and implementation will be very similar to codecs.

I am not interested in rot_13, but I am interested to know what would be the “best way” to implement line ending styles conversion (Unix line endings and Windows line endings), which should really be Unicode-to-Unicode, done before encoding into a byte stream, especially when UTF-16 is used, as this other SO question was discussed .

+3
source share
2 answers

, . :

, , string/bytearray, Python 3.

Python 3.2

A " Pythons" , Python 3.2.

:

" " "-" , , encode()/decode() Python 3.x - Python 2.x ).

3.2, API - arent, .

Python 3 docs codecs - Binary Transforms.

Barry Warsaw:

, Python 2 , Caeser (.. rot13)? , , :

>>> 'foo'.encode('rot-13')
'sbb'

Python 3, , str-to-str, rot-13, , str.encode() , . str-to-str Python 2, Python 3, API , :

>>> from codecs import getencoder
>>> encoder = getencoder('rot-13')
>>> rot13string = encoder(mystring)[0]

- API . , Python.

+6

? -, , open(), \n . , , , . ( , , .)

http://docs.python.org/3.1/library/functions.html#open

, yourstring = yourstring.replace('\n', '\r\n') Linux- Windows yourstring = yourstring.replace('\r\n', '\n') Windows Linux-. , , , , , , . (, , , \n \r\n Windows, , .)

, Unicode ( , , , Python , - Unicode), bytes.decode() bytearray.decode(), str.encode(). UTF-8 UTF-16:

newstring = yourbytes.decode('utf-8')
yourbytes = newstring.encode('utf-16')

Unicode, .

str.translate() str.maketrans(), , :

http://docs.python.org/3.1/library/stdtypes.html#str.translate
http://docs.python.org/3.1/library/stdtypes.html#str.maketrans

rot_13 :

import string
rot_13 = str.maketrans({x: chr((ord(x) - ord('A') + 13) % 26 + ord('A') if x.isupper() else ((ord(x) - ord('a') + 13) % 26 + ord('a'))) for x in string.ascii_letters})

# Using hard-coded values:

rot_13 = str.maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm')

S.translate(rot_13) , rot_13 rot_13 , .

+2

Source: https://habr.com/ru/post/1713808/


All Articles