What 8-bit character set uses 0x9d?

Which 8-bit ASCII-like character set for English is 0x9dsignificant? I clear old data files and sometimes find 0x9din other ASCII text. (No, this is not UTF-8.)

This is not valid on Windows-1252. The codec "latin-1" Python translates it into Unicode 0x9D, which is "Operating System" . That makes little sense. In Unicode, you get a field with [009d]. (In Python, you can convert anything to Latin-1 without raising errors, but that does not mean that it makes sense.)

Examples, with screens like Python, from a dirty database that I clean up that combines text from many sources:

Guitar Pro, JamPlay, RedBana\\\ Audition,\x9d Doppleganger\x99s The Lounge\x9d or Heatwave Interactive\x99s Platinum Life Country,\\"

for example \\"I\\\'ve seen the bull run in Pamplona, Spain\x9d.\\" Everything

Netwise Depot is  a \\"One Stop Web Shop\\"\x9d that provides sustainable \\"green\\"\x9d living

are looking for a \\"Do It for Me\\"\x9d solution

Out of context, I would suspect ™ or ®. But what is 8-bit code?

+4
source share
4 answers

Here's a completely wild hypothesis:

Some previous (really broken) system working on this data tried to write each character as UTF-8, but actually wrote only the last byte of each sequence (maybe there was a strange one-byte buffer somewhere). Alternatively, this was in UTF-8 in the past, but someone, looking at it in a different encoding, performed a search and replace to remove the 0xE2 0x80 bytes, because they clearly did not belong and did not understand that the remaining ones were special the symbol "was not what they wanted.

ASCII, , , UTF-8 .

" " (U + 2019) UTF-8 0xE2 0x80 0x99. , \x99s, , , s . , 0x99.

" " (U + 201D) " UTF-8 0xE2 0x80 0x9D. 0x9D, , . ". , - - , "" 0x9D.

, , , , . UTF-8 "" , , - , .

+4

Windows-1256, , \x99 \x9d . , , , . , , .

- chardet.

+1

, DOS (CP850).

0x9D "" .

0

, , , , 8- ASCII, 0x9D , .

This may be the result of years of searching through data. There are other questions about Python barcode conversions that don't work on 0x9D, so they are not unique to this data. Somewhere there is something that sticks out 0x9D once in a while, usually after quotes. Maybe some old word processor. Thank you all.

0
source

Source: https://habr.com/ru/post/1683936/


All Articles