In which coding is there a 0xDB currency symbol?

I received files, which, unfortunately, I can’t get information about how they were generated. I need to parse these files.

The file is fully ASCII, in addition, for one character: 0xDB (in decimal it gives 219).

Obviously (looking at the file) this symbol is a currency symbol. I know this because:

  • It is imperative that these files contain a currency symbol wherever the amount is displayed.
  • there are no other currency symbols (neither $, nor euro, nor anything) anywhere in the files
  • every time 0xDB appears next to the quantity

I think in these files that 0xDB should represent the Euro symbol (in fact, it is very likely that this 0xDB appears where the euro symbol should appear).

The file command says this:

ISO-8859 English text, with CRLF, LF line terminators 

hexdump gives the following:

 00000030 71 75 61 6e 74 20 db 32 2e 36 30 0a 20 41 49 4d |quant .2.60. AIM| ^^ ^ 

All files are usually formatted / parsed anyway. In fact, I get all the information, except that this strange character is 0xDB.

Does anyone know what is going on? How did the currency symbol (supposedly the euro symbol) somehow become 0xDB?

This is neither ISO-8859-1 (aka ISO Latin 1), nor ISO-8859-15, because in both cases the code point 219 corresponds to "Û" (the same as the Unicode code number 219 is "LATIN CAPITAL LETTER U WITH A CIRCUMFLEX "),

It is not expanded - ASCII.

+4
source share
4 answers

It could be Mac OS Roman

+7
source

This is MacRoman . In fact, it should be - the only encoding in which the Euro sign corresponds to 0xDB .

Here's the full map coding for MacRoman.

+4
source

Using the macroman script macro, you will find out:

 $ macroman 0xDB MacRoman DB ⇒ U+20AC ‹€› \N{ EURO SIGN } 

You can also go the other way:

 $ macroman U+00E9 MacRoman 8E ⇐ U+00E9 ‹é› \N{ LATIN SMALL LETTER E WITH ACUTE } 

And we know that U + 20AC EURO SIGN is really a currency symbol due to the output of the uniprops script :

 $ uniprops -a U+20AC U+20AC <€> \N{ EURO SIGN }: \pS \p{Sc} All Any Assigned InCurrencySymbols Common Zyyy Currency_Symbol Sc Currency_Symbols S Gr_Base Grapheme_Base Graph GrBase Print Symbol X_POSIX_Graph X_POSIX_Print Age=2.1 Bidi_Class=ET Bidi_Class=European_Terminator BC=ET Block=Currency_Symbols Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None DT=None East_Asian_Width=A East_Asian_Width=Ambiguous EA=A Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=PR Line_Break=Prefix_Numeric LB=PR Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=Other SB=XX Sentence_Break=XX Word_Break=Other WB=XX Word_Break=XX _X_Begin 
+2
source

0xDB represents the Euro character in Mac OS Roman character encoding.

+1
source

Source: https://habr.com/ru/post/1337678/


All Articles