When I create a PDF form (for example, using Acrobat) that contains text fields in AcroForm format (PDF dictionaries, XFA) and I send data to the server, how can I specify / get the encoding to be used?
For instance. When I send Chinese glyphs ζ΅θ― '(test), I get the following headers and server-side content:
accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */* content-type: application/x-www-form-urlencoded content-length: 23 acrobat-version: 10.1.4 user-agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC; .NET4.0C; AskTbCLA/5.15.1.22229) accept-encoding: gzip, deflate connection: Keep-Alive Song=%b2%e2%ca%d4&Test=
There is no encoding reference except x-www-form-urlencoded. Two glyphs are represented as four bytes: B2 E2 CA D4. After some research, I know that B2E2 is the GBK value for the first glyph, and CAD4 is the GBK value for the second glyph, but I cannot extract it from the request header.
Is it always GBK? I want to change the data encoding by setting a specific key in the dictionary in PDF, but it seems not. For example: I would like to make sure that PDF always sends Unicode characters instead of GBK.
Note that I already experimented by changing the default font (and encoding) of the text field. I also searched for ISO-32000-1 for field encodings, but all I found was a way to define non-Latin characters for checkboxes and some information about the encoding of the FDF file. None of them answered my questions.
source share