HTTP POST request character encoding detection

I am creating a web service and node that accepts POST to create a new resource. A resource expects one of two types of content - the XML format that I will define, or form variables.

The idea is that consuming applications can directly process POST XML messages and receive more efficient validation, etc., but there is also an HTML interface that will be POST in encoded form. Obviously, the XML format has a charset declaration, but I do not see how I detect a charset form only from viewing POST.

A typical form post from Firefox is as follows:

POST /path HTTP/1.1 Host: www.myhostname.com User-Agent: Mozilla/5.0 [...etc...] Accept: text/html,application/xhtml+xml, [...etc...] Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 41 field1=value1&field2=value2&field3=value3 

Which, apparently, does not contain any useful character set features.

From what I see, the type application / x-www-form-urlencoded is fully defined in HTML, which simply formulates the% -encoding rules, but says nothing that the data encoding should be in.

Basically, is there a way to tell a character set if I don't know which character set was originally introduced in HTML? Otherwise, I will have to try and guess the character set based on what characters are present, and it always depends a little on what I can say.

+46
Apr 02 '09 at 9:08
source share
3 answers

The standard HTTP POST encoding is ISO-8859-1.

otherwise you need to look at the Content-Type header, which will look like

 Content-Type: application/x-www-form-urlencoded ; charset=UTF-8 

Perhaps you can declare your form with

 <form enctype="application/x-www-form-urlencoded;charset=UTF-8"> 

or

 <form accept-charset="UTF-8"> 

for forced coding.

Some links:

http://www.htmlhelp.com/reference/html40/forms/form.html

http://www.w3schools.com/tags/tag_form.asp

+57
Apr 02 '09 at 9:16
source
β€” -

The encoding used in POST will correspond to the Charset encoding specified in the HTML hosting of the form. Therefore, if your form is submitted using UTF-8 encoding, which is the encoding used for published content. URL encoding is used after converting values ​​to a set of octets for character encoding.

+11
Apr 02 '09 at 9:16
source

Try setting the encoding on your Content-Type:

 httpCon.setRequestProperty( "Content-Type", "multipart/form-data; charset=UTF-8; boundary=" + boundary ); 
+1
Feb 28 2018-12-12T00:
source



All Articles