I am creating a web service and node that accepts POST to create a new resource. A resource expects one of two types of content - the XML format that I will define, or form variables.
The idea is that consuming applications can directly process POST XML messages and receive more efficient validation, etc., but there is also an HTML interface that will be POST in encoded form. Obviously, the XML format has a charset declaration, but I do not see how I detect a charset form only from viewing POST.
A typical form post from Firefox is as follows:
POST /path HTTP/1.1 Host: www.myhostname.com User-Agent: Mozilla/5.0 [...etc...] Accept: text/html,application/xhtml+xml, [...etc...] Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 41 field1=value1&field2=value2&field3=value3
Which, apparently, does not contain any useful character set features.
From what I see, the type application / x-www-form-urlencoded is fully defined in HTML, which simply formulates the% -encoding rules, but says nothing that the data encoding should be in.
Basically, is there a way to tell a character set if I don't know which character set was originally introduced in HTML? Otherwise, I will have to try and guess the character set based on what characters are present, and it always depends a little on what I can say.
Ciaran McNulty Apr 02 '09 at 9:08 2009-04-02 09:08
source share