Unicode issue in cgi.PATH_INFO URLs in ColdFusion

My ColdFusion site (MX7 on IIS 6) has a search function that adds a search query to a URL, for example. http://www.example.com/search.cfm/searchterm .

The problem I am facing is a multilingual site, so the search query may be in another language, for example. ุงู„ู‚ุงู‡ุฑุฉ , leading to a search URL, for example http://www.example.com/search.cfm/ุงู„ู‚ุงู‡ุฑุฉ

The problem is when I came to get the search query from the url. I use cgi.PATH_INFO to get the path to the search page and search term and extract the search term from this, for example. /search.cfm/searchterm however, when Unicode characters are used in a search, they are converted to question marks, for example. /search.cfm/?????? .

Actual question marks appear, not a browser that is unable to format Unicode characters, or they are distorted at the output.

I cannot find any information on whether ColdFusion supports Unicode in the URL, or how can I solve this problem and somehow get the full URL - does anyone have any ideas?

Greetings

Tom

Change Further research led me to think that the problem might be with IIS and not with ColdFusion, but my initial request is still worth it.

Further editing

The result of GetPageContext().GetRequest().GetRequestUrl().ToString() is http://www.example.com/search.cfm/searchterm/????? , so the problem seems pretty deep.

+4
source share
3 answers

Yes, this is really not a ColdFusion bug. This is a common problem.

This is mainly a mistake of the original CGI specification, which indicates that PATH_INFO should be% -decoded, thus losing the original sequence of %xx bytes, which would allow you to determine what real characters were intended.

And this is partly an IIS error because it always tries to read the %xx bytes represented in the path part as UTF-8 encoded Unicode (unless the path is a valid UTF-8 byte sequence, in which case it plumps for the Windows codepage by default, but does not give you the opportunity to find out what happened). Having done this, it places it in the environment variables as a Unicode string (since envvars are Unicode on Windows).

However, most byte tools that use C stdio (and I assume this applies to ColdFusion, as happens in Perl, Python 2, PHP, etc.), try reading the environment variables as bytes, and MS C encodes again Unicode content using the default Windows codepage. Therefore, any characters that do not match the default code page are lost forever. This will include your Arabic characters when launched in a western Windows installation.

A smart script that has direct access to the Win32 API GetEnvironmentVariableW can call this to get the native-Unicode environment variable, which they can then encode in UTF-8 or something else they wanted, assuming the input was also UTF- 8 (this is what you usually wanted today). However, I do not think that CodeFusion gives you this access, and in any case, it only works with IIS6; IIS5.x will throw out any non-default characters before they reach the environment variables.

Otherwise, itโ€™s best to rewrite the URL. If the layer above CF can convert this search.cfm/ุงู„ู‚ุงู‡ุฑุฉ to search.cfm/?q=ุงู„ู‚ุงู‡ุฑุฉ , then you do not face the same problem, because the variable QUERY_STRING , unlike PATH_INFO , is not specified as a% code, so %xx bytes remain where the instrument at the CF level can see them.

+3
source

Here is what you could do:

 <cfset url.searchTerm = URLEncodedFormat("ุงู„ู‚ุงู‡ุฑ", "utf-8") > <cfset myVar = URLDecode(url.searchTerm , "utf-8") > 

Of course, I would recommend that you work with something similar in this case:

yourtemplate.cfm? SEARCHTERM =% C3% 98% C2% A7% C3% 99% E2% 80% 9E

And then you rewrite the URL in IIS (if not already done using the framework / rest of the application) http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite -module / to match your pattern.

+2
source

You can set the character encoding for the URL area and FORM area using the setEncoding () function:

http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm?context=ColdFusion_Documentation&file=00000623.htm

You need to do this before you access any of the variables in this area.

But the standard encoding of these areas is already UTF-8, so this may not help. In addition, this is likely not to affect the CGI region.

Is the IIS server logging the correct characters in the query log?

0
source

Source: https://habr.com/ru/post/1308754/


All Articles