Reading Java Undecoded URLs from Servlet

Question

Reading Java Undecoded URLs from Servlet

Suppose I have a string like '= &? /; # +% 'to be part of my url, let's say this:

example.com/servletPath/someOtherPath/myString/something.html?a=b&c=d#asdf

where myString is the line above. I encoded the critical part so that the url looks like

 example.com/servletPath/someOtherPath/%3D%26%3F%2F%3B%23%2B%25/something.html?a=b&c=d#asdf

So far so good.

When I am in the servlet and I read any of request.getRequestURI() , request.getRequestURL() or request.getPathInfo() , the return value is already decoded, so I get strilng as

 someOtherPath/=&?/;#+%/something.html?a=b&c=d#asdf

and I cannot distinguish between real special characters and encoded ones.

I solved a specific problem by banning the characters described above altogether, but in this situation I still wonder if there is a way to get an uncoded URL in a servlet class.

GIVE ANOTHER EDITING: When I hit this issue last night, I was too tired to notice what was actually happening, which is even weirder! I have a servlet, say / servletPath / *, after which I can put whatever I want and get the response of my servlet depending on the rest of the path, except when there is% 2F in the path. In this case, the request never hits the servlet, and I get 404! If I put '/' instead of% 2F, it works fine. I am running Tomcat 6.0.14 on Java 1.6.0-04 on Linux.

+4

java url servlets decode encode

Slartibartfast Jun 08 '09 at 17:50

source share

5 answers

If the decoded URL contains %2F , this means that the encoded URL contains %252F .

Since %2F is / Why not just split by "\/" and not worry about URL encoding?

+2

Powerlord Jun 08 '09 at 17:58

source share

According to Javadoc , getRequestURI should not decode the string. GetServletPath, on the other hand, returns a decoded string. I tested this locally with Jetty and it behaves as described in the document.

So there might be something else in your situation, because the behavior you described does not match Sun documentation.

+1

Francois gravel Jun 09 '09 at 11:19

source share

It looks like you are trying to do something RESTY (use a jersey). Can you just parse the leading and trailing parts of the url to get the data you are looking for?

url.substring (startLength, url.length - endLength);

0

stevedbrown Jun 08 '09 at 20:51

source share

Update: this answer initially mistakenly indicated that '/' and '% 2F' in the path should always be treated the same. They actually differ from each other because the path is a list of / -separated segments.

You do not need to make the difference between the encoded and non-encoded character in the part of the URL path . There is no character inside the path that may have special meaning in the URL. For instance. "% 2F" should be interpreted in the same way as "/", and the browser accessing such a URL can replace one with another at its discretion. The difference between them is a violation of the URL coding standard.

In the full URL, you must make the difference between escaped and insecure characters for various reasons, including:

To find out where the path ends. Because? encoded in transit should not be construed as ending.
Inside the String request. Since part of the parameter value may contain '&' or '=', ...
Inside the path, "/" separates the two segments, and "% 2F" may be contained in the segment

Java does a great job with the first two cases:

getPathInfo() , which returns only part of the path, is decoded
getParameter(String) to access parts of the request part

In the third case, this is not so good. If you want to make the difference between "/" as the separation of two path segments and "/" inside the path segment (% 2F), then you cannot consecutively represent the path as one decrypted string. You can either represent it as a single encoded string (for example, "foo / bar% 2Fbaz"), or as a list of decoded segments (for example, "foo", "bar / baz"). But since the getPathInfo () API promises does just that (one decrypted line), it has no choice but to treat '/' and '% 2F' as the same.

For regular web applications this is just great. If you are in the rare case when you really need to make a difference, you can parse the URL yourself by getting the original version with getRequestURI() . If this gives a URL decoded as you claim, then that means you are using an error in the servlet implementation.

-1

Wouter coekaerts Jun 09 '09 at 11:36

source share

Jona christopher sahnwaldt · Accepted Answer · 2009-06-30T16:08:36+0000

There is a fundamental difference between “% 2F” and “/” for both the browser and the server.

The HttpServletRequest specification says (without any logic, AFAICT):

getContextPath: not decoded
getPathInfo: decoded
getPathTranslated: not decoded
getQueryString: not decoded
getRequestURI: not decoded
getServletPath: decoded

The result of getPathInfo () should be decoded, but the result of getRequestURI () should not be decoded. If so, your Servlet container violates the specification (as Wouter Coekaerts and Francois Gravel correctly pointed out). What version of Tomcat are you using?

To make matters worse, current versions of Tomcat reject paths containing the encodings of certain special characters for security reasons .

Reading Java Undecoded URLs from Servlet

More articles: