Java.net.URI and percentages in request parameter value

System.out.println( new URI("http", "example.com", "/servlet", "a=x%20y", null)); 

The result is http://example.com/servlet?a=x%2520y , where the value of the request parameter is different from the specified one. Strange, but this happens after Javadoc:

"The percent character ("% ") is always quoted by these constructors."

We can pass the decoded string a=xy , and then get a reasonable result (?) a=x%20y .

But what if the value of the query parameter contains an "&" character? This happens, for example, if the value is a URL with request parameters. Look at this (wrong) query string: a=b&c . Ampersand must be escaped here ( a=b%26c ), otherwise it can be considered as a query parameter a=b and some garbage ( c ). If I pass this to the URI constructor, it encodes it and returns an invalid URL: ...?a=b%2526c

This problem seems to make java.util.URI useless. Did I miss something?

Summary of responses

java.net.URI knows about the existence of the request part of the URI, but does not understand the interior of the request part, which may differ for each scheme. For example, java.net.URI does not understand the internal structure of the HTTP request part. This will not be a problem if java.net.URI viewed the request as an opaque string and did not change it. But he is trying to apply some kind of general percent encoding algorithm that breaks down HTTP URLs.

Therefore, I cannot use the URI class to reliably assemble a URL from its parts, despite the fact that there are constructors for it. I would also like to mention that with Java 7 the implementation of the relativization operation is very limited, it only works if one URL is a prefix of another. These two functions (and a more compact interface for these purposes) were the reason that I was interested in java.net.URI, but none of them worked for me.

In the end, I used java.net.URL for parsing and wrote code to assemble the URLs from parts and to relativize the two URLs. I also checked the Apache class HttpClient URIBuilder and although it understands the internals of the HTTP request string, but as of 4.3, it has the same encoding problem as java.net.URI when dealing with the request part as a whole.

+6
source share
4 answers

Query string

 a=b&c 

is not erroneous in the URI. RFC in common URI syntaxes

A query component is a string of information for interpreting a resource.

  query = *uric 

In the request component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", "," and "$" are reserved.

The & character in the query string is very important ( uric represents reserved, signed, and alphanumeric characters). RFC also states

Many URIs include components consisting of or limited to specific special characters. These characters are called “reserved” because their use in the URI component is limited by their reserved purpose. If the data for the URI component conflicts with the reserved target, then conflicting data must be escaped before the URI is generated.

Since & is valid but reserved, the user must determine whether it is intended to be encoded or not.

What you call a query parameter is not a sign of a URI, so the URI class has no reason (and should not) support it.

on this topic:

+1
source

The only workaround I found was to use constructors and methods with one argument. Note that you must use the URI#getRawQuery() to avoid decoding %26 . For instance:

 URI uri = new URI("http://a/?b=c%26d&e"); // uri.getRawQuery() equals "b=c%26d&e" uri = new URI(new URI(uri.getScheme(), uri.getAuthority(), uri.getPath(), null, null) + "?f=g%26h&i"); // uri.getRawQuery() equals "f=g%26h&i" uri = uri.resolve("?j=k%26l&m"); // uri.getRawQuery() equals "j=k%26l&m" // uri.toString() equals "http://a/?j=k%26l&m" 
+1
source

The only working solution known to me is reflection (see https://blog.stackhunter.com/2014/03/31/encode-special-characters-java-net-uri/ )

 URI uri = new URI("http", null, "example.com", -1, "/accounts", null, null); Field field = URI.class.getDeclaredField("query"); field.setAccessible(true); field.set(uri, encodedQueryString); //clear cached string representation field = URI.class.getDeclaredField("string"); field.setAccessible(true); field.set(uri, null); 
0
source

Use the URLEncoder.encode() method, for example, in your case:

 URLEncoder.encode("a=x%20y", "ISO-8859-1"); 
-1
source

Source: https://habr.com/ru/post/957868/


All Articles