HttpClient issue with URLs that include curly braces

I am using HttpClient for my android application. At some point, I should get data from remote locations. Below is a snippet of how I used HttpClient to get the response.

String url_s = "https://mydomain.com/abc/{5D/{B0blhahblah-blah}I1.jpg"; //my url string DefaultHttpClient httpClient = new DefaultHttpClient(); response = httpClient.execute(new HttpGet(url_s)); 

In most cases, it works absolutely fine, but not when there are curly braces in my URL, which are mostly strings. The stack trace shows me the index of curly braces that specify an invalid character. So I tried to create a URI from an encoded URL.

 URL url = new URL(url_s); URI uri = url.toURI(); response = httpClient.execute(new HttpGet(uri)); 

After that, I did not get the result from a remote place at all. I worked on a problem and fixed it by replacing curly braces

  • "{" with "% 7B"
  • "}" with "% 7D"

But I am not completely satisfied with my decision. Are there any better solutions? Anything neat and not hardcoded like mine?

+6
source share
2 answers

The strict answer is that you should never have braces in your URL

A full description of the valid URL can be found in RFC1738

The relevant part of this answer is as follows:

Dangerous:

Characters can be unsafe for a number of reasons. Space
the symbol is unsafe because significant spaces can disappear and
Minor spaces can be entered when URLs are transcribed or typed or processed by text programs.
The characters "<" and ">" are unsafe because they are used as separators around URLs in free text; The quotation mark ("") is used to delimit URLs on some systems. The "#" character is unsafe and must always be encoded because it is used on the World Wide Web and others
systems for delimiting a url from a fragment identifier / anchor that can follow it. The% symbol is unsafe because it is used to
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes change such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]" and "` ".

All unsafe characters must always be encoded in the URL. For
For example, the character "#" must be encoded in URLs even on the Internet of a system that usually does not process a fragment or anchor
identifiers, so if the URL is copied to another system that uses them, there is no need to change the URL encoding.

To get around the problem you are facing, you must encode your URL.

The problem that you encounter the error "the host cannot be null" will occur when encoding the entire URL, including https://mydomain.com/, so it gets confused. You just want to encode the last part of the URL called the outline.

The solution is to use the Uri.Builder class to create your URI from separate parts that must encode the path in the process

You will find a detailed description in the reference documentation for the Android SDK Uri.Builder

Some trivial examples using your values:

 Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon(); b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg"); Uri u = b.build(); 

Or you can use the chain:

  Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build(); 
+9
source

Except that RFC1738 has been outdated for more than a decade, rfc3986 has been replaced, and there are no indications in it:

https://tools.ietf.org/html/rfc3986

These braces are unsafe (in fact, the RFC does not contain any curly braces). Also, I tried URIs in browsers that contain curly braces, and they work fine.

Also note that the OP uses the URI class, which should definitely follow 3986, at least if not 3987.

However, oddly enough, IRIs are defined in:

https://tools.ietf.org/html/rfc3987

Note that:

Systems accepting IRIs can also handle printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", ", space," {","} "," | "," \ "," ^ "and" `", in step 2 above. If these characters are found but cannot be converted, then the conversion SHOULD fail. Please note that the sign of the number ("#"), the percent sign ( "%"), and the square brackets ("[", "]") are not part of the above list and MUST NOT be converted.

In other words, the RFCs themselves seem to have some problems.

0
source

Source: https://habr.com/ru/post/893031/


All Articles