Why should I use url.openStream instead of url.getContent?

I would like to get the contents of the url. Like pythons:

html_content = urllib.urlopen("http://www.test.com/test.html").read() 

In the examples ( java2s.com ), you often see the following code:

 URL url = new URL("http://www.test.com/test.html"); String foo = (String) url.getContent(); 

The getContent description is as follows:

 Gets the contents of this URL. This method is a shorthand for: openConnection().getContent() Returns: the contents of this URL. 

In my opinion, this should work just fine. Buuut, obviously, this code does not work because it causes an error:

 Exception in thread "main" java.lang.ClassCastException: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast to java.lang.String 

Obviously, it returns an inputStream.

So, I ask myself: what is the purpose of this function that does not do what it does? And why is there no hint of quirks in the documentation? And why did I see this in a few examples?

Or I'm wrong?

The proposed solution ( https://stackoverflow.com/a/123908/ ) should use url.openStream () and then read Stream.

+6
source share
3 answers

As you said, the documentation says that URL.getContent() is a shortcut for openConnection().getContent() , so we need to look at the documentation for URLConnection.getContent() .

We see that this returns an Object whose type is determined by the response content-type header field. This type defines the ContentHandler to be used. In this way, the ContentHandler converts the data based on its MIME type to the corresponding Java Object class.

In other words, the type of object you receive will depend on the content. For example, it would be pointless to return a String if the MIME type was image/png .

This is why in the sample code that you reference java2s.com, they check the class of the returned Object:

 try { URL u = new URL("http://www.java2s.com"); Object o = u.getContent(); System.out.println("I got a " + o.getClass().getName()); } catch (Exception ex) { System.err.println(ex); } 

So you can say String foo = (String) url.getContent(); if you know your ContentHandler will return a String .

There are default content handlers defined in the package sun.net.www.content , but as you can see, they return streams to you.

You can create your own ContentHandler that returns a String , but it will probably be easier to just read the Stream as you suggest.

+10
source

You misunderstand what "Content" means. You expected it to return a String containing HTML, but return an HttpInputStream. What for? Because the requested URL is an html page. Another valid URL could be http://www.google.com/logo.png . This URL does not contain String content. This is an image.

+3
source

You can use Guava Resources.toString (URL, Charset) to read the URL of the string more easily.

+2
source

Source: https://habr.com/ru/post/911243/


All Articles