Accepting relative paths in JSoup for pure <img> tags
The following is an example of the text to be analyzed.
<P>The symbol <IMG id="pic1" height=15 src="images/itemx/image001.gif" width=18>indicates......</P> I need to do a cleanup. Therefore, applying the following code will remove the src attribute since it does not start with a valid protocol. Anyway, to configure Jsoup to select an attribute? I want to avoid using an absolute url if possible.
Jsoup.clean(content, Whitelist.basicWithImages()); The jsoup cleaner will allow relative references if the base URI parameter is specified during cleanup. This means that the link protocol can be confirmed with respect to the allowed protocols. Note that in your example, you are using a pure method without a base URI, so the connection cannot be resolved and therefore must be removed.
etc .:
String clean = Jsoup.clean(html, "http://example.com/", Whitelist.basicWithImages()); Please note that in the current version, any relative links will be converted to absolute links after cleaning. I just made a change (available in the next release), which would probably save relative links.
The syntax will be:
String clean = Jsoup.clean(html, "http://example.com/", Whitelist.basicWithImages().preserveRelativeLinks(true)); Unfortunately, the accepted answer does not work for me, because I have to support several domains (including several environments of several developers and several production sites). Therefore, we really need relative URLs (regardless of the dangers it carries). So here is what I did for this:
// allow relative URLs. JSoup doesn't support that, so we use reflection // removing the list of allowed protocols, which means all protocols are allowed Field field = ReflectionUtils.findField(WHITELIST.getClass(), "protocols"); ReflectionUtils.makeAccessible(field); ReflectionUtils.setField(field, WHITELIST, Maps.newHashMap()); ( ReflectionUtils is a spring class that simply wraps the checked exceptions thrown by the reflection API)