How to load an XML file from a URL by escaping special characters such as & lt; & gt; $ Amp; etc?

I am using this code to download an Xml file.

String url="https://www.sec.gov/Archives/edgar/data/16160/000001616016000061/calm-20160528.xml";

            String fileName = url.substring(url.lastIndexOf("/") + 1,
                    url.length());

            String completeFileLocationWithName="/home/user/Downloads/XBRLCODE/"+fileName;

            URL surl = new URL(url);
            con = surl.openConnection();
            con.setConnectTimeout(0);
            con.setReadTimeout(0);
            InputStream in = con.getInputStream();
            Files.copy(in, Paths.get(completeFileLocationWithName));*/

as well as using String escapedInput = StringEscapeUtils.escapeXml(appNameInput);

INPUT: URL

OUTPUT - after downloading XML, it does not contain the above characters such as &lt;, &gt;, &amp;, etc. - instead of <,>, & it would be good for me ..

Please share knowledge about this.

0
source share
3 answers

Use StringEscapeUtils from commons-lang.jar library .

Here is the working code:

import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringEscapeUtils;

public class Test {

    public static void main(String[] args) {
        String url = "https://www.sec.gov/Archives/edgar/data/16160/000001616016000061/calm-20160528.xml";

        URL surl;
        try {
            surl = new URL(url);
            URLConnection con = surl.openConnection();
            con.setConnectTimeout(0);
            con.setReadTimeout(0);
            InputStream in = con.getInputStream();
            StringWriter writer = new StringWriter();
            IOUtils.copy(in, writer, "UTF-8");
            System.out.println(StringEscapeUtils.unescapeHtml(writer.toString()));
        } catch (MalformedURLException ex) {
            Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
        }

    }
}

Exiting without escaped characters, here is an example from the console:

<td valign="bottom" style="width:02.96%;border-top:1pt none #D9D9D9 ;border-left:1pt none #D9D9D9 ;border-bottom:1pt none #D9D9D9 ;border-right:1pt none #D9D9D9 ;background-color: #auto;height:1.00pt;padding:0pt;">
                    <p style="margin:0pt;font-family:Times New Roman;height:1.00pt;overflow:hidden;font-size:0pt;">
                        &nbsp;</p>
                </td>
                <td valign="bottom" style="width:02.40%;border-top:1pt none #D9D9D9 ;border-left:1pt none #D9D9D9 ;border-bottom:1pt none #D9D9D9 ;border-right:1pt none #D9D9D9 ;background-color: #auto;height:1.00pt;padding:0pt;">
                    <p style="margin:0pt;font-family:Times New Roman;height:1.00pt;overflow:hidden;font-size:0pt;">
                        &nbsp;</p>
                </td>
                <td valign="bottom" style="width:11.82%;border-top:1pt none #D9D9D9 ;border-left:1pt none #D9D9D9 ;border-bottom:1pt none #D9D9D9 ;border-right:1pt none #D9D9D9 ;background-color: #auto;height:1.00pt;padding:0pt;">
                    <p style="margin:0pt;font-family:Times New Roman;height:1.00pt;overflow:hidden;font-size:0pt;">
                        &nbsp;</p>
                </td>

Keep in mind that you need to:

import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringEscapeUtils;
0

, . XML HTML ( CSS, ).

node, , XML (<, >, & .. XML).

, XML node (us-gaap:FiscalPeriod) un , , - StringEscapeUtils.unescapeHtml, .

, , HTML .

0

, .

    InputStream iStream = new FileInputStream(new File("xxxxx"));
    StringWriter writer = new StringWriter();
    IOUtils.copy(iStream, writer, "UTF-8");
    String theString = writer.toString();
    IOUtils.write(StringEscapeUtils.unescapeXml(theString),
            new FileOutputStream("yyyy"));
0

Source: https://habr.com/ru/post/1648475/


All Articles