How can I avoid text for an XML document in Perl?

Does anyone know of any Perl module for removing text in an XML document?

I am creating XML that will contain the text entered by the user. I want to process the text correctly so that the resulting XML is well formed.

+3
source share
9 answers

I personally prefer XML :: LibXML - Perl binding for libxml. One of the pros - it uses one of the fastest XML processing libraries. Here is an example of creating node text:

use XML::LibXML; my $doc = XML::LibXML::Document->new('1.0',$some_encoding); my $element = $doc->createElement($name); $element->appendText($text); $xml_fragment = $element->toString(); $xml_document = $doc->toString(); 

And never, never, never create XML manually. It will be bad for your health when people find out what you have done.

+9
source

I'm not sure why you need to avoid the text that is in the XML file. If your file contains:

 <foo>x < y</foo> 

The file is not an XML file, despite the spread of angle brackets. The XML file must contain valid data meaning something like this:

 <foo>x &lt; y</foo> 

or

 <foo><![CDATA[x < y]]></foo> 

Therefore, either:

  • You do not request escaping data in an XML file. Rather, you want to figure out how to put character data in an XML file so that the resulting file is valid XML; or

  • You have some data in an XML file that needs to be deleted for some other reason.

Care for details?

+8
source

XML :: Simple escape_value can also be used, but using new versions for XML :: Simple is not recommended. View this post 17436965.

Manual shutdown can be done using a regular expression (copied from escape_value):

 $data =~ s/&/&amp;/sg; $data =~ s/</&lt;/sg; $data =~ s/>/&gt;/sg; $data =~ s/"/&quot;/sg; 
+8
source

Use XML :: Code .

From CPAN

XML :: code escape ()

Normally, any node content will be escaped during rendering (that is, special characters such as '&' will be replaced with the corresponding objects). Call escape () with a null argument to prevent it:

  my $p = XML::Code->('p'); $p->set_text ("&#8212;"); $p->escape (0); print $p->code(); # prints <p>&#8212;</p> $p->escape (1); print $p->code(); # prints <p>&amp;#8212;</p> 
+6
source

XML :: Entities :

 use XML::Entities; my $a_encoded = XML::Entities::numify('all', $a); 

Edit: XML :: Objects only number HTML objects. Use HTML :: Entities encode_entities ($ a) instead

+3
source

Use

XML :: Generator

XML :: Generator required;

my $ xml = XML :: Generator-> new (': pretty', escape => 'always, apos');

print $ xml-> h1 ("& <> non-html plain text <> &");

which will print all content inside escaped tags (no markup conflicts).

+3
source

After checking the XML :: Code, as recommended by Krish, I found that this can be done using the XML :: Code text() function. For instance.

 use XML::Code; my $text = new XML::Code('='); $text->set_text(q{> & < " ' "}); print $text->code(); # prints &gt; &lt; &amp; " ' " 

Passing '=' creates node text that does not contain tags when printed. Note: this only works for text data. It will not correctly avoid attributes.

+1
source

Although you are better off using a module like XML::LibXML or XML::Code , you can wrap text data in a CDATA section. You must take care not to put it into it ]]> (this sequence is also forbidden outside the CDATA sections!):

 $text =~ s/\]\]>/]]>]]&gt;<![CDATA[/; $text = "<![CDATA[$text]]>"; $xml = "<foo>$text</foo>"; 

As a bonus, your code will look more perlish confusing! :-)

0
source

For programs that must handle each special case, be sure to use the official library for this task. However, theoretically there are only 5 characters that need to be escaped in XML.

So, for one-time sentences, for which you do not want to use an additional library, the following perl expression should suffice:

 perl -pe 's/\&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g; s/"/\&quot;/g; s/'"'"'/\&apos;/g' 
0
source

Source: https://habr.com/ru/post/1302966/


All Articles