Xslt Punctuation Termination Removal

I am writing an xslt stylesheet to convert MARC-xml records to FGDC-xml metadata. Many MARC fields have extraneous punctuation at the end (periods, colons, commas, etc.) that I would like to remove. However, I do not want to delete all punctuation marks. My idea is to write a template with an if statement and check if the field ends with the specified character and then delete it, but I'm not sure: 1) if this is a good approach and 2) how to specify this process.

Edited My xslt:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:marc="http://www.loc.gov/MARC21/slim" > <xsl:output method="xml" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <xsl:for-each select="marc:collection/marc:record"> <xsl:result-document method="xml" href="banana_{marc:controlfield[@tag=001]}.xml"> <metadata> <xsl:apply-templates select="self::marc:record"/> </metadata> </xsl:result-document> </xsl:for-each> </xsl:template> <xsl:template match="marc:record"> <pubinfo> <pubplace><xsl:value-of select="marc:datafield[@tag=260]/marc:subfield[@code='a']"/></pubplace> <publish><xsl:value-of select="marc:datafield[@tag=260]/marc:subfield[@code='b']" /></publish> </pubinfo> </xsl:template> </xsl:stylesheet> 

And here is my xml document (or at least its representative part):

 <?xml version="1.0" encoding="UTF-8"?> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> <marc:record> <marc:leader>01502cfm a2200313 a 4500</marc:leader> <marc:controlfield tag="001">7943586</marc:controlfield> <marc:datafield tag="260" ind1=" " ind2=" "> <marc:subfield code="a">[Sl :</marc:subfield> <marc:subfield code="b">sn ,</marc:subfield> <marc:subfield code="c">18--]</marc:subfield> </marc:datafield> </marc:record> <marc:record> <marc:leader>01290cem a2200313 a 4500</marc:leader> <marc:controlfield tag="001">8108664</marc:controlfield> <marc:datafield tag="260" ind1=" " ind2=" "> <marc:subfield code="a">Torino :</marc:subfield> <marc:subfield code="b">Editore Gio. Batt. Maggi ,</marc:subfield> <marc:subfield code="c">1863.</marc:subfield> </marc:datafield> </marc:record> </marc:collection> 
+4
source share
2 answers

ends-with() accepts a simple string, not a regular expression. This is why you are having problems with:

 ends-with(marc:datafield[@tag=260]/marc:subfield[@code='b'],'.|:|,') 

If you want to use regex, use matches() :

 marc:datafield[@tag=260]/marc:subfield[@code='b']/matches(.,'^.*[\.:,]$') 

And remove using replace() :

 replace('Ends with punctuation.', '^(.*)[\.:,]$', '$1') => Ends with punctuation 

It would also be easier to just perform a replacement on each node instead of testing with the first one, since the case of lack of compliance will not replace, which is similar to the behavior you want anyway.

+4
source

There is a general solution that does not need to know what all the final punctuation marks are :

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="text()[matches(., '^.*\p{P}$')]"> <xsl:sequence select="replace(., '(^.*)\p{P}$', '$1')"/> </xsl:template> </xsl:stylesheet> 

When this conversion is applied to this XML document :

 <x> <t>Some text .</t> <t>Some text2 ;</t> <t>Some text3 (</t> <t>Some text4 !</t> <t>Some text5 "</t> </x> 

the desired, correct result is output:

 <x> <t>Some text </t> <t>Some text2 </t> <t>Some text3 </t> <t>Some text4 </t> <t>Some text5 </t> </x> 

Explanation

Proper use of the character class / category p{P} .

\p is an escape for the punctuation category. P is all the punctuation property.

Update

The OP provided a specific XML source document and its conversion code.

Here is her code modified using the above solution :

 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:marc="http://www.loc.gov/MARC21/slim" > <xsl:output method="xml" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <xsl:for-each select="marc:collection/marc:record"> <xsl:result-document method="xml" href="banana_{marc:controlfield[@tag=001]}.xml"> <metadata> <xsl:apply-templates select="self::marc:record"/> </metadata> </xsl:result-document> </xsl:for-each> </xsl:template> <xsl:template match="marc:record"> <pubinfo> <xsl:variable name="vSub1" select="marc:datafield[@tag=260]/marc:subfield[@code='a']"/> <xsl:variable name="vSub2" select="marc:datafield[@tag=260]/marc:subfield[@code='b']"/> <pubplace><xsl:value-of select="replace($vSub1, '(^.*)\s\p{P}$', '$1')"/></pubplace> <publish><xsl:value-of select="replace($vSub2, '(^.*)\s\p{P}$', '$1')" /></publish> </pubinfo> </xsl:template> </xsl:stylesheet> 
+2
source

Source: https://habr.com/ru/post/1445843/


All Articles