Xpath select attribute of current node?

I am using python with lxml to process xml. After I query / filter to get to the nodes, I want, but I have some problems. How to get xpath attribute value? Here is my input example.

>print(etree.tostring(node, pretty_print=True )) <rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/> 

The value I want is in resource = .... Currently, I just use lxml to get the value. I wonder if it can be done in pure xpath? thanks

EDIT: I forgot, saying that these are not root nodes, so I cannot use // here. I have 2000-3000 others in an XML file. My first attempt was to play with ". @Attrib" and "self :: * @", but they don't seem to work.

EDIT2: I will try my best to explain (well, this is my first time I encountered an xml problem using xpath. And English is not my favorite field ....). Here is my introductory snippet http://pastebin.com/kZmVdbQQ (full from here http://www.comp-sys-bio.org/yeastnet/ using version 4).

In my code, I am trying to get a speciesTypes node with a chebi resource link ( <rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>) ), and then I tried to get the value from rdf: resource attribute in rdf: li. The thing is, I’m sure that it would be easy to get an attribute in a child node if I start with a parent node, such as a view type, but I am wondering how to do this if I start with rdf: li. In my opinion, "//" in xpath will look for a node from not only not only in the current node.

below is my code

 import lxml.etree as etree tree = etree.parse("yeast_4.02.xml") root = tree.getroot() ns = {"sbml": "http://www.sbml.org/sbml/level2/version4", "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#", "body":"http://www.w3.org/1999/xhtml", "re": "http://exslt.org/regular-expressions" } #good enough for now maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns) def extract_name_and_chebi(node): name = node.attrib['name'] chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource assert len(chebies) == 1 #my current solution to get rdf:resource value from rdf:li node rdfNS = "{" + ns.get('rdf') + "}" chebi = chebies[0].attrib[rdfNS + 'resource'] #do protein later return (name, chebi) metaWithChebi = map(extract_name_and_chebi, maybemeta) fo = open("metabolites.txt", "w") for name, chebi in metaWithChebi: fo.write("{0}\t{1}\n".format(name, chebi)) 
+4
source share
3 answers

Name the attribute @ in the XPath request:

 >>> from lxml import etree >>> xml = """\ ... <?xml version="1.0" encoding="utf8"?> ... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/> ... </rdf:RDF> ... """ >>> tree = etree.fromstring(xml) >>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'} >>> tree.xpath('//rdf:li/@rdf:resource', namespaces=ns) ['urn:miriam:obo.chebi:CHEBI%3A37671'] 

EDIT

Here's the revised script in the question:

 import lxml.etree as etree ns = { 'sbml': 'http://www.sbml.org/sbml/level2/version4', 'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'body':'http://www.w3.org/1999/xhtml', 're': 'http://exslt.org/regular-expressions', } def extract_name_and_chebi(node): chebies = node.xpath(""" .//rdf:li[ starts-with(@rdf:resource, 'urn:miriam:obo.chebi') ]/@rdf:resource """, namespaces=ns) return node.attrib['name'], chebies[0] with open('yeast_4.02.xml') as xml: tree = etree.parse(xml) maybemeta = tree.xpath(""" //sbml:speciesType[descendant::rdf:li[ starts-with(@rdf:resource, 'urn:miriam:obo.chebi')]] """, namespaces = ns) with open('metabolites.txt', 'w') as output: for node in maybemeta: output.write('%s\t%s\n' % extract_name_and_chebi(node)) 
+3
source

To select the current node your attribute named rdf:resource , use this XPath expression :

 @rdf:resource 

To work correctly, you must register the association of the "rdf:" prefix in the appropriate namespace.

If you don’t know how to register the rdf namespace, you can still select the attribute β€” with this XPath expression:

 @*[name()='rdf:resource'] 
+1
source

OK I understood. I need the xpath expression here: "./@rdf:resource" not ". @Rdf: resource". But why? I thought that "./" indicates the child of the current node.

0
source

Source: https://habr.com/ru/post/1385030/


All Articles