I am using python with lxml to process xml. After I query / filter to get to the nodes, I want, but I have some problems. How to get xpath attribute value? Here is my input example.
>print(etree.tostring(node, pretty_print=True )) <rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
The value I want is in resource = .... Currently, I just use lxml to get the value. I wonder if it can be done in pure xpath? thanks
EDIT: I forgot, saying that these are not root nodes, so I cannot use // here. I have 2000-3000 others in an XML file. My first attempt was to play with ". @Attrib" and "self :: * @", but they don't seem to work.
EDIT2: I will try my best to explain (well, this is my first time I encountered an xml problem using xpath. And English is not my favorite field ....). Here is my introductory snippet http://pastebin.com/kZmVdbQQ (full from here http://www.comp-sys-bio.org/yeastnet/ using version 4).
In my code, I am trying to get a speciesTypes node with a chebi resource link ( <rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>)
), and then I tried to get the value from rdf: resource attribute in rdf: li. The thing is, Iβm sure that it would be easy to get an attribute in a child node if I start with a parent node, such as a view type, but I am wondering how to do this if I start with rdf: li. In my opinion, "//" in xpath will look for a node from not only not only in the current node.
below is my code
import lxml.etree as etree tree = etree.parse("yeast_4.02.xml") root = tree.getroot() ns = {"sbml": "http://www.sbml.org/sbml/level2/version4", "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#", "body":"http://www.w3.org/1999/xhtml", "re": "http://exslt.org/regular-expressions" } #good enough for now maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns) def extract_name_and_chebi(node): name = node.attrib['name'] chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource assert len(chebies) == 1 #my current solution to get rdf:resource value from rdf:li node rdfNS = "{" + ns.get('rdf') + "}" chebi = chebies[0].attrib[rdfNS + 'resource'] #do protein later return (name, chebi) metaWithChebi = map(extract_name_and_chebi, maybemeta) fo = open("metabolites.txt", "w") for name, chebi in metaWithChebi: fo.write("{0}\t{1}\n".format(name, chebi))