How to access rdf list members using rdflib (or plain sparql)

Question

How to access rdf list members using rdflib (or plain sparql)

What is the best way to access rdf list members? I use rdflib (python), but the answer given in simple SPARQL is also good (this type of answer can be used via rdfextras, the rdflib helper library).

I am trying to access the authors of a specific magazine article in rdf released by Zotero (some fields have been removed for brevity):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:z="http://www.zotero.org/namespaces/export#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:bib="http://purl.org/net/biblio#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xmlns:link="http://purl.org/rss/1.0/modules/link/"> <bib:Article rdf:about="http://www.ncbi.nlm.nih.gov/pubmed/18273724"> <z:itemType>journalArticle</z:itemType> <dcterms:isPartOf rdf:resource="urn:issn:0954-6634"/> <bib:authors> <rdf:Seq> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Hyoun Seung</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Jong Hee</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Ahn</foaf:surname> <foaf:givenname>Gun Young</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Dong Hun</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Shin</foaf:surname> <foaf:givenname>Jung Won</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Kim</foaf:surname> <foaf:givenname>Dong Hyun</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Chung</foaf:surname> <foaf:givenname>Jin Ho</foaf:givenname> </foaf:Person> </rdf:li> </rdf:Seq> </bib:authors> <dc:title>Fractional photothermolysis for the treatment of acne scars: a report of 27 Korean patients</dc:title> <dcterms:abstract>OBJECTIVES: Atrophic post-acne scarring remains a therapeutically challe *CUT*, erythema and edema. CONCLUSIONS: The 1550-nm erbium-doped FP is associated with significant patient-reported improvement in the appearance of acne scars, with minimal downtime.</dcterms:abstract> <bib:pages>45-49</bib:pages> <dc:date>2008</dc:date> <z:shortTitle>Fractional photothermolysis for the treatment of acne scars</z:shortTitle> <dc:identifier> <dcterms:URI> <rdf:value>http://www.ncbi.nlm.nih.gov/pubmed/18273724</rdf:value> </dcterms:URI> </dc:identifier> <dcterms:dateSubmitted>2010-12-06 11:36:52</dcterms:dateSubmitted> <z:libraryCatalog>NCBI PubMed</z:libraryCatalog> <dc:description>PMID: 18273724</dc:description> </bib:Article> <bib:Journal rdf:about="urn:issn:0954-6634"> <dc:title>The Journal of Dermatological Treatment</dc:title> <prism:volume>19</prism:volume> <prism:number>1</prism:number> <dcterms:alternative>J Dermatolog Treat</dcterms:alternative> <dc:identifier>DOI 10.1080/09546630701691244</dc:identifier> <dc:identifier>ISSN 0954-6634</dc:identifier> </bib:Journal>

+4

python rdf sparql rdflib

tjb Jan 15 '11 at 10:33

source share

2 answers

RDFLib versions of RDFLib collections can be accessed in a more orderly manner. Programmatically accessing elements in a sequence can now be done using the Seq class:

 from rdflib import * from rdflib.graph import Seq from rdflib.namespace import FOAF BIB = Namespace("http://purl.org/net/biblio#") # Load data g = Graph() g.parse(file=open("./zotero.rdf", "r"), format="application/rdf+xml") # Get the first resource linked to article via bib:authors article = URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724") authors = g.objects(article, BIB.authors).__next__() i = 1 for author in Seq(g, authors): givenname = g.triples((author, FOAF.givenname, None)).__next__()[2] surname = g.triples((author, FOAF.surname, None)).__next__()[2] print("%i: %s %s" % (i, str(givenname), str(surname))) i += 1

0

Robin Keskisarkka Oct 24 '17 at 11:33

source share

Manuel salvadores · Accepted Answer · 2011-01-16T11:46:46+0000

rdf containers are pain in general, quite annoying to deal with. I am posting two solutions, one without SPARQL and another wit SPARQL. Personally, I prefer the second, one that uses SPARQL.

Example 1: without SPARQL

To get all the authors for this article, as in your case, you could do something like the code that I publish below.

I added comments to explain this myself. The most important bit is to use g.triple(triple_pattern) with this plot function, basically you can filter the rdflib plot and look for the triple patterns you need.

When rdf: Seq is parsed, form predicates:

http://www.w3.org/1999/02/22-rdf-syntax-ns#_1

http://www.w3.org/1999/02/22-rdf-syntax-ns#_2

http://www.w3.org/1999/02/22-rdf-syntax-ns#_3

rdflib retrieves them randomly, so you need to sort them into cross them in the correct order.

 import rdflib RDF = rdflib.namespace.RDF #Parse the file g = rdflib.Graph() g.parse("zot.rdf") #So that we are sure we get something back print "Number of triples",len(g) #Couple of handy namespaces to use later BIB = rdflib.Namespace("http://purl.org/net/biblio#") FOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/") #Author counter to print at the bottom i=0 #Article for wich we want the list of authors article = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724") #First loop filters is equivalent to "get all authors for article x" for triple in g.triples((article,BIB["authors"],None)): #This expresions removes the rdf:type predicate cause we only want the bnodes # of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER # where SEQ_NUMBER is the index of the element in the rdf:Seq list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None))) #We sort the authors by the predicate of the triple - order in sequences do matter ;-) # so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435 # and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate) authors_sorted = sorted(list_triples,key=lambda x: int(x[1][44:])) #We iterate the authors bNodes and we get surname and givenname for author_bnode in authors_sorted: for x in g.triples((author_bnode[2],FOAF['surname'],None)): author_surname = x[2] for y in g.triples((author_bnode[2],FOAF['givenname'],None)): author_name = y[2] print "author(%s): %s %s"%(i,author_name,author_surname) i += 1

This example shows how to do this without using SPARQL.

Example 2: with SPARQL

Now there is exactly the same example, but using SPARQL.

 rdflib.plugin.register('sparql', rdflib.query.Processor, 'rdfextras.sparql.processor', 'Processor') rdflib.plugin.register('sparql', rdflib.query.Result, 'rdfextras.sparql.query', 'SPARQLQueryResult') query = """ SELECT ?seq_index ?name ?surname WHERE { <http://www.ncbi.nlm.nih.gov/pubmed/18273724> bib:authors ?seq . ?seq ?seq_index ?seq_bnode . ?seq_bnode foaf:givenname ?name . ?seq_bnode foaf:surname ?surname . } """ for row in sorted(g.query(query, initNs=dict(rdf=RDF,foaf=FOAF,bib=BIB)), key=lambda x:int(x[0][44:])): print "Author(%s) %s %s"%(row[0][44:],row[1],row[2])

As shown in the figure, we still have to sort, because the library itself does not process it. In the query, the variable seq_index contains a predicate that contains information about the sequence order, and one that performs sorting into lambda functions.

How to access rdf list members using rdflib (or plain sparql)

More articles: