Sparql query runs forever

I am struggling with executing a SPARQL query in Yen, with a result that I don't understand ...

I am trying to request an Esco ontology ( https://ec.europa.eu/esco/download ) and I use TDB to load the ontology and create the model (sorry, if the conditions that I use are not exact, I'm not very experienced).

My goal is to find the position of uri in the ontology that matches the text I previously extracted: ex: extracted term: "acuponcteur" → label in the ontology: "Acuponcteur" @fr → uri: < http: //ec.europa. eu / esco / occupation / 14918 >

What I call "strange behavior" is related to the results that I get (or not) when issuing queries, that is:

When executing the following query:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX esco: <http://ec.europa.eu/esco/model#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?position WHERE { ?s rdf:type esco:Occupation. { ?position skos:prefLabel ?label. } UNION { ?position skos:altLabel ?label. } FILTER (lcase(?label)= \"acuponcteur\"@fr ) } LIMIT 10 

I get these results after 1 minute:

 ----------------------------------------------- | position | =============================================== | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | | <http://ec.europa.eu/esco/occupation/14918> | ----------------------------------------------- 

However, when I try to add the DISTINCT keyword, this way:

 PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX esco: <http://ec.europa.eu/esco/model#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT DISTINCT ?position WHERE { ?s rdf:type esco:Occupation. { ?position skos:prefLabel ?label. } UNION { ?position skos:altLabel ?label. } FILTER (lcase(?label)= \"acuponcteur\"@fr ) } LIMIT 10 

it looks like the request continues to run forever (I stopped execution after 20 minutes of waiting ...)

I get the same behavior when executing the same query as the first (without DISTINCT), with a different label to match, a label that I'm sure is not in an ontology. Expecting an empty result, he (it seems that he) continues to work, and I have to kill him after a while (once again I waited a maximum of 20 minutes):

 PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX esco: <http://ec.europa.eu/esco/model#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?position WHERE { ?s rdf:type esco:Occupation. { ?position skos:prefLabel ?label. } UNION { ?position skos:altLabel ?label. } FILTER (lcase(?label)= \"assistante scolaire\"@fr ) } LIMIT 10 

Maybe the problem is in the code I'm running? There he is:

 public static void main(String[] args) { // Make a TDB-backed dataset String directory = "data/testtdb" ; Dataset dataset = TDBFactory.createDataset(directory) ; // transaction (protects a TDB dataset against data corruption, unexpected process termination and system crashes) dataset.begin( ReadWrite.WRITE ); // assume we want the default model, or we could get a named model here Model model = dataset.getDefaultModel(); try { // read the input file - only needs to be done once String source = "data/esco.rdf"; FileManager.get().readModel(model, source, "RDF/XML-ABBREV"); // run a query String queryString = "PREFIX skos: <http://www.w3.org/2004/02/skos/core#> " + "PREFIX esco: <http://ec.europa.eu/esco/model#> " + "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " + "SELECT ?position " + "WHERE { " + " ?s rdf:type esco:Occupation. " + " { ?position skos:prefLabel ?label. } " + " UNION " + " { ?position skos:altLabel ?label. }" + " FILTER (lcase(?label)= \"acuponcteur\"@fr ) " + "}" + "LIMIT 1 " ; Query query = QueryFactory.create(queryString) ; // execute the query QueryExecution qexec = QueryExecutionFactory.create(query, model) ; try { ResultSet results = qexec.execSelect() ; // taken from apache Jena tutorial ResultSetFormatter.out(System.out, results, query) ; } finally { qexec.close() ; } } finally { model.close() ; dataset.end(); } } 

What am I doing wrong here? Any idea?

Thanks!

+6
source share
1 answer

As the first point, which may or may not matter much, you can use the property path to simplify

 { ?position skos:prefLabel ?label. } UNION { ?position skos:altLabel ?label. } 

as

 ?position skos:prefLabel|skos:altLabel ?label 

This makes a request:

 SELECT ?position WHERE { ?s rdf:type esco:Occupation. # (1) ?position skos:prefLabel|skos:altLabel ?label # (2) FILTER (lcase(?label)="acuponcteur"@fr ) } 

What is the meaning of this request? There are several n pairs of positions /? Labels that correspond to (2) and some values ​​of m values ​​of s that correspond to (1). The number of results you received from the request is m & times; n, but you never use the value? S. It looks like you used DISTINCT to get rid of some duplicate values, but you haven't seen why you get duplicate values ​​in the first place. You should just delete the useless line (1) and get the request:

 SELECT DISTINCT ?position WHERE { ?position skos:prefLabel|skos:altLabel ?label FILTER (lcase(?label)="acuponcteur"@fr ) } 

I would not be surprised if at the moment you no longer need DISTINCT.

+6
source

Source: https://habr.com/ru/post/1200308/


All Articles