Alternative for an OPTIONAL keyword in SPARQL queries?

I have a sparql-Query that requests specific URI properties of a given type. Since I'm not sure if these properties exist, I am using the EXTRA Keyword:

PREFIX mbo: <http://creativeartefact.org/ontology/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?uri a mbo:LiveMusicEvent. OPTIONAL {?uri rdfs:label ?label}. OPTIONAL {?uri mbo:organisedBy ?organiser}. OPTIONAL {?uri mbo:takesPlaceAt ?venue}. OPTIONAL {?uri mbo:begin ?begin}. OPTIONAL {?uri mbo:end ?end}. } 

When I run this query against my SPARQL (Virtuoso Server) endpoint, I received an error:

Virtuoso Error 42000 Estimated execution time -721420288 (s) exceeds 400 (s).

When I reduce OPTIONAL sentences, after the first sentence removed, the estimated execution time is 4106 seconds, when I delete two sentences, the query is executed (and returns values ​​instantly).

I don’t see why the estimated runtime takes off just like this with the optional OPTIONAL clauses, but maybe I'm just using the wrong constructed query?

+5
source share
1 answer

ADDITIONAL patterns are usually expensive to evaluate (compared to the "normal" join patterns) for the SPARQL mechanism. In this case, the error indicates that the Virtuoso query planner estimates that the request is too complex to complete on time (note that it evaluates this, so the exact value may be incorrect).

You have several alternatives. However, most of them are related to the execution of several queries. The usual pattern is the retrieve-and-iterate pattern - first you run a query that retrieves all instances of mbo:LiveMusicEvent :

  SELECT ?uri WHERE { ?uri a mbo:LiveMusicEvent } 

and then you iterate over the result and retrieve the optional properties of each instance:

 SELECT * WHERE { VALUES(?uri) { <http://example.org/instance1> } OPTIONAL {?uri rdfs:label ?label}. OPTIONAL {?uri mbo:organisedBy ?organiser}. OPTIONAL {?uri mbo:takesPlaceAt ?venue}. OPTIONAL {?uri mbo:begin ?begin}. OPTIONAL {?uri mbo:end ?end}. } 

As you can see, I use the VALUES clause to insert the instance id results from the first query into this second query. In this example, I assume that you iterate over one by one and therefore make a query for each instance, but as an additional optimization, you can insert multiple instances into the VALUES clause at a time (obviously, not all of them are immediate, since this will make the request as complex as the original).

By the way, VALUES is a SPARQL 1.1 feature, and I'm not sure if Virtuoso supports it. If not, you can achieve the same effect either with the FILTER clause or simply "manually" by replacing all occurrences of the ?uri variable with the instance ID for each iteration.

Another way to handle this is to first execute a CONSTRUCT query, which retrieves the appropriate subset of data from a larger source, and then runs your more complex query with options in that subset. For instance:

  CONSTRUCT WHERE { ?uri a mbo:LiveMusicEvent; ?p ?o . } 

will retrieve all data about instances of LiveMusicEvent as an RDF graph. Place this graph in a local RDF model (for example, Sesame Model or in-memory Repository, if you work in Java), and request it further from there.

+6
source

Source: https://habr.com/ru/post/1201580/


All Articles