Exclude results from DBpedia SPARQL query based on URI prefix

How can I exclude a concept group when using the DBpedia SPARQL endpoint ? I use the following basic query to get a list of concepts:

SELECT DISTINCT ?concept WHERE { ?xa ?concept } LIMIT 100 

SPARQL Results

This gives me a list of 100 concepts. I want to exclude all concepts included in the YAGO class / group (that is, whose IRIs start with http://dbpedia.org/class/yago/ ). I can filter out individual concepts as follows:

 SELECT DISTINCT ?concept WHERE { ?xa ?concept FILTER (?concept != <http://dbpedia.org/class/yago/1950sScienceFictionFilms>) } LIMIT 100 

SPARQL Results

But I can't figure out how to exclude all subclasses of YAGO from my results? I tried using * like this, but nothing worked:

 FILTER (?concept != <http://dbpedia.org/class/yago/*>) 

Update:

This request with regex seems to do the trick, but it is really, really slow and ugly. I am really looking forward to a better alternative.

 SELECT DISTINCT ?type WHERE { [] a ?type FILTER( regex(str(?type), "^(?!http://dbpedia.org/class/yago/).+")) } ORDER BY ASC(?type) LIMIT 10 
+6
source share
1 answer

This may seem a bit uncomfortable, but your comment on casting to a string and doing string checks is probably on the right track. You can do this a little more efficiently using the SPARQL 1.1 strstarts :

 SELECT DISTINCT ?concept WHERE { ?xa ?concept FILTER ( !strstarts(str(?concept), "http://dbpedia.org/class/yago/") ) } LIMIT 100 

SPARQL Results

Another alternative would be to find the top-level YAGO class and exclude those concepts that rdfs:subClassOf refer to the top-level class. This would probably be the best solution in the long run (since it does not require casting for strings and is based on the structure of the chart). Unfortunately, it does not appear that there is one top-level YAGO class comparable to owl:Thing . I simply downloaded the YAGO type hierarchy from the DBpedia download page and ran this query, which requests classes without superclasses, against it:

 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?root where { [] rdfs:subClassOf ?root filter not exists { ?root rdfs:subClassOf ?superRoot } } 

and I got these nine results:

 ---------------------------------------------------------------- | root | ================================================================ | <http://dbpedia.org/class/yago/YagoLegalActorGeo> | | <http://dbpedia.org/class/yago/WaterNymph109550125> | | <http://dbpedia.org/class/yago/PhysicalEntity100001930> | | <http://dbpedia.org/class/yago/Abstraction100002137> | | <http://dbpedia.org/class/yago/YagoIdentifier> | | <http://dbpedia.org/class/yago/YagoLiteral> | | <http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity> | | <http://dbpedia.org/class/yago/Thing104424418> | | <http://dbpedia.org/class/yago/Dryad109551040> | ---------------------------------------------------------------- 

Given that the concepts of YAGO are not as structured as some of the others, it seems like a row-based approach might be better in this case. However, if you want to, you can make such a query without a string query that defines 100 concepts, with the exception of those that have one of these nine results as a superclass:

 select distinct ?concept where { [] a ?concept . filter not exists { ?concept rdfs:subClassOf* ?super . values ?super { yago:YagoLegalActorGeo yago:WaterNymph109550125 yago:PhysicalEntity100001930 yago:Abstraction100002137 yago:YagoIdentifier yago:YagoLiteral yago:YagoPermanentlyLocatedEntity yago:Thing104424418 yago:Dryad109551040 } } } limit 100 

SPARQL Results

I'm not sure it will end faster. The first requires conversion to a string, and strstarts , if implemented naively, should consume http://dbpedia.org/class/ in each concept before anything is inappropriate. The second requires nine comparisons, which, if the IRIs are interned, are simply checks for object identification. This is an interesting question for further study.

+9
source

Source: https://habr.com/ru/post/954752/


All Articles