Fuzzy object request in Wikidata with Sparql

I am trying to make a fuzzy (i.e., incomplete or case insensitive) search for an entity label in Wikidata with Sparql (via an online endpoint). Unfortunately, they return a "QueryTimeoutException: the request has expired." I assume this is because the query returns too many results to skip the filter after 1 minute of the Wikidata timeout.

Here is a specific request:

def findByFuzzyLabel(self, item_label):
    qstring = '''
        SELECT ?item WHERE {
            ?item rdfs:label ?label .
            FILTER( lcase(str(?label)) = "%s")
        }
        LIMIT 20
        ''' % (item_label)
    results = self.query(qstring)

Is there a way to make a partial search by lines and / or insensitive to labels on Wikidata entity labels or do I need to do this offline when loading raw data?

I am looking for suitable shortcuts, such as Lindbergh, for Charles Lindbergh, and in some cases case insensitivity. Any suggestions on how to do this, whether through Sparql or offline in Python, are appreciated.

+1
source share
2 answers

Be more specific. Triplestores work with things, not strings. For example, the following query works fine:

SELECT ?item WHERE {
    ?item wdt:P735 wd:Q2958359 .
    ?item rdfs:label ?label .
    FILTER (CONTAINS(LCASE(STR(?label)), "lindbergh"))
}

If it is not possible to be specific enough, you will need full-text search capabilities.

  • In fact, Blazegraph supports full-text searches using the magic predicate bds:search, but this object is not included in Wikipedia.
  • , Blazegraph magic fts:search. Apache Solr. , ElasticSearch, Wikidata, .

, Wikidata, - .

SQL- Quarry. - Quarry:

USE wikidatawiki_p; 
DESCRIBE wb_terms;

SELECT CONCAT("Q", term_entity_id) AS wikidata_id, term_language, term_text, term_search_key
FROM wb_terms
WHERE term_type = 'label' AND
                         term_search_key IN (LOWER('Lindbergh'), LOWER('Charles Lindbergh'));

Quarry 30 .

+3

, , "contains".

:

 SELECT ?item WHERE {
            ?item rdfs:label ?label .
            FILTER( contains(lcase(?label), 'arles lin' ))
 }
 LIMIT 20

: contains XPath, SPARQL. .: https://www.w3.org/2009/sparql/wiki/Feature:FunctionLibrary#XQuery_1.0_and_XPath_2.0_Functions_and_Operators

enter image description here

2: ( )

PREFIX skos: <http://www.w3.org/2004/02/skos/core#Concept>
SELECT ?item  ?label WHERE {
            ?item rdfs:label ?label .
            ?item rdf:type dbo:Person   #Works with our without this too, also try skos:Category
            FILTER( contains(lcase(?label), 'arles lin' ) && LANGMATCHES(LANG(?label), "en")) 
 }
 LIMIT 20
+2

Source: https://habr.com/ru/post/1658847/


All Articles