SPARQL Named Graphics and Federated Endpoints

Recently, I came across a working draft for SPARQL 1.1 Federation Extensions and wondered if this is possible with the help of the named graphs (so as not to detract from the usefulness of the aforementioned project).

My understanding of nominal graphs is a bit vague, but the only thing I woke up from reading the specifications was the rules around the merge, not related to other graphs at the time of the request. Since this does not completely satisfy my understanding, my question is this:

Given the following query:

SELECT ?something FROM NAMED <http://www.vw.co.uk/models/used> FROM NAMED <http://www.autotrader.co.uk/cars/used> WHERE { ... } 

Is it possible to assume that the processor / endpoint of the request can or should in the context of these schedules do the following:

  • Verify that a named graph exists locally

  • If he does not perform the following operation (in the case of the above request, I will use the second named schedule)

    Get / sparql /? query = EncodedQuery HTTP / 1.1 Host: www.autotrader.co.uk User-agent: my-sparql-client / 0.1

If EncodedQuery includes only the second named FROM NAMED in the FROM NAMED , and the WHERE changes accordingly with respect to GRAPH clauses (for example, if GRAPH <http://www.vw.co.uk/models/used> {...} )

Only if he cannot perform the above , do one of the following:

 GET /cars/used HTTP/1.1 Host: www.autotrader.co.uk 

or

 LOAD <http://www.autotrader.co.uk/cars/used> 
  1. Return relevant search results.

Obviously, there may be some additional considerations around OFFSET and LIMIT 's

I also remember how I read for a long time in the galaxy that the default graph of any SPARQL endpoint should be named graph in accordance with the following convention:

For: http://www.vw.co.uk/sparql/ there should be a named graph: http://www.vw.co.uk , which represents the default schedule, and therefore by the above logic it should already be possible to combine the final SPARQL points using named charts.

The reason I ask is because I want to start promoting federation across all domains in the above example, without waiting for the standard one, making sure that I will not do something that is out of order or incompatible with something otherwise in the future.

+4
source share
1 answer

Named columns and URLs used in federated queries (using SERVICE or FROM) are two different things. The latter point to the endpoints of SPARQL, named graphs are in triple storage and perform the main function of separating different data sets. This, in turn, can be useful both for increasing productivity and for presenting knowledge, for example, for representing what is the source of a set of statements.

For example, you might have two data sources that say " ?movie has-rating?x and you might want to know which source indicates which rating, in which case you can use two named graphs associated with these two sources (for example , http://www.example.com/rotten-tomatoes and http://www.example.com/imdb ). If you store both datasets in the same triple storage, you might want to use NG, and remote endpoints - is another addition, URL named Count can be used with dictionaries, such as. VOID, to describe a set given s as a whole (eg, the data set name, where and when imported triplets, who is the maintainer, user license). This is another reason for the separation of your triple store on NG.

However, your mechanism for linking NG to endpoint URLs can be implemented as an option, but I don’t think it would be advisable to have it as a must, since managing the remote endpoints and NG URLs individually can be more useful .

Moreover, the real problem with federated queries is to offer endpoint transparent queries, making the query mechanism smart enough to parse the query and understand how to split it and run partial queries on the right endpoints (and combine the results later, efficiently way). A lot of research is being done on this issue, one of the most significant results (as far as I know) is FedX , which was used to implement several query distribution optimizations ( example ).

The last thing to add, I vaguely remember the convention that you mention about $ url, $ url / sparql. There are several approaches (e.g. LOD cloud ). However, in most modern triple storages (for example, Virtuoso), queries that do not specify a named graph (do not use GRAPH) do not work as in the case of the default graph, they actually request the union of all named graphs in the store, which is usually much more useful (when you don’t know where something is indicated, or want to integrate cross-graphic data).

+1
source

Source: https://habr.com/ru/post/1340319/


All Articles