Import RDF data into SQL?

I am comfortable using SQL , but with an understanding of the impossible time of SPARQL . Firstly, I don’t even understand how to look at the data structure (in MySQL I would just do describe <table name> ), so I can request the appropriate fields.

Is there a way to import the entire RDF dataset into the corresponding tables in the MySQL database?

Ban on whether there is a SELECT * method from all tables (or any other equivalent descriptor) so that I can get all the output in csv (and get it from there?)

The RDF dataset I'm trying to run has an SPARQL endpoint and even a guide to How to SPARQL, but it's hard for me to understand this.

For instance:

 PREFIX meannot: <http://rdf.myexperiment.org/ontologies/annotations/> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX mebase: <http://rdf.myexperiment.org/ontologies/base/> SELECT DISTINCT ?annotator_name WHERE { ?comment mebase:annotates <http://www.myexperiment.org/workflows/52> . ?comment rdf:type meannot:Comment . ?comment mebase:has-annotator ?annotator ?annotator sioc:name ?annotator_name } 

makes little sense to me. Why is there a period at the end of some WHERE statements and not others? and what does ?comment mebase:has-annotator ?annotator in plain English? Choose the name of annotators, where is the name of annotators the name of annotators? a?

I would be grateful for any resources you could point me to.

+5
source share
2 answers

Although SPARQL looks like SQL in its syntax, how it functions is actually quite different, which is a problem that you and many others are trying to study.

Pattern matching

SPARQL is a triple pattern matching, not a selection from tables such as SQL. Each set of three elements in your example is a triple pattern. For example:

 ?comment rdf:type meannot:Comment . 

This tells the SPARQL processor to find any thing that has rdf:type of meannot:Comment ie things that have a type comment . In this ?comment template, there is a variable that acts as a wildcard, think of it as a field in SQL that you can select

If we add an additional triple pattern that uses a variable, we will ask the SPARQL processor to find all the things that match all triple patterns, therefore:

 ?comment mebase:annotates <http://www.myexperiment.org/workflows/52> . ?comment rdf:type meannot:Comment . 

This finds things that are comments on a specific item .
In terms of SQL, it would be like writing SELECT commentID FROM COMMENTS WHERE itemID=1234 if that helps you figure it out.

When we start adding extra variables, you might think about it, since execution is combined with other tables:

 ?comment mebase:annotates <http://www.myexperiment.org/workflows/52> . ?comment rdf:type meannot:Comment . ?comment mebase:has-annotator ?annotator . 

It finds things that are comments and users who made them on a specific item.
It will be roughly equivalent to SELECT commentID, userID FROM COMMENTS C INNER JOIN USERS U ON C.userID=U.userID WHERE itemID=1234 in SQL

Syntax Notes

As for the syntax, then . marks the end of the triple pattern.
The fact that it is omitted in your example is actually a mistake on the part of the people publishing the manual. I happen to work at one of the universities that are participating in the project, so I threw a note to my colleague asking me to fix it.

In the examples, you can also see usage ; at the end of the triple pattern. This is an abbreviation for repeating an item, for example.

 ?comment mebase:annotates <http://www.myexperiment.org/workflows/52> ; rdf:type meannot:Comment . 

So you do not need to enter ?comment again for the subsequent template.

Similarity , used to repeat an object and a predicate:

 ?comment rdf:type meannot:Comment , ex:Annotation . 

It would mean that ?comment and rdf:type repeated, in plain English this means things that have type comment and type annotations

Data Structure Discovery

RDF is not stored in tables, since it is a schematic data model closest to tables - these are graphs that are just a way to logically group multiple triples together.

Take a look at this question in the SPARQL search queries for some query suggestions.

If you just want to select everything you can do SELECT * WHERE { ?s ?p ?o } - beware that many endpoints will impose a limit on the number of results per query, so even if the endpoint has millions of triples behind him, you can only get a few thousand back. You can view pages using LIMIT and OFFSET for example.

 SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 0 SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 1000 SELECT * WHERE { ?s ?p ?o } LIMIT 1000 OFFSET 2000 # And so forth until you find no further results 

If you just want all the data to be trawled, try looking at the site to see if they offer an RDF dump, which will usually be an archive with a zip file with a bunch of RDF files. This will allow you to view local data.

Putting RDF in SQL Tables

There are systems that allow you to store RDF in SQL-based databases, but take them from someone who has worked with a wide variety of three stores, this is nowhere comparable to how to use your own three-location store.

You might be interested in R2RML , which is the new W3C standard (currently at the beginning of a working draft) that defines a standard way for mapping relational data to RDF. Some of their documentation may help you better understand the relationship between RDF / SPARQL and SQL.

Textbooks

For a more complete guide, I tested SPARQL for an example that is one of the authors of the SPARQL specification and is highly recommended.

+15
source

You can use RDF2X to convert large RDF dumps to MySQL, PostgreSQL, or another relational database. A simple alternative for smaller rdf2rdb datasets .

+2
source

Source: https://habr.com/ru/post/891770/


All Articles