Query large RDF datasets from memory

I want to download two or more datasets on my computer and be able to run the SPARQL endpoint for each. I tried Fuseki, which is part of the Jena project. However, it loads the entire data set into memory, which is not very desirable if I intend to query large arrays of data, such as DBpedia, given that I intend to do other things (starting with several SPARQL endpoints and using a federated query system for them).

Just to give you a head, I intend to link several data sets using SILK , requesting them using the FEDX federated query system. If you recommend any changes to the systems that I use, or can give feedback, that would be great. It will also be useful if you propose a dataset that can fit into this project.

+6
source share
2 answers

Jena Fuseki can use TDB as a storage engine, and TDB stores things on disk. The TDB caching documentation for 32-bit and 64-bit Java systems discusses how to map the contents of a file to memory. I do not think TDB / Fuseki loads the entire data set into memory; this is simply not possible for large data sets, but TDB can handle fairly large data sets. I think you should consider using tdbloader to create TDB storage; then you can point to him Fuseki.

Here is an example of TDB storage setup in this answer . There, the query is executed using tdbquery , but according to the Fuseki Server Launch in the documentation, all you need to start Fuseki with the same TDB store is the --loc=DIR option:

  • --loc=DIR
    Use an existing TDB database. Create empty if it does not exist.
+4
source

Source: https://habr.com/ru/post/946877/


All Articles