How many triples are in these RDF files? I tested rdflib and it will not scale much further than a few tens of kilograms - if you are lucky. In no case does this work very well for files with millions of triples.
The best parser is the rapper from Redland Libraries . My first tip is not to use RDF/XML and switch to ntriples . Ntriples is a lighter format than RDF / XML. You can convert from RDF / XML to ntriples with rapper :
rapper -i rdfxml -o ntriples YOUR_FILE.rdf > YOUR_FILE.ntriples
If you like Python, you can use Redland python bindings :
import RDF parser=RDF.Parser(name="ntriples") model=RDF.Model() stream=parser.parse_into_model(model,"file://file_path", "http://your_base_uri.org") for triple in model: print triple.subject, triple.predicate, triple.object
I parsed rather large files (a couple of gigabytes) with red libraries without any problems.
After all, if you are working with large datasets, you may need to approve your data in a scalable triple storage, then I usually use 4store . 4store internally uses redland to parse RDF files. In the long run, I think for a scalable store three times you will need to do. And with this, you can use SPARQL to query your data and SPARQL / Update to insert and delete triples.
source share