I am trying to build a graphical representation of uniprot data using Spark (GraphX) using the owl / RDF format. I am trying to parse data using apache jena, but I cannot wrap my head around the structure of the rdf file. To better illustrate, here is an example of the type of file I'm trying to process.
http://pastebin.com/iSeGs0RZ
For my needs I have to store / manipulate, for example,
Do I need to save the seeAlso token and the predicate? " http://purl.uniprot.org/string/9606.ENSP00000418960 " when trying to load a model in java / scala print (model) displays most of the information, but I can not find a way to extract everything from the file.
This is what I use to read in the model: object runner {val inputFileName = "dataset / test2.xml"
def main(args: Array[String]) {
val model = ModelFactory.createDefaultModel()
val in = FileManager.get().open(inputFileName)
if (in == null) {
throw new IllegalArgumentException(
"File: " + inputFileName + " not found")
}
model.read(in, "RDF/XML")
val items = model.listObjects()
var count = 0
while (items.hasNext) {
count += 1
val node = items.next()
println(node)
println("\n\n")
}
println(count)
}
}
source
share