Reading an ontology in GraphX ​​from an rdf model

I am trying to build a graphical representation of uniprot data using Spark (GraphX) using the owl / RDF format. I am trying to parse data using apache jena, but I cannot wrap my head around the structure of the rdf file. To better illustrate, here is an example of the type of file I'm trying to process. http://pastebin.com/iSeGs0RZ

For my needs I have to store / manipulate, for example,

   Do I need to save the seeAlso token and the predicate? " http://purl.uniprot.org/string/9606.ENSP00000418960 " when trying to load a model in java / scala print (model) displays most of the information, but I can not find a way to extract everything from the file.

This is what I use to read in the model: object runner {val inputFileName = "dataset / test2.xml"

  def main(args: Array[String]) {
    val model = ModelFactory.createDefaultModel()

    // use the FileManager to find the input file
    val in = FileManager.get().open(inputFileName)
    if (in == null) {
      throw new IllegalArgumentException(
        "File: " + inputFileName + " not found")
    }
    model.read(in, "RDF/XML")
    val items = model.listObjects()
    var count = 0
    while (items.hasNext) {
      count += 1
      val node = items.next()
      println(node)
      println("\n\n")
    }
    println(count)
  }
}
+4
source share

Source: https://habr.com/ru/post/1621277/


All Articles