Ontologies, OWL, Sparql: Modeling that “something is missing” and performance considerations

we want to simulate that “something does not exist”, as opposed to missing information, for example. the explicit expression that “the patient has not received chemotherapy” or that “the patient has no shortness of breath” is different from the missing information about whether the patient has shortness of breath.

We thought of several approaches, for example.

  • Using the negation class: "No_Dyspnea". But this seems semantically problematic, since what type will be for this class? He cannot be a descendant of the Dyspnea class.
  • Using the properties of the object does not exist, for example. "denies" or "does_not_have" and then the identity of the root class Dyspnea as an object of this triple.
  • The use of empty nodes that describe that a person belongs to a group of things that do not have shortness of breath. For instance:.

    dat:PatientW2 a [ rdf:type owl:Class; owl:complementOf [ rdf:type owl:Restriction ; owl:onProperty roo:has_finding; owl:someValuesFrom nci:Dyspnea; ] ] . 

We feel that the third option is the most “ontologically correct” way of expressing this. However, when we played with him, we ran into serious performance issues in simple scenarios.

We use Sesame with the OWLIM-Lite store and import the NCI thesaurus (280 MB, about 80,000 concepts) and another very small ontology in the repository and add two people who have an add-on / restriction class.

The following query was executed forever, and I finished it in 15 minutes:

 select * where { ?sa [ rdf:type owl:Class; owl:complementOf [ rdf:type owl:Restriction ; owl:onProperty roo:has_finding; owl:someValuesFrom nci:Dyspnea; ] ] . } Limit 100 

Does anyone know why? I would suggest that this approach creates many empty nodes, and the query mechanism should go through the entire NCI thesaurus and compare all empty nodes with this?

If I put this triple in a separate graph and only requested this graph, the query instantly returns the result.

To sum up. Two main questions:

  • Is the third approach really the best for something not modeling
  • Will this affect query performance?

EDIT 1

We discussed the proposed options. It really helped us in clarifying what we are really trying to achieve:

  • We want to be able to state that "the patient has Dyspnea" or "The patient does not have Dyspnea" at a specific point in time.

  • In the future, there may be / will be more information about this patient, for example. that he is short of breath now.

  • We want to be able to write Sparql queries that require "all patients with dyspnea" and "all patients who do not have dyspnea."

  • We want Sparql to be as simple and intuitive as possible. For instance. use only one has_finding property instead of knowing about two properties (one for has_exclusion). Or you need to know about some kind of complex empty node construct.

We played with the options:

  • Negative property statements . This sounded like the best solution to this problem, since we declare that one person is not connected to another person by this property. The problem is that we have to create a Disney individual in order to have something like owl:targetIndividual . And we cannot find a way to easily answer a negative statement, and then go through a whole chain of owl:sourceIndividual and owl:targetIndividual . This makes Sparql fairly lengthy and puts a strain on the person who is writing the query to find out about it.
  • Empty node with the addition of . We would say something with this that we do not want to say. This suggests that "patient 1 can never have shortness of breath detection." While we want to state that "Patient1 does not have shortness of breath now (or on date X)." Therefore, we should not use this approach.

  • Use of exceptions / inclusion types (Option 1 and 2) . After a more detailed study of the Jeen proposal, we believe that using the common classes :Exclusion and :Inclusion together with one has_finding property and providing the individual with dyspnea, the inclusion / exclusion type is easiest to understand, request and provide sufficient judicious ability. Example:

    :Patient1 a :Patient . :Dyspnea1 a :Dyspnea . :Dyspnea1 a :Exclusion. :Patient1 ex:has_finding :Dyspnea1 .

Thus, the person writing the Sparql query should only know:

  • There is one has_finding property that correctly reflects intentions. Since "without shortness of breath" is also technically a godsend.
  • But just asking with has_finding will not give enough information about whether it really has it or not. The request should also contain a triple on whether the individual is dyspnea a :Exclusion (or inclusion, depending on the purpose of the request).
  • Although this puts an additional strain on the query writer, it has fewer negative property statements and is easier to understand.

We will be very grateful for feedback on these findings!

+6
source share
2 answers

Regarding the issue of modeling, I would like to offer a fourth alternative, which, in essence, is a combination of your options 1 and 2: introduce a separate class (hierarchy) for these “excluded / absent” symptoms, diseases or treatment, as well as specific exceptions in as examples:

  :Exclusion a owl:Class . :ExcludedSymptom rdfs:subClassOf :Exclusion . :ExcludedTreatment rdfs:subClassOf :Exclusion . :excludedDyspnea a :ExcludedSymptom . :excludedChemo a :ExcludedTreatment . :Patient a owl:Class ; owl:equivalentClass [ a owl:Restriction ; owl:onProperty :excluded ; owl:allValuesFrom :Exclusion ] . // john is a patient without Dyspnea :john a :Patient ; :excluded :excludedDyspnea . 

Optionally, you can associate exception instances semantically with treatment / symptoms / diseases:

  :excludedDyspnea :ofSymptom :Dyspnea . 

In my opinion, this is just as “ontologically correct” (such a thing is quite subjective, to be honest), like your other options, and it may be much easier to support, query and really reason.

As for your second question: although I can’t talk about the behavior of the particular argument that you are using, generally any construct involving complementOf is very hard to compute, but perhaps more importantly, it probably does not capture what you intend.

OWL has an open global assumption, which (in a broad sense) means that we cannot decide that a certain fact is incorrect simply because this fact is currently unknown. The complementOf construct will logically be an empty class, because for any single X , even if we do not currently know that X was diagnosed with dyspnea, it is likely that this fact will become known in the future, and therefore X will not be in the complement class.

EDIT

In response to your edit, with a suggestion using one property :hasFinding , I think it looks good, although I might change it a bit:

  :patient1 a :Patient; :hasFinding :dyspneaFinding1 . :dyspneaFinding1 a :Finding ; :of :Dyspnea ; :conclusion false . 

Now you have separated the “finding” as a concept a little more purely from the symptom / treatment that was the detection. In addition, explicit modeling (or the presence or absence of an “excluded” property or an “Exception” type is not implied) whether this positive or negative finding is positive or negative.

(Aside: as we connect the person to the class here through a non-printing relationship ( ... :of :Dyspnea ), we must rely on OWL 2 punning to make this valid in OWL DL)

Request a patient with detection (positive or negative) about the discrete:

  SELECT ?x WHERE { ?xa :Patient; :hasFinding [ :of :Dyspnea ] . } 

And to request patients with confirmed lack of dyspnea:

  SELECT ?x WHERE { ?xa :Patient; :hasFinding [ :of :Dyspnea ; :conclusion false ] . } 
+2
source

If your illnesses are presented as individuals, then you can use the statements of the properties of a negative object to literally say, for example,

& Not; hasFinding (John, dyspnea)

NegativeObjectPropertyAssertion (hasFinding john Dyspnea)

Of course, if you have a lot of things that do not have a place, then this may be a bit related. This is probably the most semantically correct. It also means that your query can be directly mapped to data in the ontology, which can lead to faster results. (Of course, you still have trouble trying to conclude when the property of the negative object is saved.)

This does not work if diseases are presented as classes. If diseases are represented by classes, then you can use class expressions similar to what you suggest. For instance,

This is similar to your third option, but I wonder if it can work better. This seems to be a slightly more direct way of saying what you are trying to say (that is, if someone has a disease, this is not one of these diseases).

I agree with Jeen's answer ; there is a lot of subjectivity here, and many of them become “right”, really just a matter of finding something reasonable for the job, good enough for you, and it doesn't seem completely unnatural.

+3
source

Source: https://habr.com/ru/post/989247/


All Articles