SPARQL: how to find similar rows?

Question

SPARQL: how to find similar rows?

I use Jena to query data stored in an ontology. Some objects are identified by a string, however, sometimes the same string is not available, as I process scanned documents and therefore there may be OCR errors. Therefore, I would like to find the most similar lines. Is there a way to use SPARQL for this purpose? Can I somehow calculate the levenshtein distance in SPARQL?

If this is not possible, I can still calculate the levenshtein distance in java. However, an efficient algorithm still needs to filter out irrelevant rows using SPARQL.

+4

java levenshtein distance similarity jena sparql

Pedro Mar 29 '12 at 1:46

source share

3 answers

In case someone is interested, here is how I implemented it:

public class LevenshteinFilter extends FunctionBase2 { public NodeValue exec(NodeValue value1, NodeValue value2){ int i = StringUtils.getLevenshteinDistance(value1.asString(), value2.asString()); return NodeValue.makeInteger(i); } }

using:

  String functionUri = "http://www.example.org/LevenshteinFunction"; FunctionRegistry.get().put(functionUri , LevenshteinFilter.class); String s = "..."; String sparql = "SELECT ?x WHERE { ?xa Something . " + "?x hasString ?str . " + "FILTER(<"+functionUri +">(?str, \"" + s + "\") < 5) }"; QueryExecution qexec = QueryExecutionFactory.create(sparql, model); ResultSet rs = qexec.execSelect(); while(rs.hasNext()){ ... }

+4

Pedro Apr 14 '12 at 1:21

source share

For sesame fr/sparna/rdf/sesame/toolkit/functions/LevenshteinDistanceFunction , but cannot find the source.

0

Vladimir Alexiev Mar 17 '17 at 13:00

source share

Gregory williams · Accepted Answer · 2012-03-29T01:52:18+0000

SPARQL cannot do this directly, but you can implement the levenshtein distance function in java and use it in the SPARQL FILTER clause. Extensions in ARQ provide information about using extension functions.

SPARQL: how to find similar rows?

More articles: