Marklogic 7: semantic search

Question

Marklogic 7: semantic search

I am trying to learn the RDF Triple Store feature and Marklogic 7 semantic search capabilities , and then query using SPARQL . I was able to perform some basic operations, for example:

xquery version "1.0-ml"; import module namespace sem = "http://marklogic.com/semantics"at"/MarkLogic/semantics.xqy"; sem:rdf-insert(sem:triple(sem:iri("http://example.org/ns/people#m"), sem:iri("http://example.com/ns/person#firstName"), "Sam"),(),(),"my collection")

which creates a triple and then queries it using the following SPARQL:

 PREFIX ab: <http://example.org/ns/people#> PREFIX ac: <http://example.com/ns/person#> SELECT ?Name WHERE { ab:m ac:firstName ?Name . }

which returns the result of Sam. Edited . In my use case, I have a delimited file (structured data) having 1 billion records that I swallowed in ML using MLCP, which is stored in ML, for example:

 <root> <ID>1000-000-000--000</ID> <ACCOUNT_NUM>9999</ACCOUNT_NUM> <NAME>Vronik</NAME> <ADD1>D7-701</ADD1> <ADD2>B-Valentine</ADD2> <ADD3>Street 4</ADD3> <ADD4>Fifth Avenue</ADD4> <CITY>New York</CITY> <STATE>NY</STATE> <HOMPHONE>0002600000</HOMPHONE> <BASEPHONE>12345</BASEPHONE> <CELLPHONE>54321</CELLPHONE> <EMAIL_ADDR> abc@gmail.com </EMAIL_ADDR> <CURRENT_BALANCE>10000</CURRENT_BALANCE> <OWNERSHIP>JOINT</OWNERSHIP> </root>

Now I want to use the RDF / Semantic function for my dataset above. However, I cannot figure out whether to convert the above document to RDF, as shown below (shown for <NAME> ), considering this to be correct:

  <sem:triple> <sem:subject>unique/uri/Person </sem:subject> <sem:predicate>unique/uri/Name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string" xml:lang="en">Vronik </sem:object> </sem:triple>

and then swallow these documents in ML and perform a search using SPARQL, or I just need to swallow my documents, and then separately receive triples obtained from external sources and somehow (like .. ??) associate them with my documents, and then query using SPARQL? Or is there another way I have to do this?

+6

semantic-web rdf triplestore marklogic

Shrey shivam Nov 19 '13 at 14:03

source share

2 answers

It is for you. If you want to use XML for some facts and triples for others, you can convert selected facts from XML to triples and combine them into the same documents. For the XML that you presented, how do I get started. When you insert or update each document in the original XML format, pass it through XQuery, which adds new triples. I would save these new triples in a single document with the original XML.

You can do this using CPF: http://docs.marklogic.com/guide/cpf - or using a tool like http://marklogic.imtqy.com/recordloader/ and its class XccModuleContentFactory .

But if you want to completely get away from the original XML format, you can do it. Then you translate your XML into triples and swallow those triples instead of the original XML. Or you can also have pure XML documents and pure three-local documents in one database.

+4

mblakele Nov 19 '13 at 15:07

source share

SBuxton · Accepted Answer · 2013-11-19T17:53:06+0000

As Michael says, there are many ways you could go with this. This is because MarkLogic 7 is so flexible - you can express information as triples or as XML (or as JSON or ...), as well as mix'n'match data models and query languages

The first thing to find out is what are you trying to achieve? If you just want your feet to be wet with a combination of MarkLogic XML and triples, here's what I suggest:

swallow your XML documents as above. If you have something heavy, such as an account description or free text annotations, all the better.
Using XQuery or XSLT, add a triple to each document that represents a city, for example, for a document with a sample that you placed, add
- this document URI - Unique / URI / Location New York
import triples from the Internet that display city names in states and postal codes (for example, from geodata)
now with a mixture of SPARQL and XQuery you can search for example. the current balance of each account in a zip code (even if your documents do not contain zip codes).

The documentation gives a good description of the boot triples from external sources using mlcp.

See http://docs.marklogic.com/guide/semantics/setup

and in more detail about downloading triples see http://docs.marklogic.com/guide/semantics/loading

Note that you can now run XQuery or SPARQL (or SQL) queries directly from the Query Console at http://your-host:8000/qconsole/

Marklogic 7: semantic search

More articles: