Full-text search on multiple attributes in Datomic

I have a model that looks something like this (in JSON):

{"gender": "female", "name": [ {"family": "Smith", "given": ["Samantha"], "middle": ["Lee"]]}} 

There are about 6 M records with this structure. I need to provide a full-text search on all components of a person’s name using the OR clause. For instance. if the user enters "blacksmith", I need to check all the specified, middle and family names.

In Datomic, I made a diagram:

  {:db/ident :model/name :db/valueType :db.type/ref :db/isComponent true :db/cardinality :db.cardinality/many} {:db/ident :model.name/family :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db/fulltext true} {:db/ident :model.name/given :db/valueType :db.type/string :db/cardinality :db.cardinality/many :db/fulltext true} {:db/ident :model.name/middle :db/valueType :db.type/string :db/cardinality :db.cardinality/many :db/fulltext true} 

Please note, I have provided a full-text index for these attributes. Now when I request one attribute, say family , the performance is great (about 100 ms):

 (def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where [(fulltext $ :model.name/family ?search) [[?name _ _ _]]] [?model :model/name ?name]]) 

But when I add other conditions using the OR clause, performance drops sharply (20 seconds):

 (def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where (or [(fulltext $ :model.name/family ?search) [[?name _ _ _]]] [(fulltext $ :model.name/given ?search) [[?name _ _ _]]] [(fulltext $ :model.name/middle ?search) [[?name _ _ _]]]) [?model :model/name ?name]]) 

My question is: how could I improve this?

And if we go further, it would be great to also find not only the name, but also the address components. Ideally, there will be the following query (which also works rather slowly):

 (def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where (or (and [(fulltext $ :model.name/given ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/middle ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/prefix ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/suffix ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/family ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.address/city ?search) [[?e _ _ _]]] [?p :model/address ?e]) (and [(fulltext $ :model.address/state ?search) [[?e _ _ _]]] [?p :model/address ?e]))]) 

How to implement this?

+5
source share
2 answers

We were in the same situation and finished the job:

We created an attribute that combines all other string attributes. Of course, using the full text above this attribute.

+1
source

I think you should not use or , but make four different queries in db and combine the results of these queries. The functionality of datomic rules is useful, but managed queries that they tend to explode in terms of implemented results.

Remember that with a db stream is unchanged and will give consistent results for several queries given to it. This may not always be true for full-text searches, because the Lucene indexing job is done after transactions, but for most applications this probably doesn't matter.

+1
source

Source: https://habr.com/ru/post/1275701/


All Articles