I have a model that looks something like this (in JSON):
{"gender": "female", "name": [ {"family": "Smith", "given": ["Samantha"], "middle": ["Lee"]]}}
There are about 6 M records with this structure. I need to provide a full-text search on all components of a personβs name using the OR clause. For instance. if the user enters "blacksmith", I need to check all the specified, middle and family names.
In Datomic, I made a diagram:
{:db/ident :model/name :db/valueType :db.type/ref :db/isComponent true :db/cardinality :db.cardinality/many} {:db/ident :model.name/family :db/valueType :db.type/string :db/cardinality :db.cardinality/one :db/fulltext true} {:db/ident :model.name/given :db/valueType :db.type/string :db/cardinality :db.cardinality/many :db/fulltext true} {:db/ident :model.name/middle :db/valueType :db.type/string :db/cardinality :db.cardinality/many :db/fulltext true}
Please note, I have provided a full-text index for these attributes. Now when I request one attribute, say family , the performance is great (about 100 ms):
(def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where [(fulltext $ :model.name/family ?search) [[?name _ _ _]]] [?model :model/name ?name]])
But when I add other conditions using the OR clause, performance drops sharply (20 seconds):
(def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where (or [(fulltext $ :model.name/family ?search) [[?name _ _ _]]] [(fulltext $ :model.name/given ?search) [[?name _ _ _]]] [(fulltext $ :model.name/middle ?search) [[?name _ _ _]]]) [?model :model/name ?name]])
My question is: how could I improve this?
And if we go further, it would be great to also find not only the name, but also the address components. Ideally, there will be the following query (which also works rather slowly):
(def query-all '[:find [(rand 100 ?model) ...] :in $ ?search :where (or (and [(fulltext $ :model.name/given ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/middle ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/prefix ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/suffix ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.name/family ?search) [[?e _ _ _]]] [?p :model/name ?e]) (and [(fulltext $ :model.address/city ?search) [[?e _ _ _]]] [?p :model/address ?e]) (and [(fulltext $ :model.address/state ?search) [[?e _ _ _]]] [?p :model/address ?e]))])
How to implement this?