Hierarchical taxonomy in literate searches using RavenDb / Lucene?

I am considering RavenDb for implementing the "advanced faceted search" scenario.
I have to deal with complex hierarchical taxonomy and shared faces in different branches of the tree, supporting full-text search and all other basic functions.

Is there any resource that documents how to do this using the RavenDb API?

Insanely complex paper on the topic: In addition to the main grant search
Solr Method: HierarchicalFaceting

+6
source share
3 answers

Finally..

using System.Collections.Generic; using System.Linq; using NUnit.Framework; using Raven.Abstractions.Data; using Raven.Client; using Raven.Client.Document; using Raven.Client.Indexes; using Raven.Client.Linq; namespace Prototype.Search.Tests { [TestFixture] public class HierarchicalFaceting { // // Document definition // public class Doc { public Doc() { Categories = new List<string>(); } public int Id { get; set; } public List<string> Categories { get; set; } } // // Data sample // public IEnumerable<Doc> GetDocs() { yield return new Doc { Id = 1, Categories = new List<string> { "0/NonFic", "1/NonFic/Law"} }; yield return new Doc { Id = 2, Categories = new List<string> { "0/NonFic", "1/NonFic/Sci" } }; yield return new Doc { Id = 3, Categories = new List<string> { "0/NonFic", "1/NonFic/Hist", "1/NonFic/Sci", "2/NonFic/Sci/Phys" } }; } // // The index // public class DocByCategory : AbstractIndexCreationTask<Doc, DocByCategory.ReduceResult> { public class ReduceResult { public string Category { get; set; } } public DocByCategory() { Map = docs => from d in docs from c in d.Categories select new { Category = c }; } } // // FacetSetup // public FacetSetup GetDocFacetSetup() { return new FacetSetup { Id = "facets/Doc", Facets = new List<Facet> { new Facet { Name = "Category" } } }; } [SetUp] public void SetupDb() { IDocumentStore store = new DocumentStore() { Url = "http://localhost:8080" }; store.Initialize(); IndexCreation.CreateIndexes(typeof(HierarchicalFaceting).Assembly, store); var session = store.OpenSession(); session.Store(GetDocFacetSetup()); session.SaveChanges(); store.Dispose(); } [Test] [Ignore] public void DeleteAll() { IDocumentStore store = new DocumentStore() { Url = "http://localhost:8080" }; store.Initialize(); store.DatabaseCommands.DeleteIndex("Raven/DocByCategory"); store.DatabaseCommands.DeleteByIndex("Raven/DocumentsByEntityName", new IndexQuery()); store.Dispose(); } [Test] [Ignore] public void StoreDocs() { IDocumentStore store = new DocumentStore() { Url = "http://localhost:8080" }; store.Initialize(); var session = store.OpenSession(); foreach (var doc in GetDocs()) { session.Store(doc); } session.SaveChanges(); session.Dispose(); store.Dispose(); } [Test] public void QueryDocsByCategory() { IDocumentStore store = new DocumentStore() { Url = "http://localhost:8080" }; store.Initialize(); var session = store.OpenSession(); var q = session.Query<DocByCategory.ReduceResult, DocByCategory>() .Where(d => d.Category == "1/NonFic/Sci") .As<Doc>(); var results = q.ToList(); var facetResults = q.ToFacets("facets/Doc").ToList(); session.Dispose(); store.Dispose(); } [Test] public void GetFacets() { IDocumentStore store = new DocumentStore() { Url = "http://localhost:8080" }; store.Initialize(); var session = store.OpenSession(); var q = session.Query<DocByCategory.ReduceResult, DocByCategory>() .Where(d => d.Category.StartsWith("1/NonFic")) .As<Doc>(); var results = q.ToList(); var facetResults = q.ToFacets("facets/Doc").ToList(); session.Dispose(); store.Dispose(); } } } 
+5
source

I would handle the tree search part using pure Lucene for speed. The 2 approaches are the parent-child relationship method and the decumeration / 'Dewey Decimal' method.

A parent child is how we all learned how to implement linked lists in an algorithm class. It is easy to update, but queries require a visit to each node (for example, you cannot directly go from the parent to his great-grandson). Given that you still need to visit all node ancestors in order to get all the attributes (since the idea is to separate the attributes), visiting all the ancestors can be controversial.

How to store tree data in a Lucene / Solr / Elasticsearch or NoSQL db index? covers the path-enumeration / Dewey Decimal method.

Any approach can handle an arbitrarily complex hierarchy if it is a true hierarchy (i.e. a directed acyclic graph (DAG)).

+1
source

I have already fixed it.

I create an index as follows:

 public class ProductByCategory : AbstractIndexCreationTask<Product, ProductByCategory.ReduceResult> { public class ReduceResult { public string Category { get; set; } public string Title { get; set; } } public ProductByCategory() { Map = products => from p in products from c in p.Categories select new { Category = c, Title = p.Title }; Stores.Add(x => x.Title, FieldStorage.Yes); Indexes.Add(x => x.Title, FieldIndexing.Analyzed); } } 

And I request it as:

 var q = session.Query<ProductByCategory.ReduceResult, ProductByCategory>().Search(x => x.Title, "Sony") .Where(r => r.Category.StartsWith("1/beeld en geluid")).As<Product>(); var facetResults = q.ToFacets("facets/ProductCategory"); 
+1
source

Source: https://habr.com/ru/post/913746/


All Articles