NEST Elasticsearch Reindex Test Samples

my goal is to reindex an index with 10 million shards in order to change field mappings to facilitate the analysis of meaningful terms.

My problem is that I am having problems using the NEST library to perform re-indexing, and the documentation is (very) limited. If possible, I need an example of the following use:

http://nest.azurewebsites.net/nest/search/scroll.html

http://nest.azurewebsites.net/nest/core/bulk.html

+6
source share
3 answers

NEST provides a good Reindex method that you can use, although there is no documentation. I used it very crudely and ready using this WinForms ad-hoc code.

  private ElasticClient client; private double count; private void reindex_Completed() { MessageBox.Show("Done!"); } private void reindex_Next(IReindexResponse<object> obj) { count += obj.BulkResponse.Items.Count(); var progress = 100 * count / (double)obj.SearchResponse.Total; progressBar1.Value = (int)progress; } private void reindex_Error(Exception ex) { MessageBox.Show(ex.ToString()); } private void button1_Click(object sender, EventArgs e) { count = 0; var reindex = client.Reindex<object>(r => r.FromIndex(fromIndex.Text).NewIndexName(toIndex.Text).Scroll("10s")); var o = new ReindexObserver<object>(onError: reindex_Error, onNext: reindex_Next, completed: reindex_Completed); reindex.Subscribe(o); } 

And I just found a blog post that showed me how to do this: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/

+13
source

Unfortunately, the NEST implementation is not quite what I expected. In my opinion, this is a bit overhauled, perhaps for the most common use case.

Many people just want to update their comparisons with zero downtime ...

In my case, I already took care of creating the index with all its settings and mappings, but NEST insists that it must create a new index when reindexing. This is among many other things. Too many other things.

It was much harder for me to simply implement directly - since NEST already has the Search , Scroll and Bulk methods. (this is taken from the NEST implementation):

 // Assuming you have already created and setup the index yourself public void Reindex(ElasticClient client, string aliasName, string currentIndexName, string nextIndexName) { Console.WriteLine("Reindexing documents to new index..."); var searchResult = client.Search<object>(s => s.Index(currentIndexName).AllTypes().From(0).Size(100).Query(q => q.MatchAll()).SearchType(SearchType.Scan).Scroll("2m")); if (searchResult.Total <= 0) { Console.WriteLine("Existing index has no documents, nothing to reindex."); } else { var page = 0; IBulkResponse bulkResponse = null; do { var result = searchResult; searchResult = client.Scroll<object>(s => s.Scroll("2m").ScrollId(result.ScrollId)); if (searchResult.Documents != null && searchResult.Documents.Any()) { searchResult.ThrowOnError("reindex scroll " + page); bulkResponse = client.Bulk(b => { foreach (var hit in searchResult.Hits) { b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id)); } return b; }).ThrowOnError("reindex page " + page); Console.WriteLine("Reindexing progress: " + (page + 1) * 100); } ++page; } while (searchResult.IsValid && bulkResponse != null && bulkResponse.IsValid && searchResult.Documents != null && searchResult.Documents.Any()); Console.WriteLine("Reindexing complete!"); } Console.WriteLine("Updating alias to point to new index..."); client.Alias(a => a .Add(aa => aa.Alias(aliasName).Index(nextIndexName)) .Remove(aa => aa.Alias(aliasName).Index(currentIndexName))); // TODO: Don't forget to delete the old index if you want } 

And the extension method ThrowOnError in case you want it:

 public static T ThrowOnError<T>(this T response, string actionDescription = null) where T : IResponse { if (!response.IsValid) { throw new CustomExceptionOfYourChoice(actionDescription == null ? string.Empty : "Failed to " + actionDescription + ": " + response.ServerError.Error); } return response; } 
+5
source

I am the second Ben Wilde answered above. It is better to have full control over index creation and the reindexing process.

What is missing in Ben code is parent / child relationship support. Here is my code to fix this:

Replace the following lines:

 foreach (var hit in searchResult.Hits) { b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id)); } 

Wherein:

 foreach (var hit in searchResult.Hits) { var jo = hit.Source as JObject; JToken jt; if(jo != null && jo.TryGetValue("parentId", out jt)) { // Document is child-document => add parent reference string parentId = (string)jt; b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id).Parent(parentId)); } else { b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id)); } } 
0
source

Source: https://habr.com/ru/post/975475/