Solr does not overwrite - duplicates unique recordsKey

I have a problem with Solr 5.3.1. My circuit is pretty simple. I have one uniqueKey that is "id" as a string. indexed, stored and required, not ambiguous.

I first add documents with "content_type: document_unfinished" and then overwrite the same document with the same identifier but with a different document content_type :. The document is then double-indexed. Again, the only uniqueKey is "id", like a string. The identifier comes from the main mysql-index.

It also looks like this happens more than once:

http://lucene.472066.n3.nabble.com/uniqueKey-not-enforced-td4015086.html

http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-td4129651.html

In my case, not all documents in the index are duplicated, just some. At first, I assumed that they were overwritten when committed, when the same unique Key exists in the index. Which doesn't seem to work as I expected. I do not want to just update some fields in the document, I want to completely replace it, with all the children.

Some statistics: about 350 thousand documents in the index. Mostly with childDocuments. Documents are distinguished by the "content_type" field. I used SolrJ to import them this way:

HttpSolrServer server = new HttpSolrServer(url); server.add(a Collection<SolrInputDocument>); server.commit(); 

I always add the whole document with all the children again. Its nothing supernatural. I get duplicate documents for the same unique key. There are no lateral injections. I only run Solr with built-in Jetty. I do not open the lucene index in java manually.

Then I had to delete + paste again. This seems to work for a while, but then under certain conditions an error message is issued:

A parent request gives a document that does not match the parent filter

The document where this occurs seems completely random; only one thing appears: its child document, where it occurs. I don't run anything special, basically download the solr package from the website and run it with bin/solr start

Any ideas?

EDIT 1

I think I found a problem that seems to be an error? To reproduce the problem:

I loaded Solr 5.3.1 into Debian in virtualBox and started it with bin/solr start . A new kernel with a basic configuration set has been added. Nothing has changed in the basic configuration set, just copied it and added the kernel.

This results in two documents with the same identifier in the index:

  SolrClient solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1"); SolrInputDocument inputDocument = new SolrInputDocument(); inputDocument.setField("id", "1"); inputDocument.setField("content_type_s", "doc_unfinished"); solrClient.add(inputDocument); solrClient.commit(); solrClient.close(); solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1"); inputDocument = new SolrInputDocument(); inputDocument.setField("id", "1"); inputDocument.setField("content_type_s", "doc"); SolrInputDocument childDocument = new SolrInputDocument(); childDocument.setField("id","1-1"); childDocument.setField("content_type_s", "subdoc"); inputDocument.addChildDocument(childDocument); solrClient.add(inputDocument); solrClient.commit(); solrClient.close(); 

Search with

http://192.168.56.102:8983/solr/test1/select?q=% 3A & wt = json & indent = true

leads to the following conclusion:

 { "responseHeader": { "status": 0, "QTime": 0, "params": { "q": "*:*", "indent": "true", "wt": "json", "_": "1450078098465" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "id": "1", "content_type_s": "doc_unfinished", "_version_": 1520517084715417600 }, { "id": "1-1", "content_type_s": "subdoc" }, { "id": "1", "content_type_s": "doc", "_version_": 1520517084838101000 } ] } } 

What am I doing wrong?

+5
source share
1 answer

Thanks for your feedback! I write this as an answer because it takes too long. I got the same answer from the mailing list:

Mikhail Khludnev Hello, Sebastian,

Mixing standalone documents and blocks does not work. There are many problems open.

On, March 9, 2016 at 3:02 pm, Sebastian Riemer wrote:

Hello,

in order to actually describe my problem in short, instead of just referencing the test application using SolrJ, I do the following:

1) Create a new document as parent and execute

  SolrInputDocument parentDoc = new SolrInputDocument(); parentDoc.addField("id", "parent_1"); parentDoc.addField("name_s", "Sarah Connor"); parentDoc.addField("blockJoinId", "1"); solrClient.add(parentDoc); solrClient.commit(); 

2) Create a new document with the same unique identifier as in 1), with the child element the document is attached

  SolrInputDocument parentDocUpdateing = new SolrInputDocument(); parentDocUpdateing.addField("id", "parent_1"); parentDocUpdateing.addField("name_s", "Sarah Connor"); parentDocUpdateing.addField("blockJoinId", "1"); SolrInputDocument childDoc = new SolrInputDocument(); childDoc.addField("id", "child_1"); childDoc.addField("name_s", "John Connor"); childDoc.addField("blockJoinId", "1"); parentDocUpdateing.addChildDocument(childDoc); solrClient.add(parentDocUpdateing); solrClient.commit(); 

3) Results in 2 documents with id = "parent_1" in the solr index

Is this normal behavior? I thought that an existing document should be updated instead of generating a new document with the same identifier.

For a full working test application, see the original post.

Regards, Sebastian

I think this is a known issue, and there are several tickets that relate to this, but I'm glad there is a way to handle this (adding child documents from the beginning) ( https://issues.apache.org/jira/browse / SOLR-6096 , https://issues.apache.org/jira/browse/SOLR-5211 , https://issues.apache.org/jira/browse/SOLR-7606 )

+3
source

Source: https://habr.com/ru/post/1238024/


All Articles