I have a problem with Solr 5.3.1. My circuit is pretty simple. I have one uniqueKey that is "id" as a string. indexed, stored and required, not ambiguous.
I first add documents with "content_type: document_unfinished" and then overwrite the same document with the same identifier but with a different document content_type :. The document is then double-indexed. Again, the only uniqueKey is "id", like a string. The identifier comes from the main mysql-index.
It also looks like this happens more than once:
http://lucene.472066.n3.nabble.com/uniqueKey-not-enforced-td4015086.html
http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-td4129651.html
In my case, not all documents in the index are duplicated, just some. At first, I assumed that they were overwritten when committed, when the same unique Key exists in the index. Which doesn't seem to work as I expected. I do not want to just update some fields in the document, I want to completely replace it, with all the children.
Some statistics: about 350 thousand documents in the index. Mostly with childDocuments. Documents are distinguished by the "content_type" field. I used SolrJ to import them this way:
HttpSolrServer server = new HttpSolrServer(url); server.add(a Collection<SolrInputDocument>); server.commit();
I always add the whole document with all the children again. Its nothing supernatural. I get duplicate documents for the same unique key. There are no lateral injections. I only run Solr with built-in Jetty. I do not open the lucene index in java manually.
Then I had to delete + paste again. This seems to work for a while, but then under certain conditions an error message is issued:
A parent request gives a document that does not match the parent filter
The document where this occurs seems completely random; only one thing appears: its child document, where it occurs. I don't run anything special, basically download the solr package from the website and run it with bin/solr start
Any ideas?
EDIT 1
I think I found a problem that seems to be an error? To reproduce the problem:
I loaded Solr 5.3.1 into Debian in virtualBox and started it with bin/solr start . A new kernel with a basic configuration set has been added. Nothing has changed in the basic configuration set, just copied it and added the kernel.
This results in two documents with the same identifier in the index:
SolrClient solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1"); SolrInputDocument inputDocument = new SolrInputDocument(); inputDocument.setField("id", "1"); inputDocument.setField("content_type_s", "doc_unfinished"); solrClient.add(inputDocument); solrClient.commit(); solrClient.close(); solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1"); inputDocument = new SolrInputDocument(); inputDocument.setField("id", "1"); inputDocument.setField("content_type_s", "doc"); SolrInputDocument childDocument = new SolrInputDocument(); childDocument.setField("id","1-1"); childDocument.setField("content_type_s", "subdoc"); inputDocument.addChildDocument(childDocument); solrClient.add(inputDocument); solrClient.commit(); solrClient.close();
Search with
http://192.168.56.102:8983/solr/test1/select?q=% 3A & wt = json & indent = true
leads to the following conclusion:
{ "responseHeader": { "status": 0, "QTime": 0, "params": { "q": "*:*", "indent": "true", "wt": "json", "_": "1450078098465" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "id": "1", "content_type_s": "doc_unfinished", "_version_": 1520517084715417600 }, { "id": "1-1", "content_type_s": "subdoc" }, { "id": "1", "content_type_s": "doc", "_version_": 1520517084838101000 } ] } }
What am I doing wrong?