As part of my DSpace instance, I have a SOLR repository containing 12 million usage statistics records. Some records have been migrated using several SOLR updates and do not match the current schema. 5 million of these entries do not have a unique id field specified in my schema.
The DSpace system provides a mechanism to fine old usage statistics records in a separate solr shard using the following code.
LOGIC DSPACE SHARD:
for (File tempCsv : filesToUpload) { //Upload the data in the csv files to our new solr core ContentStreamUpdateRequest contentStreamUpdateRequest = new ContentStreamUpdateRequest("/update/csv"); contentStreamUpdateRequest.setParam("stream.contentType", "text/plain;charset=utf-8"); contentStreamUpdateRequest.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); contentStreamUpdateRequest.addFile(tempCsv, "text/plain;charset=utf-8"); statisticsYearServer.request(contentStreamUpdateRequest); } statisticsYearServer.commit(true, true);
When I tried to start this process, I received an error message for each of my entries that does not have a unique id field, and 5 million entries were deleted by the process.
I tried replacing these 5 million entries to force a unique id field for each entry. Here is the code that I run to run this update. The myQuery query iterates over several thousand records.
MY RECORD REPAIR PROCESS:
ArrayList<SolrInputDocument> idocs = new ArrayList<SolrInputDocument>(); SolrQuery sq = new SolrQuery(); sq.setQuery(myQuery); sq.setRows(MAX); sq.setSort("time", ORDER.asc); QueryResponse resp = server.query(sq); SolrDocumentList list = resp.getResults(); if (list.size() > 0) { for(int i=0; i<list.size(); i++) { SolrDocument doc = list.get(i); SolrInputDocument idoc = ClientUtils.toSolrInputDocument(doc); idocs.add(idoc); } } server.add(idocs); server.commit(true, true); server.deleteByQuery(myQuery); server.commit(true, true);
After starting this process, all entries in the repository have a unique identifier. The entries I touched also have a _version_ field.
When I try to restart the outline process that was included above, an error occurs with the value of the _version_ field, and the process terminates. If I try to set the version field explicitly, I get the same error.
Here is the error message I encounter when invoking the shard process:
Exception: version conflict for e8b7ba64-8c1e-4963-8bcb-f36b33216d69 expected=1484794833191043072 actual=-1 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: version conflict for e8b7ba64-8c1e-4963-8bcb-f36b33216d69 expected=1484794833191043072 actual=-1 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
My goal is to restore my records so that I can start the shard process provided by DSpace. Can you recommend any additional actions that I should take to restore these records?