I need help creating a single index in one instance of Solr and creating multiple cores in one instance of Solr, with each core serving the index. I understand that one index in solr is usually used to index one type of document. What is the best practice when you have different types of documents? For example, if you want to index invoice transaction information, you can create a schema with fields for an invoice transaction document as follows;
- Invoicedate
- DueDate
- invoiceSummary
- billingContact
- invoiceLineItems
- notes
Suppose you also want to index product details, you would create a new document type with a schema as follows:
- Productcode
- productDescription
- sellingPrice
- purchase price
- Onhand
- avgCost
- notes
and create a new core in Solr to index product documents? Or you combine the transaction and the product into one schema as follows:
- Invoicedate
- DueDate
- invoiceSummary
- billingContact
- invoiceLineItems
- Productcode
- productDescription
- sellingPrice
- purchase price
- Onhand
- avgCost
- notes
and have only one basic indexing of the aforementioned eyepiece instead of having the core βAccountβ and βProductβ, indexing two different documents?
I think it makes sense to have a single flat index, as suggested in the Solr wiki , when the fields are similar, however, in the example, as shown above, the data is not even remotely related to each other, since they are separate objects. I have seen cases where people suggested adding an extra field to distinguish between different objects, such as a table name field or similar, and filtering the query based on the table name field, which I think works. I'm not sure how scalable this is if you have a use case, as described below:
"Search for invoices for the keyword" John ", the search fields are" billingContact "," invoiceSummary "," notes ". Increase the field" billingContact "at the time of the request. Also find the product for" John ", search fields for" ProductDescription "," Vendor "," Notes ". Increase" Vendor "at the time of request. Return only 100 invoices and 100 products."
The application I'm working on requires a search on accounts and products from one form. There are no different parts in the application that are looking for different things.
My fears are to put everything in one index;
1) Large index size, for example: 50 million invoices + 50 million products in a single index
2) Re-indexing an index of this size.
3) Index tuning: wouldn't it be easier to tune / tune each individual index to serve certain expected search results, rather than trying to do it in one index?
4) We also recommend indexing your billing contact information in the future. Which will add more fields for indexing and will contribute to my problems in points 1) and 2).