Can Apache Solr handle TeraByte big data

I have been an apache solr user for about a year. I used solr for simple search tools, but now I want to use solr with 5TB of data. I assume that the 5TB data will be 7TB when solr indexes it according to the filter that I use. And then I will add almost 50 MB of data per hour to the same index.

1- Is there a problem using a single solr server with 5TB data. (no fragments)

  • a- Can the solr server respond to requests at an acceptable time?

  • b- what is the expected time for transmitting 50 MB of data at a 7 TB index.

  • c- Is there an upper limit to the size of the index.

2- what offers do you offer

  • a- How many layers should be used

  • b- Should I use salt cores

  • c- What transmission frequency do you offer. (1 hour OK)

3 - are there any test results for this kind of big data


There is no 5TB data available, I just want to evaluate what will be the result.

Note. You can assume that hardware resources are not a problem.

+6
source share
1 answer

if your sizes are for text, not binary files (whose text will usually be much smaller), then I don’t think you can pretend to do it on one machine.

This is very similar to Logly , and they use SolrCloud to process so much data.

ok, if all documents are rich, then the total text size for the index will be much smaller (for me it is about 7% of the initial size). In any case, even with a reduced amount, you still have too much data for one instance, I think.

+3
source

Source: https://habr.com/ru/post/905829/


All Articles