Easy Solr Deployment with Two Servers for Redundancy

Question

Easy Solr Deployment with Two Servers for Redundancy

I am deploying the Apache Solr web application on two redundant Tomcat 6 servers to provide redundancy and improve availability. At the moment, scalability is not a problem.

I have a load balancer that can dynamically route traffic to one server or the other or both.

I know that Solr supports the master / slave configuration, but this requires manual recovery if the slave receives updates during the main shutdown (which will be in my use case).

I am considering a simpler approach, using the ability to reboot the kernel: - only one of the two servers receives traffic at any time (the "active" instance), but both of them work, - both instances use the same index data and - before redirecting traffic from due to failure, now the active instance is prompted to restart the index kernel (s)

Limited failover testing with indexes and record has been successful. What consequences / problems am I missing?

Your thoughts and opinions are welcome.

+6

search-engine solr

kingolego Dec 02 '11 at 10:06

source share

2 answers

I know almost nothing about Solr, so I don’t know the answers to some of the questions that need to be considered with this setup, but I can provide some things to consider. You will need to think about what failures you want to protect against and why, and make your decision based on this. In the end, there is no perfect system.

Both instances use the same files. If for some reason the files are damaged or inaccessible (hardware error, software error), the second instance will fail just like the first.

Are files similarly stored and accessible in such a way that they are always valid when their inactive instance is read? An inactive instance will try to read files when the active instance writes them? What happens if this happens? If the active instance is interrupted while writing index files (power failure, network outage, full disk), what happens when an inactive instance tries to load them? The same questions apply in the reverse order if an “inactive” instance is going to write to files (which is not particularly unlikely if it was not designed with this use in mind, it may, for example, update some kind of idle statistics).

In addition, reloading the indices sounds as if it could be a rather time-consuming operation and the service will not be available during its execution.

If the active instance needs to complete an ordered shutdown before the inactive instance loads the indexes (possibly due to problems with the validity of the file mentioned above), this can also be time consuming and unavailable. If the active instance cannot complete the ordered shutdown, you will have a bad time.

0

Samuel edwin ward Mar 03 '12 at 21:01

source share

abdollar · Accepted Answer · 2012-03-04T06:23:40+0000

A simple approach to redundant your consideration seems reasonable, but you cannot use it for disaster recovery unless you can transfer data / index to another physical location using your NAS / SAN.

Here are some tips: -

Make disaster recovery backups and test these backups because the index may have been corrupted because there are no checksums in SOLR / Lucene. The index may be destroyed, or some records may be deleted and combined without your knowledge, and backups may be useful to restore these records / documents later if you need to investigate.
Before redirecting traffic to the second instance, I will run several requests to load the caches, and also check and confirm that the current index works before it goes online.
Isolate updates for a single location and process and flow to ensure transactional integrity in the event of brute force, as it can be difficult to manage consistency because SOLR does not use vector clocks to synchronize updates, such as some databases. I personally would leave a copy of all updates separately from SOLR in some other store in case a small time window is required.

All in all, my experience with SOLR was excellent if you weren't using cutting edge features and plugins. I have one copy that currently has 40 million documents and uptime for a year without any problems. This does not mean that you will not have problems, but it will give you an idea of how stable it can be.

Easy Solr Deployment with Two Servers for Redundancy

More articles: