Block until a new replica set configuration is established

Question

Block until a new replica set configuration is established

My My Python web application has several connections registered on the same MongoDb server, but 3 different databases. The application is run by 4 Gunicorn employees.

I use a set of replicas.

When the primary object does not work, the current request fails, and an update is scheduled in MongoReplicaSetClient (2.8, but I think in 3.2 it is the same). The following request can be successful if, by the time, a new primary element is selected, and MonitorThread receives information about this by updating the connection with the client.

But the update only affects this client. Other clients connected to the same MongoDB server were not affected - the same story happens to everyone. This means that if each employee is connected to 3 databases on the same MongoDB server, and I repeat the same HTTP request that all 3 databases use, when the primary failure is not required, it takes unlimited time to update all connected clients. If each HTTP request goes through cyclically for each of 4 workers, we need 12 requests to update each Mongo client. But in fact, the requests are not bypassed.

A look into the PyMongo code MongoReplicaSetClient._send_message_with_response I see that when the primary file is turned off, self.disconnect is self.disconnect , which calls self.__schedule_refresh . This method has a sync argument, which allows you to "block until the update is complete."

My idea is to catch an AutoReconnect exception and call __schedule_refresh(sync=True) for all clients that are connected to the failed primary and block until a new replica set configuration is established. Thus, HTTP requests will not be processed (resulting in 500) until the database is normal.

But __schedule_refresh is a private method. In addition, I do not know if it will be called quickly on all clients - it looks like MonitorThread does its work at intervals.

Or maybe I could use MongoReplicaSetClient.refresh .

What do you think of this idea? Are there any disadvantages?

Will you help me with the implementation?

+5

python mongodb pymongo

warvariuc Jan 6 '16 at 11:32

source share

1 answer

matias elgart · Answer 1 · 2016-11-11T16:02:38+0000

interesting problem.

if you work in an environment where a lot of requests come in, I would suggest against any work block in IO, especially in a network call, such as waiting for the initial message to appear.

I suggest trying to catch the exceptions from your mongo client, which may indicate that the server is down and returns 503s to the caller. 503s are usually a good way to report that a resource is unavailable and offer to try again later.

your mongo database will go down during use and detects that the application server detects this, a 503s request and return dump is one way to handle traffic that would otherwise start blocking, terminate in threads, and kill the application server. the application server leaves all these calls, which will lead to a wait and a possible failure.

This is the usual idiom used in REST services: http://www.restpatterns.org/HTTP_Status_Codes/503_-_Service_Unavailable

you can even add a Retry-After header to indicate when the client should try again in the future. more information on the Retry-After header and how some browsers interpret it:

Repeat after the HTTP response header - does this affect anything?

NTN

Block until a new replica set configuration is established

More articles: