We recently had a disconnect event when application threads got stuck while retrieving connections from c3p0. The set of settings is as follows:
Used version c3p0: 0.9.1.2
- c3p0.acquireRetryDelay = 10000;
- c3p0.acquireRetryAttempts = 0;
- c3p0.breakAfterAcquireFailure = false;
- c3p0.numHelperThreads = 8;
- c3p0.idleConnectionTestPeriod = 3;
- c3p0.preferredTestQuery = "select 1 from double";
- c3p0.checkoutTimeout = 3000;
- c3p0.user = "XYZ"; // changed to XYZ at time of publication
- c3p0.password = "XYZ"; // change XYZ when publishing
During a normal scenario, everything works fine, and c3p0 serves us well. However, during a recent network event (network sharing - where application hosts could not talk to the database), we saw that applications endlessly get stuck when trying to get connections to c3p0.
Stacktrace displayed in logs:
Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out. at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:106) at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:65) at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:527) at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128) at amazon.identity.connection.WrappedDataSource.getConnectionWithOptionalCredentials(WrappedDataSource.java:42) at amazon.identity.connection.LoggingDataSource.getConnectionWithOptionalCredentials(LoggingDataSource.java:55) at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30) at amazon.identity.connection.WrappedDataSource.getConnectionWithOptionalCredentials(WrappedDataSource.java:42) at amazon.identity.connection.ConnectionProfilingDataSource.profileGetConnectionWithOptionalCredentials(ConnectionProfilingDataSource.java:118) at amazon.identity.connection.ConnectionProfilingDataSource.getConnectionWithOptionalCredentials(ConnectionProfilingDataSource.java:99) at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30) at amazon.identity.connection.CallCountTrackingDataSource.getConnectionWithOptionalCredentials(CallCountTrackingDataSource.java:82) at amazon.identity.connection.WrappedDataSource.getConnection(WrappedDataSource.java:30) at com.amazon.jdbc.FailoverDataSource.doGetConnection(FailoverDataSource.java:133) at com.amazon.jdbc.FailoverDataSource.getConnection(FailoverDataSource.java:109) at com.amazon.identity.accessmanager.WrappedConnection$1.call(WrappedConnection.java:84) at com.amazon.identity.accessmanager.WrappedConnection$1.call(WrappedConnection.java:82) at com.amazon.identity.accessmanager.WrappedConnection.getConnection(WrappedConnection.java:110) ... 40 more Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@185e5c6b -- timeout at awaitAvailable() at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1317) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:557) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584) ....... (total of 317 such instances of prelimCheckoutResource):
Some excerpts that I pulled from c3p0 documentation
When the c3p0 data source tries and cannot get a connection, it will retry retrieving RetryAttempts with a delay in retrieving RetryDelay between each attempt. If all attempts fail, all clients waiting for a connection from the DataSource will see an exception indicating that the connection cannot be received. Please note that clients do not see an exception until the full round of attempts is completed, which may happen some time after the initial connection attempt. If getsRetryAttempts is set to 0, c3p0 will try to acquire new connections indefinitely, and calls to getConnection () may block the endless wait for a successful registration .
checkoutTimeout limits how long the client will wait for the connection, if all connections are checked and cannot be provided immediately
So here is my theory of why this happened:
Network sharing existed for several minutes. I assume that by that time connection test outages are not valid for all active connections in the pool. This means that c3p0 will now be involved in getting new connections. If any application host tries to get a connection from the pool, it will have to wait indefinitely until the connection is received (see excerpt from c3p0 documents). In addition, the check timeout parameter would not help in this case, since it provides a timeout only if all connections were retrieved (and this was not so).
My question here is the following:
- Do I understand the system correctly?
- If so, should there be a checkoutTimeout (or some other parameter present) that will timeout such requests to connect applications, and not freeze forever?
- If there is a better way to configure c3p0 to get away from this problem again. I can try to wrap a connection from a c3p0 timeout using a thread based timeout, but this is what I want to avoid if possible in order to have a better c3p0 configuration or apply the c3p0 patch.
thanks