I have a memory leak in two applications on a Tomcat 6.0.35 server that appeared “out of nowhere”. One application is Solr and the other is our own software. I hope someone has seen this before, as this has happened to me over the past few weeks, and I must continue to restart Tomcat in the production environment.
It appeared on our source server, despite the fact that none of the codes related to working with a stream or DB connection was affected. Since the old server this application was running on had to be deleted, I moved the site to a new server and a “cleaner” environment with an idea that would clean up any obsolete things. But it goes on.
Before Tomcat shuts down, the catalina.out log is populated with errors such as:
2012-04-25 21: 46: 00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader. The web application [/ AppName] seems to have started a thread named [MultiThreadedHttpConnectionManager cleanup], but this did not stop. This will likely lead to a memory leak.
2012-04-25 21: 46: 00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader. The web application [/ AppName] seems to have started a thread named [com.mchan ge.v2.async.ThreadPoolAsynchronousRunner $ PoolThread- # 2], but could not stop it. This will likely lead to a memory leak.
2012-04-25 21: 46: 00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader - the web application [/ AppName] is still processing a request that has not yet been received. This will likely lead to a memory leak. You can control the time allowed to complete requests using the unteDelay attribute of the Conte xt standard.
During this migration, we switched from Solr 1.4-> Solr 3.6 in an attempt to fix the problem. When the above errors start filling up the log, the Solr error below is repeated 10-15 times, and then tomcat stops working, and I have to turn it off and start to get it to respond.
2012-04-25 21: 46: 00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- the web application [/ solr] created ThreadLocal with a key of the type [org.a pache.solr.schema.DateField.ThreadLocalDateFormat] (value [ org.apache.solr.schema.DateField$ThreadLocalDateFormat@1f1e90ac ]) and a value of type [org.apache.solr. schema.DateField.ISO8601CanonicalDateFormat] (value [ org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2e d43a]), but could not delete it when the website suppression stopped. This will likely lead to a memory leak.
My research has come up with a lot of suggestions for modifying code that controls threads to make sure they kill connections to DB pools, etc., but this code was not changed after almost 12 months. Also, the Solr application crashes, and this is a third party, so I think this is an environmental situation (conflict in the bank, version control, full-finger configuration?)
My latest change was updating the mysql connector for java to the latest version, as some memory leak errors existed when merging in earlier versions, but the server just crashed after only a few hours.
One thing that I just noticed is that I see thousands of sessions in the Tomcat web manager, but it could be a red herring.
If anyone saw this, any help is greatly appreciated.
[change]
I think I found the source of the problem. After all, this was not a memory leak. I took the application from another development team that uses c3p0 for the database pool through Hibernate. c3p0 has an error / function that, if you do not release DB connections, c3p0 can go into a wait state after all connections (via MaxPoolSize: default is 15) are used. He will wait forever for the connection to become available. Hence my stall.
I lifted MaxPoolSize first from 25-> 100, and my application worked for several days without freezing, and then from 100-> 1000, and it has been stable since (more than 2 weeks).
This is not a complete solution, since I need to find out why it ends up merging connections, so I also set c3p0 unverturnedConnectionTimeout to 4 hours, which imposes a 4-hour limit on all connections, regardless of whether they are active or not. If it is an active connection, it will close it and open it again.
Not really and c3p0 is not recommended, but it gives me some breathing space to figure out the source of the problem.
Note: when using c3p0 with Hibernate, the settings are saved in the persistence.xml file, but not all settings can be placed there. Some settings (e.g. unverturnedConnectionTimeout) should go to c3p0.properties