Kelly's Space

Coding 'n stuff

Diagnosing a Resource Leak

We had a customer report an issue with their NeoKicks converted application, where NeoData was apparently exhausting pooled SQL connections.  It certainly seemed odd, given the fact that their load was relatively low.  To help us debug, we had them create a crash dump of IIS using the Debug Diagnostic Tool from Microsoft

Once I got a crash dump, I fired up Windbg, attached to the crash dump, and loaded SOS by issuing:

.loadby sos mscorwks

Since this is a .NET solution and the error was related to running out of pooled connections, I could dump the objects with references (that the GC won’t collect) using !dumpheap

Glancing through the output, I found the following:

image

The highlighted number above is the count of the object for each type.  The reason this one in particular is odd is that the WorkerSession object represents a current, running transaction.  The WorkerSession object contains things like open file handles, etc.  If there are 525 active transactions with each containing a few NeoData files, then it is certainly possible the SQL connection pool is exhausted.  Since this server wasn’t under a lot of load (just had been running for a while), 525 seems way too high.  I would expect this to be in the 1-10 range.  Luckily, we can dig in a little deeper to see who has a reference to these objects in Windbg. 

The next step is to get more information about each one of these objects.  You can use:

!dumpheap -type Alchemy.NeoKicks.WorkerSession

to dump out the references to that object type. 

image

I’ve highlighted the address for one of the objects in the listing from !dumpheap.  I just picked one at random.  I want to find out what has references to this object, to see what might be preventing this object from being garbage collected.  To do this, I use the following:

!gcroot 000000016fa53470

image

I looked at a few other objects (again, using !gcroot) and found that they were all being ultimately held by a System.Threading._TimerCallback object. 

This was getting interesting!  Why in the world would a TimerCallback object have a reference to my WorkerSession object?  I decided it was time to go looking through our NeoKicks code to see where a TimerCallback was used.  It turns out, a WorkerSession object can contain a Timer object that is used when a CICS START is issued so that the transaction will execute on a thread pool thread (and optionally on a time interval).  These objects should have gotten cleaned up by the GC, since these tasks run immediately and return.

I decided to look through the documentation for Timer objects in the MSDN documentation and found the following:

“When a timer is no longer needed, use the Dispose method to free the resources held by the timer.”

Now we’re getting somewhere.  Sure enough, although we implement IDisposable for the WorkerSession object, we were not calling Dispose on the Timer object.  I added the Dispose() call, and re-ran a test that reproduced the problem and the issue went away.

The moral of this story – if you are using System.Threading.Timer objects in your code, you must call Dispose() on the object when you’re done.  Otherwise, you will end up with a memory leak!

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: