Fork me on GitHub

Monday, April 6, 2009

Retry Oriented Thread Synchronization

Somewhat recently at work I've been troubleshooting some concurrency edge cases involving deadlocks due to thread synchronization. In my experience, a lot of times this is the result of creating lock objects for separate collections or something of the sort. In most of the code only one collection/object is accessed at a time and the separate synchronization objects allow for less contention and more throughput. Excluding simple/obvious mistakes, the problem typically arises when different code paths require the objects in opposite order, sometimes becoming more difficult to trace when they involve events/handlers. Obviously on goal when locking is to lock the synchronization object for as little time as possible, but sometimes it's simply necessary to hold a lock while entering an event or callback.

One solution, which worked fine for us, was to get rid of one of the synchronization objects and use the same one for both collections. Ultimately it was all that was necessary for our solution, since we couldn't change the other code paths or unlock one of the synchronization roots, but it did inspire me to start trying to think of a better solution. I've been toying with an idea somewhat similar to the pattern used by TransactionScope, taking advantage of the using statement to allow syntax to stay simple (since the lock statement is so lightweight it would be ideal to keep the new solution quick and easy). I certainly haven't found the ideal solution just yet, and it's quite likely that there really is no ideal solution, but I've posted the new code and test application on CodePlex, hoping others in the community will see the potential advantages and help me improve on it, either with code or ideas. This also doesn't implement any synchronization techniques regarding Mutexes, Semaphores, or any other wait handle style synchronization, this is currently specific to using Monitor and may only serve to improve circumstances using traditional lock statements or Monitor directly.

The general concept behind usage in the current implementation is as follows:

using (LoopLock l = new LoopLock(ltp.Syncs))
{
//optional event, but here for testing
l.LockAttemptFailed += new LoopLockEventHandler(delegate(LoopLock sender, LoopLockEventArgs e)
{
if (e.Attempts > 200)
{
//sample of aborting if it takes to long to get a successful lock;
e.AbortLock();
}
});

l.AcquireLock();

Thread.Sleep(rnd.Next(5490) + 20);

}
The LoopLock constructor takes in a params array of all the synchronization objects (in the order they should be locked). The AcquireLock method only exists to allow attaching an event handler, which will be described in a moment. AcquireLock attempts to obtain a lock on each synchronization object one at a time. If it is unable to obtain any lock in 100ms (by default, though there is a constructor overload to provide the timeout period) it will proceed to unlock each of the successfully locked objects (in reverse order), fire the LockAttemptFailed event which provides the number of tries so far along with the option of aborting the process alltogether (which throws a LoopLockAbortedException). If all locks are acquired code will proceed and, when the code leaves the scope of the using statement, all locks are released (in the opposite order they were locked).

One thing I've already considered through the simple test application I've made is the possibility of adding support for prioritization based on retry counts, which could be useful, but since these are thread specific and this is supposed to be a lightweight class it may require the use of WeakReferences so I haven't gone through with it yet since I'm still working on finding other possibilities and it could wind up being a waste of effort. One problem mentioned above that this still doesn't solve is the event handler situation, where I lock a synchronization object important to me and then fire an event which can be attached to by arbitrary code; since I don't have control over both code-bases I can't see a means to provide any "let me get out of your way" logic, since we can't release the lock once synchronization-dependent code has already begun executing. I was thinking about the possibility of a delegate or delegate wrapper that carries a reference to the synchronization object would work for some occasions, but without some under-.NET's-hood voodoo it would still sacrifice syntax clarity/diversity, which I'm trying to avoid. It seems there will have to be a tradeoff somewhere in order to improve this model, and maybe using a single callback with a synchronization object reference instead of supporting a multicast delegate may be that answer, but for now I'm going to think on it more. I would really love any insight from those in the wilderness; I've certainly not been exposed to every method of using threads and synchronization and, while I'm pretty familiar with Monitor and other synchronization classes, there could still be something obvious I'm not privy too as well.

-TheXenocide