Dr. Dobb's Journal - January 2008 - (Page 67) d02sutt_p5ma 11/9/07 11:41 AM Page 67 by Herb Sutter Effective Concurrency Use Lock Hierarchies to Avoid Deadlock Keep it on the level: Use lock hierarchies to avoid deadlock in the code you control IN THE FIRST TWO Effective Concurrency columns— ”The Pillars of Concurrency” (DDJ, August 2007) and “How Much Scalability Do You Have or Need?” (DDJ, September 2007)—we saw the three pillars of concurrency and what kinds of concurrency they express: • Pillar 1: Isolation via asynchronous agents. This is all about doing work asynchronously to gain isolation and responsiveness, using techniques like messaging. Some Pillar 1 techniques, such as pipelining, also enable limited scalability, but this category principally targets isolation and responsiveness. • Pillar 2: Scalability via concurrent collections. This is all about reenabling the free lunch of being able to ship applications that get faster on machines with more cores, using more cores to get the answer faster by exploiting parallelism in algorithms and data. • Pillar 3: Consistency via safely shared resources. While doing all of the above, we need to avoid races and deadlocks when using shared objects in memory or any other shared resources. My last few columns have focused on Pillar 3, and I’ve referred to lock levels or lock hierarchies as a way to control deadlock. A number of readers have asked me to explain what those are and how they work, so I’ll write one more column about Pillar 3 (for now). time are acquired in a consistent order. But how can we ensure this in a way that will be both usable and correct? For example, we could try to figure out which groups of mutexes might ever be held at the same time, and then try to define pairwise ordering rules that cover each possible combination. But that approach by itself is prone to accidentally missing unexpected combinations of locks; and even if we did it perfectly, the result would still be at best “DAG spaghetti”—a directed acyclic graph (DAG) that nobody could comprehend as a whole. And every time we want to add a new mutex to the system, we would have to find a way fit it into the DAG without creating any cycles. We can do better by directly exploiting the knowledge we already have about the structure of the program to regularize the mess and make it understandable. A Solution: Lock Hierarchies and Layering The idea of a lock hierarchy is to assign a numeric level to every mutex in the system, and then consistently follow two simple rules: • Rule 1: While holding a lock on a mutex at level N, you may only acquire new locks on mutexes at lower levels <N. • Rule 2: Multiple locks at the same level must be acquired at the same time, which means we need a “lock-multiple” operation such as lock( mut1, mut2, mut3, … ). This operation internally has the smarts to make sure it always takes the requested locks in some consistent global order. [1] Note that any consistent order will do; for example, one typical strategy is to acquire mutexes at the same level in increasing address order. If the entire program follows these rules, then there can be no deadlock among the mutex acquire operations, because no two pieces of code can ever try to acquire two mutexes a and b in opposite orders: Either a and b are at different levels and so the one at the higher level must be taken first; or else they are at the same level and they must be requested at the same time, and the system will automatically acquire them in the same order. The two simple rules have provided a convenient and understandable way to conveniently express a total order on all locking performed in the system. But where do we find the levels? The answer is: You probably already have them. Mutexes protect data, and the data is already in layers. Lock levels should directly leverage and mirror the layering already in place in the modular structure of your application. Figure 1 illustrates a typical example of layering (or “hierarchical decomposition” and “into a directed acyclic graph,” if you prefer five-dollar words), a time-tested technique to control the dependencies in your software. The idea is to group your code into modules and the modules into layers, where code at a given layer can only call code at the same or lower layers, and should avoid calling upward into higher layers. If that sounds a lot like the Two Rules of lock hierarchies, that’s no coincidence. After all, both the layering and the mutexes are driven by the same goal: to protect and control access to the encapsulated data that is owned by each piece of code, and to keep it free from corruption by maintaining its invariants correctly. As in Figure 1, the levels you assign to mutexes will normally closely follow the levels in your program’s layered structure. A direct consequence of Rule 1 is that locks held on mutexes at lower levels have a shorter duration than locks held at higher levels; this is just what we expect of calls into code at lower layers of a layered software system. January 2008 l www.ddj.com l Dr. Dobb’s Journal The Problem: Anatomy of a Deadlock Your program contains a potential deadlock if: • One part of your program tries to acquire exclusive use of two shared resources (such as mutexes) a and b at the same time by acquiring first a and then b. • Some other part of your program tries to do the same by acquiring b and then a in the reverse order. • The two pieces of code could ever execute concurrently. For convenience, from now on I’m going to talk about just locks, but the issues and techniques apply to any shared resource that needs to be used exclusively by one piece of code at a time. The following code shows the simplest example of a potential deadlock: // Thread 1: a then b // Thread 2: b then a a.lock(); b.lock(); b.lock(); a.lock(); … … // unlock a and b // unlock a and b // in either order // in either order The only way to eliminate such a potential deadlock is to make sure that all mutexes ever held at the same 67 http://www.ddj.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.