MSDN Magazine - October 2008 - (Page 153) Stephen tOub, IgOr OStrOvSky, AnD huSeyIn yIlDIz .Net Matters False Sharing Unless you’ve been living under a rock, you’ve likely heard about the “manycore shift.” Processor manufacturers such as Intel and AMD are increasing the power of their hardware by scaling out the number of cores on a processor rather than by attempting to continue to provide exponential increases in clock speed. This shift demands that software developers start to write all of their applications with concurrency in mind in order to benefit from these significant increases in computing power. To address this issue, a multitude of concurrency libraries and languages are beginning to emerge, including Parallel Extensions to the Microsoft .NET Framework, the Parallel Pattern Library (PPL), the Concurrency & Coordination Runtime (CCR), Intel’s Threading Building Blocks (TBB), and others. These libraries all aim to decrease the amount of boilerplate code necessary to write efficient parallel applications by providing constructs such as Parallel.For and AsParallel. Unfortunately, while these constructs represent a monumental step forward in expressing parallelism, they don’t obviate the need for developers to be aware of what their code is doing, how it’s structured, and how hardware can have a significant impact on the performance of the application. While the software industry is making strides in developing new programming models for concurrency, there does not seem to be a programming model on the horizon that would magically eliminate all concurrency-related issues. At least in the near term, understanding how memory and caches work will be important in order to write efficient parallel programs. Figure 1 Memory Access Patterns Are Important using System; using System.Diagnostics; class Program { public static void Main() { const int SIZE = 10000; int[,] matrix = new int[SIZE, SIZE]; while (true) { // Faster Stopwatch sw = Stopwatch.StartNew(); for (int row = 0; row < SIZE; row++) { for (int column = 0; column < SIZE; column++) { matrix[row, column] = (row * SIZE) + column; } } Console.WriteLine(sw.Elapsed); // Slower sw = Stopwatch.StartNew(); for (int column = 0; column < SIZE; column++) { for (int row = 0; row < SIZE; row++) { matrix[row, column] = (row * SIZE) + column; } } Console.WriteLine(sw.Elapsed); Console.WriteLine("================="); Console.ReadLine(); } } } It All Comes Down to Hardware The concept of knowing what’s happening at the lower levels of an application is, of course, not new. To achieve optimal performance, developers need to have a good understanding of how things like memory accesses affect the performance of an application. When we talk about reading and writing from memory, we typically gloss over the fact that, in modern hardware, it’s rare to read from and write to the machine’s memory banks directly. Memory access is slow—orders of magnitude slower than mathematical calculations, though orders of magnitude faster than accessing hard disks and network resources. To account for this slow memory access, most processors today use memory caches to improve application performance. Caches come in multiple levels, with most consumer machines today having at least two levels, referred to as L1 and L2, and some having more than that. L1 is the fastest, but it’s also the most expensive, so machines will typically have a small amount of it. (The laptop on which we’re writing this column has 128KB of L1 cache.) L2 is a bit slower, but is less expensive, so machines will have more of it. (The same laptop has 2MB of L2 cache.) When data is read from memory, the requested data as well as data around it (referred to as a cache line) is loaded from memory into the caches, then the program is served from the caches. This loading of a whole cache line rather than individual bytes can dramatically improve application performance. On our laptop the cache line size for both L1 and L2 is 64 bytes. Since applications frequently read bytes sequentially in memory (common when accessing arrays and the like), applications can avoid hitting main memory on every request by loading a series of data in a cache line, since it’s likely that the data about to be read has already been loaded into the cache. However, this does mean that a developer Send your questions and comments to netqa@microsoft.com. October 2008 153
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.