Dr. Dobb's Journal - April 2008 - (Page 23) d04scarp_p4ds 2/13/08 11:47 AM Page 23 Perforce of text at the same time. Consequently, they can share a single program control structure. This is a great advantage. We can fuse multiple DFAs in single, long basic blocks, which are crucial for performance on SPEs—they mitigate branch overhead and give the compiler enough elbow room during instruction scheduling. This results in reduced stalls, hiding of load/store latencies, better dual pipeline exploitation, and lower clock Cycles Per Instruction (CPI) values. The Cell is so different from traditional processors that you can’t get the best out of it unless your code explicitly takes advantage of its hardware Fast Software Configuration Management Our Fast Solution The smaller the dictionary, the smaller the corresponding State Transition Table (STT). If the STT is small enough, we can keep it entirely in the local store rather than in main memory. This speeds up the algorithm because main memory accesses are slow. Our fast implementation is based on the simple idea of keeping the STT in the local store. As we anticipated, a single automaton poorly utilizes the SPE hardware. Its implementation uses only four registers out of the 127 available, and no SIMD instructions. Moreover, an SPE has two pipelines and could issue two instructions at the same time if they do not conflict, but this implementation does not exploit this feature. As a result, the CPI is high (2.6) and throughput is low (1.35 Gbps). But if we fuse together 16 automata, things improve a lot, because with SIMD instructions our code can process data from four automata at the same time. This implementation better exploits the registers (40 out of 127) and the two pipelines (43.8 percent of the issues are dual), reaching a CPI of 0.67. This is a big improvement, but the code still stalls 7.4 percent of the cycles. To get rid of these stalls, we unroll the code manually, creating longer basic blocks. This gives the compiler more freedom to reschedule instruc- 200,000 developers rely on Perforce to manage their code. Here’s what they’re saying “Perforce is absolutely the only tool I would trust to manage our source code.” tions and fill the stalls. We found experimentally that the best conditions happen when we unroll the main loop three times. The resulting code uses almost all the registers (124 out of 127) without spilling, it has a high dual-issue rate, and no stalls. This leads to a remarkable throughput of 5.11 Gbps. Loading text from memory is not in the way: We adopt a double buffering scheme to load the next chunk while the current one is processed. This completely hides the load latency, because transfers happen asynchronously and take less time than processing. In fact, loading a 16-KB text chunk takes 5.94 microseconds, whereas processing it takes 25.64 microseconds. All this happens on a single SPE. To unleash all of the Cell’s power, the remaining seven SPEs can be put to work, too. We can do this in two ways: • When they operate in parallel, all the SPEs have identical STTs, but they Nick Triantos NVIDIA “Perforce has been a breath of fresh air; a product which does what it's supposed to do and does it well.” Scott Buchholz Sybase “Perforce is the cornerstone of Monster’s software production lines.” Mark Conway Monster Worldwide “The bottom line is that Perforce has proven itself to be a reliable platform for global development at Symantec.” Russell Jackson Symantec See for yourself Free evaluation at www.perforce.com Figure 2: Processing power and memory subsystem. April 2008 l www.ddj.com l Dr. Dobb’s Journal 23 http://www.perforce.com http://www.ddj.com
Table of Contents Feed for the Digital Edition of Dr. Dobb's Journal - April 2008 Dr. Dobb's Journal - April 2008 Contents Hmmmm Alia Vox Developer Diaries Dr. Dobb's Excellence in Programming Award Conversations Fast String Search on Multicore Processors The Byzantine Generals Problem Optimizing Math-Intensive Applications with Fixed-Point Arithmetic Random Numbers in a Range Using Generic Programming The Agile Edge Effective Concurrency Swaine's Flames Dr. Dobb's Journal - April 2008 Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page Cover1) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page Cover2) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 1) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 2) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 3) Dr. Dobb's Journal - April 2008 - Contents (Page 4) Dr. Dobb's Journal - April 2008 - Contents (Page 5) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 6) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 7) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 8) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 9) Dr. Dobb's Journal - April 2008 - Alia Vox (Page 10) Dr. Dobb's Journal - April 2008 - Alia Vox (Page 11) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 12) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 13) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 14) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 15) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Excellence in Programming Award (Page 16) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Excellence in Programming Award (Page 17) Dr. Dobb's Journal - April 2008 - Conversations (Page 18) Dr. Dobb's Journal - April 2008 - Conversations (Page 19) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 20) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 21) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 22) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 23) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 24) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 25) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 26) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 27) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 28) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 29) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 30) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 31) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 32) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 33) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 34) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 35) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 36) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 37) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 38) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 39) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 40) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 41) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 42) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 43) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 44) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 45) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 46) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 47) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 48) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 49) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 50) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 51) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 52) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 53) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 54) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 55) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page 56) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page Cover3) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.