Dr. Dobb's Journal - April 2008 - (Page 22) d04scarp_p4ds 2/13/08 11:46 AM Page 22 Core Technology STRING SEARCH ON MULTICORE PROCESSORS continued from page 20 Because they have a simple branch prediction, jumps are expensive and your code should avoid them, possibly replacing them with speculative execution. Additionally, SPEs cannot access the RAM directly and they have no cache. Instead, they have a small 256-KB memory called “local store” (LS), and transfer data to/from main memory with asynchronous DMA transfers. By cleverly scheduling DMA transfers in your code (with, for instance, double buffering), you can hide part or all of the transfer latency. See our article “Programming the Cell” (www.ddj.com/dept/64bit/197801624). These hardware features revolutionize how you write code. Programmers have been told since Programming 101 to develop code in a top-down fashion. Now on the Cell, we are forced to look down to the bottom and make both ends meet. Hardware constraints strongly influence the algorithm. Searching Strings: One Problem, Too Many Solutions Whether you are scanning files for viruses, performing full-text searches on a large collection of articles, or detecting intrusions on network traffic, you’re doing string searching. Given a text and a dictionary (a set of patterns), your task is to find all occurrences of any of the patterns in the text. String searching is a beaten track, with many proposed algorithms. We employ AhoCorasick, which is famous for appearing in the original “fgrep” UNIX utility. Despite its age (it was presented in 1975), this algorithm is simple and appears more suitable to the Cell than others (Knuth-Morris-Pratt, BoyerMoore, Commentz-Walter, Wu-Manber, and the like). In fact, Aho-Corasick can be expressed well as a branchless data flow that best exploits the SIMD capabilities of the SPEs. On the other hand, its evolutions have more complex control flows, which suffer from frequent and large branch misprediction penalties when run on an SPE. The variant of Aho-Corasick we employ is based on a Deterministic Finite Automaton (DFA), and is fast for two reasons: • The DFA is based on a keyword tree (a trie). A trie has a path of labeled edges per each pattern in the dictionary. To process an input character, you just choose the outgoing edge, which is labeled with that character, and jump to the following node along that edge. • In a DFA, no special rollback actions are required in case of mismatches. A nonmatching input character triggers a simple state transition, like any other character. For more information, see “The Aho-Corasick Algorithm” (en.wikipedia .org/wiki/Aho-Corasick_algorithm). Figure 1: Input text. Why Aho-Corasick and the Cell Make a Good Match The DFA-based Aho-Corasick suits the Cell very well. In fact, processors with SIMD instructions (like the SPE) offer lots of datalevel parallelism that you can exploit by running multiple DFAs together. Moreover, the many (128) registers available allow extensive loop unrolling and code replication. A single instance of a DFA is not sufficient to satisfy this hunger for parallelism. The most natural approach to parallelize the algorithm (thus multiplying throughput) is to adopt multiple instances of the automaton. All the instances act as the same automaton (that is, they share the same state transition table), but they will be provided with distinct chunks of the input text; see Figure 1. The chunks are partially overlapped; otherwise, occurrences of a pattern that appear across a boundary would not be matched. Because a DFA does exactly one state transition per consumed input character, multiple DFAs are naturally synchronized— they start/finish processing a given quantity 22 Dr. Dobb’s Journal l www.ddj.com l April 2008 http://www.ddj.com/dept/64bit/197801624 http://en.wikipedia.org/wiki/Aho-Corasick_algorithm http://en.wikipedia.org/wiki/Aho-Corasick_algorithm http://www.scitools.com http://www.scitools.com http://www.ddj.com
Table of Contents Feed for the Digital Edition of Dr. Dobb's Journal - April 2008 Dr. Dobb's Journal - April 2008 Contents Hmmmm Alia Vox Developer Diaries Dr. Dobb's Excellence in Programming Award Conversations Fast String Search on Multicore Processors The Byzantine Generals Problem Optimizing Math-Intensive Applications with Fixed-Point Arithmetic Random Numbers in a Range Using Generic Programming The Agile Edge Effective Concurrency Swaine's Flames Dr. Dobb's Journal - April 2008 Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page Cover1) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page Cover2) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 1) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 2) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Journal - April 2008 (Page 3) Dr. Dobb's Journal - April 2008 - Contents (Page 4) Dr. Dobb's Journal - April 2008 - Contents (Page 5) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 6) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 7) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 8) Dr. Dobb's Journal - April 2008 - Hmmmm (Page 9) Dr. Dobb's Journal - April 2008 - Alia Vox (Page 10) Dr. Dobb's Journal - April 2008 - Alia Vox (Page 11) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 12) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 13) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 14) Dr. Dobb's Journal - April 2008 - Developer Diaries (Page 15) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Excellence in Programming Award (Page 16) Dr. Dobb's Journal - April 2008 - Dr. Dobb's Excellence in Programming Award (Page 17) Dr. Dobb's Journal - April 2008 - Conversations (Page 18) Dr. Dobb's Journal - April 2008 - Conversations (Page 19) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 20) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 21) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 22) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 23) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 24) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 25) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 26) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 27) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 28) Dr. Dobb's Journal - April 2008 - Fast String Search on Multicore Processors (Page 29) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 30) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 31) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 32) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 33) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 34) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 35) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 36) Dr. Dobb's Journal - April 2008 - The Byzantine Generals Problem (Page 37) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 38) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 39) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 40) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 41) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 42) Dr. Dobb's Journal - April 2008 - Optimizing Math-Intensive Applications with Fixed-Point Arithmetic (Page 43) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 44) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 45) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 46) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 47) Dr. Dobb's Journal - April 2008 - Random Numbers in a Range Using Generic Programming (Page 48) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 49) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 50) Dr. Dobb's Journal - April 2008 - The Agile Edge (Page 51) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 52) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 53) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 54) Dr. Dobb's Journal - April 2008 - Effective Concurrency (Page 55) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page 56) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page Cover3) Dr. Dobb's Journal - April 2008 - Swaine's Flames (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.