Dr. Dobb's Journal - June 2008 - (Page 39) d06cass_p4ds 4/11/08 11:00 AM Page 39 Next, see if the code fits into the hardware. Take a look around the Web to find a variety of floating-point and integer FFT algorithms. Floating-point versions are larger and slower, but deliver more precise results. One example, a hybrid approach, can compute a 2Kpoint FFT at a rate of 400 Msamples/s (www.andraka.com/V4_FP_ fft.htm). This is 4.5 times faster than a 2.2GHz dual-core Opteron (www.fftw.org/ speed/opteron-2.2GHz-64bit). Three such FFT engines can fit in a Xilinx SX55. A CORDIC algorithm can be used to do the phase shift, because implementing such an algorithm does not consume many of the FPGA’s logic elements—less than 1000 lookup tables. Moreover, the computations can be pipelined to improve throughput. While you may be impressed with speeds 4.5 times faster, you might still think that you can buy a 4-way Opteron-based system for the same price as an FPGA-based coprocessor with a dual-core Opteron to achieve the same performance gains. But because the functional units will be pipelined in an FPGA-based coprocessor, the phase shift and the inverse FFT come with the cost of some latency until the pipeline is filled—a few hundred nanoseconds. As a result, you achieve a speedup for the phase shift and the inverse FFT for free. The overall throughput improvement can then be estimated as follows: 4.5 + 4.5 + [(25/35)*4.5] = 12.2x (the overall speedup for the pieces we put in hardware) case 256: ROUNDS = 14; break; */ KeyAddition(data,round_key[0]); /* at most ROUNDS-1 ordinary rounds */ for(r = 1; (r <= ROUNDS) ; r++) { Substitution(data); ShiftRow(data); MixColumn(data); KeyAddition(data,round_key[r]); } /* do the last, special, round: */ Substitution(data); ShiftRow(data); KeyAddition(data, round_key[ROUNDS]); } done; The AES algorithm can be broken down into four subroutines: Substitution, ShiftRow, MixColumn, and KeyAddition. In the aforementioned sample, one block of 16 bytes as a 4×4-byte matrix is run through the entire algorithm. In a pipelined version, when one block of data is encrypted with the first subroutine, it moves to the second subroutine and a new block of data is initiated in the first routine—coarse-grain parallelism at the function level. In addition, each subroutine may run in parallel (fine-grain parallelism). As an example, let’s look at the ShiftRow routine. The ShiftRow routine modifies each row of the state matrix. The top row is not changed; the next row is rotated left one position, the following row two positions, and the bottom row three positions. A processor must modify each row, then go to the next row. In an FPGA, all rows can be changed in one clock. Today, there are few commercially available tools that automatically do software parallelization, and those that exist are still in their infancy. The next generation of tools will have to address issues such as coarsegrain versus fine-grain parallelism, recursion, and data dependencies and flow, and various performance issues such as load balancing between the host system CPU and the accelerator (bus hogging, memory access bandwidth, data transfer bottlenecks, and the like). DDJ The calculation [(25/35)*4.5] is an estimate of the performance of the phase shift and is based on the phase shift using similar math to the FFT, and the 25/35 is the proportion of the time taken on the CPU for the FFT. Using Amdahl’s Law, P equals the percent that can be parallelized and S is the speedup of that portion. Using this formula you find that: 1/(1–P)+(P/S) = 1/[(1–0.95) + (0.95/12.2)] = 7.8x application speedup The Advanced Encryption Standard (AES) algorithm is another CPU-intensive piece of code. The main pseudocode is: case 128: ROUNDS = 10; break; case 192: ROUNDS = 12; break; June 2008 l www.ddj.com l Dr. Dobb’s Journal 39 http://www.andraka.com/V4_FP_fft.htm http://www.andraka.com/V4_FP_fft.htm http://www.fftw.org/speed/opteron-2.2GHz-64bit http://www.fftw.org/speed/opteron-2.2GHz-64bit http://www.servoy.com/sexy http://www.servoy.com/sexy http://www.ddj.com
Table of Contents Feed for the Digital Edition of Dr. Dobb's Journal - June 2008 Dr. Dobb's Journal - June 2008 Contents Friday Night Fish Fry Alia Vox Developer Diaries There Must Be Contest Conversations Building a Test Harness for RTOS QT and Windows CE Software to Hardware Parallelization Performance Portable C++ Effective Concurrency The Agile Edge Swaine's Flames Dr. Dobb's Journal - June 2008 Dr. Dobb's Journal - June 2008 - Dr. Dobb's Journal - June 2008 (Page Cover1) Dr. Dobb's Journal - June 2008 - Dr. Dobb's Journal - June 2008 (Page Cover2) Dr. Dobb's Journal - June 2008 - Dr. Dobb's Journal - June 2008 (Page 1) Dr. Dobb's Journal - June 2008 - Dr. Dobb's Journal - June 2008 (Page 2) Dr. Dobb's Journal - June 2008 - Dr. Dobb's Journal - June 2008 (Page 3) Dr. Dobb's Journal - June 2008 - Contents (Page 4) Dr. Dobb's Journal - June 2008 - Contents (Page 5) Dr. Dobb's Journal - June 2008 - Friday Night Fish Fry (Page 6) Dr. Dobb's Journal - June 2008 - Friday Night Fish Fry (Page 7) Dr. Dobb's Journal - June 2008 - Friday Night Fish Fry (Page 8) Dr. Dobb's Journal - June 2008 - Friday Night Fish Fry (Page 9) Dr. Dobb's Journal - June 2008 - Alia Vox (Page 10) Dr. Dobb's Journal - June 2008 - Alia Vox (Page 11) Dr. Dobb's Journal - June 2008 - Alia Vox (Page 12) Dr. Dobb's Journal - June 2008 - Alia Vox (Page 13) Dr. Dobb's Journal - June 2008 - Developer Diaries (Page 14) Dr. Dobb's Journal - June 2008 - Developer Diaries (Page 15) Dr. Dobb's Journal - June 2008 - There Must Be Contest (Page 16) Dr. Dobb's Journal - June 2008 - There Must Be Contest (Page 17) Dr. Dobb's Journal - June 2008 - There Must Be Contest (Page 18) Dr. Dobb's Journal - June 2008 - There Must Be Contest (Page 19) Dr. Dobb's Journal - June 2008 - Conversations (Page 20) Dr. Dobb's Journal - June 2008 - Conversations (Page 21) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 22) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 23) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 24) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page IBM-1) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page IMB-2) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 25) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 26) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 27) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 28) Dr. Dobb's Journal - June 2008 - Building a Test Harness for RTOS (Page 29) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 30) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 31) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 32) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 33) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 34) Dr. Dobb's Journal - June 2008 - QT and Windows CE (Page 35) Dr. Dobb's Journal - June 2008 - Software to Hardware Parallelization (Page 36) Dr. Dobb's Journal - June 2008 - Software to Hardware Parallelization (Page 37) Dr. Dobb's Journal - June 2008 - Software to Hardware Parallelization (Page 38) Dr. Dobb's Journal - June 2008 - Software to Hardware Parallelization (Page 39) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 40) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 41) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 42) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 43) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 44) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 45) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 46) Dr. Dobb's Journal - June 2008 - Performance Portable C++ (Page 47) Dr. Dobb's Journal - June 2008 - Effective Concurrency (Page 48) Dr. Dobb's Journal - June 2008 - Effective Concurrency (Page 49) Dr. Dobb's Journal - June 2008 - Effective Concurrency (Page 50) Dr. Dobb's Journal - June 2008 - Effective Concurrency (Page 51) Dr. Dobb's Journal - June 2008 - The Agile Edge (Page 52) Dr. Dobb's Journal - June 2008 - The Agile Edge (Page 53) Dr. Dobb's Journal - June 2008 - The Agile Edge (Page 54) Dr. Dobb's Journal - June 2008 - The Agile Edge (Page 55) Dr. Dobb's Journal - June 2008 - Swaine's Flames (Page 56) Dr. Dobb's Journal - June 2008 - Swaine's Flames (Page Cover3) Dr. Dobb's Journal - June 2008 - Swaine's Flames (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.