Xcell China 28 - (Page 13) 技 術 專 欄 Driver Assistance Features Awareness Blindspot Detection Sign Recognition Park/Back-Up Assist Lane Change Assist Night Vision Warning Lane Departure Warning Front Collision Warning Back-Up Warning Aid Pedestrian Detection Drowsy Driver Temporary Control Convenience Oriented Adaptive Cruise Control Automated Parking Performance Enhancement Panic Brake Assist Pre-Crash Sensing Side Impact Detection Pedestrian Protection 週期用於兩條四路 S A D 指令的流 水 線 處理 (週期 1 用於 sad1/sad2,週期 2 用 於 sad3/sad4) 三個週期用於部分結果的 ; 累加(週期3、4 和 5) 。因此,如果只處 理一個塊,則一個 300 MHz 的 Nexperia PNX1500 處理器的處理能力最高可達 60 MMAE/s。 如果每次處理一個以上 4 x 4 塊,最高 性能可略有提高。例如,可以在七個週期 內計算兩個並行 4 x 4 塊的 MAE,這時性 能可達 85.71 MMAE/s; 而處理三個塊需要 九個週期,即性能為100MMAE/s。 可並 行處理的最 大 塊 數 分別受限於 任意長指令字中允許的 SIMD SAD 運算次 數、VLIW-CPU 的通用寄存器數和優化編 譯器的調度算法。如果繼續增加塊數,整 體性能會趨於飽和,因此我們考慮並行處 理的 MAE 不超過三個。 德州儀器 (TI) 的 TMSD320DM6437 數 字 媒 體 處 理 器 每 週 期 有一 條 由 八 次 基 本 R I S C 運 算組 成的長指令,分別通 過 兩條數據通 路,各通 路每週期有四個時 隙。其 VLIW-CPU 每週期最多可執行兩條 SAD 指令(在 TI DM6437 數據手冊中稱 為 “subabs4” ,各指令有一個週期的延 ) 遲。但是,要累加部分結果,就必須使用 常數 0x01010101 執行具有三週期延遲的 (稱為 “dotpsu4” 。 ) SIMD MAC 運算 所以 ,600 MHz 的 TI DM6437 DSP-CPU 可以用七個週期計算一個 MAE (如表 2 所 示) ,因此對於 4 x 4 像素塊的最高性能為 85.71 MMAE/s。如果並行處理兩個塊, 就需 要九個週期,性能為 133.33 MMAE/s; 而三 個塊需要11個週期,性能為163.64 MMAE/s, 這仍然低於我們的 250 MSAD/s 要求 。 Safety Oriented Automatic Braking Lane Keeping 圖 1 - 駕駛員輔助功能 src1 (32-bit unsigned) unsigned unsigned unsigned unsigned src2 (32-bit unsigned) unsigned unsigned unsigned unsigned || SAD performs 11 RISC operations || || || + dst (32-bit unsigned) unsigned + + 圖 2 - SIMD 示例:四路 8 位樣本的 SAD 運算 週期內即可有效執行相當於 11 條基本指 令的運算 ,如圖 2 所示。 例如,Nexperia PNX1500 媒體處理器 配有 32 位 TriMedia VLIW-CPU,對於具有 二週期延遲的 8 位像素,可以在一個時 鐘週期內執行兩條四路 SAD 指令。算上 Slot 1 sad1=8meii(A1,B1) sad3=8meii(A3,B3) sad12=sad1+sad2 sad34=sad3+sad4 tot=sad12+sad34 Slot 2 nop nop nop nop nop 超長指令字,就是每時鐘週期最多五條 基本 RISC/SIMD 指令,其中只有兩條可以 是 SAD 指令 (在 TriMedia 數據手冊中稱為 “8meii” 。 ) 所以 ,對 4 x 4 大小的塊進行 MAE 計 算需要五個時鐘週期,如表 1 所示:兩個 Slot 3 sad2=8meii(A2,B2) sad4=8meii(A4,B4) nop nop nop Slot 4 nop nop nop nop nop Slot 5 nop nop nop nop nop Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 表 1 - Nexperia/TriMedia VLIW DSP-CPU 上的 MAE 計算的偽匯編代碼,使用四路 8 位子字並行處理 Slot 1 L1 d1=subabs4(A1,B1) d3=subabs4(A3,B3) nop nop nop sad13=sad1+sad3 tot = sad13+sad24 Slot 2 S1 nop nop nop nop nop nop nop Slot 3 Slot 4 M1 D1 nop nop sad1=dotpsu4(d1, 0x01010101) nop sad3=dotpsu4(d3, 0x01010101) nop nop nop nop nop nop nop nop nop Slot 5 L2 d2=subabs4(A2,B2) d4=subabs4(A4,B4) nop nop nop sad24=sad2+sad4 nop Slot 6 S2 nop nop nop nop nop nop nop Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Slot 7 Slot 8 M2 D2 nop nop sad2=dotpsu4(d2, 0x01010101) nop sad4=dotpsu4(d4, 0x01010101) nop nop nop nop nop nop nop nop nop 表 2 - TI VLIW DSP-CPU 上的 MAE 計算的偽匯編代碼,使用四路 8 位子字並行處理 Slot 1 L1 d1=sub2(A1,B1) ad1=abs2(d1) ad3=abs2(d3) z13=add2(ad1, ad3) nop nop nop tot = s13 + s24 Slot 2 S1 d3=sub2(A3,B3) nop nop nop nop nop nop nop Slot 3 M1 nop nop nop nop s13=dotp2(z13, 0x00010001) nop nop nop Slot 4 D1 nop nop nop nop nop nop nop nop Slot 5 Slot 6 L2 S2 d2=sub2(A2,B2) d4=sub2(A4,B4) ad2=abs2(d2) nop ad4=abs2(d4) nop z24=add2(ad2, ad4) nop nop nop nop nop nop nop nop nop Slot 7 M2 nop nop nop nop s24=dotp2(z24, 0x00010001) nop nop nop Slot 8 D2 nop nop nop nop nop nop nop nop Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 表 3 - TI VLIW DSP-CPU 上的 MAE 計算的偽匯編代碼,使用雙路 16 位子字並行處理 2008年春季刊 13
Table of Contents Feed for the Digital Edition of Xcell China 28 Xcell China 28 Avnet Ad Table of Contents At the Heart of Consumer and Automotive Innovation Designing Digital Displays with Spartan-3 Generation FPGAs A High-Speed Broadcast Video Connectivity Solution Block Matching for Automotive Applications on Spartan-3A DSP Devices Taking Device DNA Technology to the Next Level Designing GPS Systems Using CoolRunner-II CPLDs Designing Portable Handsets Using CoolRunner-II CPLDs Scalable and Flexible In-Vehicle Networking A Compact Multimedia Display Development Platform for Automotive and Industrial Markets Supporting Multiple SD Devices with CPLDs Decrease Processor Power Consumption Using a CPLD Easing Design Challenges with CoolRunner-II CPLDs ISE Design Suite10.1 Xilinx Training Courses Apr - Jun Nu Horizons Ad Xilinx Ad Xcell China 28 Xcell China 28 - Xcell China 28 (Page 1) Xcell China 28 - Avnet Ad (Page 2) Xcell China 28 - Table of Contents (Page 3) Xcell China 28 - At the Heart of Consumer and Automotive Innovation (Page 4) Xcell China 28 - At the Heart of Consumer and Automotive Innovation (Page 5) Xcell China 28 - At the Heart of Consumer and Automotive Innovation (Page 6) Xcell China 28 - Designing Digital Displays with Spartan-3 Generation FPGAs (Page 7) Xcell China 28 - Designing Digital Displays with Spartan-3 Generation FPGAs (Page 8) Xcell China 28 - A High-Speed Broadcast Video Connectivity Solution (Page 9) Xcell China 28 - A High-Speed Broadcast Video Connectivity Solution (Page 10) Xcell China 28 - A High-Speed Broadcast Video Connectivity Solution (Page 11) Xcell China 28 - Block Matching for Automotive Applications on Spartan-3A DSP Devices (Page 12) Xcell China 28 - Block Matching for Automotive Applications on Spartan-3A DSP Devices (Page 13) Xcell China 28 - Block Matching for Automotive Applications on Spartan-3A DSP Devices (Page 14) Xcell China 28 - Taking Device DNA Technology to the Next Level (Page 15) Xcell China 28 - Taking Device DNA Technology to the Next Level (Page 16) Xcell China 28 - Taking Device DNA Technology to the Next Level (Page 17) Xcell China 28 - Designing GPS Systems Using CoolRunner-II CPLDs (Page 18) Xcell China 28 - Designing GPS Systems Using CoolRunner-II CPLDs (Page 19) Xcell China 28 - Designing Portable Handsets Using CoolRunner-II CPLDs (Page 20) Xcell China 28 - Designing Portable Handsets Using CoolRunner-II CPLDs (Page 21) Xcell China 28 - Scalable and Flexible In-Vehicle Networking (Page 22) Xcell China 28 - Scalable and Flexible In-Vehicle Networking (Page 23) Xcell China 28 - A Compact Multimedia Display Development Platform for Automotive and Industrial Markets (Page 24) Xcell China 28 - A Compact Multimedia Display Development Platform for Automotive and Industrial Markets (Page 25) Xcell China 28 - A Compact Multimedia Display Development Platform for Automotive and Industrial Markets (Page 26) Xcell China 28 - A Compact Multimedia Display Development Platform for Automotive and Industrial Markets (Page 27) Xcell China 28 - A Compact Multimedia Display Development Platform for Automotive and Industrial Markets (Page 28) Xcell China 28 - Supporting Multiple SD Devices with CPLDs (Page 29) Xcell China 28 - Supporting Multiple SD Devices with CPLDs (Page 30) Xcell China 28 - Supporting Multiple SD Devices with CPLDs (Page 31) Xcell China 28 - Decrease Processor Power Consumption Using a CPLD (Page 32) Xcell China 28 - Decrease Processor Power Consumption Using a CPLD (Page 33) Xcell China 28 - Decrease Processor Power Consumption Using a CPLD (Page 34) Xcell China 28 - Decrease Processor Power Consumption Using a CPLD (Page 35) Xcell China 28 - Decrease Processor Power Consumption Using a CPLD (Page 36) Xcell China 28 - Easing Design Challenges with CoolRunner-II CPLDs (Page 37) Xcell China 28 - Easing Design Challenges with CoolRunner-II CPLDs (Page 38) Xcell China 28 - Easing Design Challenges with CoolRunner-II CPLDs (Page 39) Xcell China 28 - Xilinx Training Courses Apr - Jun (Page 40) Xcell China 28 - Nu Horizons Ad (Page 41) Xcell China 28 - Xilinx Ad (Page 42)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.