EE Times Under The Hood - October 8, 2007 - (Page 56) under the hood: SEMICONDUCTORS the four cores together. The dual stress nitride liners with embedded SiGe source/drain regions increase both n- and p-channel mobility, resulting in higher current drive. As before, AMD’s implementation of its 65-nm technology on an SOI substrate can increase latch-up resistance and reduce short-channel effects over an analogous bulk silicon implementation. Transistor performance Interestingly, the transistor performances of Intel’s Woodcrest and AMD’s Barcelona appear to match fairly closely, with the Barcelona’s gate leakage about half that of the Woodcrest. This is not so surprising as Intel uses a 25 percent thinner gate dielectric. AMD’s device shows consistently lower gate dielectric leakage than Intel’s, especially on the pFETs. The current drive for both devices is comparable, with the Barcelona coming out marginally higher for the pFET but lower for the nFET devices measured. However, the leakage current (I[subscript] off) for the nFETs was two to five times lower in the Woodcrest, suggesting the need for a bit more optimization of AMD’s transistor. Since AMD and Intel have always considered the total package, system-level performance for a particular application generally has the final word. That information, however, is not yet available for the Barcelona. Some of the changes AMD has made are intended to: 1. Increase bandwidth in operation execution—for example, the decode and instruction fetch—thereby increasing loads per cycle from the cache. This should improve AMD’s video-encoding performance. 2. Improve performance by adding an indirect branch predictor, which reduces mispredicted branches and increases processor efficiency. This architectural improvement adopted in the Barcelona architecture follows Intel’s implementation in the Prescott processor. 3. Offload certain frequent operations to dedicated hardware, using a sideband stack optimizer. This approach, similar in function to Intel’s dedicated stack manager, removes some of the load from the processor’s decoders and helps reduce pipeline clogging. 4. Add the capability to reorder load instructions and enable memory access optimization; this serves to increase instruction load speed— again, similar to the capability implemented by Intel in its Core 2 processor architecture. 5. Reduce the frequency of switching between read and write memorycontrol operations by using a “writebursting” operation (with standard DDR2 memory, one or the other can be done, but not simultaneously; switching from one to the other introduces delays). In Intel’s case, the fully buffered dual in-line memory module (FB-DIMM) architecture allows these operations to be performed simultaneously while also increasing reliability. 6. Improve entire chip performance by adding a new DRAM prefetcher (prefetchers have already been used extensively in different areas and components of the microprocessor). This prefetcher is located within the memory controller where none had existed before. It monitors the various memory requests to predict trends to identify and pull data that appears likely to be used in the future. This is stored in a separate buffer, which, incidentally, is identical to the write-bursting buffer used in the memory controller that improves performance and efficiency. Power efficiency Each core contains its own PLL, clock distribution system and power grid, with independent power/performance management capability (the core voltage and individual core fre- www.eetimes.com • www.techonline.com quencies operate independently of the Northbridge). This enables them to enter power-efficient states while the processor interface operates at full speed to service DDR2/3 memory and HyperTransport traffic. AMD has incorporated temperature controls for each of the four cores by implementing eight remote temperature sensors distributed across the core, and an additional six remote sensors in the Northbridge block. The controller tracks temperatures against predetermined limits and selects power-saving mode options to reduce die temperature. The cache is implemented with a standard 6T memory cell. AMD has provided custom tuning of the write pulse time after device fabrication by enabling programming with electrical fuses. This helps to provide scalability across a wide range of cache sizes. So what does all this mean? At the transistor level, performance is fairly well matched, with this exception: Intel and AMD appear to have optimized their devices differently, resulting in Intel having lower I[subscript off] leakage current, and AMD having lower gate dielectric leakage. How this relates to overall system performance will be seen in time. The race continues When shipments start, the advanced technology expected to be employed in the Penryn architecture will be difficult or impossible to match until AMD’s 45-nm technology is introduced in turn. Intel is not only racing the clock with AMD for the microprocessor performance crown, but also with Matsushita Electric Industrial Co. Ltd. for technology leadership. In this less-visible race, Matsushita may beat Intel to 45-nm commercialization, albeit without a high-k gate offering. This is a contest that AMD has chosen not to participate in, but rather to pursue the same objective in its own fashion, and on its own timetable. ■ 56 Electronic Engineering Times, TechOnline | October 8, 2007 http://www.eetimes.com http://www.techonline.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.