Intel Core i7 Review: Nehalem Gets Real

 

Nehalem is here.

Anticipation for Intel’s latest CPU architecture rivals the intensity for the original Core 2 Duo. It’s not just that Nehalem is a new CPU architecture. Intel’s new CPU line also brings along with it a new system bus, new chipsets, and a new socket format.

Today, we’re mainly focusing on the Core i7 CPU and its performance compared to Intel’s Core 2 quad-core CPUs. There’s a ton of data to sift through just on CPU performance. We’ll have ample opportunity to dive into the platform, and its tweaks, in future articles.

Intel will be launching three new Core i7 products in the next couple of weeks, at 2.66GHz, 2.93GHz, and 3.20GHz, at prices ranging from $285 to $999 (qty. 1,000). That’s right: You’ll be able to pick up a Core i7 CPU for around $300 fairly soon. Of course, that’s not the whole story: You’ll need a new motherboard and very likely, new memory, since the integrated memory controller only supports DDR3.

In the past several weeks, we’ve been locked in the basement lab, running a seemingly endless series of benchmarks on six different CPUs. Now it’s time to talk results. While we’ll be presenting our usual stream of charts and numbers, we’ll try to put them in context, including discussions of how and when it might be best to upgrade.

Let’s get started with a peek under the hood.

Core i7 Genesis
The Core i7 is Intel’s first new CPU architecture since the original Core 2 shipped back in July, 2006. It’s hard to believe that the first Core 2 processors shipped over two years ago.

Since then, Intel has shipped incremental updates to the product line. Quad-core Core 2 CPUs arrived in November 2006, in the form of the QX6700. AMD was quick to point out that Intel’s quad-core solutions weren’t “true” quad-core processors, but consisted of two Core 2 Duo dies in a single package. Despite that purist objection, Intel’s quad-core solutions proved highly successful in the market.

The original Core 2 line was built on a 65nm manufacturing process. In late 2007, Intel began shipping 45nm CPUs, code-named Penryn. Intel’s 45nm processors offered a few incremental feature updates, but were basically continuations of the Core 2 line.

In the past year, details about Nehalem began dribbling out, culminating with full disclosure of the Core i7 architecture at the August, 2008 Intel Developer Forum. If you want more details about Nehalem’s architecture, that article is well worth a read. However, we’ll touch on a few highlights now.

Cache and Memory
The initial Core i7 CPUs will offer a three-tiered cache structure. Each individual core contains two caches: a 64K L1 cache (split into a 32K instruction cache and a 32K data cache), plus a 256K unified L2 cache. An 8MB L3 cache is shared among the four cores. That 256K L2 cache is interesting, because it’s built with an 8-T (eight transistors per cell) SRAM structure. This facilitates running at lower voltages, but also takes up more die space. That’s one reason the core-specific L2 cache is smaller than you might otherwise expect.

Like AMD’s current CPU line, Nehalem uses an integrated, on-die memory controller. Intel has finally moved the memory controller out of the north bridge. The current memory controller supports only DDR3 memory. The new controller also supports three channels of DDR3 per socket, with up to three DIMMs per channel supported. Earlier, MCH-style memory controllers only supported two channels of DRAM.

The use of triple-channel memory mitigates the relatively low, officially supported DDR3 clock rate of 1066MHz (effective.) In conversations with various Intel representatives, they were quick to point out that three channels of DDR3-1066 equates to 30GB/sec of memory bandwidth

The integrated memory controller also clocks higher than one built into a north bridge chip, although not necessarily at the full processor clock speed. This higher clock, plus the lack of having to communicate over a north bridge link, substantially improves memory latency.

To facilitate the integrated memory controller, Intel developed a new, point-to-point system connect, similar in concept to AMD’s HyperTransport. Known as QuickPath Interconnect or QPI for short, the new interconnect can move data at peak rates of 25GB/sec (at a 6.4 gigatranfers per second base). Note that not all Nehalem processors will support the full theoretical bandwidth. The Core i7 940 and 920 CPUs support the 4.8 gigatransfer per second base rate, with a maximum throughput of 19.2GB/sec per channel. That’s still more than enough bandwidth for three DDR3-1066 memory channels.

Improvements to the Base Core Architecture
Core i7 boasts a substantial set of enhancements over the original Core 2 architecture, some of which are more subtle than others.

Let’s run down some of the more significant enhancements, in no particular order.

* The Return of Hyper-Threading—Core i7 now implements Hyper-Threading, Intel’s version of simultaneous multithreading. Each processor core can handle two simultaneous execution threads. Intel added processor resources, including deeper buffers, to enable robust SMT support. Load buffers have been increased from 32 (Core 2) to 48 (Core i7), while the number of store buffers went from 20 to 32.
* New SSE4.2 instructions—Intel enhanced SSE once again, by adding instructions that can help further speed up media transcoding and 3D graphics.
* Fast, unaligned cache access—Before Nehalem, data needed to be aligned on cache line boundaries for maximum performance. That’s no longer true with Nehalem. This will help newer applications written for Nehalem, more than older ones, only because compilers and application authors often took great care to align data along cache line boundaries.
* Advanced Power Management—The Core i7 actually contains another processor core, much tinier than the main cores. This is the power management unit, and is a dedicated microcontroller on the Nehalem die that’s not accessible from the outside world. Its sole purpose is to manage the power envelope of Nehalem. Sensors built into the main cores monitor thermals, power and current, optimizing power delivery as needed. Nehalem is also engineered to minimize idle power. For example, Core i7 implements a per core C6 sleep state.
* Turbo Mode—One interesting aspect of Core i7’s power management is Turbo Mode (not to be confused with Turbo Cache). Turbo mode is a sort of automatic overclocking features, in which individual cores can be driven to higher clock frequencies as needed. Turbo Mode is treated as another sleep state by the power management unit, and operates transparently to the OS and the user.

 

 

 

 

{module Phoca Facebook Comments|none}

 

.