Bulldozer is the codename AMD has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10 watt to 100 watt TDP computing products. Bulldozer is a completely new design developed from the ground up. AMD claims dramatic performance-per-watt improvements in HPC applications with Bulldozer cores. Products implementing the Bulldozer core are planned for release in 2011.
According to AMD, Bulldozer-based CPUs will be based on advanced 32nm SOI process technology and utilize a new approach to multithreaded computer performance that, according to press notes, "balances dedicated and shared compute resources to provide a highly compact, high core count design that is easily replicated on a chip for performance scaling." In other words, by eliminating some of the redundancies that naturally creep into multicore designs, AMD hopes to take better advantage of its hardware capabilities, while utilizing less power.
The Bulldozer cores will support most of the instruction sets currently implemented in Intel processors (including SSE4.1, SSE4.2, AES, CLMUL), future Instruction sets announced by Intel (AVX), as well as future instruction sets proposed by AMD (XOP and FMA4).
As of November 2009, Bulldozer-based implementations built on 32nm SOI with HKMG are scheduled to arrive in 2011 for both servers and desktops, as the 16-core Opteron processor codenamed Interlagos and as the 4- or 8-core desktop processor codenamed Zambezi.
Bulldozer is the next-generation micro-architecture and processor design developed from the ground up by AMD. Bulldozer will be the first major redesign of AMD’s processor architecture since 2003, when the firm launched its Athlon 64/Opteron (K8) processors. Bulldozer will feature two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied with two integer cores each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD calls this design a "Bulldozer module". A 16-core processor design would feature eight of these modules, but the operating system will see each module as two physical cores.
The module is similar to an SMT core, but enhanced with a dedicated integer core and scheduler for each thread. Because the shared floating point core is significantly enhanced, performance could get beyond that of two equivalent Bobcat cores while one of the running threads is integer-only.
Bulldozer Design Breakdown
* Two tightly coupled, "conventional" x86 out-of-order processing engines which AMD internally named module
(Single-Module ==> Dual-Core, Dual-Module ==> Quad-Core, Quad-Module ==> Octa-Core etc...)
* Between 8MB to 16MB of L3 cache shared among all Modules on the same silicon die
* DDR3-1866 and Higher Memory Level Parallelism
* Dual channel DDR3 integrated memory controler (support for PC3-12800 (DDR3-1600))
* Cluster Multi-threading (CMT) Technology
* Bulldozer module consists of the following:
o 128kB L2 cache inside each module (shared between module cores)
o 4kB L1 data cache per core and 2-way 16kB L1 instruction cache per module L1 cache, Fruehe for THW
o Two dedicated integer cores
- each consist of 2 ALU and 2 AGU which are capable for total of 4 independent arithmetic or memory operations per clock per core
- duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications
- second integer core increases Bulldozer module die by around 12%, which at chip level adds about 5% of total die space
o Two symmetrical 128-bit FMAC (fused multiply-add (FMA) capability) Floating Point Pipelines per module that can be unified into one large 256-bit wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/3DNow! capable FPPs for backward compatibility with SSE2 non-optimized software
* 32nm SOI process with implemented first generation GF's High-K Metal Gate (HKMG)
* Support for AMD's only SSE5 128-bit instructions
- incl. three smaller supplemental extensions CVT16, XOP and FMA4 instruction set, which are now part of SSE5 specification (since May 2009 revision)
* Support for Intel's Advanced Vector Extensions (AVX) (Supports 256-Bit FP Operations via AVX)SSE4.1, SSE4.2, AES, CLMUL), future Instruction sets announced by Intel (AVX), as well as future instruction sets proposed by AMD (XOP and FMA4
* Hyper Transport Technology rev.3.1 (3.20 GHz, 6.4 GT/s, 51.6 GB/s, 16-bit uplink/16-bit downlink) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
* Socket AM3+ (AM3r2)
- 938pin(?), DDR3 support
- will retain only backwards compatiblity with previous Socket AM3/AM2 processors ("new AM3+ socket for consumer versions of Bulldozer CPUs. AM2 and AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards")
* Min-Max Power Usage - 10-100 watts
* Bulldozer Module sharing levels Bulldozer module