Hopper (microarchitecture)
Hopper is the codename for Nvidia's GPU Datacenter microarchitecture that will be parallel to the release of Ada Lovelace (for the consumer segment). It is named after the American computer scientist and United States Navy Rear Admiral Grace Hopper. Hopper was once rumored to be Nvidia's first generation of GPUs that will use multi-chip modules (MCMs), although the H100 announcement showed a massive monolithic die.[1][2][3][4][5][6] Nvidia officially announced the Hopper GPU microarchitecture and H100 GPU at GTC 2022 on March 22, 2022.[7]
Launched | September 20, 2022 |
---|---|
Designed by | Nvidia |
Manufactured by | |
Fabrication process | TSMC N4 |
Specifications | |
L1 cache | 256 KB (per SM) |
L2 cache | 60 MB |
Memory support | HBM3 |
PCIe support | PCIe 5.0 |
Supported Graphics APIs | |
DirectX | DirectX 12 Ultimate (12.2) |
Direct3D | Direct3D 12 |
OpenCL | OpenCL 3.0 |
CUDA | Compute Capability 9.0 |
Supported Compute APIs | |
CUDA | CUDA Toolkit 11.6 |
DirectCompute | Yes |
Media Engine | |
Encoder(s) supported | NVENC |
History | |
Predecessor | Ampere |
Variant | Ada Lovelace (consumer) |
Successor | Blackwell |
.jpg.webp)
Details
Architectural improvements of the Hopper architecture include the following:[8]
- CUDA Compute Capability 9.0[9]
- TSMC N4 FinFET process
- Fourth-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.
- New Nvidia Transformer Engine with FP8 and FP16
- New DPX instructions
- High Bandwidth Memory 3 (HBM3) on H100 80GB
- Double FP32 cores per Streaming Multiprocessor (SM)
- NVLink 4.0
- PCI Express 5.0 with SR-IOV support (SR-IOV is reserved only for H100)
- Second Generation Multi-instance GPU (MIG) virtualization and GPU partitioning feature in H100 supporting up to seven instances
- PureVideo feature set hardware video decoding
- 8 NVDEC for H100
- Adds new hardware-based single-core JPEG decode with 7 NVJPG hardware decoders (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)
Chips
- GH100
Comparison of Compute Capability: GP100 vs GV100 vs GA100 vs GH100[10][11]
GPU features | NVIDIA Tesla P100 | NVIDIA Tesla V100 | NVIDIA A100 | NVIDIA H100 |
---|---|---|---|---|
GPU codename | GP100 | GV100 | GA100 | GH100 |
GPU architecture | NVIDIA Pascal | NVIDIA Volta | NVIDIA Ampere | NVIDIA Hopper |
Transistors | 15.3 billion | 21.1 billion | 54.2 billion | 80 billion |
Process | 16nm | 12nm | TSMC 7nm | TSMC 4nm |
Die size | 610 mm2 | 828 mm2 | 815 mm2 | 814 mm2 |
Compute capability | 6.0 | 7.0 | 8.0 | 9.0 |
Threads / warp | 32 | 32 | 32 | 32 |
Max warps / SM | 64 | 64 | 64 | 64 |
Max threads / SM | 2048 | 2048 | 2048 | 2048 |
Max thread blocks / SM | 32 | 32 | 32 | 32 |
Max Thread Blocks / Thread Block Clusters | N/A | N/A | N/A | 16 |
Max 32-bit registers / SM | 65536 | 65536 | 65536 | 65536 |
Max registers / block | 65536 | 65536 | 65536 | 65536 |
Max registers / thread | 255 | 255 | 255 | 255 |
Max thread block size | 1024 | 1024 | 1024 | 1024 |
FP32 cores / SM | 64 | 64 | 64 | 128 |
Ratio of SM registers to FP32 cores | 1024 | 1024 | 1024 | 512 |
Shared Memory Size / SM | 64 KB | Configurable up to 96 KB | Configurable up to 164 KB | Configurable up to 228 KB |
Comparison of Precision Support Matrix[12][13]
Supported CUDA Core Precisions | Supported Tensor Core Precisions | |||||||||||||||||
FP8 | FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | FP8 | FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA Tesla P4 | No | No | Yes | Yes | No | No | Yes | No | No | No | No | No | No | No | No | No | No | No |
NVIDIA P100 | No | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
NVIDIA Volta | No | Yes | Yes | Yes | No | No | Yes | No | No | No | Yes | No | No | No | No | No | No | No |
NVIDIA Turing | No | Yes | Yes | Yes | No | No | Yes | No | No | No | Yes | No | No | Yes | Yes | Yes | No | No |
NVIDIA A100 | No | Yes | Yes | Yes | No | No | Yes | No | Yes | No | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
NVIDIA H100 | No | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | Yes | No | Yes | No | No | Yes | Yes | Yes |
Legend:
- FPnn: floating point with nn bits
- INTn: integer with n bits
- INT1: binary
- TF32: TensorFloat32
- BF16: bfloat16
Comparison of Decode Performance
Concurrent streams | H.264 decode (1080p30) | H.265 (HEVC) decode (1080p30) | VP9 decode (1080p30) |
---|---|---|---|
V100 | 16 | 22 | 22 |
A100 | 75 | 157 | 108 |
H100 | 170 | 340 | 260 |
Images/sec[11] | JPEG 4:4:4 decode(1080p) | JPEG 4:2:0 decode(1080p) |
---|---|---|
A100 | 1490 | 2950 |
H100 | 3310 | 6350 |
Products using Hopper
- Nvidia Data Center GPUs
- Nvidia H100 80GB (GH100)
References
- kopite7kimi (June 10, 2019). "After Ampere, the next codename of GeForce is Hopper, in memory of Grace Hopper". @kopite7kimi. Retrieved December 1, 2019.
- "Hardware- und Nachrichten-Links des 11./12. November 2019". www.3dcenter.org (in German). Retrieved December 1, 2019.
- Hagedoorn, Hilbert. "NVIDIA Next Gen-GPU Hopper could be offered in chiplet design". Guru3D.com. Retrieved December 1, 2019.
- Pirzada, Usman (November 16, 2019). "NVIDIA Next Generation Hopper GPU Leaked - Based On MCM Design, Launching After Ampere". Wccftech. Retrieved December 1, 2019.
- "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech. March 22, 2022.
- "NVIDIA Hopper Architecture In-Depth". Nvidia. March 22, 2022.
- "NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing".
- "NVIDIA Hopper GPU Architecture".
- "CUDA C++ Programming Guide".
- "NVIDIA A100 Tensor Core GPU Architecture" (PDF). www.nvidia.com. Retrieved September 18, 2020.
- "NVIDIA H100 Tensor Core GPU Architecture Whitepaper". NVIDIA.
- "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.
- "Abstract". docs.nvidia.com.