Hopper (microarchitecture)

Hopper is the codename for Nvidia's GPU Datacenter microarchitecture that will be parallel to the release of Ada Lovelace (for the consumer segment). It is named after the American computer scientist and United States Navy Rear Admiral Grace Hopper. Hopper was once rumored to be Nvidia's first generation of GPUs that will use multi-chip modules (MCMs), although the H100 announcement showed a massive monolithic die.[1][2][3][4][5][6] Nvidia officially announced the Hopper GPU microarchitecture and H100 GPU at GTC 2022 on March 22, 2022.[7]

Hopper
LaunchedSeptember 20, 2022 (2022-09-20)
Designed byNvidia
Manufactured by
Fabrication processTSMC N4
Specifications
L1 cache256 KB (per SM)
L2 cache60 MB
Memory supportHBM3
PCIe supportPCIe 5.0
Supported Graphics APIs
DirectXDirectX 12 Ultimate (12.2)
Direct3DDirect3D 12
OpenCLOpenCL 3.0
CUDACompute Capability 9.0
Supported Compute APIs
CUDACUDA Toolkit 11.6
DirectComputeYes
Media Engine
Encoder(s) supportedNVENC
History
PredecessorAmpere
VariantAda Lovelace (consumer)
SuccessorBlackwell
Grace Hopper, eponym of the architecture

Details

Architectural improvements of the Hopper architecture include the following:[8]

  • CUDA Compute Capability 9.0[9]
  • TSMC N4 FinFET process
  • Fourth-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.
  • New Nvidia Transformer Engine with FP8 and FP16
  • New DPX instructions
  • High Bandwidth Memory 3 (HBM3) on H100 80GB
  • Double FP32 cores per Streaming Multiprocessor (SM)
  • NVLink 4.0
  • PCI Express 5.0 with SR-IOV support (SR-IOV is reserved only for H100)
  • Second Generation Multi-instance GPU (MIG) virtualization and GPU partitioning feature in H100 supporting up to seven instances
  • PureVideo feature set hardware video decoding
  • 8 NVDEC for H100
  • Adds new hardware-based single-core JPEG decode with 7 NVJPG hardware decoders (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Chips

  • GH100

Comparison of Compute Capability: GP100 vs GV100 vs GA100 vs GH100[10][11]

GPU features NVIDIA Tesla P100 NVIDIA Tesla V100 NVIDIA A100 NVIDIA H100
GPU codename GP100 GV100 GA100 GH100
GPU architecture NVIDIA Pascal NVIDIA Volta NVIDIA Ampere NVIDIA Hopper
Transistors 15.3 billion 21.1 billion 54.2 billion 80 billion
Process 16nm 12nm TSMC 7nm TSMC 4nm
Die size 610 mm2 828 mm2 815 mm2 814 mm2
Compute capability 6.0 7.0 8.0 9.0
Threads / warp 32 32 32 32
Max warps / SM 64 64 64 64
Max threads / SM 2048 2048 2048 2048
Max thread blocks / SM 32 32 32 32
Max Thread Blocks / Thread Block Clusters N/A N/A N/A 16
Max 32-bit registers / SM 65536 65536 65536 65536
Max registers / block 65536 65536 65536 65536
Max registers / thread 255 255 255 255
Max thread block size 1024 1024 1024 1024
FP32 cores / SM 64 64 64 128
Ratio of SM registers to FP32 cores 1024 1024 1024 512
Shared Memory Size / SM 64 KB Configurable up to 96 KB Configurable up to 164 KB Configurable up to 228 KB

Comparison of Precision Support Matrix[12][13]

Supported CUDA Core Precisions Supported Tensor Core Precisions
FP8 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16 FP8 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16
NVIDIA Tesla P4 NoNoYesYesNoNoYesNoNoNoNoNoNoNoNoNoNoNo
NVIDIA P100 NoYesYesYesNoNoNoNoNoNoNoNoNoNoNoNoNoNo
NVIDIA Volta NoYesYesYesNoNoYesNoNoNoYesNoNoNoNoNoNoNo
NVIDIA Turing NoYesYesYesNoNoYesNoNoNoYesNoNoYesYesYesNoNo
NVIDIA A100 NoYesYesYesNoNoYesNoYesNoYesNoYesYesYesYesYesYes
NVIDIA H100 NoYesYesYesNoNoYesNoYesYesYesNoYesNoNoYesYesYes

Legend:

  • FPnn: floating point with nn bits
  • INTn: integer with n bits
  • INT1: binary
  • TF32: TensorFloat32
  • BF16: bfloat16

Comparison of Decode Performance

Concurrent streams H.264 decode (1080p30) H.265 (HEVC) decode (1080p30) VP9 decode (1080p30)
V100 16 22 22
A100 75 157 108
H100 170 340 260
Images/sec[11] JPEG 4:4:4 decode(1080p) JPEG 4:2:0 decode(1080p)
A100 1490 2950
H100 3310 6350

Products using Hopper

See also

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.