NVIDIA NIM on Google Kubernetes Engine (GKE) 的硬件支持#

以下是 Google Kubernetes Engine (GKE) 上 NVIDIA NIM 的特定硬件配置的受支持优化配置文件。

NIM 版本 NIM 所需的最小 GPU 数量 GPU GCP 上支持的计算
配置页面中的计算名称 实例上的 #GPU 精度 配置文件
meta/llama3.1-405b-instruct 1.1.2 8 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 吞吐量
meta/llama3.1-8b-instruct 1.1.2 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 吞吐量
2 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 延迟
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 BF16 (trt-llm) 吞吐量
2 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-2g 2 BF16 (trt-llm) 延迟
2 L4 L4-<region>-g2-standard-24 2 FP16 (vllm) 非优化
meta/llama3.1-70b-instruct 1.1.2 4 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 吞吐量
8 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 延迟
4 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-4g 4 BF16 (trt-llm) 吞吐量
8 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-8g 8 BF16 (trt-llm) 延迟
8 L4 L4-<region>-g2-standard-96 8 FP16 (vllm) 非优化
meta/llama3-70b-instruct 1.0.3 4 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 吞吐量
8 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 延迟
4 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-4g 4 FP16 吞吐量
8 L4 L4-<region>-g2-standard-96 8 FP16 (vllm) 非优化
meta/llama3-8b-instruct 1.0.3 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP16 吞吐量
2 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP16 延迟
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 FP16 吞吐量
2 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-2g 2 FP16 延迟
2 L4 L4-<region>-g2-standard-24 2 FP16 (vllm) 非优化
mistralai/mistral-7b-instruct-v.03 1.0.3 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 吞吐量
2 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 延迟
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 FP16 (trt-llm) 吞吐量
2 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-2g 2 FP16 (trt-llm) 延迟
4 L4 L4-<region>-g2-standard-48 4 FP16 (vllm) 非优化
mistralai/mixtral-8x7b-instruct-v0.1 1.0.0 2 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8 8 FP8 (trt-llm) 吞吐量
4 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8 (trt-llm) 延迟
2 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-2   g 2 FP16 (trt-llm) 吞吐量
4 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-4g 4 FP16 (trt-llm) 延迟
nvidia/nv-rerankqa-mistral-4b-v3 1.0.2 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP16
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 FP16
2 L4 L4-<region>-g2-standard-24 2 FP16
nvidia/nv-embedqa-e5-v5 1.0.1 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP16
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 FP16
2 L4 L4-<region>-g2-standard-24 2 FP16
nvidia/nv-embedqa-mistral-7b-v2 1.0.1 1 H100 (80GB) H100(80GB)-<region>-a3-highgpu-8g 8 FP8
1 A100 (80GB) A100(80GB)-<region>-a2-ultragpu-1g 1 FP16
2 L4 L4-<region>-g2-standard-24 2 FP16