NVIDIA NIM on Google Kubernetes Engine (GKE) 的硬件支持#
以下是 Google Kubernetes Engine (GKE) 上 NVIDIA NIM 的特定硬件配置的受支持优化配置文件。
NIM | 版本 | NIM 所需的最小 GPU 数量 | GPU | GCP 上支持的计算 | |||
---|---|---|---|---|---|---|---|
配置页面中的计算名称 | 实例上的 #GPU | 精度 | 配置文件 | ||||
meta/llama3.1-405b-instruct | 1.1.2 | 8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 吞吐量 |
meta/llama3.1-8b-instruct | 1.1.2 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 吞吐量 |
2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 延迟 | ||
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | BF16 (trt-llm) | 吞吐量 | ||
2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | BF16 (trt-llm) | 延迟 | ||
2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 (vllm) | 非优化 | ||
meta/llama3.1-70b-instruct | 1.1.2 | 4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 吞吐量 |
8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 延迟 | ||
4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | BF16 (trt-llm) | 吞吐量 | ||
8 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-8g | 8 | BF16 (trt-llm) | 延迟 | ||
8 | L4 | L4-<region>-g2-standard-96 | 8 | FP16 (vllm) | 非优化 | ||
meta/llama3-70b-instruct | 1.0.3 | 4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 吞吐量 |
8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 延迟 | ||
4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | FP16 | 吞吐量 | ||
8 | L4 | L4-<region>-g2-standard-96 | 8 | FP16 (vllm) | 非优化 | ||
meta/llama3-8b-instruct | 1.0.3 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | 吞吐量 |
2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | 延迟 | ||
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | 吞吐量 | ||
2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | FP16 | 延迟 | ||
2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 (vllm) | 非优化 | ||
mistralai/mistral-7b-instruct-v.03 | 1.0.3 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 吞吐量 |
2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 延迟 | ||
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 (trt-llm) | 吞吐量 | ||
2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | FP16 (trt-llm) | 延迟 | ||
4 | L4 | L4-<region>-g2-standard-48 | 4 | FP16 (vllm) | 非优化 | ||
mistralai/mixtral-8x7b-instruct-v0.1 | 1.0.0 | 2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8 | 8 | FP8 (trt-llm) | 吞吐量 |
4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | 延迟 | ||
2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2 g | 2 | FP16 (trt-llm) | 吞吐量 | ||
4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | FP16 (trt-llm) | 延迟 | ||
nvidia/nv-rerankqa-mistral-4b-v3 | 1.0.2 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | |
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |
||
nvidia/nv-embedqa-e5-v5 | 1.0.1 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | |
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |
||
nvidia/nv-embedqa-mistral-7b-v2 | 1.0.1 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 | |
1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |