面向 LLM 的 NVIDIA NIM 的气隙部署#
面向大型语言模型 (LLM) 的 NVIDIA NIM 支持在气隙系统(也称为气墙、气隙或断开连接的网络)中服务模型。在使用本文档之前,请查看入门指南中的所有先决条件和说明,并参阅从本地资产服务模型。
气隙部署(离线缓存路由)#
如果 NIM 检测到缓存中先前加载的配置文件,它将从缓存中服务该配置文件。使用 download-to-cache
将配置文件下载到缓存后,可以将缓存传输到气隙系统以运行 NIM,而无需任何互联网连接,也无需连接到 NGC 注册表。
要查看实际效果,请勿提供 NGC_API_KEY,如以下示例所示。
# Create an example air-gapped directory where the downloaded NIM will be deployed
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache
mkdir -p "$AIR_GAP_NIM_CACHE"
# Transport the downloaded NIM to an air-gapped directory
cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE"
# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct
# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct
# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:latest"
# Assuming the command run prior was `download-to-cache`, downloading the optimal profile
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
# Assuming the command run prior was `download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b`
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NIM_MODEL_PROFILE=09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
气隙部署(本地模型目录路由)#
气隙路由的另一个选项是使用 NIM 容器内的 create-model-store
命令部署创建的模型仓库,为单个模型创建仓库,如以下示例所示。
create-model-store --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b --model-store /path/to/model-repository
# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct
# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct
# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:latest"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
export MODEL_REPO=/path/to/model-repository
export NIM_SERVED_MODEL_NAME=my-model
# Note: For vLLM backend, specify the following environment variables
# to set the required parallel sizes. The default values are 1.
export NIM_TENSOR_PARALLEL_SIZE=<required_value>
export NIM_PIPELINE_PARALLEL_SIZE=<required_value>
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NIM_MODEL_NAME=/model-repo \
-e NIM_SERVED_MODEL_NAME \
-v $MODEL_REPO:/model-repo \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME