实用工具#

NIM 包含一组实用工具脚本,以协助 NIM 操作。

实用工具可以通过将所需实用工具的名称添加到 docker run 命令来启动。例如,您可以使用以下命令执行 list-model-profiles 实用工具

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles

您可以使用 -h 标志获取有关每个实用工具的更多信息

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME download-to-cache -h

列出可用模型配置文件#

将 NIM 检测到的系统信息以及所选 NIM 的所有配置文件列表打印到控制台。配置文件根据其是否与当前系统兼容(基于检测到的系统信息)进行分类。

list-model-profiles

示例#

docker run -it --rm --gpus all $IMG_NAME list-model-profiles
SYSTEM INFO
- Free GPUs:
  -  [20b2:10de] (0) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (1) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (2) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (3) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (4) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (5) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (6) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
  -  [20b2:10de] (7) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%]
MODEL PROFILES
- Compatible with system and runnable:
  - d86754a6413430bf502ece62fdcc8137d4ed24d6062e93c23c1090f0623d535f (tensorrt_llm-a100-bf16-tp8-latency)
  - 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb (tensorrt_llm-a100-bf16-tp4-throughput)
  - 7283d5adcddeeab03996f61a33c51552d9bcff16c38e4a52f1204210caeb393c (vllm-fp16-tp8)
  - cdcbc486dd076bc287cca6262c59fe90057d76ae18a407882075f65a99f5f038 (vllm-fp16-tp4)
- Incompatible with system:
  - 5296eed82c6309b64b13da03fbb843d99c3276effd6a0c51e28ad5bb29f56017 (tensorrt_llm-h100-fp8-tp8-latency)
  - 4e0aeeefd4dfeae46ad40f16238bbde8858850ce0cf56c26449f447a02a9ac8f (tensorrt_llm-h100-fp8-tp4-throughput)
  - ...

将模型配置文件下载到 NIM 缓存#

将选定的或默认的模型配置文件下载到 NIM 缓存。可用于在部署前预缓存配置文件。需要环境中的 NGC_API_KEY

download-to-cache

     --profiles [PROFILES ...], -p [PROFILES ...]
Profile hashes to download. If none are provided, the optimal profile is downloaded. Multiple profiles can be specified separated by spaces.


  --all
Set to download all profiles to cache

示例#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  $IMG_NAME download-to-cache -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb
INFO 08-12 18:44:07.810 pre_download.py:80] Fetching contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb
INFO 08-12 18:44:07.810 pre_download.py:86] {
  "feat_lora": "false",
  "gpu": "A100",
  "gpu_device": "20b2:10de",
  "llm_engine": "tensorrt_llm",
  "pp": "1",
  "precision": "bf16",
  "profile": "throughput",
  "tp": "4"
}
...

创建模型存储#

从缓存的模型配置文件中提取文件,并创建一个格式正确的目录。如果配置文件尚未缓存,则会下载到模型缓存。下载配置文件需要环境中的 NGC_API_KEY

create-model-store

  --profile <PROFILE>, -p <PROFILE>
Profile hash to create a model directory of. Will be downloaded if not present.


  --model-store <MODEL_STORE>, -m <MODEL_STORE>
Directory path where model {option}`--profile` will be extracted and copied to.

示例#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME create-model-store -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb -m /tmp
INFO 08-12 19:49:47.629 pre_download.py:128] Fetching contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb
INFO 08-12 19:49:47.629 pre_download.py:135] Copying contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb to /tmp

检查 NIM 缓存#

检查 NIM 缓存目录是否存在且可写入。

VLLM_NVEXT_LOG_LEVEL=debug nim-llm-check-cache-env

示例#

docker run -it --rm --gpus all -e VLLM_NVEXT_LOG_LEVEL=debug -v /bad_path:/opt/nim/.cache $IMG_NAME nim-llm-check-cache-env
WARNING 08-12 19:54:06.347 caches.py:30] /opt/nim/.cache is read-only, application may fail if model is not already present in cache

设置缓存环境变量#

将用于设置缓存环境变量的命令打印到控制台。

nim-llm-set-cache-env

示例#

docker run -it --rm --gpus all -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME nim-llm-set-cache-env
export NUMBA_CACHE_DIR=/tmp/numba
export NGC_HOME=/opt/nim/.cache/ngc
export HF_HOME=/opt/nim/.cache/huggingface
export VLLM_CONFIG_ROOT=/tmp/vllm/config
export VLLM_CACHE_ROOT=/tmp/vllm/cache
export TRITON_CACHE_DIR=/tmp/.triton