API 参考#

OpenAPI 模式#

OpenAPI 规范 详细说明了 NVIDIA NIM for VLMs 的端点

  • /v1/models - 列出可用模型

  • /v1/health/ready - 健康检查

  • /v1/health/live - 服务活跃度检查

  • /v1/chat/completions - 兼容 OpenAI 的聊天端点

  • /inference/chat_completion - 兼容 Llama Stack 的聊天端点

API 示例#

使用本节中的示例帮助您开始使用 API。

列出模型#

cURL 请求

使用以下命令列出可用模型。

curl -X 'GET' 'http://0.0.0.0:8000/v1/models'

响应

{
  "object": "list",
  "data": [
    {
      "id": "meta/llama-3.2-11b-vision-instruct",
      "object": "model",
      "created": 1724796510,
      "owned_by": "system",
      "root": "meta/llama-3.2-11b-vision-instruct",
      "parent": null,
      "max_model_len": 131072,
      "permission": [
        {
          "id": "modelperm-c2e069f426cc43088eb408f388578289",
          "object": "model_permission",
          "created": 1724796510,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

检查健康状况#

使用以下命令检查服务器健康状况。

cURL 请求

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/ready'

响应

{
  "object": "health.response",
  "message": "Service is ready."
}

检查服务活跃度#

使用以下命令检查服务活跃度。

cURL 请求

curl -X 'GET' 'http://0.0.0.0:8000/v1/health/live'

响应

{
  "object": "readyhealth.response",
  "message": "Service is live."
}

OpenAI 聊天补全#

使用以下命令查询 OpenAI 聊天补全端点。

cURL 请求

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    }
                ]
            }
        ],
        "max_tokens": 256
    }'

响应

{
  "id": "chat-8c5f5115fa464ab593963d5764498350",
  "object": "chat.completion",
  "created": 1729020253,
  "model": "meta/llama-3.2-11b-vision-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant"
        "content": "This image shows a boardwalk in a field of tall grass. ..."
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "total_tokens": 138,
    "completion_tokens": 121
  },
  "prompt_logprobs": null
}

Llama Stack 聊天补全#

使用以下命令查询 Llama Stack 聊天补全端点。

cURL 请求

curl -X 'POST' \
'http://0.0.0.0:8000/inference/chat_completion' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "meta/llama-3.2-11b-vision-instruct",
        "messages": [
            {
                "role":"user",
                "content": [
                    {
                        "image":
                            {
                                "uri": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                            }
                    },
                    "What is in this image?"
                ]
            }
        ]
    }'

响应

{
  "completion_message": {
    "role": "assistant",
    "content": "This image shows a boardwalk in a field of tall grass. ...",
    "stop_reason": "end_of_turn"
  },
  "logprobs": null
}

参考#