参考#

OpenAI API#

您可以下载完整的 API 规范

警告

每个模型都有最大令牌长度。模型部分列出了受支持模型的最大令牌长度。请参阅truncate字段在参考中关于如何处理超过最大令牌长度的序列的方法。

动态批处理#

动态批处理是一项功能,允许 NIM 容器中底层 Triton 进程将一个或多个请求分组到一个批次中,这可以在某些条件下提高吞吐量,例如在为许多具有小有效负载的请求提供服务时。此功能默认启用,可以通过设置 NIM_TRITON_DYNAMIC_BATCHING_MAX_QUEUE_DELAY_MICROSECONDS 环境变量进行调整。默认值为 100 微秒(微秒)。

有关动态批处理的更多信息,请参阅 Triton 用户指南

API 示例#

使用本节中的示例来帮助您开始使用 API。

完整的 API 规范可以在 Open AI 规范 中找到

列出模型#

cURL 请求

使用以下命令列出可用的模型。

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/models" \
-H 'Accept: application/json'

响应

{
  "object": "list",
  "data": [
    {
      "id": "nvidia/nv-rerankqa-mistral-4b-v3"
    }
  ]
}

生成排名#

cURL 请求

curl -X "POST" \
  "http://${HOSTNAME}:${SERVICE_PORT}/v1/ranking" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nvidia/nv-rerankqa-mistral-4b-v3",
  "query": {"text": "which way should i go?"},
  "passages": [
    {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
    {"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
    {"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
    {"text": "i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
  ],
  "truncate": "END"
}'

响应

{
  "rankings": [
    {
      "index": 0,
      "logit": 0.7646484375
    },
    {
      "index": 3,
      "logit": -1.1044921875
    },
    {
      "index": 2,
      "logit": -2.71875
    },
    {
      "index": 1,
      "logit": -5.09765625
    }
  ]
}

健康检查#

cURL 请求

使用以下命令查询健康端点。

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'

响应

{
  "ready": true
}
{
  "live": true
}

参考#