使用重新排序#

本节提供了一些重排序的示例、一些最佳实践，并描述了您需要考虑的一些安全问题。

示例#

Shell (cURL)#

排序#

请求

curl -X 'POST' \
  'https://:8000/v1/ranking' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nvidia/nv-rerankqa-mistral-4b-v3",
  "query": {"text": "which way should i go?"},
  "passages": [
    {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
    {"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
    {"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
    {"text": "i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
  ],
  "truncate": "END"
}'

响应

{
  "rankings": [
    { "index": 0, "logit": -1.2421875 },
    { "index": 3, "logit": -3.029296875 },
    { "index": 2, "logit": -5.41015625 },
    { "index": 1, "logit": -8.2421875 }
  ]
}

最佳实践#

对 Text Reranking NIM 的请求包括 query、passages 列表和可选的 truncate 参数（NONE 或 END，默认为 NONE）。然后，它根据相关性对 passages 进行重新排序。请注意，虽然许多数据存储会返回 passages 的分数，但 Text Reranking NIM 不使用这些分数。仅使用 query 和候选 passages 的文本，并根据模型对内容的理解进行排序。

如果 truncate 为 NONE，则容器会为令牌化表示形式超出底层模型的令牌限制的输入返回错误。如果 truncate 为 END，则会忽略超出令牌限制的所有令牌（见下文）。

令牌限制和截断#

Text Reranking NIM API 允许传入超过 9,000 个字符的文本用于 query 和 passages，但这远高于当前的模型限制。令牌限制是底层模型的函数。对于 NV-Rerank-QA-Mistral-4B，总令牌限制为 503，包括查询。因此，如果您的 query 为 200 个令牌，而 passage 为 400 个令牌，则最右边的 97 个令牌将被截断。

请注意，这意味着如果您的 query 为 503 个令牌，并且 truncate 为 END，则整个 passage 将被截断，从而使重新排序服务失效。

最大段落数#

您可以在单个重新排序调用中传递最多 512 个 passages。

了解结果#

来自重新排序请求的结果将包括一个对象列表，其中包含 index 和 logit 键。它们将按 logit 值降序排序。logit 是模型为每个查询/段落对生成的原始、未归一化的预测。

index 引用请求中引用的段落的 index。因此，如果请求列表包含段落 ["bears", "house", "grass"] 并且响应中的索引为 1,2,0，则响应表示排序后的 passages 顺序为 ["house", "grass", "bears"]。

安全与身份验证#

作为开发人员，您有责任保护对使用 NeMo 生态系统的任何应用程序的访问，包括用户和您的应用程序之间的身份验证层，以及保护您的应用程序中服务之间的通信。

速率限制#

Text Reranking NIM 不施加速率限制。如果您想限制对应用程序的访问，则您有责任实施策略。

端口#

Text Retriever NIM 使用多个端口，但只有 API 端口 8000 需要在集群外部可访问。服务端口在 Text Embedding NIM 和 Text Reranking NIM 的启动时设置。

其他安全提醒#

作为开发人员，您必须保护自己的 API 端点。我们建议使用代理以及 HTTPS/TLS 1.2。

事件响应#

密钥#

如果您使用 Helm charts 部署 Text Retriever NIM 组件，请按照创建密钥部分中的说明进行操作。