分类扩展#

本文档介绍了 Triton 的分类扩展。分类扩展允许 Triton 返回分类索引和（可选）标签作为输出，而不是返回原始张量数据作为输出。由于支持此扩展，Triton 在其服务器元数据的扩展字段中报告“classification”。

推理请求可以使用“classification”参数来请求为一个或多个输出返回分类。对于这样的输出，返回的张量将不是模型生成的形状和类型，而是类型为 BYTES，形状为 [ 批大小, <计数> ]，其中每个元素将分类索引和标签作为单个字符串返回。返回张量的 <计数> 维度将等于分类参数中指定的“count”值。

当使用分类参数时，Triton 将根据输出张量的数据类型，通过比较输出张量中值最高的 n 个元素来确定前 n 个分类。例如，如果输出张量为 [ 1, 5, 10, 4 ]，则值最高的元素是 10（索引 2），其次是 5（索引 1），然后是 4（索引 3），然后是 1（索引 0）。因此，例如，按索引排序的前 2 个分类是 [ 2, 1 ]。

返回字符串的格式将为“<值>:<索引>[:<标签>]”，其中 <索引> 是模型输出张量中类别的索引，<值> 是模型输出中与该索引关联的值，与该索引关联的 <标签> 是可选的。例如，继续上面的示例，返回的张量将是 [ “10:2”, “5:1” ]。如果模型具有与这些索引关联的标签，则返回的张量将是 [ “10:2:apple”, “5:1:pickle” ]。

HTTP/REST#

在本文档中显示的所有 JSON 模式中，$number、$string、$boolean、$object 和 $array 指的是基本 JSON 类型。#optional 表示可选的 JSON 字段。

分类扩展要求 Triton 识别应用于请求的推理输出的“classification”参数，如下所示

“classification” : $number，指示应为输出返回的类数。

以下示例显示了如何在推理请求中使用分类参数。

POST /v2/models/mymodel/infer HTTP/1.1
Host: localhost:8000
Content-Type: application/json
Content-Length: <xx>
{
  "id" : "42",
  "inputs" : [
    {
      "name" : "input0",
      "shape" : [ 2, 2 ],
      "datatype" : "UINT32",
      "data" : [ 1, 2, 3, 4 ]
    }
  ],
  "outputs" : [
    {
      "name" : "output0",
      "parameters" : { "classification" : 2 }
    }
  ]
}

对于上述请求，Triton 将返回“output0”输出张量作为形状为 [ 2 ] 的 STRING 张量。假设模型从上述输入生成 output0 张量 [ 1.1, 3.3, 0.5, 2.4 ]，则响应将如下所示。

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: <yy>
{
  "id" : "42"
  "outputs" : [
    {
      "name" : "output0",
      "shape" : [ 2 ],
      "datatype"  : "STRING",
      "data" : [ "3.3:1", "2.4:3" ]
    }
  ]
}

如果模型具有与每个分类索引关联的标签，Triton 也会返回这些标签，如下所示。

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: <yy>
{
  "id" : "42"
  "outputs" : [
    {
      "name" : "output0",
      "shape" : [ 2 ],
      "datatype"  : "STRING",
      "data" : [ "3.3:1:index_1_label", "2.4:3:index_3_label" ]
    }
  ]
}

GRPC#

分类扩展要求 Triton 识别应用于请求的推理输出的“classification”参数，如下所示

“classification” : int64_param，指示应为输出返回的类数。

以下示例显示了如何在推理请求中使用分类参数。

ModelInferRequest {
  model_name : "mymodel"
  model_version : -1
  inputs [
    {
      name : "input0"
      shape : [ 2, 2 ]
      datatype : "UINT32"
      contents { int_contents : [ 1, 2, 3, 4 ] }
    }
  ]
  outputs [
    {
      name : "output0"
      parameters [
        {
          key : "classification"
          value : { int64_param : 2 }
        }
      ]
    }
  ]
}

对于上述请求，Triton 将返回“output0”输出张量作为形状为 [ 2 ] 的 STRING 张量。假设模型从上述输入生成 output0 张量 [ 1.1, 3.3, 0.5, 2.4 ]，则响应将如下所示。

ModelInferResponse {
  model_name : "mymodel"
  outputs [
    {
      name : "output0"
      shape : [ 2 ]
      datatype  : "STRING"
      contents { bytes_contents : [ "3.3:1", "2.4:3" ] }
    }
  ]
}