模型仓库扩展#

本文档描述了 Triton 的模型仓库扩展。模型仓库扩展允许客户端查询和控制 Triton 正在服务的一个或多个模型仓库。由于支持此扩展，Triton 在服务器元数据的扩展字段中报告 “model_repository”。此扩展有一个可选组件（如下所述），允许卸载 API 指定 “unload_dependents” 参数。支持此可选组件的 Triton 版本也将在服务器元数据的扩展字段中报告 “model_repository(unload_dependents)”。

HTTP/REST#

在本文档中显示的所有 JSON 模式中，$number、$string、$boolean、$object 和 $array 指的是基本 JSON 类型。#optional 表示可选的 JSON 字段。

模型仓库扩展需要 Index、Load 和 Unload API。Triton 在以下 URL 公开端点。

POST v2/repository/index

POST v2/repository/models/${MODEL_NAME}/load

POST v2/repository/models/${MODEL_NAME}/unload

Index#

Index API 返回有关模型仓库中每个可用模型的信息，即使该模型当前未加载到 Triton 中。Index API 提供了一种确定哪些模型可能通过 Load API 加载的方法。模型仓库索引请求通过 HTTP POST 发送到 index 端点。在相应的响应中，HTTP 正文包含 JSON 响应。

索引请求对象（标识为 $repository_index_request）是 POST 请求的 HTTP 正文中所必需的。

$repository_index_request =
{
  "ready" : $boolean #optional,
}

“ready”：可选，默认为 false。如果为 true，则仅返回准备好进行推理的模型。

成功的索引请求由 200 HTTP 状态代码指示。响应对象（标识为 $repository_index_response）在每个成功的请求的 HTTP 正文中返回。

$repository_index_response =
[
  {
    "name" : $string,
    "version" : $string #optional,
    "state" : $string,
    "reason" : $string
  },
  …
]

“name”：模型的名称。
“version”：模型的版本。
“state”：模型的状态。
“reason”：模型处于当前状态的原因（如果有）。

失败的索引请求必须由 HTTP 错误状态（通常为 400）指示。HTTP 正文必须包含 $repository_index_error_response 对象。

$repository_index_error_response =
{
  "error": $string
}

“error”：错误的描述性消息。

Load#

Load API 请求将模型加载到 Triton 中，如果模型已加载，则重新加载。加载请求通过 HTTP POST 发送到 load 端点。HTTP 正文可以为空，也可以包含加载请求对象（标识为 $repository_load_request）。成功的加载请求由 200 HTTP 状态指示。

$repository_load_request =
{
  "parameters" : $parameters #optional
}

“parameters”：一个对象，其中包含此请求的零个或多个参数，表示为键/值对。有关更多信息，请参阅 Parameters。

Load API 接受以下参数

“config”：字符串参数，其中包含模型配置的 JSON 表示形式，该表示形式必须能够解析为 model_config.proto 中的 ModelConfig 消息。此配置将用于加载模型，而不是模型目录中的配置。如果提供了 config，则会因模型元数据已更新而触发（重新）加载，并将应用相同的（重新）加载行为。
“file:<version>/<file-name>”：序列化的模型文件，base64 编码。此约定将用于指定要从中加载模型的覆盖模型目录。例如，如果用户想要指定一个包含 ONNX 模型作为版本 2 的模型目录，则用户将参数指定为 “file:2/model.onnx” : “<base64-encoded-file-content>”。请注意，必须提供 “config” 参数作为覆盖模型目录的模型配置。

失败的加载请求必须由 HTTP 错误状态（通常为 400）指示。HTTP 正文必须包含 $repository_load_error_response 对象。

$repository_load_error_response =
{
  "error": $string
}

“error”：错误的描述性消息。

示例#

对于以下请求，Triton 将使用提供的模型配置和模型文件加载模型 “mymodel”。

POST /v2/repository/models/mymodel/load HTTP/1.1
Host: localhost:8000
{
  "parameters": {
    "config": "{
      "name": "mymodel",
      "backend": "onnxruntime",
      "inputs": [{
          "name": "INPUT0",
          "datatype": "FP32",
          "shape": [ 1 ]
        }
      ],
      "outputs": [{
          "name": "OUTPUT0",
          "datatype": "FP32",
          "shape": [ 1 ]
        }
      ]
    }",

    "file:1/model.onnx" : "<base64-encoded-file-content>"
  }
}

Unload#

Unload API 请求从 Triton 卸载模型。卸载请求通过 HTTP POST 发送到 unload 端点。HTTP 正文可以为空，也可以包含卸载请求对象（标识为 $repository_unload_request）。成功的卸载请求由 200 HTTP 状态指示。

$repository_unload_request =
{
  "parameters" : $parameters #optional
}

“parameters”：一个对象，其中包含此请求的零个或多个参数，表示为键/值对。有关更多信息，请参阅 Parameters。

Unload API 接受以下参数

“unload_dependents”：布尔参数，指示除了卸载请求的模型外，还要卸载任何与请求的模型一起加载的依赖模型。例如，请求卸载组成集成模型的模型也将卸载集成模型。

失败的卸载请求必须由 HTTP 错误状态（通常为 400）指示。HTTP 正文必须包含 $repository_unload_error_response 对象。

$repository_unload_error_response =
{
  "error": $string
}

“error”：错误的描述性消息。

GRPC#

模型仓库扩展需要以下 API

service GRPCInferenceService
{
  …

  // Get the index of model repository contents.
  rpc RepositoryIndex(RepositoryIndexRequest)
          returns (RepositoryIndexResponse) {}

  // Load or reload a model from a repository.
  rpc RepositoryModelLoad(RepositoryModeLoadRequest)
          returns (RepositoryModelLoadResponse) {}

  // Unload a model.
  rpc RepositoryModelUnload(RepositoryModelUnloadRequest)
          returns (RepositoryModelUnloadResponse) {}
}

message ModelRepositoryParameter
{
  // The parameter value can be a string, an int64, a boolean
  // or a message specific to a predefined parameter.
  oneof parameter_choice
  {
    // A boolean parameter value.
    bool bool_param = 1;

    // An int64 parameter value.
    int64 int64_param = 2;

    // A string parameter value.
    string string_param = 3;

    // A bytes parameter value.
    bytes bytes_param = 4;
  }
}

Index#

RepositoryIndex API 返回有关模型仓库中每个可用模型的信息，即使该模型当前未加载到 Triton 中。错误由为请求返回的 google.rpc.Status 指示。OK 代码表示成功，其他代码表示失败。RepositoryIndex 的请求和响应消息是

message RepositoryIndexRequest
{
  // The name of the repository. If empty the index is returned
  // for all repositories.
  string repository_name = 1;

  // If true return only models currently ready for inferencing.
  bool ready = 2;
}

message RepositoryIndexResponse
{
  // Index entry for a model.
  message ModelIndex {
    // The name of the model.
    string name = 1;

    // The version of the model.
    string version = 2;

    // The state of the model.
    string state = 3;

    // The reason, if any, that the model is in the given state.
    string reason = 4;
  }

  // An index entry for each model.
  repeated ModelIndex models = 1;
}

Load#

RepositoryModelLoad API 请求将模型加载到 Triton 中，如果模型已加载，则重新加载。错误由为请求返回的 google.rpc.Status 指示。OK 代码表示成功，其他代码表示失败。RepositoryModelLoad 的请求和响应消息是

message RepositoryModelLoadRequest
{
  // The name of the repository to load from. If empty the model
  // is loaded from any repository.
  string repository_name = 1;

  // The name of the model to load, or reload.
  string model_name = 2;

  // Optional parameters.
  map<string, ModelRepositoryParameter> parameters = 3;
}

message RepositoryModelLoadResponse
{
}

RepositoryModelLoad API 接受以下参数

“config”：字符串参数，其中包含模型配置的 JSON 表示形式，该表示形式必须能够解析为 model_config.proto 中的 ModelConfig 消息。此配置将用于加载模型，而不是模型目录中的配置。如果提供了 config，则会因模型元数据已更新而触发（重新）加载，并将应用相同的（重新）加载行为。
“file:<version>/<file-name>”：字节参数，其中包含模型文件内容。此约定将用于指定要从中加载模型的覆盖模型目录。例如，如果用户想要指定一个包含 ONNX 模型作为版本 2 的模型目录，则用户将参数指定为 “file:2/model.onnx” : “<file-content>”。请注意，必须提供 “config” 参数作为覆盖模型目录的模型配置。

Unload#

RepositoryModelUnload API 请求从 Triton 卸载模型。错误由为请求返回的 google.rpc.Status 指示。OK 代码表示成功，其他代码表示失败。RepositoryModelUnload 的请求和响应消息是

message RepositoryModelUnloadRequest
{
  // The name of the repository from which the model was originally
  // loaded. If empty the repository is not considered.
  string repository_name = 1;

  // The name of the model to unload.
  string model_name = 2;

  // Optional parameters.
  map<string, ModelRepositoryParameter> parameters = 3;
}

message RepositoryModelUnloadResponse
{
}

RepositoryModelUnload API 接受以下参数

“unload_dependents”：布尔参数，指示除了卸载请求的模型外，还要卸载任何与请求的模型一起加载的依赖模型。例如，请求卸载组成集成模型的模型也将卸载集成模型。