BLS 示例#

在本节中，我们将演示 Python 后端中 BLS 的端到端示例。模型仓库应包含 pytorch、 addsub。 pytorch 和 addsub 模型分别计算 INPUT0 和 INPUT1 的和与差，并将结果放入 OUTPUT0 和 OUTPUT1 中。本示例分为两个部分。第一部分演示如何执行同步 BLS 请求，第二部分演示如何执行异步 BLS 请求。

同步 BLS 请求#

同步 BLS 模型的目标与 pytorch 和 addsub 模型相同，但不同之处在于 BLS 模型本身不会计算和与差。同步 BLS 模型会将输入张量传递给 pytorch 或 addsub 模型，并将该模型的响应作为最终响应返回。附加参数 MODEL_NAME 确定将使用哪个模型来计算最终输出。

创建模型仓库

mkdir -p models/add_sub/1
mkdir -p models/bls_sync/1
mkdir -p models/pytorch/1

# Copy the Python models
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
cp examples/bls/sync_model.py models/bls_sync/1/model.py
cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/

启动 tritonserver

tritonserver --model-repository `pwd`/models

向服务器发送推理请求

python3 examples/bls/sync_client.py

您应该看到类似于以下输出的输出

=========='add_sub' model result==========
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) + INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) - INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT1 ([-0.02932483 -0.22716594  0.04308355  0.28689077])


=========='pytorch' model result==========
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) + INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) - INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT1 ([-0.02932483 -0.22716594  0.04308355  0.28689077])


=========='undefined' model result==========
Failed to process the request(s) for model instance 'bls_0', message: TritonModelException: Failed for execute the inference request. Model 'undefined_model' is not ready.

At:
  /tmp/python_backend/models/bls/1/model.py(110): execute

sync_model.py 模型文件附有大量注释，解释了每个函数调用。

客户端输出说明#

client.py 向 “bls_sync” 模型发送三个推理请求，其中 “MODEL_NAME” 输入的值不同。如前所述，“MODEL_NAME” 确定 “bls” 模型将用于计算最终输出的模型名称。在第一个请求中，它将使用 “add_sub” 模型，在第二个请求中，它将使用 “pytorch” 模型。第三个请求使用不正确的模型名称来演示推理请求执行期间的错误处理。

异步 BLS 请求#

在本节中，我们将解释如何在不等待响应的情况下发送多个 BLS 请求。异步执行 BLS 请求不会阻塞您的模型执行，并且在某些条件下可以提高速度。

bls_async 模型将对 pytorch 和 addsub 模型执行两个异步 BLS 请求。然后，它将等待直到这些模型上的推理请求完成。它将从 pytorch 中提取 OUTPUT0，并从 addsub 模型中提取 OUTPUT1，以使用这些张量构造最终推理响应对象。

创建模型仓库

mkdir -p models/add_sub/1
mkdir -p models/bls_async/1
mkdir -p models/pytorch/1

# Copy the Python models
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/
cp examples/bls/async_model.py models/bls_async/1/model.py
cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/

启动 tritonserver

tritonserver --model-repository `pwd`/models

向服务器发送推理请求

python3 examples/bls/async_client.py

您应该看到类似于以下输出的输出

INPUT0 ([0.72394824 0.45873794 0.4307444  0.07681174]) + INPUT1 ([0.34224355 0.8271524  0.5831284  0.904624  ]) = OUTPUT0 ([1.0661918 1.2858903 1.0138729 0.9814357])
INPUT0 ([0.72394824 0.45873794 0.4307444  0.07681174]) - INPUT1 ([0.34224355 0.8271524  0.5831284  0.904624  ]) = OUTPUT1 ([ 0.3817047  -0.36841443 -0.15238398 -0.82781225])