NVIDIA 文档中心 NVIDIA Holoscan NVIDIA Holoscan SDK v2.9.0 推理

推理

概述

需要运行推理的 Holoscan 应用程序将使用推理算子。可以使用内置的推理算子 (InferenceOp)，并且在下面的推理算子章节中记录了几个相关的用例。这些用例是使用参数集创建的，该参数集必须在 Holoscan 应用程序的配置文件中定义。如果内置的 InferenceOp 没有涵盖特定的用例，用户可以创建他们自己的自定义推理算子，如创建推理算子章节中所述。

Holoscan SDK 中的核心推理功能由推理模块提供，该模块是一个框架，通过其 API 促进推理和处理应用程序的设计和执行。内置的 InferenceOp 使用它，它支持与推理模块相同的参数。Holoscan 推理模块所需的所有参数都通过应用程序的配置文件中的参数集传递。

参数和相关特性

下面列出了 Holoscan 推理模块提供的必需参数和相关特性。

数据缓冲区参数：在推理设置中提供了参数，以启用推理的几个阶段的数据缓冲区位置。如下图所示，用户可以设置三个参数 input_on_cuda、output_on_cuda 和 transmit_on_cuda。
- input_on_cuda 指的是进入推理的数据的位置。
  - 如果值为 true，则表示输入数据在设备上。
  - 如果值为 false，则表示输入数据在主机上。
  - 默认值：true
- output_on_cuda 指的是推理数据的的数据位置。
  - 如果值为 true，则表示推理数据在设备上。
  - 如果值为 false，则表示推理数据在主机上。
  - 默认值：true
- transmit_on_cuda 指的是数据传输。
  - 如果值为 true，则表示来自推理扩展的数据传输将在设备上进行。
  - 如果值为 false，则表示来自推理扩展的数据传输将在主机上进行。
  - 默认值：true

推理参数

backend 参数设置为 trt（对于 TensorRT）、onnxrt（对于 ONNX 运行时）或 torch（对于 libtorch）。如果推理应用程序中有多个模型，则所有模型将使用相同的后端。如果希望不同的模型使用不同的后端，请指定 backend_map 参数。
- TensorRT
  - x86_64 和 aarch64 上均支持基于 CUDA 的推理。
  - 支持端到端基于 CUDA 的数据缓冲区参数。input_on_cuda、output_on_cuda 和 transmit_on_cuda 对于端到端基于 CUDA 的数据移动都将为 true。
  - input_on_cuda、output_on_cuda 和 transmit_on_cuda 可以是 true 或 false。
  - TensorRT 后端期望输入模型为 tensorrt engine file 格式或 onnx 格式。
    - 如果模型为 tensorrt engine file 格式，则参数 is_engine_path 必须设置为 true。
    - 如果模型为 onnx 格式，则 Holoscan 推理模块会自动将其转换为 tensorrt engine file。
- Torch
  - x86_64 和 aarch64 上均支持基于 CUDA 和 CPU 的推理。
  - 支持端到端基于 CUDA 的数据缓冲区参数。input_on_cuda、output_on_cuda 和 transmit_on_cuda 对于端到端基于 CUDA 的数据移动都将为 true。
  - input_on_cuda、output_on_cuda 和 transmit_on_cuda 可以是 true 或 false。
  - Libtorch 和 TorchVision 包含在 Holoscan NGC 容器中，最初是作为 PyTorch NGC 容器的一部分构建的。要在这些容器之外使用 Holoscan SDK torch 后端，我们建议您从 Holoscan 的第三方存储库下载 libtorch 和 torchvision 二进制文件。
  - Torch 后端期望输入模型为 torchscript 格式。
    - 建议对 torchscript 模型生成使用与相应架构上的 HOLOSCAN SDK 中使用的 torch 版本相同的版本。
    - 此外，建议在将要执行 torchscript 模型的同一架构上生成该模型。例如，torchscript 模型必须在 x86_64 上生成，才能在 x86_64 上运行的应用程序中执行。
- ONNX 运行时
  - x86_64 和 aarch64 上均支持基于 CUDA 和 CPU 的推理。
  - 支持端到端基于 CUDA 的数据缓冲区参数。input_on_cuda、output_on_cuda 和 transmit_on_cuda 对于端到端基于 CUDA 的数据移动都将为 true。
  - input_on_cuda、output_on_cuda 和 transmit_on_cuda 可以是 true 或 false。

如果需要基于 CPU 的推理，则将 infer_on_cpu 参数设置为 true。

下表演示了与数据缓冲区和基于 trt、torch 和 onnxrt 的后端推理相关的受支持特性。

	`input_on_cuda`	`output_on_cuda`	`transmit_on_cuda`	`infer_on_cpu`
`trt` 的支持值	`true` 或 `false`	`true` 或 `false`	`true` 或 `false`	`false`
`torch` 的支持值	`true` 或 `false`	`true` 或 `false`	`true` 或 `false`	`true` 或 `false`
`onnxrt` 的支持值	`true` 或 `false`	`true` 或 `false`	`true` 或 `false`	`true` 或 `false`

model_path_map：用户可以通过在配置文件中填充 model_path_map 来设计单个或多个 AI 推理管线。
- 单条目为单次推理；多条目启用多 AI 推理。
- model_path_map 中的每个条目都有一个唯一的关键字作为键（用作 Holoscan 推理模块的标识符），以及模型路径作为值。
- 所有模型条目都必须具有 onnx、tensorrt engine file 或 torchscript 格式的模型。
pre_processor_map：到各个模型的输入张量在配置文件中的 pre_processor_map 中指定。
- Holoscan 推理模块支持多个模型的相同输入或每个模型的唯一输入。
- pre_processor_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 中使用的相同），以及张量名称向量作为值。
- Holoscan 推理模块支持每个模型的多个输入张量。
inference_map：推理后每个模型的输出张量在配置文件中的 inference_map 中指定。
- inference_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 和 pre_processor_map 中使用的相同），以及输出张量名称向量作为值。
- Holoscan 推理模块支持每个模型的多个输出张量。
parallel_inference：并行或顺序执行推理。
- 如果输入了多个模型，则可以并行执行模型。
- 参数 parallel_inference 可以是 true 或 false。默认值为 true。
- 推理并行启动，无需检查可用的 GPU 资源。您必须确保有足够的内存和计算资源可用于并行运行所有推理。
enable_fp16：使用 FP16 选项生成 TensorRT 引擎文件
- 如果 backend 设置为 onnx 或 trt，如果输入模型为 onnx 格式，则可以使用 fp16 选项生成引擎文件以加速推理。
- 首次生成引擎文件需要几分钟时间。
- 它可以是 true 或 false。默认值为 false。
enable_cuda_graphs：为支持 CUDA Graphs 的后端启用 CUDA Graphs 的使用。
- TensorRT 后端默认启用。
- 使用 CUDA Graphs 可以减少 CPU 启动成本，并实现可能无法通过流的分段工作提交机制实现的优化。
- 包含循环或条件的模型不支持 CUDA Graphs。对于这些模型，需要禁用 CUDA Graphs 的使用。
- 它可以是 true 或 false。默认值为 true。
is_engine_path：如果在 model_path_map 中以 trt engine format 指定输入模型，则此标志必须设置为 true。默认值为 false。
in_tensor_names：pre_processor_map 要使用的输入张量名称。此参数是可选的。如果参数映射中不存在，则值从 pre_processor_map 派生。
out_tensor_names：inference_map 要使用的输出张量名称。此参数是可选的。如果参数映射中不存在，则值从 inference_map 派生。
device_map：如果在参数集中填充了 device_map，则启用多 GPU 推理。
- device_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 和 pre_processor_map 中使用的相同），以及 GPU 标识符作为值。此 GPU ID 用于执行指定模型的推理。
- device_map 中指定的 GPU 必须具有 P2P（对等）访问权限，并且必须连接到相同的 PCIE 配置。如果 GPU 之间无法进行 P2P 访问，则主机（CPU 内存）将用于传输数据。
- 所有后端都支持多 GPU 推理。
temporal_map：如果在参数集中填充了 temporal_map，则启用时间推理。
- temporal_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 和 pre_processor_map 中使用的相同），以及帧延迟作为值。帧延迟表示算子在对特定模型进行推理时跳过的帧计数。值为 1 的模型对每帧进行推理。值为 10 的模型对进入算子的每第 10 帧进行推理，即第 1 帧、第 11 帧、第 21 帧等等。此外，对于所有未进行推理的帧，算子将传输上次推理结果。例如，值为 10 的模型将在第 11 帧进行推理，对于第 12 帧到第 20 帧，将传输来自第 11 帧的结果。
- 如果参数集中不存在 temporal_map，则所有模型都将对所有帧进行推理。
- 并非所有模型都必须在 temporal_map 中。缺少的模型将对每帧进行推理。
- 所有后端都支持基于时间映射的推理。
activation_map：可以使用此参数启用动态推理。它在参数集中填充，并在运行时更新。
- activation_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 和 pre_processor_map 中使用的相同），以及激活状态作为值。激活状态表示模型是否将在给定帧上用于推理。任何值为 1 的模型都将处于活动状态并将用于推理，任何值为 0 的模型都不会运行。必须在参数集中初始化激活映射，以用于需要动态激活或停用的所有模型。
- 当 activation_map 中特定模型的激活状态为 0 时，推理算子将不会启动模型的推理，并将发出模型的上次推理结果。
- 如果参数集中不存在 activation_map，则所有模型都将对所有帧进行推理。
- 并非所有模型都必须在 activation_map 中。缺少的模型在每帧上都处于活动状态。
- 所有后端都支持基于激活映射的动态推理。
backend_map：可以使用此参数在同一应用程序中使用多个后端。
- backend_map 中的每个条目都有一个唯一的关键字，代表模型（与 model_path_map 中使用的相同），以及 backend 作为值。
- 下面显示了一个示例 backend_map。在该示例中，model_1 使用 tensorRT 后端，而 model 2 和 model 3 使用 torch 后端进行推理。
  复制
  
  已复制！
```
            
                backend_map:
        "model_1_unique_identifier": "trt"
        "model_2_unique_identifier": "torch"
        "model_3_unique_identifier": "torch"
        
```
trt_opt_profile：此参数是可选的，并且通过 TensorRT 后端激活。此参数适用于具有动态输入形状的模型。
- 参数指定为 3 个整数的向量。第一个是输入的最小批大小，第二个是最佳批大小，第三个值是最大批大小。
- 用户可以为动态输入指定批处理配置文件。然后，此配置文件用于引擎创建。用户必须清除缓存才能应用更新的优化配置文件。
- 默认值：{1,1,1}

其他特性：下表说明了当前版本中的其他特性和支持的值。

特性	支持的值
数据类型	`float32`、`int32`、`int8`
推理后端	`trt`、`torch`、`onnxrt`
每个模型的输入	多个
每个模型的输出	多个
支持的 GPU	同一 PCIE 网络上的多 GPU
张量数据维度	对于 `onnx` 和 `trt` 后端，最大支持 8 个，对于 `torch`，最大支持 3 个 (CHW) 或 4 个 (NCHW)。
模型类型	`所有 onnx` 或 `所有 torchscript` 或 `所有 trt engine` 类型，或 `torch 和 trt engine 的组合`

多接收器和单发射器支持
- Holoscan 推理模块提供了一个 API，用于从多个接收器提取数据。
- Holoscan 推理模块提供了一个 API，用于通过单个发射器传输多个张量。

参数规范

必须指定推理应用程序的所有必需推理参数。下面是使用三个模型进行推理的应用程序的示例参数集。您必须使用适当的值填充所有必填字段。

复制
已复制！

            
            inference:
    backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]
    parallel_inference: true
    infer_on_cpu: false
    enable_fp16: false
    input_on_cuda: true
    output_on_cuda: true
    transmit_on_cuda: true
    is_engine_path: false

推理算子

在 Holoscan SDK 中，内置的推理算子 (InferenceOp) 是使用 Holoscan 推理模块 API 设计的。推理算子接收推理参数集（来自配置文件）和数据接收器（来自应用程序中先前连接的算子），执行推理并将推理结果传输到应用程序中下一个连接的算子。

InferenceOp 是一个通用算子，通过参数集服务于多个用例。下面列出了一些关键用例的参数集

注意

某些参数在 InferenceOp 中设置了默认值。对于下面示例参数集中未提及的任何参数，InferenceOp 都将使用其默认值。这些参数用于启用多个用例。

使用 TensorRT 后端的单模型推理。
复制

已复制！
```
            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        
```
可以修改 backend 的值以用于其他受支持的后端，以及与每个后端相关的其他参数。您必须确保将正确的模型类型和模型路径以及相应后端的所有参数的支持值提供到参数集中。

在此示例中，path_to_model_1 必须是 onnx 文件，它将在首次执行时转换为 tensorRT 引擎文件。在后续执行期间，Holoscan 推理模块将自动找到 tensorRT 引擎文件（如果 path_to_model_1 没有更改）。此外，如果您有预构建的 tensorRT 引擎文件，则 path_to_model_1 必须是引擎文件的路径，并且参数集中的参数 is_engine_path 必须设置为 true。

使用 TensorRT 后端和多个输出的单模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier",
                                      "output_tensor_2_model_1_unique_identifier",
                                      "output_tensor_3_model_1_unique_identifier"]

如上例所示，Holoscan 推理模块会自动将模型输出映射到参数集中的命名张量。您必须确保以模型生成输出的相同顺序使用命名张量。类似的逻辑适用于多个输入。

使用 fp16 精度的单模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier",
                                      "output_tensor_2_model_1_unique_identifier",
                                      "output_tensor_3_model_1_unique_identifier"]
    enable_fp16: true

如果 fp16 精度没有可用的 tensorRT 引擎文件，则 Holoscan 推理模块将在首次执行时自动生成它。该文件被缓存以供将来执行。

在 CPU 上进行单模型推理。

复制
已复制！

            
                backend: "onnxrt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
    infer_on_cpu: true

请注意，对于基于 CPU 的推理，后端只能是 onnxrt 或 torch。

输入/输出数据在主机上的单模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
    input_on_cuda: false
    output_on_cuda: false

核心推理引擎中的数据通过主机传递并在主机上接收。推理可以在 GPU 上进行。参数 input_on_cuda 和 output_on_cuda 分别定义推理前后数据的位置。

通过主机进行数据传输的单模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
    transmit_on_host: true

从推理算子到应用程序中下一个连接的算子的数据通过主机传输。

使用单个后端的模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]

默认情况下，多个模型推理并行启动。通过参数 backend 指定的后端用于应用程序中的所有模型。

使用顺序推理的多模型推理。

复制
已复制！

            
                backend: "trt"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]
    parallel_inference: false

默认情况下，parallel_inference 设置为 true。要按顺序启动模型推理，parallel_inference 必须设置为 false。

使用多个后端的模型推理。

复制
已复制！

            
                backend_map:
        "model_1_unique_identifier": "trt"
        "model_2_unique_identifier": "torch"
        "model_3_unique_identifier": "torch"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]

在上面的示例参数集中，第一个模型将使用 tensorRT 后端进行推理，而模型 2 和 3 将使用 torch 后端进行推理。

注意

backend_map 中的后端组合必须支持推理期间将使用的所有其他参数。例如，不支持基于 CPU 的推理的 onnxrt 和 tensorRT 组合。

在多 GPU 上使用单个后端的模型推理。

复制
已复制！

            
                backend: "trt"
    device_map:
        "model_1_unique_identifier": "1"
        "model_2_unique_identifier": "0"
        "model_3_unique_identifier": "1"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]

在上面的示例中，模型 1 和模型 3 将在 ID 为 1 的 GPU 上进行推理，模型 2 将在 ID 为 0 的 GPU 上进行推理。GPU 之间必须具有 P2P（对等）访问权限。如果未启用，Holoscan 推理模块默认启用它。如果 GPU 之间无法进行 P2P 访问，则数据传输将通过主机进行。

在多个 GPU 上使用多个后端的模型推理。

复制
已复制！

            
                backend_map:
        "model_1_unique_identifier": "trt"
        "model_2_unique_identifier": "torch"
        "model_3_unique_identifier": "torch"
    device_map:
        "model_1_unique_identifier": "1"
        "model_2_unique_identifier": "0"
        "model_3_unique_identifier": "1"
    model_path_map:
        "model_1_unique_identifier": "path_to_model_1"
        "model_2_unique_identifier": "path_to_model_2"
        "model_3_unique_identifier": "path_to_model_3"
    pre_processor_map:
        "model_1_unique_identifier": ["input_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["input_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["input_tensor_1_model_3_unique_identifier"]
    inference_map:
        "model_1_unique_identifier": ["output_tensor_1_model_1_unique_identifier"]
        "model_2_unique_identifier": ["output_tensor_1_model_2_unique_identifier"]
        "model_3_unique_identifier": ["output_tensor_1_model_3_unique_identifier"]

在上面的示例中，推理期间使用了三个模型。模型 1 使用 trt 后端并在 ID 为 1 的 GPU 上运行，模型 2 使用 torch 后端并在 ID 为 0 的 GPU 上运行，模型 3 使用 torch 后端并在 ID 为 1 的 GPU 上运行。

创建推理算子

推理算子是推理应用程序中的核心推理单元。内置的推理算子 (InferenceOp) 可以用于推理，或者您可以创建自己的自定义推理算子，如本节所述。在 Holoscan SDK 中，可以使用 Holoscan 推理模块 API 设计推理算子。

以下代码部分中的参数称为 …。

参数有效性检查：通过配置（从步骤 1）输入的推理参数是否正确进行验证。

复制
已复制！

            
            auto status = HoloInfer::inference_validity_check(...);

推理规范创建：对于单个 AI，仅将一个条目传递到参数集中的必需条目中。下面的 API 调用没有变化。单 AI 或多 AI 是根据配置（在步骤 1 中）中的参数规范中的条目数启用的。

复制
已复制！

            
            // Declaration of inference specifications
std::shared_ptr<HoloInfer::InferenceSpecs> inference_specs_;

// Creation of inference specification structure
inference_specs_ = std::make_shared<HoloInfer::InferenceSpecs>(...);

推理上下文创建。

复制
已复制！

            
            // Pointer to inference context.
std::unique_ptr<HoloInfer::InferContext> holoscan_infer_context_;
// Create holoscan inference context
holoscan_infer_context_ = std::make_unique<HoloInfer::InferContext>();

使用推理上下文设置参数：在此步骤中传输 Holoscan 推理模块的所有必需参数，并在推理规范中启动相关内存分配。

复制
已复制！

            
            // Set and transfer inference specification to inference context
auto status = holoscan_infer_context_->set_inference_params(inference_specs_);

数据提取和分配：以下 API 从 Holoinfer 实用程序用于提取和分配指定张量的数据。

复制
已复制！

            
            // Extract relevant data from input, and update inference specifications
gxf_result_t stat = HoloInfer::get_data_per_model(...);

推理执行

复制
已复制！

            
            // Execute inference and populate output buffer in inference specifications
auto status = holoscan_infer_context_->execute_inference(inference_specs_->data_per_model_,
                                                         inference_specs_->output_per_model_);

传输推理数据

复制
已复制！

            
            // Transmit output buffers
auto status = HoloInfer::transmit_data_per_model(...);

下图演示了 Holoscan SDK 中的推理算子。所有带有 blue 颜色的块都是来自 Holoscan 推理模块的 API 调用。

上一篇可视化

下一篇调度器