部署 UNet 模型到 Deepstream - NVIDIA 文档

要将 TAO 训练的 UNet 模型部署到 DeepStream，您需要使用 TAO Deploy 生成特定于设备的优化 TensorRT 引擎，然后 DeepStream 可以摄取该引擎。

机器特定的优化是在引擎创建过程中执行的，因此您应该为每个环境和硬件配置生成不同的引擎。此外，如果推理环境的 TensorRT 或 CUDA 库已更新（包括次要版本更新），或者生成了新模型，则您将需要生成新引擎。不支持运行使用不同版本的 TensorRT 和 CUDA 生成的引擎，并且会导致影响推理速度、准确性和稳定性的未知行为，或者可能完全无法运行。

有关如何导出 TAO 模型的更多详细信息，请参阅 UNet 的导出模型文档。

TensorRT 开源软件 (OSS)

UNet 模型需要 TensorRT OSS 构建，因为几个先决条件的 TensorRT 插件仅在 TensorRT 开源仓库中可用。

如果您的部署平台是带有 NVIDIA GPU 的 x86 PC，请按照x86 上的 TensorRT OSS 说明进行操作；如果您的部署平台是 NVIDIA Jetson，请按照Jetson (ARM64) 上的 TensorRT OSS 说明进行操作。

x86 上的 TensorRT OSS

在 x86 上构建 TensorRT OSS

安装 Cmake (>=3.13)。

注意

TensorRT OSS 需要 cmake >= v3.13，因此如果您的 cmake 版本低于 3.13c，请安装 cmake 3.13

复制
已复制！

            
            sudo apt remove --purge --auto-remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz
tar xvf cmake-3.13.5.tar.gz
cd cmake-3.13.5/
./configure
make -j$(nproc)
sudo make install
sudo ln -s /usr/local/bin/cmake /usr/bin/cmake

获取 GPU 架构。GPU_ARCHS 值可以通过 deviceQuery CUDA 示例检索

复制
已复制！

            
            cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

如果您的系统中不存在 /usr/local/cuda/samples，您可以从此 GitHub 仓库下载 deviceQuery.cpp。编译并运行 deviceQuery。

复制
已复制！

            
            nvcc deviceQuery.cpp -o deviceQuery
./deviceQuery

此命令将输出类似这样的内容，这表明基于 CUDA Capability major/minor version，GPU_ARCHS 为 75。

复制
已复制！

            
            Detected 2 CUDA Capable device(s)

Device 0: "Tesla T4"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    7.5

构建 TensorRT OSS

复制
已复制！

            
            git clone -b 21.08 https://github.com/nvidia/TensorRT
cd TensorRT/
git submodule update --init --recursive
export TRT_SOURCE=`pwd`
cd $TRT_SOURCE
mkdir -p build && cd build

注意

确保步骤 2 中的 GPU_ARCHS 在 TensorRT OSS CMakeLists.txt 中。如果 GPU_ARCHS 不在 TensorRT OSS CMakeLists.txt 中，请添加 -DGPU_ARCHS=<VER>，如下所示，其中 <VER> 表示步骤 2 中的 GPU_ARCHS。

复制
已复制！

            
            /usr/local/bin/cmake .. -DGPU_ARCHS=xy  -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=`pwd`/out
make nvinfer_plugin -j$(nproc)

构建成功结束后，libnvinfer_plugin.so* 将在 \`pwd\`/out/. 下生成。

替换原始的 libnvinfer_plugin.so*

复制
已复制！

            
            sudo mv /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.x.y ${HOME}/libnvinfer_plugin.so.8.x.y.bak   // backup original libnvinfer_plugin.so.x.y
sudo cp $TRT_SOURCE/`pwd`/out/libnvinfer_plugin.so.8.m.n  /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.x.y
sudo ldconfig

Jetson (ARM64) 上的 TensorRT OSS

安装 Cmake (>=3.13)

注意

TensorRT OSS 需要 cmake >= v3.13，而 Jetson/Ubuntu 18.04 上的默认 cmake 是 cmake 3.10.2。

使用以下命令升级 TensorRT OSS

复制
已复制！

            
            sudo apt remove --purge --auto-remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz
tar xvf cmake-3.13.5.tar.gz
cd cmake-3.13.5/
./configure
make -j$(nproc)
sudo make install
sudo ln -s /usr/local/bin/cmake /usr/bin/cmake

根据您的平台获取 GPU 架构。下表给出了不同 Jetson 平台的 GPU_ARCHS。

Jetson 平台 GPU_ARCHS

Nano/Tx1 53

Tx2 62

AGX Xavier/Xavier NX 72

构建 TensorRT OSS

复制
已复制！

            
            git clone -b 21.03 https://github.com/nvidia/TensorRT
cd TensorRT/
git submodule update --init --recursive
export TRT_SOURCE=`pwd`
cd $TRT_SOURCE
mkdir -p build && cd build

注意

下面的 -DGPU_ARCHS=72 适用于 Xavier 或 NX，对于其他 Jetson 平台，请参考步骤 2 中的 GPU_ARCHS 更改 72。

复制
已复制！

            
            /usr/local/bin/cmake .. -DGPU_ARCHS=72  -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=`pwd`/out
make nvinfer_plugin -j$(nproc)

构建成功结束后，libnvinfer_plugin.so* 将在 ‘pwd’/out/. 下生成。

将 "libnvinfer_plugin.so*" 替换为新生成的。

复制
已复制！

            
            sudo mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.y ${HOME}/libnvinfer_plugin.so.8.x.y.bak   // backup original libnvinfer_plugin.so.x.y
sudo cp `pwd`/out/libnvinfer_plugin.so.8.m.n  /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.y
sudo ldconfig

标签文件

标签文件是一个文本文件，其中包含 UNet 模型训练用于分割的类名。此处列出类的顺序必须与模型预测输出的顺序相匹配。此顺序源自训练后保存在 results 目录中的 target_class_id_mapping.json 文件。以下是 target_class_id_mapping.json 文件的示例

复制
已复制！

            
            {"0": ["foreground"], "1": ["background"]}

以下是相应的 unet_labels.txt 文件的示例。unet_labels.txt 中的顺序应与 target_class_id_mapping.json 键的顺序匹配

复制
已复制！

            
            foreground
background

将模型与 DeepStream 集成

分割模型通常用作主要推理引擎。它也可以用作辅助推理引擎。从 deepstream_tao_apps 仓库下载 ds-tlt。

请按照以下步骤将 TensorRT 引擎文件与 ds-tlt 一起使用

使用 TAO Deploy 生成 TensorRT 引擎。
成功生成引擎文件后，请执行以下操作以使用 DS 6.1 设置 ds-tlt。

请按照此处的说明安装 ds-tlt：DS TAO 安装。

DeepStream 配置文件

要使用示例 ds-tao-segmentation 运行此模型，您必须修改现有的 pgie_unet_tlt_config.txt 文件此处以指向此模型。有关所有选项，请参见下面的配置文件。要了解有关参数的更多信息，请参阅DeepStream 开发指南。

从 TAO 5.0.0 开始，.etlt 已弃用。要将 .etlt 直接集成到 DeepStream 应用程序中，您需要在配置文件中使用以下参数。

复制
已复制！

            
            tlt-encoded-model=<TAO exported .etlt>
tlt-model-key=<Model export key>
int8-calib-file=<Calibration cache file>

复制
已复制！

            
            [property]
gpu-id=0
net-scale-factor=0.007843
# 0-RGB, 1-BGR, 2-Gray
model-color-format=1 # For grayscale, this should be set to 2
offsets=127.5; 127.5; 127.5
labelfile-path=</Path/to/unet_labels.txt>
##Replace following path to your model file
# You can provide the model as onnx file or convert it to tensorrt engine offline using tao deploy and
# provide it in the config file.
onnx-file=/path/to/onnx file
# tlt-encoded-model=../../models/citysemsegformer_vdeployable_v1.0/citysemsegformer.etlt # If it is an etlt file
# tlt-model-key=tlt_encode # This is needed if etlt file is used.
# If you provide the model as onnx file, you need to provide the calibration cache and text file here
labelfile-path=/path/to/labels.txt
int8-calib-file=/path/to/calibration cache text file
# Argument to be used if you are using an tensorrt engine
# model-engine-file=<Path/to/tensorrt engine generated by tao deploy>
infer-dims=c;h;w # where c = number of channels, h = height of the model input, w = width of model input.
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1

## 0=Detector, 1=Classifier, 2=Semantic Segmentation (sigmoid activation), 3=Instance Segmentation, 100=skip nvinfer postprocessing
network-type=100 # set this to 2 if sigmoid activation was used for semantic segmentation

output-tensor-meta=1 # Set this to 1 when network-type is 100
output-blob-names=argmax_1/output # If you had used softmax for segmentation model, it would have beedn replaced with argmax by TAO for optimization. Hence, you need to provide argmax_1/output
segmentation-threshold=0.0
##specify the output tensor order, 0(default value) for CHW and 1 for HWC
segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

以下是针对在 ISBI 数据集上训练的 resnet18 3 通道模型的修改后的配置文件示例

复制
已复制！

            
            [property]
gpu-id=0
net-scale-factor=0.007843
# Since the model input channel is 3, and pre-processing of UNET TAO requires BGR format, set the color format to BGR.
# 0-RGB, 1-BGR, 2-Gray
model-color-format=1 # For grayscale, this should be set to 2
offsets=127.5;127.5;127.5
labelfile-path=/home/nvidia/deepstream_tlt_apps/configs/unet_tlt/unet_labels.txt
##Replace following path to your model file
# You can provide the model as onnx file or convert it to tensorrt engine offline using tao deploy and
# provide it in the config file. If you are providing the onnx model, do not forget to provide the model key.
onnx-file=/path/to/unet_resnet18.onnx
# Argument to be used if you are using an tensorrt engine
# model-engine-file=/home/nvidia/deepstream_tlt_apps/models/unet/unet_resnet18_isbi.engine
infer-dims=3;320;320
batch-size=1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1

## 0=Detector, 1=Classifier, 2=Semantic Segmentation (sigmoid activation), 3=Instance Segmentation, 100=skip nvinfer postprocessing
network-type=100

output-tensor-meta=1 # Set this to 1 when network-type is 100

output-blob-names=argmax_1/output # If you had used softmax for segmentation model, it would have been replaced with argmax by TAO for optimization.
                           # Hence, you need to provide argmax_1/output
segmentation-threshold=0.0
##specify the output tensor order, 0(default value) for CHW and 1 for HWC
segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

下面是用于在单张图像上进行推理的示例 ds-tlt 命令

复制
已复制！

            
            ds-tao-segmentation -c pgie_config_file -i image_isbi_rgb.jpg

注意

DeepStream 不支持 .png 图像格式。推理图像需要转换为 .jpg。如果 model_input_channels 设置为 3，请确保灰度图像转换为三通道图像。

Jetson 平台	GPU_ARCHS
Nano/Tx1	53
Tx2	62
AGX Xavier/Xavier NX	72