Gst-NvDsUcx#

Gst-NvDsUcx 是一个 Gstreamer 插件，提供了一组可用于使用 RDMA 发送和接收 pipeline 数据的元素。这允许将 Gstreamer pipeline 分布到各种主机，以便使用分布式 GPU 资源。它构建于统一通信 X (UCX) 库之上，以通过支持 RDMA 的网络发送/接收 Gstreamer 数据包。UCX 是一个开源库，可加速高性能网络上的数据传输，并可以利用 GPUDirect RDMA 技术来实现分布式 GPU 流量的最小网络延迟和最高吞吐量。有关 UCX 的更多详细信息，请参阅 https://openucx.org。

描述#

Gst-NvDsUcx 提供了单独的 sink（用于从 pipeline 接收数据）和 source 元素（用于将数据转发到 pipeline），它们通过 RDMA 网络相互连接。此外，每种 sink 或 source 类型的元素都可以是服务器或客户端，其中服务器元素必须在客户端之前启动。因此，Gst-NvDsUcx 插件提供了 4 个元素：nvdsucxserversink、nvdsucxclientsink、nvdsucxserversrc、nvdsucxclientsrc。

由于 Gst-NvDsUcx 插件需要将自身呈现为 Deepstream pipeline 的 sink 和 source，因此您需要根据 pipeline 的哪个部分需要先启动来配对元素

nvdsucxserversink <-> nvdsucxclientsrc（Sink 端先启动）
nvdsucxclientsink <-> nvdsucxserversrc（Source 端先启动）

要求#

Gst-NvDsUcx 插件具有以下要求（除了 Deepstream 6.3 SDK 要求之外）

NVIDIA ConnectX6-DX NIC 或更高版本。
- 有关安装和配置 NIC 的更多信息，请参阅：https://docs.nvda.net.cn/networking/display/ConnectX6VPI/Introduction
Mellanox Open Fabrics Enterprise Distribution (MLNX_OFED) - 5.5 或更高版本，请参阅 https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
- 有关安装说明，请参阅 https://docs.nvda.net.cn/networking/display/MLNXOFEDv551032/Installing+MLNX_OFED
- 如果在容器中安装 Mellanox OFED
  - 请确保通过将 --all 标志传递给 mlnxofedinstall 脚本，在主机操作系统中安装内核驱动程序。
  - 在容器中，您只能使用 --user-space-only 标志到 mlnxofedinstall 脚本来安装用户空间库。
UCX - 1.13 或更高版本 - 需要使用 CUDA 支持编译，或者直接使用来自 git 存储库的启用 CUDA 的 UCX 软件包，请参阅 openucx/ucx
- 有关安装说明，请按照此处的 Release 构建说明进行操作：openucx/ucx。请注意，UCX 库应使用 CUDA 编译，如下所示
  $ ./contrib/configure-release --prefix=/install/path --enable-examples --with-java=no --with-cuda=/path/to/cuda --enable-mt
Docker 容器支持
- 如果您希望在容器内使用该插件，请确保在 docker run 命令期间添加以下标志
  - --privileged --network host
  - --cap-add CAP_SYS_PTRACE --shm-size="8g"
  - --device=/dev/infiniband/uverbs0
  - --device=/dev/infiniband/rdma_cm
  - --ipc=host
  - -e CUDA_CACHE_DISABLE=0
  - -v /dev/infiniband:/dev/infiniband
对于额外的元数据处理，Gst-NvDsUcx 依赖于 Gst-NvDsMetaUtils 插件提供的序列化库。有关配置和安装序列化库，请参阅 Gst-NvDsMetaUtils 文档。

注意

此插件仅在 x86_64 平台上受支持。

输入和输出#

输入（对于 Nvdsucxserversink 或 Nvdsucxclientsink）

以下任一项
- NV12/RGBA NVMM Gst Buffer +（NvDsBatchMeta + 序列化 NvDsUserMeta/Gst Meta - 可选）
- NVMM 或原始音频缓冲区 +（NvDsBatchMeta - 可选）
- 原始文本 Gst 缓冲区
控制参数
- addr
- port
- buf-type
- gpu-id
- raw-buf-size
- nvbuf-memory-type
- num-nvbuf
- nvbuf-batch-size
- num-conns
输出（来自 Nvdsucxserversrc 或 Nvdsucxclientsrc）

以下任一项
- NV12/RGBA NVMM Gst Buffer +（NvDsBatchMeta + 序列化视频 NvDsUserMeta/Gst Meta - 可选）
- NVMM 或原始音频缓冲区 +（NvDsBatchMeta + 序列化音频 NvDsUserMeta/Gst Meta - 可选）
- 原始文本 Gst 缓冲区

Gst 属性#

Gst-nvdsucx 插件具有以下属性，具体取决于使用的元素类型

Gst-nvdsucx gst 属性#

属性

元素类型

描述

类型和范围

示例

addr

Server

客户端将连接到的 IP 地址

String

默认值: 127.0.0.1

addr = 192.168.100.1

addr

Client

服务器 IP 地址

String

默认值: 127.0.0.1

addr = 192.168.100.1

port

Server

来自客户端连接的监听端口

Integer 0 - 66535

默认值: 7174

port = 4000

port

Client

服务器端口号

Integer 0 - 65535

默认值: 7174

port = 4000

buf-type

All

UCX 处理的数据类型

0 - video

1 - audio

2 - raw-audio

4 - text

默认值: 0

Integer

buf-type = 0

gpu-id

Source

要使用的 GPU ID

Integer 0 - 4294967295

默认值: 0

gpu-id=0

raw-buf-size

All

要分配的原始缓冲区大小

Integer 0 - 8192

默认值: 8192

raw-buf-size=1024

nvbuf-memory-type

Source

为输出缓冲区分配的 NvBufSurface 内存类型

0 - Default memory

1 - cuda-pinned (分配固定/主机 Cuda 内存)

2 - cuda-device (分配设备 cuda 内存)

3 - cuda-unified (分配统一 cuda 内存)

默认值: 3

Integer

nvbuf-memory-type = 2

num-nvbuf

Source

要分配的 Nv 缓冲区数量

Integer 0 - 10

默认值: 4

num-nvbuf = 8

nvbuf-batch-size

All

Nv 缓冲区的最大批大小

Integer 1 - 2147483647

默认值: 1

nvbuf-batch-size = 4

num-conns

ServerSink

要期望的客户端连接数 [1]

Integer 1 - 4

默认值: 1

num-conns = 2

脚注

示例#

DeepStream SDK 6.1+ 包括三个示例，说明如何使用 Gst-NvDsUcx 插件来分解/划分 Gstreamer pipeline，使其在单独的进程/服务器上运行。请注意，每个示例都有一个服务器和客户端程序，用于分别运行 pipeline 的不同部分。始终在客户端程序之前启动服务器程序。

示例 1

此处的示例演示了如何在 Gstreamer pipeline 中使用 Gst-NvDsUcx 插件的 serversink 和 clientsrc 元素发送/接收视频数据。该 pipeline 使用 uridecodebin 和 nvvideoconverter 插件，根据 caps filter 将视频帧传递到 serversink 元素。serversink 将此视频数据转发到 clientsrc 元素（在另一个节点/进程上使用 RDMA），然后 clientsrc 元素将数据转发到视频转换器。最后，数据在编码后存储在文件中。

在 DS 节点 1 上

gst-launch-1.0 uridecodebin uri="file:///sample_1080p.mp4" async-handling=1 name=src1 src1. ! \
queue ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! \
nvdsucxserversink addr=192.168.100.1 port=4000 buf-type=nvdsucx-buf-video

在 DS 节点 2 上

gst-launch-1.0 nvdsucxclientsrc addr=192.168.100.1 port=4000 nvbuf-memory-type=2 num-nvbuf=4 buf-type=nvdsucx-buf-video ! \
'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080,framerate=30/1' ! \
queue ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux name=mux_0 ! \
filesink sync=1 async=0 qos=0 location=~/out_1080p.mp4

示例 2

此示例演示了如何使用 Gst-NvDsUcx 插件分发 DS pipeline，并使用序列化/反序列化组件通过 RDMA 网络发送序列化数据。此处的 Deepstream pipeline 由 streammux 插件组成，该插件从解码后的 filesrc 获取输入。streammux 将帧传递到 nvinfer 插件，该插件识别帧中的某些对象并将元数据添加到帧中。序列化插件（Gst-NvDsMetaUtils 库的一部分）创建与元数据对应的二进制对象，并将其添加到帧中。clientsink 和 serversrc 元素在此处用于演示 Gst-NvDsUcx 设置的灵活性。clientsink 将通过 RDMA 将额外的元数据与视频帧一起发送到 serversrc。

然后，serversrc 将数据转发到反序列化插件，该插件提取数据以将元数据正确附加到帧。nvdsosd 插件解释元数据（边界框），然后在编码后存储文件。

在 DS 节点 1 上

gst-launch-1.0 filesrc location=~/sample_1080p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=1 ! \
nvvideoconvert ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app/config_infer_primary.txt ! \
nvdsmetainsert serialize-lib = "/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_video_metadata_serialization.so" ! \
nvdsucxclientsink addr=192.168.100.1 port=4000 buf-type=nvdsucx-buf-video

在 DS 节点 2 上

gst-launch-1.0 nvdsucxserversrc addr=192.168.100.1 port=4000 nvbuf-memory-type=2 num-nvbuf=8 buf-type=nvdsucx-buf-video nvbuf-batch-size=1 ! \
'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080,framerate=30/1' ! nvvideoconvert ! \
nvdsmetaextract deserialize-lib = "/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_video_metadata_serialization.so" ! \
nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=~/out_1080p.mp4

示例 3

此示例演示了如何使用 Gst-NvDsUcx 和音频元数据序列化插件（Gst-NvDsMetaUtils 的一部分）在进程或节点之间分发 DS pipeline 中的音频数据。streammux 插件解释来自音频插件的音频数据，并将其转发到 Gst-NvDsUcx 插件。与示例 2 中的视频元数据序列化插件类似，音频元数据序列化插件创建一个二进制对象，serversink 元素将其转发到 clientsrc 元素。音频元数据被提取并添加到缓冲区，以供下游插件解释。

streammux 和 streamdemux 插件仅在新版本中支持音频，因此在运行示例之前必须设置环境变量。

在 DS 节点 1 上

USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 uridecodebin uri="file:///sample_1080p_h264.mp4" ! audioconvert ! \
audioresample ! 'audio/x-raw,format=F32LE,rate=48000,channels=1,layout=interleaved' ! audiobuffersplit ! \
a_streammux.sink_0 nvstreammux name=a_streammux batch-size=1 sync-inputs=1 max-latency=250000000 ! \
nvdsmetainsert serialize-lib="libnvds_audio_metadata_serialization.so" ! \
nvdsucxserversink addr=192.168.100.2 port=4000 sync=1 async=0 buf-type=nvdsucx-buf-nv-audio

在 DS 节点 2 上

USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 nvdsucxclientsrc addr=192.168.100.2 port=4000 nvbuf-memory-type=2 num-nvbuf=4 buf-type=nvdsucx-buf-nv-audio ! \
'audio/x-raw(memory:NVMM),format=F32LE,rate=48000,channels=1,layout=interleaved' ! \
nvdsmetaextract deserialize-lib = "libnvds_audio_metadata_serialization.so" ! nvstreamdemux name=asd asd.src_0 ! \
audioconvert ! "audio/x-raw,format=S16LE" ! wavenc ! filesink sync=0 async=1 qos=0 location=out.wav

属性	元素类型	描述	类型和范围	示例
addr	Server	客户端将连接到的 IP 地址	String 默认值: 127.0.0.1	addr = 192.168.100.1
addr	Client	服务器 IP 地址	String 默认值: 127.0.0.1	addr = 192.168.100.1
port	Server	来自客户端连接的监听端口	Integer 0 - 66535 默认值: 7174	port = 4000
port	Client	服务器端口号	Integer 0 - 65535 默认值: 7174	port = 4000
buf-type	All	UCX 处理的数据类型 0 - video 1 - audio 2 - raw-audio 4 - text 默认值: 0	Integer	buf-type = 0
gpu-id	Source	要使用的 GPU ID	Integer 0 - 4294967295 默认值: 0	gpu-id=0
raw-buf-size	All	要分配的原始缓冲区大小	Integer 0 - 8192 默认值: 8192	raw-buf-size=1024
nvbuf-memory-type	Source	为输出缓冲区分配的 NvBufSurface 内存类型 0 - Default memory 1 - cuda-pinned (分配固定/主机 Cuda 内存) 2 - cuda-device (分配设备 cuda 内存) 3 - cuda-unified (分配统一 cuda 内存) 默认值: 3	Integer	nvbuf-memory-type = 2
num-nvbuf	Source	要分配的 Nv 缓冲区数量	Integer 0 - 10 默认值: 4	num-nvbuf = 8
nvbuf-batch-size	All	Nv 缓冲区的最大批大小	Integer 1 - 2147483647 默认值: 1	nvbuf-batch-size = 4
num-conns	ServerSink	要期望的客户端连接数 [1]	Integer 1 - 4 默认值: 1	num-conns = 2