视频管道从目录读取带标签的视频#

在此示例中，我们将介绍如何使用 readers.video 运算符创建管道，以读取视频及其标签。该管道将返回一对输出：一批序列和相应的标签。

有关 readers.video 参数的更多信息，请查看文档。

设置#

首先，让我们从导入开始

[1]:

import os
import numpy as np

from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

我们需要一些视频容器来处理。我们可以使用 Sintel 预告片，这是一个 mp4 容器，包含 h.264 视频，并根据 Create Common 许可证分发。我们已将其拆分为 5 秒的片段，并将这些片段划分为带标签的组。这可以使用 ffmpeg 独立工具轻松完成。

然后，我们可以设置将在管道中使用的参数。count 参数将定义每个序列样本中所需的帧数。

我们可以将 video_directory 替换为任何其他包含带标签的子目录和 FFmpeg 识别的视频容器文件的目录。

[2]:

batch_size = 2
sequence_length = 8
initial_prefetch_size = 11
video_directory = os.path.join(
    os.environ["DALI_EXTRA_PATH"], "db", "video", "sintel", "labelled_videos"
)
shuffle = True
n_iter = 6

注意：DALI_EXTRA_PATH 环境变量应指向从 DALI extra repository 下载数据的位置。请确保检出正确的发布标签。

运行管道#

然后，我们可以定义一个最小的 Pipeline，它将直接输出 readers.Video 输出

[3]:

@pipeline_def
def video_pipe(file_root):
    video, labels = fn.readers.video(
        device="gpu",
        file_root=file_root,
        sequence_length=sequence_length,
        random_shuffle=True,
        initial_fill=initial_prefetch_size,
    )
    return video, labels

注意：这里重要的一点是调整 initial_fill，它对应于 Loader 预取缓冲区的初始大小。由于此缓冲区将填充 initial_fill 序列，因此帧的总数可能非常巨大！因此，请相应地设置它，以避免在训练期间发生 OOM。

让我们尝试在设备 0 上构建和运行一个 video_pipe 实例，它将在每次迭代时输出 batch_size 个 count 帧序列和 batch_size 个标签。

[4]:

pipe = video_pipe(
    batch_size=batch_size,
    num_threads=2,
    device_id=0,
    file_root=video_directory,
    seed=12345,
)
pipe.build()
for i in range(n_iter):
    sequences_out, labels = pipe.run()
    sequences_out = sequences_out.as_cpu().as_array()
    labels = labels.as_cpu().as_array()
    print(sequences_out.shape)
    print(labels.shape)

(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)
(2, 8, 720, 1280, 3)
(2, 1)

可视化结果#

之前的迭代似乎产生了预期形状的批次。但让我们可视化结果以进行

[5]:

sequences_out, labels = pipe.run()
sequences_out = sequences_out.as_cpu().as_array()
labels = labels.as_cpu().as_array()

我们将使用 matplotlib 来显示我们在最后一批中获得的帧。

[6]:

%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec

[7]:

def show_sequence(sequence, label):
    columns = 4
    rows = (sequence_length + 1) // (columns)
    fig = plt.figure(figsize=(32, (16 // columns) * rows))
    gs = gridspec.GridSpec(rows, columns)
    for j in range(rows * columns):
        plt.subplot(gs[j])
        plt.axis("off")
        plt.suptitle("label " + str(label[0]), fontsize=30)
        plt.imshow(sequence[j])

现在让我们生成 5 批序列、标签对

[8]:

ITER = 5
for i in range(ITER):
    sequences_out, labels = pipe.run()
    sequences_out = sequences_out.as_cpu().as_array()
    labels = labels.as_cpu().as_array()
    show_sequence(sequences_out[1], labels[1])

../../../_images/examples_sequence_processing_video_video_reader_label_example_18_0.png

../../../_images/examples_sequence_processing_video_video_reader_label_example_18_1.png

../../../_images/examples_sequence_processing_video_video_reader_label_example_18_2.png

../../../_images/examples_sequence_processing_video_video_reader_label_example_18_3.png

../../../_images/examples_sequence_processing_video_video_reader_label_example_18_4.png