数据加载：TensorFlow TFRecord#

概述#

此示例演示如何将以 TensorFlow TFRecord 格式存储的数据与 DALI 一起使用。

创建索引#

要使用以 TFRecord 格式存储的数据，我们需要使用 readers.TFRecord 操作符。除了所有读取器通用的参数（例如 random_shuffle）之外，此操作符还接受 path、index_path 和 features 参数。

path 是 TFRecord 文件路径的列表
index_path 是包含索引文件路径的列表，DALI 主要使用索引文件在多个 worker 之间正确地分片数据集。TFRecord 文件的索引可以使用 DALI 附带的 tfrecord2idx 实用程序从该文件获得。每个 TFRecord 文件只需创建一次索引文件。
features 是（名称，特征）对的字典，其中特征（类型为 dali.tfrecord.Feature）描述 TFRecord 的内容。DALI 特征紧密遵循 TensorFlow 类型 tf.FixedLenFeature 和 tf.VarLenFeature。

DALI_EXTRA_PATH 环境变量应指向从 DALI extra 存储库下载的数据的位置。

重要提示：确保您检出与已安装的 DALI 版本相对应的正确发布标签。

[1]:

from subprocess import call
import os.path

test_data_root = os.environ["DALI_EXTRA_PATH"]
tfrecord = os.path.join(test_data_root, "db", "tfrecord", "train")
batch_size = 16
tfrecord_idx = "idx_files/train.idx"
tfrecord2idx_script = "tfrecord2idx"

if not os.path.exists("idx_files"):
    os.mkdir("idx_files")

if not os.path.isfile(tfrecord_idx):
    call([tfrecord2idx_script, tfrecord, tfrecord_idx])

定义和运行 Pipeline#

定义一个简单的 pipeline，它接受以 TFRecord 格式存储的图像，对其进行解码，并为将其摄取到 DL 框架中做准备。

图像处理涉及裁剪、归一化和 HWC -> CHW 转换过程。

本示例中使用的 TFRecord 文件未将图像放大到通用大小。这会在裁剪期间导致错误，此时图像小于裁剪窗口。要克服此问题，请在裁剪之前使用 Resize 操作。此步骤确保裁剪的图像的较短边为 256 像素。

[2]:

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import nvidia.dali.tfrecord as tfrec
import numpy as np

pipe = Pipeline(batch_size=batch_size, num_threads=4, device_id=0)
with pipe:
    inputs = fn.readers.tfrecord(
        path=tfrecord,
        index_path=tfrecord_idx,
        features={
            "image/encoded": tfrec.FixedLenFeature((), tfrec.string, ""),
            "image/class/label": tfrec.FixedLenFeature([1], tfrec.int64, -1),
            "image/class/text": tfrec.FixedLenFeature([], tfrec.string, ""),
            "image/object/bbox/xmin": tfrec.VarLenFeature(tfrec.float32, 0.0),
            "image/object/bbox/ymin": tfrec.VarLenFeature(tfrec.float32, 0.0),
            "image/object/bbox/xmax": tfrec.VarLenFeature(tfrec.float32, 0.0),
            "image/object/bbox/ymax": tfrec.VarLenFeature(tfrec.float32, 0.0),
        },
    )
    jpegs = inputs["image/encoded"]
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    resized = fn.resize(images, device="gpu", resize_shorter=256.0)
    output = fn.crop_mirror_normalize(
        resized,
        dtype=types.FLOAT,
        crop=(224, 224),
        mean=[0.0, 0.0, 0.0],
        std=[1.0, 1.0, 1.0],
    )
    pipe.set_outputs(output, inputs["image/class/text"])

构建并运行我们的 pipeline

[3]:

pipe.build()
pipe_out = pipe.run()

要可视化结果，请使用 matplotlib 库，该库期望图像采用 HWC 格式，但 pipeline 的输出采用 CHW 格式。

注意：CHW 是大多数深度学习框架的首选格式。
为了可视化目的，请将图像转置回 HWC 布局。

[4]:

import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt

%matplotlib inline


def show_images(image_batch, labels):
    columns = 4
    rows = (batch_size + 1) // (columns)
    fig = plt.figure(figsize=(32, (32 // columns) * rows))
    gs = gridspec.GridSpec(rows, columns)
    for j in range(rows * columns):
        plt.subplot(gs[j])
        plt.axis("off")
        ascii = labels.at(j)
        plt.title("".join([chr(item) for item in ascii]))
        img_chw = image_batch.at(j)
        img_hwc = np.transpose(img_chw, (1, 2, 0)) / 255.0
        plt.imshow(img_hwc)

[5]:

images, labels = pipe_out
show_images(images.as_cpu(), labels)

为了获得更大的灵活性，VarLenFeature 支持 partial_shape 参数。如果提供，数据将被重塑以匹配其值。第一维将从数据大小推断出来。