Normalize Operator#

这向您展示了如何使用 Normalize 操作符。

简介#

归一化是将数据值平移和缩放以匹配所需分布的过程。它计算均值 \(\mu\) 和标准差 \(\sigma\) 并按如下方式修改数据

\[Y_i = \frac{X_i - \mu}{\sigma}\]

Normalize 中有更多高级功能，将在本文档的后面部分进行解释。

使用 `Normalize` 操作符#

我们需要一些样板代码来导入 DALI 和其他一些有用的库，并可视化结果。

[1]:

from nvidia.dali.pipeline import Pipeline
import math
import nvidia.dali.ops as ops
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import nvidia.dali.backend as backend

batch_size = 10
image_filename = "../data/images"

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec


def display(outputs, idx, columns=2, captions=None):
    rows = int(math.ceil(len(outputs) / columns))
    fig = plt.figure()
    fig.set_size_inches(16, 6 * rows)
    gs = gridspec.GridSpec(rows, columns)
    row = 0
    col = 0
    for i, out in enumerate(outputs):
        if isinstance(out, backend.TensorListGPU):
            out = out.as_cpu()
        plt.subplot(gs[i])
        plt.axis("off")
        if captions is not None:
            plt.title(captions[i])
        plt.imshow(out.at(idx))


def show(pipe, idx, columns=2, captions=None):
    pipe.build()
    display(pipe.run(), idx, columns, captions)

一个简单的 Pipeline#

创建一个简单的 pipeline，它只加载一些图像并对其进行归一化，并将图像数据视为包含 3*W*H 个数字（RGB 通道为 3）的平面数组。

[2]:

pipe = Pipeline(batch_size=batch_size, num_threads=1, device_id=0)
with pipe:
    jpegs, _ = fn.readers.file(file_root=image_filename)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    norm = fn.normalize(images)

    pipe.set_outputs(images, norm)

[3]:

show(pipe, 1)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

../../_images/examples_general_normalize_6_1.svg

调整输出动态范围#

正如您在上面的示例中看到的，图像强度值已被缩放和平移，许多像素被强制低于 0，并显示为黑色。在许多用例中您可能需要此结果，但如果输出类型具有有限的动态范围（例如 uint8），您可能希望将均值和标准差映射到更有效地利用该有限值范围的值。为此，Normalize 提供了两个标量参数 shift 和 scale。现在归一化公式变为

\[Y_i = \frac{X_i - \mu}{\sigma} \cdot {scale} + {shift}\]

修改 pipeline 以生成 uint8 输出，其中均值映射到 128，标准差映射到 64，这允许 \(\mu \pm 2\sigma\) 范围内的值在输出中得到正确表示。

[4]:

pipe = Pipeline(batch_size=batch_size, num_threads=1, device_id=0)
with pipe:
    jpegs, _ = fn.readers.file(file_root=image_filename)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    norm = fn.normalize(images, scale=64, shift=128, dtype=types.UINT8)

    pipe.set_outputs(images, norm)

[5]:

show(pipe, 1)

../../_images/examples_general_normalize_9_0.svg

定向归约#

对于多维数据，仅针对维度子集计算均值和标准差可能很有用。例如，维度可能对应于图像的高度 (0)、宽度 (1) 和颜色通道 (2)。归约维度 0、1（高度、宽度）将为每个通道生成单独的均值和标准差。Normalize 支持两个参数来指定方向

axes - 维度索引的元组，其中 0 是最外层。
axis_names - 在输入布局中查找的轴符号。

以下示例沿 WC、H、WH 和 C 归一化数据。

[6]:

pipe = Pipeline(batch_size=batch_size, num_threads=1, device_id=0)
with pipe:
    jpegs, _ = fn.readers.file(file_root=image_filename)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    normwc = fn.normalize(
        images, axes=(1, 2), scale=64, shift=128, dtype=types.UINT8
    )
    normh = fn.normalize(
        images, axis_names="H", scale=64, shift=128, dtype=types.UINT8
    )
    normhw = fn.normalize(
        images, axis_names="HW", scale=64, shift=128, dtype=types.UINT8
    )
    normc = fn.normalize(
        images, axes=(2,), scale=64, shift=128, dtype=types.UINT8
    )

    pipe.set_outputs(images, normwc, normh, normhw, normc)

[7]:

titles = [
    "Original",
    "Width and channels",
    "Height",
    "Height and width",
    "Channel",
]
show(pipe, 9, captions=titles)

../../_images/examples_general_normalize_12_0.svg

外部提供的参数#

默认情况下，Normalize 在内部计算均值和标准差。但是，这些值可以通过 mean 和 stddev 参数从外部提供，并且可以是标量值或输入。当提供 mean 或 stddev 值作为输入时，归约方向可以从参数的形状推断出来。如果 mean 和 stddev 值是输入，则它们必须具有相同的形状。

[8]:

pipe = Pipeline(batch_size=batch_size, num_threads=1, device_id=0)
with pipe:
    jpegs, _ = fn.readers.file(file_root=image_filename)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    norm_mean = fn.normalize(
        images, mean=64, axis_names="HW", scale=64, shift=128, dtype=types.UINT8
    )
    norm_stddev = fn.normalize(
        images,
        stddev=200,
        axis_names="HW",
        scale=64,
        shift=128,
        dtype=types.UINT8,
    )

    pipe.set_outputs(images, norm_mean, norm_stddev)

[9]:

show(pipe, 1, captions=["Original", "Fixed mean", "Fixed standard deviation"])

../../_images/examples_general_normalize_15_0.svg

批归一化#

Normalize 可以计算整个批次的均值和标准差，而不是基于每个项目。您可以通过将 batch 参数设置为 True 来启用此行为。批归一化要求非归约维度的范围与批次中的所有样本匹配。例如，由于通道是单独归一化的，因此以下 pipeline 期望所有图像都具有三个通道。

[10]:

pipe = Pipeline(batch_size=batch_size, num_threads=1, device_id=0)
with pipe:
    jpegs, _ = fn.readers.file(file_root=image_filename)
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
    norm_sample = fn.normalize(
        images,
        batch=False,
        axis_names="HW",
        scale=64,
        shift=128,
        dtype=types.UINT8,
    )
    norm_batch = fn.normalize(
        images,
        batch=True,
        axis_names="HW",
        scale=64,
        shift=128,
        dtype=types.UINT8,
    )

    pipe.set_outputs(images, norm_sample, norm_batch)

[11]:

show(
    pipe,
    1,
    columns=3,
    captions=["Original", "Per-sample normalization", "Batch normalization"],
)
show(
    pipe,
    4,
    columns=3,
    captions=["Original", "Per-sample normalization", "Batch normalization"],
)
show(
    pipe,
    7,
    columns=3,
    captions=["Original", "Per-sample normalization", "Batch normalization"],
)
show(
    pipe,
    9,
    columns=3,
    captions=["Original", "Per-sample normalization", "Batch normalization"],
)

../../_images/examples_general_normalize_18_0.svg

../../_images/examples_general_normalize_18_1.svg

../../_images/examples_general_normalize_18_2.svg

../../_images/examples_general_normalize_18_3.svg