几何变换#

在此示例中,我们演示了 transforms 模块中的操作符,以及如何使用它们来变换图像和点云。

仿射变换#

transforms 模块中的操作符可以生成和组合不同类型的仿射变换矩阵。仿射变换由以下公式定义

\[\begin{split}X_{out} = \begin{vmatrix} M & T \end{vmatrix} \begin{vmatrix} X_{in} \\ 1 \end{vmatrix}\end{split}\]

其中 \(X_{in}\) 是输入点,\(X_{out}\) 是对应的输出,\(M\) 是变换的线性部分,\(T\) 是平移向量。

如果点在 2D 空间中,则公式可以写为

\[\begin{split}\begin{vmatrix} x_{out} \\ y_{out} \end{vmatrix} = \begin{vmatrix} m_{00} & m_{01} & t_x \\ m_{10} & m_{11} & t_y \end{vmatrix} \begin{vmatrix} x_{in} \\ y_{in} \\ 1 \end{vmatrix}\end{split}\]

变换目录#

transforms 模块中有几种可用的变换。这些操作符中的每一个都可以生成仿射变换矩阵,并将其与预先存在的变换组合。以下是可用变换的列表

  • rotation - 围绕给定点和轴(仅限 3D)按给定角度(以度为单位)旋转

  • translation - 按给定偏移量平移

  • scale - 按给定因子缩放

  • shear - 按给定因子或角度剪切;2D 有 2 个剪切因子,3D 有 6 个因子

  • crop - 平移和缩放,使输入角(from_startfrom_end)映射到输出角(to_startto_end)。

操作符的文档包含有关其参数的详细信息。

还有一个操作符 combine,它组合了多个仿射变换。

案例研究:变换关键点#

为了说明变换的功能,我们将它们应用于带有相应关键点数据的图像 - 在本例中为面部landmark。我们首先导入必要的模块,定义数据的位置,并编写一个实用程序来显示带有绘制在其上的关键点的图像。

[1]:
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import math
import os

dali_extra_dir = os.environ["DALI_EXTRA_PATH"]
root_dir = os.path.join(dali_extra_dir, "db", "face_landmark")

# images are in JPEG format
image_files = ["{}.jpeg".format(i) for i in range(6)]
# keypoints are in NumPy files
keypoint_files = ["{}.npy".format(i) for i in range(6)]
[2]:
def show(images, landmarks):
    if hasattr(images, "as_cpu"):
        images = images.as_cpu()
    batch_size = len(images)

    import matplotlib.gridspec as gridspec

    fig = plt.figure(figsize=(16, 14))
    plt.suptitle(None)
    columns = 3
    rows = int(math.ceil(batch_size / columns))
    gs = gridspec.GridSpec(rows, columns)
    for i in range(batch_size):
        ax = plt.subplot(gs[i])
        plt.axis("off")
        plt.title("")
        img = images.at(i)
        r = 0.002 * max(img.shape[0], img.shape[1])
        for p in landmarks.at(i):
            circle = patches.Circle(p, r, color=(0, 1, 0, 1))
            ax.add_patch(circle)
        plt.imshow(img)

首先,让我们构建一个仅加载图像和关键点的 pipeline,不进行任何增强

[3]:
@pipeline_def
def basic_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    return images, keypoints


pipe = basic_pipe(batch_size=6, num_threads=3, device_id=0)
[4]:
pipe.build()
images, keypoints = pipe.run()
[5]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_7_0.png

向 Pipeline 添加变换#

在此步骤中,我们将变换应用于图像和关键点。我们使用 warp_affine 来变换图像,并使用 coord_transform 来变换关键点。操作符 warp_affine 使用变换矩阵来执行反向映射:目标像素坐标映射到源坐标。这有效地通过变换矩阵的逆矩阵变换了图像特征的位置。为了使关键点和图像以相同的方式变换,我们需要在 warp_affine 中指定 inverse_map=False

[6]:
@pipeline_def
def rotate_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)))
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = rotate_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[7]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_10_0.png

正如我们所见,图像已围绕点 (0, 0) 旋转,该点是左上角。要围绕中心旋转,我们可以将附加的 center 参数传递给 rotate。为了使用图像形状,我们必须使用动态执行器(它允许我们使用 GPU 张量的形状用于 CPU 操作符)或在使用 peek_image_shape 操作符解码之前查找图像形状。

[8]:
@pipeline_def
def center_rotate_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    # look up the shape of the encoded images and convert them from HWC to WH
    size = fn.peek_image_shape(jpegs)[1::-1]
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(
        angle=fn.random.uniform(range=(-45, 45)), center=center
    )
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = center_rotate_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[9]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_13_0.png

组合变换#

我们还可以组合多个变换。这可以通过两种方式实现

  1. 通过将现有的变换矩阵作为输入传递给变换操作符,

  2. 通过显式使用 transforms.combine

在下面的示例中,我们应用旋转,然后进行水平平移。

[10]:
@pipeline_def(exec_dynamic=True)
def multi_transform_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    # with exec_dynamic=True, we can just use the images' shape directly
    size = images.shape()[1::-1]  # get WH from HWC shape
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    mt = fn.transforms.rotation(
        angle=fn.random.uniform(range=(-45, 45)), center=center
    )
    mt = fn.transforms.translation(mt, offset=(300, 0))
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = multi_transform_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[11]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_16_0.png

使用 transforms.combine 组合多个变换#

本节演示了 combine 操作符与来自其他变换和常量的结果的用法。

[12]:
@pipeline_def(exec_dynamic=True)
def transform_combine_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")

    size = images.shape()[1::-1]  # get WH from HWC shape
    center = size / 2

    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)
    tr1 = fn.transforms.translation(offset=-center)
    tr2 = fn.transforms.translation(offset=center)
    rot = fn.transforms.rotation(angle=fn.random.uniform(range=(-45, 45)))
    mt = fn.transforms.combine(
        tr1, rot, np.float32([[1, 1, 0], [0, 1, 0]]), tr2
    )
    images = fn.warp_affine(images, matrix=mt, fill_value=0, inverse_map=False)
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = transform_combine_pipe(
    batch_size=6, num_threads=3, device_id=0, seed=1234
)
pipe.build()
images, keypoints = pipe.run()
[13]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_19_0.png

关键点裁剪#

在下面的示例中,我们应用一些随机变换并裁剪结果,使面部位于输出图像的中心。

[14]:
@pipeline_def
def crop_pipe():
    jpegs, _ = fn.readers.file(file_root=root_dir, files=image_files)
    images = fn.decoders.image(jpegs, device="mixed")
    keypoints = fn.readers.numpy(file_root=root_dir, files=keypoint_files)

    # This part defines the agumentations: shear + rotation
    mt = fn.transforms.shear(shear=fn.random.uniform(range=(-1, 1), shape=[2]))
    mt = fn.transforms.rotation(mt, angle=fn.random.uniform(range=(-45, 45)))

    # Now, let's see where the keypoints would be after applying this transform
    uncropped = fn.coord_transform(keypoints, MT=mt)

    # Find the bounding box of the keypoints
    lo = fn.reductions.min(uncropped, axes=[0])
    hi = fn.reductions.max(uncropped, axes=[0])
    # ...and get its larger extent (width or height)
    size = fn.reductions.max(hi - lo)
    center = (lo + hi) / 2
    # make a square region centered at the center of the bounding box
    lo = center - size  # full size - this adds 50% margin
    hi = center + size  # likewise

    # Now we can calculate a crop transform that will map the bounding box to
    # a 400x400 window and combine it with the previous transform.
    mt = fn.transforms.crop(
        mt, from_start=lo, from_end=hi, to_start=[0, 0], to_end=[400, 400]
    )

    # Apply the transform to the keypoints; specify the output size of 400x400.
    images = fn.warp_affine(
        images, size=[400, 400], matrix=mt, fill_value=0, inverse_map=False
    )
    keypoints = fn.coord_transform(keypoints, MT=mt)
    return images, keypoints


pipe = crop_pipe(batch_size=6, num_threads=3, device_id=0, seed=1234)
pipe.build()
images, keypoints = pipe.run()
[15]:
show(images, keypoints)
../../_images/examples_math_geometric_transforms_22_0.png