重新解释张量#

有时，张量中的数据需要被解释为具有不同的类型或形状。例如，将二进制文件读入内存会生成一个字节值数据的扁平张量，应用程序代码可能希望将其解释为具有特定形状和可能不同类型的数据数组。

DALI 提供了以下操作，这些操作会影响张量元数据（形状、类型、布局）

reshape
reinterpret
squeeze
expand_dims

这些操作既不修改也不复制数据 - 输出张量只是同一内存区域的另一个视图，这使得这些操作非常廉价。

固定输出形状#

此示例演示了 reshape 操作的最简单用法，为现有张量分配新的固定形状。

首先，我们将导入 DALI 和其他必要的模块，并定义一个用于显示数据的实用程序，该实用程序将在本教程中通篇使用。

[1]:

import nvidia.dali as dali
import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
import nvidia.dali.types as types
import numpy as np


def show_result(outputs, names=["Input", "Output"], formatter=None):
    if not isinstance(outputs, tuple):
        return show_result((outputs,))

    outputs = [
        out.as_cpu() if hasattr(out, "as_cpu") else out for out in outputs
    ]

    for i in range(len(outputs[0])):
        print(f"---------------- Sample #{i} ----------------")
        for o, out in enumerate(outputs):
            a = np.array(out[i])
            s = "x".join(str(x) for x in a.shape)
            if names is not None and o < len(names):
                title = names[o]
            else:
                title = f"Output #{o}"
            l = out.layout()
            if l:
                l += " "
            print(f"{title} ({l}{s})")
            np.set_printoptions(formatter=formatter)
            print(a)


def rand_shape(dims, lo, hi):
    return list(np.random.randint(lo, hi, [dims]))

现在让我们定义我们的 pipeline - 它从外部源获取数据，并以原始形式和重塑为固定正方形形状 [5, 5] 的形式返回数据。此外，输出张量的布局设置为 HW。

[2]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example1(input_data):
    np.random.seed(1234)
    inp = fn.external_source(input_data, batch=False, dtype=types.INT32)
    return inp, fn.reshape(inp, shape=[5, 5], layout="HW")


pipe1 = example1(lambda: np.random.randint(0, 10, size=[25], dtype=np.int32))
pipe1.build()
show_result(pipe1.run())

---------------- Sample #0 ----------------
Input (25)
[3 6 5 4 8 9 1 7 9 6 8 0 5 0 9 6 2 0 5 2 6 3 7 0 9]
Output (HW 5x5)
[[3 6 5 4 8]
 [9 1 7 9 6]
 [8 0 5 0 9]
 [6 2 0 5 2]
 [6 3 7 0 9]]
---------------- Sample #1 ----------------
Input (25)
[0 3 2 3 1 3 1 3 7 1 7 4 0 5 1 5 9 9 4 0 9 8 8 6 8]
Output (HW 5x5)
[[0 3 2 3 1]
 [3 1 3 7 1]
 [7 4 0 5 1]
 [5 9 9 4 0]
 [9 8 8 6 8]]
---------------- Sample #2 ----------------
Input (25)
[6 3 1 2 5 2 5 6 7 4 3 5 6 4 6 2 4 2 7 9 7 7 2 9 7]
Output (HW 5x5)
[[6 3 1 2 5]
 [2 5 6 7 4]
 [3 5 6 4 6]
 [2 4 2 7 9]
 [7 7 2 9 7]]

正如我们所见，来自扁平输入张量的数字已被重新排列成 5x5 矩阵。

使用通配符重塑#

现在让我们考虑一个更高级的用例。假设您有一些扁平化的数组，它表示固定数量的列，但行数可以因样本而异。在这种情况下，您可以通过将其形状指定为 -1 来放置通配符维度。当使用通配符时，输出将被调整大小，以便元素总数与输入中的元素总数相同。

[3]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example2(input_data):
    np.random.seed(12345)
    inp = fn.external_source(input_data, batch=False, dtype=types.INT32)
    return inp, fn.reshape(inp, shape=[-1, 5])


pipe2 = example2(
    lambda: np.random.randint(
        0, 10, size=[5 * np.random.randint(3, 10)], dtype=np.int32
    )
)
pipe2.build()
show_result(pipe2.run())

---------------- Sample #0 ----------------
Input (25)
[5 1 4 9 5 2 1 6 1 9 7 6 0 2 9 1 2 6 7 7 7 8 7 1 7]
Output (5x5)
[[5 1 4 9 5]
 [2 1 6 1 9]
 [7 6 0 2 9]
 [1 2 6 7 7]
 [7 8 7 1 7]]
---------------- Sample #1 ----------------
Input (35)
[0 3 5 7 3 1 5 2 5 3 8 5 2 5 3 0 6 8 0 5 6 8 9 2 2 2 9 7 5 7 1 0 9 3 0]
Output (7x5)
[[0 3 5 7 3]
 [1 5 2 5 3]
 [8 5 2 5 3]
 [0 6 8 0 5]
 [6 8 9 2 2]
 [2 9 7 5 7]
 [1 0 9 3 0]]
---------------- Sample #2 ----------------
Input (30)
[0 6 2 1 5 8 6 5 1 0 5 8 2 9 4 7 9 5 2 4 8 2 5 6 5 9 6 1 9 5]
Output (6x5)
[[0 6 2 1 5]
 [8 6 5 1 0]
 [5 8 2 9 4]
 [7 9 5 2 4]
 [8 2 5 6 5]
 [9 6 1 9 5]]

移除和添加单位维度#

有两个专用操作符 squeeze 和 expand_dims 可用于移除和添加单位范围的维度。以下示例演示了移除冗余维度以及添加两个新维度。

[4]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example_squeeze_expand(input_data):
    np.random.seed(4321)
    inp = fn.external_source(
        input_data, batch=False, layout="CHW", dtype=types.INT32
    )
    squeezed = fn.squeeze(inp, axes=[0])
    expanded = fn.expand_dims(squeezed, axes=[0, 3], new_axis_names="FC")
    return inp, fn.squeeze(inp, axes=[0]), expanded


def single_channel_generator():
    return np.random.randint(
        0, 10, size=[1] + rand_shape(2, 1, 7), dtype=np.int32
    )


pipe_squeeze_expand = example_squeeze_expand(single_channel_generator)
pipe_squeeze_expand.build()
show_result(pipe_squeeze_expand.run())

---------------- Sample #0 ----------------
Input (CHW 1x6x3)
[[[8 2 1]
  [7 5 9]
  [2 4 6]
  [0 8 6]
  [5 3 1]
  [1 6 1]]]
Output (HW 6x3)
[[8 2 1]
 [7 5 9]
 [2 4 6]
 [0 8 6]
 [5 3 1]
 [1 6 1]]
Output #2 (FHWC 1x6x3x1)
[[[[8]
   [2]
   [1]]

  [[7]
   [5]
   [9]]

  [[2]
   [4]
   [6]]

  [[0]
   [8]
   [6]]

  [[5]
   [3]
   [1]]

  [[1]
   [6]
   [1]]]]
---------------- Sample #1 ----------------
Input (CHW 1x2x2)
[[[6 9]
  [0 9]]]
Output (HW 2x2)
[[6 9]
 [0 9]]
Output #2 (FHWC 1x2x2x1)
[[[[6]
   [9]]

  [[0]
   [9]]]]
---------------- Sample #2 ----------------
Input (CHW 1x2x6)
[[[4 4 6 6 6 3]
  [8 2 1 7 9 7]]]
Output (HW 2x6)
[[4 4 6 6 6 3]
 [8 2 1 7 9 7]]
Output #2 (FHWC 1x2x6x1)
[[[[4]
   [4]
   [6]
   [6]
   [6]
   [3]]

  [[8]
   [2]
   [1]
   [7]
   [9]
   [7]]]]

重新排列维度#

Reshape 允许您交换、插入或移除维度。参数 src_dims 允许您指定哪个源维度用于给定的输出维度。您还可以通过将 -1 指定为源维度索引来插入新维度。

[5]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example_reorder(input_data):
    np.random.seed(4321)
    inp = fn.external_source(input_data, batch=False, dtype=types.INT32)
    return inp, fn.reshape(inp, src_dims=[1, 0])


pipe_reorder = example_reorder(
    lambda: np.random.randint(0, 10, size=rand_shape(2, 1, 7), dtype=np.int32)
)
pipe_reorder.build()
show_result(pipe_reorder.run())

---------------- Sample #0 ----------------
Input (6x3)
[[8 2 1]
 [7 5 9]
 [2 4 6]
 [0 8 6]
 [5 3 1]
 [1 6 1]]
Output (3x6)
[[8 2 1 7 5 9]
 [2 4 6 0 8 6]
 [5 3 1 1 6 1]]
---------------- Sample #1 ----------------
Input (2x2)
[[6 9]
 [0 9]]
Output (2x2)
[[6 9]
 [0 9]]
---------------- Sample #2 ----------------
Input (2x6)
[[4 4 6 6 6 3]
 [8 2 1 7 9 7]]
Output (6x2)
[[4 4]
 [6 6]
 [6 3]
 [8 2]
 [1 7]
 [9 7]]

添加和移除维度#

可以通过指定 src_dims 参数或使用专用 squeeze 和 expand_dims 操作符来添加或移除维度。

以下示例通过丢弃前导维度并添加新的尾部维度，将单通道数据从 CHW 重新解释为 HWC 布局。它还指定了输出布局。

[6]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example_remove_add(input_data):
    np.random.seed(4321)
    inp = fn.external_source(
        input_data, batch=False, layout="CHW", dtype=types.INT32
    )
    return inp, fn.reshape(
        inp,
        src_dims=[1, 2, -1],
        layout="HWC",  # select HW and add a new one at the end
    )  # specify the layout string


pipe_remove_add = example_remove_add(
    lambda: np.random.randint(0, 10, [1, 4, 3], dtype=np.int32)
)
pipe_remove_add.build()
show_result(pipe_remove_add.run())

---------------- Sample #0 ----------------
Input (CHW 1x4x3)
[[[2 8 2]
  [1 7 5]
  [9 2 4]
  [6 0 8]]]
Output (HWC 4x3x1)
[[[2]
  [8]
  [2]]

 [[1]
  [7]
  [5]]

 [[9]
  [2]
  [4]]

 [[6]
  [0]
  [8]]]
---------------- Sample #1 ----------------
Input (CHW 1x4x3)
[[[6 5 3]
  [1 1 6]
  [1 1 9]
  [6 9 0]]]
Output (HWC 4x3x1)
[[[6]
  [5]
  [3]]

 [[1]
  [1]
  [6]]

 [[1]
  [1]
  [9]]

 [[6]
  [9]
  [0]]]
---------------- Sample #2 ----------------
Input (CHW 1x4x3)
[[[9 9 5]
  [4 4 6]
  [6 6 3]
  [8 2 1]]]
Output (HWC 4x3x1)
[[[9]
  [9]
  [5]]

 [[4]
  [4]
  [6]]

 [[6]
  [6]
  [3]]

 [[8]
  [2]
  [1]]]

相对形状#

输出形状可以用相对术语计算，新范围是源范围的倍数。例如，您可能想要将两个连续的行合并为一个 - 使列数加倍，行数减半。相对形状的使用可以与维度重新排列相结合，在这种情况下，新的输出范围是不同源范围的倍数。

下面的示例将输入重新解释为具有输入行数两倍的列数。

[7]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example_rel_shape(input_data):
    np.random.seed(1234)
    inp = fn.external_source(input_data, batch=False, dtype=types.INT32)
    return inp, fn.reshape(inp, rel_shape=[0.5, 2], src_dims=[1, 0])


pipe_rel_shape = example_rel_shape(
    lambda: np.random.randint(
        0,
        10,
        [np.random.randint(1, 7), 2 * np.random.randint(1, 5)],
        dtype=np.int32,
    )
)

pipe_rel_shape.build()
show_result(pipe_rel_shape.run())

---------------- Sample #0 ----------------
Input (4x6)
[[5 4 8 9 1 7]
 [9 6 8 0 5 0]
 [9 6 2 0 5 2]
 [6 3 7 0 9 0]]
Output (3x8)
[[5 4 8 9 1 7 9 6]
 [8 0 5 0 9 6 2 0]
 [5 2 6 3 7 0 9 0]]
---------------- Sample #1 ----------------
Input (4x6)
[[3 1 3 1 3 7]
 [1 7 4 0 5 1]
 [5 9 9 4 0 9]
 [8 8 6 8 6 3]]
Output (3x8)
[[3 1 3 1 3 7 1 7]
 [4 0 5 1 5 9 9 4]
 [0 9 8 8 6 8 6 3]]
---------------- Sample #2 ----------------
Input (2x6)
[[5 2 5 6 7 4]
 [3 5 6 4 6 2]]
Output (3x4)
[[5 2 5 6]
 [7 4 3 5]
 [6 4 6 2]]

重新解释数据类型#

reinterpret 操作可以查看数据，就好像它是不同的类型一样。当未指定新形状时，最里面的维度将相应地调整大小。

[8]:

@pipeline_def(device_id=0, num_threads=4, batch_size=3)
def example_reinterpret(input_data):
    np.random.seed(1234)
    inp = fn.external_source(input_data, batch=False, dtype=types.UINT8)
    return inp, fn.reinterpret(inp, dtype=dali.types.UINT32)


pipe_reinterpret = example_reinterpret(
    lambda: np.random.randint(
        0,
        255,
        [np.random.randint(1, 7), 4 * np.random.randint(1, 5)],
        dtype=np.uint8,
    )
)

pipe_reinterpret.build()


def hex_bytes(x):
    f = f"0x{{:0{2*x.nbytes}x}}"
    return f.format(x)


show_result(pipe_reinterpret.run(), formatter={"int": hex_bytes})

---------------- Sample #0 ----------------
Input (4x12)
[[0x35 0xdc 0x5d 0xd1 0xcc 0xec 0x0e 0x70 0x74 0x5d 0xb3 0x9c]
 [0x98 0x42 0x0d 0xc9 0xf9 0xd7 0x77 0xc5 0x8f 0x7e 0xac 0xc7]
 [0xb1 0xda 0x54 0xdc 0x17 0xa1 0xc8 0x45 0xe9 0x24 0x90 0x26]
 [0x9a 0x5c 0xc6 0x46 0x1e 0x20 0xd2 0x32 0xab 0x7e 0x47 0xcd]]
Output (4x3)
[[0xd15ddc35 0x700eeccc 0x9cb35d74]
 [0xc90d4298 0xc577d7f9 0xc7ac7e8f]
 [0xdc54dab1 0x45c8a117 0x269024e9]
 [0x46c65c9a 0x32d2201e 0xcd477eab]]
---------------- Sample #1 ----------------
Input (5x4)
[[0x1a 0x1f 0x3d 0xe0]
 [0x76 0x35 0xbb 0x1d]
 [0xba 0xe9 0x99 0x5b]
 [0x78 0xe8 0x4d 0x03]
 [0x70 0x37 0x41 0x80]]
Output (5x1)
[[0xe03d1f1a]
 [0x1dbb3576]
 [0x5b99e9ba]
 [0x034de878]
 [0x80413770]]
---------------- Sample #2 ----------------
Input (5x8)
[[0x50 0x6d 0xbd 0x54 0xc9 0xa3 0x73 0xb6]
 [0x7f 0xc9 0x79 0xcd 0xf6 0xc0 0xc8 0x5e]
 [0xfe 0x09 0x27 0x19 0xaf 0x8d 0xaa 0x8f]
 [0x32 0x96 0x55 0x0e 0xf0 0x0e 0xca 0x80]
 [0xfb 0x56 0x52 0x71 0x4c 0x54 0x86 0x03]]
Output (5x2)
[[0x54bd6d50 0xb673a3c9]
 [0xcd79c97f 0x5ec8c0f6]
 [0x192709fe 0x8faa8daf]
 [0x0e559632 0x80ca0ef0]
 [0x715256fb 0x0386544c]]