量化¶

将浮点输入张量量化为量化输出张量。量化计算如下： \(output_{i_0,..,i_n} = \text{clamp}(\text{round}(\frac{input_{i_0,..,i_n}}{scale} + \text{zero_point}))\)。

属性¶

axis 执行量化的轴。

toType 输出张量的数据类型。默认为 int8。

输入¶

input: T1 类型的张量。

scale: T1 类型的张量，提供量化比例。scale 张量必须是构建时常量。其维度必须是标量（用于逐张量量化）、1-D 张量（用于逐通道量化）或与输入张量相同的秩（支持 DataType::kINT4 和 DataType::kFP4）。

zero_point: T2 类型的张量，提供量化零点。zero_point 张量是可选的，如果未设置，则假定为零。如果设置了 zero_point，则它必须仅包含零值系数，并且必须具有与 scale 相同的形状。

输出¶

output: T3 类型的张量。

数据类型¶

T1: float16, bfloat16, float32

T2: float32

T3: int4, int8, float4, float8

形状信息¶

input 和 output 是形状为 \([a_0,...,a_n]\) 的张量。

如果定义了 zero_point，则 scale 和 zero_point 必须具有相同的形状。

体积限制¶

input、scale 和 zero_point 最多可以有 \(2^{31}-1\) 个元素。

示例¶

量化

in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 3, 3))
scale = network.add_constant(shape=(1,), weights=np.array([1 / 127], dtype=np.float32))
quantize = network.add_quantize(in1, scale.get_output(0))
quantize.axis = 3
dequantize = network.add_dequantize(quantize.get_output(0), scale.get_output(0))
dequantize.axis = 3
network.mark_output(dequantize.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [0.56, 0.89, 1.4],
            [-0.56, 0.39, 6.0],
            [0.67, 0.11, -3.6],
        ]
    ]
)

outputs[dequantize.get_output(0).name] = dequantize.get_output(0).shape
expected[dequantize.get_output(0).name] = np.array(
    [
        [
            [0.56, 0.89, 1],
            [-0.56, 0.39, 1.0],
            [0.67, 0.11, -1.0],
        ]
    ]
)

块量化

in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 8))
weights = network.add_constant(shape=(4, 8), weights=np.array([
                                                               [1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0],
                                                               [1.1, 1.2, 2.1, 2.2, 3.1, 3.2, 4.1, 4.2],
                                                               [4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0],
                                                               [4.1, 4.2, 5.1, 5.2, 6.1, 6.2, 7.1, 7.2],
                                                               ], dtype=np.float32))
scale = network.add_constant(shape=(2, 8), weights=np.array([
                                                            [1, 1, 2, 2, 3, 3, 4, 4],
                                                            [4, 4, 5, 5, 6, 6, 7, 7]
                                                          ], dtype=np.float32))
quantize = network.add_quantize(weights.get_output(0), scale.get_output(0), trt.int4)
dequantize = network.add_dequantize(quantize.get_output(0), scale.get_output(0), trt.float32)
network.mark_output(dequantize.get_output(0))

inputs[in1.name] = np.array(
    [
        [2, 2, 2, 2, 2, 2, 2, 2],
    ]
)

outputs[dequantize.get_output(0).name] = dequantize.get_output(0).shape
expected[dequantize.get_output(0).name] = np.array(
    [
        [
            [1, 1, 2, 2, 3, 3, 4, 4],
            [1, 1, 2, 2, 3, 3, 4, 4],
            [4, 4, 5, 5, 6, 6, 7, 7],
            [4, 4, 5, 5, 6, 6, 7, 7],
        ]
    ]
)

C++ API¶

有关 C++ IQuantizeLayer 算子的更多信息，请参阅 C++ IQuantizeLayer 文档。

Python API¶

有关 Python IQuantizeLayer 算子的更多信息，请参阅 Python IQuantizeLayer 文档。