4.6. Half2 算术函数

要使用这些函数，请在程序中包含头文件 cuda_fp16.h。

函数

__host__ __device__ __half2 __h2div(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量除法。
__host__ __device__ __half2 __habs2(const __half2 a): 计算输入 half2 数字的两个半部分的绝对值，并返回结果。
__host__ __device__ __half2 __hadd2(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量加法。
__host__ __device__ __half2 __hadd2_rn(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量加法。
__host__ __device__ __half2 __hadd2_sat(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量加法，结果饱和到 [0.0, 1.0]。
__device__ __half2 __hcmadd(const __half2 a, const __half2 b, const __half2 c): 执行快速复数乘法累加。
__device__ __half2 __hfma2(const __half2 a, const __half2 b, const __half2 c): 以舍入到最近的偶数模式执行 half2 向量融合乘法累加。
__device__ __half2 __hfma2_relu(const __half2 a, const __half2 b, const __half2 c): 以舍入到最近的偶数模式执行 half2 向量融合乘法累加，并使用 ReLU 饱和。
__device__ __half2 __hfma2_sat(const __half2 a, const __half2 b, const __half2 c): 以舍入到最近的偶数模式执行 half2 向量融合乘法累加，结果饱和到 [0.0, 1.0]。
__host__ __device__ __half2 __hmul2(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量乘法。
__host__ __device__ __half2 __hmul2_rn(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量乘法。
__host__ __device__ __half2 __hmul2_sat(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量乘法，结果饱和到 [0.0, 1.0]。
__host__ __device__ __half2 __hneg2(const __half2 a): 对输入 half2 数字的两个半部分取反，并返回结果。
__host__ __device__ __half2 __hsub2(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量减法。
__host__ __device__ __half2 __hsub2_rn(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量减法。
__host__ __device__ __half2 __hsub2_sat(const __half2 a, const __half2 b): 以舍入到最近的偶数模式执行 half2 向量减法，结果饱和到 [0.0, 1.0]。
__device__ __half2 atomicAdd(__half2 *const address, const __half2 val): 向量加法：将 val 加到存储在全局或共享内存 address 的值，并将此值写回 address。
__host__ __device__ __half2 operator*(const __half2 &lh, const __half2 &rh): 执行 packed half 乘法运算。
__host__ __device__ __half2 & operator*=(__half2 &lh, const __half2 &rh): 使用乘法运算执行 packed half 复合赋值。
__host__ __device__ __half2 operator+(const __half2 &h): 实现 packed half 一元加运算符，返回输入值。
__host__ __device__ __half2 operator+(const __half2 &lh, const __half2 &rh): 执行 packed half 加法运算。
__host__ __device__ __half2 operator++(__half2 &h, const int ignored): 执行 packed half 后缀递增运算。
__host__ __device__ __half2 & operator++(__half2 &h): 执行 packed half 前缀递增运算。
__host__ __device__ __half2 & operator+=(__half2 &lh, const __half2 &rh): 使用加法运算执行 packed half 复合赋值。
__host__ __device__ __half2 operator-(const __half2 &h): 实现 packed half 一元减运算符。
__host__ __device__ __half2 operator-(const __half2 &lh, const __half2 &rh): 执行 packed half 减法运算。
__host__ __device__ __half2 & operator–(__half2 &h): 执行 packed half 前缀递减运算。
__host__ __device__ __half2 operator–(__half2 &h, const int ignored): 执行 packed half 后缀递减运算。
__host__ __device__ __half2 & operator-=(__half2 &lh, const __half2 &rh): 使用减法运算执行 packed half 复合赋值。
__host__ __device__ __half2 operator/(const __half2 &lh, const __half2 &rh): 执行 packed half 除法运算。
__host__ __device__ __half2 & operator/=(__half2 &lh, const __half2 &rh): 使用除法运算执行 packed half 复合赋值。

4.6.1. 函数

__host__ __device__ __half2 __h2div(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量除法。

以舍入到最近的偶数模式，将 half2 输入向量 a 除以输入向量 b。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

a 除以 b 的元素级除法结果。

__host__ __device__ __half2 __habs2(const __half2 a)

计算输入 half2 数字的两个半部分的绝对值，并返回结果。

另请参阅

__habs(__half) 以了解更多详细信息。

参数

a – [in] - half2。只读。

返回值

half2

返回 a，其中包含两个半部分的绝对值。

__host__ __device__ __half2 __hadd2(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量加法。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量加法。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 a 和 b 的和。

__host__ __device__ __half2 __hadd2_rn(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量加法。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量加法。防止将 mul+add 浮点收缩为 fma。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 a 和 b 的和。

__host__ __device__ __half2 __hadd2_sat(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量加法，结果饱和到 [0.0, 1.0]。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量加法，并将结果钳位到 [0.0, 1.0] 范围。NaN 结果被刷新为 +0.0。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

a 和 b 的和，考虑了饱和。

__device__ __half2 __hcmadd(const __half2 a, const __half2 b, const __half2 c)

执行快速复数乘法累加。

将向量 half2 输入对 a、b 和 c 解释为 half 精度复数：(a.x + I*a.y)、(b.x + I*b.y)、(c.x + I*c.y)，并以简单方式执行复数乘法累加运算：a*b + c，即：((a.x*b.x + c.x) - a.y*b.y) + I*((a.x*b.y + c.y) + a.y*b.x)

参数

a – [in] - half2。只读。
b – [in] - half2。只读。
c – [in] - half2。只读。

返回值

half2

复数 a、b 和 c 的复数乘法累加运算结果
__half2 result = __hcmadd(a, b, c) 在数值上与以下结果一致
result.x = __hfma(-a.y, b.y, __hfma(a.x, b.x, c.x))
result.y = __hfma( a.y, b.x, __hfma(a.x, b.y, c.y))

__device__ __half2 __hfma2(const __half2 a, const __half2 b, const __half2 c)

以舍入到最近的偶数模式执行 half2 向量融合乘法累加。

以舍入到最近的偶数模式，对输入 a 和 b 执行 half2 向量乘法，然后将结果与 half2 向量 c 相加，并将结果舍入一次。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。
c – [in] - half2。只读。

返回值

half2

向量 a、b 和 c 的元素级融合乘法累加运算结果。

__device__ __half2 __hfma2_relu(const __half2 a, const __half2 b, const __half2 c)

以舍入到最近的偶数模式执行 half2 向量融合乘法累加，并使用 ReLU 饱和。

以舍入到最近的偶数模式，对输入 a 和 b 执行 half2 向量乘法，然后将结果与 half2 向量 c 相加，并将结果舍入一次。然后将负结果钳位为 0。NaN 结果转换为规范 NaN。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。
c – [in] - half2。只读。

返回值

half2

向量 a、b 和 c 的元素级融合乘法累加运算结果，带有 ReLU 饱和。

__device__ __half2 __hfma2_sat(const __half2 a, const __half2 b, const __half2 c)

以舍入到最近的偶数模式执行 half2 向量融合乘法累加，结果饱和到 [0.0, 1.0]。

以舍入到最近的偶数模式，对输入 a 和 b 执行 half2 向量乘法，然后将结果与 half2 向量 c 相加，并将结果舍入一次，并将结果钳位到 [0.0, 1.0] 范围。NaN 结果被刷新为 +0.0。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。
c – [in] - half2。只读。

返回值

half2

向量 a、b 和 c 的元素级融合乘法累加运算结果，考虑了饱和。

__host__ __device__ __half2 __hmul2(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量乘法。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量乘法。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 a 和 b 的元素级乘法结果。

__host__ __device__ __half2 __hmul2_rn(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量乘法。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量乘法。防止将 mul+add 或 sub 浮点收缩为 fma。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 a 和 b 的元素级乘法结果。

__host__ __device__ __half2 __hmul2_sat(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量乘法，结果饱和到 [0.0, 1.0]。

以舍入到最近的偶数模式，执行输入 a 和 b 的 half2 向量乘法，并将结果钳位到 [0.0, 1.0] 范围。NaN 结果被刷新为 +0.0。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 a 和 b 的元素级乘法结果，考虑了饱和。

__host__ __device__ __half2 __hneg2(const __half2 a)

对输入 half2 数字的两个半部分取反，并返回结果。

对输入 half2 数字 a 的两个半部分取反，并返回结果。

另请参阅

有关更多详细信息，请参阅 __hneg(__half)。

参数

a – [in] - half2。只读。

返回值

half2

返回取反两个半精度的 a。

__host__ __device__ __half2 __hsub2(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量减法。

以舍入到最近偶数模式从输入向量 a 中减去 half2 输入向量 b。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 b 从 a 的减法。

__host__ __device__ __half2 __hsub2_rn(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量减法。

以舍入到最近偶数模式从输入向量 a 中减去 half2 输入向量 b。阻止将 mul+sub 的浮点收缩优化为 fma。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 b 从 a 的减法。

__host__ __device__ __half2 __hsub2_sat(const __half2 a, const __half2 b)

以舍入到最近的偶数模式执行 half2 向量减法，结果饱和到 [0.0, 1.0]。

以舍入到最近偶数模式从输入向量 a 中减去 half2 输入向量 b，并将结果钳位到范围 [0.0, 1.0]。NaN 结果将被刷新为 +0.0。

参数

a – [in] - half2。只读。
b – [in] - half2。只读。

返回值

half2

向量 b 从 a 的减法，考虑了饱和。

__device__ __half2 atomicAdd(__half2 *const address, const __half2 val)

向量加法 val 到存储在全局或共享内存的 address 的值，并将此值写回 address。

对于两个 __half 元素中的每一个，加法操作的原子性是单独保证的；不能保证整个 __half2 作为单个 32 位访问是原子的。

address 的位置必须在全局或共享内存中。否则，此操作具有未定义的行为。此操作由计算能力为 6.x 和更高的设备原生支持，较旧的设备使用模拟路径。

注意

有关此功能的更多详细信息，请参阅 CUDA C++ 编程指南中的原子函数部分。

参数

address – [in] - half2*。全局或共享内存中的地址。
val – [in] - half2。要添加的值。

返回值

half2

从 address 读取的旧值。

__host__ __device__ __half2 operator*(const __half2 &lh, const __half2 &rh)

执行 packed half 乘法运算。

另请参阅

__hmul2(__half2, __half2)

__host__ __device__ __half2 &operator*=(__half2 &lh, const __half2 &rh)

使用乘法运算执行 packed half 复合赋值。

另请参阅

__hmul2(__half2, __half2)

__host__ __device__ __half2 operator+(const __half2 &h): 实现 packed half 一元加运算符，返回输入值。

__host__ __device__ __half2 operator+(const __half2 &lh, const __half2 &rh)

执行 packed half 加法运算。

另请参阅

__hadd2(__half2, __half2)

__host__ __device__ __half2 operator++(__half2 &h, const int ignored)

执行 packed half 后缀递增运算。

另请参阅

__hadd2(__half2, __half2)

__host__ __device__ __half2 &operator++(__half2 &h)

执行 packed half 前缀递增运算。

另请参阅

__hadd2(__half2, __half2)

__host__ __device__ __half2 &operator+=(__half2 &lh, const __half2 &rh)

使用加法运算执行 packed half 复合赋值。

另请参阅

__hadd2(__half2, __half2)

__host__ __device__ __half2 operator--(const __half2 &h)

实现 packed half 一元减运算符。

另请参阅

__hneg2(__half2)

__host__ __device__ __half2 operator-(const __half2 &lh, const __half2 &rh)

执行 packed half 减法运算。

另请参阅

__hsub2(__half2, __half2)

__host__ __device__ __half2 &operator--(__half2 &h)

执行 packed half 前缀递减运算。

另请参阅

__hsub2(__half2, __half2)

__host__ __device__ __half2 operator--(__half2 &h, const int ignored)

执行 packed half 后缀递减运算。

另请参阅

__hsub2(__half2, __half2)

__host__ __device__ __half2 &operator-=(__half2 &lh, const __half2 &rh)

使用减法运算执行 packed half 复合赋值。

另请参阅

__hsub2(__half2, __half2)

__host__ __device__ __half2 operator/(const __half2 &lh, const __half2 &rh)

执行 packed half 除法运算。

另请参阅

__h2div(__half2, __half2)

__host__ __device__ __half2 &operator/=(__half2 &lh, const __half2 &rh)

使用除法运算执行 packed half 复合赋值。

另请参阅

__h2div(__half2, __half2)