Fixed Point Binary Functions¶

okk_bdc_fixed_point_packed_add¶

void okk_bdc_fixed_point_packed_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform addition of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) + src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination, source_0 and source_1 tensors could be int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8, U8_OP_U8_TO_U16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_U8 and U16_OP_U16_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_add_C¶

void okk_bdc_fixed_point_packed_add_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform addition of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) + C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) + C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to add.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors could be int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8, U8_OP_U8_TO_U16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_U8 and U16_OP_U16_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_sub¶

void okk_bdc_fixed_point_packed_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source_0 tensor by the elements of the source_1 tensor for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) - src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination, source_0 and source_1 tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_sub_C¶

void okk_bdc_fixed_point_packed_sub_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) - C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) - C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to subtract by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_C_sub¶

void okk_bdc_fixed_point_packed_C_sub(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (C - src(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = C - src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be subtracted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mul¶

void okk_bdc_fixed_point_packed_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform multiplication of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 and source_1 tensors is int8 or uint8, the data type of the destination tensor is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8 and U8_OP_U8_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mul_C¶

void okk_bdc_fixed_point_packed_mul_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform multiplication of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) \times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source tensor is int8 or uint8, the data type of the destination tensor is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8 and U8_OP_U8_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mac¶

void okk_bdc_fixed_point_packed_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[ \begin{align}\begin{aligned}dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\\+ 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\end{aligned}\end{align} \]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.
op_type – Operation type.
lshift – Number of the left-shift to the origin elements of the destination tensor.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 and source_1 tensors is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
If is_origin_dst_signed = true, the valid choices of op_type are S8_OP_S8_TO_S16, S8_OP_U8_TO_S16 and U8_OP_S8_TO_S16, otherwise, S8_OP_S8_TO_S16, S8_OP_U8_TO_S16, U8_OP_S8_TO_S16 and U8_OP_U8_TO_U16.
lshift is in [0, 14], rshift is in [0, 31].

okk_bdc_fixed_point_packed_mac_C¶

void okk_bdc_fixed_point_packed_mac_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.
op_type – Operation type.
lshift – Number of the left-shift to the origin elements of the destination tensor.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source tensor is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
If is_origin_dst_signed = true, the valid choices of op_type are S8_OP_S8_TO_S16, S8_OP_U8_TO_S16 and U8_OP_S8_TO_S16, otherwise, S8_OP_S8_TO_S16, S8_OP_U8_TO_S16, U8_OP_S8_TO_S16 and U8_OP_U8_TO_U16.
lshift is in [0, 14], rshift is in [0, 31].

okk_bdc_fixed_point_packed_max¶

void okk_bdc_fixed_point_packed_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)

Perform maximum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.

Remarks

The data type of the destination, source_0 and source_1 tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_max_C¶

void okk_bdc_fixed_point_packed_max_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)

Perform maximum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source_0 tensor.
C – Constant value to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.

Remarks

The data type of the destination and source tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_min¶

void okk_bdc_fixed_point_packed_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)

Perform minimum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.

Remarks

The data type of the destination, source_0 and source_1 tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_min_C¶

void okk_bdc_fixed_point_packed_min_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)

Perform minimum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source_0 tensor.
C – Constant value to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.

Remarks

The data type of the destination and source tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_16bit_arithmetic_shift¶

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform arithmetic shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is int16.
The destination, source_0 and source_1 tensors are in the 2N-mode.
The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift¶

void okk_bdc_fixed_point_packed_16bit_logical_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform logical shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is uint16, the data type of the source_1 tensor is int16.
The destination, source_0 and source_1 tensors are in the 2N-mode.
The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C¶

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int16.
The destination and source tensors are in the 2N-mode.
The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift_C¶

void okk_bdc_fixed_point_packed_16bit_logical_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is uint16.
The destination and source tensors are in the 2N-mode.
The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift¶

void okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int16.
The destination and source tensors are in the 2N-mode.
The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_logical_shift¶

void okk_bdc_fixed_point_packed_16bit_C_logical_shift(local_addr_t dst_addr, local_addr_t src_addr, unsigned short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination tensor is uint16, the data type of the source tensor is int16.
The destination and source tensors are in the 2N-mode.
The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_mul_8bit¶

void okk_bdc_fixed_point_packed_16bit_mul_8bit(local_addr_t dst_addr, local_addr_t src0_high_addr, local_addr_t src0_low_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_high_stride, const dim4 *src0_low_stride, const dim4 *src1_stride, mul_type_t mul_type, int rshift)

Perform multiplication of the elements of the source_0 (16-bit) and source_1 (8-bit) tensors for mixed fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_high_addr – Address of the source_0_high tensor.
src0_low_addr – Address of the source_0_low tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0_high, source_0_low and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_high_stride – Pointer to the stride of the source_0_high tensor.
src0_low_stride – Pointer to the stride of the source_0_low tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
mul_type – Multiplication type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 tensor is int16 or uint16, the data type of the source_1 tensor is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match mul_type.
The source_0_high and source_0_low tensors respectively store the most and least significant 8 bits of elements of the source_0 tensor. (See okk_bdc_fixed_point_packed_16bit_split_high_8bit() and okk_bdc_fixed_point_packed_16bit_split_low_8bit())
The destination tensor is in the 2N-mode, the source_0_high, source_0_low and source_1 tensors are in the 4N-mode.
The destination, source_0_high, source_0_low and source_1 tensors start at the first NPU.
dst_addr, src0_high_addr, src0_low_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_high_stride, src0_low_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of mul_type are S16_MUL_S8_TO_S16, U16_MUL_S8_TO_S16 and U16_MUL_U8_TO_U16. For some unexpected reasons, S16_MUL_U8_TO_S16 is unsupported for BM1684.
rshift is in [0, 31].