FP32 Binary Functions¶

okk_bdc_add¶

void okk_bdc_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform addition of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_add_C¶

void okk_bdc_add_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform addition of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) + C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to add.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_sub¶

void okk_bdc_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform subtraction of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_sub_C¶

void okk_bdc_sub_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform subtraction of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) - C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to subtract by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_sub¶

void okk_bdc_C_sub(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform subtraction of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = C - src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be subtracted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mul¶

void okk_bdc_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform multiplication of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mul_C¶

void okk_bdc_mul_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform multiplication of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_div¶

void okk_bdc_div(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform division of the elements of the source_0 tensor by the elements of the source_1 tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{src\_0(n, c, h, w)}{src\_1(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_div_C¶

void okk_bdc_div_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform division of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = \frac{src(n, c, h, w)}{C}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to divide by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_div¶

void okk_bdc_C_div(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform division of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{C}{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be divided.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mac¶

void okk_bdc_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mac_C¶

void okk_bdc_mac_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform multiply accumulation of the elements of the source and a constant value for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_max¶

void okk_bdc_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform maximum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_max_C¶

void okk_bdc_max_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform maximum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_min¶

void okk_bdc_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform minimum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_min_C¶

void okk_bdc_min_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform minimum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_greater_select_value¶

void okk_bdc_greater_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform greater comparing of the elements of the source_0 and source_1 tensors for fp32 data type and then select a consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)>src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
select_val – Constant value to select.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the source_0 and source_1 tensors is fp32, the data type of the destination tensor is some 32-bit type.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_greater_C_select_value¶

void okk_bdc_greater_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform greater comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)>C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_greater_select_value¶

void okk_bdc_C_greater_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform greater comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}C>src(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_equal_select_value¶

void okk_bdc_equal_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform equal comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)=src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
select_val – Constant value to select.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the source_0 and source_1 tensors is fp32, the data type of the destination tensor is some 32-bit type.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_equal_C_select_value¶

void okk_bdc_equal_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform equal comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)=C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.