About Function Names¶

The GDMA type functions are named with prefix okk_gdma_ and the BDC type functions are named with prefix okk_bdc_.

Some Definitions¶

typedef unsigned int local_addr_t¶

typedef unsigned long long system_addr_t¶

typedef unsigned long long global_addr_t¶

typedef char s8x4[4]¶

typedef unsigned char u8x4[4]¶

typedef short s16x2[2]¶

typedef unsigned short u16x2[2]¶

union x32¶

float fp32¶

int s32¶

unsigned int u32¶

s8x4 _4N_s8¶

u8x4 _4N_u8¶

s16x2 _2N_s16¶

u16x2 _2N_u16¶

union x16¶

short s16¶

unsigned short u16¶

union x8¶

char s8¶

unsigned char u8¶

class dim4¶

int n¶

int c¶

int h¶

int w¶

class dim2¶

int h¶

int w¶

enum op_type_t¶: Operation type of the fixed point binary operation.

enumerator S8_OP_S8_TO_S8 = 31¶: Value of int8 operates value of int8 to value of int8.

enumerator S8_OP_S8_TO_S16 = 23¶: Value of int8 operates value of int8 to value of int16.

enumerator S8_OP_U8_TO_S8 = 27¶: Value of int8 operates value of uint8 to value of int8.

enumerator S8_OP_U8_TO_S16 = 19¶: Value of int8 operates value of uint8 to value of int16.

enumerator U8_OP_S8_TO_S8 = 29¶: Value of uint8 operates value of int8 to value of int8.

enumerator U8_OP_S8_TO_S16 = 21¶: Value of uint8 operates value of int8 to value of int16.

enumerator U8_OP_U8_TO_S8 = 25¶: Value of uint8 operate value of uint8 to value of int8.

enumerator U8_OP_U8_TO_S16 = 17¶: Value of uint8 operates value of uint8 to value of int16.

enumerator U8_OP_U8_TO_U8 = 9¶: Value of uint8 operates value of uint8 to value of uint8.

enumerator U8_OP_U8_TO_U16 = 1¶: Value of uint8 operates value of uint8 to value of uint16.

enumerator S16_OP_S16_TO_S8 = 30¶: Value of int16 operates value of int16 to value of int8.

enumerator S16_OP_S16_TO_S16 = 22¶: Value of int16 operates value of int16 to value of int16.

enumerator S16_OP_U16_TO_S8 = 26¶: Value of int16 operates value of uint16 to value of int8.

enumerator S16_OP_U16_TO_S16 = 18¶: Value of int16 operates value of uint16 to value of int16.

enumerator U16_OP_S16_TO_S8 = 28¶: Value of uint16 operates value of int16 to value of int8.

enumerator U16_OP_S16_TO_S16 = 20¶: Value of int16 operates value of int16 to value of int16.

enumerator U16_OP_U16_TO_S8 = 24¶: Value of uint16 operates value of uint16 to value of int8.

enumerator U16_OP_U16_TO_S16 = 16¶: Value of uint16 operates value of uint16 to value of int16.

enumerator U16_OP_U16_TO_U8 = 8¶: Value of uint16 operates value of uint16 to value of uint8.

enumerator U16_OP_U16_TO_U16 = 0¶: Value of uint16 operates value of uint16 to value of uint16.

enum mul_type_t¶: Type of the fixed point multiplication.

enumerator S16_MUL_S8_TO_S16 = 7¶: Value of int16 multiplies by value of int8 to value of int16.

enumerator U16_MUL_S8_TO_S16 = 6¶: Value of uint16 multiplies by value of int8 to value of int16.

enumerator U16_MUL_U8_TO_U16 = 0¶: Value of uint16 multiplies by value of uint8 to value of uint16.

enumerator S16_MUL_U8_TO_S16 = 5¶: Value of int16 multiplies by value of uint8 to value of int16.

DIV_UP¶: DIV_UP(a, b) (((a) - 1) / (b) + 1)

ALIGN¶: ALIGN(a, b) DIV_UP (a, b) * (b)

Common Functions¶

okk_initialize¶

void okk_initialize()¶: Initialize device before calling GDMA and BDC functions.

okk_poll¶

void okk_poll()¶

Synchronize device to make all the previous GDMA and BDC functions done.

Remarks

Before calling this function, the parallel mode is required to be inactive.
After calling this function, it will be blocked until all the previous GDMA and BDC functions are done.

okk_parallel_start¶

void okk_parallel_start()¶

Start the parallel mode.

Remarks

Before calling this function, the parallel mode is required to be inactive.
After calling this function, the parallel mode is set active, and the following GDMA kind and BDC kind functions will run paralle.

okk_parallel_end¶

void okk_parallel_end()¶

End the parallel mode.

Remarks

Before calling this function, the parallel mode is required to be active.
After calling this function, the parallel mode is set inactive, and the following GDMA kind and BDC kind functions will run serially.

okk_is_parallel_state¶

bool okk_is_parallel_state()¶

Get the flag of the current paralle mode.

Returns: Flag of the current paralle mode, true means active, otherwise, inactive.

okk_local_mem_size_per_npu¶

unsigned int okk_local_mem_size_per_npu()¶

Get the size in bytes of local memory in each NPU.

Returns: Size of local memory per NPU.

okk_l2_sram_size¶

unsigned int okk_l2_sram_size()¶

Get the size in bytes of L2-SRAM.

Returns: Size of L2-SRAM.

okk_dtcm_size¶

unsigned int okk_dtcm_size()¶

Get the size in bytes of DTCM.

Returns: Size of DTCM.

okk_npu_num¶

int okk_npu_num()¶

Get the number of NPUs in each TPU.

Returns: Number of NPUs.

Utils Functions¶

okk_start_npu_index¶

int okk_start_npu_index(local_addr_t addr)¶

Calculate the index of the NPU where the tensor starts.

Parameters: addr – Address of the tensor in local memory.
Returns: Index of NPU.

okk_channle_num_per_npu¶

int okk_channle_num_per_npu(int start_idx, int num_channels)¶

Calculate the number of channels in each NPU.

Parameters

start_idx – Index of the NPU where the tensor starts.
num_channels – Number of channels of the tensor.

Returns

Number of channels per NPU.

okk_128_byte_aligned_stride_for_32bit¶

void okk_128_byte_aligned_stride_for_32bit(dim4 *stride, int start_idx, const dim4 *shape)¶

Calculate strides of the tensor in the 128-Byte Aligned Layout for 32-bit data type.

Parameters

stride[out] – Pointer to the stride of the tensor.
start_idx – Index of the NPU where the tensor starts.
shape – Pointer to the shape of the tensor.

okk_128_byte_aligned_stride_for_16bit¶

void okk_128_byte_aligned_stride_for_16bit(dim4 *stride, int start_idx, const dim4 *shape)¶

Calculate strides of the tensor in the 128-Byte Aligned Layout for 16-bit data type.

Parameters

stride[out] – Pointer to the stride of the tensor.
start_idx – Index of the NPU where the tensor starts.
shape – Pointer to the shape of the tensor.

okk_128_byte_aligned_stride_for_8bit¶

void okk_128_byte_aligned_stride_for_8bit(dim4 *stride, int start_idx, const dim4 *shape)¶

Calculate strides of the tensor in the 128-Byte Aligned Layout for 8-bit data type.

Parameters

stride[out] – Pointer to the stride of the tensor.
start_idx – Index of the NPU where the tensor starts.
shape – Pointer to the shape of the tensor.

okk_compact_stride¶

void okk_compact_stride(dim4 *stride, int start_idx, const dim4 *shape)¶

Calculate strides of the tensor in the Compact Layout.

Parameters

stride[out] – Pointer to the stride of the tensor.
start_idx – Index of the NPU where the tensor starts.
shape – Pointer to the shape of the tensor.

okk_continuous_stride¶

void okk_continuous_stride(dim4 *stride, const dim4 *shape)¶

Calculate strides of the tensor in the Continuous Layout.

Parameters

stride[out] – Pointer to the stride of the tensor.
shape – Pointer to the shape of the tensor.

GDMA Functions¶

okk_gdma_32bit_cpy_S2L¶

void okk_gdma_32bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from system memory to local memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 32-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_32bit_cpy_L2S¶

void okk_gdma_32bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from local memory to system memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 32-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_32bit_cpy_L2L¶

void okk_gdma_32bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from local memory to local memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 32-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_32bit_cpy_S2S¶

void okk_gdma_32bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from system memory to system memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 32-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_32bit_matrix_S2L¶

void okk_gdma_32bit_matrix_S2L(local_addr_t dst_addr, system_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride)¶

Copy matrix from system memory to local memory for 32-bit data type.

\[dst(x, y) = src(x, y)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in system memory.
rows – Number of the rows of the matrix.
cols – Number of the columns of the matrix.
cols_per_channel – Number of the columns per channel of the destination matrix tensor.
row_stride – Stride of the row of the source matrix tensor.

Remarks

The destination tensor is in the Matrix Layout.
The elements of each row of the source matrix are continuous.
The data type of the destination and source tensors is 32-bit.

okk_gdma_32bit_matrix_L2S¶

void okk_gdma_32bit_matrix_L2S(system_addr_t dst_addr, local_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride)¶

Copy matrix from local memory to system memory for 32-bit data type.

\[dst(x, y) = src(x, y)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in local memory.
rows – Number of the rows of the matrix.
cols – Number of the columns of the matrix.
cols_per_channel – Number of the columns per channel of the source matrix tensor.
row_stride – Stride of the row of the destination matrix tensor.

Remarks

The elements of each row of the destination matrix are continuous.
The source tensor is in the Matrix Layout.
The data type of the destination and source tensors is 32-bit.

okk_gdma_32bit_set_C_system¶

void okk_gdma_32bit_set_C_system(system_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in system memory to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in system memory.
C – Constant value of 32-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 32-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.

okk_gdma_32bit_set_C_local¶

void okk_gdma_32bit_set_C_local(local_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in local memory to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in local memory.
C – Constant value of 32-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 32-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.

okk_gdma_16bit_cpy_S2L¶

void okk_gdma_16bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from system memory to local memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 16-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_16bit_cpy_L2S¶

void okk_gdma_16bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from local memory to system memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 16-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_16bit_cpy_L2L¶

void okk_gdma_16bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from local memory to local memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 16-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_16bit_cpy_S2S¶

void okk_gdma_16bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from system memory to system memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 16-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_16bit_set_C_system¶

void okk_gdma_16bit_set_C_system(system_addr_t dst_addr, x16 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in system memory to be a constant value for 16-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in system memory.
C – Constant value of 16-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 16-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.

okk_gdma_16bit_set_C_local¶

void okk_gdma_16bit_set_C_local(local_addr_t dst_addr, x16 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in local memory to be a constant value for 16-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in local memory.
C – Constant value of 16-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 16-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.

okk_gdma_8bit_cpy_S2L¶

void okk_gdma_8bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor to the destination tensor from system memory to local memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 8-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_8bit_cpy_L2S¶

void okk_gdma_8bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from local memory to system memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 8-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_8bit_cpy_L2L¶

void okk_gdma_8bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy the elements of the source tensor to the destination tensor from local memory to local memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in local memory.
src_addr – Address of the source tensor in local memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is some 8-bit type.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.
If src_stride is NULL, the source tensor is in the 128-Byte Aligned Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_8bit_cpy_S2S¶

void okk_gdma_8bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy tensor from system memory to system memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor in system memory.
src_addr – Address of the source tensor in system memory.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is 8-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.
If src_stride is NULL, the source tensor is in the Continuous Layout.
dst_stride->w and src_stride->w are only allowed to be one.

okk_gdma_8bit_set_C_system¶

void okk_gdma_8bit_set_C_system(system_addr_t dst_addr, x8 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in system memory to be a constant value for 8-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in system memory.
C – Constant value of 8-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 8-bit.
If dst_stride is NULL, the destination tensor is in the Continuous Layout.

okk_gdma_8bit_set_C_local¶

void okk_gdma_8bit_set_C_local(local_addr_t dst_addr, x8 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor in local memory to be a constant value for 8-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor in local memory.
C – Constant value of 8-bit to set.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of destination tensor is 8-bit.
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.

Memory Functions¶

okk_bdc_32bit_cpy¶

void okk_bdc_32bit_cpy(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Copy the elements of the source tensor to the destination tensor for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensor is 32-bit.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_set_C¶

void okk_bdc_32bit_set_C(local_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)¶

Set all the elements of the destination tensor to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]

Parameters

dst_addr – Address of the destination tensor.
shape – Pointer to the shape of the destination tensor.
dst_stride – Pointer to the stride of the destination tensor.

Remarks

The data type of the destination tensor is 32-bit.
dst_addr is divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride is NULL, the destination tensor is in the 128-Byte Aligned Layout.

Data Type Converting Functions¶

okk_bdc_fp32_to_int32¶

void okk_bdc_fp32_to_int32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)¶

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{INT32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is fp32, the data type of the destination tensor is int32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_lookup_int32_to_fp32¶

void okk_bdc_lookup_int32_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)¶

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is int32, the data type of the destination tensor is fp32.
The elements of the source tensor are in [-128, 127].
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_4N_int8_to_fp32¶

void okk_bdc_4N_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)¶

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.
is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.
The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.
The source and work tensors are in the 4N-mode.
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.

okk_bdc_int8_to_fp32¶

void okk_bdc_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)¶

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.
is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.
The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.
If the source and work tensors are in the Compact Layout, another restriction is required that C stride is ALIGN (shape->h * shape->w, 4) other than shape->h * shape->w, so the source and work tensors are in an approximate Compact Layout.

FP32 Binary Functions¶

okk_bdc_add¶

void okk_bdc_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform addition of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_add_C¶

void okk_bdc_add_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform addition of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) + C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to add.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_sub¶

void okk_bdc_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform subtraction of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_sub_C¶

void okk_bdc_sub_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform subtraction of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) - C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to subtract by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_sub¶

void okk_bdc_C_sub(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform subtraction of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = C - src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be subtracted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mul¶

void okk_bdc_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform multiplication of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mul_C¶

void okk_bdc_mul_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform multiplication of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_div¶

void okk_bdc_div(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform division of the elements of the source_0 tensor by the elements of the source_1 tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{src\_0(n, c, h, w)}{src\_1(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_div_C¶

void okk_bdc_div_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform division of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = \frac{src(n, c, h, w)}{C}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to divide by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_div¶

void okk_bdc_C_div(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform division of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{C}{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be divided.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mac¶

void okk_bdc_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_mac_C¶

void okk_bdc_mac_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform multiply accumulation of the elements of the source and a constant value for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_max¶

void okk_bdc_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform maximum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_max_C¶

void okk_bdc_max_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform maximum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_min¶

void okk_bdc_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform minimum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is fp32.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_min_C¶

void okk_bdc_min_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform minimum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_greater_select_value¶

void okk_bdc_greater_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform greater comparing of the elements of the source_0 and source_1 tensors for fp32 data type and then select a consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)>src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
select_val – Constant value to select.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the source_0 and source_1 tensors is fp32, the data type of the destination tensor is some 32-bit type.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_greater_C_select_value¶

void okk_bdc_greater_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform greater comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)>C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_greater_select_value¶

void okk_bdc_C_greater_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform greater comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}C>src(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_equal_select_value¶

void okk_bdc_equal_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform equal comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)=src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
select_val – Constant value to select.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the source_0 and source_1 tensors is fp32, the data type of the destination tensor is some 32-bit type.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_equal_C_select_value¶

void okk_bdc_equal_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform equal comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)=C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value be operated.
select_val – Constant value to select.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

32-Bit Binary Functions¶

okk_bdc_32bit_and¶

void okk_bdc_32bit_and(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform bit-wise AND operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{AND}\ src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is 32-bit.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_and_C¶

void okk_bdc_32bit_and_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform bit-wise AND operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{AND}\ C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value of 32-bit to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensor is 32-bit.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_or¶

void okk_bdc_32bit_or(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform bit-wise OR operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{OR}\ src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is 32-bit.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_or_C¶

void okk_bdc_32bit_or_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform bit-wise OR operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{OR}\ C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value of 32-bit to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensor is 32-bit.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_xor¶

void okk_bdc_32bit_xor(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform bit-wise XOR operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{XOR}\ src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is 32-bit.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_xor_C¶

void okk_bdc_32bit_xor_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform bit-wise XOR operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{XOR}\ C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value of 32-bit to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensor is 32-bit.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_arithmetic_shift¶

void okk_bdc_32bit_arithmetic_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform arithmetic shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is int32.
The elements of the source_1 tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_logical_shift¶

void okk_bdc_32bit_logical_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform logical shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination and source_0 tensors is uint32, the data type of the source_1 tensor is int32.
The elements of the source_1 tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the same NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_arithmetic_shift_C¶

void okk_bdc_32bit_arithmetic_shift_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform arithmetic shift operation of the elements of the source tensor by a constant value for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int32.
The constant value C is in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_logical_shift_C¶

void okk_bdc_32bit_logical_shift_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform logical shift operation of the elements of the source tensor by a constant value for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is uint32.
The constant value C is in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_C_arithmetic_shift¶

void okk_bdc_32bit_C_arithmetic_shift(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform arithmetic shift operation of a constant value by the elements of the source tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int32.
The elements of the source tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_C_logical_shift¶

void okk_bdc_32bit_C_logical_shift(local_addr_t dst_addr, local_addr_t src_addr, unsigned int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform logical shift operation of a constant value by the elements of the source tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int32.
The elements of the source tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

FP32 Unary Functions¶

okk_bdc_rsqrt¶

void okk_bdc_rsqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)¶

Calculate reciprocal of the square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \frac{1}{\sqrt{src(n, c, h, w)}}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_sqrt¶

void okk_bdc_sqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)¶

Calculate square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \sqrt{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_taylor_exp¶

void okk_bdc_taylor_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, int num_series)¶

Calculate exponential of the elements of the source tensor by taylor expansion.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
This function is suitable for the situation that the absolute values of the elements of the source tensor are small, at least less than one.

okk_bdc_lookup_exp¶

void okk_bdc_lookup_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)¶

Calculate exponential of the elements of the source tensor by lookup table.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is int32, the data type of the destination tensor is fp32.
The elements of the source tensor are in [-103, 88].
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_exp¶

void okk_bdc_exp(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)¶

Calculate exponential of the elements of the source tensor.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_exp_tunable¶

void okk_bdc_exp_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)¶

Calculate exponential of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_exp() is equivalent to okk_bdc_exp_tunable() with num_series = 32.

okk_bdc_sigmoid¶

void okk_bdc_sigmoid(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)¶

Calculate sigmoid of the elements of the source tensor.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_sigmoid_tunable¶

void okk_bdc_sigmoid_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)¶

Calculate sigmoid of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_sigmoid() is equivalent to okk_bdc_sigmoid_tunable() with num_series = 32.

okk_bdc_tanh¶

void okk_bdc_tanh(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)¶

Calculate tanh of the elements of the source tensor.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_tanh_tunable¶

void okk_bdc_tanh_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)¶

Calculate tanh of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_tanh() is equivalent to okk_bdc_tanh_tunable() with num_series = 32.

okk_bdc_reciprocal¶

void okk_bdc_reciprocal(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Calculate reciprocal of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w)^{-1}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_neg¶

void okk_bdc_neg(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Calculate negative of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = -src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

FP32 Neural Network Functions¶

okk_bdc_relu¶

void okk_bdc_relu(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Calculate ReLU of the elements of the source tensor for fp32 data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination, source and work tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_bias¶

void okk_bdc_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w) + bias(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
bias_addr – Address of the bias tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The bias tensor is in the Compact Layout.
The data type of the destination, source and bias tensors is fp32.
The shape of the bias tensor is [1, shape->c, 1, 1].
The destination, source and bias tensors start at the same NPU.
dst_addr, src_addr and bias_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_scale¶

void okk_bdc_scale(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform scaling the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
bias_addr – Address of the scale tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The scale tensor is in the Compact Layout.
The data type of the destination, source and scale tensors is fp32.
The shape of the scale tensor is [1, shape->c, 1, 1].
The destination, source and scale tensors start at the same NPU.
dst_addr, src_addr and scale_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_scale_bias¶

void okk_bdc_scale_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform scaling and adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0) + bias(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
scale_addr – Address of the scale tensor.
bias_addr – Address of the bias tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The scale and bias tensors are in the Compact Layout.
The data type of the destination, source, scale and bias tensors is fp32.
The shape of the scale and bias tensors is [1, shape->c, 1, 1].
The destination, source, scale and bias tensors start at the same NPU.
dst_addr, src_addr, scale_addr and bias_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_conv2d¶

void okk_bdc_conv2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int output_c, int kernel_h, int kernel_w, const dim4 *input_stride, const dim4 *kernel_stride, bool using_bias, bool result_add, const Padding *padding, const dim2 *stride, const dim2 *dilation)¶

Perform 2D convolution with or without adding bias and result accumulation by addtition.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
weight_addr – Address of the weight tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
input_shape – Pointer to the shape of the input tensor.
output_c – Channel number of the output tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
input_stride – Pointer to the stride of the input tensor.
kernel_stride – Pointer to the stride of the weight tensor.
using_bias – Flag of adding bias.
result_add – Flag of performing result accumulation by addtition.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.
dilation – Pointer to the spacings between the kernel points.

Remarks

The output tensor is in the 128-Byte Aligned Layout, the bias tensor is in the Compact Layout.
The data type of the output, input, weight and bias tensors is fp32.
The weight tensor is in the 2IC-mode.
The output, weight and bias tensors start at the same NPU.
output_addr is divisible by 128, input_addr, weight_addr and bias_addr are divisible by 4.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, output_c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - ((kernel_h - 1) * dilation->h + 1)) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - ((kernel_w - 1) * dilation->w + 1)) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the bias tensor is [1, output_c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15], dilation->h and dilation->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
If dilation is NULL, the dilation value will be one as default.
If input_stride is NULL, the input tensor is in the 128-Byte Aligned Layout.

okk_bdc_depthwise2d¶

void okk_bdc_depthwise2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int kernel_h, int kernel_w, bool using_bias, const Padding *padding, const dim2 *stride, const dim2 *dilation)¶

Perform 2D depthwise convolution with or without adding bias.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
weight_addr – Address of the weight tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
using_bias – Flag of adding bias.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.
dilation – Pointer to the spacings between the kernel points.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout, the weight and bias tensors are in the Compact Layout.
The data type of the output, input, weight and bias tensors is fp32.
The output, input, weight and bias tensors start at the same NPU.
output_addr and input_addr are divisible by 128, weight_addr and bias_addr are divisible by 4.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - ((kernel_h - 1) * dilation->h + 1)) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - ((kernel_w - 1) * dilation->w + 1)) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15], dilation->h and dilation->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
If dilation is NULL, the dilation value will be one as default.

okk_bdc_avg_pool2d¶

void okk_bdc_avg_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)¶

Perform 2D average pooling.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout.
The data type of the output and input tensors is fp32.
The output and input tensors start at the same NPU.
output_addr and input_addr are divisible by 128.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - kernel_h) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - kernel_w) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.

okk_bdc_avg_pool2d_v2¶

void okk_bdc_avg_pool2d_v2(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, float scale, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)¶

Perform 2D average pooling, but with custom scale value instead.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
input_shape – Pointer to the shape of the input tensor.
scale – Scale factor of each pooling window.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.

Remarks

Refer to okk_bdc_avg_pool2d.

okk_bdc_max_pool2d¶

void okk_bdc_max_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)¶

Perform 2D max pooling.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout.
The data type of the output and input tensors is fp32.
The output and input tensors start at the same NPU.
output_addr and input_addr are divisible by 128.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - kernel_h) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - kernel_w) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
The implicit padding value is -3.4028234663852886E38 (0xff7fffff).

okk_bdc_matmul¶

void okk_bdc_matmul(local_addr_t output_addr, local_addr_t left_addr, local_addr_t right_addr, local_addr_t bias_addr, int left_rows, int left_cols, int right_cols, int left_cols_per_channel, int right_cols_per_channel, bool using_bias, bool result_add)¶

Perform matrix multiplication with or without adding bias and result accumulation by addtition.

Parameters

output_addr – Address of the output tensor.
left_addr – Address of the left matrix tensor.
right_addr – Address of the right matrix tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
left_rows – Number of the rows of the left matrix.
left_cols – Number of the columns of the left matrix.
right_cols – Number of the columns of the right matrix.
left_cols_per_channel – Number of the columns of the left matrix per channel.
right_cols_per_channel – Number of the columns of the right matrix per channel.
using_bias – Flag of adding bias.
result_add – Flag of performing result accumulation by addtition.

Remarks

The output, left matrix, right matrix and bias tensors are in the matrix layout.
The data type of the output, left matrix, right matrix and bias tensors is fp32.
The output, right matrix and bias tensors start at the same NPU.
output_addr, left_addr, right_addr and bias_addr are divisible by 128.
The bias is a 1-by-right_cols matrix.
left_cols_per_channel is in [1, min(128, left_cols)], left_rows is in [1, 65535], and right_cols_per_channel in [1, min(128, right_cols)].
It is required that ceil(left_cols / left_cols_per_channel) <= 4095 and ceil(right_cols / right_cols_per_channel) <= 4095.

Fixed Point Binary Functions¶

okk_bdc_fixed_point_packed_add¶

void okk_bdc_fixed_point_packed_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)¶

Perform addition of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) + src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination, source_0 and source_1 tensors could be int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8, U8_OP_U8_TO_U16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_U8 and U16_OP_U16_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_add_C¶

void okk_bdc_fixed_point_packed_add_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)¶

Perform addition of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) + C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) + C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to add.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors could be int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8, U8_OP_U8_TO_U16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_U8 and U16_OP_U16_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_sub¶

void okk_bdc_fixed_point_packed_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)¶

Perform subtraction of the elements of the source_0 tensor by the elements of the source_1 tensor for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) - src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination, source_0 and source_1 tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_sub_C¶

void okk_bdc_fixed_point_packed_sub_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)¶

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) - C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) - C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to subtract by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_C_sub¶

void okk_bdc_fixed_point_packed_C_sub(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)¶

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (C - src(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = C - src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be subtracted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data types of the destination and source tensors are int8, uint8, int16 or uint16, and required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_S8, U8_OP_U8_TO_S16, S16_OP_S16_TO_S8, S16_OP_S16_TO_S16, S16_OP_U16_TO_S8, S16_OP_U16_TO_S16, U16_OP_S16_TO_S8, U16_OP_S16_TO_S16, U16_OP_U16_TO_S8 and U16_OP_U16_TO_S16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mul¶

void okk_bdc_fixed_point_packed_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)¶

Perform multiplication of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 and source_1 tensors is int8 or uint8, the data type of the destination tensor is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8 and U8_OP_U8_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mul_C¶

void okk_bdc_fixed_point_packed_mul_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)¶

Perform multiplication of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) \times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source tensor is int8 or uint8, the data type of the destination tensor is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, S8_OP_S8_TO_S16, S8_OP_U8_TO_S8, S8_OP_U8_TO_S16, U8_OP_S8_TO_S8, U8_OP_S8_TO_S16, U8_OP_U8_TO_U8 and U8_OP_U8_TO_U16.
rshift is in [0, 31].

okk_bdc_fixed_point_packed_mac¶

void okk_bdc_fixed_point_packed_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)¶

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[ \begin{align}\begin{aligned}dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\\+ 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\end{aligned}\end{align} \]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.
op_type – Operation type.
lshift – Number of the left-shift to the origin elements of the destination tensor.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 and source_1 tensors is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
If is_origin_dst_signed = true, the valid choices of op_type are S8_OP_S8_TO_S16, S8_OP_U8_TO_S16 and U8_OP_S8_TO_S16, otherwise, S8_OP_S8_TO_S16, S8_OP_U8_TO_S16, U8_OP_S8_TO_S16 and U8_OP_U8_TO_U16.
lshift is in [0, 14], rshift is in [0, 31].

okk_bdc_fixed_point_packed_mac_C¶

void okk_bdc_fixed_point_packed_mac_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)¶

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to multiply.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.
op_type – Operation type.
lshift – Number of the left-shift to the origin elements of the destination tensor.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source tensor is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
If is_origin_dst_signed = true, the valid choices of op_type are S8_OP_S8_TO_S16, S8_OP_U8_TO_S16 and U8_OP_S8_TO_S16, otherwise, S8_OP_S8_TO_S16, S8_OP_U8_TO_S16, U8_OP_S8_TO_S16 and U8_OP_U8_TO_U16.
lshift is in [0, 14], rshift is in [0, 31].

okk_bdc_fixed_point_packed_max¶

void okk_bdc_fixed_point_packed_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)¶

Perform maximum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.

Remarks

The data type of the destination, source_0 and source_1 tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_max_C¶

void okk_bdc_fixed_point_packed_max_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)¶

Perform maximum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source_0 tensor.
C – Constant value to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.

Remarks

The data type of the destination and source tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_min¶

void okk_bdc_fixed_point_packed_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)¶

Perform minimum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
op_type – Operation type.

Remarks

The data type of the destination, source_0 and source_1 tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_min_C¶

void okk_bdc_fixed_point_packed_min_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)¶

Perform minimum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source_0 tensor.
C – Constant value to be operated.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
op_type – Operation type.

Remarks

The data type of the destination and source tensors is int8, uint8, int16 or uint16, and the data types are required to match op_type.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
C is in [-128, 127] if the data type of it is int8, [0, 255] if uint8, [-32768, 32767] if int16, and [0, 65535] if uint16.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of op_type are S8_OP_S8_TO_S8, U8_OP_U8_TO_U8, S16_OP_S16_TO_S16 and U16_OP_U16_TO_U16.

okk_bdc_fixed_point_packed_16bit_arithmetic_shift¶

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform arithmetic shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is int16.
The destination, source_0 and source_1 tensors are in the 2N-mode.
The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift¶

void okk_bdc_fixed_point_packed_16bit_logical_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)¶

Perform logical shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src0_addr – Address of the source_0 tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0 and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_stride – Pointer to the stride of the source_0 tensor.
src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

The data type of the destination, source_0 and source_1 tensors is uint16, the data type of the source_1 tensor is int16.
The destination, source_0 and source_1 tensors are in the 2N-mode.
The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination, source_0 and source_1 tensors start at the first NPU.
dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C¶

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform arithmetic shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int16.
The destination and source tensors are in the 2N-mode.
The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift_C¶

void okk_bdc_fixed_point_packed_16bit_logical_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform logical shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to shift by.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is uint16.
The destination and source tensors are in the 2N-mode.
The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift¶

void okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform arithmetic shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is int16.
The destination and source tensors are in the 2N-mode.
The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_logical_shift¶

void okk_bdc_fixed_point_packed_16bit_C_logical_shift(local_addr_t dst_addr, local_addr_t src_addr, unsigned short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Perform logical shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
C – Constant value to be shifted.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination tensor is uint16, the data type of the source tensor is int16.
The destination and source tensors are in the 2N-mode.
The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_mul_8bit¶

void okk_bdc_fixed_point_packed_16bit_mul_8bit(local_addr_t dst_addr, local_addr_t src0_high_addr, local_addr_t src0_low_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_high_stride, const dim4 *src0_low_stride, const dim4 *src1_stride, mul_type_t mul_type, int rshift)¶

Perform multiplication of the elements of the source_0 (16-bit) and source_1 (8-bit) tensors for mixed fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src0_high_addr – Address of the source_0_high tensor.
src0_low_addr – Address of the source_0_low tensor.
src1_addr – Address of the source_1 tensor.
shape – Pointer to the shape of the destination, source_0_high, source_0_low and source_1 tensors.
dst_stride – Pointer to the stride of the destination tensor.
src0_high_stride – Pointer to the stride of the source_0_high tensor.
src0_low_stride – Pointer to the stride of the source_0_low tensor.
src1_stride – Pointer to the stride of the source_1 tensor.
mul_type – Multiplication type.
rshift – Number of the arithmetic right-shift to the result.

Remarks

The data type of the source_0 tensor is int16 or uint16, the data type of the source_1 tensor is int8 or uint8, the data type of the destination tensor is int16 or uint16, and the data types are required to match mul_type.
The source_0_high and source_0_low tensors respectively store the most and least significant 8 bits of elements of the source_0 tensor. (See okk_bdc_fixed_point_packed_16bit_split_high_8bit() and okk_bdc_fixed_point_packed_16bit_split_low_8bit())
The destination tensor is in the 2N-mode, the source_0_high, source_0_low and source_1 tensors are in the 4N-mode.
The destination, source_0_high, source_0_low and source_1 tensors start at the first NPU.
dst_addr, src0_high_addr, src0_low_addr and src1_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src0_high_stride, src0_low_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The valid choices of mul_type are S16_MUL_S8_TO_S16, U16_MUL_S8_TO_S16 and U16_MUL_U8_TO_U16. For some unexpected reasons, S16_MUL_U8_TO_S16 is unsupported for BM1684.
rshift is in [0, 31].

Fixed Point Unary Functions¶

okk_bdc_fixed_point_packed_16bit_split_high_8bit¶

void okk_bdc_fixed_point_packed_16bit_split_high_8bit(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)¶

Split the most significant 8 bits from the elements of the source tensor of 16-bit data type.

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the source tensor is int16 or uint16, the data type of the destination tensor is int8 or uint8.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_split_low_8bit¶

void okk_bdc_fixed_point_packed_16bit_split_low_8bit(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, const dim4 *work_stride)¶

Split the least significant 8 bits from the elements of the source tensor of 16-bit data type.

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.
src_stride – Pointer to the stride of the work tensor.

Remarks

The data type of the source and work tensors is int16 or uint16, the data type of the destination tensor is int8 or uint8.
The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.
The destination and source tensors start at the first NPU.
dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride, src_stride or work_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.