About Function Names

The GDMA type functions are named with prefix okk_gdma_ and the BDC type functions are named with prefix okk_bdc_.

Some Definitions

typedef unsigned int local_addr_t
typedef unsigned long long system_addr_t
typedef unsigned long long global_addr_t
typedef char s8x4[4]
typedef unsigned char u8x4[4]
typedef short s16x2[2]
typedef unsigned short u16x2[2]
union x32
float fp32
int s32
unsigned int u32
s8x4 _4N_s8
u8x4 _4N_u8
s16x2 _2N_s16
u16x2 _2N_u16
union x16
short s16
unsigned short u16
union x8
char s8
unsigned char u8
class dim4
int n
int c
int h
int w
class dim2
int h
int w
enum op_type_t

Operation type of the fixed point binary operation.

enumerator S8_OP_S8_TO_S8 = 31

Value of int8 operates value of int8 to value of int8.

enumerator S8_OP_S8_TO_S16 = 23

Value of int8 operates value of int8 to value of int16.

enumerator S8_OP_U8_TO_S8 = 27

Value of int8 operates value of uint8 to value of int8.

enumerator S8_OP_U8_TO_S16 = 19

Value of int8 operates value of uint8 to value of int16.

enumerator U8_OP_S8_TO_S8 = 29

Value of uint8 operates value of int8 to value of int8.

enumerator U8_OP_S8_TO_S16 = 21

Value of uint8 operates value of int8 to value of int16.

enumerator U8_OP_U8_TO_S8 = 25

Value of uint8 operate value of uint8 to value of int8.

enumerator U8_OP_U8_TO_S16 = 17

Value of uint8 operates value of uint8 to value of int16.

enumerator U8_OP_U8_TO_U8 = 9

Value of uint8 operates value of uint8 to value of uint8.

enumerator U8_OP_U8_TO_U16 = 1

Value of uint8 operates value of uint8 to value of uint16.

enumerator S16_OP_S16_TO_S8 = 30

Value of int16 operates value of int16 to value of int8.

enumerator S16_OP_S16_TO_S16 = 22

Value of int16 operates value of int16 to value of int16.

enumerator S16_OP_U16_TO_S8 = 26

Value of int16 operates value of uint16 to value of int8.

enumerator S16_OP_U16_TO_S16 = 18

Value of int16 operates value of uint16 to value of int16.

enumerator U16_OP_S16_TO_S8 = 28

Value of uint16 operates value of int16 to value of int8.

enumerator U16_OP_S16_TO_S16 = 20

Value of int16 operates value of int16 to value of int16.

enumerator U16_OP_U16_TO_S8 = 24

Value of uint16 operates value of uint16 to value of int8.

enumerator U16_OP_U16_TO_S16 = 16

Value of uint16 operates value of uint16 to value of int16.

enumerator U16_OP_U16_TO_U8 = 8

Value of uint16 operates value of uint16 to value of uint8.

enumerator U16_OP_U16_TO_U16 = 0

Value of uint16 operates value of uint16 to value of uint16.

enum mul_type_t

Type of the fixed point multiplication.

enumerator S16_MUL_S8_TO_S16 = 7

Value of int16 multiplies by value of int8 to value of int16.

enumerator U16_MUL_S8_TO_S16 = 6

Value of uint16 multiplies by value of int8 to value of int16.

enumerator U16_MUL_U8_TO_U16 = 0

Value of uint16 multiplies by value of uint8 to value of uint16.

enumerator S16_MUL_U8_TO_S16 = 5

Value of int16 multiplies by value of uint8 to value of int16.

DIV_UP

DIV_UP(a, b) (((a) - 1) / (b) + 1)

ALIGN

ALIGN(a, b) DIV_UP (a, b) * (b)

Common Functions

okk_initialize

void okk_initialize()

Initialize device before calling GDMA and BDC functions.

okk_poll

void okk_poll()

Synchronize device to make all the previous GDMA and BDC functions done.

Remarks

  • Before calling this function, the parallel mode is required to be inactive.

  • After calling this function, it will be blocked until all the previous GDMA and BDC functions are done.

okk_parallel_start

void okk_parallel_start()

Start the parallel mode.

Remarks

  • Before calling this function, the parallel mode is required to be inactive.

  • After calling this function, the parallel mode is set active, and the following GDMA kind and BDC kind functions will run paralle.

okk_parallel_end

void okk_parallel_end()

End the parallel mode.

Remarks

  • Before calling this function, the parallel mode is required to be active.

  • After calling this function, the parallel mode is set inactive, and the following GDMA kind and BDC kind functions will run serially.

okk_is_parallel_state

bool okk_is_parallel_state()

Get the flag of the current paralle mode.

Returns

Flag of the current paralle mode, true means active, otherwise, inactive.

okk_local_mem_size_per_npu

unsigned int okk_local_mem_size_per_npu()

Get the size in bytes of local memory in each NPU.

Returns

Size of local memory per NPU.

okk_l2_sram_size

unsigned int okk_l2_sram_size()

Get the size in bytes of L2-SRAM.

Returns

Size of L2-SRAM.

okk_dtcm_size

unsigned int okk_dtcm_size()

Get the size in bytes of DTCM.

Returns

Size of DTCM.

okk_npu_num

int okk_npu_num()

Get the number of NPUs in each TPU.

Returns

Number of NPUs.

Utils Functions

okk_start_npu_index

int okk_start_npu_index(local_addr_t addr)

Calculate the index of the NPU where the tensor starts.

Parameters

addr – Address of the tensor in local memory.

Returns

Index of NPU.

okk_channle_num_per_npu

int okk_channle_num_per_npu(int start_idx, int num_channels)

Calculate the number of channels in each NPU.

Parameters
  • start_idx – Index of the NPU where the tensor starts.

  • num_channels – Number of channels of the tensor.

Returns

Number of channels per NPU.

okk_128_byte_aligned_stride_for_32bit

void okk_128_byte_aligned_stride_for_32bit(dim4 *stride, int start_idx, const dim4 *shape)

Calculate strides of the tensor in the 128-Byte Aligned Layout for 32-bit data type.

Parameters
  • stride[out] – Pointer to the stride of the tensor.

  • start_idx – Index of the NPU where the tensor starts.

  • shape – Pointer to the shape of the tensor.

okk_128_byte_aligned_stride_for_16bit

void okk_128_byte_aligned_stride_for_16bit(dim4 *stride, int start_idx, const dim4 *shape)

Calculate strides of the tensor in the 128-Byte Aligned Layout for 16-bit data type.

Parameters
  • stride[out] – Pointer to the stride of the tensor.

  • start_idx – Index of the NPU where the tensor starts.

  • shape – Pointer to the shape of the tensor.

okk_128_byte_aligned_stride_for_8bit

void okk_128_byte_aligned_stride_for_8bit(dim4 *stride, int start_idx, const dim4 *shape)

Calculate strides of the tensor in the 128-Byte Aligned Layout for 8-bit data type.

Parameters
  • stride[out] – Pointer to the stride of the tensor.

  • start_idx – Index of the NPU where the tensor starts.

  • shape – Pointer to the shape of the tensor.

okk_compact_stride

void okk_compact_stride(dim4 *stride, int start_idx, const dim4 *shape)

Calculate strides of the tensor in the Compact Layout.

Parameters
  • stride[out] – Pointer to the stride of the tensor.

  • start_idx – Index of the NPU where the tensor starts.

  • shape – Pointer to the shape of the tensor.

okk_continuous_stride

void okk_continuous_stride(dim4 *stride, const dim4 *shape)

Calculate strides of the tensor in the Continuous Layout.

Parameters
  • stride[out] – Pointer to the stride of the tensor.

  • shape – Pointer to the shape of the tensor.

GDMA Functions

okk_gdma_32bit_cpy_S2L

void okk_gdma_32bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from system memory to local memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_32bit_cpy_L2S

void okk_gdma_32bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from local memory to system memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_32bit_cpy_L2L

void okk_gdma_32bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from local memory to local memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_32bit_cpy_S2S

void okk_gdma_32bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from system memory to system memory for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_32bit_matrix_S2L

void okk_gdma_32bit_matrix_S2L(local_addr_t dst_addr, system_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride)

Copy matrix from system memory to local memory for 32-bit data type.

\[dst(x, y) = src(x, y)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in system memory.

  • rows – Number of the rows of the matrix.

  • cols – Number of the columns of the matrix.

  • cols_per_channel – Number of the columns per channel of the destination matrix tensor.

  • row_stride – Stride of the row of the source matrix tensor.

Remarks

  • The destination tensor is in the Matrix Layout.

  • The elements of each row of the source matrix are continuous.

  • The data type of the destination and source tensors is 32-bit.

okk_gdma_32bit_matrix_L2S

void okk_gdma_32bit_matrix_L2S(system_addr_t dst_addr, local_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride)

Copy matrix from local memory to system memory for 32-bit data type.

\[dst(x, y) = src(x, y)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in local memory.

  • rows – Number of the rows of the matrix.

  • cols – Number of the columns of the matrix.

  • cols_per_channel – Number of the columns per channel of the source matrix tensor.

  • row_stride – Stride of the row of the destination matrix tensor.

Remarks

  • The elements of each row of the destination matrix are continuous.

  • The source tensor is in the Matrix Layout.

  • The data type of the destination and source tensors is 32-bit.

okk_gdma_32bit_set_C_system

void okk_gdma_32bit_set_C_system(system_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in system memory to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • C – Constant value of 32-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

okk_gdma_32bit_set_C_local

void okk_gdma_32bit_set_C_local(local_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in local memory to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • C – Constant value of 32-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

okk_gdma_16bit_cpy_S2L

void okk_gdma_16bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from system memory to local memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_16bit_cpy_L2S

void okk_gdma_16bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from local memory to system memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_16bit_cpy_L2L

void okk_gdma_16bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from local memory to local memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_16bit_cpy_S2S

void okk_gdma_16bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from system memory to system memory for 16-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_16bit_set_C_system

void okk_gdma_16bit_set_C_system(system_addr_t dst_addr, x16 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in system memory to be a constant value for 16-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • C – Constant value of 16-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

okk_gdma_16bit_set_C_local

void okk_gdma_16bit_set_C_local(local_addr_t dst_addr, x16 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in local memory to be a constant value for 16-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • C – Constant value of 16-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

okk_gdma_8bit_cpy_S2L

void okk_gdma_8bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor to the destination tensor from system memory to local memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_8bit_cpy_L2S

void okk_gdma_8bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from local memory to system memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_8bit_cpy_L2L

void okk_gdma_8bit_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy the elements of the source tensor to the destination tensor from local memory to local memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • src_addr – Address of the source tensor in local memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_8bit_cpy_S2S

void okk_gdma_8bit_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy tensor from system memory to system memory for 8-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • src_addr – Address of the source tensor in system memory.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_gdma_8bit_set_C_system

void okk_gdma_8bit_set_C_system(system_addr_t dst_addr, x8 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in system memory to be a constant value for 8-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in system memory.

  • C – Constant value of 8-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

okk_gdma_8bit_set_C_local

void okk_gdma_8bit_set_C_local(local_addr_t dst_addr, x8 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor in local memory to be a constant value for 8-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor in local memory.

  • C – Constant value of 8-bit to set.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

Memory Functions

okk_bdc_32bit_cpy

void okk_bdc_32bit_cpy(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Copy the elements of the source tensor to the destination tensor for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_32bit_set_C

void okk_bdc_32bit_set_C(local_addr_t dst_addr, x32 C, const dim4 *shape, const dim4 *dst_stride)

Set all the elements of the destination tensor to be a constant value for 32-bit data type.

\[dst(n, c, h, w) = C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • shape – Pointer to the shape of the destination tensor.

  • dst_stride – Pointer to the stride of the destination tensor.

Remarks

Data Type Converting Functions

okk_bdc_fp32_to_int32

void okk_bdc_fp32_to_int32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{INT32}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the source tensor is fp32, the data type of the destination tensor is int32.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_lookup_int32_to_fp32

void okk_bdc_lookup_int32_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the source tensor is int32, the data type of the destination tensor is fp32.

  • The elements of the source tensor are in [-128, 127].

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_4N_int8_to_fp32

void okk_bdc_4N_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.

  • is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

  • The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.

  • The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.

  • The source and work tensors are in the 4N-mode.

  • The destination, source and work tensors start at the same NPU.

  • dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.

okk_bdc_int8_to_fp32

void okk_bdc_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.

  • is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

  • The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.

  • The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.

  • The destination, source and work tensors start at the same NPU.

  • dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.

  • If the source and work tensors are in the Compact Layout, another restriction is required that C stride is ALIGN (shape->h * shape->w, 4) other than shape->h * shape->w, so the source and work tensors are in an approximate Compact Layout.

FP32 Binary Functions

okk_bdc_add

void okk_bdc_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform addition of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_add_C

void okk_bdc_add_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform addition of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) + C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to add.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_sub

void okk_bdc_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform subtraction of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_sub_C

void okk_bdc_sub_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform subtraction of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) - C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to subtract by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_C_sub

void okk_bdc_C_sub(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform subtraction of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = C - src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be subtracted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_mul

void okk_bdc_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform multiplication of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_mul_C

void okk_bdc_mul_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform multiplication of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to multiply.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_div

void okk_bdc_div(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform division of the elements of the source_0 tensor by the elements of the source_1 tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{src\_0(n, c, h, w)}{src\_1(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_div_C

void okk_bdc_div_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform division of the elements of the source tensor by a constant value for fp32 data type.

\[dst(n, c, h, w) = \frac{src(n, c, h, w)}{C}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to divide by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_C_div

void okk_bdc_C_div(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform division of a constant value by the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = \frac{C}{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be divided.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_mac

void okk_bdc_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_mac_C

void okk_bdc_mac_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform multiply accumulation of the elements of the source and a constant value for fp32 data type.

\[dst(n, c, h, w) = dst(n, c, h, w) + src(n, c, h, w) \times C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to multiply.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_max

void okk_bdc_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform maximum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_max_C

void okk_bdc_max_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform maximum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_min

void okk_bdc_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform minimum operation of the elements of the source_0 and source_1 tensors for fp32 data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_min_C

void okk_bdc_min_C(local_addr_t dst_addr, local_addr_t src_addr, float C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform minimum operation of the elements of the source tensor and a constant value for fp32 data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_greater_select_value

void okk_bdc_greater_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform greater comparing of the elements of the source_0 and source_1 tensors for fp32 data type and then select a consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)>src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • select_val – Constant value to select.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_greater_C_select_value

void okk_bdc_greater_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform greater comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)>C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value be operated.

  • select_val – Constant value to select.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_C_greater_select_value

void okk_bdc_C_greater_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform greater comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}C>src(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value be operated.

  • select_val – Constant value to select.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_equal_select_value

void okk_bdc_equal_select_value(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform equal comparing of a constant value and the elements of the source tensor for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src\_0(n, c, h, w)=src\_1(n, c, h, w),\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • select_val – Constant value to select.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_equal_C_select_value

void okk_bdc_equal_C_select_value(local_addr_t dst_addr, local_addr_t src_addr, float C, x32 select_val, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform equal comparing of the elements of the source tensor and a constant value for fp32 data type and then select another consant value or zero to be the result.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}select\_val&{\text{if }}src(n, c, h, w)=C,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value be operated.

  • select_val – Constant value to select.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the source tensor is fp32, the data type of the destination tensor is some 32-bit type.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

32-Bit Binary Functions

okk_bdc_32bit_and

void okk_bdc_32bit_and(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform bit-wise AND operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{AND}\ src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_32bit_and_C

void okk_bdc_32bit_and_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform bit-wise AND operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{AND}\ C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value of 32-bit to be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_32bit_or

void okk_bdc_32bit_or(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform bit-wise OR operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{OR}\ src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_32bit_or_C

void okk_bdc_32bit_or_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform bit-wise OR operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{OR}\ C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value of 32-bit to be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_32bit_xor

void okk_bdc_32bit_xor(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform bit-wise XOR operation of the elements of the source_0 and source_1 tensors for 32-bit data type.

\[dst(n, c, h, w) = src\_0(n, c, h, w)\ \mathbf{XOR}\ src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

okk_bdc_32bit_xor_C

void okk_bdc_32bit_xor_C(local_addr_t dst_addr, local_addr_t src_addr, x32 C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform bit-wise XOR operation of the elements of the source tensor and a constant value for 32-bit data type.

\[dst(n, c, h, w) = src(n, c, h, w)\ \mathbf{XOR}\ C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value of 32-bit to be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_32bit_arithmetic_shift

void okk_bdc_32bit_arithmetic_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform arithmetic shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

  • The data type of the destination, source_0 and source_1 tensors is int32.

  • The elements of the source_1 tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination, source_0 and source_1 tensors start at the same NPU.

  • dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_logical_shift

void okk_bdc_32bit_logical_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform logical shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

  • The data type of the destination and source_0 tensors is uint32, the data type of the source_1 tensor is int32.

  • The elements of the source_1 tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination, source_0 and source_1 tensors start at the same NPU.

  • dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_arithmetic_shift_C

void okk_bdc_32bit_arithmetic_shift_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of the elements of the source tensor by a constant value for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to shift by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is int32.

  • The constant value C is in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_logical_shift_C

void okk_bdc_32bit_logical_shift_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of the elements of the source tensor by a constant value for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to shift by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is uint32.

  • The constant value C is in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_C_arithmetic_shift

void okk_bdc_32bit_C_arithmetic_shift(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of a constant value by the elements of the source tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be shifted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is int32.

  • The elements of the source tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_32bit_C_logical_shift

void okk_bdc_32bit_C_logical_shift(local_addr_t dst_addr, local_addr_t src_addr, unsigned int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of a constant value by the elements of the source tensor for 32-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be shifted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is int32.

  • The elements of the source tensor are in [-32, 32], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

FP32 Unary Functions

okk_bdc_rsqrt

void okk_bdc_rsqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate reciprocal of the square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \frac{1}{\sqrt{src(n, c, h, w)}}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination and source tensors is fp32.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_sqrt

void okk_bdc_sqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \sqrt{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination and source tensors is fp32.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_taylor_exp

void okk_bdc_taylor_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, int num_series)

Calculate exponential of the elements of the source tensor by taylor expansion.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • num_series – Number of the taylor expansion series.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination and source tensors is fp32.

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • num_series is in [1, 64], a trade-off between performance and accuracy.

  • This function is suitable for the situation that the absolute values of the elements of the source tensor are small, at least less than one.

okk_bdc_lookup_exp

void okk_bdc_lookup_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate exponential of the elements of the source tensor by lookup table.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

Remarks

  • The destination and source tensors are in the 128-Byte Aligned Layout.

  • The data type of the source tensor is int32, the data type of the destination tensor is fp32.

  • The elements of the source tensor are in [-103, 88].

  • The destination and source tensors start at the same NPU.

  • dst_addr and src_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_exp

void okk_bdc_exp(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate exponential of the elements of the source tensor.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

Remarks

  • The destination, source and work tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination, source and work tensors is fp32.

  • The elements of the source tensor are in [-103.0, 88.0].

  • The destination, source and work tensors start at the same NPU.

  • dst_addr, src_addr and work_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_exp_tunable

void okk_bdc_exp_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate exponential of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • num_series – Number of the taylor expansion series.

Remarks

okk_bdc_sigmoid

void okk_bdc_sigmoid(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate sigmoid of the elements of the source tensor.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

Remarks

  • The destination, source and work tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination, source and work tensors is fp32.

  • The elements of the source tensor are in [-103.0, 88.0].

  • The destination, source and work tensors start at the same NPU.

  • dst_addr, src_addr and work_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_sigmoid_tunable

void okk_bdc_sigmoid_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate sigmoid of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • num_series – Number of the taylor expansion series.

Remarks

okk_bdc_tanh

void okk_bdc_tanh(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate tanh of the elements of the source tensor.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

Remarks

  • The destination, source and work tensors are in the 128-Byte Aligned Layout.

  • The data type of the destination, source and work tensors is fp32.

  • The elements of the source tensor are in [-103.0, 88.0].

  • The destination, source and work tensors start at the same NPU.

  • dst_addr, src_addr and work_addr are divisible by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_tanh_tunable

void okk_bdc_tanh_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate tanh of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • num_series – Number of the taylor expansion series.

Remarks

okk_bdc_reciprocal

void okk_bdc_reciprocal(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate reciprocal of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w)^{-1}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_neg

void okk_bdc_neg(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate negative of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = -src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

FP32 Neural Network Functions

okk_bdc_relu

void okk_bdc_relu(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate ReLU of the elements of the source tensor for fp32 data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_bias

void okk_bdc_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w) + bias(0, c, 0, 0)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • bias_addr – Address of the bias tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_scale

void okk_bdc_scale(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform scaling the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • bias_addr – Address of the scale tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_scale_bias

void okk_bdc_scale_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform scaling and adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0) + bias(0, c, 0, 0)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • scale_addr – Address of the scale tensor.

  • bias_addr – Address of the bias tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

okk_bdc_conv2d

void okk_bdc_conv2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int output_c, int kernel_h, int kernel_w, const dim4 *input_stride, const dim4 *kernel_stride, bool using_bias, bool result_add, const Padding *padding, const dim2 *stride, const dim2 *dilation)

Perform 2D convolution with or without adding bias and result accumulation by addtition.

Parameters
  • output_addr – Address of the output tensor.

  • input_addr – Address of the input tensor.

  • weight_addr – Address of the weight tensor.

  • bias_addr – Address of the bias tensor, only used when using_bias = true.

  • input_shape – Pointer to the shape of the input tensor.

  • output_c – Channel number of the output tensor.

  • kernel_h – Height of the convolution kernel.

  • kernel_w – Width of the convolution kernel.

  • input_stride – Pointer to the stride of the input tensor.

  • kernel_stride – Pointer to the stride of the weight tensor.

  • using_bias – Flag of adding bias.

  • result_add – Flag of performing result accumulation by addtition.

  • padding – Pointer to the amount of paddings applied to the input tensor.

  • stride – Pointer to the strides for the cross-correlation.

  • dilation – Pointer to the spacings between the kernel points.

Remarks

okk_bdc_depthwise2d

void okk_bdc_depthwise2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int kernel_h, int kernel_w, bool using_bias, const Padding *padding, const dim2 *stride, const dim2 *dilation)

Perform 2D depthwise convolution with or without adding bias.

Parameters
  • output_addr – Address of the output tensor.

  • input_addr – Address of the input tensor.

  • weight_addr – Address of the weight tensor.

  • bias_addr – Address of the bias tensor, only used when using_bias = true.

  • input_shape – Pointer to the shape of the input tensor.

  • kernel_h – Height of the convolution kernel.

  • kernel_w – Width of the convolution kernel.

  • using_bias – Flag of adding bias.

  • padding – Pointer to the amount of paddings applied to the input tensor.

  • stride – Pointer to the strides for the cross-correlation.

  • dilation – Pointer to the spacings between the kernel points.

Remarks

okk_bdc_avg_pool2d

void okk_bdc_avg_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)

Perform 2D average pooling.

Parameters
  • output_addr – Address of the output tensor.

  • input_addr – Address of the input tensor.

  • input_shape – Pointer to the shape of the input tensor.

  • kernel_h – Height of the convolution kernel.

  • kernel_w – Width of the convolution kernel.

  • padding – Pointer to the amount of paddings applied to the input tensor.

  • stride – Pointer to the strides for the cross-correlation.

Remarks

okk_bdc_avg_pool2d_v2

void okk_bdc_avg_pool2d_v2(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, float scale, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)

Perform 2D average pooling, but with custom scale value instead.

Parameters
  • output_addr – Address of the output tensor.

  • input_addr – Address of the input tensor.

  • input_shape – Pointer to the shape of the input tensor.

  • scale – Scale factor of each pooling window.

  • kernel_h – Height of the convolution kernel.

  • kernel_w – Width of the convolution kernel.

  • padding – Pointer to the amount of paddings applied to the input tensor.

  • stride – Pointer to the strides for the cross-correlation.

Remarks

okk_bdc_max_pool2d

void okk_bdc_max_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)

Perform 2D max pooling.

Parameters
  • output_addr – Address of the output tensor.

  • input_addr – Address of the input tensor.

  • input_shape – Pointer to the shape of the input tensor.

  • kernel_h – Height of the convolution kernel.

  • kernel_w – Width of the convolution kernel.

  • padding – Pointer to the amount of paddings applied to the input tensor.

  • stride – Pointer to the strides for the cross-correlation.

Remarks

okk_bdc_matmul

void okk_bdc_matmul(local_addr_t output_addr, local_addr_t left_addr, local_addr_t right_addr, local_addr_t bias_addr, int left_rows, int left_cols, int right_cols, int left_cols_per_channel, int right_cols_per_channel, bool using_bias, bool result_add)

Perform matrix multiplication with or without adding bias and result accumulation by addtition.

Parameters
  • output_addr – Address of the output tensor.

  • left_addr – Address of the left matrix tensor.

  • right_addr – Address of the right matrix tensor.

  • bias_addr – Address of the bias tensor, only used when using_bias = true.

  • left_rows – Number of the rows of the left matrix.

  • left_cols – Number of the columns of the left matrix.

  • right_cols – Number of the columns of the right matrix.

  • left_cols_per_channel – Number of the columns of the left matrix per channel.

  • right_cols_per_channel – Number of the columns of the right matrix per channel.

  • using_bias – Flag of adding bias.

  • result_add – Flag of performing result accumulation by addtition.

Remarks

Fixed Point Binary Functions

okk_bdc_fixed_point_packed_add

void okk_bdc_fixed_point_packed_add(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform addition of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) + src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) + src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_add_C

void okk_bdc_fixed_point_packed_add_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform addition of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) + C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) + C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to add.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_sub

void okk_bdc_fixed_point_packed_sub(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source_0 tensor by the elements of the source_1 tensor for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) - src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) - src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_sub_C

void okk_bdc_fixed_point_packed_sub_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) - C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) - C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to subtract by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_C_sub

void okk_bdc_fixed_point_packed_C_sub(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform subtraction of the elements of the source tensor by a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (C - src(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = C - src(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be subtracted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_mul

void okk_bdc_fixed_point_packed_mul(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type, int rshift)

Perform multiplication of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_mul_C

void okk_bdc_fixed_point_packed_mul_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type, int rshift)

Perform multiplication of the elements of the source tensor and a constant value for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src(n, c, h, w) \times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src(n, c, h, w) \times C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to multiply.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_mac

void okk_bdc_fixed_point_packed_mac(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[ \begin{align}\begin{aligned}dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\\+ 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\end{aligned}\end{align} \]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src\_0(n, c, h, w)\times src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.

  • op_type – Operation type.

  • lshift – Number of the left-shift to the origin elements of the destination tensor.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_mac_C

void okk_bdc_fixed_point_packed_mac_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, bool is_origin_dst_signed, op_type_t op_type, int lshift, int rshift)

Perform multiply accumulation of the elements of the source_0 and source_1 tensors for fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = dst(n, c, h, w) \times 2^{lshift} + src(n, c, h, w)\times C\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to multiply.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • is_origin_dst_signed – Flag of the data type of the origin destination tensor, true means int16, otherwise, uint16.

  • op_type – Operation type.

  • lshift – Number of the left-shift to the origin elements of the destination tensor.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

okk_bdc_fixed_point_packed_max

void okk_bdc_fixed_point_packed_max(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)

Perform maximum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \max(src\_0(n, c, h, w), src\_1(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • op_type – Operation type.

Remarks

okk_bdc_fixed_point_packed_max_C

void okk_bdc_fixed_point_packed_max_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)

Perform maximum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \max(src(n, c, h, w), C)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source_0 tensor.

  • C – Constant value to be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

Remarks

okk_bdc_fixed_point_packed_min

void okk_bdc_fixed_point_packed_min(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride, op_type_t op_type)

Perform minimum operation of the elements of the source_0 and source_1 tensors for fixed-point data type.

\[dst(n, c, h, w) = \min(src\_0(n, c, h, w), src\_1(n, c, h, w))\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • op_type – Operation type.

Remarks

okk_bdc_fixed_point_packed_min_C

void okk_bdc_fixed_point_packed_min_C(local_addr_t dst_addr, local_addr_t src_addr, int C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, op_type_t op_type)

Perform minimum operation of the elements of the source tensor and a constant value for fixed-point data type.

\[dst(n, c, h, w) = \min(src(n, c, h, w), C)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source_0 tensor.

  • C – Constant value to be operated.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • op_type – Operation type.

Remarks

okk_bdc_fixed_point_packed_16bit_arithmetic_shift

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform arithmetic shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

  • The data type of the destination, source_0 and source_1 tensors is int16.

  • The destination, source_0 and source_1 tensors are in the 2N-mode.

  • The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination, source_0 and source_1 tensors start at the first NPU.

  • dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift

void okk_bdc_fixed_point_packed_16bit_logical_shift(local_addr_t dst_addr, local_addr_t src0_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_stride, const dim4 *src1_stride)

Perform logical shift operation of the elements of the source_0 tensor by the elements of the source_1 tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src\_0(n, c, h, w)\ \mathbf{LSH}\ src\_1(n, c, h, w)&{\text{if }}src\_1(n, c, h, w)>0,\\src\_0(n, c, h, w)\ \mathbf{RSH}\ -src\_1(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_addr – Address of the source_0 tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0 and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_stride – Pointer to the stride of the source_0 tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

Remarks

  • The data type of the destination, source_0 and source_1 tensors is uint16, the data type of the source_1 tensor is int16.

  • The destination, source_0 and source_1 tensors are in the 2N-mode.

  • The elements of the source_1 tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination, source_0 and source_1 tensors start at the first NPU.

  • dst_addr, src0_addr and src1_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride, src0_stride or src1_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C

void okk_bdc_fixed_point_packed_16bit_arithmetic_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to shift by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is int16.

  • The destination and source tensors are in the 2N-mode.

  • The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the first NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_logical_shift_C

void okk_bdc_fixed_point_packed_16bit_logical_shift_C(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of the elements of the source tensor by a constant value for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)\ \mathbf{LSH}\ C&{\text{if }}C>0,\\src(n, c, h, w)\ \mathbf{RSH}\ -C&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to shift by.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is uint16.

  • The destination and source tensors are in the 2N-mode.

  • The constant value C is in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the first NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift

void okk_bdc_fixed_point_packed_16bit_C_arithmetic_shift(local_addr_t dst_addr, local_addr_t src_addr, short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform arithmetic shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be shifted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination and source tensors is int16.

  • The destination and source tensors are in the 2N-mode.

  • The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the first NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_C_logical_shift

void okk_bdc_fixed_point_packed_16bit_C_logical_shift(local_addr_t dst_addr, local_addr_t src_addr, unsigned short C, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform logical shift operation of a constant value by the elements of the source tensor for 16-bit data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}C\ \mathbf{LSH}\ src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\C\ \mathbf{RSH}\ -src(n, c, h, w)&{\text{otherwise}}.\end{cases}}\end{split}\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • C – Constant value to be shifted.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the destination tensor is uint16, the data type of the source tensor is int16.

  • The destination and source tensors are in the 2N-mode.

  • The elements of the source tensor are in [-16, 16], positive one performs left-shift and negative one performs right-shift.

  • The destination and source tensors start at the first NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_mul_8bit

void okk_bdc_fixed_point_packed_16bit_mul_8bit(local_addr_t dst_addr, local_addr_t src0_high_addr, local_addr_t src0_low_addr, local_addr_t src1_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src0_high_stride, const dim4 *src0_low_stride, const dim4 *src1_stride, mul_type_t mul_type, int rshift)

Perform multiplication of the elements of the source_0 (16-bit) and source_1 (8-bit) tensors for mixed fixed-point data type.

If rshift > 0

\[dst(n, c, h, w) = (src\_0(n, c, h, w) \times src\_1(n, c, h, w) + 2^{rshift - 1}) \ \mathbf{RSH}\ rshift\]

else

\[dst(n, c, h, w) = src\_0(n, c, h, w) \times src\_1(n, c, h, w)\]
Parameters
  • dst_addr – Address of the destination tensor.

  • src0_high_addr – Address of the source_0_high tensor.

  • src0_low_addr – Address of the source_0_low tensor.

  • src1_addr – Address of the source_1 tensor.

  • shape – Pointer to the shape of the destination, source_0_high, source_0_low and source_1 tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src0_high_stride – Pointer to the stride of the source_0_high tensor.

  • src0_low_stride – Pointer to the stride of the source_0_low tensor.

  • src1_stride – Pointer to the stride of the source_1 tensor.

  • mul_type – Multiplication type.

  • rshift – Number of the arithmetic right-shift to the result.

Remarks

Fixed Point Unary Functions

okk_bdc_fixed_point_packed_16bit_split_high_8bit

void okk_bdc_fixed_point_packed_16bit_split_high_8bit(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Split the most significant 8 bits from the elements of the source tensor of 16-bit data type.

Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • shape – Pointer to the shape of the destination and source tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

Remarks

  • The data type of the source tensor is int16 or uint16, the data type of the destination tensor is int8 or uint8.

  • The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.

  • The destination and source tensors start at the first NPU.

  • dst_addr and src_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_fixed_point_packed_16bit_split_low_8bit

void okk_bdc_fixed_point_packed_16bit_split_low_8bit(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, const dim4 *work_stride)

Split the least significant 8 bits from the elements of the source tensor of 16-bit data type.

Parameters
  • dst_addr – Address of the destination tensor.

  • src_addr – Address of the source tensor.

  • work_addr – Address of the work tensor.

  • shape – Pointer to the shape of the destination, source and work tensors.

  • dst_stride – Pointer to the stride of the destination tensor.

  • src_stride – Pointer to the stride of the source tensor.

  • src_stride – Pointer to the stride of the work tensor.

Remarks

  • The data type of the source and work tensors is int16 or uint16, the data type of the destination tensor is int8 or uint8.

  • The tensor is in the 4N-mode if its data type is int8 or uint8, 2N-mode if int16 or uint16.

  • The destination and source tensors start at the first NPU.

  • dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.

  • shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

  • If dst_stride, src_stride or work_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

  • The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.