FP32 Neural Network Functions¶

okk_bdc_relu¶

void okk_bdc_relu(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate ReLU of the elements of the source tensor for fp32 data type.

\[\begin{split}dst(n, c, h, w) = {\begin{cases}src(n, c, h, w)&{\text{if }}src(n, c, h, w)>0,\\0&{\text{otherwise}}.\end{cases}}\end{split}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination, source and work tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_bias¶

void okk_bdc_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w) + bias(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
bias_addr – Address of the bias tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The bias tensor is in the Compact Layout.
The data type of the destination, source and bias tensors is fp32.
The shape of the bias tensor is [1, shape->c, 1, 1].
The destination, source and bias tensors start at the same NPU.
dst_addr, src_addr and bias_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_scale¶

void okk_bdc_scale(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform scaling the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
bias_addr – Address of the scale tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The scale tensor is in the Compact Layout.
The data type of the destination, source and scale tensors is fp32.
The shape of the scale tensor is [1, shape->c, 1, 1].
The destination, source and scale tensors start at the same NPU.
dst_addr, src_addr and scale_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_scale_bias¶

void okk_bdc_scale_bias(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t scale_addr, local_addr_t bias_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Perform scaling and adding bias to the elements of the source tensor per channel.

\[dst(n, c, h, w) = src(n, c, h, w)\times scale(0, c, 0, 0) + bias(0, c, 0, 0)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
scale_addr – Address of the scale tensor.
bias_addr – Address of the bias tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The scale and bias tensors are in the Compact Layout.
The data type of the destination, source, scale and bias tensors is fp32.
The shape of the scale and bias tensors is [1, shape->c, 1, 1].
The destination, source, scale and bias tensors start at the same NPU.
dst_addr, src_addr, scale_addr and bias_addr are divisible by 4, where dst_addr and src_addr are preferred to be divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_conv2d¶

void okk_bdc_conv2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int output_c, int kernel_h, int kernel_w, const dim4 *input_stride, const dim4 *kernel_stride, bool using_bias, bool result_add, const Padding *padding, const dim2 *stride, const dim2 *dilation)

Perform 2D convolution with or without adding bias and result accumulation by addtition.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
weight_addr – Address of the weight tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
input_shape – Pointer to the shape of the input tensor.
output_c – Channel number of the output tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
input_stride – Pointer to the stride of the input tensor.
kernel_stride – Pointer to the stride of the weight tensor.
using_bias – Flag of adding bias.
result_add – Flag of performing result accumulation by addtition.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.
dilation – Pointer to the spacings between the kernel points.

Remarks

The output tensor is in the 128-Byte Aligned Layout, the bias tensor is in the Compact Layout.
The data type of the output, input, weight and bias tensors is fp32.
The weight tensor is in the 2IC-mode.
The output, weight and bias tensors start at the same NPU.
output_addr is divisible by 128, input_addr, weight_addr and bias_addr are divisible by 4.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, output_c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - ((kernel_h - 1) * dilation->h + 1)) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - ((kernel_w - 1) * dilation->w + 1)) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the bias tensor is [1, output_c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15], dilation->h and dilation->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
If dilation is NULL, the dilation value will be one as default.
If input_stride is NULL, the input tensor is in the 128-Byte Aligned Layout.

okk_bdc_depthwise2d¶

void okk_bdc_depthwise2d(local_addr_t output_addr, local_addr_t input_addr, local_addr_t weight_addr, local_addr_t bias_addr, const dim4 *input_shape, int kernel_h, int kernel_w, bool using_bias, const Padding *padding, const dim2 *stride, const dim2 *dilation)

Perform 2D depthwise convolution with or without adding bias.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
weight_addr – Address of the weight tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
using_bias – Flag of adding bias.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.
dilation – Pointer to the spacings between the kernel points.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout, the weight and bias tensors are in the Compact Layout.
The data type of the output, input, weight and bias tensors is fp32.
The output, input, weight and bias tensors start at the same NPU.
output_addr and input_addr are divisible by 128, weight_addr and bias_addr are divisible by 4.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - ((kernel_h - 1) * dilation->h + 1)) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - ((kernel_w - 1) * dilation->w + 1)) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15], dilation->h and dilation->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
If dilation is NULL, the dilation value will be one as default.

okk_bdc_avg_pool2d¶

void okk_bdc_avg_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)

Perform 2D average pooling.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout.
The data type of the output and input tensors is fp32.
The output and input tensors start at the same NPU.
output_addr and input_addr are divisible by 128.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - kernel_h) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - kernel_w) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.

okk_bdc_max_pool2d¶

void okk_bdc_max_pool2d(local_addr_t output_addr, local_addr_t input_addr, const dim4 *input_shape, int kernel_h, int kernel_w, const Padding *padding, const dim2 *stride)

Perform 2D max pooling.

Parameters

output_addr – Address of the output tensor.
input_addr – Address of the input tensor.
input_shape – Pointer to the shape of the input tensor.
kernel_h – Height of the convolution kernel.
kernel_w – Width of the convolution kernel.
padding – Pointer to the amount of paddings applied to the input tensor.
stride – Pointer to the strides for the cross-correlation.

Remarks

The output and input tensors are in the 128-Byte Aligned Layout.
The data type of the output and input tensors is fp32.
The output and input tensors start at the same NPU.
output_addr and input_addr are divisible by 128.
input_shape->n is in [1, 65535], input_shape->c is in [1, 4095], input_shape->h and input_shape->w are in [1, 2047].
It is required that

input_shape->h + padding->top + padding->bottom <= 2047,

input_shape->w + padding->left + padding->right <= 2047.
The shape of the output tensor is [input_shape->n, input_shape->c, output_h, output_w], where

output_h = (input_shape->h + padding->top + padding->bottom - kernel_h) / stride->h + 1,

output_w = (input_shape->w + padding->left + padding->right - kernel_w) / stride->w + 1,

and it is required that output_h <= 2047 and output_w <= 2047.
The shape of the weight tensor is [1, input_shape->c, kernel_h, kernel_w], the shape of the bias tensor is [1, input_shape->c, 1, 1].
padding->top, padding->bottom, padding->left and padding->right are in [0, 15], stride->h and stride->w are in [1, 15].
If padding is NULL, there will be no paddings.
If stride is NULL, the stride value will be one as default.
The implicit padding value is -3.4028234663852886E38 (0xff7fffff).

okk_bdc_matmul¶

void okk_bdc_matmul(local_addr_t output_addr, local_addr_t left_addr, local_addr_t right_addr, local_addr_t bias_addr, int left_rows, int left_cols, int right_cols, int left_cols_per_channel, int right_cols_per_channel, bool using_bias, bool result_add)

Perform matrix multiplication with or without adding bias and result accumulation by addtition.

Parameters

output_addr – Address of the output tensor.
left_addr – Address of the left matrix tensor.
right_addr – Address of the right matrix tensor.
bias_addr – Address of the bias tensor, only used when using_bias = true.
left_rows – Number of the rows of the left matrix.
left_cols – Number of the columns of the left matrix.
right_cols – Number of the columns of the right matrix.
left_cols_per_channel – Number of the columns of the left matrix per channel.
right_cols_per_channel – Number of the columns of the right matrix per channel.
using_bias – Flag of adding bias.
result_add – Flag of performing result accumulation by addtition.

Remarks

The output, left matrix, right matrix and bias tensors are in the matrix layout.
The data type of the output, left matrix, right matrix and bias tensors is fp32.
The output, right matrix and bias tensors start at the same NPU.
output_addr, left_addr, right_addr and bias_addr are divisible by 128.
The bias is a 1-by-right_cols matrix.
left_cols_per_channel is in [1, min(128, left_cols)], left_rows is in [1, 65535], and right_cols_per_channel in [1, min(128, right_cols)].
It is required that ceil(left_cols / left_cols_per_channel) <= 4095 and ceil(right_cols / right_cols_per_channel) <= 4095.