FP32 Unary Functions¶

okk_bdc_rsqrt¶

void okk_bdc_rsqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate reciprocal of the square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \frac{1}{\sqrt{src(n, c, h, w)}}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_sqrt¶

void okk_bdc_sqrt(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate square-root of the elements of the source tensor.

\[dst(n, c, h, w) = \sqrt{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_taylor_exp¶

void okk_bdc_taylor_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, int num_series)

Calculate exponential of the elements of the source tensor by taylor expansion.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
This function is suitable for the situation that the absolute values of the elements of the source tensor are small, at least less than one.

okk_bdc_lookup_exp¶

void okk_bdc_lookup_exp(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Calculate exponential of the elements of the source tensor by lookup table.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is int32, the data type of the destination tensor is fp32.
The elements of the source tensor are in [-103, 88].
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

okk_bdc_exp¶

void okk_bdc_exp(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate exponential of the elements of the source tensor.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_exp_tunable¶

void okk_bdc_exp_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate exponential of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = e^{src(n, c, h, w)}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_exp() is equivalent to okk_bdc_exp_tunable() with num_series = 32.

okk_bdc_sigmoid¶

void okk_bdc_sigmoid(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate sigmoid of the elements of the source tensor.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_sigmoid_tunable¶

void okk_bdc_sigmoid_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate sigmoid of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{sigmoid}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_sigmoid() is equivalent to okk_bdc_sigmoid_tunable() with num_series = 32.

okk_bdc_tanh¶

void okk_bdc_tanh(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape)

Calculate tanh of the elements of the source tensor.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.

okk_bdc_tanh_tunable¶

void okk_bdc_tanh_tunable(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, int num_series)

Calculate tanh of the elements of the source tensor with tunable number of the taylor expansion series.

\[dst(n, c, h, w) = \text{tanh}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
num_series – Number of the taylor expansion series.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout.
The data type of the destination, source and work tensors is fp32.
The elements of the source tensor are in [-103.0, 88.0].
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
num_series is in [1, 64], a trade-off between performance and accuracy.
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr or src_addr = work_addr is not allowed.
okk_bdc_tanh() is equivalent to okk_bdc_tanh_tunable() with num_series = 32.

okk_bdc_reciprocal¶

void okk_bdc_reciprocal(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate reciprocal of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = src(n, c, h, w)^{-1}\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.

okk_bdc_neg¶

void okk_bdc_neg(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride)

Calculate negative of the elements of the source tensor for fp32 data type.

\[dst(n, c, h, w) = -src(n, c, h, w)\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
shape – Pointer to the shape of the destination and source tensors.
dst_stride – Pointer to the stride of the destination tensor.
src_stride – Pointer to the stride of the source tensor.

Remarks

The data type of the destination and source tensors is fp32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
If dst_stride or src_stride is NULL, the relative tensor is in the 128-Byte Aligned Layout.