Data Type Converting Functions¶

okk_bdc_fp32_to_int32¶

void okk_bdc_fp32_to_int32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{INT32}(src(n, c, h, w))\]

Parameters

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is fp32, the data type of the destination tensor is int32.
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

void okk_bdc_lookup_int32_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape)

Convert the elements of the source tensor from int32 to fp32 by lookup table.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

Remarks

The destination and source tensors are in the 128-Byte Aligned Layout.
The data type of the source tensor is int32, the data type of the destination tensor is fp32.
The elements of the source tensor are in [-128, 127].
The destination and source tensors start at the same NPU.
dst_addr and src_addr are divisible by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].

void okk_bdc_4N_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.
is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.
The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.
The source and work tensors are in the 4N-mode.
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.

void okk_bdc_int8_to_fp32(local_addr_t dst_addr, local_addr_t src_addr, local_addr_t work_addr, const dim4 *shape, bool is_signed, bool is_aligned_layout)

Convert the elements of the source tensor from int8 or uint8 to fp32.

\[dst(n, c, h, w) = \mathbf{FP32}(src(n, c, h, w))\]

Parameters

dst_addr – Address of the destination tensor.
src_addr – Address of the source tensor.
work_addr – Address of the work tensor.
shape – Pointer to the shape of the destination, source and work tensors.
is_signed – Flag of the data type of the source tensor, true means int8, otherwise, uint8.
is_aligned_layout – Flag of the layout of the destination, source and work tensor, true means 128-Byte Aligned Layout, otherwise, Compact Layout.

Remarks

The destination, source and work tensors are in the 128-Byte Aligned Layout or Compact Layout simutanously.
The data type of the source and work tensors is int8 or uint8, the data type of the destination tensor is fp32.
The destination, source and work tensors start at the same NPU.
dst_addr, src_addr and work_addr are divisible by 4 and preferred by 128.
shape->n, shape->h and shape->w are in [1, 65535], shape->c is in [1, 4095].
The work tensor is a workspace to store temporary tensor with the same size as the source tensor, dst_addr = work_addr is not allowed.
If the source and work tensors are in the Compact Layout, another restriction is required that C stride is ALIGN (shape->h * shape->w, 4) other than shape->h * shape->w, so the source and work tensors are in an approximate Compact Layout.