GDMA 操作

tpu_gdma_general_cpy_S2L

张量的元素从 system memory 拷贝到 local memory。

void tpu_gdma_general_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, const dim4 *src_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

src_shape – 指向 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

dst_shape 和 src_shape 对应的元素个数应相等。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_general_cpy_L2S

张量的元素从 local memory 拷贝到 system memory。

void tpu_gdma_general_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, const dim4 *src_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

src_shape – 指向 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

dst_shape 和 src_shape 对应的元素个数应相等。

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_S2L

张量的元素从 system memory 拷贝到 local memory。

void tpu_gdma_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_L2S

张量的元素从 local memory 拷贝到 system memory。

void tpu_gdma_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_L2L

张量的元素从 local memory 拷贝到 local memory。

void tpu_gdma_cpy_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_S2S

张量的元素从 system memory 拷贝到 system memory。

void tpu_gdma_cpy_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_nc_trans_S2L

张量的元素从 system memory 拷贝到 local memory，N 和 C 维度转置。

void tpu_gdma_cpy_nc_trans_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_nc_trans_L2S

张量的元素从 local memory 拷贝到 system memory，N 和 C 维度转置。

void tpu_gdma_cpy_nc_trans_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_nc_trans_L2L

张量的元素从 local memory 拷贝到 local memory，N 和 C 维度转置。

void tpu_gdma_cpy_nc_trans_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_nc_trans_S2S

张量的元素从 system memory 拷贝到 system memory，N 和 C 维度转置。

void tpu_gdma_cpy_nc_trans_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_cpy_cw_trans_S2L

张量的元素从 system memory 拷贝到 local memory，C 和 W 维度转置。

void tpu_gdma_cpy_cw_trans_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, w, h, c)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->n, dst_shape->w, dst_shape->h, dst_shape->c]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_cpy_cw_trans_L2S

张量的元素从 local memory 拷贝到 system memory，C 和 W 维度转置。

void tpu_gdma_cpy_cw_trans_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, w, h, c)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->n, dst_shape->w, dst_shape->h, dst_shape->c]。

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_cpy_cw_trans_L2L

张量的元素从 local memory 拷贝到 local memory，C 和 W 维度转置。

void tpu_gdma_cpy_cw_trans_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, w, h, c)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->n, dst_shape->w, dst_shape->h, dst_shape->c]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_cpy_cw_trans_S2S

张量的元素从 system memory 拷贝到 system memory，C 和 W 维度转置。

void tpu_gdma_cpy_cw_trans_S2S(system_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, w, h, c)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->n, dst_shape->w, dst_shape->h, dst_shape->c]。

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_mask_select_L2S

将 local memory 中存储的张量按照 mask 筛选后拷贝到 global memory 中。

void tpu_gdma_mask_select_L2S(global_addr_t dst_addr, local_addr_t src_addr, addr_t mask_addr, int mask_in_lmem, const dim4 *shape, data_type_t data_dtype, data_type_t mask_dtype)

参数:

dst_addr – dst 在 global memory 中的地址

src_addr – src 在 local memory 中的地址

mask_addr – mask 在 global memory 或 local memory 中的地址

mask_in_lmem – mask 在 local memory 的标志

shape – 指向 input data 或 mask 的 shape 的指针

data_dtype – input/output data 的数据类型

mask_dtype – mask 的数据类型

注意事项

mask_select 后的 filter_num 可由 tpu_gdma_get_filter_num() 得到。该函数没有输入参数，返回值的类型为 DT_UINT32。

tpu_gdma_mask_select_S2S

将 global memory 中存储的张量按照 mask 筛选后拷贝到 global memory 中。

void tpu_gdma_mask_select_L2S(global_addr_t dst_addr, global_addr_t src_addr, addr_t mask_addr, int mask_in_lmem, const dim4 *shape, data_type_t data_dtype, data_type_t mask_dtype)

参数:

dst_addr – dst 在 global memory 中的地址

src_addr – src 在 global memory 中的地址

mask_addr – mask 在 global memory 或 local memory 中的地址

mask_in_lmem – mask 在 local memory 的标志

shape – 指向 input data 或 mask 的 shape 的指针

data_dtype – input/output data 的数据类型

mask_dtype – mask 的数据类型

注意事项

mask_select 后的 filter_num 可由 tpu_gdma_get_filter_num() 得到。该函数没有输入参数，返回值的类型为 DT_UINT32。

tpu_gdma_nonzero_L2S

将 local memory 的输入张量中不为 0 的元素的 index 输出到 global memory 中。

void tpu_gdma_nonzero_L2S(global_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, data_type_t data_type, unsigned int base_idx)

参数:

dst_addr – dst 在 global memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向输入张量的 shape 的指针

data_type – 输入张量的元素的数据类型

base_idx – dst 的起始 index

注意事项

dst 是 64-byte aligned layout, src 是 compact layout。

dst index 的个数可由 tpu_gdma_get_filter_num() 得到。该函数没有输入参数，返回值的类型为 DT_UINT32。

tpu_gdma_nonzero_S2S

将 global memory 的输入张量中不为 0 的元素的 index 输出到 global memory 中。

void tpu_gdma_nonzero_L2S(global_addr_t dst_addr, global_addr_t src_addr, const dim4 *shape, data_type_t data_type, unsigned int base_idx)

参数:

dst_addr – dst 在 global memory 中的地址

src_addr – src 在 global memory 中的地址

shape – 指向输入张量的 shape 的指针

data_type – 输入张量的元素的数据类型

base_idx – dst 的起始 index

注意事项

dst 是 compact layout, src 是 compact layout。

dst index 的个数可由 tpu_gdma_get_filter_num() 得到。该函数没有输入参数，返回值的类型为 DT_UINT32。

tpu_gdma_compact_S2L

张量的元素从 system memory 拷贝到 local memory。

void tpu_gdma_compact_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

dst 是 compact layout，src 是 continuous layout。

tpu_gdma_compact_L2S

张量的元素从 local memory 拷贝到 system memory。

void tpu_gdma_compact_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, c, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

dst 是 continuous layout，src 是 compact layout。

tpu_gdma_compact_nc_trans_S2L

张量的元素从 system memory 拷贝到 local memory，N 和 C 维度转置。

void tpu_gdma_compact_nc_trans_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *dst_shape, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

dst 是 compact layout，src 是 continuous layout。

tpu_gdma_compact_nc_trans_L2S

张量的元素从 local memory 拷贝到 system memory，N 和 C 维度转置。

void tpu_gdma_compact_nc_trans_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *dst_shape, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(c, n, h, w)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

dst_shape – 指向 dst 的 shape 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [dst_shape->c, dst_shape->n, dst_shape->h, dst_shape->w]。

dst 是 continuous layout，src 是 compact layout。

tpu_gdma_set_C_system

将 system memory 中的张量的元素置成常数。

void tpu_gdma_set_C_system(system_addr_t dst_addr, scalar_t C, const dim4 *shape, const dim4 *dst_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = C}\]

参数:

dst_addr – dst 在 system memory 中的地址

C – 常数

shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

dtype – dst 的元素和 C 的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 continuous layout，否则是 free layout。

dst_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_set_C_local

将 local memory 中的张量的元素置成常数。

void tpu_gdma_set_C_local(local_addr_t dst_addr, scalar_t C, const dim4 *shape, const dim4 *dst_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = C}\]

参数:

dst_addr – dst 在 local memory 中的地址

C – 常数

shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

dtype – dst 的元素和 C 的数据类型

注意事项

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

dst_stride->w 小于等于 128 / tpu_data_type_size(dtype)。

tpu_gdma_matrix_S2L

矩阵的元素从 system memory 拷贝到 local memory。

void tpu_gdma_matrix_S2L(local_addr_t dst_addr, system_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride, data_type_t dtype)

\[\mathsf{dst(x, y) = src(x, y)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

rows – 矩阵的行数

cols – 矩阵的列数

cols_per_channel – dst 在每个 channel 的列数

row_stride – src 的行 stride

dtype – dst 和 src 的元素的数据类型

注意事项

dst 是 matrix layout，src 的每一行的元素是连续存储，通过 row_stride 换行。

tpu_gdma_matrix_L2S

矩阵的元素从 local memory 拷贝到 system memory。

void tpu_gdma_matrix_L2S(system_addr_t dst_addr, local_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride, data_type_t dtype)

\[\mathsf{dst(x, y) = src(x, y)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

rows – 矩阵的行数

cols – 矩阵的列数

cols_per_channel – src 在每个 channel 的列数

row_stride – dst 的行 stride

dtype – dst 和 src 的元素的数据类型

注意事项

dst 的每一行的元素是连续存储，通过 row_stride 换行，src 是 matrix layout。

tpu_gdma_matrix_trans_S2L

矩阵的元素从 system memory 转置拷贝到 local memory。

void tpu_gdma_matrix_trans_S2L(local_addr_t dst_addr, system_addr_t src_addr, int src_rows, int src_cols, int dst_cols_per_channel, int src_row_stride, data_type_t dtype)

\[\mathsf{dst(x, y) = src(y, x)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

src_rows – src 的行数

src_cols – src 的列数

dst_cols_per_channel – dst 在每个 channel 的列数

src_row_stride – src 的行 stride

dtype – dst 和 src 的元素的数据类型

注意事项

dst 是 matrix layout，src 的每一行的元素是连续存储，通过 src_row_stride 换行。

tpu_gdma_matrix_trans_L2S

矩阵的元素从 local memory 转置拷贝到 system memory。

void tpu_gdma_matrix_trans_L2S(system_addr_t dst_addr, local_addr_t src_addr, int src_rows, int src_cols, int src_cols_per_channel, int dst_row_stride, data_type_t dtype)

\[\mathsf{dst(x, y) = src(y, x)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

src_rows – src 的行数

src_cols – src 的列数

src_cols_per_channel – src 在每个 channel 的列数

dst_row_stride – dst 的行 stride

dtype – dst 和 src 的元素的数据类型

注意事项

dst 的每一行的元素是连续存储，通过 dst_row_stride 换行，src 是 matrix layout。

tpu_gdma_vector_S2L

向量的元素从 system memory 拷贝到 local memory。

void tpu_gdma_vector_S2L(local_addr_t dst_addr, system_addr_t src_addr, int len, int len_per_channel, data_type_t dtype)

\[\mathsf{dst(x) = src(x)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

len – 向量的长度

len_per_channel – dst 在每个 channel 的长度

dtype – dst 和 src 的元素的数据类型

注意事项

dst 是 vector layout，src 的元素是连续存储。

tpu_gdma_vector_L2S

向量的元素从 local memory 拷贝到 system memory。

void tpu_gdma_vector_L2S(system_addr_t dst_addr, local_addr_t src_addr, int len, int len_per_channel, data_type_t dtype)

\[\mathsf{dst(x) = src(x)}\]

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

len – 向量的长度

len_per_channel – src 在每个 channel 的长度

dtype – dst 和 src 的元素的数据类型

注意事项

dst 的元素是连续存储，src 是 vector layout。

tpu_gdma_channel_bcast_S2L

张量的元素从 system memory 拷贝到 local memory，channel 广播。

void tpu_gdma_channel_bcast_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, 0, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [shape->n, 1, shape->h, shape->w]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 continuous layout，否则是 free layout。

如果 dst 从 NPU X 开始，X 的取值范围是 [0, NPU_NUM - 1]，且 shape->c 小于等于 NPU_NUM - X。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_channel_bcast_L2L

张量的元素从 local memory 拷贝到 local memory，channel 广播。

void tpu_gdma_channel_bcast_L2L(local_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride, data_type_t dtype)

\[\mathsf{dst(n, c, h, w) = src(n, 0, h, w)}\]

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

注意事项

src 的 shape 是 [shape->n, 1, shape->h, shape->w]。

如果 dst_stride 是 NULL，则 dst 是 64-byte aligned layout，否则是 free layout。

如果 src_stride 是 NULL，则 src 是 64-byte aligned layout，否则是 free layout。

如果 dst 从 NPU X 开始，X 的取值范围是 [0, NPU_NUM - 1]，且 shape->c 小于等于 NPU_NUM - X。

dst_stride->w 和 src_stride->w 只能是 1。

tpu_gdma_h_gather_S2L

通过 h 维度的索引取值得到输出张量，即 output = param[index]。

void tpu_gdma_h_gather_S2L(local_addr_t output_addr, system_addr_t param_addr, addr_t index_addr, bool index_is_local, scalar_t C, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\begin{split}\mathsf{output(0, c, h, w)} = {\begin{cases} \mathsf{param(0, c, index(0, c, h, 0), w)}&\mathsf{\text{如果}~index(0, c, h, 0)~\text{有效}}\\ \mathsf{C}&\mathsf{\text{其他情况}}\end{cases}}\end{split}\]

参数:

output_addr – output 在 local memory 中的地址

param_addr – param 在 system memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

C – 常数

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 64-byte aligned layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 continuous layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, shape->h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，有效取值范围是 [0, param_h - 1]。

tpu_gdma_h_gather_L2S

通过 h 维度的索引取值得到输出张量，即 output = param[index]。

void tpu_gdma_h_gather_L2S(system_addr_t output_addr, local_addr_t param_addr, addr_t index_addr, bool index_is_local, scalar_t C, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\begin{split}\mathsf{output(0, c, h, w)} = {\begin{cases} \mathsf{param(0, c, index(0, c, h, 0), w)}&\mathsf{\text{如果}~index(0, c, h, 0)~\text{有效}}\\ \mathsf{C}&\mathsf{\text{其他情况}}\end{cases}}\end{split}\]

参数:

output_addr – output 在 system memory 中的地址

param_addr – param 在 local memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

C – 常数

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 continuous layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 64-byte aligned layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, shape->h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，有效取值范围是 [0, param_h - 1]。

tpu_gdma_h_gather_L2L

通过 h 维度的索引取值得到输出张量，即 output = param[index]。

void tpu_gdma_h_gather_L2L(local_addr_t output_addr, local_addr_t param_addr, addr_t index_addr, bool index_is_local, scalar_t C, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\begin{split}\mathsf{output(0, c, h, w)} = {\begin{cases} \mathsf{param(0, c, index(0, c, h, 0), w)}&\mathsf{\text{如果}~index(0, c, h, 0)~\text{有效}}\\ \mathsf{C}&\mathsf{\text{其他情况}}\end{cases}}\end{split}\]

参数:

output_addr – output 在 local memory 中的地址

param_addr – param 在 local memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

C – 常数

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 64-byte aligned layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 64-byte aligned layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, shape->h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，有效取值范围是 [0, param_h - 1]。

tpu_gdma_h_gather_S2S

通过 h 维度的索引取值得到输出张量，即 output = param[index]。

void tpu_gdma_h_gather_S2S(system_addr_t output_addr, system_addr_t param_addr, addr_t index_addr, bool index_is_local, scalar_t C, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\begin{split}\mathsf{output(0, c, h, w)} = {\begin{cases} \mathsf{param(0, c, index(0, c, h, 0), w)}&\mathsf{\text{如果}~index(0, c, h, 0)~\text{有效}}\\ \mathsf{C}&\mathsf{\text{其他情况}}\end{cases}}\end{split}\]

参数:

output_addr – output 在 system memory 中的地址

param_addr – param 在 system memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

C – 常数

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 continuous layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 continuous layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, shape->h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，有效取值范围是 [0, param_h - 1]。

tpu_gdma_h_scatter_S2L

通过 h 维度的索引改变输出张量的对应元素，即 output[index] = param。

void tpu_gdma_h_scatter_S2L(local_addr_t output_addr, system_addr_t param_addr, addr_t index_addr, bool index_is_local, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\mathsf{output(0, c, index(0, c, h, 0), w) = param(0, c, h, w)}\]

参数:

output_addr – output 在 local memory 中的地址

param_addr – param 在 system memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 64-byte aligned layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 continuous layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, param_h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，取值范围是 [0, shape->n - 1]。

tpu_gdma_h_scatter_L2S

通过 h 维度的索引改变输出张量的对应元素，即 output[index] = param。

void tpu_gdma_h_scatter_L2S(system_addr_t output_addr, local_addr_t param_addr, addr_t index_addr, bool index_is_local, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\mathsf{output(0, c, index(0, c, h, 0), w) = param(0, c, h, w)}\]

参数:

output_addr – output 在 system memory 中的地址

param_addr – param 在 local memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 continuous layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 64-byte aligned layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, param_h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，取值范围是 [0, shape->h - 1]。

tpu_gdma_h_scatter_L2L

通过 h 维度的索引改变输出张量的对应元素，即 output[index] = param。

void tpu_gdma_h_scatter_L2L(local_addr_t output_addr, local_addr_t param_addr, addr_t index_addr, bool index_is_local, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\mathsf{output(0, c, index(0, c, h, 0), w) = param(0, c, h, w)}\]

参数:

output_addr – output 在 local memory 中的地址

param_addr – param 在 local memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 64-byte aligned layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 64-byte aligned layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, param_h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，取值范围是 [0, shape->h - 1]。

tpu_gdma_h_scatter_S2S

通过 h 维度的索引改变输出张量的对应元素，即 output[index] = param。

void tpu_gdma_h_scatter_S2S(system_addr_t output_addr, system_addr_t param_addr, addr_t index_addr, bool index_is_local, const dim4 *shape, int param_h, const dim4 *output_stride, const dim4 *param_stride, const dim4 *index_stride, data_type_t dtype)

\[\mathsf{output(0, c, index(0, c, h, 0), w) = param(0, c, h, w)}\]

参数:

output_addr – output 在 system memory 中的地址

param_addr – param 在 system memory 中的地址

index_addr – index 在 system memory 或 local memory 中的地址

index_is_local – index 在 local memory 的标志

shape – 指向 output 的 shape 的指针

param_h – param 的 h

output_stride – 指向 output 的 stride 的指针

param_stride – 指向 param 的 stride 的指针

index_stride – 指向 index 的 stride 的指针

dtype – output 和 param 的元素的数据类型

注意事项

如果 output_stride 是 NULL，则 output 是 continuous layout，否则是 free layout。

如果 param_stride 是 NULL，则 param 是 continuous layout，否则是 free layout。

如果 index_stride 是 NULL，则 index 是 64-byte aligned layout （index_is_local 是 true）或 continuous layout （index_is_local 是 false），否则是 free layout。

如果 index_addr 被 512 整除，则性能更优。

shape->n 只能是 1，param 的 shape 是 [1, shape->c, param_h, shape->w]， index 的 shape 是 [1, shape->c, param_h, 1]。

output_stride->w、param_stride->w 和 index_stride->h 只能是 1。

index 的元素的数据类型是 DT_UINT32，取值范围是 [0, shape->h - 1]。

tpu_gdma_system_cpy

system memory 拷贝数据

void tpu_gdma_system_cpy(system_addr_t dst_addr, system_addr_t src_addr, unsigned int count, data_type_t dtype)

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 system memory 中的地址

count – 数据的长度

dtype – dst 和 src 的元素的数据类型

注意事项

dst_addr 和 src_addr 都被元素的数据类型的位宽整除。

tpu_gdma_reverse_S2S

将System Memory中的数据，按照指定维度翻转顺序到另一块System Memory空间中. 目前支持N、C、H维

void tpu_gdma_reverse_S2S(system_addr_t dst_addr, system_addr_t src_addr, dim4 *shape, dim4 *dst_stride, dim4 *src_stride, int reverse_axis, data_type_t dtype);

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

reverse_axis – 指定在哪一维翻转，支持 0-N维，1-C维，2-H维

dtype – dst 和 src 的元素的数据类型

注意事项

BM1684X设备不支持该函数

dst_stride 和 src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

如果提供dst_stride, 必须保证 dst_stride->w 为 1

tpu_gdma_reverse_S2L

将System Memory中的数据，按照指定维度翻转顺序到Local Memory空间中. 目前支持N、C、H维

void tpu_gdma_reverse_S2L(local_addr_t dst_addr, system_addr_t src_addr, dim4 *shape, dim4 *dst_stride, dim4 *src_stride, int reverse_axis, data_type_t dtype);

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

reverse_axis – 指定在哪一维翻转，支持：0-N维，1-C维，2-H维

dtype – dst 和 src 的元素的数据类型

注意事项

BM1684X设备不支持该函数

dst_stride 和 src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

如果提供dst_stride, 必须保证 dst_stride->w 为 1

tpu_gdma_reverse_L2S

将Local Memory中的数据，按照指定维度翻转顺序到System Memory空间中. 目前仅支持C维

void tpu_gdma_reverse_L2S(system_addr_t dst_addr, local_addr_t src_addr, dim4 *shape, dim4 *dst_stride, dim4 *src_stride, int reverse_axis, data_type_t dtype);

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

reverse_axis – 指定在哪一维翻转，支持1-C维

dtype – dst 和 src 的元素的数据类型

注意事项

BM1684X设备不支持该函数

dst_stride 和 src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

如果提供dst_stride, 必须保证 dst_stride->w 为 1

tpu_gdma_reverse_L2L

将Local Memory中的数据，按照指定维度翻转顺序到另一Local Memory空间中. 目前仅支持C维

void tpu_gdma_reverse_L2L(local_addr_t dst_addr, local_addr_t src_addr, dim4 *shape, dim4 *src_stride, dim4 *dst_stride, int reverse_axis, data_type_t dtype);

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_stride – 指向 src 的 stride 的指针

reverse_axis – 指定在哪一维翻转，支持1-C维

dtype – dst 和 src 的元素的数据类型

注意事项

BM1684X设备不支持该函数

dst_stride 和 src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

如果提供dst_stride, 必须保证 dst_stride->w 为 1

tpu_gdma_compress_normal_L2S

将Local Memory空间中的数据压缩存储到System Memory, 该压缩算法不支持解压时随机访问，适用于加载权重或者部分不要切片访问tensor的场景

void tpu_gdma_compress_normal_L2S( global_addr_t dst_addr, local_addr_t src_addr, dim4 *shape, dim4 *src_stride data_type_t dtype, unsigned char bias0, unsigned char bias1, bool zero_guard);

参数:

dst_addr – dst 在 system memory 中的地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

bias0 – 为压缩参数

bias1 – 为压缩参数

zero_guard – 是否把denormal看作0（仅对fp16有效，bf16默认true，其他数据类型默认false）

注意事项

BM1684X设备不支持该函数

src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

被压缩值变换的偏置值(bias0、bias1均为8bit无符整型)，选择合适的参数可以提升压缩率

tpu_gdma_decompress_normal_S2L

将用 tpu_gdma_compress_normal_L2S 生成的在System Memory空间中的压缩数据解压到Local Memory

void tpu_gdma_decompress_normal_S2L(local_addr_t dst_addr, global_addr_t src_addr, dim4 *shape, dim4 *dst_stride, data_type_t dtype, unsigned char bias0, unsigned char bias1, bool zero_guard);

参数:

dst_addr – dst 在 local memory 中的地址

src_addr – src 在 system memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

zero_guard – 是否把denormal看作0（仅对fp16有效，bf16默认true，其他数据类型默认false）

注意事项

BM1684X设备不支持该函数

src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

被压缩值变换的偏置值(bias0、bias1均为8bit无符整型)，必须和压缩时的参数一致

tpu_gdma_compress_normal_max_bytes

计算压缩后的数据最大字节数，用于system memory的空间分配

tpu_gdma_compress_RACU_L2S

将Local Memory空间中的数据压缩存储到System Memory, 该压缩算法支持解压时随机访问，适用于重复访问tensor切片的场景. 算法支持最小按照RACU (Random Access Compression Unit)粒度进行压缩数据的访问. 压缩后会输出RACU和Meta两块数据

void tpu_gdma_compress_RACU_L2S(global_addr_t dst_racu_addr, global_addr_t dst_meta_addr, local_addr_t src_addr, dim4 *shape, dim4 *dst_racu_stride, dim4 *dst_meta_stride, dim4 *src_stride, data_type_t dtype, unsigned char bias0, unsigned char bias1, bool zero_guard);

参数:

dst_racu_addr – dst 在 system memory 中 racu 数据地址

dst_meta_addr – dst 在 system memory 中 meta 数据j地址

src_addr – src 在 local memory 中的地址

shape – 指向 dst 和 src 的 shape 的指针

dst_racu_stride – 输出参数，压缩数据并会计算 racu 数据的 stride，供随机访问解压缩使用

dst_meta_stride – 输出参数，压缩数据并会计算 meta 数据的 stride，供随机访问解压缩使用

src_stride – 指向 src 的 stride 的指针

dtype – dst 和 src 的元素的数据类型

bias0 – 为压缩参数

bias1 – 为压缩参数

zero_guard – 是否把denormal看作0（仅对fp16有效，bf16默认true，其他数据类型默认false）

注意事项

BM1684X设备不支持该函数

src_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 src_stride->w 为 1

被压缩值变换的偏置值(bias0、bias1均为8bit无符整型)，选择合适的参数可以提升压缩率

tpu_gdma_decompress_RACU_S2L

将使用 tpu_gdma_compress_RACU_L2S 操作压缩在System Memory空间中的数据解压到Local Memory

void tpu_gdma_compress_RACU_L2S(local_addr_t dst_addr, global_addr_t src_racu_addr, global_addr_t src_meta_addr, dim4 *shape, dim4 *dst_stride, dim4 *src_racu_stride, dim4 *src_meta_stride, data_type_t dtype, unsigned char bias0, unsigned char bias1, bool zero_guard)

参数:

dst_addr – dst 在 local memory 中的地址

src_racu_addr – src 在 system memory 中 racu 数据地址

src_meta_addr – src 在 system memory 中 meta 数据j地址

shape – 指向 dst 和 src 的 shape 的指针

dst_stride – 指向 dst 的 stride 的指针

src_racu_stride – 指向 src racu 数据的 stride 的指针, 只有n, c值有效，h、w值会被忽略

src_meta_stride – 指向 src meta 数据的 stride 的指针, 只有n, c值有效，h、w值会被忽略

dtype – dst 和 src 的元素的数据类型

bias0 – 为压缩参数，必须和压缩时的参数一致

bias1 – 为压缩参数，必须和压缩时的参数一致

zero_guard – 是否把denormal看作0（仅对fp16有效，bf16默认true，其他数据类型默认false）

注意事项

BM1684X设备不支持该函数

dst_stride 可以为NULL，用默认连续排列

如果提供src_stride, 必须保证 dst_stride->w 为 1

tpu_gdma_compress_RACU_max_racu_bytes

计算RACU算法中压缩后的 racu 数据最大字节数，用于system memory的空间分配

int tpu_gdma_compress_RACU_max_racu_bytes(const dim4 *shape, data_type_t dtype);

参数:

shape – 待压缩数据的 shape 指针

dtype – 待压缩数据的数据类型

tpu_gdma_compress_RACU_max_meta_bytes

计算RACU算法中压缩后的 meta 数据最大字节数，用于system memory的空间分配

int tpu_gdma_compress_RACU_max_meta_bytes(const dim4 *shape, data_type_t dtype);

参数:

shape – 待压缩数据的 shape 指针

dtype – 待压缩数据的数据类型