TPUKernel 用户开发文档
简介
TPU架构
TPU编程模型
TPU API
- 基础定义
- 舍入模式
- 功能函数
- tpu_initialize
- tpu_poll
- tpu_parallel_start
- tpu_parallel_end
- tpu_is_parallel_state
- tpu_npu_num
- tpu_bank_num
- tpu_eu_num
- tpu_local_mem_size_per_npu
- tpu_l2_sram_size
- tpu_l2_sram_get_start_addr
- tpu_local_mem_get_start_addr
- tpu_global_mem_addr
- tpu_local_mem_addr
- tpu_local_mem_addr_unified
- tpu_l2_sram_addr
- tpu_flush_cache
- tpu_invalidate_cache
- 辅助函数
- GDMA 操作
- tpu_gdma_cpy_S2L
- tpu_gdma_cpy_L2S
- tpu_gdma_cpy_L2L
- tpu_gdma_cpy_S2S
- tpu_gdma_cpy_nc_trans_S2L
- tpu_gdma_cpy_nc_trans_L2S
- tpu_gdma_cpy_nc_trans_L2L
- tpu_gdma_cpy_nc_trans_S2S
- tpu_gdma_cpy_cw_trans_S2L
- tpu_gdma_cpy_cw_trans_L2S
- tpu_gdma_cpy_cw_trans_L2L
- tpu_gdma_cpy_cw_trans_S2S
- tpu_gdma_mask_select_L2S
- tpu_gdma_mask_select_S2S
- tpu_gdma_nonzero_L2S
- tpu_gdma_nonzero_S2S
- tpu_gdma_compact_S2L
- tpu_gdma_compact_L2S
- tpu_gdma_compact_nc_trans_S2L
- tpu_gdma_compact_nc_trans_L2S
- tpu_gdma_set_C_system
- tpu_gdma_set_C_local
- tpu_gdma_matrix_S2L
- tpu_gdma_matrix_L2S
- tpu_gdma_matrix_trans_S2L
- tpu_gdma_matrix_trans_L2S
- tpu_gdma_vector_S2L
- tpu_gdma_vector_L2S
- tpu_gdma_channel_bcast_S2L
- tpu_gdma_channel_bcast_L2L
- tpu_gdma_h_gather_S2L
- tpu_gdma_h_gather_L2S
- tpu_gdma_h_gather_L2L
- tpu_gdma_h_gather_S2S
- tpu_gdma_h_scatter_S2L
- tpu_gdma_h_scatter_L2S
- tpu_gdma_h_scatter_L2L
- tpu_gdma_h_scatter_S2S
- tpu_gdma_system_cpy
- 基础数据操作
- 数据类型转换与舍入操作
- 一元操作
- tpu_bdc_abs
- tpu_bdc_not
- tpu_bdc_neg
- tpu_bdc_fp32_reciprocal
- tpu_bdc_fp32_tunable_reciprocal
- tpu_bdc_fp32_rsqrt
- tpu_bdc_fp32_tunable_rsqrt
- tpu_bdc_fp32_sqrt
- tpu_bdc_fp32_tunable_sqrt
- tpu_bdc_fp32_exp
- tpu_bdc_fp32_expm1
- tpu_bdc_fp32_log
- tpu_bdc_fp32_log1p
- tpu_bdc_fp32_logx
- tpu_bdc_sign
- tpu_bdc_fp32_sin
- tpu_bdc_fp32_cos
- tpu_bdc_fp32_tan
- tpu_bdc_fp32_cot
- tpu_bdc_fp32_arcsin
- tpu_bdc_fp32_arccos
- 二元操作
- tpu_bdc_and
- tpu_bdc_and_C
- tpu_bdc_or
- tpu_bdc_or_C
- tpu_bdc_xor
- tpu_bdc_xor_C
- tpu_bdc_min
- tpu_bdc_min_C
- tpu_bdc_max
- tpu_bdc_max_C
- tpu_bdc_arithmetic_shift
- tpu_bdc_arithmetic_shift_C
- tpu_bdc_logical_shift
- tpu_bdc_logical_shift_C
- tpu_bdc_greater
- tpu_bdc_greater_C
- tpu_bdc_less
- tpu_bdc_less_C
- tpu_bdc_equal
- tpu_bdc_equal_C
- tpu_bdc_greater_equal
- tpu_bdc_greater_equal_C
- tpu_bdc_less_equal
- tpu_bdc_less_equal_C
- tpu_bdc_not_equal
- tpu_bdc_not_equal_C
- tpu_bdc_vc_and
- tpu_bdc_vc_or
- tpu_bdc_vc_xor
- tpu_bdc_vc_min
- tpu_bdc_vc_max
- tpu_bdc_vc_greater
- tpu_bdc_vc_less
- tpu_bdc_vc_equal
- tpu_bdc_vc_greater_equal
- tpu_bdc_vc_less_equal
- tpu_bdc_vc_not_equal
- 浮点二元操作
- tpu_bdc_fp_add
- tpu_bdc_fp_add_C
- tpu_bdc_fp_sub
- tpu_bdc_fp_sub_C
- tpu_bdc_fp_C_sub
- tpu_bdc_fp_mul
- tpu_bdc_fp_mul_C
- tpu_bdc_fp32_div
- tpu_bdc_fp32_div_C
- tpu_bdc_fp32_C_div
- tpu_bdc_fp32_tunable_div
- tpu_bdc_fp32_tunable_div_C
- tpu_bdc_fp32_tunable_C_div
- tpu_bdc_fp32_mac
- tpu_bdc_fp32_mac_C
- tpu_bdc_fp_diff_abs
- tpu_bdc_fp_diff_abs_C
- tpu_bdc_fp32_pow
- tpu_bdc_fp32_pow_C
- tpu_bdc_fp32_C_pow
- tpu_bdc_fp_vc_add
- tpu_bdc_fp_vc_sub
- tpu_bdc_fp_vc_mul
- tpu_bdc_fp32_vc_div
- 整型二元操作
- tpu_bdc_int_add
- tpu_bdc_int_add_C
- tpu_bdc_int_pcs_add
- tpu_bdc_int_pcs_add_C
- tpu_bdc_int_sub
- tpu_bdc_int_sub_C
- tpu_bdc_int_C_sub
- tpu_bdc_int_pcs_sub
- tpu_bdc_int_pcs_sub_C
- tpu_bdc_int_pcs_C_sub
- tpu_bdc_int_mul
- tpu_bdc_int_mul_C
- tpu_bdc_int_pcs_mul
- tpu_bdc_int_pcs_mul_C
- tpu_bdc_int8_mac
- tpu_bdc_int8_mac_C
- tpu_bdc_int_min_C
- tpu_bdc_int_max_C
- tpu_bdc_int_vc_add
- tpu_bdc_int_vc_sub
- tpu_bdc_int_vc_mul
- 比较选择函数
- 浮点矩阵操作
- 整型矩阵操作
- tpu_bdc_int_mm
- tpu_bdc_int_mm_L_trans
- tpu_bdc_int_mm_L_const
- tpu_bdc_int_pcs_mm
- tpu_bdc_int_pcs_mm_L_trans
- tpu_bdc_int_pcs_mm_L_const
- tpu_bdc_int8_mm
- tpu_bdc_int8_mm_L_trans
- tpu_bdc_int8_mm_L_const
- tpu_bdc_int8_zp_mm
- tpu_bdc_int8_zp_mm_R_trans
- tpu_bdc_int8_zp_mm_all_trans
- tpu_bdc_int8_zp_mm_L_const
- tpu_bdc_int8_zp_mm_R_const
- tpu_bdc_int8_zp_mm_L_const_R_trans
- tpu_bdc_int8_zp_mm_L_const_all_trans
- tpu_bdc_int8_zp_mm_R_const_all_trans
- tpu_bdc_int8_pc_zp_mm
- tpu_bdc_int8_pc_zp_mm_R_trans
- tpu_bdc_int8_pc_zp_mm_all_trans
- tpu_bdc_int8_pc_zp_mm_L_const
- tpu_bdc_int8_pc_zp_mm_R_const
- tpu_bdc_int8_pc_zp_mm_L_const_R_trans
- tpu_bdc_int8_pc_zp_mm_L_const_all_trans
- tpu_bdc_int8_pc_zp_mm_R_const_all_trans
- 浮点神经网络操作
- tpu_bdc_fp_bias
- tpu_bdc_fp_scale
- tpu_bdc_fp_scale_bias
- tpu_bdc_fp_scale_bias_C
- tpu_bdc_fp_add_bias_sqr
- tpu_bdc_fp_add_C_sqr
- tpu_bdc_fp_sub_bias_sqr
- tpu_bdc_fp_sub_C_sqr
- tpu_bdc_fp_conv2d
- tpu_bdc_fp_conv2d_for_deconv2d
- tpu_bdc_fp_max_pool2d
- tpu_bdc_fp_ins_avg_pool2d
- tpu_bdc_fp_avg_pool2d
- tpu_bdc_fp_depthwise2d
- 定点神经网络操作
- 激活函数
- scatter 和 gather 操作
- tpu_bdc_w_gather
- tpu_bdc_w_gather_exception
- tpu_bdc_w_scatter
- tpu_bdc_hw_gather
- tpu_bdc_hw_gather_exception
- tpu_bdc_hw_scatter
- tpu_bdc_batch_bcast_w_gather
- tpu_bdc_batch_bcast_w_gather_exception
- tpu_bdc_batch_bcast_w_scatter
- tpu_bdc_batch_bcast_w_mask_select
- tpu_bdc_batch_bcast_h_gather
- tpu_bdc_batch_bcast_h_gather_exception
- tpu_bdc_batch_bcast_h_scatter
- 特殊函数
- tpu_bdc_fp_taylor
- tpu_bdc_table_lookup
- tpu_bdc_arithmetic_sequence_bcast
- tpu_bdc_arithmetic_sequence_distribute
- tpu_bdc_arithmetic_sequence_general
- tpu_bdc_load_fp32_exp_coeff
- tpu_bdc_load_fp32_exp_table
- tpu_bdc_load_fp32_log_coeff
- tpu_bdc_load_fp32_erf_coeff
- tpu_bdc_load_fp32_sin_coeff
- tpu_bdc_load_fp32_cos_coeff
- tpu_bdc_load_fp32_tan_coeff
- tpu_bdc_load_fp32_arcsin_coeff
- 量化操作
- HAU 操作