6.6. Model Reasoning 

contents

Model Reasoning

For details about the C interface, read BMRUNTIME Development Reference Manual.

See the SAIL User Development Manual for a detailed description of the Python interface.

BMRuntime is used to read the compilation output of the BMCompiler (.bmodel) and drive it to be executed in Sophon TPU chip. BMRuntime provides a rich interface for users to migrate the algorithm. Its software architecture is as follows:

BMRuntime implements C/C++ interface. SAIL module implements Python interface based on encapsulation of BMRuntime and BMLib. This chapter describes common interfaces of C and Python as follows:

BMLib interface: Handle management, memory management, data transfer, API sending and synchronization, A53 enabling, and TPU working frequency setting
C language interface of BMRuntime
This section describes the Python interfaces of BMLib and BMRuntime

6.6.1. Introduction of C language Interface of BMLib Module 

BMLIB interface

It is used for device management and does not belong to BMRuntime. However, it needs to be used together with bmruntime.

The BMLIB interface is a C language interface. The corresponding header file is bmlib_runtime.h, and the corresponding lib library is libbmlib.so.

The BMLIB interface is used for device management, including device memory management.

BMLIB has many interfaces, and this section describes the interfaces that are commonly used by applications.

bm_dev_request

Used to request a device and get the device handle. All other device interfaces need to specify this device handle. devid indicates the device ID. In PCIE mode, you can select a device if multiple devices exist. In SoC mode, set this parameter to 0.

/**
 * @name    bm_dev_request
 * @brief   To create a handle for the given device
 * @ingroup bmlib_runtime
 *
 * @param [out] handle  The created handle
 * @param [in]  devid   Specify on which device to create handle
 * @retval  BM_SUCCESS  Succeeds.
 *          Other code  Fails.
 */
bm_status_t bm_dev_request(bm_handle_t *handle, int devid);

bm_dev_free

Used to release a device. Typically an application starts by requesting a device and releases it before exiting.

/**
 * @name    bm_dev_free
 * @brief   To free a handle
 * @param [in] handle  The handle to free
 */
void bm_dev_free(bm_handle_t handle);

6.6.2. BMRuntime This section describes the interface of module C 

The corresponding header file is bmruntime_interface.h and the corresponding lib library is libbmrt.so.

You are advised to use the C interface when the user program uses it. The interface supports static compilation networks of various shapes and dynamic compilation networks.

bmrt_create

/**
 * @name    bmrt_create
 * @brief   To create the bmruntime with bm_handle.
 * This API creates the bmruntime. It returns a void* pointer which is the pointer
 * of bmruntime. Device id is set when get bm_handle;
 * @param [in] bm_handle     bm handle. It must be initialized by using bmlib.
 * @retval void* the pointer of bmruntime
 */
void* bmrt_create(bm_handle_t bm_handle);

bmrt_destroy

/**
 * @name    bmrt_destroy
 * @brief   To destroy the bmruntime pointer
 * @ingroup bmruntime
 * This API destroy the bmruntime.
 * @param [in]     p_bmrt        Bmruntime that had been created
 */
void bmrt_destroy(void* p_bmrt);

bmrt_load_bmodel

After loading the bmodel file, there will be data of several networks in the bmruntime, and the network can be reasoned later.

/**
 * @name    bmrt_load_bmodel
 * @brief   To load the bmodel which is created by BM compiler
 * This API is to load bmodel created by BM compiler.
 * After loading bmodel, we can run the inference of neuron network.
 * @param   [in]   p_bmrt        Bmruntime that had been created
 * @param   [in]   bmodel_path   Bmodel file directory.
 * @retval true    Load context sucess.
 * @retval false   Load context failed.
 */
bool bmrt_load_bmodel(void* p_bmrt, const char *bmodel_path);

bmrt_load_bmodel_data

Load bmodel, unlike bmrt_load_bmodel, whose bmodel data is stored in memory

/*
Parameters: [in] p_bmrt      - Bmruntime that had been created.
            [in] bmodel_data - Bmodel data pointer to buffer.
            [in] size        - Bmodel data size.
Returns:    bool             - true: success; false: failed.
*/
bool bmrt_load_bmodel_data(void* p_bmrt, const void * bmodel_data, size_t size);

bmrt_get_network_info

bmrt_get_network_info Obtains information about a network based on the network name

/* bm_stage_info_t holds input shapes and output shapes;
every network can contain one or more stages */
typedef struct {
bm_shape_t* input_shapes;   /* input_shapes[0] / [1] / ... / [input_num-1] */
bm_shape_t* output_shapes;  /* output_shapes[0] / [1] / ... / [output_num-1] */
} bm_stage_info_t;

/* bm_tensor_info_t holds all information of one net */
typedef struct {
const char* name;              /* net name */
bool is_dynamic;               /* dynamic or static */
int input_num;                 /* number of inputs */
char const** input_names;      /* input_names[0] / [1] / .../ [input_num-1] */
bm_data_type_t* input_dtypes;  /* input_dtypes[0] / [1] / .../ [input_num-1] */
float* input_scales;           /* input_scales[0] / [1] / .../ [input_num-1] */
int output_num;                /* number of outputs */
char const** output_names;     /* output_names[0] / [1] / .../ [output_num-1] */
bm_data_type_t* output_dtypes; /* output_dtypes[0] / [1] / .../ [output_num-1] */
float* output_scales;          /* output_scales[0] / [1] / .../ [output_num-1] */
int stage_num;                 /* number of stages */
bm_stage_info_t* stages;       /* stages[0] / [1] / ... / [stage_num-1] */
} bm_net_info_t;

bm_net_info_t represents the total information of a network and bm_stage_info_t represents the different shapes supported by the network.

/**
 * @name    bmrt_get_network_info
 * @brief   To get network info by net name
 * @param [in]     p_bmrt         Bmruntime that had been created
 * @param [in]     net_name       Network name
 * @retval  bm_net_info_t*        Pointer to net info, needn't free by user; if net name not found, will return NULL.
 */
const bm_net_info_t* bmrt_get_network_info(void* p_bmrt, const char* net_name);

Sample code:

const char *model_name = "VGG_VOC0712_SSD_300X300_deploy"
const char **net_names = NULL;
bm_handle_t bm_handle;
bm_dev_request(&bm_handle, 0);
void * p_bmrt = bmrt_create(bm_handle);
bool ret = bmrt_load_bmodel(p_bmrt, bmodel.c_str());
std::string bmodel; //bmodel file
int net_num = bmrt_get_network_number(p_bmrt, model_name);
bmrt_get_network_names(p_bmrt, &net_names);
for (int i=0; i<net_num; i++) {
//do somthing here
......
}
free(net_names);
bmrt_destroy(p_bmrt);
bm_dev_free(bm_handle);

bmrt_shape_count

The interface declaration is as follows:

/*
number of shape elements, shape should not be NULL and num_dims should not large than BM_MAX_DIMS_NUM
*/
uint64_t bmrt_shape_count(const bm_shape_t* shape);

To get the number of elements of shape.

For example, if num_dims is 4, the number obtained is dims[0]*dims[1]*dims[2]*dims[3] bm_shape_t Structure introduction:

typedef struct {
int num_dims;
int dims[BM_MAX_DIMS_NUM];
} bm_shape_t;

bm_shape_t means tensor shape, which is tensor in eight dimensions. num_dims is the actual number of dimensions for tensor, dims are the values of each dimension starting from [0], for example (n, c, h, w) four dimensions are corresponding respectively (dims[0], dims[1], dims[2], dims[3]).

If it is a constant shape, the initialization reference is as follows:

bm_shape_t shape = {4, {4,3,228,228}};
bm_shape_t shape_array[2] = {
{4, {4,3,28,28}}, // [0]
{2, {2,4}}, // [1]
}

bm_image_from_mat

//if use this function you need to open USE_OPENCV macro in include/bmruntime/bm_wrapper.hpp
/**
* @name    bm_image_from_mat
* @brief   Convert opencv Mat object to BMCV bm_image object
* @param [in]     in          OPENCV mat object
* @param [out]    out         BMCV bm_image object
* @retval true    Launch success.
* @retval false   Launch failed.
*/
static inline bool bm_image_from_mat (cv::Mat &in, bm_image &out)

//* @brief   Convert opencv multi Mat object to multi BMCV bm_image object
static inline bool bm_image_from_mat (std::vector<cv::Mat> &in, std::vector<bm_image> &out)

bm_image_from_frame

/**
 * @name    bm_image_from_frame
 * @brief   Convert ffmpeg a avframe object to a BMCV bm_image object
 * @ingroup bmruntime
 *
 * @param [in]     bm_handle   the low level device handle
 * @param [in]     in          a read-only avframe
 * @param [out]    out         an uninitialized BMCV bm_image object
                     use bm_image_destroy function to free out parameter until you no longer useing it.
 * @retval true    change success.
 * @retval false   change failed.
 */

static inline bool bm_image_from_frame (bm_handle_t       &bm_handle,
                                      AVFrame           &in,
                                      bm_image          &out)

/**
 * @name    bm_image_from_frame
 * @brief   Convert ffmpeg avframe  to BMCV bm_image object
 * @ingroup bmruntime
 *
 * @param [in]     bm_handle   the low level device handle
 * @param [in]     in          a read-only ffmpeg avframe vector
 * @param [out]    out         an uninitialized BMCV bm_image vector
                   use bm_image_destroy function to free out parameter until you no longer useing it.
 * @retval true    change success.
 * @retval false   chaneg failed.
 */
static inline bool bm_image_from_frame (bm_handle_t                &bm_handle,
                                      std::vector<AVFrame>       &in,
                                      std::vector<bm_image>      &out)

bm_inference

//if use this function you need to open USE_OPENCV macro in include/bmruntime/bm_wrapper.hpp
/**
* @name    bm_inference
* @brief   A block inference wrapper call
* @ingroup bmruntime
*
* This API supports the neuron nework that is static-compiled or dynamic-compiled
* After calling this API, inference on TPU is launched. And the CPU
* program will be blocked.
* This API support single input && single output, and multi thread safety
*
* @param [in]    p_bmrt         Bmruntime that had been created
* @param [in]    input          bm_image of single-input data
* @param [in]    output         Pointer of  single-output buffer
* @param [in]    net_name       The name of the neuron network
* @param [in]    input_shape    single-input shape
*
* @retval true    Launch success.
* @retval false   Launch failed.
*/
static inline bool bm_inference (void           *p_bmrt,
                                bm_image        *input,
                                void           *output,
                                bm_shape_t input_shape,
                                const char   *net_name)

// * This API support single input && multi output, and multi thread safety
static inline bool bm_inference (void                       *p_bmrt,
                                bm_image                    *input,
                                std::vector<void*>         outputs,
                                bm_shape_t             input_shape,
                                const char               *net_name)

// * This API support multiple inputs && multiple outputs, and multi thread safety
static inline bool bm_inference (void                           *p_bmrt,
                                std::vector<bm_image*>          inputs,
                                std::vector<void*>             outputs,
                                std::vector<bm_shape_t>   input_shapes,
                                const char                   *net_name)

6.6.3. Python interface 

This section provides only a brief introduction to the interface functions used in the YOLOv5 use case.

See the SAIL User Development Manual for more interface definitions.

Engine

def __init__(tpu_id):
""" Constructor does not load bmodel.
Parameters
---------
tpu_id : int TPU ID. You can use bm-smi to see available IDs
"""

load

def load(bmodel_path):
"""Load bmodel from file.
Parameters
---------
bmodel_path : str Path to bmode
"""

set_io_mode

def set_io_mode(mode):
""" Set IOMode for a graph.
Parameters
---------
mode : sail.IOMode Specified io mode
"""

get_graph_names

def get_graph_names():
""" Get all graph names in the loaded bmodels.
Returns
------
graph_names : list Graph names list in loaded context
"""

get_input_names

def get_input_names(graph_name):
""" Get all input tensor names of the specified graph.
Parameters
---------
graph_name : str Specified graph name
Returns
------
input_names : list All the input tensor names of the graph
"""

get_output_names

def get_output_names(graph_name):
""" Get all output tensor names of the specified graph.
Parameters
---------
graph_name : str Specified graph name
Returns
------
input_names : list All the output tensor names of the graph
"""

sail.IOMode

# Input tensors are in system memory while output tensors are in device memory sail.IOMode.SYSI
# Input tensors are in device memory while output tensors are in system memory.
sail.IOMode.SYSO
# Both input and output tensors are in system memory.
sail.IOMode.SYSIO
# Both input and output tensors are in device memory.
ail.IOMode.DEVIO

sail.Tensor

def __init__(handle, shape, dtype, own_sys_data, own_dev_data):
""" Constructor allocates system memory and device memory of the tensor.
Parameters
---------
handle : sail.Handle Handle instance
shape : tuple Tensor shape
dytpe : sail.Dtype Data type
own_sys_data : bool Indicator of whether own system memory
own_dev_data : bool Indicator of whether own device memory
"""

get_input_dtype

def get_input_dtype(graph_name, tensor_name):
""" Get scale of an input tensor. Only used for int8 models.
Parameters
---------
graph_name : str The specified graph name tensor_name : str The specified output tensor name
Returns
------
scale: sail.Dtype Data type of the input tensor
"""

get_output_dtype

def get_output_dtype(graph_name, tensor_name):
""" Get the shape of an output tensor in a graph.
Parameters
---------
graph_name : str The specified graph name tensor_name : str The specified output tensor name
Returns
------
tensor_shape : list The shape of the tensor
"""

process

def process(graph_name, input_tensors, output_tensors):
""" Inference with provided input and output tensors.
Parameters
---------
graph_name : str The specified graph name
input_tensors : dict {str : sail.Tensor} Input tensors managed by user
output_tensors : dict {str : sail.Tensor} Output tensors managed by user
"""

get_input_scale

def get_input_scale(graph_name, tensor_name):
""" Get scale of an input tensor. Only used for int8 models.
Parameters
---------
graph_name : str The specified graph name tensor_name : str The specified output tensor name
Returns
------
scale: float32 Scale of the input tensor
"""

get_output_scale

def get_output_scale(graph_name, tensor_name)
""" Get scale of an output tensor. Only used for int8 models.

Parameters
----------
graph_name : str
    The specified graph name
tensor_name : str
    The specified output tensor name

Returns
-------
scale: float32
    Scale of the output tensor
"""

get_input_shape

def get_input_shape(graph_name, tensor_name):
""" Get the maximum dimension shape of an input tensor in a graph.
    There are cases that there are multiple input shapes in one input name,
    This API only returns the maximum dimension one for the memory allocation
    in order to get the best performance.

Parameters
----------
graph_name : str
    The specified graph name
tensor_name : str
    The specified input tensor name

Returns
-------
tensor_shape : list
    The maxmim dimension shape of the tensor
"""

get_output_shape

def get_output_shape(graph_name, tensor_name):
""" Get the shape of an output tensor in a graph.

Parameters
----------
graph_name : str
    The specified graph name
tensor_name : str
    The specified output tensor name

Returns
-------
tensor_shape : list
    The shape of the tensor
"""

6.6. Model Reasoning

6.6.1. Introduction of C language Interface of BMLib Module

6.6.2. BMRuntime This section describes the interface of module C

6.6.3. Python interface

6.6. Model Reasoning 

6.6.1. Introduction of C language Interface of BMLib Module 

6.6.2. BMRuntime This section describes the interface of module C 

6.6.3. Python interface 