4.2. Overview of Transplant Development

4.2.1. Algorithm Transplant Process

Product development based on the Sophon BM168X chip goes through the following stages:

  1. Evaluation and Selection: Determine the product form used according to the application scenario.

  2. Model Migration: Convert the model generated by the training under the original deep learning framework into FP32 BModel running on the BM168X platform. Use BMLang to develop unsupported operators if necessary. Use quantitative sets to generate INT8 BModel, and test and tune to ensure that the accuracy of the INT8 BModel meets the requirements (4N Batch BModel is recommended to get the best performance).

  3. Algorithm Transplant: Based on BM168X hardware acceleration interface, transplant the pre and post processing and inference of the existing Algorithm.

  4. Program Transplant: Transplant algorithm engine code such as task management, resource scheduling and business code such as logic processing, result presentation and data pushing.

  5. Test Tuning: Network performance and accuracy testing, stress testing, in-depth optimisation based on network compilation, quantization tools, multi-card multi-core, task pipelining, etc.

  6. Deploy Joint Tuning: The algorithm service is packaged and deployed on BM168X hardware products, and the functionality is tuned with the business platform or integration platform in real-life scenarios; if necessary, the parameters are adjusted in the production environment and data is collected to further optimize the model.

Before starting the migration, please ensure that you have downloaded and installed and configured the required environment as described in section 1:

  • Host and hardware environment are ready

  • SDK compressed packages are correctly unpacked

  • If using a PCIE accelerator card, ensure that the drivers are installed properly and that the device is discovered properly

4.2.2. Typical Video AI Analysis Tasks

A typical AI video analysis task pipeline typically consists of: video source > video decoding > pre-processing > inference > post-processing > business logic > video/image encoding, etc. The hardware acceleration of Sophon devices for each step is supported as follows.

Algorithm Steps

Hardware Acceleration Support

BM-OpenCV

BM-FFmpeg

Native Interface

Video/Photo Codec

Support

Y

Y

BMCV(photo)/SAIL

Input Pre-processing

Support

Y

N

BMCV/SAIL

Model inference

Support

N

N

BMRuntime/SAIL

Output Post-processing

Partial Support

N

N

BMCV

  • For unsupported layers or operators that can be developed using BMLang or ARM CPUs and then fused into BModel, refer to the BMLang_cpp Technical Reference Manual and the BMLang_python Technical Reference Manual in the SophonSDK/tpu-nntc/doc directory

  • For other algorithms that need to use TPU acceleration, TPUKernel development based on TPU atomic operation interface can be used, see the TPUKernel User Development Documentation

  • Operator support and models that have been tested, see the TPU-NNTC Development Reference Manual

  • For routines about a single model or scene, please refer to sophon-demo, The sample code for pipeline that uses multithreading to build multi-model inference tasks can be found in the sophon-pipeline.