2. Introduction
2.1. Explanation of Terms
Term |
Description |
BM1688/CV186AH |
Two fifth-generation tensor processors launched by SOPHON for the field of deep learning |
BM1684X |
A fourth-generation tensor processor launched by SOPHON for the field of deep learning |
BM1684 |
A third-generation tensor processor launched by SOPHON for the field of deep learning |
Intelligent Vision Deep Learning Processor |
The neural network computing unit in BM1688/CV186AH/BM1684/BM1684X |
VPSS |
The video processing subsystem in BM1688/CV186AH, including the graphics computing acceleration unit, also known as VPP |
VPU |
The decoding unit in BM1688/CV186AH/BM1684/BM1684X |
VPP |
The graphics computing acceleration unit in BM1684/BM1684X |
JPU |
The image JPEG encoding and decoding unit in BM1688/CV186AH/BM1684/BM1684X |
SDK |
An original deep learning development toolkit based on BM1688/CV186AH/BM1684/BM1684X from SOPHON |
PCIe Mode |
A working mode of BM1684/BM1684X, used as an acceleration device, with customer algorithms running on an x86 host |
SoC Mode |
A working mode of BM1688/CV186AH/BM1684/BM1684X, operating as an independent host, with customer algorithms running directly on it |
arm_pcie Mode |
A working mode of BM1684/BM1684X, where the board carrying BM1684/BM1684X is plugged into an ARM processor server as a PCIe slave device, with customer algorithms running on the ARM processor host |
BMCompiler |
An optimized compiler for deep neural networks developed for intelligent vision deep learning processors, capable of converting various deep neural networks from deep learning frameworks into instruction streams that can run on the processor |
BMRuntime |
An inference interface library for intelligent vision deep learning processors |
BMCV |
A hardware acceleration interface library for graphics computing |
BMLib |
A low-level software library that encapsulates device management, memory management, data transfer, API sending, A53 enablement, and power control on top of the kernel driver |
mlir |
An intermediate model format generated by TPU-MLIR, used for model migration or quantization |
BModel |
A deep neural network model file format for intelligent vision deep learning processors, containing the weights and instruction streams of the target network |
BMLang |
A high-level programming model for intelligent vision deep learning processors, allowing users to develop without needing to know the underlying hardware information |
TPUKernel |
A development library based on the atomic operations of intelligent vision deep learning processors (a set of interfaces encapsulated according to the BM1688/CV186AH/BM1684/BM1684X instruction set) |
SAIL |
A SOPHON Inference inference library that supports Python/C++ interfaces, further encapsulating BMCV, sophon-media, BMLib, BMRuntime, etc. |
TPU-MLIR |
An intelligent vision deep learning processor compiler project that can convert pre-trained neural networks from different frameworks into bmodels that can efficiently run on SOPHON intelligent vision deep learning processors |
2.1.1. Operating Modes
The SOPHON BM168X series products cover a variety of product forms from the edge to the core, and can support two different operating modes, which correspond to different product forms. The specific information is as follows:
BM168X |
SoC Mode |
PCIe Mode |
Independent operation |
Yes, BM168X is an independent host, and the algorithm runs on BM168X |
No, the algorithm is deployed on an X86 or ARM host, and inference runs on a PCIe card |
External IO method |
Gigabit Ethernet |
PCIe interface |
Corresponding products |
Microserver/module |
PCIe accelerator card |
2.1.2. Development and Runtime Environments
The development environment refers to the environment used for model conversion or verification, as well as program compilation and other development processes; the runtime environment refers to the environment used to deploy and run algorithm applications on a platform with SOPHON devices.
The development environment and the runtime environment may be unified (for example, an x86 host with an PCIe accelerator card is both a development environment and a runtime environment), or they may be separated (for example, using an x86 host as a development environment to convert models and compile programs, and using an SoC Mode to deploy and run the final algorithm application).
However, regardless of whether users are using a SoC mode or a PCIe mode product, they will need an x86 host as a development environment, and the runtime environment can be any system platform that has been tested and supported.
2.1.3. Hardware Memory
Memory is an important concept often involved in the debugging of BM168X applications. In particular, there are three concepts that need to be clearly distinguished: Global Memory, Host Memory, and Device Memory.
Global Memory: Refers to the off-processor storage DDR of BM168X, which is usually 12GB for BM1684 and can be customized up to 16GB. BM1688/CV186AH Please refer to the product manual.
Device Memory (Device Memory) and System Memory (Host Memory): Depending on the type of BM168X product or operating mode, device memory and system memory have different meanings:
Mode |
SoC Mode |
PCIe Mode |
Product |
SM5/SE5/SM7/SE7 |
SC5/SC5H/SC5+/SC7FP75/SC7HP75 |
Global Memory |
Up to 4GB dedicated to Tensor Computing Processor Up to 3GB dedicated to VPU Up to 4GB dedicated to VPP (Remaining memory for use by the host Cortex A53) |
Up to 4GB dedicated to Tensor Computing Processor 4GB dedicated to VPU 4GB dedicated to VPP/A53 4GB dedicated to VPP/A53 |
Host Memory |
Memory of the host Cortex A53 |
Host memory |
Device Memory |
Memory allocated to the Tensor Computing Processor/VPP/VPU |
Physical memory on the PCIe card (Global Memory) |
Memory synchronization issues are important and often hidden problems that are frequently encountered in subsequent application debugging. Memory synchronization functions are provided in both the sophon-opencv and sophon-ffmpeg frameworks; while the BMCV API only operates on device memory, so there is no memory synchronization issue. Before calling the BMCV API, the input data needs to be prepared in device memory; interfaces are provided in BMLib to enable data transfer between Host Memory and Global Memory, within Global Memory, and between different devices’ Global Memory. For more detailed information, please refer to the “BMLIB Development Reference Manual” and the “Multimedia Development Reference Manual”.
2.1.4. BModel
BModel: It is a deep neural network model file format for the SOPHON intelligent vision deep learning processor, which contains the weights (weight) and instruction stream of the target network.
Stage: It supports combining different batch size models of the same network into one BModel; different batch size inputs of the same network correspond to different stages, and BMRuntime will automatically select the corresponding stage model according to the size of the input shape during inference. It also supports combining different networks into one BModel, and obtaining different networks through network names.
Dynamic compilation and static compilation: It supports dynamic and static compilation of models, which can be set through parameters when converting models. Dynamically compiled BModel supports any input shape smaller than the shape set during compilation at runtime; statically compiled BModel only supports the shape set during compilation at runtime.
Note
Prefer statically compiled models: Dynamically compiled models require the participation of the ARM9 microcontroller in the BM168X to dynamically generate instructions for the intelligent vision deep learning processor based on the actual input shape during runtime. Therefore, the execution efficiency of dynamically compiled models is lower than that of statically compiled models. If possible, statically compiled models or statically compiled models that support multiple input shapes should be preferred.
2.1.5. bm_image
BMCV: BMCV provides a set of machine vision libraries optimized for SOPHON Deep learning processors. By utilizing the Tensor Computing Processor and Video Post-Processing modules of the processor, it can perform operations such as color space conversion, scale transformation, affine transformation, perspective transformation, linear transformation, drawing frames, JPEG encoding and decoding, BASE64 encoding and decoding, NMS, sorting, feature matching, and more.
bm_image: BMCV APIs are all centered around bm_image. A bm_image object corresponds to one image. Users create bm_image objects through bm_image_create, which are then used by various bmcv functional functions. After use, bm_image_destroy must be called to destroy them.
BMImage: In the SAIL library, bm_image is encapsulated as BMImage. For more information, please refer to the ” SOPHON-SAIL User Manual “.
The following is the definition of the bm_image structure and related data formats:
1typedef enum bm_image_format_ext_{
2 FORMAT_YUV420P,
3 FORMAT_YUV422P,
4 FORMAT_YUV444P,
5 FORMAT_NV12,
6 FORMAT_NV21,
7 FORMAT_NV16,
8 FORMAT_NV61,
9 FORMAT_RGB_PLANAR,
10 FORMAT_BGR_PLANAR,
11 FORMAT_RGB_PACKED,
12 FORMAT_BGR_PACKED,
13 PORMAT_RGBP_SEPARATE,
14 PORMAT_BGRP_SEPARATE,
15 FORMAT_GRAY,
16 FORMAT_COMPRESSED
17} bm_image_format_ext;
18
19typedef enum bm_image_data_format_ext_{
20 DATA_TYPE_EXT_FLOAT32,
21 DATA_TYPE_EXT_1N_BYTE,
22 DATA_TYPE_EXT_4N_BYTE,
23 DATA_TYPE_EXT_1N_BYTE_SIGNED,
24 DATA_TYPE_EXT_4N_BYTE_SIGNED,
25}bm_image_data_format_ext;
26
27struct bm_image {
28 int width;
29 int height;
30 bm_image_format_ext image_format;
31 bm_data_format_ext data_type;
32 bm_image_private* image_private;
33};