2. Introduction 

2.1. Explanation of Terms 

Table 2.1 :class: longtable :width: 600
Term	Description
BM1688/CV186AH	Two fifth-generation tensor processors launched by SOPHON for the field of deep learning
BM1684X	A fourth-generation tensor processor launched by SOPHON for the field of deep learning
BM1684	A third-generation tensor processor launched by SOPHON for the field of deep learning
Intelligent Vision Deep Learning Processor	The neural network computing unit in BM1688/CV186AH/BM1684/BM1684X
VPSS	The video processing subsystem in BM1688/CV186AH, including the graphics computing acceleration unit, also known as VPP
VPU	The decoding unit in BM1688/CV186AH/BM1684/BM1684X
VPP	The graphics computing acceleration unit in BM1684/BM1684X
JPU	The image JPEG encoding and decoding unit in BM1688/CV186AH/BM1684/BM1684X
SDK	An original deep learning development toolkit based on BM1688/CV186AH/BM1684/BM1684X from SOPHON
PCIe Mode	A working mode of BM1684/BM1684X, used as an acceleration device, with customer algorithms running on an x86 host
SoC Mode	A working mode of BM1688/CV186AH/BM1684/BM1684X, operating as an independent host, with customer algorithms running directly on it
arm_pcie Mode	A working mode of BM1684/BM1684X, where the board carrying BM1684/BM1684X is plugged into an ARM processor server as a PCIe slave device, with customer algorithms running on the ARM processor host
BMCompiler	An optimized compiler for deep neural networks developed for intelligent vision deep learning processors, capable of converting various deep neural networks from deep learning frameworks into instruction streams that can run on the processor
BMRuntime	An inference interface library for intelligent vision deep learning processors
BMCV	A hardware acceleration interface library for graphics computing
BMLib	A low-level software library that encapsulates device management, memory management, data transfer, API sending, A53 enablement, and power control on top of the kernel driver
mlir	An intermediate model format generated by TPU-MLIR, used for model migration or quantization
BModel	A deep neural network model file format for intelligent vision deep learning processors, containing the weights and instruction streams of the target network
BMLang	A high-level programming model for intelligent vision deep learning processors, allowing users to develop without needing to know the underlying hardware information
TPUKernel	A development library based on the atomic operations of intelligent vision deep learning processors (a set of interfaces encapsulated according to the BM1688/CV186AH/BM1684/BM1684X instruction set)
SAIL	A SOPHON Inference inference library that supports Python/C++ interfaces, further encapsulating BMCV, sophon-media, BMLib, BMRuntime, etc.
TPU-MLIR	An intelligent vision deep learning processor compiler project that can convert pre-trained neural networks from different frameworks into bmodels that can efficiently run on SOPHON intelligent vision deep learning processors

The SOPHON BM168X series products cover a variety of product forms from the edge to the core, and can support two different operating modes, which correspond to different product forms. The specific information is as follows:

BM168X	SoC Mode	PCIe Mode
Independent operation	Yes, BM168X is an independent host, and the algorithm runs on BM168X	No, the algorithm is deployed on an X86 or ARM host, and inference runs on a PCIe card
External IO method	Gigabit Ethernet	PCIe interface
Corresponding products	Microserver/module	PCIe accelerator card

2.1.2. Development and Runtime Environments 

The development environment refers to the environment used for model conversion or verification, as well as program compilation and other development processes; the runtime environment refers to the environment used to deploy and run algorithm applications on a platform with SOPHON devices.

The development environment and the runtime environment may be unified (for example, an x86 host with an PCIe accelerator card is both a development environment and a runtime environment), or they may be separated (for example, using an x86 host as a development environment to convert models and compile programs, and using an SoC Mode to deploy and run the final algorithm application).

However, regardless of whether users are using a SoC mode or a PCIe mode product, they will need an x86 host as a development environment, and the runtime environment can be any system platform that has been tested and supported.

2.1.3. Hardware Memory 

Memory is an important concept often involved in the debugging of BM168X applications. In particular, there are three concepts that need to be clearly distinguished: Global Memory, Host Memory, and Device Memory.

Global Memory: Refers to the off-processor storage DDR of BM168X, which is usually 12GB for BM1684 and can be customized up to 16GB. BM1688/CV186AH Please refer to the product manual.
Device Memory (Device Memory) and System Memory (Host Memory): Depending on the type of BM168X product or operating mode, device memory and system memory have different meanings:

Mode	SoC Mode	PCIe Mode
Product	SM5/SE5/SM7/SE7	SC5/SC5H/SC5+/SC7FP75/SC7HP75
Global Memory	Up to 4GB dedicated to Tensor Computing Processor Up to 3GB dedicated to VPU Up to 4GB dedicated to VPP (Remaining memory for use by the host Cortex A53)	Up to 4GB dedicated to Tensor Computing Processor 4GB dedicated to VPU 4GB dedicated to VPP/A53 4GB dedicated to VPP/A53
Host Memory	Memory of the host Cortex A53	Host memory
Device Memory	Memory allocated to the Tensor Computing Processor/VPP/VPU	Physical memory on the PCIe card (Global Memory)

Memory synchronization issues are important and often hidden problems that are frequently encountered in subsequent application debugging. Memory synchronization functions are provided in both the sophon-opencv and sophon-ffmpeg frameworks; while the BMCV API only operates on device memory, so there is no memory synchronization issue. Before calling the BMCV API, the input data needs to be prepared in device memory; interfaces are provided in BMLib to enable data transfer between Host Memory and Global Memory, within Global Memory, and between different devices’ Global Memory. For more detailed information, please refer to the “BMLIB Development Reference Manual” and the “Multimedia Development Reference Manual”.

2.1.4. BModel 

BModel: It is a deep neural network model file format for the SOPHON intelligent vision deep learning processor, which contains the weights (weight) and instruction stream of the target network.

Stage: It supports combining different batch size models of the same network into one BModel; different batch size inputs of the same network correspond to different stages, and BMRuntime will automatically select the corresponding stage model according to the size of the input shape during inference. It also supports combining different networks into one BModel, and obtaining different networks through network names.

Dynamic compilation and static compilation: It supports dynamic and static compilation of models, which can be set through parameters when converting models. Dynamically compiled BModel supports any input shape smaller than the shape set during compilation at runtime; statically compiled BModel only supports the shape set during compilation at runtime.

Note

Prefer statically compiled models: Dynamically compiled models require the participation of the ARM9 microcontroller in the BM168X to dynamically generate instructions for the intelligent vision deep learning processor based on the actual input shape during runtime. Therefore, the execution efficiency of dynamically compiled models is lower than that of statically compiled models. If possible, statically compiled models or statically compiled models that support multiple input shapes should be preferred.

2.1.5. bm_image 

BMCV: BMCV provides a set of machine vision libraries optimized for SOPHON Deep learning processors. By utilizing the Tensor Computing Processor and Video Post-Processing modules of the processor, it can perform operations such as color space conversion, scale transformation, affine transformation, perspective transformation, linear transformation, drawing frames, JPEG encoding and decoding, BASE64 encoding and decoding, NMS, sorting, feature matching, and more.

bm_image: BMCV APIs are all centered around bm_image. A bm_image object corresponds to one image. Users create bm_image objects through bm_image_create, which are then used by various bmcv functional functions. After use, bm_image_destroy must be called to destroy them.

BMImage: In the SAIL library, bm_image is encapsulated as BMImage. For more information, please refer to the ” SOPHON-SAIL User Manual “.

The following is the definition of the bm_image structure and related data formats:

typedef enum bm_image_format_ext_{
    FORMAT_YUV420P,
    FORMAT_YUV422P,
    FORMAT_YUV444P,
    FORMAT_NV12,
    FORMAT_NV21,
    FORMAT_NV16,
    FORMAT_NV61,
    FORMAT_RGB_PLANAR,
    FORMAT_BGR_PLANAR,
    FORMAT_RGB_PACKED,
    FORMAT_BGR_PACKED,
    PORMAT_RGBP_SEPARATE,
    PORMAT_BGRP_SEPARATE,
    FORMAT_GRAY,
    FORMAT_COMPRESSED
} bm_image_format_ext;

typedef enum bm_image_data_format_ext_{
    DATA_TYPE_EXT_FLOAT32,
    DATA_TYPE_EXT_1N_BYTE,
    DATA_TYPE_EXT_4N_BYTE,
    DATA_TYPE_EXT_1N_BYTE_SIGNED,
    DATA_TYPE_EXT_4N_BYTE_SIGNED,
}bm_image_data_format_ext;

struct bm_image {
    int width;
    int height;
    bm_image_format_ext image_format;
    bm_data_format_ext data_type;
    bm_image_private* image_private;
};

2. Introduction

2.1. Explanation of Terms

2.1.1. Operating Modes

2.1.2. Development and Runtime Environments

2.1.3. Hardware Memory

2.1.4. BModel

2.1.5. bm_image