3.6. SDK Update Record

3.0.0

SDK

middleware

bm168x

bmcompiler & bmruntime

  • Add support for BM1684X

Quantization Tool

Sophon-inference

  • Add an interface to save input and output: set_dump_io_flag (python/c++)

  • Handle add get_sn interface (python/c++)

  • Add multi-card inference engine MultiEngine (python)

  • Add putText and putText_inference (python/c++)

  • Add image_add_weighted interface (python/c++)

  • Add image_copy_to interface (python/c++)

  • Add image_copy_to_padding interface (python/c++)

  • Add set_decoder_env interface (python/c++) to customize ffmpeg parameters

  • PaddingAttr add constructor function (python/c++)

  • Engine adds create_input_tensors_map interface (python/c++) to create input tensor map according to bmodel

  • Engine adds create_output_tensors_map interface (python/c++) to create output tensor map according to bmodel

  • Engine optimizes the numpy whose input data format is fortran, and optimizes the conversion time from fortran to contiguous. On ordinary PC, the time of 640-640 color image on ordinary PC is shortened from 40ms to 3.5ms.

  • Add nms interface (python/c++)

SoC Firmware

2.7.0

middleware

  • Bmcpu opencv is supported, that is, in pcie mode, cpu executes the corresponding function of opencv on the card. For a list of supported functions, please see the document Multimedia user’s Manual.

  • Bmcv jpeg decoder interface automatically switches to turbojpeg soft decoding when it encounters a format that is not supported by hardware.

  • Add opencv capture.read_record interface to decode and provide stream recording function at the same time.

  • Provides opencv imread decoding robustness, when there are errors in the image, still output the decodable part of the image as far as possible. It is controlled by the IMREAD_RETRY_SOFTDEC flag bit

  • The sip library in gb28181 is replaced by the GPL osip library with a self-developed sip library

  • Added fourcc support in opencv capture

  • opencv adds Abi0/1 libraries on x86, abi0 for compilers like centos below gcc5, and abi1 for ubntu systems

  • Fixed the decoding failure caused by insufficient secondary axi ram in some 4k videos

  • Enrich the vidmulti example by incorporating command line options for multiplexing

  • bmcv::toBMI adds a conversion of type 8SC3/8SC1

  • bmcv::dumpMat adds support for video compression formats

  • Improve the robustness of the error stream. When the underlying layer is blocked due to a code stream error, the avcodec_decode_video2 or send_packet/receive_frame interface will return -1 for the upper-layer application to reconnect or handle the error

  • Some bugs fix

bm168x

  • A53 is enabled by default when a PCIE driver is loaded.

  • Supports the PCIE virtual NIC function.

  • bm-smi

  • When a53 is enabled, bm-smi recovery reloads a53

  • Supports PCIE MIX MODE. In pcie mode, complete soc functions are enabled. Reasoning and video codec tasks are completed in the soc environment, and pcie only serves as a communication channel.

bmcompiler and bmruntime

  • Added new operator support and optimizations

  • Improved bmodel and bmruntime memory allocation mechanism, can support large model running, such as BasicVSR

  • Added the data statistics mode. With the export BMCOMPILER_STAT_ERR=1 command, the similarity of the data at the layer is collected during open comparison, and compilation will not be interrupted if certain data exceeds the threshold

  • Code structure optimization: decoupling with bmlib, bmcompiler layer reconstruction, etc

  • bug fix

Quantization quantization tool

  • New ubuntu16.04+python3.7 docker is supported

  • U-FrameWork optimized lmdb program interface, provides ufwio python installation package, easy to embed user development environment to make lmdb.

  • Delete the convert_imageset binary tool, use the above ufwio interface to make lmdb, and provide python routines that can be used directly or modified.

  • Quantization tool adds MSE,PERCENTILE quantization algorithm.

  • auto-cali is integrated into the ufw whl package for easy dependency management. The code was refactored to optimize the one-click quantization call interface and add more automatic quantization strategies. One-click quantization is recommended for cv quantization.

  • Add a new version of the visualization tool, the old version of the visualization tool temporarily coexist. The new version of the tool shows the network structure and is more convenient to operate. The new version of the visual tool is recommended.

  • bug-fix

Sophon-inference

  • Refactor headers to hide implementation details and improve compatibility

  • Fixed crop/resize/convert interface forcing the input BMImage with specified output format to BGR_PLANAR

    • Added the set_print_flag() switch interface for printing the main time consuming situations during inference

  • In Python, sail.Decoder removes the read_interface that takes bm_image as the return value

  • Added copy_from() and attach_from() interfaces to copy and attach data from BMImageArray.

  • Changed the initialization method for say.tensor in Python to add the own_sys_data token to enable Tensor that consists only of device memory

  • The imwrite_ interface for bm_image is added to Bmcv

  • sail in Python.Bmcv added a rectangle_to draw rectangles for bm_image()

  • sail in Python.Bmcv adds interface vpp_convert_format()

  • sail in Python.Bmcv adds interface convert_format()

  • sail in Python.Bmcv adds interface crop_and_resize_padding()

  • Added syntax hints for Python development

  • Fixed several documentation errors

SE5 firmware

  • SE5 v2 supported

  • Decoupling gate correlation (pre-set face capture recognition application) - For Ubuntu20.04 release only

  • Transplant sophon system interface and restful api to family box to provide some system information to customers (tpu/vpu/fan etc.)

  • Port some basic commands of bm_bin from gate file system to family bucket box to improve user experience

  • oemconfig.ini is generated

  • Add the upgrade command (bm_upgrade_runtime) to download the latest upgrade package from the official website and upgrade it

  • qt5.14 lib is ported to box

  • SophUI(hdmi display interface), display information, change ip

documents

The previous module documents released with the SDK are modified to be released in the official website document Center. You can view or download the PDF online

2.6.0

bmlib&driver

  • Added support for the loongarch64 platform

  • Fixed an issue where firmware would overflow after 1M

bmvid / middleware、bmcv

  • Add the vpu/jpu usage information in proc

  • opencv videowrite adds direct support for rtsp/rtmp and adds examples

  • The multimedia_guide document has been significantly updated and the multimedia_faq document has been added

  • Update and improve the functionality of the bm-scale filter in ffmpeg: Add csc conversions with more colors and support the use of filters in normal mode

  • pcie opencv enables the bmcpu function and provides basic drawing operations for the A53, such as font, rectangle, and line

  • bmcv/vpp Adds support for argb colors

  • ffmpeg bmcodec increases support for loop-decoding

  • Added support for the loongarch64 platform

  • Windows/mips64 sw64 / loogarch64 platform to increase the sample under the x86 Linux use cases

  • Added various examples of opencv/ffmpeg to show switching calls between ffmpeg+bmcv

  • Bug fixed: Support jpeg encoding of crop images, reduced memory footprint of opencv video decoding, removed libbm_x264, removed absolute path from opencvModule.cmake, etc

Sophon-inference

  • Fix bugs and improve stability

  • Added frame rate acquisition interface

  • Supports int32 model input types

  • Add a multi-process Demo

  • Document modification

nntc

  • Added bmneto to support onnx model compilation

  • bmnett supports the tensorflow 2.x and saved_model formats

  • 3d operators such as conv3d, deconv3d and related optimizations were added, and 3d models such as slowfast and 3dunet were supported

  • strideslice, transpose, depth2space and other optimizations were added to improve the performance of yolov5 related models

  • Optimize the timestep process to reduce compilation time

  • The python version supported by related tools has been upgraded to support Python 3.7

  • Fixed several bugs

cali

  • Document modification

  • Figure optimization speed optimization, c++ implementation instead of python implementation

  • reverse layer quantization support, lstm, batchmatmul layer quantization set for floating point reasoning

  • bugfix

ufw

  • Added support for onnx model to umodel

  • Supports retention of user-defined input and output names

  • Several bug fixes

bsp

  • Update SM750 driver; Added XFS support

  • Added pcie mixed mode support. [Fixed] Single gz file in the brush package is too large to be overstepped. Upgrade the open source sdk and newos packaging scripts; Adding the lte modem service; Fixed ethtool version issues

2.5.0

bmlib&driver

  • Windows SDK development support

  • Currently, the driver can be installed in debug mode and supports single-core and three-core cards

  • Basic functions of bm-smi are supported

  • Supports basic VPU/JPU codec functions

  • Support for bm_opencv, ffmpeg library acceleration

  • Supports the network inference runtime library

  • Added the statistics and display of the usage of vpu and jpu

  • Made an adaptation on the Sunway server

  • Changed the display interface of Bm-smi

  • Added protection against system crashes when executing the Bm-smi recovery operation

bmcv

  • Add two operators: bmcv_image_axpy, bmcv_image_lapacian

  • vpp padding support with stride.

bmvid / middleware

  • Supports A53 opencv running mode in SC5 mode, warp, affine, sobel, erode, dialet, morphologyEx, line, calcOpticalFlowPyrLK, OpticalFlowFarneback, calcHist, boxFitler, bilateralFilter, gaussianBlur,

  • Fixed an issue with rtsp mjpeg decoding

  • Fixed packaging issues with FLVS in hardware coding

  • ffmpeg supports lame mp3 decoding

  • ffmpeg supports the hls protocol

  • ffmpeg command line adds zero_copy option to support hardware-decoded yuv output

  • Extended platform support to Loongson, Sunway, and windows systems

  • Add vpu/jpu usage information to the proc output

  • bug fix

Sophon-inference

  • bug fix

  • Document correction

nntc

  • Added the user - defined tpu layer plug-in to enable the bmkernel to write a layer and insert it to the network

  • Added user-defined compilation optimization plug-ins and demo

  • The nms supports a maximum of 65536 detection enclosures

  • The bmnetu supports gru layer

  • bmcpu/bmruntime matches windows/mips64/sw64

  • int8 model dynamic compilation supports conv3d/pooling3d

  • Performance optimization and bug fix

  • Document update

cali

  • Incorporate the leaky relu layer and other optimizations

  • Added the use of maximum quantization option

  • Document update

  • bug fix

ufw

  • Optimized memory usage, effectively reducing memory usage by 50%

  • The layer adds the Tag function to classify and describe functions of the Layer

  • Added semantic analysis of CFG

  • Fixed some bugs

bsp

  • Added the a53 soft reboot function to ATF. u-boot fixed the mcu watchdog timeout issue after the CLI is restarted. linux added efuse secure key protection and modified tpll switching mode

  • kernel closed errutam 843419 and fixed the cpufreq bug

  • kernel5.4 and ubuntu20.04 systems are supported. Compatible with both old and new its node names; Added a dts for sm5 (incompatible!) ; debian adds device discovery tools for ethtool, gate, etc. Modify lpddr4 parameters. Improved u-boot usb compatibility; Delete duplicate realtek config in kernel4.9 Adding usb acm support

  • Added SE5 lite support. memory layout correction tool Fixed low configuration of SM5 mini

  • Update SM750 driver; Added XFS support

2.4.0

bmlib&driver

  • Added a card management interface

  • Increase the mips tool chain and function support to reach the customer trial level

  • Added Windows SDK bmlib and some test case cmakelist files, driver and Bmlib platform adaptation code (not yet at the release level).

  • Fixed an issue where drivers could not be re-installed after A53 was enabled

  • Refactoring how files are organized under proc under Linux

  • Reconstructs bm-smi code and interface organization to support display by card

  • Added soft reset function after hang for vpp module

  • Added an interface for retrieving memory heap information in bmlib

  • Fixed INTC iic2 interrupt failed to read the smbus temperature due to mask

bmcv

  • New features: put_text, draw_lines;

  • In PCIe mode, A53 can be enabled and the on-chip CPU is used to perform some operations.

bmvid / middleware

  • Added mips64 platform support

  • The assert judgment in the bmvid API is removed, and an error value is returned

  • Unify all jpeg decoded output to planar format

  • Added support for dumpBMImage for float32 int8 and other formats

  • Update osip library to improve compatibility and stability of gb28181

  • Adjust the underlying scheduling to make the scheduling more uniform when multiplexing and decoding

  • The soc reduces the 8-way limit and supports 24-way video coding

  • Fix problems and enhance stability

Sophon-inference

  • Python features added base64 support

  • Enhanced Bmcv::imwrite to support more formats.

  • Some optimizations in BMImage processing

nntc

  • 【BMNETC】Power operator bugfix, support L2Normalize operator

  • 【BMNETC】Fixed a bug where the rpnproposal operator was calculated incorrectly

  • 【BMNETP】pytorch version upgraded to 1.8.1

  • 【BMNETP】support BERT model

  • 【BMNETT】Added support for trigonometric functions such as arcsin, arccos, arcsinh, arccosh, arctanh, cosh, sinh, and tan

  • 【BMNETU】The new layer supports: ShapeRange

  • 【BMNETU】The python interface supports int8 umodel splitting

  • 【BMNETU】daily test has been enhanced

  • 【BMNETU】The Tile layer supports two inputs

  • 【BMNETU】Fixed dynamic network bug containing reorg fix8b layer

  • 【BMCOMPILER】Refactoring the graph filter optimizer

  • 【BMCOMPILER】Reconstructs the layer group optimizer into a pass structure

  • 【BMCOMPILER】The performance of active SILU fix8b has been optimized and the performance of yolov5 has been improved

  • 【BMCOMPILER】Added support for active SIGN layer

  • 【BMCOMPILER】Added Timestep merge optimization and fixed 3IC optimization issues. yolov3 and other detection network performance improved somewhat

  • 【BMCOMPILER】TOPK operators use TPU acceleration, including TOPK network has improved performance

  • 【BMLANG】stride class computing performance optimization and coeff input support

  • 【BMLANG】condition_select computations support data broadcast

  • 【BMLANG】Support for Shape Tensor control input, and update bmlang demo rgb2yuv support for dynamic crop regions

  • 【BMLANG】Performance optimization of multiple masked_select operators using the same mask

  • 【BMLANG】The performance of the nms operator is improved by several times

  • 【BMRUNTIME】Remove exit and exit directly, using c++ exception to return an error value

  • 【BMRUNTIME】Support for Loongson mips

  • 【BMRUNTIME】Networks with FC layers also support dynamically variable resolution

  • 【BMPROFILE】Fixed layer and GDMA display errors

  • 【BMKERNEL】Supports mips64

  • 【BMKERNEL】Added BMRT + BMKERNEL running yolov3 backbone + post-processing demo

  • 【BACKEND 1684】Added the bmkernel demo of high-performance group topk

  • 【BACKEND 1684】Fixed an issue where dynamic networks containing reduce would hang after running many rounds

  • 【BACKEND 1684】Support for any multi - operator dynamic network

  • 【BACKEND 1684】Fixed a calculation error bug for 3d max pooling

  • 【BACKEND 1684】Optimized performance of 4N/1N conversion, some network performance will be improved

cali

  • Fixed bug in layer such as batchnorm leakyrelu

  • Add is_shape_layer tag to better support shape-related operations, and add shapeRange, expandDims, shapeAssign, and shapeCast layers to support int32 data

  • Collate forward_with_float function to increase stability

  • Added the ADMM statistics threshold

  • Improve the auto_calib function

ufw

  • Networks containing control flows are supported by default

  • Networks that support arbitrary dynamic shapes, and networks that include run-time inferences of shapes

  • Part of the bugfix

  • Some layer functions are enhanced

bsp

  • Increased the ramdisk size in recovery mode. Added usb swiping function

  • Added support for SM5 min wide temperature version; Fixed the temperature point and CPU frequency modulation mechanism of wide temperature plate; Adjust the pcie rc initialization code to support boards without refclk0. spacc secure key function; u-boot and BL2 slim; Upgrade the flash update tool. Increase the recovery partition to support OTA; bind pcie interrupts to cpu7. Added fl2000 re-enumeration function

  • Added the a53 soft reboot function to ATF

2.3.2

bmlib&driver

  • Added adaptation to mips

  • Added some libraries for the mips architecture

  • Fixed several SC5P bugs

  • Added some cmake code and scripts to support Windows compilation

  • Added support for SM5-W

bmcv

  • Fixed several corner case bugs;

  • Added interfaces: absdiff, threshold, fft, max_min, cal-hist;

  • Sort out where the back-end code is stored. Only a few apis are placed in itcm, and all new ones are placed in ddr.

bmvid

  • Supports Loongson platform compilation

middleware

  • Extend opencv support to 512 channels for video

  • ffmpeg osip library update

  • Fix bugs

  • Supports Loongson platform compilation

Sophon-inference

  • Fixed an issue where the URL in the document would not connect.

nntc

  • 【BMLANG】Added deconv’s perchannel int8 compute demo.

  • 【BMLANG】Select/Condition select supports int8.

  • 【BMLANG】The performance of the NMS is optimized

  • 【BMNETC】Added CONV3D and POOLING3D support.

  • 【BMNETT】Add user-defined input data and data types.

  • 【BMCOMPILER】Fixed some bugs to support the customer model.

  • 【BMCOMPILER】Optimized the wait time for running INT8 networks.

  • 【BMNETP】Added support for BMM, LAYER NORM, and EINSUM operators.

  • 【BMNETU】Good unit testing.

  • 【BMPROFILE】Added the CSV export function.

  • 【BMPROFILE】Added the ability to parse Global memory operations.

  • 【BMRUNTIME】Adapted to the Loongson.

  • 【BACKEND_1684】Optimized the time overhead for GDMA GEN CMD.

  • 【BACKEND_1684】Fixed back end bugs.

  • 【BACKEND_1684】Optimized group topk and full library topk demo.

cali

  • Fixed a bug in the batchnorm layer

  • Add the auto_calib function

ufw

  • Fixed several bugs

  • Enhanced stability of analysis tools

  • conv per-channel computation is supported

  • Removed some of the python apis for ufw blob data

bsp

  • The DDR data in the ATF is separated into a separate bin file

  • Switch to MCU watchdog

  • SM5 Wide warm board support

  • BSP SDK open source related

2.3.1

bmlib&driver

  • Added a device management interface and updated the document

  • Added MIPS architecture engineering

  • fix warn information in compilation

  • Added adaptation to SC5P

  • The mechanism of updating chip and board temperature and tpu utilization rate was reconstructed

bmcv

  • sobel, gaussian blur, add-weighted, dct, yuv2hsv, and batch-topk operators are added.

  • Optimize and extend the nms to support a maximum of 65535 boxes and improve its performance;

  • Extension of matmul to support output of float32;

  • bugfix

middleware

  • The 8UC1→8UC1, BORDER_DEFAULT case of the sobel interface is implemented with hardware acceleration

  • The 8UC1 input of the gaussion_blur interface is implemented with hardware acceleration

  • Added support for hikvision smart options

  • Optimized opencv font rendering in yuv

  • Update multimedia document

  • Improve stability and fix bugs

Sophon-inference

  • Fixed problems and improved stability

nntc

  • 【bmcompiler】Performance optimization for networks containing C axis CONCAT operators.

  • 【bmnetu】【bmnetp】Added support for 3D Conv/Pooling.

  • 【bmnetu】Added support for multi-input networks.

  • 【bmnett】Added the check ops function.

  • 【bmnetp】Added support for the Pytorch GRU/LSTM model.

  • 【bmnetp】Upgrade to support pytorch version 1.7.

  • 【bmlang】The Deconv operator supports 16bit output.

  • 【bmlang】Added performance optimized group topk business program demo.

  • 【bmlang】GEMM supports INT8 input /FP32 output and supports perchannel scale.

  • 【bmlang】Added deconv to do a demo of perchannel quantization calculations.

  • 【backend_1684】When the nms operator is larger than 1024 boxes, the performance is better than 2.3.0.

  • 【backend_1684】Improved performance of 1N and 4N data conversion.

  • 【all】Fixed BUGs to improve stability.

ufw

  • float computing supports networks containing control flows

  • UFW python Blobs are compatible with the numpy data type

  • The UFW supports partially empty set input and computation

bsp

  • The perfetto tool is pre-installed

  • The deb package required for compiling kernel module on the preinstallation board

  • VPP driver code optimization

  • The PMU was enabled

3.7. Release Note

22.09.02 release note:

  • TPU-NNTC: bugfix

  • TPU-KERNEL: Optimize soc and cmodel testing; bugfix

  • TPU-MLIR: None

  • TPU-PERF: None

  • sophon-mw: bugfix

  • libsophon: Improve bm1684 soc mode support; bugfix

  • sophon-img: bugfix

  • sophon-pipeline: None

22.10.01 release note:

  • TPU-NNTC: Send packages directly from nntoolchain

  • TPU-KERNEL: Add some CV operators

  • TPU-MLIR: Support for compiling caffe model

  • TPU-PERF: bugfix

  • sophon-mw: bugfix

  • libsophon: bmvid directory structure collation, add ARM schema support; bugfix

  • sophon-img: update the kernel version to 5.4.219, u-boot version to 2022.10 bugfix

  • sophon-pipeline: add yolov5、video_stitch routing

  • sophon-sail:

  1. The following APIs have been added to Python:

  1. BMImage adds an interface to convert to opencv Mat: asmat()

  2. BMImage add and get the device interface to which you belong: get_device_id()

  3. BMImageArray add and get the device interface to which you belong: get_device_id()

  4. Bmcv add BMImage Sketch Interface: drawPoint()

  5. Bmcv add bm_image Sketch Interface: drawPoint_()

  1. The following APIs have been added to C++:

  1. BMImage add to get the device interface: get_device_id()

  2. BMImageArray add to get the device interface: get_device_id()

  3. Bmcv add BMImage Sketch Interface: drawPoint()

  4. Bmcv adds bm_image drawing point API: drawPoint_()

  1. Decoder fixed a problem where Python GIL was not released during decoding

  • sophon-demo: Released for the first time, including 10 routines such as YOLOv5 and ResNet

  • sophon-rpc: Support A53 on the PCIe mode startup board, and automatically enable A53 after reboot by configuring udev rules

  • bmmonke

  • bmpanda

22.11.01 release note:

  • TPU-NNTC: bugfix

  • TPU-KERNEL: Some cv operators samples have been added

  • TPU-MLIR: Support for hybrid quantization

  • TPU-PERF: bugfix

  • sophon-mw: bugfix; increased memory allocation for bmlib and removed the old way of ION LIB allocation

  • libsophon: centos rpm package support; centos abi0 support

  • sophon-img: changed the kernel to non-preemptive mode; bugfix; and added a documentation description of how to use custom software packages and how to modify kernel memory layout

  • sophon-pipeline: add retinaface routines

  • sophon-sail:

  1. crop_and_resize API adds support for BMCV_INTER_LINEAR and BMCV_INTER_BICUBIC methods

  2. delete the vpp_crop_padding interface because there is a problem with the interface implementation

  3. resize API adds support for BMCV_INTER_LINEAR and BMCV_INTER_BICUBIC methods

  4. crop_and_resize_padding API adds support for BMCV_INTER_LINEAR and BMCV_INTER_BICUBIC methods

  5. add a common preprocessing interface

  • sophon-demo: fefactoring the cpp/lprnet_bmcv routine of LPRNet

  • sophon-rpc: bugfix

  • bmmonkey

  • bmpanda

22.12.01 release note:

  • TPU-NNTC: bugfix

  • TPU-KERNEL: bugfix

  • TPU-MLIR: Tflite supports inception/yolo network

  • TPU-PERF: fix mlir model run error; bugfix

  • sophon-mw: fixbug; add A53 coding case; change ffbmcv case

  • libsophon: bugfix

  • sophon-img: enable nfsd function; bugfix

  • sophon-pipeline: add face_recognition and multi routines

  • sophon-sail:

  1. crop_and_resize interface adds configurable nearest neighbor algorithm, linear interpolation algorithm, bicubic interpolation algorithm

  2. resize interface adds configurable nearest neighbor algorithm, linear interpolation algorithm, bicubic interpolation algorithm

  3. crop_and_resize_padding interface adds configurable nearest neighbor algorithm, linear interpolation algorithm, bicubic interpolation algorithm

  4. bugfix

  • sophon-demo: fixbug; LPRNet/cpp/lprnet_bmcv replaces opencv decoding with ff_decode; reconstructs SSD-related routines

  • sophon-rpc: Support for uninstalling libraries; fix some bugs during installation.

  • bmmonkey

  • bmpanda

3.8. SDK known problems

This section describes the discovered but unsolved SDK related problems and provides temporary solutions for users’ reference.

Temporarily absent