3. Deployment of YOLOv5 Model for General Use

3.1. Introduction

This document introduces the operation process of deploying the YOLOv5 architecture model on the CV181x development board. The main steps include:

  • Convert YOLOv5 model Pytorch version to ONNX model

  • Convert onnx model to cvi model format

  • Finally, write a calling interface to obtain the inference results

3.2. Convert pt Model to onnx

1. Firstly, you can download the official Yolov5 warehouse code at the following address: [ultra tics/yolov5: YOLOv5] 🚀 In PyTorch>ONNX>CoreML>TFLite](https://github.com/ultralytics/yolov5)

git clone https://github.com/ultralytics/yolov5.git

2. Obtain the YOLOv5 model in. pt format, such as the address for downloading the YOLOv5S model: [yolov5s](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt)

3. It is necessary to modify the forward function in the Detect class in the yolov5/models/yolo .py file, Let the RISC-V do the latter part of YOLOv5 and output nine branches, which will be referred to as the TDLSDK export method in the future

The original output is a result and the post-processing is done by the model, which is the official export result.

The reason is that the output contains a relatively large coordinate range, which is prone to quantization failure or poor performance.

def forward(self, x):
      z = []  # inference output
      for i in range(self.nl):
          x[i] = self.m[i](x[i])  # conv
          bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
          x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

          xywh, conf, score = x[i].split((4, 1, self.nc), 4)
          z.append(xywh[0])
          z.append(conf[0])
          z.append(score[0])
      return z
#The output of the modified model is divided into 9 different branches:
# (3, 20, 20, 80) - class
# (3, 20, 20, 4)  - box
# (3, 20, 20, 1)  - conf
# (3, 40, 40, 80) - class
# (3, 40, 40, 4)  - box
# (3, 40, 40, 1)  - conf
# (3, 40, 40, 80) - class
# (3, 40, 40, 4)  - box
# (3, 40, 40, 1)  - conf

4.Export the onnx model using the officially provided export.py

#Where - weights represents the relative path of the weight file, and - include represents the conversion format as onnx
python export.py --weights ./yolov5s.pt --include onnx
#The generated onnx model is in the current directory

3.3. Preparing the Environment for Model Conversion

Converting onnx to cvi model requires a TPU-MLIR release package. TPU-MLIR is a TPU compiler project that can calculate TDL processors.

TPU-MLIR Toolkit Download TPU-MLIR code path https://github.com/sophgo/tpu-mlir Interested parties can be referred to as open source developers who jointly maintain the open source community. And we only need the corresponding toolkit, which can be downloaded from the TPU-MLIR forum on the official website of Quanneng, later referred to as the toolchain toolkit: (https://developer.sophgo.com/thread/473.html)

The TPU-MLIR project provides a complete toolchain that can transform pre trained neural networks under different frameworks into files that can be efficiently executed on computational TPUs. Currently, it supports direct conversion of onnx and Caffe models, while models from other frameworks need to be converted to onnx models and then converted through the TPU-MLIR tool.

The conversion model needs to be executed in the specified docker, and the main steps can be divided into two steps:

  • The first step is to use the model_transform.py converts the original model into an mlir file

  • The second step is to use the model_deploy. py converts the mlir file into a cvi model

>If you need to convert to INT8 model, you also need to call run before the second step_Calibration. py generates a calibration table and passes it to the model_Deploy.py

Docker Configuration

TPU-MLIR needs to be developed in the Docker environment, and the Docker image can be directly downloaded (which is relatively slow). Please refer to the following command:

docker pull sophgo/tpuc_dev:latest

Alternatively, you can download the Docker image from the TPU Toolchain Toolkit (which is faster) and load the Docker.

docker load -i  docker_tpuc_dev_v2.2.tar.gz

If using Docker for the first time, you can execute the following commands for installation and configuration (only for the first time):

sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Enter Docker Environment

Ensure that the installation package is in the current directory, and then create a container in the current directory as follows:

docker run --privileged --name myname -v $PWD:/workspace -it sophgo/tpuc_dev:v2.2

The following steps assume that the user is currently in the/workspace directory in the Docker

####Loading tpu mlir toolkit&preparing working directory

The following operations need to be performed on the Docker container

[Decompression tpu_mlr toolkit] The following folders are mainly created for the convenience of subsequent management, and you can also classify files according to your preferred management method

Create a new folder tpu_Mlir, extract the new toolchain to tpu_Under the mlir/directory, and set the environment variables:

##Among them, tpu mlir_ Xxx in xxx.tar.gz is the version number, determined by the corresponding file name
mkdir tpu_mlir && cd tpu_mlir
cp tpu-mlir_xxx.tar.gz ./
tar zxf tpu-mlir_xxx.tar.gz
source tpu_mli_xxx/envsetup.sh

[Copy Onnx model] Create a folder, using yolov5s as an example, create a folder yolov5s, and place the onnx model in the yolov5s/onnx/path

 mkdir yolov5s && cd yolov5s
 ##Copy the yolov5 onnx model transferred from the previous section to the yolov5s directory
 cp yolov5s.onnx ./
##Copy the dog.jpg from the official website and come over for verification.
 cp dog.jpg ./

After the above preparation work is completed, you can start converting the model

3.4. Onnx to MLIR

If the model is image input, we need to understand the preprocessing of the model before converting it.

If the model uses preprocessed npz files as input, there is no need to consider preprocessing.

In this example, the image of yolov5 is rgb, with mean and scale corresponding to:

  • mean: 0.0, 0.0, 0.0

  • scale: 0.0039216, 0.0039216, 0.0039216

The command for model conversion is as follows:

model_transform.py \
--model_name yolov5s \
--model_def yolov5s.onnx \
--input_shapes [[1,3,640,640]] \
--mean 0.0,0.0,0.0 \
--scale 0.0039216,0.0039216,0.0039216 \
--keep_aspect_ratio \
--pixel_format rgb \
--test_input ./dog.jpg \
--test_result yolov5s_top_outputs.npz \
--mlir yolov5s.mlir

Among them, model_For details of the transform.py parameter, please refer to the [tpu_mlr_xxxxx/doc/TPU-MILIR Quick Start Guide]

After converting to an mlir file, a yolov5s will be generated In_F32.npz file, which is the input file for the model

3.5. MLIR to INT8 Model

[Generate Calibration Table]

Before converting to the INT8 model, it is necessary to run a calibration to obtain the calibration table; The quantity of input data is prepared to be around 100-1000 sheets according to the situation.

Then use a calibration table to generate a cvi model. The image of the generated calibration table should be as similar as the distribution of the training data as possible

## This dataset is extracted from COCO2017 by 100 for calibration, and other images are also acceptable. There is no mandatory requirement here.
run_calibration.py yolov5s.mlir \
--dataset COCO2017 \
--input_num 100 \
-o yolov5s_cali_table

After the operation is completed, a file named yolov5 will be generated_Cali_Table file, which is used as the input file for subsequent compilation of the cvimode model

[Generate cvi-model]

Then generate the int8 symmetric quantization cvi model and execute the following command:

Among them - quantum the output parameter indicates that the output layer is also quantified as int8, and if this parameter is not added, the output layer is kept as float32.

From the subsequent test results, quantifying the output layer to int8 can reduce some ions and improve inference speed,

And the model detection accuracy has not decreased significantly. It is recommended to add - quantum Output parameter

model_deploy.py \
--mlir yolov5s.mlir \
--quant_input \
--quant_output \
--quantize INT8 \
--calibration_table yolov5s_cali_table \
--processor cv181x \
--test_input yolov5s_in_f32.npz \
--test_reference yolov5s_top_outputs.npz \
--tolerance 0.85,0.45 \
--model yolov5_cv181x_int8_sym.cvimodel

Among them, model For the main parameters of deploy.py, please refer to the [tpu_mlr_xxxxx/doc/TPU-MILIR Quick Start Guide]

After compilation, a file named yolov5 will be generated_Cv181x_Int8_File for sym.cvimodel

After successfully running the above steps, the step of compiling the cvi model is completed, and then TDL can be used The SDK calls the exported CVIModel for YOLOv5 target detection and inference.

3.6. TDL SDK Interface Description

The TDLSDK toolkit needs to be contacted to obtain it.

The integrated YOLOv5 interface opens up pre processing settings, anchor, conf confidence, and NMS confidence settings for YOLOv5 model algorithms

The structure set for preprocessing is Yolov5PreParam

/** @struct YoloPreParam
*  @ingroup core_cvitdlcore
*  @brief Config the yolov5 detection preprocess.
*  @var YoloPreParam::factor
*  Preprocess factor, one dimension matrix, r g b channel
*  @var YoloPreParam::mean
*  Preprocess mean, one dimension matrix, r g b channel
*  @var YoloPreParam::rescale_type
*  Preprocess config, vpss rescale type config
*  @var YoloPreParam::pad_reverse
*  Preprocess padding config
*  @var YoloPreParam::keep_aspect_ratio
*  Preprocess config quantize scale
*  @var YoloPreParam::use_crop
*  Preprocess config, config crop
*  @var YoloPreParam:: resize_method
*  Preprocess resize method config
*  @var YoloPreParam::format
*  Preprocess pixcel format config
*/
typedef struct {
  float factor[3];
  float mean[3];
  meta_rescale_type_e rescale_type;
  bool pad_reverse;
  bool keep_aspect_ratio;
  bool use_quantize_scale;
  bool use_crop;
  VPSS_SCALE_COEF_E resize_method;
  PIXEL_FORMAT_E format;
} YoloPreParam;

Here is a simple setup example: * Initialize preprocessing settings YoloPreParam and Yolov5 model settings YoloAlgParam, using CVI_TDL_Set_YOLOV5_Param passes in the set parameters * Yolov5 is a detection algorithm for anchor based. For ease of use, anchor customization settings have been opened. In setting YoloAlgParam, it is important to note that the order of anchors and structures needs to be one-to-one, otherwise it may cause errors in the inference results * Additionally, it supports modifying the custom classification quantity. If the output classification quantity of the model is modified, YolovAlgParam.cls needs to be set as the modified classification quantity * The properties that appear in YoloPreParam and YoloAlgParam in the following code cannot be empty * Open the model again CVI_TDL_OpenModel * After opening the model again, the corresponding confidence level and nsm threshold can be set: * CVI_TDL_SetModelThreshold sets the confidence threshold, which defaults to 0.5 * CVI_TDL_SetModelNmsThreshold sets the nsm threshold to 0.5 by default

// yolo preprocess setup
YoloPreParam p_preprocess_cfg;
for (int i = 0; i < 3; i++) {
    p_preprocess_cfg.factor[i] = 0.003922;
    p_preprocess_cfg.mean[i] = 0.0;
}
p_preprocess_cfg.use_quantize_scale = true;
p_preprocess_cfg.format = PIXEL_FORMAT_RGB_888_PLANAR;

// setup yolov5 param
YoloAlgParam p_yolov5_param;
uint32_t p_anchors[3][3][2] = {{{10, 13}, {16, 30}, {33, 23}},
                            {{30, 61}, {62, 45}, {59, 119}},
                            {{116, 90}, {156, 198}, {373, 326}}};
p_yolov5_param.anchors = &p_anchors;
uint32_t strides[3] = {8, 16, 32};
p_yolov5_param.strides = &strides;
p_yolov5_param.anchor_len = 3;
p_yolov5_param.stride_len = 3;
p_yolov5_param.cls = 80;

printf("setup yolov5 param \n");
ret = CVI_TDL_Set_YOLOV5_Param(tdl_handle, &p_preprocess_cfg, &p_yolov5_param);
if (ret != CVI_SUCCESS) {
    printf("Can not set Yolov5 parameters %#x\n", ret);
    return ret;
}

ret = CVI_TDL_OpenModel(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV5, model_path.c_str());
if (ret != CVI_SUCCESS) {v c
    printf("open model failed %#x!\n", ret);
    return ret;
}

// set thershold for yolov5
CVI_TDL_SetModelThreshold(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV5, 0.5);
CVI_TDL_SetModelNmsThreshold(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV5, 0.5);

3.7. Compilation Instructions

  1. Obtain the cross-compilation tools

wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/23/03/07/16/host-tools.tar.gz
tar xvf host-tools.tar.gz
cd host-tools
export PATH=$PATH:$(pwd)/gcc/riscv64-linux-musl-x86_64/bin
  1. Download the TDL SDK

The download site for the tdlsdk toolkit: sftp://218.17.249.213. Account: cvitek_mlir_2023. Password: 7&2Wd%cu5k.

We download the cvitek_tdl_sdk_1227.tar.gz file.

  1. Compile the TDL SDK

We enter the sample directory under cvitek_tdl_sdk.

chmod 777 compile_sample.sh
./compile_sample.sh
  1. After compilation, connect to the development board and execute the program:

    • Connect the development board to the network, ensuring that the board and computer are on the same gateway.

    • Connect the computer to the development board via serial port, set the baud rate to 115200, and enter ifconfig on the computer’s serial port to obtain the development board’s IP address.

    • Connect to the development board using an SSH remote tool to the corresponding IP address, with the default username: root, and the default password: cvitek_tpu_sdk.

    • After connecting to the development board, you can mount an SD card or a computer folder:
      • The command to mount the SD card is:

      mount /dev/mmcblk0 /mnt/sd
      # or
      mount /dev/mmcblk0p1 /mnt/sd
      
      • The command to mount a computer folder is:

      mount -t nfs 10.80.39.3:/sophgo/nfsuser ./admin1_data -o nolock
      

      Be sure to change the IP address to your computer’s IP and modify the path to your own path accordingly.

  2. Export the Dynamic Dependency Libraries

The main dynamic dependency libraries required are:

  • lib under the ai_sdk directory

  • lib under the tpu_sdk directory

  • middlewave/v2/lib

  • middleware/v2/3rd

  • lib under the ai_sdk/sample/3rd directory

Example as follows:

export LD_LIBRARY_PATH=/tmp/lfh/cvitek_tdl_sdk/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/opencv/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/tpu/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/ive/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/middleware/v2/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/lib:\
                        /tmp/lfh/cvitek_tdl_sdk/sample/3rd/middleware/v2/lib/3rd:

Caution

Be sure to change /tmp/lfh to a path accessible by the development board. If you are using an SD card mount, you can copy all the necessary files from the lib directories into one folder in advance, and then export the corresponding path on the SD card.

  1. Run the Sample Program

  • Switch to the mounted cvitek_tdl_sdk/bin directory.

  • Then run the following test case:

./sample_yolov5 /path/to/yolov5s.cvimodel /path/to/test.jpg

Be mindful to select your own cvimodel and the mounted path for the test image when running the above command.

Reasoning and result acquisition

Obtain images locally or through streaming, and use CVI_TDL_ReadImage function reads the picture, and then calls Yolov5 inference interface CVI_TDL_Yolov5.

The results of reasoning are stored in obj In the meta structure, traverse to obtain the coordinates of the upper left and lower right corners of the bounding box bbox, as well as the object score (x1, y1, x2, y2, score), as well as the classification classes

VIDEO_FRAME_INFO_S fdFrame;
ret = CVI_TDL_ReadImage(img_path.c_str(), &fdFrame, PIXEL_FORMAT_RGB_888);
std::cout << "CVI_TDL_ReadImage done!\n";

if (ret != CVI_SUCCESS) {
    std::cout << "Convert out video frame failed with :" << ret << ".file:" << str_src_dir
              << std::endl;
}

cvtdl_object_t obj_meta = {0};

CVI_TDL_Yolov5(tdl_handle, &fdFrame, &obj_meta);

for (uint32_t i = 0; i < obj_meta.size; i++) {
    printf("detect res: %f %f %f %f %f %d\n", obj_meta.info[i].bbox.x1,
                              obj_meta.info[i].bbox.y1,
                              obj_meta.info[i].bbox.x2,
                              obj_meta.info[i].bbox.y2,
                              obj_meta.info[i].bbox.score,
                              obj_meta.info[i].classes);
}

The following are the results of testing the official YOLOv5 model after conversion on the Coco2017 dataset, using CV1811h as the testing platform_Wevb_0007a_Spinor

3.8. Test Result

The threshold used for the following tests is:

  • Conf Threshold: 0.001

  • Nms Threshold: 0.65

Input resolutions are all 640 x 640

The official export method of the YOLOv5S model onnx performance:

platform

Inference time (ms)

bandwidth (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

pytorch

N/A

N/A

N/A

56.8

37.4

cv181x

92.8

100.42

16.01

Quantification failure

Quantification failure

cv182x

69.89

102.74

16

Quantification failure

Quantification failure

cv183x

25.66

73.4

N/A

Quantification failure

Quantification failure

TDL of yolov5s model_SDK export method onnx performance:

platform

Inference time (ms)

bandwidth (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

onnx

N/A

N/A

N/A

55.4241

36.6361

cv181x

87.76

85.74

15.8

54.204

34.3985

cv182x

65.33

87.99

15.77

54.204

34.3985

cv183x

22.86

58.38

14.22

54.204

34.3985

The official export method of the YOLOv5m model onnx performance:

platform

Inference time (ms)

bandwidth (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

pytorch

N/A

N/A

N/A

64.1

45.4

cv181x

ion allocation failure

ion allocation failure

35.96

Quantification failure

Quantification failure

cv182x

180.85

258.41

35.97

Quantification failure

Quantification failure

cv183x

59.36

137.86

30.49

Quantification failure

Quantification failure

TDL of yolov5m model SDK export method onnx performance:

platform

Inference time (ms)

bandwidth (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

onnx

N/A

N/A

N/A

62.770

44.4973

cv181x

N/A

N/A

35.73

ion allocation failure

ion allocation failure

cv182x

176.04

243.62

35.74

61.5907

42.0852

cv183x

56.53

122.9

30.27

61.5907

42.0852