6. 通用yolov8模型部署

6.1. 引言

本文档介绍了如何将yolov8架构的模型部署在cv181x开发板的操作流程,主要的操作步骤包括:

  • yolov8模型pytorch版本转换为onnx模型

  • onnx模型转换为cvimodel格式

  • 最后编写调用接口获取推理结果

6.2. pt模型转换为onnx

首先获取yolov8官方仓库代码[ultralytics/ultralytics: NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite (github.com)](https://github.com/ultralytics/ultralytics)

git clone https://github.com/ultralytics/ultralytics.git

再下载对应的yolov8模型文件,以[yolov8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt)为例,然后将下载的yolov8n.pt放在ultralytics/weights/目录下,如下命令行所示

cd ultralytics & mkdir weights
cd weights
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt

调整yolov8输出分支,去掉forward函数的解码部分,并将三个不同的feature map的box以及cls分开,得到6个分支,这一步可以直接使用yolo_export的脚本完成

yolo_export中的脚本可以通过SFTP获取:下载站台:sftp://218.17.249.213 帐号:cvitek_mlir_2023 密码:7&2Wd%cu5k

通过SFTP找到下图对应的文件夹:

/home/公版深度学习SDK/yolo_export.zip

将yolo_export/yolov8_export.py代码复制到yolov8仓库下,然后使用以下命令导出分支版本的onnx模型:

python yolov8_export.py --weights ./weights/yolov8.pt

运行上述代码之后,可以在./weights/目录下得到yolov8n.onnx文件,之后就是将onnx模型转换为cvimodel模型

小技巧

如果输入为1080p的视频流,建议将模型输入尺寸改为384x640,可以减少冗余计算,提高推理速度,如下:

python yolov8_export.py --weights ./weights/yolov8.pt --img-size 384 640

6.3. onnx模型转换cvimodel

cvimodel转换操作可以参考cvimodel转换操作可以参考yolo-v5移植章节的onnx模型转换cvimodel部分。

6.4. TDL_SDK接口说明

yolov8的预处理设置参考如下:

// set preprocess and algorithm param for yolov8 detection
// if use official model, no need to change param
CVI_S32 init_param(const cvitdl_handle_t tdl_handle) {
  // setup preprocess
  YoloPreParam preprocess_cfg =
      CVI_TDL_Get_YOLO_Preparam(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION);

  for (int i = 0; i < 3; i++) {
    printf("asign val %d \n", i);
    preprocess_cfg.factor[i] = 0.003922;
    preprocess_cfg.mean[i] = 0.0;
  }
  preprocess_cfg.format = PIXEL_FORMAT_RGB_888_PLANAR;

  printf("setup yolov8 param \n");
  CVI_S32 ret = CVI_TDL_Set_YOLO_Preparam(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION,
                                          preprocess_cfg);
  if (ret != CVI_SUCCESS) {
    printf("Can not set yolov8 preprocess parameters %#x\n", ret);
    return ret;
  }

  // setup yolo algorithm preprocess
  YoloAlgParam yolov8_param =
      CVI_TDL_Get_YOLO_Algparam(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION);
  yolov8_param.cls = 80;

  printf("setup yolov8 algorithm param \n");
  ret =
      CVI_TDL_Set_YOLO_Algparam(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION, yolov8_param);
  if (ret != CVI_SUCCESS) {
    printf("Can not set yolov8 algorithm parameters %#x\n", ret);
    return ret;
  }

  // set theshold
  CVI_TDL_SetModelThreshold(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION, 0.5);
  CVI_TDL_SetModelNmsThreshold(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION, 0.5);

  printf("yolov8 algorithm parameters setup success!\n");
  return ret;
}

推理测试代码:

ret = CVI_TDL_OpenModel(tdl_handle, CVI_TDL_SUPPORTED_MODEL_YOLOV8_DETECTION, argv[1]);

if (ret != CVI_SUCCESS) {
  printf("open model failed with %#x!\n", ret);
  return ret;
}
printf("---------------------to do detection-----------------------\n");

VIDEO_FRAME_INFO_S bg;
ret = CVI_TDL_ReadImage(strf1.c_str(), &bg, PIXEL_FORMAT_RGB_888_PLANAR);
if (ret != CVI_SUCCESS) {
  printf("open img failed with %#x!\n", ret);
  return ret;
} else {
  printf("image read,width:%d\n", bg.stVFrame.u32Width);
  printf("image read,hidth:%d\n", bg.stVFrame.u32Height);
}
std::string str_res;
cvtdl_object_t obj_meta = {0};
CVI_TDL_YOLOV8_Detection(tdl_handle, &bg, &obj_meta);

std::cout << "objnum:" << obj_meta.size << std::endl;
std::stringstream ss;
ss << "boxes=[";
for (uint32_t i = 0; i < obj_meta.size; i++) {
  ss << "[" << obj_meta.info[i].bbox.x1 << "," << obj_meta.info[i].bbox.y1 << ","
    << obj_meta.info[i].bbox.x2 << "," << obj_meta.info[i].bbox.y2 << ","
    << obj_meta.info[i].classes << "," << obj_meta.info[i].bbox.score << "],";
}
ss << "]\n";
std::cout << ss.str();

6.5. 测试结果

转换测试了官网的yolov8n以及yolov8s模型,在COCO2017数据集上进行了测试,其中阈值设置为:

  • conf: 0.001

  • nms_thresh: 0.6

所有分辨率均为640 x 640

yolov8n模型的官方导出方式性能:

测试平台

推理耗时 (ms)

带宽 (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

pytorch

N/A

N/A

N/A

53

37.3

cv180x

ion分配失败

ion分配失败

13.26

ion分配失败

ion分配失败

cv181x

54.91

44.16

8.64

量化失败

量化失败

cv182x

40.21

44.32

8.62

量化失败

量化失败

cv183x

17.81

40.46

8.3

量化失败

量化失败

cv186x

7.03

55.03

13.92

量化失败

量化失败

yolov8n模型的TDL_SDK导出方式性能:

测试平台

推理耗时 (ms)

带宽 (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

onnx

N/A

N/A

N/A

51.32

36.4577

cv180x

299

78.78

12.75

45.986

31.798

cv181x

45.62

31.56

7.54

51.2207

35.8048

cv182x

32.8

32.8

7.72

51.2207

35.8048

cv183x

12.61

28.64

7.53

51.2207

35.8048

cv186x

5.20

43.06

12.02

51.03

35.61

yolov8s模型的官方导出方式性能:

测试平台

推理耗时 (ms)

带宽 (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

pytorch

N/A

N/A

N/A

61.8

44.9

cv180x

模型转换失败

模型转换失败

模型转换失败

模型转换失败

模型转换失败

cv181x

144.72

101.75

17.99

量化失败

量化失败

cv182x

103

101.75

17.99

量化失败

量化失败

cv183x

38.04

38.04

16.99

量化失败

量化失败

cv186x

13.16

95.03

23.44

量化失败

量化失败

yolov8s模型的TDL_SDK导出方式性能:

测试平台

推理耗时 (ms)

带宽 (MB)

ION(MB)

MAP 0.5

MAP 0.5-0.95

onnx

N/A

N/A

N/A

60.1534

44.034

cv180x

模型转换失败

模型转换失败

模型转换失败

模型转换失败

模型转换失败

cv181x

135.55

89.53

18.26

60.2784

43.4908

cv182x

95.95

89.53

18.26

60.2784

43.4908

cv183x

32.88

58.44

16.9

60.2784

43.4908

cv186x

11.37

82.61

21.96

60.27

43.52