6. Mix Precision
This chapter takes yolov3 tiny
as examples to introduce how to use mix precision。
This model is from <https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/tiny-yolov3>。
This chapter requires the following files (where xxxx corresponds to the actual version information):
tpu-mlir_xxxx.tar.gz (The release package of tpu-mlir)
6.1. Load tpu-mlir
The following operations need to be in a Docker container. For the use of Docker, please refer to Setup Docker Container.
1$ tar zxf tpu-mlir_xxxx.tar.gz
2$ source tpu-mlir_xxxx/envsetup.sh
envsetup.sh
adds the following environment variables:
Name |
Value |
Explanation |
---|---|---|
TPUC_ROOT |
tpu-mlir_xxx |
The location of the SDK package after decompression |
MODEL_ZOO_PATH |
${TPUC_ROOT}/../model-zoo |
The location of the model-zoo folder, at the same level as the SDK |
envsetup.sh
modifies the environment variables as follows:
1export PATH=${TPUC_ROOT}/bin:$PATH
2export PATH=${TPUC_ROOT}/python/tools:$PATH
3export PATH=${TPUC_ROOT}/python/utils:$PATH
4export PATH=${TPUC_ROOT}/python/test:$PATH
5export PATH=${TPUC_ROOT}/python/samples:$PATH
6export LD_LIBRARY_PATH=$TPUC_ROOT/lib:$LD_LIBRARY_PATH
7export PYTHONPATH=${TPUC_ROOT}/python:$PYTHONPATH
8export MODEL_ZOO_PATH=${TPUC_ROOT}/../model-zoo
6.2. Prepare working directory
Create a yolov3_tiny
directory, note that it is the same level as tpu-mlir, and put both model files and image files into the yolov3_tiny
directory.
The operation is as follows:
1 $ mkdir yolov3_tiny && cd yolov3_tiny
2 $ wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/tiny-yolov3/model/tiny-yolov3-11.onnx
3 $ cp -rf $TPUC_ROOT/regression/dataset/COCO2017 .
4 $ mkdir workspace && cd workspace
$TPUC_ROOT
is an environment variable, corresponding to the tpu-mlir_xxxx directory.
6.3. Sample for onnx
detect_yolov3.py
is a python program, to run yolov3_tiny
model.
The operation is as follows:
$ detect_yolov3.py \
--model ../tiny-yolov3-11.onnx \
--input ../COCO2017/000000366711.jpg \
--output yolov3_onnx.jpg
The print result as follows:
person:60.7%
orange:77.5%
And get result image yolov3_onnx.jpg
, as below( yolov3_tiny ONNX ):

Fig. 6.1 yolov3_tiny ONNX
6.4. To INT8 symmetric model
6.4.1. Step 1: To F32 mlir
$ model_transform.py \
--model_name yolov3_tiny \
--model_def ../tiny-yolov3-11.onnx \
--input_shapes [[1,3,416,416]] \
--scale 0.0039216,0.0039216,0.0039216 \
--pixel_format rgb \
--keep_aspect_ratio \
--pad_value 128 \
--output_names=transpose_output1,transpose_output \
--mlir yolov3_tiny.mlir
6.4.2. Step 2: Gen calibartion table
$ run_calibration.py yolov3_tiny.mlir \
--dataset ../COCO2017 \
--input_num 100 \
-o yolov3_cali_table
6.4.3. Step 3: To model
$ model_deploy.py \
--mlir yolov3_tiny.mlir \
--quantize INT8 \
--calibration_table yolov3_cali_table \
--chip bm1684x \
--model yolov3_int8.bmodel
6.4.4. Step 4: Run model
$ detect_yolov3.py \
--model yolov3_int8.bmodel \
--input ../COCO2017/000000366711.jpg \
--output yolov3_int8.jpg
The print result as follows, indicates that one target is detected:
orange:73.0%
And get image yolov3_int8.jpg
, as below( yolov3_tiny int8 symmetric ):

Fig. 6.2 yolov3_tiny int8 symmetric
It can be seen that the int8 symmetric quantization model performs poorly compared to the original model on this image and only detects one target.
6.5. To Mix Precision Model
After int8 conversion, do these commands as beflow.
6.5.1. Step 1: Gen quantization table
Use run_qtable.py
to gen qtable, parameters as below:
Name |
Required? |
Explanation |
---|---|---|
(None) |
Y |
mlir file |
dataset |
N |
Directory of input samples. Images, npz or npy files are placed in this directory |
data_list |
N |
The sample list (cannot be used together with “dataset”) |
calibration_table |
Y |
Name of calibration table file |
chip |
Y |
The platform that the model will use. Support bm1684x/bm1684/cv183x/cv182x/cv181x/cv180x. |
fp_type |
N |
Specifies the type of float used for mixing precision. Support auto,F16,F32,BF16. Default is auto, indicating that it is automatically selected by program |
input_num |
N |
The number of sample, default 10 |
expected_cos |
N |
Specify the minimum cos value for the expected final output layer of the network. The default is 0.99. The smaller the value, the more layers may be set to floating-point |
min_layer_cos |
N |
Specify the minimum cos expected per layer, below which an attempt is made to set the fp32 calculation. The default is 0.99 |
debug_cmd |
N |
Specifies a debug command string for development. It is empty by default |
o |
Y |
output quantization table |
The operation is as follows:
$ run_qtable.py yolov3_tiny.mlir \
--dataset ../COCO2017 \
--calibration_table yolov3_cali_table \
--min_layer_cos 0.999 \ #If the default 0.99 is used here, the program detects that the original int8 model already meets the cos of 0.99 and simply stops searching
--expected_cos 0.9999 \
--chip bm1684x \
-o yolov3_qtable
The final output after execution is printed as follows:
int8 outputs_cos:0.999317
mix model outputs_cos:0.999739
Output mix quantization table to yolov3_qtable
total time:44 second
Above, int8 outputs_cos represents the cos similarity between original network output of int8 model and fp32; mix model outputs_cos represents the cos similarity of network output after mixing precision is used in some layers; total time represents the search time of 44 seconds.
In addition,get quantization table yolov3_qtable
, context as below:
# op_name quantize_mode
convolution_output11_Conv F16
model_1/leaky_re_lu_2/LeakyRelu:0_LeakyRelu F16
model_1/leaky_re_lu_2/LeakyRelu:0_pooling0_MaxPool F16
convolution_output10_Conv F16
convolution_output9_Conv F16
model_1/leaky_re_lu_4/LeakyRelu:0_LeakyRelu F16
model_1/leaky_re_lu_5/LeakyRelu:0_LeakyRelu F16
model_1/leaky_re_lu_5/LeakyRelu:0_pooling0_MaxPool F16
model_1/concatenate_1/concat:0_Concat F16
In the table, first col is layer name, second is quantization type.
Also full_loss_table.txt
is generated, context as blow:
1# chip: bm1684x mix_mode: F16
2###
3No.0 : Layer: convolution_output11_Conv Cos: 0.984398
4No.1 : Layer: model_1/leaky_re_lu_5/LeakyRelu:0_LeakyRelu Cos: 0.998341
5No.2 : Layer: model_1/leaky_re_lu_2/LeakyRelu:0_pooling0_MaxPool Cos: 0.998500
6No.3 : Layer: convolution_output9_Conv Cos: 0.998926
7No.4 : Layer: convolution_output8_Conv Cos: 0.999249
8No.5 : Layer: model_1/leaky_re_lu_4/LeakyRelu:0_pooling0_MaxPool Cos: 0.999284
9No.6 : Layer: model_1/leaky_re_lu_1/LeakyRelu:0_LeakyRelu Cos: 0.999368
10No.7 : Layer: model_1/leaky_re_lu_3/LeakyRelu:0_LeakyRelu Cos: 0.999554
11No.8 : Layer: model_1/leaky_re_lu_1/LeakyRelu:0_pooling0_MaxPool Cos: 0.999576
12No.9 : Layer: model_1/leaky_re_lu_3/LeakyRelu:0_pooling0_MaxPool Cos: 0.999723
13No.10 : Layer: convolution_output12_Conv Cos: 0.999810
This table is arranged smoothly according to the cos from small to large, indicating the cos calculated
by this Layer after the precursor layer of this layer has been changed to the corresponding floating-point mode.
If the cos is still smaller than the previous parameter min_layer_cos, this layer and its immediate successor
layer will be set to floating-point calculation。
run_qtable.py
calculates the output cos of the whole network every time the neighboring two layers are set
to floating point. If the cos is larger than the specified expected_cos, the search is withdrawn. Therefore,
if you set a larger expected_cos value, you will try to set more layers to floating point。
6.5.2. Step 2: Gen mix precision model
$ model_deploy.py \
--mlir yolov3_tiny.mlir \
--quantize INT8 \
--quantize_table yolov3_qtable \
--calibration_table yolov3_cali_table \
--chip bm1684x \
--model yolov3_mix.bmodel
6.5.3. Step 3: run mix precision model
$ detect_yolov3.py \
--model yolov3_mix.bmodel \
--input ../COCO2017/000000366711.jpg \
--output yolov3_mix.jpg
The print result as follows:
person:63.9%
orange:73.0%
And get image yolov3_mix.jpg
, as below( yolov3_tiny mix ):

Fig. 6.3 yolov3_tiny mix
It can be seen that targets that cannot be detected in int8 model can be detected again with the use of mixing precision.