Decompress test package

mkdir -p sophon/model-zoo
tar -xvf path/to/model-zoo_<date>.tar.bz2 --strip-components=1 -C sophon/model-zoo
cd sophon/model-zoo

The catalogue structure of test package is as below:

├── config.yaml
├── requirements.txt
├── data
├── dataset
├── harness
├── output
└── ...

General configurations included in config.yaml: Catalogue of dataset, root catalogue of model, etc. as well as some multiplexing parameters and commands.
requirements.txt is the python dependency of model-zoo.
dataset catalogue includes imagenet dataset pre-processing of model so as to be called by tpu_perf as plugin.
data catalogue will be used for storing lmdb dataset.
output catalogue will be used for storing bmodel output by compilation and some intermediate data.
Other catalogues include the information and configuration of different models. Catalogue corresponding to each model contains one config.yaml file, which is configured with model name, path, FLOPs, dataset production parameters and quantization compilation command of model.

Prepare dataset

ImageNet

Download the followings of imagenet 2012 dataset

ILSVRC2012_img_val.tar（MD5 29b22e2961454d5413ddabcf34fc5622）.

Decompress ILSVRC2012_img_val.tar to dataset/ILSVRC2012/ILSVRC2012_img_val catalogue:

cd path/to/sophon/model-zoo
tar xvf path/to/ILSVRC2012_img_val.tar -C dataset/ILSVRC2012/ILSVRC2012_img_val

COCO (optional)

If coco dataset is used for precision test, please download and decompress it according to the following steps:

cd path/to/sophon/model-zoo/dataset/COCO2017/
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
unzip annotations_trainval2017.zip
unzip val2017.zip

Run performance and accuracy tests in SOC

Performance and accuracy testing only depends on the libsophon running environment, so the model compiled in the toolchain compilation environment can be packaged together with Model-zoo, and TPU_perf is used for performance and accuracy testing in the SOC environment. However, due to the limited eMMC, the complete Model-zoo and compiled output may not be fully copied to the SOC.Here is a method to compile the model in the toolchain environment and run the test in the soc by mounting the linux nfs remote file system.

First, install the nfs service on the toolchain environment server 『host system』:

sudo apt install nfs-kernel-server

Add the shared directory in /etc/exports:

/path/to/sophon *(rw,sync,no_subtree_check,no_root_squash)

*``Indicates that everyone can access the shared directory, or it can be configured to be accessible by a specific network segment or IP, such as ``192.168.43.0/24 .

Then execute the following command to make the configuration take effect:

sudo exportfs -a
sudo systemctl restart nfs-kernel-server

In addition, you need to add read permissions to the images in the dataset directory:

chmod -R +r path/to/sophon/model-zoo/dataset

Install the client in soc and mount the shared directory:

mkdir sophon
sudo apt-get install -y nfs-common
sudo mount -t nfs <IP>:/path/to/sophon ./sophon

This will make the test directory accessible in the soc environment. The remaining operations of the soc test are basically the same as those of pcie, please refer to the following for operations; the difference in the execution position of the running environment command has been added in the execution place.

Prepare running environment

In python3.7 and above,install the required dependencies for model-zoo using pip:

sudo apt-get install -y libgl1 # For OpenCV
# The accuracy test needs to perform the following operations, and the performance test does not perform
cd path/to/sophon/model-zoo
pip3 install -r requirements.txt

If running the tests in the soc environment, execute the above command on the soc.

In addition, call tpu hardware in running environment for performance and precision test. Please install libsophon by following libsophon user manual.

Install the tpu_perf tool

Using the tpu_perf tool, you can easily verify the model performance and accuracy, and realize batch testing and verification of multiple models and batch sizes. You can get the latest version of the tpu_perf release corresponding to the architecture from here, or you can compile it from the source code. The following are the source code compilation steps.

Please compile the tpu_perf source code in the deployment environment. Compilation depends on libsophon-dev, please refer to the libsophon manual for installation.

Compilation depends on the python packaging tool, please use pip to install:

pip3 install setuptools wheel

Compilation depends on protoc.On x86 machines, please use the following installation method to maintain compatibility with the tpu-nntc environment:

wget -O /tmp/protoc-3.19.4-linux-x86_64.zip \
    https://github.com/protocolbuffers/protobuf/releases/download/v3.19.4/protoc-3.19.4-linux-x86_64.zip
cd path/to/sophon
mkdir protoc
unzip -o -d ./protoc /tmp/protoc-3.19.4-linux-x86_64.zip

In the soc environment, you can use the package manager to install directly:

sudo apt-get install -y protobuf-compiler

Then, get the tpu_perf source code and unzip it, and then execute the compilation. The command is as follows:

mkdir tpu_perf
tar xvf path/to/tpu-perf-X.Y.Z.tar.gz --strip-components=1 -C tpu_perf

# If the tpu_perf/build directory already exists, it is recommended to delete it first.
# rm -r tpu_perf/build

# Execute compilation.
mkdir -p tpu_perf/build
cd tpu_perf/build
PATH=$PATH:../../protoc/bin cmake ..
make install/strip -j4

Among them, the PATH environment variable specified in the cmake generation step is to use the correct protoc program, and it can be not specified in the soc environment.

After successful compilation, the whl package will be generated under tpu_perf/python/dist.

When tpu_perf runs also depends on numpy , lmdb , protobuf==3.19.* , psutil , pyyaml numpy , lmdb , protobuf==3.19.* , psutil , pyyaml ,make sure you can connect to the Internet or install dependencies manually when installing whl.

cd ..
pip3 install --upgrade python/dist/tpu_perf-X.Y.Z-py3-none-<arch>.whl

Please note that if you compile tpu_perf in the soc environment, please download and compile it on the soc environment and the toolchain environment 『host system』 respectively to obtain a package for ‘Aarch64’ and a package for ‘x86_64’ for both machines.

Prepare tool chain compilation environment

Tool chain software is suggested to be used in docker environment. docker of the latest version can be installed by referring to official tutorial.After installation, execute the following script and add the present user into docke group to obtain docker execution authority.

sudo usermod -aG docker $USER
newgrp docker

Next, pull docker mirroring from docker hub:

docker pull sophgo/tpuc_dev:v2.1

Then decompress tool chain development package from test package catalogue. The latest tool chain development package could be obtained from official website.

cd path/to/sophon
tar xvf tpu-nntc_vx.y.z-<hash>-<date>.tar.gz

# Optional, copy the tpu_perf installation package to the working directory
cp path/to/tpu_perf-X.Y.Z-py3-none-x86_64.whl ./

Next, start docker container and map the catalogue to docker.

docker run -td -v $(pwd):/workspace --name nntc sophgo/tpuc_dev:v2.1 bash

Enter docker environment by executing the following commands:

docker exec -it nntc bash

One example

By taking resnet50 network as an example, the process of performance and precision test is executed for once here completely so that users could gain a full understanding of test process.

First, compile model in『docker tool chain environment』:

# Env setup
cd /workspace/tpu-nntc_vx.y.z-<hash>-<date>
source scripts/envsetup.sh

# Prepare working directory
cd /workspace/model-zoo
mkdir -p output/resnet50
cp vision/classification/ResNet50-Caffe/ResNet-50-* output/resnet50
cd output/resnet50

python3 -m ufw.cali.cali_model \
    --model ./ResNet-50-deploy.prototxt \
    --weights ./ResNet-50-model.caffemodel \
    --cali_image_path /workspace/model-zoo/dataset/ILSVRC2012/caliset \
    --test_iterations 10 \
    --net_name resnet50 \
    --postprocess_and_calc_score_class none \
    --target BM1684X \
    --cali_iterations 100 \
    --cali_image_preprocess='
        resize_side=256;
        crop_w=224,crop_h=224;
        mean_value=103.94:116.78:123.68,scale=1' \
    --input_shapes=[1,3,224,224]

The command quantifies model into int8 model in virtue of auto_cali quantization tool. Quantization dataset is required and the pre-processing parameter needs designating.

Next, carry out performance test and precision verification of model under the running environment. First, exit tool chain docker environment:

exit

If running the tests in the soc environment, execute the following model test commands on the soc.

Verify model reasoning time via bmrt_test tool

cd path/to/sophon/model-zoo
bmrt_test --bmodel output/resnet50/resnet50_batch1/compilation.bmodel

The key performance parameters of model will be printed after execution.

Next, the running precision test program will verify bmodel precision on dataset:

python3 harness/topk.py \
    --mean 103.94,116.78,123.68 --scale 1 --size 224 \
    --bmodel output/resnet50/resnet50_batch1/compilation.bmodel \
    --image_path ./dataset/ILSVRC2012/ILSVRC2012_img_val \
    --list_file ./dataset/ILSVRC2012/caffe_val.txt  \
    --devices 10

The last line of program will output top 1 and top 5 precision measured by bmodel.

The following two chapters introduce how to verify performance and precision using tpu_perf. In consideration of multiple models and relatively long execution time of commands, it is suggested to execute command by putting it in terminal manager of screen/tmux, if operation is made via ssh dialogue so as to avoid suspending task once dialogue is finished.

Performance test

First, compile model in『docker tool chain environment:

# Env setup
cd /workspace/tpu-nntc_vx.y.z-<hash>-<date>
source scripts/envsetup.sh

# Install tpu_perf tool
pip3 install --upgrade path/to/tpu_perf-X.Y.Z-py3-none-x86_64.whl

cd /workspace/model-zoo
python3 -m tpu_perf.build --list default_cases.txt  --time

tpu_perf all commands can specify a case list file to run, here default_cases.txt is used. The full list of cases can be run by specifying full_cases.txt (may take a long time), or a custom list file. If no list is specified, tpu_perf will traverse the current directory looking for config.yaml files and execute them one by one.

Running test shall be carried out in the running environment beyond docker. It is optional to exit docker environment:

exit

If running the tests in the soc environment, execute the following model test commands on the soc.

As model is executed by root user by default after generation in docker, change the owner of output catalogue to the current user.

cd path/to/sophon/model-zoo
sudo chown -R $(whoami):$(whoami) output

Then install and use tpu_perf tool to run model and generate performance data:

pip3 install --upgrade path/to/tpu_perf-X.Y.Z-py3-none-<arch>.whl
python3 -m tpu_perf.run --list default_cases.txt

Performance data can be obtained from output/stats.csv.

Precision verification

First, prepare dataset and compile model in『docker tool chain environment:

Prepare quantization and test LMDB

If model quantization is carried out via auto_cali , quantify model via image set or LMDB dataset. In fact, if image set is designated, auto_cali will generate LMDB dataset automatically based on pre-processing parameters for the purpose of quantization. As several models use the same image set and pre-processing in course of batch quantization, tpu_perf tool is used here to finish LMDB dataset in batch first before calling auto_cali for quantization.

# Install the tpu_perf tool
cd /workspace/tpu_perf
pip3 install --upgrade path/to/tpu_perf-X.Y.Z-py3-none-x86_64.whl

cd /workspace/model-zoo
python3 -m tpu_perf.make_lmdb --list default_cases.txt

Executing this command generates preprocessed lmdb datasets for quantification and testing based on the configuration of each model.

tpu_perf all commands can specify a case list file to run, here default_cases.txt is used. The full list of cases can be run by specifying full_cases.txt (may take a long time), or a custom list file. If no list is specified, tpu_perf will traverse the current directory looking for config.yaml files and execute them one by one.

The tool configuration uses 200 image sets for quantization by default. If you want to choose other image sets, you can put the images in the dataset/ILSVRC2012/caliset directory, and pass the cali_set field in config.yaml specifies the number of quantization sets. For details, see the preprocessing implementation in the dataset directory.

Script execution time may be long, pls wait patiently.

Model quantization and compilation

# Env setup
cd /workspace/tpu-nntc_vx.y.z-<hash>-<date>
source scripts/envsetup.sh

cd /workspace/model-zoo
python3 -m tpu_perf.build --list default_cases.txt

This command quantizes and compiles the models according to the configuration of each model. If there are many models, the script execution time will be longer, please be patient.

Precision test shall be carried out in the running environment beyond docker. It is optional to exit docker environment:

exit

Precision test

If running the tests in the soc environment, execute the following model test commands on the soc.

As dataset and model are executed by root user by default after generation in docker, change the owner of generated catalogue to the current user.

# Install the tpu_perf tool
pip3 install --upgrade path/to/tpu_perf-X.Y.Z-py3-none-<arch>.whl

cd path/to/sophon/model-zoo
sudo chown -R $(whoami):$(whoami) data
sudo chown -R $(whoami):$(whoami) output

The tests can then be run using the tpu_perf tool:

python3 -m tpu_perf.precision_benchmark --list default_cases.txt

Various types of precision data are available in individual csv files in the output directory.