7.2. 视频编解码性能

7.2.1. BM1684 性能测试

本节我们列举了BM1684芯片对于视频编码、解码、转码的性能数据,此外,我们附上了每一个性能测试的测试命令,您可以按照表格中的测试命令进行验证。

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684

decode

 1#!/bin/bash
 2rm -rf *.log *.txt
 3video=test_video/jellyfish-3-mbps-hd-h264.mkv
 4tpus=1
 5thread_num=8
 6for ((i=0; i<$tpus; i++))
 7do
 8{
 9    if [[ $tpus -gt `cat /proc/bmsophon/chip_num` ]];then
10        echo "The number of tpu entered is greater than the actual number."
11        return -1
12    fi
13    for ((j=0; j<$thread_num; j++))
14    do
15    {
16        nohup ffmpeg -benchmark -output_format 101 -zero_copy 1 -sophon_idx ${i} -c:v h264_bm -i $video -f null /dev/null >> ${video}_thread_${j}_tpu_${i}.log 2>&1 &
17    } &
18    done
19}
20done
21echo "Start Wait End..."
22sleep 30
23echo "Start Calc fps..."
24for ((i=0; i<$tpus; i++))
25do
26    for ((j=0; j<$thread_num; j++))
27    do
28        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1| awk -F '=' '{print $3}'|tr -d 'a-z'|tr -d ' ' >> fps.txt    #fps
29        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1| awk -F '=' '{print $2}'|tr -d 'a-z'|tr -d ' ' >> frame.txt    #frames
30        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 5|head -n 1| awk -F '=' '{print $4}'|tr -d 'a-z'|tr -d ' ' >> time.txt    #frames
31        cat fps.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
32        cat frame.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
33        cat time.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
34    done
35done
36    echo "chip_num: $tpus"
37        echo "total_frames: $(<framesum.txt)"
38        cat timesum.txt | awk '{task_num = "'$tpus'"*"'$thread_num'"} {print $1/task_num}' > timeavg.txt
39        echo "avg_time: $(<timeavg.txt)"
40        #echo "fps: $(<fpssum.txt)"
41        f_sum=$(<framesum.txt)
42        time_avg=$(<timeavg.txt)
43        awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
44        speed=$(<speed.txt)
45    echo "speed: ${speed}"
46        rm -rf f*.txt
47        rm -rf t*.txt
48        rm -rf *.log

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684

encode

 1#!/bin/bash
 2rm -rf *.log f*.txt t*.txt
 3video=test_video/jellyfish-3-mbps-hd-h264.mkv
 4tpus=1         #chip num: one SC5+ chip_num=3
 5thread_num=8   #8 cores
 6for ((i=0; i<$tpus; i++))
 7do
 8{
 9    if [[ $tpus -gt `cat /proc/bmsophon/chip_num` ]];then
10        echo "The number of tpu entered is greater than the actual number."
11        return -1
12    fi
13    for ((j=0; j<$thread_num; j++))
14    do
15    {
16    nohup ffmpeg -benchmark -hwaccel bmcodec -hwaccel_device $i -c:v h264_bm -i $video  -c:v h264_bm -g 256 -b:v 3MB -enc-params "gop_preset=4:mb_rc=1:delta_qp=3:min_qp=20:max_qp=33" -f null /dev/null >> ${video}_thread_${j}_tpu_${i}.log 2>&1 &
17    } &
18    done
19}
20done
21echo "Start Wait End..."
22sleep 105
23echo "Start Calc fps..."
24for ((i=0; i<$tpus; i++))
25do
26    for ((j=0; j<$thread_num; j++))
27    do
28        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1|awk -F "=" '{print $3}'|tr -d "a-z"|tr -d " " >> fpsall.txt
29        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1|awk -F "=" '{print $2}'|tr -d " "|tr -d "a-z" >> frameall.txt
30        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 5|head -n 1|awk -F "=" '{print $4}'|tr -d "a-z"|tr -d " " >> timeall.txt
31        cat fpsall.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
32        cat frameall.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
33        cat timeall.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
34    done
35done
36        echo "chip_num: $tpus"
37        echo "total_frames: $(<framesum.txt)"
38        cat timesum.txt | awk '{task_num = "'$tpus'"*"'$thread_num'"} {print $1/task_num}' > timeavg.txt
39        echo "avg_time: $(<timeavg.txt)"
40        #echo "fps: $(<fpssum.txt)"
41    f_sum=$(<framesum.txt)
42    time_avg=$(<timeavg.txt)
43    awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
44    speed=$(<speed.txt)
45    echo "speed: ${speed}"
46        rm -rf f*.txt
47        rm -rf t*.txt
48        rm -rf *.log

编解码能力性能测试

芯片

场景

测试结果

1684

decode

../_images/1684_decode.png

encode

../_images/1684_encode.png

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684

transcode

转 32K 低码流:

1ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 32K -y result.264
../_images/1684_transcode_1.png

转 1Mbps 高码流:

1ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 1M -y result.ts
../_images/1684_transcode_2.png
-hwaccel 使用硬件API
-hwaccel_device 指定设备,使用bm-smi查看
-c:v 解码器
-output_format 输出数据的格式。设为0,则输出线性排列的未压缩数据;设为101,则输出压缩数据。缺省值为0。推荐设置为101,输出压缩数据。可以节省内存、节省带宽。输出的压缩数据,可以调用scale_bm filter解压缩成正常的YUV数据。
-i 输入
-vf ffmpeg过滤器,scale_bm将frame resize为目标宽高
-c:v 编码器
-g 关键帧间隔控制
-b:v 输出码流
-y 输出时覆盖原文件result.ts 输出文件

7.2.2. BM1684X 性能测试

本节我们列举了BM1684X芯片对于视频编码、解码、转码的性能数据,此外,我们附上了每一个性能测试的测试命令,您可以按照表格中的测试命令进行验证。

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684X

decode

 1#!/bin/bash
 2rm -rf *.log *.txt
 3video=test_video/jellyfish-3-mbps-hd-h264.mkv
 4tpus=1
 5thread_num=8
 6for ((i=0; i<$tpus; i++))
 7do
 8{
 9    if [[ $tpus -gt `cat /proc/bmsophon/chip_num` ]];then
10        echo "The number of tpu entered is greater than the actual number."
11        return -1
12    fi
13    for ((j=0; j<$thread_num; j++))
14    do
15    {
16        nohup ffmpeg -benchmark -output_format 101 -zero_copy 1 -sophon_idx ${i} -c:v h264_bm -i $video -f null /dev/null >> ${video}_thread_${j}_tpu_${i}.log 2>&1 &
17    } &
18    done
19}
20done
21echo "Start Wait End..."
22sleep 30
23echo "Start Calc fps..."
24for ((i=0; i<$tpus; i++))
25do
26    for ((j=0; j<$thread_num; j++))
27    do
28        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1| awk -F '=' '{print $3}'|tr -d 'a-z'|tr -d ' ' >> fps.txt    #fps
29        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1| awk -F '=' '{print $2}'|tr -d 'a-z'|tr -d ' ' >> frame.txt    #frames
30        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 5|head -n 1| awk -F '=' '{print $4}'|tr -d 'a-z'|tr -d ' ' >> time.txt    #frames
31        cat fps.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
32        cat frame.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
33        cat time.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
34    done
35done
36    echo "chip_num: $tpus"
37        echo "total_frames: $(<framesum.txt)"
38        cat timesum.txt | awk '{task_num = "'$tpus'"*"'$thread_num'"} {print $1/task_num}' > timeavg.txt
39        echo "avg_time: $(<timeavg.txt)"
40        #echo "fps: $(<fpssum.txt)"
41        f_sum=$(<framesum.txt)
42        time_avg=$(<timeavg.txt)
43        awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
44        speed=$(<speed.txt)
45    echo "speed: ${speed}"
46        rm -rf f*.txt
47        rm -rf t*.txt
48        rm -rf *.log

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684X

encode

 1#!/bin/bash
 2rm -rf *.log f*.txt t*.txt
 3video=test_video/jellyfish-3-mbps-hd-h264.mkv
 4tpus=1         #chip num: one SC5+ chip_num=3
 5thread_num=8   #8 cores
 6for ((i=0; i<$tpus; i++))
 7do
 8{
 9    if [[ $tpus -gt `cat /proc/bmsophon/chip_num` ]];then
10        echo "The number of tpu entered is greater than the actual number."
11        return -1
12    fi
13    for ((j=0; j<$thread_num; j++))
14    do
15    {
16    nohup ffmpeg -benchmark -hwaccel bmcodec -hwaccel_device $i -c:v h264_bm -i $video  -c:v h264_bm -g 256 -b:v 3MB -enc-params "gop_preset=4:mb_rc=1:delta_qp=3:min_qp=20:max_qp=33" -f null /dev/null >> ${video}_thread_${j}_tpu_${i}.log 2>&1 &
17    } &
18    done
19}
20done
21echo "Start Wait End..."
22sleep 105
23echo "Start Calc fps..."
24for ((i=0; i<$tpus; i++))
25do
26    for ((j=0; j<$thread_num; j++))
27    do
28        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1|awk -F "=" '{print $3}'|tr -d "a-z"|tr -d " " >> fpsall.txt
29        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 7|head -n 1|awk -F "=" '{print $2}'|tr -d " "|tr -d "a-z" >> frameall.txt
30        cat ${video}_thread_${j}_tpu_${i}.log | tail -n 5|head -n 1|awk -F "=" '{print $4}'|tr -d "a-z"|tr -d " " >> timeall.txt
31        cat fpsall.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
32        cat frameall.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
33        cat timeall.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
34    done
35done
36        echo "chip_num: $tpus"
37        echo "total_frames: $(<framesum.txt)"
38        cat timesum.txt | awk '{task_num = "'$tpus'"*"'$thread_num'"} {print $1/task_num}' > timeavg.txt
39        echo "avg_time: $(<timeavg.txt)"
40        #echo "fps: $(<fpssum.txt)"
41    f_sum=$(<framesum.txt)
42    time_avg=$(<timeavg.txt)
43    awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
44    speed=$(<speed.txt)
45    echo "speed: ${speed}"
46        rm -rf f*.txt
47        rm -rf t*.txt
48        rm -rf *.log

编解码能力性能测试

芯片

场景

测试结果

1684X

decode

../_images/1684x_decode.png

encode

../_images/1684x_encode.png

编解码能力性能测试

芯片

场景

ffmpeg命令行

1684X

transcode

转 32K 低码流:

1ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 32K -y result.264
../_images/1684x_transcode_1.png

转 1Mbps 高码流:

1ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 1M -y result.ts
../_images/1684x_transcode_2.png
-hwaccel 使用硬件API
-hwaccel_device 指定设备,使用bm-smi查看
-c:v 解码器
-output_format 输出数据的格式。设为0,则输出线性排列的未压缩数据;设为101,则输出压缩数据。缺省值为0。推荐设置为101,输出压缩数据。可以节省内存、节省带宽。输出的压缩数据,可以调用scale_bm filter解压缩成正常的YUV数据。
-i 输入
-vf ffmpeg过滤器,scale_bm将frame resize为目标宽高
-c:v 编码器
-g 关键帧间隔控制
-b:v 输出码流
-y 输出时覆盖原文件result.ts 输出文件