7.2. Video Codec Performance
contents
This section lists the performance data of the BM1684 chip for video encoding, decoding, and transcoding. In addition, we attach the test commands for each performance test. You can verify the performance according to the test commands in the table.
注解
The test file can be obtained from the following link: http://disk-sophgo-vip.quickconnect.cn/sharing/J16mZJNRl
7.2.1. Decode capability performance test
Command line
#!/bin/bash
video=jellyfish-3-mbps-hd-h264.mkv
tpu_id=1
thread_num=8
sleep_time=30
decode_type=h264_bm
rm -rf *.log
rm -rf fpssum.txt fps.txt framesum.txt frame.txt speed.txt timeavg.txt timesum.txt time.txt
for ((j=0; j<$thread_num; j++))
do
{
nohup ffmpeg -benchmark -output_format 101 -zero_copy 1 -sophon_idx ${tpu_id} -c:v ${decode_type} -i $video -f null /dev/null >> ${video}_thread_${j}_tpu_${tpu_id}.log 2>&1 &
} &
done
echo "Start Wait End..."
sleep ${sleep_time}
echo "Start Calc fps..."
for ((j=0; j<$thread_num; j++))
do
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1| awk -F '=' '{print $3}'|tr -d 'a-z'|tr -d ' ' >> fps.txt #fps
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1| awk -F '=' '{print $2}'|tr -d 'a-z'|tr -d ' ' >> frame.txt #frames
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 5|head -n 1| awk -F '=' '{print $4}'|tr -d 'a-z'|tr -d ' ' >> time.txt #frames
cat fps.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
cat frame.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
cat time.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
done
echo "tpuid: $tpu_id"
echo "total_frames: $(<framesum.txt)"
task_num=$(cat frame.txt | grep "." -c)
cat timesum.txt | awk '{print $1/'$task_num'}' > timeavg.txt
echo "avg_time: $(<timeavg.txt)"
f_sum=$(<framesum.txt)
time_avg=$(<timeavg.txt)
awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
speed=$(<speed.txt)
echo "speed: ${speed}"
Parameter
tpu_id: The number of the tested TPU, can be viewed through bm-smi.
thread_num: The number of encoding threads, typically set to 16.
sleep_time: Sleep waiting time, the length of time required to wait for encoding completion, based on TPU performance and video format settings. 7200 frames of video in the test file as an example, it is generally set to 150s with 1684.
decode_type: h264_bm, hevc_bm.
Typical results
Typical results |
||
Chip |
Scene | Test Result | |
|
1684 |
encode |
![]() |
1684X |
encode |
![]() |
7.2.2. Encode capability performance test
Command line
#!/bin/bash
video=jellyfish-3-mbps-hd-h264.mkv
tpu_id=1
thread_num=3
sleep_time=150
encode_type=h264_bm
rm -rf *.log
rm -rf fpsall.txt fpssum.txt frameall.txt framesum.txt speed.txt timeall.txt timeavg.txt timesum.txt
for ((j=0; j<$thread_num; j++))
do
{
nohup ffmpeg -benchmark -hwaccel bmcodec -hwaccel_device $tpu_id -extra_frame_buffer_num 5 -i $video -c:v ${encode_type} -g 256 -is_dma_buffer 1 -b:v 3MB -enc-params "gop_preset=2:mb_rc=1:delta_qp=3:min_qp=35:max_qp=40" -f null /dev/null >> ${video}_thread_${j}_tpu_${tpu_id}.log 2>&1 &
} &
done
echo "Start Wait End..."
sleep ${sleep_time}
echo "Start Calc fps..."
for ((j=0; j<$thread_num; j++))
do
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1|awk -F "=" '{print $3}'|tr -d "a-z"|tr -d " " >> fpsall.txt
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1|awk -F "=" '{print $2}'|tr -d " "|tr -d "a-z" >> frameall.txt
cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 5|head -n 1|awk -F "=" '{print $4}'|tr -d "a-z"|tr -d " " >> timeall.txt
cat fpsall.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
cat frameall.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
cat timeall.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
done
echo "tpu_id: $tpu_id"
echo "total_frames: $(<framesum.txt)"
task_num=$(cat frameall.txt | grep "." -c)
cat timesum.txt | awk '{print $1/'$task_num'}' > timeavg.txt
echo "avg_time: $(<timeavg.txt)"
f_sum=$(<framesum.txt)
time_avg=$(<timeavg.txt)
awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
speed=$(<speed.txt)
echo "speed: ${speed}"
Parameter
tpu_id: The number of the tested TPU, can be viewed through bm-smi.
thread_num: The number of encoding threads, typically set to 2 on the BM1684 chip and 12 on the BM1684X chip.
sleep_time: Sleep waiting time, the length of time required to wait for encoding completion, based on TPU performance and video format settings. Taking 12 threads of 1684X and 7200 frames of video in the test file as an example, it is generally set to 150s.
encode_type: h264_bm, h265_bm
Typical results
Typical results |
||
Chip |
Scene | Test Result | |
|
1684 |
encode |
![]() |
1684X |
encode |
![]() |
7.2.3. Transcode capability performance test
Generate the original video required for transcoding
ffmpeg -i jellyfish-3-mbps-hd-h264.mkv -c copy -bsf:v h264_mp4toannexb -f mpegts 1.264
ffmpeg -i "concat:1.264|1.264|1.264|1.264|1.264|1.264|1.264|1.264" -c copy -bsf:a aac_adtstoasc -movflags +faststart test.264
To 32K low code stream
ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 32K -y result.264
To 1M high code stream
ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 1M -y result.ts
Parameter
-hwaccel: use hardware API
-hwaccel_device: Specifies the device. Use bm-smi to view
-c:v: decoder
-output_format: Format of the output data. Set to 0, the linear array of uncompressed data is output; If the value is set to 101, compressed data is output. The default value is 0. The recommended value is 101 to output compressed data. It can save memory and bandwidth.The output compressed data can be decompressed into normal YUV data by calling scale_bm filter.
-i: input
-vf: ffmpeg filter, scale_bm resize the frame to the target width and height
-c: decorder
-g: keyframe interval control
-b: output stream
-y: overwrite the original
result.ts output file
Typical results |
||
Chip |
Scene |
Test result |
1684 |
To 32K low code stream |
![]() |
To 1M high code stream |
![]() |
|
1684X |
To 32K low code stream |
![]() |
To 1M high code stream |
![]() |