7.2. Video Codec Performance

This section lists the performance data of the BM1684 chip for video encoding, decoding, and transcoding. In addition, we attach the test commands for each performance test. You can verify the performance according to the test commands in the table.

注解

The test file can be obtained from the following link: http://disk-sophgo-vip.quickconnect.cn/sharing/J16mZJNRl

7.2.1. Decode capability performance test

Command line

#!/bin/bash
video=jellyfish-3-mbps-hd-h264.mkv
tpu_id=1
thread_num=8
sleep_time=30
decode_type=h264_bm

rm -rf *.log
rm -rf fpssum.txt  fps.txt  framesum.txt  frame.txt  speed.txt  timeavg.txt  timesum.txt  time.txt

for ((j=0; j<$thread_num; j++))
do
{
    nohup ffmpeg -benchmark -output_format 101 -zero_copy 1 -sophon_idx ${tpu_id} -c:v ${decode_type} -i $video -f null /dev/null >> ${video}_thread_${j}_tpu_${tpu_id}.log 2>&1 &
} &
done


echo "Start Wait End..."

sleep ${sleep_time}

echo "Start Calc fps..."


for ((j=0; j<$thread_num; j++))
do
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1| awk -F '=' '{print $3}'|tr -d 'a-z'|tr -d ' ' >> fps.txt    #fps
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1| awk -F '=' '{print $2}'|tr -d 'a-z'|tr -d ' ' >> frame.txt    #frames
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 5|head -n 1| awk -F '=' '{print $4}'|tr -d 'a-z'|tr -d ' ' >> time.txt    #frames
    cat fps.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
    cat frame.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
    cat time.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
done


echo "tpuid: $tpu_id"
echo "total_frames: $(<framesum.txt)"
task_num=$(cat frame.txt | grep "." -c)
cat timesum.txt | awk '{print $1/'$task_num'}' > timeavg.txt
echo "avg_time: $(<timeavg.txt)"
f_sum=$(<framesum.txt)
time_avg=$(<timeavg.txt)
awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
speed=$(<speed.txt)
echo "speed: ${speed}"

Parameter

  • tpu_id: The number of the tested TPU, can be viewed through bm-smi.

  • thread_num: The number of encoding threads, typically set to 16.

  • sleep_time: Sleep waiting time, the length of time required to wait for encoding completion, based on TPU performance and video format settings. 7200 frames of video in the test file as an example, it is generally set to 150s with 1684.

  • decode_type: h264_bm, hevc_bm.

Typical results

Typical results

Chip

Scene | Test Result |

1684

encode

../_images/1684_encode.png

1684X

encode

../_images/1684x_encode.png

7.2.2. Encode capability performance test

Command line

#!/bin/bash
video=jellyfish-3-mbps-hd-h264.mkv
tpu_id=1
thread_num=3
sleep_time=150
encode_type=h264_bm

rm -rf *.log
rm -rf fpsall.txt  fpssum.txt  frameall.txt  framesum.txt  speed.txt  timeall.txt  timeavg.txt  timesum.txt

for ((j=0; j<$thread_num; j++))
do
{
    nohup ffmpeg -benchmark -hwaccel bmcodec -hwaccel_device $tpu_id -extra_frame_buffer_num 5 -i $video -c:v ${encode_type} -g 256 -is_dma_buffer 1 -b:v 3MB -enc-params "gop_preset=2:mb_rc=1:delta_qp=3:min_qp=35:max_qp=40" -f null /dev/null >> ${video}_thread_${j}_tpu_${tpu_id}.log 2>&1 &
} &
done

echo "Start Wait End..."
sleep ${sleep_time}
echo "Start Calc fps..."

for ((j=0; j<$thread_num; j++))
do
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1|awk -F "=" '{print $3}'|tr -d "a-z"|tr -d " " >> fpsall.txt
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 7|head -n 1|awk -F "=" '{print $2}'|tr -d " "|tr -d "a-z" >> frameall.txt
    cat ${video}_thread_${j}_tpu_${tpu_id}.log | tail -n 5|head -n 1|awk -F "=" '{print $4}'|tr -d "a-z"|tr -d " " >> timeall.txt
    cat fpsall.txt | awk '{sum+=$1} END {print sum}' > fpssum.txt
    cat frameall.txt | awk '{sum+=$1} END {print sum}' > framesum.txt
    cat timeall.txt | awk '{sum+=$1} END {print sum}' > timesum.txt
done

echo "tpu_id: $tpu_id"
echo "total_frames: $(<framesum.txt)"
task_num=$(cat frameall.txt | grep "." -c)
cat timesum.txt | awk '{print $1/'$task_num'}' > timeavg.txt
echo "avg_time: $(<timeavg.txt)"
f_sum=$(<framesum.txt)
time_avg=$(<timeavg.txt)
awk 'BEGIN{printf "%.2f\n",'$f_sum'/'$time_avg'}' > speed.txt
speed=$(<speed.txt)
echo "speed: ${speed}"

Parameter

  • tpu_id: The number of the tested TPU, can be viewed through bm-smi.

  • thread_num: The number of encoding threads, typically set to 2 on the BM1684 chip and 12 on the BM1684X chip.

  • sleep_time: Sleep waiting time, the length of time required to wait for encoding completion, based on TPU performance and video format settings. Taking 12 threads of 1684X and 7200 frames of video in the test file as an example, it is generally set to 150s.

  • encode_type: h264_bm, h265_bm

Typical results

Typical results

Chip

Scene | Test Result |

1684

encode

../_images/1684_encode.png

1684X

encode

../_images/1684x_encode.png

7.2.3. Transcode capability performance test

Generate the original video required for transcoding

ffmpeg -i jellyfish-3-mbps-hd-h264.mkv -c copy -bsf:v h264_mp4toannexb -f mpegts 1.264
ffmpeg -i "concat:1.264|1.264|1.264|1.264|1.264|1.264|1.264|1.264" -c copy -bsf:a aac_adtstoasc -movflags +faststart test.264

To 32K low code stream

ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 32K -y result.264

To 1M high code stream

ffmpeg -hwaccel bmcodec -hwaccel_device 0 -c:v h264_bm -output_format 101 -i test.264 -vf "scale_bm=352:288" -c:v h264_bm -g 256 -b:v 1M -y result.ts

Parameter

  • -hwaccel: use hardware API

  • -hwaccel_device: Specifies the device. Use bm-smi to view

  • -c:v: decoder

  • -output_format: Format of the output data. Set to 0, the linear array of uncompressed data is output; If the value is set to 101, compressed data is output. The default value is 0. The recommended value is 101 to output compressed data. It can save memory and bandwidth.The output compressed data can be decompressed into normal YUV data by calling scale_bm filter.

  • -i: input

  • -vf: ffmpeg filter, scale_bm resize the frame to the target width and height

  • -c: decorder

  • -g: keyframe interval control

  • -b: output stream

  • -y: overwrite the original

  • result.ts output file

Typical results

Chip

Scene

Test result

1684

To 32K low code stream

../_images/1684_transcode_1.png

To 1M high code stream

../_images/1684_transcode_2.png

1684X

To 32K low code stream

../_images/1684x_transcode_1.png

To 1M high code stream

../_images/1684x_transcode_2.png