BModel

About BModel

Bmodel is a deep neural network model file format for Sophon’s TPU processors. Generated by model compilers (such as bmnetc/bmnett, etc.), it contains parameter information of one or more networks, such as input and output information. It is loaded and used as a model file in the runtime phase.

Bmodel also serves as the compilation output file for the BMLang programming language and it is generated in the BMLang compilation phase. Bmodel contains the information of one or more BMLang functions,such as parameters, input and output.

Multi-stage bmodel description:

In bmodel, stage is to compile bmodels with various types of input_shape, and then use tpu_model to combine several bmodels into one bmodel, and each bmodel contained in it is a stage.Stage_num indicates the number of combined bmodels. Stage_num=1 for bmodels not combined. When running a model with a given shape, bmruntime will automatically choose a bmodel with the same input shape to run.

The running efficiency can be improved and the effect of dynamic operation can be achieved by selecting several common input shapes to combine. For example, compile two bmodels with the inputs being [1,3,200,200] and [2,3,200,200]. Running will start upon combination. If running is realized with the input [2,3,200,200], the bmodel with the input being [2,3,200,200] will be automatically found to run.

Alternatively, compile two bmodels with the inputs [1,3,200,200] and [1,3,100,100] to build the model that supports inputs 200*200 and 100*100.

Static bmodel description:

  1. Static bmodel saves the atomic operation instructions with fixed parameters that can be directly used on the chip. TPU can automatically read such atomic operation instructions, execute them in the flow line without any interruption in the halfway.

  2. When the static bmodel is executed, the size input of the model must be same with its size in compilation.

  3. Due to the simplicity and stability of a static interface, the model compiled under the new sdk can usually run on the old machine without refreshing the new firmware. It should be noted that although static compilation is designated for some models, some operators must have the internal MCU of TPU or host cpu involved, such as sorting, nms, where, detect_out and other operators with more logical operations. This part will be divided into subnets and implemented in a dynamic way. If the part for updating sdk compilation is a dynamic model, it is preferred to refresh or update the firmware to ensure sdk and runtime are consistent (It can be judged by the output of tpu_model --info xx.bmodel. For static compilation, if the subnet number is 1, it indicates a purely static network. See the section on tpu_model use for details.

  4. If the input shape only has several fixed discrete cases, the multi-stage bmodel aforementioned may be used to achieve the effect of the dynamic model.

Dynamic bmodel description:

  1. The dynamic bmodel stores the parameter information of each operator and cannot run directly on the TPU. It is necessary for the MCU inside the TPU to parse parameters layer by layer for shape and dtype inference and call atomic operations for achieving specific functions. So, the dynamic bmodel has a worse performance than the static one.

  2. When running the dynamic bmodel on the bm168x platform, it is preferred to start icache. Otherwise, the bmodel will run slowly.

  3. In the case of compilation, specify the maximum shape available through the shapes parameter. In the event of actual running, except for the c-dimension, variability is available in other dimensions. Normally, dynamic compilation should be considered if there are too many variable shapes. Otherwise, the multi-stage bmodel is recommended.

  4. In order to ensure the parameters are scalable and compatible, the dynamic bmodel will ensure the runtime of the new sdk can run the dynamic bmodel compiled by the old version sdk. It is usually recommended to refresh the machine after a new sdk is replaced with so as to ensure the consistency of the two versions.

Tpu_model use

With the tpu_model tool, you can view the parameter information of the bmodel file. Decompose multi-network bmodels into multiple single-network bmodels or combine multi-network bmodels into one bmodel.

Currently, the following six operation methods are available:

  1. View brief information (commonly used)

tpu_model --info xxx.bmodel

The output information is described as follows:

bmodel version: B.2.2                         # bmodel version No.
chip: BM1684                                  # Chip type supported
create time: Mon Apr 11 13:37:45 2022         # Creation time

==========================================
# Network dividing line: If there are multiple nets, there will be several dividing lines.
net 0: [informer_frozen_graph]  static        # The network is named informer_frozen_graph, a static type network (or static network)or a dynamic compilation network
# if it is dynamic.
------------                                  # stage dividing line: There will be several
# dividing lines if there are several stages in each network.
stage 0:                                      # Information of the first stage
subnet number: 41                             # Number of subnets in this stage, which is
# divided at the time of compilation to support the switching of different devices.Normally, the fewermore the number of subnets is
                                              # the better the result will be.
input: x_1, [1, 600, 9], float32, scale: 1    # Input and output information:name, shape, quantified scale value
input: x_3, [1, 600, 9], float32, scale: 1
input: x, [1, 500, 9], float32, scale: 1
input: x_2, [1, 500, 9], float32, scale: 1
output: Identity, [1, 400, 7], float32, scale: 1

device mem size: 942757216 (coeff: 141710112, instruct: 12291552, runtime:
788755552)  # The memory size occupied by the model on the TPU, in byte,
# format: Total memory size occupied (size of constant memory, size of instruction
# memory, size of data memory during the running)
host mem size: 8492192 (coeff: 32, runtime: 8492160)   # Memory size occupied on
# the host, in byte, format: Total memory size occupied (size of constant memory,
# size of data memory during the running)
  1. View detailed parameter information

tpu_model --print xxx.bmodel

  1. Decompose

tpu_model --extract xxx.bmodel

Decompose a bmodel that includes several stages in several networks into each bmodel that includes a stage within a network. The decomposed bmodel is named bm_net0_stage0.bmodel, bm_net1_stage0.bmodel and so on according to the serial numbers of net and stage.

  1. Combine

tpu_model --combine a.bmodel b.bmodel c.bmodel -o abc.bmodel

Combine multiple bmodels into one bmodel. -o is used to specify the output file name. If not specified, it is named compilation.bmodel by default.

Upon the combination of multiple bmodels:

  • Combination of bmodels with different net_names:The interface will select the corresponding network for inference according to net_name.

  • Combination of bmodels with the same net_name: The network with the net_name can support multiple stages, that is multiple input shapes. The interface will make a selection among multiple stages in the network according to the shape you input. For a static network, the stage that perfectly matches the shape will be selected. For a dynamic network, the nearest stage will be selected.

Restrictions:The same network net_name, when using combine, requires all static compilation, or all dynamic compilation. The combine that adopts static and dynamic compilation for the same net_name is not available.

  1. Combine folders

tpu_model --combine_dir a_dir b_dir c_dir -o abc_dir

It is the same with the functions of combine. Differently, this function can also combine input and output files for testing in addition to the bmodel. It combines folders, each of which must contain three files generated by the compiler: input_ref_data.dat, output_ref_data.dat, compilation.bmodel.

  1. Export binary data

tpu_model --dump xxx.bmodel start_offset byte_size out_file

Save the binary data in bmodel to a file. The print function may be used to view the [start,size] for all binary data, which corresponds to start_offset and byte_size.