4. Overall Design

4.1. Layered

TPU-MLIR treats the compilation process of the network model in two layers.

Top Dialect

Chip-independent layer, including graph optimization, quantization and inference, etc.

Tpu Dialect

Chip-related layer, including weight reordering, operator slicing, address assignment, inference, etc.

The overall flow is shown in the (TPU-MLIR overall process) diagram, where the model is gradually converted into final instructions by Passes. Here is a detailed description of what functions each Pass does in the Top layer and the Tpu layer. The following chapters will explain the key points of each Pass in detail.


Fig. 4.1 TPU-MLIR overall process

4.2. Top Pass


Do shape inference, and constant folder


Graph optimization related to specific OP, such as merging relu into conv, shape merge, etc.


Do extra patterns, such as get FLOPs, remove unuse output, etc.


Assign chip, such as bm1684x, cv183x, etc; and adjust top mlir by chip, for example, make all cv18xx input types as F32.


Import calibration table, assign min and max for all ops, for quantization later.


Do top ops optimization by chip.


Lower top ops to tpu ops; if for mode F32/F16/BF16, top op normally convert to tpu op directly; if INT8, quantization is needed.

4.3. Tpu Pass


Graph optimization related to specific OP, such as merging of consecutive Requants, etc.


Input and output types will be quantized if true; or be F32


Do tpu ops optimization by chip.


Reorder the weights of individual OP based on chip characteristics, such as filter and bias for convolution.


Split the network into different subnets according to TPU/CPU, if all operators are TPU, there is only one subnet.


Reorder op to make sure ops are close to their users.


Slice the network so that as many OPs as possible are computed consecutively in the local mem.


Assign addresses to the OPs that need global mem.


Use Builder module to generate the final model in flatbuffers format.

4.4. Other Passes

There are some optional passes, not in the diagram, used for special functions.


Fuse image preprocess to model.


Fuse postprocess to model, only support ssd and yolo.