13. MLIR Definition

This chapter introduces the definition of each element of MLIR, including Dialect, Interface, etc.

13.1. Top Dialect

13.1.1. Operations

13.1.1.1. AddOp

Brief intro

Add operation, \(Y = coeff_0 * X_0 + coeff_1 * X_1\)

Input
  • inputs: tensor array, corresponding to 2 or more input tensors

Output
  • output: tensor

Attributes
  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

  • coeff: the coefficient corresponding to each tensor, 1.0 by default

Output
  • output: tensor

Interface

None

Example
%2 = "top.Add"(%0, %1) {do_relu = false} : (tensor<1x3x27x27xf32>, tensor<1x3x27x27xf32>) -> tensor<1x3x27x27xf32> loc("add")

13.1.1.2. AvgPoolOp

Brief intro

Perform average pooling on the input tensor, \(S=\frac{1}{width\ *\ height}\sum_{i,j}a_{ij}\), where \(width\) and \(height\) represent the width and height of the kernel_shape. \(\sum_{i,j}a_{ij}\) means to sum the kernel_shape. A sliding window of a given size will sequentially pool the input tensor

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • kernel_shape: controls the size of the sliding window

  • strides: step size, controlling each step of the sliding window

  • pads: controls the shape of the padding

  • pad_value: padding content, constant, 0 by default

  • count_include_pad: whether the result needs to count the pads filled

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%90 = "top.AvgPool"(%89) {do_relu = false, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]} : (tensor<1x256x20x20xf32>) -> tensor<1x256x20x20xf32> loc("resnetv22_pool1_fwd_GlobalAveragePool")

13.1.1.3. Depth2SpaceOp

Brief intro

Depth to space operation, \(Y = Depth2Space(X)\)

Input
  • inputs: tensor

Output
  • output: tensor

Attributes
  • block_h: tensor block size of h dimension, i64 type

  • block_w: tensor block size of w dimension, i64 type

  • is_CRD: column-row-depth. If true, the data is arranged in the depth direction according to the order of HWC, otherwise it is CHW, bool type

  • is_inversed: if true, the shape of the result is: \([n, c * block_h * block_w, h / block_h, w / block_w]\), otherwise it is: \([n, c / (block_h * block_w), h * block_h, w * block_w]\), bool type

Output
  • output: tensor

Interface

None

Example
%2 = "top.Depth2Space"(%0) {block_h = 2, block_w = 2, is_CRD = true, is_inversed = false} : (tensor<1x8x2x3xf32>) -> tensor<1x2x4x6xf32> loc("add")

13.1.1.4. BatchNormOp

Brief intro

Perform Batch Normalization on a 4D input tensor. More details on batch normalization can be found in the paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .

The specific calculation formula is as follows:

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]
Input
  • input: 4D input tensor

  • mean: mean of the input tensor

  • variance: variance of the input tensor

  • gamma: \(\gamma\) tensor in the formula, can be None

  • beta: \(\beta\) tensor in the formula, can be None

Output
  • output: tensor

Attributes
  • epsilon: constant \(\epsilon\) in formula, 1e-05 by default

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%5 = "top.BatchNorm"(%0, %1, %2, %3, %4) {epsilon = 1e-05, do_relu = false} : (tensor<1x3x27x27xf32>, tensor<3xf32>, tensor<3xf32>, tensor<3xf32>, tensor<3xf32>) -> tensor<1x3x27x27xf32> loc("BatchNorm")

13.1.1.5. CastOp

(To be implemented)

13.1.1.6. ClipOp

Brief intro

Constrain the given input to a certain range

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • min: the lower limit

  • max: the upper limit

Output
  • output: tensor

Interface

None

Example
%3 = "top.Clip"(%0) {max = 1%: f64,min = 2%: f64} : (tensor<1x3x32x32xf32>) -> tensor<1x3x32x32xf32> loc("Clip")

13.1.1.7. ConcatOp

Brief intro

Concatenates the given sequence of tensors in the given dimension. All input tensors either have the same shape (except the dimension to be concatenated) or are all empty.

Input
  • inputs: tensor array, corresponding to 2 or more input tensors

Output
  • output: tensor

Attributes
  • axis: the subscript of the dimension to be concatenated

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%2 = "top.Concat"(%0, %1) {axis = 1, do_relu = false} : (tensor<1x3x27x27xf32>, tensor<1x3x27x27xf32>)  -> tensor<1x6x27x27xf32> loc("Concat")

13.1.1.8. ConvOp

Brief intro

Perform 2D convolution operation on the input tensor.

In simple terms, the size of the given input is \((N, C_{\text{in}}, H, W)\). The output \((N, C_{\text{out}}, H_{ \text{out}}, W_{\text{out}})\) is calculated as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k),\]

where \(\star\) is a valid cross-correlation operation, \(N\) is the batch size, \(C\) is the number of channels, \(H, W\) is the input image height and width.

Input
  • input: tensor

  • filter: parameter tensor. The shape is

\((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})\)

  • bias: learnable bias tensor with the shape of \((out\_channels)\)

Output
  • output: tensor

Attributes
  • kernel_shape: the size of the convolution kernel

  • strides: strides of convolution

  • pads: the number of layers to add 0 to each side of the input

  • group: the number of blocked connections from the input channel to the output channel, the default is 1

  • dilations: the spacing between convolution kernel elements, optional

  • inserts: optional

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%2 = "top.Conv"(%0, %1) {kernel_shape = [3, 5], strides = [2, 1], pads = [4, 2]} : (tensor<20x16x50x100xf32>, tensor<33x3x5xf32>)  -> tensor<20x33x28x49xf32> loc("Conv")

13.1.1.9. DeconvOp

Brief intro

Perform a deconvolution operation on the input tensor.

Input
  • input: tensor

  • filter: parameter tensor. The shape is

\((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})\)

  • bias: learnable bias tensor with the shape of \((out\_channels)\)

Output
  • output: tensor

Attributes
  • kernel_shape: the size of the convolution kernel

  • strides: strides of convolution

  • pads: the number of layers to add 0 to each side of the input

  • group: the number of blocked connections from the input channel to the output channel, the default is 1

  • dilations: the spacing between convolution kernel elements, optional

  • inserts: optional

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%2 = "top.Deconv"(%0, %1) {kernel_shape = (3, 5), strides = (2, 1), pads = (4, 2)} : (tensor<20x16x50x100xf32>, tensor<33x3x5xf32>)  -> tensor<20x33x28x49xf32> loc("Deconv")

13.1.1.10. DivOp

Brief intro

Division operation, \(Y = X_0 / X_1\)

Input
  • inputs: tensor array, corresponding to 2 or more input tensors

Output
  • output: tensor

Attributes
  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

  • multiplier: the multiplier for quantization, the default is 1

  • rshift: right shift for quantization, 0 by default

Output
  • output: tensor

Interface

None

Example
%2 = "top.Div"(%0, %1) {do_relu = false, relu_limit = -1.0, multiplier = 1, rshift = 0} : (tensor<1x3x27x27xf32>, tensor<1x3x27x27xf32>) -> tensor<1x3x27x27xf32> loc("div")

13.1.1.11. InputOp

(To be implemented)

13.1.1.12. LeakyReluOp

Brief intro

Apply the LeakyRelu function on each element in the tensor. The function can be expressed as: f(x) = alpha * x for x < 0, f(x) = x for x >= 0

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • alpha: the coefficients corresponding to each tensor

Output
  • output: tensor

Interface

None

Example
%4 = "top.LeakyRelu"(%3) {alpha = 0.67000001668930054 : f64} : (tensor<1x32x100x100xf32>) -> tensor<1x32x100x100xf32> loc("LeakyRelu")

13.1.1.13. LSTMOp

Brief intro

Perform the LSTM operation of the RNN

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • filter: convolution kernel

  • recurrence: recurrence unit

  • bias: parameter of LSTM

  • initial_h: Each sentence in LSTM will get a state after the current cell. The state is a tuple(c, h), where h=[batch_size, hidden_size]

  • initial_c: c=[batch_size, hidden_size]

  • have_bias: whether to set bias, the default is false

  • bidirectional: set the LSTM of the bidirectional loop, the default is false

  • batch_first: whether to put the batch in the first dimension, the default is false

Output
  • output: tensor

Interface

None

Example
%6 = "top.LSTM"(%0, %1, %2, %3, %4, %5) {batch_first = false, bidirectional = true, have_bias = true} : (tensor<75x2x128xf32>,tensor<2x256x128xf32>, tensor<2x256x64xf32>, tensor<2x512xf32>, tensor<2x2x64xf32>, tensor<2x2x64xf32>) -> tensor<75x2x2x64xf32> loc("LSTM")

13.1.1.14. LogOp

Brief intro

Perform element-wise logarithm on the given input tensor

Input
  • input: tensor

Output
  • output: tensor

Attributes

None

Output
  • output: tensor

Interface

None

Example
%1 = "top.Log"(%0) : (tensor<1x3x32x32xf32>) -> tensor<1x3x32x32xf32> loc("Log")

13.1.1.15. MaxPoolOp

Brief intro

Perform max pool on the given input tensor

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • kernel_shape: controls the size of the sliding window

  • strides: step size, controlling each step of the sliding window

  • pads: controls the shape of the padding

  • pad_value: padding content, constant, 0 by default

  • count_include_pad: whether the result needs to count the pads filled

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%8 = "top.MaxPool"(%7) {do_relu = false, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]} : (tensor<1x256x20x20xf32>) -> tensor<1x256x20x20xf32> loc("resnetv22_pool0_fwd_MaxPool")

13.1.1.16. MatMulOp

Brief intro

2D matrix multiplication operation, \(C = A * B\)

Input
  • input: tensor: matrix of size m*k

  • right: tensor: matrix of size k*n

Output
  • output: tensor: matrix of size m*n

Attributes
  • bias: the bias_scale will be calculated according to the bias during quantization (can be empty)

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Output
  • output: tensor

Interface

None

Example
%2 = "top.MatMul"(%0, %1) {do_relu = false, relu_limit = -1.0} : (tensor<3x4xf32>, tensor<4x5xf32>) -> tensor<3x5xf32> loc("matmul")

13.1.1.17. MulOp

Brief intro

multiplication operation, \(Y = X_0 * X_1\)

Input
  • inputs: tensor array, corresponding to 2 or more input tensors

Output
  • output: tensor

Attributes
  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

  • multiplier: the multiplier for quantization, the default is 1

  • rshift: right shift for quantization, default is 0

Output
  • output: tensor

Interface

None

Example
%2 = "top.Mul"(%0, %1) {do_relu = false, relu_limit = -1.0, multiplier = 1, rshift = 0} : (tensor<1x3x27x27xf32>, tensor<1x3x27x27xf32>) -> tensor<1x3x27x27xf32> loc("mul")

13.1.1.18. MulConstOp

Brief intro

Multiply with a constant, \(Y = X * Const_Val\)

Input
  • inputs: tensor

Output
  • output: tensor

Attributes
  • const_val: constants of type f64

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Output
  • output: tensor

Interface

None

Example
%1 = arith.constant 4.7 : f64
%2 = "top.MulConst"(%0) {do_relu = false, relu_limit = -1.0} : (tensor<1x3x27x27xf64>, %1) -> tensor<1x3x27x27xf64> loc("mulconst")

13.1.1.19. PermuteOp

Brief intro

Change the tensor layout. Change the order of tensor data dimensions, and rearrange the input tensor according to the given order

Input
  • inputs: tensor array, tensor of any types

Attributes
  • order: the order in which tensors are rearranged

Output
  • output: rearranged tensor

Interface

None

Example
%2 = "top.Permute"(%1) {order = [0, 1, 3, 4, 2]} : (tensor<4x3x85x20x20xf32>) -> tensor<4x3x20x20x85xf32> loc("output_Transpose")

13.1.1.20. ReluOp

Brief intro

Performs the ReLU function on each element in the input tensor, if the limit is zero, the upper limit is not used

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Output
  • output: tensor

Interface

None

Example
%1 = "top.Relu"(%0) {relu_limit = 6.000000e+00 : f64} : (tensor<1x3x32x32xf32>) -> tensor<1x3x32x32xf32> loc("Clip")

13.1.1.21. ReshapeOp

Brief intro

Reshape operator, which returns a tensor of the given shape with the same type and internal values as the input tensor. Reshape may operate on any row of the tensor. No data values will be modified during the reshaping process

Input
  • input: tensor

Output
  • output: tensor

Attributes

None

Interface

None

Example
%133 = "top.Reshape"(%132) : (tensor<1x255x20x20xf32>) -> tensor<1x3x85x20x20xf32> loc("resnetv22_flatten0_reshape0_Reshape")

13.1.1.22. ScaleOp

Brief intro

Scale operation \(Y = X * S + B\), where the shape of X/Y is [N, C, H, W], and the shape of S/B is [1, C, 1, , 1].

Input
  • input: tensor

  • scale: the magnification of the input

  • bias: the bias added after scaling

Output
  • output: tensor

Attributes
  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Interface

None

Example
%3 = "top.Scale"(%0, %1, %2) {do_relu = false} : (tensor<1x3x27x27xf32>, tensor<1x3x1x1xf32>, tensor<1x3x1x1xf32>) -> tensor<1x3x27x27xf32> loc("Scale")

13.1.1.23. SigmoidOp

Brief intro

The activation function, which maps elements in the tensor to a specific interval, [0, 1] by default. The calculation method is:

\[Y = \frac{scale}{1 + e^{-X}} + bias\]
Input
  • inputs: tensor array, tensor of any types

Attributes
  • scale: the magnification of the input, 1 by default

  • bias: default is 0

Output
  • output: tensor

Interface

None

Example
%2 = "top.Sigmoid"(%1) {bias = 0.000000e+00 : f64, scale = 1.000000e+00 : f64} : (tensor<1x16x64x64xf32>) -> tensor<1x16x64x64xf32> loc("output_Sigmoid")

13.1.1.24. SiLUOp

Brief intro

The activation function, \(Y = \frac{X}{1 + e^{-X}}\) or \(Y = X * Sigmoid(X)\)

Input
  • input: tensor array, tensor of any types

Attributes

None

Output
  • output: tensor

Interface

None

Example
%1 = "top.SiLU"(%0) : (tensor<1x16x64x64xf32>) -> tensor<1x16x64x64xf32> loc("output_Mul")

13.1.1.25. SliceOp

Brief intro

Tensor slice, slicing each dimension of the input tensor according to the offset and step size in the offset and steps arrays to generate a new tensor

Input
  • input: tensor array, tensor of any types

Attributes
  • offset: an array for storing slice offsets. The index of the offset array corresponds to the dimension index of the input tensor

  • steps: an array that stores the step size of the slice. The index of the steps array corresponds to the index of the input tensor dimension

Output
  • output: tensor

Interface

None

Example
%1 = "top.Slice"(%0) {offset = [2, 10, 10, 12], steps = [1, 2, 2, 3]} : (tensor<5x116x64x64xf32>) -> tensor<3x16x16x8xf32> loc("output_Slice")

13.1.1.26. SoftmaxOp

Brief intro

For the input tensor, the normalized index value is calculated on the dimension of the specified axis. The calculation method is as follows:

\[\sigma(Z)_i = \frac{e^{\beta{Z_i}}}{\sum_{j=0}^{K-1}{e^{\beta{Z_j}}}},\]

where \(\sum_{j=0}^{K-1}{e^{\beta{Z_j}}}\) does the exponential summation on the axis dimension. j ranges from 0 to K-1 and K is the size of the input tensor in the axis dimension.

For example, the size of the input tensor is \((N, C, W, H)\), and the Softmax is calculated on the channel of axis=1. The calculation method is:

\[Y_{n,i,w,h} = \frac{e^{\beta{X_{n,i,w,h}}}}{\sum_{j=0}^{C-1}{e^{\beta{X_{n,j,w,h}}}}}\]
Input
  • input: tensor array, tensor of any types

Attributes
  • axis: dimension index, which is used to specify the dimension to perform softmax. It can take the value from [-r, r-1], where r is the number of dimensions of the input tensor. When axis is negative, it means the reverse order dimension

  • beta: The scaling factor for the input in the tflite model, invalid for non-tflite models, 1.0 by default.

Output
  • output: the tensor on which the softmax is performed.

Interface

None

Example
%1 = "top.Softmax"(%0) {axis = 1 : i64} : (tensor<1x1000x1x1xf32>) -> tensor<1x1000x1x1xf32> loc("output_Softmax")

13.1.1.27. SqueezeOp

Brief intro

Crop the input tensor with the specified dimension and return the cropped tensor

Input
  • input: tensor

Output
  • output: tensor

Attributes
  • axes: specifies the dimension to be cropped. 0 represents the first dimension and -1 represents the last dimension

Interface

None

Example
%133 = "top.Squeeze"(%132) {axes = [-1]} : (tensor<1x255x20x20xf32) -> tensor<1x255x20xf32> loc(#loc278)

13.1.1.28. UpsampleOp

Brief intro

Upsampling op, upsampling the input tensor nearest and returning the tensor

Input

tensor

Attributes
  • scale_h: the ratio of the height of the target image to the original image

  • scale_w: the ratio of the width of the target image to the original image

  • do_relu: whether to perform Relu operation on the result, False by default

  • relu_limit: specify the upper limit value if doing Relu. There is no upper limit if it is a negative number

Output
  • output: tensor

Interface

None

Example
%179 = "top.Upsample"(%178) {scale_h = 2 : i64, scale_w = 2 : i64} : (tensor<1x128x40x40xf32>) -> tensor<1x128x80x80xf32> loc("268_Resize")

13.1.1.29. WeightOp

Brief intro

The weight op, including the reading and creation of weights. Weights will be stored in the npz file. The location of the weight corresponds to the tensor name in npz.

Input

None

Attributes

None

Output
  • output: weight Tensor

Interface
  • read: read weight data, the type is specified by the model

  • read_as_float: convert the weight data to float type for reading

  • read_as_byte: read the weight data in byte type

  • create: create weight op

  • clone_bf16: convert the current weight to bf16 and create a weight Op

  • clone_f16: convert the current weight to f16 and create a weight Op

Example
%1 = "top.Weight"() : () -> tensor<32x16x3x3xf32> loc("filter")