TPU Architecture

The following is the architecture of Sophon TPU BM1684x.

Sophon TPU has a multi-core architecture. Each core is called a Neural network Processing Unit (NPU). There is an independent memory and many kinds of Execution Units (EU) in each NPU.

All NPUs execute instructios in the form of Single Instruction Multiple Data (SIMD). At a certain moment, all NPUs will execute the same instructions, but the data on each NPU is different.

The memory inside each NPU is called local memory, and the EUs can only access local memory. The data needs to be copied from system memory (usually global memory) to NPU local memory by GDMA before it can be accessed by EUs. CDMA, GDMA and BDC can run in parallel.

The calculation acceleration of TPU is usually divided into the following steps:

Copy the data from the host-side memory to the TPU’s system memory (global memory).

Copy the data from system memory (global memory) to local memory.

Calculate the data from local memory, and return the calculation result to local memory.

Copy the results from local memory back to global memory.

Copy the results from global memory back to the host-side memory.

Memory Types

The TPU contains the following memory types:

System Memory

Global Memory: Off-chip memory (DDR).

L2-SRAM: On-chip memory, it can be used as intermediate cache.

Local Memory : On-chip memory, it is mainly used to store the data for BDC.