Date: Mar 20, 2023
Click Count: 314
UltraScale is a 20nm process-based FPGA, while UltraScale+ is a 16nm process-based FPGA, and although the process is different, the internal architecture is the same. The UltraScale architecture described below also applies to UltraScale+ if not otherwise stated.
In UltraScale, each I/O Bank is in a single Clock Region CR (Clock Region) and contains 52 input/output pins. Among these 52 pins are 4 pairs (8) of Global Clock Pins GC (GlobalClock I/O). The usage is the same as the 7 series FPGAs.
The difference is that these 4 GC pairs are equal in status, and there is no longer a distinction between MRCC and SRCC.
The 7 series FPGAs contain both global and regional clock buffers. UltraScale simplifies the clock buffers, i.e., only the global clock buffers. There are 24 BUFGCE, 4 BUFGCE_DIV, and 8 BUFGCTRL in the clock region containing the input/output columns, but only 24 can be used simultaneously, as shown in the figure below.
These global clock buffers are located in the clock column. They can drive horizontal clock wiring/distribution tracks and vertical clock wiring/distribution tracks, which are unavailable in the 7 series FPGAs. As shown in the figure below, these tracks are all located in the center of the clock region (some chips will have high-speed transceivers on only one side). It is easy to see that the width of each clock region has been reduced compared to the 7 series FPGAs, no longer half the width of the chip, and the height has been changed from 50 CLBs in the 7 series to 60 CLBs. The granularity of the clock region is more refined.
Both horizontal clock wiring/distribution tracks and vertical clock wiring/distribution tracks are bounded by clock regions. This means that if a resource in a clock region is not clocked, the tool will turn off the corresponding track, thus saving power. Wiring tracks can drive wiring tracks and distribution tracks within adjacent clock regions, but distribution tracks can only drive horizontal distribution tracks within adjacent clock regions.
A wiring track aims to route the clock from the global clock buffer to some central point. At this central point, the clock is connected to its load's clock port via the distribution track. The distribution track can further move this point to improve the local offset of the clock. This point we call the ClockRoot.
Each clock region has 24 horizontal clock wiring/distribution tracks and 24 vertical clock wiring/distribution tracks. On the horizontal clock distribution tracks, there are 32 BUFCE_LEAFs, called leaf clock buffers. The clocks come down from the horizontal clock distribution tracks and reach the clock ports of the logic resources via BUFCE_LEAF. BUFCE_LEAF can only be used automatically by Vivado and cannot be instantiated in code.
UltraScale has a separate BUFGCE, which must not be configured via BUFGCTRL. However, BUFGCTRL is still configurable. For example, BUFGCE_1, BUFGMUX, and BUFGMUX_1 are generated through BUFGCTRL configuration. BUFGCE_DIV replaces BUFR but has a more powerful driving capability than BUFR because it has become a global clock buffer.
Also, BUFGCE_DIV has a crossover function, and the crossover factor can be an integer from 1 to 8 (including 1 and 8). Only when a dividing factor is an odd number the duty cycle of the output clock will no longer be 50%. BUFG_GT has been added to UltraScale. The ADC/DAC module can only drive BUFG_GT in a high-speed transceiver or RFSoC.
BUFG_GT_SYNC is a synchronizer for BUFG_GT and is automatically inserted into the design when Vivado infers BUFG_GT. Like BUFGCE_DIV, BUFG_GT also has a crossover function with an available crossover factor of an integer between 1 and 8 (1 and 8). The crossover factor is input from the DIV port.
The DIV bit width is 3; when it is 3'b000, it corresponds to a dividing factor of 1. There are 24 BUFG_GTs in the clock region containing the high-speed transceiver. A new global clock buffer, BUFG_PS, has been added to the Zynq UltraScale+ MPSoC (it is not available in the Zynq 7000 family of FPGAs). This buffer is located next to the internal ARM processor.
The number of BUFG_PS varies from chip to chip. For example, the ZU4EG has 96 BUFG_PS, and the ZU2CG has 72 BUFG_PS.
Application example: Performing simple frequency dividing with BUFG_GT
BUFG_GT has a crossover function and can support a crossover factor of integers from 1 to 8 (including 1 and 8). The input port DIV provides the crossover factor control word with a bit width of 3. When DIV is 3'b000, the corresponding dividing factor is 1. With the dividing function of BUFG_GT, a dividing clock can be generated, thus saving the MMCM. When using BUFG_GT, could you pay attention to its clock source?
The clock buffer with the same frequency divider function also includes BUFGCE_DIV, as shown in the figure below. The input of BUFGCE_DIV can come from the output of the MMCM, and the frequency of clk2x in the figure is twice as high as clk1x. The use of BUFGCE_DIV is effective in reducing the clock skew of the synchronous cross-clock domain path.
<< Previous: Basic knowledge of FPGA architecture and applications
FPGA XC3000 Family 2K Gates 100 Cells 100MHz 5V 68-Pin PLCC
FPGA Virtex-4 FX Family 56880 Cells 90nm Technology 1.2V 1152-Pin FCBGA
Mechanical Sample IC
Mechanical Sample IC