FONT SIZE : AAA
In this section we provide some known recipes to the challenges explained in Sect. 18.3 . The recipes below would help design teams to realize their own FPGAbased emulator. We have assumed (by this chapter, toward the end of the book) a basic understanding of FPGA-based design.
Note that you should perform RTL to RTL Logic Equivalence Check after any RTL transformation.
PLLs : All technology ASIC libraries contain PLLs. Each PLL consists of basic reference clock in , clock out , with pins indicating the multiplier factor in terms of Numerator and Denominator values. These have to be mapped to the equivalent PLLs in the selected FPGA. The methodology used is to keep the ASIC PLL entity identical but to instantiate the FPGA clocking resource in place. If the PLL has multiple clock outputs, the same are also remapped to the FPGA.
Clock Dividers : If there are dividers in the design, then it is appropriate to remove the divider circuits and replace them with the FPGA clock resource outputs as defifi ned in the MMCM clock tile.
It would be useful to maintain a table similar to Table 18.1
In the Table 18.1 , for (#2) and (#3), the clock frequencies are the same, i.e., 20 MHz. It would be worthwhile to investigate from an ASIC clocking point of view, if it is possible to use the same PLL output of 20 MHz driving the clock end points of both (#2) and (#3). If the clocks are of the same frequency, but asynchronous to each other, it would be OK to reduce the use of a PLL and free up routing resources and reduce complexity of mapping to the FPGA.
Programmable Clock Dividers : Usually there is a use of Programmable Clock Dividers to select a baud rate as it is in the case of UART. In such cases, reconfifi gurable registers of the ASIC need to be remapped to the Dynamic Reconfifi guration Data Input of the Clocking tile. Most emulation designers would put the dynamic reconfifi guration data input as part of the instrumentation in the testbench, so that they have better control over the clock.
Clock Gating Cells : Integrated clock gating cells are instantiated by the RTL designer to enable dynamic power reduction. This can be a problem with FPGAs which can get resource limited if there are too many clock gating cells in the design. A solution is to do a tool-based or hand-scripted transformation to the clock gating cells. A typical example is provided in Fig. 18.4 .
Now that the individual pieces of your RTL have been readied for FPGA-based emulation, the next level of complexity comes if the design cannot be mapped on one FPGA. For a particular design, it might not fifi t into a single FPGA, due to either of the following:
• Design logic size exceeding the logic that can be mapped onto the FPGA.
• Design logic could be mapped, but it could not be routed.
• Design logic was mapped and routed, but design has more memory than the block RAMs on the FPGA.
• Design ran out of IO that could be appropriately mapped on the FPGA.
Irrespective of the situation leading to the use of multiple FPGAs, all of the above need to be resolved on a per FPGA basis on a MultiFPGA emulation system. To start with, get a gate, memory, and pin count estimate for the big blocks in the design. Also, assume that each FPGA may be about 60 % utilized to begin with. Typically, most big IPs would fall within 5 ~ 6 sub-hierarchical levels of logic. This exercise would give a rough estimate of the number of FPGAs required to fifi t the design and testbench.
The exercise is iterative. Start with partitioning through the most constrained of the three resources (gate count, pin count, memory) and then affect the grouping changes to see if the other constraints can also fifi t. Figure 18.5 depicts the hierarchical view of the DUV and the testbench BFM components and the Table 18.2 the tabular view of the same. Both these views (hierarchical and tabular) help in converging to the right partitioning between multiple FPGAs.
Once the gross level partitioning is known through analytical method as per Table 18.2 , we need to get the same implemented. There are tools which can read in the RTL fifi les and then dump out a regrouped fifi le. Such grouping would result in new hierarchical tables being generated, as shown in Table 18.3 .
For this example, considering per FPGA gate count of ~100M gates, Table 18.3 shows that FPGA3 is OK, but FPGA1 and FPGA2 are likely challenges to the P&R
stage. These considerations and iterations go on until there is suffifi cient convergence. Table 18.3 is defifi cient in terms of pin count and memory as it is for illustration purpose only.
However, since the module BLOCK2 and BFM2 are closely knit with each other, there could be pin count challenge if some readjustments of modules of BLOCK2 are done onto FPGA3 which seems to be least constrained.
The MultiFPGA board usually has fifi xed pin count which can be summarized in a template table as in Table 18.4
In Table 18.4 PF12 are the physical IO pins that are available between FPGA1 and FPGA2 (F1 <--> F2) on the FPGA board.
In Table 18.4 we have a Not Applicable (NA) if the particular FPGA is not used in the implementation. The implemented pin count across the FPGAs (IPF) should be less than the provisioned pin count across the FPGAs (PF). Thus, the pin count criteria can be converged when IPF12 < PF12 and so on.
If the pin count criteria are not satisfifi ed, you could resort to pin muxing for the IO. This means that another utility RTL needs to be added to send multiple bits of data over a single IO from one FPGA to another. This utility RTL is inserted prior to the pin-multiplexed IO. Figure 18.6 shows the circuit for the utility RTL on the FPGAs for pin multiplexing. There are three main operations done:
• Load: convert from parallel to serial.
• Shift: shift the serial data from FPGA2FPGA.
• Restore: convert serial data back to parallel.
EDA Tools like Certify™ from Synopsys ® form a major backbone to enablement of this convergence.
It is also possible to use the FPGA SERDES Lanes as an extension to the pin multiplexing. SERDES provides a convenient serializer and deserializer over a two- wire network, which can transmit and receive data Gbps (Giga bits per second) range. The SERDES lanes are useful in converting FPGA2FPGA IOs into serial, sending it across at high speed and reconstructing the same at the other end.
As soon as we move into using multiple FPGAs, the clocking complexity increases. One way is to see each hop or evaluation as a phase (a dedicated time slot) and increase the emulation clock period accordingly. This means that the performance of the emulator drops every time there is a signal hop.
Manufacturer:Xilinx
Product Categories: FPGAs (Field Programmable Gate Array)
Lifecycle:Active Active
RoHS:
Manufacturer:Xilinx
Product Categories: Condensateurs électrolytiques en aluminium
Lifecycle:Active Active
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: FPGAs (Field Programmable Gate Array)
Lifecycle:Active Active
RoHS:
Manufacturer:Xilinx
Product Categories:
Lifecycle:Obsolete -
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: Résistances
Lifecycle:Active Active
RoHS: No RoHS
Support