This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > Designing with Xilinx FPGAs Using Vivado > Emulation Using FPGAs > General Methodology

TABLE OF CONTENTS

Xilinx FPGA FPGA Forum

General Methodology

FONT SIZE : AAA

In this section we provide some known recipes to the challenges explained in  Sect. 18.3 . The recipes below would help design teams to realize their own FPGAbased emulator. We have assumed (by this chapter, toward the end of the book) a  basic understanding of FPGA-based design.

Note that you should perform RTL to RTL Logic Equivalence Check after any  RTL transformation.

RTL-Related Transformations

PLLs : All technology ASIC libraries contain PLLs. Each PLL consists of basic reference clock in , clock out , with pins indicating the multiplier factor in terms of  Numerator and Denominator values. These have to be mapped to the equivalent  PLLs in the selected FPGA. The methodology used is to keep the ASIC PLL entity  identical but to instantiate the FPGA clocking resource in place. If the PLL has  multiple clock outputs, the same are also remapped to the FPGA.

Clock Dividers : If there are dividers in the design, then it is appropriate to remove  the divider circuits and replace them with the FPGA clock resource outputs as  defifi ned in the MMCM clock tile.

It would be useful to maintain a table similar to Table 18.1

In the Table 18.1 , for (#2) and (#3), the clock frequencies are the same, i.e.,  20 MHz. It would be worthwhile to investigate from an ASIC clocking point of  view, if it is possible to use the same PLL output of 20 MHz driving the clock end  points of both (#2) and (#3). If the clocks are of the same frequency, but asynchronous to each other, it would be OK to reduce the use of a PLL and free up routing  resources and reduce complexity of mapping to the FPGA.  

Programmable Clock Dividers : Usually there is a use of Programmable Clock  Dividers to select a baud rate as it is in the case of UART. In such cases, reconfifi gurable  registers of the ASIC need to be remapped to the Dynamic Reconfifi guration Data  Input of the Clocking tile. Most emulation designers would put the dynamic reconfifi guration data input as part of the instrumentation in the testbench, so that they have  better control over the clock.

Clock Gating Cells : Integrated clock gating cells are instantiated by the RTL  designer to enable dynamic power reduction. This can be a problem with FPGAs  which can get resource limited if there are too many clock gating cells in the design.  A solution is to do a tool-based or hand-scripted transformation to the clock gating  cells. A typical example is provided in Fig. 18.4 .

Mapping of ASIC clock frequencies to FPGA clocks.png

Typical ASIC and FPGA implementation for a clock gating cell.png

Multiple FPGA Specifific (The Partitioning Problem)

Now that the individual pieces of your RTL have been readied for FPGA-based  emulation, the next level of complexity comes if the design cannot be mapped on  one FPGA. For a particular design, it might not fifi t into a single FPGA, due to either  of the following:

• Design logic size exceeding the logic that can be mapped onto the FPGA. 

• Design logic could be mapped, but it could not be routed. 

• Design logic was mapped and routed, but design has more memory than the block RAMs on the FPGA. 

• Design ran out of IO that could be appropriately mapped on the FPGA.

Irrespective of the situation leading to the use of multiple FPGAs, all of the  above need to be resolved on a per FPGA basis on a MultiFPGA emulation system.  To start with, get a gate, memory, and pin count estimate for the big blocks in the  design. Also, assume that each FPGA may be about 60 % utilized to begin with.  Typically, most big IPs would fall within 5 ~ 6 sub-hierarchical levels of logic. This  exercise would give a rough estimate of the number of FPGAs required to fifi t the  design and testbench.

The exercise is iterative. Start with partitioning through the most constrained of  the three resources (gate count, pin count, memory) and then affect the grouping  changes to see if the other constraints can also fifi t. Figure 18.5 depicts the hierarchical view of the DUV and the testbench BFM components and the Table 18.2 the  tabular view of the same. Both these views (hierarchical and tabular) help in converging to the right partitioning between multiple FPGAs.

Hierarchical view for embedded synthesizable testbench with DUV and BFM.png

FPGA view for the embedded synthesizable testbench with DUV and BFM.png

Partitioning Gate Count Challenge

Once the gross level partitioning is known through analytical method as per  Table 18.2 , we need to get the same implemented. There are tools which can read in  the RTL fifi les and then dump out a regrouped fifi le. Such grouping would result in  new hierarchical tables being generated, as shown in Table 18.3 .

For this example, considering per FPGA gate count of ~100M gates, Table 18.3 shows that FPGA3 is OK, but FPGA1 and FPGA2 are likely challenges to the P&R

Sorted list of hierarchies on per FPGA basis.png

Actual partitioned pin count vs. available connections between FPGAs.png

stage. These considerations and iterations go on until there is suffifi cient convergence. Table 18.3 is defifi cient in terms of pin count and memory as it is for illustration purpose only.

However, since the module BLOCK2 and BFM2 are closely knit with each other,  there could be pin count challenge if some readjustments of modules of BLOCK2  are done onto FPGA3 which seems to be least constrained.

Partitioning Pin Count

The MultiFPGA board usually has fifi xed pin count which can be summarized in a  template table as in Table 18.4

In Table 18.4 PF12 are the physical IO pins that are available between FPGA1  and FPGA2 (F1 <--> F2) on the FPGA board.  

In Table 18.4 we have a Not Applicable (NA) if the particular FPGA is not used  in the implementation. The implemented pin count across the FPGAs (IPF) should  be less than the provisioned pin count across the FPGAs (PF). Thus, the pin count  criteria can be converged when IPF12 < PF12 and so on.

If the pin count criteria are not satisfifi ed, you could resort to pin muxing for the  IO. This means that another utility RTL needs to be added to send multiple bits of  data over a single IO from one FPGA to another. This utility RTL is inserted prior  to the pin-multiplexed IO. Figure 18.6 shows the circuit for the utility RTL on the  FPGAs for pin multiplexing. There are three main operations done:

• Load: convert from parallel to serial. 

• Shift: shift the serial data from FPGA2FPGA.

Pin muxing for IOs over two FPGAs.png

• Restore: convert serial data back to parallel.  

EDA Tools like Certify™ from Synopsys ® form a major backbone to enablement  of this convergence.  

Using SERDES Lanes

It is also possible to use the FPGA SERDES Lanes as an extension to the pin  multiplexing. SERDES provides a convenient serializer and deserializer over a  two- wire network, which can transmit and receive data Gbps (Giga bits per  second) range. The SERDES lanes are useful in converting FPGA2FPGA IOs  into serial, sending it across at high speed and reconstructing the same at the  other end.

Handling Clocks Over Multiple FPGAs

As soon as we move into using multiple FPGAs, the clocking complexity increases.  One way is to see each hop or evaluation as a phase (a dedicated time slot) and  increase the emulation clock period accordingly. This means that the performance  of the emulator drops every time there is a signal hop.

  • XC5VFX130T-2FFG1738I

    Manufacturer:Xilinx

  • FPGA Virtex-5 FXT Family 65nm Technology 1V 1738-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS:

  • XC5VFX200T-1FF1738I

    Manufacturer:Xilinx

  • FPGA Virtex-5 FXT Family 65nm Technology 1V 1738-Pin FCBGA
  • Product Categories: Condensateurs électrolytiques en aluminium

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC5VFX200T-2FFG1738C

    Manufacturer:Xilinx

  • FPGA Virtex-5 FXT Family 65nm Technology 1V 1738-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS:

  • XC3064-70PG132M

    Manufacturer:Xilinx

  • FPGA XC3000 Family 4.5K Gates 224 Cells 70MHz 5V 132-Pin CPGA
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: No RoHS

  • XC5VFX30T-2FF665C

    Manufacturer:Xilinx

  • FPGA Virtex-5 FXT Family 65nm Technology 1V 665-Pin FCBGA
  • Product Categories: Résistances

    Lifecycle:Active Active

    RoHS: No RoHS

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.