This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > FPGAs Fundamentals, advanced features, and applications in industrial electronics > Tools and Methodologies for FPGA-Based Design > Basic Design Flow Based on RTL Synthesis and Implementation Tools

Basic Design Flow Based on RTL Synthesis and Implementation Tools

FONT SIZE : AAA

The combination of RTL synthesis and back-end tools is the core of the tra- ditional synthesis-based FPGA design flow. These tools are essential in all other FPGA design flows since all of them eventually converge into this one. In essence, an RTL synthesizer takes as input HDL files with synthesizable code, which define the system to be designed. The output generated by the synthesizer is an intermediate representation of the circuit, where its basic structure can be identified but there is no link to any target technology. Back- end tools translate this generic structural representation into components available in the selected technology, map them into suitable locations within the FPGA fabric, and create the required interconnections by means of rout- ing resources. If they succeed (which may not be the case, for instance, due to the lack of enough logic or interconnect resources in the target device), the configuration bitstream is generated. Finally, this may be used either to directly configure the FPGA or to program an external nonvolatile memory whose contents are loaded into the FPGA at power-up. 

The main elements this flow consists of, as well as the main information coming in and out of the different tools, are depicted in Figure 6.1. Grayed elements represent the information to be provided by the user. Elements marked with an asterisk are optional. 

The natural order to follow in this design flow starts with the creation of an HDL description of the system. This description is simulated in order to verify functional correctness. After functional simulation, RTL synthesis and back- end tools transform the HDL description into a placed and routed design. At this point, accurate timing information is available, enabling detailed timing simulations to be carried out. Finally, by creating the bitstream and configur- ing the FPGA with it, it is possible to verify the correct operation of the actual implementation. All these steps are discussed in detail in the following sec- tions, which follow the aforementioned natural order. 

RTL synthesis and implementation tools’ design flowpng

FIGURE 6.1 RTL synthesis and implementation tools’ design flow.

Design Entry

The first stage of this flow corresponds to the entry of the required informa- tion into the design framework in order to specify the circuit to be designed. As shown in Figure 6.1, there are three entry points where external data have to be provided by the user, because they are design specific: 

• The file(s) containing the HDL description(s) of the circuit to be designed for implementation in an FPGA. 

• The file(s) describing the testbenches* for the device, under a some- what realistic context. In many cases, device subsystems, if complex enough, should have their own testbenches, too. They are used for simulation purposes and are discussed in Section 6.2.2. 

• The file defining the placement restrictions for I/O connections, which map signals in the design to I/O pins of the FPGA. Optionally, other placement restrictions and configuration attributes of internal compo- nents and signals within the design can be included. This file is used to guide placement and is therefore discussed in Section 6.2.3.3. 

For medium- and high-complexity designs, HDL descriptions (entity/archi- tecture pairs in the case of VHDL, modules in the case of Verilog) are bet- ter organized in a hierarchical way. Typically, the top-level descriptions just show the decomposition of the system into independent elements (mod- ules), whose internal functionality is not described at this level, connected by signals. These descriptions simply place (instantiate) components and interconnect them, so they represent structure, not behavior. Ports within components are linked to signals by mappings associated with the instantia- tion. This approach is followed down the module hierarchy until functional descriptions are obtained for all circuit components, where their behavior can be identified. It is neither possible to simulate nor to synthesize a circuit until the behavior of all its components is described. 

The RTL statements and description styles used to represent behavior are relatively simple but quite different from those of other languages, mainly because HDLs are not programming but description languages. They easily express concurrency (all hardware elements at architecture level are concur- rent among them, so the order they are listed in the code is not significant) as well as data transfers in every active clock edge. 

Concurrent statements, either conditional (e.g., when…else) or selective (e.g., with…select), represent combinational logic. They may define from sim- ple Boolean expressions to more complex functional blocks, such as decoders, encoders, MUXs, and code translators. Data merging and splitting can be, respectively, described through data aggregations, for example, “MacHeader <= DestMAC & SourceMAC & TypeLength;”, and vector subranges, for exam- ple, “MSByte (7 downto 0) <= Word16 (15 downto 0);”. 

On the other side, synchronous sequential logic is typically described by means of processes where all synchronous signal assignments are condition- ally executed within the clause “if clk’event and clk=’1’ ” or similar. In this way, any such signal being assigned is equivalent to one flip-flop (single signal) or one register (vector signal). Processes run concurrently to other processes or other concurrent constructs, but they are “triggered” only when any of the signals in the so-called sensitivity list change. Once a process is triggered, statements within it are executed sequentially at “zero time,” meaning that signal assignments do not take place in the sentence where they are defined, but all them occur together at the end of the process, with a minimum time delay (“delta”), which expresses causality (a process is triggered when sig- nals on its sensitivity list change; signals assigned within the process change slightly after that). This behavior matches nicely with actual register updates taking place in the hardware at active clock edges. 

Since signal assignments are not executed until the process is finished, all sentences within such clauses represent how the new values of the memory elements may be assigned according to the present values of signals and/ or memory elements. This follows strictly the register-transfer level (from where RTL stands) rules these description types may have. Not all HDL clauses are acceptable for synthesis, but the RTL subset is. 

If the conditions to assign new values to registers within a process are too complex to define, variables can be used. Opposite to signals, they change immediately as they are assigned in the sequential structure of a process or function. Variables are used, therefore, to approach algorithms, in the sense that register values are dependent on the combinations of variables and signals that are related algorithmically. As discussed in Section 6.4, HLS has a similar purpose. However, the difference is that the use of variables within a process does not modify the concept of register transfer, and all operations involving variables within a process that represents sequential synchronous logic take place within a clock cycle. Actually, complex algorithmic expressions may involve large critical paths by accumulating operators between register outputs and inputs. Contrary to this, HLS allows pipelining and other optimization techniques to be used since it does not start from a time-explicit description. 

Design entry is, nowadays, automated and/or facilitated in many aspects to simplify designer’s tasks. FPGA design frameworks are more integrated than ever, all options being available within the same environ- ment. Editors for design entry may have features such as templates for basic constructs, syntax highlighting, automatic or aided indentation, on- the-fly syntax checking, code beautifiers, context search, and automatic block comment/uncomment. Also, some frameworks offer the possibil- ity to have schematics automatically generated from structural descrip- tions, and navigation throughout the hierarchy of modules is enabled. For instance, module name-matching within working libraries allows automatic hierarchy identification to be achieved: All modules are sorted as a hierar- chy tree with no need for configuration (i.e., no need to define methods for associating one component’s module name with its description). 

After design entry, a fearless designer might proceed straight into the syn- thesis and back-end process. However, it is normal that complete simulations of the important blocks, as well as for the whole design, are performed as an intermediate step. Simulation tools are described in Section 6.2.2. 

Simulation Tools

Simulation is the preferred method to ensure that the description of a circuit matches its expected functionality. In simulations, a circuit must be set to work under the required conditions or, in other words, to receive a suitable and realistic set of input stimuli, allowing correct operation to be verified. The required stimuli sets are obtained through the generation of testbench descriptions. A testbench is an HDL file that contains an instantiation of the unit under test (UUT) and the elements that provide the stimuli to it. Optionally, testbench descriptions may include assertions to automatically check the fulfillment of some operating conditions. 

Multivalued logic is available, and its use is strongly recommended. In this way, the ability to describe digital signals’ behavior is extended from simply taking “0” or “1” strong logic values to many other situations that may occur in an actual circuit: “U” (unassigned), “X” (conflict), “Z” (high impedance), “H” (weak high), “L” (weak low), and “-” (don’t care). This allows some problems in either the UUT or the testbench itself to be more easily identified. For instance, a not-initialized flip-flop in the design (because it does not have a reset signal) or in the testbench (because the reset signal is not asserted at the beginning) would produce unassigned values that would rapidly propagate through the design. This is due to the fact that simulators are conservative (or even pessimistic) in the sense that they intend to highlight any possible hazardous condition in the circuit, pointing designer’s attention to them. Simulators can also highlight other common mistakes such as multiple assignments to signals coming from different sentences/processes, by setting the affected signals to “X.” 

For relatively simple circuits, stimuli are generated from processes that define the evolution of input signals over time. As simulation time elapses, these processes describe changes in input signals by using wait for (or similar) constructs, until all conditions are evaluated. Clocks are modeled in dedicated separate processes that take advantage of the possibility of reevaluating a pro- cess as soon as it finishes in order to achieve continuous operation. Testbench template file generation tools are capable of modeling this feature automati- cally. Other signals are typically grouped into different processes according to the origin of the incoming stimuli in order to mimic realistic operation. For instance, all signals involved in a communication channel are grouped into one process in order to produce input signal variations resembling the com- munication standard used in that channel (regarding issues such as timing, signal polarity, and coding). To this respect, the use of functions or procedures that carry out repetitive operations with different data is very helpful. 

Processes, however, may not be the best approach for generating stimuli when UUTs are connected to (many) other elements, or there are intensive I/O operations, maybe requiring fine timing relationships among signals. In this case, the need arises for modeling the UUT and the other elements con- nected to it in such a way that stimuli for the different modules are provided at the right times. For instance, if an external memory is used, a model is needed for it, defining a more or less precise timing behavior (depending on the target timing precision for the particular simulation), in order for the interaction between both modules to be accurately described. Otherwise, the designer should have to precisely foresee when the UUT would issue a read transaction for the memory and generate the right data value at the right time according to the address the UUT is supposed to point to. 

In general, this type of testbench modeling is required when there is strong module interaction or closed loops are present in the system. The need for such testbenches must be foreseen when accounting for design and valida- tion efforts. Sometimes, it may be more difficult to model the environment of a circuit than the circuit itself. Simulation and verification times can never be neglected within the whole design process, but they are of particular significance for these heavily interacting systems. 

Figure 6.2 shows sample block diagrams of the two aforementioned testbench types. Case (a) corresponds to a basic, process-based oriented testbench, where input stimuli come from a stand-alone process. Case (b) cor- responds to a simulation with model components set together. The memory model interacts with the UUT by means of the corresponding bus signals. The communication module produces the necessary data sequences in its connections to the UUT, according to a process that generates data packets at the required moments. 

As shown in Figure 6.1 and discussed in the following, simulations can be performed at two different stages of the design process, namely, at func- tional validation level and at timing verification level. In the first case, 

(a) Stimuli process based; (b) side module basedpng

FIGURE 6.2 Testbench examples: (a) Stimuli process based; (b) side module based.

no timing information derived from the characteristics of the implemen- tation is included, so it is often referred to as “ideal model” simulation. The second case corresponds to a stage where accurate timing information (at subclock cycle timing level) is available for analysis. Timing-accurate simulations can be performed, thanks to the data coming out from the timing analysis, which can be conducted after the place and route process (described in Section 6.2.3.3). The simulation model of the UUT is then fed back with data regarding delays in the internal logic and their associated connections inside the FPGA (including wires and switching blocks). Since this model contains a lot of information derived from the structure of the FPGA, simulation execution is much slower than in the case of the ideal model. This is the main reason why it is strongly advisable to perform an initial functional simulation of the circuit, to validate its functions at high level, as well as the overall activity and interactions, before proceeding to the place and route process and then timing simulation. Anyway, it is also possible to consider a “golden” reference testbench that can be applied to both models, equipped with asserts to automatically verify clauses ensur- ing that no deviations with respect to the target model occur during the synthesis and back-end design phases. 

Not only very sophisticated simulation environments but also conventional ones within a classic design flow are nowadays equipped with feature-rich visualization and analysis tools, which significantly contribute to simplify- ing design validation. Data can be visualized as individual signal lines or grouped into buses, using different digital representations (such as binary, hexadecimal, or ASCII), or even as analog signals displayed in an oscillo- scope-like format. Signals exhibiting nontypical or unexpected behaviors (e.g., taking an “X” value) are represented using different attention-calling colors. Signal navigation for long simulation runs can be accelerated by all kinds of zooming and panning. Navigation may be also done by selecting a signal and checking its evolution transition by transition; in this way, there is no need to look for specific values in a signal across long periods of time; the visualization tool can be asked to move forward (or backward) and find that small “cycle” almost hidden among all other signals. It is also possible to select the signals to be traced by navigating through the hierarchies of components and processes. 

The simulation process can be enhanced with features that allow more realistic results to be achieved, execution to be accelerated, or interacting discrete- and continuous-time systems to be analyzed together. The result- ing simulation approaches can be summarized in the following categories, addressed in subsequent sections: 

• Interactive simulation 

• Mixed-mode simulation 

• HIL verification 

Interactive Simulation

Simulations can be made interactive (and with customized graphical inter- faces) for the sake of building virtual models that resemble as much as possible the appearance and the interactivity among elements, in particu- lar when interaction with humans is to be verified/validated. Some simu- lators offer these possibilities by means of specific script languages (such as C or other standard programming languages), native tools embedded into other frameworks, or even communication sockets enabling distrib- uted virtual or remote simulations. If fast-enough simulation platforms are available (e.g., combining powerful processors with simple and fast system models), interactivity can be made somehow similar to “real-life” behav- ior, for instance, allowing user interfaces to be validated and “mock-ups” of products to be made well before they are actually produced or built. However, the use of this technique is not advisable from the design valida- tion viewpoint since unpredictable human interaction makes experiments lose repeatability. 

Mixed-Mode Simulation

Mixed-mode simulation is a technique to be considered when the embed- ded system to be designed is part of a control loop, where the system to be controlled is to be modeled in continuous time rather than as a discrete-event system. In this case, it is possible to combine discrete-event simulators (such as HDL simulators for the required digital designs) with continuous-time simulators, which are effective for either analog cir- cuits or any other physical systems modeled with continuous signals. For instance, when designing a motor controller, its behavior can be more real- istically analyzed if the discrete events coming out of it are converted to analog signals and applied to a motor model so that both elements can be jointly simulated at the same time. There are also some specific HDLs tar- geting mixed-mode simulation, such as VHDL-AMS, that could be useful for such type of simulation. 

HIL Verification

The evolution of the features and performance of simulation platforms currently enables the combination of emulation and simulation tasks through the use of HIL techniques. This approach is similar to mixed- mode simulation, but instead of a model of the physical system, the real system itself is used. A necessary condition for HIL platforms to provide realistic results is that the execution of the model being simulated has to be as fast as the real system it interacts with. This is not the case, of course, for systems with hard real-time requirements, but it is still feasible in a wide range of applications, and its use is being adopted in many design and verification flows. 

Some companies include in their simulators features intended to support HIL operation, providing suitable interfaces between the simulation and emulation domains in a similar way as required in mixed-mode simula- tion for the interaction between the digital and the analog/physical models. However, since there is no standardization so far in this respect, the mixed emulated-simulated scenarios have to be customized on a case-by-case basis, usually implying a considerable amount of work. Although, as a con- sequence, the decision on whether to use this technique or not is strongly application dependent, it is becoming a very interesting possibility for many embedded systems in industrial applications, as discussed in Chapter 9. 

RTL Synthesis and Back-End Tools

The validation of the functional description of a system in synthesizable HDL code is the green flag to proceed to the synthesis and implementation of the design. As discussed in Section 6.2 (and highlighted in Figure 6.1), the transformation from the HDL description of a circuit to the correspond- ing FPGA programming bitstream consists of several steps, namely, RTL synthesis (analyzed in detail in Section 6.2.3.1), translation into the target technology (Section 6.2.3.2), placement and routing (Section 6.2.3.3), and bitstream generation (Section 6.2.3.4). In addition to HDL descriptions, the specification of constraints for guided placement and some other parameters may be required to configure and guide the synthesis and implementation processes. 

RTL Synthesis

This step is the most important of the basic design flow, where the logic ele- ments that will actually perform the functions described in the HDL input file(s) are obtained. All elements described in the file(s) are translated into an intermediate representation, where operators that process signals are placed in between registers that memorize signal values from one clock cycle to the next. This characteristic is the one from which the name RTL (from register- transfer level) given to this type of synthesis is derived. 

The operation of an RTL synthesizer can be explained from the state and output equations that define the evolution of any sequential system, which may be expressed as 

33.png

and

44.png

where

Q are state variables

X are inputs

Y are outputs

Subindex “t” denotes current time

“t+1” denotes next time step (according to the discrete nature of time in synchronous sequential systems)

On the one side, an RTL synthesizer translates the expressions inside the clauses conditioned to work only at active clock edges (e.g., within clk’event clauses inside the processes that model sequential logic in VHDL) into state equations. On the other side, concurrent sentences are translated into out- put equations, that is, combinational functions that determine the value of the outputs at any time, given the state and the input values at the very same time. 

Figure 6.3 illustrates with an example how a design (a simple binary counter) is gradually transformed from its description in the HDL input file to the final circuit implementation into the FPGA fabric. The VHDL process describing the behavior of the counter is shown in Figure 6.3a. 

tages during synthesis and implementation.png

FIGURE 6.3 Stages during synthesis and implementation: (a) HDL code; (b) abstract model; (c) after Boolean optimization; (d) mapped into target technology; (e) placed design; (f) placed and routed design.

The assignments representing changes in the state of the counter that may take place at each active edge of the clock signal “clk” (i.e., conditioned to the occurrence of “clk’event and clk = ‘1’ ”) are 

• “counter <= ‘0’;”, which makes the counter roll over when the MaxVal value has been reached 

• “counter <= counter + 1;”, which increases the counter value otherwise 

Since the state of the counter is assigned with a synchronous operation, a register is required. Therefore, the translation of these sentences into the intermediate representation of the circuit shown in Figure 6.3b produces a register and a binary adder, which adds 1 to the current value of the register and transfers it into its next value if the counter is not at MaxVal value (a comparator is used to check this condition). Otherwise, the reg- ister synchronously rolls over to zero. This “synchronous clear” function must have higher priority than the “increment” function. This priority is explicitly expressed in the code, where the roll-over condition is evalu- ated in the if clause and the increment condition is evaluated in the cor- responding else part. 

At this point, a significant question arises. According to the specification, the HDL code has been transformed into a register, an adder, and a compara- tor. Does this mean that the hardware structure of a counter implemented from an HDL description is different from the well-known one in Figure 6.3c, typically taught in basic digital electronics courses? Apparently yes, but the real answer is no. After the translation into the intermediate representation, it is time for Boolean optimization. In this step, logic is as much simplified as possible, unnecessary logic is removed, redundant logic is reduced when identified as such (which is only possible to a certain extent in complex cir- cuits), and, if specified, logic is arranged to fulfill timing constraints. These constraints are specified by the designer, and, in their simplest form, they just consist of a minimum operating frequency, which determines the maxi- mum critical path delay of the design. 

How is the structure in Figure 6.3c inferred from Figure 6.3b? As can be clearly seen, the adder has a constant integer value of 1 in one of its inputs, that is, a binary combination “00…0001.” The 0s in this input greatly simplify the adder, the series AND gates that appear in the reduced circuit being a resemblance of the carry propagation chain. The transformation from D flip- flops (the usual ones for registers) to T flip-flops comes from the fact that the basic addition of two bits is equivalent to an XOR gate (0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0; “+” being, in this case, the arithmetic operator, not the logic one), and a T flip-flop is obtained from a D flip-flop by feeding back the output of the register by means of an XOR gate. Finally, if the counter’s count cycle is a power of 2, the comparator disappears; otherwise, it would appear as an AND gate that will trigger register rollover. 

Translation

The result of the synthesis is an intermediate generic representation with reduced logic expressions. However, this is not necessarily equivalent to having a minimized circuit built with conventional logic gates, nor are these gates size optimized for this target. Actually, the target technology might use building blocks that are far from being these logic gates, such as in the case of FPGAs, where LUTs and flip-flops are the basic constitu- ents of the logic (as analyzed in Section 2.3.1). Therefore, in FPGA design, logic expressions must be, in principle, implemented with LUTs (specialized hardware blocks may also be used), in such a way that the same functional- ity is obtained, but using the elements that allow the circuit to be mapped into the target FPGA device. 

With the advent of proprietary synthesizers for some FPGA families, the synthesis and translation steps are more tightly linked, since the way some structures are mapped into specialized resources, such as memories or DSP blocks, requires some knowledge of the underlying technology in order not to miss the possibility of using them. In some cases, the inference of some elements needs to follow a specific syntax for the synthesizer to recognize them. In other cases, the possibility exists to use attributes in the code (or synthesis and optimization options) to guide the tools to infer some specific elements and choose among different implementation possibilities. This is the case, for instance, for memories. The designer may, for instance, decide whether a memory in his/her design is to be mapped into embedded mem- ory blocks or distributed along the logic (among other possibilities). Tools, in principle, should be able to choose a right solution (if it exists) so that the cir- cuit fits into the target FPGA, but in case the system needs to be “fine-tuned,” as for speed optimization or detailed positioning and mapping, the designer may decide to instantiate primitive technology–specific blocks, such as an LUT, to have full control on the mapping. This may be combined with place- ment restriction specifications, which are analyzed in Section 6.2.3.3. 

After technology translation, a circuit like the one in Figure 6.3d is obtained. At this point, the specific location of the elements and the spe- cific connecting paths (wires and switching logic) to be used have not been determined yet. These issues are addressed in the placement and routing processes, described next. 

Placement and Routing

The placement and routing processes are in charge of providing fully mapped logic and fully specified interconnects, respectively. In the place- ment process, the already device-specific circuit must be mapped into spe- cific locations within the configurable fabric such that it meets two basic, but opposite, criteria: Connected elements must be as close as possible in order to minimize signal propagation delays, whereas logic density (or occupation) should be not too high in order to make routing of all required connections possible with the available routing resources. Typically, placement and rout- ing follow an iterative scheme, such that (preliminary) placements are fol- lowed by (nondetailed) routings. In each iteration, delay optimization steps are executed in order for critical delays to be minimized, and routing is progressively defined (in terms of percentage of routed signals). If there is no routing solution for some resources within an area, some elements are swapped, displaced, or separated in order to facilitate routing. Routes that were not feasible in the previous iteration are normally routed first, assum- ing that the easier ones will still be routable afterward. 

Placement and routing are among the most time-consuming tasks in the design flow. In spite of the simplified explanation given earlier, the actual procedures are very complex indeed. For instance, the possibility of stagna- tion of the iteration loop is addressed not only at design tool level but also at architecture level. As discussed in Section 2.3.3, FPGAs include local con- nections and midlength connections of different lengths, allowing signals to reach other areas of the device using different wire lengths, letting signals avoid wire congestions while covering large distances in the FPGA without crossing too many interconnection switches (which are responsible for the most part of signal propagation delays). Design tools must also make a con- sistent use of global lines (also discussed in Section 2.3.3). These are mostly dedicated to clock or reset signals, but they may also be used for other tasks, such as to globally enable or disable large portions of a circuit by means of an enable signal with high fan-out. Because of the importance and complex- ity of this process, routing tools have become one of the key elements of the implementation tool flow. 

In principle, placement is arbitrarily defined for most of the logic elements, with some exceptions. First, designers have to specify the mapping of signals to pins in the FPGA. This is done by a collection of the so-called placement restrictions (stored in a “restrictions file”), which specify the I/O type and I/O pin for each signal in the design. In addition to these mandatory place- ment restrictions, further ones may optionally be used to “guide” the tool in placing design components or elements in specific regions of the device. This facilitates not only more optimized designs to be obtained but also incremental or difference-oriented design to be performed. Since complex designs require a lot of computation time for placement and routing, parts of a design that have been previously validated may be consistently kept placed in the same regions with the same routing so that placement and routing efforts are mainly concentrated on the parts of the design still being built and debugged, thus reducing the overall design effort. 

Once the routing process is completed, the full circuit is known, so detailed reports may be produced, as shown in Figure 6.1. On one side, uti- lization of logic resources is summarized, both in general terms and for every particular component (LUTs, flip-flops, slices, LBs, I/O blocks, DSP blocks, RAM blocks, etc.). The timing report is of particular interest because it includes an analysis of the critical paths and the subsequent maximum operating frequency that may be achieved. If a minimum operating fre- quency is specified at synthesis time, the fulfillment of this requirement is checked, and, as a result, either the slack time that is left is specified (allow- ing to determine how much faster than required the circuit would be able to operate) or a list of paths whose propagation delays exceed the maximum allowed time is provided. In some cases, these delays do not correspond to the most realistic situations since they may represent conditions unlikely to happen during normal operation. In all other cases, the circuit must be redesigned using time reduction techniques, such as pipelining or segmen- tation of large combinational areas in the design. For complex circuits with critical timing issues, this iterative design flow may be tedious and it may be the case that no solution can be found for a given design to be imple- mented in a given FPGA. When this happens, the first solution would be to look for equivalent devices with higher speed grades (i.e., faster), which unfortunately will most likely be more expensive. In the worst cases, the FPGA device or family must be changed, which may have significant nega- tive implications, such as the need for PCB redesign. Therefore, it is highly advisable, in particular for time-critical designs or those where FPGA uti- lization is high, not to proceed to any subsequent system design step until placement and routing has been completed, and extensive timing simula- tions have been carried out, so that the circuit is known to exhibit correct behavior and to fit in the target device. 

Figure 6.3e shows an illustrative example of a circuit after placement, where elements are placed in specific positions of the device, typically represented by their Cartesian coordinates. Figure 6.3f presents the final result after routing; that is, the circuit resulting from its description has been synthesized from a set of design specification files, then mapped and translated into the corresponding FPGA technology, and eventually placed and routed. 

Bitstream Generation

Once the circuit is fully implemented after the placement and routing steps have been successfully completed, the bitstream that will be downloaded into the FPGA to configure it can be generated. This bitstream contains the data to be written to the FPGA configuration memory for the required elements inside the device to be adequately arranged for it to operate as specified. 

Although the way the mapping between the configuration memory ele- ments and the corresponding logic elements in the reconfigurable fabric is done is kept confidential by many FPGA vendors, some information is usu- ally disclosed about the relationship between placement (in the fabric) and addressing (in the memory), allowing block relocation in a reconfigurable platform (as described in Section 8.2), or to apply memory fault diagnosis and correction schemes to critical areas. Fault diagnosis is of paramount importance for FPGAs working in environments prone to cause mem- ory bit flips (known in the specific jargon as “single-event upsets,” SEUs). For SRAM-based FPGAs configured at boot time from an external nonvola- tile memory, the occurrence of a bit flip in the configuration memory can be periodically checked during operation by comparing the contents of the nonvolatile memory and the internal configuration memory. This may be combined as well with error verification and correction methods available in some FPGAs. Placement with area restrictions and the aforementioned knowledge of the bitstream structure enable this verification to be concen- trated on critical areas of the design. 

Bitstreams are clearly a potential source for IP vulnerability, because a design might be reproduced in principle by any third party having access to the bitstream. With the increasing complexity of FPGAs and the variety of programming and read-back possibilities available, this problem requires special attention. In order to mitigate it, vendors provide bitstream cypher- ing capabilities, the option of avoiding configuration memory read-back after programming, and other sophisticated techniques. The provision of remote configurations in networked systems is also a possibility to consider, since in this case there is no physical device (i.e., a flash memory) whose content may be copied, but instead an encrypted bitstream is transmitted every time a device is to be configured. This has the advantage of increased system main- tainability (also enabling upgrades), but also the risk of malfunction in case of network failure. As an alternative approach, some FPGAs have unique identifiers (one identifier per device), allowing designs to be only used in the device with the right identifier. That is, the same bitstream downloaded to an identical FPGA device with identical PCB design will not work. Regarding security, however, as in any other technologies, there is no infallible solution in the FPGA domain either. If IP protection and design privacy are important issues, designers should contact FPGA vendors for the assessment of the spe- cific capabilities of their devices in this context. 

Bitstreams may be compressed in order to minimize memory utilization for bitstream storage, as well as to reduce reconfiguration time, which is mostly due to the process of receiving the bitstreams through the configura- tion port rather than due to the internal reconfiguration process itself (ana- lyzed in Chapter 8). Simple encodings, such as run-length encoding, can be used for bitstream compression. In this type of encoding, bytes (or words) with the same value located consecutively in the bitstream are compressed by specifying the value and the number of times it is repeated. As a clear example of the usefulness of this technique, one may think on the great sav- ings (in terms of bitstream storage and configuration time) associated with unused areas of the FPGA. While tools encode information at bitstream generation time, FPGAs are required to have the corresponding decoding resources that, fortunately, are really simple and silicon-inexpensive. 

  • XC2C512-10PQG208I

    Manufacturer:Xilinx

  • CPLD CoolRunner -II Family 12K Gates 512 Macro Cells 128MHz 0.18um Technology 1.8V 208-Pin PQFP
  • Product Categories: CPLDs

    Lifecycle:Active Active

    RoHS:

  • XC2C512-7FT256I

    Manufacturer:Xilinx

  • CPLD CoolRunner -II Family 12K Gates 512 Macro Cells 179MHz 0.18um Technology 1.8V 256-Pin FTBGA
  • Product Categories: CPLDs

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC3SD3400A-4FG676C

    Manufacturer:Xilinx

  • FPGA Spartan-3A DSP Family 3.4M Gates 53712 Cells 667MHz 90nm Technology 1.2V 676-Pin FBGA
  • Product Categories: FPGAs

    Lifecycle:Active Active

    RoHS:

  • XC3SD3400A-5CSG484C

    Manufacturer:Xilinx

  • FPGA Spartan-3A DSP Family 3.4M Gates 53712 Cells 770MHz 90nm Technology 1.2V 484-Pin LCSBGA
  • Product Categories: FPGAs

    Lifecycle:Active Active

    RoHS:

  • XC4002A-5PC84C

    Manufacturer:Xilinx

  • FPGA XC4000A Family 2K Gates 64 Cells 125MHz 5V 84-Pin PLCC
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: No RoHS

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.