This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > Designing with Xilinx FPGAs Using Vivado > State-of-the-Art Programmable Logic > FPGA Architecture

TABLE OF CONTENTS

Xilinx FPGA FPGA Forum

FPGA Architecture

FONT SIZE : AAA

FPGA Architecture Overview

The primary function of the FPGA is to implement programmable logic which can be used by end customers to create new hardware devices. FPGAs are built around an array of programmable logic blocks embedded in a sea of programmable inter- connect. This array is often referred to as the programmable logic fabric or just the fabric . At the edges are programmable I/O blocks designed to interface the fabric signals to the external world. It was this set of innovations that sparked the FPGA industry. Figure 1.2 shows a basic architecture of an FPGA. 

Interestingly, nearly all the other special FPGA features such as carry chains, block RAM, or DSP blocks can also be implemented in programmable logic. This is in fact the approach the initial FPGAs took and users did implement these func- tions in LUTs. However, as the FPGA markets developed, it became clear that these special functions would be more cost effective as dedicated functions built from hard gates and later FPGA families such as the Xilinx 4 K series and Virtex began 

Basic FPGA architecture.png

Fig. 1.2 Basic FPGA architecture

to harden these special functions. This hardening improved not only cost but also improved frequency substantially. 

Within any one FPGA family, all devices will share a common fabric architec- ture, but each device will contain a different amount of programmable logic. This enables the user to match their logic requirements to the right-sized FPGA device. FPGAs are also available in two or more package sizes which allow the user to match the application I/O requirements to the device package. FPGA devices are also available in multiple speed grades and multiple temperature grades as well as multiple voltage levels. The highest speed devices are typically 25 % faster than the lower speed devices. By designing to the lowest speed devices, users can save on cost, but the higher performance of the faster devices may minimize system level cost. 

Modern FPGAs commonly operate at 100–500 MHz. In general, most logic designs which are not targeted at FPGA architectures will run at the lower fre- quency range, and designs targeted at FPGAs will run in the mid-frequency range. The highest frequency designs are typically DSP designs constructed specifi cally to take advantage of FPGA DSP and BRAM blocks. 

Sections below describe a high level overview of FPGA architectures. Please refer to Xilinx’s data sheets and user guides for more detailed and current information. 

Programmable Interconnect

Woven through the FPGA logic fabric is a set of wires which can be wired together to connect any two blocks in an FPGA. This enables arbitrary logic networks to be constructed by the user. The architecture of the interconnect wires varies from generation to generation and is hidden from the user by the tools.

Programmable Logic Block

An array of programmable logic blocks are embedded into the programmable interconnect. These are called CLBs (confi gurable logic blocks) in Xilinx devices. Today, each logic block consists of one or more programmable logic functions implemented as a 4–6-bit confi gurable lookup table ( LUT ), a confi gurable carry chain, and confi gurable registers. We use the word confi gurable to indicate a hard block which can be confi gured through the FPGA’s confi guration memory to be used as part of the user’s logic. For instance, if the user design called for a register with a clock enable (CE), the register is confi gured to have the clock enable enabled and connected to the user’s CE signal. Figure 1.3a through c illustrates the UltraScale CLB architecture, showing the CLB , LUT-fl ip-fl op pair, and the carry chain structures. 

The combination of a LUT, carry chain, and register is called a logic cell or LC . The capacity of FPGAs is commonly measured in logic cells. For instance, the largest Xilinx Virtex UltraScale FPGA supports up to 4 million LCs, while the smallest Spartan device contains as few as 2000 logic cells. Depending on usage, each logic cell can map between 5 and 25 ASIC gates. The lower number is com- monly used for ASIC netlist emulation, while the higher number is achievable under expert mapping. 

For Xilinx UltraScale devices, the CLB supports up to 8 × 6-input LUTs, 16 reg- isters, and 8 carry chain blocks. Each 8-LUT can be confi gured as 2 × 5-LUTs if the 5-LUTs share common signals. For comparison purposes, Xilinx rates each 6-LUT as the equivalent of 1.6 LCs or Logic cells. 

Embedded in the CLB is a high-performance look-ahead carry chain which enables the FPGA to implement very high-performance adders. Current FPGAs have carry chains which can implement a 64-bit adder at 500 MHz. 

Associated with each LUT is an embedded register . The rich register resources of the FPGA programmable logic enable highly pipelined designs, which are a key to maintaining higher speeds. Each register can be confi gured to support a clock enable and reset with confi gurable polarity. 

An important additional feature of the Xilinx CLB’s 6-LUT is that it can confi gure to implement a small 64-bit deep by 1-bit wide memory called a distributed RAM . An alternate confi guration allows the 6-LUT to implement a confi gurable depth shift register with a delay of 1–32 clocks. 

Memory

Access to memory is extremely important in modern logic designs. Programmable logic designs commonly use a combination of memories embedded in the FPGA logic fabric and external DDR memories. Within the logic fabric, memory can be implemented as discrete registers, shift registers , distributed RAM, or block RAM . Xilinx UltraScale devices support two sizes of block RAM, 36-kbit RAMs and 288- kbit RAMs. In most cases the Xilinx tools will select the best memory type to map each memory in the user design. In some cases, netlists optimized for FPGAs will hand instantiate memory types to achieve higher density and performance. 

Special forms of memory called dual-port memories and FIFOs are supported as special modes of the block RAMs or can be implemented using distributed RAM . 

System memory access to external DDR memory (Chap. 5 ) is via a bus interface which is commonly an AXI protocol internal to the FPGA. UltraScale FPGAs support 72-bit wide DDR4 at up to 3200 MB/s. 

In general, registers or fl ip-fl ops are used for status and control registers, pipelining, and shallow (1–2 deep) FIFOs. Shift registers are commonly used for signal delay elements and for pipeline balancing in DSP designs. Distributed RAMs are used for shallow memories up to 64 bits deep and can be as wide as necessary. Block RAMs are used for buffers and deeper memories. They can also be aggregated 

( a ) UltraScale CLB, ( b ) one of the eight LUT-fl ip-fl op pairs from an UltraScale CLB, ( c ).png

Fig. 1.3 ( a ) UltraScale CLB, ( b ) one of the eight LUT-fl ip-fl op pairs from an UltraScale CLB, ( c ) carry chain paths

(conrinued).png

Fig. 1.3 (conrinued)

DSP fl owgraph.png

Fig. 1.4 DSP fl owgraph

together to support arbitrary widths and depths. For instance, a 64-bit wide by 32 K-bit deep memory would require 64 block RAMs. Generally FPGAs contain around 1 36 K block RAMs for every 500–1000 logic cells.

DSP Blocks

Modern FPGAs contain discrete multipliers to enable effi cient DSP processing. Commonly DSP applications build pipelines or fl ow graphs of DSP operations and data streams through this fl ow graph. A typical DSP fi lter called an FIR (fi nite impulse response) fi lter is shown in Fig. 1.4 . It consists of sample delay blocks, multipliers, adders, and memories for coeffi cients. Interestingly this graph can be almost directly implemented as an FPGA circuit. 

For fi ltering and many other DSP applications, multipliers and adders are used to implement the fl ow graph. Xilinx FPGAs contain a DSP block known as a DSP48 which supports an 18-bit × 25-bit multiplier, a 48-bit accumulator, and a 25-bit pre- adder. In addition up to four levels of pipelining can be supported for operation up to 500 MHz. The DSP48 supports integer math directly; however, 32-bit and 64-bit fl oating point operations are supported as library elements. A 32-bit fl oating point multiplier will require two DSP48s and several hundred LCs. 

Xilinx tools will generally map multipliers and associated adders in RTL or HDL languages to DSP48 blocks. For highest performance however, designs optimized for DSP in FPGAs may use DSP48 aware libraries for optimal performance, power, and density. 

Clock Management

Logic netlists almost universally require one or more system clocks to implement synchronous netlists for I/O and for internal operation. Synchronous operation uses a clock edge to register the results of upstream logic and hold it steady for use by downstream logic until the next clock edge. The use of synchronous operation allows for pipelined fl ow graphs which process multiple samples in parallel. External digital communications interfaces use I/O clocks to transfer data to and from the FPGA. Commonly, interface logic will run at the I/O clock rate (or a multiple of the I/O clock rate). Chapter 12 covers more on clocking resources available on Xilinx FPGAs. 

I/O Blocks

One of the key capabilities of FPGAs is that they interface directly to external input and output (I/O) signals of all types and formats. To support these diverse require- ments, modern FPGAs contain a special block called the I/O block or IOB . This block contains powerful buffers to drive external signals out of the FPGA and input receivers, along with registers for I/O signals and output enables (OE). IOBs typi- cally support 1.2–3.3 V CMOS as well as LVDS and multiple industry I/O memory standards such as SSTL3. For a complete list, refer to the device datasheet. I/Os are abstracted from the user RTL and HDL design and are typically confi gured using a text fi le to specify each I/O’s signaling standard. 

UltraScale devices also include multiplexing and demultiplexing features in the I/O block. This feature supports dual data rate (DDR) operation and operation for 4:1 or 8:1 multiplexing and demultiplexing. This allows the device to operate at a lower clock rate than the I/O clock. For example, Gigabit Ethernet (SGMII) oper- ates at 1.25 GHz over a single LVDS link, which is too fast for the FPGA fabric to support directly. The serial signal is expanded to 8/10 bits in the IOB interface to the fabric allowing the fabric to operate at 125 MHz. I/Os are commonly a limited resource, and FPGAs are available in multiple package sizes to allow the user to use smaller lower-cost FPGAs with lower signal count applications and larger package sizes for higher signal count applications. This helps to minimize system cost and board space. 

A primary application of FPGA I/Os is for interfacing to memory systems. UltraScale devices support high-bandwidth memory systems such as DDR4. 

High-Speed Serial I/Os ( HSSIO )

CMOS and LVDS signaling are limited in performance and can be costly in terms of power and signal count. For this reason, high-speed serial I/Os have been devel- oped to enable low-cost, high-bandwidth interfaces. This evolution can be seen in the evolving PCI standard which has moved from low-speed 32-bit CMOS inter- faces at 33 MHz to PCIe Gen3 with 1–8 lanes at 8 Gb/s lane. An eight-lane PCIe Gen3 interface can transfer 64 Gb/s of data in each direction. Xilinx UltraScale devices support up to 128 MGT (Multi-Gigabit Transceivers) at up to 32.75 Gb/s. 

Within the FPGA, the HSSIO are interfaced directly to a custom logic block which multiplexes and demultiplexes the signals to wide interfaces at lower clock mrates. This block also performs link calibration and formatting.


  • XC3S200A-5VQ100C

    Manufacturer:Xilinx

  • FPGA Spartan-3A Family 200K Gates 4032 Cells 770MHz 90nm Technology 1.2V 100-Pin VTQFP
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC17S40XLPDG8C

    Manufacturer:Xilinx

  • PROM Serial 323K-bit 3.3V 8-Pin PDIP
  • Product Categories: Memory - Configuration Proms for FPGA's

    Lifecycle:Obsolete -

    RoHS:

  • XC17S50APD8I

    Manufacturer:Xilinx

  • SERIAL PROM FOR 50000 SYSTEM GATE LOGIC
  • Product Categories: Memory - Configuration Proms for FPGA's

    Lifecycle:Any -

    RoHS: -

  • XC4VLX25-11FFG676I

    Manufacturer:Xilinx

  • FPGA Virtex-4 LX Family 24192 Cells 90nm Technology 1.2V 676-Pin FCBGA
  • Product Categories: Socle de fusible

    Lifecycle:Obsolete -

    RoHS:

  • XC4VLX25-11SFG363I

    Manufacturer:Xilinx

  • FPGA Virtex-4 LX Family 24192 Cells 90nm Technology 1.2V 363-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS:

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.