This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technology > FPGA > Comparison of the latest released FPGAs from Xilinx, Intel, and Lattice - FPGA Technology

Comparison of the latest released FPGAs from Xilinx, Intel, and Lattice

Date: Jul 27, 2020

Click Count: 6575

In the past month, the FPGA market has boomed. In this article, we will briefly study the three newly released FPGAs from Xilinx, Intel and Lattice.

Each of these FPGAs focuses on different aspects of improving performance: Xilinx VU57P attempts to bypass the memory bandwidth challenges in demanding applications. Intel Stratix 10 NX FPGA integrates AI-optimized DSP modules, which can help implement large-scale AI models with low latency. Moreover, Lattice Nexus FPGA attempts to redefine low-power, small-size FPGAs.

Xilinx VU57P FPGA-high bandwidth memory

In the past decade, the computing bandwidth of many applications has increased exponentially. For example, the number of DSP slices provided by Xilinx FPGAs for machine learning applications has increased from approximately 2,000 slices for the largest Virtex 6 FPGA to approximately 12,000 slices for modern Virtex UltraScale+ devices. As shown below, similar trends have been observed in other application areas (such as network technology and video applications).

Requirements for memory bandwidth.png

The figure above shows that in the past ten years, the memory bandwidth of DDR technology has only slightly increased-from DDR3 to DDR4 it has increased by approximately 2 times. (It is worth noting that the leap from DDR4 to DDR5 may be more influential.)

The bandwidth gap in the figure means that the limited data transfer rate between FPGA and memory is the bottleneck in these applications. To solve this problem, designers usually use multiple DDR chips in parallel to increase memory bandwidth (not necessarily memory capacity). However, due to high power consumption, form factor and cost issues, and PCB design challenges, this method becomes unusable when the memory bandwidth is greater than about 85GB/s.

In addition, an effective solution to the memory bandwidth problem is a type of DRAM-based memory called high-bandwidth memory (HBM for short). In this case, silicon stacking technology can be used to implement both DRAM memory and FPGA in the same package, as shown in the figure below.

Silicon stacking helps implement DRAM memory and FPGA in parallel.png

HBM technology allows us to eliminate the relatively long PCB traces that connect the DDR chip to the FPGA. Using an integrated HBM interface with a large number of pins can significantly increase memory bandwidth, and its latency is similar to DDR-based technology.

Xilinx recently released the VU57P FPGA (from the Virtex UltraScale+ series), which integrates 16 G HBM and up to 460GB/s memory bandwidth. The device uses an integrated AXI port switch, allowing us to access any HBM memory location from any memory port.

In addition to the energy-saving computing functions and large memory bandwidth discussed above, VU57P also provides high-speed interfaces such as 100G Ethernet with RS-FEC, 150G Interlaken and PCIe Gen4. The 58G PAM4 transceiver of the new device supports connection with the latest optical standards. This is useful in different applications, such as next-generation firewalls and switches and routers with QoS.

Intel Stratix 10 NX FPGA-AI optimized DSP module

Many conventional applications of digital signal processing (DSP) require high-precision arithmetic. This is why FPGAs usually have DSP modules with high-precision multipliers and adders. For example, XC7A50T (Xilinx) and 5CGXC4 (Intel) have 120 and 140 18×18 multipliers, respectively.

It turns out that many deep learning applications can be implemented with fewer bits without significantly sacrificing accuracy. A lower precision approximation reduces the amount of computing resources and the required memory bandwidth.

Another advantage of reducing the bit width is that it can save power consumption due to lower-precision calculations and fewer bits to be transferred for each memory transaction. In fact, according to UC Davis researchers, in many deep learning applications, INT8 or even lower precision calculations can yield acceptable results.

The Intel Stratix 10 NX FPGA is the first AI optimized FPGA from Intel. These devices integrate arithmetic blocks called AI Tensor Blocks, which contain a dense array of low-precision multipliers. The basic precision of these blocks is INT8 and INT4, although they support FP16 and FP12 numerical formats through shared exponent support hardware.

Compared with the DSP module of the standard Intel Stratix 10 FPGA, the AI Tensor module (used in the Stratix 10 NX FPGA) can increase the INT8 throughput by 15 times. The high-level block diagram of AI Tensor Block is shown below.

Block diagram of AI Tensor Block.png

The most notable feature of Intel Stratix 10 NX FPGA is the high computational density provided by AI-optimized computational blocks. However, the new device also integrates two other functions to further help designers implement its large-scale AI model in a low-latency manner: it supports abundant approximate computing memory (integrated HBM) and high-bandwidth networks (up to 57.8 G PAM4 transceivers)器).

Lattice Nexus — Low power, small size FPGA

Lattice Semiconductor recently released its Certus-NX FPGA series, which uses 28nm fully depleted silicon-on-insulator (FD-SOI) process technology. FD-SOI was originally developed by Samsung and is somewhat similar to the traditional CMOS process. However, as shown in the figure below, it can provide programmable bias for most transistors.

Lattice Semiconductor recently released its Certus-NX FPGA series, which uses 28-nm fully depleted silicon-on-insulator (FD-SOI) process technology. FD-SOI was originally developed by Samsung, which is somewhat similar to the traditional CMOS process; however, it can provide programmable bias for most transistors. The conceptual explanation is as follows.

Circuit architecture of the Lattice Nexus platform.png

The programmable buck voltage greatly reduces chip area and power consumption. Compared with other FPGAs with a similar number of logic cells, Certus-NX's power consumption is reduced by up to four times.

As a result of the FD-SOI technology, the size of the new device can be as small as 6mm x 6mm. Compared with similar FPGAs, the I/O per mm2 is twice as high. The table below compares Certus-NX-40 with similar products from Intel and Xilinx.

Comparison of three popular FPGAs for PCIe design.png

It should be noted that the new device supports AES for bulk encryption and elliptic curve (ECDSA) for identity verification. Therefore, it can provide higher security for networked devices. In addition, it also has high resistance to soft errors, which makes the device suitable for aerospace applications.

FPGA development trend

By studying these newly released FPGAs from Xilinx, Intel and Lattice Semiconductors, we can have a clearer understanding of the development of FPGAs-focusing on higher memory bandwidth, AI optimization, low power consumption and small size.


<< Previous: A Secure U Disk Solution Realized by SoPC Technology

<< Next: Alternative method for system management using on-chip Flash of mixed-signal FPGA

Relateds

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.