Date: Jul 27, 2020
Click Count: 6575
In the past month, the FPGA market has boomed. In this article, we will briefly study the three newly released FPGAs from Xilinx, Intel and Lattice.
Each of these FPGAs focuses on different aspects of improving performance: Xilinx VU57P attempts to bypass the memory bandwidth challenges in demanding applications. Intel Stratix 10 NX FPGA integrates AI-optimized DSP modules, which can help implement large-scale AI models with low latency. Moreover, Lattice Nexus FPGA attempts to redefine low-power, small-size FPGAs.
Xilinx VU57P FPGA-high bandwidth memory
In the past decade, the computing bandwidth of many applications has increased exponentially. For example, the number of DSP slices provided by Xilinx FPGAs for machine learning applications has increased from approximately 2,000 slices for the largest Virtex 6 FPGA to approximately 12,000 slices for modern Virtex UltraScale+ devices. As shown below, similar trends have been observed in other application areas (such as network technology and video applications).
The figure above shows that in the past ten years, the memory bandwidth of DDR technology has only slightly increased-from DDR3 to DDR4 it has increased by approximately 2 times. (It is worth noting that the leap from DDR4 to DDR5 may be more influential.)
The bandwidth gap in the figure means that the limited data transfer rate between FPGA and memory is the bottleneck in these applications. To solve this problem, designers usually use multiple DDR chips in parallel to increase memory bandwidth (not necessarily memory capacity). However, due to high power consumption, form factor and cost issues, and PCB design challenges, this method becomes unusable when the memory bandwidth is greater than about 85GB/s.
In addition, an effective solution to the memory bandwidth problem is a type of DRAM-based memory called high-bandwidth memory (HBM for short). In this case, silicon stacking technology can be used to implement both DRAM memory and FPGA in the same package, as shown in the figure below.
HBM technology allows us to eliminate the relatively long PCB traces that connect the DDR chip to the FPGA. Using an integrated HBM interface with a large number of pins can significantly increase memory bandwidth, and its latency is similar to DDR-based technology.
Xilinx recently released the VU57P FPGA (from the Virtex UltraScale+ series), which integrates 16 G HBM and up to 460GB/s memory bandwidth. The device uses an integrated AXI port switch, allowing us to access any HBM memory location from any memory port.
In addition to the energy-saving computing functions and large memory bandwidth discussed above, VU57P also provides high-speed interfaces such as 100G Ethernet with RS-FEC, 150G Interlaken and PCIe Gen4. The 58G PAM4 transceiver of the new device supports connection with the latest optical standards. This is useful in different applications, such as next-generation firewalls and switches and routers with QoS.
Intel Stratix 10 NX FPGA-AI optimized DSP module
Many conventional applications of digital signal processing (DSP) require high-precision arithmetic. This is why FPGAs usually have DSP modules with high-precision multipliers and adders. For example, XC7A50T (Xilinx) and 5CGXC4 (Intel) have 120 and 140 18×18 multipliers, respectively.
It turns out that many deep learning applications can be implemented with fewer bits without significantly sacrificing accuracy. A lower precision approximation reduces the amount of computing resources and the required memory bandwidth.
Another advantage of reducing the bit width is that it can save power consumption due to lower-precision calculations and fewer bits to be transferred for each memory transaction. In fact, according to UC Davis researchers, in many deep learning applications, INT8 or even lower precision calculations can yield acceptable results.
The Intel Stratix 10 NX FPGA is the first AI optimized FPGA from Intel. These devices integrate arithmetic blocks called AI Tensor Blocks, which contain a dense array of low-precision multipliers. The basic precision of these blocks is INT8 and INT4, although they support FP16 and FP12 numerical formats through shared exponent support hardware.
Compared with the DSP module of the standard Intel Stratix 10 FPGA, the AI Tensor module (used in the Stratix 10 NX FPGA) can increase the INT8 throughput by 15 times. The high-level block diagram of AI Tensor Block is shown below.
The most notable feature of Intel Stratix 10 NX FPGA is the high computational density provided by AI-optimized computational blocks. However, the new device also integrates two other functions to further help designers implement its large-scale AI model in a low-latency manner: it supports abundant approximate computing memory (integrated HBM) and high-bandwidth networks (up to 57.8 G PAM4 transceivers)器).
Lattice Nexus — Low power, small size FPGA
Lattice Semiconductor recently released its Certus-NX FPGA series, which uses 28nm fully depleted silicon-on-insulator (FD-SOI) process technology. FD-SOI was originally developed by Samsung and is somewhat similar to the traditional CMOS process. However, as shown in the figure below, it can provide programmable bias for most transistors.
Lattice Semiconductor recently released its Certus-NX FPGA series, which uses 28-nm fully depleted silicon-on-insulator (FD-SOI) process technology. FD-SOI was originally developed by Samsung, which is somewhat similar to the traditional CMOS process; however, it can provide programmable bias for most transistors. The conceptual explanation is as follows.
The programmable buck voltage greatly reduces chip area and power consumption. Compared with other FPGAs with a similar number of logic cells, Certus-NX's power consumption is reduced by up to four times.
As a result of the FD-SOI technology, the size of the new device can be as small as 6mm x 6mm. Compared with similar FPGAs, the I/O per mm2 is twice as high. The table below compares Certus-NX-40 with similar products from Intel and Xilinx.
It should be noted that the new device supports AES for bulk encryption and elliptic curve (ECDSA) for identity verification. Therefore, it can provide higher security for networked devices. In addition, it also has high resistance to soft errors, which makes the device suitable for aerospace applications.
FPGA development trend
By studying these newly released FPGAs from Xilinx, Intel and Lattice Semiconductors, we can have a clearer understanding of the development of FPGAs-focusing on higher memory bandwidth, AI optimization, low power consumption and small size.
<< Previous: A Secure U Disk Solution Realized by SoPC Technology
<< Next: Alternative method for system management using on-chip Flash of mixed-signal FPGA
A design scheme of general data acqu...
Data acquisition is an important part of signal analysis and...
Date: Jun 26, 2020
Design of lane departure warning sys...
In all kinds of traffic accidents, the front collision of ve...
Date: Jul 09, 2020
Using FPGA to realize various types ...
The design of the FIR filter is inherently stable because no...
Date: Jul 10, 2020
The key role of FPGA in automotive s...
Flash-based FPGA logic also has internal security features t...
Date: Jul 02, 2020
The function and design of FPGA acce...
5G is coming soon, facing massive data tasks between "access...
Date: Jun 22, 2020
AMD news you must know in 2020
The recent hot news about ADM is undoubted that AMD plans to...
Date: Oct 27, 2020
1
2
3
4
5
6
7
8
Comparison of the latest released FPGAs from Xilinx, Intel, and Lattice
9
10
FPGA XC4000X Family 10K Gates 950 Cells 0.35um Technology 3.3V 100-Pin PQFP EP
FPGA Virtex-4 FX Family 94896 Cells 90nm Technology 1.2V 1152-Pin FCBGA
FPGA Virtex-4 FX Family 12312 Cells 90nm Technology 1.2V 668-Pin FCBGA
FPGA Virtex-4 FX Family 12312 Cells 90nm Technology 1.2V 363-Pin FCBGA
FPGA Virtex-4 FX Family 12312 Cells 90nm Technology 1.2V 668-Pin FCBGA
Support