Date: Oct 24, 2020
Click Count: 2739
FPGA is a product of further development based on programmable devices such as PAL, GAL, and CPLD. It emerged as a semi-custom circuit in the ASIC field, which not only solves the deficiencies of custom circuits, but also overcomes the shortcomings of the limited number of gate circuits of the original programmable devices.
What is the principle that FPGA is faster than CPU and GPU?
Both CPU and GPU belong to the von Neumann structure, instruction decoding and execution, and shared memory. The reason why FPGA is faster than CPU and GPU is essentially due to its architecture without instructions and shared memory.
In Feng's structure, since the execution unit may execute arbitrary instructions, an instruction memory, decoder, arithmetic units for various instructions, and branch and jump processing logic are required. The function of each logic unit of the FPGA is determined during reprogramming, and no instructions are required. The use of memory in Feng's structure has two functions: preservation of state and communication between execution units.
1) Save state: Registers and on-chip memory (BRAM) in FPGA belong to their control logic, without unnecessary arbitration and buffering.
2) Communication requirements: The connection between each logic unit of the FPGA and the surrounding logic unit has been determined during reprogramming, and there is no need to communicate through shared memory.
In computationally intensive tasks:
In the data center, the core advantage of FPGA over GPU is latency. Why does FPGA have much lower latency than GPU? Essentially, it is a difference in architecture. FPGA has both pipeline parallelism and data parallelism, while GPU is almost only data parallel (the pipeline depth is limited).
There are 10 steps to process a data packet. FPGA can build a 10-stage pipeline. Different stages of the pipeline process different data packets, and each data packet is processed after passing through 10 stages. Each processed data packet can be output immediately. The GPU's data parallel method is to make 10 computing units, and each computing unit is also processing different data packets, but all computing units must follow a unified pace and do the same thing (SIMD). This requires 10 data packets to be in and out at the same time. When tasks arrive one by one rather than in batches, pipeline parallelism can achieve lower latency than data parallelism. Therefore, FPGAs have inherent advantages in latency over GPUs for pipelined computing tasks.
ASIC is the best in terms of throughput, latency, and power consumption. But its research and development costs are high and the cycle is long. The flexibility of FPGAs can protect assets. The data center is leased to different tenants. Some machines have neural network accelerator cards, some have bing search accelerator cards, and some have network virtual accelerator cards. Task scheduling and operation and maintenance will be very troublesome. Using FPGAs can maintain the homogeneity of the data center. You can click this link to read about the difference between ASIC and FPGA.
In communication-intensive tasks, FPGAs have greater advantages over GPUs and CPUs.
1) Throughput: FPGA can be directly connected to a 40Gbps or 100Gbps network cable to process data packets of any size at wire speed; while the CPU needs a network card to receive the data packets; GPU can also process data packets with high performance, but the GPU does not have a network port , Also requires a network card, so that throughput is limited by the network card and (or) CPU.
2) Delay: The network card transmits the data to the CPU, and the CPU transmits it to the network card after processing. Also, the clock interruption and task scheduling in the system increase the instability of the delay.
To sum up, the main advantage of FPGA in the data center is stable and extremely low latency, which is suitable for streaming computing-intensive tasks and communication-intensive tasks.
The biggest difference between FPGA and GPU lies in the architecture. FPGA is more suitable for streaming processing that requires low latency, and GPU is more suitable for processing large quantities of homogeneous data.
The lack of instructions is the strength and weakness of FPGA at the same time. Every time you do a little different thing, it takes up a certain amount of FPGA logic resources. If the things to be done are complex and not repetitive, it will take up a lot of logic resources, most of which are idle. At this time, it is better to use a von Neumann processor.
FPGA and CPU work together. The strong locality and repetition belong to FPGA, and the complicated one belongs to CPU. Therefore, FPGA is widely used in machine deep learning.
<< Previous: The world's first FPGA chip-level disassembly
<< Next: AMD news you must know in 2020
Meeting design challenges in the era...
This article will study the design challenges of wearable pr...
Date: Jul 06, 2020
The future of FPGA development
FPGA will play an increasingly important role in multiple DS...
Date: Jun 22, 2020
PCIe accelerator board with the high...
The Accelerator-6D accelerator board is the only FPGA-based ...
Date: Jul 02, 2020
The difference, characteristics and ...
FPGA is a kind of programmable silicon chip, DSP is digital ...
Date: Jan 22, 2021
Five points of introduction of Xilin...
Each operation under Xilinx corresponds to a tool, logic syn...
Date: Nov 03, 2020
Signal demodulation system based on ...
With the gradual maturity and development of FPGA chip techn...
Date: Jul 13, 2020
1
2
3
4
5
6
7
8
Comparison of the latest released FPGAs from Xilinx, Intel, and Lattice
9
10
FPGA Virtex-4 FX Family 12312 Cells 90nm Technology 1.2V 668-Pin FCBGA
FPGA Virtex-4 FX Family 12312 Cells 90nm Technology 1.2V 668-Pin FCBGA
IC FPGA VIRTEX-4FX 1517FFBGA
Fuse Block; 300V; 20A; 12P; 1/4in QC Term; for 3AB/AG Fuses; UL 94V-0; Low Prof
FPGA XA Spartan-3A Family 400K Gates 8064 Cells 667MHz 90nm Technology 1.2V Automotive 400-Pin FBGA
Support