Date: Dec 02, 2020
Click Count: 3332
Friends who are familiar with Xilinx FPGAs know that the Zynq® UltraScale+™ MPSoC series is based on the Xilinx® UltraScale™ MPSoC architecture, which integrates a feature-rich, ARM-based 64-bit quad-core or dual-core processing system (PS) and Xilinx in a single device Programmable logic (PL) UltraScale architecture. Besides, it also includes on-chip memory, multi-port external memory interfaces, and a wealth of peripheral interfaces, especially the 16.3 Gbps GTH transceiver, which supports the interface with PCI Express® Gen3 storage devices like NVMe SSD drives. This article shows a solution to implement the NVMe solid-state drive (SSD) interface on Xilinx’s ZCU102 evaluation kit by using Design Gateway’s NVMeG3-IP core. This solution can achieve amazingly fast performance: write speeds of 2,319 MB/s, read The fetch speed is 3,347 MB/s.
Introduction to Zynq® UltraScale+ MPSoC ZCU102 Evaluation Kit
ZCU102 is a general-purpose evaluation board for rapid prototyping, based on the XCZU9EG-2FFVB1156E MPSoC device. The evaluation board includes high-speed DDR4 SODIMM and component memory interfaces, FMC expansion ports, multi-gigabit per second serial transceivers, various peripheral interfaces, and FPGA logic devices for custom designs, thereby providing a flexible Prototype development platform.
ZCU102 provides programmable logic functions and can be used in the most advanced applications such as 5G wireless networks, next-generation advanced driver assistance systems (ADAS), and industrial Internet of Things (IIoT) solutions.
In short, for applications that require high-performance, high-reliability external data storage such as NVMe SSD drives, it is necessary to adopt appropriate solutions to make full use of GTH transceivers that support PCI Express® Gen3 interfaces.
Introduction to NVMe SSD storage
NVM Express (NVMe) defines the interface for the host controller to access the SSD through PCI Express. NVM Express uses only two registers (command issuance and command completion), thereby optimizing the command issuance and completion process. Also, NVMe supports parallel operations, supporting up to 64K commands in a single queue. The 64K command entry improves the transmission performance of both sequential access and random access.
NVMe drives paved the way for high-speed data storage and computing. With PCIe Express® Gen3 technology, new NVMe SSD drives can achieve peak performance up to 40 Gbps.
An example of an NVMe storage device is shown here.
Implementation of NVMe host controller on ZCU102
By convention, the NVMe host is realized by using the host processor in conjunction with the PCIe controller to realize data transfer to and from the NVMe SSD. The purpose of implementing the NVMe protocol is for device driver communication with PCIe controller hardware CPU peripherals connected via a higher-speed bus. Both the data buffer and the command queue require external DDR memory to transfer data between the PCIe controller and the SSD.
Because the XCZU9EG-2FFVB1156E FPGA device on ZCU102 does not contain PCIe Gen3 integrated block, traditional implementation methods cannot be used.
Therefore, Design Gateway proposed a solution that uses the NVMeG3-IP core to implement the NVMe SSD interface of the Zynq® UltraScale+™ MPSoC device (without the PCIe integrated block). Through the NVMe interface, ZCU102 can build a multi-channel RAID system with higher performance while minimizing FPGA resource consumption. The NVMeG3-IP core license includes reference design examples to help designers shorten development time and reduce costs.
Overview of Design Gateway's NVMeG3-IP
In the absence of PCIe integrated block, CPU, and external memory, the NVMe IP core with PCIe Gen3 IP soft-core (NVMeG3-IP) is ideal for accessing NVMe SSD. NVMeG3-IP includes PCIe Gen3 IP soft-core and 256 KB memory. If your application requires a higher-speed performance of NVMe SSD storage, but you use a low-cost FPGA that does not contain PCIe integrated blocks, then this solution is recommended.
NVMeG3-IP has many features, some of which are listed below:
Realize some parts of the application layer, transaction layer, data link layer, and physical layer to access NVMe SSD without taking up CPU
Cooperate with Xilinx PCIe PHY IP configured as 4-lane PCIe Gen3 (128-bit bus interface)
Contains 256 KB RAM data buffer
Simple user interface via dgIF typeS
Support six commands, namely "identify", "close", "write", "read", "SMART" and "refresh" (support other commands as optional)
Supported NVMe devices:
Base class code: 01h (mass storage), sub class code: 08h (non-volatile), programming interface: 02h (NVMHCI)；
Minimum memory page size (MPSMIN): 0 (4 KB)
Maximum data transfer size (MDTS): at least 5 (128 KB) or 0 (unlimited)；
LBA unit: 512 bytes or 4096 bytes
The user clock frequency must be greater than or equal to the PCIe clock (Gen3 is 250 MHz)
Available reference designs:
ZCU102 with AB17-M2FMC adapter board
KCU105 with AB18-PCIeX16/AB16-PCIeXOVR adapter board
VCU118 with AB18-PCIeX16 adapter board
Design Gateway developed NVMeG3-IP to be able to operate as an NVMe host controller to access NVMe SSD. The user interface and standard features are designed for ease of use, without knowledge of the NVMe protocol. The additional feature of NVMeG3-IP is the built-in PCIe IP soft-core, which can implement certain parts of the data link layer and physical layer of the PCIe protocol through pure logic. Therefore, with the built-in PCIe IP soft-core and Xilinx PCIe PHY IP core, NVMeG3-IP can run on FPGAs without PCIe integrated blocks. Xilinx's PCIe PHY IP is a free IP core available that includes a transceiver and logic equalizer.
NVMeG3-IP supports six NVMe commands, namely identify, close, write, read, SMART, and refresh. 256 KB of BlockRAM is integrated into NVMeG3-IP, which can be used as a data buffer. The system does not require a CPU and external memory.
The FPGA resource usage of the XCZU9EG-2FFVB1156E FPGA device is shown in Table 1 below.
Implementation and performance results of ZCU102
The following figure shows an overview of the reference design based on ZCU102 to demonstrate the operation of NVMeG3-IP. The NVMeG3IPTest module in the demo system includes the following modules: TestGen, LAxi2Reg, CtmRAM, IdenRAM and FIFO.
The demonstration system is designed to write/verify data to the NVMe SSD on the ZCU102. The user can control the test operation through the serial console. To connect NVMe SSD and ZCU102, you need to use the AB17-M2FMC adapter board, as in the figure below.
When using a 512 GB Samsung 970 Pro, the sample test results of running the demo system on ZCU102 are shown in the figure below.
The NVMeG3-IP core provides a solution for implementing the NVMe SSD interface on the ZCU102 evaluation kit; it also provides a solution for the Xilinx® Zynq® UltraScale+™ MPSoC device series without PCIe integrated block. The design goal of NVMeG3-IP is to achieve the highest performance of NVMe SSD access with the lowest FPGA resource usage without using the CPU. It is very suitable for high-performance NVMe storage that does not use a CPU. It can use GTH transceivers to implement multiple NVMe SSD interfaces without being limited by the number of PCIe integrated modules contained on the FPGA device.
Xilinx related articles：
FPGA XA Spartan-3A Family 400K Gates 8064 Cells 667MHz 90nm Technology 1.2V Automotive 400-Pin FBGA
FPGA XA Spartan-3E Family 500K Gates 10476 Cells 572MHz 90nm Technology 1.2V Automotive 132-Pin CSBGA
FPGA Spartan-3 Family 50K Gates 1728 Cells 90nm Technology 1.2V Automotive 100-Pin VTQFP
CPLD XA9500XL Family 800 Gates 36 Macro Cells 64.5MHz 0.35um, CMOS Technology 3.3V Automotive 44-Pin VQFP