FONT SIZE : AAA
As stated in Section 3.1.3, soft processors are involved in the origin of FPSoC architectures. They are processor IP cores (usually general-purpose ones) implemented using the logic resources of the FPGA fabric (distributed logic, specialized hardware blocks, and interconnect resources), with the advan- tage of having a very flexible architecture.
FIGURE 3.7 Soft processor architecture.
As shown in Figure 3.7, a soft processor consists of a processor core, a set of on-chip peripherals, on-chip memory, and interfaces to off-chip memory. Like microcontroller families, each soft processor family uses a consistent instruction set and programming model.
Although some of the characteristics of a given soft processor are pre- defined and cannot be modified (e.g., the number of instruction and data bits, instruction set architecture [ISA], or some functional blocks), others can be defined by the designer (e.g., type and number of peripherals or memory map). In this way, the soft processor can, to a certain extent, be tailored to the target application. In addition, if a peripheral is required that is not available as part of the standard configuration possibilities of the soft processor, or a given available functionality needs to be optimized (for instance, because of the need to increase processing speed in performance-critical systems), it is always possible for the designer to implement a custom peripheral using available FPGA resources and connect it to the CPU in the same way as any “standard” peripheral.
The main alternative to soft processors are hard processors, which are fixed hardware blocks implementing specific processors, such as the ARM’s Cortex-A9 (ARM 2012) included by Altera and Xilinx in their latest fami- lies of devices. Although hard processors (analyzed in detail in Section 3.3) provide some advantages with regard to soft ones, their fixed architecture causes not all their resources to be necessary in many applications, whereas in other cases there may not be enough of them. Flexibility then becomes the main advantage of soft processors, enabling the development of custom solu- tions to meet performance, complexity, or cost requirements. Scalability and reduced risk of obsolescence are other significant advantages of soft proces- sors. Scalability refers to both the ability of adding resources to support new features or update existing ones along the whole lifetime of the system and the possibility of replicating a system, implementing more than one processor in the same FPGA chip. In terms of reduced risk of obsolescence, soft processors can usually be migrated to new families of devices. Limiting factors in this regard are that the soft processor may use logic resources specific to a given family of devices, which may not be available in others, or that the designer is not the actual owner of the HDL code describing the soft processor.
Soft processor cores can be divided into two groups:
1. Proprietary cores, associated with an FPGA vendor, that is, sup- ported only by devices from that vendor.
2. Open-source cores, which are technology independent and can, therefore, be implemented in devices from different vendors.
These two types of soft processors are analyzed in Sections 3.2.1 and 3.2.2, respectively. Although there are many soft processors with many diverse features available in the market, without loss of generality, we will focus on the main features and the most widely used cores, which will give a fairly comprehensive view of the different options available for designers.
Proprietary cores are optimized for a particular FPGA architecture, so they usually provide a more reliable performance, in the sense that the informa- tion about processing speed, resource utilization, and power consumption can be accurately determined, because it is possible to simulate their behav- ior from accurate hardware models. Their major drawback is that the porta- bility of and the possibility of reusing the code are quite limited.
Open-source cores are portable and more affordable. They are relatively easy to adapt to different FPGA architectures and to modify. On the other hand, not being optimized for any particular architecture, usually, their per- formance is worse and less predictable, and their implementation requires more FPGA resources to be used.
Xilinx’s PicoBlaze (Xilinx 2011a) and MicroBlaze (Xilinx 2016a) and Altera’s Nios-II* (Altera 2015c), whose block diagrams are shown in Figure 3.8a through c, respectively, have consistently been the most popular propri- etary processor cores over the years. More recently, Lattice Semiconductor released the LatticeMico32 (LM32) (Lattice 2012) and LatticeMico8 (LM8) (Lattice 2014) processors, † whose block diagrams are shown in Figure 3.8d and e, respectively.
* Altera previously developed and commercialized the Nios soft processor, predecessor of Nios-II.
† Although LM8 and LM32 are actually open-source, free IP cores, since they are optimized for Lattice FPGAs, they are better analyzed together with proprietary cores.
FIGURE 3.8 Block diagrams of proprietary processor cores: (a) Xilinx’s PicoBlaze, (b) Xilinx’s MicroBlaze, (c) Altera’s Nios-II, (d) Lattice’s LM32, and (e) Lattice’s LM8.
PicoBlaze and LM8 are 8-bit RISC microcontroller cores optimized for Xilinx* and Lattice FPGAs, respectively. Both have a predictive behavior, particularly PicoBlaze, all of whose instructions are executed in two clock cycles. Both have also similar architectures, including:
* KCPSM3 is the PicoBlaze version for Spartan-3 FPGAs, and KCPSM6 for Spartan-6, Virtex-6, and Virtex-7 Series.
• General-purpose registers (16 in PicoBlaze, 16 or 32 in LM8).
• Up to 4 K of 188-bit-wide instruction memory.
• Internal scratchpad RAM memory (64 bytes in PicoBlaze, up to 4 GB in 256-byte pages in LM8).
• Arithmetic Logic Unit (ALU).
• Interrupt management (one interrupt source in PicoBlaze, up to 8 in LM8).
The main difference between PicoBlaze and LM8 is the communication interface. None of it includes internal peripherals, so all required periph- erals must be separately implemented in the FPGA fabric. PicoBlaze com- municates with them through up to 256 input and up to 256 output ports, whereas LM8 uses a Wishbone interface from OpenCores, described in Section 3.5.4.
Similarly, although MicroBlaze, Nios-II, and LM32 are also associated with the FPGAs of their respective vendors, they have many common character- istics and features:
• 32-bit general-purpose RISC processors.
• 32-bit instruction set, data path, and address space.
• Harvard architecture.
• Thirty-two 32-bit general-purpose registers.
• Instruction and data cache memories.
• Memory management unit (MMU) to support OSs requiring virtual memory management (only in MicroBlaze and Nios-II).
• Possibility of variable pipeline, to optimize area or performance.
• Wide range of standard peripherals such as timers, serial commu- nication interfaces, general-purpose I/O, SDRAM controllers, and other memory interfaces.
• Single-precision floating point computation capabilities (only in MicroBlaze and Nios-II).
• Interfaces to off-chip memories and peripherals.
• Multiple interrupt sources.
• Exception handling capabilities.
• Possibility for creating and adding custom peripherals.
• Hardware debug logic.
• Standard and real-time OS support: Linux, μCLinux, MicroC/OS-II, ThreadX, eCos, FreeRTOS, uC/OS-II, or embOS (only in MicroBlaze and Nios-II).
A soft processor is designed to support a certain ISA. This implies the need for a set of functional blocks, in addition to instruction and data memories, peripherals, and resources, to connect the core to external elements. The functional blocks supporting the ISA are usually implemented in hardware, but some of them can also be emulated in software to reduce FPGA resource usage. On the other hand, not all blocks building up the core are required for all applications. Some of them are optional, and it is up to the designer whether to include them or not, according to system requirements for func- tionality, performance, or complexity. In other words, a soft processor core does not have a fixed structure, but it can be adapted to some extent to the specific needs of the target application.
Most of the remainder of this section is focused on the architecture of the Nios-II soft processor core as an example, but a vast majority of the analyses are also applicable to any other similar soft processors. As shown in Figure 3.8c, the Nios-II architecture consists of the following functional blocks:
• Register sets: They are organized in thirty-two 32-bit general-purpose registers and up to thirty-two 32-bit control registers. Optionally, up to 63 shadow register sets may be defined to reduce context switch latency and, in turn, execution time.
• ALU: It operates with the contents of the general-purpose regis- ters and supports arithmetic, logic, relational, and shift and rotate instructions. When configuring the core, designers may choose to have some instructions (e.g., division) implemented in hardware or emulated in software, to save FPGA resources for other purposes at the expense of performance.
• Custom instruction logic (optional): Nios-II supports the addition of not only custom components but also of custom instructions, for example, to accelerate algorithm execution. The idea is for the designer to be able to substitute a sequence of native instructions by a single one executed in hardware. Each new custom instruction created generates a logic block that is integrated in the ALU, as shown in Figure 3.9. This is an interesting feature of the Nios-II architecture not provided by others.
Up to 256 custom instructions of five different types (combina- tional, multicycle, extended, internal register file, and external inter- face) can be supported. A combinational instruction is implemented through a logic block that performs its function within a single clock cycle, whereas multicycle (sequential) instructions require more than one clock cycle to be completed. Extended instructions allow several (up to 256) combinational or multicycle instructions to be imple- mented in a single logic block. Internal register file custom instructions are those that can operate with the internal registers of their logic block instead of with Nios-II general-purpose registers (the ones used by other custom instructions and by native instructions).
FIGURE 3.9 Connection of custom instruction logic to the ALU.
Finally, external interface custom instructions generate communica- tion interfaces to access elements outside of the processor’s data path.
Whenever a new custom instruction is created, a macro is gener- ated that can be directly instantiated in any C or C++ application code, eliminating the need for programmers to use assembly code (they may use it anyway if they wish) to take advantage of custom instructions.
In addition to user-defined instructions, Nios-II offers a set of predefined instructions built from custom instruction logic. These include single-precision floating-point instructions (according to IEEE Std. 754-2008 or IEEE Std. 754-1985 specifications) to support computation-intensive floating-point applications:
• Exception controller: It provides response to all possible exceptions, including internal hardware interrupts, through an exception han- dler that assesses the cause of the exception and calls the corre- sponding exception response routine.
• Internal and external interrupt controller (EIC) (optional): Nios-II sup- ports up to 32 internal hardware interrupt sources, whose priority is determined by software. Designers may also create an EIC and con- nect it to the core through an EIC interface. When using EIC, internal interrupt sources are also connected to it and the internal interrupt controller is not implemented.
• Instruction and data buses: Nios-II is based on a Harvard architecture. The separate instruction and data buses are both implemented using 32-bit Avalon-MM master ports, according to Altera’s proprietary Avalon interface specification. The Avalon bus is analyzed in Section 3.5.2.
The data bus allows memory-mapped read/write access to both data memory and peripherals, whereas the instruction bus just fetches (reads) the instructions to be executed by the processor. Nios-II architecture does not specify the number or type of memo- ries and peripherals that can be used, nor the way to connect to them either. These features are configured when defining the FPSoC. However, most usually, a combination of (fast) on-chip embedded memory, slower off-chip memory, and on-chip peripherals (imple- mented in the FPGA fabric) is used. • Instruction and data cache memories (optional): Cache memories are supported in the instruction and data master ports. Both instruc- tion and data caches are an intrinsic part of the core, but their use is optional. Software methods are available to bypass one of them or both. Cache management and coherence are managed in software.
• Tightly coupled memories (TCM) (optional): The Nios-II architecture includes optional TCM ports aimed at ensuring low-latency memory access in time-critical applications. These ports connect both instruc- tion and data TCMs, which are on chip but external to the core. Several TCMs may be used, each one associated with a TCM port.
• MMU (optional): This block handles virtual memory, and, therefore, its use makes only sense in conjunction with an OS requiring virtual memory. Its main tasks are memory allocation to processes, transla- tion of virtual (software) memory addresses into physical addresses (the ones the hardware sets in the address lines of the Avalon bus), and memory protection to prevent any process to write to memory sections without proper authorization, thus avoiding errant soft- ware execution.
• Memory protection unit (MPU) (optional): This block is used when memory protection features are required but virtual memory man- agement is not. It allows access permissions to the different regions in the memory map to be defined by software. In case a process attempts to perform an unauthorized memory access, an exception is generated.
• JTAG debug module (optional): As shown in Figure 3.10, this block connects to the on-chip JTAG circuitry and to internal core signals. This allows the soft processor to be remotely accessed for debugging purposes. Some of the supported debugging tasks are downloading programs to memory, starting and stopping program execution, set- ting breakpoints and watchpoints, analyzing and editing registers and memory contents, and collecting real-time execution trace data. In this context, the advantage with regard to hard processors is that the debugging module can be used during the design and verifica- tion phase and removed for normal operation, thus releasing FPGA resources.
FIGURE 3.10 Connection of the JTAG debug module
To ease the task of configuring the Nios-II architecture to fit the requirements of different applications, Altera provides three basic models from which designers can build their own core, depending on whether performance or complexity weighs more in their decisions. Nios-II/f (fast) is designed to maximize performance at the expense of FPGA resource usage. Nios-II/s (standard) offers a balanced trade-off between performance and resource usage. Finally, Nios-II/e (economy) optimizes resource usage at the expense of performance.
The similarities between the hardware architecture of Altera’s Nios-II and Xilinx MicroBlaze can be clearly noticed in Figure 3.8. Both are 32-bit RISC processors with Harvard architecture and include fixed and optional blocks, most of which are present in the two architectures, even if there may be some differences in the implementation details. Lattice’s LM32 is also a 32-bit RISC processor, but much simpler than the two former ones. For instance, it does not include an MMU block. It can be integrated with OSs such as μCLinux, uC/OS-II, and TOPPERS/JSP kernel (Lattice 2008).
The core processor is not the only element a soft processor consists of, but it is the most important one, since it has to ensure that any instruction in the ISA can be executed no matter what the configuration of the core is. In addition, the soft processor includes peripherals, memory resources, and the required interconnections. A large number of peripherals are or may be integrated in the soft processor architecture. They range from standard resources (GPIO, timers, counters, or UARTs) to complex, specialized blocks oriented to signal processing, networking, or biometrics, among other fields. Not only FPGA vendors provide peripherals to support their soft processors, but others are also available from third parties.
Communication of the core processor with peripherals and external cir- cuits in the FPGA fabric is a key aspect in the architecture of soft proces- sors. In this regard, there are significant differences among the three soft processors being analyzed. Nios-II has always used, from its very first ver- sions to date, Altera’s proprietary Avalon bus. On the other hand, Xilinx ini- tially used IBM’s CoreConnect bus, together with proprietary ones (such as local memory bus [LMB] and Xilinx CacheLink [XCL]), but the most current devices use ARM’s AXI interface. Lattice LM32 processor uses WishBone interfaces. A detailed analysis of the on-chip buses most widely used in FPSoCs is made in Section 3.5.
At this point, readers may be afraid to realize the huge amount and diver- sity of concepts, terms, hardware and software alternatives, or design deci- sions one must face when dealing with soft processors. Fortunately, designers have at their disposal robust design environments as well as an ecosystem of design tools and IP cores that dramatically simplify the design process. The tools supporting the design of SoPCs are described in Section 6.3.
In addition to proprietary cores, associated with certain FPGA architectures/ vendors, there are also open-source soft processor cores available from other parties. Some examples are ARM’s Cortex-M1 and Cortex-M3, Freescale’s ColdFire V1, MIPS Technologies’ MP32, OpenRISC 1200 from OpenCores community, Aeroflex Gaisler’s LEON4, as well as implementations of many different well-known processors, such as the 8051, 80186 (88), and 68000. The main advantages of these solutions are that they are technology indepen- dent, low cost, based on well-known, proven architectures, and they are sup- ported by a full set of tools and OSs.
The Cortex-M1 processor (ARM 2008), whose block diagram is shown in Figure 3.11a, was developed by ARM specifically targeting FPGAs. It has a 32-bit RISC architecture and, among other features, includes configurable instruction and data TCMs, interrupt controller, or configurable debug logic. The communication interface is ARM’s proprietary AMBA AHB-Lite 32-bit bus (described in Section 3.5.1.1). The core supports Altera, Microsemi, and Xilinx devices, and it can operate in a frequency range from 70 to 200 MHz, depending on the FPGA family.
The OpenRISC 1200 processor (OpenCores 2011) is based on the OpenRISC 1000 architecture, developed by OpenCores targeting the implementation of 32- and 64-bit processors. OpenRISC 1200, whose block diagram is shown in Figure 3.11b, is a 32-bit RISC processor with Harvard architecture. Among other features, it includes general-purpose registers, instructions and data caches, MMU, floating-point unit (FPU), MAC unit for the efficient implemen- tation of signal processing functions, and exception/interrupt management units. The communication interface is WishBone (described in Section 3.5.4). It supports different OSs, such as Linux, RTEMS, FreeRTOS, and eCos.
LEON4 is a 32-bit processor based on the SPARC V8 architecture origi- nated from European Space Agency’s project LEON. It is one of the most complex and flexible (configurable) open-source cores. It includes an ALU
FIGURE 3.11 Some open-source soft processors: (a) Cortex-M1, (b) OpenRISC1200, and (c) LEON4.
with hardware multiply, divide, and MAC units, IEEE-754 FPU, MMU, and debug module with instruction and data trace buffer. It supports two levels of instruction and data caches and uses the AMBA 2.0 AHB bus (described in Section 3.5.1.1) as communication interface. From a software point of view, it supports Linux, eCos, RTEMS, Nucleus, VxWorks, and ThreadX.
Table 3.1 summarizes the performance of the different soft processors analyzed in this chapter. It should be noted that data have been extracted
from information provided by vendors and, in some cases, it is not clear how this information has been obtained.
Since several soft processors can be instantiated in an FPGA design (to the extent that there are enough resources available), many diverse FPSoC solu- tions can be developed based on them, from single to multicore. These mul- ticore systems may be based on the same or different soft processors, or their combination with hard processors, and support different OSs. Therefore, it is possible to design homogeneous or heterogeneous FPSoCs, with SMP or AMP architectures.
Manufacturer:Xilinx
Product Categories:
Lifecycle:Obsolete -
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: Microcontrôlleur
Lifecycle:Obsolete -
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: CPLDs
Lifecycle:Active Active
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories:
Lifecycle:Any -
RoHS: -
Manufacturer:Xilinx
Product Categories:
Lifecycle:Any -
RoHS:
Support