FONT SIZE : AAA
The embedded multipliers and DSP blocks described so far lack the ability to perform floating-point operations. Actually, most of these kinds of resources available in FPGAs are designed to operate in fixed point. This is a signifi- cant limitation in many cases because it implies the need for many signal processing designs to be adapted to work in fixed point, not only adding design burden but also creating potential accuracy problems. These issues are being overcome by the availability of new DSP blocks supporting the IEEE 754 floating-point standard.
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is the most widely used standard in floating-point computation circuits (IEEE 2008). It defines data formats, operations, and exceptions (such as division by zero, asymptotic functions, overflow, or inputs/outputs producing undefined or unrepresentable numbers, called NaN—Not a Number). The two basic data formats in IEEE 754 are simple (32-bit) and double (64-bit) precision. Any IEEE 754–compliant computing system must at least support simple preci- sion operations.
The simple precision format consists of a sign bit (the most significant one in any data word), followed by 8 bits for the exponent (represented in excess to 2 n−1 − 1 format) and 23 bits for the mantissa, which is normalized so that it always starts with a nonzero bit. Therefore, in order for some operations (e.g., addition and subtraction) to be performed, it is first necessary to align
* Up to 3600 in Xilinx 7 Series devices.
the mantissas of the operands (so that the decimal separator is in the same position in all of them), then operate, and finally round off and normalize again the result. Actually, IEEE 754 specifies that alignment and normaliza- tion operations be done for each operation.
If fixed-point multipliers or DSP blocks are to be used for IEEE 754– compliant floating-point operations, alignment and normalization should be necessar- ily done using distributed logic in the FPGA fabric. This usually implies the need for barrel shifters of up to 48 bits (when working in single precision), which requires a large amount of logic and interconnection resources to be used, in turn negatively affecting operating frequency, to the extent that it may become the limiting factor in the performance of the whole processing system. Performance degradation is more significant as the complexity of the target algorithm grows, because of the need for executing alignment and normalization steps in all operations.
Currently, DSP blocks supporting IEEE 754–compliant single-precision operations are available in some FPGAs (Parker 2014; Sinha 2014; Altera 2016). As the sample block in Figure 4.6 shows, they include an adder and a multiplier, both IEEE 754 compliant, and some registers and MUXes that, like in the blocks described in Sections 4.2 and 4.3, are intended to allow high operating frequencies to be achieved and to provide configurability. Supported operating modes are addition/subtraction, multiplication, MAC, multiplication and addition/subtraction, vector one/two, and complex mul- tiplication mode, among others.
In this case, alignment and normalization operations are carried out inside the DSP block itself, avoiding the need for using distributed logic resources with these purposes and, therefore, eliminating the aforementioned nega- tive impact of these operations in performance. These blocks also include the logic resources required to detect and flag the exceptions defined by the IEEE 754 standard.
Figures 4.7 through 4.10 show some of the operating modes for floating- point arithmetic supported by the DSP block in Figure 4.6.
FIGURE 4.6 Variable Precision DSP Block from Altera Arria 10 FPGAs.
FIGURE 4.7 Multiplication mode: floating-point multiplication of input operands y and z.
FIGURE 4.8 MAC mode: floating-point multiplication of input operands y and z, followed by floating-point addition/subtraction of the result and the previously accumulated value (y · z + acc or y · z − acc).
FIGURE 4.9 Vector two mode: simultaneous floating-point multiplication (whose result is sent to the following DSP block through the chainout output) and addition of the value received through the chainin input (from the previous DSP block) to the x input operand (Result n = x n + chainin n = x n + chainout n−1 = x n + y n−1 · z n−1 ).
FIGURE 4.10 Complex multiplication mode: floating-point complex multiplication using four DSP blocks, according to the expression (a + j · b) · (c + j · d) = (a · c – b · d) + j · (a · d + b · c).
Vendors provide sets of floating-point mathematic functions (many of which comply with specifications such as OpenCL 1.2) optimized for their implementation in these blocks.
In general, the design tools from the different vendors significantly auto- mate the optimization and use of DSP resources available in their FPGAs. In this way, for applications without extremely demanding timing require- ments, designers can easily develop fully functional systems without taking care of complex hardware issues, such as the internal topology of the blocks, pipeline acceleration, or time-division multiplexing techniques.
Manufacturer:Xilinx
Product Categories: FPGAs (Field Programmable Gate Array)
Lifecycle:Active Active
RoHS:
Manufacturer:Xilinx
Product Categories: Aluminum Electrolytic Capacitors
Lifecycle:Active Active
RoHS:
Manufacturer:Xilinx
Product Categories: FPGAs (Field Programmable Gate Array)
Lifecycle:Active Active
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: FPGAs (Field Programmable Gate Array)
Lifecycle:Active Active
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories: Embedded - CPLDs (Complex Programmable Logic Devices)
Lifecycle:Active Active
RoHS: No RoHS
Support