This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > FPGAs Fundamentals, advanced features, and applications in industrial electronics > Tools and Methodologies for FPGA-Based Design > HLS Tools

HLS Tools

FONT SIZE : AAA

As introduced in Section 6.1, HLS tools offer the possibility of mapping algo- rithms into hardware from descriptions that are not time explicit or, in other words, do not contain information about transactions between registers in every clock cycle. These tools make the appropriate scheduling of operations in a given set of operators, which may be reused over time for different purposes. 

The algorithm specification defines relationships between variables con- taining data and operations, so data are transformed along the algorithm execution. The tools identify involved operators and data dependencies, in order for the modules in charge of these operations to be reused at differ- ent times of the algorithm execution. They also generate the multiplexing schemes and the associated control required to select the appropriate data path(s) at every time point during execution. As a result, they provide 

• A data flow graph, which contains the registers required to hold data, the multiplexing schemes required to feed operators with these data, and the operators themselves 

• A control state machine, which controls the data flow graph in order for the required operations to be performed in the required sequences 

Reuse of operators can be maximized to reduce logic resource utilization, usually at the expense of longer latency. Alternatively, if execution speed has higher priority than size, the circuit may be “widened” by multiple instantiations of operators so that parallelism may be exploited. In this case, pipelining, loop unrolling, parallel memory access (memory reshaping), and I/O adaptation are the main techniques used to speed up algorithm execu- tion. These techniques are briefly described here: 

• Pipelined structures achieve high execution speeds at the expense of high number of registers and long latencies. A well-designed pipe- lined circuit should have all stages performing operations and hold- ing data, cycle by cycle, with data coming from various execution cycles, as long as the signals are being propagated by the pipeline. Thus, pipelined structures are incompatible with resource reuse since structural hazards would be produced. 

• Loop unrolling is a technique that uses several functional instances for the inner loops of the code so that all iterations within the loop are executed in parallel. In order for this to be feasible, the loop must contain a fixed, predefined number of iterations (i.e., it does not depend on a variable but on a constant). If loops are nested, more than one loop may be set to be unrolled, but the chances for huge resource utilization increase. In general, this technique requires high resource utilization but few additional registers, and it should be complemented with memory reshaping and I/O adaptation, because all resources must be fed with the appropriate data at high speeds and simultaneously, otherwise, no performance improve- ment would be achieved. 

• Fast access to memories by the functional resources is crucial to achieve high computing bandwidth. With this purpose, memories (in particular those storing vectors or arrays) may be set to use wide parallel buses, capable of providing data to the possibly replicated computing resources at the required speeds. Since memory con- tents are the same, memory utilization inside the FPGA remains unchanged and the only overhead is that caused by parallel wiring. For this reason, this technique is called memory reshaping. 

• Data from the external elements have to be fed to the blocks designed through HLS techniques fast enough for all required data to be available at the right times. Similarly, these blocks must be capable of delivering output values to their destinations at the right times. High data throughput may be achieved by using DMA engines on dedi- cated ports. They may be embedded into the system under design for the control flow part to produce the proper transactions at the right times.Apart from traditional HLS tools, which are being inte- grated into design suites, there are also tools aimed at embedding (in a somewhat automated way) hardware accelerators within SoPC systems. They are targeted to a restricted set of devices or families and are conceived to support software designers with little expertise in hardware development. 

A special case of this approach is the development of hardware accelerators from programming languages that allow explicit parallelism to be described. OpenCL is becoming a widely used standard for such specifications because of its adequacy to cater to a variety of devices, such as GPGPUs, multicore systems, or SoPCs. It also supports heterogeneous computing, in the sense that different portions of the code may run on different computing platforms, as discussed in Section 3.1.1.1. This is very convenient for the newest FPGA families, which integrate several different hard processing fabrics in the same device. Because of its expected increased significance, the issues related to the design of these particular accelerators are discussed in Section 6.5. 

  • XC3SD3400A-4FGG676I

    Manufacturer:Xilinx

  • FPGA Spartan-3A DSP Family 3.4M Gates 53712 Cells 667MHz 90nm Technology 1.2V 676-Pin FBGA
  • Product Categories: FPGAs

    Lifecycle:Active Active

    RoHS:

  • XC3SD3400A-5FGG676C

    Manufacturer:Xilinx

  • FPGA Spartan-3A DSP Family 3.4M Gates 53712 Cells 770MHz 90nm Technology 1.2V 676-Pin FBGA
  • Product Categories: FPGAs

    Lifecycle:Active Active

    RoHS:

  • XC2C512-7FTG256I

    Manufacturer:Xilinx

  • CPLD CoolRunner -II Family 12K Gates 512 Macro Cells 179MHz 0.18um Technology 1.8V 256-Pin FTBGA
  • Product Categories: Embedded - CPLDs (Complex Programmable Logic Devices)

    Lifecycle:Active Active

    RoHS:

  • XC2C512-FT256I

    Manufacturer:Xilinx

  • Xilinx BGA
  • Product Categories:

    Lifecycle:Any -

    RoHS: -

  • XC2C64-5VQ100C

    Manufacturer:Xilinx

  • This lends power savings to High-end Communication equipment and speed to battery operated devices.
  • Product Categories: Programmable logic array

    Lifecycle:Any -

    RoHS: -

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.