This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > Designing with Xilinx FPGAs Using Vivado > Power Analysis and Optimization > Vivado Power Optimization

TABLE OF CONTENTS

Xilinx FPGA FPGA Forum

Vivado Power Optimization

FONT SIZE : AAA

Vivado Power Optimization

Vivado power optimization exploits a variety of techniques to reduce the dynamic  power consumption of the design. As shown in Fig. 15.5 , it detects the clock cycles  under which certain sequential circuit elements do not contribute to observable

Vivado power optimization.png

design functionality, and applies ASIC-like clock- gating techniques to reduce their  activities. Due to the fact that FPGAs have dedicated clock routing resources, the  clock gating is actually applied to the enable port of sequential elements such as a  flfl op or block RAM. Compared to the coarse-grained clock gating that requires a nontrivial amount of design effort, Vivado power optimization is capable of automatically  inferring more fifi ne-grained gating conditions across multiple levels of logic and  sequential boundaries.

Optimization Paradigms

The fundamental of Vivado power optimization is the inference of logic conditions  under which the sequential element can be disabled without disturbing observable  design states and/or functionalities. There are two major paradigms that Vivado  power optimization explores: the output don’t care ( ODC ) paradigm and the input  don’t toggle ( IDT ) paradigm. A brief introduction of these paradigms will help in  intuitively understanding the potential netlist-level changes applied by Vivado power  optimization, which is important for designing and analyzing low-power systems.   

The ODC paradigm infers the enable condition by exploring the output side of a  sequential element, with the key idea that the sequential element only needs to be  enabled when its output is consumed by logic in the fan-out cone. As shown in  Fig. 15.6 , the output of FF1 becomes don’t care when FF2 ’s CLR signal is asserted.  Consequently, Vivado power optimization infers that FF1 only needs to be enabled  when FF2 ’s CLR signal is de-asserted and applies that signal to the enable port of  FF1 through the inverter. Since a flfl op’s enable decides its output data availability in  the next clock cycle, the actual enable of the FF1 needs to be traced back by one  clock cycle which is applied through FF3 in the example. 

To infer enable conditions across sequential boundaries, Vivado power optimiza

tion performs multiple iterations of ODC analysis. This essentially unrolls the time  span and back propagates ODC conditions across multiple levels of flfl ops. In the  example shown in Fig. 15.7 , the ODC enable for flfl op FF2 is inferred in the fifi rst  iteration from its output observability at the MUX, while the ODC enable for FF1 is decided in the second iteration based on FF2 ’s inferred ODC enable.  On the other hand, the IDT paradigm searches enable condition by exploring the  input side of a sequential element, with the idea that if its input data remains same,



Multiple iterations of ODC.png

the sequential element can be safely disabled without altering the direct output. In  the example illustrated in Fig. 15.8 , the flfl op FF1 ’s input doesn’t toggle when a = 1  and b = 0 since its output is directly fed into its input. Consequently, Vivado power  optimization generates the IDT enable signal of FF1 as the complement of such  disable condition, i.e., EN = ~a + b . Generally speaking, the IDT paradigm is useful  for reducing dynamic power of designs with many feedback loops

In addition to the general ODC and IDT paradigms, Vivado power optimization  also takes care of applying specififi c optimization techniques to certain high-powerconsuming components such as block RAMs . To illustrate a few, the following  techniques are deployed:

• Block RAM Structural ODC Optimization—Different from the general ODC paradigm, this optimization searches the conditions under which the block RAM is used in write-only manner and thus directly utilizes the write-enable signal as  the block RAM’s global enable control to suppress any unnecessary READ operations. 

• Block RAM Write-Mode Optimization —Write mode defi nes the behavior of the block RAM’s outputs when data is being written into it, which can be set to NO_CHANGE to suppress any unnecessary output toggling. To fully utilize this feature, Vivado power optimization searches the block RAMs whose outputs aren’t  consumed during WRITE operations and sets their write mode to NO_CHANGE .

IDT optimization paradigm.png

• Block RAM Quiescent IDT (QIDT) Optimization—When the block RAM’s input addresses remain the same in two consecutive READ cycles, Vivado  power optimization safely disables the block RAM without disturbing its functionality. 

• Cascaded Block RAM Optimization—When multiple block RAMs are cascaded, only one of the block RAM needs to be active at the same time. Consequently, Vivado power optimization generates the enable signal for each block RAM in the cascaded chain from most-signifi cant bits (MSBs) of the address bus such  that only the block RAM being accessed is enabled.

Post Vivado power optimization, you may observe different outputs of certain  sequential elements such as flfl ops or block RAMs from simulation. This is expected  since the activities of these elements are reduced by clock gating the EN port. But,  Vivado power optimization guarantees that the design’s observable functionality  remains undisturbed (i.e., from primary outputs), since these sequential elements  are only disabled during the clock cycles when their outputs are not consumed or  remain unchanged.

Suggestions for Low-Power Design

In addition to automatically reducing the power consumption of the design, some  design level considerations could further improve the power characteristics and/or  create more optimization opportunities for Vivado power optimizer. In this subsection a few techniques good for low-power design are proposed:

• Cascaded block RAMs—To implement the same memory, block RAMs can be cascaded in different ways which impact the power and timing of the design. For example, to implement the 36*8K memory, one option is to have nine 4*8K block RAMs in parallel, each contributing 4 bits of the data. Although this achieves the highest speed, it requires all block RAMs to be active concurrently, which consumes a signifi cant amount of power. On the other hand, the same memory can be implemented by cascading nine 36*1K block RAMs, which is power optimal since only one block RAM is active at the same time. Generally,  you shall consider the balance between parallel/cascaded block RAM implementation to achieve the best power and speed trade-off. 

• Distributed RAM vs. block RAM—Similarly, the choice of using distributed 

RAM or block RAM to implement the memory could affect the power consumption of a design. For instance, to implement the 32*100 memory, using one block RAM is functionally correct but wastes a large portion of the data capacity of the block RAM. On the other hand, the same memory can be implemented by 100 distributed RAMs without wasting any resource or power. Consequently, it is also a good idea to consider distributed RAM vs. block RAM under certain 

power and resource constraints. 

• MUX chain design—The structure of the MUX chain decides the way ODC analysis is being performed by Vivado power optimization. Pushing the highpower- consuming element such as block RAM to the end of the MUX chain increases the chance for Vivado power optimization to fi nd the best ODC enable condition for that element. 

• XOR tree design.—While XOR tree is good for implementing arithmetic logics, it has the disadvantage of causing excessive glitches with increased levels of XOR gates. Consequently for power-centric designs, it is suggested to limit the logic levels of XOR tree by applying techniques such as inserting pipeline stages in between.

  • XC4003A-5PC84I

    Manufacturer:Xilinx

  • Xilinx PLCC84
  • Product Categories:

    Lifecycle:Any -

    RoHS: -

  • XCS30-4PQ208I

    Manufacturer:Xilinx

  • Spartan and Spartan-XL Families Field Programmable Gate Arrays
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: -

  • XC5VLX110-1FF1153C

    Manufacturer:Xilinx

  • FPGA Virtex-5 LX Family 110592 Cells 65nm Technology 1V 1153-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC5VLX110-1FF676I

    Manufacturer:Xilinx

  • FPGA Virtex-5 LX Family 110592 Cells 65nm Technology 1V 676-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC5VLX110-1FFG676C

    Manufacturer:Xilinx

  • FPGA Virtex-5 LX Family 110592 Cells 65nm Technology 1V 676-Pin FCBGA
  • Product Categories: Embedded - FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS:

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.