This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > Design Recipes for FPGAs Using Verilog and VHDL > Design Optimization Example: DES > Optimizing the Datapath

TABLE OF CONTENTS

Xilinx FPGA FPGA Forum

Optimizing the Datapath

FONT SIZE : AAA

Examining the nine control states in the main loop and relating these to the mapping of the control graph to the dataflow graph showed that the last 8 cycles were performing the S-block and the first 2 cycles were mainly related to transforming the key. The second state is an overlap state where both key and data transforms are taking place. The problem with the last 8 cycles was fairly self-evident since there are eight substitutions and there are eight control states to perform them. Clearly there was something causing each substitution to be locked into a separate control state and therefore preventing optimization with respect to latency. It wasn’t difficult to see what each of these states contained: just register assignments, concatenations and a ROM read operation. It is the last of these that is the problem; the ROM implementation being targeted is a synchronous circuit, so the S-block ROM can only be accessed once per clock cycle—in other words once per control state. It is this that is preventing the datapath operations from being performed in parallel. Attacking this problem is beyond the capabilities of behavioral synthesis because it requires knowledge of the dataflow at a much higher level than can be automatically extracted. The solution therefore requires modification of the original design.

There are two obvious solutions to this problem: either split the S-block into eight smaller ROMs that can therefore be accessed in parallel or make the S-block a non-ROM so that the array gets expanded into a decoder block once for each access, giving eight decoders. The latter solution appears simplest, but it will result in eight 512-way decoders, which will be a very large implementation. The solution of splitting the ROMs is more likely to yield a useful solution. The substitute function was rewritten to have eight mini-ROMs:


function substitute(data : vec48) return vec32 is

−−moods inline

type S_block_type is

array(0 to 63) of natural range 0 to 15;

constant S_block0 : S_block_type := ( . );

−−moods ROM

.

constant S_block7 : S_block_type := ( . );

begin

−−moods ROM

return std_logic_vector(to_unsigned(S_block0(to_integer(

unsigned(data(1) & data(6) & data(2 to 5)))),4)) &

.

std_logic_vector(to_unsigned(S_block7(to_integer(

unsigned(data(43) & data(48) & data(44 to 47)))),4));

end;

This was resynthesized and resulted in the control graph shown in Figure 19.3. The inner loop was found to have been reduced to two states, and examination of the last state confirmed that all of the S-block substitutions were being carried out in the one state c4. The key transformations were still split across the two inner states c3 and c4.

One interesting side-effect of this optimization is that it is also a smaller design. MOODS predicts that this design has the area and delay characteristics shown in Table 19.1 in the line labeled (2).

Control state machine for optimized S-blockspng

Figure 19.3 Control state machine for optimized S-blocks.

Optimizing the Key Transformations

Examination of the two control states in the main loop, which both contain key transformations, showed that both of these states were performing ROM access and rotate operations. Examination of the original key_rotate function showed that the shift distance ROMs are accessed twice per call, so this turned out to be exactly the same problem as with the S-block ROM. Since ROMs are synchronous, they can only be accessed once per cycle and this forces at least two cycles to be used for the rotate. To solve this, the function can be rewritten to only access the ROMs once per call:


if encrypt = 1 then

distance := encrypt_shift_distance(round);

result :=

vec28(unsigned(key(1 to 28)) rol distance) &

vec28(unsigned(key(29 to 56)) rol distance);

else

distance := decrypt_shift_distance(round);

result :=

vec28(unsigned(key(1 to 28)) ror distance) &

vec28(unsigned(key(29 to 56)) ror distance);

end if;

This was resynthesized and resulted in the control graph shown in Figure 19.4. The inner loop was found to have been reduced to one state (c3) containing both the key and data transformations, which are repeated 16 times. As before, states c1 and c2 implement the input handshake.

So, this optimization means that the target of 1 clock cycle per round of the core was achieved. MOODS predicts that this design has the area and delay characteristics shown in Table 19.1 in the line labeled (3).

Control state machine for optimized key rotatepng

Figure 19.4 Control state machine for optimized key rotate.

  • XC2V2000-4BFG957I

    Manufacturer:Xilinx

  • FPGA Virtex-II Family 2M Gates 24192 Cells 650MHz 0.15um Technology 1.5V 957-Pin FCBGA
  • Product Categories: Module RF, IC et Accessoires

    Lifecycle:Obsolete -

    RoHS:

  • XC2V2000-4FF896I

    Manufacturer:Xilinx

  • FPGA Virtex-II Family 2M Gates 24192 Cells 650MHz 0.15um Technology 1.5V 896-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Obsolete -

    RoHS: No RoHS

  • XC2V2000-4FG676I

    Manufacturer:Xilinx

  • FPGA Virtex-II Family 2M Gates 24192 Cells 650MHz 0.15um Technology 1.5V 676-Pin FBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Obsolete -

    RoHS:

  • XC4028XLA-08BG352C

    Manufacturer:Xilinx

  • FPGA XC4000XLA Family 28K Gates 2432 Cells 263MHz 0.35um Technology 3.3V 352-Pin Metal BGA
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: No RoHS

  • XC3090-125PQ160C

    Manufacturer:Xilinx

  • FPGA XC3000 Family 6K Gates 320 Cells 125MHz 5V 160-Pin PQFP
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: No RoHS

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.