This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technical Tutorials > Design Recipes for FPGAs Using Verilog and VHDL > Secure Systems > The Data Encryption Standard (DES)

TABLE OF CONTENTS

Xilinx FPGA FPGA Forum

The Data Encryption Standard (DES)

FONT SIZE : AAA

Introduction 

The Data Encryption Standard (DES) was adopted by the National Institute of Standards and Technology (NIST) in 1977 as the Federal Information Processing Standards 46 (FIPS PUB 46). As mentioned previously, the algorithm operates on plaintext blocks of 64 bits and the key size is 56 bits. By 1999, NIST had decreed that DES was no longer secure and should only be used for legacy systems and that triple DES should be used instead. As will be described later, DES has since been superceded by the Advanced Encryption Standards (AES). The coarse structure (overall architecture) of DES is shown in Figure 10.3. 

The center section (where the main repetition occurs) is called the fine structure and is where the details of the encryption take place. This fine structure is detailed in Figure 10.4.

DES coarse structurepng

Figure 10.3 

DES coarse structure.

DES fine structurepng

Figure 10.4 

DES fifine structure.

The fine structure of DES consists of several important functional blocks: 

• Initial permutation Fixed, known mapping 64-64 bits. 

• Key transformations Circular L shift of keys by A(i) bits in round (A(i) is known and fixed). 

• Compression Permutation Fixed known subset of 56-bit input mapped onto 48-bit output. 

• Expansion permutation 32-bit data shuffled and mapped (both operations fixed and known) onto 48 bits by duplicating 16 input bits. This makes diffusion quicker. 

Another significant section of the algorithm is the substitution or S-box. The nonlinear aspect of the cipher is vital in cryptography. In DES the eight S boxes each contain four different (fixed and known) 4:4 input maps. These are selected by the extra bits created in the expansion box. The S boxes are structured as shown in Figure 10.5.

The final part of the DES structure is the key generation architecture for the individual round keys and this is given in Figure 10.6.

DES S-box architecturepng

Figure 10.5 

DES S-box architecture.

DES round key generationpng

Figure 10.6 

DES round key generation.

The remaining functional block is the initial and final permutation. The initial permutation (P-Box) is a 32:32 fixed, known bit permutation. The final permutation is the inverse of the initial permutation. The initial permutation is defined using the following table:

3.png

DES VHDL Implementation 

DES can be implemented in VHDL using a structural or a functional approach. As has been discussed previously, there are advantages to both methods; however, the DES algorithm is tied implicitly to the structure, so a structural approach will give an efficient implementation. 

Implementing the initial permutation in VHDL requires a 64-bit input vector and a 64-bit output vector. We can create this in VHDL with an entity that defines an input and output std_logic vector as follows. The architecture is simply the assignment of bits from input to output according to the initial permutation table previously defined.

1 library ieee;

2 use ieee.std_logic_1164.all; 34 entity des_ip is port

5 ( 6 d : in std_logic_vector(1 to 64);

7 y : out std_logic_vector(1 to 64)

8 );

9 end des_ip;

10

11 architecture behavior of des_ip is

12 begin

13 y(1)<=d(58); y(2)<=d(50); y(3)<=d(42); y(4)<=d(34);

14 y(5)<=d(26); y(6)<=d(18); y(7)<=d(10); y(8)<=d(2);

15 y(9)<=d(60); y(10)<=d(52); y(11)<=d(44); y(12)<=d(36);

16 y(13)<=d(28); y(14)<=d(20); y(15)<=d(12); y(16)<=d(4);

17 y(17)<=d(62); y(18)<=d(54); y(19)<=d(46); y(20)<=d(38);

18 y(21)<=d(30); y(22)<=d(22); y(23)<=d(14); y(24)<=d(6);

19 y(25)<=d(64); y(26)<=d(56); y(27)<=d(48); y(28)<=d(40);

20 y(29)<=d(32); y(30)<=d(24); y(31)<=d(16); y(32)<=d(8);

21 y(33)<=d(57); y(34)<=d(49); y(35)<=d(41); y(36)<=d(33);

22 y(37)<=d(25); y(38)<=d(17); y(39)<=d(9); y(40)<=d(1);

23 y(41)<=d(59); y(42)<=d(51); y(43)<=d(43); y(44)<=d(35);

24 y(45)<=d(27); y(46)<=d(19); y(47)<=d(11); y(48)<=d(3);

25 y(49)<=d(61); y(50)<=d(53); y(51)<=d(45); y(52)<=d(37);

26 y(53)<=d(29); y(54)<=d(21); y(55)<=d(13); y(56)<=d(5);

27 y(57)<=d(63); y(58)<=d(55); y(59)<=d(47); y(60)<=d(39);

28 y(61)<=d(31); y(62)<=d(23); y(63)<=d(15); y(64)<=d(7);

29 end behavior;

As this function is purely combinatorial we don’t need to have a register (i.e., clocked input) on this model, although we could implement that if required using a simple process. 

As shown in the previous description of the expansion function, we need to take a word consisting of 32 bits and expand it to 48 bits. This requires a translation table as shown below. Notice that there are duplicates in the cell which means that you only need 32 input bits to obtain 48 output bits.

4.png

We can use a VHDL model similar to the initial permutation function, except that in this case there are 32 input bits and 48 output bits. Notice that some of the input bits are repeated, giving a straightforward expansion function.

1 library ieee;

2 use ieee.std_logic_1164.all; 34 entity des_e is port

5 ( 6 d : in std_logic_vector(1 to 32);

7 y : out std_logic_vector(1 to 48)

8 );

9 end des_e;

The architecture is simply the assignment of bits from input to output according to the initial permutation table previously defined.

1 architecture behavior of des_e is

2 begin

3 y(1)<=d(32); y(2)<=d(1); y(3)<=d(2); y(4)<=d(3);

4 y(5)<=d(4); y(6)<=d(5); y(7)<=d(4); y(8)<=d(5);

5 y(9)<=d(6); y(10)<=d(7); y(11)<=d(8); y(12)<=d(9);

6 y(13)<=d(8); y(14)<=d(9); y(15)<=d(10); y(16)<=d(11);

7 y(17)<=d(12); y(18)<=d(13); y(19)<=d(12); y(20)<=d(13);

8 y(21)<=d(14); y(22)<=d(15); y(23)<=d(16); y(24)<=d(17);

9 y(25)<=d(16); y(26)<=d(17); y(27)<=d(18); y(28)<=d(19);

10 y(29)<=d(20); y(30)<=d(21); y(31)<=d(20); y(32)<=d(21);

11 y(33)<=d(22); y(34)<=d(23); y(35)<=d(24); y(36)<=d(25);

12 y(37)<=d(24); y(38)<=d(25); y(39)<=d(26); y(40)<=d(27);

13 y(41)<=d(28); y(42)<=d(29); y(43)<=d(28); y(44)<=d(29);

14 y(45)<=d(30); y(46)<=d(31); y(47)<=d(32); y(48)<=d(1);

15 end behavior;

The final permutation block is the permutation marked (P) on the fine structure after the key function. This is a straightforward bit substitution function with 32 bits input and 32 bits output. The bit translation table is shown in the following table:

5.png

This is implemented in VHDL using exactly the same approach as the previous expansion and permutation functions as follows:

1 library ieee;

2 use ieee.std_logic_1164.all; 34 entity des_p is port

5 ( 6 d : in std_logic_vector(1 to 32);

7 y : out std_logic_vector(1 to 32)

8 );

9 end des_p;

The architecture is simply the assignment of bits from input to output according to the initial permutation table previously defined.

1 architecture behavior of des_p is

2 begin

3 y(1)<=d(16); y(2)<=d(7); y(3)<=d(20); y(4)<=d(21);

4 y(5)<=d(29); y(6)<=d(12); y(7)<=d(28); y(8)<=d(17);

5 y(9)<=d(1); y(10)<=d(15); y(11)<=d(23); y(12)<=d(26);

6 y(13)<=d(5); y(14)<=d(18); y(15)<=d(31); y(16)<=d(10);

7 y(17)<=d(2); y(18)<=d(8); y(19)<=d(24); y(20)<=d(14);

8 y(21)<=d(32); y(22)<=d(27); y(23)<=d(3); y(24)<=d(9);

9 y(25)<=d(19); y(26)<=d(13); y(27)<=d(30); y(28)<=d(6);

10 y(29)<=d(22); y(30)<=d(11); y(31)<=d(4); y(32)<=d(25);

11 end behavior;

The nonlinear part of the DES algorithm is the S box. This is a set of 6->4 bit transformations that reduce the 48 bits of the expanded word in the DES f function to the 32 bits for the next round. The required row and column are obtained from the data passed into the S box. The data into the S box is a 6-bit binary word. The row is obtained from 2.b1 + b6 and the column is obtained from b2b3b4b5. For example, S(011011) would give a row of 01 (1) and a column of 1101 (13). For S8 this would result in a value returning of 1110 (14).

The basic S-box entity can therefore be constructed using the following VHDL:

1 library ieee;

2 use ieee.std_logic_1164.all; 3 entity des_sbox is

4 port ( 5 d : in std_logic_vector (1 to 6);

6 y : out std_logic_vector (1 to 4)

7 );

8 end entity des_sbox;

One approach is to define the row and column from the input D word and then calculate the output Y word from that using a lookup table approach or minimize the logic as a truth table. The basic architecture could then look something like this:

1 architecture behavior of sbox is

2 signal r : std_logic_vector (1 to 2);

3 signal c : std_logic_vector (3 to 6);

4 begin

5 r <= d ( 1 to 2);

6 c <= d (3 to 6 );

7 −− the look up table or logic goes here

8 end;

Another approach is to define a simple lookup table with the input d as the unique address and the output y stored in the memory; this is exactly the same as a ROM, so the input is defined as an unsigned integer to look up the required value. In this case the memory is defined in exactly the same way as the ROM separately in this book. 

The S-box substitutions are given in the table following and the VHDL can either use the lookup table approach to store the address of each substitution, or logic can be used to decode the correct output. 

In order to use this table, the appropriate S box is selected and then the 2 bits of the row select the appropriate row and the same for the column. For example, for S box S1, if the row is 3

6.png

(11) and the Column is 10 (1010) then the output can be read off as 3 (0011). This can be coded in VHDL using nested case statements as follows:

1 case row is

2 when 0 =>

3 case column is

4 when 0 => y <= 14;

5 when 1 => y <= 4;

6 −− and so on

7 end case; 8 when 1 =>

9 case column is

10 −− and the lookup goes here

11 End case

12 −− and so on for all the rows

13 end case;

Obviously this is quite cumbersome, but also very easy to code automatically using a simple code generator and offers the possibility of the synthesis tool carrying out logic optimization and providing a much more efficient implementation than a memory block.

DES Verilog Implementation 

DES can also be implemented in Verilog using a structural or a functional approach. As has been discussed previously, there are advantages to both methods; however, the DES algorithm is tied implicitly to the structure, so a structural approach will give an efficient implementation. 

Implementing the initial permutation in Verilog requires a 64-bit input vector and a 64-bit output vector. In Verilog this is very easy to create by simply specifying the size of the input and output accordingly. In this combinatorial example (the same as the VHDL example previously in this chapter), we have not used an enable signal, but this would be easy to add, and then of course we can implement the output with a high impedance (z) state for unknown input values.

1 module desip (y,d);

2 / Define the IO

3 input [64:1] d;

4 output [64:1] y;

56 / Define the output as a register

7 reg [64:1] y;

89 / Assign the permutations on any change in d

10 always @ d

11 begin

12 y[1]<=d[58]; y[2]<=d[50]; y[3]<=d[42]; y[4]<=d[34];

13 y[5]<=d[26]; y[6]<=d[18]; y[7]<=d[10]; y[8]<=d[2];

14 y[9]<=d[60]; y[10]<=d[52]; y[11]<=d[44]; y[12]<=d[36];

15 y[13]<=d[28]; y[14]<=d[20]; y[15]<=d[12]; y[16]<=d[4];

16 y[17]<=d[62]; y[18]<=d[54]; y[19]<=d[46]; y[20]<=d[38];

17 y[21]<=d[30]; y[22]<=d[22]; y[23]<=d[14]; y[24]<=d[6];

18 y[25]<=d[64]; y[26]<=d[56]; y[27]<=d[48]; y[28]<=d[40];

19 y[29]<=d[32]; y[30]<=d[24]; y[31]<=d[16]; y[32]<=d[8];

20 y[33]<=d[57]; y[34]<=d[49]; y[35]<=d[41]; y[36]<=d[33];

21 y[37]<=d[25]; y[38]<=d[17]; y[39]<=d[9]; y[40]<=d[1];

22 y[41]<=d[59]; y[42]<=d[51]; y[43]<=d[43]; y[44]<=d[35];

23 y[45]<=d[27]; y[46]<=d[19]; y[47]<=d[11]; y[48]<=d[3];

24 y[49]<=d[61]; y[50]<=d[53]; y[51]<=d[45]; y[52]<=d[37];

25 y[53]<=d[29]; y[54]<=d[21]; y[55]<=d[13]; y[56]<=d[5];

26 y[57]<=d[63]; y[58]<=d[55]; y[59]<=d[47]; y[60]<=d[39];

27 y[61]<=d[31]; y[62]<=d[23]; y[63]<=d[15]; y[64]<=d[7];

28 end

29 endmodule

As this function is purely combinatorial we don’t need to have a register (i.e., clocked input) on this model, although we could implement that if required using a simple addition of a clock. 

As shown in the previous description of the expansion function, we need to take a word consisting of 32 bits and expand it to 48 bits. This requires a translation table as shown below. Notice that there are duplicates in the cell, which means that you only need 32 input bits to obtain 48 output bits.

7.png

We can use a Verilog model similar to the initial permutation function, except that in this case there are 32 input bits and 48 output bits. Notice that some of the input bits are repeated, giving a straightforward expansion function.


1 module dese (y,d);

2 / Define the IO

3 input [32:1] d;

4 output [48:1] y;

56 / Define the output as a register

7 reg [48:1] y;

89 / Assign the permutations on any change in d

10 always @ d

11 begin

12 y[1]<=d[32]; y[2]<=d[1]; y[3]<=d[2]; y[4]<=d[3];

13 y[5]<=d[4]; y[6]<=d[5]; y[7]<=d[4]; y[8]<=d[5];

14 y[9]<=d[6]; y[10]<=d[7]; y[11]<=d[8]; y[12]<=d[9];

15 y[13]<=d[8]; y[14]<=d[9]; y[15]<=d[10]; y[16]<=d[11];

16 y[17]<=d[12]; y[18]<=d[13]; y[19]<=d[12]; y[20]<=d[13];

17 y[21]<=d[14]; y[22]<=d[15]; y[23]<=d[16]; y[24]<=d[17];

18 y[25]<=d[16]; y[26]<=d[17]; y[27]<=d[18]; y[28]<=d[19];

19 y[29]<=d[20]; y[30]<=d[21]; y[31]<=d[20]; y[32]<=d[21];

20 y[33]<=d[22]; y[34]<=d[23]; y[35]<=d[24]; y[36]<=d[25];

21 y[37]<=d[24]; y[38]<=d[25]; y[39]<=d[26]; y[40]<=d[27];

22 y[41]<=d[28]; y[42]<=d[29]; y[43]<=d[28]; y[44]<=d[29];

23 y[45]<=d[30]; y[46]<=d[31]; y[47]<=d[32]; y[48]<=d[1];

24 end

25 endmodule

The final permutation block is the permutation marked (P) on the fine structure after the key function. This is a straightforward bit substitution function with 32 bits input and 32 bits output. The bit translation is shown in the following table:

8.png

This is implemented in Verilog using exactly the same approach as the previous expansion and permutation functions as follows, with the behavior simply the assignment of bits from input to output according to the initial permutation table previously defined.

1 module desp (y,d);

2 / Define the IO

3 input [32:1] d;

4 output [32:1] y;

56 / Define the output as a register

7 reg [32:1] y;

8

9 / Assign the permutations on any change in d

10 always @ d

11 begin

12 y[1]<=d[16]; y[2]<=d[7]; y[3]<=d[20]; y[4]<=d[21];

13 y[5]<=d[29]; y[6]<=d[12]; y[7]<=d[28]; y[8]<=d[17];

14 y[9]<=d[1]; y[10]<=d[15]; y[11]<=d[23]; y[12]<=d[26];

15 y[13]<=d[5]; y[14]<=d[18]; y[15]<=d[31]; y[16]<=d[10];

16 y[17]<=d[2]; y[18]<=d[8]; y[19]<=d[24]; y[20]<=d[14];

17 y[21]<=d[32]; y[22]<=d[27]; y[23]<=d[3]; y[24]<=d[9];

18 y[25]<=d[19]; y[26]<=d[13]; y[27]<=d[30]; y[28]<=d[6];

19 y[29]<=d[22]; y[30]<=d[11]; y[31]<=d[4]; y[32]<=d[25];

20 end

21 endmodule

The nonlinear part of the DES algorithm is the S box. This is a set of 6->4 bit transformations that reduce the 48 bits of the expanded word in the DES f function to the 32 bits for the next round. The required row and column are obtained from the data passed into the S box. The data into the S box is a 6-bit binary word. The row is obtained from 2.b1 + b6 and the column is obtained from b2b3b4b5. For example, S(011011) would give a row of 01 (1) and a column of 1101 (13). For S8 this would result in a value returning of 1110 (14). 

The basic S-box model can therefore be constructed using the following Verilog, with one approach to define the row and column from the input d word and then calculate the output Y word from that using a lookup table approach or minimize the logic as a truth table. The basic architecture could then look something like this:

1 module sbox (y,d);

2 / Define the IO

3 input [6:1] d;

4 output [4:1] y;

56 / Define the output as a register

7 reg [4:1] y;

89 / Assign the permutations on any change in d

10 reg [2:1] r;

11 reg [4:1] c;

12

13 always @ d

14 begin

15 r <= d[2:1];

16 c <= d[6:3];

17

18 / The remainder of the SBOX logic goes here

19

20 end

21 endmodule

Another approach is to define a simple lookup table with the input d as the unique address and the output y stored in the memory; this is exactly the same as a ROM, so the input is defined as an unsigned integer to look up the required value. In this case the memory is defined in exactly the same way as the ROM is defined separately in this book.

In order to use the table, the appropriate S box is selected and then the 2 bits of the row select the appropriate row and the same for the column. For example, for Sbox S1, if the row is 3 (11) and the Column is 10 (1010) then the output can be read off as 3 (0011). This can be coded in VHDL using nested case statements as follows:

1 case (row)

2 0: case (column)

3 0: y=14;

4 1: y=4;

5 / and so on

6 endcase

7 1: case (column)

8 0: y=0;

9 1: y=15;

10 / and so on

11 endcase

12 / and so on for all the rows

13 endcase

Obviously this is quite cumbersome, but also very easy to code automatically using a simple code generator and offers the possibility of the synthesis tool carrying out logic optimization and providing a much more efficient implementation than a memory block.

Validation of DES 

In order to validate the implementation of DES, a set of test vectors can be used (i.e., plaintext/ciphertext pairs to ensure that the correct processing is taking place). A suitable set of test vectors is given as:

9.png

In this case the key to be used is 0123456789ABCDEF. 

Each of the groups of hexadecimal characters is represented by 7-bit ASCII and adding an extra bit.


  • XC2V2000-4FG676C

    Manufacturer:Xilinx

  • FPGA Virtex-II Family 2M Gates 24192 Cells 650MHz 0.15um Technology 1.5V 676-Pin FBGA
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS:

  • XC4028XLA-08BG256C

    Manufacturer:Xilinx

  • FPGA XC4000XLA Family 28K Gates 2432 Cells 263MHz 0.35um Technology 3.3V 256-Pin BGA
  • Product Categories:

    Lifecycle:Obsolete -

    RoHS: No RoHS

  • XCV300-4BG432I

    Manufacturer:Xilinx

  • FPGA Virtex Family 322.97K Gates 6912 Cells 250MHz 0.22um Technology 2.5V 432-Pin BGA
  • Product Categories: Voltage regulator tube

    Lifecycle:Obsolete -

    RoHS: No RoHS

  • XC5VLX110-2FF676C

    Manufacturer:Xilinx

  • FPGA Virtex-5 LX Family 110592 Cells 65nm Technology 1V 676-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS: No RoHS

  • XC5VLX110-2FFG1760I

    Manufacturer:Xilinx

  • FPGA Virtex-5 LX Family 110592 Cells 65nm Technology 1V 1760-Pin FCBGA
  • Product Categories: FPGAs (Field Programmable Gate Array)

    Lifecycle:Active Active

    RoHS:

Need Help?

Support

If you have any questions about the product and related issues, Please contact us.