FONT SIZE : AAA
A useful example of an embedded processor is to consider a generic microcontroller in the context of an FPGA platform. Take a simple example of a generic 8-bit microcontroller as shown in Figure 8.1.
As can be seen from Figure 8.1, the microcontroller is a general-purpose microprocessor with a simple clock (clk) and reset (clr), and three 8-bit ports (A, B, and C). Within the microcontroller itself, there needs to be the following basic elements:
1. A control unit: this is required to manage the clock and reset of the processor, manage the data flow and instruction set flow, and control the port interfaces. There will also need to be a program counter (PC).
2. An ALU: a microcontroller will need to be able to carry out at least some rudimentary processing which is carried out in the ALU (Arithmetic Logic Unit).
3. An Address Bus.
4. A Data Bus.
5. Internal Registers.
6. An instruction decoder.
7. A ROM to hold the program.
While each of these individual elements (1-6) can be implemented simply enough using a standard FPGA, the ROM presents a specific difficulty. If we implement a ROM as a set of registers, then obviously this will be hugely inefficient in an FPGA architecture. However, in most modern FPGA platforms, there are blocks of RAM on the FPGA that can be accessed and it makes a lot of sense to design a RAM block for use as a ROM by initializing it with the ROM values on reset and then using that to run the program.
This aspect of the embedded core raises an important issue, which is the reduction in efficiency of using embedded rather than dedicated cores. There is usually a compromise involved and in this case it is that the ROM needs to be implemented in a different manner, in this case with a hardware penalty. The second issue is what type of memory core to use.
In an FPGA RAM, the memory can usually be organized in a variety of configurations to vary the depth (number of memory addresses required) and the width (width of the data bus). For example, a 512 address RAM block, with an 8-bit address width would be equivalent to a 256 address RAM block with a 16-bit address width. If the equivalent microcontroller ROM is, say, 12 bits wide and 256, then we can use a 256 × 16 RAM block and ignore the top 4 bits. The resulting embedded microcontroller core architecture could be of the form shown in Figure 8.2.
When we program a microprocessor of any type, there are three different ways of representing the code that will run on the processor. These are machine code (1s and 0s), assembler (low level instructions such as LOAD, STORE), and high level code (such as C, Fortran, or Pascal). Regardless of the language used, the code will always be compiled or assembled into machine code at the lowest level for programming into memory. High level code (e.g., C) is compiled and assembler code is assembled (as the name suggests) into machine code for the specific platform.
Clearly a detailed explanation of a compiler is beyond the scope of this book, but the same basic process can be seen in an assembler and this is useful to discuss in this context. Every processor has a basic Instruction Set which is simply the list of functions that can be run in a program on the processor. Take the simple example of the following pseudocode expression:
1 b = a + 2;
In this example, we are taking the variable a and adding the integer value 2 to it, and then storing the result in the variable b. In a processor, the use of a variable is simply a memory location that stores the value, and so to load a variable we use an assembler command as follows:
1 LOAD a
What is actually going on here? Whenever we retrieve a variable value from memory, the implication is that we are going to put the value of the variable in the register called the accumulator (ACC). The command “LOAD a” could be expressed in natural language as “LOAD the value of the memory location denoted by a into the accumulator register ACC.”
The next stage of the process is to add the integer value 2 to the accumulator. This is a simple matter, as instead of an address, the value is simply added to the current value stored in the accumulator. The assembly language command would be something like:
1 ADD #x02
Notice that we have used the x to denote a hexadecimal number. If we wished to add a variable, say called c, then the command would be the same, except that it would use the address c instead of the absolute number. The command would therefore be:
1 ADD c
Now we have the value of a+2 stored in the accumulator register (ACC). This could be stored in a memory location, or put onto a port (e.g., PORT A). It is useful to notice that for a number we use the key character # to indicate that we are adding the value and not using the argument as the address. In the pseudocode example, we are storing the result of the addition in the variable called b, so the command would be something like this:
1 STORE b
While this is superficially a complete definition of the instruction set requirements, there is one specific design detail that has to be decided on for any processor. This is the number of instructions and the data bus size. If we have a set of instructions with the number of instructions denoted by N, then the number of bits in the opcode (n) must conform to the following rule:
N >= 2n (8.1)
In other words, the number of bits provides the number of unique different codes that can be defined, and this defines the size of the instruction set possible. For example, if n = 3, then with 3 bits there are 8 possible unique opcodes, and so the maximum size of the instruction set is 8.
The standard method of executing a program in a processor is to store the program in memory and then follow a strict sequence of events to carry out the instructions. The first stage is to use the program counter to increment the program line; this then calls up the next command from memory in the correct order, and then the instruction can be loaded into the appropriate register for execution. This is called the fetch execute cycle.
What is happening at this point? First the contents of the program counter (PC) are loaded into the memory address register (MAR). The data in the memory location are then retrieved and loaded into the memory data register (MDR). The contents of the MDR can then be transferred into the instruction register (IR). In a basic processor, the PC can then be incremented by one (or in fact this could take place immediately after the PC has been loaded into the MDR). Once the opcode (and arguments if appropriate) are loaded, then the instruction can be executed. Essentially, each instruction has its own state machine and control path, which is linked to the instruction register (IR) and a sequencer that defines all the control signals required to move the data correctly around the memory and registers for that instruction. We will discuss registers in the next section, but in addition to the program counter (PC), instruction register (IR) and accumulator (ACC) mentioned already, we require two memory registers at a minimum, the Memory Data Register (MDR) and Memory Address Register (MAR).
For example, consider the simple command LOAD a, from the previous example. What is required to actually execute this instruction? First, the opcode is decoded and this defines that the command is a LOAD command. The next stage is to identify the address. As the command has not used the # symbol to denote an absolute address, this is stored in the variable a. The next stage, therefore, is to load the value in location a into the MDR, by setting MAR = a and then retrieving the value of a from the RAM. This value is then transferred to the accumulator (ACC).
The design of the registers partly depends on whether we wish to clone a “real” device or create a modified version that has more custom behavior. In either case there are some mandatory registers that must be defined as part of the design. We can assume that we need an accumulator (ACC), a program counter (PC), and the three input/output ports (PORTA, PORTB, PORTC). Also, we can define the instruction register (IR), Memory Address Register (MAR), Memory Data Register (MDR).
In addition to the data for the ports, we need to have a definition of the port direction and this requires three more registers for managing the tristate buffers into the data bus to and from the ports (DIRA, DIRB, DIRC). In addition to this, we can define a number (essentially arbitrary) of registers for general purpose usage. In the general case the naming, order, and numbering of registers does not matter; however, if we intend to use a specific device as a template, and perhaps use the same bit code, then it is vital that the registers are configured in exactly the same way as the original device and in the same order.
In this example, we do not have a base device to worry about, and so we can define the general purpose registers (24 in all) with the names REG0 to REG23. In conjunction with the general purpose registers, we need to have a small decoder to select the correct register and put the contents onto the data bus (F).
In order for the device to operate as a processor, we must define some basic instructions in the form of an instruction set. For this simple example we can define some very basic instructions that will carry out basic program elements, ALU functions, memory functions. These are summarized in the following list of instructions:
• LOAD arg This command loads an argument into the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: LOAD #01 LOAD abc
• STORE arg This command stores an argument from the accumulator into memory. If the argument has the prefix # then it is the absolute address, otherwise it is the address and this is taken from the relevant memory address. Examples: STORE #01 STORE abc
• ADD arg This command adds an argument to the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: ADD #01 ADD abc
• NOT This command carries out the NOT function on the accumulator.
• AND arg This command ands an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: AND #01 AND abc
• OR arg This command ors an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: OR #01 OR abc
• XOR arg This command xors an argument with the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address.
Examples:
XOR #01
XOR abc
• INC This command carries out an increment by one on the accumulator.
• SUB arg This command subtracts an argument from the accumulator. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: SUB #01 SUB abc
• BRANCH arg This command allows the program to branch to a specific point in the program. This may be very useful for looping and program flow. If the argument has the prefix # then it is the absolute number, otherwise it is the address and this is taken from the relevant memory address. Examples: BRANCH #01 BRANCH abc
In this simple instruction set, there are 10 separate instructions. This implies, from the rule given in equation (8.1) previously in this chapter, that we need at least 4 bits to describe each of the instructions given in the table above. Given that we wish to have 8 bits for each data word, we need to have the ability to store the program memory in a ROM that has words of at least 12 bits wide. In order to cater for a greater number of instructions, and also to handle the situation for specification of different addressing modes (such as the difference between absolute numbers and variables), we can therefore suggest a 16-bit system for the program memory.
Notice that at this stage there are no definitions for port interfaces or registers. We can extend the model to handle this behavior later.
So far in the design of this simple microprocessor, we have not specified details beyond a fairly abstract structural description of the processor in terms of registers and busses. At this stage we have a decision about the implementation of the design with regard to the program and architecture.
One option is to take a program (written in assembly language) and simply convert this into a state machine that can easily be implemented in a VHDL model for testing out the algorithm. Using this approach, the program can be very simply modified and recompiled based on simple rules that restrict the code to the use of registers and techniques applicable to the processor in question. This can be useful for investigating and developing algorithms, but is more ideal than the final implementation as there will be control signals and delays due to memory access in a processor plus memory configuration, that will be better in a dedicated hardware design.
Another option is to develop a simple model of the processor that does have some of the features of the final implementation of the processor, but still uses an assembly language description of the model to test. This has advantages in that no compilation to machine code is required, but there are still not the detailed hardware characteristics of the final processor architecture that may cause practical issues on final implementation.
The third option is to develop the model of the processor structurally and then the machine code can be read in directly from the ROM. This is an excellent approach that is very useful for checking both the program and the possible quirks of the hardware/software combination, as the architecture of the model reflects directly the structure of the model to be implemented on the FPGA.
In order to create a suitable instruction set for decoding instructions for our processor, the assembly language instruction set needs to have an equivalent machine code instruction set that can be decoded by the sequencer in the processor. The resulting opcode/instruction table is given here:
Taking the abstract design of the microprocessor given in Figure 8.2 we can redraw with the exact registers and bus configuration as shown in the structural diagram in Figure 8.3. Using this model we can create separate VHDL models for each of the blocks that are connected to the internal bus and then design the control block to handle all the relevant sequencing and control flags to each of the blocks in turn. Before this can be started, however, it makes sense to define the basic criteria of the models and the first is to define the basic type. In any digital model (as we have seen elsewhere in this book) it is sensible to ensure that data can be passed between standard models and so in this case we shall use the std_logic_1164 library that is the standard for digital models.
In order to use this library, each signal shall be defined in VHDL of the basic type std_logic and also the library ieee.std_logic_1164.all shall be declared in the header of each of the models in the processor. Finally, each block in the processor shall be defined as a separate block for implementation in VHDL or Verilog.
Manufacturer:Xilinx
Product Categories:
Lifecycle:Obsolete -
RoHS: No RoHS
Manufacturer:Xilinx
Product Categories:
Lifecycle:Obsolete -
RoHS:
Manufacturer:Xilinx
Product Categories: CPLD/FPGA
Lifecycle:Any -
RoHS: -
Manufacturer:Xilinx
Product Categories:
Lifecycle:Obsolete -
RoHS:
Manufacturer:Xilinx
Product Categories: Embedded - CPLDs (Complex Programmable Logic Devices)
Lifecycle:Active Active
RoHS: No RoHS
Support