RISC (Reduced Instruction Set Computing RISC) is a microprocessor that executes fewer types of computer instructions. It originated from the MIPS mainframe (namely, RISC machine) in the 1980s. The microprocessor used in RISC machine is collectively called RISC processor. As a result, it can perform operations at a faster speed (more millions of instructions per second, or MIPS). Because the computer requires additional transistors and circuit elements to execute each instruction type, the larger the computer instruction set, the more complicated the microprocessor and the slower the execution. John Cocke of the IBM Research Center in Yorktown, New York proved that about 20% of the instructions in the computer took up 80% of the work. In 1974, he proposed the concept of RISC. Many current microchips use the RISC concept.
Reduced instruction set computer: (RISC: Reduced Instruction Set Computing) A computer with a short instruction length that runs faster than CISC. RISC and CISC are two types of CPUs based on the characteristics of the instruction set: CISC and RISC. RISC is the abbreviation of English Reduced Instruction Set Computing, which is "reduced instruction operation set", CISC is "complex instruction operation set". RISC's instruction system is relatively simple. It only requires the hardware to execute the most limited and most commonly used instructions. Most complex operations use mature compilation techniques and are synthesized by simple instructions. At present, the CPU of this instruction system is generally used in high-end servers, especially the high-end servers all use the CPU of the RISC instruction system. CPUs that use RISC instructions in high-end servers include Alpha of Compaq (Compaq, New HP), PA-RISC of HP, Power PC of IBM, MIPS of MIPS, and Sparc of SUN.
RISC is relative to the complex instruction set computer (CISC). The so-called complex instruction set computer depends on increasing the hardware structure of the machine to meet the increasing performance requirements of the computer. The development of computer architecture has been monopolized by increasingly complex processors. In order to reduce the difference between computer operations and high-level languages, in order to improve the operating characteristics of machines, there are more and more machine instructions and more and more complicated instruction systems In particular, the contradiction between the early high-speed CPU and slow-speed memory, in order to minimize the number of data accesses and increase the speed of the machine, greatly developed the complex instruction set, but with the development of semiconductor process technology, The increasing speed of memory, especially the use of high-speed buffers, has caused fundamental changes in the computer architecture. While hardware technology has improved, software has also undergone equally important developments. An optimized compiler has emerged, which Execution time is reduced as much as possible! And the memory occupied by machine language is minimized. Under the condition of advanced memory technology and advanced compiler, the CISC architecture is no longer applicable, so the RISC architecture was born. The basic starting point of technology is to reduce the complexity of hardware design and improve the speed of instruction execution by reducing the machine instruction system. In RISC, the computer actually executes instructions in every machine cycle, whether it is simple or complex operations, all are made by simple The instruction block is completed, and it has strong simulation ability.
In RISC machines, all instructions are required to be executed within a single machine cycle time, and the system's most fundamental throughput rate limit is determined by the ratio of memory access time during program execution. Therefore, as long as the CPU executes instructions and fetches time The same, you can get the maximum system throughput. (For a machine cycle to execute an instruction). In RISC machines, hardware control is used to decode which instruction is recognized, and fewer instructions and simple addressing modes are used to simplify instruction decoding and hard-wired control logic through a fixed instruction format. In addition, RISC design is based on complex design optimization to obtain a simple hardware chip environment. Compilation optimization can improve the operating efficiency of HLL programs.
The RISC design eliminates the microcode routines, and gives the machine generation control to the software. That is, the faster RAM replaces the microcode ROM in the processor as a command cache, and the computer control resident instructions Cache, so that the instruction streams generated by computer systems and compilers closely match the needs of high-level languages and hardware performance.
The performance of the computer can be measured by the time required to complete a specific task, this time is equal to CXTXl.
C=Number of cycles required to complete each instruction
T = time per cycle
I = number of instructions per task
RISC technology is an effort to minimize C and T. The reduction of C and T may lead to an increase in I, but the use of optimized compilation technology and other technologies can make up for the impact of I on machine performance. The place of RISC technology From the rapid development of a new insight into a promising computer market, the main reasons are as follows: first, the RISC structure adapts to the development of the ever-changing VLSI technology; second, RISC simplifies the processor structure, implementation and It is easier to debug, so the design cost is low and the development cycle is short; third, the structure is simplified, the processor occupies a smaller chip area, so that a larger register file can be integrated on the same chip, and the translation backup buffer (TLB) , Coprocessor and fast knee multiplier, etc., make the processor obtain higher performance; Fourth, RISC support HLL program is better than the previous complex instruction system travel computer, can make users (programmers) easy to use unified Instruction set, it is easy to estimate the role of code optimization, so that programmers have more confidence in the correctness of the hardware.
Adopt multi-level instruction pipeline structure
The use of pipeline technology can make multiple instructions overlap each time to reduce the value of CPI, so that the CPU does not waste empty cycles. Example: Pentium Ⅱ /Pro/Celeron can issue five instructions at the same time, AMD-K6/K6-2 can issue six instructions at the same time.
Simple commands with high frequency of use and some complex commands in the machine
This can reduce the number of clock cycles and increase the CPU speed. The essence is to reduce the value under CPI. Example: Select operation instructions, load, store instructions and transfer instructions as the main instruction set.
Adopt Load and Store structure
Only Load and Store instructions are allowed to perform memory operations, and the remaining instructions operate on registers. Examples: Amd-K6/K6-2, P Ⅱ /Celeron/Pro all support direct operation and renaming of registers, and greatly increase the number of general registers.
Lazy load instruction and branch instruction
Due to the speed difference between the memory and the register, the transfer instruction needs to calculate the entry address, which greatly limits the CPU execution speed. Therefore, in order to ensure the high-speed operation of the pipeline, RISC technology is allowed to add an unrelated Instructions that can be executed immediately to increase speed.
Example: Mainly reflected in the aspects of predictive execution, non-sequential execution and data transmission, except Intel P54/55C does not support, like K6-2, P Ⅱ are supported.
Use cache structure
In order to ensure the continuous transmission of instructions to the CPU, the CPU sets a certain size of Cache to expand the bandwidth of the memory to meet the demand of the CPU to frequently fetch instructions. Generally, there are two independent caches, each storing "instruction + data".
Example: P Ⅱ /Celeron: 16K + 16K, AMD-K6/K6-2 is 32K + 32K, Cyrix M Ⅱ: 64K (there are also two 32K Cache, this is shared Cache), P Ⅱ also added L2 Cache , Has greatly increased the CPU speed.
The characteristics of RISC are the few instructions and their formats, and the simple operation and control. The specific aspects are as follows.
Reduced instruction set
The RISC structure uses a streamlined, long and short instruction set, making most operations as efficient as possible. Some operations that need to be implemented with multi-cycle instructions in the traditional structure, in the RISC structure, through machine language programming, instead of multiple single-cycle instructions. The reduced instruction set greatly improves the performance of the processor and promotes the design of RISC. There is no certain answer to the question of how simple it is. Comparing the existing RISC system with the CISC system, you can see the approximate. Generally, for RISC:
The number of instructions is small, no more than 128.
There are few addressing modes, no more than four.
There are few instruction formats, no more than 4 types.
It is very prudent to deal with the suggestions about expanding the instruction set. It must be carefully weighed and verified to see if they really improve the performance of the computer. For example, MIPS uses a rule: adding an instruction must make the performance gain a 1% gain in a certain application range, otherwise this instruction will be rejected.
Instruction clock cycle, instruction length is equal
If the task to be executed by each instruction is simple and clear, the time taken to execute each instruction can be compressed or reduced. The design goal of RISC is to implement a machine cycle to execute an instruction, making the system operation more efficient. Technologies that approach this goal include instruction pipelines and specific load/store structures. Typical instructions may include fetching, decoding, executing, and storing fruit. Single cycle refers to all instructions can be achieved by standard length. The standard command length should be equal to the basic word length of the computer system, usually equal to the number of data lines in the system.
In any fetch cycle, a complete single instruction is passed to the CPU. For example, if the base word length is 32 bits and the data portion of the system bus is 32 lines, the standard instruction length is 32 bits. It is difficult to make the execution time of all instructions consistent. Some instructions, including simple logic operations on CPU registers (clearing registers, etc.), can be easily executed within one CPU clock cycle; other instructions may include memory access (reading, writing, and accessing memory, etc.) or Multi-cycle operations (multiplication, division, etc.) may not be executed in a single cycle. This puts forward such a requirement to the designer; it allows most frequently used instructions to be executed in a single cycle.
The way to reduce the number of cycles required to execute an instruction is to execute multiple instructions overlappingly. The instruction pipeline adopts such a working method: divide the execution of each instruction into several discrete parts, and then execute multiple instructions at the same time. The fetching and execution phases of any instruction occupy the same time, ideally a single cycle. This is arguably the most important design principle of RISC. All instructions executed from memory to CPU follow a constant stream. Each instruction is executed at the same pace, without waiting instructions. The CPU is always busy. The necessary conditions to achieve pipeline operation are:
The standard, fixed-length instruction is equal to the computer word length and the data line word length.
The standard execution time of all instructions is best within a single CPU cycle.
For example, the SPARC chip uses a four-stage pipeline structure of fetching, decoding, executing, and writing results to maximize the performance of the processor. At the beginning of each clock cycle, a new instruction can be started, which ensures that each machine cycle takes an average of a new instruction from memory, so that, overall, most instructions can be in a single cycle achieve. Instruction pipeline technology can be compared to an assembly line—instructions flow from one process to the next process like the processed product until it is executed.
Therefore, it is possible for the instruction pipeline to use a factor equal to its pipeline depth to reduce the number of instruction cycles, but in this case, the pipeline is always full of useful instructions and nothing prevents the instructions from passing through the pipeline. This requirement adds a certain amount to the structure. burden. For example, competition for resources such as ALU prevents the flow of instructions in the pipeline. The adverse consequences caused by the different execution times are more obvious, which is why RISC has to define an instruction set with the characteristics described above.
Load and store structure (LOAD/Store)
Execution of operation instructions related to memory requires either an increase in the time per cycle or an increase in the number of instruction cycles. Because these instructions need to calculate the address of the operand, read the required operand from memory, calculate the result, and then send the result back to memory, so their execution time is much longer. In order to eliminate the negative effect of this instruction, RISC uses such a load and store number structure: only the load (Load) and store (Store) instructions to access the inter-memory, all other operations only access the processor registers The operand in. The advantages are:
Reduce the number of times to access the memory, reducing the memory bandwidth requirements.
Limiting all operations to registers only helps to simplify the instruction set.
Canceling the memory operation makes it easier for the compiler to optimize the allocation of registers—this feature reduces access to memory and also reduces the number of instructions per task.
Has a large register bank
In order to facilitate the operation of most instructions between registers, the so-called register-to-register operation, there must be a sufficient amount of CPU general registers. The sufficient number of registers makes it necessary to temporarily store in the CPU register as an intermediate result for the operation in the subsequent operation, thus reducing the number of loading and storing of memory and speeding up the running speed. At least 32 general-purpose CPU registers are used in industrial RISC systems.
Using hard-wired control
Because of the flexibility provided by microprogramming designers, many CISC systems are controlled by microprogramming. Different instructions usually have different lengths of microprograms, which means that each instruction has a different number of execution cycles, which contradicts the consistent and streamlined processing principle of all instructions. But this can be solved by hardwired control, and the speed will be faster. Therefore, RISC should be controlled by hard-wired. There may be exceptions when each instruction has a one-to-one correspondence with a single microinstruction, that is, each microprogram consists of a single control word. This design can be as fast and efficient as hardwired control, and allows designers to benefit from the superiority of microprogramming. The use of hard-wired control makes the RISC system controller simple. The simple design makes the layout of the machine more reasonable, allowing designers to concentrate on optimizing the remaining, but few, but critical processor characteristics. The simplified structure eases the on-chip area resource constraints. Some structures that are critical to performance, such as large register components, conversion lookup buffer (TLB) S coprocessors, and multiplication and division units can be installed on the same chip . These additional resources add a huge performance advantage to the processor. In fact, RISC does not necessarily have all of the above characteristics strictly, and some systems called RISC types even violate some of the above aspects. The above characteristics should be used as a guiding principle to explain the nature of RISC. To relax, a system that meets most of these characteristics can be considered a RISC.
RISC can improve speed while keeping costs down.
Due to the simplicity of the RISC instruction set, it requires relatively small and simple control unit decoding and hardware execution subsystems. This leads to the following results when implementing computer systems with VLSI:
The chip area occupied by the control unit is greatly reduced, such as RlsCI accounted for 10%, and usually CISC accounted for more than 50%. Therefore, in the RISCVLSI chip, more available space is left, so that the entire CPU and other components are on one chip (such as cache, floating point unit, part of main memory, memory management unit, 1/0 port).
As the control area is reduced, the amount of CPU registers (RISCI is 138) can be increased on the chip.
By reducing the area of the control unit on the VLSI chip and placing a large number of consistent registers, the regularization factor (regulariZationfaetor) of the chip can be improved. Basically, the higher the regularization factor, the lower the VLSI design cost.
It is advantageous to use GaAs (arsenide crop) VLSI chip implementation technology, because it is suitable for manufacturing chips with higher density. In short, it reduces complex procedures and simplifies the structure.
One of the characteristics of RISC is the instruction pipeline, and the consistency of instruction length and execution time minimizes the waiting and holding time in the pipeline. These factors help to improve the calculation speed. The simpler and smaller control units in RISC also have fewer doors, which makes the transmission path of the control unit signals shorter and speeds up the operation. The streamlining of the instruction set leads to a small and simple decoding system, which speeds up the decoding of RISC. The reduction in hardwired control units makes RISC perform faster than systems normally controlled by microprograms. The relatively large CPU register reduces the conflict between fetching and storing operations between the CPU and the memory; the large register set can be used to store the parameters passed between the calling process and the calling process, and store the relevant information of the interrupt program, otherwise , This information can only be saved in memory. All of these save a lot of computer processing time. Optimizing branch delay technology in compilation also contributed to improving speed. Overall, RISC is generally 2 to 4 times faster than CISC when the functions are roughly the same.
The relatively small and simple control unit of the CPU usually results in the following cost and reliability benefits: a. The design time of the RISC control unit is shortened, which can reduce the overall design cost. b. The short design time reduces the possibility that the final product will be discarded when the design is completed. c. Simpler, smaller control unit can reduce design errors, thereby improving reliability; moreover, positioning and correcting errors are also easier than CISC. d. Because the instruction format l (or 2) is simple and few, and all instructions have a standard length, so the instruction will not cross the word limit, nor will it cross different pages in the virtual memory (iVrtualMemory), which excludes the virtual memory management Potential difficulties in subsystem design.
The evolution process from CISC to RISC is similar to the development process from assembly language to high-level language. Writing programs in assembly language requires the use of some elaborate and complex instructions, while the writing of high-level languages is almost different from complex instructions. While pursuing streamlined instructions, RISC tightly combines the architecture and the design of optimized compilation to make the overall results lead to improvements in overall performance. If the development of RISC is based on the improvement of VLSI technology and compilation technology, it can be understood that the complex instruction system is replaced by complex compilation, and it can even be considered that the hardware problem is transferred to the software. In recent years, the rapid development of intelligent compilers can easily handle this task, and the advantages of RISC may be here. The traditional CISC must have complex microcode writing and design work, and the use of assembly language involves the development of assembly programs, which is very labor and time consuming. RISC is more conducive to supporting high-level languages, which is also one of the ways to solve the "software crisis" faced by computers for a long time. The success of RISC lies in software compatibility. As long as the source level is compatible through recompilation, the existing software can be easily run on the RISC machine.
The simplified structure gives programmers many benefits:
A more unified instruction set is convenient to use.
Since there is a strict correspondence between the number of instructions and the number of cycles, the actual effect of code optimization is easy to measure.
The programmer has a more accurate grasp of the hardware.
In today's computer world, there is an upsurge to find better performance. RISC and CISC are both confrontational and complementary. RISC has its own shortcomings.
The shortcomings of RISC are directly related to some of its advantages. Because RISC has a small number of instructions, some functions that are completed by only one instruction in CISC require two, three, or more instructions in RISC, which makes the RISC code longer, so the RISC program requires more memory, Command conflicts between memory and CPU will also increase. Research shows that, on average, a RISC program is 30% longer than a CISC program to perform the same function. At the same time, RISC has high requirements on the compiler. Optimizing the design of the compiler is a very important and technically demanding work. It must be compiled by the RISC machine manufacturer itself, because it is impossible to produce a compilation without RISC detailed original data. The object code of the program. In this way, it is more difficult for third-party companies to provide new versions, users have less choice, and software costs have increased.
A controversial feature of the RISC system is its large register set. Large register banks have the aforementioned advantages, but on the other hand, they also have shortcomings. Large registers increase the addressing time. In addition, some compilers make the use of small register banks more efficient. It remains to be discussed how large the CPU register set should be, and the large register U set can also be replaced by cache (CACHE). The shortcomings of the large register group can be summarized as follows:
Long access time.
The register set occupies more chip space.
Advanced compilation technology makes the small register set more effective.
If all CPU registers in the context switch site are saved, the large register set will save more storage time.
If a window pointer is used (one of the main points of RISC's implementation is the overlap register window, the purpose is to facilitate parameter transfer. The overlap register window uses a window pointer), the register address decoding will be longer. At the same time, overlapping registers also complicate the CPU logic.
There is a greater possibility of errors, it is not easy to find and modify errors, and it is more difficult to handle complex instructions. (4) Single word instructions cannot use direct memory addressing for full 32-bit addresses. For this reason, some manufacturers have made a small number of instructions with double-word cards (such as INTEL80960). The use of such instructions is determined by the programmer. He can write complete programs with only one-word instructions.
The basic starting point of RISC technology is to reduce the complexity of hardware design and improve the speed of instruction execution by reducing the machine instruction system. Although RISC's design ideas have had a huge impact on the computer structure and achieved great success, the complex instruction set computer (CISC) technology relatively makes the programming of the program easier. Therefore, CISC technology and RISC technology Not isolated from each other. At present, there is a new design idea, which is based on improving the performance of the entire computer system, and absorbs the advantages of CISC and RISC in the structure. For example, RISC technology is used in many CISC designs, and NSC32532 microprocessor of National Semiconductor Corporation adopts RISC technology in CISC design, which reduces the average instruction execution time from the original 6 machine cycles to less than 2.4 machine cycles. Under the main frequency of 26MHZ, the running speed reaches 10-12M1PS. Intel80486 and Mot. r. la68040 also absorbs RISC design technology, so that the average execution time of each instruction is less than 2 machine cycles. The Fairchild Clipper is a 32-bit microprocessor that combines the advantages of R1SC and CISC technology, and its running speed is up to 33MIPS. Therefore, the two main methods of designing a processor (RISC and ClSC technology) are not completely separated. They are complementary. At present, some people have combined CISC technology and RISC technology, and have proposed a writable instruction set computer ( WISC) structural assumptions, and specifies the strategic principles that focus on the advantages of Rlsc and Clsc. However, it is still a further development of a RISC technical dog built on the RlSC concept.
At present, most RISC processors have reached the goal of executing one instruction per cycle (that is, CPI ratio), but this is not the limit. Super scalar and super pipeline technology have appeared in RISC technology. Super scalar technology is to make the microprocessor execute several instructions in parallel in a clock cycle, while super pipeline technology is the mainstream pipeline segment (instruction decoding and instruction execution) only occupies a part of a clock cycle, so that it can still be Several instructions Intel8096 are executed simultaneously in one clock cycle. Super scalar technology is used, which can simultaneously execute integer instructions and floating-point instructions. IBM's RS/6000 also uses a super scalar structure, the processor contains three different processing components: fixed-point processor, floating-point processor , Branch processor, can execute four instructions (4IPC) in a clock cycle, up to 6IPc. RISc scientists pointed out that the encouraging performance improvement process of the microprocessor in the past is unlikely to continue. In the future, the capacity of the cache and its structure and optimization of the compiler will become the key factors to improve the performance of the computer. The future development focus will be on multiprocessor technology.
CPLD CoolRunner -II Family 1.5K Gates 64 Macro Cells 159MHz 0.18um Technology 1.8V 100-Pin VTQFP
FPGA Virtex Family 236.666K Gates 5292 Cells 333MHz 0.22um Technology 2.5V 256-Pin BGA
FPGA Virtex Family 236.666K Gates 5292 Cells 333MHz 0.22um Technology 2.5V 456-Pin FBGA
FPGA XC4000X Family 28K Gates 2432 Cells 0.35um Technology 3.3V 208-Pin HSPQFP EP
FPGA Virtex-5 FXT Family 65nm Technology 1V 1738-Pin FCBGA