This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > Wiki encyclopedia > MMU​

MMU​

MMU is an abbreviation of Memory Management Unit, and the Chinese name is a memory management unit, sometimes called a paged memory management unit (English: paged memory management unit, abbreviated as PMMU). It is a piece of computer hardware responsible for processing memory access requests of the central processing unit (CPU). Its functions include virtual address to physical address conversion (ie virtual memory management), memory protection, and central processor cache control. In a relatively simple computer architecture, it is responsible for bus arbitration and bank switching (bank switching, Especially on 8-bit systems).

MMU​

History

Many years ago, when people were still using DOS or an older operating system, the computer's memory was still very small, generally calculated in units of K. Correspondingly, the program size was not large at that time, so the memory Although the capacity is small, it can still hold the program at that time. However, with the rise of the graphical interface and the continuous increase in user demand, the scale of the application has also expanded. Finally, a problem is in front of the programmer, that is, the application is too large to hold the program in memory, usually The solution is to divide the program into many pieces called overlays.

Overlay block 0 runs first, and at the end he will call another overlay block. Although the exchange of overlay blocks is done by the OS, the programmer must first divide the program. This is a time-consuming and labor-intensive work, and it is quite boring. People must find a better way to solve this problem fundamentally. Soon people found a way, this is virtual memory (virtual memory). The basic idea of virtual memory is that the total size of the program, data, and stack can exceed the size of physical memory. The operating system keeps the currently used part in memory, and saves the other unused parts on the disk. For example, for a 16MB program and a machine with only 4MB of memory, the operating system can decide which 4M content to keep in memory at various times by choosing, and exchange program fragments between memory and disk when needed, so that you can put This 16M program runs on a machine with only 4M memory. The 16M program does not need to be divided by the programmer before running.

Introduction

The memory management unit is usually applied to a desktop computer or server, and the virtual storage allows the computer to use more storage space than the actual physical memory. At the same time, the memory management unit also divides and protects the actual physical memory, so that each software task can only access the allocated memory space. If a task attempts to access the memory space of other tasks, the memory management unit will automatically generate an exception to protect the programs and data of other tasks from damage. The figure on the right shows a typical memory map that uses a memory management unit for multiple tasks. This mechanism of the memory management unit is a very powerful tool for debugging errors such as pointer errors or array subscript out of bounds.

Basic concepts

The MMU is located between the processor core and the bus connecting the cache and physical memory. When the processor core fetches instructions or accesses data, it will provide an effective address (effective address), or logical address or virtual address. This address is generated by the linker when the executable code is compiled. Unlike programmers who develop embedded processor systems, programmers of desktop calculators usually know very little about the physical configuration information of the hardware. To virtualize the memory system, programmers do not need to understand the physical configuration details of the memory. When the application code needs to use storage space, the operating system allocates appropriate physical storage space through the MMU. The effective address does not need to match the actual hardware physical address of the system, but the effective address is mapped to the corresponding physical address through the MMU to access instructions and data.

The size of the memory corresponding to each MMU matching rule is defined as a page. The page size is usually set to the minimum program and code length that does not significantly affect the performance of the program. When the content of physical memory is not used temporarily, it can be saved to external storage such as a hard disk, and its space is used for other programs; when this part of content is used again, it is written back from the external storage to the actual physical memory. In this way, the system can provide more "virtual memory" than the actual physical memory capacity. If the page defined by the MMU is too large, the time it takes to perform virtual memory page replacement is too long; if the page is too small, it will cause too frequent page replacement. Usually the smallest page is set to 4 KB.

In order to speed up the process of MMU rule matching, the correspondence table between the effective address and the actual physical address is usually stored in a separate cache, called a correspondence lookup table (Translation Lookaside Buffer, TLB). The TLB and the actual physical memory can be parallelized at the same time. Access. The high-order bits of the effective address serve as the basis for matching and searching in the TLB, and the low-order bits of the effective address serve as the offset address within the page.

The TLB can contain many entries, and each entry corresponds to an MMU page. The operating system or application startup code must correctly initialize all TLB entries. When the effective address provided by the application is exactly within the address range specified by a certain TLB entry, it is called a TLB hit; if the effective address is not within the address range specified by any TLB entry, it is called a TLB Missing, or TLB miss. TLB misses often occur when an application error occurs, so exception handling caused by TLB misses can be very effective in finding and debugging such errors. The virtual memory uses the exception of the TLB miss to complete the page swap, and adjusts the parameters of the TLB entry according to the content of the swap. Usually TLB entries will also specify some other parameters for memory read and write. Only when these parameters are consistent with the current memory read and write parameters can a TLB hit be generated.

MMU properties

MMU meets the specifications of the Power architecture. TLB entries are all connected and mapped, and provide additional hardware to speed up the handling of TLB miss exceptions. There are several special instructions for managing TLB entries.

Page size

The memory range mapped by the TLB entry is the page. This attribute specifically includes:

① The specified effective mapping address.

② Read and write permissions.

③ The relationship between storage and cache.

The MMU has 32 TLB entries and can support 32 pages. The capacity of each page can be specified as 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB, 16 MB, 64 MB and 256 MB.

The actual page address (RealPageNumber, RPN) determines the starting physical address corresponding to the page. This starting address must be an integer multiple of the page size. For example, the starting address of a 16 KB page must be on the boundary of 0, 16 KB, 32 KB, or 48 KB. In this way, when you map the actual physical address of the MPC 5554 / 5553, you need to pay special attention.

In fact, the memory allocation table also takes this into consideration, carefully allocating the start address and module address space of each on-chip module, so that the required TLB entries are as few as possible. In fact, only the first 5 TLB entries can be used in the BAM code to cover the entire address space of the microprocessor. Multiple pages are also allowed to occupy the same physical address space.

For example, a 64KB section of FLASH memory on the chip can be mapped into a 64KB read-only page; in addition, a section of 16KB read-write page can be mapped to the 64KB starting address. In this way, the 64KB address space is actually divided into two parts, 48KB and 16KB, as shown on the right. Although these two pages may generate the same physical address, by combining other page attributes, multiple pages can be prevented from hitting at the same time. Note that read and write permissions cannot be used to avoid multiple page hits.

In this configuration, the read-only portion of 48 KB can be used to save fixed data such as constants and tables, while the read-write portion of 16 KB can be used to simulate the EEP-ROM storage area. The application must be very careful to confirm that the code that operates on the 48KB constant segment will not incorrectly access the previous 16KB area, because in this case the MMU cannot check the access to the 48KB area out of bounds. If you want to implement the cross-border inspection of the 48 KB area, you will need to use a total of 4 TLB entries to achieve. Using 3 TLB entries to map this 48 KB space to 3 independent 16 KB pages, and without any overlap with the original 16 KB pages.

By configuring TLB entries, all on-chip memory and I/O space can also be mapped into a large continuous virtual address space. The address space of the I/O modules of MPC 5554/5553 is scattered in the entire 4 GB addressing space. In the example below, the 16 KB dual-port shared RAM of the eTPU module and the 64 KB RAM on-chip are mapped into a continuous 80KB virtual space.

Of course, the eTPU module can no longer be used at this time. By configuring TLB entries, you can also map the addresses of peripherals that the processor needs to access frequently in the application to those page spaces that can use more efficient addressing methods. For example, by mapping the address space of the eMIOS module to a page starting at address 0, the processor can use efficient immediate indirect addressing instead of complicated register indirect addressing.

TS address space type indication

According to the provisions of the Power system specification, the effective address also contains two additional bits used to indicate whether the current effective address is an instruction fetch operation and data read and write operations. For instruction operations, this extra bit is saved in the IS bit of the MSR register; for data operations, this extra bit is saved in the DS bit of the MSR register. These two bits can be modified by program instructions, but when an interrupt occurs, both bits in the MSR register will be cleared to 0.

When the processor performs instruction fetching or data reading and writing operations, the IS and DS bits are compared with the TS bits stored in the TLB entry, respectively, and the result of the comparison will affect whether a TLB hit occurs.

Since the IS and DS bits in the MSR register are cleared to 0 when an interrupt occurs, the TS of the page where the interrupt handler is located must be set to 0. The TS of the page where the normal application program is located needs to be set to 1.

Use the address space type indication to distinguish multiple program modules that map to the same effective address. For example, the interrupt service program can be placed in the first 32KB of the address space, and has the attribute of running in the system state; and the fixed data table of the application can also be mapped to this 32 KB, with the attribute of read-only in the normal state.

TID TLB processing ID number

According to the Power System Specification, MMU also provides PID (ProcessID) features. When the TLB performs the effective address mapping process, the value of the PID register is also compared with the TID value in the TLB entry. No matter it is instruction operation or data operation, PID and TID must be compared.

If TID is set to a value of 0, then the result of PID and TID comparison will be ignored and will not affect the processing of TLB hits.

When multiple TLB entries correspond to the same effective address, they can be distinguished by TID. By modifying the value of the TID register, it is easy to switch the memory at run time. For example, when the application is running, after modifying the value of the TID register through the NEXUS debug interface, the data corresponding to the same effective address will be changed from another - Physical storage space. This is very useful for debugging.

EPN, RPN effective page address, actual page address

In each TLB entry, the following conditions must be met to correctly map the effective address:

● The content of the specific digits of the effective address is the same as the effective page address EPN in the TLB entry.

● The IS bit (for instruction operation) or DS bit (for data operation) of the MSR register is the same as the TS bit in the TLB entry.

● The value of PID register is the same as the value of TID in the TLB entry, or the value of TID is zero.

The figure on the right shows the logical processing structure of effective TLB hits.

For a valid address, if none of the TLB entries meet the above conditions, a TLB is missing, which can cause a TLB missing exception for an instruction or data.

The page size defined in the TLB entry determines how many bits of effective address information need to be compared with the EPN in the TLB entry. When the TLB hits, the RPN in the TLB entry replaces the corresponding bit in the effective address and constitutes the actual physical address.

Memory access

The program can specify certain access rights for each virtual page, including whether it is in the system state or the normal state, whether to allow reading, writing and running instructions. For some applications, these access rights settings can only be configured once after a system reset.

For example, the area where the program code is located is configured to run only, the data variable area is set to read-write non-run, and the data constant area is configured to read-only non-run. For other applications, these access rights are dynamically modified by the operating system according to the needs of the application program and the system's operating strategy.

The UX, SX, UW, SW, UR and SR access permission bits are used to set the access permission of a virtual page. The specific description of these bits is as follows:

● SR--System state read permission: In the system state (MSR[PR=0]), it allows the storage area read operation and the cache management operation in the form of read.

●SW--System state write permission: In the system state (MSR[PR=0]), it allows memory write operations and cache management operations in the form of writes.

●SX--System state operation authority: In the system state (MSR[PR=0]), it is allowed to obtain and execute instructions from the memory.

● UR--Read permission in normal state: Under normal state (MSR[PR=1]), it allows read operation of storage area and cache management operation in the form of read.

●UW--write permission in normal state: in normal state (MSR[PR= 1]), it allows memory write operations and cache management operations in the form of write.

● UX-normal state operation authority: in normal state (MSR[PR= 1]), it allows to fetch and execute instructions from memory.

After the address comparison and page attribute comparison are completed, these access permission settings also need to be checked. If a permission conflict occurs, an instruction or data storage interrupt (ISI or DSI) will be triggered.

Related concepts

Address range

At any time, there is a set of addresses that a program can generate on the computer, which we call an address range.

Virtual address

The size of the address range is determined by the number of bits of the CPU. For example, for a 32-bit CPU, its address range is 0~0xFFFFFFFF (4G), and for a 64-bit CPU, its address range is 0~0xFFFFFFFFFFFFFFFF (16E). This range is the address range that our program can generate. We call this address range a virtual address space, and we call a certain address in the space a virtual address.

Physical address

Corresponding to the virtual address space and the virtual address are the physical address space and the physical address. Most of the time, the physical address space that our system has is only a subset of the virtual address space. Here is a simple example to illustrate these two intuitively. For a 32bit x86 host with 256M memory, its virtual address space range is 0~0xFFFFFFFF (4G), while the physical address space range is 0x00000000 ~ 0x0FFFFFFF (256M).

Address mapping

On a machine that does not use virtual memory, the address is sent directly to the memory bus, so that the physical memory with the same address is read and written; while in the case of using virtual memory, the virtual address is not sent directly to the memory address bus Instead, it is sent to the memory management unit MMU to map the virtual address to a physical address.

Paging mechanism

Most systems that use virtual memory use a mechanism called paging. The virtual address space is divided into units called pages, and the corresponding physical address spaces are also divided into units of page frames. The size of pages and page frames must be the same. In this example, we have a machine that can generate 32-bit addresses, and its virtual address range is from 0 to 0xFFFFFFFF (4G), and this machine has only a 256M physical address, so he can run 4G programs, but the program cannot Loaded into memory at once. This machine must have an external memory (such as a disk or FLASH) that can store 4G programs to ensure that program fragments can be called when needed. In this example, the page size is 4K, and the page frame size is the same as the page-this must be guaranteed, because the transfer between memory and peripheral memory is always in units of pages. Corresponding to 4G virtual addresses and 256M physical memory, they contain 1M pages and 64K page frames, respectively.

Features

1) Map linear addresses to physical addresses

Modern multi-user multi-process operating systems require MMU to achieve the goal that each user process has its own independent address space. Using MMU, the operating system divides an address area. In this address area, what each process sees is not necessarily the same. For example, the MICROSOFT WINDOWS operating system divides the address range 4M-2G into the user address space. Process A maps the executable file at address 0X400000 (4M). Process B also maps the executable file at address 0X400000 (4M). If the process A reads Address 0X400000, read the contents of A's executable file mapped to RAM, and when process B reads address 0X400000, it reads the contents of B's executable file mapped to RAM. This is the role of MMU in address translation.

2) Provide memory access authorization of hardware mechanism

For many years, microprocessors have been equipped with an on-chip memory management unit (MMU), which enables a single software thread to work in the hardware protected address space. But in many commercial real-time operating systems, even if the system contains these hardware, MMU is not used.

When all threads of an application share the same memory space, any one thread will intentionally or unintentionally destroy the code, data, or stack of other threads. Abnormal threads may even destroy kernel code or internal data structures. For example, a wrong pointer in a thread can easily crash the entire system, or at least cause the system to work abnormally.

In terms of security and reliability, the performance of a process-based real-time operating system (RTOS) is superior. To generate processes with separate address spaces, RTOS only needs to generate some RAM-based data structures and make the MMU strengthen the protection of these data structures. The basic idea is to "access" a new set of logical addresses in each association translation. The MMU uses the current mapping to map the logical address used in the instruction call or data reading and writing process to the physical memory address. The MMU also marks accesses to illegal logical addresses that are not mapped to any physical addresses.

Although these processes increase the system overhead inherent in using the lookup table to access the memory, the benefits achieved are very high. At the process boundary, negligence or incorrect operation will not occur, and defects in the user interface thread will not cause the code or data of other more critical threads to be destroyed. In complex embedded systems that require high reliability and security, there is still a case of operating systems without memory protection, which is really incredible.

Using MMU also facilitates selective mapping or demapping of pages into logical address space. Physical memory pages are mapped to logical space to maintain the code of the current process, and the remaining pages are used for data mapping. Similarly, physical memory pages can maintain the thread stack of a process through mapping. RTOS can easily retain the page content corresponding to the logical address after each thread stack is unmapped. In this way, if the stack allocated by any thread overflows, a hardware memory protection fault will occur, and the kernel will suspend the thread without destroying other important memory areas in the address space, such as another thread stack. This not only increases memory protection between threads, but also between the same address space.

Memory protection (including this type of stack overflow detection) is usually very effective in application development. With memory protection, program errors will generate exceptions and can be detected immediately, it is tracked by the source code. Without memory protection, program errors will cause some subtle failures that are difficult to track. In fact, in the flat memory model, RAM is usually located at the zero page of the physical address, so even the release of the NULL pointer reference cannot be detected.

ASSOCIATED PRODUCTS

  • XCS20XL-4VQG100I

    XCS20XL-4VQG100I

    FPGA Spartan-XL Family 20K Gates 950 Cells 217MHz 3.3V 100-Pin VTQFP

  • XCS20XL-5TQG144C

    XCS20XL-5TQG144C

    FPGA Spartan-XL Family 20K Gates 950 Cells 250MHz 3.3V 144-Pin TQFP

  • XC2V1500-4FG676C

    XC2V1500-4FG676C

    FPGA Virtex-II Family 1.5M Gates 17280 Cells 650MHz 0.15um Technology 1.5V 676-Pin FBGA

  • XC2V1500-5FFG896C

    XC2V1500-5FFG896C

    FPGA Virtex-II Family 1.5M Gates 17280 Cells 750MHz 0.15um Technology 1.5V 896-Pin FCBGA

  • XC2V1500-5FGG676I

    XC2V1500-5FGG676I

    FPGA Virtex-II Family 1.5M Gates 17280 Cells 750MHz 0.15um Technology 1.5V 676-Pin FBGA

FPGA Tutorial Lattice FPGA
Need Help?

Support

If you have any questions about the product and related issues, Please contact us.