Memory Hierarchy and cache

4 minute read

Virtual memory

This extends the notes over here // These are a series of notes write ups for anyones refernece as a summary.

Virtual memory is a technique of being able to logically address a larger amount of memory that is physically present in the system. This can be useful for multiple reasons, including hiding the actual location of the memory (page file on HDD) and also allowing programs to be able to have their own mapping of addresses independant of the hardware.

It allows physical ram to allocated in discontinious blocks, so that the hardware can be better utilized.

While performing this, it is still transparent to your application where in the physical memory that data might be located. This means that in your code you can access a large contigious block of memory, but in hardware that block may be made up of many smaller contigious blocks.

This is then further used by allowing multiple programs to share blocks, this works provided they are reading from this area of memory but it can help avoid loading duplicate data into memory.

Design

In part one cache was defined to split the provided address into both a block address (the lower bits) and a tag which is the higher bits of the address. Virtual memory expands on this notation, but packing extra bits at the beginning of the address, these can be used to hold the extra information to makethese addresses mappable to the physical address.

For example for a 32-bit hardware address, it virtual memory will split this into a virtual page number with the upper 20 bits, and a page offset with the lower 12 bits. In this example it means that pages are 2^12 bytes in size.

The Virtual Page Number is used to index into a page table that stores the mapping from the virtual address to the corresponding physicaly address of the page that backs that data.

TLB ( Translation Look-aside Buffer)

This is a smaller cache that buffers the translation from a virtual address to the main physical memory address, this stores a small subsection of the translations the same as a main system cache. If an entry is not found in this cache, it needs to be checked if its a cache miss or a page fault. If its a page fault (ie invalid decoding) an exception is raised to the host processor. If its just a TLB miss, then the full address is looked up in the main page table.

A miss can be a very high cost, sometimes into the realm of hundreds of clock cycles for the resolution, or even longer in the case of multiple level page tables.

The requested table is no longer available (such as being paged from ram to hard disk) then a page fault is raised and the host OS can handle this to retrieve this and cause the program to wait. (The OS can schedule other programs in the mean time).

Because the cost of a page fault is so high, it is almost always a fully associative cache, meaning that any entry can be mapped to any location in the cache.

Page table size and layout

On a 32-bit system, the page table can be quite large, and consume a significant portion of system ram. on a 64-bit system it becomes completely impractical.

One possible solution to this issue would be to increase the page size (and proportionally reduce the size of the page table). The Intel IT32 offers 4MB page size for this reason.

Another alternative would be to use a hashtable to store the information, but this can come with a significant runtime computation cost without extensive hardware support.

Windows and UNIX instead use a third option, multi-level page tables. These are a set of tables that work with pointers to sub tables to resolve the address. When a virtual address is passed through, the main table resolves the first few bits, and then the entry stored under that tag is a new memory address for the next table. This can continue for as many tables as desired.

Memory Protection

Virual memory also has another essential function; placing a barrier between the running processes and the rest of the system (other programms and the OS).

This is implimented using flags for the privledge of the running code and also the memory in the system.

Additional flags can be used to represent if the memory is read only or if it can be written to.

PAE (Physical Address Extension)

PAE was brought up by Intel as a recyxling of an old idea from the Apple IIe, where your OS may only use 32 bits for its address space, but your system can hold more than this.

Intel has worked around this by keeping the 4GB limitation for each program, but by having the page table expand with an extra layer to hide the 64bit hardware addresses.