「MIT 6.828」MIT 6.828 Fall 2018 lab2

Lab2—Memory management

回顾

在lab1中，完成了实模式【16bits】到保护模式【32bits】的转换，但是在系统真正要接管内存【在cr0开启地址转换】，分配真正页表之前，要有一个简单的页表将kernel code载入到内存中【赋值到cr3】。

即kern/entrypgdir.c，设置了虚拟地址0x00000000-0x00400000以及0xF0000000-0xf0400000都映射到物理地址的[0, 4MB)【引用这些地址范围以外的虚拟地址将会抛出缺页的异常】，只管理了部分虚拟空间。此后entry.S跳转到了虚拟地址空间的高地址部分来继续运行kernel，最后跳转到i386_init()中【此前设置好了内核栈，可执行c代码】，此后在cmd上打印了一系列字符【利用printf等】，而后调用mem_init()。

bootloader读入内核代码之后的分布【实际物理内存分布】：

开启分页的机制以后，寻址方式变为【逻辑地址-线性地址-物理地址】：

对虚拟内存空间来说，可以划分为代码段、数据段、堆栈段等等部分，对于每个段的访问，都是通过基地址+偏移量来访问的。基地址是通过段寄存器的值在描述符表中的偏移来获得，而偏移量就是我们所说的逻辑地址；并且，线性地址 = 基地址+偏移地址(逻辑地址)；物理地址则是通过线性地址来寻找，通过页表的方式来获取。【分段->分页】

JOS的内核地址变换过程如下：JOS内核开始执行后，执行在0x0010 0000处，此后由于开启分页机制，虚拟内存机制也随之一同开启，此后JOS内核将会在虚拟内存空间中的KERNBASE(0xF000 0000)处执行，更重要的是，此后的JOS代码执行将全部使用虚拟内存地址。由于分页机制需要使用页表，在完整的分页机制建立之前，JOS使用了一个“人工手写”的映射关系，将虚拟内存空间中[KERNBASE, KERNBASE+4MB) 、 [0, 4MB)的地址一同映射到物理内存[0, 4MB)处。上述代码中entry_pgdir就指向了手写页表的基地址。在上述过程完成后，内核将跳转至i386_init()->mem_init()(/kern/pmap.c)处执行，主要的执行代码在/kern/pmap.c文件下，该文件主要完成虚拟地址到物理地址之间的转换，即页表的建立，为此后的系统内存分配提供了保证。

Introduction

本实验将为OS设计一个内存管理器【memory management code】

Memory management 由两个部分组成：

物理内存分配器：为内核分配物理内存，从而内核可分配并且释放内存。
- 分配器以4KB单位进行分配，即页大小=4KB
- write the routines to allocate and free pages of memory.
- how many processes are sharing each allocated page.
虚拟内存：将用户软件和内核使用的虚拟地址映射到物理地址
- The x86 hardware’s memory management unit (MMU) performs the mapping when instructions use memory, consulting a set of page tables
- 当指令访问内存时，虚拟地址-物理地址的映射由MMU完成【具体依据页表来完成】

Lab2将会包含以下源文件：

inc/memlayout.h
kern/pmap.c
kern/pmap.h
kern/kclock.h
kern/kclock.c

inc/memlayout.h描述了虚拟地址空间的整体布局，可以通过修改pmap.c以实现具体内容【pmap.c中访问RAM硬件设备以确定有多少物理内存的这一部分代码以实现，不需要知道CMOS硬件工作的具体细节】。

memlayout.h 和 pmap.h 定义了PageInfo 数据结构，可通过此结构来追溯物理页是否已释放。

kclock.c 和 kclock.h 操作了电池驱动的PC时钟和CMOS RAM硬件【这里的BIOS中存储了PC包含的物理内存数量】，从而pmap.c需要读取这个硬件设备，搞清楚到底有多少物理内存。

在实验开始前先熟悉memlayout.h、pmap.h ，回顾 mmu.h文件中的一些定义。

    4 Gig -------->  +------------------------------+
                     |                              | RW/--
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     :              .               :
                     :              .               :
                     :              .               :
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
                     |                              | RW/--
                     |   Remapped Physical Memory   | RW/--
                     |                              | RW/--
    KERNBASE, ---->  +------------------------------+ 0xf0000000      --+
    KSTACKTOP        |     CPU0's Kernel Stack      | RW/--  KSTKSIZE   |
                     | - - - - - - - - - - - - - - -|                   |
                     |      Invalid Memory (*)      | --/--  KSTKGAP    |
                     +------------------------------+                   |
                     |     CPU1's Kernel Stack      | RW/--  KSTKSIZE   |
                     | - - - - - - - - - - - - - - -|                 PTSIZE
                     |      Invalid Memory (*)      | --/--  KSTKGAP    |
                     +------------------------------+                   |
                     :              .               :                   |
                     :              .               :                   |
    MMIOLIM ------>  +------------------------------+ 0xefc00000      --+
                     |       Memory-mapped I/O      | RW/--  PTSIZE
 ULIM, MMIOBASE -->  +------------------------------+ 0xef800000
                     |  Cur. Page Table (User R-)   | R-/R-  PTSIZE
    UVPT      ---->  +------------------------------+ 0xef400000
                     |          RO PAGES            | R-/R-  PTSIZE
    UPAGES    ---->  +------------------------------+ 0xef000000
                     |           RO ENVS            | R-/R-  PTSIZE
 UTOP,UENVS ------>  +------------------------------+ 0xeec00000
 UXSTACKTOP -/       |     User Exception Stack     | RW/RW  PGSIZE
                     +------------------------------+ 0xeebff000
                     |       Empty Memory (*)       | --/--  PGSIZE
    USTACKTOP  --->  +------------------------------+ 0xeebfe000
                     |      Normal User Stack       | RW/RW  PGSIZE
                     +------------------------------+ 0xeebfd000
                     |                              |
                     |                              |
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     .                              .
                     .                              .
                     .                              .
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
                     |     Program Data & Heap      |
    UTEXT -------->  +------------------------------+ 0x00800000
    PFTEMP ------->  |       Empty Memory (*)       |        PTSIZE
                     |                              |
    UTEMP -------->  +------------------------------+ 0x00400000      --+
                     |       Empty Memory (*)       |                   |
                     | - - - - - - - - - - - - - - -|                   |
                     |  User STAB Data (optional)   |                 PTSIZE
    USTABDATA ---->  +------------------------------+ 0x00200000        |
                     |       Empty Memory (*)       |                   |
    0 ------------>  +------------------------------+                 --+

上图所示的是JOS的虚拟内存空间分布，JOS内核空间位于KERNBASE之上。在/kern/pmap.c中存在以下几个全局变量需要注意：

kern_pgdir：内核一级页表指针/基地址
pages : 页表信息指针/基地址，将物理内存划分成页后，每一个页i对应一个pages[i]【pages是kern_pgdir内核页目录的下一级页表】
page_free_list : 物理页表空闲链表

Part 1: Physical Page Management

OS必须跟踪物理RAM明确哪些是free、哪些是in use。JOS管理PC的物理内存是以page为单位/粒度的，因而PC可以使用MMU来映射和保护每一个已分配内存。

现在需要实现物理内存分配器，通过struct PageInfo对象来追溯哪些页面是free，每一个struct PageInfo对象对应一个物理页

注意：在实现物理内存分配器之后，才能实现虚拟内存的部分，因为需要物理内存分配以存储页表。

Exercise 1

You need to write the physical page allocator before you can write the rest of the virtual memory implementation, because your page table management code will need to allocate physical memory in which to store page tables.

In the file kern/pmap.c, you must implement code for the following functions (probably in the order given).

boot_alloc()
mem_init() (only up to the call to `check_page_free_list(1)`)
page_init()
page_alloc()
page_free()

check_page_free_list() and check_page_alloc() test your physical page allocator. You should boot JOS and see whether check_page_alloc() reports success. Fix your code so that it passes. You may find it helpful to add your own assert()s to verify that your assumptions are correct.

boot_alloc()

现在来按顺序实现，首先进行boot_alloc(uint32_t n)：

这是一个简单的物理内存分配器，仅仅会在JOS启动虚拟内存之前使用，而真正的内存分配器是page_alloc()。

如果n>0，那么分配的足够的连续物理内存页，并且不会初始化，返回一个kernel内存虚拟地址【因为这时候已经开启了页表地址转化模式，且有了初始的kernel的页表，因此返回的地址都是虚拟地址】
n=0，则返回下一个free page的地址，并且不会分配任何内存。

根据上述表达完成代码如下：

// This simple physical memory allocator is used only while JOS is setting
// up its virtual memory system.  page_alloc() is the real allocator.
//
// If n>0, allocates enough pages of contiguous physical memory to hold 'n'
// bytes.  Doesn't initialize the memory.  Returns a kernel virtual address.
//
// If n==0, returns the address of the next free page without allocating
// anything.
//
// If we're out of memory, boot_alloc should panic.
// This function may ONLY be used during initialization,
// before the page_free_list list has been set up.
//由于已经开启了分页机制，并且只在[0,4KM]和[KERNBASE,KERNBASE+4MB]设置了页表，所以在这之外的虚拟内存无法映射，会out of memory。
//如下是分配连续的kernel虚拟空间。
static void *
boot_alloc(uint32_t n)
{
	//注意这个变量是staic变量，存储在.data区,长期存在且维护。
	static char *nextfree;	// virtual address of next byte of free memory, 这里存储了下一个空位置，直接在这里分配后再移动该指针到下一个空位置
	char *result;

	// Initialize nextfree if this is the first time.
	// 'end' is a magic symbol automatically generated by the linker,
	// which points to the end of the kernel's bss segment:
	// the first virtual address that the linker did *not* assign
	// to any kernel code or global variables.
	if (!nextfree) {
		extern char end[];
		nextfree = ROUNDUP((char *) end, PGSIZE);  //也就是下一个空闲的page位置在kernel的bss段后的第一个PGSIZE对齐的位置
	}

	// Allocate a chunk large enough to hold 'n' bytes, then update
	// nextfree.  Make sure nextfree is kept aligned
	// to a multiple of PGSIZE.
	//
	// LAB 2: Your code here.
	if(n==0){
		return nextfree;
	}
	if(n>0){
        if (PADDR(nextfree) + n > (npages + 1) * PGSIZE) {
        	panic("out of memory\n");
    	}  
		char * nowalloc = nextfree;
		nextfree = nextfree + n;
		nextfree = ROUNDUP(nextfree,PGSIZE);
		return nowalloc;
	}
	
	return NULL;
}

以上的end是在kernel.ld中描述的，即为kernel最后一个加载段.bss的结束地址【虚拟地址】
因此nextfree初始化为kernel加载后，高地址方向的最后一个空闲虚拟地址。

  //由于初始的entry_pgdir是将虚拟地址[KERNBASE, KERNBASE+4MB)和[0, 4MB)映射到了物理地址空间[0, 4MB)，因此通过将高地址虚拟空间的kernel地址-KERNBASE，就可以转换为物理内存下的kernel地址空间
  //故这个宏为takes a kernel virtual address-> returns the corresponding physical address
  /* This macro takes a kernel virtual address -- an address that points above
   * KERNBASE, where the machine's maximum 256MB of physical memory is mapped --
   * and returns the corresponding physical address.  It panics if you pass it a
   * non-kernel virtual address.
   */
  #define PADDR(kva) _paddr(__FILE__, __LINE__, kva)
  
  static inline physaddr_t
  _paddr(const char *file, int line, void *kva)
  {
  	if ((uint32_t)kva < KERNBASE)
  		_panic(file, line, "PADDR called with invalid kva %08lx", kva);
  	return (physaddr_t)kva - KERNBASE;
  }

mem_init()

追溯boot_alloc()函数的调用者mem_init()，

首先调用了i386_detect_memory(void)

通过这个函数检测，当前有多少可用的内存

// Find out how much memory the machine has (npages & npages_basemem).
i386_detect_memory();
//并存储到全局变量npages和npages_basemem中
//前者包括available base & extended memory，通过读取nvram得到
//CMOS RAM hardware, in which the BIOS records the amount of physical memory the PC contains,……

而后调用了kern_pgdir = (pde_t *) boot_alloc(PGSIZE);
- 在这之前，启动页表时，只设置了entry_pgdir。
- The entry.S page directory maps the first 4MB of physical memory starting at virtual address KERNBASE (that is, it maps virtual addresses [KERNBASE, KERNBASE+4MB) to physical addresses [0, 4MB)).
- We choose 4MB because that’s how much we can map with one page table and it’s enough to get us through early boot. We also map virtual addresses [0, 4MB) to physical addresses [0, 4MB); this region is critical for a few instructions in entry.S and then we never use it again.
- 在kernel之后【kernel的最后一个ld部分是bss】，分配的是kern_pgdir，通过memset把其置为0。

而后对kern_pgdir进行了设置。

参考本文末尾的 The UVPT（user virtual page table）

在页目录0x3BD表项中插入回环指针。

//Recursively insert PD in itself as a page table, to form a virtual page table at virtual address UVPT.
// Permissions: kernel R, user R
kern_pgdir[PDX(UVPT)] = PADDR(kern_pgdir) | PTE_U | PTE_P;
//kern_pgdir[v]=kern_pgdir[0x3BD]= kern_pgdir = 页目录物理基地址 以及权限位kernel | user
//PADDR是以kernel虚拟地址为参数，返回kernel物理地址【相差的bias为-KERNBASE】

接下来就是要自己实现的部分了。

现在要建立npages大小的struct PageInfo数组【每个该结构体对应一个物理页】存储到pages变量中，并在kernel virual space中为其分配虚拟空间，初始化各个字段为0。

其中npages表示内存中的物理页数量。
pages存储了这些物理页的基本信息【并不是物理内存本身】，并作为信息队列的头部。

//在memlayout.h中定义
/*
 * Page descriptor structures, mapped at UPAGES.
 * Read/write to the kernel, read-only to user programs.
 *
 * Each struct PageInfo stores metadata for one physical page.
 * Is it NOT the physical page itself, but there is a one-to-one
 * correspondence between physical pages and struct PageInfo's.
 * You can map a struct PageInfo * to the corresponding physical address
 * with page2pa() in kern/pmap.h.
 */
struct PageInfo {
	// Next page on the free list.
	struct PageInfo *pp_link;

	// pp_ref is the count of pointers (usually in page table entries)
	// to this page, for pages allocated using page_alloc.
	// Pages allocated at boot time using pmap.c's
	// boot_alloc do not have valid reference count fields.

	uint16_t pp_ref;
};


//在pmap.c中定义
// These variables are set by i386_detect_memory()
size_t npages;			// Amount of physical memory (in pages)
static size_t npages_basemem;	// Amount of base memory (in pages)

// These variables are set in mem_init()
pde_t *kern_pgdir;		// Kernel's initial page directory
struct PageInfo *pages;		// Physical page state array
static struct PageInfo *page_free_list;	// Free list of physical pages

添加的代码如下：

n = sizeof(struct PageInfo) * npages;
pages = (struct PageInfo*)boot_alloc(n);
memset(pages,0,n);

可以说内核页目录kern_pgdir【其中存储了第二级页表的位置，但不是pages变量】+物理页集合pages，由内核控制负责整个物理内存【除内核本身的】的管理和映射。内核本身的映射由人工构建的entry_pgdir和entry_pgtable来实现。

page_init()

mem_init()接下来，就到了page_init()。

根据提示可知，现在我们需要分配的initial kernel data structures，因此我们先来设置空闲物理空间的链表。一旦我们完成这一步，之后的内存管理都会调用page_为前缀的函数。

我们在page_init()中要完成的工作有：

初始化page structrue
memory free list

总的来说，这个函数的主要作用是初始化之前分配的pages数组，并且构建一个PageInfo链表，保存空闲的物理页，表头是全局变量page_free_list。

一旦以上工作完成，之后不会再使用boot_alloc()，而是使用page_xxxx的函数【通过维护page_free_list】来进行物理内存的分配和释放工作。

//主要是判断哪些page是free的，哪些是in use的。

// --------------------------------------------------------------
// Tracking of physical pages.
// The 'pages' array has one 'struct PageInfo' entry per physical page.
// Pages are reference counted, and free pages are kept on a linked list.
// --------------------------------------------------------------
//追踪物理内存，pages 保存的每一个页的信息，有些页实际上是不能用的。
//
// Initialize page structure and memory free list.初始化页面结构和空闲内存
// After this is done, NEVER use boot_alloc again.  ONLY use the page
// allocator functions below to allocate and deallocate physical
// memory via the page_free_list.
//从这以后 就再也不会用 boot_alloc【只能紧挨着kernel结尾分配连续的内存页】，只有page分配函数在 page_free_list 上面进行操作了.
//也就是说在这个函数执行以后，就从page_free_list体现出了哪些内存是空闲的，只能在空闲的内存中进行分配。
void
page_init(void)
{
	// The example code here marks all physical pages as free.示例代码帮你把所有页都变成了空闲页
	// However this is not truly the case.  What memory is free? 但是其中有些不是空闲的
	//  1) Mark physical page 0 as in use.  比如第0号页存了实模式下面的IDT（中断向量表）和BIOS。不可以再将第0页内存进行分配
	//     This way we preserve the real-mode IDT and BIOS structures
	//     in case we ever need them.  (Currently we don't, but...)
	//  2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
	//     is free. 除去第0页之后，从第一个起到base memory的结尾是可以用的。
	//  3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
	//     never be allocated. 之后有一块是给IO用的，这段内存空洞不能分配。
	//  4) Then extended memory [EXTPHYSMEM, ...). 然后就是扩展内存，开头的低地址部分已经被内核用了。
	//     Some of it is in use, some is free. Where is the kernel
	//     in physical memory?  Which pages are already in use for
	//     page tables and other data structures?
	//
	// Change the code to reflect this.
	// NB: DO NOT actually touch the physical memory corresponding to
	// free pages!
	size_t i;
	for (i = 0; i < npages; i++) {
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i]; //注意这个page_free_list是倒序连接的，也就是说分配内存的时候，会先使用extended memory，才会使用到base memory。
        //NULL<-[0]<-[1]<-[2]<-……<-page_free_list
	}
	// 根据上面他给的提示写，1)0号页是实模式的IDT和BIOS不应该添加到空闲页，所以
	pages[1].pp_link=pages[0].pp_link; //即next指向page_free_list
	pages[0].pp_ref = 1;//可以随意设置，因为这个页都没有进free list，永远都不可能用去分配。【如果某个页是free，才会用到】
	pages[0].pp_link=NULL; //表示in use
	//2)是说那一块可以用，也就是上一次实验说的低地址，所以不用做修改,也就是保持page 1~ page npages_basemem
	//3)存IO的那一块不能分配，地址是从[IOPHYSMEM,EXTPHYSMEM)
	size_t range_io=PGNUM(IOPHYSMEM),range_ext=PGNUM(EXTPHYSMEM); //PGNUM宏，就是参数/PGSIZE。
	pages[range_ext].pp_link=pages[range_io].pp_link;
	for (i = range_io; i < range_ext; i++) pages[i].pp_link = NULL;  //IO使用的内存全部设置为已分配,不在free list中

	//4)在extended memory中分配了一些内存页面给内核，其后还有已经被分配的kern_pgdir和pages【即之前是分配的内核页目录，和为每个页分配数据结构】。直接 找到了boot_alloc(0),瞬间明白..这个直接把页表的空间也算上去了，所以准确来说应该是内核+页表+页目录的内存（可能内核包括页表和页目录..）。
	size_t free_top = PGNUM(PADDR(boot_alloc(0))); //boot_alloc两次，得到的下一个free的位置。
	pages[free_top].pp_link = pages[range_ext].pp_link;
	for(i = range_ext; i < free_top; i++) pages[i].pp_link = NULL; //包括kernel、kernel page table directory和pages结构所在的区域都设置为不可再分配。
}
//注意：kernel是在extended memory的起始位置1MB=0x100000物理地址开始装载的。

现在我们来回看一下以上函数设置的pages【是PageInfo 数组】的功能，其实就是以页号为索引标记了整个物理内存空间哪些地方没用，可以分配；哪些地方用了，不能分配，从page_free_list中剔除。

//其他的实现方式    
    pages[0].pp_ref = 1;
    pages[0].pp_link = page_free_list; // null
    for (i = 1; i < npages_basemem; i++) {
        pages[i].pp_ref = 0;
        pages[i].pp_link = page_free_list;
        page_free_list = &pages[i];
    }
    int npages_extmem = EXTPHYSMEM/PGSIZE;
    int npages_freeextmem = ((uint32_t)(struct PageInfo *)(pages + npages)-KERNBASE)/PGSIZE; //-KERNBASE转换为物理地址空间【是手动kernel空间页表地址转换】
    for ( int i = npages_freeextmem ; i < npages ; i++)
    {
        pages[i].pp_ref = 0 ;
        pages[i].pp_link = page_free_list;
        page_free_list = &pages[i];
    }

完成了page_init()之后，内存的相关操作就用page_xxx函数了，如要进行内存映射mapping就用boot_map_region或page_insert。

page_alloc()和page_free()

在调用check_page_free_list() 和check page_alloc()之前要实现page_alloc()和 page_free()函数。

page_alloc函数其实就是取一个链表头的操作。

注意：这里只是返回了一个空闲物理页对应的PageInfo结构体，并返回空闲物理页基地址
PageInfo结构体只是一个对物理页的信息描述（包括指向这个页的指针个数，下一个页指针），实际的内存地址还需要一些简单的转换，即第k页就是k«12的物理基地址
基于物理地址再转换成虚拟地址：【kernel空闲的页+/-KERNBASE，因为页表就是offset=KERNBASE的线性变换，用户空间需要使用到用户空间的页表】

struct PageInfo *
page_alloc(int alloc_flags)
{
    if (!page_free_list) return NULL;
    struct PageInfo* ret = page_free_list;
    page_free_list = page_free_list->pp_link;
    ret -> pp_link = NULL;
    memset(page2kva(ret),0,PGSIZE);//因为现在已经开起来页表地址转换【设置了cr0和cr3】，因此需要用虚拟地址操作地址空间的赋值等，这样cpu才能正确切换/转换。
    if (alloc_flags & ALLOC_ZERO)
    {
        memset(page2kva(ret),'\0',PGSIZE);
    }
    // cprintf("ret:x\n",ret);
    return ret;
}
//间接调用了KADDR和page2pa进行地址转换
static inline void*
page2kva(struct PageInfo *pp) //根据PageInfo，得到该页的虚拟地址【必须是位于kernel空间的页，才可以这样转换】
{
	return KADDR(page2pa(pp));
}
//将物理地址转换为内核虚拟地址，即+KERNBASE	
/* This macro takes a physical address and returns the corresponding kernel
 * virtual address.  It panics if you pass an invalid physical address. */
#define KADDR(pa) _kaddr(__FILE__, __LINE__, pa)
static inline void*
_kaddr(const char *file, int line, physaddr_t pa)
{
	if (PGNUM(pa) >= npages)
		_panic(file, line, "KADDR called with invalid pa %08lx", pa);
	return (void *)(pa + KERNBASE);
}

static inline physaddr_t
page2pa(struct PageInfo *pp) 
{
	return (pp - pages) << PGSHIFT;  //根据当前PageInfo和pages的距离，得到是第几个页，再左移12位，得到当前页的物理基地址
}
#define PGSHIFT		12		// log2(PGSIZE)【PGSIZE=2^12】

pages存储了所有物理内存页的PageInfo【整个物理内存】
其中空闲的页，插入了page_free_list

page_free相对应的，就是在链表头插入一个节点的操作。

void
page_free(struct PageInfo *pp)
{
    // Fill this function in
    // Hint: You may want to panic if pp->pp_ref is nonzero or
    // pp->pp_link is not NULL.
    if (pp->pp_ref!=0 || pp->pp_link)
    { //要求free的指针不为空，且指向该该空间的指针数=0
        panic("page_free: pp_ref is nonzero or pp_link is not NULL");
    }
    // how to return something in a void function?
    // 相当于在链表头插入一个节点
    pp->pp_link = page_free_list;
    page_free_list = pp ;
}

到现在，练习1就算完成了。怎么知道我们的实现是对的呢，启动JOS，断言应该挂在page_insert处，并且make grade显示Physical page allocator: OK 就应该是没问题了。

pagetables

Part 1完成后【mem_init()后】，物理内存的情况如下：

Part 2: Virtual Memory

Exercise 2

在进行更近一步之前，需要了解x86保护模式的内存管理架构——段页式内存翻译/管理【 segmentation and page translation】。

参考该资料的第五章和第六章：https://pdos.csail.mit.edu/6.828/2018/readings/i386/toc.htm

Virtual, Linear, and Physical Addresses

在x86中，虚拟地址（virtual addr）是由段选择子和段内偏移【a segment selector and an offset within the segment】组成的，有时候也会被成为逻辑地址（logic addr）。

A linear address is what you get after segment translation but before page translation.
A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM.

对于C程序而言，其中涉及的地址（如c指针指向的地址）就是只是虚拟地址/逻辑地址的offset部分。但是为什么我们往往在平常在调试或者编写程序的时候都说“虚拟地址“呢？

可以从boot/boot.S中分析出来：在内核启动的时候，我们设置了所有段寄存器指向同一个GDT表项，该GDT（Global Descriptor Table）表项的base addr为0、limit（限制大小）为0xffffffff。也就是说相当于base+offset【我们在程序中所说的”虚拟地址”】=0+offset=offset，所以程序所说的虚拟地址就等于真实的虚拟地址本身。换句话说，就是相当于我们通过设置GDT把段式内存管理给关闭了。
- A C pointer is the “offset” component of the virtual address. In boot/boot.S, we installed a Global Descriptor Table (GDT) that effectively disabled segment translation by setting all segment base addresses to 0 and limits to 0xffffffff. Hence the “selector” has no effect and the linear address always equals the offset of the virtual address.
- 在lab3中，为了设置特权级别，会稍许涉及到段，但是在内存翻译这一部分，我们完全可以忽略段，仅仅考虑页面翻译。

回顾lab1的实验3，我们建立了一个简单的页表，使得kernel可以跑在0xf0100000的地址【实际对应的物理地址是在ROM BIOS之上的0x00100000】。这个简单的页表仅仅map了4MB的内存。在本实验中，我们将会扩展内存映射，将第一个256MB的物理内存映射到虚拟地址0xf0000000，并且map其他虚拟地址空间。

Exercise 3

GDB只能通过虚拟内存访问到QEMU的内存数据，但是可以开启QEMU monitor（press Ctrl-a c in the terminal running QEMU，参考手册），在QEMU monitor中使用xp命令并且在GDB中使用x命令，可以同时观察到物理内存和虚拟内存中的数据情况（观察一致性）。

ctrl-a c无法使用的时候，考虑在lab目录下面输入如下指令，一样可以打开moniter：
- qemu-system-i386 -hda obj/kern/kernel.img -monitor stdio -gdb tcp::26000 -D qemu.log
在本实验中的patched QEMU还可以使用info pg看到当前页表的详细信息。
info mem可以看到哪些虚拟内存被映射，被赋予了哪些权限。

一旦我们进入了保护模式（boot/boot.S中），就没有办法直接使用线性或物理地址。所有内存引用都被解释为虚拟地址，并由 MMU 进行转换，这意味着 C中的所有指针都指的是虚拟地址。

在JOS源码中为了方便阅读，用uintptr_t表示虚拟地址、用physaddr_t表示物理地址【实际上两种类型都是uint32 _ t】

有时候JOS内核需要对地址进行偏移等算术操作，因此使用了整数类型，而非指针类型。所以在解引用 uintptr _ t之前，需要将 uintptr _ t转化为指针类型再解引用。但是内核不能解引用一个物理地址physaddr_t，因为MMU会把其作为虚拟地址进行内存转换【这样就莫名其妙进行了两次地址翻译了，注意现在已经开启了地址翻译了】

总结

C type	Address type
`T*`	Virtual
`uintptr_t`	Virtual
`physaddr_t`	Physical

Question 1

假设下述JOS内核代码是正确的，那么变量x应该是uintptr_t类型呢，还是physaddr_t呢？　　　

mystery_t x;
char* value = return_a_pointer();
*value = 10;
x = (mystery_t) value;

答：由于这里使用了 * 操作符解析地址，所以变量x应该是uintptr_t类型。

JOS内核有时候在仅知道物理地址的情况下，想要访问该物理地址，但是没有办法绕过MMU的线性地址转换机制，所以没有办法用物理地址直接访问。但是有时候他只知道这个要被修改的内存的物理地址。举个例子，当我们想要加入一个新的页表项时，我们需要分配一块物理内存来存放页目录项，然后初始化这块内存。然而，内核它是不能绕过虚拟地址转换这一步的，因而它也不能直接加载或者存储物理地址。

那要怎么办呢？

回顾lab1，我们可以知道初始的page table【[0,4MB]和[KERNBASE,KERNBASE+4MB]都映射到[0,4MB]】，JOS将虚拟地址0xf0000000映射到物理地址0x0处的一个原因就是希望能有一个简便的方式实现物理地址和线性地址的转换【只需要+/-KERNBASE】。

在知道物理地址pa的情况下可以加0xf0000000得到对应的线性地址，可以用KADDR(pa)宏实现。在知道线性地址va的情况下减0xf0000000可以得到物理地址，可以用宏PADDR(va)实现。

Reference counting

在之后的实验中，你将会经常遇到一种情况，多个不同的虚拟地址被同时映射到相同的物理页上面。这时我们需要记录一下每一个物理页上存在着多少不同的虚拟地址来引用它，这个值存放在这个物理页的PageInfo结构体的pp_ref成员变量中。当这个值变为0时，这个物理页才可以被释放。通常来说，任意一个物理页p的pp_ref值等于它在所有的页表项中，被位于虚拟地址UTOP之下的虚拟页所映射的次数（UTOP之上的地址范围在启动的时候已经被映射完成了，之后不会被改动）。

　当我们使用page_alloc或page_insert函数的时候需要注意，这些函数返回的页的引用计数值总是0，所以一旦你对返回的页面做了什么（比如插入页表中），pp_ref就应该被加一。

Page Table Management

接下来需要完成一些coding，用于管理页表。

to insert and remove linear-to-physical mappings,
to create page table pages when needed.

Exercise 4

In the file kern/pmap.c, you must implement code for the following functions.

        pgdir_walk()
        boot_map_region()
        page_lookup()
        page_remove()
        page_insert()
	

check_page()【在mem_init()中被调用】会负责检测上述代码的实现。

1. pgdir_walk()

根据注释的提示可知，通过传入的参数pg_dir【指向当前的页目录指针】，返回针对入参线性地址va的PTE（page table entry），需要进行两级的页表结构进行转换。

注意不论是cr3、还是页表或页目录中的条目存储的地址都是物理地址，不然如果存储了虚拟地址又要经过页表转换，那就不断反复了。

// Given 'pgdir', a pointer to a page directory, pgdir_walk returns
// a pointer to the page table entry (PTE) for linear address 'va'.
// This requires walking the two-level page table structure.
//
// The relevant page table page might not exist yet.【通过pgdir中表项的Present bits】
// If this is true, and create == false, then pgdir_walk returns NULL.
// Otherwise, pgdir_walk allocates a new page table page with page_alloc.
//    - If the allocation fails, pgdir_walk returns NULL.
//    - Otherwise, the new page's reference count is incremented,
//	the page is cleared,
//	and pgdir_walk returns a pointer into the new page table page.
//
// Hint 1: you can turn a PageInfo * into the physical address of the
// page it refers to with page2pa() from kern/pmap.h.
//
// Hint 2: the x86 MMU checks permission bits in both the page directory
// and the page table, so it's safe to leave permissions in the page
// directory more permissive than strictly necessary.
//
// Hint 3: look at inc/mmu.h for useful macros that manipulate page
// table and page directory entries.
//给定一个页目录表指针 pgdir ，该函数应该返回线性地址va所对应的页表项指针。
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{   //va是线性地址
	// Fill this function in
	assert(pgdir);  //pgdir没有present bit，如果某个任务被调度，一定会保证pg dir被正确设置。【详见参考Present Bit第三段】
	//如果pgdir不为空。那么还需要考虑页表为空和不为空两种情况。
	//为空时，如果create不为false，那么就要创建新的页表
	//参考inc/mmu.h中可用的宏，便于得到各个字段
    
    pde_t *pde = &pgdir[PDX(va)]; //首先获取va线性地址对应的页目录项的entry指针
    if (!(*pde & PTE_P)) { //读取pgdir_entry的PTE_P位【主要是用于表示当前这个entry是否可用，是否存在，详见参考】，如果不存在
        if (!create) return NULL;  //且不允许创建新的，直接return NULL
        struct PageInfo *page = page_alloc(ALLOC_ZERO);   //如果运行创建，就建立新的page table【每张page table的大小就是一个页面】
        if (!page) return NULL;  //分配失败返回NULL
        assert(page->pp_ref == 0);  //初次分配的页面引用应该位0
        page->pp_ref++;       //页面被分配后引用+1
        assert(page->pp_link == NULL);  //已经不在page_free_list中了,pp_link字段无效
        *pde = page2pa(page) | PTE_P | PTE_U | PTE_W;  //在pgdir中的条目写入page table的基地址【物理地址】和权限位，PTE_W[Read/write access]、PTE_U[User level]【详见参考】
    }
    // 获取页表项
    return (pte_t*)(KADDR(PTE_ADDR(*pde))) + PTX(va); //*pde读取va在pgdir对应的表项，PTE_ADDR(pte)为表项去掉权限位得到的页表物理基地址，KADDR(PTE_ADDR(*pde))为va对应页表的虚拟地址，(pte_t*)(KADDR(PTE_ADDR(*pde))) + PTX(va)为页表中va对应页表项的指针【指向虚拟地址的指针】。
}

// Address in page table or page directory entry 【from inc/imm.h】
#define PTE_ADDR(pte)	((physaddr_t) (pte) & ~0xFFF)

注意：页表目录和页表中存储的是物理地址。当在页目录中存储页表的物理基地址的时候，需要将线性地址通过简单的offset转换为物理地址。

之前说到的UVPT，是确定了页表和目录对应的index，从而计算出相应的虚拟地址。这个index中存储的依旧是物理地址。

2.boot_map_region()

该函数把虚拟地址空间范围[va, va+size)映射到物理空间[pa, pa+size)的映射关系加入到页表pgdir中。
这个函数主要的目的是为了设置虚拟地址UTOP【用户虚拟空间的最高地址】之上的地址范围，这一部分的地址映射是静态的，在操作系统的运行过程中不会改变，所以这个页的PageInfo结构体中的pp_ref域的值不会发生改变。
实现办法：通过修改pgdir指向的树/通过修改页表结构，将[va, va+size)对应的虚拟地址空间映射到物理地址空间[pa, pa+size)。va和pa都是页对齐的。

static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
	// Fill this function in
	size_t pgs = size / PGSIZE;    
	if (size % PGSIZE != 0) {
		pgs++;
	}                            //计算总共有多少页
	for (int i = 0; i < pgs; i++) {
		pte_t *pte = pgdir_walk(pgdir, (void *)va, 1);//获取va对应的PTE的地址
		if (pte == NULL) {
			panic("boot_map_region(): out of memory\n");
		}
		*pte = pa | PTE_P | perm; //修改va对应的PTE的值，也就是在二级页表里面设置va对应于pa【由于两者都是页对齐，因此是虚拟基地址和物理基地址，不需要额外清空pa的低位】。
		pa += PGSIZE;             //更新pa和va，进行下一轮循环
		va += PGSIZE;
	}
}

3.page_lookup()

返回虚拟地址va所映射的物理页的PageInfo结构体的指针【即给定一个虚拟地址，找到相应的物理地址】，如果pte_store参数不为0，则把这个物理页的页表项地址存放在pte_store中。

这个函数的功能就很容易实现了，我们只需要调用pgdir_walk函数获取这个va对应的页表项，然后判断这个页是否已经在内存中【即是否对应的映射页表项存在】，如果在，则返回这个页的PageInfo结构体指针。并且把这个页表项的内容存放到pte_store中。

// Return the page mapped at virtual address 'va'.
// If pte_store is not zero, then we store in it the address
// of the pte for this page.  This is used by page_remove and
// can be used to verify page permissions for syscall arguments,
// but should not be used by most callers.
//
// Return NULL if there is no page mapped at va.
//
// Hint: the TA solution uses pgdir_walk and pa2page.
//
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
	// Fill this function in
	pte_t * pte =pgdir_walk(pgdir,(const void*)va,false); //如果对应的页表不存在，不进行创建,因为这里只是查找
	if(pte==NULL)
		return NULL;
	if(!(*pte & PTE_P)){
		return NULL;
	}

	struct PageInfo* retpp = pa2page(PTE_ADDR(*pte));
	if(pte_store!=NULL){
		*pte_store = pte; 
	}
	return retpp;
}

4.page_remove()

这个函数的功能主要是取消虚拟地址与物理地址的关联。

// Unmaps the physical page at virtual address 'va'.
// If there is no physical page at that address, silently does nothing.
//
// Details:
//   - The ref count on the physical page should decrement.
//   - The physical page should be freed if the refcount reaches 0.
//   - The pg table entry corresponding to 'va' should be set to 0.
//     (if such a PTE exists)
//   - The TLB must be invalidated if you remove an entry from
//     the page table.
//
// Hint: The TA solution is implemented using page_lookup,
// 	tlb_invalidate, and page_decref.
//
void
page_remove(pde_t *pgdir, void *va)
{
	// Fill this function in
	pte_t * pte = NULL;
	struct PageInfo* pp= page_lookup(pgdir,va,&pte);
	if(!pp)
		return;
	*pte = 0;  //这个页对应的页表项应该被置0
	page_decref(pp); //free and decrease pp_ref here
	tlb_invalidate(pgdir,va);
}

//相关函数：
//
// Decrement the reference count on a page,
// freeing it if there are no more refs.
//
void
page_decref(struct PageInfo* pp)
{
	if (--pp->pp_ref == 0)  //当减少一个虚拟页映射到这个物理页时，ref--。
		page_free(pp);     //当完全没有虚拟页对应这个物理页时，才考虑free这个物理页到page_free_list中
}

5.page_insert()

接下来再继续查看page_insert()，函数原型如下 page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)，功能上是完成：把一个物理内存中页pp与虚拟地址va建立映射关系【之前的boot_map_region()是一个空间范围的映射，这里是一个页的映射】。

这个函数的主要步骤如下：

首先通过pgdir_walk函数求出虚拟地址va所对应的页表项。
修改pp_ref的值。
查看这个页表项，确定va是否已经被映射，如果被映射，则删除这个映射。
把va和pp之间的映射关系加入到页表项中。
最后将TLB中的va虚拟页相关的缓存注销。

// Map the physical page 'pp' at virtual address 'va'.
// The permissions (the low 12 bits) of the page table entry
// should be set to 'perm|PTE_P'.
//
// Requirements
//   - If there is already a page mapped at 'va', it should be page_remove()d.
//   - If necessary, on demand, a page table should be allocated and inserted
//     into 'pgdir'.
//   - pp->pp_ref should be incremented if the insertion succeeds.
//   - The TLB must be invalidated if a page was formerly present at 'va'.
//
// Corner-case hint: Make sure to consider what happens when the same
// pp is re-inserted at the same virtual address in the same pgdir.
// However, try not to distinguish this case in your code, as this
// frequently leads to subtle bugs; there's an elegant way to handle
// everything in one code path.
//
// RETURNS:
//   0 on success
//   -E_NO_MEM, if page table couldn't be allocated
//
// Hint: The TA solution is implemented using pgdir_walk, page_remove,
// and page2pa.
//
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
	// Fill this function in
	pte_t* pte = pgdir_walk(pgdir,(const void *)va,true); //拿到va对应的PTE地址，如果va对应的页表还没有分配，则分配一个物理页作为页表
	if(pte==NULL){
		return -E_NO_MEM;
	}
    pp->pp_ref++; //这里要注意，pp->pp_ref++这条语句，一定要放在page_remove之前，这是为了处理一种特殊情况：pp已经映射到va上了。
	if(*pte & PTE_P){
		page_remove(pgdir,va); //如果这个va，已经映射到pp了，而且是唯一的映射。那么在remove的时候，会检查pp_ref是否为0，如果为0，那么va对应的物理页，也就是pp，就会被回收到page_free_list中。这时候释放了之后需要再被映射的pp页。故需要提前将pp_ref++，避免pp被回收。
	}
	*pte = page2pa(pp) | PTE_P | perm;
	tlb_invalidate(pgdir,va);

	return 0;
}

至此如果一切顺利，将通过mem_init()中check_page()的所有assert。

Part 3: Kernel Address Space

JOS内核将处理器的32位线性空间分为两个部分：用户空间和内核空间。

用户空间进程的加载和运行将会在lab3中描述，这里只强调用户空间进程控制了lower part的内存，而内核空间控制了upper part的内存。两者分割线即inc/memlayout.h中的ULIM，大概为内核虚拟空间保留了256MB的大小，具体如下图所示：

mit231

这也就解释了为什么lab1中需要对低地址的内核物理空间进行高地址的虚拟空间的映射。因为没有足够的空间在用户虚拟空间以下同时映射内核的虚拟内存。也就是（高地址）内核虚存+用户虚存+内核虚存（低地址），不能完全放入虚存空间中，虚存不够。所以这里牺牲了低地址的内核虚存。

This explains why we needed to give the kernel such a high link address in lab 1: otherwise there would not be enough room in the kernel’s virtual address space to map in a user environment below it at the same time.

而JOS会把物理地址[0x00000000 ,0x0fffffff] 的256MB空间映射到虚拟空间[0xf0000000 ,0xffffffff ]，做这个地址映射原因之一是”One reason JOS remaps all of physical memory starting from physical address 0 at virtual address 0xf0000000 is to help the kernel read and write memory for which it knows just the physical address”

Permissions and Fault Isolation

由于kernel和用户空间同时会存在于每个地址空间中。因此我们需要在页表中使用权限位permission bits，使得用户代码只访问用户的地址空间。

Note that the writable permission bit (PTE_W) affects both user and kernel code!

用户环境对 ULIM 之上的任何内存都没有权限，而内核将能够读写这些内存。对于地址范围[ UTOP，ULIM ] ，内核和用户环境具有相同的权限——两者均可以读取但不能写入这个地址范围。这个地址范围用于向用户环境公开某些只读的内核数据结构。最后，UTOP 下面的地址空间供用户环境使用；用户环境将设置访问此内存的权限。

Initializing the Kernel Address Space

现在我们要设置一下UTOP之上的地址空间：这也是整个虚拟地址空间中的内核地址空间部分。inc/memlayout.h文件中已经向你展示了这部分地址空间的布局。你可以使用你刚刚编写的函数来设置这些地址的布局【设置页表：set up the appropriate linear to physical mappings】。

Exercise 5

填充 mem_init() 在调用 check_page()之后的缺失内容。

补足的代码需要通过 check_kern_pgdir() 和check_page_installed_pgdir() 的检测。

Hint：通过之前建立的boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)函数：在指定的pgdir中设置页表项，实现将虚拟空间[va, va+size]映射到物理空间[pa，pa+size]。

添加的3行代码如下：

①Map pages read-only by the user at linear address UPAGES.

把pages变量映射到UPAGES区域，kernel和用户都可读取。

//////////////////////////////////////////////////////////////////////
	// Map 'pages' read-only by the user at linear address UPAGES
	// Permissions:
	//    - the new image at UPAGES -- kernel R, user R
	//      (ie. perm = PTE_U | PTE_P)
	//    - pages itself -- kernel RW, user NONE
	// Your code goes here:
	// 在kern_pgdir页目录下进行设置，将虚拟地址的UPAGES映射到物理地址pages数组开始的位置
	boot_map_region(kern_pgdir, UPAGES, PTSIZE, PADDR(pages), PTE_U); //boot_map_region中自带会设置PTE_P权限
//内核虚拟内存中存储的pages是内核可以修改变动的【实际对应的物理地址在低端位置，是内核.bss段之后的一小块物理区域，由boot_alloc和人工构建的初始线性页目录和页表：entry_pgdir及entry_pgtable负责分配和管理】。
//这里把pages再映射到UPAGES的位置，设置权限位PTE_U【for applications procedures and data.】和有效位PTE_P，但是没有其他权限
//在地址范围[ UTOP，ULIM ]中，用户和内核都可以访问。
//而UPAGES包含在其中，但是用户访问时，该位置不具备WX权限，只可R。
//但是kernel中的pages【位于高地址的内核虚拟空间】是具备RW权限的【因为人工构建的初始线性页目录和页表设置了PTE_W权限】
//也就是说内核可以访问UPAGES空间，得到通过这里设置的map找到物理pages，也可以通过在内核虚拟空间读取pages，通过entry_pgdir及entry_pgtable找到物理地址的pages。
//但是用户只能通过访问UPAGES空间，找到pages，且只能R。
//默认PTE_P的表项只可R/W bits为0，即为可读。如果设置了W则可读写。

perm变量之所以设置为PTE_U，是因为这部分空间【即[UTOP,ULIM]】是kernel space和user space中的代码都能访问的，所以要设置PTE_U。

如果只能kernel访问，则不设置PTE_U。

②Use the physical memory that ‘bootstack’ refers to as the kernel。

还需要映射内核的栈区，把由bootstack变量所标记的物理地址PADDR(bootstack)范围映射给内核的堆栈。内核堆栈的虚拟地址范围是[KSTACKTOP-PTSIZE, KSTACKTOP)，不过要把这个范围划分成两部分：

[KSTACKTOP-KSTKSIZE, KSTACKTOP) 这部分映射关系加入的页表中。
[KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) 这部分不进行映射，而是作为guard page,如果kernel stack溢出了，不会发生溢出写，而是fault。【也就是invalid memory】

mit232

另外，对这部分地址的访问权限是，kernel space 可以读写，user space 无权访问。因此是PTE_W。

//////////////////////////////////////////////////////////////////////
	// Use the physical memory that 'bootstack' refers to as the kernel
	// stack.  The kernel stack grows down from virtual address KSTACKTOP.
	// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
	// to be the kernel stack, but break this into two pieces:
	//     * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
	//     * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
	//       the kernel overflows its stack, it will fault rather than
	//       overwrite memory.  Known as a "guard page".
	//     Permissions: kernel RW, user NONE
	// Your code goes here:
	// 'bootstack'定义在/kernel/entry.
	boot_map_region(kern_pgdir, KSTACKTOP-KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);

③Map all of physical memory at KERNBASE.

映射整个OS的内核。访问权限也是，kernel space 可以读写，user space 无权访问。

//////////////////////////////////////////////////////////////////////
	// Map all of physical memory at KERNBASE.
	// Ie.  the VA range [KERNBASE, 2^32) should map to
	//      the PA range [0, 2^32 - KERNBASE)
	// We might not have 2^32 - KERNBASE bytes of physical memory, but
	// we just set up the mapping anyway.
//原来的初始人工页表是VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
//现在设置为VA's [KERNBASE, 2^32) to PA's [0, 2^32 - KERNBASE)
//也就是现在的映射涵盖了之前的初始映射，权限不变依旧是PTE_P|PTE_W
	// Permissions: kernel RW, user NONE
	// Your code goes here:
	boot_map_region(kern_pgdir, KERNBASE, 0xffffffff - KERNBASE, 0, PTE_W);

至此完成了lab2所有的代码部分：

mit233

内存情况如下：

图1 物理内存和虚拟内存映射关系示意图

Question And Challenge

参考链接：

https://jiyou.github.io/blog/2018/04/19/mit.6.828/jos-lab2/
https://111qqz.github.io/2019/02/mit-6-828-lab-2/#part-2-virtual-memory
https://www.cnblogs.com/gatsby123/p/9832223.html
https://www.cnblogs.com/fatsheep9146/p/5324692.html

参考

The UVPT（user virtual page table）

参考连接：https://pdos.csail.mit.edu/6.828/2014/lec/l-josmem.html

一个用户程序的普通虚拟内存地址是通过页表转换为物理地址的。又如何通过页表虚拟内存地址得到页表的物理地址呢？难道cpu会判断这个虚拟地址是不是页表虚拟地址嘛【那岂不是要写死了？太不安全了】，接下来我们来解答这个问题：

首先来看看对于x86而言，如何将虚拟地址转换为物理地址：

pagetables

对于一个虚拟地址，将会分为PDX PTX low bits，三个部分
首先cr3指向了页面目录（page directory），通过PDX作为页目录索引，得到页表（page table）。然后以PTX作为页表的索引，得到page的物理基地址，然后low bits作为页面偏移得到物理地址。
- 每个进程在被调度时，cr3会被更新，更新为当前进程的物理页目录地址
对于处理器而言，并不认为页目录、页表、页地址本身和普通的内存有什么区别，它并不会因为提供的是页目录的地址，就不进行索引；是页表的地址，就进行1次索引（So there’s nothing that says a particular page in memory can’t serve as two or three of these at once）；其他地址就索引2次。其实对于处理器而言，对于任何虚拟地址都只进行一种操作：
- pd = lcr3();
- pt = *(pd+4*PDX);
- page = *(pt+4*PTX);
那么处理器如何才知道现在的给出的虚拟地址是页表或者页目录的呢？其实它根本就不知道的。而是设置了回环指针：
- 如果我们在页目录中的某一个项（记为索引为V的项）中放入一个指针指向页目录本身：
  - 那么当试图翻译一个 PDX 和 PTX 等于 v 的虚拟地址，三次转换箭头都会指回页目录的物理基地址。试想：
    PDX=PTX=v pd = lcr3(); ----->读取cr3=页目录物理基地址 pt = *(pd+4*PDX); ----->基于页目录基地址和PDX索引[4(size)*PDX=4v]，得到索引v下的指针，即指向页目录基地址【本来应该是指向了下一级，即页表。但通过回环指针，化解了一次转换】 page = *(pt+4*PTX); ----->基于页目录基地址和PTX索引[4(size)*PTX=4v]，得到索引v下的指针，即指向页目录基地址
  - 可以看到cpu对于页目录虚拟地址也是做同样的地址转换处理，只是因为特殊的索引表项，化解了转换。
  - 在JOS 中，v 是0x3BD，因此 UVPD 的虚拟地址是(0x3BD < 22) | (0x3BD < 12)【即PDX和PTX都是v，两次转换都到v表项中】
- 同样地，用 PDX = v 和一个任意的 PTX！= v的虚拟地址进行翻译，将会得到页表的物理基地址。
  - 故在JOS中，v 是0x3BD，所以 UVPT 的虚拟地址是(0x3BD < 22)【即PDX是v】
正由于我们巧妙地将“ no-op”箭头【上述的回环指针】插入到页目录中，所以我们可以将用作页目录的页面和页面表映射到虚拟地址空间中。
- 因此页目录和页表都有自己的虚拟地址。

Present Bit

More details through https://pdos.csail.mit.edu/6.828/2018/readings/i386/s05_02.htm

The Present bit indicates whether a page table entry can be used in address translation. P=1 indicates that the entry can be used.

When P=0 in either level of page tables, the entry is not valid for address translation, and the rest of the entry is available for software use; none of the other bits in the entry is tested by the hardware. Figure 5-11 illustrates the format of a page-table entry when P=0.

If P=0 in either level of page tables when an attempt is made to use a page-table entry for address translation, the processor signals a page exception. In software systems that support paged virtual memory, the page-not-present exception handler can bring the required page into physical memory. The instruction that caused the exception can then be reexecuted. Refer to Chapter 9 for more information on exception handlers .

Note that there is no present bit for the page directory itself. The page directory may be not-present while the associated task is suspended, but the operating system must ensure that the page directory indicated by the CR3 image in the TSS is present in physical memory before the task is dispatched . Refer to Chapter 7 for an explanation of the TSS and task dispatching.

Page-Level Protection

More details through https://pdos.csail.mit.edu/6.828/2018/readings/i386/s06_04.htm

Two kinds of protection are related to pages:

Restriction of addressable domain.
- Supervisor level (U/S=0) – for the operating system and other systems software and related data.
- User level (U/S=1) – for applications procedures and data.
- The current level (U or S) is related to CPL. 【在段寄存器中的不可见部分，申明了当前的权限】
Type checking.
- Read-only access (R/W=0)
- Read/write access (R/W=1)
- When the processor is executing at supervisor level, all pages are both readable and writable.
- When the processor is executing at user level, only pages that belong to user level and are marked for read/write access are writable;
- pages that belong to supervisor level are neither readable nor writable from user level.

lab2开始了，猴多啊……