第十六章 页高速缓存和页回写

页高速缓存和页回写

页高速缓存为了减少对磁盘的I/O操作,将磁盘数据缓存在物理内存中,提高访问速度。

  • 访问内存的速度比访问硬盘速度快几个数量级
  • 局部性原理

1. 缓存方法

页高速缓存由内存中的物理页面组成,对应磁盘上的物理块。

写缓存策略

  • Write-through: write is done synchronously both to the cache and to the backing store.
  • Write-back (also called write-behind): initially, writing is done only to the cache. The write to the backing store is postponed until the cache blocks containing the data are about to be modified/replaced by new content.

缓存回收策略

Page replacement algorithm

2. Linux页高速缓存

Linux页高速缓存的目标是缓存任何基于页的对象,包含各种类型的文件和各种类型的内存映射。为了普遍性,Linux使用一个新对象管理缓存项和页I/O操作——address_space结构体。

  • address_space结构体
    address_space定义在linux/fs.h中,对于一个文件,它可以对应多个vm_area_struct,但只有一个address_space。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct address_space {
struct inode *host; /* owner: inode, block_device */
struct radix_tree_root page_tree; /* radix tree of all pages */
spinlock_t tree_lock; /* and lock protecting it */
atomic_t i_mmap_writable;/* count VM_SHARED mappings */
struct rb_root i_mmap; /* tree of private and shared mappings */
struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */
/* Protected by tree_lock together with the radix tree */
unsigned long nrpages; /* number of total pages */
/* number of shadow or DAX exceptional entries */
unsigned long nrexceptional;
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
unsigned long flags; /* error bits */
spinlock_t private_lock; /* for use by the address_space */
gfp_t gfp_mask; /* implicit gfp mask for allocations */
struct list_head private_list; /* ditto */
void *private_data; /* ditto */
} __attribute__((aligned(sizeof(long))));
/*
* On most architectures that alignment is already the case; but
* must be enforced here for CRIS, to let the least significant bit
* of struct page's "mapping" pointer be used for PAGE_MAPPING_ANON.
*/
  • address_space操作
    与vfs对象类似,a_ops域指向了address_space相关的操作函数,由结构体address_space_operations表示。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
/* Write back some dirty pages from this mapping. */
int (*writepages)(struct address_space *, struct writeback_control *);
/* Set a page dirty. Return true if this dirtied it */
int (*set_page_dirty)(struct page *page);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
/*
* migrate the contents of a page to the specified target. If
* migrate_mode is MIGRATE_ASYNC, it must not block.
*/
int (*migratepage) (struct address_space *,
struct page *, struct page *, enum migrate_mode);
bool (*isolate_page)(struct page *, isolate_mode_t);
void (*putback_page)(struct page *);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page)(struct address_space *, struct page *);
/* swapfile support */
int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
sector_t *span);
void (*swap_deactivate)(struct file *file);
};

3. flush线程

由于页高速缓存的存在,写操作会被延迟,内存中的脏数据会在特定情况下写回磁盘:

  • 当空闲内存低于一个特定阈值,内核必须将脏页写回磁盘释放内存
  • 当脏页在内存中驻留时间超过一个特定的阈值,内核必须将超时脏页写回磁盘
  • 用户调用sync()和fsync(),内核会按要求执行回写操作

这些工作由内核的flusher线程执行。实现代码在mm/page_writeback.cmm/backing-dev.c,回写机制实现在fs/fs-writeback.c

更多阅读: