8. File System Interface and Framework

8.1 Introduction

vnode/vfs allows multiple types of file systems coexist within a system

In this note FFS = Berkeley ffs, ufs = FFS within vnode/vfs framework

8.2 The User Interface to Files

8.2.1 Files and Directories

8.2.2 File Attributes

Commonly supported attributes

Type: regular, directory, FIFO, symbolic link, block device, character device

Number of hard links

File size

Device ID = Logical Disk = Partition

inode number

User and group id

Timestamps

Permission and mode flags

sticky bit for file: keep the image in swap area after the program exits so that reloading the program later could be fast

sticky bit for directory = access permission

even if a process has write permission for the directory, it has to has an adequate permission to delete or rename a file

same uid or write permission to the file

8.2.3 File Descriptor

inode: unique for the file

Open file object: the status of an open - has offset, mode, etc.

fd: pointer to an open file object

Having two fds pointing the same open file object means sharing offset and access mode...

Passing fd between processes

has the effect of passing a reference to the open file object

kernel copies the descriptor to the first free slot in the receiver's fd table

8.2.4 File I/O

read/write is atomic

8.2.5 Scatter-Gather I/O

Ordinary read/write copies data from a consecutive address space of the process to a logically consecutive file space

readv/writev: copies from scattered multiple addresses into a logically consecutive file space

8.2.6 File Locking

read/write atomicity does not guarantee consistency of file across read/write system calls -> locking required

Advisory & mandatory locking

Advisory: enforced by cooperative processes by checking locks before accessing the file

Mandatory: enforced by kernel

8.3 File Systems

Unix file system is a collection of subtrees which are connected together to make a big tree rooted by the root file system

8.3.1 Logical Disks

An independent, randomly accessible, linear disk conceived by the kernel

May contain only one file system

May contain no file system and be used for swapping area

A physical disk can be divided into multiple logical disks

Multiple physical disks can be combined into a single logical disk

A file larger than a disk becomes possible

Mirroring, RAID, and striping become possible

8.4 Special Files

8.4.1 Symbolic Links

8.4.2 Pipes and FIFOs

Both pipes and FIFOs are first-in first-out data stream

FIFO: named pipe

created by 'mknod'

opened by the name by any process which has the permission

deleted explicitly

Pipe: unnamed pipe

created by 'pipe'

inherited to child process

automatically deleted when no readers or writers are remained

Read from and write to pipe or FIFO are same as those of socket

BSD implements pipe and FIFO using socket, while SVR4 using STREAM

8.5 File System Framework

Need for support of multiple types of file systems within a single system

DOS formatted floppy disk

Network File System, ...

vnode/vfs is de facto standard

8.6 vnode/vfs Architecture

8.6.1 Objectives

Support several file system types

Continue to provide the single big, homogeneous tree structure

Support network file systems

Modular design

8.6.2 Lessons from Device I/O

Analogy to virtual file system:

Each device provide device-specific read/write/open/... operations

File system calls the generic functions to open/read/write.. devices

Implementation of device driver

Kernel has an array which is indexed into by the device major number

Each slot in the array has function pointers which point the appropriate function for open/read/write... each device provides

8.6.3 Overview of vnode/vfs Interface

Object oriented implementation

Kernel provides base classes for

vfs with generic data structures and virtual functions

vnode with generic data structures and virtual functions

Each file system extends the base classes by

adding file system specific data

extending virtual functions

=> a vnode object per file & a vfs object per file system

8.7 Implementation Overview

vfs nodes connected together as a linked list

Root file system is pointed by 'rootvfs' variable

Each vnode has a pointer to a vfs to which it belongs

Each vnode has a collection of methods which extend the virtual functions defined by the kernel

Each process has an fd-table whose entries point the corresponding vnode

8.9 Mounting a File System: Creating a New File System

Check whether the logical disk is formatted for the file system type

Create a vfs object

Link the vfs object with the existing vfs objects to form the linked list

Create a vnode for the "/" directory of the new file system

Connect the vnode to the new vfs object

Connect the vnode to the mount point using "mounted-on" link

8.10 Operations on Files

8.10.1 Pathname Traversal

lookuppn() - namei of the traditional Unix

input: pathname

output: the pointer to the vnode

Start from the root vnode("rootdir" variable) or the vnode for the current directory("u_cdir" variable in u-area)

Call vnode->lookup()

8.10.2 Directory Lookup Cache

Global resource shared by vfs objects

Cache vnode pointers for recently accessed files & directories

Hashed bucket based on the parent directory & file name

Cache hits eliminate disk I/O for directory lookup

9. File System Implementations

9.1 Introduction

System V file system(s5fs) and FFS are two representative local file system in Unix

Now SVR4 includes FFS

vnode/vfs made possible to have both s5fs and FFS within a system

ufs = FFS within the framework of vnode/vfs

Earlier versions of unix file systems used "buffer cache". But modern systems integrate file I/O and virtual memory management and use buffer cache for only meta data

9.2 System V File System: s5fs

Disk layout: boot area + superblock + inode list + data blocks

Boot area: empty except for booting file system

Superblock: metadata for file system

Inode list: one inode per file

Data blocks

9.2.1 Directories

A special file containing a list of files

Each entry is 16-byte long: 2-byte for inode number & 14-byte for file name

-> 65535 files are possible per file system

9.2.2 Inodes

Metadata for a file

On-disk inode & in-core inode

Fields of on-disk inode

mode: file type (regular, dir, block device, ...), suid, sgid, sticky, access right

number of links

...

data block pointers

10 direct blocks

1 indirect block

1 double-indirect block

1 triple-indirect block

9.2.3 Superblock

Kernel reads the superblock when it mounts the file system and keeps in memory

Fields

Size of file system

Size of inode list

Number of free blocks and inodes

First part of free block list

Some free inodes: sort of free inode cache

9.3 s5fs Kernel Organization

9.3.1 In-core inode

Fields

All fields of on-disk inode

Pointer to the vnode: remember that this has the pointer to vfs

Device id

inode number

Flags for synchronization and cache management

Pointer to maintain the free list of inodes

Pointer to maintain hash queues of inodes

9.3.2 inode Lookup

To get in-core inode and in turn vnode pointer for the given inode number

Hash into the hash queue by the inode number

If found, return vnode pointer

Otherwise, allocate a free inode from the head of the free list & read on-disk inode fields from disk and add in-core specific fields

9.3.3 File I/O

read(fd, buf, size)

index into fd table and get vnode pointer for the file

check mode of open

lock the vnode, which results in the lock of inode

call virtual read -> s5read

get (logical block number, offset within the block) from file offset

read data pages

map data page number into kernel virtual memory page number

if the page is in memory(= buffer cache in earlier releases), copy the page to 'buf'

otherwise page fault handling(= bring the page into the buffer cache)

unlock vnode, increase file offset by the number of bytes read

9.3.4 Allocating and Reclaiming inodes

When the reference count of vnode reaches zero, the corresponding inode gets freed -> add the inode to the free list:

Remember that the inode can still be hashed by the inode number

If any data page of the file is in memory, put the inode to the tail of free list of inode

Otherwise, put the inode to the front of free list

When the kernel needs an inode and couldn't find the in-core inode, it takes an inode at the head of the free list

Any data blocks allocated to the inode (=file) should be freed

9.4 Analysis of s5fs

Reliability: single copy of superblock suffers from reliability

Performance

Segregation between inode lists and data blocks -> long seek between inode access and data block pointed by the inode

Random allocation of inode -> slow access for all files in a directory

Disk block allocation and deallocation is sort of random -> sequential access of a file is slow

Block size issue: performance vs. fragmentation

Functionality

14 bytes file name limitation

Only 65535 files in a file system

9.5 Berkeley FFS

Major difference from s5fs are disk layout, on-disk structure, and free block allocation methods

9.6 Hard Disk Structure

9.7 On-Disk Organization

Disk partition comprises of a set of consecutive cylinders

Cylinder group: a small set of consecutive cylinders

Store related information in the same cylinder group -> reduce head movement

Ordinary superblock is partitioned into

file system info:

the number, size, and location of cylinder groups; block size; file system size, etc.

multiple copies at cylinder groups at different offset

cylinder group info: free list of data and inode blocks

9.7.1 Blocks and Fragments

Block size increased to 8K

-> better performance & no need for triple-indirect block

Fragment

Subdivision of a block

Individually addressable

File = multiple blocks + consecutive fragments (within the last block)

9.7.2 Allocation Policies

9.8 FFS Functionality Enhancement

9.9 Analysis

9.10 Temporary File Systems

9.10.1 The Memory File System

9.10.2 The tmpfs File System

9.11 Special-Purpose File Systems

9.11.1 specfs File System

9.11.2 /proc File System

9.11.3 Processor File System

9.11.4 Translucent File System

9.12 Old Buffer Cache

9.12.1 Basic Operation

9.12.2 Buffer Headers

9.12.3 Advantages

9.12.4 Disadvantages

9.12.5 Ensuring File System Consistency