**THE FAST FILE SYSTEM**
========================

# 0. Problem with Simple Unix file system

It is not disk-aware, treating disk as a random memory. Hence, 
- bad performance
- utilize only a very small fraction of disk bandwidth (2%)

Example of problem:
1) expensive position cost:
	+ data block of a file is far from file's inode --> expensive seek
2) fragmented because of bad free space management (use free list)
	A1 A2 B1 B2 C1 C2 D1 D2      in the beginning
	A1 A2 f  f  C1 C2 f  f       C and D are deleted
	A1 A2 E1 E2 C1 C2 E3 E4	     E is allocated (4 blocks)
	==> accessing to E require an additional seek 
3) small block size (512 byte)

Question: how do we make file system *disk aware*?

# 1. FFS: enforcing disk awareness
FFS seeks to improve both performance and reliability:
- performance: 
	+ place related stuff close together to avoid seek time
	+ disk aware: parameterization
- reliability: replication of supper block

Note that: FFS retains simple abstraction (i.e the interface), changing 
the underlying implementation

# 1.1. On-disk structure: Cylinder group
- disk is divided in to *cylinder group*
- each group has
	+ copy of super block
	 	==> super block is replicated, one gets corrupted, use another
	+ bitmap for both inode and data (instead of free list)
		==> better than free list, can find large chunk of free space 
		to allocate to a file, hence may avoid some fragmentation
	+ data block

# 1.2. Policies: place related stuff together
- allocate directory: find cylinder group with 
	+ low number of allocated dir
	==> hence, balancing dirs across group
	+ high number of free inodes
	==> subsequent files on that dir may be allocated on the same cylinder
- allocate file:
	+ allocate data block in the same group as its inode
	==> avoid long seek time
	+ place files in the same directory in the cylinder group of that directory

*Problem*: allocate of large file may occupy entirely its first group
==> undesirable, because subsequent "related" files may not be place on the
same group, thus hurt subsequent file-access locality
*Solution*: 
	+ for large file, divide it into chunks
	+ place first chunk in first group
	+ place subsequent chunk in different group, and so on
==> this might hurt performance, because of additional seek to read next chunk
==> Hence, *chunk-size* is important here:
	> large chunk size will increase the time portion spent on transferring
	(compared to seeking)
(in the paper: chunk size is 1 MB)

# 1.3. Other things: large block size and parameterization
- 4K block size ==> efficient, much data is transfer per disk transaction
But, problem of *internal fragmentation*, when number of small file is large

==> Solution: use *sub-block*, each has 512 byte.
+ when a small file, say 512 byte, is created, it would use up 1 sub-block, 
==> not waste an entire 4K block. 
+ As the file grew, FFS will allocate 512-byte sub-blocks until it acquires full
4-KB of data. Up to this, 
	> FFS find a whole full 4-KB block
	> copy the sub-blocks into it
	> and free the sub-blocks for future use

But now come the problem of *copy overhead*. 
Hence, a hack into the libc library: *the library will buffer writes and
then issue them in 4-KB chunks to the file system*, thus avoid sub-block 
copying overhead.

- Block layout on disk: parameterization
Blocks  are placed on disk to obtain optimal rotation access, avoiding
extra rotations when reading subsequent blocks on a track.
==> This still requires at least 2 rotations to read the whole track

==> Newer FS: *track buffer*, hence remove this complex placing logic
 from file system code

# 1.4. File system functionality:
- long file name
- symbolic link: allow cross file system reference
- atomic rename
- locking (?? which i don't understand much)

# 3.3 Layout Policies
- 2 way to improve performance:
  + 1. increase the locality of reference to minimize seek latency
  + 2. improve layout of data to make larger transfers possible
- Global layout policies:
  + try to improve performance by clustering related information
  + spread unrelated data among different cylinder groups
- Inode allocation:
  + place all inodes of files in a directory in same cylinder group
    ~ since files in same dir tend to access together (say ls...)
  + new directory is places in a cylinder group that has a greater than
    average number of new inode:
    ~ ensure files are distributed throughout the disk
- Data block