« The scalable commutativity rule: designing scalable software for multicore processors | Main | The multikernel: a new OS architecture for scalable multicore systems »

The Design and Implementation of a Log-Structured File System

Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Trans. on Computer Systems 10(1), February 1992, pp. 26-52.

Reviews due Thursday, 4/9.

Comments

Summary:
The paper introduces a new technique for disk storage management called log-structured file system and evaluates it by implementing Sprite LFS and comparing its performance with Berkeley Unix FFS.

Problem:
The fast speed of CPU computations in comparison to the disk access times limits the performance of the existing file systems. The random writes on disk does not help the performance. The problem can be solved if file modification is done sequentially on disk.

Solution:
Paper introduces the concept behind log-structured file system that writes log-like structures to disk sequentially on file modifications. Key features of this solution are:
-single disk write operation of the buffered sequence maintained in file-cache.
-index structures in log to allow random-access retrievals.
-log division into segments to maintain large free areas on disk to avoid overhead in case of monolithic segment of data on disk.
-segment summary blocks as part of each segment to maintain just the live data on disk.
-write cost used to decide the segment cleaning policy which includes both new file write and disk cleaning write costs.
-cost-benefit policy to allow cold-segments cleaning at comparative higher rate.
-segment-usage table to maintain per segment utilization statistics.
-fast crash recovery using checkpoints(maintaining file consistent states) and roll-forward(all the writes since last checkpoint)

Evaluation:
Sprite LFS improves the disk writes for small files and compares these with Berkeley Unix FFS and the performance improvement is 70% disk utilization compared to 5-10%. Though in real implementation it is just 20% faster then the SunOS file system. Crash recovery is not evaluated. Though, LFS relies on hardware to handle the large file sequential write maintenance, evaluation results with SunOS shows slightly better comparative performance for it.

Concerns/Questions:
Can we discuss the use of LFS when applied to flash-storage as the writes are spread across address space? How it can be leveraged on it, and if there are better file system solutions for flash?

1. Summary
This paper introduced log-structured file system, which writes all modifications to disk sequentially in a log-like structure rather than write randomly. The paper talked about the challenges and motivations for log-structured file system in the 1990’s. And the paper explained their segment cleaning mechanism and crash recovery in detail.

2. Problem
In the 1990’s, processor technologies are advancing fast and their speed is increasing at a nearly exponential rate. Meanwhile, disk improvements are more focused on cost and capacity rather than performance, making the access time a bottleneck for I/O. Main memory is increasing in size at an exponential rate, leading to larger file caches. Yet these two issues arise with larger file caches:


  • Larger file caches absorb a greater fraction of the read requests. Disk traffic is more and more dominated by writes.

  • It is possible to improve the performance by writing more blocks in a single sequential transfer with only one seek, but it increases the amount of data lost during a crash.


The current file systems suffer from two general problems with the technologies and a workload dominated b accesses to small files:

  1. Files are spreaded around the disk in a way that causes too many small accesses.

  2. For workloads with many small files, the disk traffic is dominated by the synchronous metadata writes, reducing write performance.

3. Contributions
This paper presents a new idea to improve write performance by buffering a sequence of file system changes in the file cache and then write all the changes to disk sequentially in a single disk write operation. The paper talked about two key issues in such a file system:


  • To retrieve information from the log

  • Sprite LFS uses a data structure called an inode map to maintain the current location of each inode, and caches inode maps in main memory, reducing required disk accesses.
  • To manage the free space on disk efficiently

  • Sprite LFS uses a combination of threading and copying, divides the disk into large fixed-sized segments and does segment cleaning to make use of space taken by old data.

4. Evaluation
The paper benchmarked different implementations of LFS and FFS today. The conclusion of their simulation analysis with Sprite LFS is that low cleaning overheads can be achieved with a simple policy based on cost and benefit.

5. Confusion
I don’t fully understand how “threading” works.

1. Summary
This paper introduced log-structured file system, which writes all modifications to disk sequentially in a log-like stucture rather than write randomly. The paper talked about the challenges and motivations for log-structured file system in the 1990’s. And the paper explained their segment cleaning mechanism and crash recovery in detail.

2. Problem
In the 1990’s, processor technologies are advancing fast and their speed is increasing at a nearly exponential rate. Meanwhile, disk improvements are more focused on cost and capacity rather than performance, making the access time a bottleneck for I/O. Main memory is increasing in size at an exponential rate, leading to larger file caches. Yet these two issues arise with larger file caches:


  • Larger file caches absorb a greater fraction of the read requests. Disk traffic is more and more dominated by writes.

  • It is possible to improve the performance by writing more blocks in a single sequential transfer with only one seek, but it increases the amount of data lost during a crash.


The current file systems suffer from two general problems with the technologies and a workload dominated b accesses to small files:

  1. Files are spreaded around the disk in a way that causes too many small accesses.

  2. For workloads with many small files, the disk traffic is dominated by the synchronous metadata writes, reducing write performance.

3. Contributions
This paper presents a new idea to improve write performance by buffering a sequence of file system changes in the file cache and then write all the changes to disk sequentially in a single disk write operation. The paper talked about two key issues in such a file system:


  • To retrieve information from the log

  • Sprite LFS uses a data structure called an inode map to maintain the current location of each inode, and caches inode maps in main memory, reducing required disk accesses.
  • To manage the free space on disk efficiently

  • Sprite LFS uses a combination of threading and copying, divides the disk into large fixed-sized segments and does segment cleaning to make use of space taken by old data.

4. Evaluation
The paper benchmarked different implementations of LFS and FFS today. The conclusion of their simulation analysis with Sprite LFS is that low cleaning overheads can be achieved with a simple policy based on cost and benefit.

5. Confusion
I don’t fully understand how “threading” works.

1.Summary
This paper presents a new radical approach in design of file systems called Log-Structured File Systems (LFS) mainly aimed at increasing the I/O performance of systems. LFS aims to utilize full disk bandwidth by caching large amount of writable data in main memory and asynchronously performing a sequential write to log, an on-disk structure, thus speeding file writing and improving crash recovery. The paper also discusses mechanisms to maintain large free segments in memory called segment cleaning and evaluates some cleaning policies. Results show that LFS achieves an order of magnitude higher performance compared to Unix FFS for small files.

2.Problem
With ever-changing technology, increasing memory size and new workloads, I/O performance is mainly dependent on the disk’s write performance. Existing file systems only utilize small percentage of disk bandwidth to actually perform writes as most of the time they perform disk seeks and rotations to access small files spread across the disks. Also, these FSs do synchronous random write operations hurting the disk performance. The paper suggests a new design of FSs by caching large amounts of data in memory and asynchronously writing all at once sequentially to an on-disk structure log, utilizing maximum disk bandwidth.

3.Contributions
One of the contributions of paper, is the mechanism of LFS to locate the files and access them in disk. LFS maintains an on-disk structure called log, where the large data contents are written sequentially at a time. LFS uses blocks of inode map to keep up-to-date inode and disk blocks data. The biggest contribution of the paper is the mechanism and policies needed for free space management in LFS. LFS has large free extents in disk called segments and uses a combination of threading and copy and compact for segment cleaning. Some number of segments are read into main memory, live data of these segments are copied, compacted and are written back to cleaner segments. Live data of segments are marked by writing back ’segment summary block’ for each segment. The paper also discusses cleaning policies by introducing a metric — 'write cost’, the cost/time of disk needed to write new data to disk. LFS uses a cost-benefit policy to improve disk utilization bandwidth and reduce cleaning overhead. Finally, crash recovery for LFS is discussed, where a checkpoint region is used to update the log overtime and recover from this region in case of disk crash. A roll forward mechanism helps recover the data written after the last checkpoint.

4.Evaluation
Throughout the paper, their policies implemented are reasoned out by simulations. The segment utilization factor for cleaning segments are simulated and a cost-benefit policy was used to differentiate the locality behavior accounted for cleaning threshold. Sprite LFS, a prototype is evaluated and compared to Sun OS FS for small file reads/writes micro-bencmarks, and LFS has 10x speedup for creating and deleting files and has slightly better speedup for reading. In case of large files, Sprite LFS has comparable performance and even better in some cases. Overall, LFS seems to have better disk utilization bandwidth of 70-75%, compared to Unix FS’s 10–15% utilization. Lot of empirical data on disk utilization and cleaning overheads is also mentioned.

5.Confusion
One trivial concept I am missing is why doesn’t LFS need inode/disk block bitmaps similar to Unix FFS to keep track of free inodes and disk blocks? Why is that the write cost still 2 (50% disk bandwidth) (Fig.4) even with zero live blocks present in segment for no-variance LFS and uniform access LFS can achieve 100% bandwidth? How is that cleaning happens at least segment utilization threshold. Isn’t that should be opposite, as cleaning should be done when more and more data is utilized and free space gets more fragmented?

Summary

This paper presents log-structured file system, in which the writes to the disk are optimized by always writing to disk sequentially in a log-like structure. This speeds up both file writing and crash recovery. The log also contains additional indexing information for read efficiency. Policies to maintain large free areas on disk for fast writing have also been discussed.

Problem
The speed of processors increases at a nearly exponential rate but disk access times improves only slowly. So, the disk access time creates a bottleneck and causes applications to become disk-bound. This is exacerbated by the fact that other filesystems may store a single file and its metadata in a scattered manner throughout the disk resulting in high seek times. If the writes can be ensured to be sequential then we can obtain significant speed-up in the write performance.

Contributions
The main contribution is to conceive and implement the file-system as data permanently stored in a log in the disk. This allows all writes to be batched and written sequentially on the disk. This also helps in fast crash recovery. LFS uses another level of indirection to store the inode map because it doesn't place inodes at fixed positions, they are written to the log. The inode map is divided into blocks and are also written to the log. A fixed checkpoint region identifies the location of all the inode map blocks. Free space management to maintain large free extents for writing new data have been obtained by the segment cleaning mechanism. The disk is divided into segments. Periodically, a number of segments are read into memory, the live data is identified and then the live data is written to a smaller number of clean segments, hence compacting the segments and freeing up contiguous segments.

Evaluation
Sprite LFS and FFS implementations have been compared using microbenchmarks in SunOS. They demonstrate that raw writing speed of Sprite LFS is an order of magnitude greater than that of FFS for small files. For other workloads, such as reads and large-file accesses, Sprite LFS is comparable to FFS. They found that Sprite LFS permits about 65-75 % of a disk's raw bandwidth to be used for writing new data whereas FFS can only utilize 5-10 % of a disk's raw bandwidth for writing new data.

Confusion
The crash recovery and the roll forward part is not clear to me. Also, what are the various consistency issues that should be kept in mind during crash recovery?

1. Summary
This paper introduces the Sprite log structured filesystem, a filesystem designed to improve the speed of disk writes by assuming that most reads will happen from data cached in RAM. Writes are made by sequentially appending them to a log, eliminating the disk seeking required in a randomly distributed filesystem.
2. Problem
Hard disks of the day had fallen behind CPUs in terms of performance, while still decreasing in cost and increasing in capacity. Standard filesystems required disk seeks to write files, which ended up accounting for ~90% of disk write time in some situations. The authors aimed to improve write performance with Sprite.
3. Contributions
All updates to the filesystem are organized sequentially, appending them to the end of the log as writes happen. New inodes are also appended to the end of the log on write, and an inode map that points to all of them is generally cached entirely in memory. A global list of pointers to the sections of the inode map is stored in a known location on disk, allowing recovery from a crash.
Old data that has been invalidated by new writes is handled by a cleaning system that decides whether to perform a copy and compact on it to reduce segmentation or leave it alone (if it is long lived, rarely written data). To choose pages to clean they are evaluated based on age, the amount of free space generated, and utilization.
4. Evaluation
Sprite FS is benchmarked against Unix FFS and shown to be about 10x faster in situations involving the creation and deletion of small files. In write situations it is shown to be able to consume about 70% of available disk bandwidth vs. the Unix FS which can only manage about 5%-10%, with the rest of the time spent seeking. LFS is beating in some cases involving writes to large files followed immediately by reads.
5. Confusion
It seems to me that modern solid state hardware negates a lot of the benefits provided by Sprite? Seek time becomes much more negligible, is there still a reason to use a log structure on modern hardware?

Summary:
This paper introduces a log-structured file system, which writes all modifications to disk sequentially in a log-like structure.

Problem:
The disk access times have only improved slowly, comparing to the CPU and the memory. Existing file system has many small accesses and the seek overhead of disk is very large. Besides, they tend to write synchronously. Thus the existing file system is slow when they exist many small file accesses.

Contributions:
The paper introduces the log-structured file system. The file system changes will be buffered in the file cache and then be written to disk sequentially in a single disk write operation. Index structures are output in the log to permit random-access retrievals. Segment is the basic unit of free space management in disk. The log is threaded on a segment-by-segment basis and the segments containing long-lived data can skip copying. Every segment has a segment summary block used for cleaning. No free-block or bitmap is used in cleaning, thus crash recovery is simplified. The cost-benefit cleaning policy is used, which allows cold segments to be cleaned at a much higher utilization than hot segments.

Evaluation:
The authors implement the Sprite LFS system, and compare it with Unix fast FS (FFS). In all sequential and random read/write operations, Sprite LFS outperforms FFS. For small file performance, the Sprite LFS is an order of magnitude faster. Segment cleaning overheads and crash recovery overheads are low.

Confusion:
What if we replace disk with SSD, what will be the performance change of FFS and LFS?

Summary:
This paper talks about the design and development of a log-structured file system called Sprite LFS. They authors show that such a file system can improve the performance for small random writes and that it provides competitive performance for other cases also. They also discuss the challenges in maintaining such a system (such as space reclamation).

Problem/Motivation:
The authors argue that as CPU speeds increase, the speed of disk accesses became a very prominent bottleneck for application performance. They say that while file systems which were existing then (around 1991) could optimize the work load of long sequential reads/ writes ( as the seek and rotation latency is amortized during disk access itself) and small random reads ( through file caches in memory), the workload of small random writes is much more difficult to optimize. They posit that a log-structured file system such as the one they develop optimizes performance for this scenario.

Contributions:
-While log based storage techniques were already in use in write-once media and in databases(for crash recovery), one major contribution of the paper was to provide an efficient mechanism (and identifying effective policies) that could reclaim the space occupied by 'dead' logs.
-Also, unlike databases (and like the storage systems for write-once media), Sprite LFS uses the logs as the final storage format and not just as a recovery mechanism, and updates indices (such as inode-maps) and other segment usage data to keep track of the latest location of data(inodes to be more exact) and to keep track of the 'liveness' of blocks within segments of the log.
-The authors also point out that since using logs as the final source of truth means previously random writes can become sequential there is substantial reduction in the latency of disk access for writes(even without considering grouping the writes into segments)
-The cleaning mechanism (to reclaim space occupied by old logs while avoiding fragmentation)works as follows: The Sprite LFS always batches together a set of writes (into fixed sized segments) and perform 'segment'(/multiples of segments) sized disk writes always (this grouping together of smaller writes further improves performance). When number of available 'clean' segments falls below a low threshold, data from segments with 'dead/old' logs are combined/compressed together into fewer segments so that the number of available 'clean' segments raises above a certain high threshold.
-The paper shows that keeping track of hot and cold pages and cleaning the 'colder' pages ( i.e. pages not frequently changed), and grouping together the live blocks from cleaned pages by sorting by 'age' can be an effective policy as the space gained in such a way is more likely to be not 'reclaimed' by the same segment.
-The authors also talk about using 'checkpointing' i.e. periodically writing the inode-map and segment usage data to a fixed location in disk (alternating between two fixed locations) and then rolling forward from the last successfully checkpointed log segment to the most recent one to update the 'checkpoint' region's indexes and other metadata in case of crashes.

Evaluation:
-The authors execute their experiments on Sun-4/260 machines.The paper shows that the time taken to 'create'/'delete' a new file reduces nearly tenfold in Sprite LFS when compared against FFS due to the improvement in small write performance
-The authors also argue that the 'cleaning' overheads too were lesser than what their previous simulations has led them to believe they would be (i.e. 1 in 1.2-1.6 bytes written is for actual new data and not due to their cleaning mechanism).
-They show crash recovery time of around one second for an one-hour checkpoint interval with their roll-forward mechanism, but do not offer data on any other recovery mechanism's performance for comparison.

Confusion:
-Didn't database systems which were using logs for crash recovery at that time have their own 'cleaning' mechanisms?If they did, how different was the proposed mechanism from what was used by the database community?

1. Summary

The paper introduces the concept of log-structured file systems where data is written sequentially to a circular buffer (a log). In partiuclar, the design of Sprite LFS and the cleaning mechanism is expounded in greater detail.

2. Problem

In conventional update-in-place file systems such as FFS, block locations are static once assigned by some block allocation policy. All subsequent reads and writes to the block will be sent to that location. This works well for workloads with large files but fails for small file accesses.

Firstly, data writes are spread around the disk as file metadata (eg. inodes) are separate from file contents. This means one update operation on a file may requires upto 5 different disk I/O seeks.
This severely reduces the disk's bandwidth to less than 5%. Secondly, for workloads with small files, traffic is dominated by writes to meta-data. And as meta-data updates are synchronous, the application needs to wait till the competion of the I/O and so its performance suffers.

3. Contributions

The authors note that increasing main memory sizes will have 2 implications.
1) Most reads wont go to disk because of larger caches. Write requests will dominate disk accesses.
2) Presents the opportunity to utilize write buffers in memory which will make it more efficient to write blocks.

Hence, LFS seeks to improve write performace by buffering updates which enables large batched disk transfers. This has the effect of combining small writes into larger ones to utilize disk bandwidth.

To retrieve a file, each file has associated with it an inode which contains the file's attributes. The inode itself is also in the log but its position can be indexed through the idnode map which can be kept in memory.

There is also the problem of ensuring there are large free spaces in disk for new writes. Over time, the log is fragmented as free space due deleted or updated files is spread throughout the log. For this, the log is threaded into large segments and live data is moved around to be compacted together into these segments. This maintains large free segments for future writes. A segment summary block helps to identify which blocks in a segment are live (for writing out) and which file each block belongs to (to update the file's inode upon moving the block to another segment). A block's liveness can then also be determined by checking if the file's inode points still refers to this block. All of the above cleaning is carried out by the segment cleaner. The performance of the system is dependent on policies that decide which segments to clean and how to group live blocks into a segment. These policies are governed by a cost-based approach.

In the case of a crash, in a traditional FS, the last changes cannot be determined so the system wastes time in scanning through all the metadata to restore consistency. In a LFS, this is easy as the last data is at the end of the log. To help recovery LFS maintains the notion of a checkpoint which is the position in the log in which all files are consistent. This checkpoint is updated at periodic intervals. At recovery LFS reverts back to this checkpoint. In addition, LFS also uses roll-forward to recover data written after the last checkpoint.

4. Evaluation

On microbenchmarks, LFS is shown to vastly outperform SunOS on small file accesses and is competitive for larger files. A special case of reading is much slower in LFS though. The difference in temporal and logical localities is also explained in how they affect the performance of a LFS and a traditional FS. Overall LFS has 65%-75% disk bandwidth utilization while UNIX systems only utilize 5%-10%. Finally, LFS recovery time is shown to be in order of seconds.

5. Confusions

Isn't creating an entire new version of the file in the log every time it is updated expensive as compared to updating a file in place? I guess the this overhead is much smaller than the extra seeks that the other mechansim requires?

Why are log-structured file systems not used nowadays?

Summary
In this paper the authors present a new mechanism to implement a file system called a log-structured file system. They discuss how the present Unix fast file system is slow for writes of small files and various other issues. Then a technique is presented which writes all modifications to disk sequentially in a log-like fashion which speeds up both writes and crash recovery. In the evaluation section they show how the new file system called Sprite FS is able to write an order of magnitude higher than Unix FFS and reads at the same bandwidth while providing fast crash recovery and efficient free space management.

Problem
The major problem with other file systems like the Unix FFS is that they spread different files all over the disk. When a new file is created at least five head seeks are required for it. Hence, in these file systems most of the time is spent in seeking and only 5-10% of the disk bandwidth is utilized for the actual write. For crash recovery, these file systems have to scan the entire disk for consistency checks. A log-structured file system called Sprite FS is implemented which writes all modifications in the cache at the head of the log, all at one place sequentially thus utilizing 65-75% of the disk bandwidth for the actual write. Also for crash recovery only the most recent portion of the log needs to be examined.

Contributions
1. Basic design of the file system - While the basic notions of inodes and directories remain the same between Unix FFS and Sprite FS, the location of inodes is different between them. While Unix FFS stores all inodes at a fixed area of the disk, Sprite FS stores inodes in the log. So, inodes are spread all over the disk and are usually stores sequentially with the file data. The location of the inodes can be found given the file number by doing a lookup into inode maps. Inode maps are stored along with checkpoint information at fixed locations on the disk.

2. Free space management - A log-structured file system needs long runs of free space to commit modifications sequentially. When a disk is new the entire space is free. When files get overwritten or deleted holes are introduced. Two techniques can be used to write into these holes. Either the live blocks can be consolidated (de-fragmented) and free space interspersed can be squeezed out. This is called copying. The other technique is to not to move the live blocks of data at all and write the log in only the free/dead blocks. This is called threading. Sprite FS uses a combination of these techniques by having segments of disk space. Inside a segment copying is used to consolidate free space. Threading is used thereafter.

3. Crash recovery - At regular intervals checkpointing information containing information of blocks of inode map, segment usage table and the last segment written. After a crash, the checkpoint information along with a technique called roll-forward in which the log segments after the checkpoint are read, is used to recover as much information as possible.

Evaluation
A comparison is made between the Sprite FS and the FFS used by SunOS. In a benchmark where a large number of small files are read, written and deleted, Sprite FS achieves better performance than FFS. Since Sprile FS writes all modifications sequentially, it loses file locality. Hence sequential reads after random writes is worse than FFS.

Confusion
The concept of directory operation log in the context of crash recovery is not clear to me.

Summary
Oustershout’s team at Berkley present here the Log-structured file system (LFS). The papers details the working, design approach and implementation of LFS. It introduces several new ideas over the already existing FS available at the time. The FS buffers all writes in segments and writes to disk sequentially thus using majority of the available of the disk bandwidth. There are no in-place updates instead logs are organized in segments. The garbage collector cleans fragmented and outdated segments. Another highlight of LFS - allows for fast and easy recovery.

Problem
In most existing file systems, the information is spread across the disk, causing small seeks during reads. And writes to metadata all happened in real time whenever the data block was updated. And in-place updates meant increased write times due to large seeks. Memory was becoming cheaper and with larger cache, serving several reads without disk access and hence disk traffic comprised mostly of writes – a crucial performance analyser of any file system. Typical file systems at the time witnessed poor performance on common workloads such as FFS requires large no. of writes for file creation alone. In addition, file systems were not RAID aware.

Contribution
The basic idea of LFS is close to what is seen in databases. When writing to disk, all updates to the data and metadata (inode block) are buffered into segments. The size of a segment is typically 512 KB. All writes occur first to the segment and when the segment is full, the entire segment is written at once to the disk in one, long, sequential transfer. This leads to faster and asynchronous writes. There is no requirement that the segments should be contiguous on disk.

The above features however lead to few challenges. Every data and its update are scattered on disk , each has different version. Keeping track of the newly placed data and inodes is crucial for access and recovery. In LFS, the idea of imap is introduced. Imap is mapping of inode number to its location on disk. While writing the data, the imap is also written to log in same segment but this means we now also need to track the location of imap. All the imap locations are written to a checkpoint region which is stored on a fixed location on the disk. During a read, the fs looks up the imap. If imap is not in memory, read the CR and the fetch the imap. Subsequent lookups then do not need to do an IO to fetch the inode.

No in-place updates means a lot of stale data is probably around which needs to be garbage collected. During cleanup, LFS reads a set of segments, writes out live blocks to other segment and frees the old one. The next challenge is to determine live blocks. To do this, there is segment summary block in each segment which the stores the inode number and offset. Live and dead blocks are determined by reading and matching contents of segment summary block with that of imap. Another issue that remains is how often should the garbage collection happen? The paper introduces the concept of hot and cold segments, hot segments are the ones whose block contents are frequently over-written. The policy followed is clean the cold segment sooner than hot segment. A segment usage table is maintained to implement the policy. The table stores for each the segment, the number of live bytes and the most recent modified time of a block in the segment. For crash recovery, LFS reads the CR, following the pointer from CR to end of log and applying all changes that happened since the last checkpoint.

Evaluation
Micro-benchmarks on Sprite LFS and SunOS show that LFS is 10 times faster for small file and random write operations. The write performance of LFS is certainly better and read performance is similar to Unix. The sequential write pays off for LFS. Sprite LFS is able to use 65% - 75% of the disk's raw bandwidth where unix only uses 5% - 10% for writes. LFS exploits temporal locality, data that is used at around the same time is at the same place in the log. If logical locality and temporal locality are about equivalent (sequential read after sequential write), both systems will perform about the same and different otherwise. Crash recovery in LFS is fast due to checkpoint and roll forward in contrast to Unix FS where fsck takes a lot of time as it scans the entire fs.

Confusion
LFS attains crash recovery in either of two ways. It reads the Checkpoint regions and repeats the log. I did not quite understand the roll forward mechanism and the importance of directory operation log in recovery.

Summary :
The paper deals with the design and implementation of a filesystem that is log structured wherein any updates are always written to free locations as in a log and do not overwrite any existing data. It also deals with the buffering of updates done in segments, the policies of the segment cleaner and mechanisms for crash recovery. The evaluation is done over a prototype called Sprite LFS and is seen to obtain the major performance benefit for small writes.

Problem :
Disk traffic is mainly dominated by writes since modifying/creating a file would require multiple random writes of the data blocks and metadata The write performance is heavily impacted by the high number of seeks. The problem here is to design a filesystem such that could make effective use of the sequential bandwidth of the disk and be able to convert the above set of random writes to sequential writes thus improving the performance of writes.

Contributions :
1. Buffering of updates as segments in memory makes it efficient to transfer the updates to the disk using a sequential write using minimal number of seeks.
2. An additional layer of indirection added using the inode map to find the location of the latest version of the inode given the inode number. Since the inodes are spread across the disk, this mechanism is required to find the location of the inode. The checkpoint region maintains the locations of the inode maps.
3. Mix of copying and threading at the segment level used for freeing up segments. Live data from a segment is copied to the new location before freeing up the segment.
4. Segment summary block used to go to the inode and check if a block is live or dead. Uses version numbers in the segment summary block that could be used to quickly compare it with that in the inode map to determine block liveness.
5. Cost-benefit policy with the segment usage table used to check for hot and cold segments so as to choose segments for cleaning.
6. Checkpointing is used with the roll-forward technique to handle crashes during writes to a segment by recovering the information written since last checkpoint.

Evaluation :
A couple of microbenchmarks have been used to compare the performance of SpriteLFS with SunOS 4.0.3. The evaluation of create, read and delete for small files indicate an improvement of 10x as compared to SunOS. SpriteLFS has been observed to keep the disk 17% busy as compared to 85% by SunOS. It has a higher write bandwidth for sequential and random writes and the same read bandwidth as that of SunOS.

Confusion :
Which application level benchmarks would be most suited to LFS? Are there better policies for when to schedule the segment cleaner?

1. Summary

Log structured filesystems can offer serious performance benefits over more classical filesystems in the case of small files or random writes. In addition they can offer better crash tolerance and crash recovery.

2. Problem

Contemporary file systems are inefficient with random writes, and inefficient with manipulating small files. Much of this inefficiency can be traced to the seeks required to get to the relevant data.

3. Contributions

The core concept is that of a log based filesystem, one where all the data is held in a log. Within the log, the actual structure of the filesystem is similar to classical filesystems, with inodes and data blocks written similarly. But instead of fixed locations for inodes and data, the file system is separated into segments, each of which contains both inodes and data, and when a data is changed, the change, with the new inode and changed data blocks is written out to a new segment. When changes are made, they are first buffered in memory until either there exists sufficient new data, or a fixed period has passed, at which time a new segment is written out with the new data.

Because the data is written out as a log, with new data written separate from old data, the system also has the effect of being crash tolerant, in it's most simple form, by adding a bit of data recording a "checkpoint" where the filesystem is guaranteed to be in a consistent state, the system can be brought up after a crash by simply ignoring data in segments that have not been committed. On top of this they further implement a "roll forward" technique that allows the to recover some of the data written since the last checkpoint.

This works up until one runs out of segments to use. To clean segments, the data from multiple segments is read, and then rewritten to a new set of segments, with any unused data discarded, this allowed not only the reclaiming of unused space inside of segments, but allowed some degree of reorganization of the data in the rewritten segments, as to group older data together. using this along with a cost benefit analysis achieves efficient compaction of the data without too much copying

4. Evaluation

As their goal is small file performance, they reasonably analyze small file performance, and see a significant speedup as compared with the SunOS filesystem. They also estimate the performance if the tests were not as CPU limited that seem to indicate that with a faster CPU and the same speed disk, the gains of LFS would be even larger. Here though, it is not clear that this is of interest, since both

They also further analyze the case of large files. here the advantage is less pronounced, and in particular, it shows a significant degradation of the case of rereading a file after random writes into that file. Here it is clear the cost of optimizing for the small files case. And there exists certain workloads that do require random writes to large files, so while they are not common, there are considerations that are to be made here.

Of interest would have been to analyze crash performance, perhaps in simulation, seeing what percentage of recently written data would be recoverable under the new system. Presumably this would be more than on the SunOS filesystem, but the test data would be enlightening.

5. Confusion

Why did they want a bimodal distribution for the segment utilization, and what is the optimal segment distribution ignoring the cost of reaching such a distribution.

1. Summary
This paper presents Sprite LFS, a log-structured file system (LFS) which stores data, metadata and all modifications to the data in a sequential log. This speeds up disk access, improves the disk bandwidth and improves the recovery process in case of a crash when compared to Unix file systems. It also presents some other mechanisms like segment cleaning, checkpointing and roll forward to maintain optimal performance and consistency of the file system.

2. Motivation
In designing a log-structured file system, the authors focussed on improving the disk I/O when dealing with workloads containing large number of small files. Another side-effect of this was also to improve the speeds of crash recovery. They assumed that improvement in hardware will take care of other kinds of workloads like large file access. Some techniques developed however seemed to help with all kinds of workloads.

3. Contributions


  • The notion of logging was not new in file systems. Many of them used them as auxiliary data structures to improve writes or crash recovery. However, the data is permanently stored in a traditional random-access storage structure on disk. In contrast, LFS stores the data permanently in the log

  • For workloads that contain many small files, a LFS converts the many small synchronous random writes of traditional file systems into large asynchronous sequential transfers that can better utilize the disk bandwidth

  • The disk is divided into logical units called segments which can be further composed of blocks for efficient use of disk space. Defragmentation process is called segment cleaning. The policy used is called cost-benefit policy which improves the file access time without incurring too much cost involved in moving the data while regaining the fragmented space

  • The sequential nature of the log also permits faster crash recovery. The Unix file systems have to scan the entire disk to restore consistency. While, LFS needs to examine the log since the last checkpoint. Checkpoints are taken regularly which corresponds to a state of the file system when all the data structures and data are consistent and complete

  • LFS maintains some similar metadata used in Unix File Systems like inode and also maintains some new data structures like inode map, segment summary, segment usage table, checkpoint region, directory change log, etc. for efficient access to data, improve the disk bandwidth and aid in defragmentation and crash recovery

4. Evaluation
Benchmarks experiments show that writing speed of Sprite LFS is almost ten times as fast as Unix FFS for a large number of small files. It matches Unix FFS with large-file accesses. Sprint LFS has higher write bandwidth than Unix FFS in all cases achieving a significant improvement for random writes as it converts them to sequential writes to the log; it is also faster for sequential writes. The read performance is similar to Unix FFS except for sequential reads of files written randomly. The above figures are when the segment cleaner is not running. However, it was seen not add too much overhead with the use of cost-benefit policy. Sprint LFS achieves a disk bandwidth usage of 65-75% compared to 5-10% in Unix FFS. The crash recovery duration depends on factors like checkpoint interval and the nature of actions to be recovered.

5. Confusions
I did not understand why a segment can have more than one segment summary block.
Will LFS be faster than traditional file systems in machines with fast hardware enough to make them disk bound? Can traditional file systems match the performance of LFS with small files with asynchronous writing of the metadata and data?

1. Summary
The authors detail the issues of contemporary file systems—primarily the issue of poor disk bandwidth utilization in the common case. They then present a technique of log-structured file system, which writes changes to file system in the form a log sequentially to disk. They implement this technique in Sprite LFS and evaluate it to show the benefits of log-structured file system.

2. Problem
Typical file system workloads involve mostly small files (order of few kilobytes), and creating, modifying and deleting them involve lots of random I/Os due to updates to meta-data. This led to contemporary file systems only using a very small fraction of disk bandwidth in the common case. The authors design a file system, which performs well in this common case and speeds up file writes and crash recovery.

3. Contributions
The contributions in this work can be categorized into: (a) file storage management in using logs; (b) cleanup to maintain large extent of free space; (c) supporting faster crash recovery.

Sprite LFS uses the same file storage management of inodes as existing
file systems to maintain meta-data to locate data blocks of files. However,
instead of placing these inodes in fixed locations in disks (like
existing file systems), it uses an inode map to locate the most recent
version of inodes. inode map blocks themselves are written as
blocks in the log with a checkpoint information containing the set of blocks
that form the inode map.

As files are actively created and deleted, they leave holes in the log, which
need to be reclaimed to write more data. This issue can be addressed by copying
live data to the beginning of the log (copying), or by writing the log in the
dead space the old log (threading). Sprite LFS addresses this issue of
maintaining a large extent of free space using a hybrid of the two approaches by
splitting the disk into large segments, and cleaning up under-utilized segments
by copying live data in them into free segments, while threading around
well-utilized segments.

Sprite LFS achieves faster crash recovery by writing checkpoints as regular
intervals containing information of blocks of inode map, segment usage
table, and the last segment written at checkpoint. On recovery, Sprite LFS
recreates the main memory structures using the latest checkpoint information,
and replays the log from when the checkpoint was written.

4. Evaluation
The authors measured the performance of Sprite LFS and compared it to that Unix FFS based file system in SunOS. In a micro-benchmark of creating, reading, and deleting a large number of small files, Sprite LFS performed an order of magnitude better than SunOS in create/delete, and at least as good in read. The authors also showed that Sprite LFS loses locality in file content by making random writes sequential using logs, i.e., random writes to a file followed by sequential read will perform badly in Sprite LFS. The authors provide usage statistics of a real-world deployment of Sprite LFS, and show that it performs better in real-world than simulation in terms of write cost.

5. Confusion
The authors mention in passing about other techniques to create large free spaces—one such is using hierarchy of logs. Can you elaborate on this?

Summary:

This paper describes how a file system that logs data in memory can reduce the cost of writing to disk. The authors argue that this improves the performance of writes by accumulating a number of smaller writes and exploiting the sequential write benefits. What's even more interesting about this paper is the fact that the authors add garbage collection mechanisms to their file system and this helps to create a more efficient log based file system implementation.

Problem:

The authors attempt to bridge the widening gap between CPU and disk performance. Their main issue with existing file systems was the need of updating metadata with small random writes that create latency issues due to seek and spin time.

Contributions:

The biggest contribution of this paper is the idea of a log structured file system that incorporates a garbage collection mechanism to ensure that free space is almost always available. To achieve this design goal, the authors designed a segmented file system that grouped blocks, allowing them to use a combination of threading and copying when dealing with cold data.

They were also able to exploit the indirect nature of the inode map to implement a compaction mechanism within these segments.

Other interesting contributions include a two pronged approach to crash recovery. They use a checkpoint system to create images of consistent states and a roll-forward to try and reduce the amount of data that is lost while resuming previous checkpoints.

Evaluation:

The authors compare the performance of their implementation, Sprite LFS, with the FFS implementation used by SunOS. They achieve better performance for most cases with their microbenchmarks. The only case in which they perform worse is the final sequential file read of their benchmark.

Confusion:

The authors mention in two places that if the memory utilization is 0, the write cost would be 1. This implies that writes would have the entire bandwidth available. However, their formula ensures that the minimum write cost is 2. Is the extra cost due to compaction?

Summary:

In this paper, the authors describe the log structured file system which sequentially writes all modifications to the disk. The LFS is designed to speed up writing and crash recovery. The paper discusses in detail about the free space management using segments, the mechanism of cleaning and the policies governing them. They also talk about recovering from crashes using checkpoints and roll-forward. The implementation of LFS called Sprite LFS is benchmarked.

Problem:

The speed of the CPU is much higher than the disk. Because of this the disk transfer speed becomes the bottleneck as a slow file system could affect the performance of a fast CPU. Also the increase in main memory size could result in bigger files being cached and therefore reads would take place at main memory and writes to disk would be major factor affecting the performance. File systems like FFS did not consider this. Also recovery from crashes were slow as the Unix file systems present then had to scan the entire disk. The LFS tried to improve the system performance by improving the speed of writes and crash recovery.

Contribution:

The LFS tries to improve write performance by buffering a sequence of file system changes and then writing it to the disk. LFS tries to write sequentially and changes to the file are written in new locations of the disk and the old version space is reclaimed by the reclamation process. The LFS uses inodes as the basic structures for files. As a file is updated, the new version along with its inode is written to a new location. The current inode location of a file is kept track using the inode map. The disk is divided into segments for free space management. In a given segment, the data is written sequentially. To identify live and outdated blocks, segment summary blocks are used by the LFS which contains the file number and block number when that segment is written. The policy on which blocks to reclaim is analyzed by the authors based on write cost. The LFS classifies the segments as hot and cold based on how often the files are modified in those blocks. The cold segments are reclaimed first as the files in that segment will not change often. In order to keep track of hot and cold segment the segment usage table structure is maintained. LFS alternates between two different check point regions in fixed location which contains a consistent version of inode map and the corresponding blocks along with a timestamp. Check pointing is done at periodic intervals to reduce overhead. Roll forward is used to recover as much data as possible.

Evaluation:

The performance of Sprite LFS against the Sun OS with micro benchmarks showed that LFS performed much better than Sun OS. But this was without considering the cleaning overheads. LFS does not out-perform Sun OS only in case of large file re-reads as Sun OS has better logical locality. The cleaning overheads in actual implementation were lower than the simulations due to longer files being absent in simulations.

Confused about:

In roll-forward I did not get how directory operation log is used to restore consistency between directories and inodes.

1. Summary
The paper talks about the design and implementation of Log-Structured File System(LFS) which improves the bandwidth utilisation of the then existing disks by doing sequential writes as opposed to random writes in the previous file systems. This also ensures that the read speeds are comparable to the previous existing file systems.

2. Problem
Unix file systems use only 5-10% of the disk’s raw bandwidth. The rest of the time is spent in seeking to the corresponding inode and the original data sectors. With the processor speed and the memory capacity growing, it was important to improve the disk performance so as to not cause an imbalance in the system. Also, with small file type workloads, most of the time is spent in updating the metadata. Since the existing file systems were doing synchronous writes, the applications had to wait for the disk write to complete, thus causing a lot of unnecessary standbys.

3. Contributions
a) Designed a log-structure that stores all the necessary data needed to access a file block. When the buffer becomes full, an entire sector of data is written sequentially preceded by the log that contains all the metadata, thus increasing the write speed by a magnitude. An inode map indexes a given file’s latest inode.
b) Obsolete data is cleared by threading and copying. Threading jumps active sectors while copying groups big live+old data segments into small live data segments, thus making space for new data in the future. The segment cleaner performs this task.
c) The segment cleaner identifies blocks that need cleaning by matching the file to the latest inode. This information is stored in the segment summary block in the beginning of every segment write. To increase the efficiency of cleaning, a segment usage table is also stored at the beginning of every segment that stores the number of lib bytes in the segment and the recent usage time of the segment.
d) Crash recovery is obtained by checkpoints and roll-forward. Checkpoints are stored once every 30 seconds in a particular position on the disk that is referred to while boot, thus updating the system to its recent state. Roll-forward enables the system to even move forward from the last checkpoint by identifying completed log writes and inode data.

4. Evaluation
The authors evaluate LFS by developing Sprite LFS. Initially, the perform some simulated tests to identify good cleaning policies. It was found that using a bimodal distribution (hot+cold segments) was good in addition to the age of the segments when compared to greedy LFS that just cleans the least recently used data segments. On most tests, LFS is better than the existing file systems due to the sequential writes. However, in case of random writes and sequential reads, traditional file systems prove to be better as sequential reads tend to be faster in them.

5. Confusion
Where is the directory operation log written? Ideally, it should be written at the beginning of the log so that it records all the directory data that has been modified and can be restored during a crash.

1. summary
LFS incorporates the novel idea of using a log like structure for disk storage management. The basic idea behind LFS is, collect large amounts of new data in a file cache in main memory then write the data sequentially to disk in a single large I/O. LFS is based on the assumption that with increasing memory sizes, most read request will be satisfied from cache. LFS drastically increase the write performance as they are sequential and make crash recovery much faster since, only the recent logs are need to be scanned. In this paper the design of LFS, cleaning policies and crash recovery mechanism are presented. SpriteLFS, a prototype is implemented and its performance is evaluated against traditional UNIX FFS.
2. Problem
Improvements in technology like, increasing processor speed, increasing main memory size is not leveraged by the current FS. There are two general problems that make current FS difficult to cope with the technological improvements. Firstly, related information is spread across the disk causing many small accesses. This causes most of the disk's bandwidth to be used for seeking rather than on data operations. The second problem is that writes are synchronous, this ties the application's performance to that of the disk. This defeats the purpose of having faster processors and also having large MM caches.
3. Contributions
The main contributions of this paper is the design and implementation of LFS.
File Location and reading: LFS maintains per file DS inode. Inode map is used to track the inode locations as inodes are written to log. Inode map is divided into blocks and written to log. Fixed checkpoint region tracks these inode map blocks. Usually the inode map is small and kept in memory.
Segment cleaning Mechanism: Segment cleaning is done in three steps: read a number of segments, identify live blocks, write back only live blocks. To identify live blocks, LFS uses Segment summary block DS. Segment summary block is a per segment DS containing UID(inode# , version number) , file#, block# within the file for each block in the segment. If the UID in the inode map doesn't match with the UID of the block then the block is dead.
Segment cleaning policy: Segment cleaning starts when # of clean segments goes below a threshold and continues until a # of clean segments are obtained. Cost benefit policy is used to select segments to be cleaned. The policy rates each segment according to benefit(free space * age) of cleaning and the cost of cleaning, LFS picks the one with highest benefit to cost ratio.
Crash recovery: LFS uses checkpoints and roll-forward for recovery. Checkpoint is a position in the log where all the FS DS are consistent and complete. LFS performs checkpoints at periodic intervals. During recovery the latest checkpoint region is read to bring the system to a consistent state and scans all log segments after the latest checkpoint region and updates the FS DS accordingly. This operation is called roll-forward.
4. Evaluation
Sprite LFS uses 70% of disk bandwidth for writing while FFS uses only 5-10%. LFS provides better write performance and performs just as good as FFS for reads. In microbenchmark LFS is compared with SunOS using FFS and LFS is almost 10 times faster for create read and delete operations. Cost-Benefit LFS outperforms Improved FFS as the write costs are less even at high utilizations. Crash recovery is also much faster as only part of the log after checkpoint is scanned.
5. Confusion
How does LFS perform over modern hardware improvements ?

Summary:
Paper is about log structured file system which was designed to make full use of disk bandwidth by aggregating small writes and writing them sequentially. It helps in improving write performance and also helps in crash recovery. Free space management of segments and cleanup policy is also discussed.

Problem:
CPU speed was increasing at that time and disk operations were bottleneck. As size of main memory was increasing, read request were served from memory but write request required an access to disk. Authors had observed large number of small write which require seek as a result write operations were slow and disk bandwidth was not used to the level. By aggregating small write requests and writing sequentially author tries to improve performance.

Contribution:
Major contribution is the design of LFS. For all new updates in file, metadata(inode, imap) and data is written sequentially at end of the log and thereby reducing seek time. Inode map was used to point to most recent version of inode and location of imap was cached in memory. Logs were made up of segments and segments were written sequentially. Since for each update in file a new inode, imap, data was written it needs some mechanism and policy to free old garbage data. Segment summary block contains inode and offset for each block which was compared to offset info in corresponding inode to detect liveness of block. To efficiently detect live block, version number was stored in inode and summary block. File system was checkpoint at periodically and in case of crash it was made consistent by reading checkpoint. Checkpoint was made at two different locations so that failure during checkpoint update can be handled.

Evaluation:
Performance of LFS was compared to FFS. Random write and sequential write performance of LFS is better than FFS. LFS was able to use around 70% of disk bandwidth while FFS was able to use 5-10% of disk bandwidth. LFS is 10 times faster for creating and deleting small files as compared to FFS. By simulation cost-benefit policy was shown to be better than greedy policy.

Confusion:
I didn't get concept of roll forward used for crash recovery that well.

1. Summary
This paper presented a log-structured file system called Sprite LFS for disk storage management. It caches new data in main memory and writes back into a log-like structure sequentially in a batch. Segments cleaning mechanism and policy are developed to manage free space,which enables fast cleaning/writing. Checkpoints and roll-forward enables fast and easy crash recovery.

2. Problem
Imbalanced speed improvement of processor and disk makes the latter to be the performance bottleneck. Larger main memory makes writes dominant in disk traffic rather than reads.
Existing file system design 1) spreads the information round the disk so most of bandwidth is spent seeking metadata while only a few reading/writing new data; 2) writes data/metadata synchronously and does not take advantage of file cache as write buffers, which reduces write performance and cannot benefit from faster CPU.

3. Contributions
The idea of log-structured file system is that all writes are buffered sequentially then doing a single disk write. It shares the same file system structure as Unix FFS such as inode and indirect block, but writes into log continuously and compactly. Inode map helps track where each inode is located.
Free space is managed by a combination of threading and copying. The disk is divided into a number of segments. Free segments are threaded and compact copying is adopted within a segment. Segments are large enough so seeking cost becomes negligible.
Segment cleaning mechanism is to copy a number of segments into memory, and write live data back to a smaller number of segments, leaving the rest free for new data. Segment summary block facilitates this functionality.
This paper explored several segment cleaning policies to determine which segments are cleaned and how they should be grouped when written back. For data with locality, they achieved low write cost when using cost-benefit policy to select segments and age-sorting to group live blocks. Segment usage table holds statistics to implement this policy.
Checkpoints indicate when the system is complete and consistent. New checkpoints are created by writing all modifications into disk and add a checkpoint into checkpoint regions, which are checked during system booting. Roll-forward recovers data modified after latest valid checkpoint, including editing inode map, discarding incomplete data and maintaining directory operation log consistency.

4. Evaluation
Sprite LFS was compared to SunOS 4.3 running Unix FFS on micro-benchmarks. When there is no segment cleaning, it shows a speedup factor of ten for file creation and deletion, far better write performance, and comparable read performance, except sequential reread data that is randomly written. Cleaning overhead is significantly lower than expected in terms of write cost and utilization, which may be partly due to different usage patterns between practice and simulation. Crash recovery cost grows as increasing sizes and numbers of files. Bandwidth analysis shows that over 85% are occupied by data transfer and 7% are used by inode map.

5. Confusion
I don’t understand how to determine a block’s liveliness. The paper reads ”Once a block’s identity is known, its liveliness can be determined by checking the file’s inode or indirect block to see if the appropriate block pointer still refers to this block.” Does this mean we need to scan through all inodes? But this seems to be very slow.

Summary
The paper introduces Log-structured File System (LFS) designed for high write throughput, in which LFS buffers all updates (including metadata) in an in-memory segment and then write the segment to disk in one long, sequential, asynchronous bulk transfer to an unused part of the disk. Write performance can therefore approach raw maximum of the disk and this approach also speed up the crash recovery process as well.

Problems
Traditional filesystems like FFS can use large disk caches to increase the efficiency of reads, but still perform writes synchronously. Especially for metadata, this can involve several seeks per block of data transferred; this limits effective disk utilization to a small percentage of the theoretical maximum. As memory becomes cheaper and disk caches get larger, disk traffic will be dominated by writes; thus authors came up with a new file system called LFS that focus on write performance and try to make use of the sequential b/w of the disk.

Contributions
LFS improves write performance by buffering sequence of FS updates in the file cache and then writing all the changes to disk in a sequential structure called the “log”. This approach eliminates almost all seeks and sequential nature of the log also permits much faster crash recovery. Since, we are working with sequential logs structures, LFS doesn’t place inodes at fixed positions; they are written to the log. LFS uses “inode map” data structure to maintain current location of each inode. Inode maps are updated whenever inode is written to disk and inode maps are compact enough to be cached in main memory for quick access. LFS always write to unused portion of the disk (something called as copy-on-write), and then later reclaims that old space through segment cleaner. The authors explain the policy (which blocks to clean - with the help of hot and cold segments) and mechanism (determining block liveliness by adding segment summary block which keeps information for each segment) behind garbage collection to ensure large chunk availability for new data. LFS also uses the concept of checkpointing and roll forward (redo) for crash recovery.

Evaluations
Authors have implemented a prototype LFS called Sprite LFS and evaluation results show that it outperforms Unix FS by an order of magnitude for small-file writes while matching or exceeding performance for read and large writes. When overhead of cleaning is included, Sprite LFS can use 70% of the disk b/w for writing whereas Unix FS typically can use only 5-10%.

Confusions
What if the file being written exceeds 1 MB (segment size) or how will it perform for concurrent writes to a file?

Summary:
This paper presents log-structured file system - a new technique for disk storage management which writes all modifications to disk sequentially. The system aims to increase the writing performance by buffering the writes in memory and then writing them sequentially, thereby reducing time spent in seeking. LFS requires large contiguous free spaces to write new data. LFS relies on its its garbage collector to make segments of free space available and then copies live data from segments with low utilization and makes the old segment available for writing new data. The paper also proposes a new crash recovery system which saves consistent states periodically. During recovery the latest checkpoint information is retrieved and entries at the end of the log are parsed to recover information written since the last checkpoint.

Problem:
There was a growing disparity between CPU speeds and disk access times, which made applications disk bound. Thus, LFS was designed to utilize disks an order of magnitude better than the other file systems.

Contributions:
- LFS always writes data sequentially to the log and thus eliminates time spent on disk seeks.
- LFS buffers data blocks and meta data before in memory writing this on to the disks. Thus LFS converts random write to sequential writes and makes writes faster
- For segment cleaning mechanism, LFS uses an unique identifier (uid) formed from the version number combined with inode map entry for each file. Thus, LFS does not rely on free list or bitmap. The elimination of these data structures not only saves memory and disk space, but also simplifies crash recovery.
- LFS uses checkpoints to perform crash recovery and uses a method called roll-forwarding.
File systems prior to LFS did not have log systems and could not identify changes made to the disk just before a crash. They had to scan all metadata structures to restore consistency. In LFS, the last disk operation resides at the end of the log. LFS reads the logs after the last checkpoint and checks for valid updates. LFS uses the segment summary block to check for new inodes in a segment. If they are present, LFS updates the iMap to point to the new inodes. This automatically restores the data blocks pointed to by the new inodes.

Evaluation:
Benchmark programs demonstrate that the raw writing speed of LFS is more than an order of magnitude greater than that of Unix for small files. Even for workloads, such as those including reads and large file accesses, LFS is as fast as UNIX in all cases, except where files are read sequentially after being written randomly. Also, LFS permits about 65-75% of the disk’s raw bandwidth to be used for writing new data.

Confusions:
What is the size of a segment?

1. Summary
The paper describes the design of the Log-structured File System. It is based on the assumption that most of the reads would be served from the memory and thus LFS optimizes for writes by making all writes sequential.

2. Problem
The technological progress resulted in faster CPUs and faster and cheaper main memory. However disk performance did not improve much. The larger disk latency would turn out to be a bottleneck in disk I/O operations.

3. Contributions
The main contribution of this work is the design of the Log-structured File System. While logs were already being used in file systems, unlike LFS, they were not used as the primary data store. In LFS, the data and the metadata is all stored in a sequential log. This makes the disk writes sequential requiring only one seek operation. LFS prioritizes very fast writes since it assumes that most of the reads would be served by the file system cache in the memory. To manage free disk space effectively, LFS divides the disk into contiguous segments. LFS writes an additional segment summary block per segment that helps speed up the cleaning process. It takes into account both the cost of cleaning a segment and the benefit of cleaning it before deciding which segments to clean. LFS also writes two checkpoint regions in fixed places on the disk that aid in crash recovery. The choice of two checkpoint regions is to make recovery from a crash during checkpointing possible. While recovering from a crash, LFS also tries to read beyond the last checkpoint to find any recoverable data.

4. Evaluation
The authors run microbenchmarks to evaluate the performance of LFS. They find the writes to be considerably fast on LFS as compared to FFS whereas the read performance was comparable. They also found that segment cleaning does not add a significant amount of overhead to overall LFS performance.

5. Confusion
How would LFS deal with concurrent writes to a file? How will it affect performance?

Summary
The paper introduces SpriteFS, a log structured filesystem which buffers all the writes and then batches them into a single large sequential write to disk. This reduces the seek time for writes while maintains comparable read times. Sequential log write pattern makes crash recovery easy as we only need to examine the most recent portion of the log instead of scanning the entire disk.
Problem
The processors became faster, main memory became larger and cheaper; but the disk technology saw no significant improvement performance. Larger main memories support large file caches of the file system and thereby handle most of the read requests quickly. As a result the disk access was more write-dominated, than read-dominated. There was a need to make the write requests fast. Also the
Contribution
The main idea of buffering the file system changes and then writing them to a log in a sequential manner to achieve greater write bandwidth and low seek time is a major contribution. Instead of storing all the inodes at a fixed place, LFS uses a datastructure called inode map that indexes all the inodes stored at different places. And this inode map of active set is compact enough to be stored in main memory and hence requires zero disk access for almost all read requests. Another contribution is how they handle free space. The memory is divided into segments and each segment keeps track of whether or not the inode still references the block. And when they run out of disk, they clean the segments. Cost-benefit approach is implemented while cleaning to achieve higher utilization. This bimodal operation is good for performance since files that are likely to change soon anyway (by temporal locality reasoning the ones recently changed) aren't unnecessarily moved. LFS also assist easy crash recovery through checkpoints(consistent state of the file system; like a snapshot) and roll-forward which is used to recover information written since last checkpoint.
Evaluation
The performance of LFS was compared to SunOS4.0.3, whose file system was based on UnixFFS. Sprite LFS was substantially faster for random writes because it turns them into sequential writes to log; also faster for sequential writes because it groups many blocks into a single large I/O. The read performance was similar in the two systems except the case of reading a file sequentially after it has been written randomly, because in this case spritelfs require seeks for reads and sun-os do not. To estimate the cleaning overhead, they collected the cleaning statistics over a period of four months; they observed that the write costs were lower than those in simulations.
Confusion
Will LFS have performance gain because of write fragmentation, even on flash memory where seek times are usually negligible?

Summary:
The paper discusses a new storage technique that involves using a log structure to buffer writes to the disk. This helps in achieving near sequential write performance from the disk due to improved bandwidth utilization. The paper first describes the problems with the existing file system techniques, discuss the design and policy decisions of the new log structured technique and then evaluates performance of Sprite LFS against Unix FFS to show that it is an order of magnitude faster for small files. The authors also discuss about crash recovery techniques to return the system back to consistent state as much as possible when a crash occurs.

Problem:
The primary motivation revolves around the fact that disk access times is slow and most of the access are writes since reads are served from cache. Hence overall throughout highly depends on write time. Processor speed is increasing drastically and disk writes needs to scale to improve overall scalability of the system. There are two more problems with the current file systems with newer workload characteristics :
a. Spread information Across the disk so there are many small accesses.
b. Asynchronous: Application has to wait until the write is serviced before moving ahead.
C. A common operation like writing to inode and then to file, but the inode and data blocks are spread across the disk.

Contributions:
Segments and Free space management: The key idea is write to a log in memory before going to disk to update when data in the log is enough to amortize the cost to write to disk. Thus, the log writes a chunk segment together which works at near sequential speed since write and seek time is reduced. LFS uses different levels of indirection to achieve this, mainly first from checkpoint region to imap, imap to inode and inode to data. For free space management, two primary techniques are described: Threading overwrites blocks which are no longer needed or deleted, whereas copying works by grouping active segments together and using the new free blocks to write new data.

Segment cleaner : This works to eliminate old blocks which are no longer needed since live data has been written into clean segments whch are tracked using segment summary blocks. The authors also describe segment cleaning policy decisions focused on which segments should be cleaned and how can grouping for current blocks be optimized for future performance.

Crash Recovery : LFS using check pointing and roll-forward for recovery. Checkpointing is done periodically(30 s) where file system state is written on a fixed location on disk. Once this is done, in case of a crash, we know the system is consistent upto checkpointed state. Beyond this, roll-forward is done to scan through logs and segment summary blocks since the last checkpoint state to get back to consistent state.

Evaluation:

Sprite LFS was built and evaluated against Unix Fast file system and the workloads were synthetic benchmarks consisting of small and large files. Spring LFS is faster by order of magnitude (10x) as compared to Unix for small files and similar to Unix FFS for larger files. This seems to stem from the result that disk bandwidth utilization in Sprite LFS is much more as compared to Unix FFS since larger segment writes utilize the disks better.

Confusions:
The primary performance benefit is achieved with small files and that is obvious since buffering small files and then updating disk together is beneficial. Do the workloads today consist of small files and does the log structure still hold in those terms? How have workloads changed over the years and what were the drawbacks of logging with the old and new workloads? Also, do read requests served from the cache scale as the cache size increases?

Summary:
The paper describes Log-structured file system, a new file system which allows sequential writes in form of logs inspired from database systems and faster crash recovery. The paper also describes segment cleaning mechanism for reclaiming disk space.

Problem:
While processor speeds, main memory sizes are increasing at an exponential rate, a slow improvement in disk access times are making applications disk-bound. With larger cache sizes, read requests can be handle from cache, making disk traffic dominated by write requests. The authors suggest that disk write performance can be improved by combining the small writes.

Existing file systems spread information around the disk (~5% disk bandwidth to access data) and use synchronous writes which are required for consistency but are less efficient.

Contributions:
1. Write all the modifications to disk sequentially: use file cache as buffer and combine small writes into larger sequential writes.
- Use log structures which contain inode, data blocks and indexing information.
- Indexing information is used to read efficiently. Also, most reads can be served from cache.
2. Comparison of segment cleaning policies (greedy v/s cost-benefit). With greedy policy, a segment does not get cleaned until its utilization drops down the threshold value. Treating the hot and cold segments differently in cost-benefit policy leads to better disk utilization.
3. Faster Cash recovery through checkpoints and roll-forward mechanism.

Evaluation:
The authors implemented a prototype called Sprite LFS and is evaluated against FS in SunOS 4.0.3 (based on Unix FFS). Sprite LFS is 10x faster for create, read and delete phases of the benchmark. Sprite uses 70% of the disk bandwidth for writing whereas Unix FS uses only 5-10%. Various segment cleanup policies were evaluated for uniform and hold/cold access patterns and found that cost-benefit policy provides better disk utilization than greedy. Sprite performs pretty well for large file access even though it was designed for small accesses. The authors found that 13% of the data written comprised of metadata information (inode/map/segment map blocks etc) which could be lowered by increasing checkpoint interval.

Confusions
Would like to discuss consistency issues faced while recovering from a crash.
Also, is there a remote possibility that LFS accepts writes in buffer while there is no disk space to perform the "actual" write? How does it handle such a scenario?

Summary:

The paper presents the design and implementation of a new file system namely Log-Structure File System (LFS). LFS improves the performance of random writes by actually writing the random writes sequentially to the disk and maintaining certain data structures like imap to help retrieve the file inodes quickly. The tradeoff made by LFS is that reads are going to get expensive as the data blocks might be scattered all over the disk for a file but large caches mitigate that problem.

Problem:

As the cache size keeps increasing, read requests are satisfied pretty fast but the problem comes with random, small writes. A lot of time is spent seeking to the appropriate disk location and making a small write. The seek time dominates the transfer of data. The paper provides a way to make random writes fast enough that most such that most of the disk’s transfer bandwidth could be utilized.

Contributions:

LFS makes the random write sequential. LFS divides the disk into large size extents namely segments. When a user makes a random write, it is cached in the buffer and when there is enough data equivalent to the size of a segment it is written sequentially in log in a particular segment.

Problem with making such writes sequential is that, certain operations cause the directory entry to be modified as well as the file inode to be modified along with the data blocks. Hence the inodes get scattered all over the disk. LFS solves this problem by maintaining an imap which contains the address of the latest version of the inode mapped to the inode number. As blocks get invalidated frequently, because the update doesn’t happen in place, a segment gets fragmented. A segment cleaner cleans the segment by reading the live data from multiple segments and writing all the live data as a single segment in the log thereby reclaiming the original segments. Certain data structures namely segment summary blocks and segment usage table helps LFS to easily identify the live blocks. LFS also applies certain policies to determine which segment has to be cleaned.

LFS uses a checkpoint region to help in crash recovery. After a specific interval of time, LFS makes sure all the imap blocks and the segment summary information are written to the disk to make sure consistency is maintained. The checkpoint region is written in a known memory location. Two checkpoints regions are maintained to help in recovery in case a crash happen while writing the checkpoint region. During recovery, the checkpoint region is used to determine the last point checkpointed and the log is checked to determine if there are inconsistencies.

Evaluation:

The paper provides a good way of making random write workload fast. The simulation results clearly show the efficiency of LFS. The paper cleanly explains how to handle different scenarios that might come up while using LFS. The use of imap data structure greatly simplifies the management of inodes which is real take away from LFS.

Confusions:

Little confused about the roll forward operation. How does it find out which are the new inode blocks or new inodes? And how does it find out what is the corresponding inode number for the inode block.

Summary
This paper introduces the Log Structured File System which claims to be faster than conventional file systems of that era in terms of file writing and crash recovery. They obtain this speed up by writing all modifications to files to the disk sequentially in the form of logs thus using up major portion of the complete disk bandwidth. These logs are divided into segments and a segment cleaner is used to clean up the fragmented segments.They developed a prototype log structured file system called Sprite LFS, which outperformed the UNIX file systems in case of small file writes and matched the performance during large file writes.

Problem
Disk access is becoming the bottleneck for any operation’s performance as CPU speeds and RAM sizes are increasing at a faster pace than the disk access times. Traditional file systems perform lot of preceding IOs and seeks for just creating a new file. A lot of time is consumed in these operations. Also the write operations are synchronous thus they keep waiting for the operation to complete.

Contribution
The main idea of this file system was to buffer a sequence of file system changes in the file cache and then writing all the changes to disk sequentially in a single write operation. This improves performance when the workload contains lot of small writes by utilizing the complete bandwidth of the disk. The prototype LFS puts index structures in the log for random access retrievals. The inodes can be placed in random locations when they are written into the log. It uses a inode map which maintains the location of these inodes. This file system also maintains a segment summary block which contains the file number and block number for each file data block. This is used to determine live blocks which is necessary for cleaning the segments. The number of live data in a segment and the modification timestamp is also maintained in a data structure called segment usage table. This information is useful during cleaning operations. To handle recovery LFS stores two checkpoint which contain the addresses of all the blocks in inode map, segment usage table, current time and last segment written. During recovery the checkpoint with the latest timestamp is used. It also incorporates an operation called the roll-forwarding which is used to recover data written after the last checkpoint. It adds any new inode in the inode map that it finds in the summary block.

Evaluation
Small benchmark programs were used to test the performance of the Sprite LFS against SunOS 4.0.3 whose FS is based on Unix FS. LFS performed better than SunOS for random writes and matched performance for reads. Also the predicted overhead of cleaning was very little and was found to be acceptable.

Confusion
If the disk access times are causing a bottleneck, is it feasible to completely remove those interfaces. With SSDs becoming popular, is it possible to eliminate slower SATA, PCI interfaces and directly plugging flash memory to the system and dedicating few cores to handle the flash memory as the flash memory controller. Will this structure improve performance.

1. Summary
This paper introduces and tests the Sprite LFS, with which the author both simulates and tests the performance against the Unix FFS. The Sprite LFS not only outperforms in dealing with small files, but also performs almost the same with large files.

2. Problems
The problems the paper tries to solve include,
In the development of computer, while the CPU speed increases rapidly, the disk access times have just improved slowly, need get to know how to improve the speed; Previous file systems use log just temporary, which is not efficient when read files back; To operate efficiently it is needed to ensure that there are always large extents of free space available for writing new data, which is challenging; Current file system spread information around the disk causes too many small access and they tend to write synchronously; In a log-structured file system, there are two major problems of how to retrieve information from the log and how to manage the free space on disk so that large extents of free space are always available.

3. Contributions
Sprite LFS uses the same data structure of inode as Unix FFS, however, it doesn’t place inodes at fixed positions, instead, they are written to the log. Sprite LFS uses inode map to maintain the current location of each inode. Besides, inode map lookups rarely require disk access.

To deal with problems of managing free space, Sprite LFS combines the use of threading and copying. The disk is divided into large fixed-size extents called segments. Segments are written sequentially and all live data must be copied out of a segment before the segment can be rewritten. However the log is threaded on a segment-by-segment basis.

The segment cleaning mechanism that needs no free-block list or bitmap. This not only saves memory and disk space, but also simplifies crash recovery. Based on this, four policy issues are addressed to achieve higher and more proper behavior of the cleaner. The cost-benefit policy. To support such policy, Sprite LFS maintains a data structure called segment usage table helps improve segment cleaning efficiency.

4. Evaluation
This paper evaluates the performance of Sprite LFS with both simulation and real test. With the simulation, the author founds the defects of cleaning policy and improves it as the cost-benefit policy. With benchmark measurements, the results show that Sprite LFS is almost as ten times as fast as Sun OS for the create and delete phases of the benchmark, it is also faster for reading the files back. Moreover, the disk is only 17% busy compared with 85% of SunOS. The performance of large files is also competitive on Sprite. As to the overhead of cleaning cost in Sprite LFS, it is even lower compared with simulations.

5. Confusions
I’m curious about a side problem that in a long term testing such as the example that takes four months, what if something went wrong during the test?

1. Summary
The paper proposes a log-structured file system mechanism and discusses the underlying principles, policies and cleaning and recovery mechanisms.
2. Problem
The primary problem that the paper looks at is the overheads of small file writes. For small file accesses, most of the time is spent in seeking, resulting in smaller bandwidth utilization. In addition, the write bandwidth is dominated by the metadata writes which are usually written sequentially. This problem might not be alleviated even with file caches.
3. Contributions
The authors propose basic mechanisms for implementing a log-structured file system. Whenever a new file is created or modified, the associated metadata structures like the inode, inode map and the data blocks are written to a log at the end of the current log pointer.
In order to manage free space, the disk is divided into segments and a combination of threading and copying is used to compact the live data blocks and make space for new data segments.
A segment summary block identifies the live data blocks in a segment which is useful to calculate the utilization of the segment. The segment utilization is used to design policies to clean segments in the disk. The segments with lower utilization have lower write cost and thus are suitable for cleaning. A cost-benefit policy is proposed that regroups segments based on their latest access time and free space. This policy seemed to show lower write cost for segments even at high utilization and helps to take advantage of locality. For crash recovery, checkpointing and roll forward is used.
4. Evaluation
The LFS is implemented on a Sun system and the performance is compared against UnixFFS implementation of SunOS. For micro-benchmarks, LFS has higher write bandwidth and is faster for random writes, sequential writes than FFS. But, for sequential rereads, LFS has more seeks and thus higher latency than FFS. In order to estimate the cleaning overheads, the overhead is measured over a period of 4 months and it is shown that the write costs are lower than expected, from 1.2 - 1.6.
5. Confusion
What are the other disadvantages of LFS apart from high latency sequential re-reads? How important is it to have low latency for such re-reads?

Summary
The authors present Sprite, a log based file system. Sprite aims to more effectively leverage the low bandwidth available for disk access. The idea is that because files can be cached in memory and read very quickly once there, any optimization that can improve disk writes while sacrificing reads would be beneficial and Sprite does just this. The log structure allows writes to happen very quickly.

Problem
Hard disks have not seen the same performances increases over time as CPUs. This creates a need for a file system that can get as much performance out of disks as possible. Current file systems require several disk seeks before a write can be done which can be incredibly slow. This causes the majority of the disk’s bandwidth to be consumed by seeking, effectively wasting it.

Contribution
Sprite FS minimizes the number of seeks required for each read by writing all new data contiguously at the end of a log. As the log grows and files get deleted or overwritten, segmentation can occur. Sprite FS solves this with segments, a hybrid between threading and copying data to avoid fragmentation. Segments are large blocks that are always written contiguously. Threading then occurs at the segment level, skipping used segments when making new writes. A segment cleaner runs in the background and combines segments into fewer when data within the segments become fragmented.

Evaluation
The paper compares Sprite FS to Unix FFS, saying that Sprite allows about 65-75% of the disks bandwidth to be used for writes whereas Unix allows only 5-10%. They also note that Sprite handles large and small files in similar ways but Unix suffers when a workload consists of writing many small files. Additionally the authors run some simulations on different LFS segment cleaning policies and compare them to current FFS implementations showing that for low disk usage they are substantially faster.

Confusion
Do the performance gains carry over to SSDs as well where seek time doesn’t hamper reads (and therefore writes in Unix FFS) as heavily?

Summary: This paper shows the design and implementation of a log-structured file system. The new file system optimizes file write performance by appending every update to the end of a log on the disk.

Problem:
1. A traditional file system will split information of a single file all across the disk. For example, the inode and the actual data blocks of a file may not be contiguous. As a result, accessing times are dominated by disk seeking times, and the bus utilization rate is very low.

2. Crash recovery for traditional file systems are slow and difficult because an entire disk scan must be performed in order to identify all corrupted blocks and metadatas.

Contribution:
1. Organizing all updates sequentially on the disk while preserving read performance by utilizing a inode map. An inode map is a map from inode number to the physical location of the inode. The inode map is usually cached entirely in memory. During a read, the inode map will be used to efficiently locate the inode to the file to access. During a write, a data block, an updated inode, and an updated portion of the inode map will be added to the end log. There is a global fixed place on the disk that stores the address of all inode maps, and it will be updated periodically.

2. Segmentation that allows for efficient garbage collection. Since every update will always write new data blocks and inodes, outdated data blocks and inodes will never be erased from the disk. These stale data blocks will become garbage. To identify garbage, a segment summary is stored in each segment. It is like a reverse inode map that contains maps each block in this segment its inode. To test if a block is stale, go to its inode recorded in its segment summary, and check if the inode also points to this block. When a clean up is needed, several segments is scanned for garbage, and rewritten sequentially in new places. Then, the old places of these segments can be reused.

Evaluation: The authors evaluated the performance of LFS under several workloads. On of them tests the performance under temporal locality, in which LFS outperforms others a lot. The other one tests the performance of large files. The third test is involved with a bunch of random writes to a large file followed by a re-read of the entire file. Since in this case the workload exhibits logical locality but not temporal locality, LFS is beaten by traditional file systems.

Confusions: Why do they use an additional layer of indirection inode map? They could have just written the address of each inode in a table indexed by the inode number, and update the table periodically like they update the addresses of inode map in the checkpoint region.

Summary:
The paper presents sprite LFS, a new file system which uses a sequential log structure and segments to layout data on disk in order to speed up small write operations and improve recovery time. The working principle and data structures of the file system are explained along with the free space management, recovery procedure and the performance tradeoffs involved in choosing a cleanup policy.

Problem:
As the gap between processor and disk speeds grows, applications become increasingly disk bound. With memory sizes increasing read operations can be serviced without going to disk which makes most of this disk traffic write based. The spread out arrangement of existing filesystems and the small access nature of most workloads means these writes are dominated by expensive disk seeks. Coupled with the fact that the writes are synchronous makes these operations highly inefficient. LFS aims to speed up disk IO by making the writes sequential in order to eliminate the seeks.

Contributions:
Since the premise of the log structure was that writes had to be sequential, the authors had to take care of the different challenges that came with this copy on write approach such as providing an index style lookup to allow random retrievals. The inode map was a simple and small enough indirection that allowed access to all the inodes while fitting in memory. Free space management was an handled with a combination of threading and copy and compact to avoid fragmentation and the expense of copying long lived files. The segment summary block allowed the cleaning mechanism to determine live blocks and clean up underutilized segments. But more importantly the cleanup policy of reordering live blocks according to their age and treating hot and cold segments differently based on their cost-benefit allowed the authors to achieve high disk utilization while keeping the write overhead of clean up small. Double checkpoints of imaps allowed for recovery and a starting point for the roll-forward mechanism.

Evaluations:
The authors evaluated their cleanup policy by simulating the write costs for a range of disk utilizations which showed the cost-benefit policy outperform both the greedy policy and the lowest Unix FFS write cost. Benchmark programs showed the sprite LFS utilizing about 65-75% of the disk’s raw bandwidth for writing new data as compared to Unix FFS’s 5-10% utilization. The authors noted an overhead in the writes, as 13% of it was comprised of file system metadata such as inode map and segment map blocks, but they attributed it to the short checkpoint intervals which could be remedied.

Confusions:
The segment size is chosen to be large enough to make the transfer time to read/write greater than the cost of a seek to the segment. What happens if the segment size is too large?

1. Summary
This paper presents the Log-file system, a new approach built to effectively utilize the nature of slow disk writes. The basic log-based system is detailed, the method of cleaning segments is explained, and crash recovery is mentioned.

2. Problem
CPU speeds are increasing more quickly than disk speeds. Additionally, large memories mean that many disk reads are increasingly served by a file cache in memory. As a result, the majority of disk accesses are in the form of writes. The physical machinery necessary to seek to correct locations on disk quickly becomes the bottleneck in traditional file systems. Because many workloads function as random, small-sized reads and writes, we have an issue that will only become worse as the disparity between CPU and disk increases.

3. Contributions
The main contribution in the paper is obviously the formalized idea of a log-based file system. In this system, long disk times are amortized by combining multiple small file writes to a single large write to disk. These logs are written consecutively in memory to take of advantage of minimal seek times. For this reason, relevant i-nodes and i-node maps are also included in this chunk.
Because writing logs in this manner essentially invalidates old version of data, a cleaning system is implemented which garbage collects segments that have become fragmented. In this system, both threading and copy/compact are implemented. Data is separated into segments. Long-lived, rarely written segments are left alone, while transient data is cycled through the disk via copy/compact. This mechanism means a policy choosing which segments must be cleaned is necessary. The authors aspire that a policy should evict hot pages which have a very low utilization, while cold pages should be evicted at a much higher utilization. This is because hot pages are likely to lose more data quickly (because they are hot), and we want to achieve a situation in which most segments are nearly full. To accomplish this, cost-benefit functions chooses the best page to clean based on free space generated, age of data, and utilization. This achieves the two modes necessary for hot and cold pages being chosen at different utilizations.
Finally, the log based system allows for simple crash recovery. Checkpoint regions containing pointers to all blocks in the inode map, current time, and last segment written are updated at periodic intervals. To recover from a crash, the most recent checkpoint region is simply read, and the log can be read forward to recover lost operations.

4. Evaluation
The authors first provide a simulation to examine which segment cleaning policy is best. They find that a least utilized function causes performance issues, because long-lived cold pages only slowly drop in utilization, and naturally tie up a large number of free blocks. This leads to the cost/benefit function mentioned above. They also provide a number of comparisons to the FFS in SunOS, finding that their system compares favorably. As the disparity between CPU and disk speed increases, the performance difference will only increase. The authors are using this system in their daily work, and find it effective as a real-world file system.


5. Confusion
I would appreciate going over the directory/inode consistency when rolling forward - this didn’t make much sense to me.

1. Summary
In the paper, "The Design and Implementation of a Log-Structured File System", the authors propose the log structured file system, with a focus on improving write performance by making use of sequential bandwidth of the disk, thereby eliminating almost all seeks. They also show how the sequential logging of both data and metadata aids faster in crash recovery. They develop a file system simulator to demonstrate the segment cleaning policy based on cost and benefit and also implement a prototype called Sprite LFS, which outperforms the Unix File System.

2. Problem
While CPU speed and main memory increases exponentially, disk improvements are mostly in capacity and cost and not in performance. Hence, programs become disk-bound i.e. disk becomes the bottleneck. Moreover, in traditional file systems, the information (both data and metadata) is spread around the disk, causing many small accesses leading to multiple seeks which decreases the disk bandwidth utilization. The applications also write synchronously which prevents them from exploiting the benefit of faster CPU's.

3. Contributions
- To write to disk, buffers all updates including data and metadata and issue writes as segments (usually 512 KB) to leverage sequential bandwidth
- Using inodemaps to provide a level of indirection to reach the latest inode of a file, which in turn points to the latest version of the data blocks of the file
- Using segment summary block to record the inode and offset of each data block to a file, and use it for detecting live blocks
- Using a segment usage table to record the number of live bytes in the segment and the most recent modified time of any block in a segment to support the cost-benefit cleaning policy
- Cost-benefit cleaning policy ensures cold segments are cleaned at much higher utilization than hot segments (because on delaying the cleaning of hot segments, more blocks will die in the current segment) and age sorting the live data on writing them back to disk
- Maintaining two checkpoint regions(CR) so that one with latest timestamp can be used for recovery
- Using roll forward principle so that data and metadata updates written to disk but not yet in CR region can be recorded in the CR

4. Evaluation
The micro-benchmarks on Sprite LFS and SunOS show that LFS is 10 times faster for create and deleting small files. It also has higher write performance and same read performance as Unix except for reading a file sequentially after writing randomly. While Sprite LFS utilizes about 65% - 75% of the disk's raw bandwidth, Unix uses only 5% - 10% of the disk's raw bandwidth for writing new data. LFS exploits temporal locality, data that is used at around the same time is at the same place in the log. If logical locality and temporal locality are about equivalent (sequential read after sequential write), both systems will perform about the same. If they are different, systems will perform differently. Crash recovery is also very fast in LFS due to checkpoint and roll forward in contrast to Unix FS where fsck time will increase with disk size.

5. Confusion
A discussion on commercial systems where LFS is used today would be helpful.

Summary: This paper presents log-structured file system (FS), which provides efficient disk storage management compared with traditional FS. They divide the log into segments and use segment cleaner to compress info. In experiment, they show an order of magnitude improvement based on Unix on small-file writes and improvement on reads and writes.

Problem: As CPU speed and RAM capacity increase, disk operations (seek, read, write) become the bottleneck of the OS. Traditional FS suffers from too many small access and synchronously writes (one write needs to wait for another to complete all). One may need to change the file organization structure to utilize CPU and RAM.

Contributions:
1. The log-structured FS, the main feature of this structure is to eliminate seek operation (domination of overhead) when doing small-file writing. Another feature is fast crash recovery compared with traditional FS.
2. To make the log-structured FS efficient, they divide the log into segments and design a segment cleaning procedure to compress log info, release new space. To improve the efficiency of cleaning, the cleaning policy emphasizes on older data.
Evaluation:
The authors implement a prototype called Sprite LFS to compare with SunOS
running Unix fast FS (FFS). In sequential + random read/write workloads, Sprite LFS consistently outperforms SunOS. In small file writes, LFS is an order of magnitude faster.
Confusion:

In modern OS, where is LFS used? Actually, I think LFS is useful in data analysis, where the data is not organized with rich hierarchy.

Summary:

The paper describes the log-structured file system, which was intended to be optimum for small random writes, as the authors predicted that Reads are mostly going to be satisfied by larger and faster main memories. The authors mainly describe the placement of data structures, such as inodes etc, free-space management (segment cleaning) and crash recovery. For comparison, the authors consider the UNIX-FFS.

Problem:

The existing FS, FFS, spread data across a disk, causing the disk to perform multiple accesses, leading to excessive seeks. This limited the effective bandwidth of the disks. The other problem was that the synchronous writes slowed down or stalled an application which issued a write, which rendered the high CPU speed useless, since the disk bottlenecked the application.

Contributions

The key idea proposed is to buffer a sequence of writes in the file cache,logging, and issuing a single sequential disk write request. Buffering small random writes this way helps utilize disk bandwidth better and also buffering helps avoid the problem of application having to wait for synchronous writes. Unlike FFS, the inodes in LFS are a part of the log and the inodes are tracked using inode maps and inode maps are identified/tracked using checkpoints. The idea is that even though there’s two level of indirection, caching inode maps will help quick access to data blocks. Fod free space management, LFS uses a combination of threading and copying. This is realized by partitioning the disk into segments, and copying all the data out the segments before being re-written, and skipping segments with long-lived data. To clean the segments, the authors propose segment summary blocks to keep track of live blocks, which need to be re-written. An optimization to the check for stale blocks is a version number associated with each file, stored in the inodes and summary blocks(per-block). A mismatch on this number immediately identifies a block as stale. A cost-benefit policy, supported by a segment usage table, is used to determine which segments to clean. In this policy, the FS selects the segment which are cold and have high utilization, reducing the overall write cost since these segments are cleaned quickly and also can be free for longer periods. LFS stores two checkpoints at fixed locations on disk to be used for crash recovery (recent one used).

Evaluation

Micro-benchmarks are used for evaluation and all the performance measures are compared against FFS based SunOS 4.0.3. The evaluation shows that LFS was about a magnitude, ~10 times faster and was able to utilize 70% of the disk bandwidth while SunOS utilized only 5-10%. Segment cleaning overheads and crash recovery overheads were considerably low.

Confusions
How applicable is LFS to a system in a distributed environment? Seems like the file caching, logs, can provide benefits, but is there a catch?

Summary

As modern CPU speeds increase rapidly, applications are more prone to bottlenecks from disk performance. This paper presents a log-structured file system, which tries to optimize write times by storing all disk data in a log and making all writes sequential. This requires a novel approach to maintaining free space, which is accomplished with a segment cleaner which compresses and de-fragments disk data. In their tests, the log-structured file system performs an order of magnitude faster than traditional Unix file systems.

Problem

Disk I/O performance is bound by the fact that these devices have mechanical, moving parts with slow seek times. This leaves few opportunities to improve performance, whereas non-mechanical CPUs and system memory continue to increase dramatically in both speed and capacity. As such, disk performance has become a bottleneck in modern applications. Moreover, common workloads are typically dominated by small file access, meaning that I/O requests are frequent, short and random, instead of the long-running, sequential accesses that maximize disk throughput.

Contributions

The main contribution in this paper is the log-structure file system. Other file systems had previously used logs, but these were just temporary storage units; in this case, the log is the actual data structure for the entire file system. To handle cleaning up old copies of data in the log, the paper introduces segments and a segment cleaner. The segment cleaner periodically deletes stale data from the log and then compresses live data back into the newly-available free space. Finally, the authors build a prototype log-structured file system, Sprite LFS, which they use to measure and evaluate this new approach.

Evaluation

The authors use their prototype Sprite LFS to measure performance of a log-based file system compared to SunOS using standard Unix FFS. In all sequential and random read/write operations, Sprite LFS performs consistently faster than SunOS, in some cases by more than 100%. The only area where Sprite LFS does not outperform SunOS is in sequential re-reads. Other tests measure small file performance speeds, where Sprite LFS is an order of magnitude faster. Cleaning overhead is also demonstrated to be minimal.

Confusions

I don't understand why we need both an inode map and a checkpoint region? Isn't the checkpoint region sufficient?

Summary
The main motivation behind the creation of the log structured file system was to create a file system that was optimized for writes. The authors predicted that CPU and memory performance were going to increase exponentially while disk performance was going to stagnate. In LFS, writes are buffered in a file cache and are sequentially written to disk in a single I/O operation. This improves disk bandwidth by spending less time seeking and more time writing new data to the disk. Additionally, the log can be used for recovery when the system crashes. Periodically, a consistent state of the system (a checkpoint) is written to the log so during recovery only the log records after the checkpoint have to be written.
Problem
Traditional file systems suffer from a couple of problems which LFS attempts to solve. First, the file’s metadata, the directory the file is located, the directory’s metadata, and the file’s contents are located in a separate spot on disk. This induces extra seeks which decreases the bandwidth utilization of the disk. Secondly, traditional file systems write synchronously. This makes applications I/O bound and prevent applications from fully utilizing faster CPUs and memory.
Contributions
The main contribution behind LFS is to buffer writes in the file cache and sequentially write them to the log in one operation. LFS differs from FFS in that it uses an inode map to identify the locations of inodes rather than a fixed position. This allows for updating an inode by writing a new inode to the log and updating the location in the inode map. In order to maintain enough free space for the log to write to, LFS had to invent a mechanism and policy for cleaning dead log data. Sprite LFS keeps a version number as it’s mechanism in each entry of the inode map and uses this to detect dead blocks within a segment. They used the definition of a write cost, which is a multiple of the time to write the new data with no overhead. The authors initially decided use this write cost to determine which pages to clean, however, they found that this lead to poorer performance, when the workload was writing to hot and cold segments. To address this, the used a policy based on the benefit and cost of cleaning a segment. The benefit of cleaning a segment is determined by the age of segment and the cost is determined by the utilization of page. By using this benefit/cost model the authors were able to achieve a bimodal distribution.
Evaluation
Sprite LFS performs well when files are read sequentially the same way they are written. Sprite LFS is almost 10 times faster as SunOS for the creation and deletion phases of the micro-benchmarks they ran. Additionally, Sprite LFS saturated the CPU while keeping the disk 17% busy. This implies that as CPUs scale and become faster, Sprite LFS will be able to scale too and not be bound by the disk. While Sprite LFS was designed for small file accesses it also provides similar performance to traditional file systems for larger files. The system is also performs very well under random writes because it converts random writes into sequential writes.
Confusions
Because writes are asynchronous, how often is the log written to disk? Are there any more recent evaluations of the log structured file system with better CPUs and larger memory that show even more significant improvements over the evaluations shown in the paper?

Summary :
This paper presents Log Structured File System, designed to make disks optimized for writes and small file accesses. It also improves the performance of crash recovery. The issues in designing a log based file system are discussed and the solutions are analyzed.

Problem :
The authors point out various problems with existing file systems :
1. Memory got bigger and so this resulted in more data being cached and most of the reads being serviced from here. Thus, file system performance largely depended on write performance.
2. Transfer bandwidth in disks improved at a much faster rate than seek times and rotational delays. This meant that doing sequential storage and access would improve performance by a lot as compared to the traditional random storage.
3. Existing file systems were not optimized for the most common workloads. For example, even when a small file is created, there is a lot of time wasted in seeking and rotational delays because there were a number of writes being made : new inode, update to inode bitmap, directory data block, directory inode and so on.
4. Small write performance was very poor. A logical write to a single file resulted in 4 physical I/Os which was not good. Disks need to take advantage of the sequential write bandwidth that disks provide.

Contributions :
1. Using the idea of a log storage as a permanent storage mechanism on disk, and making writes sequential to this log. This way, the writes were faster and data could be stored in bulk in just one write access.
2. Now that inodes are randomly distributed across the disk, LFS uses an inode map as another level of indirection, indexing, to get to these faster. Chunks of these inode maps are written sequentially with the inodes and are found using checkpoint regions.
3. Since the fils system may now have multiple copies of the same file, some form of garbage collection has to be done without causing fragmentation. This is done by using a segment cleaner which periodically frees large chunks of segments to obtain new write space (leaving the most recent version lying around).
4. Recovery from crashes are ensured both at the checkpoint region write level and the segment write level.

Evaluations :
Sprite LFS, a LFS based file system was built and evaluated against Unix FFS. The authors observed that Unix typically used only 5-10% of their disk bandwidth for writes whereas Sprite could use around 70%. This system also matched the bandwidth of reads as Unix did.

What I found confusing :
LFS seems very good for write performance but nowadays, there is a lot of data that needs to be read into memory and not everything can be cached. For e.g. while doing big data analysis or graph search. In that case, wouldn’t LFS perform very badly with reads? Also, where is it used these days?

1. Summary
This paper presents LFS - in which all writes are batched and performed sequentially to a log. Additional indexing information is maintained to allow reads. Periodic cleaning of stale log entries is performed to keep large free extents for future writes. LFS provides order of magnitude better performance than FFS for random writes. LFS also provides quick crash-recovery compared to FFS which requires scanning the entire disk.

2. Problem
FFS spreads FS meta-data and data all over the disk. This causes lots of seeks for random access patterns, reducing disk BW utilization used for actual transfer. Also, CPU speeds are growing faster than disk access speeds, making more workloads disk-bound. Memory sizes are also growing, making it feasible to having larger FS caches which can filter many reads. Hence, LFS is built towards a workload where disk traffic is mostly writes.

3. Contributions
Basic idea of LFS is to buffer small writes in memory and periodically issue them as one sequential write at end of log. Data always exists in the Log. To retrieve information, LFS uses same i-node structure, but i-node’s disk address is not fixed. It maintains an i-node map which returns the last disk address of each i-node. Chunks of the i-node map are written to the log as well. For efficient free-space management, LFS structures the log as segments. Segment is the unit of free-space management and is always written/read sequentially. The log is threaded from segment to segment. This ensures that long-lived, untouched segments won’t need to be copied during cleaning. The segment cleaning mechanism will read some segments into memory, identify and compact live data and write the data into smaller number of clean segments. The originally read segments can now be re-used. To identify live data within a segment, LFS maintains a Segment Summary Block which will give the i-number and offset for each data block within the segment. If this information matches with the offset info in the corresponding i-node, we ascertain the block as live. Segment cleaning has multiple policy decisions. Critical ones are for selecting the segments for cleaning and grouping of live data when they are written. A segment selection policy based on cleaning cost + estimated benefit is decided to be good, as it selects cold segments more often for cleaning. During crash-recovery, the most recent checkpoint region contains a snapshot of a consistent FS. More recent data can be recovered using roll-forward, where we scan segments written since the last checkpoint.

4. Evaluation
LFS policies are evaluated using a simulator. Two access patterns are considered - Uniform and Hot&Cold. Hot&Cold approximates access-locality, by accessing a small portion of the disk for majority of accesses. In general, LFS performs much better than FFS for random writes. It performs as well as FFS for reads. The cleaning overheads are shown to be limited and avg. write cost is consistently
5. Confusions
How valid is the assumption that most reads will be filtered by increasing memory caches, given that disk capacity grows much faster than memory sizes.

Summary
This paper describes a novel file system implementation which improves write performance by writing data sequentially into a log-like arrangement, and optimizes reads using large file caches. Normal file system concerns like read/write performance, crash consistency and free space management are also discussed.

Problem
The high-level problem is the increasing gap between CPU speeds and disk latencies. Furthermore, existing file systems were unable to guarantee performance close to raw disk speeds particularly for small as well as random writes. The distributed layout of files on disks in these systems results in a large number of I/O operations per file operation, which hurts performance.

Contributions
The central idea of this paper relate to using logs as the permanent form of storage for files, thus allowing all write operations to be batched as well as sequential writes. Also, increasing main memory capacity is exploited to support large file caches, thus minimizing the I/O operations needed for reading file data. The use of logs for data storage allows for simple and fast crash recovery. The authors also discuss policies and mechanisms for cleaning the log and ensuring large free chunks for new data. This is solved by dividing the disk into segments, which are large chunks of disk blocks, and compacting the live data in multiple segments regularly as determined by the data access patterns.

Evaluation
Microbenchmarks are used to evaluate the best-case performance of the prototype Sprite LFS and the FFS implementation in SunOS. Sprite LFS is almost 10 times faster than FFS for small file operations and random write operations, while being comparable in the case of large files.The central result seems to be that LFS allows 70% of the raw disk bandwidth to be harnessed for writing, while Unix FS typically use under 10%. The cost of overheads such as segment cleaning and crash recovery is also suitably demonstrated. However, there isn’t mention of the size of file caches required to satisfy most of the read requests, or the cost of reading the file from disk.

Confusions
While I understand the math behind needing lower utilization within segments to guarantee better performance, it seems ironical as greater fragmentation results in greater performance. I think this is due to the way they associate performance with write cost. Shouldn’t write cost be a metric of efficiency with read/write latency being the performance indicator?


Post a comment