« Experiences with Processes and Monitors in Mesa | Main | NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories] »

The Design and Implementation of a Log-Structured File System]

Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Trans. on Computer Systems 10(1), February 1992, pp. 26-52.

Reviews due Tuesday, 3/28.

Comments

1. Summary
This paper introduces log-structured file systems, which store all information on disk as a log. The authors implement Sprite LFS, which performs compaction of old data in the log and uses checkpoints to recover from crashes.

2. Problem
By the early 1990's, it was clear that processor speeds and main memory sizes were increasing, but disk IO performance was unable to keep up. Applications partly avoided this problem by caching data in memory for future reads, but disk writes remained a bottleneck. Existing file systems performed many disk writes for even small operations. For example, creating a new file in Unix FFS required five disk IO's. In addition, certain operations, such as updating an inode, needed to complete synchronously. These problems are exacerbated if a file system contains many small files, such as in office or engineering environments.

3. Contributions
Log-structured file systems improve disk performance by buffering all desired changes in memory, and then writing these changes to disk sequentially. Sprite LFS contains an inode for each file and indirect blocks for large files, like Unix FFS. Since inodes are written to the log and are not at a fixed position on disk, Sprite LFS uses an inode map in memory to track the current location of each inode. The log is a linked list of segments, where each segment contains multiple contiguous disk blocks. Each segment is written sequentially, and the system moves to the next segment on the next write operation. Live data in old segments can be compacted to make more segments available. The policy in Sprite LFS is to run the segment cleaner when the number of clean segments becomes low, and to run it until the number of clean segments passes a threshold. The system preferes to clean cold segments, using a cost-benefit policy. Sprite LFS writes checkpoint information every 30 seconds and begins roll-forward after a crash using the information in the checkpoint.

4. Evaluation
The evaluation is fairly comprehensive. In Section 3, the authors compare various cleaning policies before choosing the cost-benefit policy as the best one. In Section 5, the authors perform several microbenchmarks to show that Sprite LFS performs better than the SunOS filesystem for many small files, as well as for a single large file in all cases except random writes followed by a sequential read.

5. Confusion
(1) Do log-structured file systems have any special code to deal with crashes during crash recovery? For example, databases can create checkpoints during recovery in case they crash again.
(2) Are log-structured file systems still used today? Have they run into any of the IO limitations described in the conclusion (even faster processors, etc.)?

1. Summary
The paper presents a new type of file system implementation for increasingly write bound persistent storage which the authors project to only get worse. The system proposed makes the previously fixed location data-stuctures primarily inodes mobile and introduces the concept of log segments which contain versioned/incrementally updating indodes closer to the actual data, with a cached index to their location. In doing so the authors introduce a new problem of cleaning the segments to get rid of stale copies of inodes once a log segment has been committed.
2. Problem
In the authors view there has been a trend in the last decade where, CPU speeds have increased dramatically while disk access times have only improved slowly. They envision this trend would likely to continue in the future and cause more and more applications to become disk-bound. They assume that files will mostly be cached in main memory and that increasing memory sizes will make these caches more and more effective at satisfying read requests. As a result, disk traffic will become dominated by writes. Hence a need for a file system which is more aware of these facts and can improve write performance, while providing crash recovery was needed.
3. Contributions
The authors provide a file-system idea that the file system behave like a log, with memory buffered writes to disk, making all writes to disk sequential thereby improving performance.
While developing this system the authors introduce a new class of problems for their proposed file system known as segment cleaning problem (how to get rid of stale inodes with minimal performance overhead or loss of essential data). This was a big issue which was resulted in a complicated development cycle and delayed adoption of these kinds of file systems.
4. Evaluation
The paper does a very detailed evaluation of the system with almost half the paper dedicated to comparing the performance with a commercial competitor (Sun OS's unix like fs). The authors do a good job at convincing the reader that the system proposed works, works better than existing systems given some restrictions such as log size and the systems ability to get rid of stale values. Overall the paper showed that it is possible to build a file system as a log and have it perform well. The system has been recently adapted for use in SSD controllers and even as file systems for NVM.
5. Confusion
The paper was not very clear on how they handle crash recovery, seems a bit tedious and error prone given if the log segment is buffered for too long in the memory and the system crashes before it is written back.

Summary

Log structured file system aimed to improve the write performance for files and crash recovery time of the file system. The paper focuses on write performance because it argues that with increasing memory sizes most of the reads are served from the cache thus making disk access times write bound.

Problem

Disk access times haven’t been able to keep up with the rapid progress in CPU speeds and thus has become the bottleneck in application performance. So since the hardware improvements were not progressing at a desired rate file system researchers started to look at new storage management techniques. Log structured file system is one such successful technique.

LFS buffers writes until the buffer size reaches a certain limit and writes all the changes sequentially to a log. So the main challenge faced by LFS to constantly find large extents of free space for writing new data.

Contribution

LFS always writes sequentially to the log. This is to eliminate the problem of many small random accesses like inode, directory, data block in traditional file systems like Unix FFS.

To deal with the issue of data retrieval from the log LFS also writes new inodes to the log whose locations are indexed in the inode map. Inode map is divided into blocks and the locations of the blocks are identified by the checkpoint region which has a fixed location on the disk.

To deal with the main issue of free space management LFS made a number of contributions:

1. LFS uses a combination of threading + copying live data to make more space. For this it introduces an abstraction called segments. Each segment is always written sequentially but the log can be threaded on a segment by segment basis.

2. LFS uses segment summary block to identify the file and the position in the file for each block in a segment. Segment summary block is also used to identify the liveness of the block by checking its inode. It can be cleaned if it is not live.

3.The paper shows that the best performance and low cost is achieved when the disk has a bimodal distribution of data. That is having most of the segments nearly full and a few empty or nearly empty.

4.To achieve bimodal distribution LFS comes up with a policy called cost-benefit policy for choosing the segments to be cleaned.

LFS uses checkpoint regions and roll-forwads operation to recover from crashes.

Evaluation

Sprite LFS is compared against SunOS 4.0.3 both of which use 32MB of memory, 300MB of storage space. SunOS used a block size of 8KB and Sprite LFS used a 4KB block size and 1MB segment size.

When a large number of small files are created, read and deleted Sprite LFS is almost 10 times as fast as SunOS for create and delete and also a bit faster for reading. Also, Sprite LFS only kept the disk 17% busy but SunOS kept it 85% busy though only 1.2% was used for new data.

The read performance though similar in most cases LFS’ performance degrades many fold when a file is read sequentially after many random writes.

Confusion

Discussion about bandwidth and seek times were specific to disk drives. What aspects are still prevalent in newer storage technologies?

Summary:
This paper presents the design and implementation of new technique (Log-structure File System) to manage disk storage wherein all modifications to disk are buffered and then written in sequential order in a log-like structure. The paper discusses issues such as too many small accesses, and synchronous write with the then filesystems and aims to improve write performance (by minimizing seeks – buffer large data in memory and do large sequential I/O optimizing disk bandwidth usage) and crash recovery time. It explores various design choices while implementing on-disk data structures to retrieve data, and segment cleaner to manage free space for a log structured file system. The paper also discusses checkpoint and roll-forward based crash recovery which performs better than recovery mechanisms used by traditional FSs.

Problem:
Access time of a disk was improving but progress rate was much slower than the rate of growth in CPU speed, and memory size in 1990s. This motivated the authors to build a more efficient file system by making full use of rapidly changing advances (such as increased memory size) and solve 2 major problems of ‘90s FSs – 1) they required large number of small accesses to disk to manage data; this was the major factor behind suboptimal disk bandwidth utilization as too much time was spent in seeking, and 2) they required synchronous write i.e. application can’t proceed until write is finished; this made it difficult for user program to benefit from high CPU speed. The other systems which had implemented log like structured filesystems earlier used it only for temporary storage and not as a fully-fledged disk storage management.

Contributions:
1) Improves write performance by buffering large data in memory and then writing that data sequentially in one go and hence minimizing number of disk seeks required. Index information in log allows random access.
2) Updates are written in newer and modified block, and old/previous block is made invalid by moving the inode block pointer from old block to newer one.
3) Since inodes’ positions are not fixed in LFS, it uses Inode Map to track physical address of latest inodes.
4) Discusses importance of large empty disk spaces, several cleaning policies, and issues with them (such as with greedy cleaner). Uses its own segment cleaner to continually create large empty segments by compressing live data from fragmented segments. Segment summary block and usage table facilitates this segment cleaner.
5) Creates checkpoints (consistent state) after regular intervals and uses roll forward mechanism to re-execute data modifications done after last checkpoint to recover from system crashes. Uses directory operation log to restore metadata and avoid inconsistency between inode and directories.

Evaluation:
Author implement LFS on a system called Sprite and tests it using variety of workloads and microbenchmarks over a long period of time. They compared it with Unix FFS (implemented on SunOS FFS) and observed that Sprite performed similar to SunOS FS for read operations and dramatically better than SunOS FS for write operations. These tests were done for both small and large files and results were in favour of LFS/Sprite except in a case wherein sequential reads are performed after random writes in large files. The paper however does not compare LFS/Sprite with FSs other than Unix FFS.

Confusion:
Bimodal distribution part is confusing. Why does bimodal segment distribution perform better than other unimodal/multimodal distributions?

1. Summary
The CPU speed and RAM are growing a lot faster than disk access time.
Applications will become disk-bound with lightning fast CPU and large RAM. Current Unix file system doesn’t utilize the full disk write potential. The authors have a new file system that ensures high write throughput closer to actual disk write speed.
2. Problem
Increasing RAM size will improve caching performance for read requests. Current file system involves at least five writes for simple file operation. As a result, the current file system cannot exploit full disk write speed. Also, the crash recovery in current filesystems is very slow involving almost entire disk scan. The authors have proposed a new file system providing near hardware write performance and faster crash recovery.
3. Contributions
The paper proposes new file system to achieve high write performance, faster crash recovery, asynchronous writes. To achieve high write performance, the LFS uses logs to track updates. The writes are buffered in memory cache and pushed to disk in single bulk write operation. The LFS uses large fixed size extents called segments. LFS relies on ‘inode map’ to track current location of each inode. Each segment has its own inode map. To tackle the problem of fragementation, LFS employs the technique Segment cleaning. The segment cleaning involves copying live data out of a target segment into another clean segments. The active blocks are tracked via Segment summary block in each segment. LFS eliminates free-block list and bitmap used in Unix FFS. LFS uses a simple approach based on utilization of a segment to identify target segment for cleaning. LFS uses Segement Usage table to track usage of each segment and last modification time for the segment. LFS uses Checkpoint and Roll-Forward to implement Crash Recovery. LFS has two checkpoint blocks. The checkpoint block contains information about blocks in inode map, segment usage table and last checkpoint time. The two checkpoint blocks are used alternatively to facilitate recovery from crashes during recovery. The Roll-Forward operation uses Segment summary blocks to recover recently written blocks. Roll Forward also uses directory operation log records to fix erroneous inode counts.
4. Evaluation
The authors have compared the performance of LFS against SunOS file system with benchmark programs. Sprite LFS was almost 10 times faster than SunOS file system for write operations. Sprite LFS also improved read performance for these workloads. To measure segment cleaning overhead in LFS, authors recorded statistics over period of six months. In spite of cleaning overhead, Sprite LFS was able to achieve 70% of maximum sequential write bandwidth.
5. Confusion
Do users need to control utilization factor for getting better performance?

1. Summary:

This paper presents a new type of file system called Log Structured File system. In such a filesystem all the data is kept in a contigous log and the updates to files is stored in a cache which is written to the disk at once. The authors implment Sprite LFS and compare it with other file systems such as Berkley Unix Fast File systems and report an order of magnitude improvement for small files and also shows that its performance is comparable or better for larger files even with additional overhead of segment cleaning.

2. Problems:

With increase in CPU speed and increase in main memory size, disk access time became the major bottleneck in the performance of applications.
Due to non contigous nature of files stored on the disk, less than 5% of disk bandwidth was used for reading and writing the data, rest of the time was spent in seeking.
Most of the file systems were designed to write synchronously. Synchronous writes to the disk make the applications slow, and in case of workloads with small files writing metadata dominated the disk time, preventing the application to take benefit of the faster CPU.

3. Contributions:

Main contribution of this paper is the use of log-like structures for reading and writing to the disk. The sequence of file system changes is first recorded in the file cache and then written sequentially to the disk utilizing close to 100% disk bandwidth.
An index structure in the form of an inode-map is used for random access retrievals. Thus the disk seeks are minimized in most cases during reading/writing/creating a file compared to other file systems.

Other key ideas and contributions are as follows.

->The whole disk is divided into segments which are fixed size extents. Segment size is chosen such that time to read or write to a segment is much greater than the seek time.
->Management of free space in the disk is done by a combination of threading and copying techniques.
->Policy issues related to segment cleaning process of copying live data out of a segment) such as which segments to clean and how to group the live blocks are addressed methodically.
->A segment summary block which contains the file number and corresponding block number is used to distinguish live blocks to that of the deleted/overwritten blocks.
->A segment usage table which records the number of live bytes in the segment and the most recent modified time of any block in the segment is used to choose the segments to clean.
->The authors make use of following metrics to analyze segment cleaning policies 1. write cost metric (average amount of time disk is busy per new byte of data written)
2. cost-benefit metric - amount of free space generated and the age of free space per cost of cleaning
->The authors conclude to implement cost-benefit policy as it manages the cold segments better and clean segments more efficiently and reduces the write cost significantly.
->Sprite LFS performs crash recovery using two phase checkpointing and further uses roll forward to further improve the crash recovery by scanning through log segments written after the last checkpoint and makes the inode map consistent and updates the segment usage table

4. Evaluation:

The authors have done a thorough evaluation of the File system by performing simulation as well as testing the production system for several months.
The authors evaluated Sprite LFS cleaning policies using file system simulator under controlled conditions and concluded that cost-benefit policy is best suited.
The authors compared the performance of Sprite LFS using different micro benchmarks with SunOS and found it to be ten times better during the creation and deletion phases. They also find that even though Sprite LFS was designed to have better performance with small files, it also resulted in good performance for larger files.
They estimate that maximum recovery time maximum recovery time would grow by one second for every 70 seconds of checkpoint interval length.

5. Confusions:
When does the write buffer/file cache get flushed to the disk?
It was not clear as to why better grouping resulted in worse performance than a system with no locality?

1. Summary
The paper describes the log-structured filesystem which improves performance of disk writes by writing all modifications to disk sequentially. A segment cleaner is used to compress live information and solve fragmentation of data in the disk.

2. Problem
In the 1990's processors were becoming exponentially faster, while the improvement in performance of disk drives was much slower. Size of main memory was also growing exponentially. As a result, large amounts of data were buffered in main memory and satisfied read requests, while disk traffic was dominated by writes. Existing file systems distributed data in various parts of the disk which resulted in poor utilization of the disk bandwidth due to much time being spent in disk seek.

3. Contributions
The main contribution of the paper is to improve the performance of disk writes by writing all data sequentially, thus avoiding the overhead of disk seek. Writes to disk were buffered in memory in large units called segments which could be written to disk at once. New inodes and inode maps are created on every new write to the disk. The older inodes become invalid as a result, resulting in fragmentation of data on the disk. A segment cleaner can be invoked which scans the segments, looking for live data, and writes new segments with compressed version of the live data, thus freeing space. The concept of write cost is introduced to evaluate policies for cleaning segments. Cleaning overheads for regularly accessed parts of the file system are simulated. Segments are ranked by their utilization and the time they were last modified to be chosen for cleaning. Checkpoints with timestamps are created at fixed points on the disk to aid crash recovery. Data stored after a crash but before a checkpoint update can be partially recovered through roll-forward.

4. Evaluation
The evaluation is done in a production environment with the Sprite network operating system. Microbenchmarks are used to measure performance of small and large file accesses. The new file system significantly outperforms Linux FFS for accessing small files. For large files, reads perform at par with Unix FFS while writes outperform due to lower write overheads. Statistics from a production environment collected over several months show the overheads of disk cleaning and the benefits of cost benefit policy on segment utilization.

5. Confusion
How are crashes during checkpoint updates handled?

Summary:
This paper proposes a new way of storing and managing data on disk called the log-structured file system. All the writes to the disk is appended to the log-like structure. In addition, contains indexing to maintain efficient reads. It greatly improves the performance of small random writes which is observed to be a major bottleneck in the existing file systems. In order to maintain large free areas on disk, the log is divided into segments and segment cleaner is used to clean segments with little or no live data/utilization using a cost-benefit policy.
Problem:
The rapid increase in the processor speed increased the need to speed up disk I/O. Furthermore, the large main memory is able to satisfy most of the reads, so majority of the disk access are write operations. Most of the delay was observed to be due to the disk access time. Applications dominated by small random disk I/Os suffered due to the disk access required to update the file system metadata.
Contribution:
The log-structured file system buffers the changes in file cache and writes them to the disk sequentially.
Two main issues addressed here are : how to make the read operations efficient and how to manage free space so that large extents are available for efficient writes. LFS also contains inodes like the traditional file systems, but rather than having them at a fixed location, it uses inode map to maintain current location of each inode. In order to address the second issue, LFS uses threading and copying technique. It writes the data by grouping them into segments which are sequentially written large contiguous extent on disk. Segment cleaner is used to free the poorly utilized segments by copying the live data in them into another segment. Various segment cleaning policies are evaluated. It is observed that best results are obtained by treating hot and cold segments differently by using cost-benefit policy to select the segments to be cleaned. In addition, crash recovery is much faster and is achieved by using checkpoints and roll-forward to recover most of the data written after the checkpoint.
Evaluation:
Different cleaning policies are analyzed using file system simulator. For uniform random access pattern, the simple greedy policy of selecting segment with lowest utilization policy worked well. While the access pattern with locality and hot-cold segment grouping performed bad with greedy policy. In this case the new cost-benefit policy performed very well outperforming best possible Unix FFS even at high disk capacity utilizations. Even when the overhead for cleaning is included, Sprite LFS can use 70% of the disk bandwidth for writing, whereas Unix file systems typically can use only 5 – 10%.
Confusion:
How does cost-benefit policy produce bimodal distribution of segments?

1. Summary
To overcome the slow progress in the disk access speeds as compared to processor speeds, a new technique of disk storage management is introduced, known as log-structured file system. All the modifications in log-structured file system are written sequentially in a log-like structure, which speeds up both file writing and crash recovery.

2. Problem
Though the processor speed and main memory size and access speed have been increasing rapidly, the disk transfer bandwidth and access time have been staggering. Also, if an application results in a series of small disk transfers separated by seeks, which is the case in most office and engineering applications, the application will not speed up. There are two major problems associated with the file systems earlier to LFS. First problem is that those file systems spread the data around the disk, causing too many small accesses to get some data. Second major problem is that the earlier file systems wrote synchronously, which made the application to halt until the write is complete.

3. Contributions
Log-structured file system focuses mainly converting many small random writes into a single large sequential write workloads. To achieve this, the paper tackles two major hurdles in the way – first, the efficient retrieval of information from the log. Second, management of free space on disk to ensure the availability of large contiguous free space for writing new data.
Though most of the structures used in LFS are same as Unix FFS, the inodes are not maintained at fixed positions. Instead they are also written to log and a new structure called inode map is maintained to access the inodes. Fixed checkpoint regions on disk are used to mark the location of inode maps. To maintain large free spaces to write logs, LFS uses a hybrid approach combining threading and copying. The disk is divided into large fixed-size extents called segments. LFS maintains structures called segment summary block to copy live data out of segment in a process called segment cleaning. A combination of version number and inode number is used as unique identifier which simplifies data cleaning. This also results in no bitmap map, which simplifies crash recovery.
The paper also explains the policy considerations for segment cleaning. It defines and uses write cost metric for this. Then the paper presents a two pronged approach to crash recovery by using checkpoints and roll forwarding. LFS also introduces directory operation log which aids in restoring consistency.

4. Evaluation
Sprite LFS is implemented as a part of implementation of Sprite network operating system. The authors provide a detailed analysis of different kinds of policies for segment cleaning using the write cost metric. The authors compare Sprite LFS with Unix FFS on Sun OS. Micro-benchmark workloads are run on both of them and performance under different circumstances if measured. The performance for both small and large file workloads are evaluated on both the file systems and demonstrated that LFS performs better. Also the detailed segment cleaning statistics and recovery time are presented in the paper.

5. Confusion
1. Can you please explain the role of segment summary block in data cleaning?
2. What happens if the system crashes again while roll forward is happening?

Summary
This paper talks about design and implementation of log structured file system which aims at improving performance for small writes by eliminating seek costs. Additionally, it also provides better crash recovery. The performance of LFS is compared with FFS and it was shown that LFS performs better in most of the workloads.

Problem
CPU speeds and memory have increased exponentially which allows fast read and write operations. Disk access time has been improved at a very slow rate as it is determined by the mechanical movement of the head which is hard to improve. Read operations have become fast as most of the files can be cached in the memory and no further disk access is needed. On the other hand, write operations need to access disk to store most recent data. The slow access to disk is thus the bottleneck in case of small write operations.

Contributions
This papers presents the design and implementation of log structured file system.
>>The basic idea behind LFS is to buffer a sequence of write operations in cache and write them to disk sequentially in a single write operation.
>>Inodes are not written at a fixed location and inode map is used to determine the position of each inode.
>>The space is divided into various segments which is used for large sequential write operation and to avoid fragmentation.
>>Segment cleaning is used to remove dead data and to provide large contiguous free space. For this, all the live data present in the segment (which is to be cleaned) is moved to a different segment and the segment is marked as clean which can now be used for writing.
>>Cost-benefit policy is used for cleaning which allows cold segments to be cleaned at much higher utilization than hot segments.
>>Techniques like checkpoints and roll forward is used to provide crash recovery.

Evaluation
Simulation results showed that, under a greedy policy, locality and better grouping resulting in worse performance that a system with no locality. Thus they used cost-benefit policy for segment cleaning. Experiments showed that LFS permits about 65% to 75% of disk bandwidth to be used in the writing operation whereas Unix permits only 5% to 10%. Authors compared Sprite LFS with Unix FFS on SunOS 4.0.3. Sprite LFS was 10 times fast as compared to Unix FFS for workloads involving creation, reading and deletion of a large number of small files. In addition to this, it provided competitive performance even for large files.

Confusion
I am not very clear why better grouping and locality resulted in worse performance under greedy policy.

Summary
The paper talks about the design and implementation of Log structured file system. The main goal of LFS is to optimise writes to disk avoiding seeks (writing sequentially without updating the already written data).

Problem
There are a few issues with existing FS authors mention
- CPU speeds have increased dramatically while disk access times have only improved slowly.
- existing log based FS use the log only for temporary storage
- inodes are spread around the disk and are separate from file’s content; hence it takes at least 5 separate I/Os to create a new file in Unix
- writes are synchronous; hard for application to benefit from fast CPUs

Contributions
LFS buffers a sequence and write all at once sequentially and the writes happens asynchronously
In this paper authors talks about two main problems to tackle with LFS design
1. How to retrieve data from log.
LFS uses inode map to locate inodes. The inode map is divided into blocks that are written to the log; a fixed checkpoint region on each disk identifies the location of all the inode map blocks.
LFS’s read operation is as good as unix and the focus is to optimise the writes
2. Free space management (garbage collection)
Free space management can be done in two ways 1. Threading: leaving live data in place and thread the log through the free extents 2. Copying : Copying the live data to new place. LFS uses a combination of threading and copying. The disk is divided into large fixed size extents called segments. To reclaim free space there is need to find out which blocks are alive which is done keeping segment summary block. To optimise this version number for files are maintained. Segment cleaning policies are compared using write cost which is the total number of bytes moved to and from the disk decided by number of those bytes that represent new data in case of LFS.

Another important contribution is faster Crash Recovery :
Sprite LFS uses two pronged approach to recovery : 1) Checkpoints 2) Roll-forward. Sprite LFS uses a checkpoint interval of 30 secs. Roll forward is used to avoid losing data after last checkpoint. During roll forward LFS uses information in segment summary blocks to recover recently written file data. Certain conditions are checked during roll forward and changes are incorporated only if the FS is in consistent state.


Evaluations
The authors used a collection of micro benchmarks to evaluate performance of Sprite and compare it it Sun OS. For small-file write performance, Sprint LFS outperforms SunOS. For large files, write performance is as good as SunOS except for files read sequentially after being written randomly.

Confusions
How LFS is helpful today? As most of the disks are flash and writing sequentially doesn’t improve the performance much.

1. Summary
The paper talks about log-structured file system which has faster crash recovery and file writing that is of an order of magnitude better than existing filesystems.

2. Problem
With the advent of improvements in CPU performance, disk technology could not keep up with it in terms of performance. In scenarios with large number of small disk writes followed by seeks, the disk traffic is dominated by writes than reads. The existing file systems had much of the disk bandwidth wasted in seeks. Thus, the benefit from improved CPU performance got hurt by the disk access time.

3. Contributions
The novel contribution of the paper is that it proves that a log-structured file system can use disks an order of magnitude more efficiently than existing file systems.
In a log structured file system, all modifications to the disk are sequentially written in a log-like structure. The sequential nature of the log permits much faster crash recovery since a log-structured filesystem just needs to examine the most recent portion of the log. Also, this sequential log is the only structure on disk that contains indexing information which can be used to read back files from the log efficiently.
In order to maintain large free areas on disk for fast writing, the log-structured file system is divided into fixed-size segments. Segment cleaning overhead is reduced by adopting a cost and benefit based cleaning policy which treats cold and hot data differently. Such an approach leads to the desired bimodal distribution of segment utilizations. Also, in the cleaned segments cold data is grouped together to form a more stable segment. Sprite LFS takes a two-pronged approach to crash recovery : checkpoints and roll-forward. A checkpoint refers to a position in the log where all of the file system structures are consistent and complete. To recover as much information as possible, the long segments written after the last checkpoint are scanned which is called roll-forward.

4. Evaluation
Firstly, the authors present simulations used to test segment cleaning policies and thereby choose a policy based on benefits and cost. The authors implemented a prototype log-structured file system called Sprite LFS. It is shown to outperform current unix file systems by an order of magnitude for small file-writes. Furthermore, while Unix file systems can typically use 5-10% of the disk bandwidth for writing, Sprite LFS can use 70%. For large files, LFS is comparable to FFS. Segment cleaning cost was evaluated experimentally and was found to be much better than that observed in simulations.

5. Confusion
Could you explain what exactly is live data in the context of segment cleaning ?
Not exactly confusion. Could you talk about state-of-art crash recovery mechanisms ?

Summary
The authors present a new File System in sprite OS called Log Structured file system which writes the updates of file system sequentially. Its concept is to cache large amount of new data in main memory and do a single large sequential I/O that can utilize all of the disk bandwidth. Aim of the LFS was to optimize the small file accesses, but as a positive side-effect, it does well for large files as well. LFS utilizes disk an order of magnitude better than existing file system.

Problem
CPU speeds have increase exponentially where as disk access time hasn't imporoved at the same rate. In addition, with more CPU power and larger main memories, applications are becoming disk bound. Problem with current file systems like Unix are, 1) Spread of information in disk is such that it causes too many access (5 access to create a new file). 2) Synchronous writes: Need to write meta data to disk before returning to application, though the writes are buffered.

Contributions
Main contributions of the paper is: 1) Design and implementation of LFS that utilizes the disk bandwidth an order of magnitude better 2) Ease of Crash recover.

LFS: Sprite LFS achieves better disk utilization through sequential writes; Modification to any blocks of file is not overwritten, but the modified block is rewritten into the log. To invalidate the previous version of the block, inode's block pointer is made to point to the new disk block. The update in the inode is also appended to the log. Update to the inode is reflected by writing the imap in to log. Imaps are divided into blocks and are pointed by fixed locations in the disk.
With this sequential organization of writing, the disk space might be completely used if frequent writes are done to the file; The challenging part of the LFS is the free space management. A segment cleaner is run to free some of the older blocks which might be holding non-referenced blocks. Issue occurs when long-lived files exist; with every pass of segment cleaner, block of such file has to be copies into log to make segments free. Thus LFS uses a hybrid approach which divides the disk into segments of large size; Cleaning is done per segment. For cleaning a segment, live-data has to be kept track, thus, a segment summary block is written for every segment containing information which identifies the file to which each block belongs to; position of the block within the file. By checking the inode's pointer, it can be determined if the block is live or not. For faster detection of live-blocks, a combination of version number and inode number called UID is used. Whenever a file is created or deleted, version number is incremented. When a liveness of a block has to be checked, UID is matched with the UID stored in inode; If they match then it's live else it is not.
Becasue of such arrangements, LFS does not need free-block list or bitmap.

Crash Recovery: Recovery from crash in LFS is quite fast and this is enabled through checkpoints. Checkpoints are consistent states in File Systems which are recorded periodically. Once a crash happens the recovery can start from a checkpoint (which is pointed by a fixed location in disk). To avoid issues when crash happens during recover, two fixed locations are used in as ping-pong buffers (checkpoint with latest timestamp is chosen for recovery). After reboot, LFS starts from checkpoint and goes through the log segments to see if information can be recovered. This phase is called roll-forward . Only issue that might occur during recovery is the inconsistency between directories and inodes (inode's reference count decremented (unlink) and not reflected in directory). Such situations are handled by writing a special record called Directory Operation Log. This contains information regarding the operations that are done on the directory and this is written before the directory block.

Evaluation
Authors provide a decent evaluation of LFS. They test LFS on production File Sytem on natural workloads for months; this gives better insight into the natural use of the FS. Authors say that due to benchmarks being CPU bound, they speed-up achieved by Sprite LFS is limited. They compare the best case performance of LFS with SunOS FS based on Unix FFS. On micro benchmarks, authors show that LFS utilizes only 17% of disk while saturating the CPUT where as SunOS kept disk busy 85% of the time. Authors compare these two FS by creating new files and measure the file created per second. As claimed, LFS performs an order of magnitude better than SunOS. Even for large files, LFS perform better than SunOS except sequential reads after random writes.

Questions
1. Please explain about LFS's inherent advantage in Flash Storage.
2. What is bimodal distribution and how is it helpful in achieving higher disk capacity utilization with low write costs?

1. Summary
This paper presents the log-structured file system which writes all modifications to disk sequentially in a log like structure to improve write speed by reducing seek time, and also speeds up crash recovery.
2. Problem
Processors, disks and main memory are three technologies affecting file system design, and among them disk transfer bandwidth becomes the bottleneck since it’s hardest to increase, especially compared with processor speed. The increasingly larger main memory serves as a file cache which makes writes the dominating operation and also buffers a bunch of writes before presenting them to disk. Emerging office and engineering applications tend to be dominated by accesses to small files which results in small random disk I/Os. However, current file systems do not cope with technology and application features well because they tend to spread information around the disk which requires multiple separate I/Os each preceded by seek to access, create or delete files, and because they tend to write metadata synchronously which makes disk traffic dominated by meta data writes.
3. Contributions
This paper proposes solution to two key issues with log structured file system: indexing and free space management. For indexing, LFS keeps an inode for each file and outputs inodes in logs. To locate inodes, LFS uses inode map whose blocks are pointed to by a fixed location checkpoint region on each disk. For free space management, LFS uses segments which combines threading and copy & compact policy. The segment cleaning policy considers both segment utilization rate and age of segment data. LFS sets a higher cleaning utilization rate threshold for cold segments (not being modified for a long time) than hot segments, in order to clean free space in cold segments earlier and postponing cleaning hot segments since data in hot segments dies more quickly. The utilization rate and age data are stored in a per-segment segment usage table. For crash recovery, LFS uses both checkpointing and roll-forward and since the most recent changes are all buffered in the latest log, crash recovery in LFS does not need scanning the whole disk as in FFS.
4. Evaluation
Authors implement LFS and put it in real production environment. Everyday usage does not reveal obvious benefits of LFS since machines used are not fast enough to be disk-bound by current workloads. They also use a set of microbenchmarks which creates, reads and deletes a large number of small files or creates a large number of large files to show the best-case performance of LFS and proves that LFS has higher bandwidth than FFS, except for sequential re-reads after writes. They also showed with real usage data that the cleaning overheads are acceptable and can be reduced after installing rolling forward crash recovery codes.
5. Confusion
When LFS does segment cleaning, does it clean each segment separately or clean multiple segments and compact the live data in different segments together before writing them out?

1. Summary

This paper presents the design and implementation of Log Structured File System (LFS). LFS is optimized for write traffic to the disk while it assumes that most read requests can be satisfied by the in memory cache. The log is the only structure on the disk and everything on the disk including file data and metadata is written sequentially at the head of the log. This avoids the need of rotations and seeks to the block locations for each write and improves the write performance significantly. The read performance for the common case is no worse than the traditional unix file systems.

2. Problem

Due to increasing disk sizes, most read requests can be satisfied by the in memory cache hence the disk traffic is dominated by writes. The authors however noted that the write bandwidth achieved in most file systems was a meager fraction of the peak available disk bandwidth. This was because each write updates multiple on disk data structures located at different locations. Hence each write involved multiple rotations and seeks degrading the application perceived throughput significantly.

3. Contributions

The basic idea of LFS is to have a log structure at the disk and write everything by appending to that log. This avoids the need to update multiple disk locations for updating metadata on each disk write. As a result, seeks and rotations are avoided. Furthermore, LFS performs write buffering such that the buffer is only spilled to the disk when its size becomes equal to one segment. A segment spans multiple blocks and can be in MBs. This further improves write throughput. LFS also writes the updated inodes along with the newly written data blocks at the head of the log. This is unlike traditional file systems where the location of inodes is fixed and can be statically computed. To locate the most recent version of an inode, LFS keeps another data structure called the inode map. The inode map has the mappings for inode number -> disk locations for each inode. Inode map itself is also written in chunks along the log, however, it is also always cached in memory which helps in finding the latest inodes much faster. There is a check point region with fixed location on disk which stores the pointers to all the imaps in the system. This region however is only read during crash recovery to find the imaps and is updated at 30 second intervals. For the common case, the imap is always read from memory.

LFS also has a garbage collector to create contiguous regions of free space on the disk. It runs a cleaner as a background process which consolidates the segments by getting rid of the disk blocks no longer in use.

4. Evaluation

The authors implemented LFS as part of the sprite network operating system. The authors compare the sprite LFS with SunOS whose file system is based on unix ffs. The microbenchmarks show that sprite LFS performs much better than SunOS for read, write and create operations. For the large file read and write bandwidth, LFS performs better than almost all scenarios except the one in which the file was written randomly but read sequentially.

5. Confusions

Do any large scale commercial file systems today use something similar to LFS? With the current memory and storage capacities LFS seems to be a decent choice. High storage availability ensures that Cleaner may not be needed at all, or is needed very infrequently.

1) Summary

As processor speeds increase and memory capacities increase, many applications are becoming disk-write-bound. Thus, the authors propose the log-structured filesystem, in which most writes are sequential and few seeks are required. This optimizes the filesystem for the common case disk access: a write to a small file. They show substantially better disk usage then Unix-like filesystems.

2) Problem

Processor speeds and memory capacities are increasing more quickly than disk latency and bandwidth. This means that computation and cached disk reads are much faster, but writes are slow. Traditional filesystems often require multiple disk accesses separated seeking to write a file. This is particularly slow on HDDs, where parts must physically move to read the disk platters. This wastes a great deal of time and disk bandwidth.

3) Contributions

The authors propose a novel filesystem organization on disk. The log-structured filesystem turns all writes into a log append. This keeps writes sequential. Reads are assumed to be cached in the common case.

Part of the contribution of the paper is a discussion of different design alternatives and choices the authors made. For example, the authors describe why they choose their reclamation strategy at length. Their design discussions also serves to identify performance bottlenecks in more traditional filesystems.

4) Evaluation

The idea of a log-structured filesystem is rather elegant, in my opinion. Almost every operation is expressed as a log append -- even cleaning! The simplicity of the idea makes it intuitive to understand and simplifies implementation somewhat.

The paper itself is well-written. The authors state their assumptions about the common case before proceeding to optimize for it. They assume that reads are usually cached, that sequential operations are fast, and that most files are small. Moreover, the authors extensively discuss their design alternatives and compare their design choices with Unix filesystems to show where their performance improvement comes from.

However, one wonders why they chose to only compare with Unix filesystems. Why not journaling or FAT filesystems? Likewise, the authors fail to answer a few key questions in their evaluation. For example, what percentage of disk writes are to data that has not changed (e.g. one might expect that inodes don't change that much after they are first written, so do we really want to keep writing new inodes to segments?)?

Also, their crash recovery seems weak. 30s to "minutes" of work is a lot, even from a human perspective; it is forever in the computer world, though, and losing 30s to minutes of work might be catastrophic, depending on the workload.

Finally, I found the graphs in the paper extremely hard to read because they used dashed and dotted lines for both the borders of the plot and the plotted data.

5) Confusions/Questions

1) Are log-structured filesystems still valid with SSDs? I am guessing from the reading list that they require some changes. One would assume that since sequential accesses are not that much faster than random accesses, log-structuring would not offer much benefit. Moreover, constantly rewriting cleaned segments would wear down the disk.

2) Why are journaling filesystems more common in consumer systems. Don't they require more writes than log-structured filesystems? If I recall correctly, NTFS, HFS, and ext4 are all journally filesystems.

1. Summary
This paper introduces log structured file system, which can collect a large amount of new data in a memory file cache, then write them all to disk in a single large IO sequentially like in a log like structure.

2. Problems
The main file system problem this paper deals with is the write performance. Since memory is getting larger, read performance can be partially solved by large file cache, but a write operation of a file in traditional filesystem includes modify inode, directory, data etc., which has many write operations and many random access on disk, so it can only utilize a small portion of the ideal disk bandwidth. Also, traditional FS is not good at dealing with small files, because it will create more small write operations scattered among the whole disk.

3. Contributions
The first contribution of this paper is how to locate data, since data blocks, inodes and segment usage tables are all scattered among the disk. This paper uses inode map to help find the newest version of inode, after find the correct inode, the rest is the same as traditional filesystem. There is also a checkpoint region in a fixed location, to record the location of all inode maps.
The second contribution is how to manage free spaces since LFS does not overwrite existing data blocks. This paper merge blocks into segments as the entity of cleaning, and runs segment cleaner when there is not much spaces. It also discusses which segment should we choose to clean. Instead of greedily choosing the most fragmented block, this paper use simulation results to show that we should consider hot and cold segments separately and proposes a cost benefit criteria for segment cleaning.

4. Evaluation
The implementation of LFS is called Sprite. This paper first evaluates Sprite on microbenchmarks, without considering the effect of cleaning. For small file operations, Unix FS uses only 1.2% of the ideal bandwidth, and Sprite is about 10 times faster than Unix FS. For large file operations or sequential file operations, Sprite is not much slower than Unix FS, which means we can not only use LFS to process small files. To measure the overhead of cleaning, it runs a long experiment on real workloads for months, the average write cost is 1.2 to 1.6, which means that the data copy and threading does not lead to a large overhead.

5. Confusion
(1) Can you give some details about the segment usage table? Is it scattered among the whole disk just like inode map? What is the storage structure of segment usage table? This paper only describes it in English, not in a figure, so I am not sure I understand it correctly.

1.Summary
Log-structured file system improves write performance significantly and maintains the read performance as good as the UNIX FFS system. LFS buffers the writes in a large in memory cache and writes the large chunks of cached blocks to the disks at once and thereby minimizing the disk writes. All the writes(file system changes) are appended to a log on the disk.

2. Problem
The performance of the disk has not improved as much as the performance of the CPU and the main memory. And thus operations involving the disk have been the bottleneck for the overall improvement in the performance.
In case of UNIX FFS system, most of the time is spent in performing metadata updates rather than writing actual data, especially in case of small random writes.
UNIX FFS system forces the writes to the filesystem data structures such as directories and the inodes to happen synchronously which is a major performance bottleneck in case of small writes.

3. Contributions
The two key issues to be addressed while implementing LFS are retrieving information from the log(i.e how to make the reads faster) and managing the disk space so that large extents of the free space are always available for writing new data.
The Sprite LFS solves the first issue by using inode maps which point to the latest inode of the file. It solves the second issue by combining the threading and copying policies to reclaim the space. Sprite LFS writes segments (large fixed size contiguous blocks) at once. Segment cleaning refers to writing the live data out of the segment before marking the segment to be clean and be used for new data.
Sprite LFS defines various segment cleaning policies and answers the questions like when should the segment cleaner execute, how many segments should it clean at a time, which segments should be cleaned and how the live blocks be grouped when written out.
The term write cost is used to decide which segment cleaning policy works better.
The checkpoints and the roll-forward are discussed as the methods to deal with crash recovery.

4. Evaluation
The Sprite LFS has been evaluated against UNIX FFS in the Sun OS. They simulate the file system and run workloads. They discovered that the hot and the cold segments to be treated differently by the segment cleaner and came up with the cost-benefit policy. It is shown that the Sprite LFS with the cost-benefit policy outperforms UNIX FFS for all kinds of workloads.

5. Confusion
What segment cleaning policies are used in the current file systems which are implementation of the log structured file systems?

1. summary
The paper proposed a new model to manage storage, which is log-based file system. They targeted the problem encountered by systems like UNIX and provided the LFS solution. They discussed the crutial issues that lfs need to tackle and conducted experiments to show the performance improvement for writing.

2. Problem
With the processor becoming faster and memory becoming bigger, disk access becomes the bottleneck of systems. While read can be cached thus reduce the performance issue, write operation need a new strategy to optimize and thus boost the performance of whole system. Specifically, most file operation is write, so they would like to optimize the writing to small files while maintaining the performance of other operation. Current file system liken UNIX, doing too much disk acess (seeks) while writing, which should have been reduced.

3. Contributions
a) Proposed Log-based file system which has large influence on current systems.
b) Using consolidation of writing to improve write performance by reduce like five writing and five seeking for creating file to just one.
c) Segment Cleaning: read segments into memeory, find live data, write live data to clean segments
d) Analysis and support for cost-benefit cleaning policy (maintaining table that stores # of live bytes in the segment and latest writing time to that segment)
e) checkpoints and roll-forward for recovery

4. Evaluation
a) The experiments show the writing performance compared to FFS can be an order of magnitude higher while reading performance is nut hurt.
b) They also showed the simulation result for two file access patterns
c) Sprite LFS can use upto 70 % of the disk bandwidth to write while UNIX typically can use 5-10%
d) It is kind of suprising that locality and grouping result in worse performance.

5. Confusion
a) The interaction of LFS with calls like fsync()?

Summary:
The paper presents the design and implementation of a new file system called log structured file system. The new technique improves speed of small file writes and crash recovery as compared to current unix file systems.

Problem:
During 1990s, existing file systems would spread information around the disk in such a way that would cause too many small accesses to disk. Another issue was synchronous write operations to disk. These two problems with existing file systems limited their ability to cope up with increasing memory sizes and changing workloads. Existing file systems typically could use only 5-10% of disk bandwidth for writing.

Contributions:
The authors implemented a prototype log structured file system called Sprite LFS. In Sprite LFS, modifications to disk are buffered in file cache and are then written to a log-like structure sequentially in a single write operation. This eliminates disk seek time since the log is written sequentially. Sprite LFS needs large chunks of free space to operate effectively. Hence it breaks the disk into large chunks called segments and a user level process called segment cleaner creates empty segments by compressing and moving data from fragmented segments. This segment cleaner uses cost-benefit policy to choose which segments to clean. As per this policy, hot and cold segments are treated differently. Cold segments are cleaned at higher utilization. Sprite LFS uses checkpoint and roll forward to handle crash recovery. It also maintains directory operation log to restore consistency between directory entries and nodes after crash.

Evaluation:
The authors used a collection of small benchmarks to measure the best case performance of Sprite LFS and compare it to SunOS which has Unix FFS based file system. Small-file writes performance in Sprite LFS is an order of magnitude better than in SunOS. Read and large write performance is as good as SunOS. An interesting result is that Sprite LFS kept the disk only 17% busy while saturating the CPU during the file creation phase. So Sprite LFS will be able to take advantage of increasing CPU speed.

Confusion:
The authors tried greedy policy + age sorting to select which segment to clean. I could not understand why this policy perform worse than using just greedy policy, i.e. why “locality” and better “grouping" results in worse performance than a system with no locality ?

1. summary
This paper presents a log-structured file system. The fundamental goal is to avoid slow disk seeks in small file accesses. This is achieved by a sequential write of buffered I/Os.

2. Problem
At the time when the paper was published, the processors speeded up at a nearly exponential rate, but the disk did not keep pace with the CPU. The bandwdith limit of disks is slow seeks, which is common in small-file writes. The main memory increased in size at an exponential rate. This enabled modern file systems to cache recently used file data in main memory. Disk traffic becomed dominiated by writes. Thus small-file writes was the bottleneck of file systems. Unfortunately, most office and engineering applications frequently accessed to small files, resulting in small random disk I/Os limited by the bandwidth of I/O.

3. Contributions
This paper presents a log-structure file system. It buffers writes and writes all in a single sequential of blocks. Such a sequential write increases write performance dramatically by eliminating almost all seeks. Also it has a much faster crash recovery. The idea of logging is not new, but this paper is the first to store data permanently in the log. Most difficult challengen in the log-structure FS is to guarantee that there are always large extents of free space avaiable. The authors handle this by batching writes in segments, and use segement cleaner process to regenerate empty segments by compressing the live data from heavily fragmented segments. The authors also build a prototype Log-strutured FS called Sprite LFS. In the new FS, raw writting speed is more than an order of magnitude greater than that of Unix for small files.

4. Evaluation
The authors first evaulate the Sprite LFS on micro-benchmarks to see the best-case performance of Sprite LFS. Thei Sprite LFS is compared with SunOS 4.0.3 (Unix FFS) with a file system of 300MB. For small-file performance, Sprint LFS outperforms SunOS and also speeds up will given better CPUs. For large files, Sprite LFS is better at random writing and sequential writing. The authors also measure the overhead of cleaning. This was done on five real file systems for several month. The overhead of cleaning is 1.2-1.6 compared with the write costs of 2.5-3.0.

5. Confusion
None.

1. Summary
Rosenblum and Ousterhout present the log-structured file system designed in an effort to curb the bottleneck of disk efficiency. They cleanly lay out the problem at hand and combine techniques from various fields including programming languages and databases. The result is the Sprite LFS that dramatically outperformed FFS on both workloads and synthetic benchmarks.

2. Problem
The three core components that influence file system performance are processors, disks, and main memory. Among the three, the rate of improvement of disks is significantly slower than the other two. How do we efficiently improve or remove disks from becoming the primary bottleneck?

3. Contribution
I would say the major contribution here is the design and implementation of log-structure. While their design goals and resulting implementation focused on the application in file systems, it is continued on today with the use of LFS in SSDs and the popularization of LSM Trees.

Providing some high level details, they solved two key issues: how to retrieve information and how to manage free space. The first issue was solved rather simply by adding another level of indirection from the checkpoint region; thus, the inode map and inodes were allowed to reside within the log. Additionally, the log is maintained using segments over blocks. The second issue of free space management was a significant amount of their work as to tackle the problem of efficiency. This is done using a garbage collection technique combining the temperature of the data and the amount of space utilized in the segment.

Crash recovery is another important aspect that was handled using the checkpoint region and an operation called roll-forward. A checkpoint is handled in two phases. All modified data and structures are flushed to disk and second the checkpoint region is updated along with the current time and pointer to last segment. Two CRs are used to avoid the problem of incomplete checkpoints by using the latest of the two time stamps. Roll-forward is more or less a byproduct of the log structured nature with the addition of the directory operation log to maintain consistency of directories. Lastly, as it’s name implies, it just reads through the log on crash and applies corresponding advancements to the FS state.

4. Evaluation
The evaluation of the LSFS composed of three parts: simulation, synthetic benchmarks, and experience in production. The key metric used was write cost (write amplification). This makes sense as it conveys the maximum throughput of a disk for a workload. Sprite LFS outperformed FFS in all categories except for random write to sequential read. Ultimately, the real world performance proved their benchmarks to be upper bounds as the real world cold data is colder than the benchmark data and greater locality can be found within segments.

5. Discussion
With respect to current day file systems, where do we see the findings drive file system design for disks?

Summary

The paper presented Log-Structured File System (LFS), a technique for disk storage management. LFS speedup both file writing and crash recovery by writing changes to disk sequentially in a log-like structure.

The log is the only structure on disk, there is no bitmap nor free-block list is needed. Log is divided into segments to maintain large free areas on disk. The paper also proposed a segment cleaner to compress the live information from heavily fragmented segments and a cost-benefit policy was adopted for segment cleaner. The designed LFS does checkpoint and roll-forward on a crash to recover from inconsistency.

The authors implemented an LFS called Sprite LFS who outperformed Unix for small-file writes and achieved matching or exceeding Unix performance for reads and large writes.

Problem

1. CPU’s impact on disk
CPU’s speed has increased dramatically while disk access time have only improved slowly causing more applications to be disk-bound.
2. Main memory’s impact on disk
a. Larger file cache absorbs more read requests cause disk traffic (hence disk performance) become increasingly dominated by writes.
b. Larger file caches can serve as write buffers and collect large number of modified blocks before writing to disk.
3. Workload: Small disk I/Os’ performance dominated by updates to FS metadata.
4. Current FS problem
Spread information around disk causing too many small accesses and write synchronously.

Contribution

LFS improve write performance by buffering a sequence of file system changes in the file cache and then writing all the changes to disk sequentially in a single disk write operation. The approach increases write performance dramatically by eliminating almost all seeks.

At the same time, LFS also improves crash recovery due to its sequential nature. At a crash, the FS does not need to scan the entire disk but only need to examine the most recent portion of the log.

Here are some specific mechanisms worth noting:
1. File Location and reading
Sprite LFS have inodes written to the log instead of placing them at fixed positions. An inode map is used by Sprite LFS to maintain the current location of each inode. A fixed checkpoint region on each disk identifies the locations of all the inode map blocks.
2. Segments
Sprite LFS uses a combination of threading and copying: any given segment is always written sequentially from its beginning to its end while threading happens among different segments. Within a segment, all live data must be copied out of a segment before the segment can be rewritten.
3. Segment cleaning
Copying live data out of a segment: read a number of segments into memory, identify the live data, and write the live data back to a smaller number of clean segments.
4. Cleaning policy: clean cold segments at a much higher utilization than hot segments.
5. Crash recovery: checkpoint and roll-forward. Checkpoint define consistent states of the file system and roll-forward is used to recover information written since the last checkpoint.

Evaluation

The authors implemented Sprite LFS which was in production as part of the Sprite network operating system. Sprite LFS was first compared against SunOS 4.0.3 whose file system is based on Unix FFS. According to the paper, benchmark programs demonstrate that the raw writing speed of Sprite LFS is more than an order of magnitude greater than that of Unix for small files. It is worth to note that when running these small file benchmarks, no cleaning occurred for Sprite LFS so the measurement represent best-case performance.

When it comes to read and large file performance, Sprite LFS achieves matching or exceeding performance except for files read sequentially after being written randomly.

Later, the author also provided measurements on cleaning overhead and crash recovery.

Confusion

1. Looks like many of the efforts of LFS design are trying to address the problem of access time with a focus on seeking time. As SSD is getting increasingly popular nowadays, does LFS still provide much benefits for SSD?
2. What is the significance of Figure 3? Looks like in Figure 3, LFS have a heavier write cost than both FFS today and FFS improved.
3. How does the indexing mechanism (helping with reading back from log efficiently) work?

1. Summary
The paper presents a new file system called the log-structured filesystem (LFS) that aims at improving write performance by buffering writes in sequential log-like structures called segments and writing them to disk later. Data and metadata are written in sequential structures which allow sequential write speed performance for workloads with random writes. The paper also deals with the problems associated with choosing this design such as metadata not being located in a fixed location and garbage collection.

2. Problem
Disk speeds are not growing at the same rate as processor speeds. With memory growing cheaper, performance of reads can be improved by caching more data in memory instead of keeping it in disks. Thus application performance would be limited by the performance of writes to disk. Existing file systems achieve very low disk bandwidth utilization (5-10%) for random writes because of the data & metadata layout and the mechanical delays associated with seeks which can’t be eliminated. The authors propose eliminating this problem by buffering writes to memory and later writing to disk in big, sequential chunks, thus achieving high disk bandwidth utilization and write speeds because of minimal time spent in seeks.

3. Contributions
Their main contribution lies in identifying that for buffering writes in memory and writing to disk later to work, the filesystem would need both the data and metadata to be written in the sequential structures. At first glance, this may seem like a simple system, but a closer inspection reveals several challenges. The other contributions of this paper lies in addressing those challenges:
1. Since nodes are now scattered across in the log structures called segments and are not located at a fixed place, inode maps are introduced which allow storage of both metadata and data in the same structure.
2. Garbage collection mechanisms so that old data and metadata segments are cleaned up for correctness and also to ensure that there is enough free space for sequential segments.
3. Proposal of different garbage collection policies for hot and cold data. This is essential because we don’t want to waste our garbage collection efforts on hot data that will get overwritten soon.
4. Check pointing and roll-forward mechanisms for crash recovery.

4. Evaluation
The authors implement LFS for the Sprite OS. It is evaluated against the Unix FFS-based filesystem on SunOS 4.0.3. For a microbenchmark which creates, reads and deletes a large number of small files with no cleaning overhead, Sprite LFS is 10 times faster than SunOS FFS proving its effectiveness for its targeted workload of small files. Benchmarks for large files show that the Sprite LFS is competitive with SunOS in most cases, demonstrating that it did not degrade performance for large file workloads. The authors also presented statistics from the production use of Sprite LFS measured over several months to demonstrate the cleaning overheads, and showed that Sprite LFS was able to achieve higher disk capacity utilization using the cost benefit policy over SunOS FFS.

5. Confusion
This will probably get answered by reading the next paper, but can we discuss why the idea of log-structured FS fell out of favor? Was it because of the high cleaning overheads or because of the limitations of the crash recovery mechanisms?

1. Summary
This paper present the log-structured file system called Sprite LFS as a competition to the typical file systems of the day. The Sprite LFS was made to improve on disk speed and crash recovery speed. Since log-structured file systems need sequential blocks for writing the paper also presents the segment cleaner as a way of regenerating empty segments.

2. Problem
With CPU speeds increasing exponentially and the disk speed has only increased slightly resulting in the disk providing a major bottleneck of systems today. The paper notes that most office and engineering workloads are dominated by accesses to small files leading to large overheads of updating metadata in typical file systems. A problem with typical file systems is that they spread information around the disk leading to many small accesses and poor bandwidth utilization. Another problem with typical file systems is that on a crash you must scan the entire disk to restore consistency. This contributes to a major time loss in recovery and is only increasing with larger disk sizes. It is also beneficial for performance to allow writes to happed asynchronously instead of coupling an applications performance to the disk by having synchronous writes.

3. Contributions
A technology push that the paper is based around is that with memory sizes growing most disk traffic will be writes. The log-structured file system groups many small synchronous writes into larger asynchronous sequential transfers. Sprite LFS does not place inodes in fixed positions and therefore has implemented an inode map for finding inodes. The inode map is compact enough that active maps can be kept in cache to prevent extra disk I/O. To help deal with fragmentation the Sprite LFS uses a combination of threading on a segment-by-segment basis and copying of live data. The segment cleaning mechanism used by Sprite LFS has three steps of reading a certain number of segments, identifying the live data, and then writing the live data back to a smaller number of segments. While evaluating the cleaning policy the paper found that a cost-benefit policy that had a higher utilization threshold for cold segments than hot was best. This necessitated the implementation of a segments usage table that records live bytes in a segment as well as the most recent modified time of any block in the segment. Another benefit of using log-structured file systems is that crash recovery is quicker and the Sprite LFS relies on both checkpoints as well as roll-forward to maximize data recovery.

4. Evaluation
The Sprite LFS had a rather robust amount of testing done on it. This included simulations as well as real world usage statistics. The paper used a file system simulator to examine different cleaning policies and had the surprising conclusion that locality and grouping had worse performance than with no locality. This finding caused a change in the system leading to cold segments being cleaned at a much higher utilization than hot segments. Using several small benchmark programs the paper found that for creating and deleting the Sprite LFS is almost ten times as fast as the SunOS. The more remarkable aspect of this paper however was that it had real world feedback recorded over several months. This allowed the authors to notice that typical files cover multiple blocks that are written to and deleted leading to much better write costs for cleaning than in simulation.

5. Confusion
It seems odd that this paper finds that the time when cleaning is executed does not warrant much thought. It seems that the segment cleaner would take up a chunk of bandwidth and it might be better to examine executing it as a low priority or at night as suggested by the paper.

1. Summary
Log-based File Systems (LFS) seek to improve FS performance for random writes to small files by collecting writes together into sequential logs before writing them out to disk.

2. Problem
As processors have grown faster and memory sizes have increased, disk latency has not improved. This means that systems will soon become bound by disk write performance. Previous FS designs, such as FFS, require many writes to multiple locations on disk to maintain their persistent data structures. This will give bad performance in a disk-latency bound system. Additionally, FFS and its ilk are optimized for workloads with large files that are written and read sequentially. Modern workloads often feature writes to many small files, which can appear and perform like random I/O to systems like FFS.

3. Contributions
This paper contributes the high-level idea of a log-structured file system for modern workloads in a system that is bound by disk performance.
The authors also provide the performance and some implementation details of Sprite LFS, a real FS implemented in a production OS. They identify some of the challenges in designing a real LFS, including cleaning policy choices and checkpointing strategies.
One of the biggest contributions is the policy choices they outline for their segment cleaner. They discovered that file hotness is an important criteria when choosing which segments to clean, and integrate it into their “cost-benefit” policy.

4. Evaluation
They have implemented Sprite LFS in a production OS, the Sprite network operating system.
First, they examine the overhead in write bandwidth of their LFS using a FS simulator running a pair of synthetic workloads. These experiments lead to the discovery of their “cost-benefit” cleaning policy.
They then compare their LFS implemented in the Sprite OS with FS performance in SunOS 4.0.3, which is based on the Unix FFS. They run microbenchmarks to measure their performance on small- and large-file traffic. LFS does predictably well. It’s important to note these measurements are optimistic, as they don’t penalize LFS at all for its cleaning overheads.
In order to measure the effect of cleaning overhead on the LFS over time, they provide the disk bandwidth utilization of their system over time. They estimate cleaning adds about 30% write-traffic overhead.
Finally, they provide some data on the recovery times provided by their crash mechanism, but don’t offer any point of comparison.

5. Confusion
I had difficulty following what happened with the “hot-and-cold” simulation and what I was supposed to take from Figure 4 showing write cost versus disk utilization.

1. Summary
This paper introduces the log-structured file system (LFS), the most important feature is to buffer file system write operations in memory, then write to disk in a sequential way.

2. Problem
The first problem is that CPU speed increased much faster than disk speed. This tendency implies the disk would be the bottleneck of more and more applications, and the file system should be designed to improve performance of disk-bound applications. In addition, The authors stated that the workload of small files was prevalent in office and engineering environments. But, existed file systems (e.g. fast file system) were not optimized for small files operations. Traditionally scattered file system layout (separate updates of file's data and metadata) incurred several disk seeks for one file system operation (such as create a file). Due to the physical limitation of disk, we would like to keep disk seeks as few as possible, which means we prefer sequential disk write to random write. LFS leverages sequential write to optimize for small files operations.

3. Contributions
The most important contribution should be the design of disk layout (segments) and corresponding garbage collection mechanism. LFS still follows the design of inode (has pointers to data blocks) and directory (array of filename + inode), but the inode and data blocks are no longer updated in place. The changed disk blocks (including data and metadata) are first buffered in memory. Then a series of changed disk blocks (so-called segment) is written to disk sequentially. Because the disk address of inode cannot be calculated directly from inode number itself, a map structure (so-called inode map: map[inode number] = disk address of inode) is used to find inode. Because inode map is also updated in each segment, a structure (so-called critical region: cr[inode number] = disk address of disk address of inode map) is used to find inode map. The critical region also stores other information for crash recovery. Because changed blocks are written to disk at new location, old blocks need to be freed (garbage collection). The mechanism used in LFS is to read M segments from disk, figure out which blocks are still live among the segments, then write the live blocks back as N segments (N is smaller than M). Liveness of blocks (whether block is the newest version) can be determined by recording each block's position information (block's inode number and offset in file) in segment (so-called segment summary). Segment usage table is used to implement the cost-benefit garbage collection policy in paper. For crash recovery, checkpoints is used and optimized with roll-forward.

4. Evaluation
The authors implemented log-structured file system (Sprite LFS) as part of Sprite network operating system. They compared Sprite LFS with Unix FFS on SunOS. In micro-benchmarks without cleaning (garbage collection), Sprite LFS is much better than FFS for small file operations and as good as FFS for large file operations in most cases. The authors also showed the effectiveness of their cost-benefit cleaning policy (segment utilization ratio), crash recovery (recovery time) and overhead (disk and bandwidth usage).

5. Confusion
Whether LFS is still useful for SSD because it doesn't suffer from random write overhead as disk?

Summary

They come up with a new FS called the log structured file system which improves small write performance. The problem with small writes in Unix is that the disk spends a lot of time seeking the correct location before writing. This happens as the inode and data are spread out over the disk. LFS resolves this issue by writing in contiguous chunks with the aid of write buffering. The biggest concern with such a FS is cleaning up the log once the log is exhausted, they provide policies about a cleaner which runs through the log and consolidates memory.

Problem

  1. Cpu speeds have increased while disk access speed has not improved at the same rate. Disk will become the bottleneck for applications.

  2. Need to find a way to write without spending all that time seeking as disk transfer bandwidth can still be somewhat improved.

  3. In Unix, inodes are separate from the data, so even on a large write there has to be multiple disk seeks to different locations to find the right stuff.

  4. Unix is also good at distributing files uniformly across the disk for small writes. It only places things in the same cylinder for large writes.

  5. Synchronous writes prevents us from using the true power of the CPU

Contribution

  1. The FS is based off a major assumption: Increasing memory caches are effective at satisfying most read requests, it’s writes we should worry about (especially small writes)

  2. Minor assumptions: Crashes are rare so write buffering is a wonderful thing to do.

  3. All writes are written sequentially to a log so no time is wasted seeking.

  4. Faster crash recovery- no need to scan whole FS tree, just most recent portion of log.

  5. Free space management in log- mixture of copying and threading.
    • Read segments

    • Identify live data using segment summary block

    • Consolidate into smaller segments

    • Mark old segments as clean and can be overwritten

    • There a policies regarding the frequency, number of segments etc for the cleaner
    • .

  6. Inodes places along with data in a sequential manner. Inodes in fixed location for unix. Prevents multiple seeks. Only the superblock(contains inode map) and checkpoint are fixed. Everything else you can write in one go.
  7. Checkpointing and roll forward to recover from crash.

Post a comment