CS 736 Reviews - Spring 2016: The Design and Implementation of a Log-Structured File System.

« Experiences with Processes and Monitors in Mesa | Main | FlashTier: A Lightweight, Consistent and Durable Storage Cache »

The Design and Implementation of a Log-Structured File System.

Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Trans. on Computer Systems 10(1), February 1992, pp. 26-52.

Reviews due Thursday, 3/17

Posted by Michael Swift on March 17, 2016 06:18 AM | Permalink

Comments

1. Summary
The paper presents the design of log-structured file system (LFS) for efficient disk storage management. The key idea of the design is to write all updates in a contiguous log like structure to avoid the penalty of disk seeks. This targets the use-case of small-file writes for which existing file systems perform poorly. LFS performs an order of magnitude better for small-file writes, and has comparable/better performance in almost all other cases.
2. Problem
The main problem with existing file systems was that they were able to utilize only 5-10% of disk bandwidth for doing actual disk writes, and the remaining time was wasted in operations like disk seek. This was primarily due to distributed nature of a file’s data and metadata across non-contiguous locations on disk. There were several technological trends that aggravated this situation. CPU speeds were increasing dramatically while disk speeds, though improving, were nowhere near catching up with CPU speeds. Also larger main memories could be used for having a larger FS cache which meant that most file reads could be satisfied out of main memory, and the disk traffic was dominated by file writes. These factors cumulatively led to a situation where improving the utilization of available disk bandwidth for useful work was crucial to sustain application performance. LFS does this by optimizing for small-file writes to disk, the dominating kind of disk traffic.
3. Contributions
The biggest contribution of this work was its novel and efficient use of log-based structures to design a file system. Firstly, LFS implemented log not as a temporary structure that was separate from actual storage for files, which was the case for many log-based systems. LFS used logs as the storage mechanism whenever a write to disk is required. To exploit contiguity, all writes to different files are written to same segment and the inodes are relocated and updated with new attributes. To enable retrieving data from logs during reads, they used inode maps to track the inodes. This involves less disk seeks while reading/writing/creating a file compared to other systems, in most cases. Secondly, and more importantly, LFS provided sophisticated mechanisms and policies to manage free space. The disk is divided into segments: segments are managed using “threading” without worrying about contiguity; within a segment, data is always written to contiguous locations. They studied various cleaning policies - greedy policy which yielded unexpected results, cost-benefit policy to manage the cold segments better and clean segments more efficiently. LFS also used method of double checkpointing for crash recovery, which was subsequently optimized using roll-forward techniques.
4. Evaluation
The paper has evaluated the system well. Using hot and cold regions in simulations, they showed how a naive greedy cleaner can lead to unexpected results. The benefits of using the ingenious cost-benefit policy have also been shown - policy is effective in bringing down the write-cost significantly. In fact, they have compared LFS against existing FFS and projected performance of FFS to give an idea of how much and when LFS would yield benefits. The authors have also shown using Sprite LFS that LFS performs 10x better than FFS for small (1000 1KB) file writes and show scope for further improvement in future. They also show that the performance for other cases - sequential reads, sequential writes and random writes improves or does not suffer in LFS. In the end they have shown the impact of cleaning overheads using data from deployments over a period of 4 months.
The insights provided in the evaluations are invaluable and thorough. However, the authors did not talk about and quantify extra memory pressure imposed by cleaning. Further, the basic assumption of disk traffic being dominated by writes could have been substantiated with some statistics.
5. Confusion
How is the inode map managed when an inode is relocated? It the inode map logged in pieces or logged in its entirety? How is inode map handled during checkpointing?
Why do frequency of cleaning and number of segments cleaned do not matter? The paper simply ignores them without any reasoning.

Posted by: Lokesh Jindal | March 17, 2016 03:54 PM

1. Summary
This paper introduces log structured file system (LFS) and discusses a specific implementation of this idea - Sprite LFS. LFS writes metadata and data updates to a file in a log. These writes are sequential in nature allowing the system to achieve significant write bandwidth over existing systems. In order to facilitate the writes in the form of segments, LFS introduces a notion of segment cleaning and compaction. The paper also talks about Crash Recovery and its associated mechanism. As compared to the Unix FFS, the LFS achieves significant improvement for workloads which involve writes to small files and slightly better performance for large sequential reads and writes.

2. Problem
File system performance was not able to match the performance improvement of CPUs. As a result FS became a bottleneck. Although there were measures taken to overcome the bandwidth requirement of FS, but access time remained the same. The increasing capacity of cache memories provided a unique opportunity. Most of the reads of the system were handled by the cache. As a result, the FS would see more write accesses than read. This coupled with the fact that the state of the art system (Berkeley FFS) achieved a write bandwidth of 5% of peak bandwidth motivated the authors to use the observation to optimize for the common case. The authors noted that the Berkeley FFS failed to achieve peak bandwidth for write because, writing to multiple small files would lead to significant seek overhead in order to lookup up the directory and file meta data. Given that large caches can serve as write buffers, the authors believed that it made more sense to develop a system where such writes could be carried out sequentially.

3. Contribution
The contributions of this paper are two fold. Firstly, noting that large caches should make a system designer question the basic observations that motivated the development of previous generation FS. Secondly, implementing a system with Log Structured File System. The basic idea of LFS is to complete writes sequentially in order to avoid seek overheads. Writes are sequential within a segment and there is potential for seek overheads if writes are larger than one segment. While writing the data to a log, the LFS system also writes the File Inode Data, Directory Inode Data and Directory Data to the log (depending on whether the write is simply a file update/ file creation/ file deletion etc). This spreads the inodes across multiple segments. In order to track these inodes, LFS introduces inode maps. This second level of indirection helps the LFS track inodes across segments. Lastly, each segment has a segment summary block which provides an indication of the nature of data within each block of a segment. LFS needs free segments to write data sequentially, as a result it needs a way to garbage collect unused segments or create unused segments by compacting partially used segments. The paper discusses some of the tradeoffs associated with compaction policies and garbage collection policies. The optimum system uses an age based policy which combines this information with segment utilization and multiple thresholds (for hot and cold segments) to determine candidate segments for garbage collection and compaction. In order to facilitate this process, it creates a segment map. Finally, the paper also discusses crash recovery using checkpointing and roll forward. For crash recovery LFS saves checkpoints to a fixed region in the disk. Each checkpoint ends with a timestamp giving an indication of the last checkpoint of the system. The checkpoint also stores addresses to inode_maps and a pointer to the last segment. This is used to additionally roll-forward the logs and thus reclaim work done before the crash.

4. Evaluation
1. The authors provide an insightful analysis related to cleaning policies. They clearly show the intuition of how free space can be more valuable in a colder segment.
2. In order to test their prototype they run multiple benchmarks which read and write multiple small writes or carry out sequential reads and writes for large files. LFS outperforms UNIX FFS nearly 10x in creating and deleting small files. For large file sequential read LFS performs at par, and for sequential writes outperforms FFS due to batching disk i/o.
3. To asses the cleaning overheads, they measure the performance of the system over several months. They notice that they are able to achieve a significantly better write bandwidth with a modest write cost.
4. They discuss the crash recovery time and its relation to the number of files written in a check point.
5. They also show that 17% of a log consists of metadata structures.

5. Questions
Could we go through Unix FFS and its advantages/disadvantages in the class.

Posted by: Urmish Thakker | March 17, 2016 02:17 PM

1. Summary
This paper introduces a log-structured file system, which aims to reduce the disk access time on file read and write compared to the traditional Berkeley Fast File System (FFS), while preserving a good level of crash recoverability.
2. Problem
The speed of processor has been growing exponentially, while the disk I/O speed remains behind. Moreover, although there is possibility to further speed up the data transfer rate using technology like parallel reading, the seek time in disk I/O is limited by mechanical movement and is unlikely to grow much, which will lead to more and more I/O bound application. The LFS tries to minimize the overhead of disk access time, and improve utilization of the disk bandwidth by enabling large chunks of read and writes.
3. Contributions
In the design of LFS, new data is always appending the end of the log contiguously. This provides speedup by keeping large sequential writes. Unlike in FFS, inodes are bound with the data blocks in location, when inode maps are cached in main memory, random reads on file can also be avoided.
A major design component they provide to make such file system work is the segment cleaner, which runs when log reaches the end of the disk. The cleaner can use the information stored in a special block called the segment summary block within each segment to determine invalid data in it, then move and compact live data into smaller number of segments to leave clean segments to write.
In determining the policy for data to move, they found the benefit to move “cold” files as they don’t appear to expand or shrink very often. Thus they designed age sorting policy to group files.
4. Evaluation
To evaluate the performance of LFS under scenarios simulating real workload, they actually installed the file system for several months in real-world use to measure the overhead of the segment cleaning process, and found a much better result than what they estimated, which they attribute to the fact that although small files are prevalent, there are certain larger files which themselves are less fragmented.
5. Confusion
In the four directions of policies to consider, why does the second one not matter as they claim?

Posted by: Fujie Zhan | March 17, 2016 01:27 PM

Summary
This paper introduces the log-structured file system - all updates to both data and meta-data are written out in logs. It discusses a design to aid garbage collection and compaction, and provide fast look-ups /reads., and fast crash recovery.
Problem
With rapid evolutions in processor and main memory, disk latencies dominate the performance of i/o intensive programs. Due to mechanical limitations disk seeks are expensive and offset the gain from high write/read bandwidth. Contemporary file systems stored much of their meta-data (inode blocks) away from actual data blocks, requiring multiple seeks to update/create files. A second issue was that meta data updates were required to be synchronous. The systems of the 1990s and today use memory to filter most reads and buffer writes to the disks. The authors argue that large sequential writes become commonplace motivating the log idea.
Contribution
The basic idea of LFS is appending all file system updates i.e. both data and meta-data into a log like structure. Because LFS appends inode blocks in the log as well it uses a level of indirection via inode_maps. Since most of the inode_map working set can be cached in the memory, thus LFS can perform faster look-ups than naively reading the entire log
1- An issue with using log structure is dealing with fragmentation and freeing of large extents of free space. The LFS solution is to divide the disk into regions called segments. The large size allows segments to amortize seek costs in sequential writes so segments can be threaded easily. This avoids moving around long lived data unnecessarily.
2- A segment cleaning mechanism searches for low utilized segments, and compacts the data into lesser segments, freeing up segments for new logs. To help identifying live data, each segment may contain one or more segment summary tables that provide a back pointer to the file + block number for each block in the log.
3- The cleaning mechanism requires a policy to select candidate pages to compact .The authors systematically simulate the cleaning overheads (write costs) for the common case where portions of the file system (hot) account for most accesses (90%). They develop a scheme that ranks segments based on both their utilization and chances of being further fragmented (estimated by age). Simulations validate that this scheme can achieve better efficiency in writes by incurring a lower write cost.
4- For crash recovery LFS saves checkpoints to a fixed region in the disk. Each checkpoint ends with a timestamp, that allows LFS recovery code to identify if a checkpoint has completed. The checkpoint also stores addresses to inode_maps and a pointer to the last segment. This is used to additionally roll-forward the logs and thus reclaim work done before the crash.
Evaluation
I found their analytical evaluation for cleaning policies interesting and insightful. They clearly show the intuition of how free space can be more valuable in a colder segment. The authors build a production ready prototype which is impressive. The authors use micro benchmarks for small and big file accesses. LFS outperforms UNIX FFS nearly 10x in creating and deleting small files. For large file sequential read LFS performs at par, and for sequential writes outperforms FFS due to batching disk i/o. To assess the cleaning overheads the authors measure a production system over several months, the write costs being quite low. The crash recovery time increase with number of files written in a check point. Also they show that 17% of a log consists of meta data structures such as inodes and other tables. (overhead)
Confusion
How to disk reads work in LFS, if the inode_map is caches in memory won’t it be a limitation if the disk is large ~ 10s of Tb.

Posted by: Brian Coutinho | March 17, 2016 12:31 PM

1. Summary
This paper discusses the first implementation of a Log-Structured File System. The distinguishing attribute of LFS is that all modifications/new data are written to disk (almost) sequentially, by utilising a file cache in main memory. IE, they are grouped into a single large I/o, which is very useful in environments where there are many small file writes. For this sequential writing, large free contiguous areas of memory need to be created / maintained, which requires a policy and a mechanism. A prototype implementation called SpriteLFS was made and evaluated.

2. Problem
At the time of writing of this paper, CPU speeds were increasing considerably, while disk access times were not. Thus, the authors hypothesized that more and more applications would become disk bound. Further, they assumed that with exponentially increasing main memory size, files would be cached for reads, and hence most disk activity would be writes. This memory increase would also enable write buffering.
Office and engineering applications were dominated by accesses to small files, and the creation and deletion times for these were mostly FS metadata updates.
Also, most FS at that time were writing synchronously.

3. Contributions
Primary contribution : Faster I/O, better disk bandwidth usage.
Secondary : Faster crash recovery.
Design Choices : How to retrieve information from the log, how to maintain large extents of free space.
Information Retrieval:
Index structures for random access retrievals.
Inode map :- usually small enough to keep in memory.
Free Contiguous Space Management:
Problem :- fragmentation of free space.
Idea - combination of threading (using fixed size extents called 'segments'), and copying (live data copied out from less utilized segments and compacted.) Read a number of segments into memory, identify live blocks, compact and write live blocks to a smaller number of segments.
Policy :- Selection of segments for cleaning, grouping of segments to be written.
'Write cost' used to evaluate policies.
'Bimodal segment distribution' - try to achieve this for best performance.
New Policy based on (benefit : cost) ratio, which requires usage knowledge, tracked by a 'segment usage table'.
Mechanism :-
'Segment Summary Blocks' - live block identification, position of that block within file.
Crash Recovery:
Last disk ops - end of the log. Checkpoints and roll-forward. Two, well-known checkpoint regions. Roll forward is enabled by the information in the Segment Summary Blocks. Directory Operation Log - consistency between directories and inodes.

4. Evaluation
Both a Simulation, and the Prototype were evaluated.
Every day / Real Life use :- Then-current machines were not fast enough to be disk-bound.
Synthetic Micro-Benchmarks :- large numbers of small files.
Temporal locality patterns observed.
Profiling :- 13% of writes were metadata.
Prototype performance was better than simulations (lesser cleaning costs).
Sprite LFS achieves a much higher fraction of disk bandwidth (~70%) than Unix FFS.
Matches or betters Unix perf for large files. For small file writes, upto an order of magnitude faster.
Moreover, the relative performance requires a pretty large utilization percentage for the FFS to outperform LFS.
My Opinion:
What about SSDs, other forms of backing storage?
Since the segment cleaner uses main memory, it could cause contention between the cleaner and applications in cases where main memory is small, or the disk is heavily used (cleaner runs nearly constantly).

5. Confusion

Posted by: Adithya Bhat | March 17, 2016 10:48 AM

1 Summary
The paper introduces log structured file systems to counter the low disk utilization achieved by the file systems of the time. The new design amortized seek costs of random writes by doing all writes on an append only log and invalidating previous entries. A Fs called Sprite FS was implemented based on this principle and benchmarked
2. Problem
As the CPU speed and memory size increased at a much higher pace than disk write speeds, most workloads were bottlenecked by this, synchronous writes exacerbated the problem. Additionally current file system designs optimized sequential reads not taking into account the high seek/rotational delay caused by the resulting random writes. The benefit of sequential reads could still be had by using memory as a cache for the file system, however the seek time losses were not easily recoverable.
3. Contribution
The primary contribution of this paper is taking the write ahead log concepts of databases and combining it with log based file systems of write-once media into a usable file system for read-write media, using the log as the final repository of information rather than a staging area only. The paper tackles the newly introduced issues of segment reclaiming by identifying the most empty segments to compact with the minimum number of writes. The log structured file systems also leads to consistency issues which leads to a checkpointing solution along with roll forward mechanisms to recover any un checkpointed I/O. The above two policies have been implemented but not tuned for efficiency
4. Evaluation
Sprite FS is used a production environment which makes evaluation easier as data can be accumulated from the production instance. The paper catalogues the time taken for creating and accessing small files as well sequential/random read/write performance. The authors also show the cleaning overheads of an old instance to establish that most recovered segments are almost completely empty leading to very little write overhead. While the production nature of the evaluation inspires confidence and very few questions remain unanswered. The paper could have considered a worst case workload that randomly updated files but accessed them sequentially such as databases and seen the impact of random seeks to read sequential segments due to the lack of logical locality.
5. Confusion
The paper talks of singly disk operations as if they were atomic, I am confused as to the behaviour when a failure occurs in the middle of an I/O operation. This seems critical for cases when the Time field is being written for a checkpoint and the system crashed in the middle of this operation. The resulting time may still be higher than the previous checkpoint while not being completely accurate.

Posted by: Abhinav Mehra | March 17, 2016 10:35 AM

Summary
The paper presents a Log-structured file system for efficient disk storage management. Further, the implementation of one such prototype, Sprite, is also discussed with some workload evaluations showing a performance increase as compared to the traditional Unix file systems.

Problem
With the increasing CPU speeds and stagnant disk speeds, it seemed the applications were going to be more disk-bound. Plus, due to the physical constraints of disk drives, there did not seem much scope in increasing the disk speed. Hence, an attempt is made here is to bridge this gap by introducing a new file system: LFS. The authors targetted mainly improving the speeds for writes as they felt with the increasing sizes of memory, read speeds would also scale up (as the file caches would also increase) and write speeds would be the major concern.

Contributions
a. Introduced a new file system - LFS; a file system that only maintains and structures the data and metadata in the form of sequential logs with minimal data structures.
b. The write performances are improved by buffering the writes/changes in the file caches and then writing out the changes on disk in a single write operation. Such an approach reduced the number of disk seeks and hence boosted performance.
c. Since, the data and metadata are appended to the end of the log in a circular fashion, an inode map is maintained to track the recent inodes(metadata for a file) for files.
d. The fragmentation issues are taken care of by the segment cleaners that free disk spaces(i.e segments) and allow large continuous memory blocks that can be used for fast writes. Segment cleaners use segment usage table and summary block for deciding on which blocks need to be freed.
e. Using such log structured data inherently gives multiple versions of a data and hence crash recovery is easier. Inconsistencies were taken care of by only looking into the recent logs. Hence, this avoided the traditional lookup methods that consumed more time. Checkpoints are maintained in the logs for ensuring completion of a write.

Evaluation
The paper evaluates LFS against Sun OS based on FFS and shows up a good performance boost. Mulitple smaller benchmarks were run to illustrate performances of reads/writes for smaller files. The evaluations showed LFS dominating FFS with a reduction in disk utiilization and an increase in CPU utilization which further boosted the throughput (in terms of number of files processed/sec). Another benchmark covered the reads/writes for larger files which again showed a boost in performance for writes and a near to similar performance for the reads. But, the performance LFS decreased for seqeuntial rereads where the FFS had an advantage due it's locality of data. Further, the crash recovery for LFS is also evaluated which showed it's effectiveness but it does not show on how it performed against FFS. The log bandwidth usage shows the usages by each of the block types present in LFS. Overall, the evaluation seemed good enough to prove the author's point of improving write speeds for file systems. Though, it would have been good to see the CPU overhead evaluation for the segment cleaners that form the integral part of LFS.

Confusion
The paper doesn't seem to clearly mention on when segment cleaners come up.

Posted by: Akshay Kanfade | March 17, 2016 10:33 AM

1. Summary

This paper presents the new disk storage management technique - log-structured file system using the ideas borrowed from the database community. The file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. In this paper, they describe the design, implementation and performance of the prototype log-structured file system called Sprite LFS.

2. Problem

Disk performance is bottleneck in many operations due to the slow performance growth rate. The problem with the existing file system is that they spread information around the disk in a way that causes too many small accesses. They also tend to write synchronously. This leads to poor performance on common workload. The authors propose the file system that achieves a better write performance for most workloads by improving on the above mentioned problems.

3. Contribution

The log-structured file system improves write performance by buffering file system changes in the file cache and then writing the changes to disk sequentially in a single disk write operation. The large size buffer is called segment. When the segment is full, it is written sequentially to an unused part of the disk, making it fast. However, live data is copied out (segment cleaning). They also discuss the different cleaning policies for hot and cold segments. Write is asynchronous as they are buffered in segments before being flushed to disk. As the data and meta data are scattered on disk in the log, the file system makes use of inode map to maintain the current location of each node in the log. A fixed checkpoint region on each disk identifies the locations of all the imap blocks. Garbage collection of old versions is done. Crash recovery is easy due to logs and checkpoints. To improve performance, they use roll-forward. The authors also describe the implementation of the Sprite LFS. I feel that the authors do a great job in determining the current bottlenecks in the system and propose a system that addresses these issues. They perform a careful analysis of their solution and come up with various policies and techniques to make them faster. The work is used currently to get the best performance from SSDs as they resemble log-structured file system in several ways.

4. Evaluation

The authors compare the performance of LFS and Unix FFS installed on similar systems. They use varied workloads and determine reasons for the system behaviour. Implementing the LFS was no harder than the Unix FFS. LFS was mainly designed for many small file accesses and its performance proves to be better than Unix FFS for the create, read and delete of small files. The system also performs competitively for large files. LFS achieves temporal locality. LFS performs better in random rereads however UNIX FFS performs better in sequential rereads. The authors also analyse the cleaning, crash recovery and other miscellaneous overheads.

5. Confusion

Drawbacks?

Posted by: Nivetha Singara Vadivelu | March 17, 2016 10:20 AM

Summary :
This paper explains a log structured filesystem which presents a new technique for disk storage management. The basic idea is that the file system writes all modifications to disk sequentially in a log-like manner. In order to maintain large free areas on disk for fast writing the log is divided into segments and use a segment cleaner to compress live information from heavily fragmented segments.

Problem:
In the last few years the disk access times have not been able to match with increasing CPU speeds. So, the applications performance might get disk bound. Increase in main memory size has made file caches more effective for read requests. As a result disk traffic is dominated by writes. This approach improves write performance by eliminating almost all seeks. The sequential nature of the log permits much faster crash recovery as well. It has been observed that most of the applications like office and engineering applications tend to be dominated by accesses to small files. Small files result in random disk I/O and the creation and deletion times for such files are dominated by updates to file system "metadata". The log structured file system hence aims to improve the write efficiency of small file accesses. Overall Sprite LFS permits about 65-75% of disk's raw bandwidth to be used for writing new data in comparison to Unix Systems which only utilize 5-10% of a disk's raw bandwidth.

Contribution:
i) Log Structured file system proposed the idea of large asynchronous sequential transfers to the disk that can utilize 100% raw disk bandwidth vs synchronous writes which couple application performance to that of the disk. This is important because otherwise applications would not be able to exploit the CPU speeds.
ii) The biggest contribution of this paper is the free space management policy which creates large spaces on disk for log writes. This cleaning policy is called the "cost-benefit policy". It allows cold segments to be cleaned at a much higher utilization than hot segments. This is because the benefit/cost ratio calculated in guiding the cleaner to clean the disk is a function of free space generate*age of data. This policy reduces the write cost to 50% over the greedy policy which directly picks the least utilized segment.
iii) They also added a new data structures to support the cleaning policy: segment usage table. Table records the number of live bytes and the most recent modified time for any block.
iv) Checkpoints which define consistent states of the file system (also 2 checkpoint regions if crash is during checkpointing can refer to other region) and roll forward to recover informantion written since the last change.

Evaluation:
The evaluation results of Sprite LFS has been compared to SunOS 4.0.3, whose file system is based on Unix FFS. The benchmarks are synthetic and not real workloads. The machine used for both systems was: Sun-4/260 (8.7 integerSpecmarks) with 32 megabytes of memory, a Sun SCS13 HBA, and a Wren IV disk.
The evaluations were first carried out with no cleaning in Sprite LFS and the results were better than simulation:
i) Sprite LFS kept the disk only 17% busy during the create phase while saturating CPU. In contrast, SunOS kept the disk busy 85% of the time during the create phase. Therefore performance will improve 4-6 times as CPUs get faster.
ii) Faster for random write because it turns them into sequential writes, also faster for sequential writes since it groups them into larger IO
iii) Read performance is similar when files are written sequentially and then read but worse when files are written randomly then read. In this case Sprite LFS requires seeks but SunOS does not.
The system was then evaluated with cleaning over a four month period on : /user6, /pcs, /swap2, /tmp and /src/kernel. Even with high disk utilization (11-75%) most of the segments cleaned were empty. Write cost ranged from 1.4-1.6 when compared to 2.5-3 in simulations.
This evaluation does give an understanding of file system performance. However, in my opinion there were several more things which should have been measured. Firstly, the cleaning should have been tried at shorter intervals also to evaluate write costs. Cleaning after large gap of 4months would have led to most data becoming dead hence reducing the cleaning cost. Secondly, the crash recovery mechanism is crucial for this file system as they cache a lot of data in file cache before writing the data to disk. The primary assumption is that there will be less crashes. However, this feature did not go into production system and crash recovery has not been timed in production system. The code was run to understand the time to recover depends on checkpoint interval and rate and type of operation being performed. Lastly, the overall performance has been described just in terms of write cost. It would have been good to see overall performance improvement of an application. And a split of performance of different operations for an application, for example ,a re-read would affect performance negatively, and write would improve it in comparison to Unix FFS. This comparison will help the users evaluate overall benefit of the file system.

Confusion:
Could we discuss the roll forward mechanism in class?

Posted by: Vishakha | March 17, 2016 09:20 AM

1. Summary
The main idea behind LFS is to use disk purely sequentially for writes. This new file system uses log-like data structure on the disk that buffers file system modifications in order to gain higher disk performance and higher disk bandwidth. The authors of the paper have suggested this approach that has introduced zero seek time, faster garbage collection and crash recovery.

2. Problem
There was a growing gap between sequential and random I/O performance. The CPU speeds were much faster than the disk access and the applications were becoming disk-bound. RAID-5 was especially bad with small random writes. The disk was too fragmented when there were too many small files with sizes 8kb, 16kb etc. This led to multiple slow disk reads and writes in file systems like FFS and was highly inefficient in achieving the maximum benefit from the disk bandwidth.

3. Contribution
It is easier for writes to use disk sequentially since it can do all write operations near each other to empty space. It also works well with large sequential writes (RAID-5). Read operation may not be affected if they are not near each other on the disk since the cache sizes are getting larger. Buffered data is written from memory to disk sequentially to a new segment ensuring good bandwidth. Inodes are no longer at fixed offsets, the current offset on disk is used. Imaps were added to track the inodes location on the disk. There were global data structures added that maintained the list of imaps. The pointers to the recently used imaps were kept in memory as an optimization. Checkpoint and rollback mechanisms were introduced for efficient crash recovery. There were two checkpoint regions introduced and after crash if they were not in sync, the latest timestamp checkpoint was considered for recovery using the rollback where it would revert the file system from log segments written after the last checkpoint.

4. Evaluation
The authors used real-world scenarios to evaluate the amount of time FS took to copy the blocks to maintain free space and gathered that 40% more time was taken and it could be scheduled during idle time so it has a lesser impact on performance. Crash recovery was tested by creating megabytes of fixed-file sizes. With growing sizes, the recovery time increased thereby testing both checkpoint and rollback mechanism. I really like the efficient optimization done for garbage collection wherein the version number was combined with the inode number to generate unique id for the contents of the file. If the uid of a block didn't match the uid currently stored in the inode map, the block was discarded.

5. Question
I want to understand how this works with paging in solid state drives.

Posted by: Sejal Chauhan | March 17, 2016 09:10 AM

1. Summary
This paper presents log-structure file system, novel technique with simple cleaning policy for disk storage management where all writes are sequential targeting small file writes speedup and crash consistency. It introduces structures like inode-map for identifying each inode, checkpoint region to identify inode-map blocks and segment summary block to identify live blocks. Sprite LFS implementing this technique proved to be disk-efficient utilizing 7-times the disk bandwidth than UNIX file system.
2. Problem
Other elements of system need to match CPU speed. Even though disk transfer bandwidth is increased, certain mechanical-based property like disk access time are limited to speed-ups. There is a need to employ large main memory by absorbing majority of the read requests in cache and serve as write buffers to make single sequential write with one seek. Current file systems spread information uniformly which causes many small seeks and tend to write synchronously due to metadata structure dependency which bring down the application’s performance to disk.
3. Contributions
Due to sequential-write nature of this file system, most of the seeks are eliminated and fault isolation is introduced where only the recent portion of log needs to be analyzed for crash recovery. It is responsive towards different workload and thus gain performance by employing cost-benefit based algorithm which segregates older, slowly changing data from younger, rapidly changing data and treats them differently while cleaning. It preserves the read performance of UNIX FFS by using index structures to permit random access retrievals. Inode access performance is matched with FFS bitmap by caching compact inode-maps in main memory. It uses combination of threading and copying to reduces fragmentation and copying cost by dividing disk into segments whose size make sure that the ratio of disk-transfer to seek is greater than one. Due to summary block, bitmap and hence its maintenance is eliminated thus simplifying crash recovery. It eliminates the need to scan all metadata structures to ensure crash consistency by using checkpoints: to define consistent file system state towards end of log and roll-forward: to recover most of the information like inode-map, live data utilizations and directory entry written since last checkpoint.
4. Evaluation
Since Sprite LFS buffer all small writes, roll-forward seem like useful mechanism particularly when system crashes while it was about to issue single write for buffered content which authors didn’t implement even though its code do not seem complicated. This is worsening by short checkpoint interval which often updates metadata to the disk. Since most of the available applications are not disk-bound, authors were not able to evaluate the performance gained by log-structure file system in every day-use though whatever speed-up was achieved is attributed to removal of synchronous writes. It was 10 times faster compared to UNIX FS without cleaning overhead thus delivering high performance and disk utilization for random writes and reads if temporal locality matched spatial locality. Authors experience 2 dissimilarities in bare workload: greater locality within individual segments and large number of cold files which necessitates re-evaluation of segment cleaning policy.
5. Confusion
How does it handle multiple concurrent large writes? Wouldn’t this cause fragmentation and affect read performance by buffer flushing due to cleaning?

Posted by: Unmesh Phalak | March 17, 2016 09:00 AM

1. Summary
This paper proposes a log structured file system aimed at speeding up file writes and crash recovery. LFS uses a copy-on-write technique and amortizes the seek cost by buffering all writes (both data and metadata) in memory and writing these buffered writes as a sequential segment to the disk and thus maximizing the sequential bandwidth utilization.

2. Problem
As CPUs are getting faster, disk becomes the bottleneck. Disk transfer bandwidth is improving faster while the seek times are hard to improve and hence there is a huge gap between random I/O and sequential I/O performances. LFS is based on the assumption that increasing memory sizes allow more data to be cached in main memory and hence disk traffic will be dominated by writes. Writes in existing file systems would require a number of random I/Os to update the file system metadata.

3. Contributions
The main contribution is the building of a log structured file system that speeds up file writes and crash recovery times. i) The key idea behind LFS is to buffer all writes, data as well as metadata, in main memory and write these buffered writes as a sequential segment to the disk. LFS never overwrites existing data and writes the sequential segments to a new unused segment on the disk. Inodes in LFS are no longer at a fixed location as they are written to new location on being modified, LFS introduces new structures inode map and checkpoint region to locate the recent version of inodes. ii) LFS introduces a cost and data benefit cleaning policy to garbage collect the older versions of data and metadata scattered on disk and tries to minimize fragmentation. Colder segments with few dead blocks are garbage collected sooner while waiting to garbage collect hotter segments with dead blocks as some blocks might be updated soon leaving the new segment with new dead blocks. Segment usage table helps to determine which blocks are live. iii) LFS uses the following approach for faster crash recovery: checkpointing using the last complete CR region and roll-forward to recover from crashes during writes to the new CR region. Since data is never overwritten, few of the older versions of the file can be maintained and hence LFS was well suitable for building versioning file systems that came later. Also the basic idea of copy on write fits well for flash-based SSDs where overwriting an existing block is costlier due to erase/program cycle and writing to new unused block will require only a program. Thus LFS-like approach was adopted for flash-based SSDs where the flash translation layers uses a log structured approach.

4. Evaluation
The papers presents a thorough evaluation of its mechanisms and policies. Through a simulation based approach, the best policy for segment cleaning is determined. Evaluations show that LFS outperforms FFS by orders of magnitude for creation and deletion of small files and the performance of large sequential writes is 1.3x better than FFS. LFS performs similar to FFS on a sequential read after a sequential write and random reads. The performance of LFS degrades than FFS for sequential read after random writes and this is expected as the blocks are not sequential on disk due to copy on write. In the experiment, they perform a random read and then a sequential reread after a random write. Since some of the blocks might have been cached after the random read, I wonder how the performance of a sequential read would be without the random read in between the write and read. Evaluations of re-reads of large files that do not fit in memory would have been again interesting as LFS assume disk traffic mostly consists of writes. Evaluation show that segment utilization is mostly full thus validating their segment cleaning policy.

5. Confusion

Posted by: Aishwarya Ganesan | March 17, 2016 09:00 AM

1.Summary
This paper presents a new technique for disk storage management called a log structured file system.Log-structured file system writes all new information to the disk sequentially in a log-like structure that increases the write performance and crash recovery.

2.Problem
Processor speed is increasing at an exponential rate and so is main memory. However, the rate of improvement of transfer bandwidth and access time of disk remains relatively slow. Because of these factors disk traffic is becoming more and more dominated by writes since large file cache can now absorb large fraction of read requests. Further, office and engineering applications tend to be dominated by accesses to small files. But the data is spread around the disk, causing many small access. Log file system addresses these problems by buffering writes to memory in a segment and later writing to disk sequentially.

3.Contributions
The overall contribution of the paper is the design and implementation of LFS. At a high level the idea of LFS does not seem novel or complicated: log the write content in memory and later write the content to disk sequentially. However a closer inspection of it reveals several challenges to do so. This paper addresses them:
* By storing both meta-data and data itself into one format, it reduces the number of random access to the disk, which requires seek and also the nature of log allowed batched write so that small writes could be aggregated into one I/O.
* Imap to keep track of the inode that stores inode number and location on disk. It speeds up subsequent read requests as imap can be mostly cached in memory that prevents I/O.
* Garbage collection of older segments since data & metadata are scattered on disk with different version. LFS proposes various mechanisms and policies to do so.
* Crash recovery mechanism: LFS does not need and fsck, hence recovery is fast. LFS employs several mechanisms to handle cases such as consistency and synchronization.

4.Evaluation
The authors present several benchmark experiments to measure the performance of LFS and compare it to SunOS 4.0.3 with UNIX FFS. On small read, write and deletes, LFS is almost 10 times faster than SunOS. For large files the performance of LFS is comparable to the SunOS. LFS is substantially faster for random and sequential writes.
The authors also provide benchmarks for cleaning overhead which shows that it can be substantial in some cases. Further, the authors provide some numbers on other overheads of the LFS system.
Overall the authors attempted to cover all the cases in their evaluation section. However, they do not provide end-to-end comparison of the system with other system such as Unix FFS i.e. including the cleaning and crash recovery overheads.
5.Confusion
It is not clear how the cleaning and crash recovery mechnism costs.

Posted by: Udip | March 17, 2016 09:00 AM

1. Summary
The article describes a new file system called Log-structured file system(LFS). LFS aims to increase the disk bandwidth usage by writing all the modifications to the disk in a log-like structure and aims to handle the disk reads by using memory to cache as much data as possible.

2. Problem
CPUs, disks and main memory are main components which affect the performance of a file system. Persistent data is stored in the slow disks, the main memory can act as a cache for this data. The CPUs were getting faster but the disks were improving rapidly in the areas of cost and performance rather than speed and main memory was also growing exponentially in size. Taking advantage of this the contemporary file systems cached more and more data in memory absorbing a larger fraction of read requests. The write requests were cached as well but the metadata structures like directories and inodes were written synchronously via random accesses negating the performance gains of caching data writes. Studies had shown that office and engineering applications were dominated by accesses to small files whose size was in order of a few kilobytes, and in the contemporary file systems accesses to small files resulted in high percentage of random i/o which resulted in inefficient use of disk bandwidth. The main problem which the authors focus on is improving the efficiency of small file access, but their techniques scaled well to large files as well.

3. Contributions
The main contribution of this work is the development of a mechanism to allow writing of file data and file system metadata in a log-like structure which can be written to the disk as a sequential operation increasing the disk bandwidth usage and development of a garbage collection mechanism and policy which is used to handle old/unused versions of data structures.
To make sure that the writes are sequential, LFS caches all the updates in memory till the size of the cache reaches a threshold, this cache is called a segment. The information written in the segment includes the data and metadata structures. inodes are used to track all the data blocks of a file, imaps are used to track the location of inodes. The file system also maintains checkpointing regions at the start and end of the disk which contain pointers to the imaps. Whenever a file is modified, LFS will create new inodes and copy the modified data blocks to a new segment in memory instead of modifying the on-disk data structures, the imaps and checkpoint region will be modified to point to the new inode, thus the older data will be marked unused and the authors develop novel ways of cleaning the segment. The article proposes policies and mechanisms for segment cleaning, after conducting various simulations the authors propose a cost-benefit based approach which helps to clean the cold segments sooner than the hot segments which are being continuously modified.
The article also proposes a crash recovery approach by scanning through the logs written after the last checkpoint to recover the latest data and the file system consistency.

4. Evaluation
The authors implemented LFS for the Sprite OS and spent significant amount of effort on evaluating the performance of the file system. To come up with a suitable policy for segment cleaning the authors ran a simulation where they modeled the system as small 4kb files which are either in the hot or cold group. The initial greedy policy proposed by the authors did not perform as per their expectations and the authors clearly explain why the failure occurred and proposed a new cost-benefit policy based on their new found understanding which performed well in the simulations. The authors also evaluate LFS thoroughly for the goal they had set for themselves - improving efficiency of small file i/o. Their micro-benchmarks showed that LFS has an order of magnitude better performance than the Unix FFS when no segment cleaning is involved, they also showed that the CPU was 100% utilized for LFS while performing writes which implied that faster CPUs would result in better write performance as the authors had claimed in their design notes. The authors also ran benchmarks to show that LFS performed at least as good as FFS for large files. The authors measured the cost of segment cleaning by running their measurements on production system which was in use for several months and found that the results were substantially better than the simulation results. The overall write costs ranged from 1.2 to 1.6. The authors also provide a measurement of recovery time after a crash based on various simulated crash scenarios. They show that recovery time varied with the number and size of files written between the last checkpoint and the crash. Finally, they show the percentage of data and metadata on the disk and explain that the high percentage of imap data maybe because of the small checkpoint interval of 30s. I feel the authors have done a pretty thorough analysis of their system and have give proper explanations for their observations.

5. Confusion
I would like to know about the policy on how many segments to clean at a time as the authors don't talk about it much.

Posted by: Mihir Shete | March 17, 2016 09:00 AM

Summary : The authors identify that the existing Unix file system(FS) is slow for write operations in various scenarios. They then present log-structured file system in which all changes to disk are written sequentially adhering to a log-like structure. A sequential log consists of data, metadata and modifications to the data. To ensure consistency of the FS and maintain optimal performance, mechanisms like segment cleaning, checkpointing and roll forward are all introduced. Their prototype, Sprite FS which is based on the above ideas is evaluated to show it performs better that the Unix FS in terms of writes speed and even the reads consume same bandwidth. In addition to the above improvements it also provides for faster crash recovery and has efficient free space management.

Problem : Even though CPU speed and main memory increased, disk improvements were confined to capacity and cost but not performance resulting in disk becoming the bottleneck for disk operations(read, write and seek). In the then existing file systems, file data and metadata were across a disk, resulting in multiple seeks even for small accesses/consistency checks that in turn decreased utilization of the disk bandwidth to only 5%-10%. In addition applications performed synchronous writes that resulted in a lot of unnecessary delays. Hence a prototype, Sprite FS, was implemented based on log-structured file system that wrote sequentially all changes in cache at the head of log, and all at one place utilizing 65-75% of disk bandwidth and even crash recovery required only examination of the most recent chunk of the log.

Contributions :
a] During a write to disk operation data, metadata and all updates are buffered. Writes are issued as segments(normally 512 KB) to utilize sequential bandwidth. b] Inode maps - provide one level of indirection to reach to most recent inode of a file and this points to the latest version of data blocks of the file. c] Segment summary block - that records the inode and offset of each data block of a file, which is useful in detecting live blocks. d] Segment usage table - records the number of live bytes in a segment and the latest modified time of any block in the segment used by the cleaning policy. e] Cost-benefit cleaning policy make sure that cold segments are cleaned are faster than the hot segments. This delay in cleaning of hot segments resulted in more blocks dying in current segment and age sorting of live data when writing back to a disk. f] Two checkpoint regions are maintained and the one with the latest timestamp is used for recovery. g] Roll forward principle is used to make sure updates of data and metadata are written to disk but not immediately to checkpoint region.
Having seen the above contributions we can contrast LFS to traditional file systems on the following grounds, i] LFS converts multiple small synchronous random writes to large asynchronous sequential transfers, ii] Disk space is efficiently utilized by dividing the disk into segments and blocks. iii] Faster crash recovery in LFS as only the log since last checkpoint has to be examined whereas in Unix FS entire disk had to be scanned.

Evaluation : The authors have well evaluated their system with the aid of multiple micro-benchmarks. Sprite LFS and SunOS demonstrated the LFS is 10 time faster in creating and deleting small files. IT also had higher write performance and similar read performance as Unix FFS. Sprite LFS utilized about 65%-75% of the disk’s raw bandwidth whereas Unix used only 5%-10% for writing new data. The extended period of evaluation of the cleaning policy of LFS spanning a couple of months is highly appreciated. Temporal locality assumes that data used recently resides at same locations in the log. LFS exploited this temporal locality and it was shown that if logical locality was equivalent to temporal locality with LFS and Unix FFS performed similarly. Even the write cost has been evaluated in extensive detail and the authors have taken effort to analyze these and suggest the probable solutions to reduce the write cost by running the cleaning task when system is idle. LFS also demonstrated a faster crash recovery due to mechanisms like checkpoint and roll forward whereas in Unix FFS this was slow due to increase in time of fsck owing to the large disks. Overall the evaluation in this paper is commendable as the authors have ensured to evaluate each of their new proposed mechanisms extensively, presenting the empirical data in the form of easily comprehensible tables and graphs and at the same time analyzing the trends observed to propose solutions for further improvements.

Confusion : Granularity and hierarchical organization of directories, files, segments and blocks is not very clear.

Posted by: Shruthi Racha | March 17, 2016 08:59 AM

1. Summary
The paper introduces a “log-structured file system” that works by writing all modifications to a sequential loglike structure. The authors discuss the system and the supporting structures and policies (segments, cleaner, roll-forward) needed to make it work. They then run a series of micro-benchmarks and discuss situations in which the system would benefit the user most.

2. Problem
The speed of disk accesses has not been able to keep up with increasing CPU speeds, and this is likely to cause more applications to be disk bound. Current file systems have two problems: 1) they spread disk data in such a way that there are too many fragmented small accesses, and 2) they tend to write synchronously, meaning that writes cannot be handled in the background. Log-structured file systems use disks more effectively than current file systems and permit more of a disk's raw bandwidth to be used for writing new data.

3. Contributions
Segments: in log-structured FS, disks are divided into fixed-size extents called segments. Each segment is written out sequentially from beginning to end. Their implementation (Sprite) uses a mix of copying and threading: all live data must be copied out of a segment before the segment can be rewritten, and the log is threaded on a segment basis.
Inodes: much like Unix, Sprite uses inodes to keep track of files. Inode maps are used to maintain current locations of each node. Checkpoint regions on each disk identifies locations of all inode map blocks.
Cleaning: copying live data out of a segment is called “segment cleaning”. Segments are read into memory, live data is identified, and said data is written back to a small number of clean segments. The old segments can then be marked as clean and rewritten as needed. To keep track of blocks, each segment has a segment summary block that gives the file that the segment belongs to. SSBs can also differentiate live blocks from dead blocks that can then be rewritten. This saves memory, disk space, and simplifies crash recovery.
Policies: there were two tested polices for selecting segments to clean. The first is the greedy poliy which just tries to choose the least-utilized segments to clean. This showed poor performance in systems with locality. A second policy, the cost-benefit policy, compared the “benefit” (free space generated * age of data) vs the cost of cleaning. The highest benefit:cost ratio was selected. This resulted in much better performance.
Crash recovery: two major mechanisms were used for crash recovery. First, checkpoints. Sprite uses checkpoint regions written out to a fixed location on disk. If a crash occurs, it can use the data written there to recover. Second, roll-forward. Sprite can use info in segment summary blocks to recover recently written data. It also adjusts utilizations in the segment usage table read from the checkpoint.
Overall, log-structured file systems work off of temporal locality: that is, information that is written at the same time will be clustered on the disk. This makes for very fast sequential reads, but random reads suffer.

4. Evaluation
They implemented Sprite LFS as part of the Sprite operating system, reporting that it's no more complicated than Unix FFS. They run some microbenchmarks to test the program. One is the creation/read/deletion of a large number of small files. Compared to SunOS, this was very effective. However, as the authors do point out, Sprite works best with temporal locality. They also don't clean in their micro-benchmark, instead providing a separate set of measurements for that. To their surprise, cleaning costs are actually lower in Sprite than in the simulations.

5. Confusion
What sort of memory issues are there? How are clean segments selected for rewriting? Does it just pick the first clean segment available and fill in data there?

Posted by: En-Ui Lin | March 17, 2016 08:57 AM

Summary
The paper explains the Log-structured File System (LFS), a disk storage management technique which was designed to efficiently handle disk -bound applications that were becoming increasingly prevalent in conventional files systems due to various technology trends and the nature of contemporary application workloads. LFS achieves high performance and faster crash recovery than the conventional Fast File System (FFS) by using file caching, write buffering (asynchronous writes) and by writing all modifications to disk sequentially in a log-like structure. The LFS prototype named Sprite LFS was developed and then used to successfully demonstrate the performance superiority of LFS against an FFS counterpart.

Problem
A combination of factors such as technology trends (1. Processor speed improvements outperforming disk speed improvements, 2. Significant improvements in memory capacity), rise of file system workloads requiring many small-file accesses, and the problem with existing file systems (1. Non-contiguous storage of files and inode information which caused many small random write I/Os for the above workload, 2. Synchronous write operations which prevented applications to benefit from faster CPUs) caused applications' performance to be disk bound. A solution that preferred sequential disk accesses for write operations (to minimize disk overheads and maximize disk bandwidth utilization) and maximal use of file caching and write buffering (to allow faster reads and asynchronous writes respectively) was thus needed.

Contribution
The authors propose the idea of a Log-structured File System to counter the above problem. This file system design was heavily influenced from the designs of 1. Earlier LFS ideas, 2. Garbage collection mechanisms for programming languages and 3. Database systems.

The basic idea in LFS is to buffer writes in memory and write them asynchronously to disk in a sequential log-like structure. LFS handles information retrieval from the persisted logs by using index structures and by using a checkpoint region at a fixed location to store locations to inode map information. The issue of free space management in LFS is resolved by dividing the disk into segments and using a segment cleaner to recover free disk space from segments that were partially utilized or not utilized at all. (Segment utilization drops over time due to file deletions and overwrites). The LFS design uses segment summary blocks to identify 1. Live blocks in a segment 2. The file which contains a block and 3. The relative position of a block in a file. The specifics of the segment cleaning operation (frequency of segment cleaning, number of segments to clean, which segments to clean etc.) are guided by the segment cleaning policies. A segment usage table also exists to determine the age of a segment, a detail which is required for the cost-benefit policy of choosing which segments to clean.

LFS achieves crash recovery by using the ideas of checkpoints (Positions in log at which all file system structures are consistent and complete) and roll-forward (process of recovering from a previous checkpoint) which were borrowed from crash-recovery mechanisms in Database systems.

Evaluation
The authors evaluate the design and implementation of LFS in multiple ways. First, an initial study shows that the write cost (a measure of disk write overhead) for segment cleaning is inversely proportional to proportion of unused blocks in the segment. Second, the segment cleaning policies of greedy selection and cost-benefit selection were analyzed on a simulator for the hot-and-cold access pattern. It was observed that the cost-benefit policy was able to achieve the bimodal segment distribution which produces ideal performance for the LFS, while the greedy policy (coupled with age sorting to group hot and cold data) underperformed as the varying rates of deterioration in segment utilization for the hot and cold segments caused segment cleanup to be performed at higher utilization.

The Sprite LFS prototype was then compared against the SunOS 4.0.3, which used the Unix FFS. For both small-file I/O and large-file accesses, Sprite LFS outperformed in all operations except the sequential reread after a random write. The results were attributed to the observation that LFS is suited for disk accesses exhibiting temporal locality, while a traditional file system is suited logical locality. The cleaning overhead of the segment cleaning operation was then evaluated by observing production log-structured file systems over several months. (However, the chosen startup time of several months seemed ad-hoc to me). The results were encouraging, as they indicated lower write costs for all the file systems and bimodal segment distribution. In the next experiment, the crash recovery time was measured for different file sizes and amounts of data recovered. These values were found to be reasonable. Finally, the authors studied the relative importance of data written to disk, which found that that most segment space and log contents were dominated by the data blocks. The metadata overheads in this case could be tuned by varying the checkpoint interval.

Overall, I believe that these experiments were sufficient to demonstrate the performance of an LFS system, as they successfully evaluated all the key components and performance aspects associated with the LFS design.

Questions / confusion
1. The concept of the directory operation log was not clear.

Posted by: Shantanu Bhate | March 17, 2016 08:54 AM

1. Summary
This paper proposed a new technique for disk storage management known as log-structured file system. The main idea of the proposed solution is to buffer the writes and write them to the disk sequentially leading to a speedup in the file writing as well as crash recovery. The authors implement and evaluate their prototype. The initial results show that LFS outperforms Unix FS by an order of magnitude for small writes and matches it’s performance for reads and large writes.
2. Problem
File system design is impacted by technology as well as the workload. Recent advances have ensured that there is an exponential rise in the processor speed as well as memory size. However, improvements with respect to disks haven’t been able to keep pace when it comes to performance improvements. Reason being that disk access time, which plays a major role in deciding the performance of a disk, is limited by the mechanical motions of the disk that are hard to improve. While reads can be handled by leveraging the increasing cache size, making writes more efficient is limited by the aforementioned factor. The authors aim to solve this problem and make writing to disks more efficient. In the process the authors also aim to solve the problems faced by existing file systems – large number of small accesses due to spreading of information in the disk and limit the benefit of faster CPUs due to synchronous writes.
3. Contribution
According to me, the key idea of the proposed solution is to improve the write performance by buffering a number of writes and then writing all the changes to the disk sequentially in a single disk operation in order to improve the utilization of disk bandwidth. Firstly, in order to achieve an acceptable performance with respect to random-access reads, the authors propose using a number of data structures. The key data structure that allows them to meet their goal is the inode map, which can be indexed into to find the location of each inode. The second main contribution of the authors is their approach of combining threading (on a segment-by-segment basis) with copying (live data must be copied out of a segment before rewriting it) to manage the free space in the disk. The next major contribution of the authors was to design policies that would aid in answering questions such as how many segments and which segments should be cleaned. The authors analyze a number of cleaning policies and decide to use a cost-benefit policy that allows cold segments to be cleaned at a higher utilization than hot segments. Lastly, the authors propose a crash recovery mechanism, which consist of checkpoint regions (consist snapshot of the file system) and roll-forward (used to recover information since last checkpoint).
4. Evaluation
The authors evaluated their proposed solution extensively. They compared their prototype with SunOS using synthetic micro-benchmarks. They also evaluate the performance of their cleaning policy by measuring the cleaning overheads as well as performance of the crash recovery mechanism. Though their evaluation seems complete, I feel that there are a few shortcomings. Firstly, the results of the micro-benchmarks cannot be taken at face value as they do not include the cleaning overheads. Secondly, it would have been ideal to compare the performance by varying the the segment sizes. Thirdly, I feel that it would have been good if a comparison of the LFS was done with/without write buffering to quantify the impact of write buffering as I feel it is one of the building blocks of the proposed solution.
5. Confusion
Is it reasonable to make the assumption that most of the read requests can be satisfied by the cache and would not require disk access?

Posted by: Arjun Singhvi | March 17, 2016 08:52 AM

Summary
This paper proposes a new file system for disk storage management known as log-structured file system (LFS) that buffers file system modifications in sequential log-like structures and writes them sequentially to the disk to achieve higher write performance and higher disk bandwidth utilization. While doing so, it also solves a number of challenges associated with traditional log-based system using efficient indexing data structures and efficient cleaning policy.

Problem
With growing size of system memory, more data is now being able to be cached allowing for faster file reads. However, the seek and access times for disk are only improving at a fraction of CPU processing speed, hence most applications are going to be write-bound. Also, random write I/O are more expensive than sequential I/O writes to the disks. So, a file system that could exploit a large write buffer cache in memory, while still requiring only sequential writes to disks will be an ideal file system that could outperform every other contemporary file system like Unix FFS. And this has been the motivation for this paper.

Contributions
According to me, the following are the novel ideas presented in the paper:
(1) by storing the indexing information, the log-structured file system is able to use logs to permanently store data, instead of just using it as an auxiliary structure for crash recovery.
(2) through efficient buffering of the file system changes in main memory and use of a single sequential I/O to persist all modifications in one go, LFS is able to achieve higher disk bandwidth utilization and higher performance,
(3) through the use of specially designed data structures like inode map and fixed checkpoints, LFS is able to solve the problem of efficiently retrieving the data stored in the logs,
(4) through proposed design of segments with their segment summary blocks, LFS is able to provide efficient free space management, ensuring that free extents will be available on the disk for writing data,
(5) the design of a segment cleaner and a cost benefit policy based on the hotness and coldness of the data that is able to achieve high disk utilization,
(6) check pointing and roll-forwards are proposed as novel mechanisms for crash recovery and consistency management.

Evaluation
The authors implemented their proposed LFS features for Sprite LFS, which was part of the Sprite network operating system. Subsequently, they evaluated Sprite LFS against the then popular Unix-based Fast File System (FFS) for SunOS 4.0.3 on a Sun-4/260 machine with 32 MB of memory and 300 MB of usable disk storage space. Since, the machine being used was not fast enough to be disk-bound for the existing workloads, the authors used micro-benchmarks to evaluate the performance of Sprite LFS with SunOS FFS. For a micro-benchmark that creates, reads and deletes a large number of small files under best-case scenario with no cleaning overhead involved, they found Sprite LFS to be ten times faster than the SunOS FFS. On the other hand, the performance of Sprite LFS for large files was competitive with SunOS in most cases, except for sequential reads of randomly written files.
To demonstrate the long-term effect of the cleaning overhead, the authors have presented statistics from the production use of Sprite LFS measured over several months and found that Sprite LFS is able to achieve higher disk capacity utilization using the cost benefit policy over SunOS FFS.
According to me, the evaluations presented by the authors were fairly detailed and well-reasoned. However, my suspicion is that file reads in the presence of heavily fragmented segments (due to random writes) will suffer badly in LFS and this has been partly admitted by the authors in their evaluation, but not stressed well enough (since authors have only presented the case for minimum write cost). The cost benefit-based cleaning policy presented in the paper proposes to clean hotter segments more frequently than colder segments. However, the results shown in the paper are only for simulated data and long-term measurement done over a period of several months. It would have been interesting to see how well this policy works when stress-tested individually for different kinds of real workloads.

Confusion
Could we go over in the class about how roll-forwards are used in crash recovery, especially the use of directory operation log?

Posted by: Saket Saurabh | March 17, 2016 08:50 AM

1. Summary

The authors are trying to solve the problem of disk storage management with a log like structure which stores sequentially all the modifications in writing them to the disk.

2. Problem

More and more applications are becoming disk bound as the disk access time has improved very slowly compared to increasing processor speeds. This makes it necessary to devise a new disk storage management technique. Addtionally the increase in main memory sizes allows more and more effective buffer caches which satisy the read requests. Thus the major disk traffic is dominated by disk writes, and a design that takes it into consideration is inevitable.

3. Contributions

The idea of log structured file system increases the disk write performance by eliminating all seeks and writing all new data to a sequential structure called log on disk. The author have prototyped this idea and named it as Sprite LFS. In the paper they demonstrated that though the read performance is as same as existing systems, the writes are much faster and outperforms Unix file system by an order of magnitude for small files while matching & execeeding for large writes. The bulk write enables maximum disk bandwidth utilization.

They've introduced notion of segments which are extents of free space on disk available for writing new data and brought in segment cleaning policies which takes into consideration the nature of data access and a cost benifit technique based on age.
For crash recovery they have a roll forwarding technique in LFS which can restore data from the last checkpoint. The design also has policies to decide whent to checkpoint such as after a fixed period of time or after writing fixed amount of bytes to disk.

4. Evaluation

The authors used small bench mark programs to measure the best case performance of Sprite LFS and compared it to that of Unix FFS on a synthetic workload. The operations of simple create, read and delete shows sprite lfs as ten times faster than FFS for create and delete, it is also faster for read as the writes are densely packed into the log. In their benchmarks they also showed that Sprite LFS gives a competetive performance for large files as well. They mention that the system performance for Random writes is substantially faster as they are converted to sequential writes to log. Even for the sequential writes because of grouping many large blocks to single I/O it performs better to group writes of FFS.

They also evaluated cleaning mechanisms by recording the statistics from their production based log structured file systems for several months. And in their results they show that the write costs are insignificant. In my opinion the authors did a good evaluation considering from both responsivenes and reliability factors. Since they mention in the related work that idea of LFS is similar to database systems of write ahead logging, should they have shown some more analogous benchmarks and provided insightful similarities and dissimilarities with DB systems it would have been more interesting to read in the paper.

5. Confusion

What are the ways in which two file system design can be compared, is that there are other factors as in performance of algorithm w.r.t hardware design also important in comparison ?

Posted by: Ankur Srivastava | March 17, 2016 08:49 AM

Summary

The paper presents a new disk storage management technique called log structured file system. The basic idea is to write all modifications to disk sequentially in a log like structure,thereby speeding up both file writing and crash recovery. The authors have implemented a prototype log structured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small file writes while matching or exceeding Unix performance for reads and large writes.

The problem

A number of technological trends motivated the authors to come up with the log structured file system design. Firstly CPU speed was increasing exponentially which meant disk performance was becoming a bottleneck. On the other hand though disk transfer time improved substantially, disk seek time was still lagging behind. Thirdly, main memory was increasing at an unprecedented rate. As a consequence, file caches became larger which could satisfy majority of the read request and also act as a write buffer. Fourthly, it was observed that for workloads dominated by accesses to small files , the overhead of metadata maintenance outweighed that of the actual file data write. Lastly existing file systems suffered from many shortcomings like synchronous writes and scattering of information around the disk resulting in multiple small accesses.

Contributions

1.Write throughput increases significantly as they can be batched to a large sequential write and thus costly seeks can kept to be minimum. For workloads that contain many small files, a log-structured file system converts the many small synchronous random writes of traditional file systems into large asynchronous sequential transfers that can utilize
nearly 100% of the raw disk bandwidth
2.Writes create multiple, chronologically-advancing versions of both file data and meta-data. Some versioning file systems implementations can leverage on this make these old file versions nameable and accessible,
3.The log is the only structure on the disk. For containing indexing information so that files can be read back from the log efficiently, it uses a data structure called an inode map to maintain the current location of each inode, and caches inode maps in main memory, reducing required disk accesses. 4.Recovery is very fast in the event of a crash. Instead of compared to fsck which scanned the entire disk, the entire state can be reconstructed from the last consistent point in the disk . This is done via 'checkpointing' i.e. periodically writing the inode-map and segment usage data to a fixed location in disk (alternating between two fixed locations) and then rolling forward from the last successfully checkpointed log segment to the most recent one to update the 'checkpoint' region's indexes and other metadata in case of crashes.
5.To manage the free space on disk efficiently Sprite LFS uses a combination of threading and copying, divides the disk into large fixed-sized segments and does segment cleaning to make use of space taken by old data. A cost-benefit cleaning policy is used, which allows cold segments to be cleaned at a much higher utilization than hot segments. No free-block list or bitmap is required in LFS. This means that in addition to saving memory and disk space, the elimination of these data structures also simplifies crash recovery.
6.LFS implements temporal locality of data.

Evaluation

The authors compared the performance of Sprite LFS to SunOS 4.0.3 withUnix FFS. SunOS used 8KB block size while LFS used 4 KB .The best case performance run on a collection of small benchmarks reading/writing small files which resulted in no cleaning, showed that LFS has 10x speedup for creating and deleting files and has slightly better speedup for reading. In case of large files, Sprite LFS has comparable performance and even better in some cases. Like random and sequential writes. LFS has worse performance only for sequential reads after random writes. Overall, LFS seems to have better disk utilization bandwidth of 70-75%, compared to Unix FS’s 10–15% utilization. The cleaning overheads are separately presented from statistical data collected over a period of several months. These overheads (1.2-1.6) too were lesser than what their previous simulations had led them to believe they would be (2.5 - 3). The authors dutifully explain this disparity in the results and what I like about this part is that they paid attention to avoid start-up effects to reflect true results. But the microbenchmarks used were synthetic, hence what I feel is lacking in the evaluation is a better comparative analysis with different real workloads. Also a break up of the overheads for checkpointing and maintenance of imap and segment usage table would have been useful. Another valuable addition in the evaluation could have been a comparative analysis of the recovery mechanism of UNIX FFS and LFS, this would have helped us in better appreciating the advantages of checkpointing.

Confusions

How does LFS compare with journaling?

Posted by: Amrita Roy chowdhury | March 17, 2016 08:49 AM

1. Summary The authors maximize disk bandwidth use for writes by creating a new file system in which all writes are large sequential writes. This is enabled by a combination of disk buffering and disk garbage collection algorithm.

2. Problem In the early 90s, processor speed and memory size was increasing exponentially, while rotational disk performance was not keeping pace. Moreover, common desktop workloads were small-file intensive; creations, writes, and metadata manipulations of such files were the common case. For conventional UNIX file systems, write time was dominated by mechanical latency in these use cases. Seek times were so dominant that only 5% of the disk's write bandwidth was used. Additionally, metadata manipulations in these file systems must be performed synchronously, and often require multiple separate writes, further bottlenecking performance.

3. Contributions The authors construct a log structured file system; rather than storing the data for a given block or metadata at a fixed disk location, the newest data and metadata is written out to a linear on-disk log.The authors take advantage of the speed and increasing availability of RAM to buffer disk data as much as possible, allowing writes to be coalesced into large writes, so that write time dominates seek times. As the newest version of inodes are written to the log, FFS-style random access performance is enabled by an node map which tracks the on-disk location of all indoes in the file system. This structure is cached in memory and is periodically checkpointed to predefined regions.

As the log moves across the disk, free space gradually fragmented as the space containing obsolete versions of data is reclaimed. When the log needs to find new data in a fragmented region, there are two options for using the available space. First, the log can "thread" the new writes through the empty area. However, the performance of a log structured file system is contingent on large sequential writes that can saturate the disk bandwidth, which limits the utility of naive threading. Second, the log can periodically compact "live" data; this, however is a potentially costly activity, which must be intelligently scheduled. The authors choose a compromise strategy, in which the file system is divided into fixed size segments, which must be written in their entirety, and for which all live data must be copied out before reuse. The segment size is chosen such that even if segments are written randomly, the time to write a segment dominates the seek time, allowing the log to thread between live segments. To speed up the compaction of internally fragmented segments, the authors use a segment summary block, which allows stale disk blocks to be identified by a quick comparison with the most recent inodes, without reading the full segment.

4. Evaluation The authors assess their design choices through a combination of modeling, simulation, and live testing. I like the "write-cost" metric they introduce, as it cleanly encapsulates the notion of amortized cost of writing in a garbage collected environment, and I also appreciate how they gradually move from an idealized model of concerns to more and more concrete justification of design choices. In accordance with their mathematical model, the simualtions show that, in general, while write costs are fixed for FFS, LFS's costs blow up asymptotically as the live data per block approaches 100%, and that average block utilization needs to remain below 75% to compete with FFS. Moreover, they show that the garbage collection needs to differentiate between hot and cold segments to eliminate performance anomalies.

In an analysis of live performance benchmarks, they show that LFS drastically improves on Sun's file system for small file workloads, and that large file workloads are at least as good for all uses cases, except for sequential reads following random writes. I was disappointed that they did not do any significant analysis of how LFS affects overall system performance - they briefly note that there does not appear to be much of a qualitative difference in interactive systems. The cited workloads in the beginning are office and engineering ones, which makes this seem like a solution in search of a problem.

5. Confusion I'm still a little unclear on how to interpret their discussion of hot and cold segments. I understand how hot and cold files may affect performance, but I'm unsure how that translates into segments being hot and cold in the log-structured layout.

Posted by: Michael Vaughn | March 17, 2016 08:47 AM

1. Summary
The authors came up with a new storage technique after studying the disk access pattern and the advances in the available processors and memory. Their analysis suggested that I/O traffic will be dominated by disk write and hence it is better to bunch write operations and store them in a circular log based approach. Through bunch of clever policies and evaluation, they proved that their implementation outperforms the existing FFS based solution by a long margin.

2. Problem
Existing File Systems (FFS) had a high write costs as they tried to implement the logical locality. This resulted in high seek cost and write operations dominated disk traffic. Also, existing work only had implementation based on synchronous writes, which further slowed the write operations.

3. Contribution
Most important contribution of the authors was to pick the best ideas available from the community of database developers. Their implementation in Sprite LFS batches the disk writes, returns the I/O call (making it asynchronous) and writes based on policy of temporal locality. Updates to inodes and other metadata are also stored along with the data blocks. To do this, they introduce notion of segments (consisting of smaller units - blocks) which are written in a circular manner. This leads to the two issues
- Finding free segments.
- Updates to file blocks making old blocks invalid (hence need to copy and compact).
They explain the choice of picking a particular segment for cleaning based on the benefit / cost ratio. All this information about hot / cold data was stored in Segment summary block. Based on this, they were able to achieve operating at Bimodal segment distribution. Crash recovery was handled using techniques such as checkpointing, where each checkpoint is treated as a commit block which ensures that the state is consistent. There can be a crash during a write to a checkpoint. To solve this, they proposed to keep two checkpoint regions and alternate between them. Roll-forward was implemented to recover as much data written since last checkpoint operation. This still beats the existing fsck based implementation which would have to otherwise scan the entire disk for consistency.
Overall, their implementation is limited to disk involving seek operations. Will the FS behave any differently on medium such as Flash based disk where writes do not involve seek cost would be interesting to study? In my opinion, it is better to buffer the writes and write them based on logical locality when the disk is idle (essentially FFS + asynchronous behaviour) since then it will give better read and write rates.

4. Evaluation
The solutions which the authors provide is completely based on the evaluate-learn-implement technique. Initially they evaluated their policy of picking up segments based on several factors such as age, utilization and write costs. They then evaluated the performance using microbenchmarks which was quite comprehensive. Cleaning Overhead experiments performed over a long duration showed that Sprite LFS was able to still beat the system with complexity of moving the data blocks.

5. Confusion
I did not understand the context of directory operation logs - is it just another term given to metadata associated with the data blocks or there is something more to it.

Posted by: Vikas Goel | March 17, 2016 08:46 AM

Summary
The paper presents the design and implementation of new Log Structured File System which increases the write speed for small as well as large file by using logs to store file information that written sequentially.
Problem
During thee 1990’s the CPU speed was increasing rapidly but the disk access time remained the same. Hence the Applications are not able to utilize the fast CPU since the disk I/O time is still the same. The system memory is also growing so more data can be cached . Existing file system perform poorly as information is spread across the disk which causes too many small accesses. The file attributes are separate from file contents and hence operation like creation of file required 5 disk I/O operation. Other problems are they tend to write the data synchronously . All these effect the utilization of the disk bandwidth . So the paper suggests a new file system which attempts to overcome these problems.
Contributions
The Main Contribution of the log structured file system is maximum utilization of the disk bandwidth with asynchronous write operation i.e., 1]Buffering a sequence of file system changes in the file cache and then writing all the changes(into a log) to the disk sequentially in a single disk write operation.
2]All the details related to a file a present at a single place(log) rather then being spread as in Unix file system. Hence the write operation is faster when new files are created .
3] recovery of the unused the blocks of the file system is faster as now the disk is split into segments and by tracking the information in the segment in segment summary . Usage of segment is tracked in SegmentUsage table to tack the live and unused blocks in the segment.
4] Faster tracking of old data by maintaining the version number of the inode.
5] Efficient tracking of the file system state using checkpoints which helps is faster recovery. Recovery of the file system is faster as the system only needs to check the last checkpoint and recover from it. Roll forward approach is applied to further recover incomplete writes before the crash, which is not present in traditional file system.
6]The paper present cost-benefit policy to recover the unused blocks in the segments and presents the evaluation of the policy to recover the segments
Evaluation
The Author has done a good evaluation of the file system. The creation, read and delete of large number of files is almost 10 times faster in log structured file system compared to the Unix FFS used in Sun OS ( Fig 8). The evaluation of the cleaning policy is very well evaluated by running the log structured file system over a period of several months ( Figure 9). The sequential write speed is faster because of buffering and doing a single write when the log size is reached. The Random writes are also faster in the log structured file system as they are written in a single sequential write to the log. The only drop in performance is when when file is read after being written randomly.
The paper has given detailed evaluation of write cost in table II which ranges from 1.2-1.6 and author has suggested these costs may be avoided in real system by running the cleaning task while being idle or in night time.
The system was able to achieve a 70% utilization of the maximum sequential write bandwidth with cleaning overhead which is pretty impressing compared to Unix FFS. The paper also presents the crash recovery metrics in table III giving the details of time taken for different file size. Detailed evaluation of the bandwidth usage is also given in table IV
Overall the author's evaluation is pretty convincing.

Flaws in Evaluations :
To gather the effects of cleaning the start up effects we’re avoided by waiting for several months. Start up effects must have been included to show the real effect of the cleaning process.Overall the system seems to be performing extremely well only when there is efficient disk space available. The performance of the system when the file size is in Mega bytes and when the disk is completely utilized , the cost of cleaning for these huge files should have been shown. The evaluation shown only consider the file size of max 100KB.

Overall the paper presents a good file system which is still used in current industry
(https://en.wikipedia.org/wiki/List_of_log-structured_file_systems )

Confusion:
Please explain the threaded log mechanism

Posted by: Mushahid Alam | March 17, 2016 08:44 AM

This paper describes a new file system called LFS (Log-structured File System). LFS buffers all updates in an in-memory segment and when the segment is full, performs a long sequential transfer to an unused portion of the disk. As LFS never overwrites existing data, a segment cleaner is used to compress live information from heavily fragmented segments. The use of a log also helps with a speedy crash recovery mechanism. This paper clearly outlines the problems associated with mainstream disk management systems, introduces a new file-system that matches/outperforms existing systems on typical workloads and evaluates their system against SunOS's file system.

2. Problem
The following factors created the motivation for LFS:
- The growing disparity between CPU speeds and disk access times: This trend is likely to continue and cause applications to become disk-bound.
- Increasing main-memory capacity: As memory size grows, majority of the read requests can be satisfied from buffer caches and disk traffic would be dominated by writes
- No significant improvements wrt disk performance: Transfer bandwidth and access time have not improved drastically as compared to disk cost and capacity
- The growing divide between disk sequential and random I/O performance
- Poor performance of prevalent file-systems on common workloads: Small writes trigger multiple physical I/O's
- Synchronous writes: Typically file-system metadata structures are updated synchronously (FFS)

The authors attempt to address the above mentioned problems and designed LFS which leverages the disk's sequential I/O performance. The use of a redo log, also simplifies crash recovery significantly.

3. Contributions
The primary contributions are:
- Leveraging disk's sequential I/O performance: An in-memory segment buffers updates (data-blocks + inodes + imap) which are then written onto an unused portion of the disk sequentially. Each segment has two checkpoint regions which are used to track the imaps distributed within the segment.
- Free space management (segments): The system writes segments sequentially and the log is threaded on a segment-by-segment basis to manage free space.
- Segment cleaning: The system uses a segment summary block to determine block liveness and doesn't use a bitmap or free-block list. The system also segregates hold and cold segments to aid the segment cleaning policies. Additionally a write-cost is used to compare cleaning policies.
- Crash recovery: Roll forward techniques are employed via the "redo" log mechanism to start reading from the last checkpoint region and to check for valid updates within. However the updates made after the last update to the CR and the crash would be lost (approximately 30 seconds worth).

4. Evaluation
Sprite LFS was evaluated on the following frontiers:
- Micro-benchmarks: Synthetic benchmarks were used to determine the best-case performance of Sprite LFS and SunOS 4.0.3 file-systems. Sprite LFS is almost ten times as fast for the create and delete phases of the benchmark. Re-reading the files are also very fast as they are packed densely within the log. LFS kept the disk busy for only 17% of the create phase while the other system utilized was clocking 85%. LFS also provides competitive performance for large files. Sprite LFS has higher write bandwidth than SunOS in all cases. This is substantial for the case of random writes as it turns them into sequential writes to the log. It is also faster for sequential writes as it groups many blocks into a large single I/O. Read performance is similar except for the case of reading a file sequentially after it has been written randomly as this requires seeks from LFS.
- Segment cleaning overheads: A table is provided which describes the segment cleaning statistics and write costs for the production system. The table presents a very pessimistic view of the system. Even though the disk utilisations ranged from 11-75%, more then half of the cleaned segments were empty.
- Crash recovery: Even though crash recovery was not installed on the production system, a table is provided that shows recovery time with the number and size of the files written between the last checkpoint and the crash. The results are not very surprising as the recovery time is bounded by the amount of data written between checkpoints.
- The other overheads associated with this system is presented in yet another table and this shows the relative importance of the various kinds of data written to disk. The inode-map (imap) accounts for around 7% of all data written to the log and this is because of the short checkpoint interval which forces metadata updates.

5. Confusion
What are the improvements that could be made to LFS ?

Posted by: Vinothkumar Siddharth | March 17, 2016 08:44 AM

1. Summary
This paper describes a new file system called LFS (Log-structured File System). LFS buffers all updates in an in-memory segment and when the segment is full, performs a long sequential transfer to an unused portion of the disk. As LFS never overwrites existing data, a segment cleaner is used to compress live information from heavily fragmented segments. The use of a log also helps with a speedy crash recovery mechanism. This paper clearly outlines the problems associated with mainstream disk management systems, introduces a new file-system that matches/outperforms existing systems on typical workloads and evaluates their system against SunOS's file system.

2. Problem
The following factors created the motivation for LFS:
- The growing disparity between CPU speeds and disk access times: This trend is likely to continue and cause applications to become disk-bound.
- Increasing main-memory capacity: As memory size grows, majority of the read requests can be satisfied from buffer caches and disk traffic would be dominated by writes
- No significant improvements wrt disk performance: Transfer bandwidth and access time have not improved drastically as compared to disk cost and capacity
- The growing divide between disk sequential and random I/O performance
- Poor performance of prevalent file-systems on common workloads: Small writes trigger multiple physical I/O's
- Synchronous writes: Typically file-system metadata structures are updated synchronously (FFS)

3. Contributions
The primary contributions are:
- Leveraging disk's sequential I/O performance: An in-memory segment buffers updates (data-blocks + inodes + imap) which are then written onto an unused portion of the disk sequentially. Each segment has two checkpoint regions which are used to track the imaps distributed within the segment.
- Free space management (segments): The system writes segments sequentially and the log is threaded on a segment-by-segment basis to manage free space.
- Segment cleaning: The system uses a segment summary block to determine block liveness and doesn't use a bitmap or free-block list. The system also segregates hold and cold segments to aid the segment cleaning policies. Additionally a write-cost is used to compare cleaning policies.
- Crash recovery: Roll forward techniques are employed via the "redo" log mechanism to start reading from the last checkpoint region and to check for valid updates within. However the updates made after the last update to the CR and the crash would be lost (approximately 30 seconds worth).

4. Evaluation
Sprite LFS was evaluated on the following frontiers:
- Micro-benchmarks: Synthetic benchmarks were used to determine the best-case performance of Sprite LFS and SunOS 4.0.3 file-systems. Sprite LFS is almost ten times as fast for the create and delete phases of the benchmark. Re-reading the files are also very fast as they are packed densely within the log. LFS kept the disk busy for only 17% of the create phase while the other system utilized was clocking 85%. LFS also provides competitive performance for large files. Sprite LFS has higher write bandwidth than SunOS in all cases. This is substantial for the case of random writes as it turns them into sequential writes to the log. It is also faster for sequential writes as it groups many blocks into a large single I/O. Read performance is similar except for the case of reading a file sequentially after it has been written randomly as this requires seeks from LFS.
- Segment cleaning overheads: A table is provided which describes the segment cleaning statistics and write costs for the production system. The table presents a very pessimistic view of the system. Even though the disk utilisations ranged from 11-75%, more then half of the cleaned segments were empty.
- Crash recovery: Even though crash recovery was not installed on the production system, a table is provided that shows recovery time with the number and size of the files written between the last checkpoint and the crash. The results are not very surprising as the recovery time is bounded by the amount of data written between checkpoints.
- The other overheads associated with this system is presented in yet another table and this shows the relative importance of the various kinds of data written to disk. The inode-map (imap) accounts for around 7% of all data written to the log and this is because of the short checkpoint interval which forces metadata updates.

5. Confusion
What are the improvements that could be made to improve LFS ?

Posted by: Vinothkumar Siddharth | March 17, 2016 08:42 AM

1. Summary
This paper is about the log structured file system which buffers data and metadata writes in a large segments which are later written sequentially to disk hence leveraging better performance of sequential bandwidth. It maintains a checkpoint region for quick crash recovery and a segment summary table to collect garbage.
2. Problem
With CPU’s getting faster and memory becoming bigger and less expensive the disk has become a major bottleneck to process performance. Since memory is cheaper , most reads can be serviced from memory itself.Existing file systems like FFS distribute the file data and metadata around the disk causing many small random accesses to take place. This has degraded IO performance for common workloads. Also the write is synchronous which is also impacts performance.
3. Contributions
When writing to disk , the data and metadata blocks are buffered in the file cache before asynchronously being written to disk. A new inode indexing mechanism was developed called the inode map which is divided into multiple blocks spread across the segments. A central checkpoint regino(CR) maintains the location of all imap blocks, though imap is compact enough to be stored completely in memory.The unused blocks need to be garbage collected efficiently.In order to do this the live data from a set of segments is compacted into few segment and the older segments can be cleaned. The mechanism to determine liveness of a block is to check the segment summary block, which identifies the file to which a data block belongs and the position of the block in the file. Block version number can be stored in the summary block as an optimization in this case. The policy employed here is cost-benefit policy. The cold segments should be cleaned sooner(as they are more stable) than hot segments.To support this policy a segment usage table is maintained.In order to recover from a crash , a checkpoint is written to a fixed position on disk. The checkpoint is a position in the log when all file system structures are consistent and complete.After a crash , the checkpoint region is read and the logs after the CR till the end of the log are replayed when necessary. A directory operation log maintains consistency between directory entries and inode entries.
4. Evaluation
Sprite LFS was an implementation used to evaluate the design principles.The performance of LFS is significantly better than FFS only if the machine on which they are run is fast.Any speedup observed was attributed to removal of synchronous writes.A collection of small synthetic benchmarks were used to measure the best-case performance. It was noted that create , delete of files are ten times faster in LFS (which used disk bandwidth better) .If a file is written and read sequentially both FFS and LFS performance is comparable ,the difference occurs during operation writes. This observation and analysis is helpful to understand when LFS is a better choice than FFS and vice versa.In order to evaluate the cost benefit cleaning policy and to shed light on the worst-case performance ,production LFS systems with cleaning overheads were measured.It can be seen that cleaning costs are low because there exits greater locality within individual segments and that the segments are accessed in a non uniform distribution pattern.
5. Confusion
Please explain directory operation log in more detail

Posted by: Shreya Kamath | March 17, 2016 08:41 AM

1.Summary:
This paper is about the design and implementation of Log-structured file system, where all modifications to disk are written sequentially to a log-like structure, thus speeding up writing and crash recovery. The authors have implemented a prototype named the Sprite LFS which performs ~ten times better than Unix file systems for file writes.

2.Problem:
1) Increasing memory sizes have led to caching of files in the main memory thus significantly improving file read requests. As a result, the disk traffic is mainly due to writes.
2) Write performance is generally affected due to large seek times, thus the requirement to optimize this.
3) To create a file in Unix, at least five disk I/Os are required, preceded by a seek since the inode and directory structures are spread out on the disk away from file contents.
4) Crash recovery techniques in Unix file systems had to scan the entire disk to restore consistency after a crash.
Log-structured file systems solve the above problems by writing in a sequential format to logs thus improving write times. Crash recovery is sped up since it requires to scan only the recent portion of log.

3.Contributions:
1) The fundamental contribution of LFS is to buffer all writes in cache and then writing the contents sequentially to log like structure in a single write operation. This significantly improves write performance by avoiding unnecessary seeks.
2) Abstractions:
Inode map - structure to store the locations of the inodes instead of placing inodes in fixed locations, for easy lookup.
Segment - Collection of fixed size extents in disk.
Segment summary block - Details of segment such as file number, its version number and block number for the extent. This is used for determining the liveliness of the segment, aiding in recovery and cleaning mechanism.
Segment usage table - in each segment to track live bytes and most recent modified time of block in segment.
4) The authors have developed a novel policy for selecting segments for cleaning and grouping of live data according to the age. They use the metric 'write cost' which is the fraction of total bytes moved to and from the disk to those bytes that represent new data.
5) Instead of cleaning the segments that are least utilized, they use metric 'cost-benefit', which selects less utilized hot data(short lived) segments and almost fully utilized cold data (long lived) for cleaning. This achieves lesser write costs.
6) Checkpoint regions are used to track the consistent states of the file system and roll forward mechanism to recover information after the checkpoint region.

4.Evaluations:
The authors have well evaluated the write costs of different cleaning policies for LFS and compared it against the Unix FFS, where LFS outperforms FFS even at high disk capacity utilization. Cost-benefit cleaning policy reduces the write costs by 50% when compared to greedy based policy. To measure best case performance, they have compared the cost of creating/reading/deleting small files where Sprite LFS performs better than SunOS, whose file system is based on Unix FFS. Sprite LFS can use around 70% of the disk bandwidth for writing when compared to 5-10% of FFS.
Measurements of overhead due to cleaning policies have been presented, where the cleaning costs are lower in Sprite LFS than in the simulations ran due to larger files that are written and deleted as a whole. Only missing part would be the comparison of recovery times against the Unix FFS. Thus the authors have well reasoned all their observations and have backed up most of their claims in their evaluation.

5.Confusion:
How much of data is buffered in asynchronous writes before writing to the log structure?

Posted by: Sharanya Devaraj | March 17, 2016 08:41 AM

1. Summary
This paper describes a file system in which all file modifications are stored in a cache in main memory and then written sequentially to disk, which the then authors implemented. This design requires techniques for preserving the amount of free space on the disk using regions of memory called segments.

2. Problem
While hard drive capacity has increased exponentially over time, the time to access data on that hard disk has increased much more slowly. This is particularly important since processors have also increased in speed at an exponential rate and computer users expect all activities of their computer to increase similarly. This could not be solved well in hardware, since most of the slowness in disk access stems from disk seeks, which have physical limits which cannot be easily overcome. In addition, the size of main memory has also increased at an exponential rate, allowing memory to cache larger and larger amounts of information from disk.

Existing file systems were not well-designed to handle these conditions. This is particularly true for interactive use, where users typically create many small files. This can lead to multiple, slow disk seeks.

3. Contribution
The paper describes a file system in which blocks of data are written to the disk sequentially. Since inodes are also added to the end of the region of disk, they may now appear anywhere on disk. To handle this, the file system includes an inode map, which stores the locations of the inodes. Since the blocks of the inode map can also be located anywhere on the disk, the file systems includes a fixed checkpoint region which stores the location of the blocks of the inode map, as well as other data.

Besides keeping track of inodes, the other, larger challenge for this new file system is maintaining long extents of free disk space into which to write new data. These are maintained using extents called segments. When more free space is needed, segments are selected based on the amount of live data (data still referenced from some file) they contain and the age of that data. The blocks of live data are then copied into a smaller number of other segments. Then, the inodes for the files that reference the data are updated to refer to the new locations of that data. This is done using a segment summary block, which is something like a reverse inode.

The checkpoint region is also used to recover from crashes. It contains a timestamp of a known time when the file system structures were consistent and complete and a pointer to the last segment written, as well as the location of the blocks in the inode map mentioned previously. In case the machine crashes during a checkpoint operation, there are actually two checkpoint regions that are used alternatively. The file system can also "roll back" information from log segments that were written after the last checkpoint. This uses information in segment summary blocks to update the inode map read from the checkpoint, which incorporates the data into the file system.

4. Evaluation
The authors evaluate their work on microbenchmarks, on real-world usage, and on induced crashes. The microbenchmarks test the performance on creating, reading, and writing small and large files. With small files, the new file system is around ten times faster for creating and deleting files; it is also faster for reading files. The paper's file system saturated the CPU while creating files, indicating that its speed would increase with increases in CPU speed. The microbenchmarks also demonstrated on large files that the file system described in this paper performs as well as others, despite being designed for small files.

The authors used real-world usage to evaluate the amount of time the file system took copying blocks to maintain regions of free space. They measured the time taken for managing segments in addition to the time taken to write the data the users actually had modified. They found that only around 40% more time was taken, and speculated that much of this extra work could be scheduled during idle periods so that it was less noticeable.

To test crash recovery, the authors wrote tests that created magabytes of fixed-sized files and then crashed the system. They found that the recovery time increased with the size of the files written. This tested both the checkpoint mechanism and the rollback mechanism.

5. Confusion
Do any of the tradeoffs in the performance motivations for this work change with solid state drives?

Posted by: Stephen N. Lee | March 17, 2016 08:41 AM

1. Summary
The authors introduce a new way to store data on disk, which borrows previously well-known ideas such as logging, checkpointing, and roll-forwarding.
2. Problem
The CPU speed has been increasing exponentially, while disk speed improvement has not been even close to the same rate. As a result, programs are now seeing more of a bottleneck when using storage rather than using the CPU. In addition, because main memory is getting bigger, allowing for a bigger buffer cache size, read requests are being satisfied efficiently, but writes are still a bottleneck.
3. Contributions
The goal was to improve the performance of the file system. Their design has the same performance for read, if not better for some cases, as previous systems. However, for write operations, it is much faster because of the sequential nature of the log, which requires no seeks and allows for faster crash recovery.
In this design, the file system writes a batch of changes to the end of log, where changes include file data as well as any FS structure. The challenge is what should happen after reaching the end of log/disk. The authors propose dividing the disk into segments and using threading technique to interleave on a segment-to-segment basis, while using copying technique to move around live data within a segment.
Another challenge is which segments to free and copy to a new location. After doing experiments with a couple policies to see which one performs best, the authors came up with the cost-benefit policy, which is based on the amount of free space that would be reclaimed and the likelihood of the space to stay free.
The authors also introduce a checkpoint/roll-forward system for recovery. In order to checkpoint, the file system writes modified data to the log and then writes some information, such location of last segment written, to a fixed checkpoint location. During recovery, the checkpoint location is accessed to do roll-forward. The paper compares various policies for when to checkpoint, such as interval checkpoints that happen every X seconds or checkpoints that occur after writing X bytes to disk.
4. Evaluation
The authors first do a few basic benchmarks that simply create, read, and delete large number of files, which to me does not seem a comparison to a real workload. However, the authors end of also measuring random read and writes as well as reading a file that was written randomly. At some point, they mention that their system’s performance for sequential write operations is similar to a newer version of FFS that groups writes, which makes the LFS mostly great mainly for random writes only.
They also evaluate their cleaning mechanism overhead by running the system for months and seeing how much the performance is hurt after log reaches the end. This is a sound way to test a system as real daily workloads are being used. Turns out that for their workload, the write cost was not significant at all.
5. Confusion
Do you go through the log and keep calculating the cost-benefit for each segment until you find one that is under the threshold? Should there be a policy to stop at some point if it is taking too long to find a segment under threshold and just go with a segment already visited?
Are there more standard benchmarks for file systems today that were not available then?

Posted by: Arman Shanjani | March 17, 2016 08:38 AM

Summary:
This paper discusses about the design and implementation of a log structured file system primarily to improve the performance of disk writes and provide faster crash recovery.

Problem:
CPU speed is increasing rapidly. Disk speed is not increasing rapidly and slowly becoming a bottleneck for most read/write operations. Read calls are fast because of caching in RAM, but writes require disk access and random writes to disk are typically slow. Most file accesses are synchronous, hence need to wait until completion. Traditional file systems moreover were very slow in crash recovery and required full disk scan. These problems have been addressed using this new design.

Contribution:
Write latencies are more dominant in current disks, since a lot of reads access cache. This design decreases write latency and makes crash recovery faster. Traditional inodes, bitmaps, data blocks and data bitmaps are used here as well for efficient lookups. Segments are basic units of allocation, hence all blocks are buffered in memory until it reaches the segment size. All writes are sequential and writes are batched together to improve I/O and blocks are stored in the same order in which it was originally written. Modified Inodes are also placed in the log and imaps are used to identify inode blocks. Cleaning policy was required to create completely empty segments in order to be able to write segments. Cleaning mechanism invoked when the number of free segments falls below a threshold and continues until the threshold is crossed. Defines long living and short-lived (hot) segments, where long-lived(cold) segments help in compaction. Liveness determined using Segment Summary information in each block. But, this cleaning is an additional overhead now. Summary block containing metadata about each segment. Checkpointing at regular intervals for crash recovery. To ensure recovery in case of failure during checkpoint, a backup of current is checkpoint is stored. Faster recovery using checkpointing and Roll-Forward in this design to recover changes to blocks made after checkpointing, since it drastically reduces the number of disk blocks that need to be accessed.

Evaluation:
Prototype LFS called Sprite LFS implemented. Performance found to be an order of magnitude better for small file writes and >= performance in the case of reads and large file writes, compared to FFS. Cleaning policy determined by practically trying out various policies using a simulator. LFS is faster with random writes, but slower for sequential reads. CPU utilization is less in LFS, hence a faster CPU could speedup LFS further. Provides evaluation for cleaning and segment utilization using real world workloads. Provides stats of ratio of metadata to data blocks on disk. Overhead of age sorting not clear, since the cleaning policy overhead split up is not provided. Examines the uniform and hot-and-cold file access pattern to evaluate segment cleaning policies, but the access pattern could typically increase disk utilization further and require much higher rate of greedy segment cleaning. No statistics on write cost, cleaning cost with changes in segment size has been provided. Comparison data to a few more file systems could have been provided, since there are drastic design changes in this paper.

Issues:
Cost structure of cleaning policy. What about the cost of metadata update in the “write cost”?

Posted by: Siddharth Suresh | March 17, 2016 08:34 AM

Summary:
The paper describes the idea and implementation of the log-structured file system (LFS) which aims at enhancing the performance of small file writes and fast recovery from crashes. It does this by writing all modifications sequentially in a log-like structure that has both the data and the metadata information.
Problem:
File systems until then offered poor performance because of the synchronous writes to the disk. Hence, the applications were not getting enough perform benefit even with the increased processor performance. On the other hand, file systems that cached the writes in memory before writing to the disk had low bandwidth utilization because of the fragmented writes to the disk and ran the risk of data loss over crashes. This encouraged the authors to design LFS that would improve the write performance of small files and provide fast crash recovery.
Contributions:
LFS does away with UNIX FFS style of having the inode table and the file data in non-contiguous chunks. Instead, LFS caches the writes to the disk. It then does a single sequential write of both the data and metadata structures as a log. LFS maintains the location of the inodes using the inode map whose location is determined from a fixed checkpoint region. The authors claim that the inode map is small enough to be cached in memory.
LFS manages free space using segments. Each segment is a contiguous block of storage of a fixed size. Segments are the basic units at which cleaning is done. Each segment contains a segment summary table that has the information about the data in the segment. The file system cleaner reads the table and decides on the liveness of each file. This approach does not need to maintain the free-list bitmap data structures and thus alleviates the complexity.
LFS also maintains two checkpoint regions that hold the last consistent state of the disk (the end of the last log and the last segment written). These two regions are at fixed locations in the disk and are written alternatively to handle the possibility of a crash during the checkpointing operation.
Since recovery from a crash is instantaneous using the checkpointing mechanism described above, LFS tries to recover some data that were lost after the last checkpoint. During the roll-forward phase, LFS reads the segment summary blocks and check if a new inode is found and appropriately updates the inode map structure. In order to maintain consistency between the inodes and the directory entries, LFS maintains a directory operation log which maintains the mapping between the directory entry and the inodes inside it. During a write, the log is written before other structures are updated. Hence, rolling forward is direct by simply following the log.
Evaluation:
The authors have evaluated two different policies for the segment cleaning mechanism, albeit on a simulator. The policies deal with what segments need to be chosen for cleaning. The results show that a cost-benefit based policy is better than a naïve greedy policy. The authors have demonstrated the performance improvement of small file writes and how the asynchronous writes take advantage of the processing power. The authors also demonstrate that the cost-benefit policy is able to maintain a bimodal distribution of segments. The paper also determines the recovery time with different file sizes. Overall, the authors have covered all the bases here. They have demonstrated that LFS does what it claims to do. I feel the paper could have evaluated the cleaning overheads when the disk utilization is close to disk capacity. This could show the worst case performance of LFS and help in making cleaning policy decisions under such conditions.

Confusion:
Is LFS currently being in any of the modern OSes?

Posted by: Prashanth Balasubramanian | March 17, 2016 08:30 AM

1. Summary
The paper presents Log structured file system(LFS), which aims at improving the disk bandwidth utilization compared to the existing file systems like FFS by taking advantage of the increased primary memory and read-write access patterns.

2. Problems
The disk technology improvements were not able to keep with the CPU performance improvements. When there are large number of small disk writes followed by seeks, disk access time (depends on mechanical motions like rotor speed) hinders the performance benefit from improved CPU speed and larger primary memory. The authors note that in such a scenario, the disk traffic is dominated by writes than reads(For reading, the pages are cached in main memory). Existing file systems like FFS do not take this behaviour into consideration and spend much of the disk time in updating the metadata(spread across the disk) which results in increased seek time. Thus effective disk bandwidth utilization is affected (5% for FFS). Also, since metadata updates are synchronous, they further hurt overall performance.

3. Contribution
1. Major contribution of the paper is the design of LFS, which writes the metadata and the data blocks in a fixed size log. The logs are written back to disk either periodically or when the sizes are filled up. The advantage here is that the writes are sequential thus the seek time is greatly reduced. The paper introduces various data structures like inode map and checkpoint region to track the data blocks of the file in the disk.

2. LFS introduces fixed size extents called segments which is written to the disk sequentially. The proposed free space management is a combination of threading and copy and compact techniques. Within a segment, the blocks are copied and rearranged to create contiguous space and the segments themselves are threaded which helps in recovering from crashes. Segment summary data structures helps in identifying live blocks and helps the cleaner to aggregate the live blocks and free the segment for future use. The paper also discusses the factors affecting segment cleaning policies.

3. Cost-benefit policy: The paper proposes a new policy which takes the age into consideration while selecting segments for cleaning. The proposed policy cleans cold segments at a higher utilization rate than hot segments thus achieving a bimodal distribution which results in lower overhead due to cleaning. This mechanism improves the write cost significantly.

4. Crash Recovery : LFS borrows the two-step crash recovery mechanisms from other logging based system. Checkpointing ensures system consistency. The sprite LFS alternates between two checkpoints (uses the most recent one at reboot) which contains the address of all the blocks in the inode map and segment usage table, timestamp and pointer to the last segment written. Roll-forward operation scans through the segments written after the last checkpoint and tries to restore the system state with no loss of information.

4. Evaluation
The evaluation in the paper is in two phases. Firstly, the authors employ a file system simulator to analyze the cleaning policies as it lets them to control the access pattern and locality. This evaluation throws light on how a real world use case access pattern like hot-cold can perform significantly lower than expected when compared to uniform access pattern. The authors go on to propose cost-benefit policy which corrects this anomalous behaviour.
Second part of the evaluation records the experiences of implementing the sprite LFS on a real system and evaluating it complexity of implementation, costs involved in creating, reading and deleting files, various file access patterns like random read and write and sequential read and write. The authors compare the performance of LFS against Sun OS’s FFS for all the aforementioned criterias. The paper also presents the results of segment utilization, average write cost and tabulates other useful metrics. Since the LFS segments are comparable to the total available memory, it would have been interesting to see the memory footprint of the this technique when compared to FFS, which uses bitmaps and other data structures for tracking data blocks.

5. Confusion
How accurate/true is the author’s claim that crashes are rare and acceptable to use lose a few minutes of data while crashes? What kind of recovery mechanisms are employed now?
Why does Unix FFS writes metadata synchronously?

Posted by: Bharadwaj Krishnamurthy | March 17, 2016 08:26 AM

1. Summary: This paper presents the log-structured file system, which uses techniques like write buffering, sequential writes, and checkpointing to allow faster crash recovery, and achieve write performances of an order of magnitude better than existing FSs like Fast FS.
2. Problem: Contemporary File Systems treated disks as random access memory. They provided logical locality: Directory and related files were kept together. But either the metadata was some place else (UFS), or the data was written out in small chunks (UFS+FFS). The latter resulted in lots of small synchronous writes, resulting in much of the disk bandwidth being wasted in seeks! Thus, CPU performance was tied to disk performance, and vast amounts of CPU potential remain untapped.
3. Contribution: The authors recognized that technology is changing (increasing CPU performance and RAM). Thus, they can buffer more data in RAM. This will also result in Reads being serviced from RAM itself. Thus, the only bottleneck affecting disk performance was random writes. Their biggest contribution was in realizing that writes can always be made sequential: just write everything in logs(data+inodes), and do not go back to update previous structures. They combined this with write buffering (segments) to convert lots of small synchronous writes to one big asynchronous write, thus freeing the CPU. Of course with this, they had to keep track of the most recent inode, inode to disk address translation etc., and for this they introduced structures like imap. Such a strategy of never overwriting older structures led to multiple copies of files still being in disks: like different versions! This was later used as the basis of Versioning File Systems like WAFL. In their implementation however, the authors implemented garbage collection kind of strategies to find clean segments. To effectively do this, they introduced structures like Segment Usage table, Segment summary. I particularly liked how they innovatively borrowed from other fields like Programming Languages and Databases to solve their problem. One other big contribution was the crash recovery strategy. They removed the need to scan the entire disks by keeping checkpoints. Thus, only the log written after the most recent checkpoint had to be scanned, vastly improving the recovery time. In the process of radically changing how writes to disks are performed, the authors also removed free-space management structures like free-lists and bitmaps.
4. Evaluation: The authors clearly evaluate their policies by both simulating as well as implementing it in Sprite LFS. They also provide speculative performances of their systems with increasing CPU performance. A couple of things that were missing: They did not run segment cleaning while evaluating with micro-benchmarks. Thus, they only evaluate the best case performance of their implementation. Infact, garbage collection was a major source of debate for LFS in the to-be future. They also do not evaluate their performance, as the size of segment varies. They assume that segments are large enough that data transfer times dominate the seek and rotation time. This would be an interesting stat. While the system was designed in a way that it can be tuned (when to run the segment cleaner, for example) the corresponding evaluation was not done.
5. Confusion: If checkpoints had to track the Inode map, why not track the inodes itself? What is the need to explicitly store one extra meta-data, when it can be stored as checkpoint, and other times in RAM? How would LFS perform on Flash storage which do not have long seeks?

Posted by: Mohit | March 17, 2016 08:23 AM

1. Summary
The paper presents a new technique for disk storage management called a log-structured file system. This file system writes modification to disk segments in a sequential log-like structure, to efficiently use the disk while writing and enabling faster crash recovery. Indexing information is stored along with the log to enable faster reads. The authors propose a cleaning policy to ensure availability of segments to write out to the log.

2. Problem
The disk access times are only improving slowly as compared to the CPU processing speeds and main memory,providing better file caching, causing the applications to become disk bound. The workload which accesses many small files suffers from the problem of too many small disk accesses(only using 5% of disk’s bandwidth) and synchronous writes,coupling application’s performance to that of disk. The slow nature of disk writes due to disk seeks is the problem being targeted by the log-structured file system. Writing information to the disk in a sequential log structure eliminates the disk seeks required enabling much better disk bandwidth utilization for writes.

3. Contributions
The main contribution of this paper is the idea of treating disk as a log where data is written to and permanently stored in a log-like structure. All the sequences of changes in the file system are buffered in the file cache and written to disk in a single disk write operation. The authors explain the two issues that need to be addressed to gauge on the benefits of faster write performance - retrieving information from the log and managing space on the disk so that large extent free space is available for writes. The first problem is handled maintaining a map/index which outputs the disk address in the log. The paper introduces the notion of segments, combination of threading and copying to divide the disk into large size extents. Segment cleaning is done to copy out live data, so that the segment can used for new data. The advantage here is that no additional data structure is needed to implement a segment cleaning policy. Some of the policy decisions that are addressed are: when should cleaner execute, how many segments to clean at a time, which segment should be cleaned, live blocks be grouped while writing out.

4. Evaluation
The key to achieving high performance at low cost in log structured file system is to force the disk into bimodal distribution where most segments are either full and few are empty or nearly empty. The authors use a simulator to test segment cleaning policies and then choose policy of benefits to cost, which also has a bimodal distribution on the host and cold example case. The authors compare their implementation of LFS with Unix FFS and show that LFS performs an order of magnitude better than FFS for small-file writes. For which it was mainly targeted. LFS uses upto 70% of disk bandwidth for writes as compared to 5=10% for Unix FFS. For large files LFS performs comparable to FFS. The cost of segment cleaning was performed experimentally and results obtained were much better from those derived from simulations. The authors provide an idea of recovery time after crash, based on the number and size of files written after the checkpoint. They do not compare LFS with FFS for sequential rereads for which FFS is expected to perform better.

5. Confusion
What exactly is live data? Could you cover in class about the segment cleaning mechanism?

Posted by: Anshul Purohit | March 17, 2016 08:14 AM

1. Summary
The paper introduces a new file system called LFS "Log-Structured File System". They talk about the motivation and challenges behind this new file system. They have also created a prototype called "Sprite LFS” and they show how this system (along with some of their policy) performs for different read/write operations.

2. Problem
The improvement in the speed of disk (especially access time) has been slow compared to increase in performance of CPU's and the increase in size of memory. A lot of time is spent in doing disk writes and a lot of applications are not able to take advantage of faster CPUs because of disks acting like a bottleneck (especially synchronous writes). Other challenges which the authors faced were that information are usually spread out across disk and workloads are dominated by access to small files

3. Contributions
They have introduced a new file system called LFS where logs include all file system information. One of the main things that this design did was that it created new copies with updates instead of updating in place (I believe this is one of the main idea and reason for improved performance). Some of the changes that they had to make in the data structure were to introduce imaps, segment summary block, checkpoint region and segment usage table. They have divided disks to segments and have combined the ideas of copying and threading (all operations are performed on segment-by-segment basis). They spend a fair amount of time explaining their cleaning policy (when/what to clean) where they introduce a new metric called "write cost" and show that "cost-benefit policy" (clean the cold segment sooner and hot segment later) performs better than other policies. They then talk about crash recovery where they are using checkpoints and roll-forwards to ensure crash-recovery. One of the interesting point was that they used 2 checkpoint regions to ensure recovery from failure during checkpoint operation. They used a short checkpoint interval of 30 seconds

4. Evaluation
They compare the prototype they created called "Sprite LFS" with SunOS on a microbenchmark which creates, reads and deletes large number of small files. It clearly outperforms SunOS for small files reads and writes. In large file operation, Sun OS did better only for sequential reads after writing randomly. SunOS has logical locality and LFS has temporal locality (performance of different workload will vary depending on which locality is better suited). One of the problems with the evaluation was that the simulated and real results for measuring cleaning cost varied a lot (in practice the workload was very different compared to the simulated one). A question that I have regarding this is that Are simulated results usually accepted in today's research paper? How reliable are they considered? The authors were also able to justify their policy choices, checkpoint timeout using various experiments.

5. Confusion
What File Systems are usually preferred in real-time systems? I didn't understand why FFS is better than LFS for sequential reads after random writes

Posted by: Anubhavnidhi "Archie" Abhashkumar | March 17, 2016 08:13 AM

1. Summary
This paper proposes a file system that is optimized for small file writes and performs as fast as Unix FFS for reads and large files. The major idea is to pack a series of random writes into a single sequential write to a log.

2. Problem
The gap between the speed of CPU and disk is growing and the presence of large memory is able to cache most read requests. The bottleneck of performance of many systems becomes the disk write time, which is determined by transfer rate and access time. The former can be improved by using disk arrays, while the latter is not optimized well in existing file systems..

3. Contributions
Sprite LFS uses the sequential log as the permanent place for data. Inodes are no longer stored at fixed locations but are indexed by the inode map. This groups the changes to file data and metadata together at the end of the log, providing better temporal locality.
To maintain large free extents on disk, Sprite LFS divides the disk and the log into fixed-size segments. Logs are sequential inside a segment but segments can be in arbitrary order. This reduces the amount of copying needed compared to maintaining a completely sequential log.
The cost and benefit cleaning policy reduces the cleaning overhead by treating cold and hot data differently. Segments with larger free space, potentially longer lifetime and lower copying cost is preferred. Cold data from cleaned segments are grouped together to produce a more stable segment.
Crash recovery is done with two techniques: checkpoints and roll-forward. A checkpoint is when the file system is in a consistent state and is written to reserved checkpoint regions on disk periodically. During recovery, the checkpoint with the most recent timestamp is used and changes after that is recovered with best effort by roll-forward.

4. Evaluation
The authors evaluated their work in four aspects: I/O performance, cleaning overhead, recovery speed and other overhead.
In the I/O performance micro-benchmarking, Sprite LFS beats Unix FFS in sequential creating, reading and deleting of small files. While on large files, Sprite LFS runs faster for writing, comparable for reading and slower for sequential reading after random writing. The reason for the last result is due to the different localities the two file systems trying to achieve. But as their machine is not fast enough to make the current workload disk-bound, this part is less convincing.
For cleaning overhead, statistics on real systems are given. By showing that many cleaned segments were actually empty, they conclude the overhead is smaller than the pessimistic expectation.
The parts for recovery speed and other overhead are more focused on demonstration. They measured some relevant parameters and discussed how can they be improved.

5. Confusion
How do they know the first two policies for cleaning are less important, even though they didn’t do any experiments on it?
Is the checkpoint region the only thing at a fixed location?

Posted by: Xiangjin Wu | March 17, 2016 08:03 AM

1. summary
This paper introduce new file system, log-structured file system that most frequently accessed read request are cached in main memory, to speed up writing small random-addressed files as well as crash recovery using log-based checkpoint. The log-structured file system outperforms traditional Unix file system in terms of read and write operation regardless of file size.

2. Problem
The current file system, FFS, has a lack of write performance when there are small random write operation on disk. Because it needs to access several times to acquire inode of directories and file for write data.
With main memory as a write buffer between cpu and disk, the speed gap between cpu and disk can be mitigated but the data stored in main memory are eventually lost when crash occurs. Current Unix System needs to scan all the disk to recover from the crash so its operation needs a lot of time while log-structured system does not needs that operation. By the way, LFS in this paper requests large free space to handling large files into new space even the file is exist.

3. Contributions
Key contribution, here, is to develop the file system which uses log based mechanism. LFS adapts sequential write to provide write efficiency at the sacrifice of sequential read access. In order to use LFS efficiently, the paper come up with below policies.
To reduce the seek time for inode of directory and files, LFS use inode with inode map. The inode map, which is more compact than UFS, maintains the current location of inode.
Sequential writes cause fragmentation because the data in a segment, which are the small random write, can be scattered eventually. The data in a segment may not have relationship each other, which causes live and dead data in a segment. Therefore, frequent cleaning operation should be performed by cleaner.
Segment cleaner has policies: executing it in background at low cost, cleaning as many as possible, cleaning most fragmented file, and grouping with locality. Cleaning policy, cost-benefit policy, is chosen to work effectively considering cleaning and disk utilization.
LSF uses log to keep consistency from the crash. Recovery time is more efficient than Unix FFS because files are written sequentially not randomly. LSF uses checkpoint policy, stores map data in specific place when data is fully written in a segment, and restore data from that place when system is recovered after crash to reduce the restore time.

4. Evaluation
LFS is evaluated with micro benchmark and it shows that create, reads and deletes operations outperform SunOS. By the way this results come from low memory utilization. Then, they conduct an experiment with large file size and the results show that the performance is better or equal than SunOS except the case of sequential read on random write. I wonder why the author did not show the disk usage in this experiment and why he provides another section for cleaning rather than combining the experiment into one. He already mentioned that performance is not good at high disk capacity utilization though. In addition, he didn’t show how much more reliable LFS is than SunOS.

5. Confusion
How does the segment concept in LFS work with paging?
Does it will work well on byte-addressable non-volatile memory, i.e PCRAM disk?

Posted by: Anonymous | March 17, 2016 07:37 AM

1. Summary
Log-structured File System was designed to improve write performance. It buffers all new data along with the metadata into an in-memory segment which is finally written to the disk in one long sequential write to a free location. It creates large extents of free space using a segment cleaner process that incorporates policies and algorithms to provide better cost-benefit ratio. And finally they describe a crash recovery technique with checkpoints and roll-forwards. They evaluate against the Unix FFS in SunOS to show a tremendous improve in disk bandwidth usage.
2. Problem
There were issues with existing file systems such as FFS spreading out sequential writes around the disk, inodes and contents for the same file being at different locations, they perform writes synchronously which affects the application performance. And then there were technological advancements like memory becoming larger and as more data is cached, disk traffic increasingly consists of writes (reads are serviced by the cache). Thus, file system performance is largely determined by its write performance.
3. Contributions
To write sequentially and effectively, LFS buffers all the updates in-memory, until it reaches a threshold segment size, thus ensuring efficient use of the expensive disk resource. The information includes both the content and the metadata of the file. The index structures(inode map) in log help in retrieving information through a random read, and thereby not harming the read performance. A novel way for managing free space is proposed in this work. The segment cleaning identifies the versions of data by looking into a structure called the segment summary block and then determines the liveness of the blocks. This event of cleaning is performed according to policies to determine both when to clean and which blocks are worth cleaning. This work focusses on the one about selecting the blocks, where they group files as hot and cold, depending to their frequency of selection. By conducting various experiments, they come up with a heuristics of cleaning hot segments later and cold segments sooner. Finally, with a seminal approach the authors tackle crash recovery: they use checkpoint that contains all the information with the current time- 2 in number to handle crash during CP, which are then written to a fixed position on disk, and then they have roll-forwards to recover from the lost updates after the last CP.
LFS approach of writing all updates into free spaces in the disk effectively uses the disk bandwidth but makes cleaning process complex and costly and also has to maintain many data structures. But in all, this work is well written, popular and provides strong and sound reasoning on all the contributions.
4. Evaluation
Authors implement a prototype LFS called Sprite LFS and evaluation results show that it outperforms Unix FFS by an order of magnitude for small-file writes while matching or exceeding performance for read and large writes. When overhead of cleaning is included, Sprite LFS can use 70% of the disk b/w for writing whereas Unix FFS typically can use only 5-10%. With various simulations, they decide on the heuristics for the cleaning policy. They state that the performance of LFS would improve by a factor of 4-6 as CPUs get faster because of all the data structures they maintain to implement LFS. While traditional FFS achieves logical locality, LFS achieves temporal locality and so their behavior differs with workloads- LFS is good with random writes, whereas FFS is for sequential rereads. They also provide the cleaning overheads and segment utilization to understand the real-world workload patterns(hot/cold regions) and behaviors(usually longer files).
There have been objections on the measurements made in this work {Seltzer,'95} which were retorted by Ousterhout {https://goo.gl/mgFb4D}. It would have helped if they had tabulated few of such real-world write workloads comparing LFS vs FFS, and comparing vs few more file systems.
5.Confusion
Couldn’t clearly understand the quantification of the cleaning policy and cost, and then interpreting the results based on those. How is FFS efficient at sequential rereads, while LFS is not owing to their locality?

Posted by: Tithy Sahu | March 17, 2016 07:13 AM

Summary

The paper presents LFS - new filesystem design where the basic idea is to handle reads through caching and writes by appending large segments to a log there by increasing disk performance on write operations.

Problem

Filesystems design is governed mostly by two factors - technology and workload. CPU's are getting faster, main memory is becoming larger and faster and the disks are becoming larger in size rapidly even though the disk performance is not improving rapidly. Thus, modern day file systems can cache recently used data more in main memory. Hence, most reads are satisfied from the buffer cache and the disk traffic mostly comprises of writes. To speed things up, writes need to happen faster but the disk performance is limited by the disk head movement. Small files workload usually result in small random I/O's and majority of the data written on disk comprises of metadata compared to the original data. This data and metadata is spread across information to achieve locality and hence read operations need to issue several disk seeks resulting in lower read performance.

Contribution

The fundamental idea used here is to treat the disk as a log where the data is always written at the head. All the file system change is accumulated in the buffer cache and then written down to the disk sequentially. This helps in utilizing 100% bandwidth of the raw disk hence improving the write performance. In order to retrieve information(read operations) from the log, LFS uses various data structures to allow random-accesses. LFS uses inode map to identify where on the log, the inode information is found. Inode map information is also written on the log and the location of the inode map is identified by fixed checkpoint regions. So, once the inode is found, the read operation is analogous to Unix FFS. In order to manage free disk space so that large section of disk is available for writes, the disk is divided into large fixed-sized extents called segments. Segments are always written sequentially from beginning to end. The segment cleaning mechanism is to read a number of segments into memory, identify the live data and write the live data back to a smaller number of clean segments. The segment cleaning policy choose segments based on utilization - how much is to be gained by cleaning the segments and by age - how likely is the segment to change soon. To support the segment policy, segment usage table is maintained that holds information about the amount of live data in the segment and the most recent modified time of any block in the segment. To handle crash recovery, LFS regularly checkpoints the data log so that all the file system structures are consistent and complete. After crashes, LFS uses a roll forward operation by scanning through the log segments that were written after the last checkpoint, to recover the recently written data and also get back the file system structures consistency.

Evaluation

I am impressed by the amount of effort spent in the evaluation. The authors used simulator to play around with various segment cleaning policy and then decide on the best policy and the results are convincing to back the authors claim of why they choose a particular policy. Then they talk about the real time usage of Sprite-LFS in their department which shows the confidence they had in their idea. The authors compare Sprite-LFS against Unix FFS by running various microbenchmarks to illustrate the strength and weakness of the two file systems. In the first test(creates/reads/deletes), the benchmark is run to obtain the best-case performance (without segment cleaning) and LFS performed almost 10X faster than FFS for creates and deletes. The results also showed that as the cPU utilization is less compared to FFS, faster CPU will improve the performance of LFS by a factor of 4-6. For large-file's too, LFS performed better or at par with that of FFS. To measure the cost of segment cleaning, the authors recorded stats of the production system for several months and found that the results were substantially better than the simulation results. The overall write costs ranged from 1.2 to 1.6. The authors also provide a measurement of recovery time after a crash. They observed that recovery time varies with the number and size of files written between the last checkpoint and the crash. Lastly, they show the breakup of how the ratio of data vs metadata resides in the disk. Majority of the live data is data blocks though 13% of it is used for metadata. The authors expect to bring this down once the roll-forward recovery mechanism is installed.

Confusion
I dont have any confusion in this paper as I have spent around 28% of my lifetime writing filesystem(WAFL) code that is similar to LFS.

Posted by: Yuvraj | March 17, 2016 06:42 AM

CS 736 Reviews - Spring 2016

The Design and Implementation of a Log-Structured File System.

Comments

Post a comment