Nitin Agrawal's Home Page

Intro

I'm a sixth (and final!) year PhD candidate working with Professors Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau as part of the Advanced Systems Lab. During my PhD, I have interned at Microsoft Research Redmond, Microsoft Research Silicon Valley and IBM Almaden Research.

I am looking for a full time position beginning Spring/Summer 2009
CV (in pdf)

Recent news: FAST paper gets Best Paper Award!

Research

My research interests lie broadly in systems, with an emphasis on operating systems and file & storage systems.
1. Design and evaluation of novel techniques for benchmarking applications and file systems [FAST '09, HotMetrics '08, SUBMIT '09]
2. Designing and understanding Solid State Devices (SSDs) [USENIX '08]
3. Large-scale longitudinal study of file-system metadata [FAST '07, ACM TOS '07]
4. Analyses and solutions for reliability in the storage stack [SOSP '05, DSN '08, USENIX '06]
5. Techniques for deconstructing commercial storage clusters (EMC Centera) [ISCA '05]

Representative Publications

Generating Realistic Impressions for File-System Benchmarking
Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
Proceedings of the 7th Conference on File and Storage Technologies (FAST '09), Feb 2009, San Francisco, CA.
Available as: Abstract, Postscript, PDF, BibTex
Best Paper Award
Download the Impressions framework source code here (coming soon!)

Design Tradeoffs for SSD Performance
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, Rina Panigrahy.
Usenix Annual Technical Conference (USENIX '08), June 2008, Boston, MA.
Available as: Abstract, Postscript, PDF, BibTeX [Press: Storagemojo]
Download the SSD simulator source code here

A Five-Year Study of File-System Metadata
Nitin Agrawal, William J. Bolosky, John R. Douceur, Jacob R. Lorch.
Proceedings of the 5th Conference on File and Storage Technologies (FAST '07), Feb 2007, San Jose, CA.
Available as: Abstract, Postscript, PDF, BibTex
Selected as a top paper and forwarded to ACM TOS
Download the dataset from SNIA's IOTTA Repository here

Towards Realistic File-System Benchmarks with CodeMRI
Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
Appears in ACM HotMetrics '08, June 2008, Annapolis, MD.
and in SIGMETRICS Performance Evaluation Review (PER), Volume 36, Issue 2 (Sep 2008)
Available as: Abstract, Postscript, PDF, BibTeX

Other Publications

IRON File Systems
Vijayan Prabhakaran, Lakshmi Bairavasundaram, Nitin Agrawal, Haryadi Gunawi, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05), October 2005, Brighton, UK
Available as: Abstract, Postscript, PDF, BibTex

Deconstructing Commodity Storage Clusters
Haryadi S. Gunawi, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Jiri Schindler.
Proceedings of the 32nd International Symposium on Computer Architecture (ISCA'05), June 2005, Madison, WI
Available as: Abstract, Postscript, PDF, BibTex

Analyzing the Effects of Disk Pointer Corruption
Lakshmi Bairavasundaram, Meenali Rungta, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift.
38th Conference on Dependable Systems and Networks (DSN '08), June 2008, Alaska, AK.
Available as: Abstract, Postscript, PDF, BibTeX

A Five-Year Study of File-System Metadata
Nitin Agrawal, William J. Bolosky, John R. Douceur, Jacob R. Lorch.
ACM Transactions on Storage (TOS), Volume 3, Issue 3 (Oct 2007)
Available as: Abstract, Postscript, PDF, BibTex

Still Other Publications

Symbolic Rule-Extraction from Artificial Neural Networks
Nitin Agrawal.
Bachelor's Thesis, Department of Computer Sciences, Institute of Technology, BHU, India, May 2003

Description of Research

Enabling Realistic and Practical File-System Benchmarking (thesis research):

Everyone cares about data, from scientists running simulations to families storing photos and tax returns. Thus, the file and storage systems that store and retrieve our important data play an essential role in our computer systems. In spite of tremendous advances in file system design, the approaches for benchmarking still lag far behind. My dissertation research bridges this gap with three contributions.

In the first part of my thesis, I perform a large scale analysis of file system metadata collected over a period of five years. The metadata snapshots were used to study temporal changes in file size, file age, type frequency, namespace structure etc., and we give consequent lessons for designers of file systems and related software. We also presented a generative model that explains the namespace structure and the distribution of directory sizes [FAST ’07, TOS ’07].
    Once we have a good understanding of the properties of metadata, we need a mechanism to enable this information to be put to useful practice. Most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb. Furthermore, the lack of standardization and reproducibility makes file system benchmarking ineffective. To remedy these problems, I develop Impressions, a framework to generate statistically accurate file-system images with realistic metadata and content. Impressions is flexible in supporting user-specified constraints on various file-system parameters, using a number of statistical techniques to generate consistent images. Using desktop search as a case study, I demonstrate that incorporating the effects of metadata and file content is crucial to understanding the performance and storage characteristics of file systems and applications [FAST ’09].
    In the second part of my thesis, I investigate techniques for creating realistic benchmark workloads. Synthetic file system benchmarks are widely used, but largely based on the benchmark writer’s interpretation of the real workload. This approximation is insufficient since even a simple operation through the API may end up exercising the file system in very different ways due to effects of features such as caching and prefetching. I have taken first steps in creating “realistic synthetic” benchmarks by building a tool, CodeMRI, that leverages file-system domain knowledge and a small amount of system profiling in order to better understand how the benchmark is stressing the system and to deconstruct its workload [HotMetrics ’08, PER ’08].
    The last part of my thesis addresses the problem of scalable benchmarking. Storage capacities have seen a tremendous increase in the past few years; terabyte-sized disks are now easily available for desktop computers. However, in order to benchmark file systems and applications that operate on such large disk partitions, the setup required is often cumbersome and the benchmark takes an inconvenient amount of time to finish. I am currently working on a system that makes it practical to run benchmarks on large file systems [FAST '09 WIP].

Solid-State Storage Devices: Flash-based SSDs have the potential to change the storage landscape. Recently, I worked on design tradeoffs that are relevant to NAND-flash solid-state storage. We analyzed several of these tradeoffs using a trace based disk simulator that we built to characterize different SSD organizations. More specifically, we worked on designing high-performance solid-state drives for I/O intensive workloads. We proposed algorithms for cleaning and wear-leveling flash-media to make it viable for use in environments with high I/O rates, along with improvements in the performance of random writes – a substantial drawback of flash-based disks in their current form. Our analysis was driven by various traces captured from running systems such as a full-scale TPC-C benchmark, an Exchange server workload, and various standard file system benchmarks. From our analysis, we found that SSD performance and lifetime is highly workload-sensitive, and that complex systems problems that normally appear higher in the storage stack, or even in distributed systems, are relevant to device firmware. We also presented the design of high-performance flash-disks and disk-array configurations based on these disks [USENIX ’08].

Reliability in the Storage Stack: Hardware constituents of the storage stack (storage devices, interconnects, etc) fail and software (device firmware, drivers, file systems) exhibits bugs and other inconsistencies, leading to data loss and corruption. I have worked on techniques to help understand causes of failures in the storage stack:
• Analyzed how commodity file systems handle disk failures. I built a type-aware fault injection framework for the Reiser file system and performed analysis of its fault handling towards partial disk failures [SOSP ’05].
• File systems demonstrate inconsistent and inadequate handling of latent sector errors and other partial disk failures. In order to identify the root causes of the observed inconsistencies, I developed Differential Failure Analysis, a combination of static and run-time analysis to achieve a thorough understanding of the failure handling characteristics of file-system source code [USENIX ’06].
• Developed and applied type-aware corruption to understand the effects of disk-pointer corruption on file-system reliability, using Windows NTFS and Linux ext3 as case studies. I analyzed the ext3 file system [DSN ’08].
• Evaluated failure handling of SCSI drivers by injecting faults at the lowest level of the tiered SCSI architecture and observing the detection and recovery mechanisms employed by the upper driver levels.

Commodity Storage Clusters: High-end storage systems are increasingly being built using commodity components. We designed techniques for characterizing complex storage clusters in the context of the EMC Centera storage system. By correlating disk and network traffic with the running workload using observation and delay, we inferred the structure of the software system as well as its policies (e.g., how it performs caching, replication, load-balancing) without any access to the source code [ISCA ’05].

Regulatory Compliant Storage: The introduction of federal regulations such as Sarbanes-Oxley and HIPAA mandate stricter enforcing of data retention, access and tampering guidelines. I worked on an auditing framework to enforce regulatory compliance on archival storage. The focus was to provide continuous verification of system state and support feature-rich querying. As part of a larger project on compliant storage at IBM Almaden, I designed and built a prototype for the auditing engine.