Lakshmi N. Bairavasundaram
Email: laksh AT SERVER cs.wisc.edu

        
My full name is Lakshmi Narayanan Bairavasundaram. I'm a Ph.D. student in Computer Sciences at University of Wisconsin-Madison. My research area is Operating systems. I work in the ADvanced Systems Laboratory, and my advisors are Prof. Andrea C. Arpaci-Dusseau and Prof. Remzi H. Arpaci-Dusseau.
 
 
I will be joining the Advanced Technology Group at NetApp in October! My job application materials are still available:

Resume: [PDF] (Last updated: Mar 17, 2008)          Research Statement: [PDF]

             
        

News

  • February 2008: Our FAST papers (Data corruption, Parity Lost) have featured in an article on storagemojo.com
  • February 2008: Our FAST'08 paper on data corruption has won the "Best Student Paper" award.
  • February 2008: Our SIGMETRICS'07 paper on latent sector errors has featured in an article on storagemojo.com

Research


My research interests include File and Storage Systems, Operating Systems, and Fault Tolerance.
    
Dissertation: "The characteristics, impact and tolerance of partial disk failures".
My research centers around understanding how computer systems fail, and developing techniques to avoid such failures. Much of the value people place in computers stems from the data stored in them. Hence, my dissertation focuses on failures related to loss or corruption of data. Disk drive failures are the primary causes of data loss. These failures are typically partial failures, where some disk sectors are unavailable due to a latent sector error or some disk blocks are silently corrupted. The goals of my dissertation are to (i) understand the characteristics of partial disk failures, (ii) analyze how these failures impact components of the storage stack, and (iii) develop solutions to tolerate such failures.

Failure characteristics: I have analyzed the occurrence and characteristics of latent sector errors and data corruption in a population of 1.53 million disk drives [SIGMETRICS07, FAST08a]:
  • The study of latent sector errors found that almost 20% of nearline (SATA) disk drives are afflicted by latent sector errors in 2 years of use, and that latent sector errors show high spatial and temporal locality.
  • The analysis of data corruption identified interesting corruption trends including the tendency of consecutive disk blocks to become corrupt, and the non-independence of corruption instances within the same disk and across different disk drives in the same storage system.
Impact on storage stack: Given that partial failures could affect a significant percentage of disks, it is important to understand their impact on different elements of the storage stack:
  • I have analyzed the impact of such errors on modern commodity file systems (IBM JFS, ext3, and Windows NTFS) using "type-aware" fault injection techniques [SOSP05, StorageSS06, DSN08]. The analyses found that even widely-used file systems have bugs in failure handling code, use illogically-inconsistent policies, and do not implement fault tolerance techniques like type-checking and replication effectively.
  • I have also investigated the mechanisms used by virtual memory systems (of Linux, FreeBSD, and Windows XP) to tolerate disk errors [DSN06], and found that they are inconsistent and ineffective as well.
  • I have applied model checking to data protection schemes used in real parity-based RAID systems and found that they are vulnerable to data loss or corruption due to disk errors such as "lost" writes [FAST08b].
Tolerance: The studies above show that even widely-used systems have bugs and cannot be trusted to handle partial disk failures. Therefore, it is important to lower the trust we place in any single system. I am currently developing a file system architecture based on N-version programming principles that reduces the need to trust any one file system.
    
Other Research: My initial research experience was built on studying how different layers of a system interact and how "gray-box" techniques can be used in such layers to overcome the limitations of narrow interfaces. A specific application of such techniques has been in semantically-smart disk systems. These are disk systems that leverage basic knowledge of file system operation to provide significant improvements in performance, availability and security. I have primarily explored the performance angle. I have developed a cache mechanism for disk arrays that utilizes basic knowledge of file system data structures to provide an exclusive cache i.e. a cache that retains blocks that are not present in the file system cache [ISCA04]. I have also extended this technique to work effectively with database systems [FAST05]. Finally, I have worked on using block liveness information in a semantically-smart disk [OSDI04]. In addition to my research at Madison, I have been on research internships at IBM T. J. Watson Research Center, NY (June-Aug 2004), where I developed a distributed caching mechanism for storage systems, Intel Corporation, OR (May-Aug 2005), where I developed techniques for hypervisor-based fault injection, and Network Appliance, CA (Jun-Aug 2006, Jun-Aug 2007) where I analyzed data on storage system errors.
 
 
 

Publications


Analyzing the Effects of Disk-Pointer Corruption

Lakshmi N. Bairavasundaram, Meenali Rungta, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift
To appear in the Proceedings of the International Conference on Dependable Systems and Networks (DSN'08)
Anchorage, Alaska. June 2008.

Available as: Abstract PDF Postscript BibTex

An Analysis of Data Corruption in the Storage Stack

Lakshmi N. Bairavasundaram, Garth R. Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the 6th USENIX conference on File and Storage Technologies (FAST'08)
San Jose, California. February 2008.

Best Student Paper Award!
Available as: Abstract PDF Postscript BibTex

Parity Lost and Parity Regained

Andrew Krioukov, Lakshmi N. Bairavasundaram, Garth R. Goodson, Kiran Srinivasan, Randy Thelen, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the 6th USENIX conference on File and Storage Technologies (FAST'08)
San Jose, California. February 2008.

Available as: Abstract PDF Postscript BibTex

An Analysis of Latent Sector Errors in Disk Drives

Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, Jiri Schindler.
Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07)
San Diego, California. June 2007.

Kenneth C. Sevcik Outstanding Student Paper Award!
Available as: Abstract PDF Postscript BibTex

Limiting Trust in the Storage Stack

Lakshmi N. Bairavasundaram, Meenali Rungta, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the 2nd International Workshop on Storage Security and Survivability (StorageSS'06)
Alexandria, Virgina. October 2006.

Available as: Abstract PDF Postscript BibTex

Dependability Analysis of Virtual Memory Systems

Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the International Conference on Dependable Systems and Networks (DSN'06)
Philadelphia, Pennsylvania. June 2006.

Available as: Abstract PDF Postscript BibTex

Semantically-Smart Disk Systems: Past, Present, and Future

Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Lakshmi N. Bairavasundaram, Timothy E. Denehy, Florentina I. Popovici, Vijayan Prabhakaran, Muthian Sivathanu
Sigmetrics Performance Evaluation Review (PER)
Volume 33, Number 4. March 2006.

Available as: Abstract PDF Postscript BibTex

Database Aware Semantically-smart Storage

Muthian Sivathanu, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the 4th USENIX conference on File and Storage Technologies (FAST'05)
San Francisco, California. December 2005.

Available as: Abstract PDF Postscript BibTex

IRON File Systems

Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of 20th ACM Symposium on Operating Systems Principles (SOSP'05)
Brighton, United Kingdom. October 2005.

Available as: Abstract PDF Postscript BibTex

Life or Death at Block Level

Muthian Sivathanu, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of 6th Symposium on Operating Systems Design and Implementation (OSDI'04)
San Francisco, California. December 2004.

Available as: Abstract PDF Postscript BibTex

X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs

Lakshmi N. Bairavasundaram, Muthian Sivathanu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04)
Munich, Germany. June 2004.

Available as: Abstract PDF Postscript BibTex

Dynamic Path Profile Aided Recompilation in a JAVA Just-In-Time Compiler
R. Vinodh Kumar, B. Lakshmi Narayanan, R. Govindarajan
Proceedings of the International Conference on High Performance Computing (HiPC'02)
Bangalore, India. December 2002.

Available as: PDF

Functional Unit Usage Based Thread Selection in a Simultaneous Multithreaded Processor

Deepak Babu M.I, Lakshmi Narayanan B, Madhu Saravana Sibi G, Ranjani Parthasarathi
Poster Session of the International Conference on High Performance Computing (HiPC'01)
Hyderabad, India. December 2001.

Available as: PDF
 
 
 

Internships


Summer Intern. Advanced Development Group, Network Appliance.
Sunnyvale, CA. June - August, 2007.
I analyzed the occurrence of silent data corruption in disk drives, identifying important characteristics that would be useful for corruption-proof system design.

Summer Intern. Advanced Development Group, Network Appliance.
Sunnyvale, CA. June - August, 2006.
I analyzed data on latent sector errors (disk errors wherein a sector becomes inaccessible), examining the dependence on factors such as disk drive age and capacity, and identifying characteristics such as spatial and temporal locality.

Summer Intern. Core Virtualization Research Group, Intel Corporation.
Hillsboro, OR. May - August, 2005.
I developed techniques for hypervisor-based fault injection and applied these techniques to study operating system behavior when memory and disk errors occur.

Summer Intern. IBM T.J. Watson Research Center.
Hawthorne, NY. May - August, 2004.
I designed and implemented an extensible and scalable distributed cache architecture for an enterprise storage system. My scheme involved the use of remote memory (on machines over a high-speed LAN) for caching disk blocks.

Summer Research Fellow. Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR)
Bangalore, India. May - June, 2001.
I developed an instruction scheduler for a Java Just-in-Time compiler. The scheduler was built to utilize path profile information.