Block-level Inline Data Deduplication in ext3

Aaron Brown, Kris Kosmatka

Abstract: Solid State Disk (SSD) media are increasingly being used as primary storage in consumer and mobile devices. This trend is being driven by factors including low power demands, resistance to environmental shocks and vibrations, and by superior random access performance. However, SSDs have some important limitations including high cost, small capacity, and limited erase-write cycle lifespan. Inline data deduplication offers one possible way to ameliorate these problems by avoiding unnecessary writes and enabling more efficient use of space. In this work we propose an inline block-level deduplication layer for ext3 called Dedupfs. To identify potential deduplication opportunities Dedupfs maintains an in-memory cache of block hashes. Block reference counts are monitored for each block in the filesystem in order to prevent freeing a still-referenced block. The new metadata structures in Dedupfs are independent of and complimentary to existing ext3 structures which ensures easy backward compatibility.

Paper available as: PDF

Click here to download our software.

Click here to download our presentation slides.