We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.
Aaaaaaa this is fantastic work https://t.co/otNLPDt00R— Bear Conditioning (@aphyr) March 8, 2017
what happens to distributed systems when there are IO errors? https://t.co/wURSluxGoZ— 🔎Julia Evans🔍 (@b0rk) March 8, 2017