Thanh Do Thanh Do



I am currently a Senior Staff Software Engineer at Google, where I manage the F1 database group at Google Madison office.

I got my Ph.D. from Computer Sciences Department at University of Wisconsin-Madison in 2014. During my Ph.D. life, I worked with Professors Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau My dissertation, "Towards Reliable Cloud Systems", is all about techniques to improve reliability of cloud systems. I obtained my B.S. at Hanoi University of Technology (HUT) in 2006.

Professional Services

Publications

  • As recorded by DBLP, Google Scholar
  • Offset-value coding in database query processing
    Goetz Graefe, Thanh Do
    Proceedings 26th International Conference on Extending Database Technology (EDBT 2023)
    Ioannina, Greece, March 2023
    Available as: PDF

  • Robust and Efficient Sorting with Offset-value Coding
    Thanh Do, Goetz Graefe
    ACM Transactions on Database Systems, Volumn 48, Issue 1, Article 2 (March 2023)
    Available as: ACM digital library entry

  • Efficient sorting, duplicate removal, grouping, and aggregation
    Thanh Do, Goetz Graefe, Jeff Naughton
    ACM Transactions on Database Systems, Volumn 47, Issue 4, Article 16 (December 2022)
    Available as: PDF

  • Napa: powering scalable data warehousing with robust query performance at Google
    With the Napa team at Google
    Proceedings of the VLDB Endowment 2021
    Available as: VLDB entry

  • External merge sort for Top-K queries: Eager input filtering guided by histograms
    Yannis Chronis, Thanh Do, Goetz Graefe, Keith Peters
    Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD'20)
    Available as: ACM digital library entry

  • F1 query: Declarative querying at scale
    With the F1 database group at Google
    Proceedings of the VLDB Endowment 2018 (VLDB'18)
    Available as: ACM digital library entry

  • Towards Pre-Deployment Detection of Performance Failures in Cloud Distributed Systems
    Riza O. Suminto, Agung Laksono, Anang D. Satria, Thanh Do, Haryadi S. Gunawi
    To appear in the Proceedings of the 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '15)
    Santa Clara, CA, July, 2015

  • What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems
    With many others
    Proceedings of the 2014 ACM Symposium on Cloud Computing (SoCC '14)
    Seattle, WA, Novemeber 2014.
    Available as: PDF

  • Physical Disentanglement in a Container-Based File System
    Lanuye Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
    Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI '14)
    Broomfield, CO, October 2014.
    Available as: Abstract, PDF, BibTex

  • Limplock: Understanding the Impact of Limpware on Scale-out Cloud Systems
    Thanh Do, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Haryadi S. Gunawi
    Proceedings of the 2013 ACM Symposium on Cloud Computing (SoCC '13)
    Santa Clara, CA, Oct 2013.
    Available as: PDF, Slides
    In the news: The Register, Online backup Managzine

  • Impact of Limpware on HDFS: A Probabilistic Estimation
    Thanh Do, Haryadi S. Gunawi
    CoRR, 2013, abs/1311.3322
    Available as: PDF

  • The Case for Limping-Hardware Tolerant Clouds
    Thanh Do, Haryadi S. Gunawi
    Proceedings of the 5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '13)
    San Jose, CA, June 2013.
    Available as: PDF

  • HARDFS: Hardening HDFS with Selective and Lightweight Versioning
    Thanh Do, Tyler Harter, Yingchao Liu, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
    Proceedings of the 9th Conference on File and Storage Technologies (FAST '13)
    San Jose, CA, Feb 2013.
    Available as: Abstract, PDF, BibTex
    Talk slides: PDF

  • Failure as a Service (FaaS): A Cloud Service for Large-Scale, Online Failure Drills
    Haryadi S. Gunawi, Thanh Do, Joseph M. Hellerstein, Ion Stoica, Dhruba Borthakur, Jesse Robbins
    UCB Technical Report, 2011
    Available as: PDF

  • FATE and DESTINI: A Framework for Cloud Recovery Testing
    Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Peter Alvaro, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik Sen, Dhruba Borthakur
    Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI '11)
    Boston, MA, March 30-April 1, 2011.
    Available as: Abstract, PDF, BibTex

  • Towards Automatically Checking Thousand of Failures with Micro-specifications
    Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik Sen
    Proceedings of the 6th Workshop on Hot Topics in System Dependability (HotDep '10)
    Vancouver, BC, Canada, October 2010.
    Available as: Abstract, PDF
    Talk Slides: PowerPoint

  • ptop: A process-level power profiling tool
    Thanh Do, Suhib Rawshdeh, Weisong Shi
    Proceedings of the 2nd Workshop on Power Aware Computing and Systems (HotPower’09)
    Big Sky, MT, October 2009.
    Available as: PDF

Work Experience

  • Senior Staff Software Engineer, Google Madison, since 2023
        F1 database group

  • Senior Staff Software Engineer, Celonis, 2021-2023
        Data infrastructure group

  • Software Engineer, Databricks, 2021
        Advanced database group

  • Software Engineer, Google Madison, 2016-2021
        F1 database group

  • Research Scientist, Microsoft, 2014-2016
        Gray Systems Lab

  • Research Assistant, UW-Madison, 2010-2014
        The ADvanced Systems Laboratory (ADSL)

  • Engineering Intern, Google Inc., Summer 2012
        Platform Team at Madison Office

  • Teaching Assistant, UW-Madison, 2009-2010
        Courses: Introduction to Programming (CS-302), Undergrad OS (CS-537)

  • Lecturer, Hanoi University of Technology, 2006-2008
        Courses: Introduction to Programming, Undergrad Operating Systems

Misc