I am with the Data Systems Group, Microsoft Research. I received my Ph.D. in the Database Group at the University of Wisconsin-Madison, under the supervision of Prof. Jeffrey Naughton. I have broad interest in database system, data mining, and machine learning. I am currently working on query optimization, query processing, database system performance tuning, big data systems, distributed systems, data stream processing, and machine learning systems. In the past, I have worked on various topics including graph data management, personal data management, knowledgebase construction, social network analysis, data privacy, entity matching in data integration, database as a service in the cloud, and so on.
Selected Projects
- Autonomous Performance Tuning for Database/Big Data Systems:
- Budget-aware Query Tuning: An AutoML Perspective [SIGMOD Record 2024]
- Hybrid Cost Modeling for Reducing Query Performance Regression in Index Tuning [IEEE TKDE 2024]
- Wii: Dynamic Budget Reallocation In Index Tuning [SIGMOD 2024]
- Wred: Workload Reduction for Scalable Index Tuning [SIGMOD 2024]
- ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges [SIGMOD Record 2023]
- DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning [VLDB 2022]
- Budget-aware Index Tuning with Reinforcement Learning [SIGMOD 2022]
- ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning [SIGMOD 2022]
- Hyperspace: The Indexing Subsystem of Azure Synapse [VLDB 2021]
- Helios: Hyperscale Indexing for the Cloud and Edge [VLDB 2020]
- AI Meets AI: Leveraging Query Executions to Improve Index Recommendations [SIGMOD 2019]
- Plan Stitch: Harnessing the Best of Many Plans [VLDB 2018]
- Efficient and Scalable Machine Learning Systems:
- Stochastic Gradient Descent without Full Data Shuffle: with Applications to In-Database Machine Learning and Deep Learning Systems [VLDB Journal 2024]
- How Good are Machine Learning Clouds? Benchmarking Two Snapshots over 5 Years [VLDB Journal 2023]
- A Systematic Evaluation of Machine Learning on Serverless Infrastructure [VLDB Journal 2023]
- In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle [SIGMOD 2022]
- Towards Demystifying Serverless Machine Learning Training [SIGMOD 2021]
- OpenBox: A Generalized Black-box Optimization Service [KDD 2021]
- VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition [VLDB 2021]
- Magpie: Python at Speed and Scale using Cloud Backends [CIDR 2021]
- Model Averaging in Distributed Machine Learning: A Case Study with Apache Spark [VLDB Journal 2021]
- ColumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent [ICDE 2020]
- MLlib*: Fast Training of GLMs using Spark MLlib [ICDE 2019]
- MLBench: Benchmarking Machine Learning Services Against Human Experts [VLDB 2018]
- MLog: Towards Declarative In-Database Machine Learning [VLDB 2017]
- Ease.ML: A Lifecycle Management System for MLDev and MLOps:
- Data Debugging with Shapley Importance over Machine Learning Pipelines [ICLR 2024]
- Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise [ICDE 2023]
- Data Science Through the Looking Glass: Analysis of Millions of GitHub Notebooks and ML.NET Pipelines [SIGMOD Record 2022]
- A Data Quality-Driven View of MLOps [IEEE Data Engineering Bulletin 2021]
- Ease.ML: A Lifecycle Management System for Machine Learning [CIDR 2021]
- Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions [VLDB 2021]
- Ease.ml/snoopy in Action: Towards Automatic Feasibility Analysis for Machine Learning Application Development [VLDB 2020]
- Building Continuous Integration Services for Machine Learning [KDD 2020]
- Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization [VLDB 2019]
- Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment [SysML 2019]
- Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads [VLDB 2018]
- Ease.ml in Action: Towards Multi-tenant Declarative Learning Services [VLDB 2018]
- Query Optimization for Data Stream Processing Systems:
- Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates [ICDE 2022]
- Optimization of Threshold Functions over Streams [VLDB 2021]
- Serverless Event-Stream Processing over Virtual Actors [CIDR 2019]
- Cost Modeling and Query Optimization for Database Systems:
- Sampling-Based Query Re-Optimization [SIGMOD 2016]
- Uncertainty Aware Query Execution Time Prediction [VLDB 2014]
- Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads [VLDB 2013]
- Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable? [ICDE 2013]
- Probase: A Probabilistic Taxonomy for Text Understanding:
Professional Services
I am a program committee member of the following conferences:
- ACM International Conference on Management of Data (SIGMOD):
- 2025 (Demo Track)
- 2024 (Demo Track)
- 2020 (Research Track)
- 2018 (Research Track)
- 2017 (Research Track)
- 2016 (PhD Symposium Track)
- International Conference on Very Large Data Bases (VLDB):
- 2023 (Research Track)
- 2020 (Demo Track)
- IEEE International Conference on Data Engineering (ICDE):
- 2025 (Industry Track)
- 2023 (Industry and Applications Track)
- ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD):
- 2025 (Applied Data Science Track)
- 2024 (Applied Data Science Track)
- 2023 (Applied Data Science Track, 2nd Workshop on Decision Intelligence and Analytics for Online Marketplaces)
- 2022 (Applied Data Science Track)
- 2021 (Applied Data Science Track)
- ACM International Conference on Research and Development in Information Retrieval (SIGIR):
- 2024 (Research Track)
- 2023 (Research Track)
- 2022 (Research Track)
- 2021 (Research Track)
- 2020 (Research Track)
- International Conference on World Wide Web (WWW):
- 2024 (Industry Track)
- 2016 (PhD Symposium Track)
- ACM International Conference on Web Search and Data Mining (WSDM):
- 2025 (Research Track)
- 2024 (Research Track)
- 2023 (Research Track)
- 2021 (Research Track)
- ACM Conference on Information and Knowledge Management (CIKM):
- 2024 (Research Track)
- 2023 (Research Track)
- 2022 (Research Track)
- 2021 (Research Track)
- 2020 (Research Track)
- 2018 (Research Track)
- 2017 (Research Track)
- European Conference on Machine Learning and Data Mining (ECML/PKDD):
- 2024 (Applied Data Science Track)
- 2023 (Applied Data Science Track)
- International Conference on Extending Database Technology (EDBT):
-
2024
- Hybrid Cost Modeling for Reducing Query Performance Regression in Index Tuning.
Wentao Wu
In IEEE Transactions on Knowledge and Data Engineering, 2024. [PDF] [arXiv]
- Budget-aware Query Tuning: An AutoML Perspective.
Wentao Wu and Chi Wang
In SIGMOD Record, Vol. 53, No. 3: 20-26, 2024. [PDF] [arXiv]
- Wii: Dynamic Budget Reallocation In Index Tuning.
Xiaoying Wang, Wentao Wu, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri.
In Proceedings of the ACM on Management of Data (SIGMOD 2024), Vol. 2, Issue 3, Article No. 182: 1-26, 2024. [PDF] [FULL]
- Wred: Workload Reduction for Scalable Index Tuning.
Matteo Brucato, Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri.
In Proceedings of the ACM on Management of Data (SIGMOD 2024), Vol. 2, Issue 1, Article No. 50: 1-26, 2024. [PDF]
- Data Debugging with Shapley Importance over Machine Learning Pipelines.
Bojan Karlas, David Dao, Matteo Interlandi, Sebastian Schelter, Wentao Wu, and Ce Zhang.
In International Conference on Learning Representations (ICLR 2024), 2024. [PDF] [arXiv]
- Stochastic Gradient Descent without Full Data Shuffle: with Applications to In-Database Machine Learning and Deep Learning Systems.
Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang.
In VLDB Journal, Vol. 33, No. 5: 1231-1255, 2024. [PDF]
2023
- ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges.
Tarique Siddiqui and Wentao Wu.
In SIGMOD Record, Vol. 52, No. 4: 19-30, 2023. [PDF] [arXiv]
- Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise.
Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, and Ce Zhang.
In Proceedings of the IEEE 39th International Conference on Data Engineering (ICDE 2023): 218-231, 2023. [PDF] [arXiv]
- A Systematic Evaluation of Machine Learning on Serverless Infrastructure.
Jiawei Jiang, Shaoduo Gan, Bo Du, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Sheng Wang, and Ce Zhang.
In VLDB Journal, Vol. 33, No. 2: 425-449, 2023. [PDF]
- How Good are Machine Learning Clouds? Benchmarking Two Snapshots over 5 Years.
Jiawei Jiang, Yi Wei, Yu Liu, Wentao Wu, Chuang Hu, Zhigao Zheng, Ziyi Zhang, Yingxia Shao, and Ce Zhang.
In VLDB Journal, Vol. 33, No. 3: 833-857, 2023. [PDF]
2022
- Budget-aware Index Tuning with Reinforcement Learning.
Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek Narasayya, Surajit Chaudhuri, and Philip A Bernstein.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2022): 1528-1541, 2022. [PDF] [FULL] [Slides]
- ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning.
Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2022): 660-673, 2022. [PDF] [FULL]
- In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle.
Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, and Ce Zhang.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2022): 1286-1300, 2022. [PDF] [arXiv]
- DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning.
Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri.
In Proceedings of the VLDB Endowment, Vol. 15, No. 10 (VLDB 2022): 2019-2031, 2022. [PDF]
- Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates.
Wentao Wu, Philip A. Bernstein, Alex Raizman, and Christina Pavlopoulou.
In Proceedings of the IEEE 38th International Conference on Data Engineering (ICDE 2022): 2723-2735, 2022. [PDF] [FULL] [arXiv] [Slides]
- Data Science Through the Looking Glass: Analysis of Millions of GitHub Notebooks and ML.NET Pipelines.
Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, and Konstantinos Karanasos.
In SIGMOD Record, Vol. 51, No. 2: 30-37, 2022. [PDF] [arXiv]
-
2021
- OpenBox: A Generalized Black-box Optimization Service.
Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, and Bin Cui.
In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2021): 3209-3219, 2021. [PDF] [arXiv]
- Hyperspace: The Indexing Subsystem of Azure Synapse.
Rahul Potharaju, Terry Kim, Eunjin Song, Wentao Wu, Lev Novik, Apoorve Dave, Andrew Fogarty, Pouria Pirzadeh, Vidip Acharya, Gurleen Dhody, Jiying Li, Sinduja Ramanujam, Nicolas Bruno, Cesar Galindo-Legaria, Vivek Narasayya, Surajit Chaudhuri, Anil K. Nori, Tomas Talius, and Raghu Ramakrishnan.
In Proceedings of the VLDB Endowment, Vol. 14, No. 12 (VLDB 2021): 3043-3055, 2021. [PDF]
- VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition.
Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Yaliang Li, Bolin Ding, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang, and Bin Cui.
In Proceedings of the VLDB Endowment, Vol. 14, No. 11 (VLDB 2021): 2167-2176, 2021. [PDF] [arXiv]
- Optimization of Threshold Functions over Streams.
Walter Cai, Philip A. Bernstein, Wentao Wu, and Badrish Chandramouli.
In Proceedings of the VLDB Endowment, Vol. 14, No. 6 (VLDB 2021): 878-889, 2021. [PDF]
- Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions.
Bojan Karlas, Peng Li, Renzhi Wu, Nezihe Merve Gurel, Xu Chu, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 14, No. 3 (VLDB 2021): 255-267, 2021. [PDF] [arXiv]
- The Case for ML-Enhanced High-Dimensional Indexes.
Rong Kang, Wentao Wu, Chen Wang, Ce Zhang, and Jianmin Wang.
In Proceedings of the 3rd International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB 2021), 2021. [PDF]
- Towards Demystifying Serverless Machine Learning Training.
Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, and Ce Zhang.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2021): 857-871, 2021. [PDF] [arXiv]
- Towards Understanding End-to-End Learning in the Context of Data: Machine Learning Dancing over Semirings & Codd's Table.
Wentao Wu and Ce Zhang.
In Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning (DEEM@SIGMOD 2021): 1-4, 2021. [PDF]
- Magpie: Python at Speed and Scale using Cloud Backends.
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, Wentao Wu, and Hiren Patel.
In Conference on Innovative Data Systems Research (CIDR 2021), 2021. [PDF]
- Ease.ML: A Lifecycle Management System for MLDev and MLOps.
Leonel Aguilar, David Dao, Shaoduo Gan, Nezihe Merve Gurel, Nora Hollenstein, Jiawei Jiang, Bojan Karlas, Thomas Lemmin, Tian Li, Yang Li, Susie Rao, Johannes Rausch, Cedric Renggli, Luka Rimanic, Maurice Weber, Shuai Zhang, Zhikuan Zhao, Kevin Schawinski, Wentao Wu, and Ce Zhang.
In Conference on Innovative Data Systems Research (CIDR 2021), 2021. [PDF]
- A Data Quality-Driven View of MLOps.
Cedric Renggli, Luka Rimanic, Nezihe Merve Gurel, Bojan Karlas, Wentao Wu, and Ce Zhang.
In IEEE Data Engineering Bulletin, Vol. 44, No. 1: 11-23, 2021. [PDF] [arXiv]
- Model Averaging in Distributed Machine Learning: A Case Study with Apache Spark.
Yunyan Guo, Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Bin Cui, and Jianzhong Li.
In VLDB Journal, Vol. 30, No. 4: 693-712, 2021. [PDF]
-
2020
- Helios: Hyperscale Indexing for the Cloud & Edge.
Rahul Potharaju, Terry Kim, Wentao Wu, Vidip Acharya, Steve Suh, Andrew Fogarty, Apoorve Dave, Sinduja Ramanujam, Tomas Talius, Lev Novik, and Raghu Ramakrishnan.
In Proceedings of the VLDB Endowment, Vol. 13, No. 12 (VLDB 2020): 3231-3244, 2020. [PDF] ["The Morning Paper" Part I] ["The Morning Paper" Part II]
- Ease.ml/snoopy in Action: Towards Automatic Feasibility Analysis for Machine Learning Application Development.
Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 13, No. 12 (VLDB 2020): 2837-2840, 2020. [PDF]
- Building Continuous Integration Services for Machine Learning.
Bojan Karlas, Matteo Interlandi, Cedric Renggli, Wentao Wu, Ce Zhang, Deepak Mukunthu Iyappan Babu, Jordan Edwards, Chris Lauren, Andy Xu, and Markus Weimer.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2020): 2407-2415, 2020 (oral presentation, 44/756). [PDF]
- ColumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent.
Zhipeng Zhang, Wentao Wu, Jiawei Jiang, Lele Yu, Bin Cui, and Ce Zhang.
In Proceedings of the IEEE 36th International Conference on Data Engineering (ICDE 2020): 1513-1524, 2020. [PDF]
-
2011 - 2019
- AI Meets AI: Leveraging Query Executions to Improve Index Recommendations.
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2019): 1241-1258, 2019. [PDF]
- Serverless Event-Stream Processing over Virtual Actors.
Philip A. Bernstein, Todd Porter, Rahul Potharaju, Alejandro Z. Tomsici, Shivaram Venkataramani, and Wentao Wu.
In Conference on Innovative Data Systems Research (CIDR 2019), 2019. [PDF]
- Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization.
Cedric Renggli, Frances Ann Hubis, Bojan Karlas, Kevin Schawinski, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 12, No.12 (VLDB 2019): 1962-1965, 2019. [PDF]
- Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment.
Cedric Renggli, Bojan Karlas, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, and Ce Zhang.
In Proceedings of the 2nd SysML Conference (SysML 2019), 2019. [PDF] [arXiv] ["The Morning Paper"]
- MLlib*: Fast Training of GLMs using Spark MLlib.
Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Lele Yu, and Bin Cui.
In Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE 2019): 1778-1789, 2019. [PDF]
- Plan Stitch: Harnessing the Best of Many Plans.
Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya.
In Proceedings of the VLDB Endowment, Vol. 11, No. 10 (VLDB 2018): 1123-1136, 2018. [PDF]
- Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads.
Tian Li, Jie Zhong, Ji Liu, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 11, No. 5 (VLDB 2018): 607-620, 2018. [PDF] [arXiv]
- MLBench: Benchmarking Machine Learning Services Against Human Experts.
Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 11, No. 10 (VLDB 2018): 1220-1232, 2018. [PDF] [arXiv] [Datasets]
- Ease.ml in Action: Towards Multi-tenant Declarative Learning Services.
Bojan Karlas, Ji Liu, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 11, No. 12 (VLDB 2018): 2054-2057, 2018. [PDF]
- Semantic Bootstrapping: A Theoretical Perspective.
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu.
In Proceedings of the 33rd International Conference on Data Engineering (ICDE 2017): 7-8, 2017 (TKDE poster). [PDF]
- Semantic Bootstrapping: A Theoretical Perspective.
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu.
In IEEE Transactions on Knowledge and Data Engineering, Vol. 29, No. 2: 446-457, 2017. [PDF]
- Towards Interactive Debugging of Rule-based Entity Matching.
Fatemah Panahi, Wentao Wu, AnHai Doan, and Jeffrey F. Naughton.
In Proceedings of the 20th International Conference on Extending Database Technology (EDBT 2017): 354-365, 2017. [PDF]
- MLog: Towards Declarative In-Database Machine Learning.
Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang.
In Proceedings of the VLDB Endowment, Vol. 10, No. 12 (VLDB 2017): 1933-1936, 2017. [PDF]
- How Good Are Machine Learning Clouds for Binary Classification with Good Features?
Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang.
In Proceedings of the 2017 Symposium on Cloud Computing (SoCC 2017): 649, 2017 (extended abstract). [PDF]
- An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision.
Ce Zhang, Wentao Wu, and Tian Li.
In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics (HILDA@SIGMOD 2017): 3:1-3:6, 2017. [PDF]
- Sampling-Based Query Re-Optimization.
Wentao Wu, Jeffrey F. Naughton, and Harneet Singh.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016): 1721-1736, 2016. [PDF] [FULL] [arXiv] [Slides]
- On Debugging Non-Answers in Keyword Search Systems.
Akanksha Baid, Wentao Wu, Chong Sun, AnHai Doan, and Jeffrey F. Naughton.
In Proceedings of the 18th International Conference on Extending Database Technology (EDBT 2015): 37-48, 2015. [PDF]
- Uncertainty Aware Query Execution Time Prediction.
Wentao Wu, Xi Wu, Hakan Hacigümüs, and Jeffrey F. Naughton.
In Proceedings of the VLDB Endowment, Vol. 7, No. 14 (VLDB 2014): 1857-1868, 2014. [PDF] [FULL] [arXiv] [Slides]
- Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads.
Wentao Wu, Yun Chi, Hakan Hacigümüs, and Jeffrey F. Naughton.
In Proceedings of the VLDB Endowment, Vol. 6, No. 10 (VLDB 2013): 925-936, 2013. [PDF] [FULL] [Slides]
- Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?
Wentao Wu, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton.
In Proceedings of the 29th International Conference on Data Engineering (ICDE 2013): 1081-1092, 2013. [PDF] [FULL] [Slides]
- Probase: A Probabilistic Taxonomy for Text Understanding.
Wentao Wu, Hongsong Li, Haixun Wang and Kenny Q. Zhu.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2012): 481-492, 2012. [PDF] [FULL] [Slides]
- Context-aware Search for Personal Information Management Systems.
Jidong Chen, Wentao Wu, Hang Guo and Wei Wang.
In Proceedings of the 12th SIAM International Conference on Data Mining (SDM 2012): 708-719, 2012. [PDF]
- iMecho: A Context-Aware Desktop Search System.
Jidong Chen, Hang Guo, Wentao Wu and Wei Wang.
In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011): 1269-1270, 2011. [PDF]
- K-Symmetry Model for Identity Anonymization in Social Networks.
Wentao Wu, Yanghua Xiao, Wei Wang, Zhenying He and Zhihui Wang.
In Proceedings of the 13th International Conference on Extending Database Technology (EDBT 2010) : 111-122, 2010. [PDF] [FULL]
-
2000 - 2009
- iMecho: An Associative Memory Based Desktop Search System.
Jidong Chen, Hang Guo, Wentao Wu and Wei Wang.
In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009): 731-740, 2009. [PDF]
- Personalization As A Service: The Architecture and A Case Study.
Hang Guo, Jidong Chen, Wentao Wu and Wei Wang.
In Proceedings of the 1st International CIKM Workshop on Cloud Data Management (CloudDb 2009): 1-8, 2009. [PDF]
- Search Your memory! - An Associative Memory Based Desktop Search System.
Jidong Chen, Hang Guo, Wentao Wu and Chunxin Xie.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009): 1099-1102, 2009. [PDF]
- Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs.
Yanghua Xiao, Wentao Wu, Jian Pei, Wei Wang and Zhenying He.
In Proceedings of the 12th International Conference on Extending Database Technology (EDBT 2009): 493-504, 2009. [PDF]
- Efficient Algorithms for Node Disjoint Subgraph Homeomorphism Determination.
Yanghua Xiao, Wentao Wu, Wei Wang and Zhenying He.
In Proceedings of 13th International Conference on Database Systems for Advanced Applications (DASFAA 2008): 452-460, 2008. [FULL] [arXiv]
- Structure-based Graph Distance Measures of High Degree of Precision.
Yanghua Xiao, Hua Dong, Wentao Wu, Momiao Xiong, Wei Wang and Baile Shi.
In Pattern Recognition, Vol. 41, No. 12: 3547-3561, 2008. [PDF]
- Symmetry-based Structure Entropy of Complex Networks.
Yanghua Xiao, Wentao Wu, Hui Wang, Momiao Xiong and Wei Wang.
In Physica A, Vol. 387, No. 11: 2611-2619, 2008. [PDF]
- Mining Conserved Topological Structures from Large Protein-Protein Interaction Networks.
Yanghua Xiao, Wei Wang, and Wentao Wu.
In Proceedings of the 18th IEICE Data Engineering Workshop/5th DBSJ Annual Meeting (DEWS 2007), 2007. [PDF]
Preprints
- TablePuppet: A Generic Framework for Relational Federated Learning.
Lijie Xu, Chulin Xie, Yiran Guo, Gustavo Alonso, Bo Li, Guoliang Li, Wei Wang, Wentao Wu, and Ce Zhang
In arXiv Preprint, 2024. [arXiv]
- Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter.
Frances Ann Hubis, Wentao Wu, and Ce Zhang.
In arXiv Preprint, 2019. [arXiv]
- Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences.
Xi Wu, Matthew Fredrikson, Wentao Wu, Somesh Jha, and Jeffrey F. Naughton.
In arXiv Preprint, 2015. [arXiv]
Unpublished and Miscellaneous
- A Brief Overview of Query Optimization.
Wentao Wu, 2018.
- Suppression Strikes Back: On the Interaction of Thresholding and Differential Privacy.
Xi Wu, Wentao Wu, Chen Zeng, and Jeffrey F. Naughton, 2015.
- Sampling-Based Cardinality Estimation Algorithms: A Survey and An Empirical Evaluation.
Wentao Wu, 2012.
- Probase: a Universal Knowledge Base for Semantic Search.
Zhongyuan Wang, Jiuming Huang, Hongsong Li, Bin Liu, Bin Shao, Haixun Wang, Jingjing Wang, Yue Wang, Wentao Wu, Jing Xiao, and Kenny Q. Zhu, 2010.
Theses