Arun Kumar

4354 Computer Sciences,
1210 West Dayton Street,
Madison, WI-53706
Email: (1stname)@cs.wisc.edu
Phone: 608-890-0447


I am a fourth year graduate student in the Department of Computer Sciences, at the University of Wisconsin-Madison. I obtained my Bachelors in Computer Science and Engineering from the Indian Institute of Technology, Madras.

Research:

My primary research interests are in data management, especially the intersection of data management and machine learning (popularly known as "Big Data Analytics"). I am also interested in applied machine learning.

I am advised by professor Christopher Ré in the Database Group here. My current research focuses on problems in data analytics and managing uncertain data, as part of Project Hazy.

I spent two summers interning at Oracle Labs (2012) and IBM Research Almaden (2011), working on projects in data analytics. I was an RA at the Microsoft Jim Gray Systems Lab for a year (2010-11). In the past, I have worked on data management issues in networked systems.

News:

  • Article in ACM Queue magazine (also invited to CACM) on Project Hazy's approach to Big Data Analytics!New
  • Talk on Brainwash at CIDR 2013New
  • Poster on Bismarck at Wisconsin DBA 2012
  • Talk on Staccato at VLDB 2012
  • Talk on Bismarck at SIGMOD 2012
  • Poster on Staccato at Wisconsin DBA 2011

Projects/Publications:

BrainwashNew
In this project, we aim to systematize the black art of feature engineering in analytics using data management ideas.
  • Brainwash: A Data System for Feature Engineering
    Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang
    CIDR 2013 [Vision Paper]
Victor/Bismarck
In this project, we are building a unified system to handle several data analytics techniques. We use some ideas from the mathematical optimization literature and standard RDBMS features to achieve simplicity, efficiency and scalability. We also contributed some code to the open-source library MADlib.
  • Towards a Unified Architecture for in-RDBMS Analytics
    Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Ré
    ACM SIGMOD 2012 [Paper] [TechReport] [Code and Data]
  • The MADlib Analytics Library or MAD Skills, the SQL
    Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar
    VLDB 2012 (Industrial Track) [Paper]

Staccato
In this project, we integrate the management of uncertain content, specifically Optical Character Recognition (OCR) data, with an RDBMS. We use a probabilistic model and some ideas from statistics to tradeoff between quality and performance.

Past Projects/Publications:

InfoNames: Information-based naming of networked content
  • Flexible Multimedia Content Retrieval Using InfoNames
    Arun Kumar, Ashok Anand, Athula Balachandran, Vyas Sekar, Aditya Akella, and Srinivasan Seshan
    ACM SIGCOMM 2010 (Demo) [pdf]
  • InfoNames: An Information-Based Naming Scheme for Multimedia Content
    Arun Kumar, Athula Balachandran, Vyas Sekar, Aditya Akella, and Srinivasan Seshan
    UW-Madison Technical Report TR1677 [pdf]

WSNs: Mobile agent-based data collection in sensor networks
  • On Reducing Delay in Mobile Data Collection-based WSNs
    Arun K. Kumar, Krishna M. Sivalingam, and Adithya Kumar
    Springer Wireless Networks 2012 [pdf]
  • Energy-Efficient Mobile Data Collection in WSNs with Delay Reduction using Wireless Communication
    Arun K. Kumar, and Krishna M. Sivalingam
    IEEE/ACM COMSNETS 2010 [pdf]


Courses:

MA443: Applied Linear Algebra (Fall 2012)
ST632: Stochastic Processes (Fall 2011)
CS769: Advanced Natural Language Processing (Spring 2011)
CS537: Operating Systems (Spring 2011)
CS760: Machine Learning (Fall 2010)
CS787: Advanced Algorithms (Fall 2010)
CS764: Advanced Database Management Systems (Spring 2010)
CS740: Advanced Computer Networks (Spring 2010)
CS784: Data Models and Languages (Fall 2009)
CS838: Rethinking the Internet Architecture (Fall 2009)


Misc fun stuff:


Last Updated: Dec 25, 2012