Projects

This page is OUTDATED!

Contact me for latest resume.

Automatic Resume Segmentation

Worked in the Yahoo! Hot Jobs team on the research, design and development of a smart tool that could automatically extract relevant information from a resume. The tool employed a novel approach, which eliminated the need for manual training. Coded in Perl and C++. I did this project during my internship at Yahoo.

Virtual Machines in Grid Environments

The ability to securely run arbitrary untrusted code on a wide variety of execution platforms is a challenging problem in the Grid community. One way to achieve this is to run the code inside a contained, isolated environment, namely a "sandbox". Virtual machines provide a natural solution to the security and resource management issues that arise in sandboxing. We explore different designs for the VM-enabled sandbox and evaluate them with respect to various factors like structure, security guarantees, user convenience, feasibility and overheads in one such grid environment. Our experiments indicate that the use of on-demand VMs imposes a constant startup overhead, with I/O-intensive applications incurring additional overheads depending on the design of the sandbox. Used Xen VMM and Condor. Appeared in the proceedings of the 2nd Workshop on Real, Large Distributed Systems (WORLDS '05), December 2005, San Francisco. Click here for the paper. This project was done as part of my Graduate Distributed Systems Class.

Clustering of Images using LDA

Image representation and vocabulary definition assume greater importance when clustering methods that have worked well for text are applied to images. In this project, which was done as part of my Graduate Computer Vision class, the Latent Dirichlet Model (LDA) was applied to the problem of image clustering. The LDA model provides a highly compressed yet succint model of an image which can be used for applications like clustering, retrieval, and classification. Bag-of-words model using image segments to form the vocabulary words was used. Experimented with simple datasets. Read the paper here.

Survey of Coreference Resolution Techniques

Coreference resolution is the task of resolving noun phrases to the entities that they refer to. Much work has been done in the past in this area and the related area of anaphora resolution. In this paper, we present a literature survey that is divided into two broad categories. Discussed first are papers that are linguistically motivated - based on syntax, focus and Centering theory. We then discuss machine learning techniques that are applied to coreference resolution, which include decision trees, conditional random fields, clustering, cotraining, and others. Further, we discuss evaluation methods, coreference classes and examine simple genre-dependent noun phrase characteristics of the Brown corpus. Finally, a preliminary proposal is presented, along with future directions. The project was done as part of my Advanced Natural Language Processing graduate class.

Utilization of Disk Idleness in a Virtual Machine Environment

In virtualized environments, an operating system may not have complete knowledge about its resources, as it sees only virtualized forms of physical resources. In this paper, we show how the lack of information about a disk's activity can affect performance of a virtual machine. Specifically, we address how information about disk idleness can be passed to virtual machines so that idle disk periods can be effectively utilized to maximize disk bandwidth. The main focus of this paper is the discussion of various mechanisms that could be applicable in a virtualized environment in order to effectively expose such information and exercise control. We discuss designs to infer the number of dirty pages in each domain from the VMM, and to coerce a domain to flush its dirty pages. Finally, we present an evaluation of our approaches. Designed, implemented and evaluated various approaches to transparently pass information and exercise control over guest operating systems from a host OS. Modified various parts of Xen VMM and Linux code. Appeared in the Proceedings of the Workshop on Interaction between Operating Systems and Computer Architecture (WIOSCA '06) held in Boston, 2006. Click here for our paper. This project was done as part of my Graduate Operating Systems Class.

Domain-Specific Document Clustering

This project was a two-semester independent study during my junior year in undergrad. Got introduced to Information Retrieval and Natural Language Processing. Surveyed lexical, syntactic and semantic approaches. Developed and experimented with various models. Coded in C and Perl.

Predictive Texting for Tamil

The T-9 (Text on 9 keys) system is now popular in modern mobile phones. It basically reduces the number of keystrokes required to spell a word when using a mobile phone for services such as text messaging, by guessing the most likely word given a sequence of input digits. I designed and developed such a Predictive Texting System for the Tamil language. This included compiling a sizeable lexicon of Tamil words represented in ISCII format. Appears in the proceedings of SIMPLE '04 (Symposium on Indian Morphology, Phonology, and Language Engineering), March 2004, Kharagpur, India. This project was done as a voluntary independent study at the Indian Institute of Science.

Design and Implementation of a Distributed Linux-based kernel

As an independent study project, we designed a distributed Linux-based kernel, and implemented it by modifying a few parts of the Linux 2.4.18 kernel. I handled the process migration part of the project. We added system calls, changed the fork() routine, modified task descriptor data structures and implemented few network functionalities at the kernel level. This was done during my undergrad senior year.

Design and Implementation of Continual Flow Pipelines

Large instruction window processors can achieve high performance by supplying more instructions during long latency load misses, thus effectively hiding these latencies. Continual Flow Pipeline (CFP) architectures provide high-performance by effectively increasing the number of actively executing instructions without increasing the size of the cycle-critical structures. A CFP consists of a Slice Processing Unit which stores missed loads and their forward slices inside a Slice Data Buffer. This makes it possible to open up the resources occupied by these idle instructions to new instructions. In this project, we have designed and implemented CFP in Simplescalar. We have compared conventional pipelines to CFPs by running them on various benchmarks in the SPEC integer benchmarks suite. We also studied the behavior of mispredicted branches dependent on load misses, which turn out to be the main bottleneck in CFPs. A comparison of the performance of CFPs with ideal and non-ideal fetch mechanisms was also analyzed. Read our project report here. This project was done as part of my Graduate Advanced Computer Architecture Class.

Design and Implementation of a Network Traffic Analyzer

We designed software in Java that provides different ways of analyzing network traffic. We implemented efficient flow sampling algorithms to store network flows, and used clustering algorithms for the analysis of traffic data. Done as part of the Introduction to Computer Networks class.

Design and Implementation of a Java Program Auralizer

We created a tool for auralizing Java programs. When executed, an auralized Java program can send commands to a sound server which then obeys these commands thereby resulting in souonds. Likely applications are debugging, announcement of special events such as intrusion attempts, and pure entertainment. The software package was developed in Java based on a client-server model. The project involved the design of a small programming language, the implementation of its compiler, and networking among other issues. This project was done during my undergrad senior year as part of the Software Engineering class.

Compiler Design and Implementation

Developed a compiler in C from scratch for a procedural language called Cradle. Done as part of my undergrad Programming Languages & Compiler Construction class.

Design and Implementation of a Mini-Operating System

Developed a boot-strapping OS with a command-line interface and minimal file operations, in x86 Assembly (as part of the undergrad OS course).