SaiSuresh KrishnaKumaran

Wisconsin Madison

[Courses] [Contact Information] [Resume] [Publication] [Projects] [Miscellaneous] [Useful Links]

Welcome to the Home Page of SAISURESH KRISHNAKUMARAN!

I am a Graduate Student at The University of Wisconsin Madison Computer Sciences department. I did my B.E (Bachelor in Engineering) specializing in Computer Science and Engineering at College of Engineering, Guindy, Anna University, INDIA.


Spring 2006
     CS838 - Advanced Natural Language Processing
     CS747 - Advanced Computer Systems Analysis Techniques

Fall 2005
     CS701 - Construction of Compilers
     CS547 - Computer System Modelling Fundamentals
     Beginning Swimming

Spring 2005
     CS736 - Advanced Operating Systems
     CS757 - Advance Computer Architecture II
     Ice Skating
     Introduction to Yoga Practise

Fall 2004
     CS764 - Topics in DataBase Management
     CS752 - Advance Computer Architecture I

Contact Information

Mailing Address
     42 N Orchard Street,
     Madison, WI - 53715
     Ph: 1-608-3356970

Office Address
     5385, Department of Computer Sciences,
1210, West Dayton Street,
Madison, WI-53706.
Permanent Address
5 Astalakshmi Avenue,
      Chennai - 601302,
      Tamil Nadu,

    cs.wisc.edu - ksai
    yahoo.com - sai_suresh_krishna

    Please send an email to <ksai AT cs DOT wisc DOT edu>.


Saisuresh Krishnakumaran, Sai Arunachalam, - 'Towards economic Trace Caches-a profile based approach', Poster session of the 10th International Conference on High Performance Computing 2003, Hyderabad, India. [pdf]

Abstract: Our scheme lays out a profile-based approach to increase the efficiency of the trace cache. The application program is profiled on its first run and the resulting profile in used thenceforth for future runs, provided the run time environment does not change. The program is partitioned and the profiles seek to differentiate between the most important and the not so important traces in each of the subset. This information is judiciously used to achieve enhanced performance of the trace cache.


Graduate Projects:

"Compiler support for Transactional Memories."
(This was a 2 member group project done as a part of Advanced Compiler Construction course under the guidance of Prof. Charles Fischer.

New architectures open avenues for innovation in compiler technologies. With increasing interests in Transactional Memories as a mechanism for concurrency management in multiprocessors, compiler techniques especially optimization needs to be revisited. For this project we explore some optimizations that could benefit from the atomicity property of transactions. We investigate how traditional optimizations change and also explore a new way to use transactional architectures.

"Exploring I/O in a Virtual Machine Environment."
(This was a 2 member group project done as a part of Advanced Operating Systems course under the guidance of Prof. Remzi H. Arpaci-Dusseau.)

In virtualized environments, an operating system may not have complete knowledge about its resources, as it sees only virtualized forms of physical resources. In this paper, we show how the lack of information about a disk's activity can affect performance of a virtual machine. Specifically, we address how information about disk idleness can be passed to virtual machines so that idle disk periods can be effectively utilized to maximize disk bandwidth. The main focus of this paper is the discussion of various mechanisms that could be applicable in a virtualized environment in order to effectively expose such information and exercise control. We discuss designs to infer the number of dirty pages in each domain from the VMM, and to coerce a domain to flush its dirty pages. Finally, we present evaluation of our approaches.

"Reducing Request Bandwidth in Token Coherence."
(This was a 2 member group project done as a part of Parallel Computer Architecture (Advanced Computer Architecture II) course under the guidance of Prof. Mark Hill.)

Coarse Grained Coherence tracking technique uses the knowledge of the coherence states of cache blocks in external caches to optimize the broadcast requests. In this paper we have extended Coarse Grained Coherence Tracking (CGCT) technique to directory protocol and have analyzed the effectiveness of this approach to reduce request broadcast overhead in Token-B protocol. Our experiments show that we have effectively avoided more than 50% of unnecessary broadcasts. However bandwidth measurements show only a modest improvement of around 5% savings in the total network bandwidth. False sharing and workload characteristics decide the benefits that could be obtained from our technique.

"Optimization by query reordering using the Buffer-pool."

(This was a 2 member group project done as a part of Topics in Database Management course under the guidance of Prof. Jeff Naughton.)

Conventional query optimizers assume that all data are disk resident while optimizing queries. Techniques such as Buffer Pool Aware Query Optimization try to attain better performance by utilizing the contents of the buffer pool. They have analyzed how data present in the buffer pool can affect the choice of query plans in an optimizer. However, these techniques still leave a wider scope for improvement by performing optimization and reordering as separate phases. In our work, we have shown how reordering these queries and optimization can be combined in a single phase to gain significant performance improvement. Reordering queries minimizes the total IO cost by utilizing the existing buffer pool contents. We have developed two heuristics to reorder the queries and experimentally validated that one of them actually outperforms the optimal algorithm, when cost of computing the optimal order is included.

"Design and Implementation of Continual Flow Pipelines (CFP)."
(This was a 3 member group project done as a part of Advanced Computer Architecture I course under the guidance of Prof. Mark Hill.)

Large instruction window processors can achieve high performance by supplying more instructions during long latency load misses, thus effectively hiding these latencies. Continual Flow Pipeline (CFP) architectures provide high-performance by effectively increasing the number of actively executing instructions without increasing the size of the cycle-critical structures. A CFP consists of a Slice Processing Unit which stores missed loads and their forward slices inside a Slice Data Buffer. This makes it possible to open up the resources occupied by these idle instructions to new instructions. In this project, we have designed and implemented CFP in Simplescalar. We have compared conventional pipelines to CFPs by running them on various benchmarks in the SPEC integer benchmarks suite. We also studied the behavior of mispredicted branches dependent on load misses, which turn out to be the main bottleneck in CFPs.  A comparison of the performance of CFPs with ideal and non-ideal fetch mechanisms was also analyzed.

Current Course Information:

"Natural Language Processing."
 (Project information will be updated soon.)

Course web site: http://www.cs.wisc.edu/~jerryzhu/cs838.html
Take a look at an interesting survey on semi-supervised learning.

Undergraduate Projects:

”A Soft Core Processor for Parameterized HPL-PD Architecture.”
(This was my undergraduate senior year project work under the guidance of Dr. Ranjani Parthasarathi)

As a two member team I was involved in the design and development of a soft-core for HPL-PD, a parameterized EPIC architecture written using Handel-C. Its main purpose is to serve as an investigating tool to study processor architectures that exploit significant ILP with Compiler support. The core can be used as an effective research medium that decreases time for hardware realization. The soft-core can be synthesized in an FPGA and thus would be more accurate than software simulators.

“Disk access optimization using ‘deferred copy’  and disk block sharing in UNIX file system.”
Added fields in the Inode and disk block structures of the UNIX file system. These fields were manipulated by new system calls in order optimize the disk block accesses

"Re-configurable Architectural Kit (RAK)."
RAK is implemented in VHDL and serves as a platform to analyze the performance of various static processor configurations. I have worked on it as a part of the SIGARCH group at Anna University. I have added a number of modules to the kit and improved some existing ones. I was also involved in porting the VHDL bit files on to FPGA boards.

"A compiler using LEX and YACC."
Used lex and yacc and developed a compiler for a subset of the C programming language.

"Dynamic Instruction Reuse."
Implemented the concept dynamic instruction reuse in SimpleScalar version 3.0 toolkit and analyzed the performance gain achieved. Compared the relative performance of dynamic instruction reuse and value prediction. We also observed that a combination of these two concepts brings about a significant improvement in performance of processors.

"Simulation of Task Scheduling and Interrupt Processing by Microprocessors."

”Implementation of ‘A mail server and a messenger’ designed using Rational Rose.”

“Device Driver for a virtual CD drive in Linux environment.”

“Simulation of the Control Logic for an Automatic Teller Machine using VHDL.”

”Designed and implemented an Automatic Traffic Controller using Digital circuits.”


    Secured second place in the 2004 ACM North Central America Programming Contest.  [Standings]

     Team Members (TEAM Q - Jesse Beder, Dylan Dewitt, Saisuresh Krishnakumaran)          [Statistics]
     Interesting article in "The Capital Times" about the achievements of the UW-teams( including ours "Team Q") at the Regionals. [photo]
     The article as it appeared in the paper (continuation).


UW Madison Page last modified: Monday, 23-May-2005 19:30:01 CDT
Feedback or content questions: send email to - ksai [cs.wisc.edu] -
Technical or accessibility issues: lab@cs.wisc.edu
Copyright © 2002, 2003, 2004 The Board of Regents of the University of Wisconsin System.

Valid HTML 4.01!