Graduate Student, Computer Architecture
My main research focus is in Computer Architecture. Since, Spring 2013 I have been working with Prof. Karu Sankaralingam in the Vertical Research Group. My main interests lie in processor microarchitecture, programmable accelerators, von-neumann/dataflow architecture and GPU computing/microarchitecture.
Refer to our HPCA 2016 Publication for more details on the preliminary idea about this research → [pdf]
My main focus as part of my PhD thesis has been on exploring and designing an efficient programmable accelerator for multiple application domains, which is general purpose programmable. We believe the main driving principles behind any such programmable accelerator architecture employs five important specialization principles [Concurrency, Spatial Communication, Computation, Data-Reuse and Co-ordination]
to achieve a good balance between generality and efficiency.
Alternatively, in recent few years Domain specific accelarators (DSAs) are more popular with Moore's law ending (or slowing) and post-dennard scaling effects, and has lead to proposals of sea of accelerators which act as co-processors or offload accelerator engines to the main host processor. But, with increasing area/power costs, it is important to look at accelerators which are general purpose programmable for multiple applications domains and thus save area/power and NRE design costs. This project focuses on one such big picture of a
generic programmable accelerator fabric, which can be tuned to accelerate multiple application domains by employing simple specialization principles.
GenAccel -- General Purpose Programmable Hardware Accelerator
More details to be put here soon.
Currently, my research focuses on designing and implementing once such instance of a general purpose programmable hardware accelerator named GenAccel, employing all the specialization principles explained above. This specialization fabric aims at accelerating any general purpose data-intensive regular memory access application with same performance as ASIC or DSA with only 2x-4x overhead in area/power.
We are building a complete programmable framework to program such specialization fabrics and
also a Chisel based hardware design framework to explore difference hardware based design micro-architectural decisions for such fabrics.
We plan to release this entire framework and make it open source for others to explore such programmable accelerators and also to push forward research on developing more easier programming models/frameworkds to program such fabrics.
Von-Neumann/Dataflow Hybrid Architecture
More details about this in our ISCA 2015 paper here → [pdf]
I started my PhD, by beginning to work on a project with one more senior research member of Vertical Group. This work focuses on a Von-Neumann/Dataflow hybrid architecture to exploit Instruction Level Parallelism (ILP) in irregular applications. General purpose processors incur lot of power overheads due to the heavy-weight instruction processing task and support for dynamic extraction of data-dependence graph (DDG). Hybrid architectures with explicit dataflow support can directly offload and execute
the DDG for coarser-grained program regions and benefit from better performance and energy efficiency. However, the challenge is to achieve the benefits without excess hardware complexity and switching overheads of dataflow archietctures. This work focuses on addressing those challenges with a novel dataflow microarchitecture inspired from decades of dataflow research and analyzes how irregular applications can take advantage of huge instruction level parallelism exposed.
Master's Research [Open Source GPGPU → MIAOW]Published in HotChips 27, CoolChips-XVIII and ACM TACO journal (Volume 12, Issue 2, 2015 ): Link to Publications
Interested readers can have a look at: MIAOW White paper and MIAOW Poster
The entire source code is released to public and can be accessed here - Github Source Code.
A detailed wiki walks through on compiling MIAOW and running tests - MIAOW Wiki
For more support or any questions, contact: email@example.com
For my Master's research, I was fortunate to be an active member of MIAOW GPU group. MIAOW (Many-Core Accelerator of Wisconsin) implements a General Purpose Programmable GPU (GPGPU) compute unit, cache modules and an ultra-threaded dispatcher based on AMD southern islands ISA. Unmodified OpenCL GPU kernels can be run on MIAOW and it gives realistic area and power estimates. MIAOW is not aimed at reproducing the performance aspects of an industry standard GPU and is built only for lower-level microarchitecture research on GPU.
A primary motivator for MIAOW's creation is the belief that software simulators of hardware such as CPUs and GPUs often miss many subtle aspects that can skew the performance, power, and other quantitative results that they produce. As an actual implementation of a GPU's logic, the Vertical Research Group believes that MIAOW can be a useful tool in producing not only more accurate quantitative results when benchmarking GPGPU workloads but also provide context for the architectural complexities of actually implementing newly proposed algorithms and designs that are intended to improve performance or other desired characteristics.