» Polymorphous Architectures: A Unified Approach for Extracting Concurrency of Different Granularities

| Sorted by Date | Classified by Publication Type | Classified by Research Category |

Karthikeyan Sankaralingam. Polymorphous Architectures: A Unified Approach for Extracting Concurrency of Different Granularities. Ph.D. Thesis, The University of Texas at Austin, Department of Computer Sciences, 2006.

Download

[PDF] [HTML]

Abstract

Processor architects today are faced by two daunting challenges:emerging applications with heterogeneous computation needs andtechnology limitations of power, wire delay, and processvariation. Designing multiple application-specific processors orspecialized architectures introduces design complexity, a softwareprogrammability problem, and reduces economies of scale. There is apressing need for design methodologies that can provide support forheterogeneous applications, combat processor complexity, and achieveeconomies of scale. In this dissertation, we introduce the notion ofarchitectural polymorphism to build such scalable processors thatprovide support for heterogeneous computation by supporting differentgranularities of parallelism. Polymorphism configures coarse-grainedmicroarchitecture blocks to provide an adaptive and flexible processorsubstrate. Technology scalability is achieved by designing anarchitecture using scalable and modular microarchitecture blocks.We use the dataflow graph as the unifying abstraction layer acrossthree granularities of parallelism--instruction-level, thread-level,and data-level. To first order, this granularity of parallelism is themain difference between different classes of applications. Allprograms are expressed in terms of dataflow graphs and directly mappedto the hardware, appropriately partitioned as required by thegranularity of parallelism. We introduce Explicit Data Graph Execution(EDGE) ISAs, a class of ISAs as an architectural solution forefficiently expressing parallelism for building technology scalablearchitectures.We developed the TRIPS architecture implementating an EDGE ISA using aheavily partitioned and distributed microarchitecture to achievetechnology scalability. The two most significant features of the TRIPSmicroarchitecture are its heavily partitioned and modular design, andthe use of microarchitecture networks for communication acrossmodules. We have also built a prototype TRIPS chip in 130nm ASICtechnology composed of two processor cores and a distributed 1MBNon-Uniform Cache Access Architecture (NUCA) on-chip memory system.Our performance results show that the TRIPS microarchitecture whichprovides a 16-issue machine with a 1024-entry instruction window cansustain good instruction-level parallelism. On a set of hand-optimizedkernels IPCs in the range of 4 to 6 are seen, and on a set ofbenchmarks with ample data-level parallelism (DLP), compiler generatedcode produces IPCs in the range of 1 to 4. On the EEMBC and SPECCPU2000 benchmarks we see IPCs in the range of 0.5 to 2.3. Comparingperformance to the Alpha 21264, which is a high performancearchitecture tuned for ILP, TRIPS is up to 3.4 times better on thehand optimized kernels. However, compiler generated binaries for theDLP, EEMBC, and SPEC CPU2000 benchmarks perform worse on TRIPScompared to an Alpha 21264. With more aggressive compiler optimizationwe expect the performance of the compiler produced binaries toimprove.The polymorphous mechanisms proposed in this dissertation areeffective at exploiting thread-level parallelism and data-levelparallelism. When executing four threads on a single processor,significantly high levels of processor utilization are seen; IPCs arein the range of 0.7 to 3.9 for an application mix consisting of EEMBCand SPEC CPU2000 workloads. When executing programs with DLP, thepolymorphous mechanisms we propose provide harmonic mean speedups of2.1X across a set of DLP workloads, compared to an execution model ofextracting only ILP. Compared to specialized architectures, thesemechanisms provide competitive performance using a single executionsubstrate.

Additional Information

This is a test of the extra info broadcasting system.

BibTeX

 @phdthesis{diss06:sankaralingam,
   author={Karthikeyan Sankaralingam},
   title="{Polymorphous Architectures: A Unified Approach for Extracting Concurrency of Different Granularities}",
   school={The University of Texas at Austin, Department of Computer Sciences},
   year={2006},
   month={October},
   abstract = {
 Processor architects today are faced by two daunting challenges:
 emerging applications with heterogeneous computation needs and
 technology limitations of power, wire delay, and process
 variation. Designing multiple application-specific processors or
 specialized architectures introduces design complexity, a software
 programmability problem, and reduces economies of scale. There is a
 pressing need for design methodologies that can provide support for
 heterogeneous applications, combat processor complexity, and achieve
 economies of scale. In this dissertation, we introduce the notion of
 architectural polymorphism to build such scalable processors that
 provide support for heterogeneous computation by supporting different
 granularities of parallelism. Polymorphism configures coarse-grained
 microarchitecture blocks to provide an adaptive and flexible processor
 substrate. Technology scalability is achieved by designing an
 architecture using scalable and modular microarchitecture blocks.
 We use the dataflow graph as the unifying abstraction layer across
 three granularities of parallelism--instruction-level, thread-level,
 and data-level. To first order, this granularity of parallelism is the
 main difference between different classes of applications. All
 programs are expressed in terms of dataflow graphs and directly mapped
 to the hardware, appropriately partitioned as required by the
 granularity of parallelism. We introduce Explicit Data Graph Execution
 (EDGE) ISAs, a class of ISAs as an architectural solution for
 efficiently expressing parallelism for building technology scalable
 architectures.
 We developed the TRIPS architecture implementating an EDGE ISA using a
 heavily partitioned and distributed microarchitecture to achieve
 technology scalability. The two most significant features of the TRIPS
 microarchitecture are its heavily partitioned and modular design, and
 the use of microarchitecture networks for communication across
 modules. We have also built a prototype TRIPS chip in 130nm ASIC
 technology composed of two processor cores and a distributed 1MB
 Non-Uniform Cache Access Architecture (NUCA) on-chip memory system.
 Our performance results show that the TRIPS microarchitecture which
 provides a 16-issue machine with a 1024-entry instruction window can
 sustain good instruction-level parallelism. On a set of hand-optimized
 kernels IPCs in the range of 4 to 6 are seen, and on a set of
 benchmarks with ample data-level parallelism (DLP), compiler generated
 code produces IPCs in the range of 1 to 4. On the EEMBC and SPEC
 CPU2000 benchmarks we see IPCs in the range of 0.5 to 2.3. Comparing
 performance to the Alpha 21264, which is a high performance
 architecture tuned for ILP, TRIPS is up to 3.4 times better on the
 hand optimized kernels. However, compiler generated binaries for the
 DLP, EEMBC, and SPEC CPU2000 benchmarks perform worse on TRIPS
 compared to an Alpha 21264. With more aggressive compiler optimization
 we expect the performance of the compiler produced binaries to
 improve.
 The polymorphous mechanisms proposed in this dissertation are
 effective at exploiting thread-level parallelism and data-level
 parallelism. When executing four threads on a single processor,
 significantly high levels of processor utilization are seen; IPCs are
 in the range of 0.7 to 3.9 for an application mix consisting of EEMBC
 and SPEC CPU2000 workloads. When executing programs with DLP, the
 polymorphous mechanisms we propose provide harmonic mean speedups of
 2.1X across a set of DLP workloads, compared to an execution model of
 extracting only ILP. Compared to specialized architectures, these
 mechanisms provide competitive performance using a single execution
 substrate.
 },
   bib_dl_pdf = "http://www.cs.wisc.edu/~karu/docs/diss/karu-diss.pdf",
   bib_dl = "http://www.cs.wisc.edu/~karu/docs/diss/",
   bib_pubtype = {Other},
   bib_rescat = {Simulation,Hardware,Architecture},
   bib_extra_info = {This is a test of the extra info broadcasting system.}
 }

Generated by bib.pl (written by Patrick Riley ) on Sat Jul 15, 2017 14:55:44 time=1207019082


Page last modified on December 12, 2017