-------------------------------------------------------------------- CS 758 Programming Multicore Processors Fall 2012 Section 1 Instructor Mark D. Hill -------------------------------------------------------------------- ------------ DSL, etc. ------------ OUTLINE * Project Proposal Due * Map Reduce * LINQ * Domain Specific Languages * (Warehouse-Scale Computers) ------------- Dean and Ghemawat, MapReduce, OSDI 2004 Motivation * Google needs BS grads to write programs for their clusters * Want to abstract parallelism, distribution, fault-tolerance Example: Counting Word Frequency in Large Corpus MAP (String key, Iterator values) // key: document name // value: document counts for each word w in value: EmitIntermediate(w, "1"); REDUCE (String key, Iterator values) // key: a word // value: a list of counts int result = 0; for each v in values: result += ParseInt(v) Emit(AsString(result) Implementation Notes * Done by Gurus * MAP completely parallel * REDUCE must be associative * REDUCE requires gathering maps record with like key (e.g., hashing) * Load balance, fault tolerance, etc. C.f. Hadoop -- shared memory implementation C.f. Phoenix Ranger et al. HPCA 2006 -- study & code for multicore How will we do parallel programming models * Note Data parallel, MapReduce, SQL, LINQ, etc. --> very high level --------------------- Micro LINQ -- Language Integrated Query (LINQ, pronounced "link") visual DB & C# -- queries Related to SQL and Haskell http://en.wikipedia.org/wiki/Language_Integrated_Query var results = SomeCollection .Where(c => c.SomeProperty < 10) .Select(c => new {c.SomeProperty, c.OtherProperty}); foreach (var result in results) { Console.WriteLine(result.ToString()); } Domain Specific Languages ------------------------- http://en.wikipedia.org/wiki/Domain-specific_language A domain-specific language is created specifically to solve problems in a particular domain and is not intended to be able to solve problems outside it (although that may be technically possible). In contrast, general-purpose languages are created to solve problems in many domains. The domain can also be a business area. Some examples of business areas include: domain-specific language for life insurance policies developed internally in large insurance enterprise domain-specific language for combat simulation domain-specific language for salary calculation domain-specific language for billing A domain-specific language is somewhere between a tiny programming language and a scripting language, and is often used in a way analogous to a programming library. The boundaries between these concepts are quite blurry, much like the boundary between scripting languages and general-purpose languages. Examples Spiral for DSP http://www.spiral.net/ Green-Marl: A DSL for Easy and Efficient Graph Analysis Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun ASPLOS 2012 Stanford ... http://cgo2012.hyperdsls.org/ .. make DSL writing easier DSL advantages/disavantages http://en.wikipedia.org/wiki/Domain-specific_language Some of the advantages:[1][2] Domain-specific languages allow solutions to be expressed in the idiom and at the level of abstraction of the problem domain. The idea is domain experts themselves may understand, validate, modify, and often even develop domain-specific language programs. However, this is seldom the case.[5] Self-documenting code.[citation needed] Domain-specific languages enhance quality, productivity, reliability, maintainability, portability and reusability.[citation needed] Domain-specific languages allow validation at the domain level. As long as the language constructs are safe any sentence written with them can be considered safe.[citation needed] Some of the disadvantages: Cost of learning a new language vs. its limited applicability Cost of designing, implementing, and maintaining a domain-specific language as well as the tools required to develop with it (IDE) Finding, setting, and maintaining proper scope. Difficulty of balancing trade-offs between domain-specificity and general-purpose programming language constructs. Potential loss of processor efficiency compared with hand-coded software. Proliferation of similar non-standard domain specific languages, i.e. a DSL used within insurance company A versus a DSL used within insurance company B.[6] Non-technical domain experts can find it hard to write or modify DSL programs by themselves.[5] Increased difficulty of integrating the DSL with other components of the IT system (as compared to integrating with a general-purpose language). Low supply of experts in a particular DSL tends to raise labor costs. Harder to find code examples. ------------------------------ ---------------------------- Warehouse Scale Computers ---------------------------- See Synthesis Lecture -- Barroso and Holzle Luiz Andre Barroso, Jeffrey Dean, Urs Holzle, Web Search For a Planet: The Google Cluster Architecture, IEEE Micro, 23(2):22-28, March-April 2003 Comments by Mark D. Hill, 15 March 2004 Commodity Parts --------------- mid-range PC with bit disks carefully amortize cost lots of thread-level parallelism, but SMP price don't make sense Google Query ------------ DNS lookup with load balancing phase 1: inverted lookup for pages that match Split among shards with several machines per shared, selected by load balancing produces docids -- document ids phase 2: lookup docs (at least beginnings) GETTING SMALLER ... SeaMicro HotChips 2011 lides Start w/ slide 14 -- big pix slide 11 -- four virtual devices PCIe ethernet, 4 SATA disks BIOS UART GETTING LARGER .... Shipping Containers 40' Dry Freight Container Outside: 40' x 8' x 8.6' (height last) 20' Dry Freight Container Outside: **19'10''** x 8' x 8.6' (height last) and others. Three plugs: * Power 220v or higher, three phase? * Chilled water in then out * Network Advantages * Little on-site assembly (time) * Measurement isolation: e.g. performance/watt clear. * Gives vendor more degrees of freedom * Service-level agreement (SLA) regarding failures? * Bid vendors on big item * Become commodity? Disadvantages * Large-grain (can't buy small) * Hard to service (but see SLA) ???