CS 838: Topics in Parallel Computing
Spring 1999

UNIVERSITY OF WISCONSIN-MADISON
Computer Sciences Department

Administrative details

Instructor: Pavel Tvrdik
email: tvrdik@cs.wisc.edu
Office: CS 6376
Phone: 265-5907
Office hours: Tuesday/Thursday 9:30-11:00 a.m. or by appointment
Lecture times: Tuesday/Thursday 08:00-09:15 a.m.
Classroom: 1221 Computer Sciences

The contents

  1. Syllabus
  2. Schedule and materials
  3. Optional books
  4. Course requirements
  5. Term projects.
  6. Grading policy

The syllabus of the lectures

The aim of the course is to introduce you into the art of designing efficient and analyzing parallel algorithms for both shared-memory and distributed memory machines. It is structured into four major parts.

The first part of the course is a theoretical introduction into the field of design and analysis of parallel algorithms. We will explain the metrics for measuring the quality and performance of parallel algorithms with the emphasis on scalability and isoefficiency. To prepare the framework for parallel complexity theory, we will introduce a fundamental model, the PRAM model. Then we will introduce the basics of parallel complexity theory to provide a formal framework for explaining why some problems are easier to parallelize then some others. More specifically, we will study NC-algorithms and P-completeness.

The second part of the course will deal with communication issues of distributed memory machines. Processors in a distributed memory machine need to communicate to overcome the fact that there is no global common shared storage and that all the information is scattered among processors' local memories. First we survey interconnection topologies and communication technologies, their structural and computational properties, embeddings and simulations among them. All this will form a framework for studying interprocessor communication algorithms, both point-to-point and collective communication operations. We will concentrate mainly on orthogonal topologies, such as hypercubes, meshes, and tori, and will study basic routing algorithms, permutation routing, and one-to-all as well as all-to-all communication operation algorithms. We conclude with some more realistic abstract models for distributed memory parallel computations.

The third and fourth parts are devoted to the case studies of parallel algorithms. In the third part, we will study three families of distributed memory parallel algorithms, in which the communication algorithms explained above are heavily used. First we explain basic oblivious sorting algorithms. Parallel branch-and-bound algorithms are given to learn more about load balancing and program termination protocols for irregular parallel algorithms. Finally, we discuss important instances of efficient parallel algorithms on dense matrix, namely basic matrix operations and algorithms for solving systems of linear equations.

Finally, in the fourth part, we will get back to the PRAM model to enjoy the elegance of several well-known shared-memory parallel algorithms. We start with fundamental parallel PRAM NC-algorithms, such as parallel prefix sum, pointer jumping, and Euler tour construction. Based upon that, we will explain more complicated algorithms, such as parallel list ranking, tree contraction, and connected components algorithm.
Back to the beginning of the page

Class schedule and course materials

The following schedule is tentative and can change. I will prepare handout materials on the web for each lecture. The primary materials are lecture notes in condensed form prepared in latex using small fonts and placed on web in Postscript (ps), no later than at the end of the preceding week. Besides taking handout materials to the class, make sure to take papers or notebooks, you might need them to take notes of what we will be writing on the blackboard. Then two other kinds of materials will be generated semiautomatically from the lecture notes. The slides which you will see during lectures will appear on web also in Postscript form. Slides are prepared from lecture notes by ommiting less important parts, but using larger fonts. Finally, I will try to place on web html versions of (at least some) lecture notes.

Also specifications of homeworks will appear in the last column of the schedule table simultaneously with the corresponding course materials. For a list of books you might find useful for this course, see the list bellow.

Week Day Topic Lecture Notes Homeworks
1 Jan 19 Performance and scalability of parallel algorithms Section 1 (ps)
1 Jan 21 PRAM models Section 2 (ps) (slides) (html)
2 Jan 26 Parallel complexity theory - NC algorithms Section 3 (ps) (slides) (html)
2 Jan 28 Parallel complexity theory - P-completeness Section 4 (ps) (slides) (html) Homework 1 (ps) (html)
3 Feb 2 Interconnection networks - survey of topologies Section 5 (ps) (slides) (html)
3 Feb 4 Interconnection networks - properties Section 5 (ps) (slides) (html) Homework 2 (ps) (html)
4 Feb 9 Embeddings and simulations among INs I. Section 6 (ps) (slides) (html)
4 Feb 11 Embeddings and simulations among INs II. Section 6 (ps) (slides) (html) Homework 3 (ps) (html)
5 Feb 16 Routing and switching technologies Section 7 (ps) (slides) (html)
5 Feb 18 Virtual channels and deadlocks Section 8 (ps) (slides) (html)
6 Feb 23 Permutation routing in mesh-based networks Section 9 (ps) (slides) (html)
6 Feb 25 Permutation routing in hypercubic networks Section 10 (ps) (slides) (html) Homework 4 (ps)
7 Mar 2 One-to-all broadcast algorithms Section 11 (ps) (slides) (html)
7 Mar 4 Multicast, IRC, and one-to-all scatter algorithms Section 12 (ps) (slides) (html) Homework 5 (ps) (html)
8 Mar 16 All-to-all broadcast algorithms in SF networks Section 13 (ps) (slides) (html)
8 Mar 18 All-to-all scatter algorithms Section 14 (ps) (slides) (html) Homework 6 (ps) (html)
9 Mar 23 Introduction to parallel sorting on meshes Section 15 (ps) (slides) (html)
9 Mar 25 Asymptotically optimal mesh sorting algorithm Section 16 (ps) (slides) (html) Homework 7 (ps) (html)
10 Mar 30 Parallel sorting on hypercubic machines Section 17 (ps) (slides) (html)
10 Apr 1 Optimal EREW PRAM sorting Section 18 (ps) (slides) (html)
11 Apr 6 Parallel matrix operations Section 19 (ps) (slides)
11 Apr 8 Solving systems of equations in parallel I Section 20 (ps) (slides) Homework 8 (ps)
12 Apr 13 Solving systems of equations in parallel II Section 20 (ps) (slides)
12 Apr 15 Parallel scan - introduction Section 21 (ps) (slides) Homework 9 (ps)
13 Apr 20 Advanced applications of parallel scan Section 22 (ps) (slides)
13 Apr 22 Parallel scan on linked lists Section 23 (ps) (slides)
14 Apr 27 Deterministic coin tossing Section 24 (ps) (slides)
14 Apr 29 Euler tour technique and its applications Section 25 (ps) (slides) Homework 10 (ps)
15 May 4 Parallel tree contraction Section 26 (ps) (slides)
15 May 6 Parallel connected component algorithms Section 27 (ps) (slides)
Finals May 11 Final Exam (Tuesday, 8:00)

Back to the beginning of the page


List of optional books for this course

There is a couple of books on parallel algorithms and parallel computing you might find useful as a supplementary source of information, but in no case you have to read them to get through this course. They can serve as a source of additional information if you wish to understand more about problems discussed in the course. An incomplete list of such useful books follows:
Back to the beginning of the page

Course requirements

Administrative requirements

Each student registrated for this course is required either to provide me a reference to his personal home page with a photo and information on professional background or to provide me a passport photo and send me by email the following information:
  1. first and family name,
  2. thesis topic,
  3. supervisor's name,
  4. experience with parallel computing and/or machines in general.

Programming skills

You are expected to have some programming experience with either C or C++. PVM and MPI2 are C/C++ parallel programming libraries.

Prerequisites

Background in graph theory, programming techniques, data structures, computer architecture, and complexity theory will be of great help.

Computer facilities and accounts

We will be using UNIX workstations (running the Sun Solaris operating systems and X windows) and SP-2 machine for the programming term project. You need an account on the SP-2 machine and on at least one of local UNIX workstation clusters.

Homeworks

There will be 10 small homeworks. At the end of selected Thursday lectures (for specific dates, check always the class schedule on this page), you will be asked to do a small problem solving homework, which will be tightly related to the topics covered by that week's two lectures. The due for emailing me the the solutions and/or answers will be always Wed noon next week. Every homework will be marked by 2 points so that the total contribution of all homeworks to the course grading is 20% (see bellow).

Written final exam

The final written exam is mandatory. It will be a problem solving exam. You will be given a collection of problems, similar or related to those discussed and solved (often as homeworks) during the course. Tentatively, the written exam should take two hours or more.
Back to the beginning of the page

Grading Policy

This course will be graded on an A-F scale. The major part of the grading will be based on the results of your programming project and the written exam. More detailed criteria on how the term project will be evaluated will appear here soon, but basically, correctness of your parallel algorithm, your performance and scalability figures, and quality of the report will be the main criteria.

Term project: 40%
Homeworks: 40%
Final exam: 20%


Back to the beginning of the page