CS 701
Section 1
Spring 2020
News
Watch this space for the latest updates.
The class mailing list is
compsci701-1-s20@lists.wisc.edu.
An archive of mail sent to the class can be found
here.
Abstracts for the final-project presentations posted
The abstracts are available here.
Sign-up sheet for project presentations
Please sign up for one of the slots on 4/29 and 5/1
here.
Remaining Lectures by Webex
Starting Friday, March 13, 2020, all remaining lectures will be by Webex.
Either click here
to join the meeting, or click on the date of the lecture in the course schedule.
Problem Set 3 Assigned
Problem Set 3 is now available. Click here for more information.
Last updated:
Sun Apr 5 23:53:13 CDT 2020
Contents
General Information
Course Subject, Number, and Title
Computer Sciences 701: Construction of Compilers
Meeting Times and Location
Mondays, Wednesdays, and Fridays, 11:00 AM - 12:15 PM, on the days indicated in the
course schedule, in 1257 CS.
Credits
3
How the Course Meets the Credit-Hour-Policy Standards
Modified ``3-Credit Course, Option B'':
The classroom is reserved for three 75-minute class periods each week over the
spring semester;
however, on average we will meet for two 75-minute class periods per week.
Until late March, we will front-load the classes and meet for three 75-minute class periods most weeks,
after which there will be no classes, and you will have about five uninterrupted weeks to work on a project
of your own choosing.
In the final week of the term, we will hold enough classes for each project-group to make
a short presentation about the work they carried out.
See the course schedule for the dates on which
classes will be held.
The class carries the expectation that students will work on course learning activities
(reading, writing, problem sets, projects, etc) for about 3 hours out of the classroom
for every class period.
More information about the expectations for student work can be found elsewhere on this page.
Instructional Mode
Lecture by the instructor, and at the end of the term, a short project presentation by each project-group.
Instructor
Professor Thomas Reps
Office: 6361 Computer Sciences
Office hours: By appointment
Phone: 262-2091
E-mail: reps at cs.wisc.edu
Home page: www.cs.wisc.edu/~reps/
Textbook and Notes
There is no required textbook for the course.
Much of the reading material for the course is available on-line;
see the course bibliography.
Over the semester, we will accumulate a set of on-line notes in this
directory.
The notes (will) cover the following topics:
- Static program analysis
- Uses for static-analysis information (PPT)
- Intraprocedural dataflow analysis (PPT)
- Part 1 (PDF)
- Part 2 (PDF)
- Part 3 (PDF)
- Part 4 (PDF)
- Part 5 (PDF)
- Part 6 (PDF)
- Pointer analysis (HTML)
- Automatic differentiation and back-propagation (PDF)
- Partial evaluation
- Partial-evaluation notes from Spring 2020 on-line lectures
- Compiling for profiling
- Dataflow-frequency analysis from whole-program paths (PDF)
- Bottom-up rewrite systems (BURS) (PDF of lecture notes)
- Superoptimization
and possibly other topics as time permits.
Assignments
There will be a few problem sets (2-4), a few paper critiques (2-4), and a course project.
Problem Set 1
For Problem Set 1, do problems 3 and 4 from the list of
dataflow-analysis problems.
Due: Friday, Feb. 28, by 5:15 PM in the box outside 6361 CS.
Problem Set 2
For Problem Set 2, do problems 1 and 2 from the list of
regular-tree problems.
Due: Tuesday, March 24, 2020, by 5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu
Problem Set 3
For Problem Set 3, do problems 3 and 4 from the list of
partial-evaluation problems.
Due: Friday, April 10, 2020, by 5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu
Course Project
For the course project, you will work on a problem with a partner.
The first step is to find a partner and write a project proposal (1-2 pages) that describes
the following items:
- The statement of the problem to be investigated
- An explanation of why the problem is interesting
- A description of what you propose to do
- Explain the elements that you will have to build
- Explain the elements that you can pick up from open-source sites
- Explain the experiment(s) and/or performance measurement(s) that you
plan to carry out. A good way to make it clear what you hope to do is as follows:
- State the hypotheses that you hope to refute.
- Complete the following sentence with 1-4 bullet points:
``The experiments are designed to shed light on the following questions: . . .''
Then explain what you plan to measure; how you will measure it (if it is not obvious); and
where you will obtain test cases.
- List the tasks, broken down into two or three milestones
A list of possible project topics will be circulated by e-mail to the class mailing list.
However, you are not required to choose one of the topics from the list;
you may propose a topic of your own devising.
If you are already doing research in some area of CS, you may wish to exploit
your existing domain expertise by exploring some compilation-related
issue in that area.
So that you have an idea of the scope of an appropriate project, here are the abstracts of the
presentations from the Fall 2015 version of the course:
Fall 2015 abstracts.
To help keep you on track, I will impose five deadlines:
- Due date (for proposal): Friday, March 27 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
- Milestone 1: I am interested in evidence that
you are making forward progress. I will leave it up
to you as to what makes the most sense to turn in.
For some projects, it could be some worked examples
and a summary of planned next steps;
for others, it could be an implementation plan with completed
steps checked off. Please turn in an updated proposal (with
changes marked with changebars [\usepackage{changebar}
if you are using LaTeX]), and your new material added as
``Appendix A: Milestone 1''.
Due date (for Milestone 1 document): Wednesday, April 8 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).
- Milestone 2: Description of progress, implementation
plan with completed steps checked off, and experimentation
plan. Please turn in an updated proposal (with changes marked
with changebars, and your new material added as
``Appendix B: Milestone 2''.
Due date (for Milestone 2 document): Friday, April 17 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).
- Presentation: 14-minute oral presentations
(plus 4 minutes for questions/discussion) will be given during class at the end of the semester.
You will need to e-mail me an abstract (in plaintext)
giving the title, project participants, and a two-paragraph to
three-paragraph summary of what will be presented.
Due date (for abstract): 11 AM on Monday, April 27, in plaintext, sent by e-mail to reps at cs.wisc.edu.
Due date (for presentations): Wednesday, April 29 and Friday, May 1.
Please sign up for one of the slots on 4/29 and 5/1
here.
- Final Writeup: The final writeup should be modeled
after a typical conference paper. There is no length
requirement or limit, but I would expect it to be somewhere
around 6-10 pages in one of the full-page, double-column
conference formats.
Due date: Monday, May 4 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).
Grading
The course will be graded on the conventional (A-F) system.
- Homework (problem sets and paper critiques): 25%
- Course project: 60%
- Project presentations: 15%
The final grades will be curved.
I will be looking for a few volunteers to prepare LaTeX notes for one
or more of the term's classes -- more about this topic later in the
semester.
Students who do such a task may receive extra credit.
Policy on Collaborative Work
A certain amount of discussion of problem sets and programming
problems is permitted.
You should not ``go to someone'' simply to get an answer;
the kind of collaboration intended is a discussion with someone else
in which you discover the answer jointly.
You are not to make a habit of it over the course of the semester.
(Points will be deducted if you do an excessive amount
of the course work in collaboration with other people.)
When you do work in collaboration with someone else,
make a note of it on the solution set or program you turn in.
In all cases, you are to do your own write-up.
For the programming assignments, you are to do your own programming;
sharing of code is not permitted.
For the end-of-course project, you are encouraged to work in groups of two.
Course Summary
CS 701 is a graduate course on compilation (broadly defined).
The official course listing no longer accurately reflects the material
taught in the course, which is described below.
It is useful for students to have completed an undergraduate compiler course,
such as Wisconsin's CS 536;
however, that course is not mandatory for CS 701.
CS 701 is a graduate course on compilation (broadly defined).
Topics to be covered include partial evaluation, static program
analysis, compiling for profiling, compiling for data mining and
machine learning, and machine learning for compiling.
We will begin by studying static analysis, which provides a way
to obtain information about the possible states that a program reaches
during execution, but without actually running the program on specific
inputs. Instead, static-analysis techniques explore a program's
behavior for all possible inputs and all possible states that the
program can reach. To make this feasible, the program is "run in the
aggregate"---i.e., on descriptors that represent collections of many
states.
In particular, we will cover several varieties of interprocedural dataflow analysis.
What we learn about static-analysis methods will serve as a way to
understand the back-propagation algorithm used in machine learning.
Compiling for machine learning will be our second topic.
A third topic is partial evaluation. Partial
evaluation involves evaluating a program when only part of the
program's input has been supplied. A partial evaluator operates on a
program p that takes a pair of inputs, s (for ``static'', or
``supplied'') and d (for ``dynamic'', or ``delayed''), of which only s
is known at partial-evaluation time. Partial evaluation of p with
respect to s produces a residual program ps with the property that ps
applied to d produces the same answer as that obtained when p is
applied to s and d together. Partial evaluation provides insight into
such topics as translation, optimization, and run-time code
generation.
One of the important issues one faces when trying to get a software system
to perform as desired is to understand what it is actually doing.
Thus, a fourth topic is compiling for profiling.
The remaining topics have the common feature of involving a search
in a huge search space of possible results:
superoptimization, program synthesis, and auto-tuning.
Criteria to Use for Reading/Critiquing Assignments
Please use the following criteria (taken from Ben Liblit's CS706 class)
for the critiques that you write for the reading/critiquing assignments:
-
Stated goals and solution. What problem are the authors trying to
solve? What are the bounds on this problem, i.e., what are they not
trying to solve? What techniques or tools do the authors offer to
solve the problem at hand? How do the authors know they have solved
the problem? Do the authors test or validate their approach
experimentally? Does the solution meet the stated goals, or does it
fall short in some way? Avoid simply quoting the authors' own
abstract. Restating in your own words demonstrates your understanding.
-
Cool or significant ideas. What is new here? What are the main
contributions of the paper? What did you find most interesting? Is
this whole paper just a one-off clever trick or are there fundamental
ideas here which could be reused in other contexts?
-
Fallacies and blind spots. Did the authors make any assumptions or
disregard any issues that make their approach less appealing? Are
there any theoretical problems, practical difficulties, implementation
complexities, overlooked influences of evolving technology, and so on?
Do you expect the technique to be more or less useful in the future?
What kind of code or situation would defeat this approach, and are
those programs or scenarios important in practice?
Note: we are not interested in flaws in presentation, such as trivial
examples, confusing notation, or spelling errors. However, if you have
a great idea on how some concept could be presented or formalized
better, mention it.
-
New ideas and connections to other work. How could the paper be
extended? How could some of the flaws of the paper be corrected or
avoided? Also, how does this paper relate to others we have read, or
even any other research you are familiar with? Are there similarities
between this approach and other work, or differences that highlight
important facets of both?
Course Schedule
What follows is a tentative schedule of topics:
DATES TOPIC READINGS
===========================================================================
1/22 Synopsis of CS536 (AST-based) compiler
project; course mechanics
1/24 From ASTs to CFGs
Uses for static-analysis information PPT
Static program analysis Lecture Notes
----------------------------------------------------------------------------
1/27 Brief intro to abstract interpretation
1/29 Static analysis as a path problem Lecture Notes
1/31 Intraprocedural dataflow analysis PPT
----------------------------------------------------------------------------
2/3 Collecting semantics as a path problem Lecture Notes
2/5 Abstract semantics as a path problem Lecture Notes
2/7 Automatic differentiation Lecture Notes
and back-propagation [Olah 2015]
----------------------------------------------------------------------------
2/10 Functional approach to intraprocedural
dataflow analysis
2/12 Interprocedural dataflow analysis Lecture Notes
[Sharir & Pnueli 81]
2/14 No class
----------------------------------------------------------------------------
2/17 No class
2/19 Interprocedural dataflow analysis Lecture Notes
[RHS 1995]
2/19 Problem Set 1 assigned
2/21 Program analysis via CFL-reachability [Reps 1998]
PLDI00 Tutorial Slides
Abstract interpretation and logic [RSY 2004]
----------------------------------------------------------------------------
2/24 Newtonian program analysis Lecture Notes
PPT
[RTP16],[RTP17]
2/26 Ball-Larus & Melski-Reps path profiling [BL 1996] and [MR 1999]
(Ancient) PowerPoint Slides for [MR 1999]
2/28 Whole program paths [Larus 1999]
Lecture Notes
2/28 Problem Set 1 due (5:15 PM, in the box outside 6361 CS)
----------------------------------------------------------------------------
3/2 Dataflow frequency analysis [SM 2002]
based on whole program paths Lecture Notes
3/4 Finish dataflow frequency analysis [SM 2002]
based on whole program paths Lecture Notes
3/6 Probabilistic calling context [BM 2007]
Efficient call-trace collection [WXCZZ 2016], PPT
Lecture Notes
----------------------------------------------------------------------------
3/9 Bottom-up rewrite systems (BURS) [FHP 1992], [AGT 1989]
3/9 Problem Set 2 assigned
3/11 Tree Transducers
3/13 Finish Bottom-up rewrite systems (BURS) Lecture Notes
----------------------------------------------------------------------------
3/14-22 Spring Recess
----------------------------------------------------------------------------
3/23 Intro to partial evaluation and [Jones et al. 93], Chapter 1
Futamura projections Notes on partial evaluation (Part 1)
Notes on partial evaluation (Part 2)
Hand-written notes from on-line lecture, Part 1
Partial-evaluation problems
3/24 Problem Set 2 due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
3/25 Partial evaluation [Jones et al. 93], Section 3.3.3 and Chapter 4
Notes on partial evaluation (Part 2)
Hand-written notes from on-line lecture, Part 2
3/27 Partial evaluation Hand-written notes from on-line lecture, Part 3
3/27 Initial Course-Project Proposal Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
3/29 Problem Set 3 assigned
3/30 Generating extensions Notes on partial evaluation (Part 2; Generating Extensions)
4/1 Partial evaluation of a first-order Notes on partial evaluation (Part 2; Partial Evaluation of Function Programs)
functional language (Summary) [Jones et al. 93], Chapter 5
Tips on writing a research paper Slides
4/3 ``Mystery'' lecturer Recording; the password is Rptp6tSh; recommended speed: 1.25x
----------------------------------------------------------------------------
4/6 No class
4/8 No class
4/8 Course Project Milestone 1 Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
4/10 No class
4/10 Problem Set 3 due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
4/13 No Class
4/15 No class
4/17 No class
4/17 Course Project Milestone 2 Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
4/20 No class
4/22 No class
4/24 No class
----------------------------------------------------------------------------
4/27 FINAL PROJECT ABSTRACTS DUE (11:00 AM, in plaintext, by e-mail to reps at cs.wisc.edu)
4/27 No class
4/29 FINAL PROJECT PRESENTATIONS
5/1 FINAL PROJECT PRESENTATIONS
----------------------------------------------------------------------------
5/4 FINAL PROJECT WRITEUPS DUE (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
Bibliography
- [AGT 1989]
-
Aho, A.V., Ganapathi, M., Tjiang, S.W.K.,
Code generation using tree matching and dynamic programming.
ACM Trans. Program. Lang. Syst. 11(4): 491-516 (1989).
[On-line copy:
On-campus access.]
- [BL 1996]
-
Ball, T., and Larus, J.R.,
Efficient Path Profiling. MICRO, 1996.
[On-line copies:
On-campus access;
Off-campus access.]
- [BA 2006]
-
Bansal, S. and Aiken, A.,
Automatic generation of peephole superoptimizers,
ASPLOS, 2006.
[On-line copies:
On-campus access;
Off-campus access.]
- [BM 2007]
-
Bond, M.D., McKinley, K.S.,
Probabilistic calling context. OOPSLA, 2007.
[On-line copies:
On-campus access;
Off-campus access.]
- [FHP 1992]
-
Fraser, C.W., Hanson, D.R., Proebsting, T.A.,
Engineering a simple, efficient code-generator generator.
ACM Letters on Programming Languages and Systems (LOPLAS) 1(3): 213-226 (1992)
[On-line copy:
PDF.]
- [Hoos 2012]
-
Hoos, H.H., Programming by optimization. Commun. ACM 55(2): 70-80 (2012).
[On-line copies:
On-campus access;
Off-campus access.]
- [Jones et al. 93]
-
Jones, N.D., Gomard, C.K., and Sestoft, P.,
Partial Evaluation and Automatic Program Generation,
Prentice-Hall International, Englewood Cliffs, NJ, 1993.
[On-line copy:
PostScript,
PDF.]
- [Larus 1999]
-
Larus, J.R., Whole program paths.
PLDI 1999.
[On-line copies:
On-campus access;
Off-campus access.]
- [Leroy 2009]
-
Leroy, X.,
Formal verification of a realistic compiler. Commun. ACM 52(7): 107-115 (2009)
[On-line copy (from campus only):
PDF.]
- [Massalin 1987]
-
Massalin, H.,
Superoptimizer: A look at the smallest program.
ASPLOS, 1987.
[On-line copies:
On-campus access;
Off-campus access.]
- [MR 1999]
-
Melski, D. and Reps, T.,
Interprocedural path profiling. In Proc. Int. Conf. on Compiler Construction (CC), 1999.
[pdf]
- [ND 2010]
-
Nickolls, J., and Dally, W.J., The GPU computing era.
IEEE Micro 30(2): 56-69 (2010).
[On-line copies:
On-campus access;
Off-campus access.]
- [Nielson, Nielson, & Hankin 1999]
-
Nielson, F., Nielson, H.R., and Hankin, C.,
Principles of program analysis, Springer, 1999.
- [Olah 2015]
-
Olah, C.,
Calculus on computational graphs: Backpropagation.
Aug. 2015.
[Blog post.]
- [Reps 1998]
-
Reps, T.,
Program analysis via graph reachability.
Information and Software Technology 40, 11-12 (November/December 1998),
pp. 701-726.
[pdf]
- [RHS 1995]
-
Reps, T., Horwitz, S., and Sagiv, M.,
Precise interprocedural dataflow analysis via graph reachability.
POPL 1995.
[pdf]
- [RSY 2004]
-
Reps, T., Sagiv, M., and Yorsh, G.,
Symbolic implementation of the best transformer.
Verification, Model Checking, and Abstract Interpretation (VMCAI), 2004.
[pdf,
ppt]
- [RTP 2016]
-
Reps, T., Turetsky, E., and Prabhu, P.,
Newtonian program analysis via tensor product.
POPL 2016.
[pdf,
ppt]
- [RTP 2017]
-
Reps, T., Turetsky, E., and Prabhu, P.,
Newtonian program analysis via tensor product.
ACM Trans. on Program. Lang. and Syst. 39(2): 9:1-9:72 (2017).
[pdf,
ppt]
- [SM 2002]
-
Scholz, B., and Mehofer, E.,
Dataflow frequency analysis based on whole program paths
IEEE PACT 2002.
[pdf]
- [SP 1981]
-
Sharir, M. and Pnueli, A.,
Two approaches to interprocedural data flow analysis,
Program Flow Analysis: Theory and Applications,
Chapter 7, Prentice-Hall, 1981.
[On-line copy (from campus only):
PDF.]
- [Veldhuizen 1999]
-
Veldhuizen, T.L., C++ templates as partial evaluation.
Proc. Partial Evaluation and Semantic-Based Program Manipulation, 1999.
[pdf]
- [WXCZZ 2016]
-
Wu, R., Xiao, X., Cheung, S.-C., Zhang, H., and Zhang, C.,
Casper: An Efficient Approach to Call Trace Collection
POPL 2016.
[pdf,
ppt]