CS 701
Section 1
Spring 2020


News

Watch this space for the latest updates.

The class mailing list is compsci701-1-s20@lists.wisc.edu. An archive of mail sent to the class can be found here.

Abstracts for the final-project presentations posted

The abstracts are available here.

Sign-up sheet for project presentations

Please sign up for one of the slots on 4/29 and 5/1 here.

Remaining Lectures by Webex

Starting Friday, March 13, 2020, all remaining lectures will be by Webex. Either click here to join the meeting, or click on the date of the lecture in the course schedule.

Problem Set 3 Assigned

Problem Set 3 is now available. Click here for more information.

Last updated: Sun Apr 5 23:53:13 CDT 2020


Contents


General Information

Course Subject, Number, and Title
Computer Sciences 701: Construction of Compilers

Meeting Times and Location
Mondays, Wednesdays, and Fridays, 11:00 AM - 12:15 PM, on the days indicated in the course schedule, in 1257 CS.

Credits
3

How the Course Meets the Credit-Hour-Policy Standards
Modified ``3-Credit Course, Option B'': The classroom is reserved for three 75-minute class periods each week over the spring semester; however, on average we will meet for two 75-minute class periods per week. Until late March, we will front-load the classes and meet for three 75-minute class periods most weeks, after which there will be no classes, and you will have about five uninterrupted weeks to work on a project of your own choosing. In the final week of the term, we will hold enough classes for each project-group to make a short presentation about the work they carried out. See the course schedule for the dates on which classes will be held.

The class carries the expectation that students will work on course learning activities (reading, writing, problem sets, projects, etc) for about 3 hours out of the classroom for every class period.

More information about the expectations for student work can be found elsewhere on this page.

Instructional Mode
Lecture by the instructor, and at the end of the term, a short project presentation by each project-group.

Instructor

Professor Thomas Reps
Office: 6361 Computer Sciences
Office hours: By appointment
Phone: 262-2091
E-mail: reps at cs.wisc.edu
Home page: www.cs.wisc.edu/~reps/

Textbook and Notes

There is no required textbook for the course. Much of the reading material for the course is available on-line; see the course bibliography.

Over the semester, we will accumulate a set of on-line notes in this directory. The notes (will) cover the following topics:

and possibly other topics as time permits.

Assignments

There will be a few problem sets (2-4), a few paper critiques (2-4), and a course project.

Problem Set 1

For Problem Set 1, do problems 3 and 4 from the list of dataflow-analysis problems.
Due: Friday, Feb. 28, by 5:15 PM in the box outside 6361 CS.

Problem Set 2

For Problem Set 2, do problems 1 and 2 from the list of regular-tree problems.
Due: Tuesday, March 24, 2020, by 5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu

Problem Set 3

For Problem Set 3, do problems 3 and 4 from the list of partial-evaluation problems.
Due: Friday, April 10, 2020, by 5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu

Course Project

For the course project, you will work on a problem with a partner. The first step is to find a partner and write a project proposal (1-2 pages) that describes the following items:
  1. The statement of the problem to be investigated
  2. An explanation of why the problem is interesting
  3. A description of what you propose to do
A list of possible project topics will be circulated by e-mail to the class mailing list. However, you are not required to choose one of the topics from the list; you may propose a topic of your own devising. If you are already doing research in some area of CS, you may wish to exploit your existing domain expertise by exploring some compilation-related issue in that area.

So that you have an idea of the scope of an appropriate project, here are the abstracts of the presentations from the Fall 2015 version of the course: Fall 2015 abstracts.

To help keep you on track, I will impose five deadlines:

  1. Due date (for proposal): Friday, March 27 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)

  2. Milestone 1: I am interested in evidence that you are making forward progress. I will leave it up to you as to what makes the most sense to turn in. For some projects, it could be some worked examples and a summary of planned next steps; for others, it could be an implementation plan with completed steps checked off. Please turn in an updated proposal (with changes marked with changebars [\usepackage{changebar} if you are using LaTeX]), and your new material added as ``Appendix A: Milestone 1''.
    Due date (for Milestone 1 document): Wednesday, April 8 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).

  3. Milestone 2: Description of progress, implementation plan with completed steps checked off, and experimentation plan. Please turn in an updated proposal (with changes marked with changebars, and your new material added as ``Appendix B: Milestone 2''.
    Due date (for Milestone 2 document): Friday, April 17 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).

  4. Presentation: 14-minute oral presentations (plus 4 minutes for questions/discussion) will be given during class at the end of the semester. You will need to e-mail me an abstract (in plaintext) giving the title, project participants, and a two-paragraph to three-paragraph summary of what will be presented.
    Due date (for abstract): 11 AM on Monday, April 27, in plaintext, sent by e-mail to reps at cs.wisc.edu.
    Due date (for presentations): Wednesday, April 29 and Friday, May 1. Please sign up for one of the slots on 4/29 and 5/1 here.

  5. Final Writeup: The final writeup should be modeled after a typical conference paper. There is no length requirement or limit, but I would expect it to be somewhere around 6-10 pages in one of the full-page, double-column conference formats.
    Due date: Monday, May 4 (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu).

Grading

The course will be graded on the conventional (A-F) system. The final grades will be curved.

I will be looking for a few volunteers to prepare LaTeX notes for one or more of the term's classes -- more about this topic later in the semester. Students who do such a task may receive extra credit.

Policy on Collaborative Work

A certain amount of discussion of problem sets and programming problems is permitted. You should not ``go to someone'' simply to get an answer; the kind of collaboration intended is a discussion with someone else in which you discover the answer jointly. You are not to make a habit of it over the course of the semester. (Points will be deducted if you do an excessive amount of the course work in collaboration with other people.)

When you do work in collaboration with someone else, make a note of it on the solution set or program you turn in. In all cases, you are to do your own write-up. For the programming assignments, you are to do your own programming; sharing of code is not permitted.

For the end-of-course project, you are encouraged to work in groups of two.


Course Summary

CS 701 is a graduate course on compilation (broadly defined). The official course listing no longer accurately reflects the material taught in the course, which is described below.

It is useful for students to have completed an undergraduate compiler course, such as Wisconsin's CS 536; however, that course is not mandatory for CS 701.

CS 701 is a graduate course on compilation (broadly defined). Topics to be covered include partial evaluation, static program analysis, compiling for profiling, compiling for data mining and machine learning, and machine learning for compiling.

We will begin by studying static analysis, which provides a way to obtain information about the possible states that a program reaches during execution, but without actually running the program on specific inputs. Instead, static-analysis techniques explore a program's behavior for all possible inputs and all possible states that the program can reach. To make this feasible, the program is "run in the aggregate"---i.e., on descriptors that represent collections of many states. In particular, we will cover several varieties of interprocedural dataflow analysis.

What we learn about static-analysis methods will serve as a way to understand the back-propagation algorithm used in machine learning. Compiling for machine learning will be our second topic.

A third topic is partial evaluation. Partial evaluation involves evaluating a program when only part of the program's input has been supplied. A partial evaluator operates on a program p that takes a pair of inputs, s (for ``static'', or ``supplied'') and d (for ``dynamic'', or ``delayed''), of which only s is known at partial-evaluation time. Partial evaluation of p with respect to s produces a residual program ps with the property that ps applied to d produces the same answer as that obtained when p is applied to s and d together. Partial evaluation provides insight into such topics as translation, optimization, and run-time code generation.

One of the important issues one faces when trying to get a software system to perform as desired is to understand what it is actually doing. Thus, a fourth topic is compiling for profiling.

The remaining topics have the common feature of involving a search in a huge search space of possible results: superoptimization, program synthesis, and auto-tuning.


Criteria to Use for Reading/Critiquing Assignments

Please use the following criteria (taken from Ben Liblit's CS706 class) for the critiques that you write for the reading/critiquing assignments:
  1. Stated goals and solution. What problem are the authors trying to solve? What are the bounds on this problem, i.e., what are they not trying to solve? What techniques or tools do the authors offer to solve the problem at hand? How do the authors know they have solved the problem? Do the authors test or validate their approach experimentally? Does the solution meet the stated goals, or does it fall short in some way? Avoid simply quoting the authors' own abstract. Restating in your own words demonstrates your understanding.

  2. Cool or significant ideas. What is new here? What are the main contributions of the paper? What did you find most interesting? Is this whole paper just a one-off clever trick or are there fundamental ideas here which could be reused in other contexts?

  3. Fallacies and blind spots. Did the authors make any assumptions or disregard any issues that make their approach less appealing? Are there any theoretical problems, practical difficulties, implementation complexities, overlooked influences of evolving technology, and so on? Do you expect the technique to be more or less useful in the future? What kind of code or situation would defeat this approach, and are those programs or scenarios important in practice?

    Note: we are not interested in flaws in presentation, such as trivial examples, confusing notation, or spelling errors. However, if you have a great idea on how some concept could be presented or formalized better, mention it.

  4. New ideas and connections to other work. How could the paper be extended? How could some of the flaws of the paper be corrected or avoided? Also, how does this paper relate to others we have read, or even any other research you are familiar with? Are there similarities between this approach and other work, or differences that highlight important facets of both?

Course Schedule

What follows is a tentative schedule of topics:

DATES   TOPIC                                   READINGS
===========================================================================
1/22    Synopsis of CS536 (AST-based) compiler
        project; course mechanics

1/24    From ASTs to CFGs
        Uses for static-analysis information     PPT
        Static program analysis                  Lecture Notes
----------------------------------------------------------------------------
1/27    Brief intro to abstract interpretation

1/29    Static analysis as a path problem        Lecture Notes

1/31    Intraprocedural dataflow analysis        PPT
----------------------------------------------------------------------------
2/3     Collecting semantics as a path problem   Lecture Notes

2/5     Abstract semantics as a path problem     Lecture Notes


2/7     Automatic differentiation                Lecture Notes
        and back-propagation                     [Olah 2015]
----------------------------------------------------------------------------
2/10    Functional approach to intraprocedural
        dataflow analysis

2/12    Interprocedural dataflow analysis        Lecture Notes
                                                 [Sharir & Pnueli 81]
2/14    No class
----------------------------------------------------------------------------
2/17    No class 

2/19    Interprocedural dataflow analysis        Lecture Notes
                                                 [RHS 1995]

2/19    Problem Set 1 assigned

2/21    Program analysis via CFL-reachability    [Reps 1998]
                                                 PLDI00 Tutorial Slides
        Abstract interpretation and logic        [RSY 2004]
----------------------------------------------------------------------------
2/24    Newtonian program analysis               Lecture Notes
                                                 PPT
                                                 [RTP16],[RTP17]

2/26    Ball-Larus & Melski-Reps path profiling  [BL 1996] and [MR 1999]
                                                 (Ancient) PowerPoint Slides for [MR 1999]

2/28    Whole program paths                      [Larus 1999]
                                                 Lecture Notes

2/28    Problem Set 1 due (5:15 PM, in the box outside 6361 CS)
----------------------------------------------------------------------------
3/2     Dataflow frequency analysis              [SM 2002]
            based on whole program paths         Lecture Notes

3/4     Finish dataflow frequency analysis       [SM 2002]
            based on whole program paths         Lecture Notes  

3/6     Probabilistic calling context            [BM 2007]
        Efficient call-trace collection          [WXCZZ 2016], PPT
                                                 Lecture Notes
----------------------------------------------------------------------------
3/9     Bottom-up rewrite systems (BURS)         [FHP 1992], [AGT 1989]

3/9     Problem Set 2 assigned

3/11    Tree Transducers

3/13    Finish Bottom-up rewrite systems (BURS)  Lecture Notes
----------------------------------------------------------------------------
3/14-22 Spring Recess
----------------------------------------------------------------------------
3/23    Intro to partial evaluation and          [Jones et al. 93], Chapter 1
            Futamura projections                 Notes on partial evaluation (Part 1)
                                                 Notes on partial evaluation (Part 2)
                                                 Hand-written notes from on-line lecture, Part 1
                                                 Partial-evaluation problems

3/24    Problem Set 2 due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)


3/25    Partial evaluation                       [Jones et al. 93], Section 3.3.3 and Chapter 4
                                                 Notes on partial evaluation (Part 2)
                                                 Hand-written notes from on-line lecture, Part 2


3/27    Partial evaluation                       Hand-written notes from on-line lecture, Part 3
   

3/27    Initial Course-Project Proposal Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
3/29    Problem Set 3 assigned

3/30    Generating extensions                    Notes on partial evaluation (Part 2; Generating Extensions)

4/1     Partial evaluation of a first-order      Notes on partial evaluation (Part 2; Partial Evaluation of Function Programs)
        functional language (Summary)            [Jones et al. 93], Chapter 5
        Tips on writing a research paper         Slides


4/3     ``Mystery'' lecturer                     Recording; the password is Rptp6tSh; recommended speed: 1.25x

----------------------------------------------------------------------------
4/6     No class   

4/8     No class

4/8     Course Project Milestone 1 Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)

4/10    No class

4/10    Problem Set 3 due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
4/13    No Class

4/15    No class

4/17    No class

4/17    Course Project Milestone 2 Due (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------
4/20    No class

4/22    No class

4/24    No class
----------------------------------------------------------------------------
4/27    FINAL PROJECT ABSTRACTS DUE (11:00 AM, in plaintext, by e-mail to reps at cs.wisc.edu)

4/27    No class

4/29    FINAL PROJECT PRESENTATIONS

5/1     FINAL PROJECT PRESENTATIONS
----------------------------------------------------------------------------
5/4     FINAL PROJECT WRITEUPS DUE (5:15 PM, as a PDF file, by e-mail to reps at cs.wisc.edu)
----------------------------------------------------------------------------

Bibliography

[AGT 1989]
Aho, A.V., Ganapathi, M., Tjiang, S.W.K., Code generation using tree matching and dynamic programming. ACM Trans. Program. Lang. Syst. 11(4): 491-516 (1989). [On-line copy: On-campus access.]
[BL 1996]
Ball, T., and Larus, J.R., Efficient Path Profiling. MICRO, 1996. [On-line copies: On-campus access; Off-campus access.]
[BA 2006]
Bansal, S. and Aiken, A., Automatic generation of peephole superoptimizers, ASPLOS, 2006. [On-line copies: On-campus access; Off-campus access.]
[BM 2007]
Bond, M.D., McKinley, K.S., Probabilistic calling context. OOPSLA, 2007. [On-line copies: On-campus access; Off-campus access.]
[FHP 1992]
Fraser, C.W., Hanson, D.R., Proebsting, T.A., Engineering a simple, efficient code-generator generator. ACM Letters on Programming Languages and Systems (LOPLAS) 1(3): 213-226 (1992) [On-line copy: PDF.]
[Hoos 2012]
Hoos, H.H., Programming by optimization. Commun. ACM 55(2): 70-80 (2012). [On-line copies: On-campus access; Off-campus access.]
[Jones et al. 93]
Jones, N.D., Gomard, C.K., and Sestoft, P., Partial Evaluation and Automatic Program Generation, Prentice-Hall International, Englewood Cliffs, NJ, 1993. [On-line copy: PostScript, PDF.]
[Larus 1999]
Larus, J.R., Whole program paths. PLDI 1999. [On-line copies: On-campus access; Off-campus access.]
[Leroy 2009]
Leroy, X., Formal verification of a realistic compiler. Commun. ACM 52(7): 107-115 (2009) [On-line copy (from campus only): PDF.]
[Massalin 1987]
Massalin, H., Superoptimizer: A look at the smallest program. ASPLOS, 1987. [On-line copies: On-campus access; Off-campus access.]
[MR 1999]
Melski, D. and Reps, T., Interprocedural path profiling. In Proc. Int. Conf. on Compiler Construction (CC), 1999. [pdf]
[ND 2010]
Nickolls, J., and Dally, W.J., The GPU computing era. IEEE Micro 30(2): 56-69 (2010). [On-line copies: On-campus access; Off-campus access.]
[Nielson, Nielson, & Hankin 1999]
Nielson, F., Nielson, H.R., and Hankin, C., Principles of program analysis, Springer, 1999.
[Olah 2015]
Olah, C., Calculus on computational graphs: Backpropagation. Aug. 2015. [Blog post.]
[Reps 1998]
Reps, T., Program analysis via graph reachability. Information and Software Technology 40, 11-12 (November/December 1998), pp. 701-726. [pdf]
[RHS 1995]
Reps, T., Horwitz, S., and Sagiv, M., Precise interprocedural dataflow analysis via graph reachability. POPL 1995. [pdf]
[RSY 2004]
Reps, T., Sagiv, M., and Yorsh, G., Symbolic implementation of the best transformer. Verification, Model Checking, and Abstract Interpretation (VMCAI), 2004. [pdf, ppt]
[RTP 2016]
Reps, T., Turetsky, E., and Prabhu, P., Newtonian program analysis via tensor product. POPL 2016. [pdf, ppt]
[RTP 2017]
Reps, T., Turetsky, E., and Prabhu, P., Newtonian program analysis via tensor product. ACM Trans. on Program. Lang. and Syst. 39(2): 9:1-9:72 (2017). [pdf, ppt]
[SM 2002]
Scholz, B., and Mehofer, E., Dataflow frequency analysis based on whole program paths IEEE PACT 2002. [pdf]
[SP 1981]
Sharir, M. and Pnueli, A., Two approaches to interprocedural data flow analysis, Program Flow Analysis: Theory and Applications, Chapter 7, Prentice-Hall, 1981. [On-line copy (from campus only): PDF.]
[Veldhuizen 1999]
Veldhuizen, T.L., C++ templates as partial evaluation. Proc. Partial Evaluation and Semantic-Based Program Manipulation, 1999. [pdf]
[WXCZZ 2016]
Wu, R., Xiao, X., Cheung, S.-C., Zhang, H., and Zhang, C., Casper: An Efficient Approach to Call Trace Collection POPL 2016. [pdf, ppt]