CS 701, Program #1

Introduction to LLVM and Simple Local Optimization

Due: Monday, September 22, 2014 (by midnight)

Not accepted after midnight Wednesday, October 1, 2014

CS 701 and LLVM

The main focus of CS 701 is on the (backend) optimization phase of a compiler, including several kinds of analysis that must be done to enable optimizations.

To reinforce your understanding of the concepts involved, you'll do four projects that implement various backend components. You'll be using the LLVM Compiler Infrastructure. (LLVM means "Low Level Virtual Machine"). LLVM was initially developed by a group led by Vikram Adve, an alumnus of the University of Wisconsin (and CS 701!).

LLVM is implemented in C++. It includes commands clang, opt, and llc, which run a C front-end, an optimizer, and a backend, respectively. LLVM includes many more commands, most of which are documented at http://llvm.org/docs/CommandGuide/index.html, but you won't need those for this class.

LLVM is installed and ready to use in /unsup/llvm-3.3/. The projects that you will build will produce dynamic libraries, which will be used at runtime when you invoke the opt and llc commands. This means you don't ever need to copy or compile the entire LLVM source. (The complete LLVM tree is about 1.5 GB, and a full compilation takes about half an hour.)

Note: You will need to frequently reference the LLVM documentation in order to do the four CS 701 projects. The LLVM documentation is voluminous. We will try to give you enough information to save you from unnecessary frustration with the documentation. If you are having trouble anyway, don't hesitate to ask for help (particularly through Piazza).

To Access LLVM

To use the standard LLVM commands (like opt and llc), you should add the LLVM binary directory to your PATH. If you don't have your own elaborate environment configuration, you can add to PATH like this:

If you use the csh or tcsh shell, add the following to the file ~/.cshrc.local:
```
set path=($path /unsup/llvm-3.3/bin )
```
If you use ksh or bash, add the following to the file ~/.profile (for ksh) or .bash.local or maybe .bashrc.local (for bash):
```
export PATH=$PATH:/unsup/llvm-3.3/bin
```

So that the changes to your PATH take effect, restart your console session (logout and back in).

To Run LLVM

LLVM is composed of many separate pieces. To use LLVM to turn a C source file into an x86 (or x86_64) executable, we need to:

transform the C source into LLVM bitcode
optionally optimize the bitcode
transform the bitcode into assembly
assemble the program

Just as C files use the extension .c, LLVM bitcode uses the extension .bc, assembly uses the extension .s, and LLVM human-readable assembly uses the extension .ll.

Suppose we have a C program in the file foo.c. Below are the steps needed to create an executable (with no optimization):

    clang -emit-llvm -O0 -c foo.c -o foo.bc    // create bitcode .bc
    llc foo.bc                                 // create assembly .s
    gcc foo.s -o foo                           // create executable "foo"

To run your program:

foo

To turn LLVM bitcode into human-readable LLVM assembly (foo.ll):

    llvm-dis -f foo.bc

The above LLVM commands (clang, llc, and llvm-dis) are all available in the /unsup directory.

LLVM Instructions

Let's get a little more familiar with LLVM's instructions. Consider the following C program, sum.c:

#include <stdio.h>

int main() {
  int n;
  int sum;
  sum = 0;
  for (n = 0; n < 10; n++)
    sum = sum + n*n;
  printf("sum: %d\n", sum);
}

Running clang and llvm-dis produces the following LLVM assembly code (on my 64-bit machine):


; ModuleID = 'sum.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [9 x i8] c"sum: %d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %n = alloca i32, align 4
  %sum = alloca i32, align 4
  store i32 0, i32* %retval
  store i32 0, i32* %sum, align 4
  store i32 0, i32* %n, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32* %n, align 4
  %cmp = icmp slt i32 %0, 10
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %1 = load i32* %sum, align 4
  %2 = load i32* %n, align 4
  %3 = load i32* %n, align 4
  %mul = mul nsw i32 %2, %3
  %add = add nsw i32 %1, %mul
  store i32 %add, i32* %sum, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %4 = load i32* %n, align 4
  %inc = add nsw i32 %4, 1
  store i32 %inc, i32* %n, align 4
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %5 = load i32* %sum, align 4
  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i32 0, i32 0), i32 %5)
  %6 = load i32* %retval
  ret i32 %6
}

declare i32 @printf(i8*, ...) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }

Some remarks about this assembly code:

Anything on a line after ; is a comment.
There are 5 basic blocks named entry, for.cond, for.body, for.inc, and for.end,
Note that the entry and for.cond blocks, as well as the for.body and for.inc blocks are each really a single basic block (the first block ends with an unconditional branch to the second block) -- I don't know why LLVM has separated them.
Names that start with a percent sign, (like %0, %1, %n, and %for.cond) are either virtual register names (more about this later) or block labels.

You'll want to become comfortable with LLVM assembly (the human readable form of LLVM bitcode) because the first three projects that you will write for this class will accept LLVM bitcode as input and will emit LLVM bitcode as output.

Read About LLVM

Here are links to some LLVM documents that you may find useful during the semester (but see Useful Links below for documentation specific to project 1):

Programmer's Manual
LLVM assembly language reference
Bugpoint, an LLVM debugging tool.
Writing an LLVM Pass. You will solve the remaining projects in this class by writing various LLVM passes. Ignore what this documentation says about Setting up the build environment.
LLVM API Documentation
LLVM files Most of LLVM's function-level documentation is in comments in header files (.h files). This link provides links to each of those files.

Overview of Project 1

Each of the four class projects will be implemented by an individual or a two person team (your choice). For each project you will write one or more new LLVM passes. For Project 1, you will write two passes. All passes will be invoked by command-line flags recognized by the LLVM opt command. The two passes (explained in more detail below) for Project 1 are:

A printCode pass, implemented in a file called printCode.cpp. Your printCode pass will print the LLVM assembly code for each function in a useful format (defined below).
An optLoads pass, implemented in a file called optLoads.cpp. Your optLoads pass will find and remove unnecessary load instructions in each basic block of a function (as defined below).

Useful Links for Project 1

Before starting the project, it might be a good idea to read some of the LLVM documentation. First, it might help to review some of the core LLVM classes. Concentrate on the following:

The Value class (because Instructions, BasicBlocks, and Functions are all Values)
The Function class
The Instruction class
The BasicBlock class

And here are some other useful links, mentioned in the description of the project below:

Basic Inspection and traversal Routines: e.g., tells you how to iterate through all instructions in a function, all basic blocks in a function, all instructions in a basic block.
Instruction.h includes some information about opcodes.

Build the `proj1` Tree

A skeleton for the printCode pass has been prepared for you. To set up the skeleton and build it, navigate to the location where you want to put your 701 projects and type the following:

    cp -r /p/course/cs701-fischer/public/proj1 proj1
    cd proj1
    make

Run printCode

For the first part of this project, you'll work on the file:

    lib/p1/printCode.cpp

in the proj1 directory that you just made. The class printCode is a FunctionPass that is run when you invoke the opt command. Its Makefile is configured to build it as a dynamic library. Given an LLVM bitcode file foo.bc (created from a C file as described above) we can tell opt to run the printCode pass as follows:

opt -load Debug/lib/P1.so -printCode foo.bc > foo.opt

Note:

P1.so is a shared library file created from the C++ source code in the lib/p1 subdirectory of your proj1 directory (for now, that's just printCode.cpp). Debug/lib/P1.so is in the proj1 directory that you created.
The -load flag loads P1.so as a dynamic library. This allows us to build and test the printCode pass without rebuilding the opt binary.
The -printCode flag indicates the pass we'd like opt to run. (That flag is defined in printCode.cpp.) To see all of the built-in passes, type opt -help.
Normally, we run opt to modify (optimize) the bitcode, so when it is run it outputs the optimized version of the bitcode. Therefore, we send the output to a new file (foo.opt). Since printCode only prints (does not do any optimization), foo.opt will be the same as foo.bc. For passes that do modify the bitcode, you'll need to
```
    mv foo.opt foo.bc
```
before calling llc to transform the optimized bitcode to assembly code if you want to run the program.

For more information about opt, see its documentation online.

To ensure that everything runs properly at this point, you should write a small C program foo.c in the proj1 directory, create the corresponding LLVM bitcode file, and run the printCode pass over that bitcode file. You can do the steps explicitly like this:


    clang -emit-llvm -O0 -c foo.c -o foo.bc

    opt -load Debug/lib/P1.so -printCode foo.bc > foo.opt

Or you can use the Makefile rule for running printcode by typing


    make foo.printCode

The version of printCode we've given you doesn't do much -- it simply prints the name of each function in the source program. Your job will be to modify printCode to print information about each LLVM instruction.

Modify printCode

For your first programming assignment, you will make printCode print a useful version of the bitcode file on which it runs. Getting more familiar with the bitcode and being able to output it will be helpful for the remaining projects.

Before discussing how you should modify printCode, here's some important information about how LLVM represents virtual registers. When you look at LLVM assembly code (a .ll file), you see virtual register names -- names that start with a percent sign, like %1, %2, %n, %mul etc, in the example code given above.

You can think of virtual registers whose names use numbers or names that are not the names of variables in the source code (e.g., %1, %mul) as temporaries, and those that use identifiers from the source code (e.g., %n, %sum) as registers that hold pointers to the memory allocated for local variables. However, in the intermediate representation (IR) used by LLVM, there are no virtual-register objects. Instead, for each instruction that assigns to a virtual register, that register is represented by the instruction's address. An instruction that uses the virtual register has the defining instruction's address as its operand. For example, one of the instructions in the example code given above is

   %mul = mul nsw i32 %2, %3

The LLVM IR for that instruction has the following fields:

a "mul" opcode
two operands, one for %2 and one for %3.

The operand for %2 is the address of the instruction that "assigned" to virtual register %2, i.e., the address of the instruction

%2 = load i32* %n, align 4

Similarly, the operand for %3 is the address of the instruction that assigned to %3. There is no operand for the target register, %mul; instead, the subsequent use of %mul (in the instruction %add = add nsw i32 %1, %mul) uses the address of the instruction %mul = mul nsw i32 %2, %3 as an operand.

Now we'll talk about how you should modify printCode. The printCode class is a FunctionPass. It includes a runOnFunction method that is called once for each function in the input program. Here's what your version of runOnFunction should do:

Create a map (you can use a DenseMap, or a C++ STL map, or you can define your own Map class) that maps each instruction in the function to a unique integer starting with 1. Do this by iterating over all instructions in the function and mapping each to the next integer value. (See Useful Links for Project 1 above for how to do various kinds of iterations.) Note:
- Every instruction in the program should be mapped to a unique value, not just every instruction within a function.
- You will need to create an instruction map for other passes that you write, so you might want to put the code in a separate file rather than in printCode.cpp.
- The LLVM documentation says that a Function Pass should not maintain state across functions. That would make it very difficult to implement the map from instructions to unique integers. However, it doesn't seem to matter if you violate this; e.g., it works fine to have a method with a static int variable (initialized to 1) that keeps track of the current instruction number.
Print "FUNCTION" and the name of the function.
Iterate over all basic blocks in the function. For each, print a blank line, then "BASIC BLOCK" and the name of the basic block.
After printing the block name, iterate over the instructions in the block. For each, print a percent, then the number of the instruction (using your map), a colon, the name of the opcode, and each operand. (Instruction.h has methods for getting the opcode and the opcode name, and the User class has methods for getting the number of operands and the operands themselves). When you print an operand that is an instruction, print a percent and the instruction's number (using your map). For an operand that is not an instruction, if it has a name, print that name; otherwise print XXX. You can use the isa operator to see whether an operand is an Instruction, and use the hasName and getName methods of the Value class to see whether an operand has a name and if so to get that name.

All output should go to stderr (i.e., use std::cerr << ...).

For example, for the program shown above, your output should look like this:


FUNCTION main

BASIC BLOCK entry
%1:     alloca   XXX
%2:     alloca   XXX
%3:     alloca   XXX
%4:     store    XXX %1
%5:     store    XXX %3
%6:     store    XXX %2
%7:     br       for.cond

BASIC BLOCK for.cond
%8:     load     %2
%9:     icmp     %8 XXX
%10:    br       %9 for.end for.body

BASIC BLOCK for.body
%11:    load     %3
%12:    load     %2
%13:    load     %2
%14:    mul      %12 %13
%15:    add      %11 %14
%16:    store    %15 %3
%17:    br       for.inc

BASIC BLOCK for.inc
%18:    load     %2
%19:    add      %18 XXX
%20:    store    %19 %2
%21:    br       for.cond

BASIC BLOCK for.end
%22:    load     %3
%23:    call     XXX %22 printf
%24:    load     %1
%25:    ret      %24

You don't have to match the whitespace within each line exactly, but please try to make your output as similar to this as possible so that we can compare your output with the expected output using diff -w. In particular, if there is whitespace in the example above, please make sure that your output has whitespace, too. So for example, you should not output the opcode immediately after the instruction number, like this: %25:ret %24.

Implement the optLoads pass

For the second part of this project, you will implement a pass that finds and removes unnecessary load instructions in each function. An instruction that loads a value from memory into a virtual register %k is unnecessary if the previous instruction in the same basic block stored a value v to the same memory location. You should find all such loads and replace all uses of %k with uses of v. Then you should remove the unnecessary load instruction.

For example, if the original code looks like this:


  store i32 12, i32* %x, align 4    // store the value 12 into the memory location pointed to by %x
  %0 = load i32* %x, align 4        // load the value in the memory location pointed to by %x into %0
  %add = add nsw i32 %0, 22         // set %add to be the value in %0 + 22
  store i32 %add, i32* %y, align 4  // store the value in %add into the memory location pointed to by %y
  %1 = load i32* %y, align 4        // load the value in the memory location pointed to by %y into %1
  %add1 = add nsw i32 %1, 33        // set %add1 to be the value in %1 + 33
  store i32 %add1, i32* %z, align 4 // store the value in %add1 into the memory location pointed to by %z

You would change it to the following:


  store i32 12, i32* %x, align 4     // store the value 12 into the memory location pointed to by %x
                                     // 1st unnecessary load was removed
  %add =  add nsw i32 12, 22         // set %add to be the value 12 + 22
  store i32 %add, i32* %y, align 4   // store the value in %add into the memory location pointed to by %y
                                     // 2nd unnecessary load was removed
  %add1 = add nsw i32 %add, 33       // set %add1 to be the value in %add + 33
  store i32 %add1, i32* %z, align 4  // store the value in %add1 into the memory location pointed to by %z

Note that you can get the above code from the following source code:

int main() {
  int x, y, z;

  x = 12;
  y = x + 22;  /* load value of x that was just stored */
  z = y + 33;  /* load value of y that was just stored */
}

Implement the optLoads pass as a FunctionPass in a file called optLoads.cpp, run from opt using the -optLoads flag. For example:

    clang -emit-llvm -O0 -c foo.c -o foo.bc
    opt -load Debug/lib/P1.so -optLoads foo.bc -o foo.optLoads
    mv foo.optLoads foo.bc

Here is what you should do:

Make a copy of printCode.cpp called optLoads.cpp in the same directory. Change everything specific to printCode to refer instead to optLoads.
Make sure that everything is OK so far: Add optLoads.o to the definition of OBJS in the Makefile in the proj1/lib/p1. Type make in the proj1 directory to create the optLoads pass as well as the printCode pass (both will be in library file P1.so). To run the optLoads pass use:
```
    opt -load Debug/lib/P1.so -optLoads foo.bc -o foo.optLoads
```
Now write the new runOnFunction code:
- Create an instruction map as you did for printCode.
- Iterate over all basic blocks in the function, and all instructions in each basic block. Look for an instruction that stores a value v to the memory location pointed to by virtual register %m, immediately followed by an instruction that loads from the location pointed to by %m into register %k. The second instruction (the load) is unnecessary. Note: The opcode for a load instruction is Instruction::Load, and the opcode for a store instruction is Instruction::Store.
- Print (to stderr)
```
      %n is a useless load
```
  where n is the number of the instruction that is the useless load, retrieved from your instruction map (which probably will not be the same as the target virtual register you'll see for that instruction if you look at the output of llvm-dis).
- Replace all uses of %k with a use of v. The Value class includes a replaceAllUsesWith method that you can use.
- Save the (address of the) unnecessary load instruction so that you can remove it when you're done iterating over all basic blocks in the current function (you will mess up the iteration if you remove it now). The Instruction class has an eraseFromParent method that you can use. The documentation is here.
- Change the function to return true or false depending on whether you actually made a change.
Finally, modify the getAnalysisUsage method by giving it an empty body (since your optLoads pass does modify the program), and change the call to RegisterPass to the following: RegisterPass X("optLoads", "optimize unnecessary loads", false, false);

An easy way to verify that optLoads is operating properly is to a "before and after" listing using printCode:

  make foo.printCode
  make foo.optLoads
  make foo.printCode

The program foo.c is first compiled and the unoptimized generated code is displayed using printCode. Then optLoads is run and the optimized code is listed. If n loads have been removed, the second listing should be n instructions shorter. Moreover, at each removal point, an operand stored into memory should be reused in a following instruction.

Submit Your Work

To submit your work, copy all of your source code (printCode.cpp, optLoads.cpp, plus any other C++ files that you wrote, and your Makefile) to your handin directory:

    cp *.cpp Makefile ~cs701-1/HANDIN/<YOUR-LOGIN>/P1

using your actual login in place of <YOUR-LOGIN>.

Please verify that make foo.printCode and make foo.optLoads will build properly using the files you've provided.

Good Luck and don't dawdle!

Late Policy

The project is due on Monday, September 22. It may be handed in late, with a penalty of 3% per day, up to Wednesday, October 1. The maximum late penalty is therefore 27% (the maximum possible grade becomes 73). This assignment will not be accepted after Wednesday, October 1.

Fri Aug 8 15:11:08 CDT 2014