Homework 2
Due 02/19
Weight: 20%
Important
1. Problem 1 (20 points)
Design a 16-bit barrel shifter with the following interface. Consult lecture notes for barrel shifter design.
Inputs:
- In(15:0) - 16 bit input operand value to be shifted
- Cnt(3:0) - 4 bit amount to shift (number of bit positions to shift)
- Op(1:0) - shift type, see encoding in table below
Output:
- Out(15:0) - 16 bit output operand
| Opcode | Operation |
| 00 | rotate left |
| 01 | shift left |
| 10 | shift right arithmetic |
| 11 | shift right logical |
You should use Verilog to do this homework. Before starting to write any verilog, I suggest the following:
- Break down your design into sub-modules.
- Define interfaces between these modules
- Draw paper and pencil schematics for these modules
- Then start writing verilog
Verify the design using representative inputs.
What to submit:
- Turn in neatly and legibly drawn schematics of your design.
- Annotated simulation trace of the complete design. Pick representative cases for your simulation input to turn in.
- Explain your choice of inputs and why they are representative.
- Electronically submit your verilog source code. All of your source code must be in one tgz called hw2-p1.tgz. Vcheck output must be included in tgz.
Handin instructions for homework 2
2. Problem 2 (30 points)
This problem should also be done in Verilog. Design a simple 16-bit ALU. Operations to be performed are 2's Complement ADD, bitwise-OR, bitwise-XOR, bitwise-AND, and the shift unit from problem 1. In addition, it must have the ability to invert either of its data inputs before performing the operation and have a C0 input (to enable subtraction). Another input line also determines whether the arithmetic to be performed is signed or unsigned . Use a carry look-ahead adder (CLA) in your design. (Hint: First design a 4-bit CLA. Then use blocks of this CLA for designing the 16-bit CLA.)
| Opcode | Function | Result |
| 000 | rll | rotate left |
| 001 | sll | shift left |
| 010 | sra | shift right arithmetic |
| 011 | srl | shift right logical |
| 100 | ADD | A+B |
| 101 | OR | A OR B |
| 110 | XOR | A XOR B |
| 111 | AND | A AND B |
The external interface of the ALU should be:
Inputs
- A[15:0], B[15:0] - Data input lines A and B (16 bits each.)
- Cin - A carry-in for the LSB of the adder.
- Op(2:0) - The OP code (3 bits.) The OP code determines the operation to be performed. The opcodes are shown in the Table above.
- invA - An invert-A input (active high) that causes the A input to be inverted before the operation is performed.
- invB - An invert-B input (active high) that causes the B input to be inverted before the operation is performed.
- sign - A signed-or-unsigned input (active high for signed) that indicates whether signed or unsigned arithmetic to be performed for ADD function on the data lines. (This affects the OFL output.)
Outputs
- Out(15:0) - Data out (16 bits.)
- OFL - (1 bit) This indicates high if an overflow occurred.
- Zero - (1 bit) This indicates that the result is exactly zero.
Other assumptions:
- You can assume 2's complement numbers.
- In case of logic functions, OFL is not asserted (i.e. kept logic low).
Use hierarchical design and simulate each block by itself before you try the complete design. You must reuse the shift unit designed in Problem 1.
What to submit:
- Neatly and legibly drawn schematics, hand-drawn is fine
- Annotated simulation trace output of the complete design. Pick representative cases for your simulation input.
- You should explain why your inputs are representative.
- Electronically submit your verilog source code. All of your source code must be in one tgz called hw2-p2.tgz. Vcheck output must be included in tgz.
Handin instructions for homework 2
3. Problem 3 (6 points)
Problem 2.52 on CD of COD3e: In More Depth
Use this instruction sequence instead of the one given:
c = a + b;
d = c - b;
a = c + d;
4. Problem 4 (6 points)
Problem 2.57 on CD of COD3e: In More Depth
5. Problem 5 (6 points)
See page B-43 on CD of COD3e: Chapter/Appendix
Answer the problem using these operands instead of the ones given:
a: 0110 1111 1011 1010
b: 1001 1100 0110 0110
6. Problem 6 (6 points)
Problem 4.10 on Page 273 of COD3e
7. Problem 7 (6 points)
Problem 4.12 on Page 274 of COD3e
8. Problem 8 (20 points)
Understanding ISAs. In this problem you will get to lightly analyze the x86 and use some system software tools.
- You will first compile a simple C program and generate a binary.
- Disassemble the program (i.e.) look at the instructions generated.
- Annotate the disassembled program and map this back to the original C source code.
Use ONLY the CS Linux machines for this problem.
Type in the following program and name it sum.c
int main(void) {
int i, sum;
sum = 0;
for (i = 0; i < 24; i++) {
sum += i;
}
return sum;
}
Compile this program using gcc, by doing the following at the linux prompt.
prompt % gcc -O0 sum.c -o sum.exe
sum.exe is the binary generated. The -O0 flag disables all compiler optimizations. We will now look at the instructions the compiler generated for us. A tool called objdump does this. At the linux prompt:
prompt % objdump --disassemble sum.exe
You will see a lot of output - close to 250 lines. Search for the string "<main>" which will show the assembly language source code for the function main in our program.
The first column is the address in the program, the 2nd column lists the actual bytes that represent each instruction (you will notice the variable lengths here), and the 3rd column is the assembly language instructions.
Annotate each instruction with what it is doing and map them back to the few lines of C-source code.
x86 ISA reference:
- Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 1: Basic Architecture. pdf download
- Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 2A: Instruction Set Reference, A-M
Describes the format of the instruction and provides reference pages for instructions (from A to M). This volume also contains the table of contents for both Volumes 2A and 2B. pdf download
- Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 2B: Instruction Set Reference, N-Z
Provides reference pages for instructions (from N to Z). VMX instructions are treated in a separate chapter. This volume also contains the appendices and index support for Volumes 2A and 2B. pdf download
What you must submit.
- Read up on what objdump is and describe what it does in a few lines.
- Print the assembly language source code and annotate it with your comments on how each instructions maps back to the source code.
- Bonus question 1: Compile the program with -O3 (enabling all compiler optimizations), execute objdump, and examine the code for main. You will notice far fewer instructions. Explain what the instructions are doing now and why are there fewer instructions.
- Bonus question 2: Why does the program binary have close to 250 instructions for the few lines of C-source code? What are the instructions outside of the
main function doing?
9. Errata to Solution Set
Q2. ALU Diagram
The OFL output also needs to consider output[2] as one of its inputs.
Q3. Stack based assembly code.
There are two consecutive 'push addrC' in the code. Any one of them needs to be 'push addrD'
Q4. The correct solution is :
2.57 solution:
MIPS:
- add $t0, $zero, $zero # i = 0
- addi $t1, $zero, 10 # set max iterations of loop
- loop: sll $t2, t0, 2 # $t2 = i * 4
- add $t3, $t2, $a1 # $t3 = address of b[i]
- lw $t4, 0($t3) # $t4 = b[i]
- add $t4, $t4, $t0 # $t4 = b[i] + i
- sll $t2, t0, 4 # $t2 = i * 4 * 2
- add $t3, $t2, $a0 # $t3 = address of a[2i]
- sw $t4, 0($t3) # a[2i] = b[i] + i
- addi $t0, $t0, 1 # i++
- bne $t0, $t1, loop # loop if i != 10
PowerPC:
- add $t0, $zero, $zero # i = 0
- addi $t1, $zero, 10 # set max iterations of loop
- loop: lwu $t4, 4($a1) # $t4 = b[i]
- add $t4, $t4, $t0 # $t4 = b[i] + i
- sll $t2, t0, 4 # $t2 = i * 4 * 2
- sw $t4, $a0+$t2 # a[2i] = b[i] + i
- addi $t0, $t0, 1 # i++
- bne $t0, $t1, loop # loop if i != 10
Q6. For compiler C3 solution,
- Using I1, C3 produces programs with CPI (0.5*2 + 0.25*3 + 0.25*3) =3
- Using I2., C3 produces programs with CPI (0.5*1 + 0.25*2 + 0.25*2) = 1.5
Performance ratio of I1 vs I2 = (6/3)/(3/1.5) = 1
- Thus, both I1 and I2 have same performance on C3.