Project 1: Intro to C/Unix

Important Dates

Due: Friday, 9/28

Questions?

Send questions to 354-help@cs.wisc.edu (not to the TA or professor directly), or, of course, visit us in person during office hours!. If the question is about your code, copy all of your code into the handin directory (described below) and include your login in your email (don't worry, you are free to modify the contents of your handin directory prior to the due date). Also include all other relevant information, such as cutting and pasting what you typed and the results from the screen. In general, the more information you give, the more we can help!

Overview

This project is designed to quickly get you up to speed on C and the surrounding fun of programming in a Unix environment.

This project should be done by yourself. Copying code (from other groups) is considered cheating. Read this for more info on what is OK and what is not.

Before Beginning

Before you begin, please read this section carefully. It includes a fair amount of necessary background for the project. READ CAREFULLY!

Read this tutorial. It has some useful tips for programming in the C environment. You should definitely read this once or twice or three times. The source of the info is yours truly, from our free operating systems book.

Also read key chapters of the C book (K+R):

  • character arrays, strings, character pointers (1.9, 5.5, 5.10)
  • using array notation on pointers and vice-versa (5.3)
  • pointer arithmetic (also called address arithmetic) (5.4)
  • self-referential structures (6.5)
Actually, read the entire C book if you can. Skip over parts that are highly detailed, and just get a sense of all that is in there. It is a short book!

Finally, you have to familiarize yourself with logging into the machines in the mumble lab, using a Unix shell as well as a text editor, and all of that other fun stuff. Read more about each to become a master of Unix/C programming!

In particular, there are two tools you need to become quite familiar with. The first is a shell . The shell is where you type commands; getting efficient at doing so (and in particular, learning all of the keyboard shortcuts, etc.) makes you more wizard like. Spend the time and read the documentation! You can type man bash to learn more about the bash shell, for example; it is a long but worthwhile read! And while many people use the bash shell , others use tcsh (like me) or zsh or even more exotic ones. Pick one you like, learn how to use it well, customize it to make yourself more efficient, and in general spend the time to get better at it!

The second tool of note is your text editor. Becoming efficient within a particular text editor means spending a lot of time with it, learning all of the keyboard shortcuts, and in general being able to type in and move around code quickly. The editor I use is emacs , an amazingly powerful and programmable editing environment. Some others use vim , a variant of another old editor known as vi . Neophytes use tools like gedit , which perhaps is not a bad way to begin, but really not powerful enough to get the job done in your career; even if you start here, you'll need to eventually move to something better. There are some platform-specific rather cool editors out there, e.g., TextMate for the Mac. You might feel like checking something like that out too.

Here are some useful tutorials. Do note that it takes a long time of simply using and learning about any of these tools to become proficient. Start spending the time now!

Project Description

This project has lots of little parts, each of which is a C program that you will write. To begin with, you are given two tools, strgen.c and intgen.c. These tools do basically the same thing: generate some number of random numbers (from -25000 to 25000 or so), and write them out to a file.

However, they differ in one important way: strgen generates its output as a series of strings , which in C are arrays of characters; each character is represented via its ASCII code. In contrast, intgen generates its output in binary (integer) format, and thus each number is encoded in efficient twos-complement form. Your task will be to build tools that deal with the different types of files each of these generates.

Step One: Use The Tools

To get yourself warmed up, first make a directory where you will be working in your CS account. Probably best to do this in your private/ directory, which has permissions set such that no one else can easily access your files. Like this:

shell% cd private/
shell% mkdir cs354
shell% mkdir p1
shell% cd p1
The above sequence changes directories (using the cd command) into your private directory, creates a new directory (sometimes called a folder) called cs354 inside of the private directory, and then makes a new directory for this project called p1 and finally cd's into it.

Then copy the files strgen.c and intgen.c into that directory. We've put these files in a public folder for the class called ~cs354-3/public/p1. Thus, to copy the files over do the following:

shell% cp ~cs354-3/public/p1/strgen.c .
shell% cp ~cs354-3/public/p1/intgen.c .
Note the copy command copies a file from the source (first argument) to a target (second argument). When you list the second argument as a dot (which looks like a period), that is just telling the command to put the result into the current directory (which in Unix systems is referred to by a dot).

Finally, compile the two programs with the following lines:

shell% gcc -o intgen intgen.c -Wall -m32 -O
shell% gcc -o strgen strgen.c -Wall -m32 -O

At this point, you should have two executable programs. Let's run strgen first. To run it, type: ./strgen testfile 10 0 into the shell, like this:

shell% ./strgen testfile 10 0
Note that the shell% above is called the shell prompt , and may be different for the shell you are using; indeed, you can customize it in any way that you like! (read your shell's man page for details). The reason we write ./strgen instead of simply strgen is to tell the shell where to find the program you are trying to run; putting those two characters in front of strgen tells the shell that the program strgen is in the current directory. You can do away with the need for that by adding the current directory (dot) to your search path; read the man page of your shell for details.

If you've done everything correctly, running the program should create a file named testfile with 10 numbers in it (one per line), each of which is a number of some kind. The last number passed in to strgen , zero, is a random number seed; by changing this value, you can put a different set of randomly-generated numbers into the file. Try it!

To see the results of your work, simply type cat testfile into the shell. The cat program is a standard Unix tool for displaying text files; type man cat at the shell prompt to learn more about cat and its various options. Because strgen just creates a bunch of text in a file (which happen to be ASCII numbers), cat works just fine and displays the output to you.

The program intgen is quite similar but also a little different. Instead of generating numbers in a file as text, it generates numbers in binary form, storing them each as a 4-byte integer in the file. Sometimes we just call this binary form as it is the raw twos-complement method for storing the file. Compile it and run it too, using directions quite similar to those above.

Data inside files created by intgen is a little harder to view. Try running cat on it, for example. What do you see? Do you understand why you see something weird?

Step Two: Understand The Tools

Now that you have some sense of how the tools work, you should spend some time studying their code. How does each open a file? How does it print contents to the screen? How does it get input from the user via the argv array? All of these mysteries are answered in the simple code we give you; study the code and make sure you understand it before proceeding. The reason is simple: you can copy much of this code in building your remaining tools!

Step Three: Building Your Own Cat

Don't worry, we're not going to build a real cat here (that said, if you know how to build a real live cat from C code, please do contact me). What we'll do is make two versions of cat ; the first is called strcat , and the second intcat. Each of these is run in a similar way, and should be used to display the contents of a file made by the respective tools strgen and intgen.

Let's start with the strcat tool. This tool will be run as follows:

shell% ./strcat testfile
When run like this on a file generated by strgen, the program should display the numbers in the file, one number per line. Note that if the user calls the program incorrectly (say, without an argument, or with two or more arguments), the program should print an error message (usage: strcat ) and call exit(1).

To build this tool, you should use the following routines: fopen (to open the file), fgets (to read each line), printf (to print each line), and fclose (to close the file when done). To learn more about these routines, read the man pages by typing, for example, man fgets at the prompt:

shell% man fgets
Note that strgen.c uses some of these (e.g., fopen), but perhaps slightly differently. For example, fopen is called by strgen with a “w” as the second argument, indicating the file should be created and opened for writing. You'll need to use the “r” flag as this program only needs to read the file, not write it.

The second tool you'll be building (intcat) is slightly different. Instead of using fopen/fgets/fclose to read the file, you should instead use the calls open/read/close, much like intgen . The flags passed to open are a little different; see the intgen.c source code for how to open and create a file (lots of ugly flags like O_WRONLY|O_CREAT|O_TRUNC are used); reading a file is easier, however, just needing a O_RDONLY flag and nothing else. Read the man pages of these calls carefully to see how to open, read, and close a file correctly.

shell% man 2 open
shell% man 2 read
shell% man 2 close
Note that the number two is required above to get the right version of the manual page (section 2 of the man pages is for system calls, which are calls to the operating system for service of some kind).

Both of these tools (intcat and strcat) generate human-readable text and print it to the screen; thus, the output format is in readable ASCII characters, not binary, even for intcat.

Step Four: Building Reverse

In this penultimate portion of the project, you probably have the hardest task in front of you: to build a program that reverses what is in a file generated by one of the tools above, producing a new file with the numbers listed backwards. Thus, you will be creating strrev which takes a file with ASCII number strings (as generated by strgen ) and produces an output file with those numbers listed in reverse order (and still in ASCII format). You will also create intrev which does the same but takes an input file filled with binary integers (as generated by intgen ) and reverses those, thus creating a new binary file. Some more details follow.

Specifically, the form each of these tools should take is:

shell% strrev infile.str outfile.str
shell% intrev infile.int outfile.int
This means that each program is run with two arguments, the first being the name of the input file (in the right format!), and the second being the name of the output file (which is in the same format).

To achieve these goals, you'll be using two different implementation strategies. We'll start with intrev as it is a little easier. For the intrev program, the idea is to allocate a large array (big enough for the entire file), read the file into that array, and then simply step through the array backwards, printing out the reversed results.

There are a few routines you need to learn how to use in order to make this work as desired. The first is the fstat call (or the stat call), which can be used to figure out the size of the file. Read the manual page (man fstat) to see how to call it and what it returns about the size of the file.

Then you'll have to call malloc to allocate a big array of integers of that size. This is rather easy; for example, to allocate an integer array of size 100 bytes (which can hold 25 integers, as each integer is 4 bytes), we just do the following in C:

int *p = malloc(100);
// now p 
0 refers to the first int, p
1 the second, ..., up to p
24

Then you should just call one big read() system call, with the size of the file as the third parameter and the array as a second parameter, to read in the contents of the file. At this point, the entire file is in memory, and each element can be easily accessed via array subscripts.

Your last task for intrev is to step through this array backwards in a loop, printing out each integer to a file, one integer at a time. Use your intcat program to see if it works!

The second reverse program, strrev, is a little more complicated, and requires you to build a simple stack in C. A stack is a data structure in which the last element put in will be the first removed, much like a stack of plates. In your strrev program, every time you read a line of input from the file, you will add that line to your stack; when you have reached the end of the file, you simply open (and create) a new output file, and start popping elements off your stack and writing them to the file. Doing this (correctly!) will lead to a nice newly created file in reverse order.

To do this you'll basically have to understand how to build a simple list-like data structure in C. Read the appropriate chapters of K+R and see if you can figure it out. There's not much code to write, but of course getting it wrong makes things go very poorly. That's C for you!

Step Five: Building strtoint

In this final part of this rather detailed project, you'll be building one last tool: strtoint. This tool takes a file in str format (as generated by strgen ) and writes out an integer (binary) version of the same, such that the resulting new file seems as if it has been generated by intgen and not strgen.

This code will be much like the code in strcat above (for reading in a file), and it should have some similarity to intgen as well (for writing the file); the tricky part is parsing the input. Given an input line, you'll have to figure out how to read in a number in ASCII format (including, possibly, a leading negative sign) and then turn it into the binary form for writing. The routine atoi() will be useful for the conversion from string form to integer form (read the man page for details), but you still have to deal with the negative sign yourself. Understanding how character arrays work, and how to access elements of them, will be key in this part.

Bonus question to ponder: Why isn't building a similar tool inttostr very challenging?

Grading

Our grading will be mostly based on automated tests that the TA creates to test each of your programs. More on this shortly. The tests are usually released one week before the deadline.

Handing It In

To turn in your source code for grading, copy all the source files into your handin directory. Your handin directory for this first project is ~cs354-3/handin/login/p1 where login is your login. For example, Remzi's login is remzi , and thus he would copy his beautiful code into ~cs354-3/handin/remzi/p1 . Copying of these files is accomplished with the cp program, as follows:

shell% cp strcat.c ~cs354-3/handin/remzi/p1/
shell% cp intcat.c ~cs354-3/handin/remzi/p1/
shell% cp strrev.c ~cs354-3/handin/remzi/p1/
shell% cp intrev.c ~cs354-3/handin/remzi/p1/
shell% cp strtoint.c ~cs354-3/handin/remzi/p1/
A faster way to copy all of the files at once might be this:
shell% cp *.c ~cs354-3/handin/remzi/p1/
The use of *.c matches all files that end in .c (or dot-c as it is pronounced).

Finally, into your p1 directory, please make a README file. You can create a README file with your text editor of choice. In there, describe what you did a little bit, as you would if someone were downloading the code and you wanted to tell them a little about it and how it all works.