CS 537 Notes, C Programming

1 Why C? And Why this Primitive Command-Line Stuff?

C is, by far, the most common language used for operating implmentation. It is simple to the point of being primitive, with few unexpected behaviors. System designers like it because it is predictable.

In addition, it is a least-common-denominator in that it should be a subset or compatible with almost any other language (for example, C++, Fortran, or Java).

Visual programming environments, with their automatic dependence analysis and GUI's are quite pleasant and helpful. However, they are almost never used in operating system programming. First, the code tends to be so large and complex, that it is difficult to manage in one of these environments strictly through the visual interface. Even Microsoft used Makefiles and command-line operations in building and testing Windows.

Second, debugging an operating system is still a black art. In many cases, especially relating to the lowest levels of the system, you do not have a debugger available. In the cases where you do have a debugger (such as when using a virtual machine partition, firmware debugger, or network debugger), it is often quite primitive and uses simple command lines.

As my advisor (Mike Powell) likes to say, "Life is tough on the frontiers of computer science."

2 Compiling

The simplest case (which you will not encounter in CS537, is when the source program is entirely in a single file. Of course, C source files all end with the .c suffix. The simplest command to compile a single program (foo.c) into an executable file, you would type: gcc foo.c The result of a successful compiliation will be an executable file with the name a.out. You can specify the name of the file with the -o option: gcc -o foo foo.c To get more help from the compiler in finding questionable programming practices that can lead to errors, you need to include options to turn on warnings and strict type checking: gcc -o foo -Wall -pedantic foo.c

3 Separate Compiliation 1: Compiling

Compiling your whole program in one source file is a truly bad idea. You should divide your program into clear separate abstractions, based on the types of data types (such as lists, search trees, or page tables) and basic functional units (such as command-line options processing or memory management).

Your program, therefore, will be divided into several .c files, each of which will be compiled separately. You will create a binary file, called an object file for each source file. In the Linux world, these files have the suffix .o, while in the Windows world the suffix tends to be .obj. After you have compiled each source file to an object file, you will link them all together into a single exectuable file.

To compile file bar.c to an object file (called bar.o):

    gcc -c -Wall -pedantic bar.c

After you have compiled each of your source files, foo.c, bar.c, and glarch.c into object files, you can link them together into an executable with: gcc -o foo foo.o bar.o glarch.o

4 Separate Compiliation 2: The C Code

4.1 Functions

Functions in one .c file need to be able to reference functions in another file. This means that they need to know the name and types of parameters of the functions in another file. C allows us to handle this situation by providing a couple of features. First, we can declare prototype declarations for a function. These prototypes describe the name of the function and the type of its parameters and return values. Note that it does not name the parameters, just states their types. So, if you had a function: int PageNumber (int virtualAddress) { const int PAGESIZE = 1024; return (virtualAddress / PAGESIZE); } It's prototype declaration would be: int PageNumber (int); For each source file, it is a good idea to create a separate header (.h) file that contains the prototype declarations. Then, we include the header file each time that we compile another file that uses the functions.

Assume that function PageNumber is defined in a file called PageTable.c and its prototype appears in PageTable.h. If this function needed to be used from file VirtualMem.c, then the code in VirtualMem.c would looking something like:

    #include "PageTable.h"

    . . .
    p = PageNumber (pn);
    . . .

If you are also defining data types, for example, a new structure type definition for this PageTable.c file, and if the other code that will use the PageTable.c will need these types, then you can (should) put these declarations in the include file.

4.2 Variables

A global variable is one that is not local to any specific function, i.e., it is statically allocated and accessible to all functions in all files.

Note, however, that global variables should be used exceedingly sparingly. Because they can be accessed and modified from anywhere, it is easy to have undiscipined and careless use of such a variable. As a result, they are quite error prone.

As with a function, the global variable has one declaration, in one file, that actually allocated the variable, and then has declarations that reference it from other files. So, in only one file, outside any function, we declare a character pointer:

    char *fileName;

And in each other file that wants to use this variable, we declare it as: extern char *fileName;

5 Dynamic Memory Allocation

By indirections find directions out...
Polonius in
Hamlet, Act 1

Systems programming heavily uses dynamically allocated data structures. Example are such data structures are lists or queues that grow as you add elements. Typically, new elements are allocated when needed and linked into the data structure by storing pointers to the newly allocated element. When elements are removed from the data structure, the memory must be freed.
You can find more explanation in OSTEP Chapter 14
The library call malloc is used to allocate memory and free is used to free it. In Java or C++, you would accomplish the same thing by using the new and delete operations.
A few observations about dynamically allocated memory:

You must master the use of malloc, free, and pointers. There is no way to survive as a systems programmer without it.

Pointers and dynamically allocated memory are extremely error prone. Common errors are forgetting to allocate an object (uninitialized pointer), using a pointer after memory has been freed, freeing memory more than once, using a pointer to reference the wrong type of data structure, and forgetting to free memory ("memory leak"). Careful and discipline use of pointers is important.

When things go wrong with the use of pointers, debugging can be a serioius challenge. You can scribble on a random memory location with a bad pointer and then not encounter the problem until much later in the program's execution. Finding the source of such problems is not easy. Later in the semester, we will try out some debugging tools that help with such problems.

Here is a simple structure declaration for a node in a binary tree: struct node { char *nodeName; /* Name of tree node */ int nodeValue; /* A value associated w/this node */ struct node *leftChild; struct node *rightChild; }; If we declare a pointer: struct node *p; We allocate a new element of type node with: p = (struct node *)malloc(sizeof(struct node)); if (p == NULL) { handle the error case } And, when we're done with this node, we can deallocate the memory with: free(p);

Copyright © 2011, 2018, 2020 Barton P. Miller