Project #3: A Real Page Turner

Project Due Date: Thursday, April 18 at 11:59 PM

0 Updates and Notes

4/14: Correction: The hand-in directory was previously specified incorrectly. Please look below for the correction (which is as before).

4/8: Correction: Please realize that all of the scheduling mechanisms should work even if a process never calls File_Mmap(). Thus, once a process calls OS_Init(), it is a part of the system and should be scheduled until OS_Fini() is called. This updates a previous version of this document which stated something about contacting the OS during File_Mmap() to get scheduled, which has now been removed from the document.

4/4: The quantum time on the command line to the OS should be specified in milliseconds (ms).

4/1: If you want to use PAGESIZE in a struct, you may have difficulty, because it is really a run-time variable defined by the system. Therefore, to use it in a struct, you may just want to go ahead and declare your own MYPAGESIZE definition and set it to (8192), the page size used by the SPARC machines in the Nova lab. If you want to be a careful programmer (which you do), put an assertion into your code to ensure that MYPAGESIZE is equal to PAGESIZE, i.e., assert(PAGESIZE == MYPAGESIZE). To use assertions, you have to include the header file "assert.h". Read more about assert() in its man page.

Tip: "/usr/bin/man -s 3head signal" gives a reasonable overview of signals. How did I find this? Look in the SEE ALSO section of a man page sometime (in this case, I was looking at "/usr/bin/man -s 2 kill"), and you'll see other man pages that are related to the one you are looking at.

Tip: In this description, I will call your user-level operating system (the thing that you implement), the "OS", and I will call the real OS "Solaris". Hopefully, there will be no confusion between the two.

1 Objective

When you are finished with this project, you should have a much better understanding of how to schedule processes and how to manage virtual memory. You will also expand your Unix knowledge to include signals, memory mapping, and memory protection.

2 Overview

This project will again include our two old friends, the user-level OS, and the library interface to OS services, LibOS. Specifically, we are adding two new features to the OS: the ability to schedule processes that are connected to it, and some basic memory management services. Both of these aspects will require you to learn about Unix Signals, as they will form the core mechanism for accomplishing our goals. The memory management part of the project will require you to understand mmap() and mprotect(), two of the fundamental memory management system calls of Unix-based systems.

A note about threads: Your previous OS was a multi-threaded beast, but this OS does not have to be. In other words, you can use a much simpler non-threaded structure inside your OS to service requests from the LibOS. Of course, if you want to use threads, you can, but if you do, please see me first.

A note about signals: In a number of parts of this project, you'll need to use signals to implement some key piece of functionality. The basic idea is this: a process (or even the real OS, i.e., Solaris) can send signals to other processes; if the process recieving the signal expects that signal and wants to know when it happens, it can register a signal handler with Solaris. A handler is just a function that gets called when the process is sent a particular signal. The process of setting up a signal handler, and then it getting invoked when that signal occurs is called catching a signal. There are many more details that you'll need to learn about signals, and they are described further below, including which signals to use, how to send signals, how to set-up your signal handler, and so forth.

As always, this project involves a lot of work, so start early! A lot of coding is involved, as well as careful design work to make sure your overall structuring will be satisfactory.

3 Scheduling

We'll start by describing the scheduling part of the project. Your OS will now have to keep track of which processes it is scheduling, and then implement a scheduling policy in order to switch between them. All policies will only allow one process to run at a time (by process, I mean one of the processes that is connected to your OS). To understand how this all will work, there are a number of items we need to understand better.

3 . 1 How does the OS know which processes to schedule?

To schedule processes, the OS must know which processes to schedule, and in particular needs to know their process identification numbers, or what are commonly known as PIDs. To get the process PID, you should add code into the OS_Init() routine, which contacts the OS and tells it the PID of the process that is sending the message to the OS. To obtain the PID of a process, just call getpid(), a system call which returns the PID of the process that called it. Read the man page for more details.

3 . 2 How do I control process execution?

For some scheduling policies (e.g., round robin), you will need to stop and start processes during their execution. Stopping and starting can both be accomplished by sending processes a signal from the OS. How is this done? The process is quite simple, actually. To send a signal to a process, you should use the kill system call, or the more general sigsend() system call. Type "/usr/bin/man -s 2 kill" or "/usr/bin/man -s 2 sigsend" to find out more about this call (note: there is also a command line program called kill, and it can be used to send signals to processes from the command line. Internally, all the kill program does is call the kill() system call).

Of course, you have to know which signal to send to a process to stop it from running, and then perhaps to get another one running. To stop a process, use the SIGSTOP signal. To continue running a process that has been stopped, use the SIGCONT signal. Of course, there are many other signals that can be sent to a process, but those two should be good enough for now. Note: The signals that are used to stop and start processes do not have to be "caught". In other words, the LibOS does not have to set up a handler to try and catch them. When you don't set up a handler, Solaris takes the "default" action of that signal; in this case, it will be to either stop or start the process. Some signals (like these) can't be caught anyhow, but that may be besides the point.

3 . 3 How should I implement a time slice?

For some policies, you will need to run a process for an amount of time, stop it, and then start another one running, let it run for a while, stop it, and so forth. To do this, you will need to have the ability to implement a "time slice" (also called a "time quantum"). How should the OS do this? Well, we'll use signals again to get the job done. Specifically, we'll use the ualarm() system call. When you call ualarm(), what you are doing is telling the real OS, Solaris, to send you a signal at some time in the future. Specifically, Solaris will send you the SIGALRM signal. What you need to do is to set-up a handler to catch this signal, and inside of that handler, implement your "context switch" code, that is, the code that stops the currently running process, and starts the next one running, depending on the policy in use.

You must be careful when receiving this or any signal. Remember that a signal can be received at any time. If the function that handles the signal needs to manipulate some data, you need to make sure that some other function was not in the middle of manipulating this data when the signal arrived.

For example, assume the OS has received a create process request from some process and is in the middle of manipulating a queue to add this new process. Now the timer interrupt goes off and the SIGALRM signal is sent to the OS. This would mean that the function that does scheduling would get invoked. This could be bad news if this function also manipulates the same queue (which is now in an inconsistant state from the process create operation). To prevent this from happening, you will need to mask the interrupt signal while you are manipulating the runnable queue (or any other data the signal handler uses). You can think of this process as quite similar to the OS turning off interrupts in order to provide mutual exclusion.

You can find a a short snippet of code that describes how this masking works.

Of course, your code can be done differently from that and there will probably be more to it, but that is the basic idea. If the timer goes off while the SIGALRM signal is masked, your process will delay handling the signal until it is unmasked. Recall from the previous section that if a signal is masked, it does not mean it is lost. As soon as you unmask it (the last line in the code above), the pending SIGALRM signal will be delivered and the handler for it will be run.

3 . 4 Which scheduling policies should be implemented?

For this project, you will have to implement two different scheduling policies. They are:

FIFO policy:

Round-robin policy,

3 . 5 How exactly should the OS implement a "context switch"?

Inside the scheduler, you will use an alarm to implement a context-switch, which is needed for the round-robin and lottery policies. Thus, when running the OS with a FIFO scheduler, there is no need to use the alarm feature, and for the rest of this discussion, we will assume we are talking about round-robin.

So what should the signal handler for SIGALRM do? Well, it should schedule another process to run (if there is another process available to run). When running as a round-robin scheduler, you will need to set the timer and then catch the signal generated when this timer expires. To set the timer, you will use the ualarm() function. The prototype for this function is as follows:

useconds_t ualarm(useconds_t useconds, useconds_t interval);

The useconds_t type is simply an unsigned integer. The useconds argument indicates how many microseconds the timer should be set to. The timer then ticks backwards until it reaches zero. At this point, it sends the SIGALRM signal to your process (which you will need to catch). You should be able to always set the second argument, interval, to zero. For your own knowledge, though, this argument would be set if you wanted the timer to go off at set intervals after the first useconds have passed. Setting it to zero simply means that the timer will only go off once - after useconds. If you want the timer to go off again, you will have to make a second call to ualarm().

To make this work, you will have to write a function to deal with the timer signal. You will then have to register this function with the OS (as described above) so that it gets invoked whenever there is a timer interrupt. Inside this function (or other functions that it calls) you will need to stop the current running process and start a new one running. To stop a process, simply send it the SIGSTOP signal. To restart it, send it the SIGCONT signal. You do not need to define a special signal handler for either of these two signals - the defaults provided by Unix will work just fine. To send a signal to a process, use the sigsend() system call (check it out in the man pages). Another option is to use the kill() system call.

3 . 6 What data structures will the OS need to implement scheduling?

How does the OS know which process is next to run? All you have to do here is keep a data structure that allows you to cycle through all processes, picking the next one to run. An array may be a possibility, or a circular queue. Any time a new process is created, add it to the data structure. Any time a process gets stopped by the context-switch code (for round-robin), or completes (for any policy), you should just move to the next process in the data structure and start it running. Finally, whenever a process terminates, you should remove it from the data structure, and make sure to start the next process running.

3 . 7 Is there any extra credit available in the scheduling part of the assignment?

For extra credit, implement a lottery scheduling policy. Each process that attaches to your OS will get scheduled for the time quantum based on a lottery, as described in class. To implement a lottery, each process needs to be assigned some number of tickets. In a real system, this would be something that is carefully controlled by the administrator, but for the sake of this part of the project, just add a new interface to the LibOS, called Sched_SetTickets(). With this routine, a process can specify how many tickets it has to the OS. By default, assume that each process has one ticket. More details on the interface specification can be found below.

4 Memory Management

In the second part of the project, you are going to provide the functionality of a "memory-mapped file" to processes that link with your LibOS. A memory-mapped file is just a file that gets mapped into a part of the address space of a process. Once mapped, you can access the file's contents with pointers, instead of using read() and write(). It's a pretty neat idea, and we'll provide some examples of how to use such a feature. In any case, it's what you'll be implementing in your OS.

For this project, any file that is mapped into the address space of a process will be "demand paged"; that is, instead of bringing the entire file into the process address space at once, pages of the file will be brought in one at a time, as the process accesses them. How will this "page fault" be implemented? Again, with signals. Specifically, your LibOS will set up a memory mapping such that any access to a page will cause a fault to occur; by catching a "segmentation violation", the LibOS will be able to tell that a page was accessed. At that point, the LibOS will send a message to the OS to get the page, and get a message back with the data. The handler should make sure the page is now valid, and then copy the data into the right spot in the address space. At that point, the process should be able to access the page and get the data.

For the standard project (not including extra credit), you will only have to worry about files that are accessed "read only"; that is, a file can be mapped into the address space of a process, and the process can use pointers to read the data, but not to update it. In the extra credit portion of the project (described below), you would add the ability to write the data (and all of the details that ensue from that).

Unfortunately, it's not that simple. At some point, the OS will decide that too many pages are actively in use, and will have to reclaim some. For example, if the OS is emulating a system with N pages of physical memory, if a process asks for what would be the N+1'th page, the OS will first have to reclaim one from some process. To decide which page to reclaim, the OS will use the Clock algorithm, as described in class, but with something called "software emulation of the reference bit." The traditional clock algorithm assumes that the hardware sets a reference bit for you; unfortunately, this is not always available (as in this project). Thus, instead, your OS will periodically tell all processes to change their mappings such that subsequent accesses will again induce faults; the OS will use that information to determine which pages are under heavy utilization, and which aren't. More details on all of this is found below.

4 . 1 Are there any new interfaces in LibOS that are relevant to memory management?

New interface: void *File_Mmap(char *file, unsigned int *size). The only new interface in LibOS to support memory-mapped files is the File_Mmap() interface. It takes two arguments, the name of the file to be mapped and a pointer to an integer, and returns a pointer to where it got mapped into the address space, and sets the integer to the size of the named file. The process should then be able to access the contents of the file through the pointer that was returned. More details on this interface are available below.

4 . 2 What should File_Mmap() do?

File_Mmap() has to do a bunch of things. First, it has to make space in the process address space for the data that will be read in from disk (through the OS). To do this, it needs to know how big the file is. Thus, a message must be sent to the OS which gets the file size, and then returns it the the LibOS. Once the size is known, the LibOS must create a space in the process's address space where the file will be placed. This space is created with the mmap() system call, which is described in more detail below. The basic idea is simple: the LibOS just asks Solaris for a region of memory in the process address space that is big enough to contain the file.

After this memory has been allocated, the LibOS has to make sure that when the process tries to access the memory, it will generate a fault of some kind that the LibOS can catch. The LibOS accomplishes this by using the mprotect() system call, which is also described more below. The mprotect() call allows you to change the protection bits of pages within the process address space; in this case, the protection bits should be changed to PROT_NONE, which means that any access (load or store) will cause a segmentation violation. Thus, the LibOS must make sure to set up a signal handler to catch the SIGSEGV signal.

After the mapping is complete, the LibOS needs to inform the OS of where the mapping is in the address space of the process, and which file is mapped to that location. The OS will need to track these things in order to handle a page fault when one occurs.

4 . 3 What happens on a page fault?

At this point, after your process has called File_Mmap(), it should get back a pointer to the first page of the file. It may then access the file by dereferencing that pointer. For example, if the process thinks the first four bytes of the file contains an integer, it could access that integer like this:

int size;
void *vptr = File_Mmap("/tmp/file", &size);
if (vptr == NULL) { // error }
int *iptr = (int *) vptr;
printf("First integer of the file is: %d\n", *iptr);

If you have set things up correctly within the File_Mmap() call, this access will trigger a segmentation violation, and the segmentation fault handler within the LibOS will get called. In that handler, you will have to do a couple of things. First, you'll have to figure out what at what address the fault was generated (how to do this is described in an example below). Then, you'll have to send a message to the OS, telling it what address the fault occured at. The OS will have to be able to take this address, look it up in a "page table" for this process, figure out if it's a legal reference (or out of bounds), and it's a legal reference, read the data from disk (one page worth) and send the data back to the OS.

The LibOS will be waiting for the data, and once received, it should copy it into the correct spot in the process address space, and change the mapping with mprotect() to PROT_READ so that the process can read the data from the page. When LibOS returns from the handler, the instruction that generated the fault will run again, and it should be able to access the data at this point.

4 . 4 How should the OS handle a fault (without replacement)?

When the OS gets a page fault from a process, it has to do a number of things. First, the OS has to find the page table for the process that generated the fault, which should be easy since the LibOS should send the PID of the process along with the fault message. Second, the OS must look up the address within the page table to find information about the specific page. Specifically, the OS has to check if the address was a valid mapping, that is, to a part of a file that has been mapped with File_Mmap(). If not, the OS has to tell the LibOS there was an error, and the LibOS should just print out a message and exit, thus terminating the process.

If it is a legal mapping, the OS must do a number of addtional steps. First, the OS should find a "free physical page" and allocate it to this process, and record that mapping in the page table. If there is no free page, a replacement will have to take place, as described in the next bullet. Assuming a free page is found, the OS will then update its page table to record that this page is mapped for read-only access and is in memory, set the reference bit for the page to 1, and figure out which page of the on-disk file must be read in order to fulfill that request. The OS should then read the page, and send it back to the LibOS, which will put it in place and let the process continue running.

4 . 5 How should the OS handle a fault (with replacement)?

The hardest part of the memory-management part of the project will be to implement the replacement code. Replacement occurs when the processes have used up all of "physical memory", and thus the OS needs to kick a page out before allocating a page to a the current page-fault request. To do this, the OS has to run the Clock algorithm. Roughly, this is how the Clock algorithm works:

Check the reference bit of the page currently being pointed to by the "hand". The hand should be pointing to the first page after the page the last search replaced.
If the reference bit is zero, choose this page for replacement and make the hand point to the next page in memory. This is where the next search will start.
If the reference bit is one, set it to zero and go on to the next page.
Repeat steps 2, 3, and 4 until a zero reference page is found (this is guaranteed to terminate because you are setting the reference bits to zero as you go).

Unfortunately, the Clock algorithm doesn't quite work without hardware support. Specifically, the reference bit needs to be set every time the process touches the page, which cannot be done without hardware support. We could implement straightforward Clock without hardware support, but imagine what would happen over time. Each process would have a number of pages mapped, and the OS would never really have a good idea of which one to replace.

What are we to do? Fortunately, someone once had the idea that we can use software techniques to emulate the reference bit. Specifically, the OS can periodically tell all LibOS's to invalidate their page mappings (say, by changing their protection so that a fault will be generated upon next access). By doing this, the process will generate a fault upon the next access to the page, and the OS can use this information to set the reference bit to one. Note that the OS does not have to re-send the page to the LibOS in this case, as the page is already there. However, it has to be able to distinguish between this case and the first-time demand fault as described above.

4 . 6 How should the OS tell a process that one of its pages has been reclaimed?

For the OS to reclaim a page from a process, it has to let the LibOS know which page to unmap. To do this, the OS should send a signal to the process. When the LibOS catches that signal, it should immediately perform a Domain_Read() to wait for further instructions from the OS. Of course, the OS had better send a message at that point, which tells the LibOS exactly what to do.

4 . 7 How should the OS tell a process to change the protection status of all of its pages?

In the exact same way as the previous question, except the OS will send a different message!

4 . 8 How often should the OS turn off protection so as to implement software-emulated reference bits?

In a real OS, this would be a "voo-doo constant" that the implementors would decide, say once every 10 milliseconds. However, for testing purposes, you should only clear the reference bits of all mapped pages when the OS receives a special message from one of the processes; this message is sent when a testing interface of the LibOS is called, known as OS_ResetReferenceBits().

4 . 9 How should the OS read the data from disk?

When the OS wants to read a page of an on-disk file to fulfill a page fault request, it can use the simple open(), lseek(), read(), and close() interfaces, as shown in this example code. Important: do NOT assume that all files are a multiple of a page in size! That means you have to be able to read in less than a page full of bytes for the last page of the mmap'd region.

4 . 10 What data structures will the OS need to implement memory management?

To implement memory management within the OS, the OS is going to need a few data structures and some state information. How you do this is completely up to you, but here are a few suggestions (you do not have to follow these if you don't want to).

Have a process descriptor structure. This can store all the information the OS needs to know about a specific process. In particular, the process ID, a pointer to the process's memory map structure, the socket address of the process (needed for sending messages to the process), etc.
A memory map structure. This should contain an entry per file that is mapped by a process. For each file that is mapped, the name of the file, the range of addresses it covers (a start address and length), and a pointer to a page-table structure could be included.
A page table structure. For each file that is mapped, a page table will be needed. This can just be an array of page table entries, each of which tells you which "physical page" got allocated to this virtual page, and maybe stores some protection information about each page.
You may want some structure that represents all of the frames in memory. This can make life much simpler and faster when trying to find a page in memory to replace. It will prevent you from having to search through all of the various processes page tables looking for valid pages that haven't been referenced.

4 . 11 Is there any extra credit for the memory-management part of the project?

For extra credit, modify your OS to also allow for writes. Thus, as a page gets modified by a process, the OS will have to record that the page is "dirty"; if the page gets "paged out", the process should send it back to the OS, and the OS should write it to disk, thus updating the file. However, note that the process may have dirty pages at the end of its run, which never got forced out to disk. In this case, the OS_Fini() routine should make sure that all dirty pages have been flushed to the OS, and the OS again should flush those pages to disk.

If you are doing this part of the project, please first talk to me about it.

5 LibOS Interface

Like project 2, this project will require you to create a shared library called libOS.so. You will do this in the same manner you did for project 2. However, the code for in LibOS.c will be much different. It still contains the osErrno variable for defining errors but you will now implement the following functions:

int OS_Init(char* file)
Create a socket for the process and obtain the address of the OS socket. Any other initialization you need to do for your library should also be done in here. What is different from the last project is that OS_Init() should communicate with the OS, in particular telling the OS its process ID (PID). The PID of a process can be obtained with the getpid() system call. If this call fails for any reason, simply return a -1 (osErrno set to E_GENERAL). If the call is successful, return 0.

void *File_Mmap(char* file, int *size)
This function will allocate space in the process's address space for the file, and demand page the file into the mapped region by contacting the OS upon a fault. The function should map enough bytes so that the entire file can be accomodated. Upon success, File_Mmap() should set the size parameter to the size of the file, and return a pointer to the beginning of the mapped region. If for some reason mmap fails, File_Mmap() should return NULL and set osErrno to E_MEM_ALLOC. If it should fail for any other reason it should return NULL and set osErrno to E_GENERAL.

int OS_ResetReferenceBits()
When OS_ResetReferenceBits() is called, the LibOS should send a message to the OS telling it so. The OS will take this as a command to tell all processes to mprotect() their mapped file regions, in order to get a better idea of which pages are truly being accessed. This is required for testing only (a real LibOS wouldn't provide such an interface). OS_ResetReferenceBits() should return 0 upon success, and -1 upon any failure, with osErrno set to E_GENERAL.

int Sched_SetTickets(int tickets)
This routine is used to inform the OS of how many tickets a process has, when using lottery scheduling (extra credit). Sched_SetTickets() should return 0 upon success, and -1 upon any failure, with osErrno set to E_GENERAL.

int OS_Fini()
Send a message to the OS indicating this process has terminated and would like to be removed from the list of active processes. If this call fails because the process is not currently in the list of active processes, return -1 and set osErrno to E_NO_SUCH_PROC. Otherwise return 0. The OS should take this message and make sure to clean up all state associated with that process.

You will also be required to create a second file called Handlers.c. This file will be linked with your library. LibOS will use the signal handlers defined in Handlers.c (which you implement) in order to catch various signals sent by the OS. This file will contain the following functions:

void RemovePage_Handler(int sig);
This should be registered by the LibOS as the signal handler for dealing with the 'RemoveSignal' signal. 'RemoveSignal' is defined inside Handlers.h and is actually SIGUSR1. The reason this signal was generated by the OS is because some page belonging to this process has been revoked and given to another process, or that software emulation of the reference bit is occuring. When a process receives this signal, it should call Domain_Read() to await further instruction from the OS. The OS should then send a message telling the LibOS what to do.

void SegFault_Handler(int sig, siginfo_t* sigInfo, void* context);
This should be registered by the LibOS as the signal handler for dealing with the 'FaultSignal' signal (actually SIGSEGV as defined in Handlers.h). When a process receives this signal it should send a message to the operating system notifying it of the signal and telling it what address (or page) caused the segmentation fault. The OS will examine its page tables to determine why the fault occurred and will then send back a message to indicate what the fault was all about. There are three possibilities:

The process has accessed a piece of memory that is in its address space, is considered in physical memory by the OS, but has its reference bit cleared (an emulation fault). If this is the case, the OS only needs to know about this signal so it can mark the page as referenced. The process should then use the mprotect() function to make this page readable.
The process has accessed a piece of memory that is in its address space but is not considered to be in physical memory by the OS (a page fault). In this case, the OS should handle the fault as described in the memory management section, eventually sending the page to the LibOS.
The process has accessed a piece of memory outside its address space. In other words, this is a real segmentation fault (remember, segmentation faults that occur because you try to actually reference an invalid piece of memory also get sent to this handler). If this is the case, you can either catch this error inside Handler.c or have the OS catch it and tell you about it. Either way, your program should print out an error message and terminate.

Any program you write that needs to use LibOS.so should be linked to it just as in project 2. You should also compile Handlers.c into Handlers.o and link it together into libSO.so.
As for LibOS, you are just creating a library, which should be a file called libOS.so. This library has a pre-defined set of interfaces, as defined below. As before, libraries don't have a main routine, so you will have to create your own test programs in order to test if your library (and your OS) are working correctly.

Important Note: As before, for testing, we will be linking our own programs with your library. Thus, it is very important that you don't change any of the interfaces specified in for the LibOS!

6 The OS Interface

You should make the OS runnable as follows:



prompt> ./os -f filename -n numPages -p policy [-q quantum]

The -f filename flag is required, and passes to the OS the name of the file that it will be bind to; other processes will use this name to direct their messages to the OS process. The -n numPages flag tells the OS how many physical pages it has. The policy is used to specify the scheduling policy that the OS should use, and there are three options: fifo, rr, and optionally, lottery. If the policy is round-robin, we need a time quantum, so we use an optional flag -q quantum, which specifies the quantum duration in milliseconds. Of course, bad file names, negative numPages, bad policies, and bad quantums, should all be rejected, and an appropriate message printed out.

One thing that you notice: this command line parsing is getting to be a pain! To help ease your burden, we provide a nice piece of example code using getopt() a simple and effective argument parser.

Click on getopt.c to get the code.

7 Program Design and Implementation

Before writing a single line of code, both partners should sit down together and design the entire system. This is so important that it will be repeated, this time in bold. Before writing a single line of code, both partners should sit down together and design the entire system.

Now that we have that out of the way, here are some suggestions on how to approach this.

Read the entire project description at least

twice,

Meet with partner at least once to come up with some preliminary designs. At this point, both partners should be looking at how all the major components should work and fit together. This should be from a very high level. Don't worry about the tiny details here. Some people feel that doing a flow chart at this stage (showing how the different components are to interact) is very helpful in doing the next step.
Decide on which partner should do the detailed designs for which parts. Some good suggestions at this point are for one person to analyze how the scheduling mechanisms will work and for another to look at the details in the paging mechanisms. There is a little cross-over here so make sure you talk to your partner regularly to stay on the same page with each other.
Completed detailed design. Both partners should have a written description of all the functions they see being needed. This description should also include psuedo-code describing what each function should do. Remember, no code has been written at this point - you are designing the system first! When both people have their halves done, the partners should sit down and review each other's work, make suggestions, corrections, modifications, etc. Of course, if you've been in touch with each other throughout, this part goes much more smoothly - there are few suprises in store for your partner.
Start coding. First, write small pieces of code that get you familiar with the new functionality. Then, slowly move towards the desired end-goal, testing as you go.

DO NOT WRITE ALL OF THE CODE WITHOUT TESTING AS YOU GO.

Put it all together. If you've really taken the time before this to design a good system, you will be amazed at how easily this step goes. Of course you will run into a few snags but they should be few and quickly dispatched. Believe it or not, many professional software developers will tell you that coding is the easiest stage in any software development. The only reason for that, however, is because they do their homework up front designing the system.
Hand it in. Have a beer (if you're of the legal age limit, of course).

There are three major points about the above stategy. Number one, stay in touch with your partner. Do not divide the work and then speak to each other only the day before the due date. Stay in touch! Number two, work hard to develop a good design before writing any code. It is very hard to over-emphasize this point (although I'm trying hard). And lastly, get started early. Don't wait until the last week. We give you three weeks to do these projects for a reason - they take that long.

8 Help with signals, mmap(), and mprotect()

8 . 1 Help with Signals

Signals provide a way to deliver a notification of an event to a process, in a way quite similar to the hardware raising an interrupt for the OS (in fact, many signals originate from hardware interrupts). Whenever some major event happens - say a segmentation fault, a timer interrupt, a keyboard interrupt (control-C), and so forth - the operating system sends the relevant process a signal. The program has three different options when dealing with signals. These are:

Ignore the signal. Some signals (SIGKILL for example) cannot be ignored.
Let the operating system defined default handler deal with it. This is what happens most of the time. For example, on a SIGSEGV (segment fault) signal the default handler terminates the offending process.
Define a specific function to run when the signal arrives. This is what you will do for this project for several signals (including SIGSEGV). Now, whenever a signal arrives that has a handler, whatever the process is doing is stopped and the signal handling function is run. If the handling function finishes and returns, the process continues its execution from where it was interrupted. There are several signals that don't quite work this way but you do not need to worry about them for this project.

For a complete list of all the different signals, you can simply type /usr/bin/man -s 3head signal inside the shell on a nova machine.

To define a function as a signal handler, you need to write the function and then register it. Depending on how much information you want to pass the interrupt handling routine you have two options for how to declare a handling function and how to register it with the operating system. These two methods are described below:

Handling Function Prototype Registration Description

void functionName(int sig); struct sigaction action;
action.sa_handler = (functionName);
action.sa_flags = SA_RESTART | SA_RESETHAND;
if(sigaction(SIGNAL, &action, NULL) < 0) { error(); }

void functionName(int sig, siginfo_t* sigInfo, void* ucontext); struct sigaction action;
action.sa_sigaction = (functionName);
action.sa_flags = SA_RESTART | SA_SIGINFO | SA_RESETHAND;
if(sigaction(SIGNAL, &action, NULL) < 0) { error(); }

You will need to replace functionName with the actual name of your handling function. You will also need to replace SIGNAL with whatever signal you are trying to catch. The big difference to note between these two is that in the first case, your function can only accept a single integer - the signal number. In the second case, your function must accept 3 arguments. The first must be registered using the sa_handler field of the sigaction stucture. The second uses the sa_sigaction field. If you are using the siginfo_t structure in your function their are several fields of interest. These are:

pid_t si_pid: the process id of the process that caused the signal.
void* si_addr: a pointer to the instruction that caused the signal. This is only valid if the signal was caused by some instruction - a memory access instruction that caused a segmentaion fault, for example.

There are many more fields in this stucture and you can look them up if you want.

Whenever a signal handler gets invoked, you need to reregister it. The reason for this is because the Solaris operating system automatically returns the signal handler back to the default after a signal. Hence, all of your signal handlers will have code similar to the following in them (this handler is for a SIGSEGV signal):

void foo(int sig) {
   struct sigaction action;

	// reregister handler
	action.sa_handler = (foo);
	action.sa_flags = SA_RESTART | SA_RESETHAND;
	if(sigaction(SIGSEGV, &action, NULL) < 0) {
	   perror("registering seg fault handler");
		exit(1);
   }

	// do the actual handling of the signal now
}

Of course, if your function requires the information provided by the siginfo_t structure, your handling declaration and registration is going to look slightly different but the concept is exactly the same.

There is an important thing to realize about signals - they can happen at any time. This means your program can be in the middle of manipulating some data structure when the signal arrives. Your process will stop what it is doing and jump to the signal handler. If your signal handler is also going to manipulate that data structure, you will most likely have problems. The reason is because the data structure is in an inconsistent state. You have no idea where the last function was when the signal occurred (this should sound earily familiar to having multiple threads manipulating the same data). To prevent this case from happening, Unix provides the programmer a means of masking signals. When a signal is masked, the signal is blocked until it becomes unmasked. The signal is not ignored, it is just blocked from delivery. As soon as you unmask the signal, it will get delivered to the process - no matter how long it has been waiting (blocked).

To mask signals you will want to use the sigprocmask function as well as several other functions. Their prototypes are defined as follows:

#include <signal.h>

int sigprocmask(int how, sigset_t* set, sigset_t* oldset);
int sigemptyset(sigset_t* set);
int sigfillset(sigset_t* set);
int sigaddset(sigset_t* set, int sig);
int sigdelset(sigset_t* set, int sig);
int sigismember(sigset_t* set, int sig);

In some sections below, your will see examples on how to use some of these functions. For more details, check out the man pages or look on-line. The basic purpose of these is to prevent the deliver of a signal until some future time. Be careful, though, and don't forget to unblock a signal if you want it to be delivered at a later time.

8 . 2 Help with mmap() and mprotect()

One of the useful features of Unix is the ability to map a file into memory and then access the file just as if you were accessing memory. To do this, you use the mmap() function. It's prototype is as follows:

#include <sys/mman.h>

void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t off);

Don't worry, this function is not as daunting as it looks. First of all, you should be able to tell from the fd parameter that you are going to have to pass it the file descriptor of an open file. This means you have to open a file before you can call this function (there are ways around that but we'll not discuss them here). So here is what the rest of the parameters mean:

addr: a pointer to recommended starting address. For this project you can always make this NULL.
len: the number of contiguous bytes you want allocated. This number has to be a multiple of the page size (PAGESIZE), and therefore may be slightly larger than the actual file you are mapping.
prot: access permissions to the memory area. These can be:
- PROT_READ: you can read from the region
- PROT_WRITE: you can write to the region
- PROT_NONE: no access to the region
There are more protection options but these are the major ones. If you try to access a memory region in a way not permitted (for example, writing to a region that is set for read only) the system will deliver a SIGSEGV fault to your process.
flags: how the memory and file are to relate to one another. For this project you will always use MAP_PRIVATE | MAP_ANONYMOUS.
fd: the file descriptor of the file to mmap. In this project, this will always be -1.
off: the offset into the file to start the mapping. For this project you will always make this value 0 (thus letting Solaris pick where to put the mapping).

On success, mmap() will return the starting address of a memory region that starts on a page boundary. On failure, mmap() returns MAP_FAILED. You should always check this return value. If you want more information on mmap(), check out the man pages or look on-line. You will find an example of how to use this function in the next subsection.

2.3 The mprotect() Function
The mmap() function allows you to declare an initial protection value for a memory region. The mprotect() function allows you to change this value. It is fairly simple to use. Here is the prototype:

#include <sys/mman.h>

int mprotect(void* addr, size_t len, int prot);

The addr is the starting address of the page you want to protect and the len parameter is the number of bytes to protect. The len parameter should be a multiple of the page size of the system. The prot parameter is the new protection for the region. Again, these are PROT_READ, PROT_WRITE, PROT_NONE, and a few others. On success, mprotect() returns 0. On failure, it returns -1 and sets errno appropriately.

The only tricky part about using this is getting the starting address to be the starting address of a page. Fortunately, this is quite simple. All you have to do is performance a 'logical and' of the address with the page mask (PAGEMASK). (in C, this is accomplished with the '&' character)

8 . 3 An Example

A full example of how to use mmap(), mprotect(), and signals is provided in the following file:

example.c

Note that in the example, there are a few useful macros defined in sys/param.h that you will find useful, both in LibOS and your OS. These are:

PAGESIZE: this is the size of the page on your system. For the Solaris operating system the pagesize is 8 KB.
PAGEMASK: this is a mask that can be used to obtain a page's starting address.

9 Provided Materials (Summary)

The following files have been provided here for you. The first four are most definitely required for your project. The example.c program shows how to use signals, mmap(), and mprotect(). To download any of these, simply right mouse click on the file name and select "save as" from the popup menu.

signals, mmap, mprotect

masking snippet

reading a file block

processing arguments with getopt

10 Handing in Your Project

The directory for handing in your program can be found at:

~cs537-1/handin/(username)/p3

where (username) is your login. You only need to put copies of your code into one partner's handin dirctory.

You should only hand in the files that you created and/or modified. You should probably also include Domain.c, Domain.h, and all of that other stuff that is required so we can just type make and build the entire darn thing. You should also submit the Makefile needed to build your program. Lastly, don't forget to hand in a README file that indicates how to run your program, known bugs, the names of both partners, and any other information you that is important to runnning your program.

11 Grading

No late assignments will be accepted. This project is due on Thursday, April 18 at 11:59 PM.

This assignment will be graded based on correctness of implementation as well as robustness. This means your program should work under all the test cases all the time. Programs that only partially work or fail intermittently will be penalized.

If you do not have a fully functional program, it is your responsibility to be able to quickly and efficiently show which of the above functionality is working properly. For example, to show that you are creating and terminating processes correctly at the OS, you could print out the entire runnable queue every time a new process enters or leaves the system.

Handling Function Prototype	Registration Description
void functionName(int sig);	struct sigaction action; action.sa_handler = (functionName); action.sa_flags = SA_RESTART \| SA_RESETHAND; if(sigaction(SIGNAL, &action, NULL) < 0) { error(); }
void functionName(int sig, siginfo_t sigInfo, void* ucontext);*	struct sigaction action; action.sa_sigaction = (functionName); action.sa_flags = SA_RESTART \| SA_SIGINFO \| SA_RESETHAND; if(sigaction(SIGNAL, &action, NULL) < 0) { error(); }