Read an Excerpt
Chapter 1: Internet Basics
In Depth
I'm old enough to remember when making an international telephone call was a big deal. You had to get an operator to help you. Not only that, the operator would probably take your number and call you back once the call was set up.Once the operator connected you, you'd have a poor-quality line full of noise and echoes. The real agony, however, came with the bill. Today, international calls are simple to make. The global phone network is one of those modern marvels that people don't even notice. However, no matter how easy it is to make a call, there is still one fundamental problem: language. Calling China won't help if you don't share a language with the person on the other end. Even on a domestic call, you probably can't directly talk to a fax machine or modem unless you are even geekier than I am.
The Internet is like a phone network of its own. The underlying network infrastructure allows any two computers to make a connection. However, connecting an IBM mainframe to a PaImPilot requires more than just a connection. The two computers have to agree on the topic of conversation and the format of the data.
Computers agree to communicate through a variety of protocols. Some of these are very familiar to you. For example, HTTP (Hypertext Transport Protocol) is the protocol that allows Web browsers to fetch Web pages. At a deeper level, lowlevel protocols govern the flow of raw data over the vast Internet.
What about Java?
Java is especially important to Internet protocols. Why? Because-at least in theory-the same program can run on different types of computers. If you are connecting a PC to a Unix workstation, it makes things simpler if they both are running the exact same program.Java's "write once, run anywhere" philosophy is ideal for Internet programming. In addition, the standard Java libraries have many very useful classes that take the pain out of traditional network programming.
With the Java libraries, making a connection with a server is as simple as asking for a new Socket object. You'll need to know the server's address (like a phone number) and a port number (an extension). Building a server to listen for requests is just as easy.
As important as Java is to the Internet process, protocols are not Java-specific. So for the rest of this chapter, don't worry about Java. You'll read more about Java's relationship with networking in Chapter 2.
Protocol Soup
An incredible number of protocols are in common use on the Internet. Many of them are for special purposes, and you'll probably never use them. Of the common protocols, many build on each other, which makes life easier.For example, consider Telnet. You've probably used a Telnet program to log in to a remote computer. You can identify three things as Telnet when you do this. First, you are using a Telnet client on your computer. The computer you log in to has to have a Telnet server (or a daemon in Unix parlance). Finally, the client and server communicate with the Telnet protocol.
So, the Telnet client uses the Telnet protocol to communicate to the Telnet server. That doesn't seem very surprising. However, email clients use essentially the same Telnet connection to talk to Simple Mail Transfer Protocol (SMTP) servers. In fact, you can use a Telnet program to manually talk to an SMTP server. Mail messages have a distinct way of representing data, and Web servers use the same format. What's more, Web servers also use a Telnet-like connection.
So while it may seem daunting to learn so many protocols, the truth is that many of the higher-level protocols build on the lower-level protocols, which makes the learning curve less steep than it appears. You can often recycle some of what you know about simpler protocols when developing code for more sophisticated ones.
Another thing that can make your life simpler is the wide array of source code available on the Web to allow Java programs to work with different protocols. Many open source packages and examples for any protocol you can imagine exist on the Web. Java has some built-in support for adding custom protocol handlers. Also, Java's object-oriented approach makes it a natural for creating reusable building blocks to handle Internet protocols.
Of particular interest is the NetComponents package from the Jakarta Project (http://jakarta.apache.org). The Jakarta Project is the Java arm of the group that produces the popular Apache Web server. NetComponents (originally written by David Savarese) contains classes for the common protocols you'll encounter on the Web.
Jakarta is a good place to find many useful Java classes, not all of which are Internet related. Another interesting project is the Giant Java Tree (www.gjt.org). In addition, you'll find plenty of code elsewhere online and in this book.
Internet Addressing
If you think of the Internet as a phone network, you need to know how to call different computers. There are actually several ways you can specify the exact program you want to use. Suppose you call your bank on the phone. You need to know the bank's main number, of course. When you call, you'll probably get an automated system, and you'll have to punch in the extension of the department you want, for example, the loan department. Of course, the loan department's phones probably roll over so that they can handle many callers at one time.The same situation exists with computers on the Internet. Each computer on the network has an IP (Internet Protocol) address that looks like 4 decimal numbers between 0 and 255, separated by periods (for example, 192.16.32.182). Each number is known as an octet because it represents 8 bits. This IP address corresponds to the bank's main number.
Of course, one computer might provide many services, including email, Web documents, file transfers, and other services. You need what amounts to an extension. This is known as a port number. Port numbers 1023 and below are reserved for wellknown services. For example, Web servers usually use port 80. That way, any interested Web browser can connect to the server and request port 80 to fetch Web pages. You'll find a list of common port numbers later in the "Immediate Solutions" section. Just as the loan department has multiple lines on the same extension, a server can respond to multiple requests on the same port. That way, many Web browsers can access the server at once. Of course, just as a small company might have a single phone line, a server can elect to handle only one request at a time. The choice is up to the author of the server.
How do computers get IP addresses? That depends. At some level, a central authoritythe Internet Corporation for Assigned Names and Numbers or ICANN-assigns organizations blocks of IP addresses. For most people, however, their computer's IP address is assigned by their Internet provider or a network administrator. For client machines, it is common to use Dynamic Host Configuration Protocol (DHCP) to automatically assign IP addresses from a pool of available addresses. This isn't a good idea for a server, however, because clients may depend on the server being at the same address all the time.
The numbers of an IP address actually have some meaning and aren't just arbitrary. The numbers are categorized into three major categories. Each category uses a different number of bits to specify the network number. Everyone who has the same network number is on the same network. Requests to other networks must be routed off the network....