M. Horton
AT&T Bell Laboratories
R. Adams
Center for Seismic Studies
December 1987
STATUS OF THIS MEMO
This document defines the standard format for the interchange of network News messages among USENET hosts. It updates and replaces RFC-850, reflecting version B2.11 of the News program. This memo is disributed as an RFC to make this information easily accessible to the Internet community. It does not specify an Internet standard. Distribution of this memo is unlimited.
There are five sections to this document. Section two defines the format. Section three defines the valid control messages. Section four specifies some valid transmission methods. Section five describes the overall news propagation algorithm.
Here is an example USENET message to illustrate the fields.
From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Usenet Etiquette -- Please Read Message-ID: <642@eagle.ATT.COM> Date: Fri, 19 Nov 82 16:14:55 GMT Followup-To: news.misc Expires: Sat, 1 Jan 83 00:00:00 -0500 Organization: AT&T Bell Laboratories, Murray Hill The body of the message comes here, after a blank line.Here is an example of a message in the old format (before the existence of this standard). It is recommended that implementations also accept messages in this format to ease upward conversion.
From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz) Newsgroups: news.misc Title: Usenet Etiquette -- Please Read Article-I.D.: eagle.642 Posted: Fri Nov 19 16:14:55 1982 Received: Fri Nov 19 16:59:30 1982 Expires: Mon Jan 1 00:00:00 1990 The body of the message comes here, after a blank line.Some news systems transmit news in the A format, which looks like this:
Aeagle.642 news.misc cbosgd!mhuxj!mhuxt!eagle!jerry Fri Nov 19 16:14:55 1982 Usenet Etiquette - Please Read The body of the message comes here, with no blank line.A standard USENET message consists of several header lines, followed by a blank line, followed by the body of the message. Each header line consist of a keyword, a colon, a blank, and some additional information. This is a subset of the Internet standard, simplified to allow simpler software to handle it. The "From" line may optionally include a full name, in the format above, or use the Internet angle bracket syntax. To keep the implementations simple, other formats (for example, with part of the machine address after the close parenthesis) are not allowed. The Internet convention of continuation header lines (beginning with a blank or tab) is allowed.
Certain headers are required, and certain other headers are optional. Any unrecognized headers are allowed, and will be passed through unchanged. The required header lines are "From", "Date", "Newsgroups", "Subject", "Message-ID", and "Path". The optional header lines are "Followup-To", "Expires", "Reply-To", "Sender", "References", "Control", "Distribution", "Keywords", "Summary", "Approved", "Lines", "Xref", and "Organization". Each of these header lines will be described below.
RFC-822 specifies that all text in parentheses is to be interpreted as a comment. It is common in Internet mail to place the full name of the user in a comment at the end of the "From" line. This standard specifies a more rigid syntax. The full name is not considered a comment, but an optional part of the header line. Either the full name is omitted, or it appears in parentheses after the electronic address of the person posting the message, or it appears before an electronic address which is enclosed in angle brackets. Thus, the three permissible forms are:
From: mark@cbosgd.ATT.COM From: mark@cbosgd.ATT.COM (Mark Horton) From: Mark Horton <mark@cbosgd.ATT.COM>Full names may contain any printing ASCII characters from space through tilde, except that they may not contain "(" (left parenthesis), ")" (right parenthesis), "<" (left angle bracket), or ">" (right angle bracket). Additional restrictions may be placed on full names by the mail standard, in particular, the characters "," (comma), ":" (colon), "@" (at), "!" (bang), "/" (slash), "=" (equal), and ";" (semicolon) are inadvisable in full names.
Wdy, DD Mon YY HH:MM:SS TIMEZONESeveral examples of valid dates appear in the sample message above. Note in particular that ctime(3) format:
Wdy Mon DD HH:MM:SS YYYYis not acceptable because it is not a valid RFC-822 date. However, since older software still generates this format, news implementations are encouraged to accept this format and translate it into an acceptable format.
There is no hope of having a complete list of timezones. Universal Time (GMT), the North American timezones (PST, PDT, MST, MDT, CST, CDT, EST, EDT) and the +/-hhmm offset specifed in RFC-822 should be supported. It is recommended that times in message headers be transmitted in GMT and displayed in the local time zone.
Wildcards (e.g., the word "all") are never allowed in a "News- groups" line. For example, a newsgroup comp.all is illegal, although a newsgroup rec.sport.football is permitted.
If a message is received with a "Newsgroups" line listing some valid newsgroups and some invalid newsgroups, a host should not remove invalid newsgroups from the list. Instead, the invalid newsgroups should be ignored. For example, suppose host A subscribes to the classes btl.all and comp.all, and exchanges news messages with host B, which subscribes to comp.all but not btl.all. Suppose A receives a message with Newsgroups: comp.unix,btl.general.
This message is passed on to B because B receives comp.unix, but B does not receive btl.general. A must leave the "Newsgroups" line unchanged. If it were to remove btl.general, the edited header could eventually re-enter the btl.all class, resulting in a message that is not shown to users subscribing to btl.general. Also, follow-ups from outside btl.all would not be shown to such users.
<string not containing blank or ">">In order to conform to RFC-822, the Message-ID must have the format:
<unique@full_domain_name>where full_domain_name is the full name of the host at which the message entered the network, including a domain that host is in, and unique is any string of printing ASCII characters, not including "<" (left angle bracket), ">" (right angle bracket), or "@" (at sign).
For example, the unique part could be an integer representing a sequence number for messages submitted to the network, or a short string derived from the date and time the message was created. For example, a valid Message-ID for a message submitted from host ucbvax in domain "Berkeley.EDU" would be "<4123@ucbvax.Berkeley.EDU>". Programmers are urged not to make assumptions about the content of Message-ID fields from other hosts, but to treat them as unknown character strings. It is not safe, for example, to assume that a Message-ID will be under 14 characters, that it is unique in the first 14 characters, nor that is does not contain a "/".
The angle brackets are considered part of the Message-ID. Thus, in references to the Message-ID, such as the ihave/sendme and cancel control messages, the angle brackets are included. White space characters (e.g., blank and tab) are not allowed in a Message-ID. Slashes ("/") are strongly discouraged. All characters between the angle brackets must be printing ASCII characters.
cbosgd!mhuxj!mhuxt cbosgd, mhuxj, mhuxt @cbosgd.ATT.COM,@mhuxj.ATT.COM,@mhuxt.ATT.COM teklabs, zehntel, sri-unix@cca!decvax(The latter path indicates a message that passed through decvax, cca, sri-unix, zehntel, and teklabs, in that order.) Additional names should be added from the left. For example, the most recently added name in the fourth example was teklabs. Letters, digits, periods and hyphens are considered part of host names; other punctuation, including blanks, are considered separators.
Normally, the rightmost name will be the name of the originating system. However, it is also permissible to include an extra entry on the right, which is the name of the sender. This is for upward compatibility with older systems.
The "Path" line is not used for replies, and should not be taken as a mailing address. It is intended to show the route the message traveled to reach the local host. There are several uses for this information. One is to monitor USENET routing for performance reasons. Another is to establish a path to reach new hosts. Perhaps the most important use is to cut down on redundant USENET traffic by failing to forward a message to a host that is known to have already received it. In particular, when host A sends a message to host B, the "Path" line includes A, so that host B will not immediately send the message back to host A. The name each host uses to identify itself should be the same as the name by which its neighbors know it, in order to make this optimization possible.
A host adds its own name to the front of a path when it receives a message from another host. Thus, if a message with path "A!X!Y!Z" is passed from host A to host B, B will add its own name to the path when it receives the message from A, e.g., "B!A!X!Y!Z". If B then passes the message on to C, the message sent to C will contain the path "B!A!X!Y!Z", and when C receives it, C will change it to "C!B!A!X!Y!Z".
Special upward compatibility note: Since the "From", "Sender", and "Reply-To" lines are in Internet format, and since many USENET hosts do not yet have mailers capable of understanding Internet format, it would break the reply capability to completely sever the connection between the "Path" header and the reply function. It is recognized that the path is not always a valid reply string in older implementations, and no requirement to fix this problem is placed on implementations. However, the existing convention of placing the host name and an "!" at the front of the path, and of starting the path with the host name, an "!", and the user name, should be maintained when possible.
For example, if John Smith is visiting CCA and wishes to post a message to the network, using friend Sarah Jones' account, the message might read:
From: smith@ucbvax.Berkeley.EDU (John Smith) Sender: jones@cca.COM (Sarah Jones)If a gateway program enters a mail message into the network at host unix.SRI.COM, the lines might read:
From: John.Doe@A.CS.CMU.EDU Sender: network@unix.SRI.COMThe primary purpose of this field is to be able to track down messages to determine how they were entered into the network. The full name may be optionally given, in parentheses, as in the "From" line.
If the keyword poster is present, follow-up messages are not permitted. The message should be mailed to the submitter of the message via mail.
The purpose of the "References" header is to allow messages to be grouped into conversations by the user interface program. This allows conversations within a newsgroup to be kept together, and potentially users might shut off entire conversations without unsubscribing to a newsgroup. User interfaces need not make use of this header, but all automatically generated follow-ups should generate the "References" line for the benefit of systems that do use it, and manually generated follow-ups (e.g., typed in well after the original message has been printed by the machine) should be encouraged to include them as well.
It is permissible to not include the entire previous "References" line if it is too long. An attempt should be made to include a reasonable number of backwards references.
For upward compatibility, messages that match the newsgroup pattern "all.all.ctl" should also be interpreted as control messages. If no "Control" header is present on such messages, the subject is used as the control message. However, messages on newsgroups matching this pattern do not conform to this standard.
Also for upward compatibility, if the first 4 characters of the "Subject:" line are "cmsg", the rest of the "Subject:" line should be interpreted as a control message.
Newsgroups: rec.auto,misc.forsale Distribution: nj,nyso that it would only go to persons subscribing to rec.auto or misc. for sale within New Jersey or New York. The intent of this header is to restrict the distribution of a newsgroup further, not to increase it. A local newsgroup, such as nj.crazy-eddie, will probably not be propagated by hosts outside New Jersey that do not show such a newsgroup as valid. A follow-up message should default to the same "Distribution" line as the original message, but the user can change it to a more limited one, or escalate the distribution if it was originally restricted and a more widely distributed reply is appropriate.
This is only of value to the local system, so it should not be transmitted. For example, in:
Path: seismo!lll-crg!lll-lcc!pyramid!decwrl!reid From: reid@decwrl.DEC.COM (Brian Reid) Newsgroups: news.lists,news.groups Subject: USENET READERSHIP SUMMARY REPORT FOR SEP 86 Message-ID: <5658@decwrl.DEC.COM> Date: 1 Oct 86 11:26:15 GMT Organization: DEC Western Research Laboratory Lines: 441 Approved: reid@decwrl.UUCP Xref: seismo news.lists:461 news.groups:6378the "Xref" line shows that the message is message number 461 in the newsgroup news.lists, and message number 6378 in the newsgroup news.groups, on host seismo. This information may be used by certain user interfaces.
Implementors and administrators may choose to allow control messages to be carried out automatically, or to queue them for annual processing. However, manually processed messages should be dealt with promptly.
Failed control messages should NOT be mailed to the originator of the message, but to the local "usenet" account.
cancel <Message-ID>If a message with the given Message-ID is present on the local system, the message is cancelled. This mechanism allows a user to cancel a message after the message has been distributed over the network.
If the system is unable to cancel the message as requested, it should not forward the cancellation request to its neighbor systems.
Only the author of the message or the local news administrator is allowed to send this message. The verified sender of a message is the "Sender" line, or if no "Sender" line is present, the "From" line. The verified sender of the cancel message must be the same as either the "Sender" or "From" field of the original message. A verified sender in the cancel message is allowed to match an unverified "From" in the original message.
ihave <Message-ID list> [<remotesys>] sendme <Message-ID list> [<remotesys>]This message is part of the ihave/sendme protocol, which allows one host (say A) to tell another host (B) that a particular message has been received on A. Suppose that host A receives message "<1234@ucbvax.Berkeley.edu>", and wishes to transmit the message to host B.
A sends the control message "ihave <1234@ucbvax.Berkeley.edu> A" to host B (by posting it to newsgroup to.B). B responds with the control message "sendme <1234@ucbvax.Berkeley.edu> B" (on newsgroup to.A), if it has not already received the message. Upon receiving the sendme message, A sends the message to B.
This protocol can be used to cut down on redundant traffic between hosts. It is optional and should be used only if the particular situation makes it worthwhile. Frequently, the outcome is that, since most original messages are short, and since there is a high overhead to start sending a new message with UUCP, it costs as much to send the ihave as it would cost to send the message itself.
One possible solution to this overhead problem is to batch requests. Several Message-ID's may be announced or requested in one message. If no Message-ID's are listed in the control message, the body of the message should be scanned for Message-ID's, one per line.
newgroup <groupname> [moderated]This control message creates a new newsgroup with the given name. Since no messages may be posted or forwarded until a newsgroup is created, this message is required before a newsgroup can be used. The body of the message is expected to be a short paragraph describing the intended use of the newsgroup.
If the second argument is present and it is the keyword moderated, the group should be created moderated instead of the default of unmoderated. The newgroup message should be ignored unless there is an "Approved" line in the same message header.
rmgroup <groupname>This message removes a newsgroup with the given name. Since the newsgroup is removed from every host on the network, this command should be used carefully by a responsible administrator. The rmgroup message should be ignored unless there is an "Approved:" line in the same message header.
sendsys (no arguments)The sys file, listing all neighbors and the newsgroups to be sent to each neighbor, will be mailed to the author of the control message ("Reply-To", if present, otherwise "From"). This information is considered public information, and it is a requirement of membership in USENET that this information be provided on request, either automatically in response to this control message, or manually, by mailing the requested information to the author of the message. This information is used to keep the map of USENET up to date, and to determine where netnews is sent.
The format of the file mailed back to the author should be the same as that of the sys file. This format has one line per neighboring host (plus one line for the local host), containing four colon separated fields. The first field has the host name of the neighbor, the second field has a newsgroup pattern describing the newsgroups sent to the neighbor. The third and fourth fields are not defined by this standard. The sys file is not the same as the UUCP L.sys file. A sample response is:
From: cbosgd!mark (Mark Horton) Date: Sun, 27 Mar 83 20:39:37 -0500 Subject: response to your sendsys request To: mark@cbosgd.ATT.COM Responding-System: cbosgd.ATT.COM cbosgd:osg,cb,btl,bell,world,comp,sci,rec,talk,misc,news,soc,to,test ucbvax:world,comp,to.ucbvax:L: cbosg:world,comp,bell,btl,cb,osg,to.cbosg:F:/usr/spool/outnews/cbosg cbosgb:osg,to.cbosgb:F:/usr/spool/outnews/cbosgb sescent:world,comp,bell,btl,cb,to.sescent:F:/usr/spool/outnews/sescent npois:world,comp,bell,btl,ug,to.npois:F:/usr/spool/outnews/npois mhuxi:world,comp,bell,btl,ug,to.mhuxi:F:/usr/spool/outnews/mhuxi
version (no arguments)The name and version of the software running on the local system is to be mailed back to the author of the message ("Reply-to" if present, otherwise "From").
It is not a requirement that USENET hosts have mail systems capable of understanding the Internet mail syntax, but it is strongly recommended. Since "From", "Reply-To", and "Sender" lines use the Internet syntax, replies will be difficult or impossible without an Internet mailer. A host without an Internet mailer can attempt to use the "Path" header line for replies, but this field is not guaranteed to be a working path for replies. In any event, any host generating or forwarding news messages must have an Internet address that allows them to receive mail from hosts with Internet mailers, and they must include their Internet address on their From line.
uux - remote!rnewsand on a Berknet:
net -mremote rnewsIt is important that the message be sent via a reliable mechanism, normally involving the possibility of spooling, rather than direct real-time remote execution. This is because, if the remote system is down, a direct execution command will fail, and the message will never be delivered. If the message is spooled, it will eventually be delivered when both systems are up.
One problem with this method is that it may not be possible to convince the mail system that the "From" line of the message is valid, since the mail message was generated by a program on a system different from the source of the news message. Another problem is that error messages caused by the mail transmission would be sent to the originator of the news message, who has no control over news transmission between two cooperating hosts and does not know whom to contact. Transmission error messages should be directed to a responsible contact person on the sending machine.
A solution to this problem is to encapsulate the news message into a mail message, such that the entire message (headers and body) are part of the body of the mail message. The convention here is that such mail is sent to user rnews on the remote system. A mail message body is generated by prepending the letter N to each line of the news message, and then attaching whatever mail headers are convenient to generate. The N's are attached to prevent any special lines in the news message from interfering with mail transmission, and to prevent any extra lines inserted by the mailer (headers, blank lines, etc.) from becoming part of the news message. A program on the receiving machine receives mail to rnews, extracting the message itself and invoking the rnews program. An example in this format might look like this:
Date: Mon, 3 Jan 83 08:33:47 MST From: news@cbosgd.ATT.COM Subject: network news message To: rnews@npois.ATT.COM NPath: cbosgd!mhuxj!harpo!utah-cs!sask!derek NFrom: derek@sask.UUCP (Derek Andrew) NNewsgroups: misc.test NSubject: necessary test NMessage-ID: <176@sask.UUCP> NDate: Mon, 3 Jan 83 00:59:15 MST N NThis really is a test. If anyone out there more than 6 Nhops away would kindly confirm this note I would Nappreciate it. We suspect that our news postings Nare not getting out into the world. NUsing mail solves the spooling problem, since mail must always be spooled if the destination host is down. However, it adds more overhead to the transmission process (to encapsulate and extract the message) and makes it harder for software to give different priorities to news and mail.
News messages are combined into a script, separated by a header of the form:
#! rnews 1234where 1234 is the length of the message in bytes. Each such line is followed by a message containing the given number of bytes. (The newline at the end of each line of the message is counted as one byte, for purposes of this count, even if it is stored as <CARRIAGE RETURN><LINE FEED>.) For example, a batch of message might look like this:
#! rnews 239 From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Usenet Etiquette -- Please Read Message-ID: <642@eagle.ATT.COM> Date: Fri, 19 Nov 82 16:14:55 EST Approved: mark@cbosgd.ATT.COM Here is an important message about USENET Etiquette. #! rnews 234 From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Notes on Etiquette message Message-ID: <643@eagle.ATT.COM> Date: Fri, 19 Nov 82 17:24:12 EST Approved: mark@cbosgd.ATT.COM There was something I forgot to mention in the last message.Batched news is recognized because the first character in the message is #. The message is then passed to the unbatcher for interpretation.
The second argument (in this example rnews) determines which batching scheme is being used. Cooperating hosts may use whatever scheme is appropriate for them.
USENET is a directed graph. Each node in the graph is a host computer, and each arc in the graph is a transmission path from one host to another host. Each arc is labeled with a newsgroup pattern, specifying which newsgroup classes are forwarded along that link. Most arcs are bidirectional, that is, if host A sends a class of newsgroups to host B, then host B usually sends the same class of newsgroups to host A. This bidirectionality is not, however, required.
USENET is made up of many subnetworks. Each subnet has a name, such as comp or btl. Each subnet is a connected graph, that is, a path exists from every node to every other node in the subnet. In addition, the entire graph is (theoretically) connected. (In practice, some political considerations have caused some hosts to be unable to post messages reaching the rest of the network.)
A message is posted on one machine to a list of newsgroups. That machine accepts it locally, then forwards it to all its neighbors that are interested in at least one of the newsgroups of the message. (Site A deems host B to be "interested" in a newsgroup if the newsgroup matches the pattern on the arc from A to B. This pattern is stored in a file on the A machine.) The hosts receiving the incoming message examine it to make sure they really want the message, accept it locally, and then in turn forward the message to all their interested neighbors. This process continues until the entire network has seen the message.
An important part of the algorithm is the prevention of loops. The above process would cause a message to loop along a cycle forever. In particular, when host A sends a message to host B, host B will send it back to host A, which will send it to host B, and so on. One solution to this is the history mechanism. Each host keeps track of all messages it has seen (by their Message-ID) and whenever a message comes in that it has already seen, the incoming message is discarded immediately. This solution is sufficient to prevent loops, but additional optimizations can be made to avoid sending messages to hosts that will simply throw them away.
One optimization is that a message should never be sent to a machine listed in the "Path" line of the header. When a machine name is in the "Path" line, the message is known to have passed through the machine. Another optimization is that, if the message originated on host A, then host A has already seen the message. Thus, if a message is posted to newsgroup misc.misc, it will match the pattern misc.all (where all is a metasymbol that matches any string), and will be forwarded to all hosts that subscribe to misc.all (as determined by what their neighbors send them). These hosts make up the misc subnetwork. A message posted to btl.general will reach all hosts receiving btl.all, but will not reach hosts that do not get btl.all. In effect, the messages reaches the btl subnetwork. A messages posted to newsgroups misc.misc,btl.general will reach all hosts subscribing to either of the two classes.