Connected: An Internet Encyclopedia
5. The News Propagation Algorithm

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 850
Prev: 4.3 Batching

5. The News Propagation Algorithm

5. The News Propagation Algorithm

This section describes the overall scheme of USENET and the algorithm followed by sites in propagating news to the entire network. Since all sites are affected by incorrectly formatted articles and by propagation errors, it is important for the method to be standardized.

USENET is a directed graph. Each node in the graph is a host computer, each arc in the graph is a transmission path from one host to another host. Each arc is labelled with a newsgroup pattern, specifying which newsgroup classes are forwarded along that link. Most arcs are bidirectional, that is, if site A sends a class of newsgroups to site B, then site B usually sends the same class of newsgroups to site A. This bidirectionality is not, however, required.

USENET is made up of many subnetworks. Each subnet has a name, such as "net" or "btl". The special subnet "net" is defined to be USENET, although the union of all subnets may be a superset of USENET (because of sites that get local newsgroup classes but do not get net.all). Each subnet is a connected graph, that is, a path exists from every node to every other node in the subnet. In addition, the entire graph is (theoretically) connected. (In practice, some political considerations have caused some sites to be unable to post articles reaching the rest of the network.)

An article is posted on one machine to a list of newsgroups. That machine accepts it locally, then forwards it to all its neighbors that are interested in at least one of the newsgroups of the message. (Site A deems site B to be "interested" in a newsgroup if the newsgroup matches the pattern on the arc from A to B. This pattern is stored in a file on the A machine.) The sites receiving the incoming article examine it to make sure they really want the article, accept it locally, and then in turn forward the article to all their interest neighbors. This process continues until the entire network has seen the article.

An important part of the algorithm is the prevention of loops. The above process would cause a message to loop along a cycle forever. In particular, when site A sends an article to site B, site B will send it back to site A, which will send it to site B, and so on. One solution to this is the history mechanism. Each site keeps track of all articles it has seen (by their message ID) and whenever an article comes in that it has already seen, the incoming article is discarded immediately. This solution is sufficient to prevent loops, but additional optimizations can be made to avoid sending articles to sites that will simply throw them away.

One optimization is that an article should never be sent to a machine listed in the Path line of the header. When a machine name is in the Path line, the message is known to have passed through the machine. Another optimization is that, if the article originated on site A, then site A has already seen the article. (Origination can be determined by the Posting-Version line.)

Thus, if an article is posted to newsgroup "net.misc", it will match the pattern "net.all" (where "all" is a metasymbol that matches any string), and will be forwarded to all sites that subscribe to net.all (as determined by what their neighbors send them). These sites make up the "net" subnetwork. An article posted to "btl.general" will reach all sites receiving "btl.all", but will not reach sites that do not get "btl.all". In effect, the articles reaches the "btl" subnetwork. An article posted to newsgroups "net.micro,btl.general" will reach all sites subscribing to either of the two classes.


Connected: An Internet Encyclopedia
5. The News Propagation Algorithm