Connected: An Internet Encyclopedia
Congestion collapse

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 896
Prev: Introduction
Next: The two problems

Congestion collapse

Congestion collapse Before we proceed with a discussion of the two specific problems and their solutions, a description of what happens when these problems are not addressed is in order. In heavily loaded pure datagram networks with end to end retransmission, as switching nodes become congested, the round trip time through the net increases and the count of datagrams in transit within the net also increases. This is normal behavior under load. As long as there is only one copy of each datagram in transit, congestion is under control. Once retransmission of datagrams not yet delivered begins, there is potential for serious trouble.

Host TCP implementations are expected to retransmit packets several times at increasing time intervals until some upper limit on the retransmit interval is reached. Normally, this mechanism is enough to prevent serious congestion problems. Even with the better adaptive host retransmission algorithms, though, a sudden load on the net can cause the round-trip time to rise faster than the sending hosts measurements of round-trip time can be updated. Such a load occurs when a new bulk transfer, such a file transfer, begins and starts filling a large window. Should the round-trip time exceed the maximum retransmission interval for any host, that host will begin to introduce more and more copies of the same datagrams into the net. The network is now in serious trouble. Eventually all available buffers in the switching nodes will be full and packets must be dropped. The round-trip time for packets that are delivered is now at its maximum. Hosts are sending each packet several times, and eventually some copy of each packet arrives at its destination. This is congestion collapse.

This condition is stable. Once the saturation point has been reached, if the algorithm for selecting packets to be dropped is fair, the network will continue to operate in a degraded condition. In this condition every packet is being transmitted several times and throughput is reduced to a small fraction of normal. We have pushed our network into this condition experimentally and observed its stability. It is possible for round-trip time to become so large that connections are broken because the hosts involved time out.

Congestion collapse and pathological congestion are not normally seen in the ARPANET / MILNET system because these networks have substantial excess capacity. Where connections do not pass through IP gateways, the IMP-to host flow control mechanisms usually prevent congestion collapse, especially since TCP implementations tend to be well adjusted for the time constants associated with the pure ARPANET case. However, other than ICMP Source Quench messages, nothing fundamentally prevents congestion collapse when TCP is run over the ARPANET / MILNET and packets are being dropped at gateways. Worth noting is that a few badly-behaved hosts can by themselves congest the gateways and prevent other hosts from passing traffic. We have observed this problem repeatedly with certain hosts (with whose administrators we have communicated privately) on the ARPANET.

Adding additional memory to the gateways will not solve the problem. The more memory added, the longer round-trip times must become before packets are dropped. Thus, the onset of congestion collapse will be delayed but when collapse occurs an even larger fraction of the packets in the net will be duplicates and throughput will be even worse.


Next: The two problems

Connected: An Internet Encyclopedia
Congestion collapse