The solution to the small-packet problem

Connected: An Internet Encyclopedia
The solution to the small-packet problem

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 896
Up: The two problems
Up: The small-packet problem
Prev: The small-packet problem
Next: Congestion control with ICMP

The solution to the small-packet problem

The solution to the small-packet problem Clearly an adaptive approach is desirable. One would expect a proposal for an adaptive inter-packet time limit based on the round-trip delay observed by TCP. While such a mechanism could certainly be implemented, it is unnecessary. A simple and elegant solution has been discovered.

The solution is to inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged. This inhibition is to be unconditional; no timers, tests for size of data received, or other conditions are required. Implementation typically requires one or two lines inside a TCP program.

At first glance, this solution seems to imply drastic changes in the behavior of TCP. This is not so. It all works out right in the end. Let us see why this is so.

When a user process writes to a TCP connection, TCP receives some data. It may hold that data for future sending or may send a packet immediately. If it refrains from sending now, it will typically send the data later when an incoming packet arrives and changes the state of the system. The state changes in one of two ways; the incoming packet acknowledges old data the distant host has received, or announces the availability of buffer space in the distant host for new data. (This last is referred to as "updating the window"). Each time data arrives on a connection, TCP must reexamine its current state and perhaps send some packets out. Thus, when we omit sending data on arrival from the user, we are simply deferring its transmission until the next message arrives from the distant host. A message must always arrive soon unless the connection was previously idle or communications with the other end have been lost. In the first case, the idle connection, our scheme will result in a packet being sent whenever the user writes to the TCP connection. Thus we do not deadlock in the idle condition. In the second case, where the distant host has failed, sending more data is futile anyway. Note that we have done nothing to inhibit normal TCP retransmission logic, so lost messages are not a problem.

Examination of the behavior of this scheme under various conditions demonstrates that the scheme does work in all cases. The first case to examine is the one we wanted to solve, that of the character-oriented Telnet connection. Let us suppose that the user is sending TCP a new character every 200ms, and that the connection is via an Ethernet with a round-trip time including software processing of 50ms. Without any mechanism to prevent small-packet congestion, one packet will be sent for each character, and response will be optimal. Overhead will be 4000%, but this is acceptable on an Ethernet. The classic timer scheme, with a limit of 2 packets per second, will cause two or three characters to be sent per packet. Response will thus be degraded even though on a high-bandwidth Ethernet this is unnecessary. Overhead will drop to 1500%, but on an Ethernet this is a bad tradeoff. With our scheme, every character the user types will find TCP with an idle connection, and the character will be sent at once, just as in the no-control case. The user will see no visible delay. Thus, our scheme performs as well as the no-control scheme and provides better responsiveness than the timer scheme.

The second case to examine is the same Telnet test but over a long-haul link with a 5-second round trip time. Without any mechanism to prevent small-packet congestion, 25 new packets would be sent in 5 seconds.¹ Overhead here is 4000%. With the classic timer scheme, and the same limit of 2 packets per second, there would still be 10 packets outstanding and contributing to congestion. Round-trip time will not be improved by sending many packets, of course; in general it will be worse since the packets will contend for line time. Overhead now drops to 1500%. With our scheme, however, the first character from the user would find an idle TCP connection and would be sent immediately. The next 24 characters, arriving from the user at 200ms intervals, would be held pending a message from the distant host. When an ACK arrived for the first packet at the end of 5 seconds, a single packet with the 24 queued characters would be sent. Our scheme thus results in an overhead reduction to 320% with no penalty in response time. Response time will usually be improved with our scheme because packet overhead is reduced, here by a factor of 4.7 over the classic timer scheme. Congestion will be reduced by this factor and round-trip delay will decrease sharply. For this case, our scheme has a striking advantage over either of the other approaches.

We use our scheme for all TCP connections, not just Telnet connections. Let us see what happens for a file transfer data connection using our technique. The two extreme cases will again be considered.

As before, we first consider the Ethernet case. The user is now writing data to TCP in 512 byte blocks as fast as TCP will accept them. The user's first write to TCP will start things going; our first datagram will be 512+40 bytes or 552 bytes long. The user's second write to TCP will not cause a send but will cause the block to be buffered. Assume that the user fills up TCP's outgoing buffer area before the first ACK comes back. Then when the ACK comes in, all queued data up to the window size will be sent. From then on, the window will be kept full, as each ACK initiates a sending cycle and queued data is sent out. Thus, after a one round-trip time initial period when only one block is sent, our scheme settles down into a maximum-throughput condition. The delay in startup is only 50ms on the Ethernet, so the startup transient is insignificant. All three schemes provide equivalent performance for this case.

Finally, let us look at a file transfer over the 5-second round trip time connection. Again, only one packet will be sent until the first ACK comes back; the window will then be filled and kept full. Since the round-trip time is 5 seconds, only 512 bytes of data are transmitted in the first 5 seconds. Assuming a 2K window, once the first ACK comes in, 2K of data will be sent and a steady rate of 2K per 5 seconds will be maintained thereafter. Only for this case is our scheme inferior to the timer scheme, and the difference is only in the startup transient; steady-state throughput is identical. The naive scheme and the timer scheme would both take 250 seconds to transmit a 100K byte file under the above conditions and our scheme would take 254 seconds, a difference of 1.6%.

Thus, for all cases examined, our scheme provides at least 98% of the performance of both other schemes, and provides a dramatic improvement in Telnet performance over paths with long round trip times. We use our scheme in the Ford Aerospace Software Engineering Network, and are able to run screen editors over Ethernet and talk to distant TOPS-20 hosts with improved performance in both cases.

¹ This problem is not seen in the pure ARPANET case because the IMPs will block the host when the count of packets outstanding becomes excessive, but in the case where a pure datagram local net (such as an Ethernet) or a pure datagram gateway (such as an ARPANET / MILNET gateway) is involved, it is possible to have large numbers of tiny packets outstanding.

Next: Congestion control with ICMP

Connected: An Internet Encyclopedia
The solution to the small-packet problem