Connected: An Internet Encyclopedia
5.3 Interaction with data compression

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1144
Up: 5 Configurable parameters and tuning
Prev: 5.2 Choosing a maximum transmission unit
Next: 6 Performance measurements

5.3 Interaction with data compression

5.3 Interaction with data compression

Since the early 1980's, fast, effective, data compression algorithms such as Lempel-Ziv[7] and programs that embody them, such as the compress program shipped with Berkeley Unix, have become widely available. When using low speed or long haul lines, it has become common practice to compress data before sending it. For dialup connections, this compression is often done in the modems, independent of the communicating hosts. Some interesting issues would seem to be: (1) Given a good data compressor, is there any need for header compression? (2) Does header compression interact with data compression? (3) Should data be compressed before or after header compression?/39/

To investigate (1), Lempel-Ziv compression was done on a trace of 446 TCP/IP packets taken from the user's side of a typical telnet conversation. Since the packets resulted from typing, almost all contained only one data byte plus 40 bytes of header. I.e., the test essentially measured L-Z compression of TCP/IP headers. The compression ratio (the ratio of uncompressed to compressed data) was 2.6. In other words, the average header was reduced from 40 to 16 bytes. While this is good compression, it is far from the 5 bytes of header needed for good interactive response and far from the 3 bytes of header (a compression ratio of 13.3) that header compression yielded on the same packet trace.

Figure 10: Data compression alternatives

The second and third questions are more complex. To investigate them, several packet traces from FTP file transfers were analyzed/40/ with and without header compression and with and without L-Z compression. The L-Z compression was tried at two places in the outgoing data stream (fig. 10): (1) just before the data was handed to TCP for encapsulation (simulating compression done at the `application' level) and (2) after the data was encapsulated (simulating compression done in the modem). Table 1 summarizes the results for a 78,776 byte ASCII text file (the Unix csh.1 manual entry)/41/ transferred using the guidelines of the previous section (256 byte MTU or 216 byte MSS; 368 packets total). Compression ratios for the following ten tests are shown (reading left to right and top to bottom):

No data
compress.
L-Z
on data
L-Z
on wire
L-Z
on both
Raw Data
+ TCP Encap.
w/Hdr Comp.
1.00
0.83
0.98
2.44
2.03
2.39
-
1.97
2.26
-
1.58
1.66
Table 1: ASCII Text File Compression Ratios

The first column of table 1 says the data expands by 19% (`compresses' by .83) when encapsulated in TCP/IP and by 2% when encapsulated in header compressed TCP/IP./42/ The first row says L--Z compression is quite effective on this data, shrinking it to less than half its original size. Column four illustrates the well-known fact that it is a mistake to L--Z compress already compressed data. The interesting information is in rows two and three of columns two and three. These columns say that the benefit of data compression overwhelms the cost of encapsulation, even for straight TCP/IP. They also say that it is slightly better to compress the data before encapsulating it rather than compressing at the framing/modem level. The differences however are small --- 3% and 6%, respectively, for the TCP/IP and header compressed encapsulations./43/

Table 2 shows the same experiment for a 122,880 byte binary file (the Sun-3 ps executable). Although the raw data doesn't compress nearly as well, the results are qualitatively the same as for the ASCII data. The one significant change is in row two: It is about 3% better to compress the data in the modem rather than at the source if doing TCP/IP encapsulation (apparently, Sun binaries and TCP/IP headers have similar statistics). However, with header compression (row three) the results were similar to the ASCII data --- it's about 3% worse to compress at the modem rather than the source./44/

No data
compress.
L-Z
on data
L-Z
on wire
L-Z
on both
Raw Data
+ TCP Encap.
w/Hdr Comp.
1.00
0.83
0.98
1.72
1.43
1.69
-
1.48
1.64
-
1.21
1.28
Table 2: Binary File Compression Ratios


Next: 6 Performance measurements

Connected: An Internet Encyclopedia
5.3 Interaction with data compression