The tests were performed sequentially so represent slightly different network loads. Regardless, a straightforward comparison of the total time is not an adequate measure of the performance difference; loss for an individual request can severely perturb the overall average. The total elapsed time for the set of URLs of the two configurations in fact shows the persistent connection case to take longer than the non-persistent configuration.
The tcpdump traces between a New Zealand proxy and a parent proxy cache in Palo Alto, California, show the benefits of persistent connections, the impact of cache hierarchy decisions on performance, and illustrate TCP implementation problems over long links.
--------- ------------ -------------
| Client | | NZ cache | | Palo Alto |
| | | | | cache |
--------- ------------ -------------
no persistent Persistent Persistent
capabilities capabilities capabilities
We only save the SYN/SYN between the NZ cache and the PA cache The steady state case for the Non-persistent configuration: Client - NZ cache : tcp connection setup : GET HTTP request : tcp data transfer back to Client NZ cache - PA cache: UDP ICP NZ cache - PA cache: tcp connection setup : GET HTTP request : tcp data transfer back to NZ cache The steady state case for the Persistent configuration: Client - NZ cache : tcp connection setup : GET HTTP request : tcp data transfer back to Client NZ cache - PA cache: UDP ICP NZ cache - PA cache: : GET HTTP request : tcp data transfer back to NZ cache
The persistent connection did eliminate slow start for many requests, but not all. Any additional delays in the UDP or request setup reactivated the slow start mechanism. The real reason for little or no performance improvement from slow start is that the tcp window for the NZ cache was much too small to fill the pipe, and most of the time the two caches were waiting for ack/data from the other. See section 3.
One of the main advantages of persistent connections is that it reduces the amount of state required at the server. Persistent connections halved the number of connections made in the NZ cache, and the PA cache. If the client supported persistent connections, the number of tcp connections for test on the NZ cache would have been 2 instead of 465 with the client + 1 with the PA cache. A similar situation is true if servers supported persistent connections. Many connections to the same server would have collapsed into a single connection.
Client FIN for previous request and ack to NZ cache: 12:31:53.958256 memphis.cc.waikato.ac.nz.1736 > osiris.3128: F 91:91(0) ack 7519 win 33580 12:31:53.958381 osiris.3128 > memphis.cc.waikato.ac.nz.1736: . ack 92 win 8760 Client tcp SYN with NZ cache for next request: 12:31:53.997898 memphis.cc.waikato.ac.nz.1737 > osiris.3128: S 473856000:473856000(0) win 32768 12:31:53.998199 osiris.3128 > memphis.cc.waikato.ac.nz.1737: S 1811449806:1811449806(0) ack 12:31:53.999886 memphis.cc.waikato.ac.nz.1737 > osiris.3128: . ack 1 win 33580 Client sends GET request in data pkt: 12:31:54.000257 memphis.cc.waikato.ac.nz.1737 > osiris.3128: P 1:89(88) ack 1 win 33580 NZ Cache sends UDP request to parent PA cache and waits for response: 12:31:54.006140 osiris.3130 > cache.nlanr.pa-x.dec.com.3130: udp 65 12:31:54.048889 osiris.3128 > memphis.cc.waikato.ac.nz.1737: . ack 89 win 8760 12:31:54.279761 cache.nlanr.pa-x.dec.com.3130 > osiris.3130: udp 61 Got response - now send GET request via tcp (for non-persistent connections this would first involve establishing a tcp connection) 12:31:54.282252 osiris.46918 > cache.nlanr.pa-x.dec.com.3128: P 257:375(118) ack 56491 win 8760 **12:31:54.468909 osiris.46918 > cache.nlanr.pa-x.dec.com.3128: P 257:375(118) ack 56491 win 8760 12:31:54.647664 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: . ack 375 win 33580 PA cache sends back first data packets: 12:31:54.679161 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: P 56491:56703(212) ack 375 win 33580 12:31:54.718401 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: . ack 375 win 33580** Note: The NZ cache tcp implementation doesn't properly calculate the RTT for use as a response timeout, even though it has sufficient information to do so.
If there is a single parent cache, the parent should not be polled prior to the actual data request. Under most circumstances multiple parenting should be avoided for similar reasons. It is unlikely that the additional parents will contribute substantially to the hit ratio, and waiting for the response is costly. Having multiple parents that resolve to a single parent for each URL is fine, or should be (caches shouldn't do ICP when there is a single parent that will receive the request regardless of the answer.) Caches should automatically determine if the ICP is superfluous.
Multiple parents are often used for redundancy in the case where a parent
dies or becomes unreachable. There should be other mechanisms for dealing
with this. A particular cache might not these features, but to the end
user it is a costly way to build in fault tolerance. Loosing/stalling on
a few requests is much better. Detecting the stall and switching to a
backup parent is the way to go.
Each 1460 byte packet took about .01 seconds to transmit on a link with a
RTT of .278 Sec. This should translate to a max. window size of about 25
packets, or 36500. The PA cache had a max window size of 33580, which is
appropriate for the link latency. But in this configuration the PA cache
was not in the position to receive packet streams from the NZ network.
Still need to figure out exactly how much time was wasted.
About 1/3 of the pipe is utilized, which means that for every packet
sent the receiving host waits two packet wait times for the next packet.
Donald's measurements on a set of 3 unique URLs:
These figures need to be taken with great caution, since the different sets
of URL's had markedly different times when compared one set against
another, so the overall median is in fact quite unstable. More useful is
the fact that for every set the test configuration was faster than the
production server using a Squid parent.
Similarly for the three-URL tests
Production Test (persistent) Direct
Average 27.0 29.9 12.3
Median 18.4 11.8 9.0
Median (midnight 12.3 11.2 7.0
- 9am only)