Observations from a Trans-Pacific cache pair

Kathy J Richardson, Digital Network Systems Laboratory
in collaboration with

Donald Neal, University of Waikato, New Zealand
Duane Wessels and Kim Claffy at NLANR

Abstract/Introduction


This paper reports on a detailed analysis of tcpdump traces with persistent and non-persistent connections for identical URL sets. On March 17th, 1997, tcpdump traces were taken on a benchmark configuration. A client on the University of Waikato network made sequential http requests to its local cache for 93 unique URLs. The list of requests were repeated 5 times for each of two configurations: harvest with persistent connections and harvest without non-persistent connections.

The tests were performed sequentially so represent slightly different network loads. Regardless, a straightforward comparison of the total time is not an adequate measure of the performance difference; loss for an individual request can severely perturb the overall average. The total elapsed time for the set of URLs of the two configurations in fact shows the persistent connection case to take longer than the non-persistent configuration.

The tcpdump traces between a New Zealand proxy and a parent proxy cache in Palo Alto, California, show the benefits of persistent connections, the impact of cache hierarchy decisions on performance, and illustrate TCP implementation problems over long links.

Performance benefits from persistent connections


The primary goal is to understand improvements to customer service from persistent connections. The primary customer service metric for a cache is what is the latency distribution for servicing page requests.
Because long tail distributions dramatically effect the average or cumulative response time, it is important to look at the median time, a truncated average, and/or a distribution of the response times.
Median Bin: Persistent: In the >.75 and <1.0 second bin. The bin encompass the 30-60 percentiles. Non-persistent: Median at 1.25 second bin boundary. Truncated average: Averaging all requests requiring less than 5 seconds: Persistent: 1.127 seconds includes 92% of all references Tail average: 25.129 seconds Non-persistent: 1.446 seconds includes 89% of all references Tail average: 14.169 seconds The truncated averaged round-trip (including everything less than 5 seconds) as measured from the UDP RT times was: .2778 seconds. It looks like the improvement in total response time is exactly the same as one RTT. This is in fact exactly what we would expect. Here is the configuration:
	---------		------------		-------------
	| Client |              | NZ cache |		| Palo Alto |
	|        |              | 	   |		| cache     |
	---------		------------		-------------
       no persistent		Persistent		Persistent
	capabilities		capabilities		capabilities

We only save the SYN/SYN between the NZ cache and the PA cache

The steady state case for the Non-persistent configuration:	
	Client - NZ cache  : tcp connection setup
			   : GET HTTP request
			   : tcp data transfer back to Client
	NZ cache - PA cache: UDP ICP
	NZ cache - PA cache: tcp connection setup
			   : GET HTTP request
			   : tcp data transfer back to NZ cache

The steady state case for the Persistent configuration:	
	Client - NZ cache  : tcp connection setup
			   : GET HTTP request
			   : tcp data transfer back to Client
	NZ cache - PA cache: UDP ICP
	NZ cache - PA cache:
			   : GET HTTP request
			   : tcp data transfer back to NZ cache

An additional hope is that persistent connections will eliminate the slow start effects for subsequent connections. Since the savings was equal to a single RTT it is unlikely that there was a significant performance improvement for this test-case from slow start.

The persistent connection did eliminate slow start for many requests, but not all. Any additional delays in the UDP or request setup reactivated the slow start mechanism. The real reason for little or no performance improvement from slow start is that the tcp window for the NZ cache was much too small to fill the pipe, and most of the time the two caches were waiting for ack/data from the other. See section 3.

One of the main advantages of persistent connections is that it reduces the amount of state required at the server. Persistent connections halved the number of connections made in the NZ cache, and the PA cache. If the client supported persistent connections, the number of tcp connections for test on the NZ cache would have been 2 instead of 465 with the client + 1 with the PA cache. A similar situation is true if servers supported persistent connections. Many connections to the same server would have collapsed into a single connection.

Hierarchical cache


There are several reasons to use hierarchical caches, but if they are not used carefully, they result in much poorer response time without providing any other benefits. parent The inter cache communication adds additional latency to each cache miss. Care needs to be taken to understand the impact and the benefits from the ICP. In the test configuration the NZ cache had a single parent, the PA Cache. The parent cache was polled to see if it had the data before the data was retrieved. I'm told this was a configuration error, but it illustrates the importance understanding how the cache protocol effects performance. First asking the parent cache if it has the data, and then requesting the data from that parent regardless of the answer introduces an additional RT delay with out providing any bandwidth saving. From the tracedump the penalty is evaluated by examining the UDP time.

Inter-cache Protocol overhead - UDP message time

For this trans-pacific link the truncated-average UDP response time was .278 seconds. By running the cache in single parent mode, the response time to the client would have improved another .278 seconds, to .849 sec for the persistent configuration and 1.268 for the non-persistent configuration. The tracedump looks like:
Client FIN for previous request and ack to NZ cache:
12:31:53.958256 memphis.cc.waikato.ac.nz.1736 > osiris.3128: F 91:91(0) ack 
7519 win 33580
12:31:53.958381 osiris.3128 > memphis.cc.waikato.ac.nz.1736: . ack 92 win 8760 
Client tcp SYN with NZ cache for next request:
12:31:53.997898 memphis.cc.waikato.ac.nz.1737 > osiris.3128: S 
473856000:473856000(0) win 32768
12:31:53.998199 osiris.3128 > memphis.cc.waikato.ac.nz.1737: S 
1811449806:1811449806(0) ack
12:31:53.999886 memphis.cc.waikato.ac.nz.1737 > osiris.3128: . ack 1 win 33580 
Client sends GET request in data pkt:
12:31:54.000257 memphis.cc.waikato.ac.nz.1737 > osiris.3128: P 1:89(88) ack 1 
win 33580
NZ Cache sends UDP request to parent PA cache and waits for response:
12:31:54.006140 osiris.3130 > cache.nlanr.pa-x.dec.com.3130: udp 65 
12:31:54.048889 osiris.3128 > memphis.cc.waikato.ac.nz.1737: . ack 89 win 8760 
12:31:54.279761 cache.nlanr.pa-x.dec.com.3130 > osiris.3130: udp 61
Got response - now send GET request via tcp (for non-persistent
connections this would first involve establishing a tcp connection)
12:31:54.282252 osiris.46918 > cache.nlanr.pa-x.dec.com.3128: P 257:375(118) 
ack 56491 win 8760
**12:31:54.468909 osiris.46918 > cache.nlanr.pa-x.dec.com.3128: P 257:375(118) 
ack 56491 win 8760
12:31:54.647664 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: . ack 375 win 
33580
PA cache sends back first data packets:
12:31:54.679161 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: P 
56491:56703(212) ack 375 win 33580
12:31:54.718401 cache.nlanr.pa-x.dec.com.3128 > osiris.46918: . ack 375 win 
33580
** Note: The NZ cache tcp implementation doesn't properly calculate the RTT for use as a response timeout, even though it has sufficient information to do so.

If there is a single parent cache, the parent should not be polled prior to the actual data request. Under most circumstances multiple parenting should be avoided for similar reasons. It is unlikely that the additional parents will contribute substantially to the hit ratio, and waiting for the response is costly. Having multiple parents that resolve to a single parent for each URL is fine, or should be (caches shouldn't do ICP when there is a single parent that will receive the request regardless of the answer.) Caches should automatically determine if the ICP is superfluous.

Multiple parents are often used for redundancy in the case where a parent dies or becomes unreachable. There should be other mechanisms for dealing with this. A particular cache might not these features, but to the end user it is a costly way to build in fault tolerance. Loosing/stalling on a few requests is much better. Detecting the stall and switching to a backup parent is the way to go.

Parental miss distributions

The miss distribution times for requests through the Digital Equipment Palo Alto proxy produce median access times on the order of between .25 and .5 sec. The PA cache used in these experiments shares the same Internet connection with the Digital proxy, and should experience the same miss distribution. This means that even if the parent cache is likely to miss the penalty for time asking it if it has the data is comparable to the time that it takes to get the data even if it doesn't have it.

TCP over long/slow links


The TCP implementation and parameters on the NZ cache seemed very poor. Several problems were evident from the tcpdump trace:

Future work - extrapolating.

Extrapolating to Satellite links
Extrapolating improvements to hierarchies.

Discussion - is a State-side cache a win?