Ingrid Melve, UNINETT
Web proxy cache systems with several servers currently have no well documented ways of evaluating the relationships between cache proxy servers, for gains in terms of money, latency and bandwidth. This work intends to find a threshold value for relations between servers, based on bandwidth saved, latency reduction and local cost for Internet access.
This threshold will be of interest for work on auto configuration of large multi server proxy cache systems.
The web cache proxy software used is Squid. Co-operating web proxy cache servers using sibling mode is used to illustrate the principles. The work may easily be used to analyze parent relations. Local proxy performance is not considered, only traffic into the proxy.
Cache managers need some way of evaluating relationship. Today this is done on the basis of analyzing web cache logs for various parameters. There is no easy or consistent analysis.
My proposal is to form a "magic number" per relationship that is used to evaluate the relations. This may be used to automatically reconfigure relationships, to terminate relations if the gain drops below a threshold.
Gerhard Winkler, ACCOnet, has found that hit rates drop over time, since cache content gets homogenized. One solution to this is to use the proxy-only mode, another to establish a round-robin solution and change relations to the next on list once a cache drops below the threshold. He needs a threshold to start with.
The simplest solution is to look at bytes saved per relation. This approach is promoted by NetCache
Basic assumption is that the two reasons for running a web cache are
Latency reduction is important for clients placed a long distance from the web servers, as setting up a TCP connection takes time (1.5 round trip times). Most popular web servers are placed in the US, this is far away for non-American users.
Time to look up if the sibling has object is negligible compared with download time. Typical value is 2 orders of magnitude smaller.
Assume non-persistent connections, as used by HTTP/1.0 Even though some browsers implement Keep-Alive, Squid/1.1 does not support this.

The figure shows a cache system where the green part is clients requesting documents from web servers in pale blue. The primary proxy cache is shown in brown, and its sibling in blue.
Documents requested through the proxy is either served from the cache, directly from the origin web server (solid red line) or from the sibling (blue line). The cost of getting an object from the sibling should be lower than getting an object directly.
Calculations may happen in real time or evaluation may happen every night (or all nights except weekends). There is still work to be done to evaluate if continuous calculation is feasible.
Basic components for this threshold value are
The optimum latency measurement would be to measure median response time for hits during peak load and median response time for misses during peak load. Different response types should be separated, and emphasis placed on HTTP "200 OK" responses.
Performance issues related to computation of real-time median suggests that using weighted mean (throw away values larger than 3 times the standard deviation) yields a good enough result.
Experience in network measurements indicate that peak load may be chosen as the hour which has most load, counting Monday to Friday. Weekend traffic is significantly lower i UNINETT, students seem to take time off to party and faculty probably writes scientific papers. This may be different for other network environments.
Dimensioning should be done using the extreme conditions where web traffic is at its high water mark (high water is when your feet get wet and you feel uncomfortable unless you move on to a better place (or higher bandwidth)).
Delay with sibling is calculated by adding together delay for hits and delay for misses and divide by number of requests. Delay for hits is given by weighted mean multiplied by number of hits. Delay for misses is given by weighted mean multiplied by number of misses.
Suggested formula for delay when using web proxy cache sibling:
D = Ds*Th + Dw*Tm
D= Delay
Tc= Number of object domain HTTP connections
Th= Number of hits in object domain
Tm=Tc -Th
Ds= Delay for sibling hits
Dw= Delay for object direct from web origin server
D is measured in peak hour.
Delay without sibling is approximated by calculating delay for misses in object domain multiplied by number of requests.
Suggested formula for delay without web proxy cache sibling:
D = Dw*Tc
D= Delay
Tc= Number of object domain HTTP connections
Dw= Delay for object direct from web origin server
High document hit ratio gives low latency (there is a trade off between latency and high byte hit ratio). The document hit ratio is implicitly handled by considering latency.
Traffic volume to proxy is calculated for traffic from web servers directly to proxy (misses) and for traffic from sibling (HTTP hits and ICP responses). The ratio between direct traffic and sibling HTTP traffic is the net byte gain. The ICP traffic is added to compute the total traffic flow from sibling to proxy. If cost of traffic from sibling is low, the ICP traffic may be neglected.
The total traffic volume is composed of ICP traffic from sibling, HTTP traffic from sibling (hits) and HTTP traffic from web servers (misses).
ICP traffic from sibling is response to ICP requests. All requests for documents result in an ICP response-request to sibling (unless there is a hit at the proxy, but that case falls outside the scope of this document). Using an average object size of 3kB yields a break even of 3.3% object hit rate before the ICP traffic exceeds the HTTP traffic on the link to the sibling. Each ICP request/response uses on average 128B, 64B per request.
Traffic needs not be symmetric, the traffic volume is calculated only flowing into the web cache. Similar calculations may be done for traffic flowing to sibling cache, this is left out to simplify calculations. Most Internet connections have a loop sided traffic profile, importing more than they export. Normally connections are full duplex, and the dimensioning factor is traffic flowing into the network. In this case, outflowing traffic needs not be considered.
Cost factor for an ISP may be cost of international connection as opposed to local cost of bandwidth, or it may be cost per byte transferred out of cache. For a local sysadmin the cost factor may be the access line cost compared with local traffic exchange cost. Finding this factor is not necessarily easy, as some costs are not paid at the same organizational level as the cache servers are run.
Cost/benefit analysis from DESIRE project gives examples of costs for international versus national bandwidth.
Example: Cd = cost of manhours
Cd
is a cost factor for delay, an ISP is likely to set this to zero (unless there
is a demand for faster web access).
Example: Ct = cost of international bandwidth / national cost
Ct
is a cost factor for traffic volume, an ISP is likely to but most emphasis
on this.
Cost factors for traffic volume:
Cts Cost of traffic from sibling
Ctw Cost of traffic direct from web origin servers
Cts and multiplying direct traffic by Ctw.
Traffic from sibling is composed of ICP responses (see above) and HTTP hits.
If the cost of sibling traffic is neglible, the cost of traffic is equal
to the cost of direct traffic. Comparing this number to the case where all
traffic is direct, gives the gain of web caching.
Basic components for this threshold value are
Suggested formula for cost when using web proxy cache sibling:
(Ds*Th + Dw*Tm)*Cd + (Tb - Hb)*Ctw + Hb*Cts + Ib*Cts
Suggested formula for cost without sibling:
Dw*Tc*Cd + Tb*Ctw
Results common for object domain
Tc= Number of object domain HTTP connections
Th= Number of hits in object domain
Tm=Tc -ThNumber of misses in object domain
Dw= Delay for object direct from web origin server
Tb= Total object domain HTTP traffic in bytes
Ib=64B*TcICP traffic in bytes (64 bytes per connection)
Tc= Number of object domain HTTP connections
Various cost factors (individual per sibling)
Cts= Cost of traffic from sibling
Ctw= Cost of traffic direct from web origin servers
Cd= Cost of delay
Results per sibling:
Ds= Delay for sibling hits
Hb= Byte hit rate
Ds and Dw is measured in peak hour.
When considering only bandwidth savings, the cost factor Cd
is set to zero. The simplified result is
(Tb - Hb)*Ctw + Hb*Cts + Ib*Ctswith sibling
Tb*Ctwwithout sibling
If cost of sibling is neglible (i.e. same LAN), Cts may be set to
zero, which results in
(Tb - Hb)*Ctwwith sibling
Tb*Ctwwithout sibling
Subtracting these two to find total saving yields
Hb*Ctw
Byte hit rate multiplied with cost of connection if not hit.
Neglecting traffic volume, and setting Cd = 1 gives
Ds*Th + Dw*Tmwith sibling
Dw*Tcwithout sibling
Subtracting these two to find savings results in
(Dw - Ds)*Th
Total number of connections multiplied with median delay difference for hit and miss.
These cost figures shows that in the European context is makes sense to ignore all national and European costs.
Using Calamaris to extract numbers per day per object domain (thanks to Ernst Heiri for his patch). Look at results for .com domain.
Chose to use this date, as there are other results gathered from other web caches that may be used for comparison (at a later stage).
Results common for object domain
Tc= 78364
Th= 32194
Tm= 46170
Dw= 5.25 s
Tb= 867258 kB
Ib = Tc*64B= 5015 kB
Various cost factors
Cts= 1
Ctw= 250
Cd= 0.05 NOK/s (estimate)
Total for all 8 siblings
Ds= 5.19 s
Hb= 56686 kB
The first question is if there is any overall benefit from the sibling
relations of this cache. Computing the byte cost with siblings (using
Ib as ICPsize multiplied with the number of siblings) gives a
byte cost of (Tb - Hb)*Ctw + Hb*Cts + Ib*Cts with sibling
and a byte cost of Tb*Ctw without sibling. Assumes that all
siblings have the same cost factors (this is not entirely correct as one
sibling is located in the US and the rest in Europe).
The cost gain from the siblings is 14074694/216814500 = 6.5%
The extra byte cost introduced by the siblings is not significant
for values of Ctw > 1.7*Cts.
Cost of delay for the overall configuration is given by
(Ds*Th + Dw*Tm)*Cd, whereas the delay without sibling may
be estimated by Dw*Tc.
The savings are 1933 ksec, which amounts to some
96650 NOK (if time is money).
If the cost factors are Ctw = 10, Cts = 1,
the byte gain is significantly lower, around 0.2%.
The complexity of the configuration does not justify such a small gain.
Results per sibling:
cache.nic.surf.net
Ds= 3.31 s
Hb= 9416 kB
For values of Ctw > 1.53*Cts there is a gain for byte cost.
Results common for object domain
Tc= 202267
Th= 87788
Tm= 114479
Dw= 9.1 s
Tb= 2563825 kB
Ib = Tc*64B= 12945 kB
Various cost factors
Cts= 1
Ctw= 250
Cd= 0.06 NOK/s (estimate)
Total for all 7 siblings
Ds= 6.68 s
Hb= 303856 kB
Results per sibling:
www-cache.uninett.no
Ds= 5.94 s
Hb= 34899 kB
Analysing relation with www-cache.uninett.no gives a reduction in delay and
cost.
For values of Ctw > 1.37*Cts there is a gain for byte cost.
There is a higher gain for this relation than for the opposite relation,
the gain is not symmetric.
More work on all relations involved in COM-MESH needs to be done, to properly evaluate the gain from such a large scale mesh. All relations for the given day should be calculated, as the data is available.
Calculations need to be tested in real time, to see if this framework is applicable to real time "best sibling" selection mechanisms. This may be one step on the road to mesh autoconfiguration.