Analyzing Performance of Partitioned Caches for the WWW

Cristina Duarte Murta      Virgilio A. F. Almeida      Wagner Meira Jr.
{cristina, virgilio, meira}@dcc.ufmg.br
Computer Science Department
Federal University of Minas Gerais, Brazil

Abstract

There are different approaches to Web caching, such as client caching, server caching, proxy caching, hierarchical caching and cooperative server caching. All the caching approaches share a common issue: How to use Web-cache space efficiently? A basic issue of a caching scheme is therefore how to manage its finite storage space in order to obtain good hit rates. Web caching is though different from traditional file caching schemes and virtual memory systems, that inspired most of the current replacement policies. Web caches must handle variable size documents. The problem of managing Web caches is compounded by the high variability in the WWW, where the size of documents range from hundreds of bytes to tens of megabytes. Basically, cache performance is measured by two metrics: hit rate and byte hit rate. The WWW caching mechanisms reported in the literature attempt to optmize only one metric, either hit rate or byte hit rate. In this paper, we propose a new approach to manage Web-cache space, that takes into consideration the high variability noticed in the WWW. Instead of having a single cache to store all documents, we propose to split the cache into partitions, that store classes of documents, based on their size. The proposed approach is compared to current cache management policies, that use LRU and size-based algorithms. Using four different and realistic workloads, our trace-driven simulations show that the use of size-based partitions improves Web caching performance in terms of both metrics.

1 Introduction

  Caching is a key strategy for scaling the World Wide Web. By storing frequently accessed Web documents, caches can potentially improve the overall WWW performance in many ways. Caching helps to reduce load on busy servers and volume of network traffic resulting from Web document requests. Cached documents lead to fast response times to Web users. There are different approaches to Web caching, such as client caching, server caching, proxy caching, hierarchical caching and cooperative server caching. All the caching approaches share a common issue: How to use Web-cache space efficiently? A basic issue of a caching scheme is therefore how to manage its finite storage space in order to obtain good hit rates. Cache replacement policies (also known as removal policies) are used to purge documents to make room for new documents. Performance of replacement policies is critical to the effectiveness of a caching scheme. Various replacement algorithms have been carefully examined in the literature [7,2,8], such as first-in, first-out (FIFO), least recently used (LRU), least frequently used (LFU) and largest file removed first (SIZE).

Web caching is though different from traditional file caching schemes and virtual memory systems, that inspired most of the current replacement policies. First of all, Web caches handle variable size documents. The number of potentially cachable objects could be very large, in the order of millions documents, depending on the location of the cache. The problem of managing Web caches is compounded by the high variability in the WWW [5]. For example, the size of Web documents range from hundreds of bytes to tens of megabytes. Also the document popularity in the Web is highly variable, following the Zipf's Law [1]. As a result, there are situations that, in order to make room for a very large, but unpopular document, a replacement policy may have to evict thousands of frequently accessed small documents, that are more likely to satisfy future cache requests. Such a that situation contributes to reduce performance of Web caching.

In this paper we propose a new approach to manage Web-cache space, that takes into consideration the high variability noticed in the WWW. Instead of having a single cache to store all documents, we propose to split the cache into partitions, that store classes of documents, based on their size. A class is defined by a subrange of document sizes. Thus, each partition is designated to hold one or more classes of documents. As a consequence, a partition holds documents that are somehow homogeneous and that can be adequately managed by traditional replacement algorithms. The proposed approach is compared to current cache management policies, that use LRU and size-based algorithms. Using four different and realistic workloads, our trace-driven simulations show that the use of size-based partitions improves Web caching performance.

The remainder of this paper is organized as follows. Section 2 describes the partitioned cache scheme and discusses its rationale. In section 3, we present the experimental methodology used to conduct the performance analysis of partitioned caching. The traces, the cache workloads and the simulation model are discussed in details. Results are analyzed in Section 4 and our conclusions are in Section 5.

2 Partitioned Caching

  Partitioned caches have been previously used in computer architecture to increase bandwidth between the memory hierarchy and the CPU. Two separate caches, one dedicated to instructions and other to data, are found in most recent processors. It was observed that instruction caches have lower miss rates than data caches [6]. Our approach is to have a size-based partitioned Web cache. The rationale behind the partitioning strategy is to improve hit rate in each partition, by preserving reference locality. As documents stored in one partition can only be replaced by documents of the same class, we will not see the case where the reference locality of small documents is disrupted by the arrival of a large non-cached document. Using this approach, we are able to explore and improve the temporal locality at each partition.

There are several parameters involved in the design of a partitioned cache for the WWW, namely: the number of cache partitions, the size of each partition, the subrange limits of document sizes for each partition and the replacement algorithm used in each partition. For the purpose of this study, we have set these parameters as follows. We specified three cache partitions dedicated to small, medium and large documents. The size of each partition was determined experimentally. Tests were made with an equal size partition configuration (1/3, 1/3, 1/3) and two non-equal size configurations, (1/6, 1/3, 1/2) and (1/10, 2/10, 7/10). The last configuration provided the best performance among those three. The subrange limits for the partitions are based on Web workload characteristics shown in [2]. The median and the average size of documents found in several real workloads are 2 kbytes and 6 kbytes, respectively [2]. Small class consists of documents whose sizes are less than the median. The size of documents that make up the medium class is in between the median and the average sizes for the workloads. The best values for the partition size can be solved using optimizing techniques. In our experiment we have the first 10 percent of the cache size dedicated to documents whose sizes are less or equal to 2 kbytes. Then the next two tenth of the cache is dedicated to requests which sizes are between 2 and 6 kbytes, and the largest partition holds documents with more than 6 kbytes. In our approach, the cache is viewed as a combination of n separate caches (n = 3, in the cases studied in the paper), whose sizes are (1/10, 2/10, 7/10) of the originating cache size, respectively. Another caching decision concerns the replacement policy. Each partition can use a different policy that is more appropriate to the class of documents stored in the partition. In this work all partitions use LRU replacement algorithm.

The implementation is straightforward. The cache manager maintains a list of the documents cached. When a request arrives, the manager searches the list for the requested document. If it is not present (a miss), the manager has to order it from the originating web server. The operation is a file transfer, so the file size is known. The document must be placed in the partition responsible for files of its range. If the cache is full, a LRU policy is applied only in that partition.

To study performance of the partitioned caching, we built a trace-driven simulator of a proxy cache. For comparison purposes, we also implemented the traditional single cache with two different replacement algorithms: LRU and SIZE (largest file removed first). The performance metrics used in the comparisons are hit rate (HR) and byte hit rate (BHR). The former represents the fraction of requests satisfied by the cache, and byte hit rate is the fraction of the bytes requested by the client that are returned by the cache. The cache management scheme that achieves the highest HR and BHR is considered to be the caching scheme with the best overall performance.

A simple partitioned caching was proposed by [7]. However, they only discussed a simple workload with two types of documents, audio and non-audio. They conclude that splitting the cache would maximize the overall weighted hit rate, but they do not present results for other metrics such as hit rate. Reference [8,7] compared a document size-based policy (called SIZE) and LRU and concluded that SIZE is the best policy when hit rate is considered, but it is clearly the worst for byte hit rate. Reference [8] points out that LRU is the best for byte hit rate. In [7], the authors were inconclusive about which policy maximize byte hit rate for proxy caching servers. Reference [8] also reports LFU (least frequently used) policy as the worst for hit rate. References [4,3] reinforce the importance of the use of document size in Web cache document replacement policies.

3 Experimental Methodology

The approach taken in this study was to obtain access logs of Web proxy caching servers and to use these logs to drive simulation models implementing the different cache management policies to be compared. The workloads are described next, followed by an explanation of the simulation model and the performance metrics used in the comparisons.

3.1 Workload Description

Four workloads are used in this study, three of them collected at Virginia Tech (BL, BR and U) as described in [7]. The last one (POP) was collected from busy proxy server in Brazil. The workloads represent different document access patterns. POP and U represent a proxy cache workload. BL represents a client cache workload of a local community accessing remote servers. BR is the workload of a server cache for remote client accesses.

The number of valid requests (no cgi requests, server return code equal to 200 and URL size greater than 0) and bytes transferred is presented in Table 1. It is also shown the fraction of valid requests representing unique documents and the fraction of bytes transferred that are unique bytes for all traces. In our experiments we consider that a cache hit is a match in both URL and size. So, if a URL appears twice or more with different sizes in the trace, we assume that the document was modified between the two references. The cached copy is out of date or stale, so it is a miss.


 

 
Table 1: Cache workload characteristics
name valid requests unique docs bytes transferred unique bytes
    % MBytes %
BL 53399 61 644 65
BR 179600 9 1481 14
POP 500000 66 3214 58
U 173597 64 2070 62

Table 2 exhibits statistics of request sizes. The median is around 2 kbytes for the workloads under study. The largest document size is 107, seven order of magnitudes greater than the smallest one. The last column of Table 2 show how many files whose size is less or equal the median could be in the place of the largest document. The high variability of the Web workloads is well evidenced in the graphics of Figure 1, where the cumulative distribution of request size for all workloads is shown in the first graph and the document popularity for BL is shown in the second graph (we do not present the document popularity graphs for the remaining workloads because they are similar to BL).


 

 
Table 2: Cache workload statistics
name mean median largest smallest largest/median
BL 12600 2654 7949704 8 2995
BR 62400 1999 10032887 18 6837
POP 6428 1703 11644863 1 8300
U 11927 2268 18825928 1 5018


  
Figure 1: Distribution of request sizes for the workloads and document popularity of the BL workload
\begin{figure}
\centerline{

\psfig {figure=gplotfx.ps,width=0.8\largura}

\psfig {figure=pop2BL.ps,width=0.8\largura}
}\end{figure}

3.2 Simulation Methodology

Table 3 summarizes the parameters of the experiment. The goal is to assess performance of single caching using LRU, SIZE replacement algorithms and the partitioned caching, called PART. An important parameter is the cache size. It was fixed in 10% of the unique bytes of each trace. It means 10% of the ``infinite cache'', that is, the cache size so that no document is ever removed. With growing cache size, the hit rates become more and more equal and tend toward the same value for all algorithms. The importance of a replacement policy with a good performance becomes more evident when cache space is scarce.


 

 
Table 3: Summary of the simulation parameters
workloads BL, BR, POP, U
policies LRU, SIZE, PART
cache size 10% of the unique bytes
PART options number of partitions: 3
subranges partition 1: 0 < size < 2kbytes
  partition 2: 2k < size < 6kbytes
  partition 3: size > 6kbytes
policy replacement LRU in all partitions

4 Results

  The results of our simulations are summarized using two metrics, hit rate (HR) and byte hit rate (BHR). Figure 2 shows the hit rate as a function of the number of requests in the trace. The best algorithm is SIZE. However, the hit rates for PART are very close to SIZE. LRU is clearly the worst, well behind the two the policies. SIZE optimizes the hit rate because it keeps small files in the cache and the popularity graphic shown in Fig. 1 shows that most references go to small files. Figure 3 exhibits the byte hit rate as a function of the number of requests. In this case, LRU is the best policy. PART's results are again close to the best results obtained. Now, SIZE is clearly the worst policy. Its poor performance stems from the fact that SIZE discards large files, that are responsible for increasing the byte hit rate.

We can see that PART works well for both metrics, hit rate and byte hit rate. This is its main advantage. Normalizing these results in relation to the best results for each metric, and taking the mean over all points for each curve, we can see that PART performance is equivalent to 97.5% of the best policy for hit rate and 94.1% of the best policy for byte hit rate. Table 4 shows the normalized results for the three policies. An important point is that the results are consistent across all workloads. Even the relative ordering of the policies studied is the same.


  
Figure 2: Hit rate results
\begin{figure}
\begin{center}

\psfig {figure=BLhit10.ps,width=0.7\largura}

\ps...
 ...7\largura}

\psfig {figure=Uhit10.ps,width=0.7\largura}
\end{center}\end{figure}


  
Figure 3: Byte hit rate results
\begin{figure}
\begin{center}

\psfig {figure=BLbytehit10.ps,width=0.7\largura}
...
 ...rgura}

\psfig {figure=Ubytehit10.ps,width=0.7\largura}
\end{center}\end{figure}


 
 
Table 4: hit rate and byte hit rate: mean for each workload and overall
  HR HR HR BHR BHR BHR
log size part lru lru part size
BL 1.000 0.980 0.858 1.000 0.978 0.818
BR 1.000 0.935 0.700 1.000 0.840 0.587
POP 1.000 0.996 0.938 1.000 0.964 0.846
U 1.000 0.988 0.938 1.000 0.983 0.904
mean 1.000 0.975 0.859 1.000 0.941 0.789

5 Conclusion

  In this paper we proposed and analyzed performance of partitioned caching, that takes into consideration key characteristics of the WWW workloads, namely, the high variability in the document sizes and the document popularity. On contrary of other policies, numerical results show that the partitioned caching scheme obtains good results for both metrics used to evaluate performance of cache management policies, i.e., hit rate and byte hit rate. Future work includes the use of partitioned caching for designing cooperative and hierarchical proxy caching. We are also working on numerical methods for determining the number of partitions and the best size for each partition as well as the choice of subrange limits.

References

1
Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira.
Characterizing reference locality in the www.
In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems (PDIS '96), 1996.

2
M. F. Arlitt.
A performance study of internet web servers.
Master's thesis, Department of Computer Science, University of Saskatchewan, Saskatoon, Canadá, 1996.

3
J-C. Bolot, S.M. Lamblot, and A. Simonian.
Design of efficient caching schemes for the world wide web.
In Teletraffic Contributions for the Information Age -- Proceedings of the 15th International Teletraffic Congress, pages 403-412, 1997.

4
J.C. Bolot and P. Hoschka.
Performance engineering of the www: Application to dimensioning and cache design.
Computer Networks and ISDN systems, pages 1397-1405, 1996.

5
Mark E. Crovella and Azer Bestavros.
Self-similarity in world wide web traffic: Evidence and possible causes.
In Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 1996.

6
D. A. Patterson and J. L. Hennessy.
Computer Arquitecture A Quantitative Approach.
Morgan Kaufmann Publishers, California, 1996.

7
Stephen Williams, Marc Abrams, C. R. Standridge, Ghaleb Abdulla, and Edward A. Fox.
Removal policies in network caches for world wide web documents.
ACM SIGCOMM 96, pages 293-305, August 1996.

8
Roland P. Wooster and Marc Abrams.
Proxy caching that estimates page load delays.
In Proceedings of the Sixth International World Wide Web Conference, 1997.