A Distributed WWW Cache

Michal Kurcewicz, Wojtek Sylwestrzak, Adam Wierzbicki

Interdisciplinary Center for Mathematical and Computational Modeling

University of Warsaw

{mkur, wojsyl, adamw}@icm.edu.pl

 

Topic: Caching software, Cache server performance analysis

 

1.  Introduction

The growth of HTTP traffic and experience with use of proxy caches have demonstrated the need for increasing the scalability and efficiency of proxy cache software. To increase scalability, proxy caching software needs to be written in a modular, decentralized fashion. On the other hand, to improve efficiency, the problem needs to be decomposed. Both these basic and common-sense considerations point to a new design solution: a distributed proxy cache.

In a centralized cache server, the main bottleneck will be a central resource. In a distributed cache, it will be communication. Therefore we stand before design considerations different than in a centralized solution: efficient request direction to minimize communication overhead, and keeping essential data structures consistent - as cheaply as possible.

Finally, to exploit fully the opportunities for improving performance by a distributed design, we must manage global resources of a distributed cache as efficiently as possible. This means that components of a distributed server must cooperate with each other closely. To achieve improved scalability, we must distinguish functional components of a proxy cache and build the design from autonomous modules which realize these functions.

In this paper, we focus on the described above problems. We attempt to improve the efficiency of proxy cache operation in basic ways. This can be achieved through additional functionality of our design. The distributed cache has better global memory management and avoids use of disk in favor of transfers between memory caches.

Another field of our attention is resilience of a proxy cache. The distributed cache does not have a single point of failure. This is achieved through the use of election algorithms (see section 4). As a result, our design is safer than currently used solutions.

Our design is in many respect based on the distributed file system xFS [1], [2]. However, there are significant differences because of the specific character of our application.

Our design uses a hierarchy similar to that of xFS, but it is more simple. A distributed file system has complex functions for consistency control, which are needed because of repeated, concurrent writes. A cache or WWW server writes objects once, then removes them if they expire - possibly replaces them after a modification.

In the next section, we briefly discuss some design considerations which arose from the evaluation of performance of existing solutions. In section three we describe the functional architecture of our distributed proxy cache. Section four discusses failure control of the functional components. In section five, we describe load sharing. In section six, we recount our experiences with a prototype implementation of our design and conclude.

2.  Design considerations

In this section, let us describe some problems which we aim to solve by our design. We do this by analyzing the characteristics of two mechanisms for dividing load among individual caches.

A commonly suggested way to partition traffic to a set of proxy servers is to use a client-initiated proxy selection mechanism called proxy auto-configuration. In the suggested scheme [5], [9] a hash function is computed from the URL to route requests to different proxies.

Let us briefly criticize this solution: the points mentioned here shall be elaborated in further sections.

  1. Hot-spots. The division of requests by the use of a hashing function is vulnerable to hot-spots. These are very popular web servers or objects, always routed to the same proxy server, which makes the distribution of load uneven.

  2. Proxy failure. Auto-configuration schemes rely on the client browser to recognize the situation when a proxy has failed and to remove it from the set of proxies. This could be accomplished more efficiently by a specialized manager which is part of the distributed cache.

  3. A one-level hierarchy. The structure used by auto-configuration schemes is a one-level hierarchy. We believe that that the structure used in our design provides better scalability.

  4. Persistent HTTP connections. In [5] the hash function is computed only from the part of the URL which is the server address. When objects from one server are routed to a single proxy, the use of persistent HTTP connections can improve performance. However, when hash functions are computed in such a way the quality of load sharing deteriorates, as in the case of hot-spots.

Another way to cluster individual caches into a ``composite cache'' is to use the mechanism of a cache hierarchy.

In this solution, several caches - members of the cluster - are provided to clients as a single cache through a load balancing TCP router or round-robin DNS. Internally, every cache has a routing array which is used to select a parent.

  1. Use of global main memory. Members of the cluster are autonomous caches. Objects requested by them from a parent are then cached by themselves. In this way, all popular objects are stored in main memory. In our design, the level of replication of objects in main memory is explicitly controlled and depends on changing object popularity (see section 5). Therefore, we are able to use global main memory more efficiently.

  2. Transfers between cluster members. Through the use of autonomous caches, a cluster seeks to avoid local object transfers. However, when an object resides in main memory, a local transfer among components of a distributed server which are connected by a fast switching network can be faster than reading the object from local disk.

To summarize, caches connected by a hierarchy are not cooperating closely enough to use global resources efficiently.

3.  Architecture of a Distributed Cache

Our design is based on a division of a distributed server into functional components. These components are not separate caches connected by a hierarchy; in fact, more than one of the functional components could run together on one computer. There are three kinds of server components which we distinguish: the Frontend, the Backend and the Manager. No components of the same kind can run together on one machine.

Main Memory Caches on Frontends

The Frontends are responsible for handling client connections. One can think of a Frontend as an autonomous proxy, which maintains its own cache in main memory.

A Frontend is selected for a client using Round Robin DNS or a load balancing TCP router. After that, the selected Frontend checks if it has the requested object in its main memory cache. If not, it asks the Manager where the object can be found.

The Frontend's main memory caches store popular objects. Because of that, there is a high probability that a client request will be handled directly by some Frontend, without the need to touch disk. The hit rate of the Frontends is determined by three factors: the sizes of their main memory caches, the quality of their replacement policy and the quality of the algorithm which determines if a object should be placed on some Frontend.

We can now demonstrate how our design removes the problem with using persistent connections and proxy auto-configuration. A persistent HTTP connection is made to a Frontend, and it can remain open regardless of what objects are requested and where the requests are directed.

Processing a client request

On Figure 1 we show two kinds of components: the Frontends and Backends. A Manager is also present: the table on the figure is part of the Manager's data structures and is called the Manager routing table. The small boxes containing ``Hit'' or ``Miss'' on Figure 1 also represent information internal to the Manager. Figure 1 shows three Frontends (F1, F2, F3).



Figure 1. Frontends and Backends working with one Manager

On Figure 1, there is only one Manager. The distributed cache usually has more than one Manager. However, let us discuss the simpler case first. This will enable us to show how client requests are handled.

After a client makes a request to some Frontend (for example, F2) and the object (URL3) is not found immediately in F2's memory cache, F2 asks the Manager to find URL3. He consults his data structures to determine if the requested object is cached on some other Frontend. On Figure1, URL3 is found on F1. The Manager asks F1 to send the data to F2 and determines if F2 can store a duplicate of the object himself (for details, see section 5). If the requested object is not found on some Frontend (as in the case of URL2), the Manager computes the value of the hash function on the object URL modulo the size of the routing array. This value is used as an index to determine the id of the Backend which is written in the indexed field of the routing array.

The Manager checks if the object is present on the selected Backend. This is done without querying the Backend, because the Manager has all object metadata. If the object is found, the Manager asks the Backend to send it to the Frontend which made the request. Otherwise, the Manager asks the Frontend to fetch the object from the home server and to store it on the selected Backend.

One of the issues raised in section two can be resolved here. In our distributed cache, we use transfers between Frontends to fetch the object because this can be faster than using local disk. This strategy makes sense if the cluster is connected by a fast, switched network such as ATM. The remote Frontend's memory becomes a stage in the memory hierarchy which fits between local memory and local disk [3].

Disk Caches on Backends

As mentioned earlier, the Backends form a distributed file server to which Managers direct requests. Basically the function of a Backend is very simple: it only needs to store objects and return them on request. In our design the Backend does not even decide whether it has an object or not: this is done by the Manager, who stores the object metadata. We make no other functional assumptions about Backends. However, we believe that a Backends main memory use should be as small as possible, so that it would not retard Manager or Frontend operation.

We believe that the Backend could use dedicated solutions for optimal storing of files, such as the WAFL file system used by NetCache. In our design the role of Backend could be fulfilled by any caching software: in fact, in our first prototype implementation Backends were Squid caches.

We say that a Backend ``owns'' the fields in the routing table which have its identifier. The amount of fields owned by a Backend is a variable which can be used to control the portion of all requests which will be served by the Backend. We return to this point when we discuss load sharing in section 5.

Management of Metadata and Routing

All object metadata and the routing table to Backends are maintained by the Manager. Frontends ask Managers to direct object requests. If the design is to scale well, there cannot be too many Frontends to one Manager (for example, not more than 8 Frontends to one Manager). Otherwise, the Managers will be forced to store too much data and become overloaded. Therefore, when the distributed server has more components, there should be more than one Manager.

In such a case, the stream of requests has to be divided by Frontends among the Managers. Every Frontend has a Frontend routing table which tells him which Manager to ask for the requested URL. The Frontend routing table is provided for the Frontends by a selected Manager, who acts as a consistency server. We shall say more about this in the next section, when we discuss Manager failures.

On Figure 2 we show the situation when there are many Managers. F1 and F2 have an identical Frontend routing table, which contains the id's of 3 Managers. Only two are shown on the Figure, along with requests for which the F1 asks them. The two Managers are still shown in the form of Manager routing tables, to demonstrate why our design is not a one-level hierarchy.



Figure 2. A system with more than one Manager

The set of objects for which a Manager is responsible is determined by the second Frontend routing table and is separate from the object sets of other Managers. Therefore, the Manager routing tables do not need to be consistent with each other. The same is true about object metadata.

On Figure 2, the Backends are not numbered. The reason for this is that the two shown requests, for URL1 and URL2, could be directed to the same Backend by the two Managers. Such a solution would make sense if there are many Frontends and few Backends in the distributed cache.

4.  Controlling Failures

In this section, we discuss how failures of various functional components are handled in our design.

A client selects a Frontend by the use of a mechanism external to our design, such as round-robin DNS or a load balancing TCP router. The failure of a Frontend must be discovered by that mechanism. It is common knowledge that the load balancing TCP router has an advantage over the DNS server in that respect. This is because clients often cache DNS server responses. The used mechanism would also need to handle Frontend recovery. Regardless of the mechanism used, the delay and damage associated with correcting a Frontend failure will be smaller than in the proxy auto-configuration scheme, where each client must individually notice and correct the failure.

The reaction to a failure of a Backend is more complex. A Backend owns several fields of the routing tables of the system's Managers. After a Backend failure, the fields which it owned must be assigned to a different Backend. After a Backend recovery, the Managers repartition their routing arrays to add the Backend again.

To complete our discussion of Backend failures, let us remind the reader that the objects which are stored on Backends are not very popular. A Backend failure does not seriously impair the distributed cache.

The Manager is a critical component with respect to resilience. A Manager failure means a loss of object metadata and consequently, of direction for object requests. However, a Manager can create a backup of object metadata on a selected group of Backends. To make this less expensive and more resilient, the backup can be done in a log-structured way (as in standard caching software). In this way, the backed up metadata is always consistent, and after a Manager failure a different Manager can pick up the backed up metadata and add it to his own.

The Frontend routing table (described in section 4) is kept consistent by a selected Manager, called the coordinating Manager. In the case of a failure of an ordinary Manager, the coordinating Manager determines which Manager should replace the failed one.

Let us stress that the coordinating Managers role is restricted to intervention in the case of a failure or recovery. During ordinary operation, the coordinating Manager does not have additional work or store much additional data.

Because of the need for a coordinating Manager, another issue arises. It is the failure of the coordinating Manager itself. We can discuss this issue together with the situation of a failure of the only one Manager in the system. This only Manager can also be called coordinating.

In the case of failure of the coordinating Manager, an election takes place. We use the well-known tyrant algorithm [7], in which all parties participating in the election have unique numbers - and the one with the largest number wins. If we allow all machines running Frontends without Managers and all machines running Managers to participate in the election, and assign to the former numbers smaller than to the latter, then we have a solution to the stated problem. If any Managers are present, one of them will win. If no Managers are running, one of the Machines running Frontends will win and start a new Manager.

In order to achieve good scalability, the system needs a certain minimal number of Managers (as stated in the previous section). Therefore, after a Manager failure, it may be necessary to start a new Manager on another machine. This should be decided by the coordinating Manager.

5.  Load Sharing in a Distributed Cache

Frontends Replacement Policy and Filtering Algorithms

The Frontend's main memory caches cannot be very large. Therefore, a Frontend replacement policy will need to be very efficient, because most Frontend objects are popular and a removal from the cache might result in significant loss. However, on the basis of investigation of log files at the Polish national cache we can say that the 10% most popular objects account for about 50% of all requests. The Frontend's main memory cache does not need to be large to achieve a very good hit rate.

The algorithm which places objects in Frontend caches needs to select the most popular objects. It will therefore require a model of Web object popularity or client access characteristics. Such models and algorithms have been proposed in our paper ``A Filtering Algorithm for Proxy Caches'' and will not be discussed here.

The main memory of all Frontends is coordinated globally by the Managers. This is done by the use of a mechanism of popularity counters. After a object is placed in some Frontend's cache by the filtering algorithm, it will not be placed into another Frontend's cache until the Manager which is responsible for that object decides that it is sufficiently popular. Presently, we use a simple set of popularity intervals to determine how many times a object can be replicated in the Frontend's caches. The most popular objects will be placed on all Frontends.

Every time a Frontend removes a object, it needs to notify the object's Manager. These notifications are carried out every specified time interval. In this way, the Frontend can at the same time report to the Managers that it is still operational, and some communication is avoided.

Here we describe how our design solves the problem of hot-spots. Since requests for hot-spots are satisfied by Frontends, the requests which are forwarded to Backends are for less popular objects. Because of that, the routing of these requests by the means of a hashing function creates a quite even distribution of load. To prove this contention, we presents on the following table some quantitative data about the distribution of load in 800 fields of a routing array before and after the application of a filtering algorithm. The data were gathered from a simulation of algorithm operation on a log file, which lasted 2 hours. The data in the table describe the last hour.

Percent max-min max-min variance variance mean mean
of requests without with without with without with
to Frontends filter filter filter filter filter filter
43% 42 16 5.2 2.7 6.6 3.75

Partitioning the Routing Arrays

We have said much about directing requests by the means of routing arrays. The efficiency of this method is crucial to the performance of the distributed cache under heavy load. The amount of load which will be placed on the individual components of the distributed cache (a Manager or a Backend) is controlled by the number of fields assigned to that component in the routing array. Assigning too many fields may lead to overloading of the component and deterioration of distributed cache performance.

We have been able to show that a proportional division of the array is not the best method and have developed an algorithm which takes into account load information to partition the array as to minimize overall load. The algorithm has been tested in a simulator of a queuing model for a proxy cache. However, the discussion of the algorithm operation, simulator construction and performance tests is beyond the scope of this paper. It will be the subject of another paper, which is in preparation.

6.  Concluding remarks

The described design was prototyped and its algorithms tested on real workloads from several caches in Poland. However, none of the workloads available to us had a sufficiently high request rate and came from a cache with high network speed. Because of these limitations, we cannot consider prototype testing of the design to be sufficiently reliable. This will be one area of future work.

The prototype was implemented in Java for maximum portability and ease of implementation. During the first set of tests and work on the prototype we investigated the effect of a simple filtering algorithm on Frontend hit rate. This led to a detailed investigation of filtering algorithms, which is beyond the scope of this paper.

The functional components of a distributed cache to cooperate closely to improve performance. We have investigated techniques for better global memory management and increased the resilience of the distributed cache. All the described improvements were made possible by an increase of functionality due to the distributed design of our cache. The distributed cache is in several respects optimized to reduce communication overhead. The amount of data which is kept consistent is minimal. We believe that a production implementation of the presented design is feasible with the advent of high-speed, switched local networks.

7.  References


File translated from TEX by TTH, version 1.41.