| The WWW Infrastructure at Manchester and Future Developments |
| Manchester Computing, The University of Manchester, Oxford Road, Manchester, M13 9PL United Kingdom | |
|---|---|
The University of Manchester is one of the UK’s leading Universities. It has a large student and staff population and is part of an evolving Metropolitan Area Network which will ultimately incorporate many Higher Education organizations in Greater Manchester and the North West of England. Demand for networked services, in particular the WWW, is growing exponentially. In order to maintain and improve levels of service at Manchester we have implemented, and are aiming to develop further, WWW caching mechanisms. In this paper we describe the WWW and network infrastructure at Manchester and the evolution of our caching service. We then discuss current patterns of use and user demands and expectations. Finally we consider possible future developments of caching services at Manchester.
Manchester is home to several universities and Higher Education Institutions (HEIs) and hosts the largest concentration of students in the UK. The University of Manchester is one of the largest universities in Manchester and the North West region. It attracts a wide range of students and is renown for excellence in many disciplines. It is at the forefront of implementation of new communications technologies and facilities through various state-of-the- art developments, such as G-MING, a Metropolitan Area Network. It is a major node on JANET (the UK Joint Academic Network) and SuperJANET and is connected to the global Internet. It is, in terms of its size, expertise and geographical location, uniquely placed to harness the potential of the latest communications technologies for the development of teaching, training and research and for establishing new network-based services and facilities.
Manchester Computing at the University of Manchester is responsible for the maintenance and development of computing facilities and network infrastructure across the whole of the University of Manchester and UMIST (University of Manchester Institute of Science and Technology) and nationally provides services to approximately 100 universities and research establishments through out the UK. One of its major roles is the coordination and development of WWW infrastructure on the Manchester campus.
WWW services are in great demand at Manchester. In this paper we describe our attempts at Manchester Computing to establish caching facilities as part of our overall WWW service, and our thoughts and plans for the future. We begin by giving an overview of the networking infrastructure and technology at Manchester showing the scale of the operation and the associated WWW infrastructure. We then discuss the evolution of the Manchester cache service up to the present day giving its actual configuration at the time of writing. We then move on to discuss current patterns of use and perceived user requirements. This leads us to a discussion of some of the developments we plan to undertake to improve levels of service and produce new facilities. Clearly WWW object caching will play an increasingly important role in the evolution of WWW services. We list some of the options and give our views on future developments.
We now give an overview of the Manchester network and WWW infrastructure. Our aim is to give an indication of the level of demand for WWW services now and in the future. In section 4 we describe how the Manchester cache service has evolved over the last 3 years or so to meet this demand to date.
The Manchester Local Area Network actually incorporates two campuses: the University of Manchester and UMIST (University of Manchester Institute of Science and Technology). It consists of an FDDI ring linking Ethernet segments into university departments and buildings and supports approximately 10,000 PCs and workstations. It has gateways into JANET, SuperJANET and the world-wide Internet via links to the USA and Europe (via Sweden and Amsterdam). A diagram of the Manchester and UMIST LAN is given below. It illustrates how any University department accesses JANET and the Internet. Note that the Manchester LAN is now linked to other similar LANs by the Manchester Metropolitan Network described below.
G-MING (the Greater Manchester Internetworking Group) is a Metropolitan Area Network currently linking several universities in Greater Manchester. Future expansion will link various other organizations such as teaching hospitals and public libraries and will ultimately incorporate student ‘Halls of Residence’. The MAN diagram indicates both the nature of the technology employed and the number of organizations connected. G-MING essentially links the individual LANs of each organization and provides gateways to the global Internet. It incorporates the latest technologies which include ATM and multicasting and provides opportunities for the development of new networked services. As this infrastructure develops the demand for WWW services may be expected to rise relentlessly.
G-MING: Core Site Interconnections
A large proportion of academic staff and postgraduate students at Manchester, and at organizations connected to G-MING, have access to JANET and the Internet from their office desks. Most of our undergraduate students are able to access the network from the growing number of PC clusters scattered around the campuses. Our student Halls of Residence are being ‘flood-wired’ to meet the demand from the increasing number of students with their own PC kit. In this highly networked scenario we have many thousands of users requiring WWW services for project work, research and last but not least, leisure activities.
We have a modified version of wwwstat, which has two extra sections to its output, the first gives a summary of the User-Agent string and the second gives a summary of the Referer-URL string. The following extract shows that Netscape (Mozilla) is the most popular WWW browser/client currently accessing www.mcc.ac.uk:
Summary of User-Agent Strings (greater than 1.0000%)
From 1 Sep 1996 to 8 Sep 1996
requests percentage user-agent
5197 3.2709% Harvest/1.4.pl2
2870 1.8063% Mozilla/1.0N
7092 4.4636% Mozilla/1.1N
12027 7.5697% Mozilla/1.22
1912 1.2034% Mozilla/1.2N
29805 18.7590% Mozilla/2.0
13919 8.7605% Mozilla/2.01
19498 12.2718% Mozilla/2.02
2678 1.6855% Mozilla/2.02Gold
21501 13.5325% Mozilla/3.0
2525 1.5892% Mozilla/3.0b5aGold
2678 1.6855% Mozilla/3.0b6Gold
1649 1.0379% Mozilla/3.0b7
2348 1.4778% Mozilla/3.0b8Gold
Total entries in log = 158884
Moreover we host an expanding 'dial-in’ service and the use of Netscape from home and Halls of Residence by staff and students is increasing. There are some 1500 ‘dial-in’ users of which we estimate around 200 use Netscape from home.
Most University departments now have, or are in the process of installing their own WWW servers and are rapidly making more information available. Several departments are in the process of setting up their own proxy caching service, which will feed off the Manchester Computing Cache. The Manchester Computing cache is linked to other caches in the UK and around the world and individual departments can avail themselves of this enhanced service, the evolution of which we now describe.
During 1994, it was decided to implement a local Proxy/Cache service at Manchester, and after examining what was available, we settled on the Lagoon program to run in conjunction with our existing NCSA based WWW server. The Lagoon program did have some limitations, one of which was the requirement that each URL had to be rewritten using a CGI program so that it could be forced through the Cache, also that the Lagoon Cache would serve out incomplete documents. This was later abandoned in favour of the CERN server, but with the CERN server running solely as a Proxy/Cache server. At that time some parts of the Manchester Web made use of the SSI (server-side-include) features of the NCSA server quite a lot and the CERN server did not support SSI.
The CERN Proxy/Cache was initially configured without any access restrictions, but we had to tighten it up as there appeared to be more accesses coming in from outside the Manchester campus than there were from inside. Even with the access restrictions in place, we found that the use of the Proxy/Cache was getting so popular, that the system, a twin processor SUN 630MP could not cope with the load, and regularly ground to a halt with 200 plus processes running.
In April 1995, we began to use the Harvest Resource Discovery system to index all the WWW (which by this time was around 40 individual systems) and Gopher servers, including MIDAS (Manchester Information Datasets and Associated Services). After experimenting with the Harvest cached program, we replaced the CERN Proxy/Cache to Harvest cached in May 1995, and noticed that the loading on the Proxy/Cache system dropped to around 1/50th of its previous state.
In June/July 1995, we were approached by Martin Hamilton of Loughborough University to join in with a experimental linked cache system on the JANET network, between Manchester; Loughborough; Nottingham and Hull.
When the Harvest cached project ended, we resisted the change from the Harvest cached program to Squid, as the early versions appeared to be very unstable on our system. The different versions of Squid were evaluated on another system (a P120 running Linux 1.3.94) and when it appeared to have stabilised, we switched to Squid 1.0 on the main system, which has been running very well without any major problems.
One problem with changing servers, has been that of user education, Our CERN Proxy ran on port 8585, and then later on the Harvest/Squid Proxy was on port 3128. We did try and do a gradual changeover by cutting down the size of the CERN cache to around 10 Megabytes, and increased the Harvest cache to around 100 Megabytes, and then later on modified the CERN proxy to return an Informative Error message to say that it had been disabled and gave a URL pointing to the new Harvest details. We have seen some browsers still configured to use the CERN server on port 8585 a year after the service had been withdrawn !
To give a flavour of cache use at Manchester we now present some of our Squid Proxy/Cache statistics for a typical week in September 1996 in Tables 1 through 6. Due to disc space limitations and the size of the cache logs, (typical size 200Mb for one week) we are only able to keep the previous week’s logs. This will be rectified in the near future. We can state however that the Proxy/Cache use has been steadily increasing over the last year or so.
Table 1 shows that we received almost 814,000 ICP queries and over 349,000 GET requests, shipping some 3328 Mbytes, in the sample period. Table 2 indicates that HTTP is by far the most popular protocol with FTP and GOPHER trailing far behind. We aim to use these statistics for planning future capacity. Table 3 indicates that uk.ac.lut.egate (University of Loughborough) cache was our most popular customer. In fact most of the sites shown are caches within the UK with the exception of the Polish site pl.edu.icm.sunsite. Table 4 indicates that uk.ac.midas was the most popular server, however this was unusual. Sites such as com.netscape.hom and com.microsoft.ww are often far more popular. Table 5, as might be expected, shows that image and HTML are the most popular URL types. Although movie and audio URLs are the least popular, we expect their volume to grow in the coming years. Table 6 shows our most popular top level domains. There is nothing particularly surprising about the order. As might be expected, com is by far the most popular followed by uk and net.
We do not at this time have statistics from any other caches to enable comparisons. However the Manchester Cache, is certainly one of the most heavily used in the UK. In fact we would wish to offer a more comprehensive service within the UK but this is not currently possible given our limited hardware resources and shortage of funding.
At the University of Manchester we do not enforce the use of the cache; it is entirely voluntary. It is difficult to determine the exact proportion of cache users but from rudimentary investigations we are confident it is above the 50% mark. Some of our users employ other caches such as hensa.ac.uk, the UK national cache. It is not clear however whether this practice is more or less efficient in terms of bandwidth utilisation, and perhaps this could be a topic for further investigation.
Basically our users, as all others, expect to retrieve a WWW object within seconds of issuing the request. If this does not happen they complain to the Computer Services Department. We then advise that they use the cache, which usually resolves the problem. However this is by no means always the case, sometimes it is faster to go directly to the remote site than it is to use the cache.
With the increasing pervasiveness of LANs and the development of MANs, as detailed in previous sections, user demand for access to the WWW will necessitate the implementation of local and national caching mechanisms on a wider scale, if their response time expectations are to be met. In the next section we consider how this may be achieved.
| Method | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| ICP_QUERY | 813955 | 100% | 8% | 0 | - | - | 0.00 | - | - |
| GET | 0 | - | - | 349125 | 99% | 24% | 3328.40 | 9% | 31% |
| POST | 0 | - | - | 3760 | 1% | - | 43.12 | 1% | - |
| HEAD | 0 | - | - | 89 | 0% | - | 0.06 | 0% | - |
| CONNECT | 0 | - | - | 22 | 0% | - | 0.06 | 0% | - |
TABLE 1. SUMMARY OF REQUEST METHOD USAGE
| Protocol | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| HTTP | 806490 | 99% | 8% | 349752 | 99% | 24% | 2861.22 | 85% | 30% |
| FTP | 6369 | 1% | 2% | 2690 | 1% | 18% | 497.90 | 15% | 36% |
| GOPHER | 1065 | 0% | 0% | 518 | 0% | 5% | 12.33 | 0% | 38% |
| FILE | 29 | 0% | - | 0 | - | - | 0.00 | - | - |
TABLE 2. SUMMARY OF PROTOCOL USAGE
| Client | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| egate.lut.ac.uk | 189423 | 23% | 6% | 7043 | 2% | 100% | 105.10 | 3% | 100% |
| lumen.brunel.ac.uk | 117409 | 14% | 8% | 1153 | 0% | 100% | 42.75 | 1% | 100% |
| peg.cranfield.ac.uk | 108263 | 13% | 6% | 3086 | 1% | 100% | 33.99 | 1% | 100% |
| humus2.ucc.hull.ac.uk | 71113 | 9% | 13% | 34800 | 10% | 5% | 438.02 | 13% | 7% |
| norse.mcc.ac.uk | 0 | - | - | 72590 | 21% | 6% | 357.01 | 11% | 7% |
| io.salford.ac.uk | 57838 | 7% | 2% | 1278 | 0% | 100% | 37.59 | 1% | 100% |
| freya.dmu.ac.uk | 46799 | 6% | 2% | 346 | 0% | 100% | 10.33 | 0% | 100% |
| ccc.nottingham.ac.uk | 40470 | 5% | 17% | 926 | 0% | 100% | 9.76 | 0% | 100% |
| maple.shu.ac.uk | 25439 | 3% | 14% | 2464 | 1% | 100% | 49.30 | 1% | 100% |
| cass41.ast.cam.ac.uk | 13871 | 2% | 11% | 460 | 0% | 100% | 3.71 | 0% | 100% |
| akis.csc.umist.ac.uk | 0 | - | - | 10156 | 3% | 20% | 139.63 | 4% | 23% |
| sunsite.icm.edu.pl | 7189 | 1% | 10% | 1263 | 0% | 22% | 21.86 | 1% | 31% |
TABLE 3. SUMMARY OF CLIENT USAGE
| Server (http) | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| midas.ac.uk | 12 | 0% | 25% | 51689 | 15% | 3% | 115.84 | 3% | 9% |
| home.netscape.com | 13432 | 2% | 47% | 16847 | 5% | 67% | 93.43 | 3% | 89% |
| www.yahoo.com | 6514 | 1% | 14% | 4253 | 1% | 47% | 19.88 | 1% | 46% |
| www.microsoft.com | 7580 | 1% | 20% | 2906 | 1% | 46% | 28.02 | 1% | 44% |
| www.geocities.com | 7282 | 1% | 5% | 2786 | 1% | 25% | 17.02 | 1% | 24% |
| count.digits.com | 7467 | 1% | 5% | 2050 | 1% | 25% | 0.87 | 0% | 30% |
| www.mcc.ac.uk | 67 | 0% | - | 7112 | 2% | - | 78.70 | 2% | - |
| info.mcc.ac.uk | 86 | 0% | - | 6075 | 2% | - | 104.95 | 3% | - |
| www.lycos.com | 2863 | 0% | 21% | 3115 | 1% | 48% | 18.79 | 1% | 48% |
| images.infoseek.com | 3899 | 0% | 31% | 1430 | 0% | 61% | 6.70 | 0% | 67% |
TABLE 4. SUMMARY OF SERVER USAGE
| Type | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| Image | 571524 | 70% | 9% | 195120 | 55% | 33% | 1531.14 | 45% | 36% |
| HTML | 107532 | 13% | 16% | 96338 | 27% | 10% | 396.97 | 12% | 25% |
| Other | 77715 | 10% | 3% | 39522 | 11% | 11% | 785.17 | 23% | 24% |
| Directory | 49125 | 6% | 5% | 19047 | 5% | 23% | 101.02 | 3% | 32% |
| Bundle | 2808 | 0% | 2% | 817 | 0% | 16% | 402.72 | 12% | 32% |
| SHTML | 2245 | 0% | 2% | 548 | 0% | 25% | 3.43 | 0% | 34% |
| Text | 1511 | 0% | 0% | 1250 | 0% | 19% | 18.74 | 1% | 4% |
| Movie | 1226 | 0% | 1% | 248 | 0% | 19% | 126.67 | 4% | 35% |
| Audio | 268 | 0% | 2% | 106 | 0% | 8% | 5.85 | 0% | 4% |
TABLE 5. SUMMARY OF URL TYPES
| Domain | UDP COUNTS | TCP COUNTS | TCP BYTES | ||||||
| counts | %all | %hit | counts | %all | %hit | Mbytes | %all | %hit | |
| com Commercial | 433580 | 53% | 10% | 157301 | 45% | 34% | 1603.32 | 48% | 40% |
| uk United Kingdom | 60812 | 7% | 7% | 107739 | 31% | 10% | 691.32 | 48% | 40% |
| net Network | 68788 | 8% | 6% | 17192 | 5% | 25% | 188.43 | 6% | 31% |
| edu Educational | 55102 | 7% | 3% | 16119 | 5% | 20% | 247.74 | 7% | 23% |
| org Non profit | 21523 | 3% | 4% | 6970 | 2% | 21% | 59.28 | 2% | 18% |
| nl Netherlands | 16966 | 2% | 3% | 3313 | 1% | 19% | 50.04 | 1% | 14% |
| jp Japan | 14129 | 2% | 3% | 1728 | 0% | 23% | 17.94 | 1% | 23% |
| ca canada | 10717 | 1% | 7% | 3098 | 1% | 17% | 23.96 | 1% | 23% |
| de Germany | 9874 | 1% | 0% | 2807 | 1% | 14% | 27.30 | 1% | 8% |
TABLE 6 SUMMARY OF TOP LEVEL DOMAINS
The statistics shown in the tables above, were generated by three PERL scripts produced by NLANR, i.e:
access-extract.pl -hsummary_temp access-extract-urls.pl -h >summary_temp access-summary.pl summary_report
Ideally we wish to minimise resource access and retrieval times for our users and develop a scalable and robust cache infrastructure. In particular we wish to utilise our Metropolitan Area Network and its multicasting capabilites to achieve these objectives. There are however several issues that we believe require some further investigation. We look forward to suggestions and comments from members of the workshop.
We believe all the above issues apply to the national and international caching networks and should also be considered in this context. One other issue which need to be resolved is that of dedicating bandwidth for inter cache communication. It would be useful to have some idea of the effect this would have on latency or transfer times.
We have indicated the nature of our network and WWW infrastructures at Manchester. It is clear that as demand grows we will probably have to implement more sophisticated caching mechanisms and policies. However we believe some investigative work still needs to be done before we can confidently design a caching infrastructure for the future at Manchester.