Caching in the Washington State K-20 Network

Justin Pietsch
pietsch@cac.washington.edu
Networks and Distributed Computing
Computing & Communications
University of Washington 

In 1996 the Washington State Legislature appropriated $42 million for the  K-20 Educational Telecommunications Network. Under the guidance of the K-20 Telecommunications Oversight & Policy Committee the network is to be  ". . . an integrated and interoperable educational technology network serving kindergarten through higher education and promoting access for Washington citizens." Phase I of the project will create a network for the six public State Baccalaureate Universities, the 34 Community and Technical Colleges (CTC) and the nine  Educational Service Districts (ESD). During Phase II, the project will extend the network to the 294 school districts statewide. 

Three district networks, K-12, CTC and Baccalaureate, will be combined to form the K-20 Educational Telecommunications Network. Each of the three networks will be connected via the Seattle Network-to-Network Access Point. Each Phase I network node will be connected to the hub site by inverse-multiplexed DS1s. Initially the connected institutions will have 1.5 - 6 Mbps of bandwidth. This network is expected to be developed over the next several months, in time for the beginning of the 1997-98 school year. 

From the beginning, the network was designed with caching in mind. Both performance and bandwidth conservation were critical factors in the network design. Caching servers will be placed at each of the Phase I nodes. Each sector will use Pentium based PCs. As the total number of networks nodes is greater than 45, we can use redundant cache servers if we use less expensive machines and because with Intel based machines we have many options for software. Each of the planned machines will have 166 MHz Pentium Processors with five 2 GB disk (8 GB for caching, 2 GB for the system) and at least 128 MB RAM (most will have 192 MB RAM.) 

Currently the K-12 and the Baccalaureate caching hierarchies will be run together by Networks and Distributed Computing (NDC), which is a part of Computing and Communications (C&C)  at the University of Washington. The caching for the CTC networks will be run by the Communications Technology Center (CTC). The K-12 and Baccalaureate caching servers will use  Squid caching software running on the Linux operating system. The CTC caching servers will use Netscape Proxy Server running on Microsoft Windows NT Server. (For education institutions, Netscape software is free and Microsoft Windows NT Server is $45.) 

Each of the three networks of caching servers will have parents at the K-20 hub site. The parents also will be run by NDC and will be based on Squid and Linux. Each of the three parent groups will use the other groups as siblings. The current plan for implementation for the K-12 and Baccalaureate caching servers is that at each network node there will be two Squid caching servers which will be siblings of each other. Each of these will point to their parent cache servers. 

This K-20 cache deployment will provide a testbed for concepts of cache hierarchy and cache siblings. We are very much interested in the idea of using many less expensive caching servers rather than a few- high end caching servers. If this proves out well, then it will be very easy and inexpensive to add more machines when the load gets higher, rather than having to replace or upgrade machines. 

In the case of  the K-12 network, from the top of the hierarchy at the K-20 hub site, to the ESD, to the school district main office and then finally to the individual school building is four levels of caching hierarchy. As of the writing of this paper, we have little experience in caching hierarchies.  NDC has been running several caching servers for over six months, but the clients connect to the caching servers via Ethernet and the University is directly peered with our ISP, NorthWestNet which has multiple DS3 connections. The difference in latency and bandwidth in the K-20 network could prove to be significant and the work to make the hierarchy successful could prove to be more involved than the work of a hierarchy on our own campus. 

Another critical factor is how to get K-20 end users to utilize this caching service. We do not anticipate blocking of port 80 outbound or using other means of force to entice end users to use the servers. This means that the end users must elect voluntarily to use the caching servers. We intend to work with representatives at each sector to explain the benefits of caching and then we hope for more wide spread ulization. 

Still another detail that has not been resolved is the role which the individual campus caching servers will play. We do not know if they will they in turn be parents for each site's own caching servers or if individual clients will point to K-20 caching servers or both. The caching requirements a large university with over 20,000 students will be different from those of a community college, which will in turn be different from those of an ESD. 

We  have an exciting opportunity to learn about caching in this project. How well does a caching hierarchy really work? How much different will each of the three networks be to the caching network? How well does Netscape Proxy Server use Squid as a parent? How does one organization manage 30+ boxes all across the State without leaving the central office? 
last modified 4/25/97 by Justin Pietsch