From about 2.30PM yesterday (Tuesday 18 April) there was a general Network outage on the St Peter's campus. This outage will have affected all sites hanging off the Music regional switching centre including, but not limited to, the Law, Music & Education departments, Salisbury, Winchester and Cantebury houses, and Environmental Education.
This outage was our core switch's response to an abnormally high amount of broadcast traffic originating from the regional switching centre in the Music department. This sort of problem usually occurs when someone intentionally or unintentionally creates a closed loop in the network (by, for instance, plugging the same device in to two different ports on a switch). The reason for this is that loops act as repeaters -- they repeat the same broadcast traffic over and over as fast as they can. The outage occured when the number of broadcast packets from the Music switching centre exceeded 10,000 per second (a normal volume is less than 100 per second), peaking at around 23,000 per second late yesterday afternoon. This triggered protection mechanisms on the core switch that are designed to contain and limit the impact of this traffic. If these protection mechanisms hadn't activated it is likely that this fault would have caused a far larger outage spanning most of the campus.
What was unusual about this outage was the scope of the problem. The vast majority of our campus, including all of St Peters, runs on managed switching. One of the main selling points of this technology is the ability to detect and break loops close to their source, thus limiting their impact. This doesn't appear to have happened in this case, which causes us some concern. At present, the only logical explanation we have is that someone has installed an unauthorised, unmanaged device on the network in violation of the University's acceptable use policy. We were unable to confirm this yesterday since, by the time we figured out what was going on, most people had already left for the day.
So, whilst we haven't yet determined the exact cause of the problem, we did manage to isolate it to two single switch ports during the course of yesterday afternoon. As a result most of St Peter's campus should have had normal networking restored by about 4PM yesterday.
Investigations into the cause, and more specifically, why the protection that should be offered by our investment in managed switching failed to detect and resolve this, will continue during the course of today (Wednesday).