noticeboard.ru.ac.za

2007/03/18 - Student Networking Outage
The Student Networking firewall located in the Africa Media Matrix has crashed twice in the last 24 hours. The most recent crash was just after midnight last night at which point the firewall server went into livelock and could not be reset remotely. As a result of this, most of the residences located on the upper half of campus have been without networking for the majority of the morning.

This outage is unrelated to the planned work that is going to be carried out this afternoon.

The cause of the instability isn't yet known, but we're investigating a number of different possibilities. One of these possibilities involves misconfigured device(s) in Centenary house. As a result, we've disconnected Centenary house from the network in order to establish or rule out networking in that residence as a cause.

Networking for the rest of the upper half of campus should be restored shortly.
We've pretty much ruled out Centenary house as the cause. The third and fourth crashes happened in quick succession, both without Centenary being connected to the network. We've now tried a different tack, changing some kernel configuration variables. We're going to leave Centenary disconnected just in case though -- if we get beyond six hours of uptime we'll reconnect them.
The machine's now been up for about eight hours. Centenary house got networking back around two hours ago.
The problem persists. The firewall went into livelock again over lunch today (2007/03/19). We've taken steps to try and ensure it recovers more quickly, but we're still uncertain as to why it is happening. Unfortunately we've got to make intelligent guesses and eliminate things by trial and error. This means that until we work out what's going on the system is likely to remain unstable.
post.5524915