noticeboard.ru.ac.za

2007/10/30 - Unscheduled Mail Server Outage
At about 1:30AM this morning, the University's incoming mail and IMAP server, elephant.ru.ac.za, crashed. As a result of this, users will currently be unable to retrieve or read their e-mail. All incoming e-mail is being queued on the University's secondary mail exchanger. Outgoing e-mail should not be affected by this.

This outage appears to be a repeat of last week's one. If this is the case, the cause is the nightly backup trying to read from the disk faster than the disk can serve data. In this instance, the machine will be up within an hour or so, but responses are likely to be slow during the course of the day as the disk array rebuilds.
This time around things are slightly different. The machine didn't crash; it went into livelock. Whilst it did shutdown, it didn't unmount its operating system disks cleanly. As a result some operating system files have been lost. We've taken the mailserver offline again until this can be resolved.

There's no reason to believe that any mail has been lost at this stage -- the disk that houses mail was unaffected by this problem.
Once again, the IMAP server's RAID array has entered a fail state and is currently rebuilding onto a hot spare. This will be causing slow response as users try to retrieve their mail, in part because of the disk overheads involved.

From past experience the rebuild, which started about twenty minutes ago, will last approximately four hours. Once it has completed, we'd like to take the machine offline temporarily so we can check the physical hardware. It's not clear when this will happen, but when it does it'll likely be with short notice. We'll try to minimise the downtime and inconvenience.
post.5528049