2007/10/22 - Unscheduled Mail Server Outage
At about 11:30PM last night, the University's incoming mail and IMAP server,, crashed for as yet unknown reasons. As a result of this, users will currently be unable to retrieve or read their e-mail. All incoming e-mail is being queued on the University's secondary mail exchanger. Outgoing e-mail should not be affected by this.

We are currently investigating the problem and will post more details here when they are known.
Whilst the mail server has been up since about 8.30AM, we're still seeing a number of knock-on affects caused by this:
  • The mail server is busy trying to process a backlog of mail. As a result, some incoming mail is significantly delayed. This is affecting mail that was sent after the mail server came online again.
  • The backlog is causing a large amount of disk I/O and a high CPU load, which is in turn affecting people trying to retrieve mail. Users will notice this as their mailbox being slow to open, and mail taking longer than usual to be transferred.
  • The backlog is also causing the number of incoming connections to occasionally spike beyond what the machine can handle. This will show up as an
    SMTP outage on our monitoring system.
  • The crash corrupted a replicated database that exists on other machines. This will have affected users of ROSS, and some sites that use ROSS's data for authentication (mealserver, studentzone, etc). We only became aware of the problem shortly after lunch, and have repaired and restored the affected database. This issue should now be resolved.
  • The filesystem on one of the disks in elephant is corrupt and unrecoverable. Fortunately this disk didn't contain any significant data.

We haven't yet determined the cause for the crash.
It appears that one of the disks in elephant's RAID array has gone offline. The RAID array is currently rebuilding on a hot spare. This will be contributing to the current slow performance that we're seeing from the machine.