noticeboard.ru.ac.za

2012/07/31 - Disk Failure in Main Web Server
At about 15:30 yesterday (Tuesday, 31 July), the University's main web server suffered multiple hard disk failures. The caused the University's main web page, www.ru.ac.za, to be unavailable for about four hours.

Web sites affected by this failure include:
  • www.ru.ac.za
  • ross.ru.ac.za
  • bots.ru.ac.za, corymedia.ru.ac.za, cran.ru.ac.za, desktopnotices.ru.ac.za, display.ru.ac.za, files.ru.ac.za, gallery.ru.ac.za, ginx.org.za, help.ru.ac.za, librarymedia.ru.ac.za, noticeboard.ru.ac.za, people.ru.ac.za, software.ru.ac.za, studsoft.ru.ac.za, virtualplant.ru.ac.za, et al.

At present the University's main web page is being served from our off-campus, disaster recover site. Unfortunately not all of the content is available; only content that existed within the TERMINALFOUR content management system is available. Any content stored on the original webserver itself is not (this includes all of http://www.ru.ac.za/static/). In addition, content is not necessarily being synchronised between the CMS and the DR web server.

Attempts to access content that doesn't currently exist will redirect you to an error message. Attempts to access other sites that used to be hosted on the University's main web server will do the same.
On investigation, we discovered that two disks had failed. The first of these was in a RAID array, and the RAID array appears to have rebuilt successfully; the second hosted the CRAN mirror, and appears to be irrecoverably damaged.

As a result of the disk failures the web server suffered extensive file system corruption. This primarily affected the operating system, and does not appear to have affected web content. (It affected files that were changing at the time the disks failed.)

At the moment we're trying to determine the extent of the damage. Once we've ascertained that, we'll move sites back from the DR site one-by-one.

This site, noticeboard.ru.ac.za, was the first to move back and is currently served off the original web server. Other sites will follow. www.ru.ac.za will likely be last.
All sites bar www.ru.ac.za have now moved back to their original home; www.ru.ac.za will likely return later this evening.

If you experience any problems with sites affected (particularly pages that might be missing, but were there on Monday), please report them to the unit responsible for the site -- i.e. the Web Unit for www.ru.ac.za, the Data Management Unit for ross.ru.ac.za, the IT Division for software.ru.ac.za, Grounds & Gardens for bots.ru.ac.za, and so on.
Last night we moved all services back to the old web server, and things looked mostly okay.

However, I've just notice that there is some corruption in the backend database for gallery.ru.ac.za. This is causing the database software to crash each time the site is accessed. To keep other sites that depend on this database available, I've temporarily redirected gallery.ru.ac.za back to our off-campus site. Anyone trying to access gallery.ru.ac.za will now get an error. We'll look into the problem in more depth in the morning.
gallery.ru.ac.za has now been recovered from backups.
post.5532627