noticeboard.ru.ac.za

2014/04/02 - Scheduled maintenance: AMM data centre
During the April major maintenance window, electrical maintenance work affecting all services hosted in the Africa Media Matrix data centre will be performed in addition to other work already announced. The work in AMM will commence at 5.30 AM on Wednesday 2nd April, and should be completed before the end of the maintenance window at 7.30 AM.

A fault has developed in the static bypass module of the uninterruptable power supply (UPS) in our AMM data centre. The UPS provides electrical power to all servers and network infrastructure hosted in the data centre, and failure of the UPS would result in all services in the data centre. In order to try and prevent this, emergency maintenance will be performed by an outside service provider on the static bypass module. The fault, and the subsequent maintenance work, is not expected to cause a power outage in the data centre, but that remains a possibility until the problem is fixed.

The early morning start, whilst not ideal, is the only time the outside provider could accommodate us that still fitted within the maintenance window. If all goes to plan, work will be completed well ahead of the 7.30AM end of the maintenance window. However, there's a risk that, if things go wrong, we may overrun the end of the maintenance window into the start of lectures and the working day.

In order to mitigate the potential impact of this, most core services that run out of AMM will be temporarily migrated to our Struben data centre during the previous evening. Whilst this should prevent outages of any critical services, it may result in temporary capacity problems in Struben. This would typically be experienced as some services running slower than usual, which will be more noticeable if we overrun and load starts to pick up.

The ITSC-approved maintenance windows extend from 5.30 PM on a Tuesday evening until 7.30 AM on the following Wednesday morning. This is the first time we've used both the evening and morning of a maintenance window, so please take careful note of the dates and times. More information about maintenance windows and maintenance periods is available.
Three servers (elijah.csa.ru.ac.za, stream.cc.ru.ac.za, weatherstation.ru.ac.za) have had to be shut down for the duration of the maintenance window, as due to networking configurations, they can only be hosted in the AMM data centre.
The work on the UPS in AMM did result in a power outage in the data centre. It was also not completed, largely because the supplier neglected to bring the requisite spares with them. As a result this work will need to be rescheduled, likely into one of the minor maintenance periods.
There were a number of servers (including canna, gnu, lists, stanleyaccess, and others) that were unavailable between about 3AM and 8AM. One of the host servers in Struben data centre became unresponsive, and as a result all of the virtual machines residing on it themselves became unresponsive. This may or may not be related to the work done in preparation for the power maintenance in AMM.
post.5532703