noticeboard.ru.ac.za

2010/07/05 - Unscheduled Internet Outage
At about 1130 this morning, Rhodes lost access to the Internet outside South Africa. You will likely experience this outage as an ERR_DNS_FAIL or ERR_CONNECT_FAIL error when trying to access websites hosted internationally (including .za websites hosted out the country). When accessing secure (https://) web sites, your browser may report that the connection has timed out or the site is unreachable.

We will follow up with TENET and further information will be posted here as we receive it.
We have received the following notification from TENET:

QUOTE
A fault on SEACOM has caused a complete failure of international
bandwidth to all TENET sites. The nature of the fault is not yet known,
though it is not suspected to be a cable break; nor is the estimated
time to repair.


Further details will follow as we receive them.
A further update from TENET:

QUOTE
SEACOM report that the fault is between Mumbai and Mombasa, and that a
repeater has failed. More information is expected shortly.
Seacom have published a initial statement on their web page. This states that the faulty repeater has been located and that a cable ship has been dispatched to the site. They'll have to lift the cable, replace/repair the faulty repeater, and then drop it back into the ocean. They say that "the overall process may last a minimum of 6-8 days", but that they're actively negotiating an alternative to provide access whilst repairs are done.

We're told that there'll be a further announcement, hopefully containing details of the alternative plans, in the next couple of hours.

Rhodes currently has an interim plan that allows incoming and outgoing e-mail to continue working, albeit very slowly (this facility exists during most prolonged Internet outages).
During the course of last night, TENET's SEACOM connectivity was restored. It appears that this is an anomoly; most other SEACOM customers still have no connectivity, and SEACOM's NOC shows TENET's circuit as down. As such, we can expect international Internet access to fail again at any point.

The cable ship is still en route, and repairs have not yet commenced. No alternative route has yet been found. As such, the original 6-8 day prediction remains the best case scenario at this stage.
TENET's international links failed again about an hour ago, and other than that, the situation remains essentially unchanged: namely that the SEACOM undersea cable appears to need repairs that will take a minimum of a week, and that TENET are investigating alternative methods to provide international access.
We've received word from TENET that they're now treating the SEACOM outage in terms of their disaster management strategy, and are attempting to procure their own connectivity via another cable rather than waiting for SEACOM to do so.

If and when this happens, it is likely that we'll only have a small fraction of our normal Internet bandwidth available to us. We've therefore been asked to pro-actively conserve Internet bandwidth and to ensure that its only used to perform essential tasks. One of the first steps we'll take in this regard is to adjust the demand-side management values currently in force; we may also need to take more drastic steps, depending on the amount of Internet bandwidth TENET is able to procure and the absolute demand for bandwidth generated by the university.
We currently have international access again, and have done so since about 5AM. This is functioning by the same unexplained anomaly (IP-over-Mermaid?) that restored access yesterday morning. As such, we should not expect it to remain stable or reliable.

The status quo remains essentially unchanged: SEACOM's cable ship is en route to do repairs, and they expect the outage to last 6-8 days. TENET is looking for alternatives via other cables, but have not yet had any success.
QUOTE
Service on the SEACOM cable is up and down - more up than down today, and currently up. SEACOM still have no definitive diagnosis of the failure, and have not changed their estimate of six to eight days to repair. Why we're getting any service at all is something of a mystery, and it's entirely possible that the intermittent service we've been having will finally die completely - or it may stabilise and remain that way until the repair ship arrives, at which point there will be some hours of further downtime while the actual repair is effected

Unfortunately, rather than stabilise, at the moment things appear to be getting worse. Hopefully, as with the two preceding nights, things will improve in the early hours of the morning. Maybe the mermaids needed some sleep?
TENET have managed to make arrangements with other South African ISPs to provide limited international access via other cable systems (TEAMS & SAT-3) until such time as SEACOM is restored. This capacity will hopefully be available during the course of this evening or early tomorrow.

Note that this means that there'll be substantially less bandwidth available than normal. As such you are requested to limit your use of international bandwidth, particularly during working hours. Only business-critical functions should be performed during these times. In this regard, we've made some emergency changes to the demand-side management system to try and enforce this behaviour to some extent. Please be aware that, unless we have your co-operation in this regard, it may become necessary to introduce further restrictions.
TENET have managed to bring 100Mbps of international bandwidth online. This is about 1% of their usual capacity, so we can expect things to be very, very slow and congested. This is particularly true at the moment as a number of institutions start sending and receiving e-mail that's been queued for the last few days (we've fortunate that we had alternative arrangements in place for e-mail, and so avoided this problem; many universities did not).

Once again, we appeal to you to be very conservative in your use of international bandwidth until SEACOM is fully repaired. If we can't collectively manage our usage, it is likely that further temporary restrictions on Internet use will be introduced.
TENET has managed to bring a bit more international bandwidth online, which will help relieve the situation. They're now operating at roughly 1/3 of their normal peak load, so we still need to be careful and conservative about how we use international bandwidth.
SEACOM's current estimate for time to repair is now 22 July -- considerably worse than their first estimate of six to eight days. This estimate is based on their assumptions about the root cause of the problem, and is subject to external factors such as weather. As such it is not inconceivable that the actual time to repair will exceed this.

At the moment we're relying on the limited international bandwidth TENET has secured. Thanks to our collective attempts to manage our use of this bandwidth, thus far things are working reasonably well. We cannot become complacent, however. We still need to be conservative in our use of international Internet bandwidth.

Unfortunately the estimated time to repair will take us into the start of the third term. With the return of undergraduate students, we expect that the demand for international bandwidth at Rhodes (and at other institutions) will increase. We'll need to manage this fairly carefully to ensure that we don't exceed the limited supply that's available, and particularly so that business-critical functions can continue. As such, you can expect changes to our Internet quota and demand-side management systems starting on about 17 July -- things are likely to become fairly restrictive until normal capacity on SEACOM is restored.
QUOTE(guy @ Jul 12 2010, 12:40 PM)
As such, you can expect changes to our Internet quota and demand-side management systems starting on about 17 July -- things are likely to become fairly restrictive until normal capacity on SEACOM is restored.

Information about these changes is now available at http://noticeboard.ru.ac.za/post.5532296.
You may have seen reports in the media (and even on SEACOM's site) about restoration capacity being made available by TEAMS out of Mombasa. Unfortunately, TENET has not been able to take advantage of this for sound technical reasons — they simply don't have the right interfaces to connect to the cable, and cannot procure them in any reasonable time-frame. As such, at present our restoration bandwidth is coming from two South African ISPs who're making capacity via the SAT-3 cable and satellite available to us. TENET currently has about 750Mbps of international bandwidth available; the aggregate use of all the universities usually peaks at about 2300Mbps.

During the course of yesterday, ASAUDIT (the Association of South African University Directors of Information Technology) met with TENET to discuss the issues around the SEACOM outage. The discussion looked at two separate aspects: how to deal with the immediate crisis, and how to mitigate the risk of a prolonged SEACOM outage in the longer term.

In terms of the immediate crisis most universities, like Rhodes, raised the issue of returning students. TENET undertook to try and acquire additional restoration capacity to address this, but have not been successful thus far. There's more information about this at http://noticeboard.ru.ac.za/post.5532296.

In the longer term, the aim is to address the SEACOM risk by contracting with another provider for capacity on another submarine fibre cable, preferably up the West coast of Africa. There are a couple of options available in this regard, but it would be premature to publicise them at this point. Suffice to say, the problem is being taken fairly seriously -- nobody would like a repeat of the current outage.
A fibre break in Jo'burg meant that international sites were unavailable between 18:20 and 22:05. The reason the break affected us is that the temporary restoration bandwidth we're using at the moment is provisioned via the Jo'burg Internet exchange.
post.5532286