noticeboard.ru.ac.za

2005/05/18 - Jackal Disk Repair
Jackal experienced a critical error at about 9:00. The disk is now having to be repaired, a process which will take about an hour i.e. should be completed about 11:00.
The disk repair on Jackal is taking longer than originally estimated. At the present rate, the expected completion time is nearer 12:45.
The first attempt at rebuilding the disk on Jackal failed, unfortunately. Last time this happened, a second attempt at rebuilding the disk worked, and a second attempt is now in progress. Hopefully, it will work in which case the system should be up again at about 14:00.
We are running the 3rd repair on the disk now as the 2nd was only a moderate success. It would seem that eDirectory (the user database) has been corrupted on the server, and we hope a 3rd pass on the disk will fix it. This should be complete by about 20h00.
We have been unable to fix the eDirectory damage on Jackal this evening. We will be working on this early tomorrow and will get outside assistance if necessary.
We have been unable to repair the damaged eDir database on Jackal. The two options available now are to call for outside assistance from Novell, or to reinstall the server. The first option is potentially the quickest and is the likely next step. In either case it looks very much like Jackal will be unavailable today.

Unfortunately, we are also not able to get online help because of the telkom problem with our internet connection.
We are getting telephonic assistance now, and have removed the damaged eDir database from Jackal. We are busy adding a new replica from the server holding a master. This time this takes depends on the size of the database and other factors.
With the very friendly help of one Fanie Joubert, Jackal is now up. We had to delete the server object from eDirectory to re-install the eDir database on the server, so there are a few things that still need attention. The server will probably need to be rebooted a few times over the next couple of days.

The good news is that no data was lost.

We are still not sure what caused the problem, so please remember to keep backups of work in progress.
post.16003