You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Matrix currently unavailable due to a problem with its data storage.

Situation

As of 10/3/13, 3pm:

Addressing this issue is expected to take days, not hours. Just copying the data takes a day or more, much less doing all the work required to diagnose and solve the problem.

We still hopes that no data that was on the data storage system has been irretrievably lost. However, the situation is precarious since two fail-safes have failed.

  • Since system can only accommodate a loss of 2 hard drives (of 6), we are now at high risk since 2 of the hard drives seemed to have failed. And a third is now issuing warning signs.

Status

10/3/13, afternoon: ChemIT has placed an order for 3TB for Matrix.

  • ChemIT has also ordered, for its own general use, a 4TB consumer drive. This can be use for a short-term backup in this situation to further decrease risk of data loss by enable yet one more copy of the data.

10/3/13, 2:45pm: 3TB hard drive approved by Harold Scheraga. ChemIT has placed an order.

10/3/12, noonish: Using data on 4 hard disks, using RAID 6 to reconstitute data on 2 drives which test OK separately with a "quick" test.

  • 2% done after over 3 hours...

Plan

Get a copy of the data off the system, as a precautionary measure. This process may take days, not hours.

  • Confirm that data copy is complete.
  • Further duplicate that data, especially before deleting any original data.

Install new hard drive.

Analyze further one of the two suspected hard drives to try to isolate source of data corruption.

Notes

On 10/2/13 (Wed), Matrix became unavailable.

System has 8 drives on a hardware controller.

  • 2 (150GB) hard drives for the OS, RAID 1 ("6" and "7").
  • 6 (3 TB) hard drives for the data storage ("0" - "5").

 

 

 

 

 

 

 

 

 

 

 

 

OS hard drives seem fine.

However, the system refuses to work with all 6 hard drives. Can only get it to work with 4 of the hard drives, so initially suspecting a failure of two hard drives.

  • No labels