The impact on Chemistry's IT of about 40 minutes of unexpected power outage.

Summary

  • Parts of this power outage went fairly well. Some less so.
  • Cluster head nodes and compute nodes & drives seem to be in good shape.

Things we know failed:

248 Baker

Topic or eventAction takenAction requiredNotes
ChemIT UPS in cluster rack died. Would not turn on.

Moved Widom HeadNode to Loring UPS

Moved C4 to Loring UPS

Moved switch (Collum+Widom) to Collum UPS (Lulu add)

Removed ChemIT UPS and plugged in to charge, just in case.

  
ChemIT UPS for Windows servers – limited battery power, Hyper-V machines were not able to shut down gracefully.Even after 24 hours charging, Synology shows it with only 672 seconds of battery life. Probably not even thatNeeds battery or replacement 
Schernology (Scheraga's Synology) could not be accessed to cleanly shut down since network not working.Lulu did push power button to shutdown system. It should go to safe mode when battery goes to low. Could have let UPS shut it down cleanly if wanted to wait (confirm true?)Lulu investigating options. 
Mathematica license server did not start after restart.Restarted manually. (common issue)  
NMR Web server didn't start up right.Needed to be kicked a bit to start.This Gateway needs to go. 
NMR Router pukedReprogrammed passwords and forwarding for SSH & RDP.Needs external administration access configured. 

Other

Topic or eventAction takenAction requiredNotes
Lee - Steven Lee's SGI had several problems, boot, date, etc.Lulu wrestled with re-setting (hardware) time.

Advise get UPS? If so, how automate shut-down if no monitor powered?

INC000001652417

Marohn - B19 PSB's AS-CHM-Maro-03 RAID-1 had a drive fail.

Oliver tested OK, wiped and re-added to RAIDShould group invest in more risk management, including a UPS?INC000001652216
Fors - Fors instrument came up and started working. Were both the computer and the instrument previously rebooted?Roger asked group about reboot history of instrument.Awaiting group's response (Dillon) on reboot history of instrumentINC000001645245
Petersen - His group's UPS died.Roger got him a quote. INC000001653506
VoIP phones did not boot up correctly and hanged.Oliver wrote to CIT CIT: INC000001653941
  • No labels