Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Actionnotes
Schedule / notify representative / usersCommunication timeline
Evaluate hard drives (system left in operation)

Review all:

  • SMART
  • dmesg
  • Review  /var/log/messages

Verify backups and related (system left in operation)

Back-in-time (local versioning)

  • Do what to verify?

EZ-Backup (remote backup)

  • Review log for recent uploads.
  • Q: Occassionally spot-check restoring a random, recent file?
Disable outside access 

touch /etc/nologin or vi /etc/nologin with text.

this file will be removed automatically after system reboot.

Delete jobs 
Shutdown headnode & nodes 
Scheraga only: Synology updateRequires reboot. Safer with headnode off.
Reboot switches

Only for Scheraga Matrix usually. All others have shared switches!

Q: Hoffmann compute nodes isolated switches?

Boot synologyScheraga Matrix only
Boot headnode 
Verify drives with fsckOnly do every ~6 months. Time consuming.
Ex: Abruna's cluster (small): 1-2 hours
Test UPS and its notifications

Does it work as expected?

How reasonably test? Pull power? Rely on self test?

Reboot anything that needs 
Boot nodes 
Enable access rm /etc/nologin
Send email 
Add "maintenance recorde" description at HPC's wiki page