...
Action | notes |
---|---|
Schedule / notify representative / users | Communication timeline |
Evaluate hard drives (system left in operation) | Review all:
|
Verify backups and related (system left in operation) | Back-in-time (local versioning)
EZ-Backup (remote backup)
|
Disable outside access | touch /etc/nologin or vi /etc/nologin with text. this file will be removed automatically after system reboot. |
Delete jobs | |
Shutdown headnode & nodes | |
Scheraga only: Synology update | Requires reboot. Safer with headnode off. |
Reboot switches | Only for Scheraga Matrix usually. All others have shared switches! Q: Hoffmann compute nodes isolated switches? |
Boot synology | Scheraga Matrix only |
Boot headnode | |
Verify drives with fsck | Only do every ~6 months. Time consuming. Ex: Abruna's cluster (small): 1-2 hours |
Test UPS and its notifications | Does it work as expected? How reasonably test? Pull power? Rely on self test? |
Reboot anything that needs | |
Boot nodes | |
Enable access | rm /etc/nologin |
Send email | |
Add "maintenance recorde" description at HPC's wiki page |