Page History

...

Action	notes
Schedule / notify representative / users	Communication timeline
Evaluate hard drives (system left in operation)	Review all: SMART dmesg Review /var/log/messages
Verify backups and related (system left in operation)	Back-in-time (local versioning) Do what to verify? EZ-Backup (remote backup) Review log for recent uploads. Q: Occassionally spot-check restoring a random, recent file?
Crane WS only: Disable login; Kill all user logins, Umount nfs from all WS,	/etc/nologin pkill -KILL –u *** (kill all users one by one) umount /notbackedup umount /home/local/CORNELL
Disable outside access	touch /etc/nologin or vi /etc/nologin with text. this file will be removed automatically after system reboot.
Delete jobs	qdel all
Shutdown headnode & nodes	cd /root; ./shutdown_nodes.sh; (the script to shutdown all compute nodes. Try pestat command before you run this script, you may need modify this script to shutdown all nodes except "down" nodes)
Crane WS only: shutdown nfs server, Synology update, Windows update
Scheraga only: Synology update	Requires reboot. Safer with headnode off.
Reboot switches	Only for Scheraga Matrix usually. All others have shared switches! Q: Hoffmann compute nodes isolated switches?
Boot synology	Scheraga Matrix only
ddimage OS root partition	boot from centos live cd; sfdisk -d /dev/sda > sda.partition ; dd if=/dev/sda1(md0) of=root.img;
Crane WS only: Try one yum update on as-chm-cran-12	wait until nfs server is up; reboot as-chm-cran-12; if reboot OK, yum update on as-chm-cran-13; as-chm-cran-14, as-chm-cran-15 and reboot
Boot headnode
Verify drives with fsck	touch /forcefsck if we want to force fsck. Only do every ~6 months. Time consuming. Ex: Abruna's cluster (small): 1-2 hours
Test UPS and its notifications	Does it work as expected? How reasonably test? Pull power? Rely on self test?
Reboot anything that needs
Boot nodes
Enable access	rm /etc/nologin
Send email
Add "maintenance recorde" description at HPC's wiki page

Space shortcuts

Child pages

Versions Compared

Old Version 20

New Version 21

Key