Clusters and other high performance servers require maintenance. Documented procedures reduce surprises for both enabling scheduled maintenance and emergency work.
Typical work done during maintenance
Update BIOS (and why it's done...)
Update OS
Update...
Test UPS
Test backups
Test...
Scheduled maintenance procedures
Emergency work procedures