Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Scheduled maintenance and upgrades procedures

Summary

Details

ChemIT notifies cluster lead that maintenance will occur on a specific upcoming date.

  • What is a long enough lead time for the group?
  • What is a short enough lead time for ChemIT?

Message will state:

  • Date and time of shut-down. Expected duration of shut-down.
    • Most events will occur Mon-Thur, 9am-5pm EST, when staffing and backup folks are available.
  • Purpose summary.

Message will be sent to:

  • Whom?

Typical work done during maintenance

  • A proactive maintenance should be scheduled for approximately quarterly, and no longer than 6 months

Sample message:

To: ?
Subject: PI's ClusterName: Date/ time planned down-time.

-----------------------------------------------

To all users of the PI's ClusterName,

On Date/ time, the cluster will be down for planned maintenance for 3 hours.

During this down-time, we intend to:

  • Test new GPU software capabilities
  • Update the OS of the storage system.
  • Update the BIOS of the 4 GPU clusters.
  • Update the UPS software to address current software's limitations.
  • Confirm backups and review other system software configurations.

-----------------------------------------------

Children Display
depth3
styleh3
excerpttrue
 

Emergency work procedures

...