...
Scheduled maintenance and upgrades procedures
Summary
Details
ChemIT notifies cluster lead that maintenance will occur on a specific upcoming date.
- What is a long enough lead time for the group?
- What is a short enough lead time for ChemIT?
Message will state:
- Date and time of shut-down. Expected duration of shut-down.
- Most events will occur Mon-Thur, 9am-5pm EST, when staffing and backup folks are available.
- Purpose summary.
Message will be sent to:
- Whom?
Typical work done during maintenance
- A proactive maintenance should be scheduled for approximately quarterly, and no longer than 6 months
Sample message:
To: ?
Subject: PI's ClusterName: Date/ time planned down-time.
-----------------------------------------------
To all users of the PI's ClusterName,
On Date/ time, the cluster will be down for planned maintenance for 3 hours.
During this down-time, we intend to:
- Test new GPU software capabilities
- Update the OS of the storage system.
- Update the BIOS of the 4 GPU clusters.
- Update the UPS software to address current software's limitations.
- Confirm backups and review other system software configurations.
-----------------------------------------------
Children Display | ||||||
---|---|---|---|---|---|---|
|
Emergency work procedures
...