Clusters and other high performance servers require maintenance. Documented procedures reduce surprises for both enabling scheduled maintenance and emergency work.

Table of contents

Scheduled maintenance and upgrades procedures

Emergency work procedures

See also

Communication timeline

1) Something bad happens, which was not scheduled.

2) ChemIT learns of the emergency situation.

3) ChemIT characterizes the problem and develops an initial prognosis.

4) ChemIT notifies group rep (users) of status and prognosis as soon as practicable.

A record (and notes) of Emergency Actions

Cluster or HPC name

Event date
and action

     

Abruna

      

Ananth

      

Collum

      

Hoffmann

      

Lancaster (w/ Crane)

      

Scheraga

      

Widom-Loring

      

Eldor