Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

UPS inventory for CCB Clusters and non-cluster HPCs

Note 1: Currently none of the clusters ChemIT manages have UPSs for their computer nodescompute nodes. Thus, this is our standard community standard of practice. (Is this what CAC does, too?).

  • ChemIT staff must be called in to restart the compute nodes after //any// power failure.
  • After even the briefest of power outages, all compute nodes will be off and thus clusters will be unusable. This is true even if the headnode and network switch have UPS backup.

Note 2: Having a UPS is expected to provide power backup for perhaps less than 10 minutes. (Depends on size of UPS, condition/ age of that UPS's battery, and the demands placed on that UPS.) This protects against most power outages.

  • Adding a USB connection from the UPS to a system allows that system to execute a shutdown command, properly shutting the system down in a timely manner during a prolonged power outage. Otherwise the system will simply lose power, and that kind of forced shutdown can often cause software and hardware failures.

Cluster name

UPS for
main system or headnode

UPS shutdown algorithm, if any

Tools usedOther notes

Abruna

NONE
Unique: Need to do ASAP (no backup of OS!)

n/a

n/a 

Ananth

 (Unknown)

n/a

n/aCluster managed by CAC, not ChemIT

Collum

Done Spring'14

 

  

Hoffmann

Done Spring'14

 

  

Lancaster (w/ Crane)

Done Spring'14
(Funded by Crane)

 

  

Loring

NONE
Unique: Need to do ASAP (no backup of OS!)

n/a

n/a 

Scheraga: Current, production Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers)

 

  
Scheraga: Forthcoming Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers, if any are to remain as such)

UPS supporting both Synology storage system and headnode.

UPS USB-connected to Synology storage system. Synology thus sends a signal to headnode. Algorithms are:

  • Synology:
  • Headnode:

Synology's own s/w.

On Linux systems, running "nut".

 

Widom

Yes
Waiting for head node to deploy (on C4 at the moment)

 

  

ChemIT (C4)

Done
Old UPS; on the margin

 

  
Baird: 1 rack-mounted computational computer

NONE
ChemIT recommended making this investment (standard of practice), but group decided explicitly not to make the investment.

n/an/a 
Freed: EldorNONEn/an/a 
Petersen: 2 rack-mounted computational computers

Yes, but needs to be deployed in true production; using Widom's UPS for now.
ChemIT using UPS for testing UPS-related control software.

UPS supporting both stem #50 and system #51.

UPS USB-connected to system #50, which itself does not send signal to system #51.

On system #50: Shutdown if only 10% battery power is left.

On system #51: No s/w running to listen for signal from system #50(if signal were to be sent).

Windows OSTo do: Establish sending a signal from system #50 to system #51 and have system #51 properly shut down in the event of a prolonged outage.
Scheraga: 4 GPU rack-mounted computational computers

NONE
($900, estimate)
Need to protect? Data point: Feb'14 outage resulted in one of these not booting up correctly.

n/an/a 

...