Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Cluster name

UPS for
main system or headnode

UPS shutdown algorithm, if any

Tools usedOther notes

Abruna

NONE
Unique: Need to do ASAP (no backup of OS!)

n/a

n/a 

Ananth

 (Unknown)

n/a

n/aCluster managed by CAC, not ChemIT

Collum

Done Spring'14

 

  

Hoffmann

Done Spring'14

 

  

Lancaster (w/ Crane)

Done Spring'14
(Funded by Crane)

 

  

Loring

Merged with Widom cluster

n/a

n/a 

Scheraga: Current, production Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers)

 

  
Scheraga: Forthcoming Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers, if any are to remain as such)

UPS supporting both Synology storage system and headnode.

UPS USB-connected to Synology storage system. Synology thus sends a signal to headnode. Algorithms are:

  • Synology:
  • Headnode:

Synology's own s/w.

On Linux systems, running "nut".

 

Widom (w/ Loring)

Done April 2016
head node to deploy finished

 

 

Moved Widom HeadNode to Loring UPS

 

ChemIT (C4)

Done

 

 Moved C4 to Loring UPS
Baird: 1 rack-mounted computational computer

NONE
ChemIT recommended making this investment (standard of practice), but group decided explicitly not to make the investment.

n/an/a 
Freed: EldorNONEn/an/a 
Petersen: 2 rack-mounted computational computers

Yes, but needs to be deployed in true production; using Widom's UPS for now.
ChemIT using UPS for testing UPS-related control software.

UPS supporting both system #50 and system #51.

UPS is USB-connected to system #50, which itself does not send signal to system #51. Algorithms for System #50 is:

Shutdown if only 10% battery power is left.

(System #51 currently does not have a way to be shutdown properly if there is a prolonged power outage.)

Windows OSChemIT would like to: Establish sending a signal from system #50 to system #51 and have system #51 properly shut down in the event of a prolonged outage.
Scheraga: 4 GPU rack-mounted computational computers

NONE
($900, estimate)
Need to protect? Data point: Feb'14 outage resulted in one of these not booting up correctly.

n/an/a 

Power outage impact on systems with and without UPS

~5-10 minute outage from Sunday, 4/23/207, per Michael Hint's investigations

Group or server

UPS info

(details in above table)

Impact of outage:

Headnode or main server

 

Impact of outage:

Storage

Impact of outage:

Compute nodes

(expect "down")

Impact of outage:

Other

Chemistry IT: SERV-05: HyperV production

(Dell, rack)

Worthless: Died within 2 minutes.FAILED   

Chemistry IT: SERV-05: HyperV backup

(RedBarn, rack)

Worthless: Died within 2 minutes.FAILED   

RESE-01: HyperV hosts to CRANE-19 (NFS)

Crane Synology

SurvivedFineFine  

Scheraga Matrix headnode

Scheraga Matrix Synology

SurvivedFineFine(down) 
HoffmannSurvivedFinen/a(down)Router config reset, so failed
Lancaster- CraneSurvivedFinen/a(down) 
Widom-Loring-AbrunaSurvivedFinen/a

bw001 up, since part of twin head node

(all the rest were down)

 
Baird compute serverNo UPSDown (MH restarted remotely via IPMI)n/a  

 

What does it cost to UPS a research system?

...