Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

A snapshot of the UPS's used to support servers, switches, and other equipment under ChemIT's management. Mostly within ChemIT's Baker 248 server room.

See also

UPS inventory for CCB Clusters and non-cluster HPCs

...

Cluster name

UPS for
main system or headnode

UPS shutdown algorithm, if any

Tools usedOther notes

Abruna

NONE
Unique: Need to do ASAP (no backup of OS!)

n/a

n/a 

Ananth

 (Unknown)

n/a

n/aCluster managed by CAC, not ChemIT

Collum

Done Spring'14

 

  

Hoffmann

Done Spring'14

 

  

Lancaster (w/ Crane)

Done Spring'14
(Funded by Crane)

 

  

Loring

NONE
Unique: Need to do ASAP (no backup of OS!)

Merged with Widom cluster

n/a

n/a 

Scheraga: Current, production Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers)

 

  
Scheraga: Forthcoming Matrix

Done Fall'14
(See below for stand-alone computational (GPU) computers, if any are to remain as such)

UPS supporting both Synology storage system and headnode.

UPS USB-connected to Synology storage system. Synology thus sends a signal to headnode. Algorithms are:

  • Synology:
  • Headnode:

Synology's own s/w.

On Linux systems, running "nut".

 

Widom (w/ Loring)

Done April 2016
Yes
Waiting for head node to deploy (on C4 at the moment)finished

 

 

Moved Widom HeadNode to Loring UPS

 

ChemIT (C4)

Done
Old UPS; on the margin

 

  Moved C4 to Loring UPS
Baird: 1 rack-mounted computational computer

NONE
ChemIT recommended making this investment (standard of practice), but group decided explicitly not to make the investment.

n/an/a 
Freed: EldorNONEn/an/a 
Petersen: 2 rack-mounted computational computers

Yes, but needs to be deployed in true production; using Widom's UPS for now.
ChemIT using UPS for testing UPS-related control software.

UPS supporting both system #50 and system #51.

UPS is USB-connected to system #50, which itself does not send signal to system #51. Algorithms for System #50 is:

Shutdown if only 10% battery power is left.

(System #51 currently does not have a way to be shutdown properly if there is a prolonged power outage.)

Windows OSChemIT would like to: Establish sending a signal from system #50 to system #51 and have system #51 properly shut down in the event of a prolonged outage.
Scheraga: 4 GPU rack-mounted computational computers

NONE
($900, estimate)
Need to protect? Data point: Feb'14 outage resulted in one of these not booting up correctly.

n/an/a 

Power outage impact on systems with and without UPS

~5-10 minute outage from Sunday, 4/23/207, per Michael Hint's investigations

Group or server

UPS info

(details in above table)

Impact of outage:

Headnode or main server

 

Impact of outage:

Storage

Impact of outage:

Compute nodes

(expect "down")

Impact of outage:

Other

Chemistry IT: SERV-05: HyperV production hosts:

Stockroom QB, Stockroom WebApp, ChemIT file share, test WSUS.

(Dell, rack)

 

Worthless: Died within 2 minutes.

(Was a hand-me-down)

FAILED  Plan: All but ChemIT file share going to AWS.

Chemistry IT: SERV-05: HyperV backup.

(RedBarn, rack)

Worthless: Died within 2 minutes.

(Was a hand-me-down)

FAILED   

RESE-01: HyperV hosts to CRANE-19 (NFS)

Crane Synology

SurvivedFineFine  

Scheraga Matrix headnode

Scheraga Matrix Synology

SurvivedFineFine(down) 
HoffmannSurvivedFinen/a(down)Router config reset, so failed
Lancaster- CraneSurvivedFinen/a(down) 
Widom-Loring-AbrunaSurvivedFinen/a

bw001 up, since part of twin head node

(all the rest were down)

 
Baird compute serverNo UPSDown (MH restarted remotely via IPMI)n/a  
PetersenSurvived    
Freed's Eldor?    

 

What does it cost to UPS a research system?

...

Most we been done Spring 14, after the spate of power failures. See CCB's HPC page (first chart, in "UPS for headnode" column) for details

Unique: Need to do ASAP

Cluster

Done

Not done

Notes

Loring

 

X

Abruna

 

X

Unique: Need to do ASAP

...

See CCB's HPC page (second chart, in "UPS" column) and CCB's non-HPC page (in "UPS" column) for details of the few that are already done.

...