Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

A snapshot of the UPS's used to support servers, switches, and other equipment under ChemIT's management. Mostly within ChemIT's Baker 248 server room.

UPS inventory for CCB Clusters and non-cluster HPCs

Cluster name

UPS for
headnode

Maintenance
window

Upgrade status

Abruna

NONE
Unique: Need to do ASAP (no backup of OS!)

 

4/14: Within a year, upgrade OS?
No h/w upgrades planned

Ananth

 (unknown)

 

n/a

Collum

Done Spring'14

 

Fall'13: Upgraded OS and added 2 nodes

Hoffmann

Done Spring'14

 

Winter'13/14: Upgraded OS and added 2 nodes

Lancaster (w/ Crane)

Done Spring'14
(Funded by Crane)

 

Spring'14: Upgraded OS and added 2 nodes

Loring

NONE
Unique: Need to do ASAP (no backup of OS!)

 

4/14: When do OS upgrades, and why?
No hardware upgrades planned

Scheraga

Done Fall'14
(See chart for stand-alone computational (GPU) computers)

 

Summer'14: $50K hardware upgrades; to include OS upgrade.

Widom

Yes
Waiting for head node to deploy (on C4 at the moment)

 

Spring'14: Upgraded OS and added 2 nodes

ChemIT (C4)

Done
Old UPS; on the margin

 

4/14: When turn into production, and for whom?

Totals:

 

 

 

CCB non-cluster HPCs, summary information

Inventory and summary notes regarding non-cluster HPC systems in 248, including computational stand-alone systems.

Columns needed - software installed / managed (do per system; dedicated page?), Cores, Age, storage and related (RAID: h/w or s/w?)

Name of system, and purpose

ChemIT
/ other
support

DNS name
(may have CNAME)

IP
Gateway

ChemIT NetworkHeadnode IPMI Network

OS

OS Version

UPS

Maintenance
window

Upgrade status

 

Baird: 1 rack-mounted computational computer

ChemIT

as-chm-bair-08.ad.cornell.edu

compute.baird.chem.cornell.edu

10.253.229.178

192.168.255.120192.168.255.121

Windows Server

2012R2

NONE
  ( Suggested but not done )

 

 

 

Freed: Eldor

ChemIT

eldor.acert.chem.cornell.edu

10.253.229.96

192.168.255.87 

CentOS

6.4

NONE

 

 

 

Petersen: 2 rack-mounted computational computers

ChemIT

calc01.petersen.chem.cornell.edu
calc02.petersen.chem.cornell.edu

10.253.229.196/192

  

Windows Server

2012R2

Yes, but needs to be deployed in true production; using Widom's UPS for now.
ChemIT using UPS for testing UPS-related control software.

 

 

 

Scheraga: 4 GPU rack-mounted computational computers

ChemIT

gpu.scheraga.chem.cornell.edu

10.253.229.70

192.168.255.139192.168.255.138

CentOS

6.4

NONE
($900, estimate)
Need to protect? Data point: Feb'14 outage resulted in one of these not booting up correctly.

 

 

 

 

What would it cost to UPS our research systems?

Assuming protection for 1-3 minutes MAXIMUM:

Do all head nodes and stand-alone computers in 248 Baker Lab

  • Started getting done. About $180 (APC brand) per head node or server every ~3-4 years  (3 yr for warranty and ~4 years actual battery life). And ~$900/ set of 4 GPU systems every ~3-4 years (to confirm approach and estimates).

CCB head nodes' UPS status:

Remaining UPS's to invest in

Clusters

Most we been done Spring 14, after the spate of power failures. See CCB's HPC page (first chart, in "UPS for headnode" column) for details

Cluster

Done

Not done

Notes

Loring

 

X

Unique: Need to do ASAP

Abruna

 

X

Unique: Need to do ASAP

Non-clusters

See CCB's HPC page (second chart, in "UPS" column) and CCB's non-HPC page (in "UPS" column) for details of the few that are already done.

Stand-alone computers' UPS status:

Computer

Done

Note done

Notes

Coates: MS SQL Server

 

X

Unique: Need to do ASAP

Freed: Eldor

 

X

Unique: Need to do ASAP? (Q: Is OS backed up?)

Baird: 1 rack-mounted computational computer

 

X

Need?

 

 

 

 

Review  others at above two cited pages which might need a UPS, after above ones done.

Switches

Do all switches: Maybe ~$340 ($170*2), every ~4 years.

  • Recommend: Do ASAP.
  • Q: Funding?
  • Other issues and concerns with actually implementing this approach:
    • Rack space. Maybe power. Maybe cord lengths. What other issues?

Do all compute nodes: ~$18K initially, and perhaps ~$4.5K every ~4 years to replace batteries and deal with UPS hardware failures.

  • ~20 20amp UPS's ($900 each) required.
    • Replacement batteries ~$200 each, or ~1/4 replacement cost.
  • Estimates are simply back-of-the-envelope calculations.
  • If were to actually implement, there may be smarter ways to do this, but the total cost will likely not be lower.
    • In fact, costs may be higher, if sufficiently higher benefit doing it a different way, for example.
  • Issues and concerns with actually implementing this approach:
    • Costs. Rack space. Maybe power. Maybe cord lengths. What other issues?

Compute node counts, for UPS pricing estimates. Does not include head node:

Cluster

Compute node count

Power strip equivalents
(~8/strip MAX)

Cost estimate,
every 4 years

Notes

Collum

8

1

$900

 

Lancaster, with Crane (new)

10

2

$1.8K

 

Hoffmann

19

2

$1.8K

 

Scheraga

91

13

$11.7K

 

Loring

4

1

$900

 

Abruna

9

1

$900

 

C4 head node: pilot
Widom's 2 nodes there.

N/A

N/A

N/A

This CCB Community head node pilot has no compute nodes of its own.
It hosts compute nodes from CCB researchers.

Widom

2

1

?

Compute nodes are hanging off of "C4" head node, above.

TOTALS

~140?

21

$18K + Widom