Excerpt |
---|
A snapshot of the UPS's used to support servers, switches, and other equipment under ChemIT's management. Mostly within ChemIT's Baker 248 server room. |
UPS inventory for CCB Clusters and non-cluster HPCs
Cluster name | UPS for | Maintenance | Upgrade status |
---|---|---|---|
NONE |
| 4/14: Within a year, upgrade OS? | |
(unknown) |
| n/a | |
Done Spring'14 |
| Fall'13: Upgraded OS and added 2 nodes | |
Done Spring'14 |
| Winter'13/14: Upgraded OS and added 2 nodes | |
Done Spring'14 |
| Spring'14: Upgraded OS and added 2 nodes | |
NONE |
| 4/14: When do OS upgrades, and why? | |
Done Fall'14 |
| Summer'14: $50K hardware upgrades; to include OS upgrade. | |
Yes |
| Spring'14: Upgraded OS and added 2 nodes | |
ChemIT (C4) | Done |
| 4/14: When turn into production, and for whom? |
Totals: |
|
|
|
CCB non-cluster HPCs, summary information
Inventory and summary notes regarding non-cluster HPC systems in 248, including computational stand-alone systems.
Columns needed - software installed / managed (do per system; dedicated page?), Cores, Age, storage and related (RAID: h/w or s/w?)
Name of system, and purpose | ChemIT | DNS name | IP | ChemIT Network | Headnode IPMI Network | OS | OS Version | UPS | Maintenance | Upgrade status |
|
---|---|---|---|---|---|---|---|---|---|---|---|
Baird: 1 rack-mounted computational computer | ChemIT | as-chm-bair-08.ad.cornell.edu compute.baird.chem.cornell.edu | 10.253.229.178 | 192.168.255.120 | 192.168.255.121 | Windows Server | 2012R2 | NONE |
|
|
|
Freed: Eldor | ChemIT | eldor.acert.chem.cornell.edu | 10.253.229.96 | 192.168.255.87 | CentOS | 6.4 | NONE |
|
|
| |
Petersen: 2 rack-mounted computational computers | ChemIT | calc01.petersen.chem.cornell.edu | 10.253.229.196/192 | Windows Server | 2012R2 | Yes, but needs to be deployed in true production; using Widom's UPS for now. |
|
|
| ||
Scheraga: 4 GPU rack-mounted computational computers | ChemIT | gpu.scheraga.chem.cornell.edu | 10.253.229.70 | 192.168.255.139 | 192.168.255.138 | CentOS | 6.4 | NONE |
|
|
|
What would it cost to UPS our research systems?
Assuming protection for 1-3 minutes MAXIMUM:
Do all head nodes and stand-alone computers in 248 Baker Lab
- Started getting done. About $180 (APC brand) per head node or server every ~3-4 years (3 yr for warranty and ~4 years actual battery life). And ~$900/ set of 4 GPU systems every ~3-4 years (to confirm approach and estimates).
CCB head nodes' UPS status:
Remaining UPS's to invest in
Clusters
Most we been done Spring 14, after the spate of power failures. See CCB's HPC page (first chart, in "UPS for headnode" column) for details
Cluster | Done | Not done | Notes |
---|---|---|---|
Loring |
| X | Unique: Need to do ASAP |
Abruna |
| X | Unique: Need to do ASAP |
Non-clusters
See CCB's HPC page (second chart, in "UPS" column) and CCB's non-HPC page (in "UPS" column) for details of the few that are already done.
Stand-alone computers' UPS status:
Computer | Done | Note done | Notes |
---|---|---|---|
Coates: MS SQL Server |
| X | Unique: Need to do ASAP |
Freed: Eldor |
| X | Unique: Need to do ASAP? (Q: Is OS backed up?) |
Baird: 1 rack-mounted computational computer |
| X | Need? |
|
|
|
|
Review others at above two cited pages which might need a UPS, after above ones done.
Switches
Do all switches: Maybe ~$340 ($170*2), every ~4 years.
- Recommend: Do ASAP.
- Q: Funding?
- Other issues and concerns with actually implementing this approach:
- Rack space. Maybe power. Maybe cord lengths. What other issues?
Do all compute nodes: ~$18K initially, and perhaps ~$4.5K every ~4 years to replace batteries and deal with UPS hardware failures.
- ~20 20amp UPS's ($900 each) required.
- Replacement batteries ~$200 each, or ~1/4 replacement cost.
- Estimates are simply back-of-the-envelope calculations.
- If were to actually implement, there may be smarter ways to do this, but the total cost will likely not be lower.
- In fact, costs may be higher, if sufficiently higher benefit doing it a different way, for example.
- Issues and concerns with actually implementing this approach:
- Costs. Rack space. Maybe power. Maybe cord lengths. What other issues?
Compute node counts, for UPS pricing estimates. Does not include head node:
- Count source: ChemIT's Computer counts with CCB clusters
Cluster | Compute node count | Power strip equivalents | Cost estimate, | Notes |
---|---|---|---|---|
Collum | 8 | 1 | $900 |
|
Lancaster, with Crane (new) | 10 | 2 | $1.8K |
|
Hoffmann | 19 | 2 | $1.8K |
|
Scheraga | 91 | 13 | $11.7K |
|
Loring | 4 | 1 | $900 |
|
Abruna | 9 | 1 | $900 |
|
C4 head node: pilot | N/A | N/A | N/A | This CCB Community head node pilot has no compute nodes of its own. |
Widom | 2 | 1 | ? | Compute nodes are hanging off of "C4" head node, above. |
TOTALS | ~140? | 21 | $18K + Widom |
|