You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

A list of considerations when buying a new cluster, or adding to an existing cluster. Also applies to other high performance computing (HPC) systems.

General

Deployed, or simply iIdeas or options which would required further study

Notes

Scheraga's Matrix upgrade, 2014
$50K (1st of 3 yrs; ~$150K total)

Software

ChemIT: OS, the cluster's software "stack", and core applications.
Researchers: Their applications, configurations, scripts, sharing, etc.
Maintenance and updates, including scheduling. Roll-back process, if any. Process for major upgrades.

See Roles and responsibilities for clusters managed by ChemIT.

Confirm this is Czerek.
Confirm applications and their locations (some shared apps are in user directories, which is not a best practice.)

Backup

EZ-Backup service
Local, on head node: Copies and/ or sync'ed
Local, off head node: : Copies and/ or sync'ed
Using SFS, which itself has versioning. And price included backups with EZ-Backup.

See Cluster backups and related considerations.
No all options are mutually exclusive.
Options vary in what they protect against and their start-up and on-going costs.
Options vary in restores times and end-user vs. mediated restores.
Rule of thumb: The faster you pull unique data off, the less you have to invest in backups.

Get input from Czerek on our current practices, costs, value, as well as other ideas listed.
See data storage, below.

Head nodes and compute nodes

Ensure contemporary head node, taking into account it's age, warranty, and ease of replacement with a compute node (unique attributes, including hard drive bays).
Node form factors: Single, twins, quads. And 1 U, 2U's, etc.
Compute node technologies: Anything special required? Examples are GPUs and InfiniBand.
Upgrade, removal, and expansion process.

See ChemIT's inventory snapshots of CCB's clusters.

Will require a new, dedicated head node.
Consider buying compute nodes after head node set up with required software running on a few, old compute nodes. Then buy compute nodes (cheaper, better if wait months?)
Q: Any GPUs required this first year? If so, not hook up to cluster? (This is as the four other GPU-based compute nodes are- they run completely independently from the cluster, and from each other.)

Data storage

Storage required for headnode and computational use (short term), including job store and user accounts.
Longer term storage needs, in which a file server may meet needs better. Examples include the SFS service (NFS is an option there).

Storing large amounts of data make restores harder, riskier, and more time-consuming. Storing large amounts of data needing backups will cost more than smaller amounts of data.

Consider the value in separating out longer-term user files from those related to current computational data.
See backups, above.

Networking

Ensure adequate number of network switches are provisioned. Cabling. Physical arrangement/ proximity.

 

Will require some more switches and cables.

Power

Power strips required (limits!).
Power interruptions: Require Uninterruptible Power Supply (UPS). Costs usually limit to only protecting the head nodes. Duration of protection? Auto-shutdown of head node on protracted outage? Form factor options.
Procedure for staff when power goes out, and when it is restored (recovery, restart). Both during office hours and outside of normal office hours, or if key staff are unavailable.
Heat dissipation (HVAC), including emergencies.

 

Recently purchased UPS can be used for the new head node. Any further protection required to reduce downtime?

Cornell Active Directory

When is it a value to research group or ChemIT?

 

ChemIT not ready to do this at this time. But maybe we will be when ready to deploy?

Rack space

Physical arrangement. Form factors (see nodes, above).

 

 

Upgrade process contacts and roles

Funder(s). Technical lead (in research group). Testers. Users.

 

 

  • No labels