You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 9
Next »
A list of considerations when buying a new cluster, or adding to an existing cluster. Also applies to other high performance computing (HPC) systems.
General |
Deployed, or simply iIdeas or options which would required further study |
Notes |
Software |
ChemIT: OS, the cluster's software "stack", and core applications.
Researchers: Their applications, configurations, scripts, sharing, etc.
Maintenance and updates, including scheduling. |
See Roles and responsibilities for clusters managed by ChemIT. |
Backup |
EZ-Backup service
Local, on head node: Copies and/ or sync'ed
Local, off head node: : Copies and/ or sync'ed
Using SFS, which itself has versioning. And price included backups with EZ-Backup. |
See Cluster backups and related considerations.
No all options are mutually exclusive.
Options vary in what they protect against and their start-up and on-going costs.
Options vary in restores times and end-user vs. mediated restores.
Rule of thumb: The faster you pull unique data off, the less you have to invest in backups. |
Head nodes and compute nodes |
Ensure contemporary head node, taking into account it's age, warranty, and ease of replacement with a compute node (unique attributes, including hard drive bays).
Node form factors: Single, twins, quads. And 1 U, 2U's, etc.
Compute node technologies: Anything special required? Examples are GPUs and InfiniBand. |
See ChemIT's inventory snapshots of CCB's clusters. |
Data storage |
Storage required for headnode and computational use (short term), including job store and user accounts.
Longer term storage needs, in which a file server may meet needs better. Examples include the SFS service (NFS is an option there). |
Storing large amounts of data make restores harder, riskier, and more time-consuming. Storing large amounts of data needing backups will cost more than smaller amounts of data. |
Networking |
Ensure adequate number of network switches are provisioned. Cabling. Physical arrangement/ proximity. |
|
Power |
Power strips required (limits!).
Power interruptions: Require Uninterruptible Power Supply (UPS). Costs usually limit to only protecting the head nodes. Duration of protection? Auto-shutdown of head node on protracted outage? Form factor options.
Procedure for staff when power goes out, and when it is restored (recovery, restart). Both during office hours and outside of normal office hours, or if key staff are unavailable.
Heat dissipation (HVAC), including emergencies. |
|
Cornell Active Directory |
When is it a value to research group or ChemIT? |
|
Rack space |
Physical arrangement. Form factors (see nodes, above). |
|