List does not include Linux file servers under ChemIT management. Add later?
See also
CCB Clusters, summary information
The following counts include head nodes and compute nodes.
Columns needed - software installed / managed (do per cluster), Nodes/Cores, Age, storage (head/compute) and related (RAID: h/w or s/w?)
Cluster name | Number of node | ChemIT | DNS name | Two-step (Duo) option for access NOTE: If so, still offers VPN alternative access | IP | Internal network | ChemIT Network | Headnode IPMI Network | Headnode | OS Version | Provisioning | Provisioning | Scheduler | Scheduler | Backup information | UPS for See: | Maintenance | Upgrade status |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
41 | CAC | astra.cac.cornell.edu | 128.84.3.66 | N/A | N/A | CentOS | 6.2 | Rocks |
|
|
| CAC: EZ-Backup | Yes, per CAC (via building's UPS) |
| n/a | |||
Collum-Loring-Abruna-Widom "CLAW" Cluster | ?? | ChemIT | boltzmann.chem.cornell.edu | As of 1/18/17: Duo installed, NOT enabled | 10.253.229.28 | 172.0.50.x | 192.168.255.50 | 192.168.255.51 | CentOS | 6.6 | Warewulf |
|
| Partial. Collurm: EZ-Backup All others: NOT BACKED UP | Yes |
| Fall'13: Upgraded OS and added 2 nodes | |
28 | ChemIT | sol.hoffmann.chem.cornell.edu | 128.253.229.81 | 172.0.100.x | 192.168.255.100 | 192.168.255.101 | CentOS | 6.4 | Warewulf | 3.4 | Torque/Maui | torque-2.5.13 / maui-3.3.1 | ?? | Yes |
| Winter'13/14: Upgraded OS and added 2 nodes | ||
11 | ChemIT | revc.lancaster.chem.cornell.edu | 128.253.229.213 | 172.0.60.x | none | 192.168.255.61 | CentOS | 6.5 | Warewulf | 3.4 | Torque/Maui | torque-2.5.13 / maui-3.3.1 | ?? | Yes |
| Spring'14: Upgraded OS and added 2 nodes | ||
95 | ChemIT | scheraga.chem.cornell.edu | 128.253.229.65 | 192.168.255.1 | 192.168.255.5 | Centos | 6.5 | Warewulf |
| Torque/Maui | torque-2.5.9 / maui-3.3.1 | EZ-Backup. 1) Home 2) Boot and applications. 3) Fileshare. (Not "notbackedup" cache data drive, obviously.) | Yes |
| Summer'14: $50K hardware upgrades; to include OS upgrade. | |||
ChemIT (C4) | 1 | ChemIT | cluster.chem.cornell.edu | 10.253.229.9 | 192.168.255.110 | none | CentOS | 6.4 from 6.2 | Warewulf |
| Torque/Maui | torque-2.5.12 / maui-3.3.1 | N/A? | Yes |
| 4/14: When turn into production, and for whom? | ||
Totals: |
|
|
|
|
|
|
|
|
|
|
|
|
Deprecated Chemistry Clusters, summary info
Cluster name | Number of node | ChemIT | DNS name | IP | Internal network | ChemIT Network | UPS for | Upgrade status |
---|---|---|---|---|---|---|---|---|
Abruna (as stand-alone) | 10 | ChemIT | Used to be: | Used to be: 10.253.229.249 | Used to be: 192.168.255.30 | 4/14: No h/w upgrades planned. | ||
Collum (as stand-alone) | 9 | ChemIT | Used to be: | Used to be: 10.253.229.248 | Used to be: 172.0.40.x | Used to be: none | Used to be: Yes (Where is this used now?!) | Fall'13: Upgraded OS and added 2 nodes |
CCB non-cluster HPCs, summary information
Inventory and summary notes regarding non-cluster HPC systems in 248, including computational stand-alone systems.
Columns needed - software installed / managed (do per system; dedicated page?), Cores, Age, storage and related (RAID: h/w or s/w?)
Name of system, and purpose | ChemIT | DNS name | IP | ChemIT Network | Headnode IPMI Network | OS | OS Version | UPS | Maintenance | Upgrade status |
|
---|---|---|---|---|---|---|---|---|---|---|---|
Baird: 1 rack-mounted computational computer | ChemIT | as-chm-bair-08.ad.cornell.edu compute.baird.chem.cornell.edu | 10.253.229.178 | 192.168.255.120 | 192.168.255.121 | Windows Server | 2012R2 | NONE |
|
|
|
Freed: Eldor | ChemIT | eldor.acert.chem.cornell.edu | 10.253.229.96 | 192.168.255.87 | CentOS | 6.4 | NONE |
|
|
| |
Petersen: 2 rack-mounted computational computers | ChemIT | calc01.petersen.chem.cornell.edu | 10.253.229.196/192 | 192.168.255.54/55 | Centos | 7.2v1511 | Yes |
|
|
| |
Scheraga: 4 GPU rack-mounted computational computers | ChemIT | gpu.scheraga.chem.cornell.edu | 10.253.229.70 | 192.168.255.139 | 192.168.255.138 | CentOS | 6.4 | NONE |
|
|
|
See: UPS inventory and status |
Index to the many children pages of this page
aaClusters moving to CAC
Documentation page place-holder for collecting information related to having current Chemistry IT-managed clusters (in 248 Baker Lab) moved to CAC.
- Why move Chemistry's clusters from Baker to CAC? — Printable PDF flyer (800px*2000px)
Abruna Cluster
ChemIT Cluster
Collum Cluster
8 compute nodes, 1 head node. Details on this page.
Collum-Loring-Abruna-Widom "CLAW" Cluster
Cluster built on Widom's headnode. 1 headnode and xx compute nodes.
Freed Acert Eldor HPC
Non-cluster HPC
Hoffmann Cluster
Lancaster Crane Cluster
Petersen Independent Nodes
Scheraga Cluster
Upgrading summer 2014.
- Matrix compute nodes — Table containing node numbers and hardware information.
- Processor info and core counts — Matrix has 952 typical processor cores when all nodes are connected. It turns out Matrix can have as many as 9,144 cores if it could utilize 4 nodes with GPUs, which themselves contain yet another 8,192 cores! However, these additional GPU-based cores require specialized programming and have are not to-date been made available to researchers via the cluster. Those cores have only been accessible to researchers accessing these specialized nodes provisioned as workstations (and thus not attached
- Matrix end-user documentation
- Matrix end-user application information — Details for end-users regarding their applications on Matrix, including who the group contact.
- Matrix end-user documentation, from ChemIT — The new Matrix is faster, but it is different. Learn about the differences here to reduce your aggravation.
- Matrix end-user documentation, from Group
- Matrix user job limits
- Matrix users information ( name, netid, status, quota ) — On Matrix, researcher have both a quota for their home directory (keep as small as reasonable), and a quota for their storage directory.
- Scheraga Synology
zClarifying cluster responsibilities and ownership
Effective use of a cluster for research is enhanced with clarity of roles and responsibilities, along with shared conventions and procedures.
- Buying or adding to a cluster — Technical considerations when buying a new cluster, or adding to an existing cluster. Also applies to other high performance computing (HPC) systems.
- Lancaster and Crane's cluster — Lancaster and Crane share a headnode, so social conventions are required to ensure researchers are not negatively surprised.
- Roles and responsibilities for clusters managed by ChemIT
- Cluster application software specifics — Various scientific applications are used on CCB clusters. In general, Research Groups are responsible for knowing their software. In many cases, ChemIT staff can assist researchers, but cannot take responsibility or expend unlimited effort.
zCluster backups and related considerations
Although there may be unique considerations regarding backups for high performance computer systems, including cluster, see first Backups and file storage options for research groups.
zCluster Computational Software
Computational software installed on CCB clusters, and who supports and manages which software.
zCluster counts details and history
Inventory counts of CCB's HPC computers, clusters only.
zConnecting to Clusters
zMaintenance and emergency procedures
Clusters and other high performance servers require maintenance. Documented procedures reduce surprises for both enabling scheduled maintenance and emergency work.
- Cluster and HPC maintenance schedules — Regular maintenance of clusters requires downtime. A maintenance schedule can reduce surprises and not unnecessarily delay required maintenance.
- Cluster Maintenance SOP — This page includes a checklist for preparing any maintenance work, and a listing of the sequence of steps to take.
- Templates of notification emails
zStorage for HPCs and other systems
Sometimes a local hard drive(s) is all you need. But often the right solution is something else. Look here for info. related to alternatives, some of which are successfully used in production and very cost-effective.
zUseful Linux HPC commands
- Linux commands — Commands Oliver wants to keep track of, at a minimum.