Excerpt |
---|
Upgrade to CentOS and add 2 nodes to existing |
Time and labor estimates
Start-to-finish time (taking into account availability): ~5 weeks (start 7/31, Wed)
Est. labor: ~110 hours for Lulu, plus Michael's hours for networking, hardware configuration, consultation, etc.
Work descriptions | Effort | Elapse time | Est. date |
---|---|---|---|
Install and config CentOS on headnode | 3 days | 1 week |
|
Install and config Warewolf | 5 days | 1 week |
|
Install and config Torque | 2 days | .5 wk |
|
Install and config WebMO | 5 days | 1 week |
|
Install and config 2nd (new) node | 1 day | .2 week |
|
Jun et al. test | 2 days | 1 week | End of August |
Plan
Cluster Upgrade Description:
Goal: Upgrade and expand the existing Collum High Performance Computing (HPC) cluster.
The existing cluster consists of a head node and 6 slave compute nodes. The desire is to purchase 2 new nodes and expand the cluster to 8 compute nodes.
Cluster software currently in use includes:
- Fedora 11 - Operating System (2009 release, current version is Fedora 19)
- Perceus - a slave node provisioning and management system (Note: Percus is now obsolete, development has ended.)
- Torque - a distributed resource manager / queuing system, providing control over batch jobs and distributed compute nodes
- Maui Cluster Scheduler - job scheduler for use on clusters
- Web MO - a Web-based interface to computational chemistry packages
Notes:
- Discovery shows that the new nodes will require an updated operating system version, as they are not supported in Fedora 11. (Will not boot on current OS Configuration).
- ChemIT is now using CentOS (6.4) for new OS installations instead of Fedora
- Warewulf has superseded Perceus for cluster node provisioning and management, and is ChemIT's current provisioning package
- Torque and Maui are still the preferred manager and scheduler.
In order to provide a upgrade to current OS and provide a smooth transition, the proposed upgrade and sequence are as follows:
- Build new cluster with current software, utilizing one of of the new nodes as a head node. Once this is working, transition the existing cluster hardware to the new cluster.
- Install HPC Cluster software- CentOS 6.4, Warewulf, Torque, Maui, and Web MO
- Install applications
- Add 2nd new node as a slave, creating a functioning cluster
- test, verify
- Move accounts, data, and computing nodes from old cluster to the new cluster.
This will result in a fully upgraded cluster, using the current HPC tools, with a newer Head node (Under Warranty).
Plan
See work estimates below See above work descriptions for detailed steps.
Overview plan:
- Pull an old Compute Node (CN) and convert to a new CentOS HD ("new HN"Head Node (“new HN”)
- Add new CNs to new HN
- Add one old CN's CN’s to new HN
- After testing, shift production to new HN.
- Add the rest of the old CN's CN’s to new HN
- Convert old HN to CN and add it to new HN
- Later; see P41: Migrate to ChemIT Community Head Node
- Add all CN's CN’s to ChemIT Community HN
- Convert Collum HN to CN and add to ChemIT Community HN
...
- If so, use another node (for example, 3).
Time and labor estimates
View Project timeline: Collum Cluster timeline.pdf
Est. labor: ~286 hours
Duration - Start-to-finish time (taking into account availability): 30 work days (start 7/31, Wed)
Work descriptions | Effort | Elapse time | Est. date | ||||
---|---|---|---|---|---|---|---|
Install and config CentOS on headnode | 3 days | 1 week |
| ||||
Install and config Warewulf | 5 days | 1 week |
| ||||
Install and config Torque / maui | 2 days | .5 wk |
| ||||
Install and config WebMO | 5 days | 1 week |
| ||||
Install and config 2nd (new) node | 1 day | .2 week |
| ||||
Copy Jun data for test | 2 days | .5 wk |
| ||||
Jun (and group) test & tweak | 2 days | 1 week | |||||
Cleanup / additional | 2 days | .5 wk |
| ||||
Move old cluster nodes to new Cluster | 1 day | .2 wk |
| ||||
Move old cluster user data to new cluster | 2 - 3 days | .5 week |
| ||||
Total | 25 days | 6.4 weeks |
|
|
|
|