Upgrade to CentOS and add 2 nodes to existing

Cluster Upgrade Description:

Goal: Upgrade and expand the existing Collum High Performance Computing (HPC) cluster.

The existing cluster consists of a head node and 6 slave compute nodes. The desire is to purchase 2 new nodes and expand the cluster to 8 compute nodes.

Cluster software currently in use includes:

Notes:

In order to provide a upgrade to current OS and provide a smooth transition, the proposed upgrade and sequence are as follows:

This will result in a fully upgraded cluster, using the current HPC tools, with a newer Head node (Under Warranty).

Plan

See work estimates below for detailed steps.

Overview plan:

Risks

And possible ways to address them.

Gaussian needs to be recompiled (under new OS).

WebMO needs to be upgraded from 2010 version. Perhaps a good idea to do anyway, if "good reasons" are identified.

Node 2 turns out to be a dud (fails or is unstable).

Time and labor estimates

View Project timeline: Collum Cluster timeline.pdf

Est. labor: ~286 hours

Duration - Start-to-finish time (taking into account availability): 30 work days (start 7/31, Wed)

Work descriptions

Effort
in Hours or days

Elapse time
(usually days or weeks)

Est. date

Install and config CentOS on headnode
(Use Node 2)

3 days

1 week

 

Install and config Warewulf
Includes attaching one (new) node

5 days

1 week

 

Install and config Torque / maui

2 days

.5 wk

 

Install and config WebMO

5 days

1 week

 

Install and config 2nd (new) node
and 3rd (old) node.

1 day

.2 week

 

Copy Jun data for test

2 days

.5 wk

 

Jun (and group) test & tweak

2 days

1 week


Cleanup / additional

2 days

.5 wk

 

Move old cluster nodes to new Cluster
Keep the old head node for 1 month?

1 day

.2 wk

 

Move old cluster user data to new cluster
(Collum group down time)

2 - 3 days

.5 week

 

Total

25 days

6.4 weeks