You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Upgrade to CentOS and add 2 nodes to existing

Cluster Upgrade Description:

Goal: Upgrade and expand the existing Collum High Performance Computing (HPC) cluster.

The existing cluster consists of a head node and 6 slave compute nodes. The desire is to purchase 2 new nodes and expand the cluster to 8 compute nodes.

Cluster software currently in use includes:

  • Fedora 11 - Operating System (2009 release, current version is Fedora 19)
  • Perceus - a slave node provisioning and management system (Note: Percus is now obsolete, development has ended.)
  • Torque - a distributed resource manager / queuing system, providing control over batch jobs and distributed compute nodes
  • Maui Cluster Scheduler - job scheduler for use on clusters
  • Web MO - a Web-based interface to computational chemistry packages

Notes:

  • Discovery shows that the new nodes will require an updated operating system version, as they are not supported in Fedora 11. (Will not boot on current OS Configuration).
  • ChemIT is now using CentOS (6.4) for new OS installations instead of Fedora
  • Warewulf has superseded Perceus for cluster node provisioning and management, and is ChemIT's current provisioning package
  • Torque and Maui are still the preferred manager and scheduler.

In order to provide a upgrade to current OS and provide a smooth transition, the proposed upgrade and sequence are as follows:

  • Build new cluster with current software, utilizing one of of the new nodes as a head node. Once this is working, transition the existing cluster hardware to the new cluster.
    • Install HPC Cluster software- CentOS 6.4, Warewulf, Torque, Maui, and Web MO
    • Install applications
    • Add 2nd new node as a slave, creating a functioning cluster
    • test, verify
    • Move accounts, data, and computing nodes from old cluster to the new cluster.

This will result in a fully upgraded cluster, using the current HPC tools, with a newer Head node (Under Warranty).

Plan

See work estimates below for detailed steps.

Overview plan:

  • Pull an old Compute Node (CN) and convert to a new CentOS Head Node ("new HN")
  • Add new CNs to new HN
  • Add one old CN's to new HN
  • After testing, shift production to new HN.
  • Add the rest of the old CN's to new HN
  • Convert old HN to CN and add it to new HN
  • Later; see P41: Migrate to ChemIT Community Head Node
    • Add all CN's to ChemIT Community HN
    • Convert Collum HN to CN and add to ChemIT Community HN

Risks

And possible ways to address them.

Gaussian needs to be recompiled (under new OS).

  • If so, add A LOT of time and uncertainty. Spools up a whole new, large project, and crack open the PGI compiler.

WebMO needs to be upgraded from 2010 version. Perhaps a good idea to do anyway, if "good reasons" are identified.

  • If so, $1,000. And time/ process to order and get software, learn of differences, and apply it.

Node 2 turns out to be a dud (fails or is unstable).

  • If so, use another node (for example, 3).

Time and labor estimates

Start-to-finish time (taking into account availability): ~5 weeks (start 7/31, Wed)

Est. labor: ~110 hours for Lulu, plus Michael's hours for networking, hardware configuration, consultation, etc.

Work descriptions

Effort
in Hours or days

Elapse time
(usually days or weeks)

Est. date

Install and config CentOS on headnode
(Use Node 2)

3 days

1 week

 

Install and config Warewulf
Includes attaching one (new) node

5 days

1 week

 

Install and config Torque / maui

2 days

.5 wk

 

Install and config WebMO

5 days

1 week

 

Install and config 2nd (new) node
and 3rd (old) node.

1 day

.2 week

 

Copy Jun data for test

2 days

.5 wk

 

Jun (and group) test & tweak

2 days

1 week


Cleanup / additional

2 days

.5 wk

 

Move old cluster nodes to new Cluster
Keep the old head node for 1 month?

1 day

.2 wk

 

Move old cluster user data to new cluster
(Collum group down time)

2 - 3 days

.5 week

 

Total

25 days

6.4 weeks


 

 

 

 

  • No labels