Increase computational capacity for Crane's group.
Brian Crane

Karen

Lulu

Overall strategy

Upgrade cluster's OS and add 2 new compute notes. Very similar retro-fit as we did with Collum's system this summer

Broad steps

Top Level Task Description

Effort Est.

Assignee

Keep cluster running until cut-over noted below.

 

 

Pull an existing, old compute node from Lancaster's cluster and convert it into a new head node.

 

 

Install new OS

 

 

Add a new compute node (from Crane) to this new head node.

 

 

Pull another existing, old compute node from Lancaster's cluster and test it mounts to new head node.

 

 

Test new head node with 2 new compute nodes (from Crane), and old compute node

 

 

Cut-over accounts and data from old compute node to new compute node.

 

 

Move the rest of Lancaster's old compute nodes to the new head node.

 

 

Confirm all is working.

 

 

Convert old head node into a compute node.

 

 


Current and recent activity and decisions

8/27/13Tue: Oliver met with Brian Crane

  • Brian Crane graciously agreed to wait, allowing ChemIT to start and focus on Hoffmann's new cluster instead.
  • Crane will email Kyle Lancaster with an update. Plan is to move forward with retrofit of Lancaster's cluster (new OS), pending Kyle's approval. See above for more details. No need for interim solution from ChemIT.
  • Oliver to meet with Kyle to review steps and timing in more detail.
  • Having come close to completing Collum's cluster retro-fit, Oliver estimates Lulu being able to start in ~8 weeks. BUT, we can and should start work with Brian and designates much earlier (4 weeks earlier!) to spec and order hardware so it can be ready for Lulu to work with in ~8 weeks.
  • Oliver to provide Crane a status report every 2-3 weeks, until we start
  • Idea: Have Lulu install SBGrid on Lancaster's cluster so Crane's group can test. This might help inform hardware spec if bottleneck(s) identified using Lancaster's current compute nodes.
    • Oliver to discuss this idea with Lancaster. Then, determine when Lulu would have time to invest in this installation.

Previously shared activity and ideas

From ~3/13:

1) CRCF to advise on what system would work. Do within 2 weeks of 3/4/13. RedBarn and Dell. Informed by applications used. ~$5K.

2) CRCF to explore possible opportunities for mutual benefit, including:

  • CRCF provision a shared head node so researchers only need to invest in compute nodes.
  • Discuss with existing cluster owners if they'd prefer we "add" to theirs and share, rather than remain stand-alone. Example is Kyle's cluster.

Applications and other software required

Rosetta (Compiler: gcc, g++, Scons)

Gaussian

ORCA - DFT Free

Sbgrid

Charm-based s/w

FireFly - QM/MM Free

ADF

NAMD - MD Free

QSIH

TINKER

MOLCAS

GRUMACS

Notes

8/27/13Tue: Oliver met with Brian Crane (notes, above).

End of July 2013: Oliver met with Crane, reviewing likely options and time. Agreed on next check-in in ~ 1 month.

3/4/13: First meeting, Crane and Oliver. Characterize needs, explore some alternatives, and define next steps.


Options considered

1) Add Crane nodes to Lancaster's cluster

New node hardware can't easily, if at all, be added to existing cluster, which is running an old OS.

2) Add Crane's 2 new nodes to Chem Cluster

The Chemistry Community Headnode will hopefully be ready for business in the coming months (as of 8/2013).

3) Create one (or two) stand-alone workstations, as a stop-gap measure

ChemIT could "quickly" stand-up new hardware to enable Crane group's access to more powerful systems. But clearly this would only be an interim solution.

  • No labels