You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 30 Next »

NSF grant awarded. Thus, this is a "go" as of August 2013.

Next steps

  • Meet to review all options and confirm desired direction and expected timing.
  • Review resources. Huayun Gen has cluster management experience, including set-up.

Draft idea

  • Create a stand-along cluster using new hardware ($25K for minimum of 3 years operations (to confirm!). Thus, ~$8K/yr in hardware)).
    • Uses new OS and related cluster management software.
    • Install and configure necessary applications.
    • Enable NetID-based access, if possible (limit 2-3 days for a "go/no-go" decision on this functionality)
  • Confirm old nodes can successfully be added to that new cluster.
  • Migrate users and data to new cluster.
  • Migrate old nodes to new cluster.

Unknowns

  • Time for install of all necessary applications, many of which are new to Lulu. Then configure, verify, and de-bug new-installation-related issues.
  • Whether NetID-based access will succeed. But note that this is not a do-or-die step, thus we will limit the duration of our investigation, with the hope that we can make this happen.

Tasks and estimated timing

Top Level Task Description

Effort Est.

Assignee

Planning

 

 

Discovery/ Overview mtg

1.5 hrs

 

Vet options and conduct needs analysis to match to hardware order

1-2 weeks

 

Specify exactly the systems to order within budget. Includes iterating with vendor experts.

1 week

 

Approval

0 days

 

Order & Installation

 

 

Place & Process order

1/2 week

 

Delivery, after order is placed at Cornell

~3 weeks

 

Receive order and set-up hardware in 248 Baker Lab

1 week

 

Build New Cluster

 

 

Get head node and 1st cluster node operational with OS and cluster management software

3 weeks

 

Test / Verify / Approval

1 week

 

Convert Old Cluster

 

 

Move user accounts and data; test, prep, and do

1 week

 

Move old nodes to new cluster

1 week

 

  • Lulu becomes available ~mid-September or early Oct, as of 8/21/13.
  • See unknowns, above, which related to tasks which will obviously take additional time to accomplish.

Other provisioning models and related ideas

  • We can walk through rates and scenarios, as appropriate.
  • We can meet with CAC since they may be willing to do more with a commitment of $25K than is published with their $400 min. offering.
    • Brainstorming idea: Would they be willing to add hardware to CAC's RedCloud to get a buyer of that hardware a better cost and/or privileged access?

Buy cycles, on demand

Good for irregular high-performance demands, especially if have high peaks of need and long-lasting jobs.

  • Buy cycles from CAC (RedCloud, minimum of $400 for 8585 core*hour
    • http://www.cac.cornell.edu/RedCloud/start.aspx
    • 12 cores available at any one time on one system.
      • Can access more than one system at a time, but systems are not linked.
    • $400 (minimum) buys you 8585 core*hours
      • This comes out to ~1 core for an entire year, non-stop.
    • For 96 cores, that's $38.4K for 1 year, non-stop.(They have a max of 96 cores <http://www.it.cornell.edu/about/projects/virtual/test.cfm>.)
      • 96 = 8 nodes, each with dual 6-core procs => 8 * 12 = 96
    • Or, for $25K, that's ~536,562 cores*hours.
      • $25K = $400*62.5 units. And each unit is 8585 core*hours, so 62.5 of them gets you 536,562.5 cores*hours.
      • That comes to ~178,854 core*hours/ yr for 3 years. Which is 20.8 core system running non-stop each year. (Compare to one hardware node, which has 12 cores.)
  • Determine costs, processes, and trade-offs if use another cloud service, such as:
    • Amazon. Amazon AC3?
    • Google. Google Compute?

Host hardware at CAC rather than with ChemIT

Hosting costs at CAC is for basic: Expert initial configuration, then keep the system current, and keep the lights running. Other service charged hourly.

Per the above rate calculator, the rate for 9 nodes (1 head node + 8 compute nodes) would be $8,291/yr. Or, $24,873 for 3 years for this service.

At current ChemIT rates, 9 nodes would be $321.84/yr. Or, $965.52 for 3 years of service.

  • ChemIT rates are set by the CCB Computing Cmt and may change at any time. The rate for a group's single system (in a cluster or not) is $2.98/month, or $35.76/yr.

Table, related to our options

                          Option ==>
Consideration, below:

ChemIT

CAC:
RedCloud

CAC:
Hosting

Amazon (EC3?) or
Google (Compute?)

Other ideas?

Hardware costs

$25K

-

$25K

-

 

Hardware support

Yes.

-

Yes.

-

 

OS install and configuration

Yes. CentOS 6.4

 

Yes. CentOS 6.4

 

 

Cluster and queuing management

Yes. Warewulf, with options

-

Yes. ROCKS, no options.

-

 

Research software install and configuration

Yes

No

Yes; additional cost

No

 

Application debugging and optimization support

Not usually.
Available from CAC, at additional cost?

Yes; additional cost

Yes; additional cost

No.
Available from CAC, at additional cost?

 

  • No labels