NSF grant awarded, so this is a "go", August 2013
Next steps
- Meet to review all options and confirm desired direction and expected timing.
- Review resources. Xiao-Qiu Ye has cluster experience, including set-up.
Draft idea
High-level overview
- Create a stand-along cluster using new hardware ($25K).
- Uses new OS and related cluster management software.
- Install and configure necessary applications.
- Enable NetID-based access, if possible (limit 2-3 days for a "go/no-go" decision on this functionality)
- Confirm old nodes can successfully be added to that new cluster.
- Migrate users to new cluster.
- Migrate old nodes to new cluster.
Unknowns
- Time for install of all necessary applications, many of which are new to Lulu.
- Whether NetID-based access will succeed. But note that this is not a do-or-die step.
Tasks and estimated timing
Vet options and conduct needs analysis to match to hardware order
- 1-2 weeks
Specify exactly the systems to order within budget. Includes iterating with vendor experts.
- 1 week
Place order
- 1/2 week
Delivery, after order is placed at Cornell
- ~3 weeks
Receiving order and set-up in 248 Baker Lab
- 1 week
Get head node and 1st cluster node operational with OS and cluster management software
- 3 weeks
- Lulu becomes available ~mid-September?
See unknowns, above, which related to tasks which will obviously take additional time to accomplish.
Other ideas
- Rent from CAC.
- Walk through rates and scenarios.
- 12 cores available at any one time.
- $400 buys you 8585 core*hours, which is ~1 core for an entire year, non-stop.
- For 96 cores, that's $38.4K for 1 year, non-stop.
- 96 = 8 nodes, each with dual 6-core procs => 8 * 12 = 96