Excerpt |
---|
Eldor must find a physical new home by Feb. 1st. Priority is Liang's project, which in the short term is not likely to required Eldor. |
Project lead:
...
Lulu Zhu <cz67@cornell.edu>
Team: Zhichun Liang, Peter Borbat, and Lulu ZhuOliver Habicht.
Goal
Run humongous jobs, and run them hundreds of times. Expect that having own equipment is most cost-effective, but reality-check against other options such as CAC's RedCloud service (not mutually exclusive).
To buy us time, move Eldor under CRCF's management for ~6 months. This requires a new OS and file system service to replace CCMR's AFS.
Note
Required changes to Eldor are linked to moving off of CCMR's AFS. Hence the move from AFS will be being explained here, as a sub-project to the Eldor move project.
Steps and draft timeline
Ensure CIFS (replacement to AFS) is working for Liang. Includes testing permissions, including writes.
Draft start date: May 24
Duration: About two days.
Make sure everyone has alternative ways to access AFS.
Turn off Eldor
Tentative date: May 29.
Turn on Eldor, connected to CIFS
Tentative date: About a week after Eldor is turned off.
Ensure Eldor is working for Liang. Includes testing ssh, software.
Draft start date: June 6.
Duration: About two days.
Move Liang's data from AFS to CIFS. Ensure CIFS is working for Liang.
Draft start date: June 10.
Duration: About two days.
Move other users' data from AFS to CIFS one by one.
Draft start date: Week of June 10. As CIFS added for user and data moved to CIFS, turn off AFS for that user.
Duration: About one-two weeks. Do quickly to reduce overlap of some folks on AFS and some folks on CIFS.
Turn off AFS (waiting this late allows us to roll-back, if Eldor must be reverted)
...
Tentative date: Do soon after all accounts have been successfully moved to CIFS and have thus been turned off from AFS.
Status
Must identify all current users of Eldor
Required to coordinate the downtime required to move and reinstall the OS.
Lulu is coordinating the downtime dates and downtime duration with Liang.
4/19 Discussions Oliver had:
- Alex using Eldor regularly. He simply needs to be notified of the downtime and expected duration of downtime. No other coordination expected, per him.
- Peter Borbat not currently using Eldor. Nor will he be using it in the short-term. No need to keep him informed, per him.
...
4/17/13 (Wed): Oliver and Barry looked at Eldor's partitions. The /a and /b partitions are each 1.7TB in size. They looked like they had no data beyond that assigned as overhead by the OS.
...
- Oliver's understanding, based on prior conversation with Barry: There are 2 high-end GPU's and one low-end one. These would be removed. The on-board video remains, of course.
To do's
...
- Lulu and Liang, with Barry, work out timing for when server gets moved from CCMR to ChemIT, and the necessary steps.
- Lulu and Liang work out downtime required for Lulu to replace OS with an OS ChemIT can support.
- ChemIT staff propose file share service to be used by Eldor post-migration, with pricing estimates.
- File share system goes through a technical review by Barry, Liang, Peter (and others?) so they can recommend it to Jack.
- Jack approves the file share service his staff recommend.
- Migration done per schedule created in previous to-do's.
- Server is moved to ChemIT.
- Lulu removes the GPUs and gives them to Peter.
- Lulu installs OS.
- Lulu and Liang test server post OS change.
- Lulu and Liang to work out service windows for Lulu to do necessary periodic server administration tasks.
- Liang confirms server is operational.
...
Project notes and efforts
5/22/13: Lulu set up CIT SFS CIFS service, in preparation for Eldor's move to CRCF.
As of 5/22/13: Lulu worked out Linux/ CU AD integration.
As of 5/1/13: Lulu is working on Linux/ CU AD integration in preparation for Eldor's move to CRCF.
4/3/13: Oliver spoke with Liang and Peter. Liang and Peter approved removing the (computational, GPU) video cards before Lulu upgrades the OS.
...