Excerpt |
---|
Add true backup functionality to the existing cluster. Discussion started Oct. 2013. |
Table of Contents |
---|
Challenges to doing backups: The problem to solve
The quantity and number of files can put demand on the head-node and may take too long
- Will the backup put too much demand on the computer, competing with the computer's primary research purpose?
- Will the backups take too long?
System uses ~800 cores, with 30 programs (per Yi He).
The number of files
- 100's of millions of small files, along with "normal" files.
The amount of data
- 3.1TB of data (research and users)
- On 6TB partition
- 77GB OS data
- On 160GB partition
Cost is an issue since not originally budgeted.
Key take-home
- Anything that can reduce the number of files from "100's of million" will help tremendously.
- Reducing the total amount to backup can help keep costs down and speed up restoration efforts.
Objectives
Capability to restore system and data hardware to all-new hard drives, if necessary
Deal with various failures, such as:
- file system corruption#
- hard disk failures
- RAID controller card failures
- motherboard failure
- other server hardware of sub-system failure
- fire damage of room*
- water damage of room*
- theft*
- malicious incursion into the system*#
* Requires an off-site copy to be effective.
# May require versioning to get prior copies of since-corrupted data.
NOTE: If other hardware than hard disks are required to be replaced, OS may need to be modified
Ability to have some versioning, if not too expensive an add-on.
Options
Option1: In-room, off-box, copy
HDs in a dedicated computing box.
Option2: Out of room
HDs in a dedicated computing box.
Hosted file service.
EZ-Backup.
- $850-1,650/yr, depending on level of file compression.
- Software may not have time to process the files at each backup.
Ideas to consider for above options
Software to sync data, ideally with versioning
EZ-Backup
- See ChemIT's backup page, with links and general costs.
- See ChemIT's calculation estimating specific costs (handout, from Excel file)
CrashPlan
Compare
How CrashPlan backup works (technical)
Real-time Backup Version Retention
Backup Frequency and Version Retention
Backing Up Very Large File Selections
Software that backs up changes to the partition, not looking at files per-se.
One possibility Oliver found:
CA ARCserve D2D I2 Backups
12 minute technical review of D2D
PDF Brief
Pricing
We'd like to know the educational price (if any), but here are the upper-bound prices:
http://shop.arcserve.com/Products/D2D/CA-ARCSERVE-D2D-FOR-LINUX-Product-plus-1-Year-Maintenance
- (List) Price: $732.00
http://shop.arcserve.com/Products/D2D/CA-ARCSERVE-D2D-FOR-LINUX-Product-plus-3-Year-Maintenance
- (List) Price: $976.00
Oliver can't tell whether one pays more for additional clients. The other OS options specify number of seats, so the above implies to Oliver a single price gets you unlimited clients(?) per server.