Introduction
AWS S3 (including S3 Glacier) can provide very cheap object storage for some on-premise backup and archiving use cases. But, be careful. There are some pitfalls:
- S3 is an object store, not a file system. You will need to make sure that the tools you use to accomplish backup/archive are S3-savvy.
- Since S3 is not a file system, some features you might expect are missing or can be costly to replicate.
- E.g., it costs money just to get the MD5 hash or creation date for an S3 object. It's not much money, but it can add up when dealing with hundreds of thousands or millions of objects.
- S3 storage can be very cheap indeed, but you need to be careful that the tools you are using don't end up costing you a lot for S3 API operations used for checking object hashes and collecting other metadata from objects.
General Approaches
There are a lot of pathways to get on-premise files to S3 or other AWS services. Picking the right one will depend on your use case, budget, ability or desire to tinker and monitor costs, and palatability of deploying additional on-premises resources.
rclone
rclone is a CLI tool in the same vein as rsync, but it is savvy about cloud object stores like S3.
True Backup Software Using S3 for Storage
Many backup software solutions can use S3 for backend storage. An example of this for smaller-scale deployments is MSP360 Managed Backup (formerly CloudBerry Backup).
AWS Tools and Services
AWS has a lot of tools and service options to make it easier to move/sync data from on-premise sources to AWS. These services can be a great solution if they do exactly what you need and have the budget for them. Tools in this category include:
Anti-Patterns
Data Already in AWS
Don't try to role-your own backup/archive solution if your data is already in AWS. Use built-in AWS services and features:
- AWS Backup
- EBS Snapshots
- RDS Snapshots
- S3 Replication
- ...
Resources
Internal
- SFTP and Cloud Object Store Clients - OS X and Windows
- S3 Sync Analysis – Analyzes S3 API-costs of various rclone settings
External
- AWS Pricing Calculator
- rclone
- S3P - Massively Parallel S3 Copying
- https://wasabi.com/ – Provides S3-compatible storage
- Cornell has a contract with Wasabi and we expect to roll out Cornell unit access to Wasabi under that contract in the future (no specific timeline set as of ).