Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • S3 is an object store, not a file system. You will need to make sure that the tools you use to accomplish backup/archive are S3-savvy. Since S3 is not a file system, some features you might expect are missing or can be costly to replicate.
  • S3 storage can be very cheap indeed, but you need to be careful that the tools you are using don't end up costing you a lot for S3 API operations used for checking object hashes and collecting other metadata from objects.
    • E.g., it costs money just to get the MD5 hash or creation date for an S3 object. It's not much money, but it can add up when dealing with hundreds of thousands or millions of objects.
  • Remember AWS S3 isn't the only cloud storage available. Azure, Google, and Wasabi are other options.
Info

 If you aren't sure that AWS S3 is right for you, this Cornell tool can show other alternatives: https://finder.research.cornell.edu/storage

Framing Your Use Case

Here are some questions that may be valuable to answer when thinking about backup and archiving:

  • What do you want to restore from your backup or archive? Specific files? All files as of a specific date? 
  • How fast do you need it? I.e., what is your Recovery Time Objective (RTO)?
  • What is your Recovery Point Objective (RPO)? I.e. how far out-of-date can objects be when restored?
    • E.g., A server RPO may be 24 hours, meaning that its OK to have restored files be as much as 24 hours old, but no older.
  • Should different versions of backed-up/archive objects be kept? Or, do you always want the latest version?
  • How often do you envision having to restore data?
    • Some service pricing is fairly expensive when you actually need to restore data, especially in short timeframes.
  • What are the basic metrics of the data in scope for your use case?
    • Total cumulative size of target data?
    • Total number of target files/objects?
    • Total number of target files/objects < 128KB?
      • Some services handle smaller objects differently than larger objects
    • Estimate the of Estimated number of target files/objects deleted before 90 days
      • Some services require a minimum object lifetime and will charge you for storing the object for the entirety of that period, even if the object is deleted before reaching that age.

...

  • AWS CLI contains a basic sync command.
  • rclone is a CLI tool in the same vein as rsync, but it is savvy about cloud object stores like S3.
  • s3cmd is a another third-party option

True Backup Software Using S3 for Storage

...

AWS has a lot of tools and service options to make it easier to move/sync data from on-premise sources to AWS. These services can be a great solution if they do exactly what you need and you have the budget for them. Tools in this category include:

...

Anti-Patterns

Data Already in AWS

Don't role-roll your own backup/archive solution if your data is already in AWS. Use built-in AWS services and features:

...

...