I realize it’s been around for at least two years but only recently have I begun spending some time with Amazon’s Simple Storage Service (S3). Beyond just enjoying the exploration of something new, I’m trying to figure out whether it makes sense to incorporate S3 into of our day-to-day IT operations (e.g., as another destination for off-site server backups) and perhaps even include it as part of a more ambitious digital preservation strategy.
I’m not looking for a single “best place to put all my digital assets” preservation service (e.g., like OCLC’s Digital Archive). After all, if you want to preserve something that can be infinitely and non-destructively cloned you’ll have to convince me that reliance on a single copy stored somewhere else is a compelling strategy.
Oh I know what you’re thinking…
Yes, I do appreciate that long-term digital preservation involves more than just the storage of bitstreams and that it is concerned with risks and issues that can extend beyond the simple failures of man or machine. I get that. But regardless of how many “archival” services you layer on, there’s no denying that the actual bitstreams are what’s fundamental and in their absence the other services won’t much matter. All of which means right now I’m concentrating on preserving the bitstreams and I’ll devote more time to worrying about the higher-order issues as we get closer to the future (and further from the time that just having the bits around is pretty much enough).
A quick aside: Peter Murray offered an interesting apples-to-oranges comparison of OCLC’s Digital Archive service and S3 last May that is worth reading. His post dug a bit deeper into issues raised by Barbara Quint’s “OCLC Introduces High-Priced Digital Archive Service” post on Information Today, Inc.. Main point: there’s more to preservation than simple file storage so you can’t say OCLC is overpriced just because they charge $7.50 per gigabyte per year for their service while Amazon S3 charges only $1.80. Well, maybe…
Using S3
I actually began using Amazon’s S3 service several months ago without even realizing it, thanks to DropBox. If you haven’t tried DropBox, let me recommend it as a simple cross-platform (Win, Mac & Linux) way to sync files across multiple machines. DropBox offers a free 2GB account and the option to upgrade to a 50GB account (which costs $99 per year). Dropbox stores your files on Amazon’s S3 service (but under their account, not one Amazon has assigned to you).
Thus far I’ve tried two methods for accessing my personal S3 account:
JungleDisk – Cross platform
I’ve had mixed results with JungleDisk. Like DropBox, there are JungleDisk clients for Mac, Windows and Linux but I’ve had more than a little trouble getting JungleDisk to function reliably on my EeePC running Ubuntu (by contrast, Dropbox works like a charm on the EeePc). Unlike DropBox, you’ll need to set up your own S3 account before using the JungleDisk client. A “lifetime” license to JungleDisk costs $20.00 (US).
Transmit – Mac only
Panic Software’s Transmit is a great S3 client that also does FTP, SFTP and WebDav. If you’re moving a lot of data to S3, an FTP-like program makes more sense than a locally-cached desktop drive (e.g., DropBox). I’ve also set up a section of my S3 space for hosting web content (e.g., the Transmit icon above is being served up by S3) and find that Transmit works really well for managing this sort of content.
Here are a few other client options:
S3Hub (Mac Only)
CloudBerry Explorer (Windows)
S3Fox (Firefox extension)
If you don’t already have an S3 account, you can sign up with Amazon in just a few minutes. You’re not charged anything until you actually begin using your account for storage. Put a gigabyte there and you’ll be charged $1.80 per year (15 cents per month) to store the file and depending on activity other transfer costs as well:
United States
Storage
* $0.150 per GB – first 50 TB / month of storage used
* $0.140 per GB – next 50 TB / month of storage used
* $0.130 per GB – next 400 TB /month of storage used
* $0.120 per GB – storage used / month over 500 TB
Data Transfer
* $0.100 per GB – all data transfer in
* $0.170 per GB – first 10 TB / month data transfer out
* $0.130 per GB – next 40 TB / month data transfer out
* $0.110 per GB – next 100 TB / month data transfer out
* $0.100 per GB – data transfer out / month over 150 TB
Requests
* $0.01 per 1,000 PUT, COPY, POST, or LIST requests
* $0.01 per 10,000 GET and all other requests*
* No charge for delete requests