I’ve been learning a few things about backing up my home network and pretty much deciding I’m not that interested in dealing with this much longer. All in all, I have a couple terabytes of digital content – videos, music, photos, books, and miscellaneous other files that have seemed worth keeping. I’m slowly coming the conclusion that there’s some portion of this content that I just don’t care about anymore and quite a bit of it that I can get online somewhere else on demand. So, why hold onto and maintain this stuff myself? I made the plunge a while back and ripped all my DVDs and CDs to high resolution digital files and got rid of the hard copies (I know, I wasn’t supposed to do that, right?), but I’m not quite there yet with the digital content.

So, I’ve got all these files, and I set up a Synology Diskstation a while back as a home NAS for all this stuff. That platform has a whole lot of out of the box, plug-and-play type of features that have been great because I don’t have to think about anything. I serve videos out to watch anywhere in the house on various devices. I stream my music out to a couple of places as well via Airport Express. I manage my eBooks via Calibre, build out a catalog with Calibre2OPDS, drop it on the NAS, and sync it out to a personal web site only I can access. I’ve also pushed most of my music up to Google Play a while back to see what that’s like. So, I have these data in several different places.

I did get a little worried at one point about losing everything, so when I saw the nice little Amazon Glacier app on the Diskstation, I thought I might as well give it a try. Backing everything up was super simple, and it was only costing me a few bucks a month. Then I had hard discs start crashing because I was cheap and only bought desktop-quality drives instead of the NAS drives that can spin longer. One drive wasn’t a big deal – my RAID config took care of that. I replaced that and didn’t do what I should have done, which was set up a hot spare in the configuration, so when I had two more discs bomb on me at the same time, the whole system went Tango Union.

Glacier backup to the rescue, thought I. Well, it wasn’t quite that easy. AWS Glacier is kind of like the Hotel California – it’s really easy to check in, but it’s dang expensive to check out. Costs go way up for retrieval, which I hadn’t bothered to check into when I set it up. I missed the little config deal in the Diskstation App that lets you decide how much you’re willing to spend on getting your stuff out of the vaults. Fortunately, I have expenses categorized in Mint such that I got an alert that I was going way over budget on “Internet Stuff.” I went in and figured it out and shut down the process. I ended up using the retrieval to get most of my 50,000+ photos, and then pulled most everything else from some other place. I dug around and found copies of most of my movies, music, and books and decided some of the other files just didn’t matter anymore. So, I then said to hell with Glacier, found a couple of decent hard discs I had around, and spun up a fresh backup to those in case things go south again. I’ll try to remember to occasionally refresh that local backup, but I’m also leaning more toward deciding that most of this content is irrelevant in today’s connected and content-rich networked world.

So, now I’m trying to wipe out all my Glacier vault space so I’m not paying for anything on that AWS service, and that’s proving to also be a royal pain. For a few of the vaults that were created by the Diskstation app, I was able to use the app to erase the archives and then delete the vaults. But there’s one in there with about 750GB that got somehow disconnected from the app itself, so I’m screwing around trying to find out how to do all this some other way. It turns out it’s not that bad with the AWS command line interface, but holy crap does the documentation on all of this really suck. I swear it’s like whoever wrote Amazon’s online docs in this area is a frustrated philosopher or maybe a textbook writer. Maybe I’ve become too much of a lazy scriptkitty since I stopped trying to write any real code a long time ago, but I just wanted to find a simple recipe somewhere on how to do this instead of having to wade through piles of documentation on the thinking and reasoning behind the 3 or 4 ways that I might possibly be able to do what I’m trying to do.

So, for my own sake in possibly ever having to do this again, here’s my reference and essential recipe:

  1. Installing the AWS Command Line Interface (CLI) was pretty straightforward with python and pip, and then configuring it with the AWS ID and secret key went smoothly enough (I think I’ll get everything done in one session and won’t have to wade through the incredibly lengthy discussion on setting up config files or environment variables.)
  2. Here’s the list of Glacier commands
  3. I used describe-vault to see what things look like; vault-name was easily enough obtained from the AWS console
  4. I used initiate-job to kick off an inventory-retrieval job; make sure you copy the dang job ID from the response or you might have to use something else to go find that
  5. I’m now waiting on that job to get done so I can go use get-job-output to retrieve the list of all my archive IDs. Meanwhile, I’m using describe-job to see when the job actually finishes so I can get the output.
  6. Once I have the IDs, I then have to write a script to run through those and use delete-archive to loop through and delete all the archives.
  7. Once that’s done, I will presumably be able to delete my last remaining vault using delete-vault or just going to the AWS console via a browser and doing it from there.

It’s taking quite a while to retrieve the inventory. In the meantime, I went out and picked up a copy of CrossFTP and connected it up to my AWS Glacier account. It managed to pull the inventory right away, so perhaps my other job is stalled somewhere, and I’m now recklessly deleting all my archives so I can get myself out of this mess. So much for CLI; I’m back to the trusty ol’ UI, and it’s working great.

So, my conclusions are these:

  1. I think I’d be happier if I didn’t have to deal with this shit at all, so perhaps it’s time to completely cut the cord from having to own any more of my own digital content than I can reasonably store cheaply on a few local devices and my web site.
  2. Amazon Glacier has a really interesting operational model that probably makes completely clear sense to the people who built it and to real developers but is all together a major pain in the ass to manage for the mere mortal. It’s a cautionary tale for my own line of work where we are building advanced data capabilities for earth system science – I should never assume that because I understand how something works perfectly well enough to make it operate that everyone else will be able to get it.