Moving Forward with Backing Up

      Comments Off on Moving Forward with Backing Up

If you have sysadmin duties in a library like Mason’s (where our core technologies actually reside in the library and not the computer center), backing up is one big part of what you do. Though heretical at the time, we abandoned tape for disk-to-disk backup in the early ’90’s so while it doesn’t take a lot of any one person’s time it’s still something that has to be scripted, tested, monitored and regularly thought about. When it comes to the actual work of getting the backups done, well, it’s pretty much just one machine talking to another at cron-induced intervals.

For some of our systems, backup is a simple script containing a find /cpio combination. Here’s the key:

cd /DirectoryToBackup
find . -depth -print | cpio -pdmv /PlaceWhereYouWantToPutStuff

‘find’ steps recursively through all the files and sub-directories, sending each file’s name to ‘cpio’ which then copies it to the destination if the file is newer than a copy in the destination directory. You probably see the gotcha here: If you have deleted a file that was backed up last week it will stay in the destination directory since there’s nothing newer to overwrite it.

That’s where rsync shines: it makes sure the destination is identical to the original (deleting files in the destination directory that don’t exist in the original). But I digress..

For some of our systems we’ve created elaborate backup/redundancy environments. Our Voyager ILS, for example, lives on a striped, mirrored, hot-spared RAID5 array inside a Sun V880 which gives an immediate measure of protection against any single disk failure. Each evening, we take the system down for about 13 minutes and run a backup from this mirrored array to yet another stripped partition on a different set of drives in the same machine. When that finishes, Voyager returns and a follow-on process copies that new “backup” partition to another machine in yet another part of the library. This copy of the backup goes to a different machine each night so that over the course of seven days, we have seven copies spread out across seven different systems. Were we to have catastrophic failure, we could cycle back through these backups and surely find one that could be restored.

For most of our other systems, like MARS (our DSpace institutional/digital repository) less manic measures are sufficient. The hardware environment for that system enables us to function well on a weekly backup.

MARS lives on a XServe RAID (RAID5) array with two hot spare drives so we’re really only talking about needing to survive a failure of more than three drives (a component of the array and both spares)–which isn’t likely in the course of a single week. At weekly intervals we mirror the bitstreams partition to another RAID5 partition on a nearby server. We follow a slightly different process for the postgres-based metadata but always have several restorable copies lying about.

For those services that live on Apple XServes (e.g., E-Reserves, our research portals, MARS and parts of the library’s website), we use SuperDuper! to clone the system bootdrive (with Applications) to a spare internal drive on weekly schedule and then mirror important data via rsync to another machine.

At the center of our backup strategy for every system is an Apple XServe with a 1.8 TB RAID5 array. Throughout the week this machine sets up and tears down NFS exports to our Solaris, Apple or Linux servers for rsync mirroring of important data.

Note: To improve security, we export the relevant directory from the backup server to the specific machine we want to backup. The backup process is launched on the production machine (via cron) which then mounts the exported drive from our backup server. We never export directories from our production machines and restrict access to the NFS daemons via hosts.allow. Not foolproof, of course, but combined with a few other measures it is much more secure than failing to make backups would prove to be.

Then, weekly this large data store of backups is mirrored to a 500GB external drive attached to the XServe via the firewire port. We have two of these firewire drives and rotate them weekly: When finished, this week’s copy goes across campus to another building and last week’s drive returns and is connected to capture next week’s data. In this way, we have a copy of all data stored off site and assume it will never be more than a couple of weeks old. We’ll next have to look into getting a drive completely off campus since we’re reasonably well protected in case of something like a building fire but still quite vulnerable to a disaster with a slightly larger footprint (e.g., meterorites, flood, etc.).

FF223D0D-F48C-486F-9542-786EA2E79C3A.jpgEarlier this week I incorporated a Drobo into the backup regimen: as one more spot where I can store the most important 500 or so gigabytes of library data. I decided to go with the Drobo since I also had a half-terabtye of video that I needed to back up somewhere and putting it on an expensive RAID unit seemed a waste of higher-throughput space. The Drobo seemed a reasonable alternative. The Drobo isn’t fast by any means (throttled by a USB 2 interface and a controller that doesn’t appear to offer much in the way of throughput) but it is dead-simple easy to use. To quote Howlin’ Wolf, “it’s built for comfort, it ain’t built for speed.”

I put four 500GB Seagate SATA-2 drives in the drobo (yielding roughly 1.36 terabytes of protected storage) and in no time had a functioning array ready for data (by the way, SATA-I would be fine since the Drobo can’t keep up with that level of throughput either). It’s not RAID but the Drobo uses some sort of proprietary storage virtualization system. According to the documentation, if a drive in the Drobo fails, doing a hot swap with a replacement drive will restore the data and reestablish redundancy. I’ll just think of it as a black box appliance.

Attached to my Mac Pro, I just completed a backup of close to 280 gigabytes of data from an external drive attached to the same computer via Firewire 800 and on larger files, throughput averaged roughly 15MB/s according to the rsync progress reports:

/weblogs/ezproxy/FY0506/01_31_06-03_03_06.log
  1015703564 100%   15.83MB/s    0:01:01  (65313, 100.0% of 268220)
/weblogs/ezproxy/FY0506/10_25_05-01_30_2006.log
  2147483647 100%   15.97MB/s    0:02:08  (65314, 100.0% of 268220)
/weblogs/ezproxy/FY0506/ezproxy.log.030306_050306
  1847436796 100%   16.10MB/s    0:01:49  (65315, 100.0% of 268220)
/weblogs/ezproxy/FY0506/ezproxy.log.082505_102405
  1561279782 100%   15.86MB/s    0:01:33  (65316, 100.0% of 268220)

The plan going forward is take the external drive that’s pulled from our XServe each week and rsync it to the drobo before sending it across campus to its “undisclosed” location. This will give me a nearby copy of the data and thanks to the drobo’s built-in redundancy, a safer feeling than I have exposing backup data to the failure of a single-spindle drive.