The cback-span command

Introduction

Cedar Backup was designed — and is still primarily focused — around weekly backups to a single CD or DVD. Most users who back up more data than fits on a single disc seem to stop their backup process at the stage step, using Cedar Backup as an easy way to collect data.

However, some users have expressed a need to write these large kinds of backups to disc — if not every day, then at least occassionally. The cback-span tool was written to meet those needs. If you have staged more data than fits on a single CD or DVD, you can use cback-span to split that data between multiple discs.

cback-span is not a general-purpose disc-splitting tool. It is a specialized program that requires Cedar Backup configuration to run. All it can do is read Cedar Backup configuration, find any staging directories that have not yet been written to disc, and split the files in those directories between discs.

cback-span accepts many of the same command-line options as cback, but must be run interactively. It cannot be run from cron. This is intentional. It is intended to be a useful tool, not a new part of the backup process (that is the purpose of an extension).

In order to use cback-span, you must configure your backup such that the largest individual backup file can fit on a single disc. The command will not split a single file onto more than one disc. All it can do is split large directories onto multiple discs. Files in those directories will be arbitrarily split up so that space is utilized most efficiently.

Syntax

The cback-span command has the following syntax:

 Usage: cback-span [switches]

 Cedar Backup 'span' tool.

 This Cedar Backup utility spans staged data between multiple discs.
 It is a utility, not an extension, and requires user interaction.

 The following switches are accepted, mostly to set up underlying
 Cedar Backup functionality:

   -h, --help     Display this usage/help listing
   -V, --version  Display version information
   -b, --verbose  Print verbose output as well as logging to disk
   -c, --config   Path to config file (default: /etc/cback.conf)
   -l, --logfile  Path to logfile (default: /var/log/cback.log)
   -o, --owner    Logfile ownership, user:group (default: root:adm)
   -m, --mode     Octal logfile permissions mode (default: 640)
   -O, --output   Record some sub-command (i.e. cdrecord) output to the log
   -d, --debug    Write debugging information to the log (implies --output)
   -s, --stack    Dump a Python stack trace instead of swallowing exceptions
         

Switches

-h, --help

Display usage/help listing.

-V, --version

Display version information.

-b, --verbose

Print verbose output to the screen as well writing to the logfile. When this option is enabled, most information that would normally be written to the logfile will also be written to the screen.

-c, --config

Specify the path to an alternate configuration file. The default configuration file is /etc/cback.conf.

-l, --logfile

Specify the path to an alternate logfile. The default logfile file is /var/log/cback.log.

-o, --owner

Specify the ownership of the logfile, in the form user:group. The default ownership is root:adm, to match the Debian standard for most logfiles. This value will only be used when creating a new logfile. If the logfile already exists when the cback command is executed, it will retain its existing ownership and mode. Only user and group names may be used, not numeric uid and gid values.

-m, --mode

Specify the permissions for the logfile, using the numeric mode as in chmod(1). The default mode is 0640 (-rw-r-----). This value will only be used when creating a new logfile. If the logfile already exists when the cback command is executed, it will retain its existing ownership and mode.

-O, --output

Record some sub-command output to the logfile. When this option is enabled, all output from system commands will be logged. This might be useful for debugging or just for reference. Cedar Backup uses system commands mostly for dealing with the CD/DVD recorder and its media.

-d, --debug

Write debugging information to the logfile. This option produces a high volume of output, and would generally only be needed when debugging a problem. This option implies the --output option, as well.

-s, --stack

Dump a Python stack trace instead of swallowing exceptions. This forces Cedar Backup to dump the entire Python stack trace associated with an error, rather than just propagating last message it received back up to the user interface. Under some circumstances, this is useful information to include along with a bug report.

Using cback-span

As discussed above, the cback-span is an interactive command. It cannot be run from cron.

You can typically use the default answer for most questions. The only two questions that you may not want the default answer for are the fit algorithm and the cushion percentage.

The cushion percentage is used by cback-span to determine what capacity to shoot for when splitting up your staging directories. A 650 MB disc does not fit fully 650 MB of data. It's usually more like 627 MB of data. The cushion percentage tells cback-span how much overhead to reserve for the filesystem. The default of 4% is usually OK, but if you have problems you may need to increase it slightly.

The fit algorithm tells cback-span how it should determine which items should be placed on each disc. If you don't like the result from one algorithm, you can reject that solution and choose a different algorithm.

The four available fit algorithms are:

worst

The worst-fit algorithm.

The worst-fit algorithm proceeds through a sorted list of items (sorted from smallest to largest) until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. The algorithm effectively includes the maximum number of items possible in its search for optimal capacity utilization. It tends to be somewhat slower than either the best-fit or alternate-fit algorithm, probably because on average it has to look at more items before completing.

best

The best-fit algorithm.

The best-fit algorithm proceeds through a sorted list of items (sorted from largest to smallest) until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. The algorithm effectively includes the minimum number of items possible in its search for optimal capacity utilization. For large lists of mixed-size items, it's not unusual to see the algorithm achieve 100% capacity utilization by including fewer than 1% of the items. Probably because it often has to look at fewer of the items before completing, it tends to be a little faster than the worst-fit or alternate-fit algorithms.

first

The first-fit algorithm.

The first-fit algorithm proceeds through an unsorted list of items until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. This algorithm generally performs more poorly than the other algorithms both in terms of capacity utilization and item utilization, but can be as much as an order of magnitude faster on large lists of items because it doesn't require any sorting.

alternate

A hybrid algorithm that I call alternate-fit.

This algorithm tries to balance small and large items to achieve better end-of-disk performance. Instead of just working one direction through a list, it alternately works from the start and end of a sorted list (sorted from smallest to largest), throwing away any item which causes capacity to be exceeded. The algorithm tends to be slower than the best-fit and first-fit algorithms, and slightly faster than the worst-fit algorithm, probably because of the number of items it considers on average before completing. It often achieves slightly better capacity utilization than the worst-fit algorithm, while including slightly fewer items.

Sample run

Below is a log showing a sample cback-span run.

================================================
           Cedar Backup 'span' tool
================================================

This the Cedar Backup span tool.  It is used to split up staging
data when that staging data does not fit onto a single disc.

This utility operates using Cedar Backup configuration.  Configuration
specifies which staging directory to look at and which writer device
and media type to use.

Continue? [Y/n]: 
===

Cedar Backup store configuration looks like this:

   Source Directory...: /tmp/staging
   Media Type.........: cdrw-74
   Device Type........: cdwriter
   Device Path........: /dev/cdrom
   Device SCSI ID.....: None
   Drive Speed........: None
   Check Data Flag....: True
   No Eject Flag......: False

Is this OK? [Y/n]: 
===

Please wait, indexing the source directory (this may take a while)...
===

The following daily staging directories have not yet been written to disc:

   /tmp/staging/2007/02/07
   /tmp/staging/2007/02/08
   /tmp/staging/2007/02/09
   /tmp/staging/2007/02/10
   /tmp/staging/2007/02/11
   /tmp/staging/2007/02/12
   /tmp/staging/2007/02/13
   /tmp/staging/2007/02/14

The total size of the data in these directories is 1.00 GB.

Continue? [Y/n]: 
===

Based on configuration, the capacity of your media is 650.00 MB.

Since estimates are not perfect and there is some uncertainly in
media capacity calculations, it is good to have a "cushion",
a percentage of capacity to set aside.  The cushion reduces the
capacity of your media, so a 1.5% cushion leaves 98.5% remaining.

What cushion percentage? [4.00]: 
===

The real capacity, taking into account the 4.00% cushion, is 627.25 MB.
It will take at least 2 disc(s) to store your 1.00 GB of data.

Continue? [Y/n]: 
===

Which algorithm do you want to use to span your data across
multiple discs?

The following algorithms are available:

   first....: The "first-fit" algorithm
   best.....: The "best-fit" algorithm
   worst....: The "worst-fit" algorithm
   alternate: The "alternate-fit" algorithm

If you don't like the results you will have a chance to try a
different one later.

Which algorithm? [worst]: 
===

Please wait, generating file lists (this may take a while)...
===

Using the "worst-fit" algorithm, Cedar Backup can split your data
into 2 discs.

Disc 1: 246 files, 615.97 MB, 98.20% utilization
Disc 2: 8 files, 412.96 MB, 65.84% utilization

Accept this solution? [Y/n]: n
===

Which algorithm do you want to use to span your data across
multiple discs?

The following algorithms are available:

   first....: The "first-fit" algorithm
   best.....: The "best-fit" algorithm
   worst....: The "worst-fit" algorithm
   alternate: The "alternate-fit" algorithm

If you don't like the results you will have a chance to try a
different one later.

Which algorithm? [worst]: alternate
===

Please wait, generating file lists (this may take a while)...
===

Using the "alternate-fit" algorithm, Cedar Backup can split your data
into 2 discs.

Disc 1: 73 files, 627.25 MB, 100.00% utilization
Disc 2: 181 files, 401.68 MB, 64.04% utilization

Accept this solution? [Y/n]: y
===

Please place the first disc in your backup device.
Press return when ready.
===

Initializing image...
Writing image to disc...