├── .gitignore ├── README ├── example-catchup.cfg ├── example.cfg └── zfs-backup.sh /.gitignore: -------------------------------------------------------------------------------- 1 | zfs-backup.cfg 2 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | This is a backup script to replicate a ZFS filesystem and its children to 2 | another server via zfs snapshots and zfs send/receive over ssh. It was 3 | developed on Solaris 10 but should run with minor modification on other 4 | platforms with ZFS support. 5 | 6 | It supplements zfs-auto-snapshot, but runs independently. I prefer that 7 | snapshots continue to be taken even if the backup fails. It does not 8 | necessarily require that package -- anything that regularly generates 9 | snapshots that follow a given pattern will suffice. 10 | 11 | Command-line options: 12 | -n debug/dry-run mode 13 | -v verbose mode 14 | -f file specify a configuration file 15 | -r N use the Nth most recent local snapshot rather than the newest 16 | -h, -? display help message 17 | 18 | 19 | Basic installation: After following the prerequisites, run manually to verify 20 | operation, and then add a line like the following to zfssnap's crontab: 21 | 30 * * * * /path/to/zfs-backup.sh [ options ] 22 | (This for an hourly sync -- adjust accordingly if you only want to back up 23 | daily, etc. zfs-backup now supports commandline options and configuration 24 | files, so you can schedule different cron jobs with different config files, 25 | e.g. to back up to two different targets. If you schedule multiple cron 26 | jobs, you should use different lockfiles in each configuration.) 27 | 28 | This aims to be much more robust than the backup functionality of 29 | zfs-auto-snapshot, namely: 30 | * it uses 'zfs send -I' to send all intermediate snapshots (including 31 | any daily/weekly/etc.), and should still work even if it isn't run 32 | every hour -- as long as the newest remote snapshot hasn't been 33 | rotated out locally yet 34 | * 'zfs recv -dF' on the destination host removes any snapshots not 35 | present locally so you don't have to worry about manually removing 36 | old snapshots there. 37 | 38 | PREREQUISITES: 39 | 1. zfs-auto-snapshot or equivalent package installed locally and regular 40 | snapshots enabled (hourly, daily, etc.) 41 | 2. home directory set for zfssnap role (the user taking snapshots and doing 42 | the sending): 43 | # rolemod -d /path/to/home zfssnap 44 | 3. ssh keys set up between zfssnap@localhost and remuser@remhost: 45 | # su - zfssnap 46 | $ ssh-keygen 47 | Copy the contents of .ssh/id_rsa.pub into ~remuser/.ssh/authorized_keys on 48 | remhost. Test that key-based ssh works: 49 | $ ssh remuser@remhost 50 | 4. zfs allow done for remuser on remhost: 51 | # zfs allow remuser atime,create,destroy,mount,mountpoint,receive,rollback,snapshot,userprop backuppool/fs 52 | This can be done on a top-level filesystem, and is inherited by default. 53 | Depending on your usage, you may need to also allow further permissions such 54 | as share, sharenfs, hold, etc. 55 | 5. an initial (full) zfs send/receive done so that remhost has the fs we 56 | are backing up, and the associated snapshots -- something like: 57 | zfs send -R $POOL/$FS@zfs-auto-snap_daily-(latest) | ssh $REMUSER@$REMHOST zfs recv -dvF $REMPOOL 58 | Note: 'zfs send -R' will send *all* snapshots associated with a dataset, so 59 | if you wish to purge old snapshots, do that first. 60 | 6. zfs allow any additional permissions needed, to fix any errors produced in step 5 61 | 7. configure the TAG/PROP/REMUSER/REMHOST/REMPOOL variables in this script or in a config file 62 | 8. zfs set $PROP={ fullpath | basename | rootfs } pool/fs 63 | for each FS or volume you wish to back up. 64 | 65 | PROPERTY VALUES: 66 | Given the hierarchy pool/a/b, 67 | * with 'fullpath' (zfs recv -d), this is replicated to backupserver:backuppool/a/b 68 | * with 'basename' (zfs recv -e), this is replicated to backupserver:backuppool/b 69 | This is useful for replicating a sub-level FS into the top level of the backup pool; 70 | e.g. pool/backup/foo => backuppool/foo (instead of backuppool/backup/foo) 71 | * with 'rootfs' set on pool (the root filesystem in the pool; uses zfs recv -d 72 | with target set to $REMPOOL), pool is replicated to backupserver:backuppool. 73 | It is an error to set this property value on any child filesystem. 74 | 75 | WARNING: This can be dangerous -- any filesystems in $REMPOOL which do not 76 | exist in the source will be deleted! For reasons of safety and simplicity, 77 | it is usually preferable to work with ZFS filesystems rather than the root fs, 78 | or use the 'fullpath' property value, which will receive a root filesystem 79 | into a child filesystem of the same name, otherwise replicate all children 80 | into top-level child filesystems, and not touch any unknown filesystems. 81 | 82 | If this backup is not run for a long enough period that the newest 83 | remote snapshot has been removed locally, manually run an incremental 84 | zfs send/recv to bring it up to date, a la 85 | zfs send -I zfs-auto-snap_daily-(latest on remote) -R $POOL/$FS@zfs-auto-snap_daily-(latest local) | 86 | ssh $REMUSER@REMHOST zfs recv -dvF $REMPOOL 87 | It's probably best to do a dry-run first (zfs recv -ndvF). 88 | 89 | Note: I use daily snapshots in these manual send/recv examples because 90 | it is less likely that the snapshot you are using will be rotated out 91 | in the middle of a send. Also, note that ZFS will send all snapshots for a 92 | given filesystem before sending any for its children, rather than going in 93 | global date order. 94 | 95 | Alternatively, use a different tag (e.g. weekly) that still has common 96 | snapshots, possibly in combination with the -r option (Nth most recent) to 97 | avoid short-lived snapshots (e.g. hourly) being rotated out in the middle 98 | of your sync. This is a good use case for an alternate configuration file. 99 | 100 | 101 | PROCEDURE: 102 | * find newest local hourly snapshot 103 | * find newest remote hourly snapshot (via ssh) 104 | * check that both $newest_local and $latest_remote snaps exist locally 105 | * zfs send incremental (-I) from $newest_remote to $latest_local to dsthost 106 | * if anything fails, set svc to maint. and exit 107 | -------------------------------------------------------------------------------- /example-catchup.cfg: -------------------------------------------------------------------------------- 1 | # Use the standard backup settings, but use daily snapshots to catch up 2 | # in case of missed backups. You probably should run with '-n' (debug) 3 | # first to be safe. 4 | 5 | # $0 the is backup script that sources this, but $1 is in fact this very file. 6 | . `dirname $1`/example.cfg 7 | 8 | TAG="zfs-auto-snap_daily" 9 | VERBOSE="-v" 10 | 11 | # Use the 3rd most recent daily snapshot, to avoid the hourly snapshots 12 | # entirely. To override this and get the most recent snapshot, use '-r 1' on 13 | # the cmdline. I recommend you first run it as-is, then run with -r 1 14 | # afterwards. After that you should be ready to re-enable the hourly backups. 15 | RECENT=3 16 | -------------------------------------------------------------------------------- /example.cfg: -------------------------------------------------------------------------------- 1 | # sample configuration for hourly backups 2 | 3 | # Don't forget to schedule a cron job such as: 4 | # 30 * * * * /path/to/zfs-backup.sh /path/to/this.cfg 5 | 6 | TAG="zfs-auto-snap_hourly" 7 | LOCK="/var/tmp/zfsbackup.lock" 8 | PID="/var/tmp/zfsbackup.pid" 9 | # remote settings (on destination host) 10 | REMUSER="zfsbak" 11 | REMHOST="backupserver.local" 12 | REMPOOL="backuppool" 13 | -------------------------------------------------------------------------------- /zfs-backup.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env ksh 2 | # Needs a POSIX-compatible sh, like ash (Debian & FreeBSD /bin/sh), ksh, or 3 | # bash. On Solaris 10 you need to use /usr/xpg4/bin/sh (the POSIX shell) or 4 | # /bin/ksh -- its /bin/sh is an ancient Bourne shell, which does not work. 5 | 6 | # backup script to replicate a ZFS filesystem and its children to another 7 | # server via zfs snapshots and zfs send/receive 8 | # 9 | # SMF manifests welcome! 10 | # 11 | # v0.4 (unreleased) - misc. fixes; portability & doc improvements 12 | # v0.3 - cmdline options and cfg file support 13 | # v0.2 - multiple datasets 14 | # v0.1 - initial working version 15 | 16 | # Copyright (c) 2009-15 Andrew Daugherity 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without 20 | # modification, are permitted provided that the following conditions 21 | # are met: 22 | # 1. Redistributions of source code must retain the above copyright 23 | # notice, this list of conditions and the following disclaimer. 24 | # 2. Redistributions in binary form must reproduce the above copyright 25 | # notice, this list of conditions and the following disclaimer in the 26 | # documentation and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 30 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 31 | # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 34 | # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 35 | # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 36 | # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 37 | # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 38 | # SUCH DAMAGE. 39 | 40 | 41 | # Basic installation: After following the prerequisites, run manually to verify 42 | # operation, and then add a line like the following to zfssnap's crontab: 43 | # 30 * * * * /path/to/zfs-backup.sh 44 | # 45 | # Consult the README file for details. 46 | 47 | # If this backup is not run for a long enough period that the newest 48 | # remote snapshot has been removed locally, manually run an incremental 49 | # zfs send/recv to bring it up to date, a la 50 | # zfs send -I zfs-auto-snap_daily-(latest on remote) -R \ 51 | # $POOL/$FS@zfs-auto-snap_daily-(latest local) | \ 52 | # ssh $REMUSER@REMHOST zfs recv -dvF $REMPOOL 53 | # It's probably best to do a dry-run first (zfs recv -ndvF). 54 | 55 | 56 | # PROCEDURE: 57 | # * find newest local hourly snapshot 58 | # * find newest remote hourly snapshot (via ssh) 59 | # * check that both $newest_local and $latest_remote snaps exist locally 60 | # * zfs send incremental (-I) from $newest_remote to $latest_local to dsthost 61 | # * if anything fails, set svc to maint. and exit 62 | 63 | # all of the following variables (except CFG) may be set in the config file 64 | DEBUG="" # set to non-null to enable debug (dry-run) 65 | VERBOSE="" # "-v" for verbose, null string for quiet 66 | LOCK="/var/tmp/zfsbackup.lock" 67 | PID="/var/tmp/zfsbackup.pid" 68 | CFG="/var/lib/zfssnap/zfs-backup.cfg" 69 | ZFS="/usr/sbin/zfs" 70 | # Replace with sudo(8) if pfexec(1) is not available on your OS 71 | PFEXEC=`which pfexec` 72 | 73 | # local settings -- datasets to back up are now found by property 74 | TAG="zfs-auto-snap_daily" 75 | PROP="edu.tamu:backuptarget" 76 | # remote settings (on destination host) 77 | REMUSER="zfsbak" 78 | # special case: when $REMHOST=localhost, ssh is bypassed 79 | REMHOST="backupserver.my.domain" 80 | REMPOOL="backuppool" 81 | REMZFS="$ZFS" 82 | 83 | 84 | usage() { 85 | echo "Usage: $(basename $0) [ -nv ] [-r N ] [ [-f] cfg_file ]" 86 | echo " -n\t\tdebug (dry-run) mode" 87 | echo " -v\t\tverbose mode" 88 | echo " -f\t\tspecify a configuration file" 89 | echo " -r N\t\tuse the Nth most recent snapshot instead of the newest" 90 | echo "If the configuration file is last option specified, the -f flag is optional." 91 | exit 1 92 | } 93 | # simple ordinal function, does not validate input 94 | ord() { 95 | case $1 in 96 | 1|*[0,2-9]1) echo "$1st";; 97 | 2|*[0,2-9]2) echo "$1nd";; 98 | 3|*[0,2-9]3) echo "$1rd";; 99 | *1[123]|*[0,4-9]) echo "$1th";; 100 | *) echo $1;; 101 | esac 102 | } 103 | 104 | # Option parsing 105 | set -- $(getopt h?nvf:r: $*) 106 | if [ $? -ne 0 ]; then 107 | usage 108 | fi 109 | for opt; do 110 | case $opt in 111 | -h|-\?) usage;; 112 | -n) dbg_flag=Y; shift;; 113 | -v) verb_flag=Y; shift;; 114 | -f) CFG=$2; shift 2;; 115 | -r) recent_flag=$2; shift 2;; 116 | --) shift; break;; 117 | esac 118 | done 119 | if [ $# -gt 1 ]; then 120 | usage 121 | elif [ $# -eq 1 ]; then 122 | CFG=$1 123 | fi 124 | # If file is in current directory, add ./ to make sure the correct file is sourced 125 | if [ $(basename $CFG) = "$CFG" ]; then 126 | CFG="./$CFG" 127 | fi 128 | # Read any settings from a config file, if present 129 | if [ -r $CFG ]; then 130 | # Pass its name as a parameter so it can use $(dirname $1) to source other 131 | # config files in the same directory. 132 | . $CFG $CFG 133 | fi 134 | # Set options now, so cmdline opts override the cfg file 135 | [ "$dbg_flag" ] && DEBUG=1 136 | [ "$verb_flag" ] && VERBOSE="-v" 137 | [ "$recent_flag" ] && RECENT=$recent_flag 138 | # set default value so integer tests work 139 | if [ -z "$RECENT" ]; then RECENT=0; fi 140 | # local (non-ssh) backup handling: REMHOST=localhost 141 | if [ "$REMHOST" = "localhost" ]; then 142 | REMZFS_CMD="$ZFS" 143 | else 144 | REMZFS_CMD="ssh $REMUSER@$REMHOST $REMZFS" 145 | fi 146 | 147 | # Usage: do_backup pool/fs/to/backup receive_option 148 | # receive_option should be -d for full path and -e for base name 149 | # See the descriptions in the 'zfs receive' section of zfs(1M) for more details. 150 | do_backup() { 151 | 152 | DATASET=$1 153 | FS=${DATASET#*/} # strip local pool name 154 | FS_BASE=${DATASET##*/} # only the last part 155 | RECV_OPT=$2 156 | 157 | case $RECV_OPT in 158 | -e) TARGET="$REMPOOL/$FS_BASE" 159 | ;; 160 | -d) TARGET="$REMPOOL/$FS" 161 | ;; 162 | rootfs) if [ "$DATASET" = "$(basename $DATASET)" ]; then 163 | TARGET="$REMPOOL" 164 | RECV_OPT="-d" 165 | else 166 | BAD=1 167 | fi 168 | ;; 169 | *) BAD=1 170 | esac 171 | if [ $# -ne 2 -o "$BAD" ]; then 172 | echo "Oops! do_backup called improperly:" 1>&2 173 | echo " $*" 1>&2 174 | return 2 175 | fi 176 | 177 | if [ $RECENT -gt 1 ]; then 178 | newest_local="$($ZFS list -t snapshot -H -S creation -o name -d 1 $DATASET | grep $TAG | awk NR==$RECENT)" 179 | if [ -z "$newest_local" ]; then 180 | echo "Error: could not find $(ord $RECENT) most recent snapshot matching tag" >&2 181 | echo "'$TAG' for ${DATASET}!" >&2 182 | return 1 183 | fi 184 | msg="using local snapshot ($(ord $RECENT) most recent):" 185 | else 186 | newest_local="$($ZFS list -t snapshot -H -S creation -o name -d 1 $DATASET | grep $TAG | head -1)" 187 | if [ -z "$newest_local" ]; then 188 | echo "Error: no snapshots matching tag '$TAG' for ${DATASET}!" >&2 189 | return 1 190 | fi 191 | msg="newest local snapshot:" 192 | fi 193 | snap2=${newest_local#*@} 194 | [ "$DEBUG" -o "$VERBOSE" ] && echo "$msg $snap2" 195 | 196 | if [ "$REMHOST" = "localhost" ]; then 197 | newest_remote="$($ZFS list -t snapshot -H -S creation -o name -d 1 $TARGET | grep $TAG | head -1)" 198 | err_msg="Error fetching snapshot listing for local target pool $REMPOOL." 199 | else 200 | # ssh needs public key auth configured beforehand 201 | # Not using $REMZFS_CMD because we need 'ssh -n' here, but must not use 202 | # 'ssh -n' for the actual zfs recv. 203 | newest_remote="$(ssh -n $REMUSER@$REMHOST $REMZFS list -t snapshot -H -S creation -o name -d 1 $TARGET | grep $TAG | head -1)" 204 | err_msg="Error fetching remote snapshot listing via ssh to $REMUSER@$REMHOST." 205 | fi 206 | if [ -z $newest_remote ]; then 207 | echo "$err_msg" >&2 208 | [ $DEBUG ] || touch $LOCK 209 | return 1 210 | fi 211 | snap1=${newest_remote#*@} 212 | [ "$DEBUG" -o "$VERBOSE" ] && echo "newest remote snapshot: $snap1" 213 | 214 | if ! $ZFS list -t snapshot -H $DATASET@$snap1 > /dev/null 2>&1; then 215 | exec 1>&2 216 | echo "Newest remote snapshot '$snap1' does not exist locally!" 217 | echo "Perhaps it has been already rotated out." 218 | echo "" 219 | echo "Manually run zfs send/recv to bring $TARGET on $REMHOST" 220 | echo "to a snapshot that exists on this host (newest local snapshot with the" 221 | echo "tag $TAG is $snap2)." 222 | [ $DEBUG ] || touch $LOCK 223 | return 1 224 | fi 225 | if ! $ZFS list -t snapshot -H $DATASET@$snap2 > /dev/null 2>&1; then 226 | exec 1>&2 227 | echo "Something has gone horribly wrong -- local snapshot $snap2" 228 | echo "has suddenly disappeared!" 229 | [ $DEBUG ] || touch $LOCK 230 | return 1 231 | fi 232 | 233 | if [ "$snap1" = "$snap2" ]; then 234 | [ $VERBOSE ] && echo "Remote snapshot is the same as local; not running." 235 | return 0 236 | fi 237 | 238 | # sanity checking of snapshot times -- avoid going too far back with -r 239 | snap1time=$($ZFS get -Hp -o value creation $DATASET@$snap1) 240 | snap2time=$($ZFS get -Hp -o value creation $DATASET@$snap2) 241 | if [ $snap2time -lt $snap1time ]; then 242 | echo "Error: target snapshot $snap2 is older than $snap1!" 243 | echo "Did you go too far back with '-r'?" 244 | return 1 245 | fi 246 | 247 | if [ $DEBUG ]; then 248 | echo "would run: $PFEXEC $ZFS send -R -I $snap1 $DATASET@$snap2 |" 249 | echo " $REMZFS_CMD recv $VERBOSE $RECV_OPT -F $REMPOOL" 250 | else 251 | if ! $PFEXEC $ZFS send -R -I $snap1 $DATASET@$snap2 | \ 252 | $REMZFS_CMD recv $VERBOSE $RECV_OPT -F $REMPOOL; then 253 | echo 1>&2 "Error sending snapshot." 254 | touch $LOCK 255 | return 1 256 | fi 257 | fi 258 | } 259 | 260 | # begin main script 261 | if [ -e $LOCK ]; then 262 | # this would be nicer as SMF maintenance state 263 | if [ -s $LOCK ]; then 264 | # in normal mode, only send one email about the failure, not every run 265 | if [ "$VERBOSE" ]; then 266 | echo "Service is in maintenance state; please correct and then" 267 | echo "rm $LOCK before running again." 268 | fi 269 | else 270 | # write something to the file so it will be caught by the above 271 | # test and cron output (and thus, emails sent) won't happen again 272 | echo "Maintenance mode, email has been sent once." > $LOCK 273 | echo "Service is in maintenance state; please correct and then" 274 | echo "rm $LOCK before running again." 275 | fi 276 | exit 2 277 | fi 278 | 279 | if [ -e "$PID" ]; then 280 | [ "$VERBOSE" ] && echo "Backup job already running!" 281 | exit 0 282 | fi 283 | echo $$ > $PID 284 | 285 | FAIL=0 286 | # get the datasets that have our backup property set 287 | COUNT=$($ZFS get -s local -H -o name,value $PROP | wc -l) 288 | if [ $COUNT -lt 1 ]; then 289 | echo "No datasets configured for backup! Please set the '$PROP' property" 290 | echo "appropriately on the datasets you wish to back up." 291 | rm $PID 292 | exit 2 293 | fi 294 | $ZFS get -s local -H -o name,value $PROP | 295 | while read dataset value 296 | do 297 | case $value in 298 | # property values: 299 | # Given the hierarchy pool/a/b, 300 | # * fullpath: replicate to backuppool/a/b 301 | # * basename: replicate to backuppool/b 302 | fullpath) [ $VERBOSE ] && printf "\n$dataset:\n" 303 | do_backup $dataset -d 304 | ;; 305 | basename) [ $VERBOSE ] && printf "\n$dataset:\n" 306 | do_backup $dataset -e 307 | ;; 308 | rootfs) [ $VERBOSE ] && printf "\n$dataset:\n" 309 | if [ "$dataset" = "$(basename $dataset)" ]; then 310 | do_backup $dataset rootfs 311 | else 312 | echo "Warning: $dataset has 'rootfs' backuptarget property but is a non-root filesystem -- skipping." >&2 313 | fi 314 | ;; 315 | *) echo "Warning: $dataset has invalid backuptarget property '$value' -- skipping." >&2 316 | ;; 317 | esac 318 | STATUS=$? 319 | if [ $STATUS -gt 0 ]; then 320 | FAIL=$((FAIL | STATUS)) 321 | fi 322 | done 323 | 324 | if [ $FAIL -gt 0 ]; then 325 | if [ $((FAIL & 1)) -gt 0 ]; then 326 | echo "There were errors backing up some datasets." >&2 327 | fi 328 | if [ $((FAIL & 2)) -gt 0 ]; then 329 | echo "Some datasets had misconfigured $PROP properties." >&2 330 | fi 331 | fi 332 | 333 | rm $PID 334 | exit $FAIL 335 | --------------------------------------------------------------------------------