├── LICENSE ├── movestough-ownership.conf ├── movestough-scaffolding.conf ├── README.md └── movestough.sh /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 John Mark Mitchell 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /movestough-ownership.conf: -------------------------------------------------------------------------------- 1 | # This is the config file contains destination directory ownership to enforce 2 | # 3 | # Each line below should contain a owner:group pair followed by a tab followed 4 | # by a fully qualified path to which the owner information should be applied. 5 | # For safety reasons directories specified below must be subdirectories of the 6 | # destination directory. 7 | # 8 | # This file is not required for the script to run but, if needed, should be 9 | # specified via the -o or --ownership flag. 10 | # 11 | # More info: 12 | # By default, the contents that are moved from the source directory (specified 13 | # via the -s or --source flag) to the destination directory (-d or 14 | # --destination flag) will inherit the ownership of the system/user account 15 | # under which the script is run. 16 | # 17 | # The default behavior can be overriden by supplying a owner:group pair and the 18 | # corresponding path that will be supplied to chown. Each line supplied below 19 | # will be recursively applied to the specified directory. 20 | # 21 | # Note that each line is processed in order. As a result, if a path is listed 22 | # multiple times or a parent path is listed after the subdirectory, the last 23 | # supplied ownership will win. This same effect can be used in reverse to 24 | # purposefully enforce different ownership on a subdirectory than is apploed to 25 | # the parent directory. This is accomplished supplying an entry below for the 26 | # subdirectory after the listing for the parent directory. 27 | # 28 | # example if the source directory is "/shares/users/": 29 | # 30 | # root:root /shares/users/ 31 | # usera:usera /shares/users/User A/ 32 | # userb:userb /shares/users/User B/ 33 | # userc:userc /shares/users/User B/Private to User C/ 34 | # root:root /shares/users/admin/backups/ 35 | -------------------------------------------------------------------------------- /movestough-scaffolding.conf: -------------------------------------------------------------------------------- 1 | # This is the config file containing directory structures to preserve 2 | # 3 | # Each line below should be a fully qualified path to a directory housed in the 4 | # source directory path. The directory paths specified below (we'll call it 5 | # scaffolding) will be preserved from the directory clean up that happens after 6 | # all the incoming source files and directories are moved and cleaned up. This 7 | # file is not required for the script to run but, if needed, should be 8 | # specified via the -p or --preserve flag. 9 | # 10 | # More info: 11 | # By default, the contents of the source directory (specified via the -s or 12 | # --source flag) will be replicated to the destination directory (-d or 13 | # --destination flag) and the directory skeleton in the source directory will 14 | # be removed if the directory is empty and older than a specified aging period 15 | # (default is 15 mins). The aging period is a default behavior so as to supply 16 | # a reasonable window of time for a directory to be created and used without it 17 | # being deleted during other user-generated operations. 18 | 19 | # To preserve a scaffolding of directories in the source directory including 20 | # the fully qualified paths preserve below. If this config file is supplied to 21 | # the script via the -p or --preserve flag, this will result in the default 22 | # aging to be overriden and the directories specified below to be left in tact 23 | # on the source side. 24 | # 25 | # It should be noted that this has no effect on default handling of files in 26 | # the directories specified below. Files are effectively moved from the source 27 | # directory to the destintation directory regardless. 28 | # 29 | # example if the source directory is "/shares/users/": 30 | # /shares/users/User A/ 31 | # /shares/users/User B/ 32 | # /shares/users/Another User/ 33 | 34 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # movestough.sh 2 | 3 | 4 | **tl;dr: Bash script to monitor a source directory and move any added objects to a target directory, ensuring via fail-safe handling of data and then clean up the source directory as directed. Log all actions.** 5 | 6 | 7 | ## Intended Use 8 | The principal use of this script is to move files from a source directory to a target directory, on the same filesystem, via fail-safe handling. 9 | 10 | For more background information, usage suggestions and the latest version of this script, please visit: https://github.com/jmmitchell/movestough 11 | 12 | 13 | **Strengths over alternative options:** 14 | For those more familiar with linux cli handiwork, you can think of this script as the best of `rsyc`, `find`, `mv`, `mkdir` & `rmdir` rolled into one. 15 | 16 | This script excels in comparison to using `rsync` or `mv` alone in that it provides finer-grained handling of the move process. 17 | 18 | While there may be faster or less complicted ways to simply move some files, this script has been crafted to err on the side of data preservation and to produce an audit trail for all actions taken. 19 | 20 | To expand upon that, a few of the distinctive features are: 21 | 22 | (1) Every create/update/delete action can be logged in a timestamped log. The `mv` command does not do this and `rsync`, without some serious canjoling, omits key actions from it's output and therefore leaves you with no record. 23 | 24 | (2) The directory structure on the source filesystem can be cleaned up once they are empty. Specifically, after moving the source files to their target, if there empty directories on the source filesystem they can be deleted after a specified delay or they can be selectively preserved. The ability to clean up the empty source subdirectories after moving the files conatined within is a feature that is sorely missing from `rsync`. 25 | 26 | (4) Directories that are targeted for deletion as part of the clean up process will not be deleted if they, for unforseen reasons, are not empty when we attempt to deleted them. This may seem obvious but is a subtle use-case that can easily manifest itself if this script is used on a computer where files are being added by other local processes (e.g. dropbox) or has a shared filesystem where network users may be actively creating new files in the same directory structure in which we are working. 27 | 28 | (3) Any attempt to move a file is done with the greatest care for data preservation. As an example, a file that appears to be a duplicate, by file name conflict, should be examined closely before any potentially destructive action is taken. Files that are suspected to be duplicates are validated by a hash function. If the source file (file to be moved) is found to be a byte-for-byte duplicate of the existing target (i.e. destination) file, the source file is not copied and be safely disregarded. If on the other hand, the source file is found to have the same name as the existing target file but contains different data it will be moved but will be given a unique filename so as to preserved the data and allow a person to manually inspect both files after the fact and to make a judgement call on what to keep. 29 | 30 | 31 | **Feature summary:** 32 | 33 | - empty directories cleaned up from source directory (`rsync` does not do this) 34 | - via optional config, source subdirectories can be selectively preserved 35 | - consistent timestamped logging of all create/update/delete actions 36 | - log all items removed from the source directory (`rsync` does not do this) 37 | - logs level can specified to record increasing levels of detail 38 | - warging messages in logs include stderr and return code of failed command 39 | - two levels of verbose output are available when when run interactively 40 | - target directory structure will be created, if needed 41 | - via optional config file, new ownership rules can be specified for items moved to the target directory 42 | - each potential file collisions (files with the same name & date) are verified via checksum of both source and target files 43 | - verified file collisions are moved to the target via deconflicted filename as opposed to `rsync` delta copy that might replicate a partial written file 44 | - exact duplicate files, verified via checksum, are not replicated 45 | - if source and target are on the same filesystem, files are "moved" super fast via metadata update (directory entries) rather than reading and rewriting file contents 46 | - file-level atomic moves ensuring there is never a partially moved file 47 | 48 | 49 | **Shout-outs:** 50 | 51 | > If I have seen further than others, it is by standing upon the shoulders of giants. 52 | > - Isaac Newton 53 | 54 | There are many nameless people who have unselfishly contributed their time to help share their enthusiasm for and knowledge about solving problems via software development. There are not enough words to properly thank each. There are also some notably brilliant minds who had creative solutions to challenges that were faced in wrangling bash into doing what was needed in this script. References to their contributions are included below for your further reference and enjoyment: 55 | 56 | - [handling of various levels of verbose output via file descriptors](http://stackoverflow.com/a/20942015/171475) 57 | - [capturing stderr, stdout, return code from a command executed in a subshell](http://stackoverflow.com/a/26827443/171475) 58 | 59 | 60 | **WARNING** 61 | It should be recognized that while this script makes every attempt to fail safely, there are most probably edge cases that are not accounted for, so proceed with caution, question everything and, if you find bugs please contribute back with a pull request. You have been warned. 62 | 63 | 64 | **LICENSE** 65 | This software is licensed using the MIT License 66 | 67 | Copyright (c) 2016 John Mark Mitchell 68 | 69 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 70 | 71 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 72 | 73 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 74 | SOFTWARE. 75 | 76 | 77 | 78 | 79 | ##The Details 80 | 81 | 1. Using rsync, look in the source directory for files, directories and links that need to be moved to the destination directory. 82 | 2. Using rsync, replicate the identified structural items (folders & links) to the destination directory, applying attribute changes as needed to mirror the source. Once replicated, rsync will remove the source items. An exception to this is subdirectories. These are handled last. 83 | 3. For completely new files found in the source directory, move them via the mv command. 84 | 4. Manage any possible file collisions by making offending source file names unique before moving them to the target directory. The the collision-free change list is fed to mv to complete the move from source to destination. 85 | 5. For files that appear exactly the same (size, attributes and change date) in the source and destination, check each via hash against the matching destination file. If the hashes match, there is no need to mv the file, so the source file is deleted. Else, move the file via unique file name. Uniqueness style can be specified via the `--unique-style` (or `-u` for short). 86 | 6. If changes were made in the destination directory, check the supplied ownership config file and reinforce ownership as indicated. 87 | 7. Look for stale, empty directories in the source directory and remove them. Staleness is determined by the change date as compared to the number of minutes passed into the flag `--minutes-until-stale` (or `-ms` for short). If no flag value is set, staleness defaults to 15 mins. 88 | 89 | **Dependencies** 90 | 91 | - linux - versions / flavors?? 92 | - bash and builtins (declare, echo, let, local, printf, read) 93 | - external dependencies 94 | - mkdir 95 | - rsync 96 | - mv 97 | 98 | 99 | 100 | ##Usage 101 | The script accepts arguments via the following flags: 102 | 103 | > **`-s=` or `--source=`** 104 | REQUIRED. Source directory path. Should be a path to the source directory to inspect for new file system changes that will be moved to the target directory. The argument passed can be a symbolic link to the intended source directory. 105 | 106 | > **`-d=` or `--destination=`** 107 | REQUIRED. Destination directory path. Should be a path to the target directory to which source directory additions will be added. The argument passed can be a symbolic link to the intended destination directory. 108 | 109 | 110 | > **`-p=` or `--preserve=`** 111 | Directories to preserve config file. Parameter should be the path to file that contains directives about source subdirectories to perserve from the stale directory clean up process. 112 | 113 | > **`-o=` or `--ownership=`** 114 | Ownership config file. Parameter should be the path to file that contains file ownership and group permissions to enforce on the files and subdirectories of the target path. 115 | 116 | > **`-l=` or `--log=`** 117 | Log File. Parameter should be the path to file where the script should log changes. 118 | 119 | > **`-ms=` or `--minutes-until-stale=`** 120 | Minutes until directories are stale. Parameter should be a whole number. After files are moved from the source directory to the target directory, the subdirectories of the source directory left in tact for a set period of time. By default that is 15 minutes. The number supplied as the parameter can override the defualt behavior. 121 | 122 | > **`-u=` or `--unique-style=`** 123 | Syle to use when renaming files. Source files that are conflicting (same name but different content) with files in the target directory are made unique so that they can be moved to the target directory via one of the following styles: 124 | >> `1` : style 1, unique string is inserted at the last period found in the filename 125 | >> `2` : style 2, unique string is inserted at the first period of the filename 126 | >> < default >: if no a unique string is appended to the end of the file name 127 | 128 | > **`-v=` or `--verbose=`** 129 | Verboseness level of the interactive output. Options are: 130 | >> `1` : basic information for each major file type 131 | >> `2` : detailed information about each action taken 132 | 133 | **CLI** 134 | To run the script only two parameters are required: 135 | 1. source directory path (via `-s` or `--source`), and 136 | 2. destination directory path (via `-d` or `--destination`) 137 | 138 | An example of how the script might be typically used: 139 | 140 | /my/scripts/movestough.sh \ 141 | -s=/incoming/pictures/ \ 142 | -d="/media/pictures/to be processed/" \ 143 | -p=/my/scripts/movestough-scaffolding.conf \ 144 | -o=/my/scripts/movestough-ownership.conf \ 145 | -l=/var/log/movestough.log \ 146 | -v=2 \ 147 | -u=1 148 | 149 | **Cron** 150 | To run the script in an automated, unattended manor, the preferred method is via `cron`. 151 | First, if you are unfamilar with `cron`, [read the cron man page](http://linux.die.net/man/5/crontab). It is important to use `flock` when running this script from a crontab. More details on `flock` can be found on its [the flock man page](http://linux.die.net/man/2/flock). 152 | This allows the script to be executed often but keeps the system from having 153 | simultaneous copies of the script running. Not only is this resource (CPU, RAM) friendly, this eliminates race conditions and other nasty unintendedside effects. 154 | 155 | A suggested crontab entry to run the script every 3 minutes with same configuration as the CLI example above would look something like: 156 | 157 | 158 | */3 * * * * /usr/bin/flock -w 0 -n /var/lock/movestough.lock /my/scripts/movestough.sh -s=/incoming/pictures/ -d="/media/pictures/to be processed/" -p=/my/scripts/movestough-scaffolding.conf -o=/my/scripts/movestough-ownership.conf -l=/var/log/movestough.log -u=1 159 | 160 | It is unfortunate that crontab entries have no means of commenting or multiline configuration. To be accurate, the above entry is displayed as a single line as it would need to be in your crontab. If you want to understand the details of the example crontab entry, it is recommended that you copy the contents above to a text editor where you can enable line wrapping or temporarily insert white space to better visualize the contents. 161 | 162 | If you expect to use the script in an unattended manor for an extended period of time, consider using `logrotate` to manage your logs so they do not grow indefinitely. If you are unfamiliar with `logrotate`, [read the logrotate man page](http://linux.die.net/man/8/logrotate). An example config for `logrotate` might look something like: 163 | 164 | /var/log/movestough*.log { 165 | size 10M 166 | weekly 167 | rotate 12 168 | maxage 90 169 | compress 170 | missingok 171 | copytruncate 172 | } 173 | 174 | 175 | 176 | ##History 177 | **changelog:** 178 | 179 | 2016-05-04 no change; verified that source or target directory arguments can be a symbolic link to a directory and the script functions properly on fronts 180 | 2016-05-04 no change; verified that `if [ -d "/dev/null" ]` fails which keeps it from erroneously being supplied as the source or target directory 181 | 2016-04-28 added checks to `make\_filename\_unique` function to account for file names that don't have a period or an extension 182 | 2016-04-28 fix display of verbose level number on interactive run 183 | 2016-04-27 fixed error where `--ownership` was assumed active even if it was not specified 184 | 2016-04-26 fixed error where `--preserve` was assumed active even if it was not specified 185 | 2016-04-25 corrected verbose output messages for file collision processing 186 | 2016-04-24 improved `make\_filename\_unique` function allowing a style number to be specified via a `-unique-style=` (or `-u=` for short) flag 187 | 2016-04-23 just in case it was not specified in a via a `--preserve=` (or `-p` for short) config file, the source directory is preserved during the stale directory clean up process 188 | 2016-04-23 updated docs and comments to prepare for github 189 | 2016-04-20 added basic exit codes 190 | 2016-04-18 changed verbose output from notify function to use file descriptors (allows safe handling of all chars in output) 191 | 2016-04-17 fixed use of vars in `printf` statements; example: `printf "Hello, %s\n" "$NAME"` 192 | 2016-04-16 standardized log file output mimicking `rsync` bit flag output 193 | 2015-04-15 added `cmd\_out`, `cmd\_err`, `cmd\_rtn` fu to each `rsync`, `mv`, `rmdir` command 194 | 2016-04-10 fixed nasty bug in stale dir check introduced by file checks 195 | 2016-04-09 optimized flag checks and added file exist/write checks 196 | 2016-04-07 fixed egregious bug with rsync use (missing `--dry-run`) 197 | 2016-04-05 add destination file ownership via a conf file 198 | 2016-04-05 add verbose level2 checking for errors of file mv commands 199 | 2016-04-04 add handling of commented lines, as delimited by the # symbol, in the input config files 200 | 2016-04-03 add levels of verboseness to stdout 201 | 2016-04-02 standardize verbose output to stdout in function 202 | 2016-03-30 add args for source and destination directories 203 | 2016-03-29 allow directories to be preserved in source via conf file 204 | 2016-03-27 rewrite to use rsync to only list changes (fixes overwrites) 205 | 2016-12-02 updated README.md and comments to clarify intended uses and distictive features of this script over traditional alternative like using `rsync` or `mv` alone 206 | 207 | 208 | **backlog in no particular order:** 209 | 210 | 211 | - verify what happens if source and target are the same 212 | - create a help function triggered by checking and `--help` or `-?` 213 | - implement check on success of rsync when creating the checksum_match 214 | - add some checking for dependencies and minimum versions of bash, rsync, etc.; once discovered, this should also be added to the documentation as well in the dependencies section of the README 215 | - add the ability for the script to be run with prompts for required parameters 216 | - add the ability to daemonize the script 217 | - implement an internal list of files or directory changes and narrow the use of chown to only those 218 | - consider the use of `trap` to ensure consistent state when forced to exit 219 | - clean up out and make it more uniform between versions of verboseness 220 | - for efficiency, the structure changes loop should ignore owner and group differences because we very well might be enforcing the variance; this manifests itself when directories are not in the scaffolding to keep but have yet to be marked as stale 221 | - consider use of echo `-n` or `<<<` to suppress newlines or eliminate piped sed 222 | - verify proper use of `[[ ]]` as opposed to `[ ]` in if statements 223 | - consider convering `[[ ]]` to `(( ))` when the comparison is numerical (as opposed to string comparison) 224 | - periodically test and check notices from http://www.shellcheck.net/ 225 | - implement check on success of `rsync` in hash check 226 | - add different deconflicting options, like keeping file extension in tact 227 | - add levels of verboseness levels to logging 228 | - test links following behavior 229 | - test moving of links, esp in light of: `--include='\*/' --exclude='\*'` 230 | - test chown ownership of links see: `--no-dereference` flag in man page 231 | - consider grouping piped sed and grep to one: `sed -e 'pattern 1' -e 'pattern 2'` 232 | - implement use of warning_count and output even if not verbose or on sterr 233 | - create option to inherit default ownership of receiving dir 234 | - create option to inherit default permissions of receiving dir 235 | - create a default ownership/permission behavior to change the files/directories that were changed 236 | - if `mv --no-clobber` fails, see if it can be determined if collision, add to collision list 237 | - check the difference in `rsync --perms --owner --group` and `--acls` 238 | - add a feature to let files inherit the ownership from parent directory via: chown -R `stat . -c %u:%g` * 239 | - at present files are filtered for ^>f.s|>f..t therefore attribute changes would be ignored, change to accommodate other changes 240 | - add a `--dry-run` feature 241 | - verify that the `rm --force` is not able to delete open, partially written files. 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | -------------------------------------------------------------------------------- /movestough.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ################################### LICENSE ################################### 4 | # 5 | # This software is lincesed using the MIT License 6 | # 7 | # Copyright (c) 2016 John Mark Mitchell 8 | # 9 | # Permission is hereby granted, free of charge, to any person obtaining a copy 10 | # of this software and associated documentation files (the "Software"), to deal 11 | # in the Software without restriction, including without limitation the rights 12 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 13 | # copies of the Software, and to permit persons to whom the Software is 14 | # furnished to do so, subject to the following conditions: 15 | # 16 | # The above copyright notice and this permission notice shall be included in 17 | # all copies or substantial portions of the Software. 18 | # 19 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 20 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 21 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 22 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 23 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 24 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 25 | # SOFTWARE. 26 | # 27 | ############################################################################### 28 | 29 | 30 | ################################# READ ME TOO ################################# 31 | # tl;dr: Move files to target without loss. Clean up source directory. 32 | # 33 | # Intended use: 34 | # The principal use of this script is to move files from a source directory to 35 | # a target directory, on the same filesystem, via fail-safe handling. 36 | # 37 | # For more background information, usage suggestions and the latest version of 38 | # this script, please visit: https://github.com/jmmitchell/movestough 39 | # 40 | # 41 | # What it is not intended for: 42 | # This software not intended for copying files to remote a filesystem. If that 43 | # is what you are needing, then Rsync, Unison, SyncThing or a host of other 44 | # options might be a place to start. In addition, this is software is really 45 | # not intended to be backup software by itself, though it certainly could be 46 | # part of an overall backup plan. 47 | # 48 | # 49 | # Strengths over alternative options: 50 | # For those more familiar with linux cli handiwork, you can think of this 51 | # script as the best of rsyc, find, mv, mkdir & rmdir rolled into one. 52 | # 53 | # This script excels in comparison to using rsync or mv alone in that it 54 | # provides finer-grained handling of the move process. 55 | # 56 | # While there may be faster or less complicted ways to simply move some files, 57 | # this script has been crafted to err on the side of data preservation and to 58 | # produce an audit trail for all actions taken. 59 | # 60 | # To expand upon that, a few of the distinctive features are: 61 | # 62 | # (1) Every create/update/delete action can be logged in a timestamped log. The 63 | # mv command does not do this and rsync, without some serious canjoling, omits 64 | # key actions from it's output and therefore leaves you with no record. 65 | # 66 | # (2) The directory structure on the source filesystem can be cleaned up once 67 | # they are empty. Specifically, after moving the source files to their target, 68 | # if there empty directories on the source filesystem they can be deleted after 69 | # a specified delay or they can be selectively preserved. The ability to clean 70 | # up the empty source subdirectories after moving the files conatined within 71 | # is a feature that is sorely missing from rsync. 72 | # 73 | # (4) Directories that are targeted for deletion as part of the clean up 74 | # process will not be deleted if they, for unforseen reasons, are not empty 75 | # when we attempt to deleted them. This may seem obvious but is a subtle 76 | # use-case that can easily manifest itself if this script is used on a computer 77 | # where files are being added by other local processes (e.g. dropbox) or has a 78 | # shared filesystem where network users may be actively creating new files in 79 | # the same directory structure in which we are working. 80 | # 81 | # (3) Any attempt to move a file is done with the greatest care for data 82 | # preservation. As an example, a file that appears to be a duplicate, by file 83 | # name conflict, should be examined closely before any potentially destructive 84 | # action is taken. Files that are suspected to be duplicates are validated by a 85 | # hash function. If the source file (file to be moved) is found to be a 86 | # byte-for-byte duplicate of the existing target (i.e. destination) file, the 87 | # source file is not copied and be safely disregarded. If on the other hand, 88 | # the source file is found to have the same name as the existing target file 89 | # but contains different data it will be moved but will be given a unique 90 | # filename so as to preserved the data and allow a person to manually inspect 91 | # both files after the fact and to make a judgement call on what to keep. 92 | # 93 | # 94 | # Feature summary: 95 | # - empty directories cleaned up from source directory (rsync does not do this) 96 | # - via optional config, source subdirectories can be selectively preserved 97 | # - consistent timestamped logging of all create/update/delete actions 98 | # - log all items removed from the source directory (rsync does not do this) 99 | # - logs level can specified to record increasing levels of detail 100 | # - warging messages in logs include stderr and return code of failed command 101 | # - two levels of verbose output are available when when run interactively 102 | # - target directory structure will be created, if needed 103 | # - via optional config file, new ownership rules can be specified for items 104 | # moved to the target directory 105 | # - each potential file collisions (files with the same name & date) are 106 | # verified via checksum of both source and target files 107 | # - verified file collisions are moved to the target via deconflicted filename 108 | # as opposed to rsync delta copy that might replicate a partial written file 109 | # - exact duplicate files, verified via checksum, are not replicated 110 | # - if source and target are on the same filesystem, files are "moved" super 111 | # fast via metadata update (directory entries) rather than reading and 112 | # rewriting file contents 113 | # - file-level atomic moves ensuring there is never a partially moved file 114 | # 115 | # 116 | # High level steps taken by the script in performing its task: 117 | # 1. Using rsync, look in the source directory for files, directories and links 118 | # that need to be moved to the destination directory. 119 | # 2. Using rsync, replicate the identified structural items (folders & links) 120 | # to the destination directory, applying attribute changes as needed to 121 | # mirror the source. Once replicated, rsync will remove the source items. An 122 | # exception to this is subdirectories. These are handled last. 123 | # 3. For completely new files found in the source directory, move them via the 124 | # mv command. 125 | # 4. For files that appear exactly the same (size, attributes and change date) 126 | # in the source and destination, check each via hash against the matching 127 | # destination file. If the hashes are the same, there is no need to mv the 128 | # file, so the source file is deleted. 129 | # 5. Manage any possible file collisions by making offending source file names 130 | # unique while moving them to the target directory. The collision-free 131 | # change list is then fed to mv to complete the move from source to 132 | # destination. 133 | # 6. If changes were made in the destination directory, check the supplied 134 | # ownership config file and enforce ownership as indicated. 135 | # 7. Look for stale, empty directories in the source directory and remove them. 136 | # Staleness is determined by the change date as compared to the number of 137 | # minutes passed into the flag --minutes-until-stale (or -ms for short). If 138 | # no flag is set, staleness defaults to 15 mins. 139 | # 140 | # 141 | # *** IMPORTANT *** 142 | # It is important to use flock when running this script from a crontab. This 143 | # allows the script to be executed often but keeps the system from having 144 | # simultaneous copies of the script running. Not only is this resource (CPU, 145 | # RAM) friendly, this eliminates race conditions and other nasty unintended 146 | # side effects. For more information, see the readme on github: 147 | # https://github.com/jmmitchell/movestough/blob/master/README.md 148 | # ***************** 149 | # 150 | # 151 | # **** WARNING **** 152 | # It should be recognized that, while this script makes every attempt to fail 153 | # safely, there are most probably edge cases that are not accounted for so 154 | # proceed with caution, question everything and, if you find bugs please 155 | # contribute back with a pull request on github. You have been warned. 156 | # ***************** 157 | # 158 | ############################################################################### 159 | 160 | 161 | 162 | ################################## FUNCTIONS ################################## 163 | # 164 | 165 | function make_filename_unique { 166 | # where we passed anything? 167 | if [ -n "$1" ]; then 168 | local unique_string="deconflicted-$(date +"%Y-%m-%d-%H-%M-%S-%N")" 169 | # get the file name part of the filepath 170 | local filename="$(basename "$1")" 171 | # get the directory path part of the filepath 172 | local path="$(dirname "$1")/" 173 | 174 | if [[ "${UNIQUESTYLE}" -eq "1" ]]; then 175 | # shortest extension; insert string before 176 | # Get the root name of $filename, taking the last occurence of . as 177 | # the end of the name. In other words, remove the shortest possible 178 | # match of .* from the end of $filename. 179 | local rootname="${filename%.*}" 180 | local ext="${filename##*.}" 181 | if [ -n "${ext}" ]; then 182 | ext=".${ext}" 183 | fi 184 | printf -- "%s.%s" "${path}${rootname}" "${unique_string}${ext}" 185 | elif [[ "${UNIQUESTYLE}" -eq "2" ]]; then 186 | # longest extension; insert string before 187 | # Get the root name of $filename, taking the first occurence of . 188 | # as the end of the name. In other words, remove the longest 189 | # possible match of .* from the end of $filename. 190 | local rootname="${filename%%.*}" 191 | local ext="${filename#*.}" 192 | if [ -n "${ext}" ]; then 193 | ext=".${ext}" 194 | fi 195 | printf -- "%s.%s" "${path}${rootname}" "${unique_string}${ext}" 196 | else 197 | # append the string at the end 198 | printf -- "%s-%s" "${1}" "${unique_string}" 199 | fi 200 | else 201 | echo "" 202 | fi 203 | } 204 | 205 | function uuid { 206 | local N B C='89ab' 207 | for (( N=0; N < 16; ++N )) ; do 208 | B=$(( RANDOM%256 )) 209 | case $N in 210 | 6) 211 | printf -- '4%x' $(( B%16 )) 212 | ;; 213 | 8) 214 | printf -- '%c%x' ${C:$RANDOM%${#C}:1} $(( B%16 )) 215 | ;; 216 | 3 | 5 | 7 | 9) 217 | printf -- '%02x-' $B 218 | ;; 219 | *) 220 | printf -- '%02x' $B 221 | ;; 222 | esac 223 | done 224 | echo 225 | } 226 | 227 | function adddate { 228 | while IFS= read -r line; do 229 | printf -- '%s\t%s\n' "$(date)" "${line}" 230 | done <<<"${1}" 231 | } 232 | 233 | function logger { 234 | # was a param passed and does LOGFILE have some value 235 | if [ -n "$1" ] && [ -n "${LOGFILE}" ]; then 236 | # test if file exists and is writable 237 | if [[ ( -e "${LOGFILE}" || $(touch "${LOGFILE}") ) && -w "${LOGFILE}" ]]; then 238 | # prepend date then write to log, ensuring a newline at the end 239 | adddate "$1" >> "${LOGFILE}" 240 | fi 241 | fi 242 | } 243 | 244 | # 245 | ############################################################################### 246 | 247 | 248 | 249 | ##################### PROCESS ARGUMENTS & SET DEFAULTS ######################## 250 | # 251 | 252 | # set some default values 253 | MINSUNTILSTALE="15" 254 | exit_now="0" 255 | 256 | # use file descriptors 3 through 5 for our verbose output 257 | # important to start counting at 2 so that any increase to this will result in 258 | # a min of file descriptor 3 (remember 0-2 are used by stderr, stdin, stdout) 259 | VERBOSITYMIN="2" 260 | 261 | # to start VERBOSELEVEL = VERBOSITYMIN, effectively turning off verbosity 262 | VERBOSELEVEL="${VERBOSITYMIN}" 263 | 264 | # the highest verbosity level 265 | VERBOSITYMAX="5" 266 | 267 | 268 | # process arguments passed to the script 269 | for opt in "$@"; do 270 | case $opt in 271 | -v=*|--verbose=*) 272 | if [[ "${opt#*=}" =~ ^[1-3]$ ]]; then 273 | let VERBOSELEVEL+="${opt#*=}" 274 | fi 275 | # make sure the arg was not parsed twice and resulting in gt VERBOSITYMAX 276 | if [[ "${VERBOSELEVEL}" > "${VERBOSITYMAX}" ]]; then 277 | VERBOSELEVEL="${VERBOSITYMAX}" 278 | fi 279 | shift # past argument=value 280 | ;; 281 | -v|--verbose) 282 | if [[ "${VERBOSELEVEL}" -lt "3" ]]; then 283 | VERBOSELEVEL="3" 284 | fi 285 | shift # past argument with no value 286 | ;; 287 | -s=*|--source=*) 288 | SOURCEPATH="${opt#*=}" 289 | if [ -d "${opt#*=}" ]; then 290 | # cd and pwd magic ensures that the path is fully qualified, but 291 | # without link expansion. pwd always returns paths without the 292 | # trailing slash and rsync is sensitive to this so we append one 293 | SOURCEPATH="$(cd "${opt#*=}"; pwd)/" 294 | # SOURCEPATH_SLASH_ESCAPED="${SOURCEPATH//\//\\/}" 295 | # SOURCEPATH_SPACE_ESCAPED="${SOURCEPATH// /\\ }" 296 | else 297 | printf -- '\n%s : the supplied source path is not valid. No such directory.\n' "${opt}" 298 | exit_now="1" 299 | fi 300 | shift # past argument=value 301 | ;; 302 | -d=*|--destination=*) 303 | DESTINATIONPATH="${opt#*=}" 304 | if [ -d "${opt#*=}" ]; then 305 | # cd and pwd magic ensures that the path is fully qualified, but 306 | # without link expansion. pwd always returns paths without the 307 | # trailing slash and rsync is sensitive to this so we append one 308 | DESTINATIONPATH="$(cd "${opt#*=}"; pwd)/" 309 | # DESTINATIONPATH_SLASH_ESCAPED="${DESTINATIONPATH//\//\\/}" 310 | # DESTINATIONPATH_SPACE_ESCAPED="${DESTINATIONPATH// /\\ }" 311 | else 312 | printf -- '\n%s : the supplied destination path is not valid. No such directory.\n' "${opt}" 313 | exit_now="1" 314 | fi 315 | shift # past argument=value 316 | ;; 317 | -p=*|--preserve=*) 318 | DIRSTOPRESERVEFILE="${opt#*=}" 319 | # test if file exists and is readable 320 | if ! [ -r "${DIRSTOPRESERVEFILE}" ]; then 321 | printf -- '\n%s : either the file does not exist or is not readable. Check that the path is correct and that permissions are set correctly.\n' "${opt}" 322 | exit_now="1" 323 | fi 324 | shift # past argument=value 325 | ;; 326 | -o=*|--ownership=*) 327 | OWNERSHIPFILE="${opt#*=}" 328 | # test if file exists and is readable 329 | if ! [ -r "${OWNERSHIPFILE}" ]; then 330 | printf -- '\n%s : either the file does not exist or is not readable. Check that the path is correct and that permissions are set correctly.\n' "${opt}" 331 | exit_now="1" 332 | fi 333 | shift # past argument=value 334 | ;; 335 | -ms=*|--minutes-until-stale=*) 336 | if [[ "${opt#*=}" =~ ^[0-9]+$ ]]; then 337 | MINSUNTILSTALE="${opt#*=}" 338 | fi 339 | shift # past argument=value 340 | ;; 341 | -l=*|--log=*) 342 | LOGFILE="${opt#*=}" 343 | # test if file exists and is writable 344 | if [[ ( -e "${LOGFILE}" || $(touch "${LOGFILE}") ) && ! -w "${LOGFILE}" ]]; then 345 | printf -- '\n%s : either the file does not exist or is not readable. Check that the path is correct and that permissions are set correctly.\n' "${opt}" 346 | exit_now="1" 347 | fi 348 | shift # past argument=value 349 | ;; 350 | -u=*|--unique-style=*) 351 | if [[ "${opt#*=}" =~ ^[0-2]+$ ]]; then 352 | UNIQUESTYLE="${opt#*=}" 353 | fi 354 | shift # past argument=value 355 | ;; 356 | *) 357 | # unknown option 358 | printf -- '\n%s : unknown option\n' "${opt}" 359 | ;; 360 | esac 361 | done 362 | 363 | # check for the minimum required arguments 364 | if [ -z "${SOURCEPATH}" ] || [ -z "${DESTINATIONPATH}" ]; then 365 | printf -- '\n A source and target path are required arguments.\n' 366 | exit_now="1" 367 | fi 368 | 369 | # if critical information is missing or wrong, exit the script 370 | if [ "${exit_now}" -eq "1" ]; then 371 | printf -- "\nPREVIOUS ISSUE(S) HAVE CAUSED A FATAL ERROR.\n\nExiting now.\n\n" 372 | exit 1; 373 | fi 374 | 375 | 376 | # open the file descriptors needed to handle verbose output 377 | # start counting from 3 since 1 and 2 are standards (stdout/stderr). 378 | for level in $(seq 3 "${VERBOSELEVEL}"); do 379 | (( "${level}" <= "${VERBOSITYMAX}" )) && eval exec "${level}>&2" # Don't change anything higher than the maximum verbosity allowed. 380 | done 381 | 382 | # any handler higher than requested, pipe to null 383 | for level in $(seq $(( VERBOSELEVEL+1 )) "${VERBOSITYMAX}" ); do 384 | # start at 3 (1 & 2 are stdout and stderr) direct these to bitbucket 385 | (( "${level}" >= "3" )) && eval exec "${level}>/dev/null" 386 | done 387 | 388 | 389 | 390 | # if verbose output was triggered, list out the args 391 | if [[ "${VERBOSELEVEL}" > "${VERBOSITYMIN}" ]]; then 392 | printf -- 'Verbose mode was triggered...\n' 393 | printf -- ' Verbose Level = %s\n' "$(( ${VERBOSELEVEL} - ${VERBOSITYMIN} ))" 394 | printf -- ' Source Path = %s\n' "${SOURCEPATH:- < none >}" 395 | printf -- ' Destination Path = %s\n' "${DESTINATIONPATH:- < none >}" 396 | printf -- ' Directories to Preserve Config = %s\n' "${DIRSTOPRESERVEFILE:- < none >}" 397 | printf -- ' Log File = %s\n' "${LOGFILE:- < none >}" 398 | printf -- ' Ownership Config = %s\n' "${OWNERSHIPFILE:- < none >}" 399 | printf -- ' Minutes Until Directories Are Stale = %s\n' "${MINSUNTILSTALE:- < none >}" 400 | printf -- ' Deconflict Files With Style Number = %s\n\n' "${UNIQUESTYLE:- < none >}" 401 | fi 402 | 403 | # 404 | ############################################################################### 405 | 406 | 407 | 408 | # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 409 | # 410 | # tl;dr: Move files from to destination while accounting for collision. 411 | # 412 | # The details: 413 | # 1. Rsync is employed to look for new changes in the source directory that are 414 | # not in the destination directory. 415 | # 2. The --whole-file flag is used to force rsync to use name, size change 416 | # date/time to determine file uniqueness rather than using a file checksum. 417 | # This will help to keep CPU and RAM usage low. 418 | # NOTE this has one possible edge case that could be undesirable: if the 419 | # source directory contained files, like those used in databases, where 420 | # changes could be made to the internals of the file and yet the file change 421 | # date and size remain unchanged, this change would go unnoticed by rsync 422 | # when run with the --whole-file flag. If this scenario is anticipated, be 423 | # certain to remove this flag below and test, test, test before relying on 424 | # this script to handle your precious data. If you do have these kind of 425 | # files it is recommended that utilize a utility specifically written for 426 | # backing up such files. Checksum is later used and would catch this change 427 | # so data would not be lost but for things like actively written DB data 428 | # files, look into the tools specifically designef to handle them. 429 | # right tool to back them up. 430 | # 3. The --dry-run flag is used to force rsync to simultate the move but not 431 | # act on the changes, thus we get a targeted list of changes to process. 432 | # 4. The combination of the --itemize-changes flag and the -vvv flag gives us 433 | # detailed output of any file system changes that rsync found in the source 434 | # directory when compared to the target directory. The bit flags produced by 435 | # rsync's itemized changes are parsed to determine required actions. 436 | # ------------------------------------------------------ 437 | # Explanation of each bit position and value in rsync's output: 438 | # YXcstpoguax path/to/file 439 | # ||||||||||| 440 | # ||||||||||╰- x: The extended attribute information changed 441 | # |||||||||╰-- a: The ACL information changed 442 | # ||||||||╰--- u: The u slot is reserved for future use 443 | # |||||||╰---- g: Group is different 444 | # ||||||╰----- o: Owner is different 445 | # |||||╰------ p: Permission are different 446 | # ||||╰------- t: Modification time is different 447 | # |||╰-------- s: Size is different 448 | # ||╰--------- c: Different checksum (for regular files), or 449 | # || changed value (for symlinks, devices, and special files) 450 | # |╰---------- the file type: 451 | # | f: for a file, 452 | # | d: for a directory, 453 | # | L: for a symlink, 454 | # | D: for a device, 455 | # | S: for a special file (e.g. named sockets and fifos) 456 | # ╰----------- the type of update being done: 457 | # <: file is being transferred to the remote host (sent) 458 | # >: file is being transferred to the local host (received) 459 | # c: local change/creation for the item, such as: 460 | # - the creation of a directory 461 | # - the changing of a symlink, 462 | # - etc. 463 | # h: the item is a hard link to another item (requires 464 | # --hard-links). 465 | # .: the item is not being updated (though it might have 466 | # attributes that are being modified) 467 | # *: means that the rest of the itemized-output area contains 468 | # a message (e.g. "deleting") 469 | # ------------------------------------------------------ 470 | # For reference, some example output from rsync for various scenarios: 471 | # >f+++++++++ some/dir/new-file.txt 472 | # .f....og..x some/dir/existing-file-with-changed-owner-and-group.txt 473 | # .f........x some/dir/existing-file-with-changed-unnamed-attribute.txt 474 | # >f...p....x some/dir/existing-file-with-changed-permissions.txt 475 | # >f..t..g..x some/dir/existing-file-with-changed-time-and-group.txt 476 | # >f.s......x some/dir/existing-file-with-changed-size.txt 477 | # >f.st.....x some/dir/existing-file-with-changed-size-and-time-stamp.txt 478 | # cd+++++++++ some/dir/new-directory/ 479 | # .d....og... some/dir/existing-directory-with-changed-owner-and-group/ 480 | # .d..t...... some/dir/existing-directory-with-different-time-stamp/ 481 | # ------------------------------------------------------ 482 | # 5. To focus only on the changes that are substantive for each context (new 483 | # file, duplicate file, etc), the output produced by rsync --itemize-changes 484 | # is piped to sed to focus on items desired in each context. 485 | # 6. For information on other rsync flags used check the manpage. 486 | # 487 | 488 | # set default values 489 | change_count="0" # nothing changed yet 490 | ownership_changes="0" # nothing yet 491 | warning_count="0" # nothing yet, fingers crossed 492 | dirs_to_preserve="" # nothing yet 493 | 494 | # get the file/directory change list, as determined by rsync 495 | full_file_list=$(rsync --archive --acls --xattrs --dry-run\ 496 | --itemize-changes -vvv \ 497 | "${SOURCEPATH}" \ 498 | "${DESTINATIONPATH}" \ 499 | | grep -E '^(\.|>|<|c|h|\*).......... .') 500 | 501 | # level 2 verbose message, sent to file descriptor 4 502 | printf -- "==================\nRaw List:\n%s\n__________________\n\n" "${full_file_list}" >&4 503 | 504 | 505 | ##################################################################### 506 | ##### PREPARE CHANGE LIST ########################################### 507 | 508 | if [ -n "${DIRSTOPRESERVEFILE}" ]; then 509 | # loop over each line in DIRSTOPRESERVEFILE and remove it from 510 | # full_file_list as these are false positives, especially if we are 511 | # applying ownership or permission changes to the destination directories 512 | while read -r line || [[ -n "${line}" ]]; do # allows for last line with no newline 513 | # skip lines that are delimited as a comment (with #) or are blank 514 | if ! [[ "${line}" =~ ^[:space:]*# || "${line}" =~ ^[:space:]*$ ]]; then 515 | # using some bash fu, add a trailing slash if one is not already present 516 | line="${line}$( printf \\$( printf '%03o' $(( $(printf '%d' "'${line:(-1)}") == 47 ? 0 : 47 )) ) )" 517 | 518 | # save this info as it is needed later during stale directory cleanup 519 | dirs_to_preserve="${dirs_to_preserve}${line}"$'\n' 520 | 521 | # make the path local; remove $SOURCEPATH from the beginning of $line 522 | # via the substring removal capacities of bash's parameter expansion 523 | local_path="${line#$SOURCEPATH}" 524 | 525 | # search for a match to $line in $full_file_list. if found, remove the matching line 526 | full_file_list=$(sed -r "\|^........... \.?${local_path}$|d" <<< "${full_file_list}") 527 | fi 528 | done < "${DIRSTOPRESERVEFILE}" 529 | fi 530 | 531 | 532 | # extra check to make sure ./ is removed as the previous loop will only have 533 | # removed it if DIRSTOPRESERVEFILE contains an entry that is an exact match for 534 | # the source path 535 | full_file_list=$(echo "${full_file_list}" | sed '/^\s*$/d' | sed '/^........... \.\//d') 536 | 537 | # get the count of items in full_file_list 538 | full_list_count=$(echo "${full_file_list}" | sed '/^\s*$/d' | wc -l) 539 | 540 | # level 2 verbose message, sent to file descriptor 4 541 | printf -- "==================\nItems to Process:\n%s\n__________________\n\n" "${full_file_list:- < none >}" >&4 542 | 543 | # if something has changed, process it 544 | if [ "${full_list_count}" -gt "0" ]; then 545 | 546 | # level 1 verbose message, sent to file descriptor 3 547 | printf -- "\nFound (%s) additions or changes, starting processing now.\n" "${full_list_count}" >&3 548 | 549 | ##################################################################### 550 | ##### PROCESS FILE SYSTEM CHANGES LIKE DIRECTORIES, LINKS, ETC ###### 551 | 552 | # isolate any file system changes (directories, links, etc) excluding files 553 | # grep for anything not a file (.f) and grep for anything not an unchanged directory 554 | # unchanged directories are duplicates and will be removed later if empty 555 | structure_changes=$(echo "${full_file_list}" | grep -E --invert-match '^.f' | grep -E --invert-match '^\.d ') 556 | structure_changes_count=$(echo "${structure_changes}" | sed '/^\s*$/d' | wc -l) 557 | 558 | # process any file system changes 559 | if [ "${structure_changes_count}" -gt "0" ]; then 560 | # level 1 verbose message, sent to file descriptor 3 561 | printf -- "\n==================\nFound (%s) structure changes, starting processing now.\n" "${structure_changes_count}" >&3 562 | 563 | # level 2 verbose message, sent to file descriptor 4 564 | printf -- "\nStructure Changes:\n%s\n\n" "${structure_changes}" >&4 565 | 566 | structure_changes_actual_count="0" # nothing yet 567 | structure_changes_actual="" # nothing yet 568 | 569 | IFS=" " # set word break on space for splitting 570 | 571 | # read the input by line; split the line to extract rsync bits flags and path 572 | while read -r -d $'\n' bits path; do 573 | 574 | # Use rsync to populate directories, soflink, hardlink and other 575 | # associated attribute changes in the destination. rsync does not 576 | # remove source directories, those are cleaned up later. 577 | 578 | # via some bash fu, capture stdout, sterr and return code for rsync 579 | # ⬇ start of out-err-rtn capture fu 580 | eval "$({ cmd_err=$({ cmd_out="$( \ 581 | rsync --itemize-changes --links --times \ 582 | --group --owner --devices --specials --acls --xattrs \ 583 | --remove-source-files --files-from=- \ 584 | --include='*/' --exclude='*' \ 585 | "${SOURCEPATH}" "${DESTINATIONPATH}" \ 586 | <<< "${path}" \ 587 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 588 | # ⬆ close of out-err-rtn capture fu 589 | 590 | if [ "${cmd_ret}" -eq "0" ]; then 591 | logger "${bits}"$'\t'"\"${SOURCEPATH}${path}\""$'\t'"\"${DESTINATIONPATH}${path}\"" 592 | # level 2 verbose message, sent to file descriptor 4 593 | printf -- "Changes (%s) were replcated from (%s) to the destination.\n" "${bits}" "${SOURCEPATH}${path}" >&4 594 | 595 | structure_changes_actual="${structure_changes_actual}${path}"$'\n' 596 | let structure_changes_actual_count+="1" 597 | 598 | let change_count+="1" 599 | ownership_changes="1" 600 | else 601 | logger "*warning***"$'\t'"Failed to replicate changes (${bits}) from (${SOURCEPATH}${path}) to the destination. [${cmd_ret} : ${cmd_err}]" 602 | # level 2 verbose message, sent to file descriptor 4 603 | printf -- "Warning: failed to replicate changes (%s) from (%s) to the destination. [%s : %s]\n" "${bits}" "${SOURCEPATH}${path}" "${cmd_ret}" "${cmd_err}" >&4 604 | fi 605 | 606 | #clean up reused vars 607 | unset cmd_out 608 | unset cmd_err 609 | unset cmd_ret 610 | done <<< "${structure_changes}" 611 | unset IFS 612 | 613 | # level 2 verbose message, sent to file descriptor 4 614 | printf -- "\nStructure changes were processed for the following:\n%s\n" "${structure_changes_actual}" >&4 615 | 616 | # level 1 verbose message, sent to file descriptor 3 617 | printf -- "\n Done processing (%s) needed structure changes.\n" "${structure_changes_actual_count}" >&3 618 | fi 619 | 620 | 621 | ##################################################################### 622 | ##### PROCESS NEW FILES ############################################# 623 | 624 | # isolate any files that are new 625 | new_files=$(echo "${full_file_list}" | grep -E '^>f\+\+\+\+\+\+\+\+\+' | sed 's/^........... //g') 626 | new_count=$(echo "${new_files}" | sed '/^\s*$/d' | wc -l) 627 | 628 | # process any new files 629 | if [ "${new_count}" -gt "0" ]; then 630 | # level 1 verbose message, sent to file descriptor 3 631 | printf -- "\n==================\nFound (%s) new files, starting processing now.\n" "${new_count}" >&3 632 | 633 | # level 2 verbose message, sent to file descriptor 4 634 | printf -- "\nNew Files:\n%s\n\n" "${new_files}" >&4 635 | 636 | IFS=$'\n' # make newline the only separator 637 | for path in ${new_files}; do 638 | # move the file; --no-clobber is used to ensure fail safe function. 639 | # If the file fails to move because of a collision, this is very 640 | # important to catch and not overwrite any file. This would 641 | # indicate that a rare boundary case has manifest itself. In that 642 | # case, leave the file in the sourc directory as it is best handled 643 | # on a later run of the script. 644 | 645 | 646 | # via some bash fu, capture stdout, sterr and return code for mv 647 | # ⬇ start of out-err-rtn capture fu 648 | eval "$({ cmd_err=$({ cmd_out="$( \ 649 | mv --no-clobber "${SOURCEPATH}${path}" "${DESTINATIONPATH}${path}" \ 650 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 651 | # ⬆ close of out-err-rtn capture fu 652 | 653 | if [ "${cmd_ret}" -eq "0" ]; then 654 | logger ">f+++++++++"$'\t'"\"${SOURCEPATH}${path}\""$'\t'"\"${DESTINATIONPATH}${path}\"" 655 | # level 2 verbose message, sent to file descriptor 4 656 | printf -- "File (%s) was moved from source to destination directory.\n" "${path}" >&4 657 | 658 | let change_count+="1" 659 | ownership_changes="1" 660 | else 661 | logger "*warning***"$'\t'"failed to move file (${SOURCEPATH}${path}) to the destination. [${cmd_ret} : ${cmd_err}]" 662 | # level 2 verbose message, sent to file descriptor 4 663 | printf -- "Warning: File (%s) failed to move from source to destination directory. [%s : %s]\n" "${path}" "${cmd_ret}" "${cmd_err}" >&4 664 | fi 665 | done 666 | unset IFS # good practice to reset IFS to its default value 667 | 668 | # level 1 verbose message, sent to file descriptor 3 669 | printf -- "\n Done processing new files.\n" >&3 670 | fi 671 | 672 | 673 | ##################################################################### 674 | ##### PROCESS SIMILAR (SAME NAME, DIFFERENT CONTENT) FILES ########## 675 | 676 | # isolate any potential file collisions 677 | offending_files=$(echo "${full_file_list}" | grep -E '^>f.s|>f..t') 678 | offending_count=$(echo "${offending_files}" | sed '/^\s*$/d' | wc -l) 679 | 680 | # process any file name collisions 681 | if [ "${offending_count}" -gt "0" ]; then 682 | # level 1 verbose message, sent to file descriptor 3 683 | printf -- "\n==================\nFound (%s) files collisions (same name, different content), starting processing now.\n" "${offending_count}" >&3 684 | 685 | # level 2 verbose message, sent to file descriptor 4 686 | printf -- "\nFile Collisions:\n%s\n\n" "${offending_files}" >&4 687 | 688 | offending_files_actual_count="0" # nothing yet 689 | 690 | IFS=" " # set word break on space for splitting 691 | 692 | # read the input by line; split the line to extract rsync bits flags and path 693 | while read -r -d $'\n' bits path; do 694 | unique_path=$(make_filename_unique "${path}") 695 | 696 | # move the file while renaming it to its new unique name; 697 | # --no-clobber is used to ensure fail safe handling of the file 698 | # to the destination directory. 699 | 700 | # via some bash fu, capture stdout, sterr and return code for mv 701 | # ⬇ start of out-err-rtn capture fu 702 | eval "$({ cmd_err=$({ cmd_out="$( \ 703 | mv --no-clobber "${SOURCEPATH}${path}" "${DESTINATIONPATH}${unique_path}" \ 704 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 705 | # ⬆ close of out-err-rtn capture fu 706 | 707 | if [ "${cmd_ret}" -eq "0" ]; then 708 | logger "${bits}"$'\t'"\"${SOURCEPATH}${path}\""$'\t'"\"${DESTINATIONPATH}${unique_path}\"" 709 | # level 2 verbose message, sent to file descriptor 4 710 | printf -- "File (%s) was moved from source to destination directory with deconflicted name (%s).\n" "${SOURCEPATH}${path}" "${SOURCEPATH}${unique_path}" >&4 711 | 712 | let offending_files_actual_count+="1" 713 | 714 | let change_count+="1" 715 | ownership_changes="1" 716 | 717 | else 718 | logger "*warning***"$'\t'"Failed to move file (${SOURCEPATH}${path}) to destination directory with deconflicted name (${SOURCEPATH}${unique_path}). [${cmd_ret} : ${cmd_err}]" 719 | # level 2 verbose message, sent to file descriptor 4 720 | printf -- "Warning: failed to move file (%s) to destination directory with deconflicted name (%s). [%s : %s]\n" "${SOURCEPATH}${path}" "${SOURCEPATH}${unique_path}" "${cmd_ret}" "${cmd_err}" >&4 721 | fi 722 | 723 | #clean up reused vars 724 | unset cmd_out 725 | unset cmd_err 726 | unset cmd_ret 727 | done <<< "${offending_files}" 728 | unset IFS 729 | 730 | # level 1 verbose message, sent to file descriptor 3 731 | printf -- "\n Done processing (%s) file collisions by renaming offending files.\n" "${offending_files_actual_count}" >&3 732 | fi 733 | 734 | 735 | ##################################################################### 736 | ##### PROCESS DUPLICATE (SAME NAME, SAME CONTENT) FILES ############# 737 | 738 | # isolate files that are potential duplicates of existing files 739 | same_files=$(echo "${full_file_list}" | grep -E '^\.f') 740 | same_count=$(echo "${same_files}" | sed '/^\s*$/d' | wc -l) 741 | 742 | # process any unchanged files 743 | # as an extra measure of precaution, verify each file again for sameness directly before deletion 744 | if [ "${same_count}" -gt "0" ]; then 745 | # level 1 verbose message, sent to file descriptor 3 746 | printf -- "\n==================\nFound (%s) duplicate (same name, same content) files, starting processing now.\n" "${same_count}" >&3 747 | 748 | # level 2 verbose message, sent to file descriptor 4 749 | printf -- "\nDuplicate Files:\n%s\n\n" "${same_files}" >&4 750 | 751 | same_files_actual_count="0" # nothing yet 752 | same_files_actual="" # nothing yet 753 | 754 | IFS=" " # set word break on space for splitting 755 | 756 | # read the input by line; split the line to extract rsync bits flags and path 757 | while read -r -d $'\n' bits path; do 758 | 759 | # In the spirit of failing safe, we validate the full contents of 760 | # file via checksum hash before proceeding with deletion. 761 | checksum_match=$(rsync --checksum --archive --acls --xattrs --dry-run -vvv --whole-file \ 762 | "${SOURCEPATH}${path}" "${DESTINATIONPATH}${path}" | grep -c "uptodate") 763 | 764 | # If the checksum of the destination file matched the source file, 765 | # delete the source file as it is an exact duplicate. 766 | if [ "${checksum_match}" -eq "1" ]; then 767 | 768 | # via some bash fu, capture stdout, sterr and return code for rsync 769 | # ⬇ start of out-err-rtn capture fu 770 | eval "$({ cmd_err=$({ cmd_out="$( \ 771 | rm --force "${SOURCEPATH}${path}" \ 772 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 773 | # ⬆ close of out-err-rtn capture fu 774 | 775 | if [ "${cmd_ret}" -eq "0" ]; then 776 | logger "${bits}"$'\t'"\"${SOURCEPATH}${path}\""$'\t'"\"${DESTINATIONPATH}${path}\"" 777 | # level 2 verbose message, sent to file descriptor 4 778 | printf -- "File (%s) was a confirmed duplicate via checksum, so it was deleted from the source directory.\n" "${SOURCEPATH}${path}" >&4 779 | 780 | same_files_actual="${same_files_actual}${path}"$'\n' 781 | let same_files_actual_count+="1" 782 | 783 | let change_count+="1" 784 | ownership_changes="1" 785 | else 786 | logger "*warning***"$'\t'"File (${SOURCEPATH}${path}) was confirmed duplicate via checksum; attempts to delete file from source directory have failed. [${cmd_ret} : ${cmd_err}]" 787 | # level 2 verbose message, sent to file descriptor 4 788 | printf -- "Warning: File (%s) was confirmed duplicate via checksum; attempts to delete file from source directory have failed. [%s : %s]\n" "${SOURCEPATH}${path}" "${cmd_ret}" "${cmd_err}" >&4 789 | fi 790 | 791 | else 792 | # If the checksum of the destination file did not matched the 793 | # source file, the source file is a different file, despite its 794 | # name being the same. To safely move the file, make the filename 795 | # unique, then move the file. 796 | unique_path=$(make_filename_unique "${path}") 797 | 798 | # move the file while renaming it to its new unique name 799 | # --no-clobber is again used to ensure fail safe mode 800 | 801 | # via some bash fu, capture stdout, sterr and return code for rsync 802 | # ⬇ start of out-err-rtn capture fu 803 | eval "$({ cmd_err=$({ cmd_out="$( \ 804 | mv --no-clobber "${SOURCEPATH}${path}" "${DESTINATIONPATH}${unique_path}" \ 805 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 806 | # ⬆ close of out-err-rtn capture fu 807 | 808 | if [ "${cmd_ret}" -eq "0" ]; then 809 | logger "${bits}"$'\t'"\"${SOURCEPATH}${path}\""$'\t'"\"${DESTINATIONPATH}${unique_path}\"" 810 | # level 2 verbose message, sent to file descriptor 4 811 | printf -- "File (%s) was suspected a duplicate but instead was confirmed CHANGED, so it was moved from source to destination directory with deconflicted name (%s).\n" "${SOURCEPATH}${path}" "${SOURCEPATH}${unique_path}" >&4 812 | 813 | same_files_actual="${same_files_actual}${path}"$'\n' 814 | let same_files_actual_count+="1" 815 | 816 | let change_count+="1" 817 | ownership_changes="1" 818 | else 819 | logger "*warning***"$'\t'"File (${SOURCEPATH}${path}) was suspected a duplicate but instead was confirmed CHANGED; failed to move it to the destination directory with deconflicted name (${SOURCEPATH}${unique_path}). [${cmd_ret} : ${cmd_err}]" 820 | # level 2 verbose message, sent to file descriptor 4 821 | printf -- "Warning: File (%s) was confirmed duplicate via checksum; failed to move it to the destination directory with deconflicted name (%s). [%s : %s]\n" "${SOURCEPATH}${path}" "${SOURCEPATH}${unique_path}" "${cmd_ret}" "${cmd_err}" >&4 822 | fi 823 | fi 824 | 825 | #clean up reused vars 826 | unset cmd_out 827 | unset cmd_err 828 | unset cmd_ret 829 | done <<< "${same_files}" 830 | unset IFS 831 | 832 | # level 1 verbose message, sent to file descriptor 3 833 | printf -- "\n Done processing duplicate files.\n" >&3 834 | fi 835 | fi 836 | 837 | 838 | 839 | # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 840 | # 841 | # tl;dr: Delete stale subdirectories from the source directory. 842 | # 843 | # The details: 844 | # 1. The find command is used to identify directories whose create time is 845 | # older than 15 minutes. The 15 minute grace period is intended to allow 846 | # people time to create a directory and move files to it without it being 847 | # deleted out from under them. The default age of 15 minutes can be 848 | # changed via the --minutes-until-stale (or -ms for short). The -print0 arg 849 | # is used with find to ensure that paths with newlines in them, though rare, 850 | # will be handled correctly. 851 | # their name are handled correctl 852 | # 2. The directories found are redirected to a while loop for individual 853 | # processing. 854 | # 3. $dirs_to_preserve is populated above from the config file indicated via 855 | # the --preserve (or -p for short) flag for this script. 856 | # 4. Grouping parenthesis and an exclaimation point are used to negate the 857 | # affirmative return code from grep. 858 | # 5. Grep searches for the directory path from match_dir in dirs_to_preserve. 859 | # If a match is found in the file, it should be preserved, thus the 860 | # preceeding boolean negation is needed. If the path is not found the 861 | # negation makes the return result true. 862 | # 6. If true, rmdir (remove dir) command is executed against the match_dir. 863 | # We expect the directories to be empty at this point but we don't control 864 | # what users or other applications might do, so we have to account for files 865 | # being added to the directory outside of our control. There is a fail-safe 866 | # mechanism in the fact that rmdir is designed to fail if the directory is 867 | # not empty. 868 | 869 | # level 1 verbose message, sent to file descriptor 3 870 | printf -- "\n==================\nChecking for any stale (older than %s mins) directories in the source path.\n\n" "${MINSUNTILSTALE}" >&3 871 | 872 | dir_cleanup_count="0" # nothing yet 873 | 874 | # using process substitution, feed the find stout to while's stdin 875 | while IFS= read -r -d $'\0' line; do 876 | # skip over an exact match SOURCEPATH thus preserving it. 877 | if [[ "${line}" != "${SOURCEPATH}" ]]; then 878 | # if there is not a trailing slash, add one via some bash fu 879 | match_dir="${line}$( printf \\$( printf '%03o' $(( $(printf '%d' "'${line:(-1)}") == 47 ? 0 : 47 )) ) )" 880 | 881 | # Check to see if the dirs_to_preserve was populated. If so, verify that 882 | # match_dir is not listed in dirs_to_preserve 883 | if [ -n "${dirs_to_preserve}" ] && ! grep -xq "${match_dir}" <<< "${dirs_to_preserve}"; then 884 | 885 | # via some bash fu, capture stdout, sterr and return code for rsync 886 | # ⬇ start of out-err-rtn capture fu 887 | eval "$({ cmd_err=$({ cmd_out="$( \ 888 | rmdir "${line}" 2> /dev/null \ 889 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 890 | # ⬆ close of out-err-rtn capture fu 891 | 892 | if [ "${cmd_ret}" -eq "0" ]; then 893 | logger "*deleting**"$'\t'"\"${SOURCEPATH}${path}\"" 894 | # level 2 verbose message, sent to file descriptor 4 895 | printf -- "Directory (%s) was deleted from the source directory.\n" "${match_dir}">&4 896 | 897 | let change_count+="1" 898 | let dir_cleanup_count+="1" 899 | else 900 | logger "*warning***"$'\t'"failed to delete directory (${SOURCEPATH}${match_dir}) from source directory. [${cmd_ret} : ${cmd_err}]" 901 | # level 1 verbose message, sent to file descriptor 3 902 | printf -- "Attempted to delete directory (%s) but failed to do so. [%s : %s]\n" "${match_dir}" "${cmd_ret}" "${cmd_err}" >&3 903 | fi 904 | else 905 | # level 2 verbose message, sent to file descriptor 4 906 | printf -- "Directory (%s) is stale but was preserved in the source directory.\n" "${match_dir}" >&4 907 | fi 908 | fi 909 | done < <(find "${SOURCEPATH}" -type d -cmin "+${MINSUNTILSTALE}" -print0) 910 | 911 | # level 1 verbose message, sent to file descriptor 3 912 | printf -- "\n Done processing (%s) stale directories.\n" "${dir_cleanup_count}" >&3 913 | 914 | 915 | 916 | # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 917 | # 918 | # tl:dr: If substantive file system changes were made, verify ownership. 919 | # 920 | # The details: 921 | # 1. If rsync 3.1.0+ were available we could use the --chown flag, but since 922 | # it cannot be counted upon, a more structured aproach will be taken. Thus, 923 | # the iterative use of chown as prescribed by in the ownership file. The 924 | # ownership file is set via the flag --ownership (of -o for short). 925 | # 2. The chown commands are only needed if files were moved by rsync therefore 926 | # the if statement, which checks to see if the change_count is greater than 927 | # zero. 928 | 929 | # loop over each line in OWNERSHIPFILE and enforce the ownership supplied 930 | if [ "${ownership_changes}" -gt "0" ]; then 931 | if [ -n "${OWNERSHIPFILE}" ]; then 932 | # level 1 verbose message, sent to file descriptor 3 933 | printf -- "\n==================\nChanges require us to validate ownership.\n\n" >&3 934 | 935 | while read -r line || [[ -n "${line}" ]]; do # allows for last line with no newline 936 | # skip lines that are delimited as a comment (with #) or are blank 937 | if ! [[ "${line}" =~ ^[:space:]*# || "${line}" =~ ^[:space:]*$ ]]; then 938 | 939 | # split the line on tab to extract ownership info from dir path 940 | while read -r a b;do perm="$a"; path="$b"; done <<<"$line" 941 | 942 | # if both the owner:group ($perm) and the path ($path) are supplied 943 | if [ -n "${perm}" ] && [ -n "${path}" ]; then 944 | 945 | # check if the path supplied is really a directory 946 | if [ -d "${path}" ]; then 947 | # validate that any symlinks in the paths resolved 948 | realpath="$(readlink -f "${path}")" 949 | realdest="$(readlink -f "${DESTINATIONPATH}")" 950 | 951 | # make sure that the $line is local to destination path 952 | if [ "$(echo "${realpath}" | grep -E -c "^${realdest}")" = "1" ]; then 953 | 954 | # As required by chown, make sure the path does not end 955 | # in a slash. sed is used to make sure that multiple 956 | # slashes at the end are removed if they exist. 957 | realpath="$(echo "${realpath}" | sed -r 's/\/+$//')" 958 | 959 | # no-dereference is used to keep chown from following links 960 | 961 | # via some bash fu, capture stdout, sterr and return code for rsync 962 | # ⬇ start of out-err-rtn capture fu 963 | eval "$({ cmd_err=$({ cmd_out="$( \ 964 | chown --recursive --preserve-root --silent --no-dereference "${perm}" "${realpath}" \ 965 | )"; cmd_ret=$?; } 2>&1; declare -p cmd_out cmd_ret >&2); declare -p cmd_err; } 2>&1)" 966 | # ⬆ close of out-err-rtn capture fu 967 | 968 | if [ "${cmd_ret}" -eq "0" ]; then 969 | # level 2 verbose message, sent to file descriptor 4 970 | printf -- "Directory path (%s) had ownsership (%s) applied.\n" "${path}" "${perm}" >&4 971 | else 972 | # level 2 verbose message, sent to file descriptor 4 973 | printf -- "Warning: attempt to set directory path (%s) with ownership (%s) failed. [%s : %s]\n" "${path}" "${perm}" "${cmd_ret}" "${cmd_err}" >&4 974 | fi 975 | else 976 | # level 2 verbose message, sent to file descriptor 4 977 | printf -- "Directory (%s) is not local to the destination directory; ignoring it.\n" "${path}" >&4 978 | fi 979 | else 980 | # level 2 verbose message, sent to file descriptor 4 981 | printf -- "Directory (%s) listed in ownership file is not a directory; ignoring it.\n" "${path}" >&4 982 | fi 983 | else 984 | # level 2 verbose message, sent to file descriptor 4 985 | printf -- "The line (%s) in ownership file does not follow the required pattern; ignoring it.\n" "${line}" >&4 986 | fi 987 | 988 | fi 989 | done <"${OWNERSHIPFILE}" 990 | printf -- '\n==================\nMade (%s) total changes. ' "${change_count}" 991 | else 992 | # level 2 verbose message, sent to file descriptor 4 993 | printf -- '\n==================\nNo ownership file was supplied. Skipping ownership enforcement.\n' >&4 994 | fi 995 | else 996 | printf -- '\nNo (%s) changes. ' "${change_count}" 997 | fi 998 | 999 | # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1000 | # 1001 | # tl;dr: tidy up and close out 1002 | 1003 | # because we're cautious like that, we'll close the file descriptors we used for our verbose output 1004 | for level in $(seq 3 "${VERBOSELEVEL}"); do 1005 | (( "${level}" <= "${VERBOSITYMAX}" )) && eval exec "${level}>&-" 1006 | done 1007 | 1008 | printf -- "\nAll done.\n" 1009 | exit 0 --------------------------------------------------------------------------------