├── LICENSE ├── README.md └── unraid-fast-copy.sh /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 calr0 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # unraid-fast-copy 2 | 3 | An optimized parallel copy utility for copying data from an unraid share to a mounted nfs/cifs share 4 | 5 | 6 | ## Overview 7 | 8 | This script is still a work in progress, so at the moment all parameters must be manually defined at the top of the script. 9 | 10 | Each parameter has a descriptive comment above it that outlines how the primary values should be set based on: 11 | 1. The unraid share you want to copy, optionally including a subdirectory path relative to the root of the share if you want to scope the copy to just a portion of the share. 12 | 2. The target destination of the copy, which should be a locally mounted smb or nfs share using the current implementation of this script. 13 | 14 | The scipt will first determine which disk shares contain data from the share (including subdirectory if provided). Then for each disk that contains share data, one rsync process will be spawned to recursively copy the share data direclty from the disk mount. This allows each individual process to read data from each disk as fast as possible while not contending for resources from the same phsyical disk. 15 | 16 | 17 | ## Dependencies 18 | 19 | Any unraid app/plugin that enables the rsync command within the unraid shell is the only thing required to get this script running. 20 | 21 | Once that's installed, just fill in the script parameters, then add execute permissions to the script: 22 | ```bash 23 | chmod +x ./unraid-fast-copy.sh 24 | ``` 25 | then just simply call the scipt to execute the copy: 26 | ```bash 27 | ./unraid-fast-copy.sh 28 | ``` 29 | 30 | ## Results 31 | 32 | With an array of 12x spindle disks (2 parity disks + 10 data disks) and no ssd/ram cache I was able to get copy speeds slightly over 1 GB/s peak, but avereaged out to a pretty consistent ~800MB/s when copying a ~75TB share that spanned all 10 data disks on an ancient Dell R510 server. With the latest script changes I started getting bottlenecked by the 10G network link and/or remote share server's write speeds. 33 | 34 | IO wait times reported by netdata are now down to just ~10-15% compared to the 50-75%+ iowait I saw when copying directly from the user share mount with a few different approaches. 35 | 36 | 37 | 38 | # TODO: 39 | 40 | - Add command line arg parser 41 | - Validate and sanitize input paths for the source and target directories to avoid issues by including/excluding preceeding or trailing slashes (which is something to keep an eye out for now, read the comments in the script for more info). 42 | - Imrpove the `print_progress` function to eventually provide the status of each child rsync process while the script is running (a prototype of this function exists, but is buggy and currently commented out) 43 | - Get it to properly handle terminal resizing 44 | - Monitor active processes by polling for their activity to further improve what type of progress can be printed 45 | - Update the main loop so it exits when all child processes are finished copying the data for each disk that contains share data. 46 | - Add dry-run mode that just generates a report of what the script plans to do without copying/modifying anything. 47 | -------------------------------------------------------------------------------- /unraid-fast-copy.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # The {share_name} and {share_subdir} variables define the 4 | # name of the unraid share you want to copy as well as an 5 | # optional subdirectory path within the share if you want 6 | # to keep the copy scoped to one specific share folder. 7 | # 8 | # The values below would copy the unraid share named "storage" in it's entirety to 9 | # {target_path}/storage/ 10 | # > share_name="storage" 11 | # > share_subdir="" 12 | # 13 | # These values would copy the "season 1" directory from the share named "media" 14 | # to this location in your {target_path}: {target_path}/media/tv/show name/season 1/ 15 | # > share_name="media" 16 | # > share_subdir="tv/show name/season 1" 17 | # 18 | # These values would copy the "tv" directory located in the "media" share 19 | # to this location in your {target_path}: {target_path}/media/tv/ 20 | # > share_name="media" 21 | # > share_subdir="tv" 22 | # 23 | # If {share_subdir} is defined, it must be relative to the root 24 | # of the share, and can not have a leading or trailing slash in the path 25 | share_name="media" 26 | share_subdir="tv" 27 | 28 | # The {target_path} variable defines the location that you'll want the unraid share copied to. 29 | # 30 | # The media folder will be coped directly into the target dir, creating this structure: 31 | # {target_path}/{share_name}/ 32 | # 33 | # Or if share_subdir was defined, it'd look like this: 34 | # {target_path}/{share_name}/{share_subdir} 35 | # 36 | # In either case, the directories will be fully copied and 37 | # preserved exactly as they appeared in the unraid share. 38 | # Any directories that don't exist on the {target_path} will 39 | # automatically be created. 40 | # 41 | # This path is passed directly to rsync so you should also be able to define it as a remote path. 42 | # So far i've only tested it by setting it to the path of locally mounted nfs/smb network share. 43 | target_path="/mnt/cifs-or-nfs/copy/destination/" 44 | 45 | # Defines the location that all temp output 46 | # files will be generated by the script 47 | script_output_dir="$(pwd)/output" 48 | 49 | # Defines whether or not the script should automatically delete all of the files it created when the script completes. 50 | # The default is 'no' because many of the rsync files (input files, stdout redirected to a file, logfile) will be in 51 | # there and can be useful to review during/after the transfer for general stats/info about the run. Changing this value 52 | # to 'yes' will enable the cleanup functionality on script exit. 53 | # 54 | # Note: if you rerun this script, it will always clean out temp data from past runs when it starts. 55 | # This cleanup option only defines whether or not it should delete temp files when the script exits. 56 | cleanup='no' 57 | 58 | setup_env() { 59 | echo "Setting up script enviornment" 60 | if [[ -d "$script_output_dir" ]]; then 61 | # If the script's output dir already exists just 62 | # delete any old output files from previous runs 63 | echo "Deleting temp files generated from past runs" 64 | rm -vrf "$script_output_dir"/*.* 65 | else 66 | echo "Creating directory for temp script files" 67 | mkdir -v "$script_output_dir" 68 | fi 69 | echo 70 | } 71 | 72 | 73 | cleanup_env() { 74 | # If the cleanup parameter is set to 'yes' 75 | # the script's output directory will be deleted 76 | if [[ $cleanup == yes ]]; then 77 | echo "Cleaning up script environment" 78 | if [[ -d "$script_output_dir" ]]; then 79 | rm -vrf "$script_output_dir" 80 | fi 81 | echo 82 | fi 83 | } 84 | 85 | 86 | terminate() { 87 | echo -e "\n\nCaught termination signal, killing child rsync processes...\n" 88 | 89 | # Block SIGTERM so it doesn't interfere with killing child rsync 90 | # processes, then kill all child processes and exit this script 91 | trap "" SIGTERM 92 | kill 0 93 | 94 | tput init 95 | cleanup_env 96 | 97 | exit 98 | } 99 | 100 | 101 | print_progress() { 102 | local rsync_process_count=$(ls -lq $script_output_dir/*.out | wc -l) 103 | printf "\n%.0s" $(seq $((($rsync_process_count * 2) + 5))) 104 | 105 | process_info=() 106 | for stdout_file in $(find "$script_output_dir" -name rsync-*.out); do 107 | rsync_id=$( [[ ${stdout_file} =~ .*rsync-([0-9]+).out ]] && echo "${BASH_REMATCH[1]}" ) 108 | process_info+=" $rsync_id;$stdout_file" 109 | done 110 | 111 | while true; do 112 | local terminal_rows=$(tput lines) 113 | tput cup $((terminal_rows - $(((rsync_process_count * 2) + 4)))) 0 114 | 115 | for info in $process_info; do 116 | IFS=';' read rsync_id stdout_file <<< "${info}" 117 | local text=$(tail -n 1 "$stdout_file") 118 | printf "Disk %s rsync progress:\n" "$rsync_id" 119 | printf "%s\n" "$text" 120 | done 121 | 122 | sleep 0.1 123 | done 124 | } 125 | 126 | 127 | copy_unraid_share() { 128 | # First we'll want to make sure we trap any SIGINT or SIGTEM signals so that either of 129 | # those can be poperly handled by killing any child processes that are spawned below. 130 | trap terminate SIGINT SIGTERM 131 | 132 | for disk_mount_path in $(find /mnt/disk* -type d -maxdepth 0 | sort -t "/" -hk 3.5,3); do 133 | disk_share_data_dir="$disk_mount_path/$share_name/$share_subdir" 134 | disk_id=${disk_mount_path//"/mnt/disk"/} 135 | rsync_file_basename="$script_output_dir"/rsync-"$disk_id" 136 | rm -vf "$rsync_file_basename.*" 137 | 138 | if [[ ! -d "$disk_share_data_dir" ]]; then 139 | echo "No data found on disk $disk_id for $share_name/$share_subdir, skipping" 140 | continue 141 | fi 142 | 143 | echo "Starting rsync process for disk $disk_id.." 144 | echo "[rsync-$disk_id] => Copying $share_name/$share_subdir data from: /mnt/disk${disk_id}/$share_name/$share_subdir" 145 | 146 | { 147 | # Assign a real-time io priority for the 148 | # child rsync process being spawned below 149 | ionice -c 1 \ 150 | rsync --recursive \ 151 | --whole-file \ 152 | --inplace \ 153 | --sparse \ 154 | --no-compress \ 155 | --max-alloc=8GiB \ 156 | --size-only \ 157 | --human-readable \ 158 | --info=progress2 \ 159 | --log-file="$rsync_file_basename.log" \ 160 | --log-file-format="%o=%-7'''b | total=%-7'''l [%i] => %f%L" \ 161 | "/mnt/disk${disk_id}/$share_name/$share_subdir/" \ 162 | "$target_path" \ 163 | >> "$rsync_file_basename.out" 164 | } & 165 | done 166 | 167 | # Uncomment function call below to output progress for 168 | # each child process in the terminal. It's a work in 169 | # progress and doesn't work very well at the moment so 170 | # so it's disabled for now. 171 | 172 | # print_progress 173 | 174 | # Once all rsync processes have been spawned, wait for 175 | # all of them to complete before exiting. Ctrl + C can 176 | # be used to terminate the script early and terminate 177 | # all child rsync processes that are still active 178 | echo "Waiting for rsync child processes to complete..." 179 | wait 180 | 181 | } 182 | 183 | setup_env 184 | copy_unraid_share 185 | cleanup_env 186 | --------------------------------------------------------------------------------