├── CONTRIBUTING.md
├── README.md
├── LICENSE
└── cassandra-cloud-backup.sh


/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | Want to contribute? Great! First, read this page (including the small print at the end).
 2 | 
 3 | ### Before you contribute
 4 | Before we can use your code, you must sign the
 5 | [Google Individual Contributor License Agreement]
 6 | (https://cla.developers.google.com/about/google-individual)
 7 | (CLA), which you can do online. The CLA is necessary mainly because you own the
 8 | copyright to your changes, even after your contribution becomes part of our
 9 | codebase, so we need your permission to use and distribute your code. We also
10 | need to be sure of various other things—for instance that you'll tell us if you
11 | know that your code infringes on other people's patents. You don't have to sign
12 | the CLA until after you've submitted your code for review and a member has
13 | approved it, but you must do it before we can put your code into our codebase.
14 | Before you start working on a larger contribution, you should get in touch with
15 | us first through the issue tracker with your idea so that we can help out and
16 | possibly guide you. Coordinating up front makes it much easier to avoid
17 | frustration later on.
18 | 
19 | ### Code reviews
20 | All submissions, including submissions by project members, require review. We
21 | use Github pull requests for this purpose.
22 | 
23 | ### The small print
24 | Contributions made by corporations are covered by a different agreement than
25 | the one above, the
26 | [Software Grant and Corporate Contributor License Agreement]
27 | (https://cla.developers.google.com/about/google-corporate).
28 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | Cassandra Backup and Restore with Google Cloud Storage
  2 | ====================
  3 | Shell script for creating and managing Cassandra Backups using Google Cloud Storage.
  4 | ## Features
  5 | - Take snapshot backups
  6 | - Copy Incremental backup files
  7 | - Compress with gzip or bzip2 to save space
  8 | - Prune old incremental and snapshot files
  9 | - Execute Dry Run mode to identify target files
 10 | 
 11 | ## Requirements
 12 | Google Cloud SDK installed with gsutil utility configured for authentication to 
 13 | An existing Google Cloud Storage bucket 
 14 | Linux system with BASH shell. 
 15 | Cassandra 2+
 16 | 
 17 | 
 18 | ## Usage
 19 | ./cassandra-cloud-backup.sh [ options ] < command> 
 20 | 
 21 | ### Examples
 22 |   - Take a full snapshot, gzip compress it with nice level 15,  use the /var/lib/cassandra/backups directory to stage the backup before
 23 |   uploading it to the GCS Bucket, and clear old incremental and snapshot files
 24 | 
 25 |  `./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -zCc -N 15 -d /var/lib/cassandra/backups backup`
 26 | 
 27 |   - Do a dry run of a full snapshot with verbose output and create list of files that would have been copied
 28 | 
 29 |   `./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -vn backup`
 30 | 
 31 |   - Backup and bzip2 compress copies of the most recent incremental backup files since the last incremental backup
 32 | 
 33 | `  ./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -ji -d /var/lib/cassandra/backups backup`
 34 | 
 35 |   - Restore a backup without prompting from given bucket path and keep the old files locally
 36 | 
 37 |  ` ./cassandra-cloud-backup.sh -b gs://cass-bk123/backups/host01/snpsht/2016-01-20_18-57/ -fk -d /var/lib/cassandra/backups restore`
 38 | 
 39 |   - List inventory of available backups stored in Google Cloud Store
 40 | 
 41 |  ` ./cassandra-cloud-backup.sh -b gs://cass-bk123 inventory`
 42 |  
 43 |   - List inventory of available backups stored in Google Cloud Store for a different server
 44 | 
 45 |  ` ./cassandra-cloud-backup.sh -b gs://cass-bk123 inventory -a testserver01`
 46 | 
 47 | ### Commands:
 48 | 
 49 | - backup        ---        Backup the Cassandra Node based on passed in options
 50 | - restore        ---      Restore the Cassandra Node from a specific snapshot backup  
 51 | - inventory      ---       List available backups
 52 | - commands      ---        List available commands
 53 | - options       ---        list available options
 54 | 
 55 | ### Options:
 56 |  Flags:
 57 | 
 58 |   -a, --alt-hostname
 59 |     Specify an alternate server name to be used in the bucket path construction. Used
 60 |     to create or retrieve backups from different servers
 61 | 
 62 |   -B, backup
 63 |     Default action is to take a backup
 64 | 
 65 |   -b, --gcsbucket
 66 |    Google Cloud Storage bucket used in deployment and by the cluster.
 67 | 
 68 |   -c, --clear-old-ss
 69 |     Clear any old SnapShots taken prior to this backup run to save space
 70 |     additionally will clear any old incremental backup files taken immediately
 71 |     following a successful snapshot. this option does nothing with the -i flag
 72 | 
 73 |   -C, --clear-old-inc
 74 |     Clear any old incremental backups taken prior to the the current snapshot
 75 | 
 76 |   -d, --backupdir
 77 |     The directory in which to store the backup files, be sure that this directory
 78 |     has enough space and the appropriate permissions
 79 | 
 80 |   -D, --download-only
 81 |     During a restore this will only download the target files from GCS
 82 | 
 83 |   -f, --force
 84 |     Used to force the restore without confirmation prompt
 85 | 
 86 |   -h, --help
 87 |     Print this help message.
 88 | 
 89 |   -H, --home-dir
 90 |     This is the $CASSANDRA_HOME directory and is only used if the data_directories, commitlog_directory,
 91 |     or the saved_caches_directory values cannot be parsed out of the yaml file. 
 92 | 
 93 |   -i, --incremental
 94 |     Copy the incremental backup files and do not take a snapshot. Can only
 95 |     be run when compression is enabled with -z or -j
 96 | 
 97 |   -j, --bzip
 98 |     Compresses the backup files with bzip2 prior to pushing to Google Cloud Storage
 99 |     This option will use additional local disk space set the --target-gz-dir
100 |     to use an alternate disk location if free space is an issue
101 | 
102 |   -k, --keep-old  
103 |     Set this flag on restore to keep a local copy of the old data files
104 |     Set this flag on backup to keep a local copy of the compressed backup and schema dump
105 | 
106 |   -l, --log-dir
107 |     Activate logging to file 'CassandraBackup${DATE}.log' from stdout
108 |     Include an optional directory path to write the file
109 |     Default path is /var/log/cassandra
110 | 
111 |   -n, --noop
112 |     Will attempt a dry run and verify all the settings are correct
113 | 
114 |   -N, --nice
115 |     Set the process priority, default 10
116 | 
117 |   -p
118 |     The Cassandra User Password if required for security
119 | 
120 |   -r,  restore
121 |     Restore a backup, requires a --gcsbucket path and optional --backupdir
122 | 
123 |   -s, --split-size
124 |     Split the resulting tar archive into the configured size in Megabytes, default 100M
125 | 
126 |   -S, --service-name
127 |     Specify the service name for cassandra, default is cassandra use to stop and start service
128 | 
129 |   -T, --target-gz-dir
130 |     Override the directory to save compressed files in case compression is used
131 |     default is --backupdir/compressed, also used to decompress for restore
132 | 
133 |   -u
134 |     The Cassandra User account if required for security
135 | 
136 |   -U, --auth-file
137 |     A file that contains authentication credentials for cqlsh and nodetool consisting of
138 |     two lines:
139 |       CASSANDRA_USER=username
140 |       CASSANDRA_PASS=password
141 | 
142 |   -v --verbose
143 |     When provided will print additional information to log file
144 | 
145 |   -y, --yaml
146 |     Path to the Cassandra yaml configuration file
147 |     default: /etc/cassandra/cassandra.yaml
148 | 
149 |   -z, --zip
150 |     Compresses the backup files with gzip prior to pushing to Google Cloud Storage
151 |     This option will use additional local disk space set the --target-gz-dir
152 |     to use an alternate disk location if free space is an issue
153 | 
154 | 
155 | ###Cron Examples
156 | - Full gzip compressed snapshot every day at 1:30 am with nice level 10
157 | 
158 | `30 1 * * * /path_to_scripts/cassandra-cloud-backup.sh -z -N10 -b gs://cass-bk123-vCcj -d /var/lib/cassandra/backups > /var/log/cassandra/$(date +\%Y\%m\%d\%H\%M\%S)-fbackup.log 2>&1`
159 | 
160 | - Incremental gzip compressed backups copied every hour nice level 10
161 | 
162 | `0 * * * * /path_to_scripts/cassandra-cloud-backup.sh -b -N10 gs://cass-bk123 -vjiz -d /var/lib/cassandra/backups > /var/log/cassandra/$(date +\%Y\%m\%d\%H\%M\%S)-ibackup.log 2>&1`
163 | 
164 | ### Notes
165 | 
166 | The script must be run with sufficient privileges to be able to stop/start processes and create/delete directories and files within the working directories.
167 | 
168 | The restore command is designed to perform a simple restore of a full snapshot. In the event that you want to restore incremental backups you should start by restoring the last full snapshot prior to your target incremental backup file and manually move the files from each incremental backup in chronological order leading up to the target incremental backup file.  The schema dump is included in the snapshot backups, but if necessary it must also be restored manually.
169 | 
170 | Snapshots are taken at the system level, the script currently does not support backup or restore of an individual keyspace or columnfamily. 
171 | 
172 | In order to enable incremental backups, the `incremental_backups` option has to be set to true in the cassandra.yaml file.
173 | 
174 | ###License
175 |  Copyright 2016 Google Inc. All Rights Reserved.
176 | 
177 |  Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
178 |       http://www.apache.org/licenses/LICENSE-2.0
179 | Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS-IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.
180 | 
181 | This is not an official Google product.
182 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 
177 |    END OF TERMS AND CONDITIONS
178 | 
179 |    APPENDIX: How to apply the Apache License to your work.
180 | 
181 |       To apply the Apache License to your work, attach the following
182 |       boilerplate notice, with the fields enclosed by brackets "[]"
183 |       replaced with your own identifying information. (Don't include
184 |       the brackets!)  The text should be enclosed in the appropriate
185 |       comment syntax for the file format. We also recommend that a
186 |       file or class name and description of purpose be included on the
187 |       same "printed page" as the copyright notice for easier
188 |       identification within third-party archives.
189 | 
190 |    Copyright [yyyy] [name of copyright owner]
191 | 
192 |    Licensed under the Apache License, Version 2.0 (the "License");
193 |    you may not use this file except in compliance with the License.
194 |    You may obtain a copy of the License at
195 | 
196 |        http://www.apache.org/licenses/LICENSE-2.0
197 | 
198 |    Unless required by applicable law or agreed to in writing, software
199 |    distributed under the License is distributed on an "AS IS" BASIS,
200 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 |    See the License for the specific language governing permissions and
202 |    limitations under the License.
203 | 


--------------------------------------------------------------------------------
/cassandra-cloud-backup.sh:
--------------------------------------------------------------------------------
   1 | #!/bin/bash
   2 | #
   3 | # Copyright 2016 Google Inc. All Rights Reserved.
   4 | #
   5 | # Licensed under the Apache License, Version 2.0 (the "License");
   6 | # you may not use this file except in compliance with the License.
   7 | # You may obtain a copy of the License at
   8 | #
   9 | #      http://www.apache.org/licenses/LICENSE-2.0
  10 | #
  11 | # Unless required by applicable law or agreed to in writing, software
  12 | # distributed under the License is distributed on an "AS-IS" BASIS,
  13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14 | # See the License for the specific language governing permissions and
  15 | # limitations under the License.
  16 | #
  17 | #
  18 | # Description :  Take snapshot and incremental backups of Cassandra and copy them to Google Cloud Storage
  19 | #                Optionally restore full system from snapshot
  20 | # This is not an official Google product.
  21 | #
  22 | VERSION='1.0'
  23 | SCRIPT_NAME="cassandra-cloud-backup.sh"
  24 | #exit on any error
  25 | set -e
  26 | # Prints the usage for this script
  27 | function print_usage() {
  28 |   echo "Cassandra Backup to Google Cloud Storage Version: ${VERSION}"
  29 |   cat <<'EOF'
  30 | Usage: ./cassandra-cloud-backup.sh [ options ] <command>
  31 | Description:
  32 |   Utility for creating and managing Cassandra Backups with Google Cloud Storage.
  33 |   Run with admin level privileges.
  34 | 
  35 |   The backup command can use gzip or bzip2 for compression, and split large files
  36 |   into multiple smaller files. If incremental backups are enabled in
  37 |   Cassandra, this script can incrementally copy them as they are created, saving
  38 |   time and space. Additionally, this script can be used to cleanup old SnapShot
  39 |   and Incremental files locally.
  40 | 
  41 |   The restore command is designed to perform a simple restore of a full snapshot.
  42 |   In the event that you want to restore incremental backups you should start by
  43 |   restoring the last full snapshot prior to your target incremental backup file
  44 |   and manually move the files from each incremental backup in chronological order
  45 |   leading up to the target incremental backup file.  The schema dump and token ring
  46 |   are included in the snapshot backups, but if necessary they must also be restored
  47 |   manually.
  48 | 
  49 | Flags:
  50 |   -a, --alt-hostname
  51 |     Specify an alternate server name to be used in the bucket path construction. Used
  52 |     to create or retrieve backups from different servers
  53 | 
  54 |   -B, backup
  55 |     Default action is to take a backup
  56 | 
  57 |   -b, --gcsbucket
  58 |    Google Cloud Storage bucket used in deployment and by the cluster.
  59 | 
  60 |   -c, --clear-old-ss
  61 |     Clear any old SnapShots taken prior to this backup run to save space
  62 |     additionally will clear any old incremental backup files taken immediately
  63 |     following a successful snapshot. this option does nothing with the -i flag
  64 | 
  65 |   -C, --clear-old-inc
  66 |     Clear any old incremental backups taken prior to the the current snapshot
  67 | 
  68 |   -d, --backupdir
  69 |     The directory in which to store the backup files, be sure that this directory
  70 |     has enough space and the appropriate permissions
  71 | 
  72 |   -D, --download-only
  73 |     During a restore this will only download the target files from GCS
  74 | 
  75 |   -f, --force
  76 |     Used to force the restore without confirmation prompt
  77 | 
  78 |   -h, --help
  79 |     Print this help message.
  80 | 
  81 |   -H, --home-dir
  82 |     This is the $CASSANDRA_HOME directory and is only used if the data_directories,
  83 |     commitlog_directory, or the saved_caches_directory values cannot be parsed out of the
  84 |     yaml file.
  85 | 
  86 |   -i, --incremental
  87 |     Copy the incremental backup files and do not take a snapshot. Can only
  88 |     be run when compression is enabled with -z or -j
  89 | 
  90 |   -j, --bzip
  91 |     Compresses the backup files with bzip2 prior to pushing to Google Cloud Storage
  92 |     This option will use additional local disk space set the --target-gz-dir
  93 |     to use an alternate disk location if free space is an issue
  94 | 
  95 |   -k, --keep-old
  96 |     Set this flag on restore to keep a local copy of the old data files
  97 |     Set this flag on backup to keep a local copy of the compressed backup, schema dump,
  98 |     and token ring
  99 | 
 100 |   -l, --log-dir
 101 |     Activate logging to file 'CassandraBackup${DATE}.log' from stdout
 102 |     Include an optional directory path to write the file
 103 |     Default path is /var/log/cassandra
 104 | 
 105 |   -L, --inc-commit-logs
 106 |     Add commit logs to the backup archive. WARNING: This option can cause the script to
 107 |     fail an active server as the files roll over
 108 | 
 109 |   -n, --noop
 110 |     Will attempt a dry run and verify all the settings are correct
 111 | 
 112 |   -N, --nice
 113 |     Set the process priority, default 10
 114 | 
 115 |   -p
 116 |     The Cassandra User Password if required for security
 117 | 
 118 |   -r,  restore
 119 |     Restore a backup, requires a --gcsbucket path and optional --backupdir
 120 | 
 121 |   -s, --split-size
 122 |     Split the resulting tar archive into the configured size in Megabytes, default 100M
 123 | 
 124 |   -S, --service-name
 125 |     Specify the service name for cassandra, default is cassandra use to stop and start service
 126 | 
 127 |   -T, --target-gz-dir
 128 |     Override the directory to save compressed files in case compression is used
 129 |     default is --backupdir/compressed, also used to decompress for restore
 130 | 
 131 |   -u
 132 |     The Cassandra User account if required for security
 133 | 
 134 |   -U, --auth-file
 135 |     A file that contains authentication credentials for cqlsh and nodetool consisting of
 136 |     two lines:
 137 |       CASSANDRA_USER=username
 138 |       CASSANDRA_PASS=password
 139 | 
 140 |   -v, --verbose
 141 |     When provided will print additional information to log file
 142 | 
 143 |   -w, --with-caches
 144 |     For posterity's sake, to save the read caches in a backup use this flag, although it
 145 |     likely represents a waste of space
 146 | 
 147 |   -y, --yaml
 148 |     Path to the Cassandra yaml configuration file
 149 |     default: /etc/cassandra/cassandra.yaml
 150 | 
 151 |   -z, --zip
 152 |     Compresses the backup files with gzip prior to pushing to Google Cloud Storage
 153 |     This option will use additional local disk space set the --target-gz-dir
 154 |     to use an alternate disk location if free space is an issue
 155 | 
 156 | Commands:
 157 |   backup, restore, inventory, commands, options
 158 | 
 159 | backup                Backup the Cassandra Node based on passed in options
 160 | 
 161 | restore               Restore the Cassandra Node from a specific snapshot backup
 162 |                       or download an incremental backup locally and extract
 163 | 
 164 | inventory             List available backups
 165 | 
 166 | commands              List available commands
 167 | 
 168 | options               list available options
 169 | 
 170 | Examples:
 171 |   Take a full snapshot, gzip compress it with nice=15,
 172 |   upload into the GCS Bucket, and clear old incremental and snapshot files
 173 |   ./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -zCc -N 15 backup
 174 | 
 175 |   Do a dry run of a full snapshot with verbose output and
 176 |   create list of files that would have been copied
 177 |   ./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -vn backup
 178 | 
 179 |   Backup and bzip2 compress copies of the most recent incremental
 180 |   backup files since the last incremental backup
 181 |   ./cassandra-cloud-backup.sh -b gs://cassandra-backups123/ -ji backup
 182 | 
 183 |   Restore a backup without prompting from specified bucket path and keep the old files locally
 184 |   ./cassandra-cloud-backup.sh -b gs://cass-bk123/backups/host01/snpsht/2016-01-20_18-57/ -fk restore
 185 | 
 186 |   Restore a specific backup to a custom CASSANDRA_HOME directory with secure credentials in
 187 |   password.txt file with Cassandra running as a Linux service name cass
 188 |   ./cassandra-cloud-backup.sh -b gs://cass-bk123/backups/host01/snpsht/2016-01-20_18-57/ \
 189 |    -y /opt/cass/conf/cassandra.yaml -H /opt/cass -U password.txt -S cass restore
 190 | 
 191 |   List inventory of available backups stored in Google Cloud Store
 192 |   ./cassandra-cloud-backup.sh -b gs://cass-bk123 inventory
 193 | 
 194 | EOF
 195 | }
 196 | 
 197 | # List all commands for command completion.
 198 | function commands() {
 199 |   print_usage | sed -n -e '/^Commands:/,/^$/p' | tail -n +2 | head -n -1 | tr -d ','
 200 | }
 201 | 
 202 | # List all options for command completion.
 203 | function options() {
 204 |   print_usage | grep -E '^ *-' | tr -d ','
 205 | }
 206 | 
 207 | # Override the date function
 208 | function prepare_date() {
 209 |   date "$@"
 210 | }
 211 | 
 212 | # Prefix a date prior to echo output
 213 | function loginfo() {
 214 | 
 215 |   if  ${LOG_OUTPUT}; then
 216 |      echo "$(prepare_date +%F_%H:%M:%S): ${*}" >> "${LOG_FILE}"
 217 |   else
 218 |      echo "$(prepare_date +%F_%H:%M:%S): ${*}"
 219 |   fi
 220 | }
 221 | 
 222 | # Only used if -v --verbose is passed in
 223 | function logverbose() {
 224 |   if ${VERBOSE}; then
 225 |     loginfo "VERBOSE: ${*}"
 226 |   fi
 227 | }
 228 | 
 229 | # Pass errors to stderr.
 230 | function logerror() {
 231 |   loginfo "ERROR: ${*}" >&2
 232 |   let ERROR_COUNT++
 233 | }
 234 | 
 235 | # Bad option was found.
 236 | function print_help() {
 237 |   logerror "Unknown Option Encountered. For help run '${SCRIPT_NAME} --help'"
 238 |   print_usage
 239 |   exit 1
 240 | }
 241 | 
 242 | # Validate that all configuration options are correct and no conflicting options are set
 243 | function validate() {
 244 |   touch_logfile
 245 |   single_script_check
 246 |   set_auth_string
 247 |   verbose_vars
 248 |   loginfo "***************VALIDATING INPUT******************"
 249 |   if [ -z ${GSUTIL} ]; then
 250 |     logerror "Cannot find gsutil utility please make sure it is in the PATH"
 251 |     exit 1
 252 |   fi
 253 |   if [ -z ${GCS_BUCKET} ]; then
 254 |       logerror "Please pass in the GCS Bucket to use with this script"
 255 |       exit 1
 256 |   else
 257 |       if ! ${GSUTIL} ls ${GCS_BUCKET} &> /dev/null; then
 258 |         logerror "Cannot access Google Cloud Storage bucket ${GCS_BUCKET} make sure" \
 259 |         " it exists"
 260 |         exit 1
 261 |       fi
 262 |   fi
 263 |   if [ ${ACTION} != "inventory" ]; then
 264 |     if [ -z ${NODETOOL} ]; then
 265 |       logerror "Cannot find nodetool utility please make sure it is in the PATH"
 266 |     fi
 267 |     if [ -z ${CQLSH} ]; then
 268 |       logerror "Cannot find cqlsh utility please make sure it is in the PATH"
 269 |     fi
 270 |     if [ ! -f ${YAML_FILE} ]; then
 271 |       logerror "Yaml File ${YAML_FILE} does not exist and --yaml argument is missing"
 272 |     else
 273 |       #different values are needed for backup and for restore
 274 |       eval "parse_yaml_${ACTION}"
 275 |     fi
 276 | 
 277 |     if [ -z ${data_file_directories} ]; then
 278 |       if [ -z ${CASS_HOME} ]; then
 279 |         logerror "Cannot parse data_directories from ${YAML_FILE} and --home-dir argument" \
 280 |         " is missing, which should be the \$CASSANDRA_HOME path"
 281 |       else
 282 |         data_file_directories="${CASS_HOME}/data"
 283 |       fi
 284 |     fi
 285 |     if ${INCLUDE_COMMIT_LOGS}; then
 286 |       loginfo "WARNING: Backing up Commit Logs can cause script to fail if server is under load"
 287 |     fi
 288 |     if [ -z ${commitlog_directory} ]; then
 289 |       if [ -z ${CASS_HOME} ]; then
 290 |         logerror "Cannot parse commitlog_directory from ${YAML_FILE} and --home-dir argument" \
 291 |         " is missing, which should be the \$CASSANDRA_HOME path"
 292 |       else
 293 |         commitlog_directory="${CASS_HOME}/data/commitlog"
 294 |       fi
 295 |     fi
 296 |     if [ ! -d ${commitlog_directory} ]; then
 297 |       logerror "no diretory commitlog_directory: ${commitlog_directory} "
 298 |     fi
 299 |     if ${INCLUDE_CACHES}; then
 300 |       loginfo "Backing up saved caches can waste space and time, but it is happening anyway"
 301 |     fi
 302 |     if [ -z ${saved_caches_directory} ]; then
 303 |       if [ -z ${CASS_HOME} ]; then
 304 |         logerror "Cannot parse saved_caches_directory from ${YAML_FILE} and --home-dir argument" \
 305 |         " is missing, which should be the \$CASSANDRA_HOME path"
 306 |       else
 307 |         saved_caches_directory="${CASS_HOME}/data/saved_caches"
 308 |       fi
 309 |     fi
 310 |     if [ ! -d ${saved_caches_directory} ]; then
 311 |       logerror "saved_caches_directory does not exist: ${saved_caches_directory} "
 312 |     fi
 313 |     if [ ! -d ${data_file_directories} ]; then
 314 |       logerror "data_file_directories does not exist : ${data_file_directories} "
 315 |     fi
 316 |     #BACKUP_DIR is used to stage backups and stage restores, so create it either way
 317 |     if [ ! -d ${BACKUP_DIR} ]; then
 318 |         loginfo "Creating backup directory ${BACKUP_DIR}"
 319 |         mkdir -p ${BACKUP_DIR}
 320 |     fi
 321 |     if ${SPLIT_FILE}; then
 322 |       SPLIT_FILE_SUFFIX="${TAR_EXT}-${SPLIT_FILE_SUFFIX}"
 323 |     fi
 324 |     if [ ! -d ${COMPRESS_DIR} ]; then
 325 |       loginfo "Creating compression target directory"
 326 |       mkdir -p ${COMPRESS_DIR}
 327 |     fi
 328 |     if [ -z ${TAR} ] || [ -z ${NICE} ]; then
 329 |       logerror "The tar and nice utilities must be present to win."
 330 |     fi
 331 |     if [ ${ACTION} = "restore" ]; then
 332 |       GCS_LS=$(${GSUTIL} ls ${GCS_BUCKET} | head -n1)
 333 |       loginfo "GCS first file listed: ${GCS_LS}"
 334 |       if  grep -q 'incr' <<< "${GCS_LS}"; then
 335 |         loginfo "Detected incremental backup requested for restore. This script " \
 336 |         "will only download the files locally"
 337 |         DOWNLOAD_ONLY=true
 338 |         INCREMENTAL=true
 339 |         SUFFIX="incr"
 340 |       else
 341 |         if grep -q 'snpsht' <<< "${GCS_LS}"; then
 342 |           loginfo "Detected full snapshot backup requested for restore."
 343 |         else
 344 |           logerror "Detected a Google Cloud Storage bucket path that is not a backup" \
 345 |           " location. Make sure the --gcsbucket e is the full path to a specific backup"
 346 |         fi
 347 |       fi
 348 |       if grep -q "tgz" <<< "${GCS_LS}"; then
 349 |         loginfo "Detected compressed .tgz file for restore"
 350 |         COMPRESSION=true
 351 |         TAR_EXT="tgz"
 352 |         TAR_CFLAG="-z"
 353 |       fi
 354 |       if  grep -q "tbz" <<< "${GCS_LS}"; then
 355 |         loginfo "Detected compressed .tbz file for restore"
 356 |         COMPRESSION=true
 357 |         TAR_EXT="tbz"
 358 |         TAR_CFLAG="-j"
 359 |       fi
 360 |       if  grep -q "tar" <<< "${GCS_LS}"; then
 361 |         loginfo "Detected uncompressed .tar file for restore"
 362 |         COMPRESSION=false
 363 |         TAR_EXT="tar"
 364 |         TAR_CFLAG=""
 365 |       fi
 366 |       RESTORE_FILE=$(awk -F"/" '{print $NF}' <<< "${GCS_LS}")
 367 |       if [[ "${RESTORE_FILE}" != *.${TAR_EXT} ]] ; then
 368 |           #Detect Split Files${TAR_EXT}-
 369 |           if [[ "${RESTORE_FILE}" ==  ${TAR_EXT}-* ]]; then
 370 |             SPLIT_FILE=true
 371 |             loginfo "Split file restore detected"
 372 |           else
 373 |             logerror "Restore is not a tar file  ${GCS_BUCKET}"
 374 |           fi
 375 |       fi
 376 |       if [[ ! ${GCS_BUCKET} =~ ^.*\.${TAR_EXT}$ ]]; then
 377 |         if ${SPLIT_FILE}; then
 378 |           #remove the trailing digits and replace the suffix
 379 |           RESTORE_FILE="${RESTORE_FILE%${SUFFIX}*}${SUFFIX}*"
 380 |           GCS_BUCKET="${GCS_BUCKET%/}/${RESTORE_FILE}"
 381 |         else
 382 |           GCS_BUCKET="${GCS_BUCKET%/}/${RESTORE_FILE}"
 383 |           loginfo "Fixed up restore bucket path: ${GCS_BUCKET}"
 384 |         fi
 385 |       fi
 386 | 
 387 |       if grep -q "," <<< "${seed_provider_class_name_parameters_seeds}"; then
 388 |         loginfo "Restore target node is likely part of a cluster. Restore script" \
 389 |         " will not start node automatically"
 390 |         AUTO_RESTART=false
 391 |       fi
 392 |       loginfo "creating staging directory for restore: ${BACKUP_DIR}/restore"
 393 |       mkdir -p "${BACKUP_DIR}/restore"
 394 |     else
 395 |       if ${INCREMENTAL}; then
 396 |         if ${CLEAR_INCREMENTALS}; then
 397 |           logerror "--clear-old-inc option is not compatible with --incremental option"
 398 |         fi
 399 |         if ${CLEAR_SNAPSHOTS}; then
 400 |           logerror "--incremental option is not compatible with --clear-old-ss option"
 401 |         fi
 402 |         if [ -z ${incremental_backups} ] || [  ${incremental_backups} = false  ]; then
 403 |           logerror "Cannot copy incremental backups until 'incremental_backups' is true " \
 404 |           "in ${YAML_FILE} "
 405 |         fi
 406 |         if [ ! -f "${BACKUP_DIR}/last_inc_backup_time" ]; then
 407 |           touch "${BACKUP_DIR}/last_inc_backup_time"
 408 |         fi
 409 |       else
 410 |           if [ ${CLEAR_INCREMENTALS} = true ] && [ ${incremental_backups} != true ]; then
 411 |               logerror "Cannot clear incremental backups because 'incremental_backups' is " \
 412 |               "false in ${YAML_FILE} "
 413 |           fi
 414 |           if [ ! -d "${SCHEMA_DIR}" ]; then
 415 |             loginfo "Creating schema dump directory: ${SCHEMA_DIR}"
 416 |             mkdir -p "${SCHEMA_DIR}"
 417 |           fi
 418 |           if [ ! -d "${TOKEN_RING_DIR}" ]; then
 419 |             loginfo "Creating token ring dump directory: ${TOKEN_RING_DIR}"
 420 |             mkdir -p "${TOKEN_RING_DIR}"
 421 |           fi
 422 |       fi
 423 |     fi
 424 |   else
 425 |     # ${ACTION} = "inventory"
 426 |     parse_yaml_inventory
 427 |   fi
 428 | 
 429 |   logverbose "ERROR_COUNT: ${ERROR_COUNT}"
 430 | 
 431 |   if [ ${ERROR_COUNT} -gt 0 ]; then
 432 |     loginfo "*************ERRORS WHILE VALIDATING INPUT*************"
 433 |     exit 1
 434 |   fi
 435 |   loginfo "*************SUCCESSFULLY VALIDATED INPUT**************"
 436 | }
 437 | 
 438 | # Print out all the important variables if -v is set
 439 | function verbose_vars() {
 440 |   logverbose "************* PRINTING VARIABLES ****************\n"
 441 |   logverbose "ACTION: ${ACTION}"
 442 |   logverbose "AUTO_RESTART: ${AUTO_RESTART}"
 443 |   logverbose "BACKUP_DIR: ${BACKUP_DIR}"
 444 |   logverbose "CASSANDRA_PASS: ${CASSANDRA_PASS}"
 445 |   logverbose "CASSANDRA_USER: ${CASSANDRA_USER}"
 446 |   logverbose "CASSANDRA_OG: ${CASSANDRA_OG}"
 447 |   logverbose "CLEAR_INCREMENTALS: ${CLEAR_INCREMENTALS}"
 448 |   logverbose "CLEAR_SNAPSHOTS: ${CLEAR_SNAPSHOTS}"
 449 |   logverbose "COMPRESS_DIR: ${COMPRESS_DIR}"
 450 |   logverbose "COMPRESSION: ${COMPRESSION}"
 451 |   logverbose "CQLSH: ${CQLSH}"
 452 |   logverbose "CQLSH_DEFAULT_HOST: ${CQLSH_DEFAULT_HOST}"
 453 |   logverbose "DATE: ${DATE}"
 454 |   logverbose "DOWNLOAD_ONLY: ${DOWNLOAD_ONLY}"
 455 |   logverbose "DRY_RUN: ${DRY_RUN}"
 456 |   logverbose "FORCE_RESTORE: ${FORCE_RESTORE}"
 457 |   logverbose "GCS_BUCKET: ${GCS_BUCKET}"
 458 |   logverbose "GCS_TMPDIR: ${GCS_TMPDIR}"
 459 |   logverbose "GSUTIL: ${GSUTIL}"
 460 |   logverbose "HOSTNAME: ${HOSTNAME}"
 461 |   logverbose "INCREMENTAL: ${INCREMENTAL}"
 462 |   logverbose "INCLUDE_CACHES: ${INCLUDE_CACHES}"
 463 |   logverbose "INCLUDE_COMMIT_LOGS: ${INCLUDE_COMMIT_LOGS}"
 464 |   logverbose "KEEP_OLD_FILES: ${KEEP_OLD_FILES}"
 465 |   logverbose "LOG_DIR: ${LOG_DIR}"
 466 |   logverbose "LOG_FILE: ${LOG_FILE}"
 467 |   logverbose "LOG_OUTPUT: ${LOG_OUTPUT}"
 468 |   logverbose "NICE: ${NICE}"
 469 |   logverbose "NICE_LEVEL: ${NICE_LEVEL}"
 470 |   logverbose "NODETOOL: ${NODETOOL}"
 471 |   logverbose "SCHEMA_DIR: ${SCHEMA_DIR}"
 472 |   logverbose "TOKEN_RING_DIR: ${TOKEN_RING_DIR}"
 473 |   logverbose "SERVICE_NAME: ${SERVICE_NAME}"
 474 |   logverbose "SNAPSHOT_NAME: ${SNAPSHOT_NAME}"
 475 |   logverbose "SPLIT_FILE: ${SPLIT_FILE}"
 476 |   logverbose "SPLIT_SIZE: ${SPLIT_SIZE}"
 477 |   logverbose "SUFFIX: ${SUFFIX}"
 478 |   logverbose "TARGET_LIST_FILE: ${TARGET_LIST_FILE}"
 479 |   logverbose "USER_OPTIONS: ${USER_OPTIONS}"
 480 |   logverbose "USER_FILE: ${USER_FILE}"
 481 |   logverbose "YAML_FILE: ${YAML_FILE}"
 482 |   logverbose "************* DONE PRINTING VARIABLES ************\n"
 483 | }
 484 | 
 485 | # Check that script is not running more than once
 486 | function single_script_check() {
 487 |   local grep_script
 488 |   #wraps a [] around the first letter to trick the grep statement into ignoring itself
 489 |   grep_script="$(echo ${SCRIPT_NAME} | sed 's/^/\[/' | sed 's/^\(.\{2\}\)/\1\]/')"
 490 |   logverbose "checking that script isn't already running"
 491 |   logverbose "grep_script: ${grep_script}"
 492 |   status="$(ps -feww | grep -w \"${grep_script}\" \
 493 |     | awk -v pid="$(echo $$)" '$2 != pid { print $2 }')"
 494 |   if [ ! -z "${status}" ]; then
 495 |     logerror " ${SCRIPT_NAME} : Process is already running. Aborting"
 496 |     exit 1;
 497 |   fi
 498 | }
 499 | 
 500 | # Create the log file if requested
 501 | function touch_logfile() {
 502 |   if [ "${LOG_OUTPUT}" = true ] && [ ! -f "${LOG_FILE}" ]; then
 503 |     touch "${LOG_FILE}"
 504 |   fi
 505 | }
 506 | 
 507 | # List available backups in GCS
 508 | function inventory() {
 509 |   loginfo "Available Snapshots:"
 510 |   ${GSUTIL} ls -d "${GCS_BUCKET}/backups/${HOSTNAME}/snpsht/*"
 511 |   if [ -z $incremental_backups ] || [ $incremental_backups = false ]; then
 512 |     loginfo "Incremental Backups are not enabled for Cassandra"
 513 |   fi
 514 |   loginfo "Available Incremental Backups:"
 515 |   ${GSUTIL} ls -d "${GCS_BUCKET}/backups/${HOSTNAME}/incr/*"
 516 | }
 517 | 
 518 | # This is the main backup function that orchestrates all the options
 519 | # to create the backup set and then push it to GCS
 520 | function backup() {
 521 |   create_gcs_backup_path
 522 |   clear_backup_file_list
 523 |   if ${CLEAR_SNAPSHOTS}; then
 524 |     clear_snapshots
 525 |   fi
 526 |   if ${INCREMENTAL}; then
 527 |     find_incrementals
 528 |   else
 529 |     export_schema
 530 |     export_token_ring
 531 |     take_snapshot
 532 |     find_snapshots
 533 |   fi
 534 |   copy_other_files
 535 |   if ${SPLIT_FILE}; then
 536 |     split_archive
 537 |   else
 538 |     archive_compress
 539 |   fi
 540 |   copy_to_gcs
 541 |   save_last_inc_backup_time
 542 |   backup_cleanup
 543 |   if ${CLEAR_INCREMENTALS}; then
 544 |     clear_incrementals
 545 |   fi
 546 | }
 547 | 
 548 | #specific variables are needed for backup
 549 | function parse_yaml_backup() {
 550 |   loginfo "Parsing Cassandra Yaml Config Values"
 551 |   fields=('data_file_directories' \
 552 |           'commitlog_directory' \
 553 |           'saved_caches_directory' \
 554 |           'incremental_backups' \
 555 |           'native_transport_port' \
 556 |           'rpc_address')
 557 |   parse_yaml ${YAML_FILE}
 558 | }
 559 | 
 560 | #specific variables are needed for restore
 561 | function parse_yaml_restore() {
 562 |   loginfo "Parsing Cassandra Yaml Config Values"
 563 |   fields=('data_file_directories' \
 564 |           'commitlog_directory' \
 565 |           'saved_caches_directory' \
 566 |           'incremental_backups' \
 567 |           'seed_provider_class_name_parameters_seeds')
 568 | 
 569 |   parse_yaml ${YAML_FILE}
 570 | }
 571 | 
 572 | function parse_yaml_inventory() {
 573 |   fields=('incremental_backups')
 574 |   parse_yaml ${YAML_FILE}
 575 | }
 576 | 
 577 | # Based on https://gist.github.com/pkuczynski/8665367
 578 | #
 579 | # Works for arrays of hashes, and some hashes with arrays
 580 | # Variable names will be underscore delimitted based on nested parent names
 581 | # Send in yaml file as first argument and create a global array named $fields
 582 | # with necessary yaml field names fully underscore delimitted to match nesting
 583 | # then this will register those variables into the shell's scope if they exist in Yaml
 584 | # $VERBOSE=1 will also print the full values
 585 | # Defaults to indent of 4 so d=4
 586 | # To use with indent of 2 change d to 2
 587 | function parse_yaml() {
 588 |     local s
 589 |     local w
 590 |     local fs
 591 |     local d
 592 |     s='[[:space:]]*'
 593 |     w='[a-zA-Z0-9_]*'
 594 |     fs="$(echo @|tr @ '\034')"
 595 |     d=4
 596 |     eval $(
 597 |       sed -ne "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
 598 |             -e "s|^\($s\)-\?$s\($w\)$s[:-]$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
 599 |       awk -F"$fs"  -v names="${fields[*]}" '
 600 |       BEGIN { split(names,n," ") }
 601 |       { sc=length($1) % "'$d'";
 602 |         if ( sc == 0 ) {
 603 |           indent = length($1)/"'$d'"
 604 |         } else {
 605 |           indent = (length($1)+("'$d'"-sc))/"'$d'"
 606 |         }
 607 |         vname[indent] = $2;
 608 |         for (i in vname) {if (i > indent){ delete vname[i];}}
 609 |         if (length($3) > 0 ) {
 610 |           vn="";
 611 |           for (i=0; i<indent; i++) {
 612 |             if (length(vname[i]) > 0) {vn=(vn)(vname[i])("_");}
 613 |           }
 614 |           ap="";
 615 |           if($2 ~ /^ *$/ && vn ~ /_$/) { vn=substr(vn,1,length(vn)-1);ap="+" }
 616 |           for ( name in  n ) {
 617 |             if ( $2 == n[name] || vn == n[name] || (vn)($2) == n[name]) {
 618 |               printf("%s%s%s=(\"%s\")\n", vn, $2, ap, $3);
 619 |               if ("'"$VERBOSE"'" == "true"){
 620 |                 printf(";logverbose %s%s%s=\\(\\\"%s\\\"\\);", vn, $2, ap, $3);
 621 |               }
 622 |             }
 623 |           }
 624 |         }
 625 |       }'
 626 |     )
 627 | }
 628 | 
 629 | # If a username and password is required for cqlsh and nodetool
 630 | function set_auth_string() {
 631 |   if ${USE_AUTH}; then
 632 |     if [ -n "${USER_FILE}" ] && [ -f "${USER_FILE}" ]; then
 633 |       source "${USER_FILE}"
 634 |     fi
 635 |     if [ -z "${CASSANDRA_USER}" ] || [ -z "${CASSANDRA_PASS}" ]; then
 636 |       logerror "Cassandra authentication values are missing or empty CASSANDRA_USER or CASSANDRA_PASS"
 637 |     fi
 638 |     USER_OPTIONS=" -u ${CASSANDRA_USER} --password ${CASSANDRA_PASS} "
 639 |   fi
 640 | }
 641 | 
 642 | # Set the backup path bucket URL
 643 | function create_gcs_backup_path() {
 644 |   GCS_BACKUP_PATH="${GCS_BUCKET}/backups/${HOSTNAME}/${SUFFIX}/${DATE}/"
 645 |   loginfo "Will use target backup directory: ${GCS_BACKUP_PATH}"
 646 | }
 647 | 
 648 | # In case there is an existing backup file list, clear it out
 649 | function clear_backup_file_list() {
 650 |   loginfo "Clearing target list file: ${TARGET_LIST_FILE}"
 651 |    > "${TARGET_LIST_FILE}"
 652 | }
 653 | 
 654 | # Use nodetool to take a snapshot with a specific name
 655 | function take_snapshot() {
 656 |   loginfo "Taking Snapshot ${SNAPSHOT_NAME}"
 657 |   #later used to remove older incrementals
 658 |   SNAPSHOT_TIME=$(prepare_date "+%F %H:%M:%S")
 659 |   if ${DRY_RUN}; then
 660 |     loginfo "DRY RUN: ${NODETOOL} ${USER_OPTIONS} snapshot -t ${SNAPSHOT_NAME} "
 661 |   else
 662 |     ${NODETOOL} ${USER_OPTIONS} snapshot -t "${SNAPSHOT_NAME}" #2>&1 > ${LOG_FILE}
 663 |     loginfo "Completed Snapshot ${SNAPSHOT_NAME}"
 664 |   fi
 665 | }
 666 | 
 667 | # Export the whole schema for safety
 668 | function export_schema() {
 669 |   loginfo "Exporting Schema to ${SCHEMA_DIR}/${DATE}-schema.cql"
 670 |   local cqlsh_host=${rpc_address:-$CQLSH_DEFAULT_HOST}
 671 |   local cmd
 672 |   cmd="${CQLSH} ${cqlsh_host} ${native_transport_port} ${USER_OPTIONS} -e 'DESC SCHEMA;'"
 673 |   if ${DRY_RUN}; then
 674 |     loginfo "DRY RUN:  ${cmd}  > ${SCHEMA_DIR}/${DATE}-schema.cql"
 675 |   else
 676 |     #cqlsh does not behave consistently when executed directly from inside a script
 677 |     bash -c "${cmd} > ${SCHEMA_DIR}/${DATE}-schema.cql"
 678 |   fi
 679 |   echo "${SCHEMA_DIR}/${DATE}-schema.cql" >> "${TARGET_LIST_FILE}"
 680 | }
 681 | 
 682 | # Export the whole token ring for safety
 683 | function export_token_ring() {
 684 |   loginfo "Exporting Token Ring to ${TOKEN_RING_DIR}/${DATE}-token-ring"
 685 |   local cmd
 686 |   cmd="${NODETOOL} ${USER_OPTIONS} ring"
 687 |   if ${DRY_RUN}; then
 688 |     loginfo "DRY RUN:  ${cmd}  > ${TOKEN_RING_DIR}/${DATE}-token-ring"
 689 |   else
 690 |     bash -c "${cmd} > ${TOKEN_RING_DIR}/${DATE}-token-ring"
 691 |   fi
 692 |   echo "${TOKEN_RING_DIR}/${DATE}-token-ring" >> "${TARGET_LIST_FILE}"
 693 | }
 694 | 
 695 | 
 696 | # Copy the commit logs, saved caches directoy and the yaml config file
 697 | function copy_other_files() {
 698 |   loginfo "Copying caches, commitlogs and config file paths to backup list"
 699 |   #resolves issue #2
 700 |   if ${INCLUDE_COMMIT_LOGS}; then
 701 |     find "${commitlog_directory}" -type f >> "${TARGET_LIST_FILE}"
 702 |   fi
 703 |   #resolves issue #3
 704 |   if ${INCLUDE_CACHES}; then
 705 |     find "${saved_caches_directory}" -type f >> "${TARGET_LIST_FILE}"
 706 |   fi
 707 |   echo "${YAML_FILE}" >> "${TARGET_LIST_FILE}"
 708 | }
 709 | 
 710 | # Since incrementals are automatically created as needed
 711 | # this script has to find them for each keyspace and then
 712 | # compare against a timestamp file to copy only the newest files
 713 | function find_incrementals() {
 714 |   loginfo "Locating Incremental backup files"
 715 |   LAST_INC_BACKUP_TIME="$(head -n 1 ${BACKUP_DIR}/last_inc_backup_time)"
 716 |   #take time before list to backup is compiled
 717 |   local time_before_find=$(prepare_date "+%F %H:%M:%S")
 718 |   for i in "${data_file_directories[@]}"
 719 |   do
 720 |     if [ -n "${LAST_INC_BACKUP_TIME}" ]; then
 721 |       find ${i} -mindepth 4 -maxdepth 4 -path '*/backups/*' -type f \
 722 |         \( -name "*.db" -o -name "*.crc32" -o -name "*.txt" \) \
 723 |         -newermt "${LAST_INC_BACKUP_TIME}" >> "${TARGET_LIST_FILE}"
 724 |     else
 725 |       find ${i} -mindepth 4 -maxdepth 4 -path '*/backups/*' -type f \
 726 |         \( -name "*.db" -o -name "*.crc32" -o -name "*.txt" \) \
 727 |         >> "${TARGET_LIST_FILE}"
 728 |     fi
 729 |   done
 730 |   #if there is only one line in the file then no files were found
 731 |   if [ $(cat "${TARGET_LIST_FILE}" | wc -l) -lt 1 ]; then
 732 |     loginfo "No new incremental files detected, aborting backup"
 733 |     exit 0
 734 |   fi
 735 |   #store time right before backup list creation to update after successful backup
 736 |   LAST_INC_BACKUP_TIME=${time_before_find}
 737 | }
 738 | 
 739 | # After successful backup, update last_inc_backup_time file
 740 | function save_last_inc_backup_time() {
 741 |   if ! ${DRY_RUN}; then
 742 |     echo "${LAST_INC_BACKUP_TIME}" > ${BACKUP_DIR}/last_inc_backup_time
 743 |   fi
 744 | }
 745 | 
 746 | # Find snapshots to include in backup
 747 | function find_snapshots() {
 748 |   loginfo "Locating Snapshot ${SNAPSHOT_NAME}"
 749 |   for i in "${data_file_directories[@]}"
 750 |   do
 751 |     find ${i} -path "*/snapshots/${SNAPSHOT_NAME}/*" -type f >> "${TARGET_LIST_FILE}"
 752 |   done
 753 | }
 754 | 
 755 | # Compress contents of backup directory
 756 | function archive_compress() {
 757 |   loginfo "Creating Archive file: ${COMPRESS_DIR}/${ARCHIVE_FILE}"
 758 |   local cmd
 759 |   cmd="${NICE} -n${NICE_LEVEL} ${TAR} -pc ${TAR_CFLAG} -f "
 760 |   cmd+="${COMPRESS_DIR}/${ARCHIVE_FILE} --files-from=${TARGET_LIST_FILE}"
 761 |   if ${DRY_RUN}; then
 762 |     loginfo "DRY RUN: ${cmd}"
 763 |   else
 764 |     eval "${cmd}"
 765 |   fi
 766 | }
 767 | 
 768 | #For large backup files, this will split the file into multiple smaller files
 769 | #which allows for more efficient upload / download from Google Cloud Storage
 770 | function split_archive() {
 771 |   loginfo "Compressing And splitting backup"
 772 |   local cmd
 773 |   cmd="(cd ${COMPRESS_DIR} && ${NICE} -n${NICE_LEVEL} ${TAR} -pc ${TAR_CFLAG} -f - "
 774 |   cmd+="--files-from=${TARGET_LIST_FILE} "
 775 |   cmd+=" | split -d -b${SPLIT_SIZE} - ${SPLIT_FILE_SUFFIX})"
 776 |   if ${DRY_RUN}; then
 777 |     loginfo "DRY RUN: ${cmd}"
 778 |   else
 779 |     eval "${cmd}"
 780 |   fi
 781 | }
 782 | 
 783 | # Remove old snapshots to free space
 784 | function clear_snapshots() {
 785 |   loginfo "Clearing old Snapshots"
 786 |   if ${DRY_RUN}; then
 787 |     loginfo "DRY RUN: did not clear snapshots"
 788 |   else
 789 |     $NODETOOL ${USER_OPTIONS} clearsnapshot
 790 |   fi
 791 | }
 792 | 
 793 | # If requested the old incremental backup files will be pruned following the fresh snapshot
 794 | #$AGE is set to 5 minutes assuming this script takes no more than 5 minutes to run
 795 | function clear_incrementals() {
 796 |   loginfo "Clearing old incremental backups"
 797 |   for i in "${data_file_directories[@]}"
 798 |   do
 799 |     if ${DRY_RUN}; then
 800 |       loginfo "DRY RUN: did not clear old incremental backups"
 801 |     else
 802 |       find ${i} -mindepth 4 -maxdepth 4 -path '*/backups/*' -type f \
 803 |         \( -name "*.db" -o -name "*.crc32" -o -name "*.txt" \) \
 804 |         \! -newermt "${SNAPSHOT_TIME}" -exec rm -f ${VERBOSE_RM} {} \;
 805 |     fi
 806 |   done
 807 | }
 808 | 
 809 | # Copy the backup files up to the GCS bucket
 810 | function copy_to_gcs() {
 811 |   loginfo "Copying files to ${GCS_BACKUP_PATH}"
 812 |   if ${DRY_RUN}; then
 813 |     if ${SPLIT_FILE}; then
 814 |       loginfo "DRY RUN: ${GSUTIL} -m cp ${COMPRESS_DIR}/${SPLIT_FILE_SUFFIX}* ${GCS_BACKUP_PATH}"
 815 |     else
 816 |       loginfo "DRY RUN: ${GSUTIL} cp ${COMPRESS_DIR}/${ARCHIVE_FILE} ${GCS_BACKUP_PATH}"
 817 |     fi
 818 |   else
 819 |     if ${SPLIT_FILE}; then
 820 |       ${GSUTIL} -m cp "${COMPRESS_DIR}/${SPLIT_FILE_SUFFIX}*" "${GCS_BACKUP_PATH}"
 821 |     else
 822 |       ${GSUTIL} cp "${COMPRESS_DIR}/${ARCHIVE_FILE}" "${GCS_BACKUP_PATH}"
 823 |     fi
 824 |   fi
 825 | }
 826 | 
 827 | # This will optionally go through and delete files generated by the backup
 828 | # if the -k --keep-old flag is set then it will not delete these files
 829 | function backup_cleanup() {
 830 |   if ${DRY_RUN}; then
 831 |     loginfo "DRY RUN: Would have deleted old backup files"
 832 |   else
 833 |     if ${KEEP_OLD_FILES}; then
 834 |       loginfo "Keeping backup files:"
 835 |       loginfo "  ${COMPRESS_DIR}/*"
 836 |       loginfo "  ${SCHEMA_DIR}/${DATE}-schema.cql"
 837 |       loginfo "  ${TOKEN_RING_DIR}/${DATE}-token-ring"
 838 |     else
 839 |       loginfo "Deleting backup files"
 840 |       find "${COMPRESS_DIR}/" -type f -exec rm -f ${VERBOSE_RM} {} \;
 841 |       find "${SCHEMA_DIR}/" -type f -exec rm -f ${VERBOSE_RM} {} \;
 842 |       find "${TOKEN_RING_DIR}/" -type f -exec rm -f ${VERBOSE_RM} {} \;
 843 |       rm -f ${VERBOSE_RM} ${TARGET_LIST_FILE}
 844 |     fi
 845 |   fi
 846 | }
 847 | 
 848 | # This restore function is designed to perform a simple restore of a full snapshot
 849 | # In the event that you want to restore incremental backups you should start by
 850 | # restoring the last full snapshot prior to your target incremental backup file
 851 | # and manually move the files from each incremental file in chronological order
 852 | # leading up to the target incremental backup file
 853 | function restore() {
 854 |   loginfo "****NOTE: Simple restore procedure activated*****************"
 855 |   loginfo "****NOTE: Restore requires a full snapshot backup************"
 856 |   loginfo "****NOTE: Incremental backups must be manually restored******\n"
 857 |   restore_get_files
 858 |   if ${DOWNLOAD_ONLY} ; then
 859 |     loginfo "Backup file downloaded to ${BACKUP_DIR}/restore, this script will only" \
 860 |     " restore a full snapshot"
 861 |     loginfo "You must manually restore incremental files in sequence after first" \
 862 |     "restoring the last full snapshot taken prior to your incremental file's creation date"
 863 |     exit 0
 864 |   else
 865 |     restore_confirm
 866 |     restore_stop_cassandra
 867 |     restore_files
 868 |     restore_start_cassandra
 869 |     restore_cleanup
 870 |   fi
 871 | }
 872 | 
 873 | # Orchestrate the retrieval and extraction of the files to recover
 874 | function restore_get_files() {
 875 |   loginfo "Starting file retrieval process"
 876 |   if ${DRY_RUN}; then
 877 |     loginfo "DRY RUN: Would have cleared restore dir ${BACKUP_DIR}/restore/*"
 878 |   else
 879 |     rm -rf ${VERBOSE_RM} ${BACKUP_DIR}/restore/*
 880 |   fi
 881 |   if ${SPLIT_FILE}; then
 882 |     restore_split_from_gcs
 883 |   else
 884 |     restore_compressed_from_gcs
 885 |   fi
 886 | 
 887 | }
 888 | 
 889 | # Download uncompressed backup files from GCS
 890 | function restore_split_from_gcs() {
 891 |   loginfo "Downloading restore files from GCS"
 892 |   if ${DRY_RUN}; then
 893 |     loginfo "DRY RUN: ${GSUTIL} -m -r cp ${GCS_BUCKET} ${COMPRESS_DIR}"
 894 |   else
 895 |     ${GSUTIL} -m cp -r  "${GCS_BUCKET}" "${COMPRESS_DIR}"
 896 |   fi
 897 |   restore_split
 898 | }
 899 | 
 900 | # Retrieve the compressed backup file
 901 | function restore_compressed_from_gcs() {
 902 |     if ${DRY_RUN}; then
 903 |       loginfo "DRY RUN: ${GSUTIL} cp ${GCS_BUCKET} ${COMPRESS_DIR}"
 904 |     else
 905 |        #copy the tar.gz file
 906 |       ${GSUTIL} cp "${GCS_BUCKET}" "${COMPRESS_DIR}"
 907 |     fi
 908 |     restore_decompress
 909 | }
 910 | 
 911 | # Extract the compressed backup file
 912 | function restore_decompress() {
 913 |   loginfo "Decompressing restore files"
 914 |   local cmd
 915 |   cmd="${NICE} -n${NICE_LEVEL} ${TAR} -x ${TAR_CFLAG} "
 916 |   cmd+="-f ${COMPRESS_DIR}/${RESTORE_FILE} -C ${BACKUP_DIR}/restore/"
 917 |   if ${DRY_RUN}; then
 918 |     loginfo "DRY RUN: ${cmd}"
 919 |   else
 920 |     eval "${cmd}"
 921 |   fi
 922 | }
 923 | 
 924 | # Concatenate the split backup files and extract them
 925 | function restore_split() {
 926 |   loginfo "Concatening split archive and extracting files"
 927 |   local cmd
 928 |   cmd="(cd ${BACKUP_DIR}/restore/ && ${NICE} -n${NICE_LEVEL} "
 929 |   cmd+="cat ${COMPRESS_DIR}/${RESTORE_FILE} | ${TAR} -x ${TAR_CFLAG} "
 930 |   cmd+="-f - -C ${BACKUP_DIR}/restore/ )"
 931 |   if ${DRY_RUN}; then
 932 |     loginfo "DRY RUN: ${cmd}"
 933 |   else
 934 |     eval "${cmd}"
 935 |   fi
 936 | }
 937 | 
 938 | # The archive commands save permissions but the new directories need this
 939 | # @param directory path to chown
 940 | function restore_fix_perms() {
 941 |   loginfo "Fixing file ownership"
 942 |   if ${DRY_RUN}; then
 943 |     loginfo "DRY RUN: chown -R ${CASSANDRA_OG} ${1} "
 944 |   else
 945 |     chown -R ${CASSANDRA_OG} ${1}
 946 |   fi
 947 | }
 948 | 
 949 | # Do the heavy lifting of moving the files from the restore directory back to the
 950 | # correct target directories. This will also rename the current important directories
 951 | # in order to keep a local copy to roll back. This will then take the snapshot
 952 | function restore_files() {
 953 |   loginfo "Attempting to restore files"
 954 |   #temporarily move current files
 955 |   if ${DRY_RUN}; then
 956 |       loginfo "DRY RUN: Copying files from ${BACKUP_DIR}/restore/"
 957 |   else
 958 |     for i in "${data_file_directories[@]}"
 959 |     do
 960 |        loginfo "Renaming ${i} to ${i}_old_${DATE} if anything fails, manually rename it"
 961 |        mv "${i}" "${i}_old_${DATE}"
 962 |     done
 963 | 
 964 |     loginfo "Renaming ${commitlog_directory} to ${commitlog_directory}_old_${DATE} "\
 965 |       "if anything fails, manually rename it"
 966 |     mv "${commitlog_directory}" "${commitlog_directory}_old_${DATE}"
 967 | 
 968 |     loginfo "Renaming ${saved_caches_directory} to ${saved_caches_directory}_old_${DATE}"\
 969 |       " if anything fails, manually rename it"
 970 |     mv "${saved_caches_directory}" "${saved_caches_directory}_old_${DATE}"
 971 | 
 972 |     #copy the full paths back to the root directory exlude the Yaml File
 973 |     mkdir -p "${commitlog_directory}"
 974 |     restore_fix_perms "${commitlog_directory}"
 975 |     mkdir -p "${saved_caches_directory}"
 976 |     restore_fix_perms "${saved_caches_directory}"
 977 |     loginfo "Performing rsync commitlogs and caches from restore directory to full path"
 978 |     if [ -d "${BACKUP_DIR}/restore${commitlog_directory}" ]; then
 979 |       rsync -aH ${VERBOSE_RSYNC} ${BACKUP_DIR}/restore${commitlog_directory}/* ${commitlog_directory}/
 980 |     fi
 981 |     if [ -d "${BACKUP_DIR}/restore${saved_caches_directory}" ]; then
 982 |       rsync -aH ${VERBOSE_RSYNC} ${BACKUP_DIR}/restore${saved_caches_directory}/* ${saved_caches_directory}/
 983 |     fi
 984 | 
 985 |     for i in "${data_file_directories[@]}"
 986 |     do
 987 |       #have to recreate it since we moved the old one for safety
 988 |       mkdir -p ${i} && restore_fix_perms ${i}
 989 |       loginfo "Performing rsync data files from restore directory to full path ${i}"
 990 |       rsync -aH ${VERBOSE_RSYNC}  ${BACKUP_DIR}/restore${i}/*  ${i}/
 991 |       loginfo "Moving snapshot files up two directories to their keyspace base directories"
 992 |       #assume the snap* pattern is safe since no other
 993 |       # snapshots should have been copied in the backup process
 994 |       find ${i} -mindepth 2 -path '*/snapshots/snap*/*' -type f \
 995 |         -exec bash -c 'dir={}&& cd ${dir%/*} && mv {} ../..' \;
 996 |       restore_fix_perms ${i}
 997 |     done
 998 |   fi
 999 | }
1000 | 
1001 | # Stop the Cassandra service after flushing the transaction logs
1002 | # since we're doing a full restore in this case flushing is irrelevant
1003 | # but in future versions of this script there should be the option
1004 | # to restore a specific keyspace and stopping would require a flush first
1005 | function restore_stop_cassandra() {
1006 |   if ${DRY_RUN}; then
1007 |     loginfo "DRY RUN: Flushing and Stopping Cassandra"
1008 |     loginfo "DRY RUN: $NODETOOL ${USER_OPTIONS} " \
1009 |     "flush; service $SERVICE_NAME stop"
1010 |   else
1011 |     set +e
1012 |     #the following status script often throws an error, ignore it
1013 |     if $NODETOOL ${USER_OPTIONS}  status | grep -q "Connection refused"; then
1014 |       loginfo "Attempted to Stop Cassandra service but it seems to already be stopped"
1015 |     else
1016 |       $NODETOOL ${USER_OPTIONS} flush
1017 |       loginfo "Stopping Cassandra Service ${SERVICE_NAME} and sleep for 10 seconds"
1018 |       service ${SERVICE_NAME} stop
1019 |       sleep 10
1020 |     fi
1021 |     set -e
1022 |   fi
1023 | }
1024 | 
1025 | # If Cassandra is not part of a cluster then restart it. If it is part of a cluster,
1026 | # the restore must be completed for every node before restarting them, or the newer
1027 | # data on the other nodes will overwrite the old data that was just restored
1028 | function restore_start_cassandra() {
1029 |   if ${DRY_RUN}; then
1030 |       loginfo "DRY RUN: Starting Cassandra"
1031 |   else
1032 |     if "${AUTO_RESTART}"; then
1033 |       service ${SERVICE_NAME} start
1034 |     fi
1035 |   fi
1036 | }
1037 | 
1038 | # This will optionally go through and delete any copies of old data files
1039 | #if the -k --keep-old flag is set then it will not delete these files
1040 | function restore_cleanup() {
1041 |   if ${DRY_RUN}; then
1042 |     loginfo "DRY RUN: Would have deleted old data files"
1043 |   else
1044 |     if ${KEEP_OLD_FILES}; then
1045 |       loginfo "Keeping old files:"
1046 |       loginfo "  ${commitlog_directory}_old_${DATE}"
1047 |       loginfo "  ${saved_caches_directory}_old_${DATE}"
1048 |       loginfo "  ${BACKUP_DIR}/restore/"
1049 |     else
1050 |       loginfo "Deleting old files"
1051 |       rm -rf ${VERBOSE_RM} "${commitlog_directory}_old_${DATE}"
1052 |       rm -rf ${VERBOSE_RM} "${saved_caches_directory}_old_${DATE}"
1053 |       rm -rf ${VERBOSE_RM} "${BACKUP_DIR}/restore/"
1054 |     fi
1055 | 
1056 |     for i in "${data_file_directories[@]}"
1057 |     do
1058 |       if ${KEEP_OLD_FILES}; then
1059 |         loginfo "keeping old data: ${i}_old_${DATE}"
1060 |       else
1061 |         loginfo "deleting old data: ${i}_old_${DATE}"
1062 |         rm -rf ${VERBOSE_RM} "${i}_old_${DATE}"
1063 |      fi
1064 |     done
1065 |     rm -rf ${VERBOSE_RM} ${BACKUP_DIR}/restore
1066 |     rm -rf ${VERBOSE_RM} ${COMPRESS_DIR:?"aborting bad compress_dir"}/*
1067 |   fi
1068 | }
1069 | 
1070 | #restore should be performed by a person
1071 | #the -f option will force restore without confirmation
1072 | function restore_confirm() {
1073 | 
1074 |     if ${FORCE_RESTORE}; then
1075 |       return
1076 |     fi
1077 |     while true
1078 |     do
1079 |       read -p "Confirm: Stop Cassandra and restore the files \
1080 |       from ${BACKUP_DIR}/restore? Y or N" ans
1081 |       case $ans in
1082 |         [yY]* )
1083 |                 echo "Okay, commencing restore";
1084 |                 break
1085 |                 ;;
1086 |         [nN]* )
1087 |                 loginfo "Exiting restore"
1088 |                 exit 0
1089 |                 break
1090 |                 ;;
1091 | 
1092 |             * )
1093 |                 echo "Enter Y or N, please.";
1094 |                 ;;
1095 |       esac
1096 |     done
1097 | }
1098 | 
1099 | # Transform long options to short ones
1100 | for arg in "$@"; do
1101 |   shift
1102 |   case "$arg" in
1103 | 
1104 |     "backup")   set -- "$@" "-B" ;;
1105 |     "restore")   set -- "$@" "-r" ;;
1106 |     "commands")
1107 |                     commands
1108 |                     exit 0
1109 |                     ;;
1110 |     "options")
1111 |                     options
1112 |                     exit 0
1113 |                     ;;
1114 |     "inventory") set -- "$@" "-I" ;;
1115 |     "--alt-hostname")   set -- "$@" "-a" ;;
1116 |     "--auth-file") set -- "$@" "-U" ;;
1117 |     "--gcsbucket") set -- "$@" "-b" ;;
1118 |     "--backupdir")   set -- "$@" "-d" ;;
1119 |     "--bzip")    set -- "$@" "-j" ;;
1120 |     "--clear-old-ss")   set -- "$@" "-c" ;;
1121 |     "--clear-old-inc")   set -- "$@" "-C" ;;
1122 |     "--download-only")   set -- "$@" "-D" ;;
1123 |     "--force")   set -- "$@" "-f" ;;
1124 |     "--help") set -- "$@" "-h" ;;
1125 |     "--home-dir") set -- "$@" "-H" ;;
1126 |     "--inc-commit-logs") set -- "$@" "-L" ;;
1127 |     "--incremental") set -- "$@" "-i" ;;
1128 |     "--log-dir")   set -- "$@" "-l" ;;
1129 |     "--keep-old")   set -- "$@" "-k" ;;
1130 |     "--noop")   set -- "$@" "-n" ;;
1131 |     "--nice")   set -- "$@" "-N" ;;
1132 |     "--service-name")   set -- "$@" "-S" ;;
1133 |     "--split-size")   set -- "$@" "-s" ;;
1134 |     "--target-gz-dir")   set -- "$@" "-T" ;;
1135 |     "--verbose")   set -- "$@" "-v" ;;
1136 |     "--with-caches")   set -- "$@" "-w" ;;
1137 |     "--yaml")   set -- "$@" "-y" ;;
1138 |     "--zip")   set -- "$@" "-z" ;;
1139 |     *)        set -- "$@" "$arg"
1140 |   esac
1141 | done
1142 | 
1143 | while getopts 'a:b:BcCd:DfhH:iIjkl:LnN:p:rs:S:T:u:U:vwy:z' OPTION
1144 | do
1145 |   case $OPTION in
1146 |       a)
1147 |           HOSTNAME=${OPTARG}
1148 |           ;;
1149 |       b)
1150 |           GCS_BUCKET=${OPTARG%/}
1151 |           ;;
1152 |       B)
1153 |           ACTION="backup"
1154 |           ;;
1155 |       c)
1156 |           CLEAR_SNAPSHOTS=true
1157 |           ;;
1158 |       C)
1159 |           CLEAR_INCREMENTALS=true
1160 |           ;;
1161 |       d)
1162 |           BACKUP_DIR=${OPTARG}
1163 |           ;;
1164 |       D)
1165 |           DOWNLOAD_ONLY=true
1166 |           ;;
1167 |       f)
1168 |           FORCE_RESTORE=true
1169 |           ;;
1170 |       h)
1171 |           print_usage
1172 |           exit 0
1173 |           ;;
1174 |       H)
1175 |           CASS_HOME=${OPTARG%/}
1176 |           ;;
1177 |       i)
1178 |           INCREMENTAL=true
1179 |           ;;
1180 |       I)
1181 |           ACTION="inventory"
1182 |           ;;
1183 |       j)
1184 |           BZIP=true
1185 |           COMPRESSION=true
1186 |           TAR_CFLAG="-j"
1187 |           TAR_EXT="tbz"
1188 |           ;;
1189 |       k)
1190 |           KEEP_OLD_FILES=true
1191 |           ;;
1192 |       l)
1193 |           LOG_OUTPUT=true
1194 |           [ -d ${OPTARG} ] && LOG_DIR=${OPTARG%/}
1195 |           ;;
1196 |       L)
1197 |           INCLUDE_COMMIT_LOGS=true
1198 | 	        ;;
1199 |       n)
1200 |           DRY_RUN=true
1201 |           ;;
1202 |       N)
1203 |           NICE_LEVEL=${OPTARG}
1204 |           ;;
1205 |       p)
1206 |           CASSANDRA_PASS=${OPTARG}
1207 |           ;;
1208 |       r)
1209 |           ACTION="restore"
1210 |           ;;
1211 |       s)
1212 |           SPLIT_SIZE="${OPTARG/[a-z]*[A-Z]*}M" #replace letters with M
1213 |           SPLIT_FILE=true
1214 |           ;;
1215 |       S)
1216 |           SERVICE_NAME=${OPTARG}
1217 |           ;;
1218 |       T)
1219 |           COMPRESS_DIR=${OPTARG%/}
1220 |           ;;
1221 |       u)
1222 |           CASSANDRA_USER=${OPTARG}
1223 |           USE_AUTH=true
1224 |           ;;
1225 |       U)
1226 |           USER_FILE=${OPTARG}
1227 |           USE_AUTH=true
1228 |           ;;
1229 |       v)
1230 |           VERBOSE=true
1231 |           ;;
1232 |       w)
1233 |           INCLUDE_CACHES=true
1234 |           ;;
1235 |       y)
1236 |           YAML_FILE=${OPTARG}
1237 |           ;;
1238 |       z)
1239 |           COMPRESSION=true
1240 |           TAR_CFLAG="-z"
1241 |           TAR_EXT="tgz"
1242 |           ;;
1243 |       ?)
1244 |           print_help
1245 |           ;;
1246 |   esac
1247 | done
1248 | 
1249 | ACTION=${ACTION:-backup} # either backup or restore
1250 | AGE=5 #five minutes ago the last modified date of incremental backups to prune
1251 | AUTO_RESTART=true #flag set to false if Cassandra is part of a cluster
1252 | BACKUP_DIR=${BACKUP_DIR:-/cassandra/backups} # Backups base directory
1253 | BZIP=${BZIP:-false} #use bzip2 compression
1254 | CASSANDRA_PASS=${CASSANDRA_PASS:-''} #Password for Cassandra CQLSH account
1255 | CASSANDRA_USER=${CASSANDRA_USER:-''} #Username for Cassandra CQLSH account
1256 | CASSANDRA_OG="cassandra:cassandra" #modify this if you changed the system cassandra user and group
1257 | CLEAR_INCREMENTALS=${CLEAR_INCREMENTALS:-false} #flag to delete incrementals post snapshot
1258 | CLEAR_SNAPSHOTS=${CLEAR_SNAPSHOTS:-false} #clear old snapshots pre-snapshot
1259 | COMPRESS_DIR=${COMPRESS_DIR:-${BACKUP_DIR}/compressed} #directory to house backup archive
1260 | COMPRESSION=${COMPRESSION:-false} #flag to use tar+gz
1261 | CQLSH="$(which cqlsh)" #which cqlsh command
1262 | CQLSH_DEFAULT_HOST="127.0.0.1" #cqlsh host - currently hard coded
1263 | DATE="$(prepare_date +%F_%H-%M )" #nicely formatted date string for files
1264 | DOWNLOAD_ONLY=${DOWNLOAD_ONLY:-false} #user flag or used if incremental restore is requested
1265 | DRY_RUN=${DRY_RUN:-false} #flag to only print what would have executed
1266 | ERROR_COUNT=0 #used in validation step will exit if > 0
1267 | FORCE_RESTORE=${FORCE_RESTORE:-false} #flag to bypass restore confirmation prompt
1268 | GSUTIL="$(which gsutil)" #which gsutil script
1269 | HOSTNAME=${HOSTNAME:-"$(hostname)"} #used for gcs backup location
1270 | INCLUDE_CACHES=${INCLUDE_CACHES:-false} #include the saved caches for posterity
1271 | INCLUDE_COMMIT_LOGS=${INCLUDE_COMMIT_LOGS:-false} #include the commit logs for extra safety
1272 | INCREMENTAL=${INCREMENTAL:-false}  # flag to indicate only incremental files
1273 | KEEP_OLD_FILES=${KEEP_OLD_FILES:-false}
1274 | LOG_DIR=${LOG_DIR:-/var/log/cassandra} #where to write the log files
1275 | LOG_FILE="${LOG_DIR}/CassandraBackup${DATE}.log" #script log file
1276 | LOG_OUTPUT=${LOG_OUTPUT:-false} #flag to output to log file instead of stdout
1277 | NICE="$(which nice)" #which nice for low impact tar
1278 | NICE_LEVEL=${NICE_LEVEL:-10} ##10 is default nice level
1279 | NODETOOL="$(which nodetool)" #which nodetool
1280 | USER_OPTIONS="" #nodetool and cqlsh options
1281 | SCHEMA_DIR="${BACKUP_DIR}/schema" # schema backups directory
1282 | TOKEN_RING_DIR="${BACKUP_DIR}/token_ring" # token ring backups directory
1283 | SERVICE_NAME=${SERVICE_NAME:-cassandra} # sometimes the service name is different
1284 | SNAPSHOT_NAME=snap-${DATE} #name of new snapshot to take
1285 | SNAPSHOT_TIME="" #used to keep track of when the snapshot was taken
1286 | SPLIT_SIZE=${SPLIT_SIZE:-"100M"} #size of files if split command is used
1287 | SPLIT_FILE=${SPLIT_FILE:-false}  #whether or not to use the split command on backup archive
1288 | SUFFIX="snpsht" #Differentiates the two types of backup files
1289 | ${INCREMENTAL} && SUFFIX="incr"  #If incremental mode change the file suffix
1290 | TAR="$(which tar)" #which tar command
1291 | TAR_EXT=${TAR_EXT:-tar} #default tar gzip file extension
1292 | TAR_CFLAG=${TAR_CFLAG:-""} #flag for tar to use gzip, if bzip is requested then -j
1293 | #TARGET_SCHEMA=${TARGET_SCHEMA:-schema} #Restore a specific Schema, not implemented yet
1294 | USE_AUTH=${USE_AUTH:-false} #flag to use cqlsh authentication
1295 | VERBOSE=${VERBOSE:-false} #prints detailed information
1296 | VERBOSE_RSYNC="" # add more detail to rsync when verbose mode is active
1297 | ${VERBOSE} && VERBOSE_RSYNC="-v --progress"
1298 | VERBOSE_RM="" # add more detail to remove when verbose mode is active
1299 | ${VERBOSE} && VERBOSE_RM="-v"
1300 | YAML_FILE=${YAML_FILE:-/etc/cassandra/cassandra.yaml} #Cassandra config file
1301 | ARCHIVE_FILE="cass-${DATE}-${SUFFIX}.${TAR_EXT}"
1302 | SPLIT_FILE_SUFFIX="cass-${DATE}-${SUFFIX}"
1303 | TARGET_LIST_FILE="${BACKUP_DIR}/${SUFFIX}_backup_files-${DATE}"
1304 | LAST_INC_BACKUP_TIME="" #used to keep track of the incremental backup time
1305 | 
1306 | # Validate input
1307 | validate
1308 | # Execute the requested action
1309 | eval $ACTION
1310 | 


--------------------------------------------------------------------------------