├── .gitignore ├── AUTHORS ├── CONTRIBUTING.md ├── CONTRIBUTORS ├── LICENSE ├── MAINTAINERS ├── README.md ├── archiver_sample.ini ├── diagram.asciio ├── osarchiver ├── __init__.py ├── archiver.py ├── common │ ├── __init__.py │ └── db.py ├── config.py ├── destination │ ├── __init__.py │ ├── base.py │ ├── db │ │ ├── __init__.py │ │ ├── db.py │ │ └── errors.py │ └── file │ │ ├── __init__.py │ │ ├── base.py │ │ ├── csv.py │ │ ├── remote_store │ │ ├── __init__.py │ │ ├── base.py │ │ └── swift.py │ │ └── sql.py ├── errors.py ├── main.py └── source │ ├── __init__.py │ ├── base.py │ └── db.py ├── requirements.txt ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | archiver.ini 2 | # Created by https://www.gitignore.io/api/python,vim 3 | # Edit at https://www.gitignore.io/?templates=python,vim 4 | 5 | ### Python ### 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | share/python-wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | MANIFEST 33 | 34 | # PyInstaller 35 | # Usually these files are written by a python script from a template 36 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 37 | *.manifest 38 | *.spec 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Django stuff: 62 | *.log 63 | local_settings.py 64 | db.sqlite3 65 | 66 | # Flask stuff: 67 | instance/ 68 | .webassets-cache 69 | 70 | # Scrapy stuff: 71 | .scrapy 72 | 73 | # Sphinx documentation 74 | docs/_build/ 75 | 76 | # PyBuilder 77 | target/ 78 | 79 | # Jupyter Notebook 80 | .ipynb_checkpoints 81 | 82 | # IPython 83 | profile_default/ 84 | ipython_config.py 85 | 86 | # pyenv 87 | .python-version 88 | 89 | # celery beat schedule file 90 | celerybeat-schedule 91 | 92 | # SageMath parsed files 93 | *.sage.py 94 | 95 | # Environments 96 | .env 97 | .venv 98 | env/ 99 | venv/ 100 | ENV/ 101 | env.bak/ 102 | venv.bak/ 103 | 104 | # Spyder project settings 105 | .spyderproject 106 | .spyproject 107 | 108 | # Rope project settings 109 | .ropeproject 110 | 111 | # mkdocs documentation 112 | /site 113 | 114 | # mypy 115 | .mypy_cache/ 116 | .dmypy.json 117 | dmypy.json 118 | 119 | # Pyre type checker 120 | .pyre/ 121 | 122 | ### Python Patch ### 123 | .venv/ 124 | 125 | ### Vim ### 126 | # Swap 127 | [._]*.s[a-v][a-z] 128 | [._]*.sw[a-p] 129 | [._]s[a-rt-v][a-z] 130 | [._]ss[a-gi-z] 131 | [._]sw[a-p] 132 | 133 | # Session 134 | Session.vim 135 | 136 | # Temporary 137 | .netrwhist 138 | *~ 139 | # Auto-generated tag files 140 | tags 141 | # Persistent undo 142 | [._]*.un~ 143 | 144 | # End of https://www.gitignore.io/api/python,vim 145 | -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | # This is the official list of authors for copyright purposes. 2 | # This file is distinct from the CONTRIBUTORS files 3 | # and it lists the copyright holders only. 4 | 5 | # Names should be added to this file as one of 6 | # Organization's name 7 | # Individual's name 8 | # Individual's name 9 | # See CONTRIBUTORS for the meaning of multiple email addresses. 10 | 11 | # Please keep the list sorted. 12 | # 13 | 14 | OVH SAS 15 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to OSArchiver 2 | 3 | This project accepts contributions. In order to contribute, you should 4 | pay attention to a few things: 5 | 6 | 1. your code must follow the coding style rules 7 | 2. your code must be unit-tested 8 | 3. your code must be documented 9 | 4. your work must be signed (see below) 10 | 5. you may contribute through GitHub Pull Requests 11 | 12 | # Coding and documentation Style 13 | - The coding style follows `PEP-8: Style Guide for Python Code `_ (~100 chars/lines is a good limit) 14 | - The documentation style follows `PEP-257: Docstring Conventions `_ 15 | 16 | A good practice is to frequently run you code through `pylint `_ and make sure the code grades does not decrease. 17 | 18 | # Submitting Modifications 19 | 20 | The contributions should be submitted through Github Pull Requests 21 | and follow the DCO which is defined below. 22 | 23 | # Licensing for new files 24 | 25 | OSArchiver is licensed under a Modified 3-Clause BSD license. Anything 26 | contributed to OSArchiver must be released under this license. 27 | 28 | When introducing a new file into the project, please make sure it has a 29 | copyright header making clear under which license it's being released. 30 | 31 | # Developer Certificate of Origin (DCO) 32 | 33 | To improve tracking of contributions to this project we will use a 34 | process modeled on the modified DCO 1.1 and use a "sign-off" procedure 35 | on patches that are being emailed around or contributed in any other 36 | way. 37 | 38 | The sign-off is a simple line at the end of the explanation for the 39 | patch, which certifies that you wrote it or otherwise have the right 40 | to pass it on as an open-source patch. The rules are pretty simple: 41 | if you can certify the below: 42 | 43 | By making a contribution to this project, I certify that: 44 | 45 | (a) The contribution was created in whole or in part by me and I have 46 | the right to submit it under the open source license indicated in 47 | the file; or 48 | 49 | (b) The contribution is based upon previous work that, to the best of 50 | my knowledge, is covered under an appropriate open source License 51 | and I have the right under that license to submit that work with 52 | modifications, whether created in whole or in part by me, under 53 | the same open source license (unless I am permitted to submit 54 | under a different license), as indicated in the file; or 55 | 56 | (c) The contribution was provided directly to me by some other person 57 | who certified (a), (b) or (c) and I have not modified it. 58 | 59 | (d) The contribution is made free of any other party's intellectual 60 | property claims or rights. 61 | 62 | (e) I understand and agree that this project and the contribution are 63 | public and that a record of the contribution (including all 64 | personal information I submit with it, including my sign-off) is 65 | maintained indefinitely and may be redistributed consistent with 66 | this project or the open source license(s) involved. 67 | 68 | 69 | then you just add a line saying 70 | 71 | Signed-off-by: Random J Developer 72 | 73 | using your real name (sorry, no pseudonyms or anonymous contributions.) 74 | -------------------------------------------------------------------------------- /CONTRIBUTORS: -------------------------------------------------------------------------------- 1 | # This is the official list of people who can contribute 2 | # (and typically have contributed) code to the OSArchiver repository. 3 | 4 | Pierre-Samuel Le Stang 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2019, OVH SAS. 2 | All rights reserved. 3 | Modified 3-Clause BSD 4 | 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | * Redistributions in binary form must reproduce the above copyright 12 | notice, this list of conditions and the following disclaimer in the 13 | documentation and/or other materials provided with the distribution. 14 | * Neither the name of OVH SAS nor the 15 | names of its contributors may be used to endorse or promote products 16 | derived from this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY OVH SAS AND CONTRIBUTORS ``AS IS'' AND ANY 19 | EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 20 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL OVH SAS AND CONTRIBUTORS BE LIABLE FOR ANY 22 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 24 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 25 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 27 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | -------------------------------------------------------------------------------- /MAINTAINERS: -------------------------------------------------------------------------------- 1 | # This is the official list of the project maintainers. 2 | # This is mostly useful for contributors that want to push 3 | # significant pull requests or for project management issues. 4 | # 5 | # 6 | # Names should be added to this file like so: 7 | # Individual's name 8 | # Individual's name 9 | # 10 | # Please keep the list sorted. 11 | # 12 | # 13 | Pierre-Samuel Le Stang 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # OSArchiver: OpenStack databases archiver 3 | 4 | OSArchiver is a python package that aims to archive and remove soft deleted data from OpenStack databases. 5 | The package is shiped with a main script called osarchiver that reads a configuration file and run the archivers. 6 | 7 | # Philosophy 8 | 9 | * OSArchiver doesn't have any knowledge of Openstack business objects 10 | * OSArchiver purely relies on the common way of how OpenStack marks data as deleted by setting the column 'deleted_at' to a datetime. 11 | It means that a row is archivable/removable if the 'deleted_at' column is not NULL 12 | 13 | # Limitations 14 | 15 | * Support Mysql/MariaDB as db backend. 16 | * python >= 3.5 17 | 18 | # Design 19 | 20 | OSArchiver reads an INI configuration file in which you can define: 21 | 22 | * archivers: a section that hold one source and a non mandatory list of destinations 23 | * sources: a section that define a source of where the data should be read (basically the OS DB) 24 | * destinations: a section that define where the data should be archived 25 | 26 | # How does it works: 27 | 28 | .----------. 29 | .--------------------------| Archiver |-----------------------------. 30 | | '----------' | 31 | | | 32 | | | 33 | | | 34 | v _______________ v 35 | .--------. \ \ .-------------. 36 | | Source |-------------------->) ARCHIVE DATA )------------------>| Desinations | 37 | '--------' /______________/ '-------------' 38 | | | | 39 | | | | 40 | | | | 41 | | | | 42 | | | | 43 | | v | 44 | | .--------------------------. | 45 | v ( No error and delete_data=1 ) | 46 | '--------------------------' | 47 | _.-----._ | _.-----._ | 48 | .- -. | .- -. | ___ 49 | |-_ _-| | |-_ _-| | | |\ 50 | | ~-----~ | | | ~-----~ |<--'->| ' ___ 51 | | | | | | | SQL| |\ 52 | `._ _.' | `._ _.' |____| '-|---. 53 | "-----" | "-----" | CSV | | 54 | OpenStack DB v Archiving DB |_____| | 55 | ^ _______________ v 56 | | \ \ .-----------------------. 57 | '-------------------------) DELETE DATA ) ( remote_store configured ) 58 | /______________/ '-----------------------' 59 | | 60 | v 61 | __________ 62 | [_|||||||_°] 63 | [_|||||||_°] 64 | [_|||||||_°] 65 | 66 | Remote Storage (Swift, ...) 67 | 68 | # Installation 69 | 70 | ``` 71 | git clone https://github.com/ovh/osarchiver.git 72 | cd osarchiver 73 | pip install -r requirements.txt 74 | pip setup.py install 75 | ``` 76 | 77 | # osarchiver script 78 | 79 | ``` 80 | # osarchiver --help 81 | usage: osarchiver [-h] --config CONFIG [--log-file LOG_FILE] 82 | [--log-level {info,warn,error,debug}] [--debug] [--dry-run] 83 | 84 | optional arguments: 85 | -h, --help show this help message and exit 86 | --config CONFIG Configuration file to read 87 | --log-file LOG_FILE Append log to the specified file 88 | --log-level {info,warn,error,debug} 89 | Set log level 90 | --debug Enable debug mode 91 | --dry-run Display what would be done without really deleting or 92 | writing data 93 | ``` 94 | 95 | # Configuration 96 | The configuation is an INI file containing several sections. You configure your 97 | differents archivers in this configuration file. An example is available at the 98 | root of the repository. 99 | 100 | ## DEFAULT section: 101 | * Drescription: default section that define default/fallback value for options 102 | * Format **[DEFAULT]** 103 | * configuration parameters: all the parameters of archiver, source, destination 104 | and backend section can be added in this section, those will be the fallback 105 | value if the value is not set in a section. 106 | 107 | ## Archiver section: 108 | 109 | * Description: defines where to read data and where to archive them and/or delete. 110 | * Format **[archiver:*name*]** 111 | * configuration parameters: 112 | * **src**: name of the src section 113 | * **dst**: comma separated list of destination section names 114 | * **enable**: 1 or 0, if set to 0 the archiver is ignored and not run 115 | 116 | Example: 117 | ```properties 118 | [archiver:My_Archiver] 119 | src: os_prod 120 | dst: file, db 121 | 122 | [src:os_prod] 123 | ... 124 | 125 | [dst:file] 126 | ... 127 | 128 | [dst:db] 129 | .... 130 | ``` 131 | 132 | ## Source section: 133 | 134 | * Description: defines where the OpenStack database are. It supports for now 135 | one backend (db) but it may be easily extended 136 | * Format **[src:*name*]** 137 | * configuration parameters: 138 | * **backend**: the name of backend to use, only `db` is supported 139 | * **retention**: 12 MONTH 140 | * **archive_data**: 0 or 1 if set to 1 expect a dest to archive the data else 141 | won't run the archiving step just the delete step. 142 | * **delete_data**: 0 or 1 if set to 1 will run the delete step. If the 143 | archive step fails the delete step is not run to prevent loose of data. 144 | * *backend specific options* 145 | 146 | 147 | ## Destination section: 148 | 149 | * Description: defines where the data should be written. It supports for now 150 | two backends (db for datatabase and file [csv, sql]) and may be extended 151 | * Format **[dst:*name*]** 152 | * configuration parameters: 153 | * **backend**: the name of backend to use, `db` or `file` 154 | * *backend specific options* 155 | 156 | 157 | ## Backends options: 158 | 159 | ### db 160 | * Description: is the database (mysql/mariadb) backend 161 | * options: 162 | * **host**: DB host to connect to 163 | * **port**: port of MariaDB server is running on 164 | * **user**: login of MariaDB server to connect with 165 | * **password**: password of user 166 | * **delete_limit**: apply a LIMIT to DELETE statement 167 | * **select_limit**: apply a LIMIT to SELECT statement 168 | * **bulk_insert**: data are inserted in DB every builk_insert rows 169 | * **deleted_column**: name of column that holds the date of soft delete, is 170 | also used to filter table to archive, it means that the table must have 171 | the deleted_column to be archived 172 | * **where**: the literal SQL where applied to the select statement 173 | Ex: where=${deleted_column} <= SUBDATE(NOW(), INTERVAL ${retention}) 174 | * **foreign_key_check**: true or false if set to false disable foreign key 175 | check (default true) 176 | * **retention**: how long time of data to keep in database (SQL format: 12 177 | MONTH, 1 DAY, etc..) 178 | * **excluded_databases**: comma, cariage return or semicolon separated 179 | regexp of DB to exclude when specfiying '*' as database. The following DB 180 | are akways ignored: 'mysql', 'performance_schema', 'information_schema' 181 | * **excluded_tables**: comma, cariage return or semicolon separated regexp 182 | of DB to exclude when specifying '*' as table. Ex: shadow_.*,.*_archived 183 | * **db_suffix**: a non mendatory suffix to apply to the archiving DB. The 184 | default suffix '_archive' is applied if you archive on same host than 185 | source without setting a db_suffix or table_suffix (avoid reading and 186 | writing on the same db.table) 187 | * **table_suffix**: apply a suffix to the archiving table if specified 188 | 189 | ### file 190 | * Description: is the file archiving destination type, it writes SQL data in a 191 | file using one or several formats (supported: SQL, CSV) 192 | * **directory**: the directory path where to archive data. You may use the 193 | {date} keyword to append automaticaly the date to the directory path. 194 | (/backup/archive_{date}) 195 | * **formats**: a comma, semicolon or cariage return separated list that 196 | define the format in witch archive the data (csv, sql) 197 | 198 | You've developed a new cool feature ? Fixed an annoying bug ? We'd be happy 199 | 200 | to hear from you ! 201 | 202 | Have a look in [CONTRIBUTING.md](https://github.com/ovh/osarchiver/blob/master/CONTRIBUTING.md) 203 | 204 | # Related links 205 | 206 | * Contribute: https://github.com/ovh/osarchiver/blob/master/CONTRIBUTING.md 207 | * Report bugs: https://github.com/ovh/osarchiver/issues 208 | 209 | # License 210 | 211 | See https://github.com/ovh/osarchiver/blob/master/LICENSE 212 | -------------------------------------------------------------------------------- /archiver_sample.ini: -------------------------------------------------------------------------------- 1 | [DEFAULT] 2 | # The following parameter enable or disable an archiver 3 | # Default is to enable all archivers 4 | # can be override per archiver section 5 | # boolean options: yes/no 0/1 true/false 6 | enable=false 7 | # LIMIT applied to delete request 8 | # can be overrided in src section 9 | delete_limit=500 10 | # How many seconds to wait between 2 delete loop 11 | # can be overrided in src section 12 | delete_loop_delay=2 13 | # The LIMIT applied to a select 14 | # can be overrided in src section 15 | select_limit=1000 16 | # Number of statement to stack before commiting 17 | # will take the minimum between select_limit and bulk_insert 18 | bulk_insert=500 19 | # To skip archiving (and only delete because what else) set 20 | # archive_data to false 21 | # can be overrided in src section 22 | archive_data=true 23 | # Set delete_data to true if you want to delete data 24 | # from source backend 25 | # can be overrided in src section 26 | delete_data=false 27 | # The default column name that hold the date of 28 | # soft deleted data 29 | # can be overrided in src section 30 | deleted_column=deleted_at 31 | # The WHERE statement used to select rows to archive/delete 32 | # can be override in src section 33 | # {now}: python interpolation that will set the utcnow sql format date 34 | # ${deleted_column}, ${retention}: options of the config file 35 | where=${deleted_column} <= SUBDATE('{now}', INTERVAL ${retention}) 36 | # Set foreign_key_check to false to disable foreign key check 37 | # can be overrided in src section 38 | foreign_key_check=true 39 | # Data lifetime 40 | retention=1 MONTH 41 | # Coma, cariage return or semicolon separated regexp which define databases to 42 | # exclude, the defautl are 'mysql', 'performance_schema', 'information_schema' 43 | excluded_databases= 44 | # Coma, cariage return or semicolon separated regexp which define tables to 45 | # exclude 46 | excluded_tables=shadow_.* 47 | # default file archive format 48 | archive_format=bztar 49 | 50 | # Declare an archiver called 'nova' 51 | # Read data from src named 'nova' 52 | # And write data in dst named db_archiver and file_archiver 53 | [archiver:nova] 54 | src=nova 55 | dst=db_archiver, file_archiver 56 | enable=true 57 | 58 | # Declare an archiver called 'glance' 59 | # Read data from src named 'nova' 60 | # And write data in dst named db_archiver and file_archiver 61 | # Disable it 62 | [archiver:glance] 63 | src=glance 64 | dst=db_archiver, file_archiver 65 | enable=false 66 | 67 | # Here we define the src 'nova' 68 | # Which is a db backend 69 | # We want to archive all the tables of database nova except those defined 70 | # by the regex excluded_tables, data are archived then deleted 71 | [src:nova] 72 | backend=db 73 | host=localhost 74 | port=3307 75 | user=root 76 | password=*********** 77 | retention=12 MONTH 78 | databases=nova 79 | tables=* 80 | archive_data=true 81 | delete_data=true 82 | 83 | [src:glance] 84 | backend=db 85 | host=localhost 86 | port=3307 87 | user=root 88 | password=*********** 89 | retention=12 MONTH 90 | databases=glance 91 | tables=* 92 | excluded_tables=images 93 | archive_data=true 94 | delete_data=true 95 | 96 | # db_archiver destination configuration 97 | # backend is a db 98 | # db is suffixed with '_archived' 99 | [dst:db_archiver] 100 | backend=db 101 | host=localhost 102 | port=3307 103 | user=root 104 | password=********* 105 | db_suffix=_archived 106 | 107 | # file_archiver destination configuation 108 | # backend is a file 109 | # with 2 formats: csv and sql 110 | [dst:file_archiver] 111 | backend=file 112 | directory=/tmp/archive_{date} 113 | formats=csv,sql 114 | remote_store=swift 115 | 116 | [remote_store:swift] 117 | backend=swift 118 | container=osarchiver 119 | # Remote filename is by default of this format: 120 | # 2022-01-12_11:08:19/cinder.volumes.sql.tar.bz2 121 | # With file_name_prefix it will become: 122 | # /2022-01-12_11:08:19/cinder.volumes.sql.tar.bz2 123 | file_name_prefix= 124 | # All the opt_* key will be available in store_options attribute 125 | # of the remote_store instance, usefull to pass specific options 126 | # to the underlying library which is used to send the data 127 | opt_auth_version=3 128 | opt_os_project_name=project_name 129 | opt_os_username=username 130 | opt_os_password=password 131 | opt_os_auth_url=https://auth.cloud.domain.net/v3 132 | opt_os_region_name= 133 | opt_os_project_domain_name=project_domain_name 134 | opt_os_user_domain_name=user_domain_name 135 | # opt_=value 136 | # https://docs.openstack.org/python-swiftclient/latest/service-api.html 137 | # opt_retries = 5 138 | -------------------------------------------------------------------------------- /diagram.asciio: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ovh/osarchiver/73a9e6377a44b64a759f663bf99ac798e4ec026a/diagram.asciio -------------------------------------------------------------------------------- /osarchiver/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | -------------------------------------------------------------------------------- /osarchiver/archiver.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | The Archiver class file 7 | """ 8 | 9 | import logging 10 | import traceback 11 | from osarchiver.errors import OSArchiverArchivingFailed 12 | 13 | 14 | class Archiver(): 15 | """ 16 | Archiver class 17 | """ 18 | 19 | def __init__(self, name=None, src=None, dst=None, conf=None): 20 | """ 21 | instantiator, take one source and a list of destinations 22 | """ 23 | self.name = name 24 | # One source 25 | self.src = src 26 | # Pool of destinations 27 | self.dst = dst or [] 28 | # Config parser instance 29 | self.conf = conf 30 | 31 | def __repr__(self): 32 | return "Archiver {name}: {src} -> {dst}".\ 33 | format(name=self.name, src=self.src, dst=self.dst) 34 | 35 | def read(self): 36 | """ 37 | read method which loop over each set of data from Source instance 38 | yield database, table, items 39 | """ 40 | for data in self.src.read(): 41 | for items in data['data']: 42 | yield (data['database'], data['table'], items) 43 | 44 | def write(self, database=None, table=None, data=None): 45 | """ 46 | write method take a set of data as arguments, database and table and 47 | loop over each destination configured and call the write method of the 48 | destination object raise OSArchiverArchivingFailed in case of archiving 49 | error to prevent deletion 50 | """ 51 | if not self.src.archive_data: 52 | logging.info("Ignoring data archiving because archive_data is " 53 | "set to %s", self.src.archive_data) 54 | else: 55 | for dst in self.dst: 56 | try: 57 | dst.write(database=database, table=table, data=data) 58 | except Exception as my_exception: 59 | logging.error( 60 | "An error occured while archiving data: %s", 61 | my_exception) 62 | logging.error("Full traceback is: %s", 63 | traceback.format_exc()) 64 | raise OSArchiverArchivingFailed 65 | 66 | def delete(self, database=None, table=None, data=None): 67 | """ 68 | delete method take a set of data, database, table as arguments and 69 | delete the data from source if the delete_data prameters is true 70 | """ 71 | if not self.src.delete_data: 72 | logging.debug("Ignoring data deletion because delete_data is " 73 | "set to %s", self.src.delete_data) 74 | else: 75 | try: 76 | self.src.delete(database=database, table=table, data=data) 77 | except Exception as my_exception: 78 | logging.error("An error occured while deleting data: %s", 79 | my_exception) 80 | logging.error("Full traceback is: %s", traceback.format_exc()) 81 | 82 | def run(self): 83 | """ 84 | main method which basically read a set of data from the source then 85 | archive the data and delete them if no exception were caught 86 | """ 87 | if not self.src.archive_data and not self.src.delete_data: 88 | logging.info("Nothing to do for archiver %s archive_data and " 89 | "delete_date are disabled", self.name) 90 | return 0 91 | 92 | if not self.src.delete_data: 93 | logging.info("Data won't be deleted because 'delete_data' set to" 94 | " %s", self.src.delete_data) 95 | 96 | for (database, table, items) in self.read(): 97 | try: 98 | self.write(database=database, table=table, data=items) 99 | except OSArchiverArchivingFailed: 100 | logging.info("Ignoring deletion step because an error occured " 101 | "while archiving data") 102 | else: 103 | self.delete(database=database, table=table, data=items) 104 | 105 | self.clean_exit() 106 | return 0 107 | 108 | def clean_exit(self): 109 | """ 110 | method called when archiving is finished. It calls clean_exit method of 111 | Source and Destination instances 112 | """ 113 | logging.info("Please wait for clean exit...") 114 | self.src.clean_exit() 115 | for dst in self.dst: 116 | dst.clean_exit() 117 | -------------------------------------------------------------------------------- /osarchiver/common/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | Common functions of osarchiver 7 | """ 8 | from importlib import import_module 9 | 10 | 11 | def backend_factory(*args, backend='db', module=None, subclass=None, **kwargs): 12 | """ 13 | This factory function rule is to return the backend instances 14 | It raises an exception on import or attribute error or unavailable backend 15 | """ 16 | try: 17 | class_name = backend.capitalize() 18 | backend_module = import_module(module + '.' + backend) 19 | backend_class = getattr(backend_module, class_name) 20 | instance = backend_class(*args, **kwargs) 21 | except (AttributeError, ImportError): 22 | raise ImportError("{} is not part of our backend" 23 | " collection!".format(backend)) 24 | else: 25 | if not issubclass(backend_class, subclass): 26 | raise ImportError("Unsupported '{}' destination" 27 | " backend".format(backend)) 28 | return instance 29 | -------------------------------------------------------------------------------- /osarchiver/common/db.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | DB base class file which provide helpers and common method for Source and 6 | Destination Db backend 7 | 8 | The class provide a metadata storage to prevent doing some compute 9 | several times. It also keeps a reference on pymysq.cursor per table to 10 | avoid creating too much cursor 11 | """ 12 | 13 | import logging 14 | import re 15 | import warnings 16 | import time 17 | import timeit 18 | # need to include datetime to handle some result 19 | # of pymysql (integrity exception helpers) 20 | import datetime # noqa 21 | import pymysql 22 | from sqlalchemy import create_engine 23 | 24 | 25 | class DbBase(): 26 | """ 27 | The DbBase class that should be inherited from Source and Destination Db 28 | backend 29 | """ 30 | 31 | def __init__(self, 32 | host=None, 33 | user=None, 34 | password=None, 35 | select_limit=1000, 36 | delete_limit=500, 37 | port=3306, 38 | dry_run=False, 39 | deleted_column=None, 40 | max_retries=5, 41 | bulk_insert=1000, 42 | retry_time_limit=2, 43 | delete_loop_delay=2, 44 | foreign_key_check=True, 45 | **kwargs): 46 | """ 47 | instantiator of database base class 48 | """ 49 | self.host = host 50 | self.user = user 51 | self.port = int(port) 52 | self.password = password 53 | self.delete_limit = int(delete_limit) 54 | self.deleted_column = deleted_column 55 | self.connection = None 56 | self.select_limit = int(select_limit) 57 | self.bulk_insert = int(bulk_insert) 58 | self.dry_run = dry_run 59 | self.metadata = {} 60 | self._sqlalchemy_engine = None 61 | # number of retries when an error occure 62 | self.max_retries = max_retries 63 | # how long wait between two retry 64 | self.retry_time_limit = retry_time_limit 65 | self.delete_loop_delay = delete_loop_delay 66 | self.foreign_key_check = foreign_key_check 67 | 68 | # hide some warnings we do not care 69 | warnings.simplefilter("ignore") 70 | self.connect() 71 | 72 | @property 73 | def sqlalchemy_engine(self): 74 | if self._sqlalchemy_engine is None: 75 | url = "mysql+pymysql://{user}:{password}@".format( 76 | user=self.user, password=self.password) 77 | if self.host is not None: 78 | url += self.host 79 | if self.port is not None: 80 | url += ':{port}'.format(port=self.port) 81 | self._sqlalchemy_engine = create_engine(url) 82 | 83 | return self._sqlalchemy_engine 84 | 85 | def connect(self): 86 | """ 87 | connect to the database and set the connection attribute to 88 | pymysql.connect 89 | """ 90 | self.connection = pymysql.connect(host=self.host, 91 | user=self.user, 92 | port=self.port, 93 | password=self.password, 94 | database=None) 95 | logging.debug("Successfully connected to mysql://%s:%s@%s:%s", 96 | self.user, '*' * len(self.password), self.host, 97 | self.port) 98 | 99 | def disconnect(self): 100 | """ 101 | disconnect from the databse if connection is open 102 | """ 103 | if self.connection.open: 104 | self.connection.close() 105 | 106 | def add_metadata(self, database=None, table=None, key=None, value=None): 107 | """ 108 | store for one database/table a key with a value 109 | """ 110 | if database not in self.metadata: 111 | self.metadata[database] = {} 112 | 113 | if table not in self.metadata[database]: 114 | self.metadata[database][table] = {} 115 | 116 | logging.debug("Adding metadata %s.%s.%s = %s", database, table, key, 117 | value) 118 | 119 | self.metadata[database][table][key] = value 120 | return self.metadata[database][table][key] 121 | 122 | def get_metadata(self, database=None, table=None, key=None): 123 | """ 124 | return the key's value for a database.table 125 | """ 126 | if database is None or table is None: 127 | return None 128 | 129 | if database in self.metadata and table in self.metadata[database]: 130 | return self.metadata[database][table].get(key) 131 | 132 | return None 133 | 134 | def disable_fk_check(self, cursor=None): 135 | """ 136 | Disable foreign key check for a cursor 137 | """ 138 | logging.debug("Disabling foreign_key_check") 139 | cursor.execute("SET FOREIGN_KEY_CHECKS=0;") 140 | 141 | def enable_fk_check(self, cursor=None): 142 | """ 143 | Enable foreign key check for a cursor 144 | """ 145 | logging.debug("Enabling foreign_key_check") 146 | cursor.execute("SET FOREIGN_KEY_CHECKS=1;") 147 | 148 | def check_request_retry(self): 149 | """ 150 | When an SQL error occured, this method is called and do some check 151 | Right now it only re-open connection if the connection is closed 152 | """ 153 | logging.debug("Sleeping %s sec before retrying....", 154 | self.retry_time_limit) 155 | time.sleep(int(self.retry_time_limit)) 156 | # Handle auto reconnect 157 | if not self.connection.open: 158 | logging.info("Re-opening connection which seems abnormaly " 159 | "closed") 160 | self.connect() 161 | 162 | def set_foreign_key_check(self, 163 | foreign_key_check=None, 164 | cursor=None, 165 | database=None, 166 | table=None, 167 | new_cursor=False): 168 | """ 169 | This method set the correct value to foreign key check. Instead of 170 | executing it at each requests which is time consuming and overloading 171 | it checks in metadata the current value and change it if needed 172 | """ 173 | fk_check_in_cache = False 174 | current_fk_check = self.get_metadata( 175 | database=database, 176 | table=table, 177 | key='fk_check_{c}'.format(c=cursor)) 178 | 179 | # nothing in cache we want to apply the foreign_key_check value 180 | # set current_fk_check to negate of foreign_key_check 181 | if current_fk_check is None or new_cursor is True: 182 | logging.debug("foreign key check value not found in cache") 183 | current_fk_check = not foreign_key_check 184 | 185 | if foreign_key_check is False and current_fk_check is True: 186 | self.disable_fk_check(cursor=cursor) 187 | elif foreign_key_check is True and current_fk_check is False: 188 | self.enable_fk_check(cursor=cursor) 189 | else: 190 | fk_check_in_cache = True 191 | 192 | if database is not None \ 193 | and table is not None and not fk_check_in_cache: 194 | self.add_metadata(database=database, 195 | table=table, 196 | key='fk_check_{c}'.format(c=cursor), 197 | value=foreign_key_check) 198 | 199 | def get_cursor(self, 200 | database=None, 201 | table=None, 202 | cursor_type=None, 203 | new=False, 204 | fk_check=None): 205 | """ 206 | Return the pymysql cursor mapped to a database.table if exists in 207 | metadata otherwise it create a new cursor 208 | """ 209 | default_cursor_type = pymysql.cursors.Cursor 210 | cursor_type = cursor_type or default_cursor_type 211 | cursor = None 212 | cursor_in_cache = False 213 | # open db connection if not opened 214 | if not self.connection.open: 215 | self.connect() 216 | cursor = self.connection.cursor(cursor_type) 217 | else: 218 | # if this is not a cursor creation 219 | # try to get the cached one from metadata 220 | if not new: 221 | cursor = self.get_metadata( 222 | database=database, 223 | table=table, 224 | key='cursor_{c}'.format(c=cursor_type)) 225 | 226 | # if cursor is None (creation or not found in metadata) 227 | # set the cursor type to default one 228 | type_of_cursor = type(cursor) 229 | if cursor is None: 230 | type_of_cursor = default_cursor_type 231 | 232 | # Check if the cursor retrieved is well typed 233 | # if not force the cursor to be unset 234 | # it will be re-created after 235 | if cursor is not None and cursor_type != type_of_cursor: 236 | logging.debug( 237 | "Type of cursor found in cache is %s, we want %s" 238 | " instead, need to create a new cursor", type_of_cursor, 239 | cursor_type) 240 | cursor = None 241 | 242 | # cursor creation 243 | if cursor is None: 244 | logging.debug("No existing cursor found, creating a new one") 245 | cursor = self.connection.cursor(cursor_type) 246 | else: 247 | cursor_in_cache = True 248 | logging.debug("Using cached cursor %s", cursor) 249 | # set the foreign key check value if needed 250 | # for the cursor 251 | if fk_check is not None: 252 | self.set_foreign_key_check(cursor=cursor, 253 | database=database, 254 | table=table, 255 | foreign_key_check=fk_check, 256 | new_cursor=new) 257 | # Add the cursor in cache 258 | if database is not None and table is not None and not cursor_in_cache: 259 | logging.debug("Caching cursor for %s.%s", database, table) 260 | self.add_metadata(database=database, 261 | table=table, 262 | key='cursor_{c}'.format(c=cursor_type), 263 | value=cursor) 264 | return cursor 265 | 266 | def _db_execute(self, sql=None, cursor=None, method=None, values=None): 267 | """ 268 | Execute a request on database 269 | """ 270 | logging.debug("Executing SQL command: '%s'", sql) 271 | # execute / execute_many method 272 | start = timeit.default_timer() 273 | getattr(cursor, method)(sql, values) 274 | end = timeit.default_timer() 275 | logging.debug("SQL duration: %s sec", end - start) 276 | 277 | def _db_fetch(self, fetch_method=None, cursor=None, fetch_args=None): 278 | """ 279 | This method fetch data in database 280 | """ 281 | start = timeit.default_timer() 282 | fetched_values = getattr(cursor, fetch_method)(**fetch_args) 283 | end = timeit.default_timer() 284 | logging.debug("Data fetch duration: %s sec", end - start) 285 | return fetched_values 286 | 287 | def _db_commit(self, cursor=None, sql=None, values_length=None): 288 | """ 289 | Commit the executed request, return the number of row modified 290 | """ 291 | if self.dry_run: 292 | logging.info( 293 | "[DRY RUN]: here is what I should have " 294 | "commited: '%s'", cursor.mogrify(query=sql)) 295 | self.connection.rollback() 296 | return values_length 297 | # Not dry-run mode: commit the request 298 | # return the number of row affected by the request 299 | start = timeit.default_timer() 300 | self.connection.commit() 301 | end = timeit.default_timer() 302 | logging.debug("Commit duration: %s sec", end - start) 303 | return cursor.rowcount 304 | 305 | def db_request(self, 306 | sql=None, 307 | values=None, 308 | fetch_method=None, 309 | fetch_args=None, 310 | database=None, 311 | table=None, 312 | cursor_type=None, 313 | foreign_key_check=None, 314 | execute_method='execute'): 315 | """ 316 | generic method to do a request to the db 317 | It handles a retry on failure, execept for foreign key exception which 318 | in our case useless 319 | In case of error connection, it sleeps 20 seconds before retrying 320 | """ 321 | retry = 0 322 | cursor = None 323 | force_cursor_creation = False 324 | values = values or [] 325 | 326 | fetch_args = fetch_args or {} 327 | if foreign_key_check is None: 328 | foreign_key_check = self.foreign_key_check 329 | if self.dry_run: 330 | foreign_key_check = False 331 | logging.debug("Force disabling foreign key check because we are in" 332 | " dry run mode") 333 | 334 | while retry <= self.max_retries: 335 | try: 336 | if retry > 0: 337 | logging.info("Retry %s/%s", retry, self.max_retries) 338 | self.check_request_retry() 339 | 340 | if cursor is None: 341 | cursor = self.get_cursor(database=database, 342 | table=table, 343 | cursor_type=cursor_type, 344 | fk_check=foreign_key_check, 345 | new=force_cursor_creation) 346 | 347 | if database is not None: 348 | self.connection.select_db(database) 349 | 350 | # Execute the query 351 | self._db_execute(sql=sql, 352 | cursor=cursor, 353 | method=execute_method, 354 | values=values) 355 | 356 | # Fetch and return the data 357 | if fetch_method is not None: 358 | return self._db_fetch(fetch_method=fetch_method, 359 | cursor=cursor, 360 | fetch_args=fetch_args) 361 | # no fetch_method means we need to commit the request 362 | # In dry_run mode just display what would have been commited 363 | return self._db_commit(cursor=cursor, 364 | sql=sql, 365 | values_length=len(values)) 366 | 367 | except pymysql.Error as sql_exception: 368 | logging.error("SQL error: %s", sql_exception.args) 369 | if sql_exception.args[0] == "(0, '')": 370 | logging.debug("Cursor need to be recreated") 371 | if cursor is not None: 372 | cursor.close() 373 | cursor = None 374 | force_cursor_creation = True 375 | # foreign key constraint error, there is no sense in continuing 376 | if sql_exception.args[0] == 1451: 377 | logging.debug("Foreign key constraint error no retry " 378 | "attempted") 379 | retry = self.max_retries 380 | if sql_exception.args[0] == 2003: 381 | self.connection.close() 382 | logging.error("MySQL connection error, sleeping 20 " 383 | "seconds before reconnecting...") 384 | retry += 1 385 | if retry > self.max_retries: 386 | raise sql_exception 387 | continue 388 | finally: 389 | # We want to rollback regardless the error 390 | # This to prevent some undo log to be stacked on server side 391 | self.connection.rollback() 392 | 393 | def get_os_databases(self): 394 | """ 395 | Return a list of databases available 396 | """ 397 | sql = "SHOW DATABASES" 398 | result = self.db_request(sql=sql, fetch_method='fetchall') 399 | logging.debug("DB result: %s", result) 400 | return [i[0] for i in result] 401 | 402 | def get_database_tables(self, database=None): 403 | """ 404 | Return a list of tables available for a database 405 | """ 406 | if database is None: 407 | logging.warning( 408 | "Can not call get_database_tables on None database") 409 | return [] 410 | 411 | sql = "SHOW TABLES" 412 | return self.db_request(sql=sql, 413 | database=database, 414 | fetch_method='fetchall') 415 | 416 | def table_has_column(self, database=None, table=None, column=None): 417 | """ 418 | Return True/False after checking that a column exists in a table 419 | """ 420 | sql = "SELECT column_name FROM information_schema.columns WHERE "\ 421 | "table_schema='{db}' and table_name='{table}' AND "\ 422 | "column_name='{column}'".format( 423 | db=database, table=table, column=column) 424 | return bool( 425 | self.db_request(sql=sql, 426 | fetch_method='fetchall', 427 | database=database, 428 | table=table)) 429 | 430 | def table_has_deleted_column(self, database=None, table=None): 431 | """ 432 | Return True/False depending if the table has the deleted column 433 | """ 434 | return self.table_has_column(database=database, 435 | table=table, 436 | column=self.deleted_column) 437 | 438 | def get_table_primary_key(self, database=None, table=None): 439 | """ 440 | Return the first primary key of a table 441 | Store the pk in metadata and return it if exists 442 | """ 443 | primary_key = self.get_metadata(database=database, 444 | table=table, 445 | key='primary_key') 446 | if primary_key is not None: 447 | return primary_key 448 | 449 | sql = "SHOW KEYS FROM {db}.{table} WHERE "\ 450 | "Key_name='PRIMARY'".format(db=database, table=table) 451 | # Dirty but .... Column name is the 5 row 452 | primary_key = self.db_request(sql=sql, fetch_method='fetchone')[4] 453 | logging.debug("Primary key of %s.%s is %s", database, table, 454 | primary_key) 455 | self.add_metadata(database=database, 456 | table=table, 457 | key='primary_key', 458 | value=primary_key) 459 | return primary_key 460 | 461 | def get_tables_with_fk(self, database=None, table=None): 462 | """ 463 | For a given table return a list of foreign key 464 | """ 465 | sql = "SELECT table_schema, table_name, column_name "\ 466 | "FROM information_schema.key_column_usage "\ 467 | "WHERE referenced_table_name IS NOT NULL" \ 468 | " AND referenced_table_schema='{db}'"\ 469 | " AND table_name='{table}'".format( 470 | db=database, table=table) 471 | 472 | result = self.db_request(sql=sql, 473 | fetch_method='fetchall', 474 | cursor_type=pymysql.cursors.DictCursor) 475 | if result: 476 | logging.debug("Table %s.%s have child tables with foreign key: %s", 477 | database, table, result) 478 | else: 479 | logging.debug( 480 | "Table %s.%s don't have child tables with foreign " 481 | "key", database, table) 482 | return result 483 | 484 | def sql_integrity_exception_parser(self, error): 485 | """ 486 | Parse a foreign key integrity exception and return a dict of pattern 487 | with useful information 488 | """ 489 | result = {} 490 | regexp = r'^.+fails \(`'\ 491 | r'(?P.+)`\.`'\ 492 | r'(?P.+)`, CONSTRAINT `.+`'\ 493 | r' FOREIGN KEY \(`'\ 494 | r'(?P.+)`\) REFERENCES `'\ 495 | r'(?P.+)` \(`'\ 496 | r'(?P.+)`\)\)$' 497 | match = re.match(regexp, error) 498 | if match: 499 | result = match.groupdict() 500 | else: 501 | logging.warning("SQL error '%s' does not match regexp " 502 | "'%s'", error, regexp) 503 | return result 504 | 505 | def integrity_exception_select_statement(self, error="", row=None): 506 | """ 507 | Parse a foreign key excpetion and return a SELECT statement to 508 | retrieve the offending children rows 509 | """ 510 | row = row or {} 511 | data = self.sql_integrity_exception_parser(error) 512 | # empty dict is when failing to parse exception 513 | if not data: 514 | return "Unable to parse exception, here data: "\ 515 | "{row}".format(row=row) 516 | 517 | return "SELECT * FROM `{db}`.`{table}` WHERE `{fk}` = "\ 518 | "'{value}'".format(value=row[data['ref_column']], 519 | **data) 520 | 521 | def integrity_exception_potential_fix(self, error="", row=None): 522 | """ 523 | Parse a foerign key exception and return an UPDATE sql statement that 524 | mark non deleted children data as deleted 525 | """ 526 | row = row or {} 527 | data = self.sql_integrity_exception_parser(error) 528 | if not data: 529 | return "Unable to parse exception, here data: "\ 530 | "{row}".format(row=row) 531 | 532 | update = "UPDATE `{db}`.`{table}` INNER JOIN `{db}`.`{ref_table}` ON "\ 533 | "`{db}`.`{ref_table}`.`{ref_column}` = `{db}`.`{table}`.`{fk}` "\ 534 | "SET `{db}`.`{table}`.`{deleted_column}` = "\ 535 | "`{db}`.`{ref_table}`.`{deleted_column}` WHERE {fk} = " 536 | 537 | if str(row[data['ref_column']]).isdigit(): 538 | update += "{value}" 539 | else: 540 | update += "'{value}'" 541 | 542 | update += " AND `{db}`.`{table}`.`{deleted_column}` IS NULL" 543 | update = update.format(deleted_column=self.deleted_column, 544 | value=row[data['ref_column']], 545 | **data) 546 | 547 | return update 548 | -------------------------------------------------------------------------------- /osarchiver/config.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | Configuration class to handle osarchiver config 6 | """ 7 | 8 | import re 9 | import logging 10 | import configparser 11 | 12 | from osarchiver.archiver import Archiver 13 | from osarchiver.destination import factory as dst_factory 14 | from osarchiver.source import factory as src_factory 15 | 16 | BOOLEAN_OPTIONS = ['delete_data', 'archive_data', 'enable', 'foreign_key_check'] 17 | 18 | 19 | class Config(): 20 | """ 21 | This class is able to read an ini configuration file and instanciate 22 | Archivers to be run 23 | """ 24 | 25 | def __init__(self, file_path=None, dry_run=False): 26 | self.file_path = file_path 27 | """ 28 | Config class instantiator. Instantiate a configparser 29 | """ 30 | self.parser = configparser.ConfigParser( 31 | interpolation=configparser.ExtendedInterpolation()) 32 | self.loaded = 0 33 | self._archivers = [] 34 | self._sources = [] 35 | self._destinations = [] 36 | self.dry_run = dry_run 37 | 38 | def load(self, file_path=None): 39 | """ 40 | Load a file given in arguments and make the configparser read it 41 | and return the parser 42 | """ 43 | 44 | if self.loaded == 1: 45 | return self.parser 46 | 47 | if file_path is not None: 48 | self.file_path = file_path 49 | 50 | logging.info("Loading configuration file %s", self.file_path) 51 | loaded_files = self.parser.read(self.file_path) 52 | self.loaded = len(loaded_files) 53 | logging.debug("Config object loaded") 54 | logging.debug(loaded_files) 55 | return self.parser 56 | 57 | def sections(self): 58 | """ 59 | return call to sections() of ConfigParser for the config file 60 | """ 61 | if self.loaded == 0: 62 | self.load() 63 | return self.parser.sections() 64 | 65 | def section(self, name, default=True): 66 | """ 67 | return a dict of key/value for the given section 68 | if defaults is set to False, it will remove defaults value from the section 69 | """ 70 | if not name or not self.parser.has_section(name): 71 | return {} 72 | default_keys = [] 73 | if not default: 74 | default_keys = [k for k, v in self.parser.items('DEFAULT')] 75 | return { 76 | k: v for k, v in self.parser.items(name) if k not in default_keys 77 | } 78 | 79 | @property 80 | def archivers(self): 81 | """ 82 | This method load the configuration and instantiate all the Source and 83 | Destination objects needed for each archiver 84 | """ 85 | self.load() 86 | if self._archivers: 87 | return self._archivers 88 | 89 | archiver_sections = [ 90 | a for a in self.sections() if str(a).startswith('archiver:') 91 | ] 92 | 93 | def args_factory(section): 94 | """ 95 | Generic function that takes a section from configuration file 96 | and return arguments that are passed to source or destination 97 | factory 98 | """ 99 | args_factory = { 100 | k: v if k not in BOOLEAN_OPTIONS else self.parser.getboolean( 101 | section, k) 102 | for (k, v) in self.parser.items(section) 103 | } 104 | args_factory['name'] = re.sub('^(src|dst):', '', section) 105 | args_factory['dry_run'] = self.dry_run 106 | args_factory['conf'] = self 107 | logging.debug( 108 | "'%s' factory parameters: %s", args_factory['name'], { 109 | k: v if k != 'password' else '***********' 110 | for (k, v) in args_factory.items() 111 | }) 112 | 113 | return args_factory 114 | 115 | # Instanciate archivers: 116 | # One archiver is bascally a process of archiving 117 | # One archiver got one source and at least one destination 118 | # It means we have a total of source*count(destination) 119 | # processes to run per archiver 120 | for archiver in archiver_sections: 121 | # If enable: 0 in archiver config ignore it 122 | if not self.parser.getboolean(archiver, 'enable'): 123 | logging.info("Archiver %s is disabled, ignoring it", archiver) 124 | continue 125 | 126 | # src and dst sections are comma, semicolon, or carriage return 127 | # separated name 128 | src_sections = [ 129 | 'src:{}'.format(i.strip()) 130 | for i in re.split(r'\n|,|;', self.parser[archiver]['src']) 131 | ] 132 | 133 | # destination is not mandatory 134 | # usefull to just delete data from DB 135 | dst_sections = [ 136 | 'dst:{}'.format(i.strip()) for i in re.split( 137 | r'\n|,|;', self.parser[archiver].get('dst', '')) if i 138 | ] 139 | 140 | for src_section in src_sections: 141 | src_args_factory = args_factory(src_section) 142 | src = src_factory(**src_args_factory) 143 | destinations = [] 144 | for dst_section in dst_sections: 145 | dst_args_factory = args_factory(dst_section) 146 | dst_args_factory['source'] = src 147 | dst = dst_factory(**dst_args_factory) 148 | destinations.append(dst) 149 | 150 | self._archivers.append( 151 | Archiver(name=re.sub('^archiver:', '', archiver), 152 | src=src, 153 | dst=destinations, 154 | conf=self)) 155 | 156 | return self._archivers 157 | 158 | @property 159 | def sources(self): 160 | """ 161 | Return a list of Sources object after having loaded the 162 | configuration file 163 | """ 164 | self.load() 165 | self._sources.extend( 166 | [s for s in self.sections() if str(s).startswith('src:')]) 167 | return self._sources 168 | 169 | @property 170 | def destinations(self): 171 | """ 172 | Return a list of Destinations object after having loaded the 173 | configuration file 174 | """ 175 | self.load() 176 | self._destinations.extend( 177 | [d for d in self.sections() if str(d).startswith('dst:')]) 178 | return self._destinations 179 | -------------------------------------------------------------------------------- /osarchiver/destination/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | init file that allow to load osarchiver.source whithout loading submodules 6 | """ 7 | 8 | from osarchiver.common import backend_factory 9 | from osarchiver.destination.base import Destination 10 | 11 | 12 | def factory(*args, backend='db', **kwargs): 13 | """ 14 | backend factory 15 | """ 16 | return backend_factory(*args, 17 | backend=backend, 18 | module='osarchiver.destination', 19 | subclass=Destination, 20 | **kwargs) 21 | -------------------------------------------------------------------------------- /osarchiver/destination/base.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | Destination abstract base class file 7 | """ 8 | 9 | from abc import ABCMeta, abstractmethod 10 | 11 | 12 | class Destination(metaclass=ABCMeta): 13 | """ 14 | The Destination absrtact base class 15 | """ 16 | 17 | def __init__(self, name=None, backend='db', conf=None): 18 | """ 19 | Destination object is defined by a name and a backend 20 | """ 21 | self.name = name 22 | self.backend = backend 23 | self.conf = conf 24 | 25 | @abstractmethod 26 | def write(self, database=None, table=None, data=None): 27 | """ 28 | Write method that should be implemented by the backend 29 | """ 30 | 31 | @abstractmethod 32 | def clean_exit(self): 33 | """ 34 | clean_exit method that should be implemented by the backend 35 | provide a way to close and clean properly backend stuff 36 | """ 37 | -------------------------------------------------------------------------------- /osarchiver/destination/db/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | init file that allow to import Db from osarchiver.destination.db whithout 6 | loading submodules 7 | """ 8 | from osarchiver.destination.db.db import Db 9 | -------------------------------------------------------------------------------- /osarchiver/destination/db/db.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | This module implements the database destination backend which handle writing 6 | data into a MySQL/MariaDB backend 7 | """ 8 | 9 | import logging 10 | import difflib 11 | import re 12 | import time 13 | import arrow 14 | from osarchiver.destination import Destination 15 | from osarchiver.common.db import DbBase 16 | from . import errors as db_errors 17 | 18 | 19 | class Db(Destination, DbBase): 20 | """ 21 | Db class which is instanced when Db backend is required 22 | """ 23 | 24 | def __init__(self, 25 | table=None, 26 | archive_data=None, 27 | name=None, 28 | source=None, 29 | db_suffix='', 30 | table_suffix='', 31 | database=None, 32 | **kwargs): 33 | """ 34 | instance osarchiver.destination.Db class backend 35 | """ 36 | self.database = database 37 | self.table = table 38 | self.archive_data = archive_data 39 | self.source = source 40 | self.archive_db_name = None 41 | self.archive_table_name = None 42 | self.table_db_name = None 43 | self.db_suffix = db_suffix 44 | self.table_suffix = table_suffix 45 | self.normalized_db_suffixes = {} 46 | Destination.__init__(self, backend='db', name=name) 47 | DbBase.__init__(self, **kwargs) 48 | 49 | def __repr__(self): 50 | return "Destination {name} [Backend:{backend} - Host:{host}]".format( 51 | backend=self.backend, host=self.host, name=self.name) 52 | 53 | def normalize_db_suffix(self, db_suffix='', database=None): 54 | """ 55 | Return the name of the suffix that should be added to database name to 56 | build the archive database name in which archive data. It checks that 57 | it is not archived in the same Db than Source. 58 | The database name may contains '{date}' which will be replaced by the 59 | date of archiving in the format '2019-01-17_10:42:42' 60 | """ 61 | 62 | if database is not None and database in self.normalized_db_suffixes: 63 | logging.debug("Using cached db suffix '%s' of '%s' database", 64 | self.normalized_db_suffixes[database], database) 65 | return self.normalized_db_suffixes[database] 66 | 67 | if db_suffix: 68 | self.db_suffix = db_suffix 69 | 70 | # in case source and destination are the same 71 | # archiving in the same db is a non sense 72 | # force db suffix to _'archive in that case 73 | # unless table_suffix is set 74 | if self.source.host == self.host and \ 75 | self.source.port == self.port and \ 76 | not self.db_suffix: 77 | self.db_suffix = '_archive' 78 | logging.warning( 79 | "Your destination host is the same as the source " 80 | "host, to prevent writing on the same database, " 81 | "which could result in data loss the suffix of DB " 82 | "is forced to %s", self.db_suffix) 83 | 84 | if self.source.host == self.host and \ 85 | self.source.port != self.port and \ 86 | not self.db_suffix and not self.table_suffix: 87 | logging.warning("!!!! I can't verify that destination database is " 88 | "different of source database, you may loose data," 89 | " BE CAREFULL!!!") 90 | logging.warning("Sleeping 10 sec...") 91 | time.sleep(10) 92 | 93 | self.db_suffix = str( 94 | self.db_suffix).format(date=arrow.now().strftime('%F_%T')) 95 | 96 | if database is not None: 97 | self.normalized_db_suffixes[database] = self.db_suffix 98 | logging.debug("Caching db suffix '%s' of '%s' database", 99 | self.normalized_db_suffixes[database], database) 100 | 101 | return self.db_suffix 102 | 103 | def normalize_table_suffix(self, table_suffix=None): 104 | """ 105 | Return the suffix of table in which archive data. 106 | The table name may contains '{date}' which will be replaced by the date 107 | of archiving in the format '2019-01-17_10:42:42' 108 | """ 109 | if table_suffix: 110 | self.table_suffix = table_suffix 111 | 112 | self.table_suffix = str( 113 | self.table_suffix).format(date=arrow.now().strftime('%F_%T')) 114 | 115 | return self.table_suffix 116 | 117 | def get_archive_db_name(self, database=None): 118 | """ 119 | Return the name of the archiving database, which is build from the name 120 | of the source database plus a suffix 121 | """ 122 | self.archive_db_name = database + \ 123 | self.normalize_db_suffix(database=database) 124 | return self.archive_db_name 125 | 126 | def archive_db_exists(self, database=None): 127 | """ 128 | Check if a databae already exists, return True/False 129 | """ 130 | self.get_archive_db_name(database=database) 131 | show_db_sql = "SHOW DATABASES LIKE "\ 132 | "'{db}'".format(db=self.archive_db_name) 133 | return bool(self.db_request(sql=show_db_sql, fetch_method='fetchall')) 134 | 135 | def get_src_create_db_statement(self, database=None): 136 | """ 137 | Return result of SHOW CREATE DATABASE of the Source 138 | """ 139 | src_db_create_sql = "SHOW CREATE DATABASE "\ 140 | "{db}".format(db=database) 141 | src_db_create_statement = self.source.db_request( 142 | sql=src_db_create_sql, fetch_method='fetchone')[1] 143 | logging.debug("Source database '%s' CREATE statement: '%s'", database, 144 | src_db_create_statement) 145 | return src_db_create_statement 146 | 147 | def get_dst_create_db_statement(self, database=None): 148 | """ 149 | Return result of SHOW CREATE DATABASE of the Destination 150 | """ 151 | dst_db_create_sql = "SHOW CREATE DATABASE "\ 152 | "{db}".format(db=database) 153 | dst_db_create_statement = self.db_request(sql=dst_db_create_sql, 154 | fetch_method='fetchone')[1] 155 | logging.debug("Destination database '%s' CREATE statement: '%s'", 156 | database, dst_db_create_statement) 157 | return dst_db_create_statement 158 | 159 | def create_archive_db(self, database=None): 160 | """ 161 | Create the Destination database 162 | It checks that if the Destination database exists, the show create 163 | statement are the same than Source which is useful to detect Db schema 164 | upgrade 165 | """ 166 | # Check if db exists 167 | archive_db_exists = self.archive_db_exists(database=database) 168 | 169 | # retrieve source db create statement 170 | # if archive database exists, compare create statement 171 | # else use the statement to create it 172 | src_db_create_statement = self.get_src_create_db_statement( 173 | database=database) 174 | 175 | if archive_db_exists: 176 | logging.debug("Destination DB has '%s' database", 177 | self.archive_db_name) 178 | dst_db_create_statement = self.get_dst_create_db_statement( 179 | database=self.archive_db_name) 180 | 181 | # compare create statement substituing db name in dst (arbitrary 182 | # choice) 183 | to_compare_dst_db_create_statement = re.sub( 184 | 'DATABASE `{dst_db}`'.format(dst_db=self.archive_db_name), 185 | 'DATABASE `{src_db}`'.format(src_db=database), 186 | dst_db_create_statement) 187 | if src_db_create_statement == to_compare_dst_db_create_statement: 188 | logging.info("source and destination database are identical") 189 | else: 190 | logging.debug( 191 | difflib.SequenceMatcher( 192 | None, src_db_create_statement, 193 | to_compare_dst_db_create_statement)) 194 | raise db_errors.OSArchiverNotEqualDbCreateStatements 195 | 196 | else: 197 | logging.debug("'%s' on remote DB does not exists", 198 | self.archive_db_name) 199 | sql = re.sub('`{db}`'.format(db=database), 200 | '`{db}`'.format(db=self.archive_db_name), 201 | src_db_create_statement) 202 | self.db_request(sql=sql) 203 | if not self.dry_run: 204 | logging.debug("Successfully created '%s'", 205 | self.archive_db_name) 206 | 207 | def archive_table_exists(self, database=None, table=None): 208 | """ 209 | Check if the archiving tabel exists, return True or False 210 | """ 211 | self.archive_table_name = table + self.normalize_table_suffix() 212 | show_table_sql = 'SHOW TABLES LIKE '\ 213 | '\'{table}\''.format(table=self.archive_table_name) 214 | return bool( 215 | self.db_request(sql=show_table_sql, 216 | fetch_method='fetchall', 217 | database=self.archive_db_name)) 218 | 219 | def get_src_create_table_statement(self, database=None, table=None): 220 | """ 221 | Return the SHOW CREATE TABLE of Source database 222 | """ 223 | src_table_create_sql = 'SHOW CREATE TABLE '\ 224 | '{table}'.format(table=table) 225 | src_table_create_statement = self.source.db_request( 226 | sql=src_table_create_sql, 227 | fetch_method='fetchone', 228 | database=database)[1] 229 | logging.debug("Source table '%s' CREATE statement: '%s'", database, 230 | src_table_create_statement) 231 | return src_table_create_statement 232 | 233 | def get_dst_create_table_statement(self, database=None, table=None): 234 | """ 235 | Return the SHOW CREATE TABLE of Destination database 236 | """ 237 | dst_table_create_sql = 'SHOW CREATE TABLE '\ 238 | '{table}'.format(table=table) 239 | dst_table_create_statement = self.db_request(sql=dst_table_create_sql, 240 | fetch_method='fetchone', 241 | database=database)[1] 242 | logging.debug("Destination table '%s' CREATE statement: '%s'", 243 | self.archive_db_name, dst_table_create_statement) 244 | return dst_table_create_statement 245 | 246 | def compare_src_and_dst_create_table_statement(self, 247 | src_statement=None, 248 | dst_statement=None, 249 | src_table=None, 250 | dst_table=None): 251 | """ 252 | Check that Source and Destination table are identical to prevent errors 253 | due to db schema upgrade 254 | It raises an exception if there is a difference and display the 255 | difference 256 | """ 257 | # compare create statement substituing db name in dst (arbitrary 258 | # choice) 259 | dst_statement = re.sub( 260 | 'TABLE `{dst_table}`'.format(dst_table=dst_table), 261 | 'TABLE `{src_table}`'.format(src_table=src_table), dst_statement) 262 | 263 | # Remove autoincrement statement 264 | dst_statement = re.sub(r'AUTO_INCREMENT=\d+ ', '', dst_statement) 265 | src_statement = re.sub(r'AUTO_INCREMENT=\d+ ', '', src_statement) 266 | 267 | logging.debug("Comparing source create statement %s", src_statement) 268 | logging.debug("Comparing dest create statement %s", dst_statement) 269 | 270 | if dst_statement == src_statement: 271 | logging.info("source and destination tables are identical") 272 | else: 273 | for diff in difflib.context_diff(src_statement.split('\n'), 274 | dst_statement.split('\n')): 275 | logging.debug(diff.strip()) 276 | 277 | raise db_errors.OSArchiverNotEqualTableCreateStatements 278 | 279 | def create_archive_table(self, database=None, table=None): 280 | """ 281 | Create the archive table in the archive database. 282 | It checks that Source and Destination table are the identical. 283 | """ 284 | # Call create db if archive_db_name is None 285 | if self.archive_db_name is None: 286 | self.create_archive_db(database=database) 287 | else: 288 | logging.debug("Archive db is '%s'", self.archive_db_name) 289 | 290 | # Check if table exists 291 | archive_table_exists = False 292 | if self.archive_db_exists: 293 | archive_table_exists = self.archive_table_exists(database=database, 294 | table=table) 295 | 296 | # retrieve source tabe create statement 297 | # if archive table exists, compare create statement 298 | # else use the statement to create it 299 | src_create_table_statement = self.get_src_create_table_statement( 300 | database=database, table=table) 301 | 302 | if archive_table_exists: 303 | logging.debug("Remote DB has '%s.%s' table", self.archive_db_name, 304 | self.archive_table_name) 305 | dst_table_create_statement = self.get_dst_create_table_statement( 306 | database=self.archive_db_name, table=self.archive_table_name) 307 | self.compare_src_and_dst_create_table_statement( 308 | src_statement=src_create_table_statement, 309 | dst_statement=dst_table_create_statement, 310 | src_table=table, 311 | dst_table=self.archive_table_name) 312 | else: 313 | logging.debug("'%s' table on remote DB does not exists", 314 | self.archive_table_name) 315 | sql = re.sub( 316 | 'TABLE `{table}`'.format(table=table), 317 | 'TABLE `{table}`'.format(table=self.archive_table_name), 318 | src_create_table_statement) 319 | self.db_request(sql=sql, 320 | database=self.archive_db_name, 321 | foreign_key_check=False) 322 | 323 | if not self.dry_run: 324 | logging.debug("Successfully created '%s.%s'", 325 | self.archive_db_name, self.archive_table_name) 326 | 327 | def prerequisites(self, database=None, table=None): 328 | """ 329 | Check that destination database and tables exists before proceeding to 330 | archiving. Keep the result in metadata for performance purpose. 331 | """ 332 | if database in self.metadata and table in self.metadata[database]: 333 | logging.debug("Use cached prerequisites metadata") 334 | return 335 | 336 | self.metadata[database] = {} 337 | logging.info("Checking prerequisites") 338 | 339 | self.create_archive_db(database=database) 340 | self.create_archive_table(database=database, table=table) 341 | self.metadata[database][table] = \ 342 | {'checked': True, 343 | 'primary_key': self.get_table_primary_key(database=database, 344 | table=table)} 345 | return 346 | 347 | def db_bulk_insert(self, 348 | sql=None, 349 | database=None, 350 | table=None, 351 | values=None, 352 | force_commit=False): 353 | """ 354 | Insert a set of data when there are enough data or when the 355 | force_commit is True 356 | Retrurn the remaining values to insert 357 | """ 358 | values = values or [] 359 | # execute and commit if we have enough data to commit(bulk_insert) or 360 | # if commit is forced 361 | if len(values) >= self.bulk_insert or (values and force_commit): 362 | logging.info("Processing bulk insert") 363 | count = self.db_request(sql=sql, 364 | values=values, 365 | database=database, 366 | table=table, 367 | foreign_key_check=False, 368 | execute_method='executemany') 369 | values = [] 370 | logging.info("%s rows inserted into %s.%s", count, database, table) 371 | 372 | return values 373 | 374 | def write(self, database=None, table=None, data=None): 375 | """ 376 | Write method implemented which is in charge of writing data from 377 | Source into archive database. It calls the db_bulk_insert method to 378 | write by set of data 379 | """ 380 | if not self.archive_data: 381 | logging.info( 382 | "Ignoring data archiving because archive_data is " 383 | "set to % s", self.archive_data) 384 | return 385 | 386 | self.prerequisites(database=database, table=table) 387 | primary_key = self.get_table_primary_key(database=database, 388 | table=table) 389 | 390 | values = [] 391 | for item in data: 392 | placeholders = ', '.join(['%s'] * len(item)) 393 | columns = '`' + '`, `'.join(item.keys()) + '`' 394 | sql = "INSERT INTO {database}.{table} ({columns}) VALUES "\ 395 | "({placeholders}) ON DUPLICATE KEY UPDATE {pk} = {pk}".format( 396 | database=self.archive_db_name, 397 | table=table, 398 | columns=columns, 399 | placeholders=placeholders, 400 | pk=primary_key) 401 | values.append([v for v in item.values()]) 402 | values = self.db_bulk_insert(sql=sql, 403 | values=values, 404 | database=self.archive_db_name, 405 | table=table) 406 | 407 | # Force commit of remaining data even if we do not reach the 408 | # bulk_insert limit 409 | self.db_bulk_insert(sql=sql, 410 | database=self.archive_db_name, 411 | table=table, 412 | values=values, 413 | force_commit=True) 414 | return 415 | 416 | def clean_exit(self): 417 | """ 418 | Tasks to be executed to exit cleanly 419 | - disconnect from the db 420 | """ 421 | logging.info("Closing destination DB connection") 422 | self.disconnect() 423 | -------------------------------------------------------------------------------- /osarchiver/destination/db/errors.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | Destination Db implementation exceptions 6 | """ 7 | 8 | from osarchiver.errors import OSArchiverException 9 | 10 | 11 | class OSArchiverNotEqualDbCreateStatements(OSArchiverException): 12 | """ 13 | Exception raised when create statement is different between source and 14 | destination database 15 | """ 16 | 17 | def __init__(self, message=None): 18 | super().__init__(message='The CREATE DATABASE statement is not equal ' 19 | 'between src and dst') 20 | 21 | 22 | class OSArchiverNotEqualTableCreateStatements(OSArchiverException): 23 | """ 24 | Exception raised when create statement is different between source and 25 | destination table 26 | """ 27 | 28 | def __init__(self, message=None): 29 | super().__init__(message='The SHOW CREATE TABLE statement is not equal' 30 | ' between src and dst table') 31 | -------------------------------------------------------------------------------- /osarchiver/destination/file/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | init file that allow to import File from osarchiver.destination.file 7 | """ 8 | 9 | from osarchiver.destination.file.base import File 10 | -------------------------------------------------------------------------------- /osarchiver/destination/file/base.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | Base class file of file backend implementation. 6 | """ 7 | 8 | import logging 9 | import os 10 | import shutil 11 | import re 12 | from importlib import import_module 13 | from abc import ABCMeta, abstractmethod 14 | import arrow 15 | from osarchiver.destination.base import Destination 16 | from osarchiver.destination.file.remote_store import factory as remote_store_factory 17 | 18 | 19 | class File(Destination): 20 | """ 21 | The base File class is a Destination like class which implement file 22 | backend. 23 | """ 24 | 25 | def __init__(self, 26 | directory=None, 27 | archive_format='tar', 28 | formats='csv', 29 | dry_run=False, 30 | source=None, 31 | remote_store=None, 32 | **kwargs): 33 | """ 34 | Initiator 35 | :param str directory: the directory where store the files 36 | :param str archive_format: which format to use to compress file 37 | default is tar, format available are formats available with 38 | shutil.make_archive 39 | :param list formats: list of formats in which data will be written. 40 | The format should be implemented as a subclass of the current class, it 41 | is called a formatter 42 | :param bool dry_run: if enable will not write for real 43 | :param source: the Source instance 44 | """ 45 | 46 | # Archive formats: zip, tar, gztar, bztar, xztar 47 | Destination.__init__(self, backend='file', 48 | conf=kwargs.get('conf', None)) 49 | self.date = arrow.now().strftime('%F_%T') 50 | self.directory = str(directory).format(date=self.date) 51 | self.archive_format = archive_format 52 | self.formats = re.split(r'\n|,|;', formats) 53 | self.formatters = {} 54 | self.source = source 55 | self.dry_run = dry_run 56 | self.remote_store = None 57 | if remote_store is not None: 58 | self.remote_store = re.split(r'\n|,|;', remote_store) 59 | 60 | self.init() 61 | 62 | def close(self): 63 | """ 64 | This method close will call close() method of each formatter 65 | """ 66 | for formatter in self.formatters: 67 | getattr(self.formatters[formatter], 'close')() 68 | 69 | def clean_exit(self): 70 | """ 71 | clean_exit method that should be implemented. Close all formatter and 72 | compress file 73 | """ 74 | self.close() 75 | compressed_files = self.compress() 76 | # Send log files remotely if needed 77 | if self.remote_store: 78 | logging.info("Sending osarchiver files remotely") 79 | for store in self.remote_store: 80 | logging.info("Sending remotely on '%s'", store) 81 | # Retrieve store config options 82 | store_options = self.conf.section( 83 | 'remote_store:%s' % store, default=False) 84 | remote_store = remote_store_factory( 85 | name=store, date=self.date, store_options=store_options) 86 | if self.dry_run: 87 | logging.info( 88 | "As we are in dry-run mode we do not send on %s store", store) 89 | continue 90 | remote_store.send(files=compressed_files) 91 | 92 | if self.dry_run: 93 | try: 94 | logging.info( 95 | "Removing target directory %s because dry-run " 96 | "mode enabled", self.directory) 97 | os.rmdir(self.directory) 98 | except OSError as oserror_exception: 99 | logging.error( 100 | "Unable to remove dest directory (certainly not " 101 | "empty dir): %s", oserror_exception) 102 | 103 | def files(self): 104 | """ 105 | Return a list of files open by all formatters 106 | """ 107 | files = [] 108 | for formatter in self.formatters: 109 | files.extend(getattr(self.formatters[formatter], 'files')()) 110 | 111 | return files 112 | 113 | def compress(self): 114 | """ 115 | Compress all the files open by formatters 116 | """ 117 | compressed_files = [] 118 | for file_to_compress in self.files(): 119 | logging.info("Archiving %s using %s format", file_to_compress, 120 | self.archive_format) 121 | compressed_file = shutil.make_archive( 122 | file_to_compress, 123 | self.archive_format, 124 | root_dir=os.path.dirname(file_to_compress), 125 | base_dir=os.path.basename(file_to_compress), 126 | dry_run=self.dry_run) 127 | 128 | if compressed_file: 129 | logging.info("Compressed file available at %s", 130 | compressed_file) 131 | compressed_files.append(compressed_file) 132 | os.remove(file_to_compress) 133 | return compressed_files 134 | 135 | def init(self): 136 | """ 137 | init stuff 138 | """ 139 | # in case of multiple destinations using file backend 140 | # the class will be instantiated multiple times 141 | # so we need to accept that destination directory 142 | # already exist 143 | # https://github.com/ovh/osarchiver/issues/11 144 | os.makedirs(self.directory, exist_ok=True) 145 | 146 | def write(self, database=None, table=None, data=None): 147 | """ 148 | Write method that should be implemented 149 | For each format instanciate a formatter and writes the data set 150 | """ 151 | logging.info("Writing on backend %s %s data length", self.backend, 152 | len(data)) 153 | 154 | for write_format in self.formats: 155 | # initiate formatter 156 | if write_format not in self.formatters: 157 | try: 158 | class_name = write_format.capitalize() 159 | module = import_module( 160 | 'osarchiver.destination.file.{write_format}'.format( 161 | write_format=write_format)) 162 | formatter_class = getattr(module, class_name) 163 | formatter_instance = formatter_class( 164 | directory=self.directory, 165 | dry_run=self.dry_run, 166 | source=self.source) 167 | self.formatters[write_format] = formatter_instance 168 | except (AttributeError, ImportError) as my_exception: 169 | logging.error(my_exception) 170 | raise ImportError( 171 | "{} is not part of our file formatter".format( 172 | write_format)) 173 | else: 174 | if not issubclass(formatter_class, Formatter): 175 | raise ImportError( 176 | "Unsupported '{}' file format ".format( 177 | write_format)) 178 | 179 | writer = self.formatters[write_format] 180 | writer.write(database=database, table=table, data=data) 181 | 182 | 183 | class Formatter(metaclass=ABCMeta): 184 | """ 185 | Formatter base class which implements a backend, each backend have to 186 | inherit from that class 187 | """ 188 | 189 | def __init__(self, name=None, directory=None, dry_run=None, source=None): 190 | """ 191 | Initiator: 192 | 193 | """ 194 | self.directory = directory 195 | self.source = source 196 | self.handlers = {} 197 | self.now = arrow.now().strftime('%F_%T') 198 | self.dry_run = dry_run 199 | self.name = name or type(self).__name__.upper() 200 | 201 | def files(self): 202 | """ 203 | Return the list of file handlers 204 | """ 205 | return [self.handlers[h]['file'] for h in self.handlers] 206 | 207 | @abstractmethod 208 | def write(self, data=None): 209 | """ 210 | Write method that should be implemented by the classes that inherit 211 | from the import formatter class 212 | """ 213 | 214 | def close(self): 215 | """ 216 | The method close all the file handler which are not closed 217 | """ 218 | for handler in self.handlers: 219 | if self.handlers[handler]['fh'].closed: 220 | continue 221 | logging.info("Closing handler of %s", 222 | self.handlers[handler]['file']) 223 | self.handlers[handler]['fh'].close() 224 | self.handlers[handler]['fh'].close() 225 | -------------------------------------------------------------------------------- /osarchiver/destination/file/csv.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | Implementation of CSV writer (SQL data -> CSV file) 7 | """ 8 | 9 | import logging 10 | import csv 11 | from osarchiver.destination.file.base import Formatter 12 | 13 | 14 | class Csv(Formatter): 15 | """ 16 | The class implement a formatter of CSV type which is able to convert a list 17 | of dict of SQL data into one CSV file 18 | """ 19 | 20 | def write(self, database=None, table=None, data=None): 21 | """ 22 | The write method which should be implemented because of ineherited 23 | Formatter class. 24 | The name of the file is of the form .
.csv 25 | """ 26 | 27 | destination_file = '{directory}/{db}.{table}.csv'.format( 28 | directory=self.directory, db=database, table=table, 29 | ) 30 | key = '{db}.{table}'.format(db=database, table=table) 31 | 32 | writer = None 33 | if key in self.handlers: 34 | writer = self.handlers[key]['csv_writer'] 35 | else: 36 | self.handlers[key] = {} 37 | self.handlers[key]['file'] = destination_file 38 | self.handlers[key]['fh'] = open( 39 | destination_file, 'w', encoding='utf-8') 40 | self.handlers[key]['csv_writer'] = \ 41 | csv.DictWriter( 42 | self.handlers[key]['fh'], 43 | fieldnames=[h for h in data[0].keys()]) 44 | writer = self.handlers[key]['csv_writer'] 45 | if not self.dry_run: 46 | logging.debug("It seems this is the first write set, adding " 47 | " headers to CSV file") 48 | writer.writeheader() 49 | else: 50 | logging.debug( 51 | "[DRY RUN] headers not written in %s", destination_file) 52 | 53 | logging.info("%s formatter: writing %s line in %s", self.name, 54 | len(data), destination_file) 55 | if not self.dry_run: 56 | writer.writerows(data) 57 | else: 58 | logging.debug("[DRY RUN] No data written in %s", destination_file) 59 | -------------------------------------------------------------------------------- /osarchiver/destination/file/remote_store/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | init file that allow to load osarchiver.source whithout loading submodules 6 | """ 7 | 8 | from osarchiver.common import backend_factory 9 | from osarchiver.destination.file.remote_store.base import RemoteStore 10 | 11 | 12 | def factory(*args, backend='swift', **kwargs): 13 | """ 14 | remote store backend factory 15 | """ 16 | return backend_factory(*args, 17 | backend=backend, 18 | module='osarchiver.destination.file.remote_store', 19 | subclass=RemoteStore, 20 | **kwargs) 21 | -------------------------------------------------------------------------------- /osarchiver/destination/file/remote_store/base.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | RemoteStore abstract base class file 7 | """ 8 | 9 | from abc import ABCMeta, abstractmethod 10 | import arrow 11 | import re 12 | 13 | 14 | class RemoteStore(metaclass=ABCMeta): 15 | """ 16 | The RemoteStore abstract class 17 | """ 18 | 19 | def __init__(self, name=None, backend='swift', date=None, store_options={}): 20 | """ 21 | RemoteStore object is defined by a name and a backend 22 | """ 23 | self.name = name 24 | self.date = date or arrow.now().strftime('%F_%T') 25 | self.backend = backend 26 | self.store_options = { 27 | re.sub('^opt_', '', k): v for k, v in store_options.items() if k.startswith('opt_') 28 | } 29 | 30 | @abstractmethod 31 | def send(self, files=[]): 32 | """ 33 | Send method that should be implemented by the backend 34 | """ 35 | -------------------------------------------------------------------------------- /osarchiver/destination/file/remote_store/swift.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | This module implements the swift remote_log backend which handle sending of 6 | OSArchiver log files into swift backend. 7 | This module use the high level module SwiftService 8 | """ 9 | 10 | import logging 11 | from os.path import basename 12 | from swiftclient.service import SwiftError, SwiftService, SwiftUploadObject 13 | 14 | from osarchiver.destination.file.remote_store import RemoteStore 15 | 16 | 17 | class Swift(RemoteStore): 18 | """ 19 | Swift class used to send log remotely on openstack's swift backend 20 | """ 21 | 22 | def __init__(self, name=None, date=None, store_options={}): 23 | """ 24 | instance osarchiver.remote_log.swift class 25 | """ 26 | RemoteStore.__init__(self, backend='swift', name=name, 27 | date=date, store_options=store_options) 28 | self.container = store_options.get('container', None) 29 | self.file_name_prefix = store_options.get('file_name_prefix', '') 30 | self.service = None 31 | 32 | def send(self, files=[]): 33 | """ 34 | send method implemented which is in charge of sending local log files 35 | to a remote swift destination. 36 | """ 37 | 38 | options = self.store_options 39 | with SwiftService(options=options) as swift: 40 | file_objects = [ 41 | SwiftUploadObject(f, 42 | object_name='%s/%s/%s' % ( 43 | self.file_name_prefix, 44 | self.date, 45 | basename(f)) 46 | ) for f in files] 47 | for r in swift.upload(self.container, file_objects): 48 | if r['success']: 49 | if 'object' in r: 50 | logging.info("%s successfully uploaded" % r['object']) 51 | else: 52 | error = r['error'] 53 | if r['action'] == "create_container": 54 | logging.error("Failed to create container %s: %s", 55 | self.container, error) 56 | elif r['action'] == "upload_object": 57 | logging.error("Failed to upload file %s: %s", 58 | r['object'], error) 59 | else: 60 | logging.error("Unknown error while uploading file: %s", 61 | error) 62 | 63 | def clean_exit(self): 64 | """ 65 | Tasks to be executed to exit cleanly 66 | """ 67 | pass 68 | 69 | 70 | if __name__ == '__main__': 71 | pass 72 | -------------------------------------------------------------------------------- /osarchiver/destination/file/sql.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | Implementation of CSV writer (SQL data -> SQL file) 7 | """ 8 | 9 | import logging 10 | import re 11 | import pymysql 12 | from osarchiver.destination.file.base import Formatter 13 | 14 | 15 | class Sql(Formatter): 16 | """ 17 | The class implement a formatter of SQL type which is able to convert a list 18 | of dict of data into one file of SQL statement 19 | """ 20 | 21 | def get_handler(self, handler=None, file_to_handle=None): 22 | """ 23 | Return a file handler if it already exists or create a new one 24 | """ 25 | if handler not in self.handlers: 26 | self.handlers[handler] = {} 27 | self.handlers[handler]['file'] = file_to_handle 28 | self.handlers[handler]['fh'] = open(file_to_handle, 29 | 'w', 30 | encoding='utf-8') 31 | 32 | return self.handlers[handler]['fh'] 33 | 34 | def write(self, database=None, table=None, data=None): 35 | """ 36 | The write method which should be implemented because of ineherited 37 | Formatter class 38 | The name of the file is of the form .
.sql 39 | The SQL statement is: 40 | INSERT INTO .
(col1, col2, ... ) VALUES 41 | (val1, val2, ... ) 42 | ON DUPLICATE KEY UPDATE = 43 | This will help in importing again a file without removing already 44 | inserted lines 45 | """ 46 | destination_file = '{directory}/{db}.{table}.sql'.format( 47 | directory=self.directory, db=database, table=table) 48 | key = '{db}.{table}'.format(db=database, table=table) 49 | 50 | writer = self.get_handler(handler=key, file_to_handle=destination_file) 51 | lines = [] 52 | primary_key = self.source.get_table_primary_key(database=database, 53 | table=table) 54 | for item in data: 55 | # Build columns insert part 56 | # iterate over keys or values of dict is consitent in python 3 57 | columns = '`' + '`, `'.join(item.keys()) + '`' 58 | # SQL scaping, None is changed to NULL 59 | values = [ 60 | pymysql.escape_string(str(v)) if v is not None else 'NULL' 61 | for v in item.values() 62 | ] 63 | placeholders = "'" + "', '".join(values) + "'" 64 | # Remove the simple quote around NULL statement to be understood as 65 | # a MysQL NULL key word. 66 | placeholders = re.sub("'NULL'", "NULL", placeholders) 67 | # The SQL statement 68 | sql = "INSERT INTO {database}.{table} ({columns}) VALUES "\ 69 | "({placeholders}) ON DUPLICATE KEY UPDATE {pk} = {pk};\n".\ 70 | format( 71 | database=database, 72 | table=table, 73 | columns=columns, 74 | placeholders=placeholders, 75 | pk=primary_key 76 | ) 77 | lines.append(sql) 78 | 79 | logging.info("%s formatter: writing %s lines in %s", self.name, 80 | len(data), destination_file) 81 | if not self.dry_run: 82 | writer.writelines(lines) 83 | else: 84 | logging.debug("[DRY RUN] No data writen in %s", destination_file) 85 | -------------------------------------------------------------------------------- /osarchiver/errors.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | OSArchiver exceptions base class 6 | """ 7 | 8 | 9 | class OSArchiverException(Exception): 10 | """ 11 | OSArchiver base exception class 12 | """ 13 | def __init__(self, message=None): 14 | """ 15 | Instance the exception base class 16 | """ 17 | super().__init__(message) 18 | self.message = message 19 | 20 | def __str__(self): 21 | return self.message 22 | 23 | 24 | class OSArchiverArchivingFailed(OSArchiverException): 25 | """ 26 | Exception raised when archiving fail 27 | """ 28 | def __init__(self, message=None): 29 | super().__init__(message='Archiving of data set failed') 30 | -------------------------------------------------------------------------------- /osarchiver/main.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | main file providing osarchiver program 6 | """ 7 | 8 | import sys 9 | import os 10 | import logging 11 | import argparse 12 | import traceback 13 | 14 | from osarchiver.config import Config 15 | 16 | 17 | def parse_args(): 18 | """ 19 | function to parse CLI arguments 20 | return parse_args() of ArgumentParser 21 | """ 22 | parser = argparse.ArgumentParser() 23 | 24 | def file_exists(one_file): 25 | if not os.path.exists(one_file): 26 | raise argparse.ArgumentTypeError( 27 | '{f} no such file'.format(f=one_file)) 28 | return one_file 29 | 30 | parser.add_argument('--config', 31 | help='Configuration file to read', 32 | default=None, 33 | required=True, 34 | type=file_exists) 35 | parser.add_argument('--log-file', 36 | help='Append log to the specified file', 37 | default=None) 38 | parser.add_argument('--log-level', 39 | help='Set log level', 40 | choices=['info', 'warn', 'error', 'debug'], 41 | default='info') 42 | parser.add_argument('--debug', 43 | help='Enable debug mode', 44 | default=False, 45 | action='store_true') 46 | parser.add_argument('--dry-run', 47 | help='Display what would be done without' 48 | ' really deleting or writing data', 49 | default=False, 50 | action='store_true') 51 | args = parser.parse_args() 52 | 53 | if args.debug: 54 | args.log_level = 'debug' 55 | 56 | return args 57 | 58 | 59 | def configure_logger(level='info', log_file=None): 60 | """ 61 | function that configure logging module 62 | """ 63 | logger = logging.getLogger() 64 | logger.setLevel(getattr(logging, level.upper())) 65 | 66 | formatter = logging.Formatter(fmt='%(asctime)s %(levelname)s: %(message)s') 67 | 68 | stdout_handler = logging.StreamHandler(stream=sys.stdout) 69 | stdout_handler.setFormatter(formatter) 70 | logger.addHandler(stdout_handler) 71 | 72 | if log_file is not None: 73 | file_handler = logging.FileHandler(filename=log_file, encoding='utf-8') 74 | file_handler.setFormatter(formatter) 75 | logger.addHandler(file_handler) 76 | 77 | 78 | def run(): 79 | """ 80 | main function that is called when running osarchiver script 81 | It parses arguments, configure logging, load the configuration file and for 82 | each archiver call the run() method 83 | """ 84 | try: 85 | args = parse_args() 86 | config = Config(file_path=args.config, dry_run=args.dry_run) 87 | configure_logger(level=args.log_level, log_file=args.log_file) 88 | 89 | for archiver in config.archivers: 90 | logging.info("Running archiver %s", archiver.name) 91 | archiver.run() 92 | except KeyboardInterrupt: 93 | logging.info("Keyboard interrupt detected") 94 | for archiver in config.archivers: 95 | archiver.clean_exit() 96 | return 1 97 | except Exception as my_exception: 98 | logging.error(my_exception) 99 | logging.error("Full traceback is: %s", traceback.format_exc()) 100 | for archiver in config.archivers: 101 | archiver.clean_exit() 102 | return 1 103 | return 0 104 | -------------------------------------------------------------------------------- /osarchiver/source/__init__.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | init file that allow to import Source from osarchiver.source whithout loading 7 | submodules. 8 | """ 9 | 10 | 11 | from osarchiver.common import backend_factory 12 | from osarchiver.source.base import Source 13 | 14 | 15 | def factory(*args, backend='db', **kwargs): 16 | """ 17 | backend factory 18 | """ 19 | return backend_factory(*args, 20 | backend=backend, 21 | module='osarchiver.source', 22 | subclass=Source, 23 | **kwargs) 24 | -------------------------------------------------------------------------------- /osarchiver/source/base.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | 5 | """ 6 | Source abstract base class file 7 | """ 8 | 9 | from abc import ABCMeta, abstractmethod 10 | 11 | 12 | class Source(metaclass=ABCMeta): 13 | """ 14 | The source absrtact base class 15 | """ 16 | 17 | def __init__(self, name=None, backend=None, conf=None): 18 | """ 19 | Source object is defined by a name and a backend 20 | """ 21 | self.name = name 22 | self.backend = backend 23 | self.conf = conf 24 | 25 | @abstractmethod 26 | def read(self, **kwargs): 27 | """ 28 | read method that should be implemented by the backend 29 | """ 30 | 31 | @abstractmethod 32 | def delete(self, **kwargs): 33 | """ 34 | delete method that should be implemented by the backend 35 | """ 36 | 37 | @abstractmethod 38 | def clean_exit(self): 39 | """ 40 | clean_exit method that should be implemented by the backend 41 | provide a way to close and clean properly backend stuff 42 | """ 43 | -------------------------------------------------------------------------------- /osarchiver/source/db.py: -------------------------------------------------------------------------------- 1 | # Use of this source code is governed by a BSD-style 2 | # license that can be found in the LICENSE file. 3 | # Copyright 2019 The OSArchiver Authors. All rights reserved. 4 | """ 5 | OSArchiver's Source class that implement a db backend 6 | """ 7 | 8 | import re 9 | import time 10 | import logging 11 | import pymysql 12 | import arrow 13 | from numpy import array_split 14 | from osarchiver.source import Source 15 | from osarchiver.common.db import DbBase 16 | from sqlalchemy import inspect 17 | import sqlalchemy_utils 18 | 19 | NOT_OS_DB = ['mysql', 'performance_schema', 'information_schema'] 20 | 21 | 22 | class Db(Source, DbBase): 23 | """ 24 | Database backend of OSArchiver's Source 25 | """ 26 | 27 | def __init__(self, 28 | databases=None, 29 | tables=None, 30 | delete_data=0, 31 | excluded_databases='', 32 | excluded_tables='', 33 | where='1=1 LIMIT 0', 34 | archive_data=None, 35 | name=None, 36 | destination=None, 37 | **kwargs): 38 | """ 39 | Create a Source instance with relevant configuration parameters given 40 | in arguments 41 | """ 42 | self.databases = databases 43 | self.tables = tables 44 | self.configured_excluded_databases = [ 45 | d for d in re.split(',|;|\n', excluded_databases.replace(' ', '')) 46 | ] 47 | self._excluded_databases = None 48 | self.configured_excluded_tables = [ 49 | d for d in re.split(',|;|\n', excluded_tables.replace(' ', '')) 50 | ] 51 | self._excluded_tables = None 52 | self.archive_data = archive_data 53 | self.delete_data = delete_data 54 | self.destination = destination 55 | self._databases_to_archive = [] 56 | self._tables_to_archive = {} 57 | self.tables_with_circular_fk = [] 58 | # When selecting data be sure to use the same date to prevent selecting 59 | # parent data newer than children data, it is of the responsability of 60 | # the operator to use the {now} formating value in the configuration 61 | # file in the where option. If {now} is ommitted it it is possible to 62 | # get foreign key check errors because of parents data newer than 63 | # children data 64 | self.now = arrow.utcnow().format(fmt='YYYY-MM-DD HH:mm:ss') 65 | self.where = where.format(now=self.now) 66 | Source.__init__(self, backend='db', name=name, 67 | conf=kwargs.get('conf', None)) 68 | DbBase.__init__(self, **kwargs) 69 | 70 | def __repr__(self): 71 | return "Source {name} [Backend:{backend} Host:{host} - DB:{db} - "\ 72 | "Tables:{tables}]".format(backend=self.backend, db=self.databases, 73 | name=self.name, tables=self.tables, 74 | host=self.host) 75 | 76 | @property 77 | def excluded_databases(self): 78 | if self._excluded_databases is not None: 79 | return self._excluded_databases 80 | 81 | excluded_db_set = set(self.configured_excluded_databases) 82 | excluded_db_set.update(set(NOT_OS_DB)) 83 | self._excluded_databases = list(excluded_db_set) 84 | 85 | return self._excluded_databases 86 | 87 | @property 88 | def excluded_tables(self): 89 | if self._excluded_tables is not None: 90 | return self._excluded_tables 91 | 92 | self._excluded_tables = self.configured_excluded_tables 93 | 94 | return self._excluded_tables 95 | 96 | def databases_to_archive(self): 97 | """ 98 | Return a list of databases that are eligibles to archiving. If no 99 | database are provided or the * character is used the method basically 100 | do a SHOW DATABASE to get available databases 101 | The method exclude the databases that are explicitly excluded 102 | """ 103 | if self._databases_to_archive: 104 | return self._databases_to_archive 105 | 106 | if self.databases is None or self.databases == '*': 107 | self._databases_to_archive = self.get_os_databases() 108 | else: 109 | self._databases_to_archive = [ 110 | d for d in re.split(',|;|\n', self.databases.replace(' ', '')) 111 | ] 112 | 113 | excluded_databases_regex = \ 114 | "^(" + "|".join(self.excluded_databases) + ")$" 115 | self._databases_to_archive = [ 116 | d for d in self._databases_to_archive 117 | if not re.match(excluded_databases_regex, d) 118 | ] 119 | 120 | return self._databases_to_archive 121 | 122 | def tables_to_archive(self, database=None): 123 | """ 124 | For a given database, return the list of tables that are eligible to 125 | archiving. 126 | - Retrieve tables if needed (*, or empty) 127 | - Check that tables has 'deleted_at' column (deleted_column 128 | parameter) 129 | - Exclude tables in excluded_tables 130 | - Reorder tables depending foreign key 131 | """ 132 | if database is None: 133 | logging.warning("Can not call tables_to_archive on None database") 134 | return [] 135 | if database in self._tables_to_archive: 136 | return self._tables_to_archive[database] 137 | 138 | database_tables = [ 139 | v[0] for (i, v) in enumerate(self.get_database_tables(database)) 140 | ] 141 | logging.info("Tables list of database '%s': %s", database, 142 | database_tables) 143 | # Step 1: is to get all the tables we want to archive 144 | # no table specified or jocker used means we want all tables 145 | # else we filter against the tables specified 146 | if self.tables is None or self.tables == '*': 147 | self._tables_to_archive[database] = database_tables 148 | else: 149 | self._tables_to_archive[database] = \ 150 | [t for t in re.split(',|;|\n', self.tables.replace(' ', '')) 151 | if t in database_tables] 152 | 153 | # Step 2: verify that all tables have the deleted column 'deleted_at' 154 | logging.debug("Verifying that tables have the '%s' column", 155 | self.deleted_column) 156 | tables = [] 157 | for table in self._tables_to_archive[database]: 158 | if not self.table_has_deleted_column(table=table, 159 | database=database): 160 | logging.debug( 161 | "Table '%s' has no column named '%s'," 162 | " ignoring it", table, self.deleted_column) 163 | continue 164 | tables.append(table) 165 | # update self._tables_to_archive with the filtered tables 166 | self._tables_to_archive[database] = tables 167 | 168 | # Step 3: then exclude the one explicitly given 169 | excluded_tables_regex = "^(" + "|".join(self.excluded_tables) + ")$" 170 | logging.debug("Ignoring tables matching '%s'", excluded_tables_regex) 171 | self._tables_to_archive[database] = [ 172 | t for t in self._tables_to_archive[database] 173 | if not re.match(excluded_tables_regex, t) 174 | ] 175 | 176 | # Step 4 for each table retrieve child tables referencing the parent 177 | # table and order them childs first, parents then 178 | sorted_tables = self.sort_tables( 179 | database=database, tables=self._tables_to_archive[database]) 180 | self._tables_to_archive[database] = sorted_tables 181 | 182 | logging.debug( 183 | "Tables ordered depending foreign key dependencies: " 184 | "'%s'", self._tables_to_archive[database]) 185 | return self._tables_to_archive[database] 186 | 187 | def sort_tables(self, database=None, tables=[]): 188 | """ 189 | Given a DB and a list of tables return the list orderered depending 190 | foreign key check in order to get child table before parent table 191 | """ 192 | inspector = inspect(self.sqlalchemy_engine) 193 | sorted_tables = [] 194 | logging.debug("Tables to sort: %s", sorted_tables) 195 | for table in tables: 196 | if not self.table_has_deleted_column(table=table, database=database): 197 | continue 198 | if table not in sorted_tables: 199 | logging.debug("Table %s added to final list", table) 200 | sorted_tables.append(table) 201 | idx = sorted_tables.index(table) 202 | fks = inspector.get_foreign_keys(table, schema=database) 203 | logging.debug("Foreign keys of %s: %s", table, fks) 204 | for fk in fks: 205 | t = fk['referred_table'] 206 | 207 | if t in sorted_tables: 208 | if sorted_tables.index(t) > idx: 209 | continue 210 | else: 211 | sorted_tables.remove(t) 212 | sorted_tables.insert(idx+1, t) 213 | 214 | return sorted_tables 215 | 216 | def select(self, limit=None, database=None, table=None): 217 | """ 218 | select data from a database.table, apply limit or take the default one 219 | the select by set depends of the primary key type (int vs uuid) 220 | In case of int: 221 | SELECT * FROM .
WHERE > AND ... 222 | In case of uuid (uuid are not ordered naturally ordered, we sort them) 223 | SELECT * FROM .
WHERE > "" AND... 224 | ORDER BY 225 | """ 226 | offset = 0 227 | last_selected_id = 0 228 | 229 | # Use primary key column to improve performance on large 230 | # dataset vs using OFFSET 231 | primary_key = self.get_table_primary_key(database=database, 232 | table=table) 233 | 234 | if limit is None: 235 | limit = self.select_limit 236 | 237 | sql = "SELECT * FROM `{database}`.`{table}` WHERE {pk} > "\ 238 | "'{last_id}' AND {where} LIMIT {limit}" 239 | 240 | pk_type_checked = False 241 | 242 | while True: 243 | formatted_sql = sql.format(database=database, 244 | table=table, 245 | where=self.where, 246 | limit=limit, 247 | last_id=last_selected_id, 248 | pk=primary_key, 249 | offset=offset) 250 | result = self.db_request(sql=formatted_sql, 251 | cursor_type=pymysql.cursors.DictCursor, 252 | database=database, 253 | table=table, 254 | fetch_method='fetchall') 255 | logging.info("Fetched %s result in %s.%s", len(result), database, 256 | table) 257 | if not result: 258 | break 259 | last_selected_id = result[-1][primary_key] 260 | 261 | yield result 262 | 263 | offset += len(result) 264 | if pk_type_checked is False: 265 | # If the primary key is a digit remove the simple quote from 266 | # the last_id variable for performance purpose 267 | if str(last_selected_id).isdigit(): 268 | # remove the simple quote arround id 269 | sql = "SELECT * FROM `{database}`.`{table}` WHERE {pk} >"\ 270 | " {last_id} AND {where} LIMIT {limit}" 271 | else: 272 | # else this a string and we force to order by that string 273 | # to simulate an integer primary key 274 | sql = "SELECT * FROM `{database}`.`{table}` WHERE {pk} >"\ 275 | " '{last_id}' AND {where} ORDER BY {pk} LIMIT {limit}" 276 | 277 | pk_type_checked = True 278 | 279 | def read(self, limit=None): 280 | """ 281 | The read method that has to be implemented (Source abstract class) 282 | """ 283 | databases_to_archive = self.databases_to_archive() 284 | logging.info("Database elected for archiving: %s", 285 | databases_to_archive) 286 | for database in databases_to_archive: 287 | tables_to_archive = self.tables_to_archive(database=database) 288 | logging.info("Tables elected for archiving: %s", tables_to_archive) 289 | for table in tables_to_archive: 290 | logging.info("%s.%s is to archive", database, table) 291 | yield { 292 | 'database': 293 | database, 294 | 'table': 295 | table, 296 | 'data': 297 | self.select(limit=limit, database=database, table=table) 298 | } 299 | 300 | def delete_set(self, database=None, table=None, limit=None, data=None): 301 | """ 302 | Delete a set of data using the primary_key of table 303 | """ 304 | if not self.delete_data: 305 | logging.info( 306 | "Ignoring delete step because delete_data is set to" 307 | " %s", self.delete_data) 308 | return 309 | if limit is None: 310 | limit = self.delete_limit 311 | 312 | primary_key = self.get_table_primary_key(database=database, 313 | table=table) 314 | 315 | # Check if primary key is a digit to prevent casting by MySQL and 316 | # optimize the request, store the value in metadata for caching 317 | pk_is_digit = self.get_metadata(database=database, 318 | table=table, 319 | key='pk_is_digit') 320 | if pk_is_digit is None: 321 | pk_is_digit = str(data[0][primary_key]).isdigit() 322 | self.add_metadata(database=database, 323 | table=table, 324 | key='pk_is_digit', 325 | value=pk_is_digit) 326 | 327 | def create_array_chunks(array, chunk_size): 328 | for i in range(0, len(array), chunk_size): 329 | yield array[i:i + chunk_size] 330 | 331 | # For performance purpose split data in subdata of lenght=limit 332 | for subdata in list(create_array_chunks(data, limit)): 333 | if pk_is_digit: 334 | ids = ', '.join([str(d[primary_key]) for d in subdata]) 335 | else: 336 | ids = '"' + '", "'.join([str(d['id']) for d in subdata]) + '"' 337 | 338 | total_deleted_count = 0 339 | # equivalent to a while True but we know why we are looping 340 | while "there are rows to delete": 341 | if total_deleted_count > 0: 342 | logging.debug( 343 | "Waiting %s seconds before deleting next" 344 | "subset of data ", self.delete_loop_delay) 345 | time.sleep(int(self.delete_loop_delay)) 346 | 347 | sql = "DELETE FROM `{database}`.`{table}` WHERE "\ 348 | "`{pk}` IN ({ids}) LIMIT {limit}".format( 349 | database=database, 350 | table=table, 351 | ids=ids, 352 | pk=primary_key, 353 | limit=limit) 354 | foreign_key_check = None 355 | if '{db}.{table}'.format(db=database, table=table) \ 356 | in self.tables_with_circular_fk: 357 | foreign_key_check = False 358 | 359 | count = self.db_request(sql=sql, 360 | foreign_key_check=foreign_key_check, 361 | database=database, 362 | table=table) 363 | logging.info("%s rows deleted from %s.%s", count, database, 364 | table) 365 | total_deleted_count += count 366 | 367 | if int(count) < int(limit) or \ 368 | total_deleted_count == len(subdata): 369 | logging.debug("No more row to delete in this data set") 370 | break 371 | 372 | logging.debug("Waiting %s seconds after a deletion", 373 | self.delete_loop_delay) 374 | time.sleep(int(self.delete_loop_delay)) 375 | 376 | def delete(self, database=None, table=None, limit=None, data=None): 377 | """ 378 | The delete method that has to be implemented (Source abstract class) 379 | """ 380 | try: 381 | self.delete_set(database=database, 382 | table=table, 383 | limit=limit, 384 | data=data) 385 | except pymysql.err.IntegrityError as integrity_error: 386 | 387 | # foreign key constraint fails usually because of error while 388 | # processing openstack tasks 389 | # to prevent never deleting some of data, we re run delete with 390 | # half set of data if we caught an integrity error (1451) 391 | # To prevent never deleting rest of data of a set, we re run delete 392 | # with a half set if we caught an integrity error (1451) 393 | # until we caught the offending row 394 | if integrity_error.args[0] != 1451: 395 | raise integrity_error 396 | 397 | # we caught the row causing integrity error 398 | if len(data) == 1: 399 | logging.error("OSArchiver hit a row that will never be deleted" 400 | " unless you fix remaining chlidren data") 401 | logging.error("Parent row that can not be deleted: %s", data) 402 | logging.error("To get children items:") 403 | logging.error( 404 | self.integrity_exception_select_statement( 405 | error=integrity_error.args[1], row=data[0])) 406 | logging.error("Here a POTENTIAL fix, ensure BEFORE that data " 407 | "should be effectively deleted, then run " 408 | "osarchiver again:") 409 | logging.error( 410 | self.integrity_exception_potential_fix( 411 | error=integrity_error.args[1], row=data[0])) 412 | else: 413 | logging.error("Integrity error caught, deleting with " 414 | "dichotomy") 415 | for subdata in array_split(data, 2): 416 | logging.debug( 417 | "Dichotomy delete with a set of %s data " 418 | "length", len(subdata)) 419 | # Add a sleep period because in case of error in delete_set 420 | # we never sleep, it will avoid some lock wait timeout for 421 | # incoming requests 422 | time.sleep(int(self.delete_loop_delay)) 423 | self.delete(database=database, 424 | table=table, 425 | data=subdata, 426 | limit=len(subdata)) 427 | 428 | def clean_exit(self): 429 | """ 430 | Tasks to be executed to exit cleanly: 431 | - Disconnect from the database 432 | """ 433 | logging.info("Closing source DB connection") 434 | self.disconnect() 435 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | arrow==0.17.0 2 | configparser==4.0.2 3 | importlib-resources==3.2.1 4 | numpy==1.18.5 5 | PyMySQL==0.10.1 6 | python-dateutil==2.8.1 7 | six==1.15.0 8 | python_swiftclient==3.13.0 9 | python-keystoneclient==4.4.0 10 | SQLAlchemy==1.3.24 11 | SQLAlchemy-Utils==0.37.2 12 | zipp==1.2.0 13 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | name = osarchiver 3 | summary = Openstack DB archiver 4 | license = Apache 2.0 5 | version = 0.2.0 6 | description-file = 7 | README.md 8 | author = OVHcloud SAS 9 | author-email = opensource@ovh.net 10 | home-page = https://github.com/ovh/osarchiver 11 | python-requires = >=3.5 12 | classifier = 13 | Intended Audience :: Developers 14 | License :: OSI Approved :: Apache Software License 15 | Programming Language :: Python 16 | Programming Language :: Python :: 3.2 17 | Programming Language :: Python :: 3.3 18 | Programming Language :: Python :: 3.4 19 | Programming Language :: Python :: 3.5 20 | Topic :: Software Development :: Libraries :: Python Module 21 | 22 | [files] 23 | packages = 24 | osarchiver 25 | 26 | [entry_points] 27 | console_scripts = 28 | osarchiver = osarchiver.main:run 29 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | try: 4 | from setuptools import setup, find_packages 5 | except ImportError: 6 | from distutils.core import setup, find_packages 7 | 8 | setup(setup_requires=['pbr>=2.0.0'], 9 | pbr=True) 10 | --------------------------------------------------------------------------------