├── requirements.txt ├── images ├── jumpstart-01.png ├── jumpstart-02.png └── jumpstart-03.png ├── environment.yml ├── .gitignore ├── MultilabelPredictor.py ├── LICENSE └── README.md /requirements.txt: -------------------------------------------------------------------------------- 1 | torch 2 | pandas 3 | autogluon 4 | sagemaker>=2.80 5 | vowpalwabbit==8.10.1 6 | setuptools==59.5.0 7 | openpyxl -------------------------------------------------------------------------------- /images/jumpstart-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/machinelearnear/multi-label-classification-autogluon/main/images/jumpstart-01.png -------------------------------------------------------------------------------- /images/jumpstart-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/machinelearnear/multi-label-classification-autogluon/main/images/jumpstart-02.png -------------------------------------------------------------------------------- /images/jumpstart-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/machinelearnear/multi-label-classification-autogluon/main/images/jumpstart-03.png -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | # To use: 2 | # $ conda env create -f environment.yml 3 | # $ conda activate 4 | name: machinelearnear-autogluon 5 | dependencies: 6 | - python=3.9 7 | - pip 8 | - nb_conda_kernels 9 | - ipykernel 10 | - ipywidgets 11 | - gh 12 | - pip: 13 | - -r requirements.txt -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | build/ 14 | develop-eggs/ 15 | dist/ 16 | downloads/ 17 | eggs/ 18 | .eggs/ 19 | lib/ 20 | lib64/ 21 | parts/ 22 | sdist/ 23 | var/ 24 | wheels/ 25 | share/python-wheels/ 26 | *.egg-info/ 27 | .installed.cfg 28 | *.egg 29 | MANIFEST 30 | 31 | # PyInstaller 32 | # Usually these files are written by a python script from a template 33 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 34 | *.manifest 35 | *.spec 36 | 37 | # Installer logs 38 | pip-log.txt 39 | pip-delete-this-directory.txt 40 | 41 | # Unit test / coverage reports 42 | htmlcov/ 43 | .tox/ 44 | .nox/ 45 | .coverage 46 | .coverage.* 47 | .cache 48 | nosetests.xml 49 | coverage.xml 50 | *.cover 51 | *.py,cover 52 | .hypothesis/ 53 | .pytest_cache/ 54 | cover/ 55 | 56 | # Translations 57 | *.mo 58 | *.pot 59 | 60 | # Django stuff: 61 | *.log 62 | local_settings.py 63 | db.sqlite3 64 | db.sqlite3-journal 65 | 66 | # Flask stuff: 67 | instance/ 68 | .webassets-cache 69 | 70 | # Scrapy stuff: 71 | .scrapy 72 | 73 | # Sphinx documentation 74 | docs/_build/ 75 | 76 | # PyBuilder 77 | .pybuilder/ 78 | target/ 79 | 80 | # Jupyter Notebook 81 | .ipynb_checkpoints 82 | 83 | # IPython 84 | profile_default/ 85 | ipython_config.py 86 | 87 | # pyenv 88 | # For a library or package, you might want to ignore these files since the code is 89 | # intended to run in multiple environments; otherwise, check them in: 90 | # .python-version 91 | 92 | # pipenv 93 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 94 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 95 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 96 | # install all needed dependencies. 97 | #Pipfile.lock 98 | 99 | # poetry 100 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 101 | # This is especially recommended for binary packages to ensure reproducibility, and is more 102 | # commonly ignored for libraries. 103 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 104 | #poetry.lock 105 | 106 | # pdm 107 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 108 | #pdm.lock 109 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 110 | # in version control. 111 | # https://pdm.fming.dev/#use-with-ide 112 | .pdm.toml 113 | 114 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 115 | __pypackages__/ 116 | 117 | # Celery stuff 118 | celerybeat-schedule 119 | celerybeat.pid 120 | 121 | # SageMath parsed files 122 | *.sage.py 123 | 124 | # Environments 125 | .env 126 | .venv 127 | env/ 128 | venv/ 129 | ENV/ 130 | env.bak/ 131 | venv.bak/ 132 | 133 | # Spyder project settings 134 | .spyderproject 135 | .spyproject 136 | 137 | # Rope project settings 138 | .ropeproject 139 | 140 | # mkdocs documentation 141 | /site 142 | 143 | # mypy 144 | .mypy_cache/ 145 | .dmypy.json 146 | dmypy.json 147 | 148 | # Pyre type checker 149 | .pyre/ 150 | 151 | # pytype static type analyzer 152 | .pytype/ 153 | 154 | # Cython debug symbols 155 | cython_debug/ 156 | 157 | # PyCharm 158 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 159 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 160 | # and can be added to the global gitignore or merged into this file. For a more nuclear 161 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 162 | #.idea/ -------------------------------------------------------------------------------- /MultilabelPredictor.py: -------------------------------------------------------------------------------- 1 | import autogluon 2 | from autogluon.tabular import TabularDataset, TabularPredictor 3 | from autogluon.common.utils.utils import setup_outputdir 4 | from autogluon.core.utils.loaders import load_pkl 5 | from autogluon.core.utils.savers import save_pkl 6 | import os.path 7 | 8 | class MultilabelPredictor(): 9 | """ Tabular Predictor for predicting multiple columns in table. 10 | Creates multiple TabularPredictor objects which you can also use individually. 11 | You can access the TabularPredictor for a particular label via: `multilabel_predictor.get_predictor(label_i)` 12 | 13 | Parameters 14 | ---------- 15 | labels : List[str] 16 | The ith element of this list is the column (i.e. `label`) predicted by the ith TabularPredictor stored in this object. 17 | path : str, default = None 18 | Path to directory where models and intermediate outputs should be saved. 19 | If unspecified, a time-stamped folder called "AutogluonModels/ag-[TIMESTAMP]" will be created in the working directory to store all models. 20 | Note: To call `fit()` twice and save all results of each fit, you must specify different `path` locations or don't specify `path` at all. 21 | Otherwise files from first `fit()` will be overwritten by second `fit()`. 22 | Caution: when predicting many labels, this directory may grow large as it needs to store many TabularPredictors. 23 | problem_types : List[str], default = None 24 | The ith element is the `problem_type` for the ith TabularPredictor stored in this object. 25 | eval_metrics : List[str], default = None 26 | The ith element is the `eval_metric` for the ith TabularPredictor stored in this object. 27 | consider_labels_correlation : bool, default = True 28 | Whether the predictions of multiple labels should account for label correlations or predict each label independently of the others. 29 | If True, the ordering of `labels` may affect resulting accuracy as each label is predicted conditional on the previous labels appearing earlier in this list (i.e. in an auto-regressive fashion). 30 | Set to False if during inference you may want to individually use just the ith TabularPredictor without predicting all the other labels. 31 | kwargs : 32 | Arguments passed into the initialization of each TabularPredictor. 33 | 34 | """ 35 | 36 | multi_predictor_file = 'multilabel_predictor.pkl' 37 | 38 | def __init__(self, labels, path=None, problem_types=None, eval_metrics=None, consider_labels_correlation=True, **kwargs): 39 | if len(labels) < 2: 40 | raise ValueError("MultilabelPredictor is only intended for predicting MULTIPLE labels (columns), use TabularPredictor for predicting one label (column).") 41 | if (problem_types is not None) and (len(problem_types) != len(labels)): 42 | raise ValueError("If provided, `problem_types` must have same length as `labels`") 43 | if (eval_metrics is not None) and (len(eval_metrics) != len(labels)): 44 | raise ValueError("If provided, `eval_metrics` must have same length as `labels`") 45 | self.path = setup_outputdir(path, warn_if_exist=False) 46 | self.labels = labels 47 | self.consider_labels_correlation = consider_labels_correlation 48 | self.predictors = {} # key = label, value = TabularPredictor or str path to the TabularPredictor for this label 49 | if eval_metrics is None: 50 | self.eval_metrics = {} 51 | else: 52 | self.eval_metrics = {labels[i] : eval_metrics[i] for i in range(len(labels))} 53 | problem_type = None 54 | eval_metric = None 55 | for i in range(len(labels)): 56 | label = labels[i] 57 | path_i = self.path + "Predictor_" + label 58 | if problem_types is not None: 59 | problem_type = problem_types[i] 60 | if eval_metrics is not None: 61 | eval_metric = eval_metrics[i] 62 | self.predictors[label] = TabularPredictor(label=label, problem_type=problem_type, eval_metric=eval_metric, path=path_i, **kwargs) 63 | 64 | def fit(self, train_data, tuning_data=None, **kwargs): 65 | """ Fits a separate TabularPredictor to predict each of the labels. 66 | 67 | Parameters 68 | ---------- 69 | train_data, tuning_data : str or autogluon.tabular.TabularDataset or pd.DataFrame 70 | See documentation for `TabularPredictor.fit()`. 71 | kwargs : 72 | Arguments passed into the `fit()` call for each TabularPredictor. 73 | """ 74 | if isinstance(train_data, str): 75 | train_data = TabularDataset(train_data) 76 | if tuning_data is not None and isinstance(tuning_data, str): 77 | tuning_data = TabularDataset(tuning_data) 78 | train_data_og = train_data.copy() 79 | if tuning_data is not None: 80 | tuning_data_og = tuning_data.copy() 81 | else: 82 | tuning_data_og = None 83 | save_metrics = len(self.eval_metrics) == 0 84 | for i in range(len(self.labels)): 85 | label = self.labels[i] 86 | predictor = self.get_predictor(label) 87 | if not self.consider_labels_correlation: 88 | labels_to_drop = [l for l in self.labels if l != label] 89 | else: 90 | labels_to_drop = [self.labels[j] for j in range(i+1, len(self.labels))] 91 | train_data = train_data_og.drop(labels_to_drop, axis=1) 92 | if tuning_data is not None: 93 | tuning_data = tuning_data_og.drop(labels_to_drop, axis=1) 94 | print(f"Fitting TabularPredictor for label: {label} ...") 95 | predictor.fit(train_data=train_data, tuning_data=tuning_data, **kwargs) 96 | self.predictors[label] = predictor.path 97 | if save_metrics: 98 | self.eval_metrics[label] = predictor.eval_metric 99 | self.save() 100 | 101 | def predict(self, data, **kwargs): 102 | """ Returns DataFrame with label columns containing predictions for each label. 103 | 104 | Parameters 105 | ---------- 106 | data : str or autogluon.tabular.TabularDataset or pd.DataFrame 107 | Data to make predictions for. If label columns are present in this data, they will be ignored. See documentation for `TabularPredictor.predict()`. 108 | kwargs : 109 | Arguments passed into the predict() call for each TabularPredictor. 110 | """ 111 | return self._predict(data, as_proba=False, **kwargs) 112 | 113 | def predict_proba(self, data, **kwargs): 114 | """ Returns dict where each key is a label and the corresponding value is the `predict_proba()` output for just that label. 115 | 116 | Parameters 117 | ---------- 118 | data : str or autogluon.tabular.TabularDataset or pd.DataFrame 119 | Data to make predictions for. See documentation for `TabularPredictor.predict()` and `TabularPredictor.predict_proba()`. 120 | kwargs : 121 | Arguments passed into the `predict_proba()` call for each TabularPredictor (also passed into a `predict()` call). 122 | """ 123 | return self._predict(data, as_proba=True, **kwargs) 124 | 125 | def evaluate(self, data, **kwargs): 126 | """ Returns dict where each key is a label and the corresponding value is the `evaluate()` output for just that label. 127 | 128 | Parameters 129 | ---------- 130 | data : str or autogluon.tabular.TabularDataset or pd.DataFrame 131 | Data to evalate predictions of all labels for, must contain all labels as columns. See documentation for `TabularPredictor.evaluate()`. 132 | kwargs : 133 | Arguments passed into the `evaluate()` call for each TabularPredictor (also passed into the `predict()` call). 134 | """ 135 | data = self._get_data(data) 136 | eval_dict = {} 137 | for label in self.labels: 138 | print(f"Evaluating TabularPredictor for label: {label} ...") 139 | predictor = self.get_predictor(label) 140 | eval_dict[label] = predictor.evaluate(data, **kwargs) 141 | if self.consider_labels_correlation: 142 | data[label] = predictor.predict(data, **kwargs) 143 | return eval_dict 144 | 145 | def save(self): 146 | """ Save MultilabelPredictor to disk. """ 147 | for label in self.labels: 148 | if not isinstance(self.predictors[label], str): 149 | self.predictors[label] = self.predictors[label].path 150 | save_pkl.save(path=self.path+self.multi_predictor_file, object=self) 151 | print(f"MultilabelPredictor saved to disk. Load with: MultilabelPredictor.load('{self.path}')") 152 | 153 | @classmethod 154 | def load(cls, path): 155 | """ Load MultilabelPredictor from disk `path` previously specified when creating this MultilabelPredictor. """ 156 | path = os.path.expanduser(path) 157 | if path[-1] != os.path.sep: 158 | path = path + os.path.sep 159 | return load_pkl.load(path=path+cls.multi_predictor_file) 160 | 161 | def get_predictor(self, label): 162 | """ Returns TabularPredictor which is used to predict this label. """ 163 | predictor = self.predictors[label] 164 | if isinstance(predictor, str): 165 | return TabularPredictor.load(path=predictor) 166 | return predictor 167 | 168 | def _get_data(self, data): 169 | if isinstance(data, str): 170 | return TabularDataset(data) 171 | return data.copy() 172 | 173 | def _predict(self, data, as_proba=False, **kwargs): 174 | data = self._get_data(data) 175 | if as_proba: 176 | predproba_dict = {} 177 | for label in self.labels: 178 | print(f"Predicting with TabularPredictor for label: {label} ...") 179 | predictor = self.get_predictor(label) 180 | if as_proba: 181 | predproba_dict[label] = predictor.predict_proba(data, as_multiclass=True, **kwargs) 182 | data[label] = predictor.predict(data, **kwargs) 183 | if not as_proba: 184 | return data[self.labels] 185 | else: 186 | return predproba_dict 187 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Multi-label classification using `AutoGluon` 2 | 3 | From their [official website](https://auto.gluon.ai/stable/index.html): 4 | > AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to: 5 | > - Quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. 6 | > - Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. 7 | > - Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. 8 | > - Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case. 9 | 10 | In this repository we are going to see an example of how to take a sample input, pre-process it to convert it to the right format, train different algorithms and evaluate their performance, and perform inference on a validation sample. The input dataset won't be made available but, as a guideline, it follows the structure below: 11 | 12 | | Case Details | Category | Subcategory | Sector | Tag | 13 | |--------------|----------|-------------|--------|--------| 14 | | Sentence 1 | A | AA | S01 | Double | 15 | | Sentence 2 | B | BA | S01 | Mono | 16 | | ... | ... | ... | ... | ... | 17 | | Sentence N | Z | ZZ | S99 | Mono | 18 | 19 | ## Getting started 20 | - [SageMaker StudioLab Explainer Video](https://www.youtube.com/watch?v=FUEIwAsrMP4) 21 | - [AutoGluon Documentation](https://auto.gluon.ai/stable/index.html) 22 | - [New built-in Amazon SageMaker algorithms for tabular data modeling: LightGBM, CatBoost, AutoGluon-Tabular, and TabTransformer](https://aws.amazon.com/blogs/machine-learning/new-built-in-amazon-sagemaker-algorithms-for-tabular-data-modeling-lightgbm-catboost-autogluon-tabular-and-tabtransformer/) 23 | - [Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models](https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/) 24 | 25 | ## Step by step tutorial 26 | 27 | ### Setup your environment 28 | 29 | First, you need to get a [SageMaker Studio Lab](https://studiolab.sagemaker.aws/) account. This is completely free and you don't need an AWS account. Because this new service is still in Preview and AWS is looking to reduce fraud (e.g. crypto mining), you will need to wait 1-3 days for your account to be approved. You can see [this video](https://www.youtube.com/watch?v=FUEIwAsrMP4&ab_channel=machinelearnear) for more information. 30 | 31 | Now that you have your Studio Lab account, you can follow the steps shown in `data_prep_train_eval.ipynb` > [![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/machinelearnear/multi-label-classification-autogluon/blob/main/data_prep_train_eval.ipynb) 32 | 33 | Click on `Copy to project` in the top right corner. This will open the Studio Lab web interface and ask you whether you want to clone the entire repo or just the Notebook. Clone the entire repo and click `Yes` when asked about building the `Conda` environment automatically. You will now be running on top of a `Python` environment with `AutoGluon` and other dependencies installed. 34 | 35 | ### Data preparation 36 | 37 | Following the example shown in this [AWS blog post](https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/), we are looking to go from the initial dataset structure into a single label to be predicted and a single column as input sentence. 38 | 39 | > Text classification 40 | 41 | > Sentiment analysis is one of the many tasks under the umbrella of text classification. It consists of predicting what sentiment should be assigned to a specific passage of text, with varying degrees of granularity. Typical applications include social media monitoring, customer support management, and analyzing customer feedback. 42 | 43 | > The input is a directory containing a data.csv file. The first column is the label (an integer between 0 and the number of classes in the dataset), and the second column is the corresponding passage of text. This means that you could even use a dataset with more degrees of sentiment than the original—for example, very negative (0), negative (1), neutral (2), positive (3), very positive (4). The following is an example of a data.csv file corresponding to the SST2 (Stanford Sentiment Treebank) dataset, and shows values in its first two columns. Note that the file shouldn’t have any header. 44 | 45 | |Column 1|Column 2| 46 | |:----|:----| 47 | |0|hide new secretions from the parental units| 48 | |0|contains no wit , only labored gags| 49 | |1|that loves its characters and communicates something rather beautiful about human nature| 50 | |0|remains utterly satisfied to remain the same throughout| 51 | |0|on the worst revenge-of-the-nerds clichés the filmmakers could dredge up| 52 | |0|that ‘s far too tragic to merit such superficial treatment| 53 | |1|demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop .| 54 | 55 | ```python 56 | # prepare for training 57 | df_sector.columns = ['sentence','label'] 58 | df_sector = df_sector[['label','sentence']] 59 | codes, uniques = pd.factorize(df_sector['label']) 60 | df_sector['label'] = codes 61 | 62 | # print labels and codes 63 | for x,y in zip(df_sector['label'].unique(), uniques.values): 64 | print(f'{x} => "{y}"') 65 | 66 | # save to disk 67 | df_sector.to_csv('data/data_sector.csv', index=False) 68 | ``` 69 | 70 | Looking at samples in the raw dataset, each grouping (e.g., "Case") can be linked to one or more categories and to one or more subcategories, creating hundreds of variations. As a result, we treat the problem of predicting "Category" and "Subcategory" as a multi-label classification problems where each "Case" can have one or more categories as labels and one or more subcategories. 71 | 72 | The first step is to transform our dataset and create a label for each one of the categories we have. Next, for each sample we assign a `0` or `1` to each label column depending on whether the "Case" belongs to that category or not. 73 | 74 | ```python 75 | # prepare for training 76 | df_category = df.dropna(subset=['Category']) 77 | df_category = df_category.drop(['Sector', 'Subcategory', 'Tag'], axis=1) 78 | 79 | 80 | df_category=df_category.pivot_table(index='Case Details', columns=['Category'], aggfunc=len,fill_value=0) 81 | 82 | df_category['sentence'] = df_category.index 83 | df_category.reset_index(drop=True, inplace=True) 84 | df_category= df_category.rename(columns=str.lower) 85 | df_category.shape 86 | ``` 87 | 88 | We follow the same approach for subcategories. 89 | 90 | ### Model training 91 | 92 | #### Predict single label using `TabularPredictor` 93 | - https://auto.gluon.ai/scoredebugweight/api/autogluon.task.html#autogluon.tabular.TabularPredictor 94 | 95 | ```python 96 | from autogluon.tabular import TabularPredictor 97 | time_limit = 1 * 60 * 60 98 | pred_sector = TabularPredictor(label='label', path='pred_sector') 99 | pred_sector.fit(df_sector, hyperparameters='multimodal', time_limit=time_limit) 100 | ``` 101 | 102 | Once the training is finished, you can view a leaderboard with results across different algorithms 103 | 104 | ```python 105 | leaderboard = pred_sector.leaderboard(df_sector) 106 | leaderboard.to_csv('data/leaderboard.csv',index=False) 107 | leaderboard.head(10) 108 | ``` 109 | 110 | 111 | | model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | 112 | | ------------------- | ---------- | ---------- | -------------- | ------------- | ---------- | ----------------------- | ---------------------- | ----------------- | ----------- | --------- | --------- | 113 | | WeightedEnsemble_L2 | 0.96294896 | 0.83967936 | 4.62426472 | 1.03646231 | 254.271195 | 0.22828174 | 0.00050831 | 0.28966641 | 2 | TRUE | 9 | 114 | | VowpalWabbit | 0.95803403 | 0.79158317 | 0.73444152 | 0.11314583 | 5.56324816 | 0.73444152 | 0.11314583 | 5.56324816 | 1 | TRUE | 6 | 115 | | LightGBM | 0.95614367 | 0.78356713 | 0.11071897 | 0.01948977 | 7.96716714 | 0.11071897 | 0.01948977 | 7.96716714 | 1 | TRUE | 1 | 116 | | XGBoost | 0.95387524 | 0.76953908 | 0.17194605 | 0.02701616 | 9.91046214 | 0.17194605 | 0.02701616 | 9.91046214 | 1 | TRUE | 4 | 117 | | LightGBMLarge | 0.95311909 | 0.76553106 | 0.92347479 | 0.12258124 | 22.6849813 | 0.92347479 | 0.12258124 | 22.6849813 | 1 | TRUE | 7 | 118 | | TextPredictor | 0.95236295 | 0.82364729 | 3.30338812 | 0.82069683 | 162.585056 | 3.30338812 | 0.82069683 | 162.585056 | 1 | TRUE | 8 | 119 | | LightGBMXT | 0.94744802 | 0.78156313 | 0.11619425 | 0.01983976 | 7.22314429 | 0.11619425 | 0.01983976 | 7.22314429 | 1 | TRUE | 2 | 120 | | CatBoost | 0.8415879 | 0.76553106 | 0.10149503 | 0.05197072 | 63.9656713 | 0.10149503 | 0.05197072 | 63.9656713 | 1 | TRUE | 3 | 121 | | NeuralNetTorch | 0.32400756 | 0.28657315 | 0.0297451 | 0.01081109 | 6.67724204 | 0.0297451 | 0.01081109 | 6.67724204 | 1 | TRUE | 5 | 122 | 123 | #### Predict single label using `TextPredictor` 124 | - https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-multimodal-text-others.html#improve-the-performance-with-stack-ensemble 125 | - https://auto.gluon.ai/stable/tutorials/text_prediction/customization.html 126 | 127 | `TextPredictor` provides several simple preset configurations. Let’s take a look at the available presets. 128 | 129 | ```python 130 | from autogluon.text.text_prediction.presets import list_text_presets 131 | list_text_presets(verbose=True) 132 | ``` 133 | 134 | Split your data into `train` and `test` 135 | 136 | ```python 137 | train_data = df_sector.sample(frac=0.9, random_state=42) 138 | test_data = df_sector.drop(train_data.index) 139 | label = "label" 140 | y_test = test_data[label] 141 | X_test = test_data.drop(columns=[label]) 142 | ``` 143 | 144 | And begin your model training 145 | 146 | ```python 147 | from autogluon.text import TextPredictor 148 | time_limit = 1 * 60 * 60 149 | pred_sector_textpred = TextPredictor(eval_metric="acc", label="label") 150 | pred_sector_textpred.fit( 151 | train_data=train_data, 152 | presets="best_quality", 153 | time_limit=time_limit, 154 | ) 155 | ``` 156 | 157 | Once this is finished, you can evaluate against your test data 158 | 159 | ```python 160 | pred_sector_textpred.evaluate(test_data, metrics=["f1", "acc"]) 161 | ``` 162 | 163 | #### Predict `Category` and `Subcategory` (Multi-label Classification) 164 | With `AutoGluon`, we can have a separate [`TabularPredictor`](https://auto.gluon.ai/stable/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit) for each column we want to predict and use a custom MultilabelPredictor class to manage the collection of TabularPredictor objects. With this in mind, we create a `MultiLabelPredictor` class similar to the one described in the `AutoGluon` documentation here. We then apply our `MultiLabelPredictor` to predict the category and subcategory labels for each Case. 165 | 166 | First, we split the data into train and test and use the train data for training. Here is an example of how we train the model to predict Categories. 167 | 168 | ```python 169 | from MultilabelPredictor import MultilabelPredictor 170 | 171 | labels = [f'{x}' for x in category_train_data.columns] 172 | labels.remove('sentence') 173 | save_path='models' 174 | cat_multi_predictor = MultilabelPredictor(labels=labels, path=save_path) 175 | cat_multi_predictor.fit(category_train_data) 176 | ``` 177 | 178 | Once the training is finished, we can see the results for different algorithms. `LightGBM` and `WeightedEnsemble_L2` have the best performance when predicting "Category" in our case. 179 | 180 | ``` 181 | Fitting model: CatBoost ... 182 | 0.9219 = Validation score (accuracy) 183 | 22.61s = Training runtime 184 | 0.01s = Validation runtime 185 | Fitting model: ExtraTreesGini ... 186 | 0.8586 = Validation score (accuracy) 187 | 1.72s = Training runtime 188 | 0.1s = Validation runtime 189 | Fitting model: ExtraTreesEntr ... 190 | 0.8608 = Validation score (accuracy) 191 | 1.44s = Training runtime 192 | 0.1s = Validation runtime 193 | Fitting model: XGBoost ... 194 | 0.9325 = Validation score (accuracy) 195 | 2.72s = Training runtime 196 | 0.01s = Validation runtime 197 | Fitting model: NeuralNetTorch ... 198 | 0.9219 = Validation score (accuracy) 199 | 2.89s = Training runtime 200 | 0.02s = Validation runtime 201 | Fitting model: LightGBMLarge ... 202 | 0.9198 = Validation score (accuracy) 203 | 5.81s = Training runtime 204 | 0.03s = Validation runtime 205 | Fitting model: WeightedEnsemble_L2 ... 206 | 0.9325 = Validation score (accuracy) 207 | 0.77s = Training runtime 208 | 0.0s = Validation runtime 209 | ``` 210 | 211 | We can also view the leaderboard that shows the performance of each algorithm when predicting a specific column. Here is an example for “XYZ”: 212 | 213 | | |model|score_val|pred_time_val|fit_time|pred_time_val_marginal|fit_time_marginal|stack_level|can_infer| 214 | |:----|:----|:----|:----|:----|:----|:----|:----|:----| 215 | |0|LightGBM|0.936975|0.04584|2.798952|0.04584|2.798952|1|TRUE| 216 | |1|WeightedEnsemble_L2|0.936975|0.046272|3.212549|0.000432|0.413597|2|TRUE| 217 | |2|CatBoost|0.932773|0.010337|4.887029|0.010337|4.887029|1|TRUE| 218 | |3|XGBoost|0.930672|0.014278|2.400199|0.014278|2.400199|1|TRUE| 219 | |4|LightGBMXT|0.930672|0.021188|2.24326|0.021188|2.24326|1|TRUE| 220 | |5|LightGBMLarge|0.930672|0.075606|5.641281|0.075606|5.641281|1|TRUE| 221 | |6|RandomForestGini|0.918067|0.102849|1.580466|0.102849|1.580466|1|TRUE| 222 | |7|RandomForestEntr|0.915966|0.102992|1.551508|0.102992|1.551508|1|TRUE| 223 | |8|KNeighborsDist|0.915966|0.122099|0.05007|0.122099|0.05007|1|TRUE| 224 | |9|KNeighborsUnif|0.915966|0.122558|0.046638|0.122558|0.046638|1|TRUE| 225 | |10|NeuralNetFastAI|0.913866|0.015536|2.055213|0.015536|2.055213|1|TRUE| 226 | |11|NeuralNetTorch|0.913866|0.017212|3.79111|0.017212|3.79111|1|TRUE| 227 | |12|ExtraTreesEntr|0.913866|0.103084|1.431155|0.103084|1.431155|1|TRUE| 228 | |13|ExtraTreesGini|0.913866|0.104022|1.44497|0.104022|1.44497|1|TRUE| 229 | 230 | Now we are ready to use the `MultiLabelPredictor` to predict all labels in new data, in this case our test data set. 231 | 232 | ```python 233 | predictions = cat_multi_predictor.predict(category_test_data_nolab) 234 | print("Predictions: \n", predictions) 235 | 236 | output_cat_df = category_test_data_nolab.copy() 237 | cat_result = pd.concat([output_cat_df, predictions], axis=1).reindex(output_cat_df.index) 238 | cat_result.head() 239 | cat_result.to_csv('data/cat_output.csv',index=False) 240 | ``` 241 | 242 | We can also easily evaluate the performance of our predictions 243 | 244 | ```python 245 | evaluations = cat_multi_predictor.evaluate(category_test_data) 246 | print(evaluations) 247 | print("Evaluated using metrics:", cat_multi_predictor.eval_metrics) 248 | ``` 249 | 250 | ### Model inference on `validation` data 251 | 252 | We first read our validation file, as follows: 253 | 254 | ```python 255 | validation_df = pd.read_excel('data/validation_set.xlsx') 256 | val_data = validation_df.drop(columns=['Case Category','Gender','District']) 257 | val_data.columns = ['sentence'] 258 | ``` 259 | 260 | And we use the model we have just trained to detect `Sector` on that data. 261 | 262 | ```python 263 | output = pred_sector.predict(val_data) 264 | output.head() 265 | ``` 266 | 267 | ``` 268 | 0 2 269 | 1 2 270 | 2 5 271 | 3 9 272 | 4 0 273 | Name: label, dtype: int64 274 | ``` 275 | 276 | Merge both dataframes for evaluation vs ground truth. 277 | 278 | ```python 279 | output_df = val_data.copy() 280 | output_df['predicted'] = output 281 | output_df.head() 282 | ``` 283 | 284 | ## How to solve multi-label classification using Amazon Comprehend (no-code) 285 | "Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases." ([source](https://docs.aws.amazon.com/comprehend/latest/dg/what-is.html)) 286 | 287 | ### Custom classification 288 | > Use custom classification to organize your documents into categories (classes) that you define. Custom classification is a two-step process. First, you train a custom classification model (also called a classifier) to recognize the classes that are of interest to you. Then you use your model to classify any number of document sets 289 | 290 | > For example, you can categorize the content of support requests so that you can route the request to the proper support team. Or you can categorize emails received from customers to provide guidance on the requests that customers are making. You can combine Amazon Comprehend with Amazon Transcribe to convert speech to text and then to classify the requests coming from support phone calls. 291 | 292 | > You can have multiple custom classifiers in your account, each trained using different data. When you submit a classification job, you choose which classifier to use. Amazon Comprehend returns results based on that classifier, how it was trained, and whether it was trained using multi-class or multi-label mode. For multi-class mode, you can classify a single document synchronously (in real-time) or classify a large document or set of documents asynchronously. The multi-label mode supports asynchronous jobs only." 293 | 294 | Read more about how to prepare your training data [here](https://docs.aws.amazon.com/comprehend/latest/dg/prep-classifier-data.html). 295 | 296 | ### Multi-label 297 | Amazon Comprehend provides the ability to train a custom classifier directly from the AWS console or through APIs without the need for any prior ML knowledge. You can [read more here](https://docs.aws.amazon.com/comprehend/latest/dg/prep-classifier-data-multi-label.html). 298 | 299 | > In multi-label classification, individual classes represent different categories, but these categories are somehow related and are not mutually exclusive. As a result, each document has at least one class assigned to it, but can have more. For example, a movie can simply be an action movie, or it can be an action movie, a science fiction movie, and a comedy, all at the same time. 300 | 301 | > For training, multi-label mode supports up to 1 million examples containing up to 100 unique classes. 302 | 303 | > You can provide training data as a CSV file or as an augmented manifest file from Amazon SageMaker Ground Truth. 304 | 305 | Using a CSV file 306 | 307 | > To train a custom classifier, you can provide training data as a two-column CSV file. In it, labels are provided in the first column, and documents are provided in the second. 308 | 309 | > Do not include headers for the individual columns. Including headers in your CSV file may cause runtime errors. Each line of the file contains one or more classes and the text of the training document. More than one class can be indicated by using a delimiter (such as a | ) between each class. 310 | 311 | ``` 312 | CLASS,Text of document 1 313 | CLASS,Text of document 2 314 | CLASS|CLASS|CLASS,Text of document 3 315 | ``` 316 | For example, the following line belongs to a CSV file that trains a custom classifier to detect genres in movie abstracts: 317 | 318 | ``` 319 | COMEDY|MYSTERY|SCIENCE_FICTION|TEEN,"A band of misfit teens become unlikely detectives when they discover troubling clues about their high school English teacher. Could the strange Mrs. Doe be an alien from outer space?" 320 | ``` 321 | 322 | The default delimiter between class names is a pipe (|). However, you can use a different character as a delimiter. The delimiter cannot be part of your class name. For example, if your classes are CLASS_1, CLASS_2, and CLASS_3, the underscore (_) is part of the class name. You cannot use then use an underscore as the delimiter for separating class names. 323 | 324 | 325 | ## How to solve multi-label classification using SageMaker JumpStart (low-code/no-code) 326 | 327 | "SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with machine learning. You can incrementally train and tune these models before deployment. JumpStart also provides solution-templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker." ([source](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)). 328 | 329 | We need to first log into SageMaker Studio and then go to "Jumpstart". We then look for "text classification" which is the task we are looking to solve for. We select a fine-tunable model as shown below. 330 | 331 | ![Amazon Sagemaker Jumpstart](images/jumpstart-01.png) 332 | 333 | From the model card / instructions, we are going to see the following: 334 | 335 | > **Fine-tune the Model on a New Dataset** 336 | 337 | > The Text Embedding model can be fine-tuned on any text classification dataset in the same way the model available for inference has been fine-tuned on the SST2 movie review dataset. 338 | 339 | > The model available for fine-tuning attaches a classification layer to the Text Embedding model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes detected in the input data. The fine-tuning step fine-tunes all the model parameters to minimize prediction error on the input data and returns the fine-tuned model. The model returned by fine-tuning can be further deployed for inference. Below are the instructions for how the training data should be formatted for input to the model. 340 | 341 | > Input: A directory containing a 'data.csv' file. 342 | > - Each row of the first column of 'data.csv' should have integer class labels between 0 to the number of classes. 343 | > - Each row of the second column should have the corresponding text. 344 | > - Output: A trained model that can be deployed for inference. 345 | 346 | > Below is an example of 'data.csv' file showing values in its first two columns. Note that the file should not have any header. 347 | 348 | > - 0 hide new secretions from the parental units 349 | > - 0 contains no wit , only labored gags 350 | > - 1 that loves its characters and communicates something rather beautiful about human nature 351 | ... ... 352 | 353 | We can now scroll up to "Train Model" and select where our data is located in S3. 354 | 355 | ![Amazon Sagemaker Jumpstart](images/jumpstart-02.png) 356 | 357 | Finally, we set the hyperparameters we want to use for our training and click "Train". Model training will start automatically and model weights will be saved in S3. We can then find this model and deploy a Sagemaker endpoint against it to use it for real-time inference or a batch transform job. 358 | 359 | ![Amazon Sagemaker Jumpstart](images/jumpstart-03.png) 360 | 361 | ### Continue reading 362 | 363 | We recommend you looking at the more exhaustive explaination shown in this blog post ["Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models"](https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/) around how to use Jumpstart for multi-label classification. Videos are included. 364 | 365 | ## Additional reading 366 | - [New built-in Amazon SageMaker algorithms for tabular data modeling: LightGBM, CatBoost, AutoGluon-Tabular, and TabTransformer](https://aws.amazon.com/blogs/machine-learning/new-built-in-amazon-sagemaker-algorithms-for-tabular-data-modeling-lightgbm-catboost-autogluon-tabular-and-tabtransformer/) 367 | 368 | ## Disclaimer 369 | - The content provided in this repository is for demonstration purposes and not meant for production. You should use your own discretion when using the content. 370 | - The ideas and opinions outlined in these examples are my own and do not represent the opinions of AWS. --------------------------------------------------------------------------------