├── .gitignore ├── LICENSE ├── README.md ├── analysis.ipynb ├── figures ├── 2D_raw_and SHAP.png ├── clustering.excalidraw ├── clustering.png ├── clustering_.png ├── feature_image.png ├── ppt.pptx ├── shap.jpg └── shap.svg ├── plots ├── 2D_SHAP.png ├── 2D_SHAP_clusters.png ├── 2D_SHAP_clusters_coloured_by_raw_values.png ├── 2D_SHAP_clusters_feature.png ├── 2D_raw.png ├── 2D_raw_and SHAP.png ├── MASV.png └── distributions.png ├── requirements.txt └── style.mplstyle /.gitignore: -------------------------------------------------------------------------------- 1 | .excalidraw 2 | .pptx 3 | 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | 9 | # C extensions 10 | *.so 11 | 12 | # Distribution / packaging 13 | .Python 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | pip-wheel-metadata/ 27 | share/python-wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | MANIFEST 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .nox/ 47 | .coverage 48 | .coverage.* 49 | .cache 50 | nosetests.xml 51 | coverage.xml 52 | *.cover 53 | *.py,cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Django stuff: 62 | *.log 63 | local_settings.py 64 | db.sqlite3 65 | db.sqlite3-journal 66 | 67 | # Flask stuff: 68 | instance/ 69 | .webassets-cache 70 | 71 | # Scrapy stuff: 72 | .scrapy 73 | 74 | # Sphinx documentation 75 | docs/_build/ 76 | 77 | # PyBuilder 78 | target/ 79 | 80 | # Jupyter Notebook 81 | .ipynb_checkpoints 82 | 83 | # IPython 84 | profile_default/ 85 | ipython_config.py 86 | 87 | # pyenv 88 | .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 98 | __pypackages__/ 99 | 100 | # Celery stuff 101 | celerybeat-schedule 102 | celerybeat.pid 103 | 104 | # SageMath parsed files 105 | *.sage.py 106 | 107 | # Environments 108 | .env 109 | .venv 110 | env/ 111 | venv/ 112 | ENV/ 113 | env.bak/ 114 | venv.bak/ 115 | 116 | # Spyder project settings 117 | .spyderproject 118 | .spyproject 119 | 120 | # Rope project settings 121 | .ropeproject 122 | 123 | # mkdocs documentation 124 | /site 125 | 126 | # mypy 127 | .mypy_cache/ 128 | .dmypy.json 129 | dmypy.json 130 | 131 | # Pyre type checker 132 | .pyre/ 133 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Aidan Cooper 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Supervised Clustering: How to Use SHAP Values for Better Cluster Analysis 2 | 3 | Full write up: [Supervised Clustering: How to Use SHAP Values for Better Cluster Analysis](https://www.aidancooper.co.uk/supervised-clustering-shap-values/) 4 | 5 | [Analysis notebook](analysis.ipynb) 6 | -------------------------------------------------------------------------------- /figures/2D_raw_and SHAP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/2D_raw_and SHAP.png -------------------------------------------------------------------------------- /figures/clustering.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/clustering.png -------------------------------------------------------------------------------- /figures/clustering_.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/clustering_.png -------------------------------------------------------------------------------- /figures/feature_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/feature_image.png -------------------------------------------------------------------------------- /figures/ppt.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/ppt.pptx -------------------------------------------------------------------------------- /figures/shap.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/figures/shap.jpg -------------------------------------------------------------------------------- /figures/shap.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /plots/2D_SHAP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_SHAP.png -------------------------------------------------------------------------------- /plots/2D_SHAP_clusters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_SHAP_clusters.png -------------------------------------------------------------------------------- /plots/2D_SHAP_clusters_coloured_by_raw_values.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_SHAP_clusters_coloured_by_raw_values.png -------------------------------------------------------------------------------- /plots/2D_SHAP_clusters_feature.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_SHAP_clusters_feature.png -------------------------------------------------------------------------------- /plots/2D_raw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_raw.png -------------------------------------------------------------------------------- /plots/2D_raw_and SHAP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/2D_raw_and SHAP.png -------------------------------------------------------------------------------- /plots/MASV.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/MASV.png -------------------------------------------------------------------------------- /plots/distributions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AidanCooper/shap-clustering/be6dc8ba76588635ae2ab021cea7ee81eccdbcd2/plots/distributions.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Package Version 2 | --------------------------------- --------- 3 | black 21.9b0 4 | hdbscan 0.8.28 5 | ipython 7.27.0 6 | jupyter 1.0.0 7 | lightgbm 3.2.1 8 | matplotlib 3.4.3 9 | numpy 1.20.3 10 | pandas 1.3.3 11 | pip 21.2.4 12 | scikit-learn 1.0 13 | seaborn 0.11.2 14 | shap 0.40.0 15 | six 1.16.0 16 | skope-rules 1.0.1 17 | umap-learn 0.5.3 -------------------------------------------------------------------------------- /style.mplstyle: -------------------------------------------------------------------------------- 1 | axes.prop_cycle : cycler(color=["003f5c", "d45087", "ffa600", "665191", "ff7c43", "2f4b7c", "f95d6a", "a05195"]) 2 | figure.figsize : 12, 8 3 | axes.titlesize : 22 4 | axes.labelsize : 18 5 | axes.facecolor : white 6 | axes.titleweight : 500 7 | axes.labelweight : 500 8 | axes.linewidth : 1.5 9 | axes.edgecolor : 2B3A42 10 | font.family : Noto Sans, DejaVu Sans 11 | font.weight : 500 12 | legend.frameon : False 13 | legend.fontsize : 12 14 | lines.linewidth : 2.0 15 | text.color : 2B3A42 16 | xtick.labelsize : 14 17 | xtick.color : 2B3A42 18 | ytick.labelsize : 14 19 | ytick.color : 2B3A42 20 | savefig.bbox : tight 21 | savefig.dpi : 300 22 | axes.unicode_minus : False 23 | figure.facecolor : white --------------------------------------------------------------------------------