├── .gitignore ├── Analysis.ipynb ├── README.md ├── benchmarks ├── k8s_benchmark_pool.sh ├── k8s_benchmark_serve.sh ├── k8s_ray_pool.py ├── k8s_serve_explanations.py ├── ray_pool.py └── serve_explanations.py ├── cluster ├── Makefile.pool ├── Makefile.serve ├── README.md ├── ray_cluster.yaml └── ray_pool_cluster.yaml ├── dockerfiles ├── Dockerfile └── Makefile ├── explainers ├── __init__.py ├── distributed.py ├── interface.py ├── kernel_shap.py ├── utils.py └── wrappers.py ├── images ├── pool_1_node.PNG ├── pool_k8s_32.PNG ├── pool_k8s_56.PNG ├── serve_1_node.PNG ├── serve_k8s_32.PNG └── serve_k8s_56.PNG ├── poetry.lock ├── pyproject.toml ├── requirements.txt ├── requirements_advanced.txt └── scripts ├── fit_adult_model.py └── process_adult_data.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # PyCharm 76 | .idea/ 77 | 78 | # pyenv 79 | .python-version 80 | 81 | # celery beat schedule file 82 | celerybeat-schedule 83 | 84 | # SageMath parsed files 85 | *.sage.py 86 | 87 | # Environments 88 | .env 89 | .venv 90 | env/ 91 | venv/ 92 | ENV/ 93 | env.bak/ 94 | venv.bak/ 95 | 96 | # Spyder project settings 97 | .spyderproject 98 | .spyproject 99 | 100 | # Rope project settings 101 | .ropeproject 102 | 103 | # mkdocs documentation 104 | /site 105 | 106 | # mypy 107 | .mypy_cache/ 108 | 109 | # Model binaries 110 | examples/*.h5 111 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Distributing KernelSHAP using `ray` 2 | 3 | This repository shows how to distribute explanations with KernelSHAP one a single node or a Kubernetes cluster using [`ray`](https://github.com/ray-project/ray). The predictions of a logistic regression model on `2560` instances from the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset are explained using KernelSHAP configured with a background set of `100` samples from the same dataset. The data preprocessing and model fitting steps are available in the `scripts/` folder, but both the data and the model will be automatically downloaded by the benchmarking scripts. 4 | 5 | ## Distributed KernelSHAP on a single multicore node 6 | ### Setup 7 | 8 | 1. Install [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/) 9 | 2. Create a virtual environment with `conda create --name shap python=3.7` 10 | 3. Activate the environment with `conda activate shap` 11 | 4. Execute `pip install .` in order to install the dependencies needed to run the benchmarking scripts 12 | 13 | ### Running the benchmarks 14 | 15 | Two code versions are available: 16 | 17 | - One using a parallel pool of `ray` actors, which consume small subsets of the `2560` dataset to be explained 18 | - One using `ray serve` instead of the parallel pool 19 | 20 | The two methods can be run from the repository root, using the scripts `benchmarks/ray_pool.py` and `bechmarks/serve_explanations.py`, respectively. Options that can be configured are: 21 | - number of actors/replicas that the task is going to be distributed on (e.g., `--workers 5` (pool), `--replicas 5` (ray serve)) 22 | - if a benchmark (i.e., redistributing the task over an increasingly large pool or number of replicas) is to be performed (`-benchmark 0` to disable or `benchmark 1` to enable) 23 | - the number of times the task is run for the same configuration in benchmarking mode (e.g, `--nruns 5`) 24 | - how many instances can be sent to an actor/replica at once (this is a required argument) (e.g., `-b 1 5 10` (pool) `-batch 1 5 10` (ray serve)). If more than one value is passed after the argument name, the task (or benchmarking) will be executed for different batch sizes 25 | 26 | ## Distributed KernelShap on a Kubernetes cluster 27 | ### Setup 28 | 29 | This requires you to have access to a Kubernetes cluster and have [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed. Don't forget to export the path to the cluster configuration `.yaml` file in your `KUBECONFIG` environment variable, as described [here](https://auth0.com/blog/kubernetes-tutorial-step-by-step-introduction-to-basic-concepts/) before moving on to the next steps. 30 | 31 | ### Running the benchmarks 32 | 33 | The `ray_pool.py` and `serve_explanations.py` have been modified to be deployable in the kubernetes and prefixed by `k8s_`. The benchmark experiments can be run via the `bash` scripts in the `benchmarks/` folder. These scripts: 34 | 35 | - Apply the appropriate k8s manifest in `cluster/` to the k8s cluster 36 | - Upload a `k8s*.py` file to it 37 | - Run the script 38 | - Pull the results and save them in the `results` directory 39 | 40 | Specifically: 41 | 42 | - Calling `bash benchmarks/k8s_benchmark_pool.sh 10 20 ` will run the benchmark with increasing number of workers (the cluster is reset as the number of workers is increased). By default the experiment is run with batches of sizes `1 5` and `10`. This can be changed by updating the value of `BATCH` in `cluster/Makefile.pool` 43 | - Calling `bash benchmarks/k8s_benchmark_serve.sh 10 20 ray` will run the benchmark with increasing number of workers and batch size of `1 5` and `10` for each worker. The batch size setting can be modified from the `.sh` script itself. The `ray` argument means that `ray` is able to batch single requests together and dispatch them to the same worker. If replaced by `default`, minibatches will be distributed to each worker 44 | 45 | ## Sample results 46 | ### Single node 47 | The experiments were run on a compute-optimized dedicated machine in Digital Ocean with 32vCPUs. This explains why the performance gains attenuation below. 48 | 49 | The results obtained running the task using the `ray` parallel pool are below: 50 | 51 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_1_node.PNG?raw=true) 52 | 53 | Distributing using ray serve yields similar results: 54 | 55 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_1_node.PNG?raw=true) 56 | ### Kubernetes cluster 57 | The experiments were run on a cluster consisting of two compute-optimized dedicated machine in Digital Ocean with 32vCPUs each. This explains why the performance gains attenuation below. 58 | 59 | The results obtained running the task using the `ray` parallel pool over a two-node cluster are shown below: 60 | 61 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_k8s_32.PNG?raw=true) 62 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_k8s_56.PNG?raw=true) 63 | 64 | Distributing using ray serve yields similar results: 65 | 66 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_k8s_32.PNG?raw=true) 67 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_k8s_56.PNG?raw=true) 68 | -------------------------------------------------------------------------------- /benchmarks/k8s_benchmark_pool.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | START=$1 3 | END=$2 4 | echo "Workers range tested: {$START..$END}" 5 | cd ./cluster || exit 6 | for i in $(seq "$START" "$END"); do 7 | echo "Distributing over a pool of size $i actors" 8 | make -f Makefile.pool deploy 9 | make -f Makefile.pool upload-script 10 | make -f Makefile.pool run-experiment WORKERS="$i" 11 | make -f Makefile.pool pull-results 12 | make -f Makefile.pool destroy 13 | done 14 | -------------------------------------------------------------------------------- /benchmarks/k8s_benchmark_serve.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | START=$1 3 | END=$2 4 | BATCH_MODE=$3 5 | BATCH_SIZE=(1 5 10) 6 | echo "Workers range tested: {$START..$END}" 7 | echo "Batch mode: $BATCH_MODE" 8 | cd ./cluster || exit 9 | for i in $(seq "$START" "$END"); do 10 | for j in "${BATCH_SIZE[@]}"; do 11 | echo "Distributing explanations over $i workers" 12 | echo "Current batch size: $j instances" 13 | make -f Makefile.serve deploy 14 | make -f Makefile.serve upload-script 15 | make -f Makefile.serve run-experiment WORKERS="$i" BATCH="$j" BATCH_MODE="$BATCH_MODE" 16 | make -f Makefile.serve pull-results 17 | make -f Makefile.serve destroy 18 | done 19 | done 20 | -------------------------------------------------------------------------------- /benchmarks/k8s_ray_pool.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import pickle 5 | import ray 6 | 7 | import numpy as np 8 | 9 | from explainers.kernel_shap import KernelShap 10 | from explainers.utils import get_filename, load_data, load_model 11 | from sklearn.metrics import accuracy_score 12 | from typing import Any, Dict 13 | from timeit import default_timer as timer 14 | 15 | 16 | logging.basicConfig(level=logging.INFO) 17 | 18 | 19 | def fit_kernel_shap_explainer(clf, data: dict, distributed_opts: Dict[str, Any] = None): 20 | """ 21 | Returns an a fitted KernelShap explainer for the classifier `clf`. The categorical variables are grouped according 22 | to the information specified in `data`. 23 | 24 | Parameters 25 | ---------- 26 | clf 27 | Classifier whose predictions are to be explained. 28 | data 29 | Contains the background data as well as information about the features and the columns in the feature matrix 30 | they occupy. 31 | distributed_opts 32 | Options controlling the number of worker processes that will distribute the workload. 33 | """ 34 | 35 | pred_fcn = clf.predict_proba 36 | group_names, groups = data['all']['group_names'], data['all']['groups'] 37 | explainer = KernelShap(pred_fcn, link='logit', feature_names=group_names, distributed_opts=distributed_opts, seed=0) 38 | explainer.fit(data['background']['X']['preprocessed'], group_names=group_names, groups=groups) 39 | return explainer 40 | 41 | 42 | def run_explainer(explainer, X_explain: np.ndarray, distributed_opts: dict, nruns: int, batch_size: int): 43 | """ 44 | Explain `X_explain` with `explainer` configured with `distributed_opts` `nruns` times in order to obtain 45 | runtime statistics. 46 | 47 | Parameters 48 | --------- 49 | explainer 50 | Fitted KernelShap explainer object 51 | X_explain 52 | Array containing instances to be explained, layed out row-wise. Split into minibatches that are distributed 53 | by the explainer. 54 | distributed_opts 55 | A dictionary of the form:: 56 | 57 | { 58 | 'n_cpus': int - controls the number of workers on which the instances are explained 59 | 'batch_size': int - the size of a minibatch into which the dateset is split 60 | 'actor_cpu_fraction': the fraction of CPU allocated to an actor 61 | } 62 | batch_size: 63 | The minibatch size for the current set of of `nruns` 64 | nruns 65 | Number of times `X_explain` is explained for a given workers and batch size setting. 66 | """ 67 | 68 | if not os.path.exists('./results'): 69 | os.mkdir('./results') 70 | 71 | result = {'t_elapsed': []} 72 | workers = distributed_opts['n_cpus'] 73 | # update minibatch size 74 | explainer._explainer.batch_size = batch_size 75 | for run in range(nruns): 76 | logging.info(f"run: {run}") 77 | t_start = timer() 78 | explanation = explainer.explain(X_explain, silent=True) 79 | t_elapsed = timer() - t_start 80 | logging.info(f"Time elapsed: {t_elapsed}") 81 | result['t_elapsed'].append(t_elapsed) 82 | 83 | with open(get_filename(workers, batch_size, serve=False), 'wb') as f: 84 | pickle.dump(result, f) 85 | 86 | 87 | def main(): 88 | 89 | # initialise ray 90 | ray.init(address='auto') 91 | 92 | # experiment settings 93 | nruns = args.nruns 94 | batch_sizes = [int(elem) for elem in args.batch] 95 | 96 | # load data and instances to be explained 97 | data = load_data() 98 | predictor = load_model('assets/predictor.pkl') # download if not available locally 99 | y_test, X_test_proc = data['all']['y']['test'], data['all']['X']['processed']['test'] 100 | logging.info(f"Test accuracy: {accuracy_score(y_test, predictor.predict(X_test_proc))}") 101 | X_explain = data['all']['X']['processed']['test'].toarray() # instances to be explained 102 | 103 | distributed_opts = {'n_cpus': args.workers} 104 | explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts) 105 | for batch_size in batch_sizes: 106 | logging.info(f"Running experiment using {args.workers} actors...") 107 | logging.info(f"Batch size: {batch_size}") 108 | run_explainer(explainer, X_explain, distributed_opts, nruns, batch_size) 109 | 110 | 111 | if __name__ == '__main__': 112 | parser = argparse.ArgumentParser() 113 | parser.add_argument( 114 | "-b", 115 | "--batch", 116 | nargs='+', 117 | help="A list of values representing the maximum batch size of instances sent to the same worker.", 118 | required=True, 119 | ) 120 | parser.add_argument( 121 | "-w", 122 | "--workers", 123 | default=1, 124 | type=int, 125 | help="The number of workers to distribute the explanations dataset on." 126 | ) 127 | parser.add_argument( 128 | "-n", 129 | "--nruns", 130 | default=5, 131 | type=int, 132 | 133 | help="Controls how many times an experiment is run for a given number of workers to obtain run statistics." 134 | ) 135 | args = parser.parse_args() 136 | main() 137 | -------------------------------------------------------------------------------- /benchmarks/k8s_serve_explanations.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import ray 5 | import pickle 6 | import requests 7 | import numpy as np 8 | 9 | import explainers.wrappers as wrappers 10 | 11 | from collections import namedtuple 12 | from ray import serve 13 | from timeit import default_timer as timer 14 | from typing import Any, Dict, List, Tuple 15 | from explainers.utils import get_filename, batch, load_data, load_model 16 | 17 | 18 | logging.basicConfig(level=logging.INFO) 19 | 20 | PREDICTOR_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl' 21 | PREDICTOR_PATH = 'assets/predictor.pkl' 22 | """ 23 | str: The file containing the predictor. The predictor can be created by running `fit_adult_model.py` or output by 24 | calling `explainers.utils.load_model()`, which will download a default predictor if `assets/` does not contain one. 25 | """ 26 | 27 | 28 | def endpont_setup(tag: str, backend_tag: str, route: str = "/"): 29 | """ 30 | Creates an endpoint for serving explanations. 31 | 32 | Parameters 33 | ---------- 34 | tag 35 | Endpoint tag. 36 | backend_tag 37 | A tag for the backend this explainer will connect to. 38 | route 39 | The URL where the explainer can be queried. 40 | """ 41 | serve.create_endpoint(tag, backend=backend_tag, route=route, methods=["GET"]) 42 | 43 | 44 | def backend_setup(tag: str, worker_args: Tuple, replicas: int, max_batch_size: int) -> None: 45 | """ 46 | Setups the backend for the distributed explanation task. 47 | 48 | Parameters 49 | ---------- 50 | tag 51 | A tag for the backend component. The same tag must be passed to `endpoint_setup`. 52 | worker_args 53 | A tuple containing the arguments for initialising the explainer and fitting it. 54 | replicas 55 | The number of backend replicas that serve explanations. 56 | max_batch_size 57 | Maximum number of requests to batch and send to a worker process. 58 | """ 59 | 60 | if max_batch_size == 1: 61 | config = {'num_replicas': max(replicas, 1)} 62 | serve.create_backend(tag, wrappers.KernelShapModel, *worker_args) 63 | else: 64 | config = {'num_replicas': max(replicas, 1), 'max_batch_size': max_batch_size} 65 | serve.create_backend(tag, wrappers.BatchKernelShapModel, *worker_args) 66 | serve.update_backend_config(tag, config) 67 | 68 | logging.info(f"Backends: {serve.list_backends()}") 69 | 70 | 71 | def prepare_explainer_args(data: Dict[str, Any]) -> Tuple[str, np.ndarray, dict, dict]: 72 | """ 73 | Extracts the name of the features (group_names) and the columns corresponding to each feature in the faeture matrix 74 | (group_names) from the `data` dict and defines the explainer arguments. The background data necessary to initialise 75 | the explainer is also extracted from the same dictionary. 76 | 77 | Parameters 78 | ---------- 79 | data 80 | A dictionary that contains all information necessary to initialise the explainer. 81 | 82 | Returns 83 | ------- 84 | A tuple containing the positional and keyword arguments necessary for initialising the explainers. 85 | """ 86 | 87 | groups = data['all']['groups'] 88 | group_names = data['all']['group_names'] 89 | background_data = data['background']['X']['preprocessed'] 90 | assert background_data.shape[0] == 100 91 | init_kwargs = {'link': 'logit', 'feature_names': group_names, 'seed': 0} 92 | fit_kwargs = {'groups': groups, 'group_names': group_names} 93 | predictor = load_model(PREDICTOR_URL) 94 | worker_args = (predictor, background_data, init_kwargs, fit_kwargs) 95 | 96 | return worker_args 97 | 98 | 99 | @ray.remote 100 | def distribute_request(instance: np.ndarray, url: str = "http://localhost:8000/explain") -> str: 101 | """ 102 | Task for distributing the explanations across the backend. 103 | 104 | Parameters 105 | ---------- 106 | instance 107 | Instance to be explained. 108 | url: 109 | The explainer URL. 110 | 111 | Returns 112 | ------- 113 | A str representation of the explanation output json file. 114 | """ 115 | 116 | resp = requests.get(url, json={"array": instance.tolist()}) 117 | return resp.json() 118 | 119 | 120 | def request_explanations(instances: List[np.ndarray], *, url: str) -> namedtuple: 121 | """ 122 | Sends the instances to the explainer URL. 123 | 124 | Parameters 125 | ---------- 126 | instances: 127 | Array of instances to be explained. 128 | url 129 | Explainer endpoint. 130 | 131 | 132 | Returns 133 | ------- 134 | responses 135 | A named tuple with a `responses` field and a `t_elapsed` field. 136 | """ 137 | 138 | run_output = namedtuple('run_output', 'responses t_elapsed') 139 | tstart = timer() 140 | responses_id = [distribute_request.remote(instance, url=url) for instance in instances] 141 | responses = [ray.get(resp_id) for resp_id in responses_id] 142 | t_elapsed = timer() - tstart 143 | logging.info(f"Time elapsed: {t_elapsed}...") 144 | 145 | return run_output(responses=responses, t_elapsed=t_elapsed) 146 | 147 | 148 | def run_explainer(X_explain: np.ndarray, 149 | n_runs: int, 150 | replicas: int, 151 | max_batch_size: int, 152 | batch_mode: str = 'ray', 153 | url: str = "http://localhost:8000/explain"): 154 | """ 155 | Setup an endpoint and a backend and send requests to the endpoint. 156 | 157 | Parameters 158 | ----------- 159 | X_explain 160 | Instances to be explained. Each row is an instance that is explained independently of others. 161 | n_runs 162 | Number of times to run an experiment where the entire set of explanations is sent to the explainer endpoint. 163 | Used to determine the average runtime given the number of cores. 164 | replicas 165 | How many backend replicas should be used for distributing the workload 166 | max_batch_size 167 | The maximum batch size the explainer accepts. 168 | batch_mode : {'ray', 'default'} 169 | If 'ray', ray_serve components are leveraged for minibatches. Otherwise the input tensor is split into 170 | minibatches which are sent to the endpoint. 171 | url 172 | The url of the explainer endpoint. 173 | """ 174 | 175 | result = {'t_elapsed': []} 176 | # extract instances to be explained from the dataset 177 | assert X_explain.shape[0] == 2560 178 | 179 | # split input into separate requests 180 | if batch_mode == 'ray': 181 | instances = np.split(X_explain, X_explain.shape[0]) # use ray serve to batch the requests 182 | logging.info(f"Explaining {len(instances)} instances...") 183 | else: 184 | instances = batch(X_explain, batch_size=max_batch_size) 185 | logging.info(f"Explaining {len(instances)} mini-batches of size {max_batch_size}...") 186 | 187 | # distribute it 188 | for run in range(n_runs): 189 | logging.info(f"Experiment run: {run}...") 190 | results = request_explanations(instances, url=url) 191 | result['t_elapsed'].append(results.t_elapsed) 192 | 193 | with open(get_filename(replicas, max_batch_size, serve=True), 'wb') as f: 194 | pickle.dump(result, f) 195 | 196 | 197 | def main(): 198 | 199 | if not os.path.exists('results'): 200 | os.mkdir('results') 201 | 202 | data = load_data() 203 | X_explain = data['all']['X']['processed']['test'].toarray() 204 | 205 | max_batch_size = [int(elem) for elem in args.max_batch_size][0] 206 | batch_mode, replicas = args.batch_mode, args.replicas 207 | ray.init(address='auto') # connect to the cluster 208 | serve.init(http_host='0.0.0.0') # listen on 0.0.0.0 to make endpoint accessible from other machines 209 | host, route = os.environ.get("RAY_HEAD_SERVICE_HOST", args.host), "explain" 210 | url = f"http://{host}:{args.port}/{route}" 211 | backend_tag = "kernel_shap:b100" # b100 means 100 background samples 212 | endpoint_tag = f"{backend_tag}_endpoint" 213 | worker_args = prepare_explainer_args(data) 214 | if batch_mode == 'ray': 215 | backend_setup(backend_tag, worker_args, replicas, max_batch_size) 216 | logging.info(f"Batching with max_batch_size of {max_batch_size} ...") 217 | else: # minibatches are sent to the ray worker 218 | backend_setup(backend_tag, worker_args, replicas, 1) 219 | logging.info(f"Minibatches distributed of size {max_batch_size} ...") 220 | endpont_setup(endpoint_tag, backend_tag, route=f"/{route}") 221 | 222 | run_explainer(X_explain, args.n_runs, replicas, max_batch_size, batch_mode=batch_mode, url=url) 223 | 224 | 225 | if __name__ == '__main__': 226 | parser = argparse.ArgumentParser() 227 | parser.add_argument( 228 | "-r", 229 | "--replicas", 230 | default=1, 231 | type=int, 232 | help="The number of backend replicas used to serve the explainer." 233 | ) 234 | parser.add_argument( 235 | "-batch", 236 | "--max_batch_size", 237 | nargs='+', 238 | help="A list of values representing the maximum batch size of pending queries sent to the same worker." 239 | "This should only contain one element as the backend is reset from `k8s_benchmark_serve.sh`.", 240 | required=True, 241 | ) 242 | parser.add_argument( 243 | "-batch_mode", 244 | type=str, 245 | default='ray', 246 | help="If set to 'ray' the batching will be leveraging ray serve. Otherwise, the input array is split into " 247 | "minibatches that are sent to the endpoint.", 248 | required=True, 249 | ) 250 | parser.add_argument( 251 | "-n", 252 | "--n_runs", 253 | default=5, 254 | type=int, 255 | help="Controls how many times an experiment is run (in benchmark mode) for a given number of cores to obtain " 256 | "run statistics." 257 | ) 258 | parser.add_argument( 259 | "-ho", 260 | "--host", 261 | default="localhost", 262 | type=str, 263 | help="Hostname." 264 | ) 265 | parser.add_argument( 266 | "-p", 267 | "--port", 268 | default="8000", 269 | type=str, 270 | help="Port." 271 | ) 272 | args = parser.parse_args() 273 | main() 274 | -------------------------------------------------------------------------------- /benchmarks/ray_pool.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import pickle 5 | import ray 6 | 7 | import numpy as np 8 | 9 | from explainers.kernel_shap import KernelShap 10 | from explainers.utils import get_filename, load_data, load_model 11 | from sklearn.metrics import accuracy_score 12 | from typing import Any, Dict 13 | from timeit import default_timer as timer 14 | 15 | logging.basicConfig(level=logging.INFO) 16 | 17 | 18 | def fit_kernel_shap_explainer(clf, data: dict, distributed_opts: Dict[str, Any] = None): 19 | """ 20 | Returns an a fitted KernelShap explainer for the classifier `clf`. The categorical variables are grouped according 21 | to the information specified in `data`. 22 | 23 | Parameters 24 | ---------- 25 | clf 26 | Classifier whose predictions are to be explained. 27 | data 28 | Contains the background data as well as information about the features and the columns in the feature matrix 29 | they occupy. 30 | distributed_opts 31 | Options controlling the number of worker processes that will distribute the workload. 32 | """ 33 | 34 | pred_fcn = clf.predict_proba 35 | group_names, groups = data['all']['group_names'], data['all']['groups'] 36 | explainer = KernelShap(pred_fcn, link='logit', feature_names=group_names, distributed_opts=distributed_opts, seed=0) 37 | explainer.fit(data['background']['X']['preprocessed'], group_names=group_names, groups=groups) 38 | return explainer 39 | 40 | 41 | def run_explainer(explainer, X_explain: np.ndarray, distributed_opts: dict, nruns: int): 42 | """ 43 | Explain `X_explain` with `explainer` configured with `distributed_opts` `nruns` times in order to obtain 44 | runtime statistics. 45 | 46 | Parameters 47 | --------- 48 | explainer 49 | Fitted KernelShap explainer object 50 | X_explain 51 | Array containing instances to be explained, layed out row-wise. Split into minibatches that are distributed 52 | by the explainer. 53 | distributed_opts 54 | A dictionary of the form:: 55 | 56 | { 57 | 'n_cpus': int - controls the number of workers on which the instances are explained 58 | 'batch_size': int - the size of a minibatch into which the dateset is split 59 | 'actor_cpu_fraction': the fraction of CPU allocated to an actor 60 | } 61 | nruns 62 | Number of times `X_explain` is explained for a given workers and batch size setting. 63 | """ 64 | 65 | if not os.path.exists('./results'): 66 | os.mkdir('./results') 67 | batch_size = distributed_opts['batch_size'] 68 | result = {'t_elapsed': []} 69 | workers = distributed_opts['n_cpus'] 70 | for run in range(nruns): 71 | logging.info(f"run: {run}") 72 | t_start = timer() 73 | explanation = explainer.explain(X_explain, silent=True) 74 | t_elapsed = timer() - t_start 75 | logging.info(f"Time elapsed: {t_elapsed}") 76 | result['t_elapsed'].append(t_elapsed) 77 | 78 | with open(get_filename(workers, batch_size, serve=False), 'wb') as f: 79 | pickle.dump(result, f) 80 | 81 | 82 | def main(): 83 | 84 | # experiment settings 85 | nruns = args.nruns if args.benchmark else 1 86 | batch_sizes = [int(elem) for elem in args.batch] 87 | 88 | # load data and instances to be explained 89 | data = load_data() 90 | predictor = load_model('assets/predictor.pkl') # download if not available locally 91 | y_test, X_test_proc = data['all']['y']['test'], data['all']['X']['processed']['test'] 92 | logging.info(f"Test accuracy: {accuracy_score(y_test, predictor.predict(X_test_proc))}") 93 | X_explain = data['all']['X']['processed']['test'].toarray() # instances to be explained 94 | 95 | if args.workers == -1: # sequential benchmark 96 | logging.info(f"Running sequential benchmark without ray ...") 97 | distributed_opts = {'batch_size': None, 'n_cpus': None, 'actor_cpu_fraction': 1.0} 98 | explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts=distributed_opts) 99 | run_explainer(explainer, X_explain, distributed_opts, nruns) 100 | # run distributed benchmark or simply explain on a number of cores, depending on args.benchmark value 101 | else: 102 | workers_range = range(1, args.workers + 1) if args.benchmark == 1 else range(args.workers, args.workers + 1) 103 | for workers in workers_range: 104 | for batch_size in batch_sizes: 105 | logging.info(f"Running experiment using {workers} actors...") 106 | logging.info(f"Running experiment with batch size {batch_size}") 107 | distributed_opts = {'batch_size': int(batch_size), 'n_cpus': workers, 'actor_cpu_fraction': 1.0} 108 | explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts) 109 | run_explainer(explainer, X_explain, distributed_opts, nruns) 110 | ray.shutdown() 111 | 112 | 113 | if __name__ == '__main__': 114 | parser = argparse.ArgumentParser() 115 | parser.add_argument( 116 | "-b", 117 | "--batch", 118 | nargs='+', 119 | help="A list of values representing the maximum batch size of instances sent to the same worker.", 120 | required=True, 121 | ) 122 | parser.add_argument( 123 | "-w", 124 | "--workers", 125 | default=-1, 126 | type=int, 127 | help="The number of processes to distribute the explanations dataset on. Set to -1 to run sequenential (without" 128 | "ray) version." 129 | ) 130 | parser.add_argument( 131 | "-benchmark", 132 | default=0, 133 | type=int, 134 | help="Set to 1 to benchmark parallel computation. In this case, explanations are distributed over cores in " 135 | "range(1, args.workers).!" 136 | ) 137 | parser.add_argument( 138 | "-n", 139 | "--nruns", 140 | default=5, 141 | type=int, 142 | help="Controls how many times an experiment is run (in benchmark mode) for a given number of workers to obtain " 143 | "run statistics." 144 | ) 145 | args = parser.parse_args() 146 | main() 147 | -------------------------------------------------------------------------------- /benchmarks/serve_explanations.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import ray 5 | import pickle 6 | import requests 7 | 8 | import numpy as np 9 | 10 | from collections import namedtuple 11 | from ray import serve 12 | from timeit import default_timer as timer 13 | from typing import Any, Dict, Tuple 14 | from explainers.wrappers import BatchKernelShapModel, KernelShapModel 15 | from explainers.utils import get_filename, load_data, load_model 16 | 17 | logging.basicConfig(level=logging.INFO) 18 | 19 | PREDICTOR_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl' 20 | PREDICTOR_PATH = 'assets/predictor.pkl' 21 | """ 22 | str: The file containing the predictor. The predictor can be created by running `fit_adult_model.py` or output by 23 | calling `utils.utils.load_model()`, which will download a default predictor if `assets/` does not contain one. 24 | """ 25 | 26 | 27 | def endpont_setup(tag: str, backend_tag: str, route: str = "/"): 28 | """ 29 | Creates an endpoint for serving explanations. 30 | Parameters 31 | ---------- 32 | tag 33 | Endpoint tag. 34 | backend_tag 35 | A tag for the backend this explainer will connect to. 36 | route 37 | The URL where the explainer can be queried. 38 | """ 39 | serve.create_endpoint(tag, backend=backend_tag, route=route, methods=["GET"]) 40 | 41 | 42 | def backend_setup(tag: str, worker_args: Tuple, replicas: int, max_batch_size: int) -> None: 43 | """ 44 | Setups the backend for the distributed explanation task. 45 | Parameters 46 | ---------- 47 | tag 48 | A tag for the backend component. The same tag must be passed to `endpoint_setup`. 49 | worker_args 50 | A tuple containing the arguments for initialising the explainer and fitting it. 51 | replicas 52 | The number of backend replicas that serve explanations. 53 | max_batch_size 54 | Maximum number of requests to batch and send to a worker process. 55 | """ 56 | 57 | serve.init() 58 | 59 | if max_batch_size == 1: 60 | config = {'num_replicas': max(replicas, 1)} 61 | serve.create_backend(tag, KernelShapModel, *worker_args) 62 | else: 63 | config = {'num_replicas': max(replicas, 1), 'max_batch_size': max_batch_size} 64 | serve.create_backend(tag, BatchKernelShapModel, *worker_args) 65 | serve.update_backend_config(tag, config) 66 | 67 | logging.info(f"Backends: {serve.list_backends()}") 68 | 69 | 70 | def prepare_explainer_args(data: Dict[str, Any]) -> Tuple[str, np.ndarray, dict, dict]: 71 | """ 72 | Extracts the name of the features (group_names) and the columns corresponding to each feature in the faeture matrix 73 | (group_names) from the `data` dict and defines the explainer arguments. The background data necessary to initialise 74 | the explainer is also extracted from the same dictionary. 75 | Parameters 76 | ---------- 77 | data 78 | A dictionary that contains all information necessary to initialise the explainer. 79 | Returns 80 | ------- 81 | A tuple containing the positional and keyword arguments necessary for initialising the explainers. 82 | """ 83 | 84 | groups = data['all']['groups'] 85 | group_names = data['all']['group_names'] 86 | background_data = data['background']['X']['preprocessed'] 87 | # assert background_data.shape[0] == 100 88 | init_kwargs = {'link': 'logit', 'feature_names': group_names, 'seed': 0} 89 | fit_kwargs = {'groups': groups, 'group_names': group_names} 90 | predictor = load_model(PREDICTOR_URL) 91 | worker_args = (predictor, background_data, init_kwargs, fit_kwargs) 92 | 93 | return worker_args 94 | 95 | 96 | @ray.remote 97 | def distribute_request(instance: np.ndarray, url: str = "http://localhost:8000") -> str: 98 | """ 99 | Task for distributing the explanations across the backend. 100 | Parameters 101 | ---------- 102 | instance 103 | Instance to be explained. 104 | url: 105 | The explainer URL. 106 | Returns 107 | ------- 108 | A str representation of the explanation output json file. 109 | """ 110 | 111 | resp = requests.get(url, json={"array": instance.tolist()}) 112 | return resp.json() 113 | 114 | 115 | def explain(data: np.ndarray, *, url: str) -> namedtuple: 116 | """ 117 | Sends the requests to the explainer URL. The `data` array is split into sub-array containing only one instance. 118 | Parameters 119 | ---------- 120 | data: 121 | Array of instances to be explained. 122 | url 123 | Explainer endpoint. 124 | Returns 125 | ------- 126 | responses 127 | A named tuple with a `responses` field and a `t_elapsed` field. 128 | """ 129 | 130 | run_output = namedtuple('run_output', 'responses t_elapsed') 131 | instances = np.split(data, data.shape[0]) 132 | logging.info(f"Explaining {len(instances)} instances!") 133 | tstart = timer() 134 | responses_id = [distribute_request.remote(instance, url=url) for instance in instances] 135 | responses = [ray.get(resp_id) for resp_id in responses_id] 136 | t_elapsed = timer() - tstart 137 | logging.info(f"Time elapsed: {t_elapsed}") 138 | 139 | return run_output(responses=responses, t_elapsed=t_elapsed) 140 | 141 | 142 | def distribute_explanations(n_runs: int, replicas: int, max_batch_size: int, address: str = "http://localhost:8000"): 143 | """ 144 | Setup an endpoint and a backend and send requests to the endpoint. 145 | Parameters 146 | ----------- 147 | n_runs 148 | Number of times to run an experiment where the entire set of explanations is sent to the explainer endpoint. 149 | Used to determine the average runtime given the number of cores. 150 | replicas 151 | How many backend replicas should be used for distributing the workload 152 | max_batch_size 153 | The maximum batch size the explainer accepts. 154 | address 155 | The url for the explainer endpoint. 156 | """ 157 | 158 | result = {'t_elapsed': []} 159 | route = "explain" 160 | backend_tag = "kernel_shap:b100" # b100 means 100 background samples 161 | endpoint_tag = f"{backend_tag}_endpoint" 162 | data = load_data() 163 | worker_args = prepare_explainer_args(data) 164 | backend_setup(backend_tag, worker_args, replicas, max_batch_size) 165 | endpont_setup(endpoint_tag, backend_tag, route=f"/{route}") 166 | # extract instances to be explained from the dataset 167 | X_explain = data['all']['X']['processed']['test'].toarray() 168 | assert X_explain.shape[0] == 2560 169 | for run in range(n_runs): 170 | logging.info(f"Experiment run: {run}") 171 | results = explain(X_explain, url=f"{address}/{route}") 172 | result['t_elapsed'].append(results.t_elapsed) 173 | 174 | with open(get_filename(replicas, max_batch_size, serve=True), 'wb') as f: 175 | pickle.dump(result, f) 176 | 177 | ray.shutdown() 178 | 179 | 180 | def main(): 181 | 182 | if not os.path.exists('results'): 183 | os.mkdir('results') 184 | 185 | address = f"http://{args.host}:{args.port}" 186 | batch_size_limits = [int(elem) for elem in args.max_batch_size] 187 | if args.benchmark: 188 | for replicas in range(1, args.replicas + 1): 189 | logging.info(f"Running on {replicas} backend replicas!") 190 | for max_batch_size in batch_size_limits: 191 | logging.info(f"Batching with max_batch_size of {max_batch_size}") 192 | distribute_explanations(args.nruns, replicas, max_batch_size, address=address) 193 | else: 194 | nruns = 1 195 | for max_batch_size in batch_size_limits: 196 | distribute_explanations(nruns, args.replicas, max_batch_size, address=address) 197 | 198 | 199 | if __name__ == '__main__': 200 | parser = argparse.ArgumentParser() 201 | parser.add_argument( 202 | "-r", 203 | "--replicas", 204 | default=1, 205 | type=int, 206 | help="The number of backend replicas used to serve the explainer." 207 | ) 208 | parser.add_argument( 209 | "-batch", 210 | "--max_batch_size", 211 | nargs='+', 212 | help="A list of values representing the maximum batch size of pending queries sent to the same worker.", 213 | required=True, 214 | ) 215 | parser.add_argument( 216 | "-benchmark", 217 | default=0, 218 | type=int, 219 | help="Set to 1 to benchmark parallel computation. In this case, explanations are distributed over replicas in " 220 | "range(1, args.replicas).!" 221 | ) 222 | parser.add_argument( 223 | "-n", 224 | "--nruns", 225 | default=5, 226 | type=int, 227 | help="Controls how many times an experiment is run (in benchmark mode) for a given number of cores to obtain " 228 | "run statistics." 229 | ) 230 | parser.add_argument( 231 | "-ho", 232 | "--host", 233 | default="localhost", 234 | type=str, 235 | help="Hostname." 236 | ) 237 | parser.add_argument( 238 | "-p", 239 | "--port", 240 | default="8000", 241 | type=str, 242 | help="Port." 243 | ) 244 | args = parser.parse_args() 245 | main() 246 | -------------------------------------------------------------------------------- /cluster/Makefile.pool: -------------------------------------------------------------------------------- 1 | NAMESPACE ?= pool-ray-cluster 2 | WORKERS ?= 2 3 | BATCH ?= 1 5 10 4 | 5 | SHELL := /bin/bash 6 | 7 | 8 | .ONESHELL: 9 | 10 | deploy: 11 | kubectl apply -f ray_pool_cluster.yaml 12 | kubectl rollout status deployment/ray-head -n ${NAMESPACE} 13 | kubectl rollout status deployment/ray-worker -n ${NAMESPACE} 14 | 15 | destroy: 16 | kubectl delete -f ray_pool_cluster.yaml 17 | 18 | reset: 19 | kubectl delete -n ${NAMESPACE} pod --all 20 | kubectl rollout status deployment/ray-head -n ${NAMESPACE} 21 | kubectl rollout status deployment/ray-worker -n ${NAMESPACE} 22 | 23 | upload-script: 24 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 25 | kubectl cp -n ${NAMESPACE} ../benchmarks/k8s_ray_pool.py $${POD}:k8s_ray_pool.py 26 | 27 | run-experiment: 28 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 29 | kubectl exec -it -n ${NAMESPACE} $${POD} -- python -W ignore k8s_ray_pool.py --batch ${BATCH} --workers ${WORKERS} 30 | 31 | pull-results: 32 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 33 | kubectl cp ${NAMESPACE}/$${POD}:/distributed_explainers/results ../results/ 34 | -------------------------------------------------------------------------------- /cluster/Makefile.serve: -------------------------------------------------------------------------------- 1 | NAMESPACE ?= kernel-shap-ray-cluster 2 | WORKERS ?= 2 3 | BATCH_MODE ?= ray 4 | # do not pass a list of values here as they are ignored in the python script. Batch is changed via bash. 5 | BATCH ?= 5 6 | 7 | SHELL := /bin/bash 8 | 9 | 10 | .ONESHELL: 11 | 12 | deploy: 13 | kubectl apply -f ray_cluster.yaml 14 | kubectl rollout status deployment/ray-head -n ${NAMESPACE} 15 | kubectl rollout status deployment/ray-worker -n ${NAMESPACE} 16 | 17 | destroy: 18 | kubectl delete -f ray_cluster.yaml 19 | 20 | reset: 21 | kubectl delete -n ${NAMESPACE} pod --all 22 | kubectl rollout status deployment/ray-head -n {NAMESPACE} 23 | kubectl rollout status deployment/ray-worker -n {NAMESPACE} 24 | 25 | upload-script: 26 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 27 | kubectl cp -n ${NAMESPACE} ../benchmarks/k8s_serve_explanations.py $${POD}:k8s_serve_explanations.py 28 | 29 | run-experiment: 30 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 31 | kubectl exec -it -n ${NAMESPACE} $${POD} -- python -W ignore k8s_serve_explanations.py -batch ${BATCH} -r ${WORKERS} -batch_mode ${BATCH_MODE} 32 | 33 | pull-results: 34 | POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"` 35 | kubectl cp ${NAMESPACE}/$${POD}:/distributed_explainers/results ../results/ 36 | -------------------------------------------------------------------------------- /cluster/README.md: -------------------------------------------------------------------------------- 1 | # Running distributed KernelSHAP 2 | 3 | To create a virtual environment that allows you to run KernelSHAP in a distributed fashion with [`ray`](https://github.com/ray-project/ray) you need to configure your environment first, which requires [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/) to be installed. You can then run the command:: 4 | 5 | `conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name` 6 | 7 | to create the environment and then activate it with `conda activate shap`. If you don not wish to change the installation path then you can skip the `-p` option. You are now ready to run the experiments. The steps involved are: 8 | 9 | 1. data processing 10 | 2. running the experiments 11 | 12 | To process the data it is sufficient to run `python preprocess_data.py` with the default options. This will output a preprocessed version of the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset and a partition of it that is used to initialise the KernelSHAP explainer. However, you can proceed to step 2 if you don't intend to change the default parameters as the same data will be automatically downloaded. 13 | 14 | You can run an experiment with the command `python experiment.py`. By default, this will run the explainer on the `2560` examples from the `Adult` dataset with a background dataset with `100` samples, sequentially (5 times if the `-benchmark 1` option is passed to it). The resuults are saved in the `results/` folder. If you wish to run the same explanations in parallel, then run the command 15 | 16 | `python experiment.py -cores 3` 17 | 18 | which will use `ray` to perform explanations across multiple cores. 19 | 20 | Other options for the script are: 21 | 22 | - `-benchmark`: if set to 1, `-cores` will be treated as the upper bound of number of cores to compute the explanations on. The lower bound is `2`, and the explanations are computed 5 times (by default) to provide runtime averages. The number of repetitions can be controlled using the `-nruns` argument. 23 | - `-batch_size`: controls how many instances are explained by a core at once. This parameter has an important bearing to the code runtime performance 24 | -------------------------------------------------------------------------------- /cluster/ray_cluster.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: kernel-shap-ray-cluster 5 | --- 6 | 7 | # Ray head node service, allowing worker pods to discover the head node. 8 | apiVersion: v1 9 | kind: Service 10 | metadata: 11 | namespace: kernel-shap-ray-cluster 12 | name: ray-head 13 | spec: 14 | ports: 15 | # Redis ports. 16 | - name: redis-primary 17 | port: 6379 18 | targetPort: 6379 19 | - name: redis-shard-0 20 | port: 6380 21 | targetPort: 6380 22 | - name: redis-shard-1 23 | port: 6381 24 | targetPort: 6381 25 | 26 | # Ray internal communication ports. 27 | - name: object-manager 28 | port: 12345 29 | targetPort: 12345 30 | - name: node-manager 31 | port: 12346 32 | targetPort: 12346 33 | - name: serve-explain 34 | port: 8000 35 | targetPort: 8000 36 | 37 | selector: 38 | component: ray-head 39 | 40 | --- 41 | 42 | apiVersion: apps/v1 43 | kind: Deployment 44 | metadata: 45 | namespace: kernel-shap-ray-cluster 46 | name: ray-head 47 | spec: 48 | # Do not change this - Ray currently only supports one head node per cluster. 49 | replicas: 1 50 | selector: 51 | matchLabels: 52 | component: ray-head 53 | type: ray 54 | template: 55 | metadata: 56 | labels: 57 | component: ray-head 58 | type: ray 59 | spec: 60 | # If the head node goes down, the entire cluster (including all worker 61 | # nodes) will go down as well. If you want Kubernetes to bring up a new 62 | # head node in this case, set this to "Always," else set it to "Never." 63 | restartPolicy: Always 64 | 65 | # This volume allocates shared memory for Ray to use for its plasma 66 | # object store. If you do not provide this, Ray will fall back to 67 | # /tmp which cause slowdowns if is not a shared memory volume. 68 | volumes: 69 | - name: dshm 70 | emptyDir: 71 | medium: Memory 72 | containers: 73 | - name: ray-head 74 | image: alexcoca/distributedkernelshap:0.6 75 | imagePullPolicy: Always 76 | command: [ "/bin/bash", "-c", "--" ] 77 | args: 78 | - "ray start --head --node-ip-address=$MY_POD_IP --redis-port=6379 --redis-shard-ports=6380,6381 --num-cpus=$MY_CPU_REQUEST --object-manager-port=12345 --node-manager-port=12346 --block" 79 | ports: 80 | - containerPort: 6379 # Redis port. 81 | - containerPort: 6380 # Redis port. 82 | - containerPort: 6381 # Redis port. 83 | - containerPort: 12345 # Ray internal communication. 84 | - containerPort: 12346 # Ray internal communication. 85 | - containerPort: 8000 86 | 87 | # This volume allocates shared memory for Ray to use for its plasma 88 | # object store. If you do not provide this, Ray will fall back to 89 | # /tmp which cause slowdowns if is not a shared memory volume. 90 | volumeMounts: 91 | - mountPath: /dev/shm 92 | name: dshm 93 | env: 94 | - name: MY_POD_IP 95 | valueFrom: 96 | fieldRef: 97 | fieldPath: status.podIP 98 | 99 | # This is used in the ray start command so that Ray can spawn the 100 | # correct number of processes. Omitting this may lead to degraded 101 | # performance. 102 | - name: MY_CPU_REQUEST 103 | valueFrom: 104 | resourceFieldRef: 105 | resource: requests.cpu 106 | resources: 107 | requests: 108 | cpu: 1 109 | memory: 512Mi 110 | 111 | --- 112 | apiVersion: apps/v1 113 | kind: Deployment 114 | metadata: 115 | namespace: kernel-shap-ray-cluster 116 | name: ray-worker 117 | spec: 118 | # Change this to scale the number of worker nodes started in the Ray cluster. 119 | replicas: 14 120 | selector: 121 | matchLabels: 122 | component: ray-worker 123 | type: ray 124 | template: 125 | metadata: 126 | labels: 127 | component: ray-worker 128 | type: ray 129 | spec: 130 | restartPolicy: Always 131 | volumes: 132 | - name: dshm 133 | emptyDir: 134 | medium: Memory 135 | containers: 136 | - name: ray-worker 137 | image: alexcoca/distributedkernelshap:0.6 138 | imagePullPolicy: Always 139 | command: ["/bin/bash", "-c", "--"] 140 | args: 141 | - "ray start --node-ip-address=$MY_POD_IP --num-cpus=$MY_CPU_REQUEST --address=$RAY_HEAD_SERVICE_HOST:$RAY_HEAD_SERVICE_PORT_REDIS_PRIMARY --object-manager-port=12345 --node-manager-port=12346 --block" 142 | ports: 143 | - containerPort: 12345 # Ray internal communication. 144 | - containerPort: 12346 # Ray internal communication. 145 | volumeMounts: 146 | - mountPath: /dev/shm 147 | name: dshm 148 | env: 149 | - name: MY_POD_IP 150 | valueFrom: 151 | fieldRef: 152 | fieldPath: status.podIP 153 | 154 | # This is used in the ray start command so that Ray can spawn the 155 | # correct number of processes. Omitting this may lead to degraded 156 | # performance. 157 | - name: MY_CPU_REQUEST 158 | valueFrom: 159 | resourceFieldRef: 160 | resource: requests.cpu 161 | resources: 162 | requests: 163 | cpu: 4 164 | memory: 512Mi 165 | -------------------------------------------------------------------------------- /cluster/ray_pool_cluster.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: pool-ray-cluster 5 | --- 6 | 7 | # Ray head node service, allowing worker pods to discover the head node. 8 | apiVersion: v1 9 | kind: Service 10 | metadata: 11 | namespace: pool-ray-cluster 12 | name: ray-head 13 | spec: 14 | ports: 15 | # Redis ports. 16 | - name: redis-primary 17 | port: 6379 18 | targetPort: 6379 19 | - name: redis-shard-0 20 | port: 6380 21 | targetPort: 6380 22 | - name: redis-shard-1 23 | port: 6381 24 | targetPort: 6381 25 | 26 | # Ray internal communication ports. 27 | - name: object-manager 28 | port: 12345 29 | targetPort: 12345 30 | - name: node-manager 31 | port: 12346 32 | targetPort: 12346 33 | - name: serve-explain 34 | port: 8000 35 | targetPort: 8000 36 | 37 | selector: 38 | component: ray-head 39 | 40 | --- 41 | 42 | apiVersion: apps/v1 43 | kind: Deployment 44 | metadata: 45 | namespace: pool-ray-cluster 46 | name: ray-head 47 | spec: 48 | # Do not change this - Ray currently only supports one head node per cluster. 49 | replicas: 1 50 | selector: 51 | matchLabels: 52 | component: ray-head 53 | type: ray 54 | template: 55 | metadata: 56 | labels: 57 | component: ray-head 58 | type: ray 59 | spec: 60 | # If the head node goes down, the entire cluster (including all worker 61 | # nodes) will go down as well. If you want Kubernetes to bring up a new 62 | # head node in this case, set this to "Always," else set it to "Never." 63 | restartPolicy: Always 64 | 65 | # This volume allocates shared memory for Ray to use for its plasma 66 | # object store. If you do not provide this, Ray will fall back to 67 | # /tmp which cause slowdowns if is not a shared memory volume. 68 | volumes: 69 | - name: dshm 70 | emptyDir: 71 | medium: Memory 72 | containers: 73 | - name: ray-head 74 | image: alexcoca/distributedkernelshap:0.6 75 | imagePullPolicy: Always 76 | command: [ "/bin/bash", "-c", "--" ] 77 | args: 78 | - "ray start --head --node-ip-address=$MY_POD_IP --redis-port=6379 --redis-shard-ports=6380,6381 --num-cpus=$MY_CPU_REQUEST --object-manager-port=12345 --node-manager-port=12346 --block" 79 | ports: 80 | - containerPort: 6379 # Redis port. 81 | - containerPort: 6380 # Redis port. 82 | - containerPort: 6381 # Redis port. 83 | - containerPort: 12345 # Ray internal communication. 84 | - containerPort: 12346 # Ray internal communication. 85 | - containerPort: 8000 86 | 87 | # This volume allocates shared memory for Ray to use for its plasma 88 | # object store. If you do not provide this, Ray will fall back to 89 | # /tmp which cause slowdowns if is not a shared memory volume. 90 | volumeMounts: 91 | - mountPath: /dev/shm 92 | name: dshm 93 | env: 94 | - name: MY_POD_IP 95 | valueFrom: 96 | fieldRef: 97 | fieldPath: status.podIP 98 | 99 | # This is used in the ray start command so that Ray can spawn the 100 | # correct number of processes. Omitting this may lead to degraded 101 | # performance. 102 | - name: MY_CPU_REQUEST 103 | valueFrom: 104 | resourceFieldRef: 105 | resource: requests.cpu 106 | resources: 107 | requests: 108 | cpu: 1 109 | memory: 512Mi 110 | 111 | --- 112 | apiVersion: apps/v1 113 | kind: Deployment 114 | metadata: 115 | namespace: pool-ray-cluster 116 | name: ray-worker 117 | spec: 118 | # Change this to scale the number of worker nodes started in the Ray cluster. 119 | replicas: 14 120 | selector: 121 | matchLabels: 122 | component: ray-worker 123 | type: ray 124 | template: 125 | metadata: 126 | labels: 127 | component: ray-worker 128 | type: ray 129 | spec: 130 | restartPolicy: Always 131 | volumes: 132 | - name: dshm 133 | emptyDir: 134 | medium: Memory 135 | containers: 136 | - name: ray-worker 137 | image: alexcoca/distributedkernelshap:0.6 138 | imagePullPolicy: Always 139 | command: ["/bin/bash", "-c", "--"] 140 | args: 141 | - "ray start --node-ip-address=$MY_POD_IP --num-cpus=$MY_CPU_REQUEST --address=$RAY_HEAD_SERVICE_HOST:$RAY_HEAD_SERVICE_PORT_REDIS_PRIMARY --object-manager-port=12345 --node-manager-port=12346 --block" 142 | ports: 143 | - containerPort: 12345 # Ray internal communication. 144 | - containerPort: 12346 # Ray internal communication. 145 | volumeMounts: 146 | - mountPath: /dev/shm 147 | name: dshm 148 | env: 149 | - name: MY_POD_IP 150 | valueFrom: 151 | fieldRef: 152 | fieldPath: status.podIP 153 | 154 | # This is used in the ray start command so that Ray can spawn the 155 | # correct number of processes. Omitting this may lead to degraded 156 | # performance. 157 | - name: MY_CPU_REQUEST 158 | valueFrom: 159 | resourceFieldRef: 160 | resource: requests.cpu 161 | resources: 162 | requests: 163 | cpu: 4 164 | memory: 512Mi 165 | -------------------------------------------------------------------------------- /dockerfiles/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM rayproject/autoscaler:ray-0.8.6 2 | WORKDIR /distributed_explainers 3 | COPY pyproject.toml . 4 | COPY explainers ./explainers 5 | RUN conda install python=3.7 6 | RUN pip install . 7 | -------------------------------------------------------------------------------- /dockerfiles/Makefile: -------------------------------------------------------------------------------- 1 | DOCKER_REPOSITORY ?= alexcoca 2 | 3 | IMAGE_NAME ?= distributedkernelshap 4 | IMAGE_VERSION ?= 0.6 5 | 6 | kernel-shap-image: 7 | docker build ../ -f Dockerfile -t ${IMAGE_NAME}:${IMAGE_VERSION} 8 | docker tag ${IMAGE_NAME}:${IMAGE_VERSION} ${DOCKER_REPOSITORY}/${IMAGE_NAME}:${IMAGE_VERSION} 9 | 10 | push-kernel-shap-image: kernel-shap-image 11 | docker push ${DOCKER_REPOSITORY}/${IMAGE_NAME}:${IMAGE_VERSION} 12 | -------------------------------------------------------------------------------- /explainers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/explainers/__init__.py -------------------------------------------------------------------------------- /explainers/distributed.py: -------------------------------------------------------------------------------- 1 | import ray 2 | 3 | import numpy as np 4 | 5 | from functools import partial 6 | from scipy import sparse 7 | from typing import Any, Callable, Dict, List, Optional, Union 8 | from explainers.utils import batch 9 | 10 | 11 | def kernel_shap_target_fn(actor: Any, instances: tuple, kwargs: Optional[Dict] = None) -> Callable: 12 | """ 13 | A target function that is executed in parallel given an actor pool. Its arguments must be an actor and a batch of 14 | values to be processed by the actor. Its role is to execute distributed computations when an actor is available. 15 | 16 | Parameters 17 | ---------- 18 | actor 19 | A `ray` actor. This is typically a class decorated with the @ray.remote decorator, that has been subsequently 20 | instantiated using cls.remote(*args, **kwargs). 21 | instances 22 | A (batch_index, batch) tuple containing the batch of instances to be explained along with a batch index. 23 | kwargs 24 | A list of keyword arguments for the actor `shap_values` method. 25 | 26 | Returns 27 | ------- 28 | A callable that can be used as a target process for a parallel pool of actor objects. 29 | """ 30 | 31 | if kwargs is None: 32 | kwargs = {} 33 | 34 | return actor.get_explanation.remote(instances, **kwargs) 35 | 36 | 37 | def kernel_shap_postprocess_fn(ordered_result: List[Union[np.ndarray, List[np.ndarray]]]) \ 38 | -> List[Union[np.ndarray, List[np.ndarray]]]: 39 | """ 40 | Merges the results of the batched computation for KernelShap. 41 | 42 | Parameters 43 | ---------- 44 | ordered_result 45 | A list containing the results for each batch, in the order that the batch was submitted to the parallel pool. 46 | It may contain: 47 | - `np.ndarray` objects (single-output predictor) 48 | - lists of `np.ndarray` objects (multi-output predictors) 49 | 50 | Returns 51 | ------- 52 | concatenated 53 | A list containing the concatenated results for all the batches. 54 | """ 55 | if isinstance(ordered_result[0], np.ndarray): 56 | return np.concatenate(ordered_result, axis=0) 57 | 58 | # concatenate explanations for every class 59 | n_classes = len(ordered_result[0]) 60 | to_concatenate = [list(zip(*ordered_result))[idx] for idx in range(n_classes)] 61 | concatenated = [np.concatenate(arrays, axis=0) for arrays in to_concatenate] 62 | return concatenated 63 | 64 | 65 | def invert_permutation(p: list): 66 | """ 67 | Inverts a permutation. 68 | 69 | Parameters: 70 | ----------- 71 | p 72 | Some permutation of 0, 1, ..., len(p)-1. Returns an array s, where s[i] gives the index of i in p. 73 | 74 | Returns 75 | ------- 76 | s 77 | `s[i]` gives the index of `i` in `p`. 78 | """ 79 | 80 | s = np.empty_like(p) 81 | s[p] = np.arange(len(p)) 82 | return s 83 | 84 | 85 | class DistributedExplainer: 86 | """ 87 | A class that orchestrates the execution of the execution of a batch of explanations in parallel. 88 | """ 89 | 90 | def __init__(self, distributed_opts, explainer_type, init_args, init_kwargs): 91 | 92 | self.n_jobs = distributed_opts['n_cpus'] 93 | self.n_actors = int(distributed_opts['n_cpus'] // distributed_opts['actor_cpu_fraction']) 94 | self.actor_cpu_frac = distributed_opts['actor_cpu_fraction'] 95 | self.batch_size = distributed_opts['batch_size'] 96 | self.algorithm = distributed_opts['algorithm'] 97 | self.target_fn = globals()[f"{distributed_opts['algorithm']}_target_fn"] 98 | try: 99 | self.post_process_fcn = globals()[f"{distributed_opts['algorithm']}_postprocess_fn"] 100 | except KeyError: 101 | self.post_process_fcn = None 102 | 103 | self.explainer = explainer_type 104 | self.explainer_args = init_args 105 | self.explainer_kwargs = init_kwargs 106 | 107 | if not ray.is_initialized(): 108 | print(f"Initialising ray on {distributed_opts['n_cpus']} cpus!") 109 | ray.init(num_cpus=distributed_opts['n_cpus']) 110 | 111 | self.pool = self.create_parallel_pool() 112 | 113 | def __getattr__(self, item): 114 | """ 115 | Access to actor attributes. Should be used to retrieve only state that is shared by all actors in the pool. 116 | """ 117 | actor = self.pool._idle_actors[0] 118 | return ray.get(actor.return_attribute.remote(item)) 119 | 120 | def create_parallel_pool(self): 121 | """ 122 | Creates a pool of actors (aka proceses containing explainers) that can execute explanations in parallel. 123 | """ 124 | 125 | actor_handles = [ray.remote(self.explainer).options(num_cpus=self.actor_cpu_frac) for _ in range(self.n_actors)] 126 | 127 | actors = [handle.remote(*self.explainer_args, **self.explainer_kwargs) for handle in actor_handles] 128 | return ray.util.ActorPool(actors) 129 | 130 | def get_explanation(self, X: np.ndarray, **kwargs) -> np.ndarray: 131 | """ 132 | Performs distributed explanations of instances in `X`. 133 | 134 | Parameters 135 | ---------- 136 | X 137 | A batch of instances to be explained. Split into batches according to the settings passed to the constructor. 138 | kwargs 139 | Any keyword-arguments for the explainer `explain` method. 140 | 141 | Returns 142 | -------- 143 | An array of explanations. 144 | """ # noqa E501 145 | 146 | if kwargs is not None: 147 | target_fn = partial(self.target_fn, kwargs=kwargs) 148 | else: 149 | target_fn = self.target_fn 150 | batched_instances = batch(X, batch_size=self.batch_size, n_batches=self.n_jobs) 151 | 152 | unordered_explanations = self.pool.map_unordered(target_fn, enumerate(batched_instances)) 153 | 154 | return self.order_result(unordered_explanations) 155 | 156 | def order_result(self, unordered_result: List[tuple]) -> np.ndarray: 157 | """ 158 | Re-orders the result of a distributed explainer so that the explanations follow the same order as the input to 159 | the explainer. 160 | 161 | 162 | Parameters 163 | ---------- 164 | unordered_result 165 | Each tuple contains the batch id as the first entry and the explanations for that batch as the second. 166 | 167 | Returns 168 | ------- 169 | A numpy array where the the batches ordered according to their batch id are concatenated in a single array. 170 | """ 171 | 172 | # TODO: THIS DOES NOT LEVERAGE THE FACT THAT THE RESULTS ARE RETURNED AS AVAILABLE. ISSUE TO BE RAISED. 173 | 174 | result_order, results = list(zip(*[(idx, res) for idx, res in unordered_result])) 175 | orig_order = invert_permutation(list(result_order)) 176 | ordered_result = [results[idx] for idx in orig_order] 177 | if self.post_process_fcn is not None: 178 | return self.post_process_fcn(ordered_result) 179 | return ordered_result 180 | -------------------------------------------------------------------------------- /explainers/interface.py: -------------------------------------------------------------------------------- 1 | import abc 2 | import attr 3 | import copy 4 | import json 5 | import logging 6 | 7 | import numpy as np 8 | 9 | from collections import ChainMap 10 | from prettyprinter import pretty_repr 11 | from typing import Any 12 | 13 | # KernelSHAP 14 | DEFAULT_META_KERNEL_SHAP = { 15 | "name": None, 16 | "type": ["blackbox"], 17 | "task": None, 18 | "explanations": ["local", "global"], 19 | "params": {} 20 | } # type: dict 21 | """ 22 | Default KernelSHAP metadata. 23 | """ 24 | 25 | DEFAULT_DATA_KERNEL_SHAP = { 26 | "shap_values": [], 27 | "expected_value": [], 28 | "link": 'identity', 29 | "categorical_names": {}, 30 | "feature_names": [], 31 | "raw": { 32 | "raw_prediction": None, 33 | "prediction": None, 34 | "instances": None, 35 | "importances": {}, 36 | } 37 | } # type: dict 38 | """ 39 | Default KernelSHAP data. 40 | """ 41 | 42 | 43 | logger = logging.getLogger(__name__) 44 | 45 | # default metadata 46 | DEFAULT_META = { 47 | "name": None, 48 | "type": [], 49 | "explanations": [], 50 | "params": {}, 51 | } # type: dict 52 | 53 | 54 | @attr.s 55 | class Explainer(abc.ABC): 56 | """ 57 | Base class for explainer algorithms 58 | """ 59 | 60 | meta = attr.ib(default=copy.deepcopy(DEFAULT_META), repr=pretty_repr) # type: dict 61 | 62 | def __attrs_post_init__(self): 63 | # add a name to the metadata dictionary 64 | self.meta["name"] = self.__class__.__name__ 65 | 66 | # expose keys stored in self.meta as attributes of the class. 67 | for key, value in self.meta.items(): 68 | setattr(self, key, value) 69 | 70 | @abc.abstractmethod 71 | def explain(self, X: Any) -> "Explanation": 72 | pass 73 | 74 | 75 | class FitMixin(abc.ABC): 76 | @abc.abstractmethod 77 | def fit(self, X: Any) -> "Explainer": 78 | pass 79 | 80 | 81 | @attr.s 82 | class Explanation: 83 | """ 84 | Explanation class returned by explainers. 85 | """ 86 | meta = attr.ib(repr=pretty_repr) # type: dict 87 | data = attr.ib(repr=pretty_repr) # type: dict 88 | 89 | def __attrs_post_init__(self): 90 | """ 91 | Expose keys stored in self.meta and self.data as attributes of the class. 92 | """ 93 | for key, value in ChainMap(self.meta, self.data).items(): 94 | setattr(self, key, value) 95 | 96 | def to_json(self) -> str: 97 | """ 98 | Serialize the explanation data and metadata into a json format. 99 | 100 | Returns 101 | ------- 102 | String containing json representation of the explanation 103 | """ 104 | return json.dumps(attr.asdict(self), cls=NumpyEncoder) 105 | 106 | @classmethod 107 | def from_json(cls, jsonrepr) -> "Explanation": 108 | """ 109 | Create an instance of an Explanation class using a json representation of the Explanation. 110 | 111 | Parameters 112 | ---------- 113 | jsonrepr 114 | json representation of an explanation 115 | 116 | Returns 117 | ------- 118 | An Explanation object 119 | """ 120 | dictrepr = json.loads(jsonrepr) 121 | try: 122 | meta = dictrepr['meta'] 123 | data = dictrepr['data'] 124 | except KeyError: 125 | logger.exception("Invalid explanation representation") 126 | return cls(meta=meta, data=data) 127 | 128 | def __getitem__(self, item): 129 | """ 130 | This method is purely for deprecating previous behaviour of accessing explanation 131 | data via items in the returned dictionary. 132 | """ 133 | import warnings 134 | msg = "The Explanation object is not a dictionary anymore and accessing elements should " \ 135 | "be done via attribute access. Accessing via item will stop working in a future version." 136 | warnings.warn(msg, DeprecationWarning, stacklevel=2) 137 | return getattr(self, item) 138 | 139 | 140 | class NumpyEncoder(json.JSONEncoder): 141 | def default(self, obj): 142 | if isinstance( 143 | obj, 144 | ( 145 | np.int_, 146 | np.intc, 147 | np.intp, 148 | np.int8, 149 | np.int16, 150 | np.int32, 151 | np.int64, 152 | np.uint8, 153 | np.uint16, 154 | np.uint32, 155 | np.uint64, 156 | ), 157 | ): 158 | return int(obj) 159 | elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)): 160 | return float(obj) 161 | elif isinstance(obj, (np.ndarray,)): 162 | return obj.tolist() 163 | return json.JSONEncoder.default(self, obj) 164 | -------------------------------------------------------------------------------- /explainers/kernel_shap.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import logging 3 | import shap 4 | import warnings 5 | 6 | import numpy as np 7 | import pandas as pd 8 | 9 | from explainers.interface import DEFAULT_META_KERNEL_SHAP, DEFAULT_DATA_KERNEL_SHAP, Explanation, Explainer, FitMixin 10 | from explainers.utils import methdispatch 11 | from explainers.distributed import DistributedExplainer 12 | from functools import partial 13 | from scipy import sparse 14 | from shap import KernelExplainer 15 | from shap.common import DenseData, DenseDataWithIndex, convert_to_link 16 | from typing import Any, Callable, Dict, List, Optional, Sequence, Union, Tuple, TYPE_CHECKING 17 | 18 | if TYPE_CHECKING: 19 | import catboost # noqa F401 20 | 21 | logger = logging.getLogger(__name__) 22 | 23 | KERNEL_SHAP_PARAMS = [ 24 | 'link', 25 | 'group_names', 26 | 'groups', 27 | 'weights', 28 | 'summarise_background', 29 | 'summarise_result', 30 | 'kwargs', 31 | ] 32 | 33 | KERNEL_SHAP_BACKGROUND_THRESHOLD = 300 34 | 35 | 36 | def rank_by_importance(shap_values: List[np.ndarray], 37 | feature_names: Union[List[str], Tuple[str], None] = None) -> Dict: 38 | """ 39 | Given the shap values estimated for a multi-output model, this function ranks 40 | features according to their importance. The feature importance is the average 41 | absolute value for a given feature. 42 | 43 | Parameters 44 | ---------- 45 | shap_values 46 | Each element corresponds to a samples x features array of shap values corresponding 47 | to each model output. 48 | feature_names 49 | Each element is the name of the column with the corresponding index in each of the 50 | arrays in the `shap_values` list. 51 | 52 | Returns 53 | ------- 54 | importances 55 | A dictionary of the form:: 56 | 57 | { 58 | '0': {'ranked_effect': array([0.2, 0.5, ...]), 'names': ['feat_3', 'feat_5', ...]}, 59 | '1': {'ranked_effect': array([0.3, 0.2, ...]), 'names': ['feat_6', 'feat_1', ...]}, 60 | ... 61 | 'aggregated': {'ranked_effect': array([0.9, 0.7, ...]), 'names': ['feat_3', 'feat_6', ...]} 62 | } 63 | 64 | The keys of the first level represent the index of the model output. The feature effects in 65 | `ranked_effect` and the corresponding feature names in `names` are sorted from highest (most 66 | important) to lowest (least important). The values in the `aggregated` field are obtained by 67 | summing the shap values for all the model outputs and then computing the effects. Given an 68 | output, the effects are defined as the average magnitude of the shap values across the instances 69 | to be explained. 70 | """ 71 | 72 | if len(shap_values[0].shape) == 1: 73 | shap_values = [np.atleast_2d(arr) for arr in shap_values] 74 | 75 | if not feature_names: 76 | feature_names = ['feature_{}'.format(i) for i in range(shap_values[0].shape[1])] 77 | else: 78 | if len(feature_names) != shap_values[0].shape[1]: 79 | msg = "The feature names provided do not match the number of shap values estimated. " \ 80 | "Received {} feature names but estimated {} shap values!" 81 | logger.warning(msg.format(len(feature_names), shap_values[0].shape[1])) 82 | feature_names = ['feature_{}'.format(i) for i in range(shap_values[0].shape[1])] 83 | 84 | importances = {} # type: Dict[str, Dict[str, np.ndarray]] 85 | avg_mag = [] # type: List 86 | 87 | # rank the features by average shap value for each class in turn 88 | for class_idx in range(len(shap_values)): 89 | avg_mag_shap = np.abs(shap_values[class_idx]).mean(axis=0) 90 | avg_mag.append(avg_mag_shap) 91 | feature_order = np.argsort(avg_mag_shap)[::-1] 92 | most_important = avg_mag_shap[feature_order] 93 | most_important_names = [feature_names[i] for i in feature_order] 94 | importances[str(class_idx)] = { 95 | 'ranked_effect': most_important, 96 | 'names': most_important_names, 97 | } 98 | 99 | # rank feature by average shap value for aggregated classes 100 | combined_shap = np.sum(avg_mag, axis=0) 101 | feature_order = np.argsort(combined_shap)[::-1] 102 | most_important_c = combined_shap[feature_order] 103 | most_important_c_names = [feature_names[i] for i in feature_order] 104 | importances['aggregated'] = { 105 | 'ranked_effect': most_important_c, 106 | 'names': most_important_c_names 107 | } 108 | 109 | return importances 110 | 111 | 112 | def sum_categories(values: np.ndarray, start_idx: Sequence[int], enc_feat_dim: Sequence[int]): 113 | """ 114 | This function is used to reduce specified slices in a two- or three- dimensional tensor. 115 | 116 | For two-dimensional `values` arrays, for each entry in start_idx, the function sums the 117 | following k columns where k is the corresponding entry in the enc_feat_dim sequence. 118 | The columns whose indices are not in start_idx are left unchanged. This arises when the slices 119 | contain the shap values for each dimension of an encoded categorical variable and a single shap 120 | value for each variable is desired. 121 | 122 | For three-dimensional `values` arrays, the reduction is applied for each rank 2 subtensor, first along 123 | the column dimension and then across the row dimension. This arises when summarising shap interaction values. 124 | Each rank 2 tensor is a E x E matrix of shap interaction values, where E is the dimension of the data after 125 | one-hot encoding. The result of applying the reduction yields a rank 2 tensor of dimension F x F, where F is the 126 | number of features (ie, the feature dimension of the data matrix before encoding). By applying this transformation, 127 | a single value describing the interaction of categorical features i and j and a single value describing the 128 | intearction of j and i is returned. 129 | 130 | Parameters 131 | ---------- 132 | values 133 | A two or three dimensional array to be reduced, as described above. 134 | start_idx 135 | The start indices of the columns to be summed. 136 | enc_feat_dim 137 | The number of columns to be summed, one for each start index. 138 | Returns 139 | ------- 140 | new_values 141 | An array whose columns have been summed according to the entries in `start_idx` and `enc_feat_dim`. 142 | """ 143 | 144 | if start_idx is None or enc_feat_dim is None: 145 | raise ValueError("Both the start indices or the encoding dimension need to be specified!") 146 | 147 | if not len(enc_feat_dim) == len(start_idx): 148 | raise ValueError("The lengths of the sequences of start indices and encodings must be equal!") 149 | 150 | n_encoded_levels = sum(enc_feat_dim) 151 | if n_encoded_levels > values.shape[-1]: 152 | raise ValueError("The sum of the encoded features dimensions exceeds data dimension!") 153 | 154 | if len(values.shape) not in (2, 3): 155 | raise ValueError( 156 | f"Shap value summarisation can only be applied to tensors of shap values (dim=2) or shap " 157 | f"interaction values (dim=3). The tensor to be summarised had dimension {values.shape}!" 158 | ) 159 | 160 | def _get_slices(start: Sequence[int], dim: Sequence[int], arr_trailing_dim: int) -> List[int]: 161 | """ 162 | Given start indices, encoding dimensions and the array trailing shape, this function returns 163 | an array where contiguous numbers are slices. This array is used to reduce along an axis 164 | only the slices `slice(start[i], start[i] + dim[i], 1)` from a tensor and leave all other slices 165 | unchanged. 166 | """ 167 | 168 | slices = [] # type: List[int] 169 | # first columns may not be reduced 170 | if start[0] > 0: 171 | slices.extend(tuple(range(start[0]))) 172 | 173 | # add all slices to reduce 174 | slices.extend([start[0], start[0] + dim[0]]) 175 | for s_idx, d in zip(start[1:], dim[1:]): 176 | last_idx = slices[-1] 177 | # some columns might not be reduced 178 | if last_idx < s_idx - 1: 179 | slices.extend(tuple(range(last_idx + 1, s_idx))) 180 | last_idx += (s_idx - last_idx - 2) 181 | # handle contiguous slices 182 | if s_idx == last_idx: 183 | slices.append(s_idx + d) 184 | else: 185 | slices.extend((s_idx, s_idx + d)) 186 | 187 | # avoid index error 188 | if start[-1] + dim[-1] == arr_trailing_dim: 189 | slices.pop() 190 | return slices 191 | 192 | # last few columns may not be reduced 193 | last_idx = slices[-1] 194 | if last_idx < arr_trailing_dim: 195 | slices.extend(tuple(range(last_idx + 1, arr_trailing_dim))) 196 | 197 | return slices 198 | 199 | def _reduction(arr, axis, indices=None): 200 | return np.add.reduceat(arr, indices, axis) 201 | 202 | # create array of slices to be reduced 203 | slices = _get_slices(start_idx, enc_feat_dim, values.shape[-1]) 204 | if len(values.shape) == 3: 205 | reduction = partial(_reduction, indices=slices) 206 | return np.apply_over_axes(reduction, values, axes=(2, 1)) 207 | return np.add.reduceat(values, slices, axis=1) 208 | 209 | 210 | DISTRIBUTED_OPTS = { 211 | 'n_cpus': None, 212 | 'batch_size': None, 213 | 'actor_cpu_fraction': 1.0 214 | } 215 | 216 | 217 | class KernelExplainerWrapper(KernelExplainer): 218 | """ 219 | A wrapper around `shap.KernelExplainer` that supports: 220 | 221 | - fixing the seed when instantiating the KernelExplainer in a separate process 222 | - passing a batch index to the explainer so that a parallel explainer pool can return batches in arbitrary order 223 | """ 224 | 225 | def __init__(self, *args, **kwargs): 226 | if 'seed' in kwargs: 227 | seed = kwargs.pop('seed') 228 | np.random.seed(seed) 229 | super().__init__(*args, **kwargs) 230 | 231 | def get_explanation(self, X: Union[Tuple[int, np.ndarray], np.ndarray], **kwargs) -> Tuple[int, np.ndarray]: 232 | """ 233 | Wrapper around `shap.KernelExplainer.shap_values` that allows calling the method with a tuple containing a 234 | batch index and a batch of instances. 235 | 236 | Parameters 237 | ---------- 238 | X 239 | When called from a distributed context, it is a tuple containing a batch index and a batch to be explained. 240 | Otherwise, it is an array of instances to be explained. 241 | kwargs 242 | `shap.KernelExplainer` kwarg values. 243 | """ 244 | 245 | # handle call from distributed context 246 | with warnings.catch_warnings(): 247 | warnings.simplefilter("ignore") 248 | if isinstance(X, tuple): 249 | batch_idx, batch = X 250 | shap_values = super().shap_values(batch, **kwargs) 251 | return batch_idx, shap_values 252 | else: 253 | shap_values = super().shap_values(X, **kwargs) 254 | return shap_values 255 | 256 | def return_attribute(self, name: str) -> Any: 257 | """ 258 | Returns an attribute specified by its name. Used in a distributed context where the actor properties cannot be 259 | accessed using the dot syntax. 260 | """ 261 | return self.__getattribute__(name) 262 | 263 | 264 | class KernelShap(Explainer, FitMixin): 265 | 266 | def __init__(self, 267 | predictor: Callable, 268 | link: str = 'identity', 269 | feature_names: Union[List[str], Tuple[str], None] = None, 270 | categorical_names: Optional[Dict[int, List[str]]] = None, 271 | task: str = 'classification', 272 | seed: int = None, 273 | distributed_opts: Optional[Dict] = None): 274 | """ 275 | A wrapper around the `shap.KernelExplainer` class. It extends the current `shap` library functionality 276 | by allowing the user to specify variable groups in order to treat one-hot encoded categorical as one during 277 | sampling. The user can also specify whether to aggregate the `shap` values estimate for the encoded levels 278 | of categorical variables as an optional argument to `explain`, if grouping arguments are not passed to `fit`. 279 | 280 | Parameters 281 | ---------- 282 | predictor 283 | A callable that takes as an input a samples x features array and outputs a samples x n_outputs 284 | model outputs. The n_outputs should represent model output in margin space. If the model outputs 285 | probabilities, then the link should be set to 'logit' to ensure correct force plots. 286 | link 287 | Valid values are `'identity'` or `'logit'`. A generalized linear model link to connect the feature 288 | importance values to the model output. Since the feature importance values, :math:`\phi`, sum up to the 289 | model output, it often makes sense to connect them to the ouput with a link function where 290 | :math:`link(output - expected\_value) = sum(\phi)`. Therefore, for a model which outputs probabilities, 291 | `link='logit'` makes the feature effects have log-odds (evidence) units and `link='identity'` means that the 292 | feature effects have probability units. Please see this `example`_ for an in-depth discussion about the 293 | semantics of explaining the model in the probability or margin space. 294 | 295 | .. _example: 296 | https://github.com/slundberg/shap/blob/master/notebooks/kernel_explainer/Squashing%20Effect.ipynb 297 | 298 | feature_names 299 | Used to infer group names when categorical data is treated by grouping and `group_names` input to `fit` 300 | is not specified, assuming it has the same length as the `groups` argument of `fit` method. It is also used 301 | to compute the `names` field, which appears as a key in each of the values of 302 | `explanation.data['raw']['importances']`. 303 | categorical_names 304 | Keys are feature column indices in the `background_data` matrix (see `fit`). Each value contains strings 305 | with the names of the categories for the feature. Used to select the method for background data 306 | summarisation (if specified, subsampling is performed as opposed to k-means clustering). In the future it 307 | may be used for visualisation. 308 | task 309 | Can have values `'classification'` and `'regression'`. It is only used to set the contents of 310 | `explanation.data['raw']['prediction']` 311 | seed 312 | Fixes the random number stream, which influences which subsets are sampled during shap value estimation. 313 | distributed_opts 314 | A dictionary with the following structure:: 315 | 316 | { 317 | 'n_cpus': None, 318 | 'batch_size': None, 319 | } 320 | 321 | The entries represent: 322 | - `n_cpus`: an ``int`` representing the number of CPUs on which the input `X` to explain will be \ 323 | explained. If set to `None`, the code will run sequentially. 324 | - `batch_size`: and ``int`` representing how many instances should be explained on every CPU. If set to \ 325 | `None`, an input array is split in (roughly) equal parts and distributed across the available CPUs. 326 | 327 | The distributed explanation works only the `ray`_ library is installed. 328 | 329 | .. _ray: 330 | https://docs.ray.io/en/master/ 331 | 332 | Raises 333 | ------ 334 | ModuleNotFoundError 335 | If the `ray` library is not installed and `n_cpus` is set in `distributed_opts`. 336 | 337 | """ # noqa W605 338 | 339 | super().__init__(meta=copy.deepcopy(DEFAULT_META_KERNEL_SHAP)) 340 | 341 | self.link = link 342 | self.predictor = predictor 343 | self.feature_names = feature_names if feature_names else [] 344 | self.categorical_names = categorical_names if categorical_names else {} 345 | self.task = task 346 | self.seed = seed 347 | self._update_metadata({"task": self.task}) 348 | 349 | # if the user specifies groups but no names, the groups are automatically named 350 | self.use_groups = False 351 | # changes if feature groups indices are passed but not names 352 | self.create_group_names = False 353 | # if sum of groups entries matches first dimension as opposed to second, warn user 354 | self.transposed = False 355 | # if weights are not correctly specified, they are ignored 356 | self.ignore_weights = False 357 | # sums up shap values for each level of categorical var 358 | self.summarise_result = False 359 | # selects a subset of the background data to avoid excessively slow runtimes 360 | self.summarise_background = False 361 | # checks if it has been fitted: 362 | self._fitted = False 363 | self.distributed_opts = copy.deepcopy(DISTRIBUTED_OPTS) 364 | if distributed_opts: 365 | self.distributed_opts.update(distributed_opts) 366 | self.distributed_opts['algorithm'] = 'kernel_shap' 367 | self.distribute = True if self.distributed_opts['n_cpus'] else False 368 | 369 | def _check_inputs(self, 370 | background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix], 371 | group_names: Union[Tuple, List, None], 372 | groups: Optional[List[Union[Tuple[int], List[int]]]], 373 | weights: Union[Union[List[float], Tuple[float]], np.ndarray, None]) -> None: 374 | """ 375 | If user specifies parameter grouping, then we check input is correct or inform 376 | them if the settings they put might not behave as expected. 377 | """ 378 | 379 | if isinstance(background_data, shap.common.Data): 380 | # don't provide checks for situations where the user passes 381 | # the data object directly 382 | if not self.summarise_background: 383 | self.use_groups = False 384 | return 385 | # if summarisation took place, we do the checks to ensure everything is correct 386 | else: 387 | background_data = background_data.data 388 | 389 | if isinstance(background_data, np.ndarray) and background_data.ndim == 1: 390 | background_data = np.atleast_2d(background_data) 391 | 392 | if background_data.shape[0] > KERNEL_SHAP_BACKGROUND_THRESHOLD: 393 | msg = "Large datasets can cause slow runtimes for shap. The background dataset " \ 394 | "provided has {} records. Consider passing a subset or allowing the algorithm " \ 395 | "to automatically summarize the data by setting the summarise_background=True or" \ 396 | "setting summarise_background to 'auto' which will default to {} samples!" 397 | logger.warning(msg.format(background_data.shape[0], KERNEL_SHAP_BACKGROUND_THRESHOLD)) 398 | 399 | if group_names and not groups: 400 | logger.info( 401 | "Specified group_names but no corresponding sequence 'groups' with indices " 402 | "for each group was specified. All groups will have len=1." 403 | ) 404 | if not len(group_names) in background_data.shape: 405 | msg = "Specified {} group names but data dimension is {}. When grouping " \ 406 | "indices are not specifies the number of group names should equal " \ 407 | "one of the data dimensions! Igoring grouping inputs!" 408 | logger.warning(msg.format(len(group_names), background_data.shape)) 409 | self.use_groups = False 410 | 411 | if groups and not group_names: 412 | logger.warning( 413 | "No group names specified but groups specified! Automatically " 414 | "assigning 'group_' name for every index group specified!") 415 | if self.feature_names: 416 | n_groups = len(groups) 417 | n_features = len(self.feature_names) 418 | if n_features != n_groups: 419 | msg = "Number of feature names specified did not match the number of groups." \ 420 | "Specified {} groups and {} features names. Creating default names for " \ 421 | "specified groups" 422 | logger.warning(msg.format(n_groups, n_features)) 423 | self.create_group_names = True 424 | else: 425 | group_names = self.feature_names 426 | else: 427 | self.create_group_names = True 428 | 429 | if groups: 430 | if not (isinstance(groups[0], tuple) or isinstance(groups[0], list)): 431 | msg = "groups should be specified as List[Union[Tuple[int], List[int]]] where each " \ 432 | "sublist represents a group and int represent group instance. Specified group " \ 433 | "elements have type {}. Ignoring grouping inputs!" 434 | logger.warning(msg.format(type(groups[0]))) 435 | self.use_groups = False 436 | 437 | expected_dim = sum(len(g) for g in groups) 438 | if background_data.ndim == 1: 439 | actual_dim = background_data.shape[0] 440 | else: 441 | actual_dim = background_data.shape[1] 442 | if expected_dim != actual_dim: 443 | if background_data.shape[0] == expected_dim: 444 | logger.warning( 445 | "The sum of the group indices list did not match the " 446 | "data dimension along axis=1 but matched dimension " 447 | "along axis=0. Consider transposing the data!" 448 | ) 449 | self.transposed = True 450 | else: 451 | msg = "The sum of the group sizes specified did not match the number of features. " \ 452 | "Sum of group sizes: {}. Number of features: {}. Ignoring grouping inputs!" 453 | logger.warning(msg.format(expected_dim, actual_dim)) 454 | self.use_groups = False 455 | 456 | if group_names: 457 | n_groups = len(groups) 458 | n_group_names = len(group_names) 459 | if n_group_names != n_groups: 460 | msg = "The number of group names specified does not match the number of groups. " \ 461 | "Received {} groups and {} names! Ignoring grouping inputs!" 462 | logger.warning(msg.format(n_groups, n_group_names)) 463 | self.use_groups = False 464 | 465 | if weights is not None: 466 | if background_data.ndim == 1 or background_data.shape[0] == 1: 467 | logger.warning( 468 | "Specified weights but the background data has only one record. " 469 | "Weights will be ignored!" 470 | ) 471 | self.ignore_weights = True 472 | else: 473 | data_dim = background_data.shape[0] 474 | feat_dim = background_data.shape[1] 475 | weights_dim = len(weights) 476 | if data_dim != weights_dim: 477 | if not (feat_dim == weights_dim and self.transposed): 478 | msg = "The number of weights specified did not match data dimension. " \ 479 | "Number of weights: {}. Number of datapoints: {}. Weights will " \ 480 | "be ignored!" 481 | logger.warning(msg.format(weights_dim, data_dim)) 482 | self.ignore_weights = True 483 | 484 | # NB: we have already summarised the data at this point 485 | if self.summarise_background: 486 | 487 | weights_dim = len(weights) 488 | if background_data.ndim == 1: 489 | n_background_samples = 1 490 | else: 491 | if not self.transposed: 492 | n_background_samples = background_data.shape[0] 493 | else: 494 | n_background_samples = background_data.shape[1] 495 | 496 | if weights_dim != n_background_samples: 497 | msg = "The number of weights vector provided ({}) did not match the number of " \ 498 | "summary data points ({}). The weights provided will be ignored!" 499 | logger.warning(msg.format(weights_dim, n_background_samples)) 500 | 501 | self.ignore_weights = True 502 | 503 | def _summarise_background(self, 504 | background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix], 505 | n_background_samples: int) -> \ 506 | Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix]: 507 | """ 508 | Summarises the background data to n_background_samples in order to reduce the computational cost. If the 509 | background data is a `shap.common.Data object`, no summarisation is performed. 510 | 511 | Returns 512 | ------- 513 | If the user has specified grouping, then the input object is subsampled and an object of the same 514 | type is returned. Otherwise, a `shap.common.Data` object containing the result of a k-means algorithm 515 | is wrapped in a `shap.common.DenseData` object and returned. The samples are weighted according to the 516 | frequency of the occurrence of the clusters in the original data. 517 | """ 518 | 519 | if isinstance(background_data, shap.common.Data): 520 | msg = "Received option to summarise the data but the background_data object " \ 521 | "was an instance of shap.common.Data. No summarisation will take place!" 522 | logger.warning(msg) 523 | return background_data 524 | 525 | if background_data.ndim == 1: 526 | msg = "Received option to summarise the data but the background_data object only had " \ 527 | "one record with {} features. No summarisation will take place!" 528 | logger.warning(msg.format(len(background_data))) 529 | return background_data 530 | 531 | self.summarise_background = True 532 | 533 | # if the input is sparse, we assume there are categorical variables and use random sampling, not kmeans 534 | if self.use_groups or self.categorical_names or isinstance(background_data, sparse.spmatrix): 535 | return shap.sample(background_data, nsamples=n_background_samples) 536 | else: 537 | logger.info( 538 | "When summarising with kmeans, the samples are weighted in proportion to their " 539 | "cluster occurrence frequency. Please specify a different weighting of the samples " 540 | "through the by passing a weights of len=n_background_samples to the constructor!" 541 | ) 542 | return shap.kmeans(background_data, n_background_samples) 543 | 544 | @methdispatch 545 | def _get_data(self, 546 | background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix], 547 | group_names: Sequence, 548 | groups: List[Sequence[int]], 549 | weights: Sequence[Union[float, int]], 550 | **kwargs): 551 | """ 552 | Groups the data if grouping options are specified, returning a shap.common.Data object in this 553 | case. Otherwise, the original data is returned and handled internally by the shap library. 554 | """ 555 | 556 | raise TypeError("Type {} is not supported for background data!".format(type(background_data))) 557 | 558 | @_get_data.register(shap.common.Data) 559 | def _(self, background_data, *args, **kwargs) -> shap.common.Data: 560 | """ 561 | Initialises background data if the user passes a `shap.common.Data` object as input. 562 | 563 | Notes 564 | _____ 565 | 566 | If `self.summarise_background = True`, then a `shap.common.Data` object is 567 | returned if the user passed a `shap.common.Data` object to `fit` or didn't specify groups. 568 | """ 569 | 570 | group_names, groups, weights = args 571 | if weights is not None and self.summarise_background: 572 | if not self.ignore_weights: 573 | background_data.weights = weights 574 | if self.use_groups: 575 | background_data.groups = groups 576 | background_data.group_names = group_names 577 | background_data.group_size = len(groups) 578 | 579 | return background_data 580 | 581 | @_get_data.register(np.ndarray) # type: ignore 582 | def _(self, background_data, *args, **kwargs) -> Union[np.ndarray, shap.common.Data]: 583 | """ 584 | Initialises background data if the user passes an `np.ndarray` object as input. 585 | If the user specifies feature grouping then a `shap.common.DenseData` object 586 | is returned. Weights are handled separately to avoid triggering assertion 587 | correct inside `shap` library. Otherwise, the original data is returned and 588 | is handled by the `shap` library internally. 589 | """ 590 | 591 | group_names, groups, weights = args 592 | new_args = (group_names, groups, weights) if weights is not None else (group_names, groups) 593 | if self.use_groups: 594 | return DenseData(background_data, *new_args) 595 | else: 596 | return background_data 597 | 598 | @_get_data.register(sparse.spmatrix) # type: ignore 599 | def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, sparse.spmatrix]: 600 | """ 601 | Initialises background data if the user passes a sparse matrix as input. If the 602 | user specifies feature grouping, then the sparse array is converted to a dense 603 | array. Otherwise, the original array is returned and handled internally by `shap` 604 | library. 605 | """ 606 | 607 | group_names, groups, weights = args 608 | new_args = (group_names, groups, weights) if weights is not None else (group_names, groups) 609 | 610 | if self.use_groups: 611 | logger.warning( 612 | "Grouping is not currently compatible with sparse matrix inputs. " 613 | "Converting background data sparse array to dense matrix." 614 | ) 615 | background_data = background_data.toarray() 616 | return DenseData( 617 | background_data, 618 | *new_args, 619 | ) 620 | 621 | return background_data 622 | 623 | @_get_data.register(pd.core.frame.DataFrame) # type: ignore 624 | def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, pd.core.frame.DataFrame]: 625 | """ 626 | Initialises background data if the user passes a `pandas.core.frame.DataFrame` as input. 627 | If the user has specified groups and given a data frame, it initialises a `shap.common.DenseData` 628 | object explicitly as this is not handled by `shap` library internally. Otherwise, data initialisation, 629 | is left to the `shap` library. 630 | """ 631 | 632 | _, groups, weights = args 633 | new_args = (groups, weights) if weights is not None else (groups,) 634 | if self.use_groups: 635 | logger.info("Group names are specified by column headers, group_names will be ignored!") 636 | keep_index = kwargs.get("keep_index", False) 637 | if keep_index: 638 | return DenseDataWithIndex( 639 | background_data.values, 640 | list(background_data.columns), 641 | background_data.index.values, 642 | background_data.index.name, 643 | *new_args, 644 | ) 645 | else: 646 | return DenseData( 647 | background_data.values, 648 | list(background_data.columns), 649 | *new_args, 650 | ) 651 | else: 652 | return background_data 653 | 654 | @_get_data.register(pd.core.frame.Series) # type: ignore 655 | def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, pd.core.frame.Series]: 656 | """ 657 | Initialises background data if the user passes a `pandas.Series` object as input. 658 | Original object is returned as this is initialised internally by `shap` is there 659 | is no group structure specified. Otherwise, a `shap.common.DenseData` object 660 | is initialised. 661 | """ 662 | 663 | _, groups, _ = args 664 | if self.use_groups: 665 | return DenseData( 666 | background_data.values.reshape(1, len(background_data)), 667 | list(background_data.index), 668 | groups, 669 | ) 670 | 671 | return background_data 672 | 673 | def _update_metadata(self, data_dict: dict, params: bool = False) -> None: 674 | """ 675 | This function updates the metadata of the explainer using the data from 676 | the `data_dict`. If the params option is specified, then each key-value 677 | pair is added to the metadata `'params'` dictionary only if the key is 678 | included in `KERNEL_SHAP_PARAMS`. 679 | 680 | Parameters 681 | ---------- 682 | data_dict 683 | Dictionary containing the data to be stored in the metadata. 684 | params 685 | If True, the method updates the `'params'` attribute of the metatadata. 686 | """ 687 | 688 | if params: 689 | for key in data_dict.keys(): 690 | if key not in KERNEL_SHAP_PARAMS: 691 | continue 692 | else: 693 | self.meta['params'].update([(key, data_dict[key])]) 694 | else: 695 | self.meta.update(data_dict) 696 | 697 | def fit(self, # type: ignore 698 | background_data: Union[np.ndarray, sparse.spmatrix, pd.DataFrame, shap.common.Data], 699 | summarise_background: Union[bool, str] = False, 700 | n_background_samples: int = KERNEL_SHAP_BACKGROUND_THRESHOLD, 701 | group_names: Union[Tuple[str], List[str], None] = None, 702 | groups: Optional[List[Union[Tuple[int], List[int]]]] = None, 703 | weights: Union[Union[List[float], Tuple[float]], np.ndarray, None] = None, 704 | **kwargs) -> "KernelShap": 705 | """ 706 | This takes a background dataset (usually a subsample of the training set) as an input along with several 707 | user specified options and initialises a `KernelShap` explainer. The runtime of the algorithm depends on the 708 | number of samples in this dataset and on the number of features in the dataset. To reduce the size of the 709 | dataset, the `summarise_background` option and `n_background_samples` should be used. To reduce the feature 710 | dimensionality, encoded categorical variables can be treated as one during the feature perturbation process; 711 | this decreases the effective feature dimensionality, can reduce the variance of the shap values estimation and 712 | reduces slightly the number of calls to the predictor. Further runtime savings can be achieved by changing the 713 | `nsamples` parameter in the call to explain. Runtime reduction comes with an accuracy trade-off, so it is better 714 | to experiment with a runtime reduction method and understand results stability before using the system. 715 | 716 | Parameters 717 | ----------- 718 | background_data 719 | Data used to estimate feature contributions and baseline values for force plots. The rows of the 720 | background data should represent samples and the columns features. 721 | summarise_background 722 | A large background dataset impacts the runtime and memory footprint of the algorithm. By setting 723 | this argument to `True`, only `n_background_samples` from the provided data are selected. If group_names or 724 | groups arguments are specified, the algorithm assumes that the data contains categorical variables so 725 | the records are selected uniformly at random. Otherwise, `shap.kmeans` (a wrapper around `sklearn` k-means 726 | implementation) is used for selection. If set to `'auto'`, a default of 727 | `KERNEL_SHAP_BACKGROUND_THRESHOLD` samples is selected. 728 | n_background_samples 729 | The number of samples to keep in the background dataset if `summarise_background=True`. 730 | groups: 731 | A list containing sub-lists specifying the indices of features belonging to the same group. 732 | group_names: 733 | If specified, this array is used to treat groups of features as one during feature perturbation. 734 | This feature can be useful, for example, to treat encoded categorical variables as one and can 735 | result in computational savings (this may require adjusting the `nsamples` parameter). 736 | weights: 737 | A sequence or array of weights. This is used only if grouping is specified and assigns a weight 738 | to each point in the dataset. 739 | kwargs: 740 | Expected keyword arguments include `keep_index` (bool) and should be used if a data frame containing an 741 | index column is passed to the algorithm. 742 | """ 743 | 744 | np.random.seed(self.seed) 745 | 746 | self._fitted = True 747 | # user has specified variable groups 748 | use_groups = groups is not None or group_names is not None 749 | self.use_groups = use_groups 750 | 751 | if summarise_background: 752 | if isinstance(summarise_background, str): 753 | if not isinstance(background_data, shap.common.Data): 754 | n_samples = background_data.shape[0] 755 | else: 756 | n_samples = background_data.data.shape[0] 757 | n_background_samples = min(n_samples, KERNEL_SHAP_BACKGROUND_THRESHOLD) 758 | background_data = self._summarise_background(background_data, n_background_samples) 759 | 760 | # check user inputs to provide warnings if input is incorrect 761 | self._check_inputs(background_data, group_names, groups, weights) 762 | if self.create_group_names: 763 | group_names = ['group_{}'.format(i) for i in range(len(groups))] 764 | # disable grouping or data weights if inputs are not correct 765 | if self.ignore_weights: 766 | weights = None 767 | if not self.use_groups: 768 | group_names, groups = None, None 769 | else: 770 | self.feature_names = group_names 771 | 772 | # perform grouping if requested by the user 773 | self.background_data = self._get_data(background_data, group_names, groups, weights, **kwargs) 774 | explainer_args = (self.predictor, self.background_data) 775 | explainer_kwargs = {'link': self.link} 776 | # distribute computation 777 | if self.distribute: 778 | # set seed for each process 779 | explainer_kwargs['seed'] = self.seed 780 | self._explainer = DistributedExplainer( 781 | self.distributed_opts, 782 | KernelExplainerWrapper, 783 | explainer_args, 784 | explainer_kwargs, 785 | ) # type: DistributedExplainer 786 | else: 787 | self._explainer = KernelExplainerWrapper(*explainer_args, 788 | **explainer_kwargs) # type: KernelExplainerWrapper # noqa: E501 789 | self.expected_value = self._explainer.expected_value 790 | if not self._explainer.vector_out: 791 | logger.warning( 792 | "Predictor returned a scalar value. Ensure the output represents a probability or decision score as " 793 | "opposed to a classification label!" 794 | ) 795 | 796 | # update metadata 797 | params = { 798 | 'groups': groups, 799 | 'group_names': group_names, 800 | 'weights': weights, 801 | 'kwargs': kwargs, 802 | 'summarise_background': self.summarise_background, 803 | 'grouped': self.use_groups, 804 | 'transpose': self.transposed, 805 | } 806 | self._update_metadata(params, params=True) 807 | 808 | return self 809 | 810 | def explain(self, 811 | X: Union[np.ndarray, pd.DataFrame, sparse.spmatrix], 812 | summarise_result: bool = False, 813 | cat_vars_start_idx: Sequence[int] = None, 814 | cat_vars_enc_dim: Sequence[int] = None, 815 | **kwargs) -> Explanation: 816 | """ 817 | Explains the instances in the array `X`. 818 | 819 | Parameters 820 | ---------- 821 | X 822 | Instances to be explained. Note that the `pd.DataFrame` and `sparse.spmatrix` are not supported by the 823 | distributed version. In the future `pd.DataFrame` might be supported, please raise a feature request if you 824 | need this feature. 825 | summarise_result 826 | Specifies whether the shap values corresponding to dimensions of encoded categorical variables should be 827 | summed so that a single shap value is returned for each categorical variable. Both the start indices of 828 | the categorical variables (`cat_vars_start_idx`) and the encoding dimensions (`cat_vars_enc_dim`) 829 | have to be specified 830 | cat_vars_start_idx 831 | The start indices of the categorical variables. If specified, `cat_vars_enc_dim` should also be specified. 832 | cat_vars_enc_dim 833 | The length of the encoding dimension for each categorical variable. If specified `cat_vars_start_idx` should 834 | also be specified. 835 | kwargs 836 | Keyword arguments specifying explain behaviour. Valid arguments are: 837 | 838 | - `nsamples`: controls the number of predictor calls and therefore runtime. 839 | 840 | - `l1_reg`: the algorithm is exponential in the feature dimension. If set to `auto` the algorithm will \ 841 | first run a feature selection algorithm to select the top features, provided the fraction of sampled \ 842 | sets of missing features is less than 0.2 from the number of total subsets. The Akaike Information \ 843 | Criterion is used in this case. See our examples for more details about available settings for this \ 844 | parameter. Note that by first running a feature selection step, the shapley values of the remainder of \ 845 | the features will be different to those estimated from the entire set. 846 | 847 | For more details, please see the shap library `documentation`_ . 848 | 849 | .. _documentation: 850 | https://shap.readthedocs.io/en/latest/. 851 | 852 | Returns 853 | ------- 854 | explanation 855 | An explanation object containing the algorithm results. 856 | 857 | Raises 858 | ------ 859 | TypeError 860 | In the following conditions: 861 | - `fit` method has not been called prior to explain 862 | - distributed context has been specified but `X` is a `pd.DataFrame` or `sparse.spmatrix` object. 863 | """ # noqa W605 864 | 865 | if not self._fitted: 866 | raise TypeError( 867 | "Called explain on an unfitted object! Please fit the explainer using the .fit method first!" 868 | ) 869 | 870 | if self.distribute: 871 | if isinstance(X, sparse.spmatrix) or isinstance(X, pd.DataFrame): 872 | raise TypeError( 873 | "Incorrect type for `X` due to distributed context. Cast `X` to np.ndarray." 874 | ) 875 | 876 | # convert data to dense format if sparse 877 | if self.use_groups and isinstance(X, sparse.spmatrix): 878 | X = X.toarray() 879 | 880 | shap_values = self._explainer.get_explanation(X, **kwargs) 881 | self.expected_value = self._explainer.expected_value 882 | expected_value = self.expected_value 883 | # for scalar model outputs a single numpy array is returned 884 | if isinstance(shap_values, np.ndarray): 885 | shap_values = [shap_values] 886 | if isinstance(expected_value, float): 887 | expected_value = [expected_value] 888 | 889 | explanation = self.build_explanation( 890 | X, 891 | shap_values, 892 | expected_value, 893 | summarise_result=summarise_result, 894 | cat_vars_start_idx=cat_vars_start_idx, 895 | cat_vars_enc_dim=cat_vars_enc_dim, 896 | ) 897 | 898 | return explanation 899 | 900 | def build_explanation(self, 901 | X: Union[np.ndarray, pd.DataFrame, sparse.spmatrix], 902 | shap_values: List[np.ndarray], 903 | expected_value: List[float], 904 | **kwargs) -> Explanation: 905 | """ 906 | Create an explanation object. If output summarisation is required and all inputs necessary for this operation 907 | are passed, the raw shap values are summed first so that a single shap value is returned for each categorical 908 | variable, as opposed to a shap value per dimension of categorical variable encoding. 909 | 910 | Parameters 911 | ---------- 912 | X 913 | Instances to be explained. 914 | shap_values 915 | Each entry is a n_instances x n_features array, and the length of the list equals the dimensionality 916 | of the predictor output. The rows of each array correspond to the shap values for the instances with 917 | the corresponding row index in `X`. The length of the list equals the number of model outputs. 918 | expected_value 919 | A list containing the expected value of the prediction for each class. Its length should be equal to that of 920 | `shap_values`. 921 | 922 | Returns 923 | ------- 924 | explanation 925 | An explanation object containing the shap values and prediction in the `data` field, along with a `meta` 926 | field containing additional data. See usage `examples`_ for details. 927 | 928 | .. _examples: 929 | https://docs.seldon.io/projects/alibi/en/latest/methods/KernelSHAP.html 930 | 931 | """ 932 | 933 | # TODO: DEFINE COMPLETE SCHEMA FOR THE METADATA (ONGOING) 934 | # TODO: Plotting default should be same space as the explanation? How do we figure out what space they 935 | # explain in? 936 | 937 | cat_vars_start_idx = kwargs.get('cat_vars_start_idx', ()) # type: Sequence[int] 938 | cat_vars_enc_dim = kwargs.get('cat_vars_enc_dim', ()) # type: Sequence[int] 939 | summarise_result = kwargs.get('summarise_result', False) # type: bool 940 | if summarise_result: 941 | self._check_result_summarisation(summarise_result, cat_vars_start_idx, cat_vars_enc_dim) 942 | if self.summarise_result: 943 | summarised_shap = [] 944 | for shap_array in shap_values: 945 | summarised_shap.append(sum_categories(shap_array, cat_vars_start_idx, cat_vars_enc_dim)) 946 | shap_values = summarised_shap 947 | 948 | # apply explainer link function to obtain raw predictions on the same scale as used by the explainer 949 | linkfv = np.vectorize(convert_to_link(self.link).f) 950 | raw_predictions = linkfv(self.predictor(X)) 951 | 952 | if self.task != 'regression': 953 | argmax_pred = np.argmax(np.atleast_2d(raw_predictions), axis=1) 954 | else: 955 | argmax_pred = [] 956 | importances = rank_by_importance(shap_values, feature_names=self.feature_names) 957 | 958 | if isinstance(X, sparse.spmatrix): 959 | X = X.toarray() 960 | else: 961 | X = np.array(X) 962 | 963 | # output explanation dictionary 964 | data = copy.deepcopy(DEFAULT_DATA_KERNEL_SHAP) 965 | data.update( 966 | shap_values=shap_values, 967 | expected_value=np.array(expected_value), 968 | link=self.link, 969 | categorical_names=self.categorical_names, 970 | feature_names=self.feature_names 971 | ) 972 | data['raw'].update( 973 | raw_prediction=raw_predictions, 974 | prediction=argmax_pred, 975 | instances=X, 976 | importances=importances 977 | ) 978 | self._update_metadata({"summarise_result": self.summarise_result}, params=True) 979 | 980 | return Explanation(meta=copy.deepcopy(self.meta), data=data) 981 | 982 | def _check_result_summarisation(self, 983 | summarise_result: bool, 984 | cat_vars_start_idx: Sequence[int], 985 | cat_vars_enc_dim: Sequence[int]) -> None: 986 | """ 987 | This function checks whether the result summarisation option is correct given the inputs and explainer setup. 988 | 989 | Parameters 990 | ---------- 991 | summarise_result: 992 | See `explain` documentation. 993 | cat_vars_start_idx: 994 | See `explain` documentation. 995 | cat_vars_enc_dim: 996 | See `explain` documentation. 997 | """ 998 | 999 | self.summarise_result = summarise_result 1000 | if summarise_result: 1001 | if not cat_vars_start_idx or not cat_vars_enc_dim: 1002 | logger.warning( 1003 | "Results cannot be summarised as either the" 1004 | "start indices for categorical variables or" 1005 | "the encoding dimensions were not passed!" 1006 | ) 1007 | self.summarise_result = False 1008 | elif self.use_groups: 1009 | logger.warning( 1010 | "Specified both groups as well as summarisation for categorical variables. " 1011 | "By grouping, only one shap value is estimated for each categorical variable. " 1012 | "Summarisation is not necessary!" 1013 | ) 1014 | self.summarise_result = False 1015 | -------------------------------------------------------------------------------- /explainers/utils.py: -------------------------------------------------------------------------------- 1 | import io 2 | import logging 3 | import os 4 | import pickle 5 | import requests 6 | 7 | import numpy as np 8 | 9 | from functools import singledispatch, update_wrapper 10 | from scipy import sparse 11 | from typing import Callable, List 12 | 13 | 14 | EXPLANATIONS_SET_URL = 'https://storage.googleapis.com/seldon-datasets/experiments/distributed_kernel_shap/adult_processed.pkl' 15 | BACKGROUND_SET_URL = 'https://storage.googleapis.com/seldon-datasets/experiments/distributed_kernel_shap/adult_background.pkl' 16 | EXPLANATIONS_SET_LOCAL = 'data/adult_processed.pkl' 17 | BACKGROUND_SET_LOCAL = 'data/adult_background.pkl' 18 | MODEL_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl' 19 | MODEL_LOCAL = 'assets/predictor.pkl' 20 | 21 | 22 | class Bunch(dict): 23 | """ 24 | Container object for internal datasets. Dictionary-like object that exposes its keys as attributes. 25 | """ 26 | 27 | def __init__(self, **kwargs): 28 | super().__init__(kwargs) 29 | 30 | def __setattr__(self, key, value): 31 | self[key] = value 32 | 33 | def __dir__(self): 34 | return self.keys() 35 | 36 | def __getattr__(self, key): 37 | try: 38 | return self[key] 39 | except KeyError: 40 | raise AttributeError(key) 41 | 42 | 43 | def methdispatch(func: Callable): 44 | """ 45 | A decorator that is used to support singledispatch style functionality 46 | for instance methods. By default, singledispatch selects a function to 47 | call from registered based on the type of args[0]: 48 | 49 | def wrapper(*args, **kw): 50 | return dispatch(args[0].__class__)(*args, **kw) 51 | 52 | This uses singledispatch to do achieve this but instead uses args[1] 53 | since args[0] will always be self. 54 | """ 55 | 56 | dispatcher = singledispatch(func) 57 | 58 | def wrapper(*args, **kw): 59 | return dispatcher.dispatch(args[1].__class__)(*args, **kw) 60 | 61 | wrapper.register = dispatcher.register 62 | update_wrapper(wrapper, dispatcher) 63 | 64 | return wrapper 65 | 66 | 67 | def get_filename(workers: int, batch_size: int, cpu_fraction: float = 1.0, serve: bool = True): 68 | """ 69 | Creates a filename for an experiment given the inputs. 70 | 71 | Parameters 72 | ---------- 73 | workers 74 | How many worker processes are used for the explanation task. 75 | batch_size 76 | Mini-batch size: how many explanations are sent to one worker process at a time. 77 | cpu_fraction 78 | CPU fraction utilized by a worker process. 79 | serve 80 | A different naming convention is used depending on whether ray serve is used to distribute the explanations or 81 | not. 82 | """ 83 | 84 | if serve: 85 | return f"results/ray_replicas_{workers}_maxbatch_{batch_size}_actorfr_{cpu_fraction}.pkl" 86 | return f"results/ray_workers_{workers}_bsize_{batch_size}_actorfr_{cpu_fraction}.pkl" 87 | 88 | 89 | def batch(X: np.ndarray, batch_size: int = None, n_batches: int = 4) -> List[np.ndarray]: 90 | """ 91 | Splits the input into mini-batches. 92 | 93 | Parameters 94 | ---------- 95 | X 96 | Array to be split. 97 | batch_size 98 | If not `None`, batches of this size are created. The sizes of the batches created might vary if the 0-th 99 | dimension of `X` is not divisible by `batch_size`. For an array of len l that should be split into n sections, 100 | it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n. 101 | n_batches 102 | If `batch_size` is `None`, then `X` is split into `n_batches` mini-batches. 103 | 104 | Returns 105 | ------ 106 | A list of sub-arrays of X. 107 | """ 108 | 109 | n_records = X.shape[0] 110 | if isinstance(X, sparse.spmatrix): 111 | X = X.toarray() 112 | 113 | if batch_size: 114 | n_batches = n_records // batch_size 115 | if n_records % batch_size != 0: 116 | n_batches += 1 117 | slices = [batch_size * i for i in range(1, n_batches)] 118 | batches = np.array_split(X, slices) 119 | else: 120 | batches = np.array_split(X, n_batches) 121 | return batches 122 | 123 | 124 | def _download(path: str): 125 | """ Download from Seldon GC bucket indicated by `path`.""" 126 | 127 | try: 128 | resp = requests.get(path) 129 | resp.raise_for_status() 130 | except requests.RequestException: 131 | logging.exception("Could not connect to bucket, URL may be out of service!") 132 | raise ConnectionError 133 | 134 | return resp 135 | 136 | 137 | def load_model(path: str): 138 | """ 139 | Load a model that has been saved locally or download a default model from a Seldon bucket. 140 | """ 141 | 142 | try: 143 | with open(path, "rb") as f: 144 | model = pickle.load(f) 145 | return model 146 | except FileNotFoundError: 147 | logging.info(f"Could not find model {path}. Downloading from {MODEL_URL}...") 148 | model_raw = _download(MODEL_URL) 149 | model = pickle.load(io.BytesIO(model_raw.content)) 150 | 151 | if not os.path.exists('assets'): 152 | os.mkdir('assets') 153 | 154 | with open("assets/predictor.pkl", "wb") as f: 155 | pickle.dump(model, f) 156 | 157 | return model 158 | 159 | 160 | def load_data(): 161 | """ 162 | Load instances to be explained and background data from the data/ directory if they exist, otherwise download 163 | from Seldon Google Cloud bucket. 164 | """ 165 | 166 | data = {'all': None, 'background': None} 167 | try: 168 | with open(BACKGROUND_SET_LOCAL, 'rb') as f: 169 | data['background'] = pickle.load(f) 170 | with open(EXPLANATIONS_SET_LOCAL, 'rb') as f: 171 | data['all'] = pickle.load(f) 172 | except FileNotFoundError: 173 | logging.info(f"Downloading data from {EXPLANATIONS_SET_URL}") 174 | all_data_raw = _download(EXPLANATIONS_SET_URL) 175 | data['all'] = pickle.load(io.BytesIO(all_data_raw.content)) 176 | logging.info(f"Downloading data from {BACKGROUND_SET_URL}") 177 | background_data_raw = _download(BACKGROUND_SET_URL) 178 | data['background'] = pickle.load(io.BytesIO(background_data_raw.content)) 179 | 180 | # save the data locally so we don't download it every time we run the main script 181 | if not os.path.exists('data'): 182 | os.mkdir('data') 183 | with open('data/adult_background.pkl', 'wb') as f: 184 | pickle.dump(data['background'], f) 185 | with open('data/adult_processed.pkl', 'wb') as f: 186 | pickle.dump(data['all'], f) 187 | 188 | return data 189 | -------------------------------------------------------------------------------- /explainers/wrappers.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | import numpy as np 4 | 5 | from explainers.kernel_shap import KernelShap 6 | from ray import serve 7 | from typing import Any, Dict, List 8 | 9 | 10 | class KernelShapModel: 11 | """Backend class for distributing explanations with Ray Serve.""" 12 | def __init__(self, 13 | predictor, 14 | background_data: np.ndarray, 15 | constructor_kwargs: Dict[str, Any], 16 | fit_kwargs: Dict[str, Any]): 17 | """ 18 | Initialises backend for distributed explanations. 19 | 20 | 21 | Parameters 22 | ---------- 23 | predictor 24 | Model to be explained. 25 | background_data 26 | Background data used for fitting the explainer. 27 | constructor_kwargs 28 | Any other arguments for explainer constructor. See `explainers.kernel_shap.KernelShap` for details. 29 | fit_kwargs 30 | Any other arguments for the explainer `fit` method. See `explainers.kernel_shap.KernelShap` for details. 31 | """ 32 | 33 | if not hasattr(predictor, "predict_proba"): 34 | logging.warning("Predictor does not have predict_proba attribute, defaulting to predict") 35 | predict_fcn = predictor.predict 36 | else: 37 | predict_fcn = predictor.predict_proba 38 | self.explainer = KernelShap(predict_fcn, **constructor_kwargs) 39 | 40 | # TODO: REFACTOR THIS TO USE THE BACKEND METHOD CALLING FUNCTIONALITY 41 | self.explainer.fit(background_data, **fit_kwargs) 42 | 43 | def __call__(self, flask_request) -> str: 44 | """ 45 | Serves explanations for a single instance. 46 | 47 | Parameters 48 | --------- 49 | flask_request 50 | A json flask request that contains a list with the instance to be explained in the ``array`` field. 51 | 52 | Returns 53 | ------- 54 | A `str` object representing a json representation of the explainer output. 55 | """ 56 | instance = np.array(flask_request.json["array"]) 57 | explanations = self.explainer.explain(instance, silent=True) 58 | 59 | return explanations.to_json() 60 | 61 | 62 | class BatchKernelShapModel(KernelShapModel): 63 | """Extends KernelShapModel to achieve batching of requests.""" 64 | 65 | @serve.accept_batch 66 | def __call__(self, flask_requests: List) -> List[str]: 67 | """ 68 | Serves explanations for a single instance. 69 | 70 | Parameters 71 | ---------- 72 | flask_requests: 73 | A list of json flask requests. Each request should contain an instance to be explained in the ``array`` 74 | field. 75 | 76 | Returns 77 | ------- 78 | A `str` object representing a json representation of the explainer output. 79 | """ 80 | 81 | instances = [request.json["array"] for request in flask_requests] 82 | explanations = [] 83 | for instance in instances: 84 | explanations.append( 85 | self.explainer.explain(np.array(instance), silent=True).to_json() 86 | ) 87 | 88 | return explanations 89 | -------------------------------------------------------------------------------- /images/pool_1_node.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_1_node.PNG -------------------------------------------------------------------------------- /images/pool_k8s_32.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_k8s_32.PNG -------------------------------------------------------------------------------- /images/pool_k8s_56.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_k8s_56.PNG -------------------------------------------------------------------------------- /images/serve_1_node.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_1_node.PNG -------------------------------------------------------------------------------- /images/serve_k8s_32.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_k8s_32.PNG -------------------------------------------------------------------------------- /images/serve_k8s_56.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_k8s_56.PNG -------------------------------------------------------------------------------- /poetry.lock: -------------------------------------------------------------------------------- 1 | [[package]] 2 | category = "main" 3 | description = "Async http client/server framework (asyncio)" 4 | name = "aiohttp" 5 | optional = false 6 | python-versions = ">=3.5.3" 7 | version = "3.6.2" 8 | 9 | [package.dependencies] 10 | async-timeout = ">=3.0,<4.0" 11 | attrs = ">=17.3.0" 12 | chardet = ">=2.0,<4.0" 13 | multidict = ">=4.5,<5.0" 14 | yarl = ">=1.0,<2.0" 15 | 16 | [package.extras] 17 | speedups = ["aiodns", "brotlipy", "cchardet"] 18 | 19 | [[package]] 20 | category = "main" 21 | description = "Timeout context manager for asyncio programs" 22 | name = "async-timeout" 23 | optional = false 24 | python-versions = ">=3.5.3" 25 | version = "3.0.1" 26 | 27 | [[package]] 28 | category = "main" 29 | description = "Classes Without Boilerplate" 30 | name = "attrs" 31 | optional = false 32 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" 33 | version = "20.1.0" 34 | 35 | [package.extras] 36 | dev = ["coverage (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface", "sphinx", "sphinx-rtd-theme", "pre-commit"] 37 | docs = ["sphinx", "sphinx-rtd-theme", "zope.interface"] 38 | tests = ["coverage (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface"] 39 | 40 | [[package]] 41 | category = "main" 42 | description = "Screen-scraping library" 43 | name = "beautifulsoup4" 44 | optional = false 45 | python-versions = "*" 46 | version = "4.9.1" 47 | 48 | [package.dependencies] 49 | soupsieve = [">1.2", "<2.0"] 50 | 51 | [package.extras] 52 | html5lib = ["html5lib"] 53 | lxml = ["lxml"] 54 | 55 | [[package]] 56 | category = "main" 57 | description = "a list-like type with better asymptotic performance and similar performance on small lists" 58 | name = "blist" 59 | optional = false 60 | python-versions = "*" 61 | version = "1.3.6" 62 | 63 | [[package]] 64 | category = "main" 65 | description = "Python package for providing Mozilla's CA Bundle." 66 | name = "certifi" 67 | optional = false 68 | python-versions = "*" 69 | version = "2020.6.20" 70 | 71 | [[package]] 72 | category = "main" 73 | description = "Universal encoding detector for Python 2 and 3" 74 | name = "chardet" 75 | optional = false 76 | python-versions = "*" 77 | version = "3.0.4" 78 | 79 | [[package]] 80 | category = "main" 81 | description = "Composable command line interface toolkit" 82 | name = "click" 83 | optional = false 84 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 85 | version = "7.1.2" 86 | 87 | [[package]] 88 | category = "main" 89 | description = "Cross-platform colored terminal text." 90 | name = "colorama" 91 | optional = false 92 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 93 | version = "0.4.3" 94 | 95 | [[package]] 96 | category = "main" 97 | description = "Terminal string styling done right, in Python." 98 | name = "colorful" 99 | optional = false 100 | python-versions = "*" 101 | version = "0.5.4" 102 | 103 | [package.dependencies] 104 | colorama = "*" 105 | 106 | [[package]] 107 | category = "main" 108 | description = "A platform independent file lock." 109 | name = "filelock" 110 | optional = false 111 | python-versions = "*" 112 | version = "3.0.12" 113 | 114 | [[package]] 115 | category = "main" 116 | description = "A simple framework for building complex web applications." 117 | name = "flask" 118 | optional = false 119 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 120 | version = "1.1.2" 121 | 122 | [package.dependencies] 123 | Jinja2 = ">=2.10.1" 124 | Werkzeug = ">=0.15" 125 | click = ">=5.1" 126 | itsdangerous = ">=0.24" 127 | 128 | [package.extras] 129 | dev = ["pytest", "coverage", "tox", "sphinx", "pallets-sphinx-themes", "sphinxcontrib-log-cabinet", "sphinx-issues"] 130 | docs = ["sphinx", "pallets-sphinx-themes", "sphinxcontrib-log-cabinet", "sphinx-issues"] 131 | dotenv = ["python-dotenv"] 132 | 133 | [[package]] 134 | category = "main" 135 | description = "Python bindings to the Google search engine." 136 | name = "google" 137 | optional = false 138 | python-versions = "*" 139 | version = "3.0.0" 140 | 141 | [package.dependencies] 142 | beautifulsoup4 = "*" 143 | 144 | [[package]] 145 | category = "main" 146 | description = "HTTP/2-based RPC framework" 147 | name = "grpcio" 148 | optional = false 149 | python-versions = "*" 150 | version = "1.31.0" 151 | 152 | [package.dependencies] 153 | six = ">=1.5.2" 154 | 155 | [package.extras] 156 | protobuf = ["grpcio-tools (>=1.31.0)"] 157 | 158 | [[package]] 159 | category = "main" 160 | description = "A pure-Python, bring-your-own-I/O implementation of HTTP/1.1" 161 | name = "h11" 162 | optional = false 163 | python-versions = "*" 164 | version = "0.9.0" 165 | 166 | [[package]] 167 | category = "main" 168 | description = "A collection of framework independent HTTP protocol utils." 169 | marker = "sys_platform != \"win32\" and sys_platform != \"cygwin\" and platform_python_implementation != \"PyPy\"" 170 | name = "httptools" 171 | optional = false 172 | python-versions = "*" 173 | version = "0.1.1" 174 | 175 | [package.extras] 176 | test = ["Cython (0.29.14)"] 177 | 178 | [[package]] 179 | category = "main" 180 | description = "Internationalized Domain Names in Applications (IDNA)" 181 | name = "idna" 182 | optional = false 183 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" 184 | version = "2.10" 185 | 186 | [[package]] 187 | category = "main" 188 | description = "Read metadata from Python packages" 189 | marker = "python_version < \"3.8\"" 190 | name = "importlib-metadata" 191 | optional = false 192 | python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7" 193 | version = "1.7.0" 194 | 195 | [package.dependencies] 196 | zipp = ">=0.5" 197 | 198 | [package.extras] 199 | docs = ["sphinx", "rst.linker"] 200 | testing = ["packaging", "pep517", "importlib-resources (>=1.3)"] 201 | 202 | [[package]] 203 | category = "main" 204 | description = "Various helpers to pass data to untrusted environments and back." 205 | name = "itsdangerous" 206 | optional = false 207 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" 208 | version = "1.1.0" 209 | 210 | [[package]] 211 | category = "main" 212 | description = "A very fast and expressive template engine." 213 | name = "jinja2" 214 | optional = false 215 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 216 | version = "2.11.2" 217 | 218 | [package.dependencies] 219 | MarkupSafe = ">=0.23" 220 | 221 | [package.extras] 222 | i18n = ["Babel (>=0.8)"] 223 | 224 | [[package]] 225 | category = "main" 226 | description = "Lightweight pipelining: using Python functions as pipeline jobs." 227 | name = "joblib" 228 | optional = false 229 | python-versions = ">=3.6" 230 | version = "0.16.0" 231 | 232 | [[package]] 233 | category = "main" 234 | description = "An implementation of JSON Schema validation for Python" 235 | name = "jsonschema" 236 | optional = false 237 | python-versions = "*" 238 | version = "3.2.0" 239 | 240 | [package.dependencies] 241 | attrs = ">=17.4.0" 242 | pyrsistent = ">=0.14.0" 243 | setuptools = "*" 244 | six = ">=1.11.0" 245 | 246 | [package.dependencies.importlib-metadata] 247 | python = "<3.8" 248 | version = "*" 249 | 250 | [package.extras] 251 | format = ["idna", "jsonpointer (>1.13)", "rfc3987", "strict-rfc3339", "webcolors"] 252 | format_nongpl = ["idna", "jsonpointer (>1.13)", "webcolors", "rfc3986-validator (>0.1.0)", "rfc3339-validator"] 253 | 254 | [[package]] 255 | category = "main" 256 | description = "Safely add untrusted strings to HTML/XML markup." 257 | name = "markupsafe" 258 | optional = false 259 | python-versions = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*" 260 | version = "1.1.1" 261 | 262 | [[package]] 263 | category = "main" 264 | description = "MessagePack (de)serializer." 265 | name = "msgpack" 266 | optional = false 267 | python-versions = "*" 268 | version = "1.0.0" 269 | 270 | [[package]] 271 | category = "main" 272 | description = "multidict implementation" 273 | name = "multidict" 274 | optional = false 275 | python-versions = ">=3.5" 276 | version = "4.7.6" 277 | 278 | [[package]] 279 | category = "main" 280 | description = "NumPy is the fundamental package for array computing with Python." 281 | name = "numpy" 282 | optional = false 283 | python-versions = ">=3.6" 284 | version = "1.19.1" 285 | 286 | [[package]] 287 | category = "main" 288 | description = "Powerful data structures for data analysis, time series, and statistics" 289 | name = "pandas" 290 | optional = false 291 | python-versions = ">=3.6.1" 292 | version = "1.1.1" 293 | 294 | [package.dependencies] 295 | numpy = ">=1.15.4" 296 | python-dateutil = ">=2.7.3" 297 | pytz = ">=2017.2" 298 | 299 | [package.extras] 300 | test = ["pytest (>=4.0.2)", "pytest-xdist", "hypothesis (>=3.58)"] 301 | 302 | [[package]] 303 | category = "main" 304 | description = "Syntax-highlighting, declarative and composable pretty printer for Python 3.5+" 305 | name = "prettyprinter" 306 | optional = false 307 | python-versions = "*" 308 | version = "0.18.0" 309 | 310 | [package.dependencies] 311 | Pygments = ">=2.2.0" 312 | colorful = ">=0.4.0" 313 | 314 | [[package]] 315 | category = "main" 316 | description = "Protocol Buffers" 317 | name = "protobuf" 318 | optional = false 319 | python-versions = "*" 320 | version = "3.13.0" 321 | 322 | [package.dependencies] 323 | setuptools = "*" 324 | six = ">=1.9" 325 | 326 | [[package]] 327 | category = "main" 328 | description = "A Sampling Profiler for Python" 329 | name = "py-spy" 330 | optional = false 331 | python-versions = "*" 332 | version = "0.3.3" 333 | 334 | [[package]] 335 | category = "main" 336 | description = "Pygments is a syntax highlighting package written in Python." 337 | name = "pygments" 338 | optional = false 339 | python-versions = ">=3.5" 340 | version = "2.6.1" 341 | 342 | [[package]] 343 | category = "main" 344 | description = "Persistent/Functional/Immutable data structures" 345 | name = "pyrsistent" 346 | optional = false 347 | python-versions = "*" 348 | version = "0.16.0" 349 | 350 | [package.dependencies] 351 | six = "*" 352 | 353 | [[package]] 354 | category = "main" 355 | description = "Extensions to the standard Python datetime module" 356 | name = "python-dateutil" 357 | optional = false 358 | python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,>=2.7" 359 | version = "2.8.1" 360 | 361 | [package.dependencies] 362 | six = ">=1.5" 363 | 364 | [[package]] 365 | category = "main" 366 | description = "World timezone definitions, modern and historical" 367 | name = "pytz" 368 | optional = false 369 | python-versions = "*" 370 | version = "2020.1" 371 | 372 | [[package]] 373 | category = "main" 374 | description = "YAML parser and emitter for Python" 375 | name = "pyyaml" 376 | optional = false 377 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 378 | version = "5.3.1" 379 | 380 | [[package]] 381 | category = "main" 382 | description = "A system for parallel and distributed Python that unifies the ML ecosystem." 383 | name = "ray" 384 | optional = false 385 | python-versions = "*" 386 | version = "0.8.6" 387 | 388 | [package.dependencies] 389 | aiohttp = "*" 390 | click = ">=7.0" 391 | colorama = "*" 392 | filelock = "*" 393 | google = "*" 394 | grpcio = "*" 395 | jsonschema = "*" 396 | msgpack = ">=0.6.0,<2.0.0" 397 | numpy = ">=1.16" 398 | protobuf = ">=3.8.0" 399 | py-spy = ">=0.2.0" 400 | pyyaml = "*" 401 | redis = ">=3.3.2,<3.5.0" 402 | 403 | [package.dependencies.blist] 404 | optional = true 405 | version = "*" 406 | 407 | [package.dependencies.flask] 408 | optional = true 409 | version = "*" 410 | 411 | [package.dependencies.uvicorn] 412 | optional = true 413 | version = "*" 414 | 415 | [package.extras] 416 | all = ["scipy", "atari-py", "dm-tree", "opencv-python-headless", "pandas", "gym", "tensorboardx", "blist", "tabulate", "lz4", "msgpack (>=0.6.2)", "pyyaml", "gpustat", "uvicorn", "requests", "flask"] 417 | dashboard = ["requests", "gpustat"] 418 | rllib = ["tabulate", "tensorboardx", "pandas", "atari-py", "dm-tree", "gym", "lz4", "opencv-python-headless", "pyyaml", "scipy"] 419 | serve = ["uvicorn", "flask", "blist"] 420 | streaming = ["msgpack (>=0.6.2)"] 421 | tune = ["tabulate", "tensorboardx", "pandas"] 422 | 423 | [[package]] 424 | category = "main" 425 | description = "Python client for Redis key-value store" 426 | name = "redis" 427 | optional = false 428 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 429 | version = "3.4.1" 430 | 431 | [package.extras] 432 | hiredis = ["hiredis (>=0.1.3)"] 433 | 434 | [[package]] 435 | category = "main" 436 | description = "Python HTTP for Humans." 437 | name = "requests" 438 | optional = false 439 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 440 | version = "2.24.0" 441 | 442 | [package.dependencies] 443 | certifi = ">=2017.4.17" 444 | chardet = ">=3.0.2,<4" 445 | idna = ">=2.5,<3" 446 | urllib3 = ">=1.21.1,<1.25.0 || >1.25.0,<1.25.1 || >1.25.1,<1.26" 447 | 448 | [package.extras] 449 | security = ["pyOpenSSL (>=0.14)", "cryptography (>=1.3.4)"] 450 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7)", "win-inet-pton"] 451 | 452 | [[package]] 453 | category = "main" 454 | description = "A set of python modules for machine learning and data mining" 455 | name = "scikit-learn" 456 | optional = false 457 | python-versions = ">=3.6" 458 | version = "0.23.2" 459 | 460 | [package.dependencies] 461 | joblib = ">=0.11" 462 | numpy = ">=1.13.3" 463 | scipy = ">=0.19.1" 464 | threadpoolctl = ">=2.0.0" 465 | 466 | [package.extras] 467 | alldeps = ["numpy (>=1.13.3)", "scipy (>=0.19.1)"] 468 | 469 | [[package]] 470 | category = "main" 471 | description = "SciPy: Scientific Library for Python" 472 | name = "scipy" 473 | optional = false 474 | python-versions = ">=3.6" 475 | version = "1.5.2" 476 | 477 | [package.dependencies] 478 | numpy = ">=1.14.5" 479 | 480 | [[package]] 481 | category = "main" 482 | description = "A unified approach to explain the output of any machine learning model." 483 | name = "shap" 484 | optional = false 485 | python-versions = "*" 486 | version = "0.35.0" 487 | 488 | [package.dependencies] 489 | numpy = "*" 490 | pandas = "*" 491 | scikit-learn = "*" 492 | scipy = "*" 493 | tqdm = ">4.25.0" 494 | 495 | [package.extras] 496 | all = ["lime", "ipython", "matplotlib"] 497 | others = ["lime"] 498 | plots = ["matplotlib", "ipython"] 499 | 500 | [[package]] 501 | category = "main" 502 | description = "Python 2 and 3 compatibility utilities" 503 | name = "six" 504 | optional = false 505 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" 506 | version = "1.15.0" 507 | 508 | [[package]] 509 | category = "main" 510 | description = "A modern CSS selector implementation for Beautiful Soup." 511 | name = "soupsieve" 512 | optional = false 513 | python-versions = "*" 514 | version = "1.9.6" 515 | 516 | [[package]] 517 | category = "main" 518 | description = "threadpoolctl" 519 | name = "threadpoolctl" 520 | optional = false 521 | python-versions = ">=3.5" 522 | version = "2.1.0" 523 | 524 | [[package]] 525 | category = "main" 526 | description = "Fast, Extensible Progress Meter" 527 | name = "tqdm" 528 | optional = false 529 | python-versions = ">=2.6, !=3.0.*, !=3.1.*" 530 | version = "4.48.2" 531 | 532 | [package.extras] 533 | dev = ["py-make (>=0.1.0)", "twine", "argopt", "pydoc-markdown"] 534 | 535 | [[package]] 536 | category = "main" 537 | description = "Backported and Experimental Type Hints for Python 3.5+" 538 | marker = "python_version < \"3.8\"" 539 | name = "typing-extensions" 540 | optional = false 541 | python-versions = "*" 542 | version = "3.7.4.3" 543 | 544 | [[package]] 545 | category = "main" 546 | description = "HTTP library with thread-safe connection pooling, file post, and more." 547 | name = "urllib3" 548 | optional = false 549 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4" 550 | version = "1.25.10" 551 | 552 | [package.extras] 553 | brotli = ["brotlipy (>=0.6.0)"] 554 | secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "pyOpenSSL (>=0.14)", "ipaddress"] 555 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7,<2.0)"] 556 | 557 | [[package]] 558 | category = "main" 559 | description = "The lightning-fast ASGI server." 560 | name = "uvicorn" 561 | optional = false 562 | python-versions = "*" 563 | version = "0.11.8" 564 | 565 | [package.dependencies] 566 | click = ">=7.0.0,<8.0.0" 567 | h11 = ">=0.8,<0.10" 568 | httptools = ">=0.1.0,<0.2.0" 569 | uvloop = ">=0.14.0" 570 | websockets = ">=8.0.0,<9.0.0" 571 | 572 | [package.extras] 573 | watchgodreload = ["watchgod (>=0.6,<0.7)"] 574 | 575 | [[package]] 576 | category = "main" 577 | description = "Fast implementation of asyncio event loop on top of libuv" 578 | marker = "sys_platform != \"win32\" and sys_platform != \"cygwin\" and platform_python_implementation != \"PyPy\"" 579 | name = "uvloop" 580 | optional = false 581 | python-versions = "*" 582 | version = "0.14.0" 583 | 584 | [[package]] 585 | category = "main" 586 | description = "An implementation of the WebSocket Protocol (RFC 6455 & 7692)" 587 | name = "websockets" 588 | optional = false 589 | python-versions = ">=3.6.1" 590 | version = "8.1" 591 | 592 | [[package]] 593 | category = "main" 594 | description = "The comprehensive WSGI web application library." 595 | name = "werkzeug" 596 | optional = false 597 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 598 | version = "1.0.1" 599 | 600 | [package.extras] 601 | dev = ["pytest", "pytest-timeout", "coverage", "tox", "sphinx", "pallets-sphinx-themes", "sphinx-issues"] 602 | watchdog = ["watchdog"] 603 | 604 | [[package]] 605 | category = "main" 606 | description = "Yet another URL library" 607 | name = "yarl" 608 | optional = false 609 | python-versions = ">=3.5" 610 | version = "1.5.1" 611 | 612 | [package.dependencies] 613 | idna = ">=2.0" 614 | multidict = ">=4.0" 615 | 616 | [package.dependencies.typing-extensions] 617 | python = "<3.8" 618 | version = ">=3.7.4" 619 | 620 | [[package]] 621 | category = "main" 622 | description = "Backport of pathlib-compatible object wrapper for zip files" 623 | marker = "python_version < \"3.8\"" 624 | name = "zipp" 625 | optional = false 626 | python-versions = ">=3.6" 627 | version = "3.1.0" 628 | 629 | [package.extras] 630 | docs = ["sphinx", "jaraco.packaging (>=3.2)", "rst.linker (>=1.9)"] 631 | testing = ["jaraco.itertools", "func-timeout"] 632 | 633 | [metadata] 634 | content-hash = "e2f69ddb89178d8865436ba1b16f0ac88ddad592cf9e215eacc27615829896a1" 635 | lock-version = "1.0" 636 | python-versions = "^3.7" 637 | 638 | [metadata.files] 639 | aiohttp = [ 640 | {file = "aiohttp-3.6.2-cp35-cp35m-macosx_10_13_x86_64.whl", hash = "sha256:1e984191d1ec186881ffaed4581092ba04f7c61582a177b187d3a2f07ed9719e"}, 641 | {file = "aiohttp-3.6.2-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:50aaad128e6ac62e7bf7bd1f0c0a24bc968a0c0590a726d5a955af193544bcec"}, 642 | {file = "aiohttp-3.6.2-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:65f31b622af739a802ca6fd1a3076fd0ae523f8485c52924a89561ba10c49b48"}, 643 | {file = "aiohttp-3.6.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:ae55bac364c405caa23a4f2d6cfecc6a0daada500274ffca4a9230e7129eac59"}, 644 | {file = "aiohttp-3.6.2-cp36-cp36m-win32.whl", hash = "sha256:344c780466b73095a72c616fac5ea9c4665add7fc129f285fbdbca3cccf4612a"}, 645 | {file = "aiohttp-3.6.2-cp36-cp36m-win_amd64.whl", hash = "sha256:4c6efd824d44ae697814a2a85604d8e992b875462c6655da161ff18fd4f29f17"}, 646 | {file = "aiohttp-3.6.2-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:2f4d1a4fdce595c947162333353d4a44952a724fba9ca3205a3df99a33d1307a"}, 647 | {file = "aiohttp-3.6.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:6206a135d072f88da3e71cc501c59d5abffa9d0bb43269a6dcd28d66bfafdbdd"}, 648 | {file = "aiohttp-3.6.2-cp37-cp37m-win32.whl", hash = "sha256:b778ce0c909a2653741cb4b1ac7015b5c130ab9c897611df43ae6a58523cb965"}, 649 | {file = "aiohttp-3.6.2-cp37-cp37m-win_amd64.whl", hash = "sha256:32e5f3b7e511aa850829fbe5aa32eb455e5534eaa4b1ce93231d00e2f76e5654"}, 650 | {file = "aiohttp-3.6.2-py3-none-any.whl", hash = "sha256:460bd4237d2dbecc3b5ed57e122992f60188afe46e7319116da5eb8a9dfedba4"}, 651 | {file = "aiohttp-3.6.2.tar.gz", hash = "sha256:259ab809ff0727d0e834ac5e8a283dc5e3e0ecc30c4d80b3cd17a4139ce1f326"}, 652 | ] 653 | async-timeout = [ 654 | {file = "async-timeout-3.0.1.tar.gz", hash = "sha256:0c3c816a028d47f659d6ff5c745cb2acf1f966da1fe5c19c77a70282b25f4c5f"}, 655 | {file = "async_timeout-3.0.1-py3-none-any.whl", hash = "sha256:4291ca197d287d274d0b6cb5d6f8f8f82d434ed288f962539ff18cc9012f9ea3"}, 656 | ] 657 | attrs = [ 658 | {file = "attrs-20.1.0-py2.py3-none-any.whl", hash = "sha256:2867b7b9f8326499ab5b0e2d12801fa5c98842d2cbd22b35112ae04bf85b4dff"}, 659 | {file = "attrs-20.1.0.tar.gz", hash = "sha256:0ef97238856430dcf9228e07f316aefc17e8939fc8507e18c6501b761ef1a42a"}, 660 | ] 661 | beautifulsoup4 = [ 662 | {file = "beautifulsoup4-4.9.1-py2-none-any.whl", hash = "sha256:e718f2342e2e099b640a34ab782407b7b676f47ee272d6739e60b8ea23829f2c"}, 663 | {file = "beautifulsoup4-4.9.1-py3-none-any.whl", hash = "sha256:a6237df3c32ccfaee4fd201c8f5f9d9df619b93121d01353a64a73ce8c6ef9a8"}, 664 | {file = "beautifulsoup4-4.9.1.tar.gz", hash = "sha256:73cc4d115b96f79c7d77c1c7f7a0a8d4c57860d1041df407dd1aae7f07a77fd7"}, 665 | ] 666 | blist = [ 667 | {file = "blist-1.3.6.tar.gz", hash = "sha256:3a12c450b001bdf895b30ae818d4d6d3f1552096b8c995f0fe0c74bef04d1fc3"}, 668 | ] 669 | certifi = [ 670 | {file = "certifi-2020.6.20-py2.py3-none-any.whl", hash = "sha256:8fc0819f1f30ba15bdb34cceffb9ef04d99f420f68eb75d901e9560b8749fc41"}, 671 | {file = "certifi-2020.6.20.tar.gz", hash = "sha256:5930595817496dd21bb8dc35dad090f1c2cd0adfaf21204bf6732ca5d8ee34d3"}, 672 | ] 673 | chardet = [ 674 | {file = "chardet-3.0.4-py2.py3-none-any.whl", hash = "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"}, 675 | {file = "chardet-3.0.4.tar.gz", hash = "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae"}, 676 | ] 677 | click = [ 678 | {file = "click-7.1.2-py2.py3-none-any.whl", hash = "sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc"}, 679 | {file = "click-7.1.2.tar.gz", hash = "sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a"}, 680 | ] 681 | colorama = [ 682 | {file = "colorama-0.4.3-py2.py3-none-any.whl", hash = "sha256:7d73d2a99753107a36ac6b455ee49046802e59d9d076ef8e47b61499fa29afff"}, 683 | {file = "colorama-0.4.3.tar.gz", hash = "sha256:e96da0d330793e2cb9485e9ddfd918d456036c7149416295932478192f4436a1"}, 684 | ] 685 | colorful = [ 686 | {file = "colorful-0.5.4-py2.py3-none-any.whl", hash = "sha256:8d264b52a39aae4c0ba3e2a46afbaec81b0559a99be0d2cfe2aba4cf94531348"}, 687 | {file = "colorful-0.5.4.tar.gz", hash = "sha256:86848ad4e2eda60cd2519d8698945d22f6f6551e23e95f3f14dfbb60997807ea"}, 688 | ] 689 | filelock = [ 690 | {file = "filelock-3.0.12-py3-none-any.whl", hash = "sha256:929b7d63ec5b7d6b71b0fa5ac14e030b3f70b75747cef1b10da9b879fef15836"}, 691 | {file = "filelock-3.0.12.tar.gz", hash = "sha256:18d82244ee114f543149c66a6e0c14e9c4f8a1044b5cdaadd0f82159d6a6ff59"}, 692 | ] 693 | flask = [ 694 | {file = "Flask-1.1.2-py2.py3-none-any.whl", hash = "sha256:8a4fdd8936eba2512e9c85df320a37e694c93945b33ef33c89946a340a238557"}, 695 | {file = "Flask-1.1.2.tar.gz", hash = "sha256:4efa1ae2d7c9865af48986de8aeb8504bf32c7f3d6fdc9353d34b21f4b127060"}, 696 | ] 697 | google = [ 698 | {file = "google-3.0.0-py2.py3-none-any.whl", hash = "sha256:889cf695f84e4ae2c55fbc0cfdaf4c1e729417fa52ab1db0485202ba173e4935"}, 699 | {file = "google-3.0.0.tar.gz", hash = "sha256:143530122ee5130509ad5e989f0512f7cb218b2d4eddbafbad40fd10e8d8ccbe"}, 700 | ] 701 | grpcio = [ 702 | {file = "grpcio-1.31.0-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:e8c3264b0fd728aadf3f0324471843f65bd3b38872bdab2a477e31ffb685dd5b"}, 703 | {file = "grpcio-1.31.0-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:5fb0923b16590bac338e92d98c7d8effb3cfad1d2e18c71bf86bde32c49cd6dd"}, 704 | {file = "grpcio-1.31.0-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:58d7121f48cb94535a4cedcce32921d0d0a78563c7372a143dedeec196d1c637"}, 705 | {file = "grpcio-1.31.0-cp27-cp27m-win32.whl", hash = "sha256:ea849210e7362559f326cbe603d5b8d8bb1e556e86a7393b5a8847057de5b084"}, 706 | {file = "grpcio-1.31.0-cp27-cp27m-win_amd64.whl", hash = "sha256:ba3e43cb984399064ffaa3c0997576e46a1e268f9da05f97cd9b272f0b59ee71"}, 707 | {file = "grpcio-1.31.0-cp27-cp27mu-linux_armv7l.whl", hash = "sha256:ebb2ca09fa17537e35508a29dcb05575d4d9401138a68e83d1c605d65e8a1770"}, 708 | {file = "grpcio-1.31.0-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:292635f05b6ce33f87116951d0b3d8d330bdfc5cac74f739370d60981e8c256c"}, 709 | {file = "grpcio-1.31.0-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:92e54ab65e782f227e751c7555918afaba8d1229601687e89b80c2b65d2f6642"}, 710 | {file = "grpcio-1.31.0-cp35-cp35m-linux_armv7l.whl", hash = "sha256:013287f99c99b201aa8a5f6bc7918f616739b9be031db132d9e3b8453e95e151"}, 711 | {file = "grpcio-1.31.0-cp35-cp35m-macosx_10_7_intel.whl", hash = "sha256:d2c5e05c257859febd03f5d81b5015e1946d6bcf475c7bf63ee99cea8ab0d590"}, 712 | {file = "grpcio-1.31.0-cp35-cp35m-manylinux2010_i686.whl", hash = "sha256:c9016ab1eaf4e054099303287195f3746bd4e69f2631d040f9dca43e910a5408"}, 713 | {file = "grpcio-1.31.0-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:baaa036540d7ace433bdf38a3fe5e41cf9f84cdf10a88bac805f678a7ca8ddcc"}, 714 | {file = "grpcio-1.31.0-cp35-cp35m-manylinux2014_i686.whl", hash = "sha256:75e383053dccb610590aa53eed5278db5c09bf498d3b5105ce6c776478f59352"}, 715 | {file = "grpcio-1.31.0-cp35-cp35m-manylinux2014_x86_64.whl", hash = "sha256:739a72abffbd36083ff7adbb862cf1afc1e311c35834bed9c0361d8e68b063e1"}, 716 | {file = "grpcio-1.31.0-cp35-cp35m-win32.whl", hash = "sha256:f04c59d186af3157dc8811114130aaeae92e90a65283733f41de94eed484e1f7"}, 717 | {file = "grpcio-1.31.0-cp35-cp35m-win_amd64.whl", hash = "sha256:ef9fce98b6fe03874c2a6576b02aec1a0df25742cd67d1d7b75a49e30aa74225"}, 718 | {file = "grpcio-1.31.0-cp36-cp36m-linux_armv7l.whl", hash = "sha256:08a9b648dbe8852ff94b73a1c96da126834c3057ba2301d13e8c4adff334c482"}, 719 | {file = "grpcio-1.31.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:c22b19abba63562a5a200e586b5bde39d26c8ec30c92e26d209d81182371693b"}, 720 | {file = "grpcio-1.31.0-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:0397616355760cd8282ed5ea34d51830ae4cb6613b7e5f66bed3be5d041b8b9a"}, 721 | {file = "grpcio-1.31.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:259240aab2603891553e17ad5b2655693df79e02a9b887ff605bdeb2fcd3dcc9"}, 722 | {file = "grpcio-1.31.0-cp36-cp36m-manylinux2014_i686.whl", hash = "sha256:8ca26b489b5dc1e3d31807d329c23d6cb06fe40fbae25b0649b718947936e26a"}, 723 | {file = "grpcio-1.31.0-cp36-cp36m-manylinux2014_x86_64.whl", hash = "sha256:bf39977282a79dc1b2765cc3402c0ada571c29a491caec6ed12c0993c1ec115e"}, 724 | {file = "grpcio-1.31.0-cp36-cp36m-win32.whl", hash = "sha256:f5b0870b733bcb7b6bf05a02035e7aaf20f599d3802b390282d4c2309f825f1d"}, 725 | {file = "grpcio-1.31.0-cp36-cp36m-win_amd64.whl", hash = "sha256:074871a184483d5cd0746fd01e7d214d3ee9d36e67e32a5786b0a21f29fb8304"}, 726 | {file = "grpcio-1.31.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:220c46b1fc9c9a6fcca4caac398f08f0ed43cdd63c45b7458983c4a1575ef6df"}, 727 | {file = "grpcio-1.31.0-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:7a11b1ebb3210f34913b8be6995936bf9ebc541a65ab69e75db5ce1fe5047e8f"}, 728 | {file = "grpcio-1.31.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:3c2aa6d7a5e5bf73fdb1715eee777efe06dd39df03383f1cc095b2fdb34883e6"}, 729 | {file = "grpcio-1.31.0-cp37-cp37m-manylinux2014_i686.whl", hash = "sha256:e64bddd09842ef508d72ca354319b0eb126205d951e8ac3128fe9869bd563552"}, 730 | {file = "grpcio-1.31.0-cp37-cp37m-manylinux2014_x86_64.whl", hash = "sha256:5d7faa89992e015d245750ca9ac916c161bbf72777b2c60abc61da3fae41339e"}, 731 | {file = "grpcio-1.31.0-cp37-cp37m-win32.whl", hash = "sha256:43d44548ad6ee738b941abd9f09e3b83a5c13f3e1410321023c3c148ba50e796"}, 732 | {file = "grpcio-1.31.0-cp37-cp37m-win_amd64.whl", hash = "sha256:bf00ab06ea4f89976288f4d6224d4aa120780e30c955d4f85c3214ada29b3ddf"}, 733 | {file = "grpcio-1.31.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:344b50865914cc8e6d023457bffee9a640abb18f75d0f2bb519041961c748da9"}, 734 | {file = "grpcio-1.31.0-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:63ee8e02d04272c3d103f44b4bce5d43ea757dd288673cea212d2f7da27967d2"}, 735 | {file = "grpcio-1.31.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:a9a7ae74cb3108e6457cf15532d4c300324b48fbcf3ef290bcd2835745f20510"}, 736 | {file = "grpcio-1.31.0-cp38-cp38-manylinux2014_i686.whl", hash = "sha256:64077e3a9a7cf2f59e6c76d503c8de1f18a76428f41a5b000dc53c48a0b772ff"}, 737 | {file = "grpcio-1.31.0-cp38-cp38-manylinux2014_x86_64.whl", hash = "sha256:8b42f0ac76be07a5fa31117a3388d754ad35ef05e2e34be185ca9ccbcfac2069"}, 738 | {file = "grpcio-1.31.0-cp38-cp38-win32.whl", hash = "sha256:8002a89ea91c0078c15d3c0daf423fd4968946be78f08545e807ea9a5ff8054a"}, 739 | {file = "grpcio-1.31.0-cp38-cp38-win_amd64.whl", hash = "sha256:0fa86ac4452602c79774783aa68979a1a7625ebb7eaabee2b6550b975b9d61e6"}, 740 | {file = "grpcio-1.31.0.tar.gz", hash = "sha256:5043440c45c0a031f387e7f48527541c65d672005fb24cf18ef6857483557d39"}, 741 | ] 742 | h11 = [ 743 | {file = "h11-0.9.0-py2.py3-none-any.whl", hash = "sha256:4bc6d6a1238b7615b266ada57e0618568066f57dd6fa967d1290ec9309b2f2f1"}, 744 | {file = "h11-0.9.0.tar.gz", hash = "sha256:33d4bca7be0fa039f4e84d50ab00531047e53d6ee8ffbc83501ea602c169cae1"}, 745 | ] 746 | httptools = [ 747 | {file = "httptools-0.1.1-cp35-cp35m-macosx_10_13_x86_64.whl", hash = "sha256:a2719e1d7a84bb131c4f1e0cb79705034b48de6ae486eb5297a139d6a3296dce"}, 748 | {file = "httptools-0.1.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:fa3cd71e31436911a44620473e873a256851e1f53dee56669dae403ba41756a4"}, 749 | {file = "httptools-0.1.1-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:86c6acd66765a934e8730bf0e9dfaac6fdcf2a4334212bd4a0a1c78f16475ca6"}, 750 | {file = "httptools-0.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:bc3114b9edbca5a1eb7ae7db698c669eb53eb8afbbebdde116c174925260849c"}, 751 | {file = "httptools-0.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:ac0aa11e99454b6a66989aa2d44bca41d4e0f968e395a0a8f164b401fefe359a"}, 752 | {file = "httptools-0.1.1-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:96da81e1992be8ac2fd5597bf0283d832287e20cb3cfde8996d2b00356d4e17f"}, 753 | {file = "httptools-0.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:56b6393c6ac7abe632f2294da53f30d279130a92e8ae39d8d14ee2e1b05ad1f2"}, 754 | {file = "httptools-0.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:96eb359252aeed57ea5c7b3d79839aaa0382c9d3149f7d24dd7172b1bcecb009"}, 755 | {file = "httptools-0.1.1-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:fea04e126014169384dee76a153d4573d90d0cbd1d12185da089f73c78390437"}, 756 | {file = "httptools-0.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:3592e854424ec94bd17dc3e0c96a64e459ec4147e6d53c0a42d0ebcef9cb9c5d"}, 757 | {file = "httptools-0.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:0a4b1b2012b28e68306575ad14ad5e9120b34fccd02a81eb08838d7e3bbb48be"}, 758 | {file = "httptools-0.1.1.tar.gz", hash = "sha256:41b573cf33f64a8f8f3400d0a7faf48e1888582b6f6e02b82b9bd4f0bf7497ce"}, 759 | ] 760 | idna = [ 761 | {file = "idna-2.10-py2.py3-none-any.whl", hash = "sha256:b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0"}, 762 | {file = "idna-2.10.tar.gz", hash = "sha256:b307872f855b18632ce0c21c5e45be78c0ea7ae4c15c828c20788b26921eb3f6"}, 763 | ] 764 | importlib-metadata = [ 765 | {file = "importlib_metadata-1.7.0-py2.py3-none-any.whl", hash = "sha256:dc15b2969b4ce36305c51eebe62d418ac7791e9a157911d58bfb1f9ccd8e2070"}, 766 | {file = "importlib_metadata-1.7.0.tar.gz", hash = "sha256:90bb658cdbbf6d1735b6341ce708fc7024a3e14e99ffdc5783edea9f9b077f83"}, 767 | ] 768 | itsdangerous = [ 769 | {file = "itsdangerous-1.1.0-py2.py3-none-any.whl", hash = "sha256:b12271b2047cb23eeb98c8b5622e2e5c5e9abd9784a153e9d8ef9cb4dd09d749"}, 770 | {file = "itsdangerous-1.1.0.tar.gz", hash = "sha256:321b033d07f2a4136d3ec762eac9f16a10ccd60f53c0c91af90217ace7ba1f19"}, 771 | ] 772 | jinja2 = [ 773 | {file = "Jinja2-2.11.2-py2.py3-none-any.whl", hash = "sha256:f0a4641d3cf955324a89c04f3d94663aa4d638abe8f733ecd3582848e1c37035"}, 774 | {file = "Jinja2-2.11.2.tar.gz", hash = "sha256:89aab215427ef59c34ad58735269eb58b1a5808103067f7bb9d5836c651b3bb0"}, 775 | ] 776 | joblib = [ 777 | {file = "joblib-0.16.0-py3-none-any.whl", hash = "sha256:d348c5d4ae31496b2aa060d6d9b787864dd204f9480baaa52d18850cb43e9f49"}, 778 | {file = "joblib-0.16.0.tar.gz", hash = "sha256:8f52bf24c64b608bf0b2563e0e47d6fcf516abc8cfafe10cfd98ad66d94f92d6"}, 779 | ] 780 | jsonschema = [ 781 | {file = "jsonschema-3.2.0-py2.py3-none-any.whl", hash = "sha256:4e5b3cf8216f577bee9ce139cbe72eca3ea4f292ec60928ff24758ce626cd163"}, 782 | {file = "jsonschema-3.2.0.tar.gz", hash = "sha256:c8a85b28d377cc7737e46e2d9f2b4f44ee3c0e1deac6bf46ddefc7187d30797a"}, 783 | ] 784 | markupsafe = [ 785 | {file = "MarkupSafe-1.1.1-cp27-cp27m-macosx_10_6_intel.whl", hash = "sha256:09027a7803a62ca78792ad89403b1b7a73a01c8cb65909cd876f7fcebd79b161"}, 786 | {file = "MarkupSafe-1.1.1-cp27-cp27m-manylinux1_i686.whl", hash = "sha256:e249096428b3ae81b08327a63a485ad0878de3fb939049038579ac0ef61e17e7"}, 787 | {file = "MarkupSafe-1.1.1-cp27-cp27m-manylinux1_x86_64.whl", hash = "sha256:500d4957e52ddc3351cabf489e79c91c17f6e0899158447047588650b5e69183"}, 788 | {file = "MarkupSafe-1.1.1-cp27-cp27m-win32.whl", hash = "sha256:b2051432115498d3562c084a49bba65d97cf251f5a331c64a12ee7e04dacc51b"}, 789 | {file = "MarkupSafe-1.1.1-cp27-cp27m-win_amd64.whl", hash = "sha256:98c7086708b163d425c67c7a91bad6e466bb99d797aa64f965e9d25c12111a5e"}, 790 | {file = "MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_i686.whl", hash = "sha256:cd5df75523866410809ca100dc9681e301e3c27567cf498077e8551b6d20e42f"}, 791 | {file = "MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:43a55c2930bbc139570ac2452adf3d70cdbb3cfe5912c71cdce1c2c6bbd9c5d1"}, 792 | {file = "MarkupSafe-1.1.1-cp34-cp34m-macosx_10_6_intel.whl", hash = "sha256:1027c282dad077d0bae18be6794e6b6b8c91d58ed8a8d89a89d59693b9131db5"}, 793 | {file = "MarkupSafe-1.1.1-cp34-cp34m-manylinux1_i686.whl", hash = "sha256:62fe6c95e3ec8a7fad637b7f3d372c15ec1caa01ab47926cfdf7a75b40e0eac1"}, 794 | {file = "MarkupSafe-1.1.1-cp34-cp34m-manylinux1_x86_64.whl", hash = "sha256:88e5fcfb52ee7b911e8bb6d6aa2fd21fbecc674eadd44118a9cc3863f938e735"}, 795 | {file = "MarkupSafe-1.1.1-cp34-cp34m-win32.whl", hash = "sha256:ade5e387d2ad0d7ebf59146cc00c8044acbd863725f887353a10df825fc8ae21"}, 796 | {file = "MarkupSafe-1.1.1-cp34-cp34m-win_amd64.whl", hash = "sha256:09c4b7f37d6c648cb13f9230d847adf22f8171b1ccc4d5682398e77f40309235"}, 797 | {file = "MarkupSafe-1.1.1-cp35-cp35m-macosx_10_6_intel.whl", hash = "sha256:79855e1c5b8da654cf486b830bd42c06e8780cea587384cf6545b7d9ac013a0b"}, 798 | {file = "MarkupSafe-1.1.1-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:c8716a48d94b06bb3b2524c2b77e055fb313aeb4ea620c8dd03a105574ba704f"}, 799 | {file = "MarkupSafe-1.1.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:7c1699dfe0cf8ff607dbdcc1e9b9af1755371f92a68f706051cc8c37d447c905"}, 800 | {file = "MarkupSafe-1.1.1-cp35-cp35m-win32.whl", hash = "sha256:6dd73240d2af64df90aa7c4e7481e23825ea70af4b4922f8ede5b9e35f78a3b1"}, 801 | {file = "MarkupSafe-1.1.1-cp35-cp35m-win_amd64.whl", hash = "sha256:9add70b36c5666a2ed02b43b335fe19002ee5235efd4b8a89bfcf9005bebac0d"}, 802 | {file = "MarkupSafe-1.1.1-cp36-cp36m-macosx_10_6_intel.whl", hash = "sha256:24982cc2533820871eba85ba648cd53d8623687ff11cbb805be4ff7b4c971aff"}, 803 | {file = "MarkupSafe-1.1.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:00bc623926325b26bb9605ae9eae8a215691f33cae5df11ca5424f06f2d1f473"}, 804 | {file = "MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:717ba8fe3ae9cc0006d7c451f0bb265ee07739daf76355d06366154ee68d221e"}, 805 | {file = "MarkupSafe-1.1.1-cp36-cp36m-win32.whl", hash = "sha256:535f6fc4d397c1563d08b88e485c3496cf5784e927af890fb3c3aac7f933ec66"}, 806 | {file = "MarkupSafe-1.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:b1282f8c00509d99fef04d8ba936b156d419be841854fe901d8ae224c59f0be5"}, 807 | {file = "MarkupSafe-1.1.1-cp37-cp37m-macosx_10_6_intel.whl", hash = "sha256:8defac2f2ccd6805ebf65f5eeb132adcf2ab57aa11fdf4c0dd5169a004710e7d"}, 808 | {file = "MarkupSafe-1.1.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:46c99d2de99945ec5cb54f23c8cd5689f6d7177305ebff350a58ce5f8de1669e"}, 809 | {file = "MarkupSafe-1.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:ba59edeaa2fc6114428f1637ffff42da1e311e29382d81b339c1817d37ec93c6"}, 810 | {file = "MarkupSafe-1.1.1-cp37-cp37m-win32.whl", hash = "sha256:b00c1de48212e4cc9603895652c5c410df699856a2853135b3967591e4beebc2"}, 811 | {file = "MarkupSafe-1.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:9bf40443012702a1d2070043cb6291650a0841ece432556f784f004937f0f32c"}, 812 | {file = "MarkupSafe-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:6788b695d50a51edb699cb55e35487e430fa21f1ed838122d722e0ff0ac5ba15"}, 813 | {file = "MarkupSafe-1.1.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:cdb132fc825c38e1aeec2c8aa9338310d29d337bebbd7baa06889d09a60a1fa2"}, 814 | {file = "MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:13d3144e1e340870b25e7b10b98d779608c02016d5184cfb9927a9f10c689f42"}, 815 | {file = "MarkupSafe-1.1.1-cp38-cp38-win32.whl", hash = "sha256:596510de112c685489095da617b5bcbbac7dd6384aeebeda4df6025d0256a81b"}, 816 | {file = "MarkupSafe-1.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:e8313f01ba26fbbe36c7be1966a7b7424942f670f38e666995b88d012765b9be"}, 817 | {file = "MarkupSafe-1.1.1.tar.gz", hash = "sha256:29872e92839765e546828bb7754a68c418d927cd064fd4708fab9fe9c8bb116b"}, 818 | ] 819 | msgpack = [ 820 | {file = "msgpack-1.0.0-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:cec8bf10981ed70998d98431cd814db0ecf3384e6b113366e7f36af71a0fca08"}, 821 | {file = "msgpack-1.0.0-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:aa5c057eab4f40ec47ea6f5a9825846be2ff6bf34102c560bad5cad5a677c5be"}, 822 | {file = "msgpack-1.0.0-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:4233b7f86c1208190c78a525cd3828ca1623359ef48f78a6fea4b91bb995775a"}, 823 | {file = "msgpack-1.0.0-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:b3758dfd3423e358bbb18a7cccd1c74228dffa7a697e5be6cb9535de625c0dbf"}, 824 | {file = "msgpack-1.0.0-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:25b3bc3190f3d9d965b818123b7752c5dfb953f0d774b454fd206c18fe384fb8"}, 825 | {file = "msgpack-1.0.0-cp36-cp36m-win32.whl", hash = "sha256:e7bbdd8e2b277b77782f3ce34734b0dfde6cbe94ddb74de8d733d603c7f9e2b1"}, 826 | {file = "msgpack-1.0.0-cp36-cp36m-win_amd64.whl", hash = "sha256:5dba6d074fac9b24f29aaf1d2d032306c27f04187651511257e7831733293ec2"}, 827 | {file = "msgpack-1.0.0-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:908944e3f038bca67fcfedb7845c4a257c7749bf9818632586b53bcf06ba4b97"}, 828 | {file = "msgpack-1.0.0-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:db685187a415f51d6b937257474ca72199f393dad89534ebbdd7d7a3b000080e"}, 829 | {file = "msgpack-1.0.0-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:ea41c9219c597f1d2bf6b374d951d310d58684b5de9dc4bd2976db9e1e22c140"}, 830 | {file = "msgpack-1.0.0-cp37-cp37m-win32.whl", hash = "sha256:e35b051077fc2f3ce12e7c6a34cf309680c63a842db3a0616ea6ed25ad20d272"}, 831 | {file = "msgpack-1.0.0-cp37-cp37m-win_amd64.whl", hash = "sha256:5bea44181fc8e18eed1d0cd76e355073f00ce232ff9653a0ae88cb7d9e643322"}, 832 | {file = "msgpack-1.0.0-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:c901e8058dd6653307906c5f157f26ed09eb94a850dddd989621098d347926ab"}, 833 | {file = "msgpack-1.0.0-cp38-cp38-manylinux1_i686.whl", hash = "sha256:271b489499a43af001a2e42f42d876bb98ccaa7e20512ff37ca78c8e12e68f84"}, 834 | {file = "msgpack-1.0.0-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:7a22c965588baeb07242cb561b63f309db27a07382825fc98aecaf0827c1538e"}, 835 | {file = "msgpack-1.0.0-cp38-cp38-win32.whl", hash = "sha256:002a0d813e1f7b60da599bdf969e632074f9eec1b96cbed8fb0973a63160a408"}, 836 | {file = "msgpack-1.0.0-cp38-cp38-win_amd64.whl", hash = "sha256:39c54fdebf5fa4dda733369012c59e7d085ebdfe35b6cf648f09d16708f1be5d"}, 837 | {file = "msgpack-1.0.0.tar.gz", hash = "sha256:9534d5cc480d4aff720233411a1f765be90885750b07df772380b34c10ecb5c0"}, 838 | ] 839 | multidict = [ 840 | {file = "multidict-4.7.6-cp35-cp35m-macosx_10_14_x86_64.whl", hash = "sha256:275ca32383bc5d1894b6975bb4ca6a7ff16ab76fa622967625baeebcf8079000"}, 841 | {file = "multidict-4.7.6-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:1ece5a3369835c20ed57adadc663400b5525904e53bae59ec854a5d36b39b21a"}, 842 | {file = "multidict-4.7.6-cp35-cp35m-win32.whl", hash = "sha256:5141c13374e6b25fe6bf092052ab55c0c03d21bd66c94a0e3ae371d3e4d865a5"}, 843 | {file = "multidict-4.7.6-cp35-cp35m-win_amd64.whl", hash = "sha256:9456e90649005ad40558f4cf51dbb842e32807df75146c6d940b6f5abb4a78f3"}, 844 | {file = "multidict-4.7.6-cp36-cp36m-macosx_10_14_x86_64.whl", hash = "sha256:e0d072ae0f2a179c375f67e3da300b47e1a83293c554450b29c900e50afaae87"}, 845 | {file = "multidict-4.7.6-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:3750f2205b800aac4bb03b5ae48025a64e474d2c6cc79547988ba1d4122a09e2"}, 846 | {file = "multidict-4.7.6-cp36-cp36m-win32.whl", hash = "sha256:f07acae137b71af3bb548bd8da720956a3bc9f9a0b87733e0899226a2317aeb7"}, 847 | {file = "multidict-4.7.6-cp36-cp36m-win_amd64.whl", hash = "sha256:6513728873f4326999429a8b00fc7ceddb2509b01d5fd3f3be7881a257b8d463"}, 848 | {file = "multidict-4.7.6-cp37-cp37m-macosx_10_14_x86_64.whl", hash = "sha256:feed85993dbdb1dbc29102f50bca65bdc68f2c0c8d352468c25b54874f23c39d"}, 849 | {file = "multidict-4.7.6-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:fcfbb44c59af3f8ea984de67ec7c306f618a3ec771c2843804069917a8f2e255"}, 850 | {file = "multidict-4.7.6-cp37-cp37m-win32.whl", hash = "sha256:4538273208e7294b2659b1602490f4ed3ab1c8cf9dbdd817e0e9db8e64be2507"}, 851 | {file = "multidict-4.7.6-cp37-cp37m-win_amd64.whl", hash = "sha256:d14842362ed4cf63751648e7672f7174c9818459d169231d03c56e84daf90b7c"}, 852 | {file = "multidict-4.7.6-cp38-cp38-macosx_10_14_x86_64.whl", hash = "sha256:c026fe9a05130e44157b98fea3ab12969e5b60691a276150db9eda71710cd10b"}, 853 | {file = "multidict-4.7.6-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:51a4d210404ac61d32dada00a50ea7ba412e6ea945bbe992e4d7a595276d2ec7"}, 854 | {file = "multidict-4.7.6-cp38-cp38-win32.whl", hash = "sha256:5cf311a0f5ef80fe73e4f4c0f0998ec08f954a6ec72b746f3c179e37de1d210d"}, 855 | {file = "multidict-4.7.6-cp38-cp38-win_amd64.whl", hash = "sha256:7388d2ef3c55a8ba80da62ecfafa06a1c097c18032a501ffd4cabbc52d7f2b19"}, 856 | {file = "multidict-4.7.6.tar.gz", hash = "sha256:fbb77a75e529021e7c4a8d4e823d88ef4d23674a202be4f5addffc72cbb91430"}, 857 | ] 858 | numpy = [ 859 | {file = "numpy-1.19.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:b1cca51512299841bf69add3b75361779962f9cee7d9ee3bb446d5982e925b69"}, 860 | {file = "numpy-1.19.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:c9591886fc9cbe5532d5df85cb8e0cc3b44ba8ce4367bd4cf1b93dc19713da72"}, 861 | {file = "numpy-1.19.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:cf1347450c0b7644ea142712619533553f02ef23f92f781312f6a3553d031fc7"}, 862 | {file = "numpy-1.19.1-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:ed8a311493cf5480a2ebc597d1e177231984c818a86875126cfd004241a73c3e"}, 863 | {file = "numpy-1.19.1-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:3673c8b2b29077f1b7b3a848794f8e11f401ba0b71c49fbd26fb40b71788b132"}, 864 | {file = "numpy-1.19.1-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:56ef7f56470c24bb67fb43dae442e946a6ce172f97c69f8d067ff8550cf782ff"}, 865 | {file = "numpy-1.19.1-cp36-cp36m-win32.whl", hash = "sha256:aaf42a04b472d12515debc621c31cf16c215e332242e7a9f56403d814c744624"}, 866 | {file = "numpy-1.19.1-cp36-cp36m-win_amd64.whl", hash = "sha256:082f8d4dd69b6b688f64f509b91d482362124986d98dc7dc5f5e9f9b9c3bb983"}, 867 | {file = "numpy-1.19.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:e4f6d3c53911a9d103d8ec9518190e52a8b945bab021745af4939cfc7c0d4a9e"}, 868 | {file = "numpy-1.19.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:5b6885c12784a27e957294b60f97e8b5b4174c7504665333c5e94fbf41ae5d6a"}, 869 | {file = "numpy-1.19.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:1bc0145999e8cb8aed9d4e65dd8b139adf1919e521177f198529687dbf613065"}, 870 | {file = "numpy-1.19.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:5a936fd51049541d86ccdeef2833cc89a18e4d3808fe58a8abeb802665c5af93"}, 871 | {file = "numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:ef71a1d4fd4858596ae80ad1ec76404ad29701f8ca7cdcebc50300178db14dfc"}, 872 | {file = "numpy-1.19.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:b9792b0ac0130b277536ab8944e7b754c69560dac0415dd4b2dbd16b902c8954"}, 873 | {file = "numpy-1.19.1-cp37-cp37m-win32.whl", hash = "sha256:b12e639378c741add21fbffd16ba5ad25c0a1a17cf2b6fe4288feeb65144f35b"}, 874 | {file = "numpy-1.19.1-cp37-cp37m-win_amd64.whl", hash = "sha256:8343bf67c72e09cfabfab55ad4a43ce3f6bf6e6ced7acf70f45ded9ebb425055"}, 875 | {file = "numpy-1.19.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:e45f8e981a0ab47103181773cc0a54e650b2aef8c7b6cd07405d0fa8d869444a"}, 876 | {file = "numpy-1.19.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:667c07063940e934287993366ad5f56766bc009017b4a0fe91dbd07960d0aba7"}, 877 | {file = "numpy-1.19.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:480fdd4dbda4dd6b638d3863da3be82873bba6d32d1fc12ea1b8486ac7b8d129"}, 878 | {file = "numpy-1.19.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:935c27ae2760c21cd7354402546f6be21d3d0c806fffe967f745d5f2de5005a7"}, 879 | {file = "numpy-1.19.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:309cbcfaa103fc9a33ec16d2d62569d541b79f828c382556ff072442226d1968"}, 880 | {file = "numpy-1.19.1-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:7ed448ff4eaffeb01094959b19cbaf998ecdee9ef9932381420d514e446601cd"}, 881 | {file = "numpy-1.19.1-cp38-cp38-win32.whl", hash = "sha256:de8b4a9b56255797cbddb93281ed92acbc510fb7b15df3f01bd28f46ebc4edae"}, 882 | {file = "numpy-1.19.1-cp38-cp38-win_amd64.whl", hash = "sha256:92feb989b47f83ebef246adabc7ff3b9a59ac30601c3f6819f8913458610bdcc"}, 883 | {file = "numpy-1.19.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl", hash = "sha256:e1b1dc0372f530f26a03578ac75d5e51b3868b9b76cd2facba4c9ee0eb252ab1"}, 884 | {file = "numpy-1.19.1.zip", hash = "sha256:b8456987b637232602ceb4d663cb34106f7eb780e247d51a260b84760fd8f491"}, 885 | ] 886 | pandas = [ 887 | {file = "pandas-1.1.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:8c9ec12c480c4d915e23ee9c8a2d8eba8509986f35f307771045c1294a2e5b73"}, 888 | {file = "pandas-1.1.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:e4b6c98f45695799990da328e6fd7d6187be32752ed64c2f22326ad66762d179"}, 889 | {file = "pandas-1.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:16ae070c47474008769fc443ac765ffd88c3506b4a82966e7a605592978896f9"}, 890 | {file = "pandas-1.1.1-cp36-cp36m-win32.whl", hash = "sha256:88930c74f69e97b17703600233c0eaf1f4f4dd10c14633d522724c5c1b963ec4"}, 891 | {file = "pandas-1.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:fe6f1623376b616e03d51f0dd95afd862cf9a33c18cf55ce0ed4bbe1c4444391"}, 892 | {file = "pandas-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:a81c4bf9c59010aa3efddbb6b9fc84a9b76dc0b4da2c2c2d50f06a9ef6ac0004"}, 893 | {file = "pandas-1.1.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:1acc2bd7fc95e5408a4456897c2c2a1ae7c6acefe108d90479ab6d98d34fcc3d"}, 894 | {file = "pandas-1.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:84c101d0f7bbf0d9f1be9a2f29f6fcc12415442558d067164e50a56edfb732b4"}, 895 | {file = "pandas-1.1.1-cp37-cp37m-win32.whl", hash = "sha256:391db82ebeb886143b96b9c6c6166686c9a272d00020e4e39ad63b792542d9e2"}, 896 | {file = "pandas-1.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:0366150fe8ee37ef89a45d3093e05026b5f895e42bbce3902ce3b6427f1b8471"}, 897 | {file = "pandas-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:d9644ac996149b2a51325d48d77e25c911e01aa6d39dc1b64be679cd71f683ec"}, 898 | {file = "pandas-1.1.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:41675323d4fcdd15abde068607cad150dfe17f7d32290ee128e5fea98442bd09"}, 899 | {file = "pandas-1.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0246c67cbaaaac8d25fed8d4cf2d8897bd858f0e540e8528a75281cee9ac516d"}, 900 | {file = "pandas-1.1.1-cp38-cp38-win32.whl", hash = "sha256:01b1e536eb960822c5e6b58357cad8c4b492a336f4a5630bf0b598566462a578"}, 901 | {file = "pandas-1.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:57c5f6be49259cde8e6f71c2bf240a26b071569cabc04c751358495d09419e56"}, 902 | {file = "pandas-1.1.1.tar.gz", hash = "sha256:53328284a7bb046e2e885fd1b8c078bd896d7fc4575b915d4936f54984a2ba67"}, 903 | ] 904 | prettyprinter = [ 905 | {file = "prettyprinter-0.18.0-py2.py3-none-any.whl", hash = "sha256:358a58f276cb312e3ca29d7a7f244c91e4e0bda7848249d30e4f36d2eb58b67c"}, 906 | {file = "prettyprinter-0.18.0.tar.gz", hash = "sha256:9fe5da7ec53510881dd35d7a5c677ba45f34cfe6a8e78d1abd20652cf82139a8"}, 907 | ] 908 | protobuf = [ 909 | {file = "protobuf-3.13.0-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:9c2e63c1743cba12737169c447374fab3dfeb18111a460a8c1a000e35836b18c"}, 910 | {file = "protobuf-3.13.0-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:1e834076dfef9e585815757a2c7e4560c7ccc5962b9d09f831214c693a91b463"}, 911 | {file = "protobuf-3.13.0-cp35-cp35m-macosx_10_9_intel.whl", hash = "sha256:df3932e1834a64b46ebc262e951cd82c3cf0fa936a154f0a42231140d8237060"}, 912 | {file = "protobuf-3.13.0-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:8c35bcbed1c0d29b127c886790e9d37e845ffc2725cc1db4bd06d70f4e8359f4"}, 913 | {file = "protobuf-3.13.0-cp35-cp35m-win32.whl", hash = "sha256:339c3a003e3c797bc84499fa32e0aac83c768e67b3de4a5d7a5a9aa3b0da634c"}, 914 | {file = "protobuf-3.13.0-cp35-cp35m-win_amd64.whl", hash = "sha256:361acd76f0ad38c6e38f14d08775514fbd241316cce08deb2ce914c7dfa1184a"}, 915 | {file = "protobuf-3.13.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:9edfdc679a3669988ec55a989ff62449f670dfa7018df6ad7f04e8dbacb10630"}, 916 | {file = "protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:5db9d3e12b6ede5e601b8d8684a7f9d90581882925c96acf8495957b4f1b204b"}, 917 | {file = "protobuf-3.13.0-cp36-cp36m-win32.whl", hash = "sha256:c8abd7605185836f6f11f97b21200f8a864f9cb078a193fe3c9e235711d3ff1e"}, 918 | {file = "protobuf-3.13.0-cp36-cp36m-win_amd64.whl", hash = "sha256:4d1174c9ed303070ad59553f435846a2f877598f59f9afc1b89757bdf846f2a7"}, 919 | {file = "protobuf-3.13.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:0bba42f439bf45c0f600c3c5993666fcb88e8441d011fad80a11df6f324eef33"}, 920 | {file = "protobuf-3.13.0-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:c0c5ab9c4b1eac0a9b838f1e46038c3175a95b0f2d944385884af72876bd6bc7"}, 921 | {file = "protobuf-3.13.0-cp37-cp37m-win32.whl", hash = "sha256:f68eb9d03c7d84bd01c790948320b768de8559761897763731294e3bc316decb"}, 922 | {file = "protobuf-3.13.0-cp37-cp37m-win_amd64.whl", hash = "sha256:91c2d897da84c62816e2f473ece60ebfeab024a16c1751aaf31100127ccd93ec"}, 923 | {file = "protobuf-3.13.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:3dee442884a18c16d023e52e32dd34a8930a889e511af493f6dc7d4d9bf12e4f"}, 924 | {file = "protobuf-3.13.0-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:e7662437ca1e0c51b93cadb988f9b353fa6b8013c0385d63a70c8a77d84da5f9"}, 925 | {file = "protobuf-3.13.0-py2.py3-none-any.whl", hash = "sha256:d69697acac76d9f250ab745b46c725edf3e98ac24763990b24d58c16c642947a"}, 926 | {file = "protobuf-3.13.0.tar.gz", hash = "sha256:6a82e0c8bb2bf58f606040cc5814e07715b2094caeba281e2e7d0b0e2e397db5"}, 927 | ] 928 | py-spy = [ 929 | {file = "py_spy-0.3.3-py2.py3-none-macosx_10_7_x86_64.whl", hash = "sha256:ac0ef13fc2bd67593be1d3fcd1bbee93a6324715b3c2944218e50eadb966c46e"}, 930 | {file = "py_spy-0.3.3-py2.py3-none-manylinux1_i686.whl", hash = "sha256:72eb5c0495b050e6e9424ea373ff7245a01554e98f218d89f8f979c0cd762681"}, 931 | {file = "py_spy-0.3.3-py2.py3-none-manylinux1_x86_64.whl", hash = "sha256:e9d6946741c267fe82aef18d2fc1e095a90a83fb5f3d9fc89b0f20a39613a639"}, 932 | {file = "py_spy-0.3.3-py2.py3-none-win_amd64.whl", hash = "sha256:a165d444cfbf24cdcdfe8cdaa858a179e1fae43adcb912e5efb3151362f67aa8"}, 933 | ] 934 | pygments = [ 935 | {file = "Pygments-2.6.1-py3-none-any.whl", hash = "sha256:ff7a40b4860b727ab48fad6360eb351cc1b33cbf9b15a0f689ca5353e9463324"}, 936 | {file = "Pygments-2.6.1.tar.gz", hash = "sha256:647344a061c249a3b74e230c739f434d7ea4d8b1d5f3721bc0f3558049b38f44"}, 937 | ] 938 | pyrsistent = [ 939 | {file = "pyrsistent-0.16.0.tar.gz", hash = "sha256:28669905fe725965daa16184933676547c5bb40a5153055a8dee2a4bd7933ad3"}, 940 | ] 941 | python-dateutil = [ 942 | {file = "python-dateutil-2.8.1.tar.gz", hash = "sha256:73ebfe9dbf22e832286dafa60473e4cd239f8592f699aa5adaf10050e6e1823c"}, 943 | {file = "python_dateutil-2.8.1-py2.py3-none-any.whl", hash = "sha256:75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a"}, 944 | ] 945 | pytz = [ 946 | {file = "pytz-2020.1-py2.py3-none-any.whl", hash = "sha256:a494d53b6d39c3c6e44c3bec237336e14305e4f29bbf800b599253057fbb79ed"}, 947 | {file = "pytz-2020.1.tar.gz", hash = "sha256:c35965d010ce31b23eeb663ed3cc8c906275d6be1a34393a1d73a41febf4a048"}, 948 | ] 949 | pyyaml = [ 950 | {file = "PyYAML-5.3.1-cp27-cp27m-win32.whl", hash = "sha256:74809a57b329d6cc0fdccee6318f44b9b8649961fa73144a98735b0aaf029f1f"}, 951 | {file = "PyYAML-5.3.1-cp27-cp27m-win_amd64.whl", hash = "sha256:240097ff019d7c70a4922b6869d8a86407758333f02203e0fc6ff79c5dcede76"}, 952 | {file = "PyYAML-5.3.1-cp35-cp35m-win32.whl", hash = "sha256:4f4b913ca1a7319b33cfb1369e91e50354d6f07a135f3b901aca02aa95940bd2"}, 953 | {file = "PyYAML-5.3.1-cp35-cp35m-win_amd64.whl", hash = "sha256:cc8955cfbfc7a115fa81d85284ee61147059a753344bc51098f3ccd69b0d7e0c"}, 954 | {file = "PyYAML-5.3.1-cp36-cp36m-win32.whl", hash = "sha256:7739fc0fa8205b3ee8808aea45e968bc90082c10aef6ea95e855e10abf4a37b2"}, 955 | {file = "PyYAML-5.3.1-cp36-cp36m-win_amd64.whl", hash = "sha256:69f00dca373f240f842b2931fb2c7e14ddbacd1397d57157a9b005a6a9942648"}, 956 | {file = "PyYAML-5.3.1-cp37-cp37m-win32.whl", hash = "sha256:d13155f591e6fcc1ec3b30685d50bf0711574e2c0dfffd7644babf8b5102ca1a"}, 957 | {file = "PyYAML-5.3.1-cp37-cp37m-win_amd64.whl", hash = "sha256:73f099454b799e05e5ab51423c7bcf361c58d3206fa7b0d555426b1f4d9a3eaf"}, 958 | {file = "PyYAML-5.3.1-cp38-cp38-win32.whl", hash = "sha256:06a0d7ba600ce0b2d2fe2e78453a470b5a6e000a985dd4a4e54e436cc36b0e97"}, 959 | {file = "PyYAML-5.3.1-cp38-cp38-win_amd64.whl", hash = "sha256:95f71d2af0ff4227885f7a6605c37fd53d3a106fcab511b8860ecca9fcf400ee"}, 960 | {file = "PyYAML-5.3.1.tar.gz", hash = "sha256:b8eac752c5e14d3eca0e6dd9199cd627518cb5ec06add0de9d32baeee6fe645d"}, 961 | ] 962 | ray = [ 963 | {file = "ray-0.8.6-cp35-cp35m-macosx_10_13_intel.whl", hash = "sha256:28bddf09debbc82ff19e1523ada131e7beaf2170f2d88cea72601ff11ff71757"}, 964 | {file = "ray-0.8.6-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:3a282f770855a56d3ede321ccb6e4a4b48eccb1daead9aa08c20c457ec186d27"}, 965 | {file = "ray-0.8.6-cp36-cp36m-macosx_10_13_intel.whl", hash = "sha256:fdd4b994ffa894dfe582107b230d515c05ca41ecb2152f28e6c893d4edcc8369"}, 966 | {file = "ray-0.8.6-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:dfd01dec0eddd446c1a22f979bf7ced185149f92d761d742592b4fc887dc439c"}, 967 | {file = "ray-0.8.6-cp36-cp36m-win_amd64.whl", hash = "sha256:aaf43089881dc203c56c2bec499c9b425a989894bf2b39d767a5c0825a4a5af2"}, 968 | {file = "ray-0.8.6-cp37-cp37m-macosx_10_13_intel.whl", hash = "sha256:e79bb29c6d93bc24253a75196a86201471f8ca461102e957cee71ec999fb06cf"}, 969 | {file = "ray-0.8.6-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:c007b1e87ef6af7ac684ecb9c2be27bcc5cd881b89ebdc3ea2d99047ffc1eeee"}, 970 | {file = "ray-0.8.6-cp37-cp37m-win_amd64.whl", hash = "sha256:402ed1be4363cc4494b7022524158f862f3e052d249497ac1145b4452ff08ebe"}, 971 | {file = "ray-0.8.6-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:9124994117fe26d12c0873737b19fdf6d80b7d283d53d85d1662bb6c98a0b418"}, 972 | {file = "ray-0.8.6-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:efaf70097d6e61d0f3d05acb59b6f3a627a3d8a326f1c5d212c93a7231e41d67"}, 973 | {file = "ray-0.8.6-cp38-cp38-win_amd64.whl", hash = "sha256:dbf79b7c4d7834bc5c506c397b8f03ecfe9b03e3e4611fe37d725a6e6ccb5649"}, 974 | ] 975 | redis = [ 976 | {file = "redis-3.4.1-py2.py3-none-any.whl", hash = "sha256:b205cffd05ebfd0a468db74f0eedbff8df1a7bfc47521516ade4692991bb0833"}, 977 | {file = "redis-3.4.1.tar.gz", hash = "sha256:0dcfb335921b88a850d461dc255ff4708294943322bd55de6cfd68972490ca1f"}, 978 | ] 979 | requests = [ 980 | {file = "requests-2.24.0-py2.py3-none-any.whl", hash = "sha256:fe75cc94a9443b9246fc7049224f75604b113c36acb93f87b80ed42c44cbb898"}, 981 | {file = "requests-2.24.0.tar.gz", hash = "sha256:b3559a131db72c33ee969480840fff4bb6dd111de7dd27c8ee1f820f4f00231b"}, 982 | ] 983 | scikit-learn = [ 984 | {file = "scikit-learn-0.23.2.tar.gz", hash = "sha256:20766f515e6cd6f954554387dfae705d93c7b544ec0e6c6a5d8e006f6f7ef480"}, 985 | {file = "scikit_learn-0.23.2-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:98508723f44c61896a4e15894b2016762a55555fbf09365a0bb1870ecbd442de"}, 986 | {file = "scikit_learn-0.23.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:a64817b050efd50f9abcfd311870073e500ae11b299683a519fbb52d85e08d25"}, 987 | {file = "scikit_learn-0.23.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:daf276c465c38ef736a79bd79fc80a249f746bcbcae50c40945428f7ece074f8"}, 988 | {file = "scikit_learn-0.23.2-cp36-cp36m-win32.whl", hash = "sha256:cb3e76380312e1f86abd20340ab1d5b3cc46a26f6593d3c33c9ea3e4c7134028"}, 989 | {file = "scikit_learn-0.23.2-cp36-cp36m-win_amd64.whl", hash = "sha256:0a127cc70990d4c15b1019680bfedc7fec6c23d14d3719fdf9b64b22d37cdeca"}, 990 | {file = "scikit_learn-0.23.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:2aa95c2f17d2f80534156215c87bee72b6aa314a7f8b8fe92a2d71f47280570d"}, 991 | {file = "scikit_learn-0.23.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:6c28a1d00aae7c3c9568f61aafeaad813f0f01c729bee4fd9479e2132b215c1d"}, 992 | {file = "scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:da8e7c302003dd765d92a5616678e591f347460ac7b53e53d667be7dfe6d1b10"}, 993 | {file = "scikit_learn-0.23.2-cp37-cp37m-win32.whl", hash = "sha256:d9a1ce5f099f29c7c33181cc4386660e0ba891b21a60dc036bf369e3a3ee3aec"}, 994 | {file = "scikit_learn-0.23.2-cp37-cp37m-win_amd64.whl", hash = "sha256:914ac2b45a058d3f1338d7736200f7f3b094857758895f8667be8a81ff443b5b"}, 995 | {file = "scikit_learn-0.23.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:7671bbeddd7f4f9a6968f3b5442dac5f22bf1ba06709ef888cc9132ad354a9ab"}, 996 | {file = "scikit_learn-0.23.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:d0dcaa54263307075cb93d0bee3ceb02821093b1b3d25f66021987d305d01dce"}, 997 | {file = "scikit_learn-0.23.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:5ce7a8021c9defc2b75620571b350acc4a7d9763c25b7593621ef50f3bd019a2"}, 998 | {file = "scikit_learn-0.23.2-cp38-cp38-win32.whl", hash = "sha256:0d39748e7c9669ba648acf40fb3ce96b8a07b240db6888563a7cb76e05e0d9cc"}, 999 | {file = "scikit_learn-0.23.2-cp38-cp38-win_amd64.whl", hash = "sha256:1b8a391de95f6285a2f9adffb7db0892718950954b7149a70c783dc848f104ea"}, 1000 | ] 1001 | scipy = [ 1002 | {file = "scipy-1.5.2-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:cca9fce15109a36a0a9f9cfc64f870f1c140cb235ddf27fe0328e6afb44dfed0"}, 1003 | {file = "scipy-1.5.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:1c7564a4810c1cd77fcdee7fa726d7d39d4e2695ad252d7c86c3ea9d85b7fb8f"}, 1004 | {file = "scipy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:07e52b316b40a4f001667d1ad4eb5f2318738de34597bd91537851365b6c61f1"}, 1005 | {file = "scipy-1.5.2-cp36-cp36m-win32.whl", hash = "sha256:d56b10d8ed72ec1be76bf10508446df60954f08a41c2d40778bc29a3a9ad9bce"}, 1006 | {file = "scipy-1.5.2-cp36-cp36m-win_amd64.whl", hash = "sha256:8e28e74b97fc8d6aa0454989db3b5d36fc27e69cef39a7ee5eaf8174ca1123cb"}, 1007 | {file = "scipy-1.5.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:6e86c873fe1335d88b7a4bfa09d021f27a9e753758fd75f3f92d714aa4093768"}, 1008 | {file = "scipy-1.5.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:a0afbb967fd2c98efad5f4c24439a640d39463282040a88e8e928db647d8ac3d"}, 1009 | {file = "scipy-1.5.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:eecf40fa87eeda53e8e11d265ff2254729d04000cd40bae648e76ff268885d66"}, 1010 | {file = "scipy-1.5.2-cp37-cp37m-win32.whl", hash = "sha256:315aa2165aca31375f4e26c230188db192ed901761390be908c9b21d8b07df62"}, 1011 | {file = "scipy-1.5.2-cp37-cp37m-win_amd64.whl", hash = "sha256:ec5fe57e46828d034775b00cd625c4a7b5c7d2e354c3b258d820c6c72212a6ec"}, 1012 | {file = "scipy-1.5.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:fc98f3eac993b9bfdd392e675dfe19850cc8c7246a8fd2b42443e506344be7d9"}, 1013 | {file = "scipy-1.5.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:a785409c0fa51764766840185a34f96a0a93527a0ff0230484d33a8ed085c8f8"}, 1014 | {file = "scipy-1.5.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0a0e9a4e58a4734c2eba917f834b25b7e3b6dc333901ce7784fd31aefbd37b2f"}, 1015 | {file = "scipy-1.5.2-cp38-cp38-win32.whl", hash = "sha256:dac09281a0eacd59974e24525a3bc90fa39b4e95177e638a31b14db60d3fa806"}, 1016 | {file = "scipy-1.5.2-cp38-cp38-win_amd64.whl", hash = "sha256:92eb04041d371fea828858e4fff182453c25ae3eaa8782d9b6c32b25857d23bc"}, 1017 | {file = "scipy-1.5.2.tar.gz", hash = "sha256:066c513d90eb3fd7567a9e150828d39111ebd88d3e924cdfc9f8ce19ab6f90c9"}, 1018 | ] 1019 | shap = [ 1020 | {file = "shap-0.35.0-cp35-cp35m-win_amd64.whl", hash = "sha256:3f8d5e1bfc2f0e8b442370e4b8253430f7078fab21b7769e89969b1a0194c3f9"}, 1021 | {file = "shap-0.35.0-cp36-cp36m-win_amd64.whl", hash = "sha256:ef9410e940396cb451039f7a1d639086e4e4e7c742faeb3fd8734e7b71cdf3d2"}, 1022 | {file = "shap-0.35.0.tar.gz", hash = "sha256:6b9a2a3636918b9cdce4d3c599786b38353fbdca49147b5407a75aee398b1018"}, 1023 | ] 1024 | six = [ 1025 | {file = "six-1.15.0-py2.py3-none-any.whl", hash = "sha256:8b74bedcbbbaca38ff6d7491d76f2b06b3592611af620f8426e82dddb04a5ced"}, 1026 | {file = "six-1.15.0.tar.gz", hash = "sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259"}, 1027 | ] 1028 | soupsieve = [ 1029 | {file = "soupsieve-1.9.6-py2.py3-none-any.whl", hash = "sha256:feb1e937fa26a69e08436aad4a9037cd7e1d4c7212909502ba30701247ff8abd"}, 1030 | {file = "soupsieve-1.9.6.tar.gz", hash = "sha256:7985bacc98c34923a439967c1a602dc4f1e15f923b6fcf02344184f86cc7efaa"}, 1031 | ] 1032 | threadpoolctl = [ 1033 | {file = "threadpoolctl-2.1.0-py3-none-any.whl", hash = "sha256:38b74ca20ff3bb42caca8b00055111d74159ee95c4370882bbff2b93d24da725"}, 1034 | {file = "threadpoolctl-2.1.0.tar.gz", hash = "sha256:ddc57c96a38beb63db45d6c159b5ab07b6bced12c45a1f07b2b92f272aebfa6b"}, 1035 | ] 1036 | tqdm = [ 1037 | {file = "tqdm-4.48.2-py2.py3-none-any.whl", hash = "sha256:1a336d2b829be50e46b84668691e0a2719f26c97c62846298dd5ae2937e4d5cf"}, 1038 | {file = "tqdm-4.48.2.tar.gz", hash = "sha256:564d632ea2b9cb52979f7956e093e831c28d441c11751682f84c86fc46e4fd21"}, 1039 | ] 1040 | typing-extensions = [ 1041 | {file = "typing_extensions-3.7.4.3-py2-none-any.whl", hash = "sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f"}, 1042 | {file = "typing_extensions-3.7.4.3-py3-none-any.whl", hash = "sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918"}, 1043 | {file = "typing_extensions-3.7.4.3.tar.gz", hash = "sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c"}, 1044 | ] 1045 | urllib3 = [ 1046 | {file = "urllib3-1.25.10-py2.py3-none-any.whl", hash = "sha256:e7983572181f5e1522d9c98453462384ee92a0be7fac5f1413a1e35c56cc0461"}, 1047 | {file = "urllib3-1.25.10.tar.gz", hash = "sha256:91056c15fa70756691db97756772bb1eb9678fa585d9184f24534b100dc60f4a"}, 1048 | ] 1049 | uvicorn = [ 1050 | {file = "uvicorn-0.11.8-py3-none-any.whl", hash = "sha256:4b70ddb4c1946e39db9f3082d53e323dfd50634b95fd83625d778729ef1730ef"}, 1051 | {file = "uvicorn-0.11.8.tar.gz", hash = "sha256:46a83e371f37ea7ff29577d00015f02c942410288fb57def6440f2653fff1d26"}, 1052 | ] 1053 | uvloop = [ 1054 | {file = "uvloop-0.14.0-cp35-cp35m-macosx_10_11_x86_64.whl", hash = "sha256:08b109f0213af392150e2fe6f81d33261bb5ce968a288eb698aad4f46eb711bd"}, 1055 | {file = "uvloop-0.14.0-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:4544dcf77d74f3a84f03dd6278174575c44c67d7165d4c42c71db3fdc3860726"}, 1056 | {file = "uvloop-0.14.0-cp36-cp36m-macosx_10_11_x86_64.whl", hash = "sha256:b4f591aa4b3fa7f32fb51e2ee9fea1b495eb75b0b3c8d0ca52514ad675ae63f7"}, 1057 | {file = "uvloop-0.14.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:f07909cd9fc08c52d294b1570bba92186181ca01fe3dc9ffba68955273dd7362"}, 1058 | {file = "uvloop-0.14.0-cp37-cp37m-macosx_10_11_x86_64.whl", hash = "sha256:afd5513c0ae414ec71d24f6f123614a80f3d27ca655a4fcf6cabe50994cc1891"}, 1059 | {file = "uvloop-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:e7514d7a48c063226b7d06617cbb12a14278d4323a065a8d46a7962686ce2e95"}, 1060 | {file = "uvloop-0.14.0-cp38-cp38-macosx_10_11_x86_64.whl", hash = "sha256:bcac356d62edd330080aed082e78d4b580ff260a677508718f88016333e2c9c5"}, 1061 | {file = "uvloop-0.14.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:4315d2ec3ca393dd5bc0b0089d23101276778c304d42faff5dc4579cb6caef09"}, 1062 | {file = "uvloop-0.14.0.tar.gz", hash = "sha256:123ac9c0c7dd71464f58f1b4ee0bbd81285d96cdda8bc3519281b8973e3a461e"}, 1063 | ] 1064 | websockets = [ 1065 | {file = "websockets-8.1-cp36-cp36m-macosx_10_6_intel.whl", hash = "sha256:3762791ab8b38948f0c4d281c8b2ddfa99b7e510e46bd8dfa942a5fff621068c"}, 1066 | {file = "websockets-8.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:3db87421956f1b0779a7564915875ba774295cc86e81bc671631379371af1170"}, 1067 | {file = "websockets-8.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:4f9f7d28ce1d8f1295717c2c25b732c2bc0645db3215cf757551c392177d7cb8"}, 1068 | {file = "websockets-8.1-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:295359a2cc78736737dd88c343cd0747546b2174b5e1adc223824bcaf3e164cb"}, 1069 | {file = "websockets-8.1-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:1d3f1bf059d04a4e0eb4985a887d49195e15ebabc42364f4eb564b1d065793f5"}, 1070 | {file = "websockets-8.1-cp36-cp36m-win32.whl", hash = "sha256:2db62a9142e88535038a6bcfea70ef9447696ea77891aebb730a333a51ed559a"}, 1071 | {file = "websockets-8.1-cp36-cp36m-win_amd64.whl", hash = "sha256:0e4fb4de42701340bd2353bb2eee45314651caa6ccee80dbd5f5d5978888fed5"}, 1072 | {file = "websockets-8.1-cp37-cp37m-macosx_10_6_intel.whl", hash = "sha256:9b248ba3dd8a03b1a10b19efe7d4f7fa41d158fdaa95e2cf65af5a7b95a4f989"}, 1073 | {file = "websockets-8.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:ce85b06a10fc65e6143518b96d3dca27b081a740bae261c2fb20375801a9d56d"}, 1074 | {file = "websockets-8.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:965889d9f0e2a75edd81a07592d0ced54daa5b0785f57dc429c378edbcffe779"}, 1075 | {file = "websockets-8.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:751a556205d8245ff94aeef23546a1113b1dd4f6e4d102ded66c39b99c2ce6c8"}, 1076 | {file = "websockets-8.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:3ef56fcc7b1ff90de46ccd5a687bbd13a3180132268c4254fc0fa44ecf4fc422"}, 1077 | {file = "websockets-8.1-cp37-cp37m-win32.whl", hash = "sha256:7ff46d441db78241f4c6c27b3868c9ae71473fe03341340d2dfdbe8d79310acc"}, 1078 | {file = "websockets-8.1-cp37-cp37m-win_amd64.whl", hash = "sha256:20891f0dddade307ffddf593c733a3fdb6b83e6f9eef85908113e628fa5a8308"}, 1079 | {file = "websockets-8.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:c1ec8db4fac31850286b7cd3b9c0e1b944204668b8eb721674916d4e28744092"}, 1080 | {file = "websockets-8.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:5c01fd846263a75bc8a2b9542606927cfad57e7282965d96b93c387622487485"}, 1081 | {file = "websockets-8.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:9bef37ee224e104a413f0780e29adb3e514a5b698aabe0d969a6ba426b8435d1"}, 1082 | {file = "websockets-8.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:d705f8aeecdf3262379644e4b55107a3b55860eb812b673b28d0fbc347a60c55"}, 1083 | {file = "websockets-8.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:c8a116feafdb1f84607cb3b14aa1418424ae71fee131642fc568d21423b51824"}, 1084 | {file = "websockets-8.1-cp38-cp38-win32.whl", hash = "sha256:e898a0863421650f0bebac8ba40840fc02258ef4714cb7e1fd76b6a6354bda36"}, 1085 | {file = "websockets-8.1-cp38-cp38-win_amd64.whl", hash = "sha256:f8a7bff6e8664afc4e6c28b983845c5bc14965030e3fb98789734d416af77c4b"}, 1086 | {file = "websockets-8.1.tar.gz", hash = "sha256:5c65d2da8c6bce0fca2528f69f44b2f977e06954c8512a952222cea50dad430f"}, 1087 | ] 1088 | werkzeug = [ 1089 | {file = "Werkzeug-1.0.1-py2.py3-none-any.whl", hash = "sha256:2de2a5db0baeae7b2d2664949077c2ac63fbd16d98da0ff71837f7d1dea3fd43"}, 1090 | {file = "Werkzeug-1.0.1.tar.gz", hash = "sha256:6c80b1e5ad3665290ea39320b91e1be1e0d5f60652b964a3070216de83d2e47c"}, 1091 | ] 1092 | yarl = [ 1093 | {file = "yarl-1.5.1-cp35-cp35m-macosx_10_14_x86_64.whl", hash = "sha256:db6db0f45d2c63ddb1a9d18d1b9b22f308e52c83638c26b422d520a815c4b3fb"}, 1094 | {file = "yarl-1.5.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:17668ec6722b1b7a3a05cc0167659f6c95b436d25a36c2d52db0eca7d3f72593"}, 1095 | {file = "yarl-1.5.1-cp35-cp35m-win32.whl", hash = "sha256:040b237f58ff7d800e6e0fd89c8439b841f777dd99b4a9cca04d6935564b9409"}, 1096 | {file = "yarl-1.5.1-cp35-cp35m-win_amd64.whl", hash = "sha256:f18d68f2be6bf0e89f1521af2b1bb46e66ab0018faafa81d70f358153170a317"}, 1097 | {file = "yarl-1.5.1-cp36-cp36m-macosx_10_14_x86_64.whl", hash = "sha256:c52ce2883dc193824989a9b97a76ca86ecd1fa7955b14f87bf367a61b6232511"}, 1098 | {file = "yarl-1.5.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:ce584af5de8830d8701b8979b18fcf450cef9a382b1a3c8ef189bedc408faf1e"}, 1099 | {file = "yarl-1.5.1-cp36-cp36m-win32.whl", hash = "sha256:df89642981b94e7db5596818499c4b2219028f2a528c9c37cc1de45bf2fd3a3f"}, 1100 | {file = "yarl-1.5.1-cp36-cp36m-win_amd64.whl", hash = "sha256:3a584b28086bc93c888a6c2aa5c92ed1ae20932f078c46509a66dce9ea5533f2"}, 1101 | {file = "yarl-1.5.1-cp37-cp37m-macosx_10_14_x86_64.whl", hash = "sha256:da456eeec17fa8aa4594d9a9f27c0b1060b6a75f2419fe0c00609587b2695f4a"}, 1102 | {file = "yarl-1.5.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:bc2f976c0e918659f723401c4f834deb8a8e7798a71be4382e024bcc3f7e23a8"}, 1103 | {file = "yarl-1.5.1-cp37-cp37m-win32.whl", hash = "sha256:4439be27e4eee76c7632c2427ca5e73703151b22cae23e64adb243a9c2f565d8"}, 1104 | {file = "yarl-1.5.1-cp37-cp37m-win_amd64.whl", hash = "sha256:48e918b05850fffb070a496d2b5f97fc31d15d94ca33d3d08a4f86e26d4e7c5d"}, 1105 | {file = "yarl-1.5.1-cp38-cp38-macosx_10_14_x86_64.whl", hash = "sha256:9b930776c0ae0c691776f4d2891ebc5362af86f152dd0da463a6614074cb1b02"}, 1106 | {file = "yarl-1.5.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:b3b9ad80f8b68519cc3372a6ca85ae02cc5a8807723ac366b53c0f089db19e4a"}, 1107 | {file = "yarl-1.5.1-cp38-cp38-win32.whl", hash = "sha256:f379b7f83f23fe12823085cd6b906edc49df969eb99757f58ff382349a3303c6"}, 1108 | {file = "yarl-1.5.1-cp38-cp38-win_amd64.whl", hash = "sha256:9102b59e8337f9874638fcfc9ac3734a0cfadb100e47d55c20d0dc6087fb4692"}, 1109 | {file = "yarl-1.5.1.tar.gz", hash = "sha256:c22c75b5f394f3d47105045ea551e08a3e804dc7e01b37800ca35b58f856c3d6"}, 1110 | ] 1111 | zipp = [ 1112 | {file = "zipp-3.1.0-py3-none-any.whl", hash = "sha256:aa36550ff0c0b7ef7fa639055d797116ee891440eac1a56f378e2d3179e0320b"}, 1113 | {file = "zipp-3.1.0.tar.gz", hash = "sha256:c599e4d75c98f6798c509911d08a22e6c021d074469042177c8c86fb92eefd96"}, 1114 | ] 1115 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "explainers" 3 | version = "0.1.0" 4 | description = "A packaged that distributes KernelSHAP using ray" 5 | authors = ["alexcoca "] 6 | 7 | [tool.poetry.dependencies] 8 | python = "^3.7" 9 | attrs = ">=19.1.0" 10 | numpy = ">=1.17.4" 11 | pandas = ">=0.23.4" 12 | prettyprinter = ">=0.18.0" 13 | ray = {version = "0.8.6", extras = ["serve"]} 14 | scipy = ">=1.3.1" 15 | scikit-learn = ">=0.21.2" 16 | shap = ">=0.35.0" 17 | requests = "^2.24.0" 18 | 19 | [tool.poetry.dev-dependencies] 20 | 21 | [build-system] 22 | requires = ["poetry>=0.12"] 23 | build-backend = "poetry.masonry.api" 24 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | attrs>=19.1.0 2 | numpy>=1.17.4 3 | pandas>=0.23.4 4 | prettyprinter>=0.18.0 5 | ray[serve]==0.8.6 6 | scipy>=1.3.1 7 | scikit-learn>=0.21.2 8 | shap>=0.35.0 9 | requests -------------------------------------------------------------------------------- /requirements_advanced.txt: -------------------------------------------------------------------------------- 1 | kubernetes -------------------------------------------------------------------------------- /scripts/fit_adult_model.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | 4 | import pickle 5 | from sklearn.linear_model import LogisticRegression 6 | from sklearn.metrics import accuracy_score 7 | from typing import Dict, Any 8 | from explainers.utils import load_data 9 | 10 | """ 11 | This script pulls the Adult data from the ``data/`` directory and fits a logistic regression model to it. Model is 12 | saved under ``assets/predictor.pkl``. 13 | """ 14 | 15 | 16 | def fit_adult_logistic_regression(data_dict: Dict[str, Any]): 17 | """ 18 | Fit a logistic regression model to the processed Adult dataset. 19 | """ 20 | 21 | logging.info("Fitting model ...") 22 | X_train_proc = data_dict['X']['processed']['train'] 23 | X_test_proc = data_dict['X']['processed']['test'] 24 | y_train = data_dict['y']['train'] 25 | y_test = data_dict['y']['test'] 26 | 27 | classifier = LogisticRegression(multi_class='multinomial', 28 | random_state=0, 29 | max_iter=500, 30 | verbose=0, 31 | ) 32 | classifier.fit(X_train_proc, y_train) 33 | 34 | logging.info(f"Test accuracy: {accuracy_score(y_test, classifier.predict(X_test_proc))}") 35 | 36 | return classifier 37 | 38 | 39 | def main(): 40 | 41 | if not os.path.exists('assets'): 42 | os.mkdir('assets') 43 | 44 | data = load_data() 45 | lr_predictor = fit_adult_logistic_regression(data['all']) 46 | with open("assets/predictor.pkl", "wb") as f: 47 | pickle.dump(lr_predictor, f) 48 | 49 | 50 | if __name__ == '__main__': 51 | main() 52 | -------------------------------------------------------------------------------- /scripts/process_adult_data.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import pickle 3 | import logging 4 | import os 5 | import sys 6 | import requests 7 | 8 | import numpy as np 9 | import pandas as pd 10 | 11 | from io import StringIO 12 | from requests import RequestException 13 | from sklearn.compose import ColumnTransformer 14 | from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder 15 | from typing import Any, Dict, List, Tuple, Union 16 | from explainers.utils import Bunch 17 | 18 | logger = logging.getLogger(__name__) 19 | 20 | ADULT_URLS = [ 21 | 'https://storage.googleapis.com/seldon-datasets/adult/adult.data', 22 | 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 23 | 'http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data', 24 | ] # type: List[str] 25 | 26 | 27 | sys.path.append('../') 28 | 29 | 30 | def fetch_adult(features_drop: list = None, return_X_y: bool = False, url_id: int = 0) -> \ 31 | Union[Bunch, Tuple[np.ndarray, np.ndarray]]: 32 | """ 33 | Downloads and pre-processes 'adult' dataset. 34 | More info: http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/ 35 | 36 | Parameters 37 | ---------- 38 | features_drop 39 | List of features to be dropped from dataset, by default drops ["fnlwgt", "Education-Num"] 40 | return_X_y 41 | If true, return features X and labels y as numpy arrays, if False return a Bunch object 42 | url_id 43 | Index specifying which URL to use for downloading 44 | 45 | Returns 46 | ------- 47 | Bunch 48 | Dataset, labels, a list of features and a dictionary containing a list with the potential categories 49 | for each categorical feature where the key refers to the feature column. 50 | (data, target) 51 | Tuple if ``return_X_y`` is true 52 | """ 53 | if features_drop is None: 54 | features_drop = ["fnlwgt", "Education-Num"] 55 | 56 | # download data 57 | dataset_url = ADULT_URLS[url_id] 58 | raw_features = ['Age', 'Workclass', 'fnlwgt', 'Education', 'Education-Num', 'Marital Status', 59 | 'Occupation', 'Relationship', 'Race', 'Sex', 'Capital Gain', 'Capital Loss', 60 | 'Hours per week', 'Country', 'Target'] 61 | try: 62 | resp = requests.get(dataset_url) 63 | resp.raise_for_status() 64 | except RequestException: 65 | logger.exception("Could not connect, URL may be out of service") 66 | raise 67 | 68 | raw_data = pd.read_csv(StringIO(resp.text), names=raw_features, delimiter=', ', engine='python').fillna('?') 69 | 70 | # get labels, features and drop unnecessary features 71 | labels = (raw_data['Target'] == '>50K').astype(int).values 72 | features_drop += ['Target'] 73 | data = raw_data.drop(features_drop, axis=1) 74 | features = list(data.columns) 75 | 76 | # map categorical features 77 | education_map = { 78 | '10th': 'Dropout', '11th': 'Dropout', '12th': 'Dropout', '1st-4th': 79 | 'Dropout', '5th-6th': 'Dropout', '7th-8th': 'Dropout', '9th': 80 | 'Dropout', 'Preschool': 'Dropout', 'HS-grad': 'High School grad', 81 | 'Some-college': 'High School grad', 'Masters': 'Masters', 82 | 'Prof-school': 'Prof-School', 'Assoc-acdm': 'Associates', 83 | 'Assoc-voc': 'Associates' 84 | } 85 | occupation_map = { 86 | "Adm-clerical": "Admin", "Armed-Forces": "Military", 87 | "Craft-repair": "Blue-Collar", "Exec-managerial": "White-Collar", 88 | "Farming-fishing": "Blue-Collar", "Handlers-cleaners": 89 | "Blue-Collar", "Machine-op-inspct": "Blue-Collar", "Other-service": 90 | "Service", "Priv-house-serv": "Service", "Prof-specialty": 91 | "Professional", "Protective-serv": "Other", "Sales": 92 | "Sales", "Tech-support": "Other", "Transport-moving": 93 | "Blue-Collar" 94 | } 95 | country_map = { 96 | 'Cambodia': 'SE-Asia', 'Canada': 'British-Commonwealth', 'China': 97 | 'China', 'Columbia': 'South-America', 'Cuba': 'Other', 98 | 'Dominican-Republic': 'Latin-America', 'Ecuador': 'South-America', 99 | 'El-Salvador': 'South-America', 'England': 'British-Commonwealth', 100 | 'France': 'Euro_1', 'Germany': 'Euro_1', 'Greece': 'Euro_2', 101 | 'Guatemala': 'Latin-America', 'Haiti': 'Latin-America', 102 | 'Holand-Netherlands': 'Euro_1', 'Honduras': 'Latin-America', 103 | 'Hong': 'China', 'Hungary': 'Euro_2', 'India': 104 | 'British-Commonwealth', 'Iran': 'Other', 'Ireland': 105 | 'British-Commonwealth', 'Italy': 'Euro_1', 'Jamaica': 106 | 'Latin-America', 'Japan': 'Other', 'Laos': 'SE-Asia', 'Mexico': 107 | 'Latin-America', 'Nicaragua': 'Latin-America', 108 | 'Outlying-US(Guam-USVI-etc)': 'Latin-America', 'Peru': 109 | 'South-America', 'Philippines': 'SE-Asia', 'Poland': 'Euro_2', 110 | 'Portugal': 'Euro_2', 'Puerto-Rico': 'Latin-America', 'Scotland': 111 | 'British-Commonwealth', 'South': 'Euro_2', 'Taiwan': 'China', 112 | 'Thailand': 'SE-Asia', 'Trinadad&Tobago': 'Latin-America', 113 | 'United-States': 'United-States', 'Vietnam': 'SE-Asia' 114 | } 115 | married_map = { 116 | 'Never-married': 'Never-Married', 'Married-AF-spouse': 'Married', 117 | 'Married-civ-spouse': 'Married', 'Married-spouse-absent': 118 | 'Separated', 'Separated': 'Separated', 'Divorced': 119 | 'Separated', 'Widowed': 'Widowed' 120 | } 121 | mapping = {'Education': education_map, 'Occupation': occupation_map, 'Country': country_map, 122 | 'Marital Status': married_map} 123 | 124 | data_copy = data.copy() 125 | for f, f_map in mapping.items(): 126 | data_tmp = data_copy[f].values 127 | for key, value in f_map.items(): 128 | data_tmp[data_tmp == key] = value 129 | data[f] = data_tmp 130 | 131 | # get categorical features and apply labelencoding 132 | categorical_features = [f for f in features if data[f].dtype == 'O'] 133 | category_map = {} 134 | for f in categorical_features: 135 | le = LabelEncoder() 136 | data_tmp = le.fit_transform(data[f].values) 137 | data[f] = data_tmp 138 | category_map[features.index(f)] = list(le.classes_) 139 | 140 | # only return data values 141 | data = data.values 142 | target_names = ['<=50K', '>50K'] 143 | 144 | if return_X_y: 145 | return data, labels 146 | 147 | return Bunch(data=data, target=labels, feature_names=features, target_names=target_names, category_map=category_map) 148 | 149 | 150 | def load_adult_dataset(): 151 | """ 152 | Load the Adult dataset. 153 | """ 154 | 155 | logging.info("Preprocessing data...") 156 | return fetch_adult() 157 | 158 | 159 | def preprocess_adult_dataset(dataset, seed=0, n_train_examples=30000) -> Dict[str, Any]: 160 | """ 161 | Splits dataset into train and test subsets and preprocesses it. 162 | """ 163 | 164 | logging.info("Splitting data...") 165 | 166 | np.random.seed(seed) 167 | data = dataset.data 168 | target = dataset.target 169 | data_perm = np.random.permutation(np.c_[data, target]) 170 | data = data_perm[:, :-1] 171 | target = data_perm[:, -1] 172 | 173 | X_train, y_train = data[:n_train_examples, :], target[:n_train_examples] 174 | X_test, y_test = data[n_train_examples + 1:, :], target[n_train_examples + 1:] 175 | 176 | logging.info("Transforming data...") 177 | category_map = dataset.category_map 178 | feature_names = dataset.feature_names 179 | 180 | ordinal_features = [x for x in range(len(feature_names)) if x not in list(category_map.keys())] 181 | ordinal_transformer = StandardScaler() 182 | 183 | categorical_features = list(category_map.keys()) 184 | categorical_transformer = OneHotEncoder(drop='first', handle_unknown='error') 185 | 186 | preprocessor = ColumnTransformer( 187 | transformers=[ 188 | ('num', ordinal_transformer, ordinal_features), 189 | ('cat', categorical_transformer, categorical_features) 190 | ] 191 | ) 192 | 193 | preprocessor.fit(X_train) 194 | X_train_proc = preprocessor.transform(X_train) 195 | X_test_proc = preprocessor.transform(X_test) 196 | 197 | # create groups for categorical variables 198 | numerical_feats_idx = preprocessor.transformers_[0][2] 199 | categorical_feats_idx = preprocessor.transformers_[1][2] 200 | ohe = preprocessor.transformers_[1][1] 201 | 202 | # compute encoded dimension; -1 as ohe is setup with drop='first' 203 | feat_enc_dim = [len(cat_enc) - 1 for cat_enc in ohe.categories_] 204 | num_feats_names = [feature_names[i] for i in numerical_feats_idx] 205 | cat_feats_names = [feature_names[i] for i in categorical_feats_idx] 206 | 207 | group_names = num_feats_names + cat_feats_names 208 | # each sublist contains the col. indices for each variable in group_names 209 | groups = [] 210 | cat_var_idx = 0 211 | 212 | for name in group_names: 213 | if name in num_feats_names: 214 | groups.append(list(range(len(groups), len(groups) + 1))) 215 | else: 216 | start_idx = groups[-1][-1] + 1 if groups else 0 217 | groups.append(list(range(start_idx, start_idx + feat_enc_dim[cat_var_idx]))) 218 | cat_var_idx += 1 219 | 220 | return { 221 | 'X': { 222 | 'raw': {'train': X_train, 'test': X_test}, 223 | 'processed': {'train': X_train_proc, 'test': X_test_proc}}, 224 | 'y': {'train': y_train, 'test': y_test}, 225 | 'preprocessor': preprocessor, 226 | 'orig_feature_names': feature_names, 227 | 'groups': groups, 228 | 'group_names': group_names, 229 | } 230 | 231 | 232 | def main(): 233 | 234 | if not os.path.exists('data'): 235 | os.mkdir('data') 236 | 237 | # load and preprocess data 238 | adult_dataset = load_adult_dataset() 239 | adult_preprocessed = preprocess_adult_dataset(adult_dataset, n_train_examples=args.n_train_examples) 240 | # select first args.n_background_samples in train set as background dataset 241 | background_dataset = {'X': {'raw': None, 'preprocessed': None}, 'y': None} 242 | n_examples = args.n_background_samples 243 | background_dataset['X']['raw'] = adult_preprocessed['X']['raw']['train'][0:n_examples, :] 244 | background_dataset['X']['preprocessed'] = adult_preprocessed['X']['processed']['train'][0:n_examples, :] 245 | background_dataset['y'] = adult_preprocessed['y']['train'][0:n_examples] 246 | with open('data/adult_background.pkl', 'wb') as f: 247 | pickle.dump(background_dataset, f) 248 | with open('data/adult_processed.pkl', 'wb') as f: 249 | pickle.dump(adult_preprocessed, f) 250 | 251 | 252 | if __name__ == '__main__': 253 | parser = argparse.ArgumentParser() 254 | parser.add_argument('-n_background_samples', type=int, default=100, help="Background set size.") 255 | parser.add_argument('-n_train_examples', type=int, default=30000, help="Number of training examples.") 256 | args = parser.parse_args() 257 | main() 258 | --------------------------------------------------------------------------------