├── .gitignore
├── Analysis.ipynb
├── README.md
├── benchmarks
    ├── k8s_benchmark_pool.sh
    ├── k8s_benchmark_serve.sh
    ├── k8s_ray_pool.py
    ├── k8s_serve_explanations.py
    ├── ray_pool.py
    └── serve_explanations.py
├── cluster
    ├── Makefile.pool
    ├── Makefile.serve
    ├── README.md
    ├── ray_cluster.yaml
    └── ray_pool_cluster.yaml
├── dockerfiles
    ├── Dockerfile
    └── Makefile
├── explainers
    ├── __init__.py
    ├── distributed.py
    ├── interface.py
    ├── kernel_shap.py
    ├── utils.py
    └── wrappers.py
├── images
    ├── pool_1_node.PNG
    ├── pool_k8s_32.PNG
    ├── pool_k8s_56.PNG
    ├── serve_1_node.PNG
    ├── serve_k8s_32.PNG
    └── serve_k8s_56.PNG
├── poetry.lock
├── pyproject.toml
├── requirements.txt
├── requirements_advanced.txt
└── scripts
    ├── fit_adult_model.py
    └── process_adult_data.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # PyCharm
 76 | .idea/
 77 | 
 78 | # pyenv
 79 | .python-version
 80 | 
 81 | # celery beat schedule file
 82 | celerybeat-schedule
 83 | 
 84 | # SageMath parsed files
 85 | *.sage.py
 86 | 
 87 | # Environments
 88 | .env
 89 | .venv
 90 | env/
 91 | venv/
 92 | ENV/
 93 | env.bak/
 94 | venv.bak/
 95 | 
 96 | # Spyder project settings
 97 | .spyderproject
 98 | .spyproject
 99 | 
100 | # Rope project settings
101 | .ropeproject
102 | 
103 | # mkdocs documentation
104 | /site
105 | 
106 | # mypy
107 | .mypy_cache/
108 | 
109 | # Model binaries
110 | examples/*.h5
111 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Distributing KernelSHAP using `ray`
 2 | 
 3 | This repository shows how to distribute explanations with KernelSHAP one a single node or a Kubernetes cluster using [`ray`](https://github.com/ray-project/ray). The predictions of a logistic regression model on `2560` instances from the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset are explained using KernelSHAP configured with a background set of `100` samples from the same dataset. The data preprocessing and model fitting steps are available in the `scripts/` folder, but both the data and the model will be automatically downloaded by the benchmarking scripts.
 4 | 
 5 | ## Distributed KernelSHAP on a single multicore node
 6 | ### Setup
 7 | 
 8 | 1. Install [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/)
 9 | 2. Create a virtual environment with `conda create --name shap python=3.7`
10 | 3. Activate the environment with `conda activate shap` 
11 | 4. Execute `pip install .` in order to install the dependencies needed to run the benchmarking scripts
12 | 
13 | ### Running the benchmarks
14 | 
15 | Two code versions are available:
16 | 
17 |   - One using a parallel pool of `ray` actors, which consume small subsets of the `2560` dataset to be explained
18 |   - One using `ray serve` instead of the parallel pool
19 | 
20 | The two methods can be run from the repository root, using the scripts `benchmarks/ray_pool.py` and `bechmarks/serve_explanations.py`, respectively. Options that can be configured are:
21 |    - number of actors/replicas that the task is going to be distributed on (e.g., `--workers 5` (pool), `--replicas 5` (ray serve))
22 |    - if a benchmark (i.e., redistributing the task over an increasingly large pool or number of replicas) is to be performed (`-benchmark 0` to disable or `benchmark 1` to enable)
23 |    - the number of times the task is run for the same configuration in benchmarking mode (e.g, `--nruns 5`)
24 |    - how many instances can be sent to an actor/replica at once (this is a required argument) (e.g., `-b 1 5 10` (pool) `-batch 1 5 10` (ray serve)). If more than one value is passed after the argument name, the task (or benchmarking) will be executed for different batch sizes
25 |    
26 | ## Distributed KernelShap on a Kubernetes cluster
27 | ### Setup 
28 | 
29 | This requires you to have access to a Kubernetes cluster and have [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed. Don't forget to export the path to the cluster configuration `.yaml` file in your `KUBECONFIG` environment variable, as described [here](https://auth0.com/blog/kubernetes-tutorial-step-by-step-introduction-to-basic-concepts/) before moving on to the next steps.
30 | 
31 | ### Running the benchmarks
32 | 
33 | The `ray_pool.py` and `serve_explanations.py` have been modified to be deployable in the kubernetes and prefixed by `k8s_`. The benchmark experiments can be run via the `bash` scripts in the `benchmarks/` folder. These scripts:
34 | 
35 |   - Apply the appropriate k8s manifest in `cluster/` to the k8s cluster
36 |   - Upload a `k8s*.py` file to it 
37 |   - Run the script 
38 |   - Pull the results and save them in the `results` directory
39 |   
40 | Specifically:
41 | 
42 |   - Calling `bash benchmarks/k8s_benchmark_pool.sh 10 20 ` will run the benchmark with increasing number of workers (the cluster is reset as the number of workers is increased). By default the experiment is run with batches of sizes `1 5` and `10`. This can be changed by updating the value of `BATCH` in `cluster/Makefile.pool`
43 |   - Calling `bash benchmarks/k8s_benchmark_serve.sh 10 20 ray` will run the benchmark with increasing number of workers and batch size of `1 5` and `10` for each worker. The batch size setting can be modified from the `.sh` script itself. The `ray` argument means that `ray` is able to batch single requests together and dispatch them to the same worker. If replaced by `default`, minibatches will be distributed to each worker
44 |   
45 | ## Sample results
46 | ### Single node
47 | The experiments were run on a compute-optimized dedicated machine in Digital Ocean with 32vCPUs. This explains why the performance gains attenuation below.
48 | 
49 | The results obtained running the task using the `ray` parallel pool are below:
50 | 
51 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_1_node.PNG?raw=true)
52 | 
53 | Distributing using ray serve yields similar results:
54 | 
55 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_1_node.PNG?raw=true)
56 | ### Kubernetes cluster
57 | The experiments were run on a cluster consisting of two compute-optimized dedicated machine in Digital Ocean with 32vCPUs each. This explains why the performance gains attenuation below.
58 | 
59 | The results obtained running the task using the `ray` parallel pool over a two-node cluster are shown below:
60 | 
61 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_k8s_32.PNG?raw=true)
62 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/pool_k8s_56.PNG?raw=true)
63 | 
64 | Distributing using ray serve yields similar results:
65 | 
66 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_k8s_32.PNG?raw=true)
67 | ![alt text](https://github.com/alexcoca/DistributedKernelShap/blob/master/images/serve_k8s_56.PNG?raw=true)
68 | 


--------------------------------------------------------------------------------
/benchmarks/k8s_benchmark_pool.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | START=$1
 3 | END=$2
 4 | echo "Workers range tested: {$START..$END}"
 5 | cd ./cluster || exit
 6 | for i in $(seq "$START" "$END"); do
 7 |   echo "Distributing over a pool of size $i actors"
 8 |   make -f Makefile.pool deploy
 9 |   make -f Makefile.pool upload-script
10 |   make -f Makefile.pool run-experiment WORKERS="$i"
11 |   make -f Makefile.pool pull-results
12 |   make -f Makefile.pool destroy
13 | done
14 | 


--------------------------------------------------------------------------------
/benchmarks/k8s_benchmark_serve.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | START=$1
 3 | END=$2
 4 | BATCH_MODE=$3
 5 | BATCH_SIZE=(1 5 10)
 6 | echo "Workers range tested: {$START..$END}"
 7 | echo "Batch mode: $BATCH_MODE"
 8 | cd ./cluster || exit
 9 | for i in $(seq "$START" "$END"); do
10 |   for j in "${BATCH_SIZE[@]}"; do
11 |     echo "Distributing explanations over $i workers"
12 |     echo "Current batch size: $j instances"
13 |     make -f Makefile.serve deploy
14 |     make -f Makefile.serve upload-script
15 |     make -f Makefile.serve run-experiment WORKERS="$i" BATCH="$j" BATCH_MODE="$BATCH_MODE"
16 |     make -f Makefile.serve pull-results
17 |     make -f Makefile.serve destroy
18 |   done
19 | done
20 | 


--------------------------------------------------------------------------------
/benchmarks/k8s_ray_pool.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import logging
  3 | import os
  4 | import pickle
  5 | import ray
  6 | 
  7 | import numpy as np
  8 | 
  9 | from explainers.kernel_shap import KernelShap
 10 | from explainers.utils import get_filename, load_data, load_model
 11 | from sklearn.metrics import accuracy_score
 12 | from typing import Any, Dict
 13 | from timeit import default_timer as timer
 14 | 
 15 | 
 16 | logging.basicConfig(level=logging.INFO)
 17 | 
 18 | 
 19 | def fit_kernel_shap_explainer(clf, data: dict, distributed_opts: Dict[str, Any] = None):
 20 |     """
 21 |     Returns an a fitted KernelShap explainer for the classifier `clf`. The categorical variables are grouped according
 22 |     to the information specified in `data`.
 23 | 
 24 |     Parameters
 25 |     ----------
 26 |     clf
 27 |         Classifier whose predictions are to be explained.
 28 |     data
 29 |         Contains the background data as well as information about the features and the columns in the feature matrix
 30 |         they occupy.
 31 |     distributed_opts
 32 |         Options controlling the number of worker processes that will distribute the workload.
 33 |     """
 34 | 
 35 |     pred_fcn = clf.predict_proba
 36 |     group_names, groups = data['all']['group_names'], data['all']['groups']
 37 |     explainer = KernelShap(pred_fcn, link='logit', feature_names=group_names, distributed_opts=distributed_opts, seed=0)
 38 |     explainer.fit(data['background']['X']['preprocessed'], group_names=group_names, groups=groups)
 39 |     return explainer
 40 | 
 41 | 
 42 | def run_explainer(explainer, X_explain: np.ndarray, distributed_opts: dict, nruns: int, batch_size: int):
 43 |     """
 44 |     Explain `X_explain` with `explainer` configured with `distributed_opts` `nruns` times in order to obtain
 45 |     runtime statistics.
 46 | 
 47 |     Parameters
 48 |     ---------
 49 |     explainer
 50 |         Fitted KernelShap explainer object
 51 |     X_explain
 52 |         Array containing instances to be explained, layed out row-wise. Split into minibatches that are distributed
 53 |         by the explainer.
 54 |     distributed_opts
 55 |         A dictionary of the form::
 56 | 
 57 |             {
 58 |             'n_cpus': int - controls the number of workers on which the instances are explained
 59 |             'batch_size': int - the size of a minibatch into which the dateset is split
 60 |             'actor_cpu_fraction': the fraction of CPU allocated to an actor
 61 |             }
 62 |     batch_size:
 63 |         The minibatch size for the current set of of `nruns`
 64 |     nruns
 65 |         Number of times `X_explain` is explained for a given workers and batch size setting.
 66 |     """
 67 | 
 68 |     if not os.path.exists('./results'):
 69 |         os.mkdir('./results')
 70 | 
 71 |     result = {'t_elapsed': []}
 72 |     workers = distributed_opts['n_cpus']
 73 |     # update minibatch size
 74 |     explainer._explainer.batch_size = batch_size
 75 |     for run in range(nruns):
 76 |         logging.info(f"run: {run}")
 77 |         t_start = timer()
 78 |         explanation = explainer.explain(X_explain, silent=True)
 79 |         t_elapsed = timer() - t_start
 80 |         logging.info(f"Time elapsed: {t_elapsed}")
 81 |         result['t_elapsed'].append(t_elapsed)
 82 | 
 83 |         with open(get_filename(workers, batch_size, serve=False), 'wb') as f:
 84 |             pickle.dump(result, f)
 85 | 
 86 | 
 87 | def main():
 88 | 
 89 |     # initialise ray
 90 |     ray.init(address='auto')
 91 | 
 92 |     # experiment settings
 93 |     nruns = args.nruns
 94 |     batch_sizes = [int(elem) for elem in args.batch]
 95 | 
 96 |     # load data and instances to be explained
 97 |     data = load_data()
 98 |     predictor = load_model('assets/predictor.pkl')  # download if not available locally
 99 |     y_test, X_test_proc = data['all']['y']['test'], data['all']['X']['processed']['test']
100 |     logging.info(f"Test accuracy: {accuracy_score(y_test, predictor.predict(X_test_proc))}")
101 |     X_explain = data['all']['X']['processed']['test'].toarray()  # instances to be explained
102 | 
103 |     distributed_opts = {'n_cpus': args.workers}
104 |     explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts)
105 |     for batch_size in batch_sizes:
106 |         logging.info(f"Running experiment using {args.workers} actors...")
107 |         logging.info(f"Batch size: {batch_size}")
108 |         run_explainer(explainer, X_explain, distributed_opts, nruns, batch_size)
109 | 
110 | 
111 | if __name__ == '__main__':
112 |     parser = argparse.ArgumentParser()
113 |     parser.add_argument(
114 |         "-b",
115 |         "--batch",
116 |         nargs='+',
117 |         help="A list of values representing the maximum batch size of instances sent to the same worker.",
118 |         required=True,
119 |     )
120 |     parser.add_argument(
121 |         "-w",
122 |         "--workers",
123 |         default=1,
124 |         type=int,
125 |         help="The number of workers to distribute the explanations dataset on."
126 |     )
127 |     parser.add_argument(
128 |         "-n",
129 |         "--nruns",
130 |         default=5,
131 |         type=int,
132 | 
133 |         help="Controls how many times an experiment is run for a given number of workers to obtain run statistics."
134 |     )
135 |     args = parser.parse_args()
136 |     main()
137 | 


--------------------------------------------------------------------------------
/benchmarks/k8s_serve_explanations.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import logging
  3 | import os
  4 | import ray
  5 | import pickle
  6 | import requests
  7 | import numpy as np
  8 | 
  9 | import explainers.wrappers as wrappers
 10 | 
 11 | from collections import namedtuple
 12 | from ray import serve
 13 | from timeit import default_timer as timer
 14 | from typing import Any, Dict, List, Tuple
 15 | from explainers.utils import get_filename, batch, load_data, load_model
 16 | 
 17 | 
 18 | logging.basicConfig(level=logging.INFO)
 19 | 
 20 | PREDICTOR_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl'
 21 | PREDICTOR_PATH = 'assets/predictor.pkl'
 22 | """
 23 | str: The file containing the predictor. The predictor can be created by running `fit_adult_model.py` or output by 
 24 | calling `explainers.utils.load_model()`, which will download a default predictor if `assets/` does not contain one. 
 25 | """
 26 | 
 27 | 
 28 | def endpont_setup(tag: str, backend_tag: str, route: str = "/"):
 29 |     """
 30 |     Creates an endpoint for serving explanations.
 31 | 
 32 |     Parameters
 33 |     ----------
 34 |     tag
 35 |         Endpoint tag.
 36 |     backend_tag
 37 |         A tag for the backend this explainer will connect to.
 38 |     route
 39 |         The URL where the explainer can be queried.
 40 |     """
 41 |     serve.create_endpoint(tag, backend=backend_tag, route=route, methods=["GET"])
 42 | 
 43 | 
 44 | def backend_setup(tag: str, worker_args: Tuple, replicas: int, max_batch_size: int) -> None:
 45 |     """
 46 |     Setups the backend for the distributed explanation task.
 47 | 
 48 |     Parameters
 49 |     ----------
 50 |     tag
 51 |         A tag for the backend component. The same tag must be passed to `endpoint_setup`.
 52 |     worker_args
 53 |         A tuple containing the arguments for initialising the explainer and fitting it.
 54 |     replicas
 55 |         The number of backend replicas that serve explanations.
 56 |     max_batch_size
 57 |         Maximum number of requests to batch and send to a worker process.
 58 |     """
 59 | 
 60 |     if max_batch_size == 1:
 61 |         config = {'num_replicas': max(replicas, 1)}
 62 |         serve.create_backend(tag, wrappers.KernelShapModel, *worker_args)
 63 |     else:
 64 |         config = {'num_replicas': max(replicas, 1), 'max_batch_size': max_batch_size}
 65 |         serve.create_backend(tag, wrappers.BatchKernelShapModel, *worker_args)
 66 |     serve.update_backend_config(tag, config)
 67 | 
 68 |     logging.info(f"Backends: {serve.list_backends()}")
 69 | 
 70 | 
 71 | def prepare_explainer_args(data: Dict[str, Any]) -> Tuple[str, np.ndarray, dict, dict]:
 72 |     """
 73 |     Extracts the name of the features (group_names) and the columns corresponding to each feature in the faeture matrix
 74 |     (group_names) from the `data` dict and defines the explainer arguments. The background data necessary to initialise
 75 |     the explainer is also extracted from the same dictionary.
 76 | 
 77 |     Parameters
 78 |     ----------
 79 |     data
 80 |         A dictionary that contains all information necessary to initialise the explainer.
 81 | 
 82 |     Returns
 83 |     -------
 84 |     A tuple containing the positional and keyword arguments necessary for initialising the explainers.
 85 |     """
 86 | 
 87 |     groups = data['all']['groups']
 88 |     group_names = data['all']['group_names']
 89 |     background_data = data['background']['X']['preprocessed']
 90 |     assert background_data.shape[0] == 100
 91 |     init_kwargs = {'link': 'logit', 'feature_names': group_names, 'seed': 0}
 92 |     fit_kwargs = {'groups': groups, 'group_names': group_names}
 93 |     predictor = load_model(PREDICTOR_URL)
 94 |     worker_args = (predictor, background_data, init_kwargs, fit_kwargs)
 95 | 
 96 |     return worker_args
 97 | 
 98 | 
 99 | @ray.remote
100 | def distribute_request(instance: np.ndarray, url: str = "http://localhost:8000/explain") -> str:
101 |     """
102 |     Task for distributing the explanations across the backend.
103 | 
104 |     Parameters
105 |     ----------
106 |     instance
107 |         Instance to be explained.
108 |     url:
109 |         The explainer URL.
110 | 
111 |     Returns
112 |     -------
113 |     A str representation of the explanation output json file.
114 |     """
115 | 
116 |     resp = requests.get(url, json={"array": instance.tolist()})
117 |     return resp.json()
118 | 
119 | 
120 | def request_explanations(instances: List[np.ndarray], *, url: str) -> namedtuple:
121 |     """
122 |     Sends the instances to the explainer URL.
123 | 
124 |     Parameters
125 |     ----------
126 |     instances:
127 |         Array of instances to be explained.
128 |     url
129 |         Explainer endpoint.
130 | 
131 | 
132 |     Returns
133 |     -------
134 |     responses
135 |         A named tuple with a `responses` field and a `t_elapsed` field.
136 |     """
137 | 
138 |     run_output = namedtuple('run_output', 'responses t_elapsed')
139 |     tstart = timer()
140 |     responses_id = [distribute_request.remote(instance, url=url) for instance in instances]
141 |     responses = [ray.get(resp_id) for resp_id in responses_id]
142 |     t_elapsed = timer() - tstart
143 |     logging.info(f"Time elapsed: {t_elapsed}...")
144 | 
145 |     return run_output(responses=responses, t_elapsed=t_elapsed)
146 | 
147 | 
148 | def run_explainer(X_explain: np.ndarray,
149 |                   n_runs: int,
150 |                   replicas: int,
151 |                   max_batch_size: int,
152 |                   batch_mode: str = 'ray',
153 |                   url: str = "http://localhost:8000/explain"):
154 |     """
155 |     Setup an endpoint and a backend and send requests to the endpoint.
156 | 
157 |     Parameters
158 |     -----------
159 |     X_explain
160 |         Instances to be explained. Each row is an instance that is explained independently of others.
161 |     n_runs
162 |         Number of times to run an experiment where the entire set of explanations is sent to the explainer endpoint.
163 |         Used to determine the average runtime given the number of cores.
164 |     replicas
165 |         How many backend replicas should be used for distributing the workload
166 |     max_batch_size
167 |         The maximum batch size the explainer accepts.
168 |     batch_mode : {'ray', 'default'}
169 |         If 'ray', ray_serve components are leveraged for minibatches. Otherwise the input tensor is split into
170 |         minibatches which are sent to the endpoint.
171 |     url
172 |         The url of the explainer endpoint.
173 |     """
174 | 
175 |     result = {'t_elapsed': []}
176 |     # extract instances to be explained from the dataset
177 |     assert X_explain.shape[0] == 2560
178 | 
179 |     # split input into separate requests
180 |     if batch_mode == 'ray':
181 |         instances = np.split(X_explain, X_explain.shape[0])  # use ray serve to batch the requests
182 |         logging.info(f"Explaining {len(instances)} instances...")
183 |     else:
184 |         instances = batch(X_explain, batch_size=max_batch_size)
185 |         logging.info(f"Explaining {len(instances)} mini-batches of size {max_batch_size}...")
186 | 
187 |     # distribute it
188 |     for run in range(n_runs):
189 |         logging.info(f"Experiment run: {run}...")
190 |         results = request_explanations(instances, url=url)
191 |         result['t_elapsed'].append(results.t_elapsed)
192 | 
193 |         with open(get_filename(replicas, max_batch_size, serve=True), 'wb') as f:
194 |             pickle.dump(result, f)
195 | 
196 | 
197 | def main():
198 | 
199 |     if not os.path.exists('results'):
200 |         os.mkdir('results')
201 | 
202 |     data = load_data()
203 |     X_explain = data['all']['X']['processed']['test'].toarray()
204 | 
205 |     max_batch_size = [int(elem) for elem in args.max_batch_size][0]
206 |     batch_mode, replicas = args.batch_mode, args.replicas
207 |     ray.init(address='auto')  # connect to the cluster
208 |     serve.init(http_host='0.0.0.0')  # listen on 0.0.0.0 to make endpoint accessible from other machines
209 |     host, route = os.environ.get("RAY_HEAD_SERVICE_HOST", args.host), "explain"
210 |     url = f"http://{host}:{args.port}/{route}"
211 |     backend_tag = "kernel_shap:b100"  # b100 means 100 background samples
212 |     endpoint_tag = f"{backend_tag}_endpoint"
213 |     worker_args = prepare_explainer_args(data)
214 |     if batch_mode == 'ray':
215 |         backend_setup(backend_tag, worker_args, replicas, max_batch_size)
216 |         logging.info(f"Batching with max_batch_size of {max_batch_size} ...")
217 |     else:  # minibatches are sent to the ray worker
218 |         backend_setup(backend_tag, worker_args, replicas, 1)
219 |         logging.info(f"Minibatches distributed of size {max_batch_size} ...")
220 |     endpont_setup(endpoint_tag, backend_tag, route=f"/{route}")
221 | 
222 |     run_explainer(X_explain, args.n_runs, replicas, max_batch_size, batch_mode=batch_mode, url=url)
223 | 
224 | 
225 | if __name__ == '__main__':
226 |     parser = argparse.ArgumentParser()
227 |     parser.add_argument(
228 |         "-r",
229 |         "--replicas",
230 |         default=1,
231 |         type=int,
232 |         help="The number of backend replicas used to serve the explainer."
233 |     )
234 |     parser.add_argument(
235 |         "-batch",
236 |         "--max_batch_size",
237 |         nargs='+',
238 |         help="A list of values representing the maximum batch size of pending queries sent to the same worker."
239 |              "This should only contain one element as the backend is reset from `k8s_benchmark_serve.sh`.",
240 |         required=True,
241 |     )
242 |     parser.add_argument(
243 |         "-batch_mode",
244 |         type=str,
245 |         default='ray',
246 |         help="If set to 'ray' the batching will be leveraging ray serve. Otherwise, the input array is split into "
247 |              "minibatches that are sent to the endpoint.",
248 |         required=True,
249 |     )
250 |     parser.add_argument(
251 |         "-n",
252 |         "--n_runs",
253 |         default=5,
254 |         type=int,
255 |         help="Controls how many times an experiment is run (in benchmark mode) for a given number of cores to obtain "
256 |              "run statistics."
257 |     )
258 |     parser.add_argument(
259 |         "-ho",
260 |         "--host",
261 |         default="localhost",
262 |         type=str,
263 |         help="Hostname."
264 |     )
265 |     parser.add_argument(
266 |         "-p",
267 |         "--port",
268 |         default="8000",
269 |         type=str,
270 |         help="Port."
271 |     )
272 |     args = parser.parse_args()
273 |     main()
274 | 


--------------------------------------------------------------------------------
/benchmarks/ray_pool.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import logging
  3 | import os
  4 | import pickle
  5 | import ray
  6 | 
  7 | import numpy as np
  8 | 
  9 | from explainers.kernel_shap import KernelShap
 10 | from explainers.utils import get_filename, load_data, load_model
 11 | from sklearn.metrics import accuracy_score
 12 | from typing import Any, Dict
 13 | from timeit import default_timer as timer
 14 | 
 15 | logging.basicConfig(level=logging.INFO)
 16 | 
 17 | 
 18 | def fit_kernel_shap_explainer(clf, data: dict, distributed_opts: Dict[str, Any] = None):
 19 |     """
 20 |     Returns an a fitted KernelShap explainer for the classifier `clf`. The categorical variables are grouped according
 21 |     to the information specified in `data`.
 22 | 
 23 |     Parameters
 24 |     ----------
 25 |     clf
 26 |         Classifier whose predictions are to be explained.
 27 |     data
 28 |         Contains the background data as well as information about the features and the columns in the feature matrix
 29 |         they occupy.
 30 |     distributed_opts
 31 |         Options controlling the number of worker processes that will distribute the workload.
 32 |     """
 33 | 
 34 |     pred_fcn = clf.predict_proba
 35 |     group_names, groups = data['all']['group_names'], data['all']['groups']
 36 |     explainer = KernelShap(pred_fcn, link='logit', feature_names=group_names, distributed_opts=distributed_opts, seed=0)
 37 |     explainer.fit(data['background']['X']['preprocessed'], group_names=group_names, groups=groups)
 38 |     return explainer
 39 | 
 40 | 
 41 | def run_explainer(explainer, X_explain: np.ndarray, distributed_opts: dict, nruns: int):
 42 |     """
 43 |     Explain `X_explain` with `explainer` configured with `distributed_opts` `nruns` times in order to obtain
 44 |     runtime statistics.
 45 | 
 46 |     Parameters
 47 |     ---------
 48 |     explainer
 49 |         Fitted KernelShap explainer object
 50 |     X_explain
 51 |         Array containing instances to be explained, layed out row-wise. Split into minibatches that are distributed
 52 |         by the explainer.
 53 |     distributed_opts
 54 |         A dictionary of the form::
 55 | 
 56 |             {
 57 |             'n_cpus': int - controls the number of workers on which the instances are explained
 58 |             'batch_size': int - the size of a minibatch into which the dateset is split
 59 |             'actor_cpu_fraction': the fraction of CPU allocated to an actor
 60 |             }
 61 |     nruns
 62 |         Number of times `X_explain` is explained for a given workers and batch size setting.
 63 |     """
 64 | 
 65 |     if not os.path.exists('./results'):
 66 |         os.mkdir('./results')
 67 |     batch_size = distributed_opts['batch_size']
 68 |     result = {'t_elapsed': []}
 69 |     workers = distributed_opts['n_cpus']
 70 |     for run in range(nruns):
 71 |         logging.info(f"run: {run}")
 72 |         t_start = timer()
 73 |         explanation = explainer.explain(X_explain, silent=True)
 74 |         t_elapsed = timer() - t_start
 75 |         logging.info(f"Time elapsed: {t_elapsed}")
 76 |         result['t_elapsed'].append(t_elapsed)
 77 | 
 78 |         with open(get_filename(workers, batch_size, serve=False), 'wb') as f:
 79 |             pickle.dump(result, f)
 80 | 
 81 | 
 82 | def main():
 83 | 
 84 |     # experiment settings
 85 |     nruns = args.nruns if args.benchmark else 1
 86 |     batch_sizes = [int(elem) for elem in args.batch]
 87 | 
 88 |     # load data and instances to be explained
 89 |     data = load_data()
 90 |     predictor = load_model('assets/predictor.pkl')  # download if not available locally
 91 |     y_test, X_test_proc = data['all']['y']['test'], data['all']['X']['processed']['test']
 92 |     logging.info(f"Test accuracy: {accuracy_score(y_test, predictor.predict(X_test_proc))}")
 93 |     X_explain = data['all']['X']['processed']['test'].toarray()  # instances to be explained
 94 | 
 95 |     if args.workers == -1:  # sequential benchmark
 96 |         logging.info(f"Running sequential benchmark without ray ...")
 97 |         distributed_opts = {'batch_size': None, 'n_cpus': None, 'actor_cpu_fraction': 1.0}
 98 |         explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts=distributed_opts)
 99 |         run_explainer(explainer, X_explain, distributed_opts, nruns)
100 |     # run distributed benchmark or simply explain on a number of cores, depending on args.benchmark value
101 |     else:
102 |         workers_range = range(1, args.workers + 1) if args.benchmark == 1 else range(args.workers, args.workers + 1)
103 |         for workers in workers_range:
104 |             for batch_size in batch_sizes:
105 |                 logging.info(f"Running experiment using {workers} actors...")
106 |                 logging.info(f"Running experiment with batch size {batch_size}")
107 |                 distributed_opts = {'batch_size': int(batch_size), 'n_cpus': workers, 'actor_cpu_fraction': 1.0}
108 |                 explainer = fit_kernel_shap_explainer(predictor, data, distributed_opts)
109 |                 run_explainer(explainer, X_explain, distributed_opts, nruns)
110 |                 ray.shutdown()
111 | 
112 | 
113 | if __name__ == '__main__':
114 |     parser = argparse.ArgumentParser()
115 |     parser.add_argument(
116 |         "-b",
117 |         "--batch",
118 |         nargs='+',
119 |         help="A list of values representing the maximum batch size of instances sent to the same worker.",
120 |         required=True,
121 |     )
122 |     parser.add_argument(
123 |         "-w",
124 |         "--workers",
125 |         default=-1,
126 |         type=int,
127 |         help="The number of processes to distribute the explanations dataset on. Set to -1 to run sequenential (without"
128 |              "ray) version."
129 |     )
130 |     parser.add_argument(
131 |         "-benchmark",
132 |         default=0,
133 |         type=int,
134 |         help="Set to 1 to benchmark parallel computation. In this case, explanations are distributed over cores in "
135 |              "range(1, args.workers).!"
136 |     )
137 |     parser.add_argument(
138 |         "-n",
139 |         "--nruns",
140 |         default=5,
141 |         type=int,
142 |         help="Controls how many times an experiment is run (in benchmark mode) for a given number of workers to obtain "
143 |              "run statistics."
144 |     )
145 |     args = parser.parse_args()
146 |     main()
147 | 


--------------------------------------------------------------------------------
/benchmarks/serve_explanations.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import logging
  3 | import os
  4 | import ray
  5 | import pickle
  6 | import requests
  7 | 
  8 | import numpy as np
  9 | 
 10 | from collections import namedtuple
 11 | from ray import serve
 12 | from timeit import default_timer as timer
 13 | from typing import Any, Dict, Tuple
 14 | from explainers.wrappers import BatchKernelShapModel, KernelShapModel
 15 | from explainers.utils import get_filename, load_data, load_model
 16 | 
 17 | logging.basicConfig(level=logging.INFO)
 18 | 
 19 | PREDICTOR_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl'
 20 | PREDICTOR_PATH = 'assets/predictor.pkl'
 21 | """
 22 | str: The file containing the predictor. The predictor can be created by running `fit_adult_model.py` or output by 
 23 | calling `utils.utils.load_model()`, which will download a default predictor if `assets/` does not contain one. 
 24 | """
 25 | 
 26 | 
 27 | def endpont_setup(tag: str, backend_tag: str, route: str = "/"):
 28 |     """
 29 |     Creates an endpoint for serving explanations.
 30 |     Parameters
 31 |     ----------
 32 |     tag
 33 |         Endpoint tag.
 34 |     backend_tag
 35 |         A tag for the backend this explainer will connect to.
 36 |     route
 37 |         The URL where the explainer can be queried.
 38 |     """
 39 |     serve.create_endpoint(tag, backend=backend_tag, route=route, methods=["GET"])
 40 | 
 41 | 
 42 | def backend_setup(tag: str, worker_args: Tuple, replicas: int, max_batch_size: int) -> None:
 43 |     """
 44 |     Setups the backend for the distributed explanation task.
 45 |     Parameters
 46 |     ----------
 47 |     tag
 48 |         A tag for the backend component. The same tag must be passed to `endpoint_setup`.
 49 |     worker_args
 50 |         A tuple containing the arguments for initialising the explainer and fitting it.
 51 |     replicas
 52 |         The number of backend replicas that serve explanations.
 53 |     max_batch_size
 54 |         Maximum number of requests to batch and send to a worker process.
 55 |     """
 56 | 
 57 |     serve.init()
 58 | 
 59 |     if max_batch_size == 1:
 60 |         config = {'num_replicas': max(replicas, 1)}
 61 |         serve.create_backend(tag, KernelShapModel, *worker_args)
 62 |     else:
 63 |         config = {'num_replicas': max(replicas, 1), 'max_batch_size': max_batch_size}
 64 |         serve.create_backend(tag, BatchKernelShapModel, *worker_args)
 65 |     serve.update_backend_config(tag, config)
 66 | 
 67 |     logging.info(f"Backends: {serve.list_backends()}")
 68 | 
 69 | 
 70 | def prepare_explainer_args(data: Dict[str, Any]) -> Tuple[str, np.ndarray, dict, dict]:
 71 |     """
 72 |     Extracts the name of the features (group_names) and the columns corresponding to each feature in the faeture matrix
 73 |     (group_names) from the `data` dict and defines the explainer arguments. The background data necessary to initialise
 74 |     the explainer is also extracted from the same dictionary.
 75 |     Parameters
 76 |     ----------
 77 |     data
 78 |         A dictionary that contains all information necessary to initialise the explainer.
 79 |     Returns
 80 |     -------
 81 |     A tuple containing the positional and keyword arguments necessary for initialising the explainers.
 82 |     """
 83 | 
 84 |     groups = data['all']['groups']
 85 |     group_names = data['all']['group_names']
 86 |     background_data = data['background']['X']['preprocessed']
 87 |     # assert background_data.shape[0] == 100
 88 |     init_kwargs = {'link': 'logit', 'feature_names': group_names, 'seed': 0}
 89 |     fit_kwargs = {'groups': groups, 'group_names': group_names}
 90 |     predictor = load_model(PREDICTOR_URL)
 91 |     worker_args = (predictor, background_data, init_kwargs, fit_kwargs)
 92 | 
 93 |     return worker_args
 94 | 
 95 | 
 96 | @ray.remote
 97 | def distribute_request(instance: np.ndarray, url: str = "http://localhost:8000") -> str:
 98 |     """
 99 |     Task for distributing the explanations across the backend.
100 |     Parameters
101 |     ----------
102 |     instance
103 |         Instance to be explained.
104 |     url:
105 |         The explainer URL.
106 |     Returns
107 |     -------
108 |     A str representation of the explanation output json file.
109 |     """
110 | 
111 |     resp = requests.get(url, json={"array": instance.tolist()})
112 |     return resp.json()
113 | 
114 | 
115 | def explain(data: np.ndarray, *, url: str) -> namedtuple:
116 |     """
117 |     Sends the requests to the explainer URL. The `data` array is split into sub-array containing only one instance.
118 |     Parameters
119 |     ----------
120 |     data:
121 |         Array of instances to be explained.
122 |     url
123 |         Explainer endpoint.
124 |     Returns
125 |     -------
126 |     responses
127 |         A named tuple with a `responses` field and a `t_elapsed` field.
128 |     """
129 | 
130 |     run_output = namedtuple('run_output', 'responses t_elapsed')
131 |     instances = np.split(data, data.shape[0])
132 |     logging.info(f"Explaining {len(instances)} instances!")
133 |     tstart = timer()
134 |     responses_id = [distribute_request.remote(instance, url=url) for instance in instances]
135 |     responses = [ray.get(resp_id) for resp_id in responses_id]
136 |     t_elapsed = timer() - tstart
137 |     logging.info(f"Time elapsed: {t_elapsed}")
138 | 
139 |     return run_output(responses=responses, t_elapsed=t_elapsed)
140 | 
141 | 
142 | def distribute_explanations(n_runs: int, replicas: int, max_batch_size: int, address: str = "http://localhost:8000"):
143 |     """
144 |     Setup an endpoint and a backend and send requests to the endpoint.
145 |     Parameters
146 |     -----------
147 |     n_runs
148 |         Number of times to run an experiment where the entire set of explanations is sent to the explainer endpoint.
149 |         Used to determine the average runtime given the number of cores.
150 |     replicas
151 |         How many backend replicas should be used for distributing the workload
152 |     max_batch_size
153 |         The maximum batch size the explainer accepts.
154 |     address
155 |         The url for the explainer endpoint.
156 |     """
157 | 
158 |     result = {'t_elapsed': []}
159 |     route = "explain"
160 |     backend_tag = "kernel_shap:b100"  # b100 means 100 background samples
161 |     endpoint_tag = f"{backend_tag}_endpoint"
162 |     data = load_data()
163 |     worker_args = prepare_explainer_args(data)
164 |     backend_setup(backend_tag, worker_args, replicas, max_batch_size)
165 |     endpont_setup(endpoint_tag, backend_tag, route=f"/{route}")
166 |     # extract instances to be explained from the dataset
167 |     X_explain = data['all']['X']['processed']['test'].toarray()
168 |     assert X_explain.shape[0] == 2560
169 |     for run in range(n_runs):
170 |         logging.info(f"Experiment run: {run}")
171 |         results = explain(X_explain, url=f"{address}/{route}")
172 |         result['t_elapsed'].append(results.t_elapsed)
173 | 
174 |         with open(get_filename(replicas, max_batch_size, serve=True), 'wb') as f:
175 |             pickle.dump(result, f)
176 | 
177 |     ray.shutdown()
178 | 
179 | 
180 | def main():
181 | 
182 |     if not os.path.exists('results'):
183 |         os.mkdir('results')
184 | 
185 |     address = f"http://{args.host}:{args.port}"
186 |     batch_size_limits = [int(elem) for elem in args.max_batch_size]
187 |     if args.benchmark:
188 |         for replicas in range(1, args.replicas + 1):
189 |             logging.info(f"Running on {replicas} backend replicas!")
190 |             for max_batch_size in batch_size_limits:
191 |                 logging.info(f"Batching with max_batch_size of {max_batch_size}")
192 |                 distribute_explanations(args.nruns, replicas, max_batch_size, address=address)
193 |     else:
194 |         nruns = 1
195 |         for max_batch_size in batch_size_limits:
196 |             distribute_explanations(nruns, args.replicas, max_batch_size, address=address)
197 | 
198 | 
199 | if __name__ == '__main__':
200 |     parser = argparse.ArgumentParser()
201 |     parser.add_argument(
202 |         "-r",
203 |         "--replicas",
204 |         default=1,
205 |         type=int,
206 |         help="The number of backend replicas used to serve the explainer."
207 |     )
208 |     parser.add_argument(
209 |         "-batch",
210 |         "--max_batch_size",
211 |         nargs='+',
212 |         help="A list of values representing the maximum batch size of pending queries sent to the same worker.",
213 |         required=True,
214 |     )
215 |     parser.add_argument(
216 |         "-benchmark",
217 |         default=0,
218 |         type=int,
219 |         help="Set to 1 to benchmark parallel computation. In this case, explanations are distributed over replicas in "
220 |              "range(1, args.replicas).!"
221 |     )
222 |     parser.add_argument(
223 |         "-n",
224 |         "--nruns",
225 |         default=5,
226 |         type=int,
227 |         help="Controls how many times an experiment is run (in benchmark mode) for a given number of cores to obtain "
228 |              "run statistics."
229 |     )
230 |     parser.add_argument(
231 |         "-ho",
232 |         "--host",
233 |         default="localhost",
234 |         type=str,
235 |         help="Hostname."
236 |     )
237 |     parser.add_argument(
238 |         "-p",
239 |         "--port",
240 |         default="8000",
241 |         type=str,
242 |         help="Port."
243 |     )
244 |     args = parser.parse_args()
245 |     main()
246 | 


--------------------------------------------------------------------------------
/cluster/Makefile.pool:
--------------------------------------------------------------------------------
 1 | NAMESPACE ?= pool-ray-cluster
 2 | WORKERS ?= 2
 3 | BATCH ?= 1 5 10
 4 | 
 5 | SHELL := /bin/bash
 6 | 
 7 | 
 8 | .ONESHELL:
 9 | 
10 | deploy:
11 | 	kubectl apply -f ray_pool_cluster.yaml
12 | 	kubectl rollout status deployment/ray-head -n ${NAMESPACE}
13 | 	kubectl rollout status deployment/ray-worker -n ${NAMESPACE}
14 | 
15 | destroy:
16 | 	kubectl delete -f ray_pool_cluster.yaml
17 | 
18 | reset:
19 | 	kubectl delete -n ${NAMESPACE} pod --all
20 | 	kubectl rollout status deployment/ray-head -n ${NAMESPACE}
21 | 	kubectl rollout status deployment/ray-worker -n ${NAMESPACE}
22 | 
23 | upload-script:
24 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
25 | 	kubectl cp -n ${NAMESPACE} ../benchmarks/k8s_ray_pool.py $${POD}:k8s_ray_pool.py
26 | 
27 | run-experiment:
28 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
29 | 	kubectl exec -it -n ${NAMESPACE} $${POD} -- python -W ignore k8s_ray_pool.py --batch ${BATCH} --workers ${WORKERS}
30 | 
31 | pull-results:
32 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
33 | 	kubectl cp ${NAMESPACE}/$${POD}:/distributed_explainers/results ../results/
34 | 


--------------------------------------------------------------------------------
/cluster/Makefile.serve:
--------------------------------------------------------------------------------
 1 | NAMESPACE ?= kernel-shap-ray-cluster
 2 | WORKERS ?= 2
 3 | BATCH_MODE ?= ray
 4 | # do not pass a list of values here as they are ignored in the python script. Batch is changed via bash.
 5 | BATCH ?= 5
 6 | 
 7 | SHELL := /bin/bash
 8 | 
 9 | 
10 | .ONESHELL:
11 | 
12 | deploy:
13 | 	kubectl apply -f ray_cluster.yaml
14 | 	kubectl rollout status deployment/ray-head -n ${NAMESPACE}
15 | 	kubectl rollout status deployment/ray-worker -n ${NAMESPACE}
16 | 
17 | destroy:
18 | 	kubectl delete -f ray_cluster.yaml
19 | 
20 | reset:
21 | 	kubectl delete -n ${NAMESPACE} pod --all
22 | 	kubectl rollout status deployment/ray-head -n {NAMESPACE}
23 | 	kubectl rollout status deployment/ray-worker -n {NAMESPACE}
24 | 
25 | upload-script:
26 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
27 | 	kubectl cp -n ${NAMESPACE} ../benchmarks/k8s_serve_explanations.py $${POD}:k8s_serve_explanations.py
28 | 
29 | run-experiment:
30 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
31 | 	kubectl exec -it -n ${NAMESPACE} $${POD} -- python -W ignore k8s_serve_explanations.py -batch ${BATCH} -r ${WORKERS} -batch_mode ${BATCH_MODE}
32 | 
33 | pull-results:
34 | 	POD=`kubectl -n ${NAMESPACE} get pod -l component=ray-head -o jsonpath="{.items[0].metadata.name}"`
35 | 	kubectl cp ${NAMESPACE}/$${POD}:/distributed_explainers/results ../results/
36 | 


--------------------------------------------------------------------------------
/cluster/README.md:
--------------------------------------------------------------------------------
 1 | # Running distributed KernelSHAP
 2 | 
 3 | To create a virtual environment that allows you to run KernelSHAP in a distributed fashion with [`ray`](https://github.com/ray-project/ray) you need to configure your environment first, which requires [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/) to be installed. You can then run the command::
 4 | 
 5 | `conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name`
 6 | 
 7 | to create the environment and then activate it with `conda activate shap`. If you don not wish to change the installation path then you can skip the `-p` option. You are now ready to run the experiments. The steps involved are:
 8 | 
 9 | 1. data processing 
10 | 2. running the experiments
11 | 
12 | To process the data it is sufficient to run `python preprocess_data.py` with the default options. This will output a preprocessed version of the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset and a partition of it that is used to initialise the KernelSHAP explainer. However, you can proceed to step 2 if you don't intend to change the default parameters as the same data will be automatically downloaded.
13 | 
14 | You can run an experiment with the command `python experiment.py`. By default, this will run the explainer on the `2560` examples from the `Adult` dataset with a background dataset with `100` samples, sequentially (5 times if the `-benchmark 1` option is passed to it). The resuults are saved in the `results/` folder. If you wish to run the same explanations in parallel, then run the command
15 | 
16 | `python experiment.py -cores 3`
17 | 
18 | which will use `ray` to perform explanations across multiple cores.
19 | 
20 | Other options for the script are:
21 | 
22 | - `-benchmark`: if set to 1, `-cores` will be treated as the upper bound of number of cores to compute the explanations on. The lower bound is `2`, and the explanations are computed 5 times (by default) to provide runtime averages. The number of repetitions can be controlled using the `-nruns` argument.
23 | - `-batch_size`: controls how many instances are explained by a core at once. This parameter has an important bearing to the code runtime performance
24 | 


--------------------------------------------------------------------------------
/cluster/ray_cluster.yaml:
--------------------------------------------------------------------------------
  1 | apiVersion: v1
  2 | kind: Namespace
  3 | metadata:
  4 |   name: kernel-shap-ray-cluster
  5 | ---
  6 | 
  7 | # Ray head node service, allowing worker pods to discover the head node.
  8 | apiVersion: v1
  9 | kind: Service
 10 | metadata:
 11 |   namespace: kernel-shap-ray-cluster
 12 |   name: ray-head
 13 | spec:
 14 |   ports:
 15 |     # Redis ports.
 16 |     - name: redis-primary
 17 |       port: 6379
 18 |       targetPort: 6379
 19 |     - name: redis-shard-0
 20 |       port: 6380
 21 |       targetPort: 6380
 22 |     - name: redis-shard-1
 23 |       port: 6381
 24 |       targetPort: 6381
 25 | 
 26 |     # Ray internal communication ports.
 27 |     - name: object-manager
 28 |       port: 12345
 29 |       targetPort: 12345
 30 |     - name: node-manager
 31 |       port: 12346
 32 |       targetPort: 12346
 33 |     - name: serve-explain
 34 |       port: 8000
 35 |       targetPort: 8000
 36 | 
 37 |   selector:
 38 |     component: ray-head
 39 | 
 40 | ---
 41 | 
 42 | apiVersion: apps/v1
 43 | kind: Deployment
 44 | metadata:
 45 |   namespace: kernel-shap-ray-cluster
 46 |   name: ray-head
 47 | spec:
 48 |   # Do not change this - Ray currently only supports one head node per cluster.
 49 |   replicas: 1
 50 |   selector:
 51 |     matchLabels:
 52 |       component: ray-head
 53 |       type: ray
 54 |   template:
 55 |     metadata:
 56 |       labels:
 57 |         component: ray-head
 58 |         type: ray
 59 |     spec:
 60 |       # If the head node goes down, the entire cluster (including all worker
 61 |       # nodes) will go down as well. If you want Kubernetes to bring up a new
 62 |       # head node in this case, set this to "Always," else set it to "Never."
 63 |       restartPolicy: Always
 64 | 
 65 |       # This volume allocates shared memory for Ray to use for its plasma
 66 |       # object store. If you do not provide this, Ray will fall back to
 67 |       # /tmp which cause slowdowns if is not a shared memory volume.
 68 |       volumes:
 69 |       - name: dshm
 70 |         emptyDir:
 71 |           medium: Memory
 72 |       containers:
 73 |         - name: ray-head
 74 |           image: alexcoca/distributedkernelshap:0.6
 75 |           imagePullPolicy: Always
 76 |           command: [ "/bin/bash", "-c", "--" ]
 77 |           args:
 78 |             - "ray start --head --node-ip-address=$MY_POD_IP --redis-port=6379 --redis-shard-ports=6380,6381 --num-cpus=$MY_CPU_REQUEST --object-manager-port=12345 --node-manager-port=12346 --block"
 79 |           ports:
 80 |             - containerPort: 6379 # Redis port.
 81 |             - containerPort: 6380 # Redis port.
 82 |             - containerPort: 6381 # Redis port.
 83 |             - containerPort: 12345 # Ray internal communication.
 84 |             - containerPort: 12346 # Ray internal communication.
 85 |             - containerPort: 8000
 86 | 
 87 |           # This volume allocates shared memory for Ray to use for its plasma
 88 |           # object store. If you do not provide this, Ray will fall back to
 89 |           # /tmp which cause slowdowns if is not a shared memory volume.
 90 |           volumeMounts:
 91 |             - mountPath: /dev/shm
 92 |               name: dshm
 93 |           env:
 94 |             - name: MY_POD_IP
 95 |               valueFrom:
 96 |                 fieldRef:
 97 |                   fieldPath: status.podIP
 98 | 
 99 |             # This is used in the ray start command so that Ray can spawn the
100 |             # correct number of processes. Omitting this may lead to degraded
101 |             # performance.
102 |             - name: MY_CPU_REQUEST
103 |               valueFrom:
104 |                 resourceFieldRef:
105 |                   resource: requests.cpu
106 |           resources:
107 |             requests:
108 |               cpu: 1
109 |               memory: 512Mi
110 | 
111 | ---
112 | apiVersion: apps/v1
113 | kind: Deployment
114 | metadata:
115 |   namespace: kernel-shap-ray-cluster
116 |   name: ray-worker
117 | spec:
118 |   # Change this to scale the number of worker nodes started in the Ray cluster.
119 |   replicas: 14
120 |   selector:
121 |     matchLabels:
122 |       component: ray-worker
123 |       type: ray
124 |   template:
125 |     metadata:
126 |       labels:
127 |         component: ray-worker
128 |         type: ray
129 |     spec:
130 |       restartPolicy: Always
131 |       volumes:
132 |       - name: dshm
133 |         emptyDir:
134 |           medium: Memory
135 |       containers:
136 |       - name: ray-worker
137 |         image: alexcoca/distributedkernelshap:0.6
138 |         imagePullPolicy: Always
139 |         command: ["/bin/bash", "-c", "--"]
140 |         args:
141 |           - "ray start --node-ip-address=$MY_POD_IP --num-cpus=$MY_CPU_REQUEST --address=$RAY_HEAD_SERVICE_HOST:$RAY_HEAD_SERVICE_PORT_REDIS_PRIMARY --object-manager-port=12345 --node-manager-port=12346 --block"
142 |         ports:
143 |           - containerPort: 12345 # Ray internal communication.
144 |           - containerPort: 12346 # Ray internal communication.
145 |         volumeMounts:
146 |           - mountPath: /dev/shm
147 |             name: dshm
148 |         env:
149 |           - name: MY_POD_IP
150 |             valueFrom:
151 |               fieldRef:
152 |                 fieldPath: status.podIP
153 | 
154 |           # This is used in the ray start command so that Ray can spawn the
155 |           # correct number of processes. Omitting this may lead to degraded
156 |           # performance.
157 |           - name: MY_CPU_REQUEST
158 |             valueFrom:
159 |               resourceFieldRef:
160 |                 resource: requests.cpu
161 |         resources:
162 |           requests:
163 |             cpu: 4
164 |             memory: 512Mi
165 | 


--------------------------------------------------------------------------------
/cluster/ray_pool_cluster.yaml:
--------------------------------------------------------------------------------
  1 | apiVersion: v1
  2 | kind: Namespace
  3 | metadata:
  4 |   name: pool-ray-cluster
  5 | ---
  6 | 
  7 | # Ray head node service, allowing worker pods to discover the head node.
  8 | apiVersion: v1
  9 | kind: Service
 10 | metadata:
 11 |   namespace: pool-ray-cluster
 12 |   name: ray-head
 13 | spec:
 14 |   ports:
 15 |     # Redis ports.
 16 |     - name: redis-primary
 17 |       port: 6379
 18 |       targetPort: 6379
 19 |     - name: redis-shard-0
 20 |       port: 6380
 21 |       targetPort: 6380
 22 |     - name: redis-shard-1
 23 |       port: 6381
 24 |       targetPort: 6381
 25 | 
 26 |     # Ray internal communication ports.
 27 |     - name: object-manager
 28 |       port: 12345
 29 |       targetPort: 12345
 30 |     - name: node-manager
 31 |       port: 12346
 32 |       targetPort: 12346
 33 |     - name: serve-explain
 34 |       port: 8000
 35 |       targetPort: 8000
 36 | 
 37 |   selector:
 38 |     component: ray-head
 39 | 
 40 | ---
 41 | 
 42 | apiVersion: apps/v1
 43 | kind: Deployment
 44 | metadata:
 45 |   namespace: pool-ray-cluster
 46 |   name: ray-head
 47 | spec:
 48 |   # Do not change this - Ray currently only supports one head node per cluster.
 49 |   replicas: 1
 50 |   selector:
 51 |     matchLabels:
 52 |       component: ray-head
 53 |       type: ray
 54 |   template:
 55 |     metadata:
 56 |       labels:
 57 |         component: ray-head
 58 |         type: ray
 59 |     spec:
 60 |       # If the head node goes down, the entire cluster (including all worker
 61 |       # nodes) will go down as well. If you want Kubernetes to bring up a new
 62 |       # head node in this case, set this to "Always," else set it to "Never."
 63 |       restartPolicy: Always
 64 | 
 65 |       # This volume allocates shared memory for Ray to use for its plasma
 66 |       # object store. If you do not provide this, Ray will fall back to
 67 |       # /tmp which cause slowdowns if is not a shared memory volume.
 68 |       volumes:
 69 |       - name: dshm
 70 |         emptyDir:
 71 |           medium: Memory
 72 |       containers:
 73 |         - name: ray-head
 74 |           image: alexcoca/distributedkernelshap:0.6
 75 |           imagePullPolicy: Always
 76 |           command: [ "/bin/bash", "-c", "--" ]
 77 |           args:
 78 |             - "ray start --head --node-ip-address=$MY_POD_IP --redis-port=6379 --redis-shard-ports=6380,6381 --num-cpus=$MY_CPU_REQUEST --object-manager-port=12345 --node-manager-port=12346 --block"
 79 |           ports:
 80 |             - containerPort: 6379 # Redis port.
 81 |             - containerPort: 6380 # Redis port.
 82 |             - containerPort: 6381 # Redis port.
 83 |             - containerPort: 12345 # Ray internal communication.
 84 |             - containerPort: 12346 # Ray internal communication.
 85 |             - containerPort: 8000
 86 | 
 87 |           # This volume allocates shared memory for Ray to use for its plasma
 88 |           # object store. If you do not provide this, Ray will fall back to
 89 |           # /tmp which cause slowdowns if is not a shared memory volume.
 90 |           volumeMounts:
 91 |             - mountPath: /dev/shm
 92 |               name: dshm
 93 |           env:
 94 |             - name: MY_POD_IP
 95 |               valueFrom:
 96 |                 fieldRef:
 97 |                   fieldPath: status.podIP
 98 | 
 99 |             # This is used in the ray start command so that Ray can spawn the
100 |             # correct number of processes. Omitting this may lead to degraded
101 |             # performance.
102 |             - name: MY_CPU_REQUEST
103 |               valueFrom:
104 |                 resourceFieldRef:
105 |                   resource: requests.cpu
106 |           resources:
107 |             requests:
108 |               cpu: 1
109 |               memory: 512Mi
110 | 
111 | ---
112 | apiVersion: apps/v1
113 | kind: Deployment
114 | metadata:
115 |   namespace: pool-ray-cluster
116 |   name: ray-worker
117 | spec:
118 |   # Change this to scale the number of worker nodes started in the Ray cluster.
119 |   replicas: 14
120 |   selector:
121 |     matchLabels:
122 |       component: ray-worker
123 |       type: ray
124 |   template:
125 |     metadata:
126 |       labels:
127 |         component: ray-worker
128 |         type: ray
129 |     spec:
130 |       restartPolicy: Always
131 |       volumes:
132 |       - name: dshm
133 |         emptyDir:
134 |           medium: Memory
135 |       containers:
136 |       - name: ray-worker
137 |         image: alexcoca/distributedkernelshap:0.6
138 |         imagePullPolicy: Always
139 |         command: ["/bin/bash", "-c", "--"]
140 |         args:
141 |           - "ray start --node-ip-address=$MY_POD_IP --num-cpus=$MY_CPU_REQUEST --address=$RAY_HEAD_SERVICE_HOST:$RAY_HEAD_SERVICE_PORT_REDIS_PRIMARY --object-manager-port=12345 --node-manager-port=12346 --block"
142 |         ports:
143 |           - containerPort: 12345 # Ray internal communication.
144 |           - containerPort: 12346 # Ray internal communication.
145 |         volumeMounts:
146 |           - mountPath: /dev/shm
147 |             name: dshm
148 |         env:
149 |           - name: MY_POD_IP
150 |             valueFrom:
151 |               fieldRef:
152 |                 fieldPath: status.podIP
153 | 
154 |           # This is used in the ray start command so that Ray can spawn the
155 |           # correct number of processes. Omitting this may lead to degraded
156 |           # performance.
157 |           - name: MY_CPU_REQUEST
158 |             valueFrom:
159 |               resourceFieldRef:
160 |                 resource: requests.cpu
161 |         resources:
162 |           requests:
163 |             cpu: 4
164 |             memory: 512Mi
165 | 


--------------------------------------------------------------------------------
/dockerfiles/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM rayproject/autoscaler:ray-0.8.6
2 | WORKDIR /distributed_explainers
3 | COPY pyproject.toml .
4 | COPY explainers ./explainers
5 | RUN conda install python=3.7
6 | RUN pip install .
7 | 


--------------------------------------------------------------------------------
/dockerfiles/Makefile:
--------------------------------------------------------------------------------
 1 | DOCKER_REPOSITORY ?= alexcoca
 2 | 
 3 | IMAGE_NAME ?= distributedkernelshap
 4 | IMAGE_VERSION ?= 0.6
 5 | 
 6 | kernel-shap-image:
 7 | 	docker build ../ -f Dockerfile -t ${IMAGE_NAME}:${IMAGE_VERSION}
 8 | 	docker tag ${IMAGE_NAME}:${IMAGE_VERSION} ${DOCKER_REPOSITORY}/${IMAGE_NAME}:${IMAGE_VERSION}
 9 | 
10 | push-kernel-shap-image: kernel-shap-image
11 | 	docker push ${DOCKER_REPOSITORY}/${IMAGE_NAME}:${IMAGE_VERSION}
12 | 


--------------------------------------------------------------------------------
/explainers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/explainers/__init__.py


--------------------------------------------------------------------------------
/explainers/distributed.py:
--------------------------------------------------------------------------------
  1 | import ray
  2 | 
  3 | import numpy as np
  4 | 
  5 | from functools import partial
  6 | from scipy import sparse
  7 | from typing import Any, Callable, Dict, List, Optional, Union
  8 | from explainers.utils import batch
  9 | 
 10 | 
 11 | def kernel_shap_target_fn(actor: Any, instances: tuple, kwargs: Optional[Dict] = None) -> Callable:
 12 |     """
 13 |     A target function that is executed in parallel given an actor pool. Its arguments must be an actor and a batch of
 14 |     values to be processed by the actor. Its role is to execute distributed computations when an actor is available.
 15 | 
 16 |     Parameters
 17 |     ----------
 18 |     actor
 19 |         A `ray` actor. This is typically a class decorated with the @ray.remote decorator, that has been subsequently
 20 |         instantiated using cls.remote(*args, **kwargs).
 21 |     instances
 22 |         A (batch_index, batch) tuple containing the batch of instances to be explained along with a batch index.
 23 |     kwargs
 24 |         A list of keyword arguments for the actor `shap_values` method.
 25 | 
 26 |     Returns
 27 |     -------
 28 |     A callable that can be used as a target process for a parallel pool of actor objects.
 29 |     """
 30 | 
 31 |     if kwargs is None:
 32 |         kwargs = {}
 33 | 
 34 |     return actor.get_explanation.remote(instances, **kwargs)
 35 | 
 36 | 
 37 | def kernel_shap_postprocess_fn(ordered_result: List[Union[np.ndarray, List[np.ndarray]]]) \
 38 |         -> List[Union[np.ndarray, List[np.ndarray]]]:
 39 |     """
 40 |     Merges the results of the batched computation for KernelShap.
 41 | 
 42 |     Parameters
 43 |     ----------
 44 |     ordered_result
 45 |         A list containing the results for each batch, in the order that the batch was submitted to the parallel pool.
 46 |         It may contain:
 47 |             - `np.ndarray` objects (single-output predictor)
 48 |             - lists of `np.ndarray` objects (multi-output predictors)
 49 | 
 50 |     Returns
 51 |     -------
 52 |     concatenated
 53 |         A list containing the concatenated results for all the batches.
 54 |     """
 55 |     if isinstance(ordered_result[0], np.ndarray):
 56 |         return np.concatenate(ordered_result, axis=0)
 57 | 
 58 |     # concatenate explanations for every class
 59 |     n_classes = len(ordered_result[0])
 60 |     to_concatenate = [list(zip(*ordered_result))[idx] for idx in range(n_classes)]
 61 |     concatenated = [np.concatenate(arrays, axis=0) for arrays in to_concatenate]
 62 |     return concatenated
 63 | 
 64 | 
 65 | def invert_permutation(p: list):
 66 |     """
 67 |     Inverts a permutation.
 68 | 
 69 |     Parameters:
 70 |     -----------
 71 |     p
 72 |         Some permutation of 0, 1, ..., len(p)-1. Returns an array s, where s[i] gives the index of i in p.
 73 | 
 74 |     Returns
 75 |     -------
 76 |     s
 77 |         `s[i]` gives the index of `i` in `p`.
 78 |     """
 79 | 
 80 |     s = np.empty_like(p)
 81 |     s[p] = np.arange(len(p))
 82 |     return s
 83 | 
 84 | 
 85 | class DistributedExplainer:
 86 |     """
 87 |     A class that orchestrates the execution of the execution of a batch of explanations in parallel.
 88 |     """
 89 | 
 90 |     def __init__(self, distributed_opts, explainer_type, init_args, init_kwargs):
 91 | 
 92 |         self.n_jobs = distributed_opts['n_cpus']
 93 |         self.n_actors = int(distributed_opts['n_cpus'] // distributed_opts['actor_cpu_fraction'])
 94 |         self.actor_cpu_frac = distributed_opts['actor_cpu_fraction']
 95 |         self.batch_size = distributed_opts['batch_size']
 96 |         self.algorithm = distributed_opts['algorithm']
 97 |         self.target_fn = globals()[f"{distributed_opts['algorithm']}_target_fn"]
 98 |         try:
 99 |             self.post_process_fcn = globals()[f"{distributed_opts['algorithm']}_postprocess_fn"]
100 |         except KeyError:
101 |             self.post_process_fcn = None
102 | 
103 |         self.explainer = explainer_type
104 |         self.explainer_args = init_args
105 |         self.explainer_kwargs = init_kwargs
106 | 
107 |         if not ray.is_initialized():
108 |             print(f"Initialising ray on {distributed_opts['n_cpus']} cpus!")
109 |             ray.init(num_cpus=distributed_opts['n_cpus'])
110 | 
111 |         self.pool = self.create_parallel_pool()
112 | 
113 |     def __getattr__(self, item):
114 |         """
115 |         Access to actor attributes. Should be used to retrieve only state that is shared by all actors in the pool.
116 |         """
117 |         actor = self.pool._idle_actors[0]
118 |         return ray.get(actor.return_attribute.remote(item))
119 | 
120 |     def create_parallel_pool(self):
121 |         """
122 |         Creates a pool of actors (aka proceses containing explainers) that can execute explanations in parallel.
123 |         """
124 | 
125 |         actor_handles = [ray.remote(self.explainer).options(num_cpus=self.actor_cpu_frac) for _ in range(self.n_actors)]
126 | 
127 |         actors = [handle.remote(*self.explainer_args, **self.explainer_kwargs) for handle in actor_handles]
128 |         return ray.util.ActorPool(actors)
129 | 
130 |     def get_explanation(self, X: np.ndarray, **kwargs) -> np.ndarray:
131 |         """
132 |         Performs distributed explanations of instances in `X`.
133 | 
134 |         Parameters
135 |         ----------
136 |         X
137 |             A batch of instances to be explained. Split into batches according to the settings passed to the constructor.
138 |         kwargs
139 |             Any keyword-arguments for the explainer `explain` method. 
140 | 
141 |         Returns
142 |         --------
143 |             An array of explanations.
144 |         """  # noqa E501
145 | 
146 |         if kwargs is not None:
147 |             target_fn = partial(self.target_fn, kwargs=kwargs)
148 |         else:
149 |             target_fn = self.target_fn
150 |         batched_instances = batch(X, batch_size=self.batch_size, n_batches=self.n_jobs)
151 | 
152 |         unordered_explanations = self.pool.map_unordered(target_fn, enumerate(batched_instances))
153 | 
154 |         return self.order_result(unordered_explanations)
155 | 
156 |     def order_result(self, unordered_result: List[tuple]) -> np.ndarray:
157 |         """
158 |         Re-orders the result of a distributed explainer so that the explanations follow the same order as the input to
159 |         the explainer.
160 | 
161 | 
162 |         Parameters
163 |         ----------
164 |         unordered_result
165 |             Each tuple contains the batch id as the first entry and the explanations for that batch as the second.
166 | 
167 |         Returns
168 |         -------
169 |         A numpy array where the the batches ordered according to their batch id are concatenated in a single array.
170 |         """
171 | 
172 |         # TODO: THIS DOES NOT LEVERAGE THE FACT THAT THE RESULTS ARE RETURNED AS AVAILABLE. ISSUE TO BE RAISED.
173 | 
174 |         result_order, results = list(zip(*[(idx, res) for idx, res in unordered_result]))
175 |         orig_order = invert_permutation(list(result_order))
176 |         ordered_result = [results[idx] for idx in orig_order]
177 |         if self.post_process_fcn is not None:
178 |             return self.post_process_fcn(ordered_result)
179 |         return ordered_result
180 | 


--------------------------------------------------------------------------------
/explainers/interface.py:
--------------------------------------------------------------------------------
  1 | import abc
  2 | import attr
  3 | import copy
  4 | import json
  5 | import logging
  6 | 
  7 | import numpy as np
  8 | 
  9 | from collections import ChainMap
 10 | from prettyprinter import pretty_repr
 11 | from typing import Any
 12 | 
 13 | # KernelSHAP
 14 | DEFAULT_META_KERNEL_SHAP = {
 15 |     "name": None,
 16 |     "type": ["blackbox"],
 17 |     "task": None,
 18 |     "explanations": ["local", "global"],
 19 |     "params": {}
 20 | }  # type: dict
 21 | """
 22 | Default KernelSHAP metadata.
 23 | """
 24 | 
 25 | DEFAULT_DATA_KERNEL_SHAP = {
 26 |     "shap_values": [],
 27 |     "expected_value": [],
 28 |     "link": 'identity',
 29 |     "categorical_names": {},
 30 |     "feature_names": [],
 31 |     "raw": {
 32 |         "raw_prediction": None,
 33 |         "prediction": None,
 34 |         "instances": None,
 35 |         "importances": {},
 36 |     }
 37 | }  # type: dict
 38 | """
 39 | Default KernelSHAP data.
 40 | """
 41 | 
 42 | 
 43 | logger = logging.getLogger(__name__)
 44 | 
 45 | # default metadata
 46 | DEFAULT_META = {
 47 |     "name": None,
 48 |     "type": [],
 49 |     "explanations": [],
 50 |     "params": {},
 51 | }  # type: dict
 52 | 
 53 | 
 54 | @attr.s
 55 | class Explainer(abc.ABC):
 56 |     """
 57 |     Base class for explainer algorithms
 58 |     """
 59 | 
 60 |     meta = attr.ib(default=copy.deepcopy(DEFAULT_META), repr=pretty_repr)  # type: dict
 61 | 
 62 |     def __attrs_post_init__(self):
 63 |         # add a name to the metadata dictionary
 64 |         self.meta["name"] = self.__class__.__name__
 65 | 
 66 |         # expose keys stored in self.meta as attributes of the class.
 67 |         for key, value in self.meta.items():
 68 |             setattr(self, key, value)
 69 | 
 70 |     @abc.abstractmethod
 71 |     def explain(self, X: Any) -> "Explanation":
 72 |         pass
 73 | 
 74 | 
 75 | class FitMixin(abc.ABC):
 76 |     @abc.abstractmethod
 77 |     def fit(self, X: Any) -> "Explainer":
 78 |         pass
 79 | 
 80 | 
 81 | @attr.s
 82 | class Explanation:
 83 |     """
 84 |     Explanation class returned by explainers.
 85 |     """
 86 |     meta = attr.ib(repr=pretty_repr)  # type: dict
 87 |     data = attr.ib(repr=pretty_repr)  # type: dict
 88 | 
 89 |     def __attrs_post_init__(self):
 90 |         """
 91 |         Expose keys stored in self.meta and self.data as attributes of the class.
 92 |         """
 93 |         for key, value in ChainMap(self.meta, self.data).items():
 94 |             setattr(self, key, value)
 95 | 
 96 |     def to_json(self) -> str:
 97 |         """
 98 |         Serialize the explanation data and metadata into a json format.
 99 | 
100 |         Returns
101 |         -------
102 |         String containing json representation of the explanation
103 |         """
104 |         return json.dumps(attr.asdict(self), cls=NumpyEncoder)
105 | 
106 |     @classmethod
107 |     def from_json(cls, jsonrepr) -> "Explanation":
108 |         """
109 |         Create an instance of an Explanation class using a json representation of the Explanation.
110 | 
111 |         Parameters
112 |         ----------
113 |         jsonrepr
114 |             json representation of an explanation
115 | 
116 |         Returns
117 |         -------
118 |             An Explanation object
119 |         """
120 |         dictrepr = json.loads(jsonrepr)
121 |         try:
122 |             meta = dictrepr['meta']
123 |             data = dictrepr['data']
124 |         except KeyError:
125 |             logger.exception("Invalid explanation representation")
126 |         return cls(meta=meta, data=data)
127 | 
128 |     def __getitem__(self, item):
129 |         """
130 |         This method is purely for deprecating previous behaviour of accessing explanation
131 |         data via items in the returned dictionary.
132 |         """
133 |         import warnings
134 |         msg = "The Explanation object is not a dictionary anymore and accessing elements should " \
135 |               "be done via attribute access. Accessing via item will stop working in a future version."
136 |         warnings.warn(msg, DeprecationWarning, stacklevel=2)
137 |         return getattr(self, item)
138 | 
139 | 
140 | class NumpyEncoder(json.JSONEncoder):
141 |     def default(self, obj):
142 |         if isinstance(
143 |                 obj,
144 |                 (
145 |                         np.int_,
146 |                         np.intc,
147 |                         np.intp,
148 |                         np.int8,
149 |                         np.int16,
150 |                         np.int32,
151 |                         np.int64,
152 |                         np.uint8,
153 |                         np.uint16,
154 |                         np.uint32,
155 |                         np.uint64,
156 |                 ),
157 |         ):
158 |             return int(obj)
159 |         elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
160 |             return float(obj)
161 |         elif isinstance(obj, (np.ndarray,)):
162 |             return obj.tolist()
163 |         return json.JSONEncoder.default(self, obj)
164 | 


--------------------------------------------------------------------------------
/explainers/kernel_shap.py:
--------------------------------------------------------------------------------
   1 | import copy
   2 | import logging
   3 | import shap
   4 | import warnings
   5 | 
   6 | import numpy as np
   7 | import pandas as pd
   8 | 
   9 | from explainers.interface import DEFAULT_META_KERNEL_SHAP, DEFAULT_DATA_KERNEL_SHAP, Explanation, Explainer, FitMixin
  10 | from explainers.utils import methdispatch
  11 | from explainers.distributed import DistributedExplainer
  12 | from functools import partial
  13 | from scipy import sparse
  14 | from shap import KernelExplainer
  15 | from shap.common import DenseData, DenseDataWithIndex, convert_to_link
  16 | from typing import Any, Callable, Dict, List, Optional, Sequence, Union, Tuple, TYPE_CHECKING
  17 | 
  18 | if TYPE_CHECKING:
  19 |     import catboost  # noqa F401
  20 | 
  21 | logger = logging.getLogger(__name__)
  22 | 
  23 | KERNEL_SHAP_PARAMS = [
  24 |     'link',
  25 |     'group_names',
  26 |     'groups',
  27 |     'weights',
  28 |     'summarise_background',
  29 |     'summarise_result',
  30 |     'kwargs',
  31 | ]
  32 | 
  33 | KERNEL_SHAP_BACKGROUND_THRESHOLD = 300
  34 | 
  35 | 
  36 | def rank_by_importance(shap_values: List[np.ndarray],
  37 |                        feature_names: Union[List[str], Tuple[str], None] = None) -> Dict:
  38 |     """
  39 |     Given the shap values estimated for a multi-output model, this function ranks
  40 |     features according to their importance. The feature importance is the average
  41 |     absolute value for a given feature.
  42 | 
  43 |     Parameters
  44 |     ----------
  45 |     shap_values
  46 |         Each element corresponds to a samples x features array of shap values corresponding
  47 |         to each model output.
  48 |     feature_names
  49 |         Each element is the name of the column with the corresponding index in each of the
  50 |         arrays in the `shap_values` list.
  51 | 
  52 |     Returns
  53 |     -------
  54 |     importances
  55 |         A dictionary of the form::
  56 | 
  57 |             {
  58 |                 '0': {'ranked_effect': array([0.2, 0.5, ...]), 'names': ['feat_3', 'feat_5', ...]},
  59 |                 '1': {'ranked_effect': array([0.3, 0.2, ...]), 'names': ['feat_6', 'feat_1', ...]},
  60 |                 ...
  61 |                 'aggregated': {'ranked_effect': array([0.9, 0.7, ...]), 'names': ['feat_3', 'feat_6', ...]}
  62 |             }
  63 | 
  64 |         The keys of the first level represent the index of the model output. The feature effects in
  65 |         `ranked_effect` and the corresponding feature names in `names` are sorted from highest (most
  66 |         important) to lowest (least important). The values in the `aggregated` field are obtained by
  67 |         summing the shap values for all the model outputs and then computing the effects. Given an
  68 |         output, the effects are defined as the average magnitude of the shap values across the instances
  69 |         to be explained.
  70 |     """
  71 | 
  72 |     if len(shap_values[0].shape) == 1:
  73 |         shap_values = [np.atleast_2d(arr) for arr in shap_values]
  74 | 
  75 |     if not feature_names:
  76 |         feature_names = ['feature_{}'.format(i) for i in range(shap_values[0].shape[1])]
  77 |     else:
  78 |         if len(feature_names) != shap_values[0].shape[1]:
  79 |             msg = "The feature names provided do not match the number of shap values estimated. " \
  80 |                   "Received {} feature names but estimated {} shap values!"
  81 |             logger.warning(msg.format(len(feature_names), shap_values[0].shape[1]))
  82 |             feature_names = ['feature_{}'.format(i) for i in range(shap_values[0].shape[1])]
  83 | 
  84 |     importances = {}  # type: Dict[str, Dict[str, np.ndarray]]
  85 |     avg_mag = []  # type: List
  86 | 
  87 |     # rank the features by average shap value for each class in turn
  88 |     for class_idx in range(len(shap_values)):
  89 |         avg_mag_shap = np.abs(shap_values[class_idx]).mean(axis=0)
  90 |         avg_mag.append(avg_mag_shap)
  91 |         feature_order = np.argsort(avg_mag_shap)[::-1]
  92 |         most_important = avg_mag_shap[feature_order]
  93 |         most_important_names = [feature_names[i] for i in feature_order]
  94 |         importances[str(class_idx)] = {
  95 |             'ranked_effect': most_important,
  96 |             'names': most_important_names,
  97 |         }
  98 | 
  99 |     # rank feature by average shap value for aggregated classes
 100 |     combined_shap = np.sum(avg_mag, axis=0)
 101 |     feature_order = np.argsort(combined_shap)[::-1]
 102 |     most_important_c = combined_shap[feature_order]
 103 |     most_important_c_names = [feature_names[i] for i in feature_order]
 104 |     importances['aggregated'] = {
 105 |         'ranked_effect': most_important_c,
 106 |         'names': most_important_c_names
 107 |     }
 108 | 
 109 |     return importances
 110 | 
 111 | 
 112 | def sum_categories(values: np.ndarray, start_idx: Sequence[int], enc_feat_dim: Sequence[int]):
 113 |     """
 114 |     This function is used to reduce specified slices in a two- or three- dimensional tensor.
 115 | 
 116 |     For two-dimensional `values` arrays, for each entry in start_idx, the function sums the
 117 |     following k columns where k is the corresponding entry in the enc_feat_dim sequence.
 118 |     The columns whose indices are not in start_idx are left unchanged. This arises when the slices
 119 |     contain the shap values for each dimension of an encoded categorical variable and a single shap
 120 |     value for each variable is desired.
 121 | 
 122 |     For three-dimensional `values` arrays, the reduction is applied for each rank 2 subtensor, first along
 123 |     the column dimension and then across the row dimension. This arises when summarising shap interaction values.
 124 |     Each rank 2 tensor is a E x E matrix of shap interaction values, where E is the dimension of the data after
 125 |     one-hot encoding. The result of applying the reduction yields a rank 2 tensor of dimension F x F, where F is the
 126 |     number of features (ie, the feature dimension of the data matrix before encoding). By applying this transformation,
 127 |     a single value describing the interaction of categorical features i and j and a single value describing the
 128 |     intearction of j and i is returned.
 129 | 
 130 |     Parameters
 131 |     ----------
 132 |     values
 133 |         A two or three dimensional array to be reduced, as described above.
 134 |     start_idx
 135 |         The start indices of the columns to be summed.
 136 |     enc_feat_dim
 137 |         The number of columns to be summed, one for each start index.
 138 |     Returns
 139 |     -------
 140 |     new_values
 141 |         An array whose columns have been summed according to the entries in `start_idx` and `enc_feat_dim`.
 142 |     """
 143 | 
 144 |     if start_idx is None or enc_feat_dim is None:
 145 |         raise ValueError("Both the start indices or the encoding dimension need to be specified!")
 146 | 
 147 |     if not len(enc_feat_dim) == len(start_idx):
 148 |         raise ValueError("The lengths of the sequences of start indices and encodings must be equal!")
 149 | 
 150 |     n_encoded_levels = sum(enc_feat_dim)
 151 |     if n_encoded_levels > values.shape[-1]:
 152 |         raise ValueError("The sum of the encoded features dimensions exceeds data dimension!")
 153 | 
 154 |     if len(values.shape) not in (2, 3):
 155 |         raise ValueError(
 156 |             f"Shap value summarisation can only be applied to tensors of shap values (dim=2) or shap "
 157 |             f"interaction values (dim=3). The tensor to be summarised had dimension {values.shape}!"
 158 |         )
 159 | 
 160 |     def _get_slices(start: Sequence[int], dim: Sequence[int], arr_trailing_dim: int) -> List[int]:
 161 |         """
 162 |         Given start indices, encoding dimensions and the array trailing shape, this function returns
 163 |         an array where contiguous numbers are slices. This array is used to reduce along an axis
 164 |         only the slices `slice(start[i], start[i] + dim[i], 1)` from a tensor and leave all other slices
 165 |         unchanged.
 166 |         """
 167 | 
 168 |         slices = []  # type: List[int]
 169 |         # first columns may not be reduced
 170 |         if start[0] > 0:
 171 |             slices.extend(tuple(range(start[0])))
 172 | 
 173 |         # add all slices to reduce
 174 |         slices.extend([start[0], start[0] + dim[0]])
 175 |         for s_idx, d in zip(start[1:], dim[1:]):
 176 |             last_idx = slices[-1]
 177 |             # some columns might not be reduced
 178 |             if last_idx < s_idx - 1:
 179 |                 slices.extend(tuple(range(last_idx + 1, s_idx)))
 180 |                 last_idx += (s_idx - last_idx - 2)
 181 |             # handle contiguous slices
 182 |             if s_idx == last_idx:
 183 |                 slices.append(s_idx + d)
 184 |             else:
 185 |                 slices.extend((s_idx, s_idx + d))
 186 | 
 187 |         # avoid index error
 188 |         if start[-1] + dim[-1] == arr_trailing_dim:
 189 |             slices.pop()
 190 |             return slices
 191 | 
 192 |         # last few columns may not be reduced
 193 |         last_idx = slices[-1]
 194 |         if last_idx < arr_trailing_dim:
 195 |             slices.extend(tuple(range(last_idx + 1, arr_trailing_dim)))
 196 | 
 197 |         return slices
 198 | 
 199 |     def _reduction(arr, axis, indices=None):
 200 |         return np.add.reduceat(arr, indices, axis)
 201 | 
 202 |     # create array of slices to be reduced
 203 |     slices = _get_slices(start_idx, enc_feat_dim, values.shape[-1])
 204 |     if len(values.shape) == 3:
 205 |         reduction = partial(_reduction, indices=slices)
 206 |         return np.apply_over_axes(reduction, values, axes=(2, 1))
 207 |     return np.add.reduceat(values, slices, axis=1)
 208 | 
 209 | 
 210 | DISTRIBUTED_OPTS = {
 211 |     'n_cpus': None,
 212 |     'batch_size': None,
 213 |     'actor_cpu_fraction': 1.0
 214 | }
 215 | 
 216 | 
 217 | class KernelExplainerWrapper(KernelExplainer):
 218 |     """
 219 |     A wrapper around `shap.KernelExplainer` that supports:
 220 | 
 221 |         - fixing the seed when instantiating the KernelExplainer in a separate process
 222 |         - passing a batch index to the explainer so that a parallel explainer pool can return batches in arbitrary order
 223 |     """
 224 | 
 225 |     def __init__(self, *args, **kwargs):
 226 |         if 'seed' in kwargs:
 227 |             seed = kwargs.pop('seed')
 228 |             np.random.seed(seed)
 229 |         super().__init__(*args, **kwargs)
 230 | 
 231 |     def get_explanation(self, X: Union[Tuple[int, np.ndarray], np.ndarray], **kwargs) -> Tuple[int, np.ndarray]:
 232 |         """
 233 |         Wrapper around `shap.KernelExplainer.shap_values` that allows calling the method with a tuple containing a
 234 |         batch index and a batch of instances.
 235 | 
 236 |         Parameters
 237 |         ----------
 238 |         X
 239 |             When called from a distributed context, it is a tuple containing a batch index and a batch to be explained.
 240 |             Otherwise, it is an array of instances to be explained.
 241 |         kwargs
 242 |             `shap.KernelExplainer` kwarg values.
 243 |         """
 244 | 
 245 |         # handle call from distributed context
 246 |         with warnings.catch_warnings():
 247 |             warnings.simplefilter("ignore")
 248 |             if isinstance(X, tuple):
 249 |                 batch_idx, batch = X
 250 |                 shap_values = super().shap_values(batch, **kwargs)
 251 |                 return batch_idx, shap_values
 252 |             else:
 253 |                 shap_values = super().shap_values(X, **kwargs)
 254 |                 return shap_values
 255 | 
 256 |     def return_attribute(self, name: str) -> Any:
 257 |         """
 258 |         Returns an attribute specified by its name. Used in a distributed context where the actor properties cannot be
 259 |         accessed using the dot syntax.
 260 |         """
 261 |         return self.__getattribute__(name)
 262 | 
 263 | 
 264 | class KernelShap(Explainer, FitMixin):
 265 | 
 266 |     def __init__(self,
 267 |                  predictor: Callable,
 268 |                  link: str = 'identity',
 269 |                  feature_names: Union[List[str], Tuple[str], None] = None,
 270 |                  categorical_names: Optional[Dict[int, List[str]]] = None,
 271 |                  task: str = 'classification',
 272 |                  seed: int = None,
 273 |                  distributed_opts: Optional[Dict] = None):
 274 |         """
 275 |         A wrapper around the `shap.KernelExplainer` class. It extends the current `shap` library functionality
 276 |         by allowing the user to specify variable groups in order to treat one-hot encoded categorical as one during
 277 |         sampling. The user can also specify whether to aggregate the `shap` values estimate for the encoded levels
 278 |         of categorical variables as an optional argument to `explain`, if grouping arguments are not passed to `fit`.
 279 | 
 280 |         Parameters
 281 |         ----------
 282 |         predictor
 283 |             A callable that takes as an input a samples x features array and outputs a samples x n_outputs
 284 |             model outputs. The n_outputs should represent model output in margin space. If the model outputs
 285 |             probabilities, then the link should be set to 'logit' to ensure correct force plots.
 286 |         link
 287 |             Valid values are `'identity'` or `'logit'`. A generalized linear model link to connect the feature
 288 |             importance values to the model output. Since the feature importance values, :math:`\phi`, sum up to the
 289 |             model output, it often makes sense to connect them to the ouput with a link function where
 290 |             :math:`link(output - expected\_value) = sum(\phi)`. Therefore, for a model which outputs probabilities,
 291 |             `link='logit'` makes the feature effects have log-odds (evidence) units and `link='identity'` means that the
 292 |             feature effects have probability units. Please see this `example`_ for an in-depth discussion about the
 293 |             semantics of explaining the model in the probability or margin space.
 294 | 
 295 |             .. _example:
 296 |                https://github.com/slundberg/shap/blob/master/notebooks/kernel_explainer/Squashing%20Effect.ipynb
 297 | 
 298 |         feature_names
 299 |             Used to infer group names when categorical data is treated by grouping and `group_names` input to `fit`
 300 |             is not specified, assuming it has the same length as the `groups` argument of `fit` method. It is also used
 301 |             to compute the `names` field, which appears as a key in each of the values of
 302 |             `explanation.data['raw']['importances']`.
 303 |         categorical_names
 304 |             Keys are feature column indices in the `background_data` matrix (see `fit`). Each value contains strings
 305 |             with the names of the categories for the feature. Used to select the method for background data
 306 |             summarisation (if specified, subsampling is performed as opposed to k-means clustering). In the future it
 307 |             may be used for visualisation.
 308 |         task
 309 |             Can have values `'classification'` and `'regression'`. It is only used to set the contents of
 310 |             `explanation.data['raw']['prediction']`
 311 |         seed
 312 |             Fixes the random number stream, which influences which subsets are sampled during shap value estimation.
 313 |         distributed_opts
 314 |             A dictionary with the following structure::
 315 | 
 316 |                 {
 317 |                 'n_cpus': None,
 318 |                 'batch_size': None,
 319 |                 }
 320 | 
 321 |             The entries represent:
 322 |                 - `n_cpus`: an ``int`` representing the number of CPUs on which the input `X` to explain will be \
 323 |                 explained. If set to `None`, the code will run sequentially. 
 324 |                 - `batch_size`: and ``int`` representing how many instances should be explained on every CPU. If set to \
 325 |                 `None`, an input array is split in (roughly) equal parts and distributed across the available CPUs.
 326 | 
 327 |             The distributed explanation works only the `ray`_ library is installed. 
 328 | 
 329 |             .. _ray:
 330 |                https://docs.ray.io/en/master/
 331 | 
 332 |         Raises
 333 |         ------
 334 |         ModuleNotFoundError
 335 |             If the `ray` library is not installed and `n_cpus` is set in `distributed_opts`. 
 336 | 
 337 |         """  # noqa W605
 338 | 
 339 |         super().__init__(meta=copy.deepcopy(DEFAULT_META_KERNEL_SHAP))
 340 | 
 341 |         self.link = link
 342 |         self.predictor = predictor
 343 |         self.feature_names = feature_names if feature_names else []
 344 |         self.categorical_names = categorical_names if categorical_names else {}
 345 |         self.task = task
 346 |         self.seed = seed
 347 |         self._update_metadata({"task": self.task})
 348 | 
 349 |         # if the user specifies groups but no names, the groups are automatically named
 350 |         self.use_groups = False
 351 |         # changes if feature groups indices are passed but not names
 352 |         self.create_group_names = False
 353 |         # if sum of groups entries matches first dimension as opposed to second, warn user
 354 |         self.transposed = False
 355 |         # if weights are not correctly specified, they are ignored
 356 |         self.ignore_weights = False
 357 |         # sums up shap values for each level of categorical var
 358 |         self.summarise_result = False
 359 |         # selects a subset of the background data to avoid excessively slow runtimes
 360 |         self.summarise_background = False
 361 |         # checks if it has been fitted:
 362 |         self._fitted = False
 363 |         self.distributed_opts = copy.deepcopy(DISTRIBUTED_OPTS)
 364 |         if distributed_opts:
 365 |             self.distributed_opts.update(distributed_opts)
 366 |         self.distributed_opts['algorithm'] = 'kernel_shap'
 367 |         self.distribute = True if self.distributed_opts['n_cpus'] else False
 368 | 
 369 |     def _check_inputs(self,
 370 |                       background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix],
 371 |                       group_names: Union[Tuple, List, None],
 372 |                       groups: Optional[List[Union[Tuple[int], List[int]]]],
 373 |                       weights: Union[Union[List[float], Tuple[float]], np.ndarray, None]) -> None:
 374 |         """
 375 |         If user specifies parameter grouping, then we check input is correct or inform
 376 |         them if the settings they put might not behave as expected.
 377 |         """
 378 | 
 379 |         if isinstance(background_data, shap.common.Data):
 380 |             # don't provide checks for situations where the user passes
 381 |             # the data object directly
 382 |             if not self.summarise_background:
 383 |                 self.use_groups = False
 384 |                 return
 385 |             # if summarisation took place, we do the checks to ensure everything is correct
 386 |             else:
 387 |                 background_data = background_data.data
 388 | 
 389 |         if isinstance(background_data, np.ndarray) and background_data.ndim == 1:
 390 |             background_data = np.atleast_2d(background_data)
 391 | 
 392 |         if background_data.shape[0] > KERNEL_SHAP_BACKGROUND_THRESHOLD:
 393 |             msg = "Large datasets can cause slow runtimes for shap. The background dataset " \
 394 |                   "provided has {} records. Consider passing a subset or allowing the algorithm " \
 395 |                   "to automatically summarize the data by setting the summarise_background=True or" \
 396 |                   "setting summarise_background to 'auto' which will default to {} samples!"
 397 |             logger.warning(msg.format(background_data.shape[0], KERNEL_SHAP_BACKGROUND_THRESHOLD))
 398 | 
 399 |         if group_names and not groups:
 400 |             logger.info(
 401 |                 "Specified group_names but no corresponding sequence 'groups' with indices "
 402 |                 "for each group was specified. All groups will have len=1."
 403 |             )
 404 |             if not len(group_names) in background_data.shape:
 405 |                 msg = "Specified {} group names but data dimension is {}. When grouping " \
 406 |                       "indices are not specifies the number of group names should equal " \
 407 |                       "one of the data dimensions! Igoring grouping inputs!"
 408 |                 logger.warning(msg.format(len(group_names), background_data.shape))
 409 |                 self.use_groups = False
 410 | 
 411 |         if groups and not group_names:
 412 |             logger.warning(
 413 |                 "No group names specified but groups specified! Automatically "
 414 |                 "assigning 'group_' name for every index group specified!")
 415 |             if self.feature_names:
 416 |                 n_groups = len(groups)
 417 |                 n_features = len(self.feature_names)
 418 |                 if n_features != n_groups:
 419 |                     msg = "Number of feature names specified did not match the number of groups." \
 420 |                           "Specified {} groups and {} features names. Creating default names for " \
 421 |                           "specified groups"
 422 |                     logger.warning(msg.format(n_groups, n_features))
 423 |                     self.create_group_names = True
 424 |                 else:
 425 |                     group_names = self.feature_names
 426 |             else:
 427 |                 self.create_group_names = True
 428 | 
 429 |         if groups:
 430 |             if not (isinstance(groups[0], tuple) or isinstance(groups[0], list)):
 431 |                 msg = "groups should be specified as List[Union[Tuple[int], List[int]]] where each " \
 432 |                       "sublist represents a group and int represent group instance. Specified group " \
 433 |                       "elements have type {}. Ignoring grouping inputs!"
 434 |                 logger.warning(msg.format(type(groups[0])))
 435 |                 self.use_groups = False
 436 | 
 437 |             expected_dim = sum(len(g) for g in groups)
 438 |             if background_data.ndim == 1:
 439 |                 actual_dim = background_data.shape[0]
 440 |             else:
 441 |                 actual_dim = background_data.shape[1]
 442 |             if expected_dim != actual_dim:
 443 |                 if background_data.shape[0] == expected_dim:
 444 |                     logger.warning(
 445 |                         "The sum of the group indices list did not match the "
 446 |                         "data dimension along axis=1 but matched dimension "
 447 |                         "along axis=0. Consider transposing the data!"
 448 |                     )
 449 |                     self.transposed = True
 450 |                 else:
 451 |                     msg = "The sum of the group sizes specified did not match the number of features. " \
 452 |                           "Sum of group sizes: {}. Number of features: {}. Ignoring grouping inputs!"
 453 |                     logger.warning(msg.format(expected_dim, actual_dim))
 454 |                     self.use_groups = False
 455 | 
 456 |             if group_names:
 457 |                 n_groups = len(groups)
 458 |                 n_group_names = len(group_names)
 459 |                 if n_group_names != n_groups:
 460 |                     msg = "The number of group names specified does not match the number of groups. " \
 461 |                           "Received {} groups and {} names! Ignoring grouping inputs!"
 462 |                     logger.warning(msg.format(n_groups, n_group_names))
 463 |                     self.use_groups = False
 464 | 
 465 |         if weights is not None:
 466 |             if background_data.ndim == 1 or background_data.shape[0] == 1:
 467 |                 logger.warning(
 468 |                     "Specified weights but the background data has only one record. "
 469 |                     "Weights will be ignored!"
 470 |                 )
 471 |                 self.ignore_weights = True
 472 |             else:
 473 |                 data_dim = background_data.shape[0]
 474 |                 feat_dim = background_data.shape[1]
 475 |                 weights_dim = len(weights)
 476 |                 if data_dim != weights_dim:
 477 |                     if not (feat_dim == weights_dim and self.transposed):
 478 |                         msg = "The number of weights specified did not match data dimension. " \
 479 |                               "Number of weights: {}. Number of datapoints: {}. Weights will " \
 480 |                               "be ignored!"
 481 |                         logger.warning(msg.format(weights_dim, data_dim))
 482 |                         self.ignore_weights = True
 483 | 
 484 |             # NB: we have already summarised the data at this point
 485 |             if self.summarise_background:
 486 | 
 487 |                 weights_dim = len(weights)
 488 |                 if background_data.ndim == 1:
 489 |                     n_background_samples = 1
 490 |                 else:
 491 |                     if not self.transposed:
 492 |                         n_background_samples = background_data.shape[0]
 493 |                     else:
 494 |                         n_background_samples = background_data.shape[1]
 495 | 
 496 |                 if weights_dim != n_background_samples:
 497 |                     msg = "The number of weights vector provided ({}) did not match the number of " \
 498 |                           "summary data points ({}). The weights provided will be ignored!"
 499 |                     logger.warning(msg.format(weights_dim, n_background_samples))
 500 | 
 501 |                     self.ignore_weights = True
 502 | 
 503 |     def _summarise_background(self,
 504 |                               background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix],
 505 |                               n_background_samples: int) -> \
 506 |             Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix]:
 507 |         """
 508 |         Summarises the background data to n_background_samples in order to reduce the computational cost. If the
 509 |         background data is a `shap.common.Data object`, no summarisation is performed.
 510 | 
 511 |         Returns
 512 |         -------
 513 |             If the user has specified grouping, then the input object is subsampled and an object of the same
 514 |             type is returned. Otherwise, a `shap.common.Data` object containing the result of a k-means algorithm
 515 |             is wrapped in a `shap.common.DenseData` object and returned. The samples are weighted according to the
 516 |             frequency of the occurrence of the clusters in the original data.
 517 |         """
 518 | 
 519 |         if isinstance(background_data, shap.common.Data):
 520 |             msg = "Received option to summarise the data but the background_data object " \
 521 |                   "was an instance of shap.common.Data. No summarisation will take place!"
 522 |             logger.warning(msg)
 523 |             return background_data
 524 | 
 525 |         if background_data.ndim == 1:
 526 |             msg = "Received option to summarise the data but the background_data object only had " \
 527 |                   "one record with {} features. No summarisation will take place!"
 528 |             logger.warning(msg.format(len(background_data)))
 529 |             return background_data
 530 | 
 531 |         self.summarise_background = True
 532 | 
 533 |         # if the input is sparse, we assume there are categorical variables and use random sampling, not kmeans
 534 |         if self.use_groups or self.categorical_names or isinstance(background_data, sparse.spmatrix):
 535 |             return shap.sample(background_data, nsamples=n_background_samples)
 536 |         else:
 537 |             logger.info(
 538 |                 "When summarising with kmeans, the samples are weighted in proportion to their "
 539 |                 "cluster occurrence frequency. Please specify a different weighting of the samples "
 540 |                 "through the by passing a weights of len=n_background_samples to the constructor!"
 541 |             )
 542 |             return shap.kmeans(background_data, n_background_samples)
 543 | 
 544 |     @methdispatch
 545 |     def _get_data(self,
 546 |                   background_data: Union[shap.common.Data, pd.DataFrame, np.ndarray, sparse.spmatrix],
 547 |                   group_names: Sequence,
 548 |                   groups: List[Sequence[int]],
 549 |                   weights: Sequence[Union[float, int]],
 550 |                   **kwargs):
 551 |         """
 552 |         Groups the data if grouping options are specified, returning a shap.common.Data object in this
 553 |         case. Otherwise, the original data is returned and handled internally by the shap library.
 554 |         """
 555 | 
 556 |         raise TypeError("Type {} is not supported for background data!".format(type(background_data)))
 557 | 
 558 |     @_get_data.register(shap.common.Data)
 559 |     def _(self, background_data, *args, **kwargs) -> shap.common.Data:
 560 |         """
 561 |         Initialises background data if the user passes a `shap.common.Data` object as input.
 562 | 
 563 |         Notes
 564 |         _____
 565 | 
 566 |         If `self.summarise_background = True`, then a `shap.common.Data` object is
 567 |         returned if the user passed a `shap.common.Data` object to `fit` or didn't specify groups.
 568 |         """
 569 | 
 570 |         group_names, groups, weights = args
 571 |         if weights is not None and self.summarise_background:
 572 |             if not self.ignore_weights:
 573 |                 background_data.weights = weights
 574 |             if self.use_groups:
 575 |                 background_data.groups = groups
 576 |                 background_data.group_names = group_names
 577 |                 background_data.group_size = len(groups)
 578 | 
 579 |         return background_data
 580 | 
 581 |     @_get_data.register(np.ndarray)  # type: ignore
 582 |     def _(self, background_data, *args, **kwargs) -> Union[np.ndarray, shap.common.Data]:
 583 |         """
 584 |         Initialises background data if the user passes an `np.ndarray` object as input.
 585 |         If the user specifies feature grouping then a `shap.common.DenseData` object
 586 |         is returned. Weights are handled separately to avoid triggering assertion
 587 |         correct inside `shap` library. Otherwise, the original data is returned and
 588 |         is handled by the `shap` library internally.
 589 |         """
 590 | 
 591 |         group_names, groups, weights = args
 592 |         new_args = (group_names, groups, weights) if weights is not None else (group_names, groups)
 593 |         if self.use_groups:
 594 |             return DenseData(background_data, *new_args)
 595 |         else:
 596 |             return background_data
 597 | 
 598 |     @_get_data.register(sparse.spmatrix)  # type: ignore
 599 |     def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, sparse.spmatrix]:
 600 |         """
 601 |         Initialises background data if the user passes a sparse matrix as input. If the
 602 |         user specifies feature grouping, then the sparse array is converted to a dense
 603 |         array. Otherwise, the original array is returned and handled internally by `shap`
 604 |         library.
 605 |         """
 606 | 
 607 |         group_names, groups, weights = args
 608 |         new_args = (group_names, groups, weights) if weights is not None else (group_names, groups)
 609 | 
 610 |         if self.use_groups:
 611 |             logger.warning(
 612 |                 "Grouping is not currently compatible with sparse matrix inputs. "
 613 |                 "Converting background data sparse array to dense matrix."
 614 |             )
 615 |             background_data = background_data.toarray()
 616 |             return DenseData(
 617 |                 background_data,
 618 |                 *new_args,
 619 |             )
 620 | 
 621 |         return background_data
 622 | 
 623 |     @_get_data.register(pd.core.frame.DataFrame)  # type: ignore
 624 |     def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, pd.core.frame.DataFrame]:
 625 |         """
 626 |         Initialises background data if the user passes a `pandas.core.frame.DataFrame` as input.
 627 |         If the user has specified groups and given a data frame, it initialises a `shap.common.DenseData`
 628 |         object explicitly as this is not handled by `shap` library internally. Otherwise, data initialisation,
 629 |         is left to the `shap` library.
 630 |         """
 631 | 
 632 |         _, groups, weights = args
 633 |         new_args = (groups, weights) if weights is not None else (groups,)
 634 |         if self.use_groups:
 635 |             logger.info("Group names are specified by column headers, group_names will be ignored!")
 636 |             keep_index = kwargs.get("keep_index", False)
 637 |             if keep_index:
 638 |                 return DenseDataWithIndex(
 639 |                     background_data.values,
 640 |                     list(background_data.columns),
 641 |                     background_data.index.values,
 642 |                     background_data.index.name,
 643 |                     *new_args,
 644 |                 )
 645 |             else:
 646 |                 return DenseData(
 647 |                     background_data.values,
 648 |                     list(background_data.columns),
 649 |                     *new_args,
 650 |                 )
 651 |         else:
 652 |             return background_data
 653 | 
 654 |     @_get_data.register(pd.core.frame.Series)  # type: ignore
 655 |     def _(self, background_data, *args, **kwargs) -> Union[shap.common.Data, pd.core.frame.Series]:
 656 |         """
 657 |         Initialises background data if the user passes a `pandas.Series` object as input.
 658 |         Original object is returned as this is initialised internally by `shap` is there
 659 |         is no group structure specified. Otherwise, a `shap.common.DenseData` object
 660 |         is initialised.
 661 |         """
 662 | 
 663 |         _, groups, _ = args
 664 |         if self.use_groups:
 665 |             return DenseData(
 666 |                 background_data.values.reshape(1, len(background_data)),
 667 |                 list(background_data.index),
 668 |                 groups,
 669 |             )
 670 | 
 671 |         return background_data
 672 | 
 673 |     def _update_metadata(self, data_dict: dict, params: bool = False) -> None:
 674 |         """
 675 |         This function updates the metadata of the explainer using the data from
 676 |         the `data_dict`. If the params option is specified, then each key-value
 677 |         pair is added to the metadata `'params'` dictionary only if the key is
 678 |         included in `KERNEL_SHAP_PARAMS`.
 679 | 
 680 |         Parameters
 681 |         ----------
 682 |         data_dict
 683 |             Dictionary containing the data to be stored in the metadata.
 684 |         params
 685 |             If True, the method updates the `'params'` attribute of the metatadata.
 686 |         """
 687 | 
 688 |         if params:
 689 |             for key in data_dict.keys():
 690 |                 if key not in KERNEL_SHAP_PARAMS:
 691 |                     continue
 692 |                 else:
 693 |                     self.meta['params'].update([(key, data_dict[key])])
 694 |         else:
 695 |             self.meta.update(data_dict)
 696 | 
 697 |     def fit(self,  # type: ignore
 698 |             background_data: Union[np.ndarray, sparse.spmatrix, pd.DataFrame, shap.common.Data],
 699 |             summarise_background: Union[bool, str] = False,
 700 |             n_background_samples: int = KERNEL_SHAP_BACKGROUND_THRESHOLD,
 701 |             group_names: Union[Tuple[str], List[str], None] = None,
 702 |             groups: Optional[List[Union[Tuple[int], List[int]]]] = None,
 703 |             weights: Union[Union[List[float], Tuple[float]], np.ndarray, None] = None,
 704 |             **kwargs) -> "KernelShap":
 705 |         """
 706 |         This takes a background dataset (usually a subsample of the training set) as an input along with several
 707 |         user specified options and initialises a `KernelShap` explainer. The runtime of the algorithm depends on the
 708 |         number of samples in this dataset and on the number of features in the dataset. To reduce the size of the
 709 |         dataset, the `summarise_background` option and `n_background_samples` should be used. To reduce the feature
 710 |         dimensionality, encoded categorical variables can be treated as one during the feature perturbation process;
 711 |         this decreases the effective feature dimensionality, can reduce the variance of the shap values estimation and
 712 |         reduces slightly the number of calls to the predictor. Further runtime savings can be achieved by changing the
 713 |         `nsamples` parameter in the call to explain. Runtime reduction comes with an accuracy trade-off, so it is better
 714 |         to experiment with a runtime reduction method and understand results stability before using the system.
 715 | 
 716 |         Parameters
 717 |         -----------
 718 |         background_data
 719 |             Data used to estimate feature contributions and baseline values for force plots. The rows of the
 720 |             background data should represent samples and the columns features.
 721 |         summarise_background
 722 |             A large background dataset impacts the runtime and memory footprint of the algorithm. By setting
 723 |             this argument to `True`, only `n_background_samples` from the provided data are selected. If group_names or
 724 |             groups arguments are specified, the algorithm assumes that the data contains categorical variables so
 725 |             the records are selected uniformly at random. Otherwise, `shap.kmeans` (a wrapper around `sklearn` k-means
 726 |             implementation) is used for selection. If set to `'auto'`, a default of
 727 |             `KERNEL_SHAP_BACKGROUND_THRESHOLD` samples is selected.
 728 |         n_background_samples
 729 |             The number of samples to keep in the background dataset if `summarise_background=True`.
 730 |         groups:
 731 |             A list containing sub-lists specifying the indices of features belonging to the same group.
 732 |         group_names:
 733 |             If specified, this array is used to treat groups of features as one during feature perturbation.
 734 |             This feature can be useful, for example, to treat encoded categorical variables as one and can
 735 |             result in computational savings (this may require adjusting the `nsamples` parameter).
 736 |         weights:
 737 |             A sequence or array of weights. This is used only if grouping is specified and assigns a weight
 738 |             to each point in the dataset.
 739 |         kwargs:
 740 |             Expected keyword arguments include `keep_index` (bool) and should be used if a data frame containing an
 741 |             index column is passed to the algorithm.
 742 |         """
 743 | 
 744 |         np.random.seed(self.seed)
 745 | 
 746 |         self._fitted = True
 747 |         # user has specified variable groups
 748 |         use_groups = groups is not None or group_names is not None
 749 |         self.use_groups = use_groups
 750 | 
 751 |         if summarise_background:
 752 |             if isinstance(summarise_background, str):
 753 |                 if not isinstance(background_data, shap.common.Data):
 754 |                     n_samples = background_data.shape[0]
 755 |                 else:
 756 |                     n_samples = background_data.data.shape[0]
 757 |                 n_background_samples = min(n_samples, KERNEL_SHAP_BACKGROUND_THRESHOLD)
 758 |             background_data = self._summarise_background(background_data, n_background_samples)
 759 | 
 760 |         # check user inputs to provide warnings if input is incorrect
 761 |         self._check_inputs(background_data, group_names, groups, weights)
 762 |         if self.create_group_names:
 763 |             group_names = ['group_{}'.format(i) for i in range(len(groups))]
 764 |         # disable grouping or data weights if inputs are not correct
 765 |         if self.ignore_weights:
 766 |             weights = None
 767 |         if not self.use_groups:
 768 |             group_names, groups = None, None
 769 |         else:
 770 |             self.feature_names = group_names
 771 | 
 772 |         # perform grouping if requested by the user
 773 |         self.background_data = self._get_data(background_data, group_names, groups, weights, **kwargs)
 774 |         explainer_args = (self.predictor, self.background_data)
 775 |         explainer_kwargs = {'link': self.link}
 776 |         # distribute computation
 777 |         if self.distribute:
 778 |             # set seed for each process
 779 |             explainer_kwargs['seed'] = self.seed
 780 |             self._explainer = DistributedExplainer(
 781 |                 self.distributed_opts,
 782 |                 KernelExplainerWrapper,
 783 |                 explainer_args,
 784 |                 explainer_kwargs,
 785 |             )  # type: DistributedExplainer
 786 |         else:
 787 |             self._explainer = KernelExplainerWrapper(*explainer_args,
 788 |                                                      **explainer_kwargs)  # type: KernelExplainerWrapper # noqa: E501
 789 |         self.expected_value = self._explainer.expected_value
 790 |         if not self._explainer.vector_out:
 791 |             logger.warning(
 792 |                 "Predictor returned a scalar value. Ensure the output represents a probability or decision score as "
 793 |                 "opposed to a classification label!"
 794 |             )
 795 | 
 796 |         # update metadata
 797 |         params = {
 798 |             'groups': groups,
 799 |             'group_names': group_names,
 800 |             'weights': weights,
 801 |             'kwargs': kwargs,
 802 |             'summarise_background': self.summarise_background,
 803 |             'grouped': self.use_groups,
 804 |             'transpose': self.transposed,
 805 |         }
 806 |         self._update_metadata(params, params=True)
 807 | 
 808 |         return self
 809 | 
 810 |     def explain(self,
 811 |                 X: Union[np.ndarray, pd.DataFrame, sparse.spmatrix],
 812 |                 summarise_result: bool = False,
 813 |                 cat_vars_start_idx: Sequence[int] = None,
 814 |                 cat_vars_enc_dim: Sequence[int] = None,
 815 |                 **kwargs) -> Explanation:
 816 |         """
 817 |         Explains the instances in the array `X`.
 818 | 
 819 |         Parameters
 820 |         ----------
 821 |         X
 822 |             Instances to be explained. Note that the `pd.DataFrame` and `sparse.spmatrix` are not supported by the 
 823 |             distributed version. In the future `pd.DataFrame` might be supported, please raise a feature request if you 
 824 |             need this feature.
 825 |         summarise_result
 826 |             Specifies whether the shap values corresponding to dimensions of encoded categorical variables should be
 827 |             summed so that a single shap value is returned for each categorical variable. Both the start indices of
 828 |             the categorical variables (`cat_vars_start_idx`) and the encoding dimensions (`cat_vars_enc_dim`)
 829 |             have to be specified
 830 |         cat_vars_start_idx
 831 |             The start indices of the categorical variables. If specified, `cat_vars_enc_dim` should also be specified.
 832 |         cat_vars_enc_dim
 833 |             The length of the encoding dimension for each categorical variable. If specified `cat_vars_start_idx` should
 834 |             also be specified.
 835 |         kwargs
 836 |             Keyword arguments specifying explain behaviour. Valid arguments are:
 837 | 
 838 |                 - `nsamples`: controls the number of predictor calls and therefore runtime.
 839 | 
 840 |                 - `l1_reg`: the algorithm is exponential in the feature dimension. If set to `auto` the algorithm will \
 841 |                 first run a feature selection algorithm to select the top features, provided the fraction of sampled \
 842 |                 sets of missing features is less than 0.2 from the number of total subsets. The Akaike Information \
 843 |                 Criterion is used in this case. See our examples for more details about available settings for this \
 844 |                 parameter. Note that by first running a feature selection step, the shapley values of the remainder of \
 845 |                 the features will be different to those estimated from the entire set.
 846 | 
 847 |                 For more details, please see the shap library `documentation`_ .
 848 | 
 849 |                 .. _documentation:
 850 |                    https://shap.readthedocs.io/en/latest/.
 851 | 
 852 |         Returns
 853 |         -------
 854 |         explanation
 855 |             An explanation object containing the algorithm results.
 856 | 
 857 |         Raises
 858 |         ------
 859 |         TypeError
 860 |             In the following conditions: 
 861 |                 - `fit` method has not been called prior to explain
 862 |                 - distributed context has been specified but `X` is a `pd.DataFrame` or `sparse.spmatrix` object.
 863 |         """  # noqa W605
 864 | 
 865 |         if not self._fitted:
 866 |             raise TypeError(
 867 |                 "Called explain on an unfitted object! Please fit the explainer using the .fit method first!"
 868 |             )
 869 | 
 870 |         if self.distribute:
 871 |             if isinstance(X, sparse.spmatrix) or isinstance(X, pd.DataFrame):
 872 |                 raise TypeError(
 873 |                     "Incorrect type for `X` due to distributed context. Cast `X` to np.ndarray."
 874 |                 )
 875 | 
 876 |         # convert data to dense format if sparse
 877 |         if self.use_groups and isinstance(X, sparse.spmatrix):
 878 |             X = X.toarray()
 879 | 
 880 |         shap_values = self._explainer.get_explanation(X, **kwargs)
 881 |         self.expected_value = self._explainer.expected_value
 882 |         expected_value = self.expected_value
 883 |         # for scalar model outputs a single numpy array is returned
 884 |         if isinstance(shap_values, np.ndarray):
 885 |             shap_values = [shap_values]
 886 |         if isinstance(expected_value, float):
 887 |             expected_value = [expected_value]
 888 | 
 889 |         explanation = self.build_explanation(
 890 |             X,
 891 |             shap_values,
 892 |             expected_value,
 893 |             summarise_result=summarise_result,
 894 |             cat_vars_start_idx=cat_vars_start_idx,
 895 |             cat_vars_enc_dim=cat_vars_enc_dim,
 896 |         )
 897 | 
 898 |         return explanation
 899 | 
 900 |     def build_explanation(self,
 901 |                           X: Union[np.ndarray, pd.DataFrame, sparse.spmatrix],
 902 |                           shap_values: List[np.ndarray],
 903 |                           expected_value: List[float],
 904 |                           **kwargs) -> Explanation:
 905 |         """
 906 |         Create an explanation object.  If output summarisation is required and all inputs necessary for this operation
 907 |         are passed, the raw shap values are summed first so that a single shap value is returned for each categorical
 908 |         variable, as opposed to a shap value per dimension of categorical variable encoding.
 909 | 
 910 |         Parameters
 911 |         ----------
 912 |         X
 913 |             Instances to be explained.
 914 |         shap_values
 915 |             Each entry is a n_instances x n_features array, and the length of the list equals the dimensionality
 916 |             of the predictor output. The rows of each array correspond to the shap values for the instances with
 917 |             the corresponding row index in `X`. The length of the list equals the number of model outputs.
 918 |         expected_value
 919 |             A list containing the expected value of the prediction for each class. Its length should be equal to that of
 920 |             `shap_values`.
 921 | 
 922 |         Returns
 923 |         -------
 924 |         explanation
 925 |             An explanation object containing the shap values and prediction in the `data` field, along with a `meta`
 926 |             field containing additional data. See usage `examples`_ for details.
 927 | 
 928 |             .. _examples:
 929 |                https://docs.seldon.io/projects/alibi/en/latest/methods/KernelSHAP.html
 930 | 
 931 |         """
 932 | 
 933 |         # TODO: DEFINE COMPLETE SCHEMA FOR THE METADATA (ONGOING)
 934 |         # TODO: Plotting default should be same space as the explanation? How do we figure out what space they
 935 |         #  explain in?
 936 | 
 937 |         cat_vars_start_idx = kwargs.get('cat_vars_start_idx', ())  # type: Sequence[int]
 938 |         cat_vars_enc_dim = kwargs.get('cat_vars_enc_dim', ())  # type: Sequence[int]
 939 |         summarise_result = kwargs.get('summarise_result', False)  # type: bool
 940 |         if summarise_result:
 941 |             self._check_result_summarisation(summarise_result, cat_vars_start_idx, cat_vars_enc_dim)
 942 |         if self.summarise_result:
 943 |             summarised_shap = []
 944 |             for shap_array in shap_values:
 945 |                 summarised_shap.append(sum_categories(shap_array, cat_vars_start_idx, cat_vars_enc_dim))
 946 |             shap_values = summarised_shap
 947 | 
 948 |         # apply explainer link function to obtain raw predictions on the same scale as used by the explainer
 949 |         linkfv = np.vectorize(convert_to_link(self.link).f)
 950 |         raw_predictions = linkfv(self.predictor(X))
 951 | 
 952 |         if self.task != 'regression':
 953 |             argmax_pred = np.argmax(np.atleast_2d(raw_predictions), axis=1)
 954 |         else:
 955 |             argmax_pred = []
 956 |         importances = rank_by_importance(shap_values, feature_names=self.feature_names)
 957 | 
 958 |         if isinstance(X, sparse.spmatrix):
 959 |             X = X.toarray()
 960 |         else:
 961 |             X = np.array(X)
 962 | 
 963 |         # output explanation dictionary
 964 |         data = copy.deepcopy(DEFAULT_DATA_KERNEL_SHAP)
 965 |         data.update(
 966 |             shap_values=shap_values,
 967 |             expected_value=np.array(expected_value),
 968 |             link=self.link,
 969 |             categorical_names=self.categorical_names,
 970 |             feature_names=self.feature_names
 971 |         )
 972 |         data['raw'].update(
 973 |             raw_prediction=raw_predictions,
 974 |             prediction=argmax_pred,
 975 |             instances=X,
 976 |             importances=importances
 977 |         )
 978 |         self._update_metadata({"summarise_result": self.summarise_result}, params=True)
 979 | 
 980 |         return Explanation(meta=copy.deepcopy(self.meta), data=data)
 981 | 
 982 |     def _check_result_summarisation(self,
 983 |                                     summarise_result: bool,
 984 |                                     cat_vars_start_idx: Sequence[int],
 985 |                                     cat_vars_enc_dim: Sequence[int]) -> None:
 986 |         """
 987 |         This function checks whether the result summarisation option is correct given the inputs and explainer setup.
 988 | 
 989 |         Parameters
 990 |         ----------
 991 |         summarise_result:
 992 |             See `explain` documentation.
 993 |         cat_vars_start_idx:
 994 |             See `explain` documentation.
 995 |         cat_vars_enc_dim:
 996 |             See `explain` documentation.
 997 |         """
 998 | 
 999 |         self.summarise_result = summarise_result
1000 |         if summarise_result:
1001 |             if not cat_vars_start_idx or not cat_vars_enc_dim:
1002 |                 logger.warning(
1003 |                     "Results cannot be summarised as either the"
1004 |                     "start indices for categorical variables or"
1005 |                     "the encoding dimensions were not passed!"
1006 |                 )
1007 |                 self.summarise_result = False
1008 |             elif self.use_groups:
1009 |                 logger.warning(
1010 |                     "Specified both groups as well as summarisation for categorical variables. "
1011 |                     "By grouping, only one shap value is estimated for each categorical variable. "
1012 |                     "Summarisation is not necessary!"
1013 |                 )
1014 |                 self.summarise_result = False
1015 | 


--------------------------------------------------------------------------------
/explainers/utils.py:
--------------------------------------------------------------------------------
  1 | import io
  2 | import logging
  3 | import os
  4 | import pickle
  5 | import requests
  6 | 
  7 | import numpy as np
  8 | 
  9 | from functools import singledispatch, update_wrapper
 10 | from scipy import sparse
 11 | from typing import Callable, List
 12 | 
 13 | 
 14 | EXPLANATIONS_SET_URL = 'https://storage.googleapis.com/seldon-datasets/experiments/distributed_kernel_shap/adult_processed.pkl'
 15 | BACKGROUND_SET_URL = 'https://storage.googleapis.com/seldon-datasets/experiments/distributed_kernel_shap/adult_background.pkl'
 16 | EXPLANATIONS_SET_LOCAL = 'data/adult_processed.pkl'
 17 | BACKGROUND_SET_LOCAL = 'data/adult_background.pkl'
 18 | MODEL_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl'
 19 | MODEL_LOCAL = 'assets/predictor.pkl'
 20 | 
 21 | 
 22 | class Bunch(dict):
 23 |     """
 24 |     Container object for internal datasets. Dictionary-like object that exposes its keys as attributes.
 25 |     """
 26 | 
 27 |     def __init__(self, **kwargs):
 28 |         super().__init__(kwargs)
 29 | 
 30 |     def __setattr__(self, key, value):
 31 |         self[key] = value
 32 | 
 33 |     def __dir__(self):
 34 |         return self.keys()
 35 | 
 36 |     def __getattr__(self, key):
 37 |         try:
 38 |             return self[key]
 39 |         except KeyError:
 40 |             raise AttributeError(key)
 41 | 
 42 | 
 43 | def methdispatch(func: Callable):
 44 |     """
 45 |     A decorator that is used to support singledispatch style functionality
 46 |     for instance methods. By default, singledispatch selects a function to
 47 |     call from registered based on the type of args[0]:
 48 | 
 49 |         def wrapper(*args, **kw):
 50 |             return dispatch(args[0].__class__)(*args, **kw)
 51 | 
 52 |     This uses singledispatch to do achieve this but instead uses args[1]
 53 |     since args[0] will always be self.
 54 |     """
 55 | 
 56 |     dispatcher = singledispatch(func)
 57 | 
 58 |     def wrapper(*args, **kw):
 59 |         return dispatcher.dispatch(args[1].__class__)(*args, **kw)
 60 | 
 61 |     wrapper.register = dispatcher.register
 62 |     update_wrapper(wrapper, dispatcher)
 63 | 
 64 |     return wrapper
 65 | 
 66 | 
 67 | def get_filename(workers: int, batch_size: int, cpu_fraction: float = 1.0, serve: bool = True):
 68 |     """
 69 |     Creates a filename for an experiment given the inputs.
 70 | 
 71 |     Parameters
 72 |     ----------
 73 |     workers
 74 |         How many worker processes are used for the explanation task.
 75 |     batch_size
 76 |         Mini-batch size: how many explanations are sent to one worker process at a time.
 77 |     cpu_fraction
 78 |         CPU fraction utilized by a worker process.
 79 |     serve
 80 |         A different naming convention is used depending on whether ray serve is used to distribute the explanations or
 81 |         not.
 82 |     """
 83 | 
 84 |     if serve:
 85 |         return f"results/ray_replicas_{workers}_maxbatch_{batch_size}_actorfr_{cpu_fraction}.pkl"
 86 |     return f"results/ray_workers_{workers}_bsize_{batch_size}_actorfr_{cpu_fraction}.pkl"
 87 | 
 88 | 
 89 | def batch(X: np.ndarray, batch_size: int = None, n_batches: int = 4) -> List[np.ndarray]:
 90 |     """
 91 |     Splits the input into mini-batches.
 92 | 
 93 |     Parameters
 94 |     ----------
 95 |     X
 96 |         Array to be split.
 97 |     batch_size
 98 |         If not `None`, batches of this size are created. The sizes of the batches created might vary if the 0-th
 99 |         dimension of `X` is not divisible by `batch_size`. For an array of len l that should be split into n sections,
100 |         it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n.
101 |     n_batches
102 |         If `batch_size` is `None`, then `X` is split into `n_batches` mini-batches.
103 | 
104 |     Returns
105 |     ------
106 |         A list of sub-arrays of X.
107 |     """
108 | 
109 |     n_records = X.shape[0]
110 |     if isinstance(X, sparse.spmatrix):
111 |         X = X.toarray()
112 | 
113 |     if batch_size:
114 |         n_batches = n_records // batch_size
115 |         if n_records % batch_size != 0:
116 |             n_batches += 1
117 |         slices = [batch_size * i for i in range(1, n_batches)]
118 |         batches = np.array_split(X, slices)
119 |     else:
120 |         batches = np.array_split(X, n_batches)
121 |     return batches
122 | 
123 | 
124 | def _download(path: str):
125 |     """ Download from Seldon GC bucket indicated by `path`."""
126 | 
127 |     try:
128 |         resp = requests.get(path)
129 |         resp.raise_for_status()
130 |     except requests.RequestException:
131 |         logging.exception("Could not connect to bucket, URL may be out of service!")
132 |         raise ConnectionError
133 | 
134 |     return resp
135 | 
136 | 
137 | def load_model(path: str):
138 |     """
139 |     Load a model that has been saved locally or download a default model from a Seldon bucket.
140 |     """
141 | 
142 |     try:
143 |         with open(path, "rb") as f:
144 |             model = pickle.load(f)
145 |         return model
146 |     except FileNotFoundError:
147 |         logging.info(f"Could not find model {path}. Downloading from {MODEL_URL}...")
148 |         model_raw = _download(MODEL_URL)
149 |         model = pickle.load(io.BytesIO(model_raw.content))
150 | 
151 |         if not os.path.exists('assets'):
152 |             os.mkdir('assets')
153 | 
154 |         with open("assets/predictor.pkl", "wb") as f:
155 |             pickle.dump(model, f)
156 | 
157 |         return model
158 | 
159 | 
160 | def load_data():
161 |     """
162 |     Load instances to be explained and background data from the data/ directory if they exist, otherwise download
163 |     from Seldon Google Cloud bucket.
164 |     """
165 | 
166 |     data = {'all': None, 'background': None}
167 |     try:
168 |         with open(BACKGROUND_SET_LOCAL, 'rb') as f:
169 |             data['background'] = pickle.load(f)
170 |         with open(EXPLANATIONS_SET_LOCAL, 'rb') as f:
171 |             data['all'] = pickle.load(f)
172 |     except FileNotFoundError:
173 |         logging.info(f"Downloading data from {EXPLANATIONS_SET_URL}")
174 |         all_data_raw = _download(EXPLANATIONS_SET_URL)
175 |         data['all'] = pickle.load(io.BytesIO(all_data_raw.content))
176 |         logging.info(f"Downloading data from {BACKGROUND_SET_URL}")
177 |         background_data_raw = _download(BACKGROUND_SET_URL)
178 |         data['background'] = pickle.load(io.BytesIO(background_data_raw.content))
179 | 
180 |         # save the data locally so we don't download it every time we run the main script
181 |         if not os.path.exists('data'):
182 |             os.mkdir('data')
183 |         with open('data/adult_background.pkl', 'wb') as f:
184 |             pickle.dump(data['background'], f)
185 |         with open('data/adult_processed.pkl', 'wb') as f:
186 |             pickle.dump(data['all'], f)
187 | 
188 |     return data
189 | 


--------------------------------------------------------------------------------
/explainers/wrappers.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | 
 3 | import numpy as np
 4 | 
 5 | from explainers.kernel_shap import KernelShap
 6 | from ray import serve
 7 | from typing import Any, Dict, List
 8 | 
 9 | 
10 | class KernelShapModel:
11 |     """Backend class for distributing explanations with Ray Serve."""
12 |     def __init__(self,
13 |                  predictor,
14 |                  background_data: np.ndarray,
15 |                  constructor_kwargs: Dict[str, Any],
16 |                  fit_kwargs: Dict[str, Any]):
17 |         """
18 |         Initialises backend for distributed explanations.
19 | 
20 | 
21 |         Parameters
22 |         ----------
23 |         predictor
24 |             Model to be explained.
25 |         background_data
26 |             Background data used for fitting the explainer.
27 |         constructor_kwargs
28 |             Any other arguments for explainer constructor. See `explainers.kernel_shap.KernelShap` for details.
29 |         fit_kwargs
30 |             Any other arguments for the explainer `fit` method. See `explainers.kernel_shap.KernelShap` for details.
31 |         """
32 | 
33 |         if not hasattr(predictor, "predict_proba"):
34 |             logging.warning("Predictor does not have predict_proba attribute, defaulting to predict")
35 |             predict_fcn = predictor.predict
36 |         else:
37 |             predict_fcn = predictor.predict_proba
38 |         self.explainer = KernelShap(predict_fcn, **constructor_kwargs)
39 | 
40 |         # TODO: REFACTOR THIS TO USE THE BACKEND METHOD CALLING FUNCTIONALITY
41 |         self.explainer.fit(background_data, **fit_kwargs)
42 | 
43 |     def __call__(self, flask_request) -> str:
44 |         """
45 |         Serves explanations for a single instance.
46 | 
47 |         Parameters
48 |         ---------
49 |         flask_request
50 |             A json flask request that contains a list with the instance to be explained in the ``array`` field.
51 | 
52 |         Returns
53 |         -------
54 |         A `str` object representing a json representation of the explainer output.
55 |         """
56 |         instance = np.array(flask_request.json["array"])
57 |         explanations = self.explainer.explain(instance, silent=True)
58 | 
59 |         return explanations.to_json()
60 | 
61 | 
62 | class BatchKernelShapModel(KernelShapModel):
63 |     """Extends KernelShapModel to achieve batching of requests."""
64 | 
65 |     @serve.accept_batch
66 |     def __call__(self, flask_requests: List) -> List[str]:
67 |         """
68 |         Serves explanations for a single instance.
69 | 
70 |         Parameters
71 |         ----------
72 |         flask_requests:
73 |             A list of json flask requests. Each request should contain an instance to be explained in the ``array``
74 |             field.
75 | 
76 |         Returns
77 |         -------
78 |         A `str` object representing a json representation of the explainer output.
79 |         """
80 | 
81 |         instances = [request.json["array"] for request in flask_requests]
82 |         explanations = []
83 |         for instance in instances:
84 |             explanations.append(
85 |                 self.explainer.explain(np.array(instance), silent=True).to_json()
86 |             )
87 | 
88 |         return explanations
89 | 


--------------------------------------------------------------------------------
/images/pool_1_node.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_1_node.PNG


--------------------------------------------------------------------------------
/images/pool_k8s_32.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_k8s_32.PNG


--------------------------------------------------------------------------------
/images/pool_k8s_56.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/pool_k8s_56.PNG


--------------------------------------------------------------------------------
/images/serve_1_node.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_1_node.PNG


--------------------------------------------------------------------------------
/images/serve_k8s_32.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_k8s_32.PNG


--------------------------------------------------------------------------------
/images/serve_k8s_56.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexcoca/DistributedKernelShap/04c96d43b9e30c28ed38d2cbf41ff292587df380/images/serve_k8s_56.PNG


--------------------------------------------------------------------------------
/poetry.lock:
--------------------------------------------------------------------------------
   1 | [[package]]
   2 | category = "main"
   3 | description = "Async http client/server framework (asyncio)"
   4 | name = "aiohttp"
   5 | optional = false
   6 | python-versions = ">=3.5.3"
   7 | version = "3.6.2"
   8 | 
   9 | [package.dependencies]
  10 | async-timeout = ">=3.0,<4.0"
  11 | attrs = ">=17.3.0"
  12 | chardet = ">=2.0,<4.0"
  13 | multidict = ">=4.5,<5.0"
  14 | yarl = ">=1.0,<2.0"
  15 | 
  16 | [package.extras]
  17 | speedups = ["aiodns", "brotlipy", "cchardet"]
  18 | 
  19 | [[package]]
  20 | category = "main"
  21 | description = "Timeout context manager for asyncio programs"
  22 | name = "async-timeout"
  23 | optional = false
  24 | python-versions = ">=3.5.3"
  25 | version = "3.0.1"
  26 | 
  27 | [[package]]
  28 | category = "main"
  29 | description = "Classes Without Boilerplate"
  30 | name = "attrs"
  31 | optional = false
  32 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
  33 | version = "20.1.0"
  34 | 
  35 | [package.extras]
  36 | dev = ["coverage (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface", "sphinx", "sphinx-rtd-theme", "pre-commit"]
  37 | docs = ["sphinx", "sphinx-rtd-theme", "zope.interface"]
  38 | tests = ["coverage (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface"]
  39 | 
  40 | [[package]]
  41 | category = "main"
  42 | description = "Screen-scraping library"
  43 | name = "beautifulsoup4"
  44 | optional = false
  45 | python-versions = "*"
  46 | version = "4.9.1"
  47 | 
  48 | [package.dependencies]
  49 | soupsieve = [">1.2", "<2.0"]
  50 | 
  51 | [package.extras]
  52 | html5lib = ["html5lib"]
  53 | lxml = ["lxml"]
  54 | 
  55 | [[package]]
  56 | category = "main"
  57 | description = "a list-like type with better asymptotic performance and similar performance on small lists"
  58 | name = "blist"
  59 | optional = false
  60 | python-versions = "*"
  61 | version = "1.3.6"
  62 | 
  63 | [[package]]
  64 | category = "main"
  65 | description = "Python package for providing Mozilla's CA Bundle."
  66 | name = "certifi"
  67 | optional = false
  68 | python-versions = "*"
  69 | version = "2020.6.20"
  70 | 
  71 | [[package]]
  72 | category = "main"
  73 | description = "Universal encoding detector for Python 2 and 3"
  74 | name = "chardet"
  75 | optional = false
  76 | python-versions = "*"
  77 | version = "3.0.4"
  78 | 
  79 | [[package]]
  80 | category = "main"
  81 | description = "Composable command line interface toolkit"
  82 | name = "click"
  83 | optional = false
  84 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
  85 | version = "7.1.2"
  86 | 
  87 | [[package]]
  88 | category = "main"
  89 | description = "Cross-platform colored terminal text."
  90 | name = "colorama"
  91 | optional = false
  92 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
  93 | version = "0.4.3"
  94 | 
  95 | [[package]]
  96 | category = "main"
  97 | description = "Terminal string styling done right, in Python."
  98 | name = "colorful"
  99 | optional = false
 100 | python-versions = "*"
 101 | version = "0.5.4"
 102 | 
 103 | [package.dependencies]
 104 | colorama = "*"
 105 | 
 106 | [[package]]
 107 | category = "main"
 108 | description = "A platform independent file lock."
 109 | name = "filelock"
 110 | optional = false
 111 | python-versions = "*"
 112 | version = "3.0.12"
 113 | 
 114 | [[package]]
 115 | category = "main"
 116 | description = "A simple framework for building complex web applications."
 117 | name = "flask"
 118 | optional = false
 119 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 120 | version = "1.1.2"
 121 | 
 122 | [package.dependencies]
 123 | Jinja2 = ">=2.10.1"
 124 | Werkzeug = ">=0.15"
 125 | click = ">=5.1"
 126 | itsdangerous = ">=0.24"
 127 | 
 128 | [package.extras]
 129 | dev = ["pytest", "coverage", "tox", "sphinx", "pallets-sphinx-themes", "sphinxcontrib-log-cabinet", "sphinx-issues"]
 130 | docs = ["sphinx", "pallets-sphinx-themes", "sphinxcontrib-log-cabinet", "sphinx-issues"]
 131 | dotenv = ["python-dotenv"]
 132 | 
 133 | [[package]]
 134 | category = "main"
 135 | description = "Python bindings to the Google search engine."
 136 | name = "google"
 137 | optional = false
 138 | python-versions = "*"
 139 | version = "3.0.0"
 140 | 
 141 | [package.dependencies]
 142 | beautifulsoup4 = "*"
 143 | 
 144 | [[package]]
 145 | category = "main"
 146 | description = "HTTP/2-based RPC framework"
 147 | name = "grpcio"
 148 | optional = false
 149 | python-versions = "*"
 150 | version = "1.31.0"
 151 | 
 152 | [package.dependencies]
 153 | six = ">=1.5.2"
 154 | 
 155 | [package.extras]
 156 | protobuf = ["grpcio-tools (>=1.31.0)"]
 157 | 
 158 | [[package]]
 159 | category = "main"
 160 | description = "A pure-Python, bring-your-own-I/O implementation of HTTP/1.1"
 161 | name = "h11"
 162 | optional = false
 163 | python-versions = "*"
 164 | version = "0.9.0"
 165 | 
 166 | [[package]]
 167 | category = "main"
 168 | description = "A collection of framework independent HTTP protocol utils."
 169 | marker = "sys_platform != \"win32\" and sys_platform != \"cygwin\" and platform_python_implementation != \"PyPy\""
 170 | name = "httptools"
 171 | optional = false
 172 | python-versions = "*"
 173 | version = "0.1.1"
 174 | 
 175 | [package.extras]
 176 | test = ["Cython (0.29.14)"]
 177 | 
 178 | [[package]]
 179 | category = "main"
 180 | description = "Internationalized Domain Names in Applications (IDNA)"
 181 | name = "idna"
 182 | optional = false
 183 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
 184 | version = "2.10"
 185 | 
 186 | [[package]]
 187 | category = "main"
 188 | description = "Read metadata from Python packages"
 189 | marker = "python_version < \"3.8\""
 190 | name = "importlib-metadata"
 191 | optional = false
 192 | python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7"
 193 | version = "1.7.0"
 194 | 
 195 | [package.dependencies]
 196 | zipp = ">=0.5"
 197 | 
 198 | [package.extras]
 199 | docs = ["sphinx", "rst.linker"]
 200 | testing = ["packaging", "pep517", "importlib-resources (>=1.3)"]
 201 | 
 202 | [[package]]
 203 | category = "main"
 204 | description = "Various helpers to pass data to untrusted environments and back."
 205 | name = "itsdangerous"
 206 | optional = false
 207 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
 208 | version = "1.1.0"
 209 | 
 210 | [[package]]
 211 | category = "main"
 212 | description = "A very fast and expressive template engine."
 213 | name = "jinja2"
 214 | optional = false
 215 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 216 | version = "2.11.2"
 217 | 
 218 | [package.dependencies]
 219 | MarkupSafe = ">=0.23"
 220 | 
 221 | [package.extras]
 222 | i18n = ["Babel (>=0.8)"]
 223 | 
 224 | [[package]]
 225 | category = "main"
 226 | description = "Lightweight pipelining: using Python functions as pipeline jobs."
 227 | name = "joblib"
 228 | optional = false
 229 | python-versions = ">=3.6"
 230 | version = "0.16.0"
 231 | 
 232 | [[package]]
 233 | category = "main"
 234 | description = "An implementation of JSON Schema validation for Python"
 235 | name = "jsonschema"
 236 | optional = false
 237 | python-versions = "*"
 238 | version = "3.2.0"
 239 | 
 240 | [package.dependencies]
 241 | attrs = ">=17.4.0"
 242 | pyrsistent = ">=0.14.0"
 243 | setuptools = "*"
 244 | six = ">=1.11.0"
 245 | 
 246 | [package.dependencies.importlib-metadata]
 247 | python = "<3.8"
 248 | version = "*"
 249 | 
 250 | [package.extras]
 251 | format = ["idna", "jsonpointer (>1.13)", "rfc3987", "strict-rfc3339", "webcolors"]
 252 | format_nongpl = ["idna", "jsonpointer (>1.13)", "webcolors", "rfc3986-validator (>0.1.0)", "rfc3339-validator"]
 253 | 
 254 | [[package]]
 255 | category = "main"
 256 | description = "Safely add untrusted strings to HTML/XML markup."
 257 | name = "markupsafe"
 258 | optional = false
 259 | python-versions = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*"
 260 | version = "1.1.1"
 261 | 
 262 | [[package]]
 263 | category = "main"
 264 | description = "MessagePack (de)serializer."
 265 | name = "msgpack"
 266 | optional = false
 267 | python-versions = "*"
 268 | version = "1.0.0"
 269 | 
 270 | [[package]]
 271 | category = "main"
 272 | description = "multidict implementation"
 273 | name = "multidict"
 274 | optional = false
 275 | python-versions = ">=3.5"
 276 | version = "4.7.6"
 277 | 
 278 | [[package]]
 279 | category = "main"
 280 | description = "NumPy is the fundamental package for array computing with Python."
 281 | name = "numpy"
 282 | optional = false
 283 | python-versions = ">=3.6"
 284 | version = "1.19.1"
 285 | 
 286 | [[package]]
 287 | category = "main"
 288 | description = "Powerful data structures for data analysis, time series, and statistics"
 289 | name = "pandas"
 290 | optional = false
 291 | python-versions = ">=3.6.1"
 292 | version = "1.1.1"
 293 | 
 294 | [package.dependencies]
 295 | numpy = ">=1.15.4"
 296 | python-dateutil = ">=2.7.3"
 297 | pytz = ">=2017.2"
 298 | 
 299 | [package.extras]
 300 | test = ["pytest (>=4.0.2)", "pytest-xdist", "hypothesis (>=3.58)"]
 301 | 
 302 | [[package]]
 303 | category = "main"
 304 | description = "Syntax-highlighting, declarative and composable pretty printer for Python 3.5+"
 305 | name = "prettyprinter"
 306 | optional = false
 307 | python-versions = "*"
 308 | version = "0.18.0"
 309 | 
 310 | [package.dependencies]
 311 | Pygments = ">=2.2.0"
 312 | colorful = ">=0.4.0"
 313 | 
 314 | [[package]]
 315 | category = "main"
 316 | description = "Protocol Buffers"
 317 | name = "protobuf"
 318 | optional = false
 319 | python-versions = "*"
 320 | version = "3.13.0"
 321 | 
 322 | [package.dependencies]
 323 | setuptools = "*"
 324 | six = ">=1.9"
 325 | 
 326 | [[package]]
 327 | category = "main"
 328 | description = "A Sampling Profiler for Python"
 329 | name = "py-spy"
 330 | optional = false
 331 | python-versions = "*"
 332 | version = "0.3.3"
 333 | 
 334 | [[package]]
 335 | category = "main"
 336 | description = "Pygments is a syntax highlighting package written in Python."
 337 | name = "pygments"
 338 | optional = false
 339 | python-versions = ">=3.5"
 340 | version = "2.6.1"
 341 | 
 342 | [[package]]
 343 | category = "main"
 344 | description = "Persistent/Functional/Immutable data structures"
 345 | name = "pyrsistent"
 346 | optional = false
 347 | python-versions = "*"
 348 | version = "0.16.0"
 349 | 
 350 | [package.dependencies]
 351 | six = "*"
 352 | 
 353 | [[package]]
 354 | category = "main"
 355 | description = "Extensions to the standard Python datetime module"
 356 | name = "python-dateutil"
 357 | optional = false
 358 | python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,>=2.7"
 359 | version = "2.8.1"
 360 | 
 361 | [package.dependencies]
 362 | six = ">=1.5"
 363 | 
 364 | [[package]]
 365 | category = "main"
 366 | description = "World timezone definitions, modern and historical"
 367 | name = "pytz"
 368 | optional = false
 369 | python-versions = "*"
 370 | version = "2020.1"
 371 | 
 372 | [[package]]
 373 | category = "main"
 374 | description = "YAML parser and emitter for Python"
 375 | name = "pyyaml"
 376 | optional = false
 377 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 378 | version = "5.3.1"
 379 | 
 380 | [[package]]
 381 | category = "main"
 382 | description = "A system for parallel and distributed Python that unifies the ML ecosystem."
 383 | name = "ray"
 384 | optional = false
 385 | python-versions = "*"
 386 | version = "0.8.6"
 387 | 
 388 | [package.dependencies]
 389 | aiohttp = "*"
 390 | click = ">=7.0"
 391 | colorama = "*"
 392 | filelock = "*"
 393 | google = "*"
 394 | grpcio = "*"
 395 | jsonschema = "*"
 396 | msgpack = ">=0.6.0,<2.0.0"
 397 | numpy = ">=1.16"
 398 | protobuf = ">=3.8.0"
 399 | py-spy = ">=0.2.0"
 400 | pyyaml = "*"
 401 | redis = ">=3.3.2,<3.5.0"
 402 | 
 403 | [package.dependencies.blist]
 404 | optional = true
 405 | version = "*"
 406 | 
 407 | [package.dependencies.flask]
 408 | optional = true
 409 | version = "*"
 410 | 
 411 | [package.dependencies.uvicorn]
 412 | optional = true
 413 | version = "*"
 414 | 
 415 | [package.extras]
 416 | all = ["scipy", "atari-py", "dm-tree", "opencv-python-headless", "pandas", "gym", "tensorboardx", "blist", "tabulate", "lz4", "msgpack (>=0.6.2)", "pyyaml", "gpustat", "uvicorn", "requests", "flask"]
 417 | dashboard = ["requests", "gpustat"]
 418 | rllib = ["tabulate", "tensorboardx", "pandas", "atari-py", "dm-tree", "gym", "lz4", "opencv-python-headless", "pyyaml", "scipy"]
 419 | serve = ["uvicorn", "flask", "blist"]
 420 | streaming = ["msgpack (>=0.6.2)"]
 421 | tune = ["tabulate", "tensorboardx", "pandas"]
 422 | 
 423 | [[package]]
 424 | category = "main"
 425 | description = "Python client for Redis key-value store"
 426 | name = "redis"
 427 | optional = false
 428 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 429 | version = "3.4.1"
 430 | 
 431 | [package.extras]
 432 | hiredis = ["hiredis (>=0.1.3)"]
 433 | 
 434 | [[package]]
 435 | category = "main"
 436 | description = "Python HTTP for Humans."
 437 | name = "requests"
 438 | optional = false
 439 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 440 | version = "2.24.0"
 441 | 
 442 | [package.dependencies]
 443 | certifi = ">=2017.4.17"
 444 | chardet = ">=3.0.2,<4"
 445 | idna = ">=2.5,<3"
 446 | urllib3 = ">=1.21.1,<1.25.0 || >1.25.0,<1.25.1 || >1.25.1,<1.26"
 447 | 
 448 | [package.extras]
 449 | security = ["pyOpenSSL (>=0.14)", "cryptography (>=1.3.4)"]
 450 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7)", "win-inet-pton"]
 451 | 
 452 | [[package]]
 453 | category = "main"
 454 | description = "A set of python modules for machine learning and data mining"
 455 | name = "scikit-learn"
 456 | optional = false
 457 | python-versions = ">=3.6"
 458 | version = "0.23.2"
 459 | 
 460 | [package.dependencies]
 461 | joblib = ">=0.11"
 462 | numpy = ">=1.13.3"
 463 | scipy = ">=0.19.1"
 464 | threadpoolctl = ">=2.0.0"
 465 | 
 466 | [package.extras]
 467 | alldeps = ["numpy (>=1.13.3)", "scipy (>=0.19.1)"]
 468 | 
 469 | [[package]]
 470 | category = "main"
 471 | description = "SciPy: Scientific Library for Python"
 472 | name = "scipy"
 473 | optional = false
 474 | python-versions = ">=3.6"
 475 | version = "1.5.2"
 476 | 
 477 | [package.dependencies]
 478 | numpy = ">=1.14.5"
 479 | 
 480 | [[package]]
 481 | category = "main"
 482 | description = "A unified approach to explain the output of any machine learning model."
 483 | name = "shap"
 484 | optional = false
 485 | python-versions = "*"
 486 | version = "0.35.0"
 487 | 
 488 | [package.dependencies]
 489 | numpy = "*"
 490 | pandas = "*"
 491 | scikit-learn = "*"
 492 | scipy = "*"
 493 | tqdm = ">4.25.0"
 494 | 
 495 | [package.extras]
 496 | all = ["lime", "ipython", "matplotlib"]
 497 | others = ["lime"]
 498 | plots = ["matplotlib", "ipython"]
 499 | 
 500 | [[package]]
 501 | category = "main"
 502 | description = "Python 2 and 3 compatibility utilities"
 503 | name = "six"
 504 | optional = false
 505 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*"
 506 | version = "1.15.0"
 507 | 
 508 | [[package]]
 509 | category = "main"
 510 | description = "A modern CSS selector implementation for Beautiful Soup."
 511 | name = "soupsieve"
 512 | optional = false
 513 | python-versions = "*"
 514 | version = "1.9.6"
 515 | 
 516 | [[package]]
 517 | category = "main"
 518 | description = "threadpoolctl"
 519 | name = "threadpoolctl"
 520 | optional = false
 521 | python-versions = ">=3.5"
 522 | version = "2.1.0"
 523 | 
 524 | [[package]]
 525 | category = "main"
 526 | description = "Fast, Extensible Progress Meter"
 527 | name = "tqdm"
 528 | optional = false
 529 | python-versions = ">=2.6, !=3.0.*, !=3.1.*"
 530 | version = "4.48.2"
 531 | 
 532 | [package.extras]
 533 | dev = ["py-make (>=0.1.0)", "twine", "argopt", "pydoc-markdown"]
 534 | 
 535 | [[package]]
 536 | category = "main"
 537 | description = "Backported and Experimental Type Hints for Python 3.5+"
 538 | marker = "python_version < \"3.8\""
 539 | name = "typing-extensions"
 540 | optional = false
 541 | python-versions = "*"
 542 | version = "3.7.4.3"
 543 | 
 544 | [[package]]
 545 | category = "main"
 546 | description = "HTTP library with thread-safe connection pooling, file post, and more."
 547 | name = "urllib3"
 548 | optional = false
 549 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4"
 550 | version = "1.25.10"
 551 | 
 552 | [package.extras]
 553 | brotli = ["brotlipy (>=0.6.0)"]
 554 | secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "pyOpenSSL (>=0.14)", "ipaddress"]
 555 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7,<2.0)"]
 556 | 
 557 | [[package]]
 558 | category = "main"
 559 | description = "The lightning-fast ASGI server."
 560 | name = "uvicorn"
 561 | optional = false
 562 | python-versions = "*"
 563 | version = "0.11.8"
 564 | 
 565 | [package.dependencies]
 566 | click = ">=7.0.0,<8.0.0"
 567 | h11 = ">=0.8,<0.10"
 568 | httptools = ">=0.1.0,<0.2.0"
 569 | uvloop = ">=0.14.0"
 570 | websockets = ">=8.0.0,<9.0.0"
 571 | 
 572 | [package.extras]
 573 | watchgodreload = ["watchgod (>=0.6,<0.7)"]
 574 | 
 575 | [[package]]
 576 | category = "main"
 577 | description = "Fast implementation of asyncio event loop on top of libuv"
 578 | marker = "sys_platform != \"win32\" and sys_platform != \"cygwin\" and platform_python_implementation != \"PyPy\""
 579 | name = "uvloop"
 580 | optional = false
 581 | python-versions = "*"
 582 | version = "0.14.0"
 583 | 
 584 | [[package]]
 585 | category = "main"
 586 | description = "An implementation of the WebSocket Protocol (RFC 6455 & 7692)"
 587 | name = "websockets"
 588 | optional = false
 589 | python-versions = ">=3.6.1"
 590 | version = "8.1"
 591 | 
 592 | [[package]]
 593 | category = "main"
 594 | description = "The comprehensive WSGI web application library."
 595 | name = "werkzeug"
 596 | optional = false
 597 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 598 | version = "1.0.1"
 599 | 
 600 | [package.extras]
 601 | dev = ["pytest", "pytest-timeout", "coverage", "tox", "sphinx", "pallets-sphinx-themes", "sphinx-issues"]
 602 | watchdog = ["watchdog"]
 603 | 
 604 | [[package]]
 605 | category = "main"
 606 | description = "Yet another URL library"
 607 | name = "yarl"
 608 | optional = false
 609 | python-versions = ">=3.5"
 610 | version = "1.5.1"
 611 | 
 612 | [package.dependencies]
 613 | idna = ">=2.0"
 614 | multidict = ">=4.0"
 615 | 
 616 | [package.dependencies.typing-extensions]
 617 | python = "<3.8"
 618 | version = ">=3.7.4"
 619 | 
 620 | [[package]]
 621 | category = "main"
 622 | description = "Backport of pathlib-compatible object wrapper for zip files"
 623 | marker = "python_version < \"3.8\""
 624 | name = "zipp"
 625 | optional = false
 626 | python-versions = ">=3.6"
 627 | version = "3.1.0"
 628 | 
 629 | [package.extras]
 630 | docs = ["sphinx", "jaraco.packaging (>=3.2)", "rst.linker (>=1.9)"]
 631 | testing = ["jaraco.itertools", "func-timeout"]
 632 | 
 633 | [metadata]
 634 | content-hash = "e2f69ddb89178d8865436ba1b16f0ac88ddad592cf9e215eacc27615829896a1"
 635 | lock-version = "1.0"
 636 | python-versions = "^3.7"
 637 | 
 638 | [metadata.files]
 639 | aiohttp = [
 640 |     {file = "aiohttp-3.6.2-cp35-cp35m-macosx_10_13_x86_64.whl", hash = "sha256:1e984191d1ec186881ffaed4581092ba04f7c61582a177b187d3a2f07ed9719e"},
 641 |     {file = "aiohttp-3.6.2-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:50aaad128e6ac62e7bf7bd1f0c0a24bc968a0c0590a726d5a955af193544bcec"},
 642 |     {file = "aiohttp-3.6.2-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:65f31b622af739a802ca6fd1a3076fd0ae523f8485c52924a89561ba10c49b48"},
 643 |     {file = "aiohttp-3.6.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:ae55bac364c405caa23a4f2d6cfecc6a0daada500274ffca4a9230e7129eac59"},
 644 |     {file = "aiohttp-3.6.2-cp36-cp36m-win32.whl", hash = "sha256:344c780466b73095a72c616fac5ea9c4665add7fc129f285fbdbca3cccf4612a"},
 645 |     {file = "aiohttp-3.6.2-cp36-cp36m-win_amd64.whl", hash = "sha256:4c6efd824d44ae697814a2a85604d8e992b875462c6655da161ff18fd4f29f17"},
 646 |     {file = "aiohttp-3.6.2-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:2f4d1a4fdce595c947162333353d4a44952a724fba9ca3205a3df99a33d1307a"},
 647 |     {file = "aiohttp-3.6.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:6206a135d072f88da3e71cc501c59d5abffa9d0bb43269a6dcd28d66bfafdbdd"},
 648 |     {file = "aiohttp-3.6.2-cp37-cp37m-win32.whl", hash = "sha256:b778ce0c909a2653741cb4b1ac7015b5c130ab9c897611df43ae6a58523cb965"},
 649 |     {file = "aiohttp-3.6.2-cp37-cp37m-win_amd64.whl", hash = "sha256:32e5f3b7e511aa850829fbe5aa32eb455e5534eaa4b1ce93231d00e2f76e5654"},
 650 |     {file = "aiohttp-3.6.2-py3-none-any.whl", hash = "sha256:460bd4237d2dbecc3b5ed57e122992f60188afe46e7319116da5eb8a9dfedba4"},
 651 |     {file = "aiohttp-3.6.2.tar.gz", hash = "sha256:259ab809ff0727d0e834ac5e8a283dc5e3e0ecc30c4d80b3cd17a4139ce1f326"},
 652 | ]
 653 | async-timeout = [
 654 |     {file = "async-timeout-3.0.1.tar.gz", hash = "sha256:0c3c816a028d47f659d6ff5c745cb2acf1f966da1fe5c19c77a70282b25f4c5f"},
 655 |     {file = "async_timeout-3.0.1-py3-none-any.whl", hash = "sha256:4291ca197d287d274d0b6cb5d6f8f8f82d434ed288f962539ff18cc9012f9ea3"},
 656 | ]
 657 | attrs = [
 658 |     {file = "attrs-20.1.0-py2.py3-none-any.whl", hash = "sha256:2867b7b9f8326499ab5b0e2d12801fa5c98842d2cbd22b35112ae04bf85b4dff"},
 659 |     {file = "attrs-20.1.0.tar.gz", hash = "sha256:0ef97238856430dcf9228e07f316aefc17e8939fc8507e18c6501b761ef1a42a"},
 660 | ]
 661 | beautifulsoup4 = [
 662 |     {file = "beautifulsoup4-4.9.1-py2-none-any.whl", hash = "sha256:e718f2342e2e099b640a34ab782407b7b676f47ee272d6739e60b8ea23829f2c"},
 663 |     {file = "beautifulsoup4-4.9.1-py3-none-any.whl", hash = "sha256:a6237df3c32ccfaee4fd201c8f5f9d9df619b93121d01353a64a73ce8c6ef9a8"},
 664 |     {file = "beautifulsoup4-4.9.1.tar.gz", hash = "sha256:73cc4d115b96f79c7d77c1c7f7a0a8d4c57860d1041df407dd1aae7f07a77fd7"},
 665 | ]
 666 | blist = [
 667 |     {file = "blist-1.3.6.tar.gz", hash = "sha256:3a12c450b001bdf895b30ae818d4d6d3f1552096b8c995f0fe0c74bef04d1fc3"},
 668 | ]
 669 | certifi = [
 670 |     {file = "certifi-2020.6.20-py2.py3-none-any.whl", hash = "sha256:8fc0819f1f30ba15bdb34cceffb9ef04d99f420f68eb75d901e9560b8749fc41"},
 671 |     {file = "certifi-2020.6.20.tar.gz", hash = "sha256:5930595817496dd21bb8dc35dad090f1c2cd0adfaf21204bf6732ca5d8ee34d3"},
 672 | ]
 673 | chardet = [
 674 |     {file = "chardet-3.0.4-py2.py3-none-any.whl", hash = "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"},
 675 |     {file = "chardet-3.0.4.tar.gz", hash = "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae"},
 676 | ]
 677 | click = [
 678 |     {file = "click-7.1.2-py2.py3-none-any.whl", hash = "sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc"},
 679 |     {file = "click-7.1.2.tar.gz", hash = "sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a"},
 680 | ]
 681 | colorama = [
 682 |     {file = "colorama-0.4.3-py2.py3-none-any.whl", hash = "sha256:7d73d2a99753107a36ac6b455ee49046802e59d9d076ef8e47b61499fa29afff"},
 683 |     {file = "colorama-0.4.3.tar.gz", hash = "sha256:e96da0d330793e2cb9485e9ddfd918d456036c7149416295932478192f4436a1"},
 684 | ]
 685 | colorful = [
 686 |     {file = "colorful-0.5.4-py2.py3-none-any.whl", hash = "sha256:8d264b52a39aae4c0ba3e2a46afbaec81b0559a99be0d2cfe2aba4cf94531348"},
 687 |     {file = "colorful-0.5.4.tar.gz", hash = "sha256:86848ad4e2eda60cd2519d8698945d22f6f6551e23e95f3f14dfbb60997807ea"},
 688 | ]
 689 | filelock = [
 690 |     {file = "filelock-3.0.12-py3-none-any.whl", hash = "sha256:929b7d63ec5b7d6b71b0fa5ac14e030b3f70b75747cef1b10da9b879fef15836"},
 691 |     {file = "filelock-3.0.12.tar.gz", hash = "sha256:18d82244ee114f543149c66a6e0c14e9c4f8a1044b5cdaadd0f82159d6a6ff59"},
 692 | ]
 693 | flask = [
 694 |     {file = "Flask-1.1.2-py2.py3-none-any.whl", hash = "sha256:8a4fdd8936eba2512e9c85df320a37e694c93945b33ef33c89946a340a238557"},
 695 |     {file = "Flask-1.1.2.tar.gz", hash = "sha256:4efa1ae2d7c9865af48986de8aeb8504bf32c7f3d6fdc9353d34b21f4b127060"},
 696 | ]
 697 | google = [
 698 |     {file = "google-3.0.0-py2.py3-none-any.whl", hash = "sha256:889cf695f84e4ae2c55fbc0cfdaf4c1e729417fa52ab1db0485202ba173e4935"},
 699 |     {file = "google-3.0.0.tar.gz", hash = "sha256:143530122ee5130509ad5e989f0512f7cb218b2d4eddbafbad40fd10e8d8ccbe"},
 700 | ]
 701 | grpcio = [
 702 |     {file = "grpcio-1.31.0-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:e8c3264b0fd728aadf3f0324471843f65bd3b38872bdab2a477e31ffb685dd5b"},
 703 |     {file = "grpcio-1.31.0-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:5fb0923b16590bac338e92d98c7d8effb3cfad1d2e18c71bf86bde32c49cd6dd"},
 704 |     {file = "grpcio-1.31.0-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:58d7121f48cb94535a4cedcce32921d0d0a78563c7372a143dedeec196d1c637"},
 705 |     {file = "grpcio-1.31.0-cp27-cp27m-win32.whl", hash = "sha256:ea849210e7362559f326cbe603d5b8d8bb1e556e86a7393b5a8847057de5b084"},
 706 |     {file = "grpcio-1.31.0-cp27-cp27m-win_amd64.whl", hash = "sha256:ba3e43cb984399064ffaa3c0997576e46a1e268f9da05f97cd9b272f0b59ee71"},
 707 |     {file = "grpcio-1.31.0-cp27-cp27mu-linux_armv7l.whl", hash = "sha256:ebb2ca09fa17537e35508a29dcb05575d4d9401138a68e83d1c605d65e8a1770"},
 708 |     {file = "grpcio-1.31.0-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:292635f05b6ce33f87116951d0b3d8d330bdfc5cac74f739370d60981e8c256c"},
 709 |     {file = "grpcio-1.31.0-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:92e54ab65e782f227e751c7555918afaba8d1229601687e89b80c2b65d2f6642"},
 710 |     {file = "grpcio-1.31.0-cp35-cp35m-linux_armv7l.whl", hash = "sha256:013287f99c99b201aa8a5f6bc7918f616739b9be031db132d9e3b8453e95e151"},
 711 |     {file = "grpcio-1.31.0-cp35-cp35m-macosx_10_7_intel.whl", hash = "sha256:d2c5e05c257859febd03f5d81b5015e1946d6bcf475c7bf63ee99cea8ab0d590"},
 712 |     {file = "grpcio-1.31.0-cp35-cp35m-manylinux2010_i686.whl", hash = "sha256:c9016ab1eaf4e054099303287195f3746bd4e69f2631d040f9dca43e910a5408"},
 713 |     {file = "grpcio-1.31.0-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:baaa036540d7ace433bdf38a3fe5e41cf9f84cdf10a88bac805f678a7ca8ddcc"},
 714 |     {file = "grpcio-1.31.0-cp35-cp35m-manylinux2014_i686.whl", hash = "sha256:75e383053dccb610590aa53eed5278db5c09bf498d3b5105ce6c776478f59352"},
 715 |     {file = "grpcio-1.31.0-cp35-cp35m-manylinux2014_x86_64.whl", hash = "sha256:739a72abffbd36083ff7adbb862cf1afc1e311c35834bed9c0361d8e68b063e1"},
 716 |     {file = "grpcio-1.31.0-cp35-cp35m-win32.whl", hash = "sha256:f04c59d186af3157dc8811114130aaeae92e90a65283733f41de94eed484e1f7"},
 717 |     {file = "grpcio-1.31.0-cp35-cp35m-win_amd64.whl", hash = "sha256:ef9fce98b6fe03874c2a6576b02aec1a0df25742cd67d1d7b75a49e30aa74225"},
 718 |     {file = "grpcio-1.31.0-cp36-cp36m-linux_armv7l.whl", hash = "sha256:08a9b648dbe8852ff94b73a1c96da126834c3057ba2301d13e8c4adff334c482"},
 719 |     {file = "grpcio-1.31.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:c22b19abba63562a5a200e586b5bde39d26c8ec30c92e26d209d81182371693b"},
 720 |     {file = "grpcio-1.31.0-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:0397616355760cd8282ed5ea34d51830ae4cb6613b7e5f66bed3be5d041b8b9a"},
 721 |     {file = "grpcio-1.31.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:259240aab2603891553e17ad5b2655693df79e02a9b887ff605bdeb2fcd3dcc9"},
 722 |     {file = "grpcio-1.31.0-cp36-cp36m-manylinux2014_i686.whl", hash = "sha256:8ca26b489b5dc1e3d31807d329c23d6cb06fe40fbae25b0649b718947936e26a"},
 723 |     {file = "grpcio-1.31.0-cp36-cp36m-manylinux2014_x86_64.whl", hash = "sha256:bf39977282a79dc1b2765cc3402c0ada571c29a491caec6ed12c0993c1ec115e"},
 724 |     {file = "grpcio-1.31.0-cp36-cp36m-win32.whl", hash = "sha256:f5b0870b733bcb7b6bf05a02035e7aaf20f599d3802b390282d4c2309f825f1d"},
 725 |     {file = "grpcio-1.31.0-cp36-cp36m-win_amd64.whl", hash = "sha256:074871a184483d5cd0746fd01e7d214d3ee9d36e67e32a5786b0a21f29fb8304"},
 726 |     {file = "grpcio-1.31.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:220c46b1fc9c9a6fcca4caac398f08f0ed43cdd63c45b7458983c4a1575ef6df"},
 727 |     {file = "grpcio-1.31.0-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:7a11b1ebb3210f34913b8be6995936bf9ebc541a65ab69e75db5ce1fe5047e8f"},
 728 |     {file = "grpcio-1.31.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:3c2aa6d7a5e5bf73fdb1715eee777efe06dd39df03383f1cc095b2fdb34883e6"},
 729 |     {file = "grpcio-1.31.0-cp37-cp37m-manylinux2014_i686.whl", hash = "sha256:e64bddd09842ef508d72ca354319b0eb126205d951e8ac3128fe9869bd563552"},
 730 |     {file = "grpcio-1.31.0-cp37-cp37m-manylinux2014_x86_64.whl", hash = "sha256:5d7faa89992e015d245750ca9ac916c161bbf72777b2c60abc61da3fae41339e"},
 731 |     {file = "grpcio-1.31.0-cp37-cp37m-win32.whl", hash = "sha256:43d44548ad6ee738b941abd9f09e3b83a5c13f3e1410321023c3c148ba50e796"},
 732 |     {file = "grpcio-1.31.0-cp37-cp37m-win_amd64.whl", hash = "sha256:bf00ab06ea4f89976288f4d6224d4aa120780e30c955d4f85c3214ada29b3ddf"},
 733 |     {file = "grpcio-1.31.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:344b50865914cc8e6d023457bffee9a640abb18f75d0f2bb519041961c748da9"},
 734 |     {file = "grpcio-1.31.0-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:63ee8e02d04272c3d103f44b4bce5d43ea757dd288673cea212d2f7da27967d2"},
 735 |     {file = "grpcio-1.31.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:a9a7ae74cb3108e6457cf15532d4c300324b48fbcf3ef290bcd2835745f20510"},
 736 |     {file = "grpcio-1.31.0-cp38-cp38-manylinux2014_i686.whl", hash = "sha256:64077e3a9a7cf2f59e6c76d503c8de1f18a76428f41a5b000dc53c48a0b772ff"},
 737 |     {file = "grpcio-1.31.0-cp38-cp38-manylinux2014_x86_64.whl", hash = "sha256:8b42f0ac76be07a5fa31117a3388d754ad35ef05e2e34be185ca9ccbcfac2069"},
 738 |     {file = "grpcio-1.31.0-cp38-cp38-win32.whl", hash = "sha256:8002a89ea91c0078c15d3c0daf423fd4968946be78f08545e807ea9a5ff8054a"},
 739 |     {file = "grpcio-1.31.0-cp38-cp38-win_amd64.whl", hash = "sha256:0fa86ac4452602c79774783aa68979a1a7625ebb7eaabee2b6550b975b9d61e6"},
 740 |     {file = "grpcio-1.31.0.tar.gz", hash = "sha256:5043440c45c0a031f387e7f48527541c65d672005fb24cf18ef6857483557d39"},
 741 | ]
 742 | h11 = [
 743 |     {file = "h11-0.9.0-py2.py3-none-any.whl", hash = "sha256:4bc6d6a1238b7615b266ada57e0618568066f57dd6fa967d1290ec9309b2f2f1"},
 744 |     {file = "h11-0.9.0.tar.gz", hash = "sha256:33d4bca7be0fa039f4e84d50ab00531047e53d6ee8ffbc83501ea602c169cae1"},
 745 | ]
 746 | httptools = [
 747 |     {file = "httptools-0.1.1-cp35-cp35m-macosx_10_13_x86_64.whl", hash = "sha256:a2719e1d7a84bb131c4f1e0cb79705034b48de6ae486eb5297a139d6a3296dce"},
 748 |     {file = "httptools-0.1.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:fa3cd71e31436911a44620473e873a256851e1f53dee56669dae403ba41756a4"},
 749 |     {file = "httptools-0.1.1-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:86c6acd66765a934e8730bf0e9dfaac6fdcf2a4334212bd4a0a1c78f16475ca6"},
 750 |     {file = "httptools-0.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:bc3114b9edbca5a1eb7ae7db698c669eb53eb8afbbebdde116c174925260849c"},
 751 |     {file = "httptools-0.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:ac0aa11e99454b6a66989aa2d44bca41d4e0f968e395a0a8f164b401fefe359a"},
 752 |     {file = "httptools-0.1.1-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:96da81e1992be8ac2fd5597bf0283d832287e20cb3cfde8996d2b00356d4e17f"},
 753 |     {file = "httptools-0.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:56b6393c6ac7abe632f2294da53f30d279130a92e8ae39d8d14ee2e1b05ad1f2"},
 754 |     {file = "httptools-0.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:96eb359252aeed57ea5c7b3d79839aaa0382c9d3149f7d24dd7172b1bcecb009"},
 755 |     {file = "httptools-0.1.1-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:fea04e126014169384dee76a153d4573d90d0cbd1d12185da089f73c78390437"},
 756 |     {file = "httptools-0.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:3592e854424ec94bd17dc3e0c96a64e459ec4147e6d53c0a42d0ebcef9cb9c5d"},
 757 |     {file = "httptools-0.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:0a4b1b2012b28e68306575ad14ad5e9120b34fccd02a81eb08838d7e3bbb48be"},
 758 |     {file = "httptools-0.1.1.tar.gz", hash = "sha256:41b573cf33f64a8f8f3400d0a7faf48e1888582b6f6e02b82b9bd4f0bf7497ce"},
 759 | ]
 760 | idna = [
 761 |     {file = "idna-2.10-py2.py3-none-any.whl", hash = "sha256:b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0"},
 762 |     {file = "idna-2.10.tar.gz", hash = "sha256:b307872f855b18632ce0c21c5e45be78c0ea7ae4c15c828c20788b26921eb3f6"},
 763 | ]
 764 | importlib-metadata = [
 765 |     {file = "importlib_metadata-1.7.0-py2.py3-none-any.whl", hash = "sha256:dc15b2969b4ce36305c51eebe62d418ac7791e9a157911d58bfb1f9ccd8e2070"},
 766 |     {file = "importlib_metadata-1.7.0.tar.gz", hash = "sha256:90bb658cdbbf6d1735b6341ce708fc7024a3e14e99ffdc5783edea9f9b077f83"},
 767 | ]
 768 | itsdangerous = [
 769 |     {file = "itsdangerous-1.1.0-py2.py3-none-any.whl", hash = "sha256:b12271b2047cb23eeb98c8b5622e2e5c5e9abd9784a153e9d8ef9cb4dd09d749"},
 770 |     {file = "itsdangerous-1.1.0.tar.gz", hash = "sha256:321b033d07f2a4136d3ec762eac9f16a10ccd60f53c0c91af90217ace7ba1f19"},
 771 | ]
 772 | jinja2 = [
 773 |     {file = "Jinja2-2.11.2-py2.py3-none-any.whl", hash = "sha256:f0a4641d3cf955324a89c04f3d94663aa4d638abe8f733ecd3582848e1c37035"},
 774 |     {file = "Jinja2-2.11.2.tar.gz", hash = "sha256:89aab215427ef59c34ad58735269eb58b1a5808103067f7bb9d5836c651b3bb0"},
 775 | ]
 776 | joblib = [
 777 |     {file = "joblib-0.16.0-py3-none-any.whl", hash = "sha256:d348c5d4ae31496b2aa060d6d9b787864dd204f9480baaa52d18850cb43e9f49"},
 778 |     {file = "joblib-0.16.0.tar.gz", hash = "sha256:8f52bf24c64b608bf0b2563e0e47d6fcf516abc8cfafe10cfd98ad66d94f92d6"},
 779 | ]
 780 | jsonschema = [
 781 |     {file = "jsonschema-3.2.0-py2.py3-none-any.whl", hash = "sha256:4e5b3cf8216f577bee9ce139cbe72eca3ea4f292ec60928ff24758ce626cd163"},
 782 |     {file = "jsonschema-3.2.0.tar.gz", hash = "sha256:c8a85b28d377cc7737e46e2d9f2b4f44ee3c0e1deac6bf46ddefc7187d30797a"},
 783 | ]
 784 | markupsafe = [
 785 |     {file = "MarkupSafe-1.1.1-cp27-cp27m-macosx_10_6_intel.whl", hash = "sha256:09027a7803a62ca78792ad89403b1b7a73a01c8cb65909cd876f7fcebd79b161"},
 786 |     {file = "MarkupSafe-1.1.1-cp27-cp27m-manylinux1_i686.whl", hash = "sha256:e249096428b3ae81b08327a63a485ad0878de3fb939049038579ac0ef61e17e7"},
 787 |     {file = "MarkupSafe-1.1.1-cp27-cp27m-manylinux1_x86_64.whl", hash = "sha256:500d4957e52ddc3351cabf489e79c91c17f6e0899158447047588650b5e69183"},
 788 |     {file = "MarkupSafe-1.1.1-cp27-cp27m-win32.whl", hash = "sha256:b2051432115498d3562c084a49bba65d97cf251f5a331c64a12ee7e04dacc51b"},
 789 |     {file = "MarkupSafe-1.1.1-cp27-cp27m-win_amd64.whl", hash = "sha256:98c7086708b163d425c67c7a91bad6e466bb99d797aa64f965e9d25c12111a5e"},
 790 |     {file = "MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_i686.whl", hash = "sha256:cd5df75523866410809ca100dc9681e301e3c27567cf498077e8551b6d20e42f"},
 791 |     {file = "MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:43a55c2930bbc139570ac2452adf3d70cdbb3cfe5912c71cdce1c2c6bbd9c5d1"},
 792 |     {file = "MarkupSafe-1.1.1-cp34-cp34m-macosx_10_6_intel.whl", hash = "sha256:1027c282dad077d0bae18be6794e6b6b8c91d58ed8a8d89a89d59693b9131db5"},
 793 |     {file = "MarkupSafe-1.1.1-cp34-cp34m-manylinux1_i686.whl", hash = "sha256:62fe6c95e3ec8a7fad637b7f3d372c15ec1caa01ab47926cfdf7a75b40e0eac1"},
 794 |     {file = "MarkupSafe-1.1.1-cp34-cp34m-manylinux1_x86_64.whl", hash = "sha256:88e5fcfb52ee7b911e8bb6d6aa2fd21fbecc674eadd44118a9cc3863f938e735"},
 795 |     {file = "MarkupSafe-1.1.1-cp34-cp34m-win32.whl", hash = "sha256:ade5e387d2ad0d7ebf59146cc00c8044acbd863725f887353a10df825fc8ae21"},
 796 |     {file = "MarkupSafe-1.1.1-cp34-cp34m-win_amd64.whl", hash = "sha256:09c4b7f37d6c648cb13f9230d847adf22f8171b1ccc4d5682398e77f40309235"},
 797 |     {file = "MarkupSafe-1.1.1-cp35-cp35m-macosx_10_6_intel.whl", hash = "sha256:79855e1c5b8da654cf486b830bd42c06e8780cea587384cf6545b7d9ac013a0b"},
 798 |     {file = "MarkupSafe-1.1.1-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:c8716a48d94b06bb3b2524c2b77e055fb313aeb4ea620c8dd03a105574ba704f"},
 799 |     {file = "MarkupSafe-1.1.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:7c1699dfe0cf8ff607dbdcc1e9b9af1755371f92a68f706051cc8c37d447c905"},
 800 |     {file = "MarkupSafe-1.1.1-cp35-cp35m-win32.whl", hash = "sha256:6dd73240d2af64df90aa7c4e7481e23825ea70af4b4922f8ede5b9e35f78a3b1"},
 801 |     {file = "MarkupSafe-1.1.1-cp35-cp35m-win_amd64.whl", hash = "sha256:9add70b36c5666a2ed02b43b335fe19002ee5235efd4b8a89bfcf9005bebac0d"},
 802 |     {file = "MarkupSafe-1.1.1-cp36-cp36m-macosx_10_6_intel.whl", hash = "sha256:24982cc2533820871eba85ba648cd53d8623687ff11cbb805be4ff7b4c971aff"},
 803 |     {file = "MarkupSafe-1.1.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:00bc623926325b26bb9605ae9eae8a215691f33cae5df11ca5424f06f2d1f473"},
 804 |     {file = "MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:717ba8fe3ae9cc0006d7c451f0bb265ee07739daf76355d06366154ee68d221e"},
 805 |     {file = "MarkupSafe-1.1.1-cp36-cp36m-win32.whl", hash = "sha256:535f6fc4d397c1563d08b88e485c3496cf5784e927af890fb3c3aac7f933ec66"},
 806 |     {file = "MarkupSafe-1.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:b1282f8c00509d99fef04d8ba936b156d419be841854fe901d8ae224c59f0be5"},
 807 |     {file = "MarkupSafe-1.1.1-cp37-cp37m-macosx_10_6_intel.whl", hash = "sha256:8defac2f2ccd6805ebf65f5eeb132adcf2ab57aa11fdf4c0dd5169a004710e7d"},
 808 |     {file = "MarkupSafe-1.1.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:46c99d2de99945ec5cb54f23c8cd5689f6d7177305ebff350a58ce5f8de1669e"},
 809 |     {file = "MarkupSafe-1.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:ba59edeaa2fc6114428f1637ffff42da1e311e29382d81b339c1817d37ec93c6"},
 810 |     {file = "MarkupSafe-1.1.1-cp37-cp37m-win32.whl", hash = "sha256:b00c1de48212e4cc9603895652c5c410df699856a2853135b3967591e4beebc2"},
 811 |     {file = "MarkupSafe-1.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:9bf40443012702a1d2070043cb6291650a0841ece432556f784f004937f0f32c"},
 812 |     {file = "MarkupSafe-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:6788b695d50a51edb699cb55e35487e430fa21f1ed838122d722e0ff0ac5ba15"},
 813 |     {file = "MarkupSafe-1.1.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:cdb132fc825c38e1aeec2c8aa9338310d29d337bebbd7baa06889d09a60a1fa2"},
 814 |     {file = "MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:13d3144e1e340870b25e7b10b98d779608c02016d5184cfb9927a9f10c689f42"},
 815 |     {file = "MarkupSafe-1.1.1-cp38-cp38-win32.whl", hash = "sha256:596510de112c685489095da617b5bcbbac7dd6384aeebeda4df6025d0256a81b"},
 816 |     {file = "MarkupSafe-1.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:e8313f01ba26fbbe36c7be1966a7b7424942f670f38e666995b88d012765b9be"},
 817 |     {file = "MarkupSafe-1.1.1.tar.gz", hash = "sha256:29872e92839765e546828bb7754a68c418d927cd064fd4708fab9fe9c8bb116b"},
 818 | ]
 819 | msgpack = [
 820 |     {file = "msgpack-1.0.0-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:cec8bf10981ed70998d98431cd814db0ecf3384e6b113366e7f36af71a0fca08"},
 821 |     {file = "msgpack-1.0.0-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:aa5c057eab4f40ec47ea6f5a9825846be2ff6bf34102c560bad5cad5a677c5be"},
 822 |     {file = "msgpack-1.0.0-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:4233b7f86c1208190c78a525cd3828ca1623359ef48f78a6fea4b91bb995775a"},
 823 |     {file = "msgpack-1.0.0-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:b3758dfd3423e358bbb18a7cccd1c74228dffa7a697e5be6cb9535de625c0dbf"},
 824 |     {file = "msgpack-1.0.0-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:25b3bc3190f3d9d965b818123b7752c5dfb953f0d774b454fd206c18fe384fb8"},
 825 |     {file = "msgpack-1.0.0-cp36-cp36m-win32.whl", hash = "sha256:e7bbdd8e2b277b77782f3ce34734b0dfde6cbe94ddb74de8d733d603c7f9e2b1"},
 826 |     {file = "msgpack-1.0.0-cp36-cp36m-win_amd64.whl", hash = "sha256:5dba6d074fac9b24f29aaf1d2d032306c27f04187651511257e7831733293ec2"},
 827 |     {file = "msgpack-1.0.0-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:908944e3f038bca67fcfedb7845c4a257c7749bf9818632586b53bcf06ba4b97"},
 828 |     {file = "msgpack-1.0.0-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:db685187a415f51d6b937257474ca72199f393dad89534ebbdd7d7a3b000080e"},
 829 |     {file = "msgpack-1.0.0-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:ea41c9219c597f1d2bf6b374d951d310d58684b5de9dc4bd2976db9e1e22c140"},
 830 |     {file = "msgpack-1.0.0-cp37-cp37m-win32.whl", hash = "sha256:e35b051077fc2f3ce12e7c6a34cf309680c63a842db3a0616ea6ed25ad20d272"},
 831 |     {file = "msgpack-1.0.0-cp37-cp37m-win_amd64.whl", hash = "sha256:5bea44181fc8e18eed1d0cd76e355073f00ce232ff9653a0ae88cb7d9e643322"},
 832 |     {file = "msgpack-1.0.0-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:c901e8058dd6653307906c5f157f26ed09eb94a850dddd989621098d347926ab"},
 833 |     {file = "msgpack-1.0.0-cp38-cp38-manylinux1_i686.whl", hash = "sha256:271b489499a43af001a2e42f42d876bb98ccaa7e20512ff37ca78c8e12e68f84"},
 834 |     {file = "msgpack-1.0.0-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:7a22c965588baeb07242cb561b63f309db27a07382825fc98aecaf0827c1538e"},
 835 |     {file = "msgpack-1.0.0-cp38-cp38-win32.whl", hash = "sha256:002a0d813e1f7b60da599bdf969e632074f9eec1b96cbed8fb0973a63160a408"},
 836 |     {file = "msgpack-1.0.0-cp38-cp38-win_amd64.whl", hash = "sha256:39c54fdebf5fa4dda733369012c59e7d085ebdfe35b6cf648f09d16708f1be5d"},
 837 |     {file = "msgpack-1.0.0.tar.gz", hash = "sha256:9534d5cc480d4aff720233411a1f765be90885750b07df772380b34c10ecb5c0"},
 838 | ]
 839 | multidict = [
 840 |     {file = "multidict-4.7.6-cp35-cp35m-macosx_10_14_x86_64.whl", hash = "sha256:275ca32383bc5d1894b6975bb4ca6a7ff16ab76fa622967625baeebcf8079000"},
 841 |     {file = "multidict-4.7.6-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:1ece5a3369835c20ed57adadc663400b5525904e53bae59ec854a5d36b39b21a"},
 842 |     {file = "multidict-4.7.6-cp35-cp35m-win32.whl", hash = "sha256:5141c13374e6b25fe6bf092052ab55c0c03d21bd66c94a0e3ae371d3e4d865a5"},
 843 |     {file = "multidict-4.7.6-cp35-cp35m-win_amd64.whl", hash = "sha256:9456e90649005ad40558f4cf51dbb842e32807df75146c6d940b6f5abb4a78f3"},
 844 |     {file = "multidict-4.7.6-cp36-cp36m-macosx_10_14_x86_64.whl", hash = "sha256:e0d072ae0f2a179c375f67e3da300b47e1a83293c554450b29c900e50afaae87"},
 845 |     {file = "multidict-4.7.6-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:3750f2205b800aac4bb03b5ae48025a64e474d2c6cc79547988ba1d4122a09e2"},
 846 |     {file = "multidict-4.7.6-cp36-cp36m-win32.whl", hash = "sha256:f07acae137b71af3bb548bd8da720956a3bc9f9a0b87733e0899226a2317aeb7"},
 847 |     {file = "multidict-4.7.6-cp36-cp36m-win_amd64.whl", hash = "sha256:6513728873f4326999429a8b00fc7ceddb2509b01d5fd3f3be7881a257b8d463"},
 848 |     {file = "multidict-4.7.6-cp37-cp37m-macosx_10_14_x86_64.whl", hash = "sha256:feed85993dbdb1dbc29102f50bca65bdc68f2c0c8d352468c25b54874f23c39d"},
 849 |     {file = "multidict-4.7.6-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:fcfbb44c59af3f8ea984de67ec7c306f618a3ec771c2843804069917a8f2e255"},
 850 |     {file = "multidict-4.7.6-cp37-cp37m-win32.whl", hash = "sha256:4538273208e7294b2659b1602490f4ed3ab1c8cf9dbdd817e0e9db8e64be2507"},
 851 |     {file = "multidict-4.7.6-cp37-cp37m-win_amd64.whl", hash = "sha256:d14842362ed4cf63751648e7672f7174c9818459d169231d03c56e84daf90b7c"},
 852 |     {file = "multidict-4.7.6-cp38-cp38-macosx_10_14_x86_64.whl", hash = "sha256:c026fe9a05130e44157b98fea3ab12969e5b60691a276150db9eda71710cd10b"},
 853 |     {file = "multidict-4.7.6-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:51a4d210404ac61d32dada00a50ea7ba412e6ea945bbe992e4d7a595276d2ec7"},
 854 |     {file = "multidict-4.7.6-cp38-cp38-win32.whl", hash = "sha256:5cf311a0f5ef80fe73e4f4c0f0998ec08f954a6ec72b746f3c179e37de1d210d"},
 855 |     {file = "multidict-4.7.6-cp38-cp38-win_amd64.whl", hash = "sha256:7388d2ef3c55a8ba80da62ecfafa06a1c097c18032a501ffd4cabbc52d7f2b19"},
 856 |     {file = "multidict-4.7.6.tar.gz", hash = "sha256:fbb77a75e529021e7c4a8d4e823d88ef4d23674a202be4f5addffc72cbb91430"},
 857 | ]
 858 | numpy = [
 859 |     {file = "numpy-1.19.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:b1cca51512299841bf69add3b75361779962f9cee7d9ee3bb446d5982e925b69"},
 860 |     {file = "numpy-1.19.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:c9591886fc9cbe5532d5df85cb8e0cc3b44ba8ce4367bd4cf1b93dc19713da72"},
 861 |     {file = "numpy-1.19.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:cf1347450c0b7644ea142712619533553f02ef23f92f781312f6a3553d031fc7"},
 862 |     {file = "numpy-1.19.1-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:ed8a311493cf5480a2ebc597d1e177231984c818a86875126cfd004241a73c3e"},
 863 |     {file = "numpy-1.19.1-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:3673c8b2b29077f1b7b3a848794f8e11f401ba0b71c49fbd26fb40b71788b132"},
 864 |     {file = "numpy-1.19.1-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:56ef7f56470c24bb67fb43dae442e946a6ce172f97c69f8d067ff8550cf782ff"},
 865 |     {file = "numpy-1.19.1-cp36-cp36m-win32.whl", hash = "sha256:aaf42a04b472d12515debc621c31cf16c215e332242e7a9f56403d814c744624"},
 866 |     {file = "numpy-1.19.1-cp36-cp36m-win_amd64.whl", hash = "sha256:082f8d4dd69b6b688f64f509b91d482362124986d98dc7dc5f5e9f9b9c3bb983"},
 867 |     {file = "numpy-1.19.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:e4f6d3c53911a9d103d8ec9518190e52a8b945bab021745af4939cfc7c0d4a9e"},
 868 |     {file = "numpy-1.19.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:5b6885c12784a27e957294b60f97e8b5b4174c7504665333c5e94fbf41ae5d6a"},
 869 |     {file = "numpy-1.19.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:1bc0145999e8cb8aed9d4e65dd8b139adf1919e521177f198529687dbf613065"},
 870 |     {file = "numpy-1.19.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:5a936fd51049541d86ccdeef2833cc89a18e4d3808fe58a8abeb802665c5af93"},
 871 |     {file = "numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:ef71a1d4fd4858596ae80ad1ec76404ad29701f8ca7cdcebc50300178db14dfc"},
 872 |     {file = "numpy-1.19.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:b9792b0ac0130b277536ab8944e7b754c69560dac0415dd4b2dbd16b902c8954"},
 873 |     {file = "numpy-1.19.1-cp37-cp37m-win32.whl", hash = "sha256:b12e639378c741add21fbffd16ba5ad25c0a1a17cf2b6fe4288feeb65144f35b"},
 874 |     {file = "numpy-1.19.1-cp37-cp37m-win_amd64.whl", hash = "sha256:8343bf67c72e09cfabfab55ad4a43ce3f6bf6e6ced7acf70f45ded9ebb425055"},
 875 |     {file = "numpy-1.19.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:e45f8e981a0ab47103181773cc0a54e650b2aef8c7b6cd07405d0fa8d869444a"},
 876 |     {file = "numpy-1.19.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:667c07063940e934287993366ad5f56766bc009017b4a0fe91dbd07960d0aba7"},
 877 |     {file = "numpy-1.19.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:480fdd4dbda4dd6b638d3863da3be82873bba6d32d1fc12ea1b8486ac7b8d129"},
 878 |     {file = "numpy-1.19.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:935c27ae2760c21cd7354402546f6be21d3d0c806fffe967f745d5f2de5005a7"},
 879 |     {file = "numpy-1.19.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:309cbcfaa103fc9a33ec16d2d62569d541b79f828c382556ff072442226d1968"},
 880 |     {file = "numpy-1.19.1-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:7ed448ff4eaffeb01094959b19cbaf998ecdee9ef9932381420d514e446601cd"},
 881 |     {file = "numpy-1.19.1-cp38-cp38-win32.whl", hash = "sha256:de8b4a9b56255797cbddb93281ed92acbc510fb7b15df3f01bd28f46ebc4edae"},
 882 |     {file = "numpy-1.19.1-cp38-cp38-win_amd64.whl", hash = "sha256:92feb989b47f83ebef246adabc7ff3b9a59ac30601c3f6819f8913458610bdcc"},
 883 |     {file = "numpy-1.19.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl", hash = "sha256:e1b1dc0372f530f26a03578ac75d5e51b3868b9b76cd2facba4c9ee0eb252ab1"},
 884 |     {file = "numpy-1.19.1.zip", hash = "sha256:b8456987b637232602ceb4d663cb34106f7eb780e247d51a260b84760fd8f491"},
 885 | ]
 886 | pandas = [
 887 |     {file = "pandas-1.1.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:8c9ec12c480c4d915e23ee9c8a2d8eba8509986f35f307771045c1294a2e5b73"},
 888 |     {file = "pandas-1.1.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:e4b6c98f45695799990da328e6fd7d6187be32752ed64c2f22326ad66762d179"},
 889 |     {file = "pandas-1.1.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:16ae070c47474008769fc443ac765ffd88c3506b4a82966e7a605592978896f9"},
 890 |     {file = "pandas-1.1.1-cp36-cp36m-win32.whl", hash = "sha256:88930c74f69e97b17703600233c0eaf1f4f4dd10c14633d522724c5c1b963ec4"},
 891 |     {file = "pandas-1.1.1-cp36-cp36m-win_amd64.whl", hash = "sha256:fe6f1623376b616e03d51f0dd95afd862cf9a33c18cf55ce0ed4bbe1c4444391"},
 892 |     {file = "pandas-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:a81c4bf9c59010aa3efddbb6b9fc84a9b76dc0b4da2c2c2d50f06a9ef6ac0004"},
 893 |     {file = "pandas-1.1.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:1acc2bd7fc95e5408a4456897c2c2a1ae7c6acefe108d90479ab6d98d34fcc3d"},
 894 |     {file = "pandas-1.1.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:84c101d0f7bbf0d9f1be9a2f29f6fcc12415442558d067164e50a56edfb732b4"},
 895 |     {file = "pandas-1.1.1-cp37-cp37m-win32.whl", hash = "sha256:391db82ebeb886143b96b9c6c6166686c9a272d00020e4e39ad63b792542d9e2"},
 896 |     {file = "pandas-1.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:0366150fe8ee37ef89a45d3093e05026b5f895e42bbce3902ce3b6427f1b8471"},
 897 |     {file = "pandas-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:d9644ac996149b2a51325d48d77e25c911e01aa6d39dc1b64be679cd71f683ec"},
 898 |     {file = "pandas-1.1.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:41675323d4fcdd15abde068607cad150dfe17f7d32290ee128e5fea98442bd09"},
 899 |     {file = "pandas-1.1.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0246c67cbaaaac8d25fed8d4cf2d8897bd858f0e540e8528a75281cee9ac516d"},
 900 |     {file = "pandas-1.1.1-cp38-cp38-win32.whl", hash = "sha256:01b1e536eb960822c5e6b58357cad8c4b492a336f4a5630bf0b598566462a578"},
 901 |     {file = "pandas-1.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:57c5f6be49259cde8e6f71c2bf240a26b071569cabc04c751358495d09419e56"},
 902 |     {file = "pandas-1.1.1.tar.gz", hash = "sha256:53328284a7bb046e2e885fd1b8c078bd896d7fc4575b915d4936f54984a2ba67"},
 903 | ]
 904 | prettyprinter = [
 905 |     {file = "prettyprinter-0.18.0-py2.py3-none-any.whl", hash = "sha256:358a58f276cb312e3ca29d7a7f244c91e4e0bda7848249d30e4f36d2eb58b67c"},
 906 |     {file = "prettyprinter-0.18.0.tar.gz", hash = "sha256:9fe5da7ec53510881dd35d7a5c677ba45f34cfe6a8e78d1abd20652cf82139a8"},
 907 | ]
 908 | protobuf = [
 909 |     {file = "protobuf-3.13.0-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:9c2e63c1743cba12737169c447374fab3dfeb18111a460a8c1a000e35836b18c"},
 910 |     {file = "protobuf-3.13.0-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:1e834076dfef9e585815757a2c7e4560c7ccc5962b9d09f831214c693a91b463"},
 911 |     {file = "protobuf-3.13.0-cp35-cp35m-macosx_10_9_intel.whl", hash = "sha256:df3932e1834a64b46ebc262e951cd82c3cf0fa936a154f0a42231140d8237060"},
 912 |     {file = "protobuf-3.13.0-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:8c35bcbed1c0d29b127c886790e9d37e845ffc2725cc1db4bd06d70f4e8359f4"},
 913 |     {file = "protobuf-3.13.0-cp35-cp35m-win32.whl", hash = "sha256:339c3a003e3c797bc84499fa32e0aac83c768e67b3de4a5d7a5a9aa3b0da634c"},
 914 |     {file = "protobuf-3.13.0-cp35-cp35m-win_amd64.whl", hash = "sha256:361acd76f0ad38c6e38f14d08775514fbd241316cce08deb2ce914c7dfa1184a"},
 915 |     {file = "protobuf-3.13.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:9edfdc679a3669988ec55a989ff62449f670dfa7018df6ad7f04e8dbacb10630"},
 916 |     {file = "protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:5db9d3e12b6ede5e601b8d8684a7f9d90581882925c96acf8495957b4f1b204b"},
 917 |     {file = "protobuf-3.13.0-cp36-cp36m-win32.whl", hash = "sha256:c8abd7605185836f6f11f97b21200f8a864f9cb078a193fe3c9e235711d3ff1e"},
 918 |     {file = "protobuf-3.13.0-cp36-cp36m-win_amd64.whl", hash = "sha256:4d1174c9ed303070ad59553f435846a2f877598f59f9afc1b89757bdf846f2a7"},
 919 |     {file = "protobuf-3.13.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:0bba42f439bf45c0f600c3c5993666fcb88e8441d011fad80a11df6f324eef33"},
 920 |     {file = "protobuf-3.13.0-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:c0c5ab9c4b1eac0a9b838f1e46038c3175a95b0f2d944385884af72876bd6bc7"},
 921 |     {file = "protobuf-3.13.0-cp37-cp37m-win32.whl", hash = "sha256:f68eb9d03c7d84bd01c790948320b768de8559761897763731294e3bc316decb"},
 922 |     {file = "protobuf-3.13.0-cp37-cp37m-win_amd64.whl", hash = "sha256:91c2d897da84c62816e2f473ece60ebfeab024a16c1751aaf31100127ccd93ec"},
 923 |     {file = "protobuf-3.13.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:3dee442884a18c16d023e52e32dd34a8930a889e511af493f6dc7d4d9bf12e4f"},
 924 |     {file = "protobuf-3.13.0-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:e7662437ca1e0c51b93cadb988f9b353fa6b8013c0385d63a70c8a77d84da5f9"},
 925 |     {file = "protobuf-3.13.0-py2.py3-none-any.whl", hash = "sha256:d69697acac76d9f250ab745b46c725edf3e98ac24763990b24d58c16c642947a"},
 926 |     {file = "protobuf-3.13.0.tar.gz", hash = "sha256:6a82e0c8bb2bf58f606040cc5814e07715b2094caeba281e2e7d0b0e2e397db5"},
 927 | ]
 928 | py-spy = [
 929 |     {file = "py_spy-0.3.3-py2.py3-none-macosx_10_7_x86_64.whl", hash = "sha256:ac0ef13fc2bd67593be1d3fcd1bbee93a6324715b3c2944218e50eadb966c46e"},
 930 |     {file = "py_spy-0.3.3-py2.py3-none-manylinux1_i686.whl", hash = "sha256:72eb5c0495b050e6e9424ea373ff7245a01554e98f218d89f8f979c0cd762681"},
 931 |     {file = "py_spy-0.3.3-py2.py3-none-manylinux1_x86_64.whl", hash = "sha256:e9d6946741c267fe82aef18d2fc1e095a90a83fb5f3d9fc89b0f20a39613a639"},
 932 |     {file = "py_spy-0.3.3-py2.py3-none-win_amd64.whl", hash = "sha256:a165d444cfbf24cdcdfe8cdaa858a179e1fae43adcb912e5efb3151362f67aa8"},
 933 | ]
 934 | pygments = [
 935 |     {file = "Pygments-2.6.1-py3-none-any.whl", hash = "sha256:ff7a40b4860b727ab48fad6360eb351cc1b33cbf9b15a0f689ca5353e9463324"},
 936 |     {file = "Pygments-2.6.1.tar.gz", hash = "sha256:647344a061c249a3b74e230c739f434d7ea4d8b1d5f3721bc0f3558049b38f44"},
 937 | ]
 938 | pyrsistent = [
 939 |     {file = "pyrsistent-0.16.0.tar.gz", hash = "sha256:28669905fe725965daa16184933676547c5bb40a5153055a8dee2a4bd7933ad3"},
 940 | ]
 941 | python-dateutil = [
 942 |     {file = "python-dateutil-2.8.1.tar.gz", hash = "sha256:73ebfe9dbf22e832286dafa60473e4cd239f8592f699aa5adaf10050e6e1823c"},
 943 |     {file = "python_dateutil-2.8.1-py2.py3-none-any.whl", hash = "sha256:75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a"},
 944 | ]
 945 | pytz = [
 946 |     {file = "pytz-2020.1-py2.py3-none-any.whl", hash = "sha256:a494d53b6d39c3c6e44c3bec237336e14305e4f29bbf800b599253057fbb79ed"},
 947 |     {file = "pytz-2020.1.tar.gz", hash = "sha256:c35965d010ce31b23eeb663ed3cc8c906275d6be1a34393a1d73a41febf4a048"},
 948 | ]
 949 | pyyaml = [
 950 |     {file = "PyYAML-5.3.1-cp27-cp27m-win32.whl", hash = "sha256:74809a57b329d6cc0fdccee6318f44b9b8649961fa73144a98735b0aaf029f1f"},
 951 |     {file = "PyYAML-5.3.1-cp27-cp27m-win_amd64.whl", hash = "sha256:240097ff019d7c70a4922b6869d8a86407758333f02203e0fc6ff79c5dcede76"},
 952 |     {file = "PyYAML-5.3.1-cp35-cp35m-win32.whl", hash = "sha256:4f4b913ca1a7319b33cfb1369e91e50354d6f07a135f3b901aca02aa95940bd2"},
 953 |     {file = "PyYAML-5.3.1-cp35-cp35m-win_amd64.whl", hash = "sha256:cc8955cfbfc7a115fa81d85284ee61147059a753344bc51098f3ccd69b0d7e0c"},
 954 |     {file = "PyYAML-5.3.1-cp36-cp36m-win32.whl", hash = "sha256:7739fc0fa8205b3ee8808aea45e968bc90082c10aef6ea95e855e10abf4a37b2"},
 955 |     {file = "PyYAML-5.3.1-cp36-cp36m-win_amd64.whl", hash = "sha256:69f00dca373f240f842b2931fb2c7e14ddbacd1397d57157a9b005a6a9942648"},
 956 |     {file = "PyYAML-5.3.1-cp37-cp37m-win32.whl", hash = "sha256:d13155f591e6fcc1ec3b30685d50bf0711574e2c0dfffd7644babf8b5102ca1a"},
 957 |     {file = "PyYAML-5.3.1-cp37-cp37m-win_amd64.whl", hash = "sha256:73f099454b799e05e5ab51423c7bcf361c58d3206fa7b0d555426b1f4d9a3eaf"},
 958 |     {file = "PyYAML-5.3.1-cp38-cp38-win32.whl", hash = "sha256:06a0d7ba600ce0b2d2fe2e78453a470b5a6e000a985dd4a4e54e436cc36b0e97"},
 959 |     {file = "PyYAML-5.3.1-cp38-cp38-win_amd64.whl", hash = "sha256:95f71d2af0ff4227885f7a6605c37fd53d3a106fcab511b8860ecca9fcf400ee"},
 960 |     {file = "PyYAML-5.3.1.tar.gz", hash = "sha256:b8eac752c5e14d3eca0e6dd9199cd627518cb5ec06add0de9d32baeee6fe645d"},
 961 | ]
 962 | ray = [
 963 |     {file = "ray-0.8.6-cp35-cp35m-macosx_10_13_intel.whl", hash = "sha256:28bddf09debbc82ff19e1523ada131e7beaf2170f2d88cea72601ff11ff71757"},
 964 |     {file = "ray-0.8.6-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:3a282f770855a56d3ede321ccb6e4a4b48eccb1daead9aa08c20c457ec186d27"},
 965 |     {file = "ray-0.8.6-cp36-cp36m-macosx_10_13_intel.whl", hash = "sha256:fdd4b994ffa894dfe582107b230d515c05ca41ecb2152f28e6c893d4edcc8369"},
 966 |     {file = "ray-0.8.6-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:dfd01dec0eddd446c1a22f979bf7ced185149f92d761d742592b4fc887dc439c"},
 967 |     {file = "ray-0.8.6-cp36-cp36m-win_amd64.whl", hash = "sha256:aaf43089881dc203c56c2bec499c9b425a989894bf2b39d767a5c0825a4a5af2"},
 968 |     {file = "ray-0.8.6-cp37-cp37m-macosx_10_13_intel.whl", hash = "sha256:e79bb29c6d93bc24253a75196a86201471f8ca461102e957cee71ec999fb06cf"},
 969 |     {file = "ray-0.8.6-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:c007b1e87ef6af7ac684ecb9c2be27bcc5cd881b89ebdc3ea2d99047ffc1eeee"},
 970 |     {file = "ray-0.8.6-cp37-cp37m-win_amd64.whl", hash = "sha256:402ed1be4363cc4494b7022524158f862f3e052d249497ac1145b4452ff08ebe"},
 971 |     {file = "ray-0.8.6-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:9124994117fe26d12c0873737b19fdf6d80b7d283d53d85d1662bb6c98a0b418"},
 972 |     {file = "ray-0.8.6-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:efaf70097d6e61d0f3d05acb59b6f3a627a3d8a326f1c5d212c93a7231e41d67"},
 973 |     {file = "ray-0.8.6-cp38-cp38-win_amd64.whl", hash = "sha256:dbf79b7c4d7834bc5c506c397b8f03ecfe9b03e3e4611fe37d725a6e6ccb5649"},
 974 | ]
 975 | redis = [
 976 |     {file = "redis-3.4.1-py2.py3-none-any.whl", hash = "sha256:b205cffd05ebfd0a468db74f0eedbff8df1a7bfc47521516ade4692991bb0833"},
 977 |     {file = "redis-3.4.1.tar.gz", hash = "sha256:0dcfb335921b88a850d461dc255ff4708294943322bd55de6cfd68972490ca1f"},
 978 | ]
 979 | requests = [
 980 |     {file = "requests-2.24.0-py2.py3-none-any.whl", hash = "sha256:fe75cc94a9443b9246fc7049224f75604b113c36acb93f87b80ed42c44cbb898"},
 981 |     {file = "requests-2.24.0.tar.gz", hash = "sha256:b3559a131db72c33ee969480840fff4bb6dd111de7dd27c8ee1f820f4f00231b"},
 982 | ]
 983 | scikit-learn = [
 984 |     {file = "scikit-learn-0.23.2.tar.gz", hash = "sha256:20766f515e6cd6f954554387dfae705d93c7b544ec0e6c6a5d8e006f6f7ef480"},
 985 |     {file = "scikit_learn-0.23.2-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:98508723f44c61896a4e15894b2016762a55555fbf09365a0bb1870ecbd442de"},
 986 |     {file = "scikit_learn-0.23.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:a64817b050efd50f9abcfd311870073e500ae11b299683a519fbb52d85e08d25"},
 987 |     {file = "scikit_learn-0.23.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:daf276c465c38ef736a79bd79fc80a249f746bcbcae50c40945428f7ece074f8"},
 988 |     {file = "scikit_learn-0.23.2-cp36-cp36m-win32.whl", hash = "sha256:cb3e76380312e1f86abd20340ab1d5b3cc46a26f6593d3c33c9ea3e4c7134028"},
 989 |     {file = "scikit_learn-0.23.2-cp36-cp36m-win_amd64.whl", hash = "sha256:0a127cc70990d4c15b1019680bfedc7fec6c23d14d3719fdf9b64b22d37cdeca"},
 990 |     {file = "scikit_learn-0.23.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:2aa95c2f17d2f80534156215c87bee72b6aa314a7f8b8fe92a2d71f47280570d"},
 991 |     {file = "scikit_learn-0.23.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:6c28a1d00aae7c3c9568f61aafeaad813f0f01c729bee4fd9479e2132b215c1d"},
 992 |     {file = "scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:da8e7c302003dd765d92a5616678e591f347460ac7b53e53d667be7dfe6d1b10"},
 993 |     {file = "scikit_learn-0.23.2-cp37-cp37m-win32.whl", hash = "sha256:d9a1ce5f099f29c7c33181cc4386660e0ba891b21a60dc036bf369e3a3ee3aec"},
 994 |     {file = "scikit_learn-0.23.2-cp37-cp37m-win_amd64.whl", hash = "sha256:914ac2b45a058d3f1338d7736200f7f3b094857758895f8667be8a81ff443b5b"},
 995 |     {file = "scikit_learn-0.23.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:7671bbeddd7f4f9a6968f3b5442dac5f22bf1ba06709ef888cc9132ad354a9ab"},
 996 |     {file = "scikit_learn-0.23.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:d0dcaa54263307075cb93d0bee3ceb02821093b1b3d25f66021987d305d01dce"},
 997 |     {file = "scikit_learn-0.23.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:5ce7a8021c9defc2b75620571b350acc4a7d9763c25b7593621ef50f3bd019a2"},
 998 |     {file = "scikit_learn-0.23.2-cp38-cp38-win32.whl", hash = "sha256:0d39748e7c9669ba648acf40fb3ce96b8a07b240db6888563a7cb76e05e0d9cc"},
 999 |     {file = "scikit_learn-0.23.2-cp38-cp38-win_amd64.whl", hash = "sha256:1b8a391de95f6285a2f9adffb7db0892718950954b7149a70c783dc848f104ea"},
1000 | ]
1001 | scipy = [
1002 |     {file = "scipy-1.5.2-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:cca9fce15109a36a0a9f9cfc64f870f1c140cb235ddf27fe0328e6afb44dfed0"},
1003 |     {file = "scipy-1.5.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:1c7564a4810c1cd77fcdee7fa726d7d39d4e2695ad252d7c86c3ea9d85b7fb8f"},
1004 |     {file = "scipy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:07e52b316b40a4f001667d1ad4eb5f2318738de34597bd91537851365b6c61f1"},
1005 |     {file = "scipy-1.5.2-cp36-cp36m-win32.whl", hash = "sha256:d56b10d8ed72ec1be76bf10508446df60954f08a41c2d40778bc29a3a9ad9bce"},
1006 |     {file = "scipy-1.5.2-cp36-cp36m-win_amd64.whl", hash = "sha256:8e28e74b97fc8d6aa0454989db3b5d36fc27e69cef39a7ee5eaf8174ca1123cb"},
1007 |     {file = "scipy-1.5.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:6e86c873fe1335d88b7a4bfa09d021f27a9e753758fd75f3f92d714aa4093768"},
1008 |     {file = "scipy-1.5.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:a0afbb967fd2c98efad5f4c24439a640d39463282040a88e8e928db647d8ac3d"},
1009 |     {file = "scipy-1.5.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:eecf40fa87eeda53e8e11d265ff2254729d04000cd40bae648e76ff268885d66"},
1010 |     {file = "scipy-1.5.2-cp37-cp37m-win32.whl", hash = "sha256:315aa2165aca31375f4e26c230188db192ed901761390be908c9b21d8b07df62"},
1011 |     {file = "scipy-1.5.2-cp37-cp37m-win_amd64.whl", hash = "sha256:ec5fe57e46828d034775b00cd625c4a7b5c7d2e354c3b258d820c6c72212a6ec"},
1012 |     {file = "scipy-1.5.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:fc98f3eac993b9bfdd392e675dfe19850cc8c7246a8fd2b42443e506344be7d9"},
1013 |     {file = "scipy-1.5.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:a785409c0fa51764766840185a34f96a0a93527a0ff0230484d33a8ed085c8f8"},
1014 |     {file = "scipy-1.5.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0a0e9a4e58a4734c2eba917f834b25b7e3b6dc333901ce7784fd31aefbd37b2f"},
1015 |     {file = "scipy-1.5.2-cp38-cp38-win32.whl", hash = "sha256:dac09281a0eacd59974e24525a3bc90fa39b4e95177e638a31b14db60d3fa806"},
1016 |     {file = "scipy-1.5.2-cp38-cp38-win_amd64.whl", hash = "sha256:92eb04041d371fea828858e4fff182453c25ae3eaa8782d9b6c32b25857d23bc"},
1017 |     {file = "scipy-1.5.2.tar.gz", hash = "sha256:066c513d90eb3fd7567a9e150828d39111ebd88d3e924cdfc9f8ce19ab6f90c9"},
1018 | ]
1019 | shap = [
1020 |     {file = "shap-0.35.0-cp35-cp35m-win_amd64.whl", hash = "sha256:3f8d5e1bfc2f0e8b442370e4b8253430f7078fab21b7769e89969b1a0194c3f9"},
1021 |     {file = "shap-0.35.0-cp36-cp36m-win_amd64.whl", hash = "sha256:ef9410e940396cb451039f7a1d639086e4e4e7c742faeb3fd8734e7b71cdf3d2"},
1022 |     {file = "shap-0.35.0.tar.gz", hash = "sha256:6b9a2a3636918b9cdce4d3c599786b38353fbdca49147b5407a75aee398b1018"},
1023 | ]
1024 | six = [
1025 |     {file = "six-1.15.0-py2.py3-none-any.whl", hash = "sha256:8b74bedcbbbaca38ff6d7491d76f2b06b3592611af620f8426e82dddb04a5ced"},
1026 |     {file = "six-1.15.0.tar.gz", hash = "sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259"},
1027 | ]
1028 | soupsieve = [
1029 |     {file = "soupsieve-1.9.6-py2.py3-none-any.whl", hash = "sha256:feb1e937fa26a69e08436aad4a9037cd7e1d4c7212909502ba30701247ff8abd"},
1030 |     {file = "soupsieve-1.9.6.tar.gz", hash = "sha256:7985bacc98c34923a439967c1a602dc4f1e15f923b6fcf02344184f86cc7efaa"},
1031 | ]
1032 | threadpoolctl = [
1033 |     {file = "threadpoolctl-2.1.0-py3-none-any.whl", hash = "sha256:38b74ca20ff3bb42caca8b00055111d74159ee95c4370882bbff2b93d24da725"},
1034 |     {file = "threadpoolctl-2.1.0.tar.gz", hash = "sha256:ddc57c96a38beb63db45d6c159b5ab07b6bced12c45a1f07b2b92f272aebfa6b"},
1035 | ]
1036 | tqdm = [
1037 |     {file = "tqdm-4.48.2-py2.py3-none-any.whl", hash = "sha256:1a336d2b829be50e46b84668691e0a2719f26c97c62846298dd5ae2937e4d5cf"},
1038 |     {file = "tqdm-4.48.2.tar.gz", hash = "sha256:564d632ea2b9cb52979f7956e093e831c28d441c11751682f84c86fc46e4fd21"},
1039 | ]
1040 | typing-extensions = [
1041 |     {file = "typing_extensions-3.7.4.3-py2-none-any.whl", hash = "sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f"},
1042 |     {file = "typing_extensions-3.7.4.3-py3-none-any.whl", hash = "sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918"},
1043 |     {file = "typing_extensions-3.7.4.3.tar.gz", hash = "sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c"},
1044 | ]
1045 | urllib3 = [
1046 |     {file = "urllib3-1.25.10-py2.py3-none-any.whl", hash = "sha256:e7983572181f5e1522d9c98453462384ee92a0be7fac5f1413a1e35c56cc0461"},
1047 |     {file = "urllib3-1.25.10.tar.gz", hash = "sha256:91056c15fa70756691db97756772bb1eb9678fa585d9184f24534b100dc60f4a"},
1048 | ]
1049 | uvicorn = [
1050 |     {file = "uvicorn-0.11.8-py3-none-any.whl", hash = "sha256:4b70ddb4c1946e39db9f3082d53e323dfd50634b95fd83625d778729ef1730ef"},
1051 |     {file = "uvicorn-0.11.8.tar.gz", hash = "sha256:46a83e371f37ea7ff29577d00015f02c942410288fb57def6440f2653fff1d26"},
1052 | ]
1053 | uvloop = [
1054 |     {file = "uvloop-0.14.0-cp35-cp35m-macosx_10_11_x86_64.whl", hash = "sha256:08b109f0213af392150e2fe6f81d33261bb5ce968a288eb698aad4f46eb711bd"},
1055 |     {file = "uvloop-0.14.0-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:4544dcf77d74f3a84f03dd6278174575c44c67d7165d4c42c71db3fdc3860726"},
1056 |     {file = "uvloop-0.14.0-cp36-cp36m-macosx_10_11_x86_64.whl", hash = "sha256:b4f591aa4b3fa7f32fb51e2ee9fea1b495eb75b0b3c8d0ca52514ad675ae63f7"},
1057 |     {file = "uvloop-0.14.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:f07909cd9fc08c52d294b1570bba92186181ca01fe3dc9ffba68955273dd7362"},
1058 |     {file = "uvloop-0.14.0-cp37-cp37m-macosx_10_11_x86_64.whl", hash = "sha256:afd5513c0ae414ec71d24f6f123614a80f3d27ca655a4fcf6cabe50994cc1891"},
1059 |     {file = "uvloop-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:e7514d7a48c063226b7d06617cbb12a14278d4323a065a8d46a7962686ce2e95"},
1060 |     {file = "uvloop-0.14.0-cp38-cp38-macosx_10_11_x86_64.whl", hash = "sha256:bcac356d62edd330080aed082e78d4b580ff260a677508718f88016333e2c9c5"},
1061 |     {file = "uvloop-0.14.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:4315d2ec3ca393dd5bc0b0089d23101276778c304d42faff5dc4579cb6caef09"},
1062 |     {file = "uvloop-0.14.0.tar.gz", hash = "sha256:123ac9c0c7dd71464f58f1b4ee0bbd81285d96cdda8bc3519281b8973e3a461e"},
1063 | ]
1064 | websockets = [
1065 |     {file = "websockets-8.1-cp36-cp36m-macosx_10_6_intel.whl", hash = "sha256:3762791ab8b38948f0c4d281c8b2ddfa99b7e510e46bd8dfa942a5fff621068c"},
1066 |     {file = "websockets-8.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:3db87421956f1b0779a7564915875ba774295cc86e81bc671631379371af1170"},
1067 |     {file = "websockets-8.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:4f9f7d28ce1d8f1295717c2c25b732c2bc0645db3215cf757551c392177d7cb8"},
1068 |     {file = "websockets-8.1-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:295359a2cc78736737dd88c343cd0747546b2174b5e1adc223824bcaf3e164cb"},
1069 |     {file = "websockets-8.1-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:1d3f1bf059d04a4e0eb4985a887d49195e15ebabc42364f4eb564b1d065793f5"},
1070 |     {file = "websockets-8.1-cp36-cp36m-win32.whl", hash = "sha256:2db62a9142e88535038a6bcfea70ef9447696ea77891aebb730a333a51ed559a"},
1071 |     {file = "websockets-8.1-cp36-cp36m-win_amd64.whl", hash = "sha256:0e4fb4de42701340bd2353bb2eee45314651caa6ccee80dbd5f5d5978888fed5"},
1072 |     {file = "websockets-8.1-cp37-cp37m-macosx_10_6_intel.whl", hash = "sha256:9b248ba3dd8a03b1a10b19efe7d4f7fa41d158fdaa95e2cf65af5a7b95a4f989"},
1073 |     {file = "websockets-8.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:ce85b06a10fc65e6143518b96d3dca27b081a740bae261c2fb20375801a9d56d"},
1074 |     {file = "websockets-8.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:965889d9f0e2a75edd81a07592d0ced54daa5b0785f57dc429c378edbcffe779"},
1075 |     {file = "websockets-8.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:751a556205d8245ff94aeef23546a1113b1dd4f6e4d102ded66c39b99c2ce6c8"},
1076 |     {file = "websockets-8.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:3ef56fcc7b1ff90de46ccd5a687bbd13a3180132268c4254fc0fa44ecf4fc422"},
1077 |     {file = "websockets-8.1-cp37-cp37m-win32.whl", hash = "sha256:7ff46d441db78241f4c6c27b3868c9ae71473fe03341340d2dfdbe8d79310acc"},
1078 |     {file = "websockets-8.1-cp37-cp37m-win_amd64.whl", hash = "sha256:20891f0dddade307ffddf593c733a3fdb6b83e6f9eef85908113e628fa5a8308"},
1079 |     {file = "websockets-8.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:c1ec8db4fac31850286b7cd3b9c0e1b944204668b8eb721674916d4e28744092"},
1080 |     {file = "websockets-8.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:5c01fd846263a75bc8a2b9542606927cfad57e7282965d96b93c387622487485"},
1081 |     {file = "websockets-8.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:9bef37ee224e104a413f0780e29adb3e514a5b698aabe0d969a6ba426b8435d1"},
1082 |     {file = "websockets-8.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:d705f8aeecdf3262379644e4b55107a3b55860eb812b673b28d0fbc347a60c55"},
1083 |     {file = "websockets-8.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:c8a116feafdb1f84607cb3b14aa1418424ae71fee131642fc568d21423b51824"},
1084 |     {file = "websockets-8.1-cp38-cp38-win32.whl", hash = "sha256:e898a0863421650f0bebac8ba40840fc02258ef4714cb7e1fd76b6a6354bda36"},
1085 |     {file = "websockets-8.1-cp38-cp38-win_amd64.whl", hash = "sha256:f8a7bff6e8664afc4e6c28b983845c5bc14965030e3fb98789734d416af77c4b"},
1086 |     {file = "websockets-8.1.tar.gz", hash = "sha256:5c65d2da8c6bce0fca2528f69f44b2f977e06954c8512a952222cea50dad430f"},
1087 | ]
1088 | werkzeug = [
1089 |     {file = "Werkzeug-1.0.1-py2.py3-none-any.whl", hash = "sha256:2de2a5db0baeae7b2d2664949077c2ac63fbd16d98da0ff71837f7d1dea3fd43"},
1090 |     {file = "Werkzeug-1.0.1.tar.gz", hash = "sha256:6c80b1e5ad3665290ea39320b91e1be1e0d5f60652b964a3070216de83d2e47c"},
1091 | ]
1092 | yarl = [
1093 |     {file = "yarl-1.5.1-cp35-cp35m-macosx_10_14_x86_64.whl", hash = "sha256:db6db0f45d2c63ddb1a9d18d1b9b22f308e52c83638c26b422d520a815c4b3fb"},
1094 |     {file = "yarl-1.5.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:17668ec6722b1b7a3a05cc0167659f6c95b436d25a36c2d52db0eca7d3f72593"},
1095 |     {file = "yarl-1.5.1-cp35-cp35m-win32.whl", hash = "sha256:040b237f58ff7d800e6e0fd89c8439b841f777dd99b4a9cca04d6935564b9409"},
1096 |     {file = "yarl-1.5.1-cp35-cp35m-win_amd64.whl", hash = "sha256:f18d68f2be6bf0e89f1521af2b1bb46e66ab0018faafa81d70f358153170a317"},
1097 |     {file = "yarl-1.5.1-cp36-cp36m-macosx_10_14_x86_64.whl", hash = "sha256:c52ce2883dc193824989a9b97a76ca86ecd1fa7955b14f87bf367a61b6232511"},
1098 |     {file = "yarl-1.5.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:ce584af5de8830d8701b8979b18fcf450cef9a382b1a3c8ef189bedc408faf1e"},
1099 |     {file = "yarl-1.5.1-cp36-cp36m-win32.whl", hash = "sha256:df89642981b94e7db5596818499c4b2219028f2a528c9c37cc1de45bf2fd3a3f"},
1100 |     {file = "yarl-1.5.1-cp36-cp36m-win_amd64.whl", hash = "sha256:3a584b28086bc93c888a6c2aa5c92ed1ae20932f078c46509a66dce9ea5533f2"},
1101 |     {file = "yarl-1.5.1-cp37-cp37m-macosx_10_14_x86_64.whl", hash = "sha256:da456eeec17fa8aa4594d9a9f27c0b1060b6a75f2419fe0c00609587b2695f4a"},
1102 |     {file = "yarl-1.5.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:bc2f976c0e918659f723401c4f834deb8a8e7798a71be4382e024bcc3f7e23a8"},
1103 |     {file = "yarl-1.5.1-cp37-cp37m-win32.whl", hash = "sha256:4439be27e4eee76c7632c2427ca5e73703151b22cae23e64adb243a9c2f565d8"},
1104 |     {file = "yarl-1.5.1-cp37-cp37m-win_amd64.whl", hash = "sha256:48e918b05850fffb070a496d2b5f97fc31d15d94ca33d3d08a4f86e26d4e7c5d"},
1105 |     {file = "yarl-1.5.1-cp38-cp38-macosx_10_14_x86_64.whl", hash = "sha256:9b930776c0ae0c691776f4d2891ebc5362af86f152dd0da463a6614074cb1b02"},
1106 |     {file = "yarl-1.5.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:b3b9ad80f8b68519cc3372a6ca85ae02cc5a8807723ac366b53c0f089db19e4a"},
1107 |     {file = "yarl-1.5.1-cp38-cp38-win32.whl", hash = "sha256:f379b7f83f23fe12823085cd6b906edc49df969eb99757f58ff382349a3303c6"},
1108 |     {file = "yarl-1.5.1-cp38-cp38-win_amd64.whl", hash = "sha256:9102b59e8337f9874638fcfc9ac3734a0cfadb100e47d55c20d0dc6087fb4692"},
1109 |     {file = "yarl-1.5.1.tar.gz", hash = "sha256:c22c75b5f394f3d47105045ea551e08a3e804dc7e01b37800ca35b58f856c3d6"},
1110 | ]
1111 | zipp = [
1112 |     {file = "zipp-3.1.0-py3-none-any.whl", hash = "sha256:aa36550ff0c0b7ef7fa639055d797116ee891440eac1a56f378e2d3179e0320b"},
1113 |     {file = "zipp-3.1.0.tar.gz", hash = "sha256:c599e4d75c98f6798c509911d08a22e6c021d074469042177c8c86fb92eefd96"},
1114 | ]
1115 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.poetry]
 2 | name = "explainers"
 3 | version = "0.1.0"
 4 | description = "A packaged that distributes KernelSHAP using ray"
 5 | authors = ["alexcoca <alexcoca23@yahoo.co.uk>"]
 6 | 
 7 | [tool.poetry.dependencies]
 8 | python = "^3.7"
 9 | attrs = ">=19.1.0"
10 | numpy = ">=1.17.4"
11 | pandas = ">=0.23.4"
12 | prettyprinter = ">=0.18.0"
13 | ray = {version = "0.8.6", extras = ["serve"]}
14 | scipy = ">=1.3.1"
15 | scikit-learn = ">=0.21.2"
16 | shap = ">=0.35.0"
17 | requests = "^2.24.0"
18 | 
19 | [tool.poetry.dev-dependencies]
20 | 
21 | [build-system]
22 | requires = ["poetry>=0.12"]
23 | build-backend = "poetry.masonry.api"
24 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | attrs>=19.1.0
2 | numpy>=1.17.4
3 | pandas>=0.23.4
4 | prettyprinter>=0.18.0
5 | ray[serve]==0.8.6
6 | scipy>=1.3.1
7 | scikit-learn>=0.21.2
8 | shap>=0.35.0
9 | requests


--------------------------------------------------------------------------------
/requirements_advanced.txt:
--------------------------------------------------------------------------------
1 | kubernetes


--------------------------------------------------------------------------------
/scripts/fit_adult_model.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import os
 3 | 
 4 | import pickle
 5 | from sklearn.linear_model import LogisticRegression
 6 | from sklearn.metrics import accuracy_score
 7 | from typing import Dict, Any
 8 | from explainers.utils import load_data
 9 | 
10 | """
11 | This script pulls the Adult data from the ``data/`` directory and fits a logistic regression model to it. Model is 
12 | saved under ``assets/predictor.pkl``. 
13 | """
14 | 
15 | 
16 | def fit_adult_logistic_regression(data_dict: Dict[str, Any]):
17 |     """
18 |     Fit a logistic regression model to the processed Adult dataset.
19 |     """
20 | 
21 |     logging.info("Fitting model ...")
22 |     X_train_proc = data_dict['X']['processed']['train']
23 |     X_test_proc = data_dict['X']['processed']['test']
24 |     y_train = data_dict['y']['train']
25 |     y_test = data_dict['y']['test']
26 | 
27 |     classifier = LogisticRegression(multi_class='multinomial',
28 |                                     random_state=0,
29 |                                     max_iter=500,
30 |                                     verbose=0,
31 |                                     )
32 |     classifier.fit(X_train_proc, y_train)
33 | 
34 |     logging.info(f"Test accuracy: {accuracy_score(y_test, classifier.predict(X_test_proc))}")
35 | 
36 |     return classifier
37 | 
38 | 
39 | def main():
40 | 
41 |     if not os.path.exists('assets'):
42 |         os.mkdir('assets')
43 | 
44 |     data = load_data()
45 |     lr_predictor = fit_adult_logistic_regression(data['all'])
46 |     with open("assets/predictor.pkl", "wb") as f:
47 |         pickle.dump(lr_predictor, f)
48 | 
49 | 
50 | if __name__ == '__main__':
51 |     main()
52 | 


--------------------------------------------------------------------------------
/scripts/process_adult_data.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import pickle
  3 | import logging
  4 | import os
  5 | import sys
  6 | import requests
  7 | 
  8 | import numpy as np
  9 | import pandas as pd
 10 | 
 11 | from io import StringIO
 12 | from requests import RequestException
 13 | from sklearn.compose import ColumnTransformer
 14 | from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder
 15 | from typing import Any, Dict, List, Tuple, Union
 16 | from explainers.utils import Bunch
 17 | 
 18 | logger = logging.getLogger(__name__)
 19 | 
 20 | ADULT_URLS = [
 21 |     'https://storage.googleapis.com/seldon-datasets/adult/adult.data',
 22 |     'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
 23 |     'http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data',
 24 | ]  # type: List[str]
 25 | 
 26 | 
 27 | sys.path.append('../')
 28 | 
 29 | 
 30 | def fetch_adult(features_drop: list = None, return_X_y: bool = False, url_id: int = 0) -> \
 31 |         Union[Bunch, Tuple[np.ndarray, np.ndarray]]:
 32 |     """
 33 |     Downloads and pre-processes 'adult' dataset.
 34 |     More info: http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/
 35 | 
 36 |     Parameters
 37 |     ----------
 38 |     features_drop
 39 |         List of features to be dropped from dataset, by default drops ["fnlwgt", "Education-Num"]
 40 |     return_X_y
 41 |         If true, return features X and labels y as numpy arrays, if False return a Bunch object
 42 |     url_id
 43 |         Index specifying which URL to use for downloading
 44 | 
 45 |     Returns
 46 |     -------
 47 |     Bunch
 48 |         Dataset, labels, a list of features and a dictionary containing a list with the potential categories
 49 |         for each categorical feature where the key refers to the feature column.
 50 |     (data, target)
 51 |         Tuple if ``return_X_y`` is true
 52 |     """
 53 |     if features_drop is None:
 54 |         features_drop = ["fnlwgt", "Education-Num"]
 55 | 
 56 |     # download data
 57 |     dataset_url = ADULT_URLS[url_id]
 58 |     raw_features = ['Age', 'Workclass', 'fnlwgt', 'Education', 'Education-Num', 'Marital Status',
 59 |                     'Occupation', 'Relationship', 'Race', 'Sex', 'Capital Gain', 'Capital Loss',
 60 |                     'Hours per week', 'Country', 'Target']
 61 |     try:
 62 |         resp = requests.get(dataset_url)
 63 |         resp.raise_for_status()
 64 |     except RequestException:
 65 |         logger.exception("Could not connect, URL may be out of service")
 66 |         raise
 67 | 
 68 |     raw_data = pd.read_csv(StringIO(resp.text), names=raw_features, delimiter=', ', engine='python').fillna('?')
 69 | 
 70 |     # get labels, features and drop unnecessary features
 71 |     labels = (raw_data['Target'] == '>50K').astype(int).values
 72 |     features_drop += ['Target']
 73 |     data = raw_data.drop(features_drop, axis=1)
 74 |     features = list(data.columns)
 75 | 
 76 |     # map categorical features
 77 |     education_map = {
 78 |         '10th': 'Dropout', '11th': 'Dropout', '12th': 'Dropout', '1st-4th':
 79 |             'Dropout', '5th-6th': 'Dropout', '7th-8th': 'Dropout', '9th':
 80 |             'Dropout', 'Preschool': 'Dropout', 'HS-grad': 'High School grad',
 81 |         'Some-college': 'High School grad', 'Masters': 'Masters',
 82 |         'Prof-school': 'Prof-School', 'Assoc-acdm': 'Associates',
 83 |         'Assoc-voc': 'Associates'
 84 |     }
 85 |     occupation_map = {
 86 |         "Adm-clerical": "Admin", "Armed-Forces": "Military",
 87 |         "Craft-repair": "Blue-Collar", "Exec-managerial": "White-Collar",
 88 |         "Farming-fishing": "Blue-Collar", "Handlers-cleaners":
 89 |             "Blue-Collar", "Machine-op-inspct": "Blue-Collar", "Other-service":
 90 |             "Service", "Priv-house-serv": "Service", "Prof-specialty":
 91 |             "Professional", "Protective-serv": "Other", "Sales":
 92 |             "Sales", "Tech-support": "Other", "Transport-moving":
 93 |             "Blue-Collar"
 94 |     }
 95 |     country_map = {
 96 |         'Cambodia': 'SE-Asia', 'Canada': 'British-Commonwealth', 'China':
 97 |             'China', 'Columbia': 'South-America', 'Cuba': 'Other',
 98 |         'Dominican-Republic': 'Latin-America', 'Ecuador': 'South-America',
 99 |         'El-Salvador': 'South-America', 'England': 'British-Commonwealth',
100 |         'France': 'Euro_1', 'Germany': 'Euro_1', 'Greece': 'Euro_2',
101 |         'Guatemala': 'Latin-America', 'Haiti': 'Latin-America',
102 |         'Holand-Netherlands': 'Euro_1', 'Honduras': 'Latin-America',
103 |         'Hong': 'China', 'Hungary': 'Euro_2', 'India':
104 |             'British-Commonwealth', 'Iran': 'Other', 'Ireland':
105 |             'British-Commonwealth', 'Italy': 'Euro_1', 'Jamaica':
106 |             'Latin-America', 'Japan': 'Other', 'Laos': 'SE-Asia', 'Mexico':
107 |             'Latin-America', 'Nicaragua': 'Latin-America',
108 |         'Outlying-US(Guam-USVI-etc)': 'Latin-America', 'Peru':
109 |             'South-America', 'Philippines': 'SE-Asia', 'Poland': 'Euro_2',
110 |         'Portugal': 'Euro_2', 'Puerto-Rico': 'Latin-America', 'Scotland':
111 |             'British-Commonwealth', 'South': 'Euro_2', 'Taiwan': 'China',
112 |         'Thailand': 'SE-Asia', 'Trinadad&Tobago': 'Latin-America',
113 |         'United-States': 'United-States', 'Vietnam': 'SE-Asia'
114 |     }
115 |     married_map = {
116 |         'Never-married': 'Never-Married', 'Married-AF-spouse': 'Married',
117 |         'Married-civ-spouse': 'Married', 'Married-spouse-absent':
118 |             'Separated', 'Separated': 'Separated', 'Divorced':
119 |             'Separated', 'Widowed': 'Widowed'
120 |     }
121 |     mapping = {'Education': education_map, 'Occupation': occupation_map, 'Country': country_map,
122 |                'Marital Status': married_map}
123 | 
124 |     data_copy = data.copy()
125 |     for f, f_map in mapping.items():
126 |         data_tmp = data_copy[f].values
127 |         for key, value in f_map.items():
128 |             data_tmp[data_tmp == key] = value
129 |         data[f] = data_tmp
130 | 
131 |     # get categorical features and apply labelencoding
132 |     categorical_features = [f for f in features if data[f].dtype == 'O']
133 |     category_map = {}
134 |     for f in categorical_features:
135 |         le = LabelEncoder()
136 |         data_tmp = le.fit_transform(data[f].values)
137 |         data[f] = data_tmp
138 |         category_map[features.index(f)] = list(le.classes_)
139 | 
140 |     # only return data values
141 |     data = data.values
142 |     target_names = ['<=50K', '>50K']
143 | 
144 |     if return_X_y:
145 |         return data, labels
146 | 
147 |     return Bunch(data=data, target=labels, feature_names=features, target_names=target_names, category_map=category_map)
148 | 
149 | 
150 | def load_adult_dataset():
151 |     """
152 |     Load the Adult dataset.
153 |     """
154 | 
155 |     logging.info("Preprocessing data...")
156 |     return fetch_adult()
157 | 
158 | 
159 | def preprocess_adult_dataset(dataset, seed=0, n_train_examples=30000) -> Dict[str, Any]:
160 |     """
161 |     Splits dataset into train and test subsets and preprocesses it.
162 |     """
163 | 
164 |     logging.info("Splitting data...")
165 | 
166 |     np.random.seed(seed)
167 |     data = dataset.data
168 |     target = dataset.target
169 |     data_perm = np.random.permutation(np.c_[data, target])
170 |     data = data_perm[:, :-1]
171 |     target = data_perm[:, -1]
172 | 
173 |     X_train, y_train = data[:n_train_examples, :], target[:n_train_examples]
174 |     X_test, y_test = data[n_train_examples + 1:, :], target[n_train_examples + 1:]
175 | 
176 |     logging.info("Transforming data...")
177 |     category_map = dataset.category_map
178 |     feature_names = dataset.feature_names
179 | 
180 |     ordinal_features = [x for x in range(len(feature_names)) if x not in list(category_map.keys())]
181 |     ordinal_transformer = StandardScaler()
182 | 
183 |     categorical_features = list(category_map.keys())
184 |     categorical_transformer = OneHotEncoder(drop='first', handle_unknown='error')
185 | 
186 |     preprocessor = ColumnTransformer(
187 |         transformers=[
188 |             ('num', ordinal_transformer, ordinal_features),
189 |             ('cat', categorical_transformer, categorical_features)
190 |         ]
191 |     )
192 | 
193 |     preprocessor.fit(X_train)
194 |     X_train_proc = preprocessor.transform(X_train)
195 |     X_test_proc = preprocessor.transform(X_test)
196 | 
197 |     # create groups for categorical variables
198 |     numerical_feats_idx = preprocessor.transformers_[0][2]
199 |     categorical_feats_idx = preprocessor.transformers_[1][2]
200 |     ohe = preprocessor.transformers_[1][1]
201 | 
202 |     # compute encoded dimension; -1 as ohe is setup with drop='first'
203 |     feat_enc_dim = [len(cat_enc) - 1 for cat_enc in ohe.categories_]
204 |     num_feats_names = [feature_names[i] for i in numerical_feats_idx]
205 |     cat_feats_names = [feature_names[i] for i in categorical_feats_idx]
206 | 
207 |     group_names = num_feats_names + cat_feats_names
208 |     # each sublist contains the col. indices for each variable in group_names
209 |     groups = []
210 |     cat_var_idx = 0
211 | 
212 |     for name in group_names:
213 |         if name in num_feats_names:
214 |             groups.append(list(range(len(groups), len(groups) + 1)))
215 |         else:
216 |             start_idx = groups[-1][-1] + 1 if groups else 0
217 |             groups.append(list(range(start_idx, start_idx + feat_enc_dim[cat_var_idx])))
218 |             cat_var_idx += 1
219 | 
220 |     return {
221 |         'X': {
222 |             'raw': {'train': X_train, 'test': X_test},
223 |             'processed': {'train': X_train_proc, 'test': X_test_proc}},
224 |         'y': {'train': y_train, 'test': y_test},
225 |         'preprocessor': preprocessor,
226 |         'orig_feature_names': feature_names,
227 |         'groups': groups,
228 |         'group_names': group_names,
229 |     }
230 | 
231 | 
232 | def main():
233 | 
234 |     if not os.path.exists('data'):
235 |         os.mkdir('data')
236 | 
237 |     # load and preprocess data
238 |     adult_dataset = load_adult_dataset()
239 |     adult_preprocessed = preprocess_adult_dataset(adult_dataset, n_train_examples=args.n_train_examples)
240 |     # select first args.n_background_samples in train set as background dataset
241 |     background_dataset = {'X': {'raw': None, 'preprocessed': None}, 'y': None}
242 |     n_examples = args.n_background_samples
243 |     background_dataset['X']['raw'] = adult_preprocessed['X']['raw']['train'][0:n_examples, :]
244 |     background_dataset['X']['preprocessed'] = adult_preprocessed['X']['processed']['train'][0:n_examples, :]
245 |     background_dataset['y'] = adult_preprocessed['y']['train'][0:n_examples]
246 |     with open('data/adult_background.pkl', 'wb') as f:
247 |         pickle.dump(background_dataset, f)
248 |     with open('data/adult_processed.pkl', 'wb') as f:
249 |         pickle.dump(adult_preprocessed, f)
250 | 
251 | 
252 | if __name__ == '__main__':
253 |     parser = argparse.ArgumentParser()
254 |     parser.add_argument('-n_background_samples', type=int, default=100, help="Background set size.")
255 |     parser.add_argument('-n_train_examples', type=int, default=30000, help="Number of training examples.")
256 |     args = parser.parse_args()
257 |     main()
258 | 


--------------------------------------------------------------------------------