`_.
17 |
--------------------------------------------------------------------------------
/docs/related.rst:
--------------------------------------------------------------------------------
1 | ===================
2 | Related Resources
3 | ===================
4 | We summarize existing related resources for bandit algorithms and off-policy evaluation.
5 |
6 |
7 | Related Datasets
8 | --------------------
9 | Our dataset is most closely related to those of :cite:`Lefortier2016` and :cite:`Li2010`.
10 | :cite:`Lefortier2016` introduces a large-scale logged bandit feedback data (Criteo data) from a leading company in the display advertising, Criteo.
11 | The data contains context vectors of user impressions, advertisements (ads) as actions, and click indicators as reward.
12 | It also provides the ex ante probability of each ad being selected by the behavior policy.
13 | Therefore, this data can be used to compare different *off-policy learning* methods, which aim to learn a new bandit policy using only log data generated by a behavior policy.
14 | In contrast, :cite:`Li2010` introduces a dataset (Yahoo! data) collected on a news recommendation interface of the the Yahoo! Today Module.
15 | The data contains context vectors of user impressions, presented news as actions, and click indicators as reward.
16 | It was collected by running uniform random policy on the new recommendation platform, allowing researchers to evaluate their own bandit algorithms.
17 |
18 | However, the Criteo and Yahoo! data have limitations, which we overcome as follows:
19 |
20 | * The previous datasets do not provide the code (production implementation) of their behavior policy. Moreover, the data was collected by running only a single behavior policy. As a result, these data cannot be used for the evaluation and comparison of different OPE estimators.
21 |
22 | :math:`\rightarrow` In contrast, we provide the code of our behavior policies (i.e., Bernoulli TS and Random) in our pipeline, which allows researchers to re-run the same behavior policies on the log data. Our open data also contains logged bandit feedback data generated by *multiple* behavior policies. It enables the evaluation and comparison of different OPE estimators. This is the first large-scale bandit dataset that enables such evaluation of OPE with the ground-truth policy value of behavior policies.
23 |
24 | * The previous datasets do not provide a pipeline implementation to handle their data. Researchers have to re-implement the experimental environment by themselves before implementing their own methods. This may lead to inconsistent experimental conditions across different studies, potentially causing reproducibility issues.
25 |
26 | :math:`\rightarrow` We implement the Open Bandit Pipeline to simplify and standardize the experimental processing of bandit algorithms and OPE with our open data. This tool thus contributes to the reproducible and transparent use of our data.
27 |
28 | The following table summarizes key differences between our data and existing ones.
29 |
30 | .. image:: ./_static/images/related_data.png
31 | :scale: 40%
32 | :align: center
33 |
34 | Related Packages
35 | -------------------
36 | There are several existing Python packages related to our Open Bandit Pipeline.
37 | For example, *contextualbandits* package (https://github.com/david-cortes/contextualbandits) contains implementations of several contextual bandit algorithms :cite:`Cortes2018`.
38 | It aims to provide an easy procedure to compare bandit algorithms to reproduce research papers that do not provide easily-available implementations.
39 | In addition, *RecoGym* (https://github.com/criteo-research/reco-gym) focuses on providing simulation bandit environments imitating the e-commerce recommendation setting :cite:`Rohde2018`.
40 | This package also implements an online bandit algorithm based on epsilon greedy and off-policy learning method based on IPW.
41 |
42 | However, the following features differentiate our pipeline from the previous ones:
43 |
44 | * The previous packages focus on implementing and comparing online bandit algorithms or off-policy learning method. Instead, they **cannot** be used to implement and compare the off-policy evaluation methods.
45 |
46 | :math:`\rightarrow` Our package implements a wide variety of OPE estimators including advanced ones such as Switch Estimators :cite:`Wang2016`, More Robust Doubly Robust :cite:`Farajtabar2018`, and Doubly Robust with Shrinkage :cite:`Su2019`. Moreover, it is possible to compare the estimation accuracies of these estimators with our package in a fair manner. Our package also provides flexible interfaces for implementing new OPE estimators. Thus, researchers can easily compare their own estimators with other methods using our packages.
47 |
48 | * The previous packages cannot handle real-world bandit datasets.
49 |
50 | :math:`\rightarrow` Our package comes with the Open Bandit Dataset and includes the **dataset module**. This enables the evaluation of bandit algorithms and off-policy estimators using our real-world data. This function contributes to realistic experiments on these topics.
51 |
52 | The following table summarizes key differences between our pipeline and existing ones.
53 |
54 | .. image:: ./_static/images/related_packages.png
55 | :scale: 40%
56 | :align: center
57 |
--------------------------------------------------------------------------------
/docs/requirements.txt:
--------------------------------------------------------------------------------
1 | # Readthedocs requirements
2 | sphinx_rtd_theme
3 | sphinxcontrib-bibtex
4 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | # Open Bandit Pipeline Examples
2 |
3 | This page contains a list of examples written with Open Bandit Pipeline.
4 |
5 | - [`obd/`](./obd/): example implementations for evaluating standard off-policy estimators with the small sample Open Bandit Dataset.
6 | - [`synthetic/`](./synthetic/): example implementations for evaluating several off-policy estimators with synthetic bandit datasets.
7 | - [`multiclass/`](./multiclass/): example implementations for evaluating several off-policy estimators with multi-class classification datasets.
8 | - [`replay/`](./replay/): example implementations for evaluating Replay Method with online bandit algorithms.
9 | - [`opl/`](./opl/): example implementations for comparing the performance of several off-policy learners with synthetic bandit datasets.
10 | - [`quickstart/`](./quickstart/): some quickstart notebooks to guide the usage of Open Bandit Pipeline.
11 |
--------------------------------------------------------------------------------
/examples/multiclass/README.md:
--------------------------------------------------------------------------------
1 | # Example Experiment with Multi-class Classification Data
2 |
3 |
4 | ## Description
5 |
6 | We use multi-class classification datasets to evaluate OPE estimators. Specifically, we evaluate the estimation performance of some well-known OPE estimators using the ground-truth policy value of an evaluation policy calculable with multi-class classification data.
7 |
8 | ## Evaluating Off-Policy Estimators
9 |
10 | In the following, we evaluate the estimation performance of
11 |
12 | - Direct Method (DM)
13 | - Inverse Probability Weighting (IPW)
14 | - Self-Normalized Inverse Probability Weighting (SNIPW)
15 | - Doubly Robust (DR)
16 | - Self-Normalized Doubly Robust (SNDR)
17 | - Switch Doubly Robust (Switch-DR)
18 | - Doubly Robust with Optimistic Shrinkage (DRos)
19 |
20 | For Switch-DR and DRos, we tune the built-in hyperparameters using SLOPE (Su et al., 2020; Tucker et al., 2021), a data-driven hyperparameter tuning method for OPE estimators.
21 | See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
22 |
23 | ### Files
24 | - [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators using multi-class classification data.
25 | - [`./conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML methods used to define regression model.
26 |
27 | ### Scripts
28 |
29 | ```bash
30 | # run evaluation of OPE estimators with multi-class classification data
31 | python evaluate_off_policy_estimators.py\
32 | --n_runs $n_runs\
33 | --dataset_name $dataset_name \
34 | --eval_size $eval_size \
35 | --base_model_for_behavior_policy $base_model_for_behavior_policy\
36 | --alpha_b $alpha_b \
37 | --base_model_for_evaluation_policy $base_model_for_evaluation_policy\
38 | --alpha_e $alpha_e \
39 | --base_model_for_reg_model $base_model_for_reg_model\
40 | --n_jobs $n_jobs\
41 | --random_state $random_state
42 | ```
43 | - `$n_runs` specifies the number of simulation runs in the experiment to estimate standard deviations of the performance of OPE estimators.
44 | - `$dataset_name` specifies the name of the multi-class classification dataset and should be one of "breast_cancer", "digits", "iris", or "wine".
45 | - `$eval_size` specifies the proportion of the dataset to include in the evaluation split.
46 | - `$base_model_for_behavior_policy` specifies the base ML model for defining behavior policy and should be one of "logistic_regression", "random_forest", or "lightgbm".
47 | - `$alpha_b`: specifies the ratio of a uniform random policy when constructing a behavior policy.
48 | - `$base_model_for_evaluation_policy` specifies the base ML model for defining evaluation policy and should be one of "logistic_regression", "random_forest", or "lightgbm".
49 | - `$alpha_e`: specifies the ratio of a uniform random policy when constructing an evaluation policy.
50 | - `$base_model_for_reg_model` specifies the base ML model for defining regression model and should be one of "logistic_regression", "random_forest", or "lightgbm".
51 | - `$n_jobs` is the maximum number of concurrently running jobs.
52 |
53 | For example, the following command compares the estimation performance (relative estimation error; relative-ee) of the OPE estimators using the digits dataset.
54 |
55 | ```bash
56 | python evaluate_off_policy_estimators.py\
57 | --n_runs 30\
58 | --dataset_name digits\
59 | --eval_size 0.7\
60 | --base_model_for_behavior_policy logistic_regression\
61 | --alpha_b 0.4\
62 | --base_model_for_evaluation_policy random_forest\
63 | --alpha_e 0.9\
64 | --base_model_for_reg_model lightgbm\
65 | --n_jobs -1\
66 | --random_state 12345
67 |
68 | # relative-ee of OPE estimators and their standard deviations (lower is better).
69 | # =============================================
70 | # random_state=12345
71 | # ---------------------------------------------
72 | # mean std
73 | # dm 0.436541 0.017629
74 | # ipw 0.030288 0.024506
75 | # snipw 0.022764 0.017917
76 | # dr 0.016156 0.012679
77 | # sndr 0.022082 0.016865
78 | # switch-dr 0.034657 0.018575
79 | # dr-os 0.015868 0.012537
80 | # =============================================
81 | ```
82 |
83 | The above result can change with different situations. You can try the evaluation of OPE with other experimental settings easily.
84 |
85 |
86 | ## References
87 |
88 | - Yi Su, Pavithra Srinath, Akshay Krishnamurthy. [Adaptive Estimator Selection for Off-Policy Evaluation](https://arxiv.org/abs/2002.07729), ICML2020.
89 | - Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík. [Doubly Robust Off-policy Evaluation with Shrinkage](https://arxiv.org/abs/1907.09623), ICML2020.
90 | - George Tucker and Jonathan Lee. [Improved Estimator Selection for Off-Policy Evaluation](https://lyang36.github.io/icml2021_rltheory/camera_ready/79.pdf), Workshop on Reinforcement Learning
91 | Theory at ICML2021.
92 | - Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik. [Optimal and Adaptive Off-policy Evaluation in Contextual Bandits](https://arxiv.org/abs/1612.01205), ICML2017.
93 | - Miroslav Dudik, John Langford, Lihong Li. [Doubly Robust Policy Evaluation and Learning](https://arxiv.org/abs/1103.4601). ICML2011.
94 | - Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. [Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation](https://arxiv.org/abs/2008.07146). NeurIPS2021 Track on Datasets and Benchmarks.
95 |
96 |
--------------------------------------------------------------------------------
/examples/multiclass/conf/hyperparams.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 30
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | logistic_regression:
8 | max_iter: 10000
9 | C: 100
10 | random_state: 12345
11 | random_forest:
12 | n_estimators: 30
13 | max_depth: 5
14 | min_samples_leaf: 10
15 | random_state: 12345
16 |
--------------------------------------------------------------------------------
/examples/obd/README.md:
--------------------------------------------------------------------------------
1 | # Example Experiment with Open Bandit Dataset
2 |
3 | ## Description
4 |
5 | We use Open Bandit Dataset to implement the evaluation of OPE. Specifically, we evaluate the estimation performance of some well-known OPE estimators using the on-policy policy value of an evaluation policy, which is calculable with the dataset.
6 |
7 | ## Evaluating Off-Policy Estimators
8 |
9 | In the following, we evaluate the estimation performance of
10 |
11 | - Direct Method (DM)
12 | - Inverse Probability Weighting (IPW)
13 | - Self-Normalized Inverse Probability Weighting (SNIPW)
14 | - Doubly Robust (DR)
15 | - Self-Normalized Doubly Robust (SNDR)
16 | - Switch Doubly Robust (Switch-DR)
17 | - Doubly Robust with Optimistic Shrinkage (DRos)
18 |
19 | For Switch-DR and DRos, we tune the built-in hyperparameters using SLOPE, a data-driven hyperparameter tuning method for OPE estimators.
20 | See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
21 |
22 | ### Files
23 | - [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators using Open Bandit Dataset.
24 | - [`.conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML models used as the regression model in model dependent estimators (such as DM and DR).
25 |
26 | ### Scripts
27 |
28 | ```bash
29 | # run evaluation of OPE estimators with (small size) Open Bandit Dataset
30 | python evaluate_off_policy_estimators.py\
31 | --n_runs $n_runs\
32 | --base_model $base_model\
33 | --evaluation_policy $evaluation_policy\
34 | --behavior_policy $behavior_policy\
35 | --campaign $campaign\
36 | --n_sim_to_compute_action_dist $n_sim_to_compute_action_dist\
37 | --n_jobs $n_jobs\
38 | --random_state $random_state
39 | ```
40 | - `$n_runs` specifies the number of bootstrap sampling to estimate means and standard deviations of the performance of OPE estimators (i.e., relative estimation error).
41 | - `$base_model` specifies the base ML model for estimating the reward function, and should be one of `logistic_regression`, `random_forest`, or `lightgbm`.
42 | - `$evaluation_policy` and `$behavior_policy` specify the evaluation and behavior policies, respectively.
43 | They should be either 'bts' or 'random'.
44 | - `$campaign` specifies the campaign and should be one of 'all', 'men', or 'women'.
45 | - `$n_sim_to_compute_action_dist` is the number of monte carlo simulation to compute the action distribution of a given evaluation policy.
46 | - `$n_jobs` is the maximum number of concurrently running jobs.
47 |
48 | For example, the following command compares the estimation performance of the three OPE estimators by using Bernoulli TS as evaluation policy and Random as behavior policy in "All" campaign.
49 |
50 | ```bash
51 | python evaluate_off_policy_estimators.py\
52 | --n_runs 30\
53 | --base_model logistic_regression\
54 | --evaluation_policy bts\
55 | --behavior_policy random\
56 | --campaign all\
57 | --n_jobs -1
58 |
59 | # relative estimation errors of OPE estimators and their standard deviations.
60 | # ==============================
61 | # random_state=12345
62 | # ------------------------------
63 | # mean std
64 | # dm 0.156876 0.109898
65 | # ipw 0.311082 0.311170
66 | # snipw 0.311795 0.334736
67 | # dr 0.292464 0.315485
68 | # sndr 0.302407 0.328434
69 | # switch-dr 0.258410 0.160598
70 | # dr-os 0.159520 0.109660
71 | # ==============================
72 | ```
73 |
74 | Please refer to [this page](https://zr-obp.readthedocs.io/en/latest/evaluation_ope.html) for the evaluation of OPE protocol using our real-world data. Please visit [synthetic](../synthetic/) to try the evaluation of OPE estimators with synthetic bandit data. Moreover, in [benchmark/ope](https://github.com/st-tech/zr-obp/tree/master/benchmark/ope), we performed the benchmark experiments on several OPE estimators using the full size Open Bandit Dataset.
75 |
76 |
77 |
78 | ## References
79 |
80 | - Yi Su, Pavithra Srinath, Akshay Krishnamurthy. [Adaptive Estimator Selection for Off-Policy Evaluation](https://arxiv.org/abs/2002.07729), ICML2020.
81 | - Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík. [Doubly Robust Off-policy Evaluation with Shrinkage](https://arxiv.org/abs/1907.09623), ICML2020.
82 | - George Tucker and Jonathan Lee. [Improved Estimator Selection for Off-Policy Evaluation](https://lyang36.github.io/icml2021_rltheory/camera_ready/79.pdf), Workshop on Reinforcement Learning
83 | Theory at ICML2021.
84 | - Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik. [Optimal and Adaptive Off-policy Evaluation in Contextual Bandits](https://arxiv.org/abs/1612.01205), ICML2017.
85 | - Miroslav Dudik, John Langford, Lihong Li. [Doubly Robust Policy Evaluation and Learning](https://arxiv.org/abs/1103.4601). ICML2011.
86 | - Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. [Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation](https://arxiv.org/abs/2008.07146). NeurIPS2021 Track on Datasets and Benchmarks.
87 |
88 |
--------------------------------------------------------------------------------
/examples/obd/conf/hyperparams.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 30
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | logistic_regression:
8 | max_iter: 10000
9 | C: 100
10 | random_state: 12345
11 | random_forest:
12 | n_estimators: 30
13 | max_depth: 5
14 | min_samples_leaf: 10
15 | random_state: 12345
16 |
--------------------------------------------------------------------------------
/examples/opl/README.md:
--------------------------------------------------------------------------------
1 | # Example with Off-Policy Policy Learners
2 |
3 |
4 | ## Description
5 |
6 | We use synthetic bandit data to evaluate some off-policy learners using their ground-truth policy value calculable with synthetic data.
7 |
8 | ## Evaluating Off-Policy Learners
9 |
10 | In the following, we evaluate the performances of
11 |
12 | - Uniform Random Policy (`Random`)
13 | - Inverse Probability Weighting Policy Learner (`IPWLearner`)
14 | - Policy Learner using Neural Networks (`NNPolicyLearner`)
15 |
16 | See [our documentation](https://zr-obp.readthedocs.io/en/latest/_autosummary/obp.policy.offline.html) for the details about `IPWLearner` and `NNPolicyLearner`.
17 |
18 | `NNPolicyLearner` can use the following OPE estimators as the objective function:
19 | - Direct Method (DM)
20 | - Inverse Probability Weighting (IPW)
21 | - Doubly Robust (DR)
22 |
23 | See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
24 |
25 | ### Files
26 | - [`./evaluate_off_policy_learners.py`](./evaluate_off_policy_learners.py) implements the evaluation of off-policy learners using synthetic bandit data.
27 | - [`./conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML methods used to define regression model and IPWLearner.
28 |
29 | ### Scripts
30 |
31 | ```bash
32 | # run evaluation of off-policy learners with synthetic bandit data
33 | python evaluate_off_policy_learners.py\
34 | --n_rounds $n_rounds\
35 | --n_actions $n_actions\
36 | --dim_context $dim_context\
37 | --beta $beta\
38 | --base_model_for_evaluation_policy $base_model_for_evaluation_policy\
39 | --base_model_for_reg_model $base_model_for_reg_model\
40 | --off_policy_objective $off_policy_objective\
41 | --n_hidden $n_hidden\
42 | --n_layers $n_layers\
43 | --activation $activation\
44 | --solver $solver\
45 | --batch_size $batch_size\
46 | --early_stopping\
47 | --random_state $random_state
48 | ```
49 | - `$n_rounds` and `$n_actions` specify the sample size and the number of actions of the synthetic bandit data, respectively.
50 | - `$dim_context` specifies the dimension of context vectors.
51 | - `$beta` specifies the inverse temperature parameter to control the behavior policy.
52 | - `$base_model_for_ipw_learner` specifies the base ML model for defining evaluation policy and should be one of "logistic_regression", "random_forest", or "lightgbm".
53 | - `$off_policy_objective` specifies the OPE estimator for NNPolicyLearner and should be one of "dm", "ipw", or "dr".
54 | - `$n_hidden` specifies the size of hidden layers in NNPolicyLearner.
55 | - `$n_layers` specifies the number of hidden layers in NNPolicyLearner.
56 | - `$activation` specifies the activation function for NNPolicyLearner and should be one of "identity", "tanh", "logistic", or "relu".
57 | - `$solver` specifies the optimizer for NNPolicyLearner and should be one of "adagrad", "sgd", or "adam".
58 | - `$batch_size` specifies the batch size for NNPolicyLearner.
59 | - `$early_stopping` enables early stopping of training of NNPolicyLearner.
60 |
61 | For example, the following command compares the performance of the off-policy learners using synthetic bandit data with 100,00 rounds, 10 actions, five dimensional context vectors.
62 |
63 | ```bash
64 | python evaluate_off_policy_learners.py\
65 | --n_rounds 10000\
66 | --n_actions 10\
67 | --dim_context 5\
68 | --base_model_for_ipw_learner logistic_regression\
69 | --off_policy_objective ipw\
70 | --n_hidden 100\
71 | --n_layers 1\
72 | --activation relu\
73 | --solver adam\
74 | --batch_size 200\
75 | --early_stopping
76 |
77 | # policy values of off-policy learners (higher means better)
78 | # =============================================
79 | # random_state=12345
80 | # ---------------------------------------------
81 | # policy value
82 | # random_policy 0.499925
83 | # ipw_learner 0.782430
84 | # nn_policy_learner (with ipw) 0.735947
85 | # =============================================
86 | ```
87 |
88 | The above result can change with different situations. You can try the evaluation with other experimental settings easily.
89 |
90 |
--------------------------------------------------------------------------------
/examples/opl/conf/hyperparams.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 30
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | logistic_regression:
8 | max_iter: 10000
9 | C: 100
10 | random_state: 12345
11 | random_forest:
12 | n_estimators: 30
13 | max_depth: 5
14 | min_samples_leaf: 10
15 | random_state: 12345
16 |
--------------------------------------------------------------------------------
/examples/quickstart/README.md:
--------------------------------------------------------------------------------
1 | # Open Bandit Pipeline Quickstart Notebooks
2 |
3 | This page contains a list of quickstart notebooks written with Open Bandit Pipeline.
4 |
5 | - [`obd.ipynb`](./obd.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/obd.ipynb): a quickstart guide of using Open Bandit Dataset and Pipeline to conduct some OPE experiments.
6 | - [`synthetic.ipynb`](./synthetic.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/synthetic.ipynb): a quickstart guide to implement the standard off-policy learning, OPE, and the evaluation of OPE on synthetic bandit data with Open Bandit Pipeline.
7 | - [`multiclass.ipynb`](./multiclass.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/multiclass.ipynb): a quickstart guide to handle multi-class classification data as logged bandit data for the standard off-policy learning, OPE, and the evaluation of OPE with Open Bandit Pipeline.
8 | - [`online.ipynb`](./replay.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/online.ipynb): a quickstart guide to implement OPE and the evaluation of OPE for online bandit algorithms with Open Bandit Pipeline.
9 | - [`opl.ipynb`](./opl.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/opl.ipynb): a quickstart guide to implement off-policy learners and the evaluation of off-policy learners with Open Bandit Pipeline.
10 | - [`synthetic_slate.ipynb`](./synthetic_slate.ipynb) [](https://colab.research.google.com/github/st-tech/zr-obp/blob/master/examples/quickstart/synthetic_slate.ipynb): a quickstart guide to implement OPE and the evaluation of OPE for the slate recommendation setting with Open Bandit Pipeline.
11 |
--------------------------------------------------------------------------------
/examples/replay/README.md:
--------------------------------------------------------------------------------
1 | # Replay Example with Online Bandit Algorithms
2 |
3 |
4 | ## Description
5 |
6 | We use synthetic bandit datasets to evaluate OPE of online bandit algorithms.
7 | Specifically, we evaluate the estimation performance of some well-known OPE estimators using the ground-truth policy value of an evaluation policy calculable with synthetic data.
8 |
9 |
10 | ## Evaluating Off-Policy Estimators
11 |
12 | In the following, we evaluate the estimation performance of Replay Method (RM).
13 | RM uses a subset of the logged bandit feedback data where actions selected by the behavior policy are the same as that of the evaluation policy.
14 | Theoretically, RM is unbiased when the behavior policy is uniformly random and the evaluation policy is fixed.
15 | However, empirically, RM works well when evaluation policies are learning algorithms.
16 | Please refer to https://arxiv.org/abs/1003.5956 about the details of RM.
17 |
18 |
19 | ### Files
20 | - [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators by RM using synthetic bandit data.
21 |
22 | ### Scripts
23 |
24 | ```bash
25 | # run evaluation of OPE estimators with synthetic bandit data
26 | python evaluate_off_policy_estimators.py\
27 | --n_runs $n_runs\
28 | --n_rounds $n_rounds\
29 | --n_actions $n_actions\
30 | --n_sim $n_sim\
31 | --dim_context $dim_context\
32 | --n_jobs $n_jobs\
33 | --random_state $random_state
34 | ```
35 | - `$n_runs` specifies the number of simulation runs in the experiment to estimate standard deviations of the performance of OPE estimators.
36 | - `$n_rounds` and `$n_actions` specify the sample size and the number of actions of the synthetic bandit data.
37 | - `$dim_context` specifies the dimension of context vectors.
38 | - `$n_sim` specifeis the simulations in the Monte Carlo simulation to compute the ground-truth policy value.
39 | - `$evaluation_policy_name` specifeis the evaluation policy and should be one of "bernoulli_ts", "epsilon_greedy", "lin_epsilon_greedy", "lin_ts, lin_ucb", "logistic_epsilon_greedy", "logistic_ts", or "logistic_ucb".
40 | - `$n_jobs` is the maximum number of concurrently running jobs.
41 |
42 | For example, the following command compares the estimation performance (relative estimation error; relative-ee) of the OPE estimators using synthetic bandit data with 100,000 rounds, 30 actions, five dimensional context vectors.
43 |
44 | ```bash
45 | python evaluate_off_policy_estimators.py\
46 | --n_runs 20\
47 | --n_rounds 1000\
48 | --n_actions 30\
49 | --dim_context 5\
50 | --evaluation_policy_name bernoulli_ts\
51 | --n_sim 3\
52 | --n_jobs -1\
53 | --random_state 12345
54 |
55 | # relative-ee of OPE estimators and their standard deviations (lower means accurate).
56 | # =============================================
57 | # random_state=12345
58 | # ---------------------------------------------
59 | # mean std
60 | # rm 0.097064 0.091453
61 | # =============================================
62 | ```
63 |
64 | The above result can change with different situations.
65 | You can try the evaluation of OPE with other experimental settings easily.
66 |
--------------------------------------------------------------------------------
/examples/replay/evaluate_off_policy_estimators.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from pathlib import Path
3 |
4 | from joblib import delayed
5 | from joblib import Parallel
6 | import numpy as np
7 | from pandas import DataFrame
8 |
9 | from obp.dataset import logistic_reward_function
10 | from obp.dataset import SyntheticBanditDataset
11 | from obp.ope import OffPolicyEvaluation
12 | from obp.ope import ReplayMethod
13 | from obp.policy import BernoulliTS
14 | from obp.policy import EpsilonGreedy
15 | from obp.policy import LinEpsilonGreedy
16 | from obp.policy import LinTS
17 | from obp.policy import LinUCB
18 | from obp.policy import LogisticEpsilonGreedy
19 | from obp.policy import LogisticTS
20 | from obp.policy import LogisticUCB
21 | from obp.simulator import calc_ground_truth_policy_value
22 | from obp.utils import run_bandit_replay
23 |
24 | ope_estimators = [ReplayMethod()]
25 |
26 | if __name__ == "__main__":
27 | parser = argparse.ArgumentParser(
28 | description="evaluate off-policy estimators with replay bandit algorithms and synthetic bandit data."
29 | )
30 | parser.add_argument(
31 | "--n_runs", type=int, default=1, help="number of simulations in the experiment."
32 | )
33 | parser.add_argument(
34 | "--n_rounds",
35 | type=int,
36 | default=10000,
37 | help="sample size of logged bandit data.",
38 | )
39 | parser.add_argument(
40 | "--n_actions",
41 | type=int,
42 | default=10,
43 | help="number of actions.",
44 | )
45 | parser.add_argument(
46 | "--dim_context",
47 | type=int,
48 | default=5,
49 | help="dimensions of context vectors.",
50 | )
51 | parser.add_argument(
52 | "--n_sim",
53 | type=int,
54 | default=1,
55 | help="number of simulations to calculate ground truth policy values",
56 | )
57 | parser.add_argument(
58 | "--evaluation_policy_name",
59 | type=str,
60 | choices=[
61 | "bernoulli_ts",
62 | "epsilon_greedy",
63 | "lin_epsilon_greedy",
64 | "lin_ts",
65 | "lin_ucb",
66 | "logistic_epsilon_greedy",
67 | "logistic_ts",
68 | "logistic_ucb",
69 | ],
70 | required=True,
71 | help="the name of evaluation policy, bernoulli_ts, epsilon_greedy, lin_epsilon_greedy, lin_ts, lin_ucb, logistic_epsilon_greedy, logistic_ts, or logistic_ucb",
72 | )
73 | parser.add_argument(
74 | "--n_jobs",
75 | type=int,
76 | default=1,
77 | help="the maximum number of concurrently running jobs.",
78 | )
79 | parser.add_argument("--random_state", type=int, default=12345)
80 | args = parser.parse_args()
81 | print(args)
82 |
83 | # configurations
84 | n_runs = args.n_runs
85 | n_rounds = args.n_rounds
86 | n_actions = args.n_actions
87 | dim_context = args.dim_context
88 | n_sim = args.n_sim
89 | evaluation_policy_name = args.evaluation_policy_name
90 | n_jobs = args.n_jobs
91 | random_state = args.random_state
92 | np.random.seed(random_state)
93 |
94 | # define evaluation policy
95 | evaluation_policy_dict = dict(
96 | bernoulli_ts=BernoulliTS(n_actions=n_actions, random_state=random_state),
97 | epsilon_greedy=EpsilonGreedy(
98 | n_actions=n_actions, epsilon=0.1, random_state=random_state
99 | ),
100 | lin_epsilon_greedy=LinEpsilonGreedy(
101 | dim=dim_context, n_actions=n_actions, epsilon=0.1, random_state=random_state
102 | ),
103 | lin_ts=LinTS(dim=dim_context, n_actions=n_actions, random_state=random_state),
104 | lin_ucb=LinUCB(dim=dim_context, n_actions=n_actions, random_state=random_state),
105 | logistic_epsilon_greedy=LogisticEpsilonGreedy(
106 | dim=dim_context, n_actions=n_actions, epsilon=0.1, random_state=random_state
107 | ),
108 | logistic_ts=LogisticTS(
109 | dim=dim_context, n_actions=n_actions, random_state=random_state
110 | ),
111 | logistic_ucb=LogisticUCB(
112 | dim=dim_context, n_actions=n_actions, random_state=random_state
113 | ),
114 | )
115 | evaluation_policy = evaluation_policy_dict[evaluation_policy_name]
116 |
117 | def process(i: int):
118 | # synthetic data generator with uniformly random policy
119 | dataset = SyntheticBanditDataset(
120 | n_actions=n_actions,
121 | dim_context=dim_context,
122 | reward_function=logistic_reward_function,
123 | behavior_policy_function=None, # uniformly random
124 | random_state=i,
125 | )
126 | # sample new data of synthetic logged bandit feedback
127 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
128 | # simulate the evaluation policy
129 | action_dist = run_bandit_replay(
130 | bandit_feedback=bandit_feedback, policy=evaluation_policy
131 | )
132 | # estimate the ground-truth policy values of the evaluation policy
133 | # by Monte-Carlo Simulation using p(r|x,a), the reward distribution
134 | ground_truth_policy_value = calc_ground_truth_policy_value(
135 | bandit_feedback=bandit_feedback,
136 | reward_sampler=dataset.sample_reward, # p(r|x,a)
137 | policy=evaluation_policy,
138 | n_sim=n_sim, # the number of simulations
139 | )
140 | # evaluate estimators' performances using relative estimation error (relative-ee)
141 | ope = OffPolicyEvaluation(
142 | bandit_feedback=bandit_feedback,
143 | ope_estimators=ope_estimators,
144 | )
145 | metric_i = ope.evaluate_performance_of_estimators(
146 | ground_truth_policy_value=ground_truth_policy_value,
147 | action_dist=action_dist,
148 | )
149 |
150 | return metric_i
151 |
152 | processed = Parallel(
153 | n_jobs=n_jobs,
154 | verbose=50,
155 | )([delayed(process)(i) for i in np.arange(n_runs)])
156 | metric_dict = {est.estimator_name: dict() for est in ope_estimators}
157 | for i, metric_i in enumerate(processed):
158 | for (
159 | estimator_name,
160 | relative_ee_,
161 | ) in metric_i.items():
162 | metric_dict[estimator_name][i] = relative_ee_
163 | se_df = DataFrame(metric_dict).describe().T.round(6)
164 |
165 | print("=" * 45)
166 | print(f"random_state={random_state}")
167 | print("-" * 45)
168 | print(se_df[["mean", "std"]])
169 | print("=" * 45)
170 |
171 | # save results of the evaluation of off-policy estimators in './logs' directory.
172 | log_path = Path("./logs")
173 | log_path.mkdir(exist_ok=True, parents=True)
174 | se_df.to_csv(log_path / "relative_ee_of_ope_estimators.csv")
175 |
--------------------------------------------------------------------------------
/examples/synthetic/README.md:
--------------------------------------------------------------------------------
1 | # Example Experiment with Synthetic Bandit Data
2 |
3 | ## Description
4 |
5 | We use synthetic bandit datasets to evaluate OPE estimators. Specifically, we evaluate the estimation performance of well-known estimators using the ground-truth policy value of an evaluation policy calculable with synthetic data.
6 |
7 | ## Evaluating Off-Policy Estimators
8 |
9 | In the following, we evaluate the estimation performance of
10 |
11 | - Direct Method (DM)
12 | - Inverse Probability Weighting (IPW)
13 | - Self-Normalized Inverse Probability Weighting (SNIPW)
14 | - Doubly Robust (DR)
15 | - Self-Normalized Doubly Robust (SNDR)
16 | - Switch Doubly Robust (Switch-DR)
17 | - Doubly Robust with Optimistic Shrinkage (DRos)
18 |
19 | For Switch-DR and DRos, we tune the built-in hyperparameters using SLOPE, a data-driven hyperparameter tuning method for OPE estimators.
20 | See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
21 |
22 | ### Files
23 | - [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators using synthetic bandit data.
24 | - [`./conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML methods used to define regression model and IPWLearner.
25 |
26 | ### Scripts
27 |
28 | ```bash
29 | # run evaluation of OPE estimators with synthetic bandit data
30 | python evaluate_off_policy_estimators.py\
31 | --n_runs $n_runs\
32 | --n_rounds $n_rounds\
33 | --n_actions $n_actions\
34 | --dim_context $dim_context\
35 | --beta $beta\
36 | --base_model_for_evaluation_policy $base_model_for_evaluation_policy\
37 | --base_model_for_reg_model $base_model_for_reg_model\
38 | --n_jobs $n_jobs\
39 | --random_state $random_state
40 | ```
41 | - `$n_runs` specifies the number of simulation runs in the experiment to estimate standard deviations of the performance of OPE estimators.
42 | - `$n_rounds` and `$n_actions` specify the sample size and the number of actions of the synthetic bandit data, respectively.
43 | - `$dim_context` specifies the dimension of context vectors.
44 | - `$beta` specifies the inverse temperature parameter to control the behavior policy.
45 | - `$base_model_for_evaluation_policy` specifies the base ML model for defining evaluation policy and should be one of "logistic_regression", "random_forest", or "lightgbm".
46 | - `$base_model_for_reg_model` specifies the base ML model for defining regression model and should be one of "logistic_regression", "random_forest", or "lightgbm".
47 | - `$n_jobs` is the maximum number of concurrently running jobs.
48 |
49 | For example, the following command compares the estimation performance (relative estimation error; relative-ee) of the OPE estimators using synthetic bandit data with 10,000 samples, 30 actions, five dimensional context vectors.
50 |
51 | ```bash
52 | python evaluate_off_policy_estimators.py\
53 | --n_runs 20\
54 | --n_rounds 10000\
55 | --n_actions 30\
56 | --dim_context 5\
57 | --beta -3\
58 | --base_model_for_evaluation_policy logistic_regression\
59 | --base_model_for_reg_model logistic_regression\
60 | --n_jobs -1\
61 | --random_state 12345
62 |
63 | # relative-ee of OPE estimators and their standard deviations (lower means accurate).
64 | # =============================================
65 | # random_state=12345
66 | # ---------------------------------------------
67 | # mean std
68 | # dm 0.074390 0.024525
69 | # ipw 0.009481 0.006899
70 | # snipw 0.006665 0.004541
71 | # dr 0.006175 0.004245
72 | # sndr 0.006118 0.003997
73 | # switch-dr 0.006175 0.004245
74 | # dr-os 0.021951 0.013337
75 | # =============================================
76 | ```
77 |
78 | The above result can change with different situations. You can try the evaluation of OPE with other experimental settings easily.
79 |
80 | ## References
81 |
82 | - Yi Su, Pavithra Srinath, Akshay Krishnamurthy. [Adaptive Estimator Selection for Off-Policy Evaluation](https://arxiv.org/abs/2002.07729), ICML2020.
83 | - Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík. [Doubly Robust Off-policy Evaluation with Shrinkage](https://arxiv.org/abs/1907.09623), ICML2020.
84 | - George Tucker and Jonathan Lee. [Improved Estimator Selection for Off-Policy Evaluation](https://lyang36.github.io/icml2021_rltheory/camera_ready/79.pdf), Workshop on Reinforcement Learning
85 | Theory at ICML2021.
86 | - Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik. [Optimal and Adaptive Off-policy Evaluation in Contextual Bandits](https://arxiv.org/abs/1612.01205), ICML2017.
87 | - Miroslav Dudik, John Langford, Lihong Li. [Doubly Robust Policy Evaluation and Learning](https://arxiv.org/abs/1103.4601). ICML2011.
88 | - Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. [Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation](https://arxiv.org/abs/2008.07146). NeurIPS2021 Track on Datasets and Benchmarks.
89 |
90 |
--------------------------------------------------------------------------------
/examples/synthetic/conf/hyperparams.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 30
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | logistic_regression:
8 | max_iter: 10000
9 | C: 100
10 | random_state: 12345
11 | random_forest:
12 | n_estimators: 30
13 | max_depth: 5
14 | min_samples_leaf: 10
15 | random_state: 12345
16 |
--------------------------------------------------------------------------------
/images/dataset.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/dataset.png
--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/logo.png
--------------------------------------------------------------------------------
/images/obd_stats.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/obd_stats.png
--------------------------------------------------------------------------------
/images/ope_results_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/ope_results_example.png
--------------------------------------------------------------------------------
/images/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/overview.png
--------------------------------------------------------------------------------
/images/recommended_fashion_items.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/images/recommended_fashion_items.png
--------------------------------------------------------------------------------
/obd/README.md:
--------------------------------------------------------------------------------
1 | # Open Bandit Dataset
2 |
3 | This directory contains the small size (10,000 records for each pair of campaign and behavior policy) version of our data that can be used for running our [quickstart guide](https://github.com/st-tech/zr-obp/blob/master/examples/quickstart/obd.ipynb) and [examples](https://github.com/st-tech/zr-obp/tree/master/examples/obd).
4 | The full size version of our data is available at [https://research.zozo.com/data.html](https://research.zozo.com/data.html).
5 |
6 |
7 | This dataset is released along with the paper:
8 |
9 | Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita.
10 | **Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation**
11 | [https://arxiv.org/abs/2008.07146](https://arxiv.org/abs/2008.07146)
12 |
13 | When using this dataset, please cite the paper with following bibtex:
14 | ```
15 | @article{saito2020open,
16 | title={Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation},
17 | author={Saito, Yuta and Shunsuke, Aihara and Megumi, Matsutani and Yusuke, Narita},
18 | journal={arXiv preprint arXiv:2008.07146},
19 | year={2020}
20 | }
21 | ```
22 |
23 | ## Data description
24 | Open Bandit Dataset is constructed in an A/B test of two multi-armed bandit policies on a large-scale fashion e-commerce platform, [ZOZOTOWN](https://zozo.jp/).
25 | It currently consists of a total of about 26M rows, each one representing a user impression with some feature values, selected items as actions, true propensity scores, and click indicators as an outcome.
26 | This is especially suitable for evaluating *off-policy evaluation* (OPE), which aims to estimate the counterfactual performance of hypothetical algorithms using data generated by a different algorithm.
27 |
28 |
29 | ## Fields
30 | Here is a detailed description of the fields (they are comma-separated in the CSV files):
31 |
32 | **{behavior_policy}/{campaign}.csv** (behavior_policy in (bts, random), campaign in (all, men, women))
33 | - timestamp: timestamps of impressions.
34 | - item_id: index of items as arms (index ranges from 0-79 in "All" campaign, 0-33 for "Men" campaign, and 0-45 "Women" campaign).
35 | - position: the position of an item being recommended (1, 2, or 3 correspond to left, center, and right position of the ZOZOTOWN recommendation interface, respectively).
36 | - click: target variable that indicates if an item was clicked (1) or not (0).
37 | - action_prob: the probability of an item being recommended at the given position.
38 | - user_features: user-related feature values.
39 | - user_item_affinity: user-item affinity scores induced by the number of past clicks observed between each user-item pair.
40 |
41 |
42 |
43 |
44 |
45 | Structure of Open Bandit Dataset
46 |
47 |
48 |
49 |
50 | **item_context.csv**
51 | - item_id: index of items as arms (index ranges from 0-80 in "All" campaign, 0-33 for "Men" campaign, and 0-46 "Women" campaign).
52 | - item feature 0-3: item related feature values
53 |
54 |
55 | Note that user and item features are now anonymized using a hash function.
56 |
57 | ## Contact
58 | For any question, feel free to contact:
59 |
60 | - The authors of the paper: saito@hanjuku-kaso.com
61 | - ZOZO Research: zozo-research@zozo.com
62 |
--------------------------------------------------------------------------------
/obd/README_JN.md:
--------------------------------------------------------------------------------
1 | # Open Bandit Dataset
2 |
3 | このディレクトリには, [実装例](https://github.com/st-tech/zr-obp/tree/master/examples)を実行するための少量(キャンペーンと行動ポリシーのペアごとに10,000レコード)のデータが含まれています. フルサイズ版のデータは[https://research.zozo.com/data.html](https://research.zozo.com/data.html)にて公開されています.
4 |
5 | この公開データセットに関する詳細な記述は以下の論文を参照してください:
6 |
7 | Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita.
8 | **Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms**
9 | [https://arxiv.org/abs/2008.07146](https://arxiv.org/abs/2008.07146)
10 |
11 | ## データセットの概要
12 | Open Bandit Datasetは, 大規模ファッションECサイト[ZOZOTOWN](https://zozo.jp/)において, 2つの多腕バンディット方策のA/Bテストによって構築されたものです. 現在のログデータ数は合計2600万以上であり, それぞれのデータは特徴量・方策によって選択されたファッションアイテム・真の傾向スコア・クリック有無ラベルによって構成されます. このデータセットは, 別のアルゴリズムによって生成されたデータを用いて反実仮想アルゴリズムの性能を予測するオフ方策評価 (off-policy evaluation)の性能を評価するのに特に適しています.
13 |
14 |
15 | ## 構成
16 | データセットの構成要素の詳細は以下の通りです.
17 |
18 | **{behavior_policy}/{campaign}.csv** (behavior_policy in (bts, random), campaign in (all, men, women))
19 | - timestamp: インプレッションのタイムスタンプ.
20 | - item_id: アイテムのインデックス(インデックスの範囲は「すべて」キャンペーンでは0~80, 「男性」キャンペーンでは0~33, 「女性」キャンペーンでは0~46).
21 | - position: 推薦されるアイテムの位置(1, 2, 3はそれぞれ[ZOZOTOWNの推薦インターフェース](../images/recommended_fashion_items.png)の左, 中央, 右の位置に対応)
22 | - click: アイテムがクリックされたか(1), されなかったか(0)を示す2値目的変数.
23 | - action_prob:与えられたpositionにアイテムが推薦された際に計算した推薦確率. 傾向スコア.
24 | - user_features:ユーザーに関連する特徴量. 匿名化の目的でハッシュ化されている.
25 | - user_item_affinity: それぞれのユーザとアイテムのペア間で観測された過去のクリック数に応じた関連度特徴量
26 |
27 | **item_context.csv** **item_context.csv** **item_context.csv**
28 | - item_id:アイテムのインデックス(インデックスの範囲は, 「すべて」キャンペーンでは0~79, 「男性」キャンペーンでは0~33, 「女性」キャンペーンでは0~45).
29 | - item feature 0-3:アイテムに関連するの特徴量.
30 |
31 |
32 |
33 |
34 |
35 | Open Bandit Datasetの構成
36 |
37 |
38 |
39 |
40 | なお, user featureとitem featureのそれぞれが何を表すかについては、現在公表されておりません.
41 | また, それぞれのfeatureの値は, ハッシュ関数を用いて匿名化されています.
42 |
43 | ## 連絡
44 | データセットに関する質問等は, 次のメールアドレスにご連絡いただくようお願いいたします:
45 |
46 | - 論文の著者: saito@hanjuku-kaso.com
47 | - ZOZO研究所: zozo-research@zozo.com
48 |
--------------------------------------------------------------------------------
/obd/bts/men/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.6771831139635117,c82d13885d8bf7a3b8b9fa6f0842ba60,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
3 | 1,1,-0.7202996418188664,77490d05a721c6d93edf580642ffd8bd,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
4 | 2,2,0.7456623052631924,77490d05a721c6d93edf580642ffd8bd,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
5 | 3,3,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,ff2de7df709624e5b79199b850382ea0,68f8b5168b2a322db725a6cd6f5c900b
6 | 4,4,1.6511093902256406,61a525de9976c0f3fa29d400caf26c56,ee987234ffe4f3d901846ac3f7417738,7a0c97ee71eb7985bd0a6271ce57cec5
7 | 5,5,0.14203091528822703,61a525de9976c0f3fa29d400caf26c56,bb7caf7f0c11f7827fb23b331777b871,8ea65bc866b36a8f00ae913e0c3acc29
8 | 6,6,1.6511093902256406,c82d13885d8bf7a3b8b9fa6f0842ba60,818dfe387422471f09a34db693a78212,7a0c97ee71eb7985bd0a6271ce57cec5
9 | 7,7,2.8583721701755715,61a525de9976c0f3fa29d400caf26c56,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
10 | 8,8,1.349293695238158,61a525de9976c0f3fa29d400caf26c56,7daaf8717f83289266063b6cc1728087,7a0c97ee71eb7985bd0a6271ce57cec5
11 | 9,9,1.1983858477444165,135f410ec21307919cd92df77f1e2a36,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
12 | 10,10,1.5864345984426087,135f410ec21307919cd92df77f1e2a36,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
13 | 11,11,0.44384661027570976,c82d13885d8bf7a3b8b9fa6f0842ba60,24ea3b3a472c51dd6299ebdfb220a55f,0c3b42b13b5a49fcb746da9f60e63717
14 | 12,12,1.1983858477444165,c82d13885d8bf7a3b8b9fa6f0842ba60,0e077f97ef2dcda0dc404f873fc5f96c,7a0c97ee71eb7985bd0a6271ce57cec5
15 | 13,13,0.6163127216971285,135f410ec21307919cd92df77f1e2a36,0e077f97ef2dcda0dc404f873fc5f96c,7a0c97ee71eb7985bd0a6271ce57cec5
16 | 14,14,-1.000557072878672,135f410ec21307919cd92df77f1e2a36,865945b5265169a2176a6e5f084ab2eb,8ea65bc866b36a8f00ae913e0c3acc29
17 | 15,15,-0.37536741897602904,c82d13885d8bf7a3b8b9fa6f0842ba60,786ff5d72b02d1e68a43508d9579977d,68f8b5168b2a322db725a6cd6f5c900b
18 | 16,16,-0.5909500582528024,c82d13885d8bf7a3b8b9fa6f0842ba60,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
19 | 17,17,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
20 | 18,18,-0.9143240171679625,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
21 | 19,19,-0.7634161696742211,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
22 | 20,20,-0.6125083221804798,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
23 | 21,21,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,ff2de7df709624e5b79199b850382ea0,68f8b5168b2a322db725a6cd6f5c900b
24 | 22,22,-0.6987413778911891,17ef71cb22e550d31e5eaa4d629c4abd,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
25 | 23,23,-0.5693917943251251,e1b1451d555c82a01874347dbecdfeae,01b306b40a448bff555c06d5d72c0171,7a0c97ee71eb7985bd0a6271ce57cec5
26 | 24,24,0.4222883463480324,f15de9aa508214df06454736b488717c,7daaf8717f83289266063b6cc1728087,7a0c97ee71eb7985bd0a6271ce57cec5
27 | 25,25,-0.4616004746867384,135f410ec21307919cd92df77f1e2a36,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
28 | 26,26,0.8965701527569339,77490d05a721c6d93edf580642ffd8bd,746facf4548f3da6d628b8e35bf9e6ec,7a0c97ee71eb7985bd0a6271ce57cec5
29 | 27,27,-0.8496492253849305,17ef71cb22e550d31e5eaa4d629c4abd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
30 | 28,28,-1.0652318646617038,17ef71cb22e550d31e5eaa4d629c4abd,a46137fea33ac48f0809591a76630ea5,68f8b5168b2a322db725a6cd6f5c900b
31 | 29,29,-0.8496492253849305,17ef71cb22e550d31e5eaa4d629c4abd,008dc8758000efaf5b318227fcb71f8d,8ea65bc866b36a8f00ae913e0c3acc29
32 | 30,30,-0.9143240171679625,17ef71cb22e550d31e5eaa4d629c4abd,865945b5265169a2176a6e5f084ab2eb,8ea65bc866b36a8f00ae913e0c3acc29
33 | 31,31,-0.4616004746867384,e1b1451d555c82a01874347dbecdfeae,008dc8758000efaf5b318227fcb71f8d,8ea65bc866b36a8f00ae913e0c3acc29
34 | 32,32,-0.5262752664697704,f15de9aa508214df06454736b488717c,a46137fea33ac48f0809591a76630ea5,68f8b5168b2a322db725a6cd6f5c900b
35 | 33,33,-0.6125083221804798,f15de9aa508214df06454736b488717c,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
36 |
--------------------------------------------------------------------------------
/obd/bts/women/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.3701057045375884,37784fea97b5827eeaf4a23dbff98b73,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
3 | 1,1,0.5251956676347125,3220392a73f0fb73e5509a3f6b89ae64,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
4 | 2,2,-0.13450008028171972,1f0bd59babc615f7876d70abd81b0703,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
5 | 3,3,-0.5350296415166964,37784fea97b5827eeaf4a23dbff98b73,30e4f82eec0c5210c403aab8007a5881,2951c610187f9e9e8281ecd31a156bd1
6 | 4,4,-0.25230289240965403,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
7 | 5,5,0.03042385669738834,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
8 | 6,6,-0.13450008028171972,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
9 | 7,7,-0.8436730092918844,54130721ea2331736ec3cd62c6ff2a0a,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
10 | 8,8,-0.8436730092918844,54130721ea2331736ec3cd62c6ff2a0a,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
11 | 9,9,-0.6080673850360158,54130721ea2331736ec3cd62c6ff2a0a,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
12 | 10,10,-0.6080673850360158,54130721ea2331736ec3cd62c6ff2a0a,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
13 | 11,11,0.36027173065560447,e88594e2095dc09c70763bd14b6bb16e,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
14 | 12,12,0.36027173065560447,e88594e2095dc09c70763bd14b6bb16e,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
15 | 13,13,2.9990547223213335,37784fea97b5827eeaf4a23dbff98b73,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
16 | 14,14,1.3498153525302528,1f0bd59babc615f7876d70abd81b0703,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
17 | 15,15,-0.25230289240965403,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
18 | 16,16,0.6901196046138206,1f0bd59babc615f7876d70abd81b0703,40b2c280a2676cf7e83a2c19a333d4a2,7ce347fef632da56f7d0cd2e3d96c9d2
19 | 17,17,2.339358974404901,1f0bd59babc615f7876d70abd81b0703,5e32ca87b332cb657386052c2962f06f,e6dceba864edcc7bf60d38616a52a13d
20 | 18,18,-0.39366626696317525,3220392a73f0fb73e5509a3f6b89ae64,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
21 | 19,19,-0.8648775154749125,37784fea97b5827eeaf4a23dbff98b73,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
22 | 20,20,0.8550435415929286,e88594e2095dc09c70763bd14b6bb16e,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
23 | 21,21,-0.6292718912190439,1f0bd59babc615f7876d70abd81b0703,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
24 | 22,22,0.8550435415929286,e88594e2095dc09c70763bd14b6bb16e,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
25 | 23,23,-0.2994240172608278,37784fea97b5827eeaf4a23dbff98b73,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
26 | 24,24,-0.7235141409213913,3220392a73f0fb73e5509a3f6b89ae64,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
27 | 25,25,-0.95911976517726,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
28 | 26,26,-0.6292718912190439,980e8ad619a60423e616b67cfb8e09b9,a164a8f4dbd09847e25a3956e12bccff,360f242a6660cf5ee5249dc3c197fe62
29 | 27,27,-0.6292718912190439,980e8ad619a60423e616b67cfb8e09b9,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
30 | 28,28,0.3367111682300176,72f3f67e8e9907b474c547847f8d5fd3,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
31 | 29,29,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,a3438007435a63dbe0ea33f5a0d1e84a,360f242a6660cf5ee5249dc3c197fe62
32 | 30,30,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
33 | 31,31,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,a3438007435a63dbe0ea33f5a0d1e84a,360f242a6660cf5ee5249dc3c197fe62
34 | 32,32,0.17178723125090953,72f3f67e8e9907b474c547847f8d5fd3,04a71d6c9b0aa3b9e462a6923d1e8393,25e55d04edea9bd0a20aff26ac263414
35 | 33,33,-0.32298457968641464,72f3f67e8e9907b474c547847f8d5fd3,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
36 | 34,34,0.5016351052091257,72f3f67e8e9907b474c547847f8d5fd3,5e32ca87b332cb657386052c2962f06f,e6dceba864edcc7bf60d38616a52a13d
37 | 35,35,-0.5585902039422833,72f3f67e8e9907b474c547847f8d5fd3,b02dadb348cf4ac330bf1d90cb80237e,2951c610187f9e9e8281ecd31a156bd1
38 | 36,36,-0.13450008028171972,3220392a73f0fb73e5509a3f6b89ae64,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
39 | 37,37,-0.39366626696317525,f3a3cc32a3967214164eb2709555b3f7,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
40 | 38,38,-0.3701057045375884,cd7b41b498ea6d9180ad3fd389422c39,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
41 | 39,39,-0.46434795423993586,734fc1b871abffa4db3be9bc16ad80f7,b02dadb348cf4ac330bf1d90cb80237e,2951c610187f9e9e8281ecd31a156bd1
42 | 40,40,4.15352228117509,e88594e2095dc09c70763bd14b6bb16e,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
43 | 41,41,0.03042385669738834,734fc1b871abffa4db3be9bc16ad80f7,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
44 | 42,42,-0.4879085166655227,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
45 | 43,43,-0.95911976517726,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
46 | 44,44,-0.46434795423993586,980e8ad619a60423e616b67cfb8e09b9,a164a8f4dbd09847e25a3956e12bccff,360f242a6660cf5ee5249dc3c197fe62
47 | 45,45,-0.7941958281981519,980e8ad619a60423e616b67cfb8e09b9,1b433010466b794694fc6f5f29eac0d8,360f242a6660cf5ee5249dc3c197fe62
48 |
--------------------------------------------------------------------------------
/obd/random/men/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.6771831139635117,ceca20033d7d36b74dc683ddfb804aa7,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
3 | 1,1,-0.7202996418188664,270de57201b8ec18df9a72ed7ecf20eb,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
4 | 2,2,0.7456623052631924,270de57201b8ec18df9a72ed7ecf20eb,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
5 | 3,3,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,1d8ba92fbaa83078dfe330d66b81e5d6,5cc21cc265333250f10b13783ab06472
6 | 4,4,1.6511093902256406,ca9488139d82dbbf68a4e71fc7fe52f9,f65e8237cca7eb6b12f4f009a28a6f72,14fb049a96497a5deef345c1c38b2467
7 | 5,5,0.14203091528822703,ca9488139d82dbbf68a4e71fc7fe52f9,571216af60c365e6a05e1c33c7041f5f,795091554fd8f6b4a0ca7df81bf50a64
8 | 6,6,1.6511093902256406,ceca20033d7d36b74dc683ddfb804aa7,d56aaef6375c7844851af69b354331ba,14fb049a96497a5deef345c1c38b2467
9 | 7,7,2.8583721701755715,ca9488139d82dbbf68a4e71fc7fe52f9,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
10 | 8,8,1.349293695238158,ca9488139d82dbbf68a4e71fc7fe52f9,ef0257571cb05e9c0bba5446f9cfb0c9,14fb049a96497a5deef345c1c38b2467
11 | 9,9,1.1983858477444165,cb4655bc2d2e54055efefb998883d6fe,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
12 | 10,10,1.5864345984426087,cb4655bc2d2e54055efefb998883d6fe,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
13 | 11,11,0.44384661027570976,ceca20033d7d36b74dc683ddfb804aa7,b1dbb432e49fb71cc3b3e820ff31f3ad,6893a4373a4e271e7f03b7a4bdfde4a3
14 | 12,12,1.1983858477444165,ceca20033d7d36b74dc683ddfb804aa7,09122ea36aaf2a8dff8f089286af7cf3,14fb049a96497a5deef345c1c38b2467
15 | 13,13,0.6163127216971285,cb4655bc2d2e54055efefb998883d6fe,09122ea36aaf2a8dff8f089286af7cf3,14fb049a96497a5deef345c1c38b2467
16 | 14,14,-1.000557072878672,cb4655bc2d2e54055efefb998883d6fe,ec5fb795fb7b3a111ad15e1506487535,795091554fd8f6b4a0ca7df81bf50a64
17 | 15,15,-0.37536741897602904,ceca20033d7d36b74dc683ddfb804aa7,e26d13daee6e371dead874b89752bbbe,5cc21cc265333250f10b13783ab06472
18 | 16,16,-0.5909500582528024,ceca20033d7d36b74dc683ddfb804aa7,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
19 | 17,17,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
20 | 18,18,-0.9143240171679625,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
21 | 19,19,-0.7634161696742211,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
22 | 20,20,-0.6125083221804798,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
23 | 21,21,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,1d8ba92fbaa83078dfe330d66b81e5d6,5cc21cc265333250f10b13783ab06472
24 | 22,22,-0.6987413778911891,dbb8044a5cc8d79d0e5c3cf996e2d0b9,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
25 | 23,23,-0.5693917943251251,0450516d22e9e70b0ee136549576d0e7,937bfc1b19face0ab0a21dddaeaf19cd,14fb049a96497a5deef345c1c38b2467
26 | 24,24,0.4222883463480324,314759c31d4b75b54dfbbeb887f7bbe8,ef0257571cb05e9c0bba5446f9cfb0c9,14fb049a96497a5deef345c1c38b2467
27 | 25,25,-0.4616004746867384,cb4655bc2d2e54055efefb998883d6fe,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
28 | 26,26,0.8965701527569339,270de57201b8ec18df9a72ed7ecf20eb,ff86755a0252ce6d030f37e89025f60f,14fb049a96497a5deef345c1c38b2467
29 | 27,27,-0.8496492253849305,dbb8044a5cc8d79d0e5c3cf996e2d0b9,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
30 | 28,28,-1.0652318646617038,dbb8044a5cc8d79d0e5c3cf996e2d0b9,3f7cf3ddf1cc36d8310a8c0a48187aa9,5cc21cc265333250f10b13783ab06472
31 | 29,29,-0.8496492253849305,dbb8044a5cc8d79d0e5c3cf996e2d0b9,5adc59d478af904390b1de5af7f33d45,795091554fd8f6b4a0ca7df81bf50a64
32 | 30,30,-0.9143240171679625,dbb8044a5cc8d79d0e5c3cf996e2d0b9,ec5fb795fb7b3a111ad15e1506487535,795091554fd8f6b4a0ca7df81bf50a64
33 | 31,31,-0.4616004746867384,0450516d22e9e70b0ee136549576d0e7,5adc59d478af904390b1de5af7f33d45,795091554fd8f6b4a0ca7df81bf50a64
34 | 32,32,-0.5262752664697704,314759c31d4b75b54dfbbeb887f7bbe8,3f7cf3ddf1cc36d8310a8c0a48187aa9,5cc21cc265333250f10b13783ab06472
35 | 33,33,-0.6125083221804798,314759c31d4b75b54dfbbeb887f7bbe8,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
36 |
--------------------------------------------------------------------------------
/obd/random/women/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.3701057045375884,01a0a328db2dd2a2e8d91bc43f204ba7,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
3 | 1,1,0.5251956676347125,dd868ca2c498f3384250f431e7767b34,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
4 | 2,2,-0.13450008028171972,252326b1475c78b26365ebc3430adca2,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
5 | 3,3,-0.5350296415166964,01a0a328db2dd2a2e8d91bc43f204ba7,d549c11ab8eb14045de2100d6ab90c86,6476528092c639c0ea8f74062f3dd1bb
6 | 4,4,-0.25230289240965403,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
7 | 5,5,0.03042385669738834,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
8 | 6,6,-0.13450008028171972,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
9 | 7,7,-0.8436730092918844,2f872b67f01f5f2f85b24eb87e99d52c,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
10 | 8,8,-0.8436730092918844,2f872b67f01f5f2f85b24eb87e99d52c,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
11 | 9,9,-0.6080673850360158,2f872b67f01f5f2f85b24eb87e99d52c,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
12 | 10,10,-0.6080673850360158,2f872b67f01f5f2f85b24eb87e99d52c,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
13 | 11,11,0.36027173065560447,ef42bd4fa577ce60a5b82b6781a08c64,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
14 | 12,12,0.36027173065560447,ef42bd4fa577ce60a5b82b6781a08c64,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
15 | 13,13,2.9990547223213335,01a0a328db2dd2a2e8d91bc43f204ba7,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
16 | 14,14,1.3498153525302528,252326b1475c78b26365ebc3430adca2,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
17 | 15,15,-0.25230289240965403,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
18 | 16,16,0.6901196046138206,252326b1475c78b26365ebc3430adca2,0409e7011c80bccc0ff6442a03d05b29,c395d5f54cf50e223953258801be2697
19 | 17,17,2.339358974404901,252326b1475c78b26365ebc3430adca2,836017345da8a6725b8eed235c5ec3d0,75b8605bfcb7433d5bd178b3a0a2d38c
20 | 18,18,-0.39366626696317525,dd868ca2c498f3384250f431e7767b34,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
21 | 19,19,-0.8648775154749125,01a0a328db2dd2a2e8d91bc43f204ba7,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
22 | 20,20,0.8550435415929286,ef42bd4fa577ce60a5b82b6781a08c64,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
23 | 21,21,-0.6292718912190439,252326b1475c78b26365ebc3430adca2,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
24 | 22,22,0.8550435415929286,ef42bd4fa577ce60a5b82b6781a08c64,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
25 | 23,23,-0.2994240172608278,01a0a328db2dd2a2e8d91bc43f204ba7,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
26 | 24,24,-0.7235141409213913,dd868ca2c498f3384250f431e7767b34,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
27 | 25,25,-0.95911976517726,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
28 | 26,26,-0.6292718912190439,4c508776e494a9f4bc302b34fdc6e76e,683522ca22eaee449c5ac25c2a84ee52,465917095d1b8b7359e781ee782c2c26
29 | 27,27,-0.6292718912190439,4c508776e494a9f4bc302b34fdc6e76e,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
30 | 28,28,0.3367111682300176,de083a9403b58424cb3834909131a6de,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
31 | 29,29,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e98a5a6ce8eca89f5d9084dee8079f60,465917095d1b8b7359e781ee782c2c26
32 | 30,30,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
33 | 31,31,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e98a5a6ce8eca89f5d9084dee8079f60,465917095d1b8b7359e781ee782c2c26
34 | 32,32,0.17178723125090953,de083a9403b58424cb3834909131a6de,75fb3fbc11695c908a1397f96079949b,7ab06c804ac515866a347cb9a54bf2c8
35 | 33,33,-0.32298457968641464,de083a9403b58424cb3834909131a6de,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
36 | 34,34,0.5016351052091257,de083a9403b58424cb3834909131a6de,836017345da8a6725b8eed235c5ec3d0,75b8605bfcb7433d5bd178b3a0a2d38c
37 | 35,35,-0.5585902039422833,de083a9403b58424cb3834909131a6de,7e7fdf8c70a61405fea41ab1bf7cca25,6476528092c639c0ea8f74062f3dd1bb
38 | 36,36,-0.13450008028171972,dd868ca2c498f3384250f431e7767b34,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
39 | 37,37,-0.39366626696317525,5c1e1f8eb530ea4363c04483cd523ac4,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
40 | 38,38,-0.3701057045375884,3f7aceec173a91029fead403c0fa4bc9,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
41 | 39,39,-0.46434795423993586,a37dab32ea544e235487fb30dc1b29f1,7e7fdf8c70a61405fea41ab1bf7cca25,6476528092c639c0ea8f74062f3dd1bb
42 | 40,40,4.15352228117509,ef42bd4fa577ce60a5b82b6781a08c64,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
43 | 41,41,0.03042385669738834,a37dab32ea544e235487fb30dc1b29f1,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
44 | 42,42,-0.4879085166655227,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
45 | 43,43,-0.95911976517726,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
46 | 44,44,-0.46434795423993586,4c508776e494a9f4bc302b34fdc6e76e,683522ca22eaee449c5ac25c2a84ee52,465917095d1b8b7359e781ee782c2c26
47 | 45,45,-0.7941958281981519,4c508776e494a9f4bc302b34fdc6e76e,14692cff9f8196fb8846653310d39719,465917095d1b8b7359e781ee782c2c26
48 |
--------------------------------------------------------------------------------
/obp/__init__.py:
--------------------------------------------------------------------------------
1 | from obp import dataset
2 | from obp import ope
3 | from obp import policy
4 | from obp import simulator
5 | from obp import types
6 | from obp import utils
7 | from obp.version import __version__ # noqa
8 |
9 |
10 | __all__ = ["dataset", "ope", "policy", "simulator", "types", "utils", "version"]
11 |
--------------------------------------------------------------------------------
/obp/dataset/__init__.py:
--------------------------------------------------------------------------------
1 | from obp.dataset.base import BaseBanditDataset
2 | from obp.dataset.base import BaseRealBanditDataset
3 | from obp.dataset.multiclass import MultiClassToBanditReduction
4 | from obp.dataset.real import OpenBanditDataset
5 | from obp.dataset.synthetic import linear_behavior_policy
6 | from obp.dataset.synthetic import linear_reward_function
7 | from obp.dataset.synthetic import logistic_polynomial_reward_function
8 | from obp.dataset.synthetic import logistic_reward_function
9 | from obp.dataset.synthetic import logistic_sparse_reward_function
10 | from obp.dataset.synthetic import polynomial_behavior_policy
11 | from obp.dataset.synthetic import polynomial_reward_function
12 | from obp.dataset.synthetic import sparse_reward_function
13 | from obp.dataset.synthetic import SyntheticBanditDataset
14 | from obp.dataset.synthetic_continuous import linear_behavior_policy_continuous
15 | from obp.dataset.synthetic_continuous import linear_reward_funcion_continuous
16 | from obp.dataset.synthetic_continuous import linear_synthetic_policy_continuous
17 | from obp.dataset.synthetic_continuous import quadratic_reward_funcion_continuous
18 | from obp.dataset.synthetic_continuous import sign_synthetic_policy_continuous
19 | from obp.dataset.synthetic_continuous import SyntheticContinuousBanditDataset
20 | from obp.dataset.synthetic_continuous import threshold_synthetic_policy_continuous
21 | from obp.dataset.synthetic_embed import SyntheticBanditDatasetWithActionEmbeds
22 | from obp.dataset.synthetic_multi import SyntheticMultiLoggersBanditDataset
23 | from obp.dataset.synthetic_slate import action_interaction_reward_function
24 | from obp.dataset.synthetic_slate import linear_behavior_policy_logit
25 | from obp.dataset.synthetic_slate import SyntheticSlateBanditDataset
26 |
27 |
28 | __all__ = [
29 | "BaseBanditDataset",
30 | "BaseRealBanditDataset",
31 | "OpenBanditDataset",
32 | "SyntheticBanditDataset",
33 | "logistic_reward_function",
34 | "logistic_polynomial_reward_function",
35 | "logistic_sparse_reward_function",
36 | "linear_reward_function",
37 | "polynomial_reward_function",
38 | "sparse_reward_function",
39 | "linear_behavior_policy",
40 | "polynomial_behavior_policy",
41 | "MultiClassToBanditReduction",
42 | "SyntheticContinuousBanditDataset",
43 | "linear_reward_funcion_continuous",
44 | "quadratic_reward_funcion_continuous",
45 | "linear_behavior_policy_continuous",
46 | "linear_synthetic_policy_continuous",
47 | "threshold_synthetic_policy_continuous",
48 | "sign_synthetic_policy_continuous",
49 | "SyntheticSlateBanditDataset",
50 | "action_interaction_reward_function",
51 | "linear_behavior_policy_logit",
52 | "SyntheticBanditDatasetWithActionEmbeds",
53 | "SyntheticMultiLoggersBanditDataset",
54 | ]
55 |
--------------------------------------------------------------------------------
/obp/dataset/base.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Yuta Saito, Yusuke Narita, and ZOZO Technologies, Inc. All rights reserved.
2 | # Licensed under the Apache 2.0 License.
3 |
4 | """Abstract Base Class for Logged Bandit Feedback."""
5 | from abc import ABCMeta
6 | from abc import abstractmethod
7 |
8 |
9 | class BaseBanditDataset(metaclass=ABCMeta):
10 | """Base Class for Synthetic Bandit Dataset."""
11 |
12 | @abstractmethod
13 | def obtain_batch_bandit_feedback(self) -> None:
14 | """Obtain batch logged bandit data."""
15 | raise NotImplementedError
16 |
17 |
18 | class BaseRealBanditDataset(BaseBanditDataset):
19 | """Base Class for Real-World Bandit Dataset."""
20 |
21 | @abstractmethod
22 | def load_raw_data(self) -> None:
23 | """Load raw dataset."""
24 | raise NotImplementedError
25 |
26 | @abstractmethod
27 | def pre_process(self) -> None:
28 | """Preprocess raw dataset."""
29 | raise NotImplementedError
30 |
--------------------------------------------------------------------------------
/obp/dataset/obd/bts/men/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.6771831139635117,c82d13885d8bf7a3b8b9fa6f0842ba60,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
3 | 1,1,-0.7202996418188664,77490d05a721c6d93edf580642ffd8bd,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
4 | 2,2,0.7456623052631924,77490d05a721c6d93edf580642ffd8bd,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
5 | 3,3,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,ff2de7df709624e5b79199b850382ea0,68f8b5168b2a322db725a6cd6f5c900b
6 | 4,4,1.6511093902256406,61a525de9976c0f3fa29d400caf26c56,ee987234ffe4f3d901846ac3f7417738,7a0c97ee71eb7985bd0a6271ce57cec5
7 | 5,5,0.14203091528822703,61a525de9976c0f3fa29d400caf26c56,bb7caf7f0c11f7827fb23b331777b871,8ea65bc866b36a8f00ae913e0c3acc29
8 | 6,6,1.6511093902256406,c82d13885d8bf7a3b8b9fa6f0842ba60,818dfe387422471f09a34db693a78212,7a0c97ee71eb7985bd0a6271ce57cec5
9 | 7,7,2.8583721701755715,61a525de9976c0f3fa29d400caf26c56,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
10 | 8,8,1.349293695238158,61a525de9976c0f3fa29d400caf26c56,7daaf8717f83289266063b6cc1728087,7a0c97ee71eb7985bd0a6271ce57cec5
11 | 9,9,1.1983858477444165,135f410ec21307919cd92df77f1e2a36,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
12 | 10,10,1.5864345984426087,135f410ec21307919cd92df77f1e2a36,bfcce809dad48aadc7fcbe714f9eabd7,7a0c97ee71eb7985bd0a6271ce57cec5
13 | 11,11,0.44384661027570976,c82d13885d8bf7a3b8b9fa6f0842ba60,24ea3b3a472c51dd6299ebdfb220a55f,0c3b42b13b5a49fcb746da9f60e63717
14 | 12,12,1.1983858477444165,c82d13885d8bf7a3b8b9fa6f0842ba60,0e077f97ef2dcda0dc404f873fc5f96c,7a0c97ee71eb7985bd0a6271ce57cec5
15 | 13,13,0.6163127216971285,135f410ec21307919cd92df77f1e2a36,0e077f97ef2dcda0dc404f873fc5f96c,7a0c97ee71eb7985bd0a6271ce57cec5
16 | 14,14,-1.000557072878672,135f410ec21307919cd92df77f1e2a36,865945b5265169a2176a6e5f084ab2eb,8ea65bc866b36a8f00ae913e0c3acc29
17 | 15,15,-0.37536741897602904,c82d13885d8bf7a3b8b9fa6f0842ba60,786ff5d72b02d1e68a43508d9579977d,68f8b5168b2a322db725a6cd6f5c900b
18 | 16,16,-0.5909500582528024,c82d13885d8bf7a3b8b9fa6f0842ba60,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
19 | 17,17,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
20 | 18,18,-0.9143240171679625,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
21 | 19,19,-0.7634161696742211,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
22 | 20,20,-0.6125083221804798,77490d05a721c6d93edf580642ffd8bd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
23 | 21,21,-0.6987413778911891,135f410ec21307919cd92df77f1e2a36,ff2de7df709624e5b79199b850382ea0,68f8b5168b2a322db725a6cd6f5c900b
24 | 22,22,-0.6987413778911891,17ef71cb22e550d31e5eaa4d629c4abd,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
25 | 23,23,-0.5693917943251251,e1b1451d555c82a01874347dbecdfeae,01b306b40a448bff555c06d5d72c0171,7a0c97ee71eb7985bd0a6271ce57cec5
26 | 24,24,0.4222883463480324,f15de9aa508214df06454736b488717c,7daaf8717f83289266063b6cc1728087,7a0c97ee71eb7985bd0a6271ce57cec5
27 | 25,25,-0.4616004746867384,135f410ec21307919cd92df77f1e2a36,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
28 | 26,26,0.8965701527569339,77490d05a721c6d93edf580642ffd8bd,746facf4548f3da6d628b8e35bf9e6ec,7a0c97ee71eb7985bd0a6271ce57cec5
29 | 27,27,-0.8496492253849305,17ef71cb22e550d31e5eaa4d629c4abd,32338184693488f3a469822fd0a08387,68f8b5168b2a322db725a6cd6f5c900b
30 | 28,28,-1.0652318646617038,17ef71cb22e550d31e5eaa4d629c4abd,a46137fea33ac48f0809591a76630ea5,68f8b5168b2a322db725a6cd6f5c900b
31 | 29,29,-0.8496492253849305,17ef71cb22e550d31e5eaa4d629c4abd,008dc8758000efaf5b318227fcb71f8d,8ea65bc866b36a8f00ae913e0c3acc29
32 | 30,30,-0.9143240171679625,17ef71cb22e550d31e5eaa4d629c4abd,865945b5265169a2176a6e5f084ab2eb,8ea65bc866b36a8f00ae913e0c3acc29
33 | 31,31,-0.4616004746867384,e1b1451d555c82a01874347dbecdfeae,008dc8758000efaf5b318227fcb71f8d,8ea65bc866b36a8f00ae913e0c3acc29
34 | 32,32,-0.5262752664697704,f15de9aa508214df06454736b488717c,a46137fea33ac48f0809591a76630ea5,68f8b5168b2a322db725a6cd6f5c900b
35 | 33,33,-0.6125083221804798,f15de9aa508214df06454736b488717c,088abf8a8657959e46ac19af8da80d15,8ea65bc866b36a8f00ae913e0c3acc29
36 |
--------------------------------------------------------------------------------
/obp/dataset/obd/bts/women/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.3701057045375884,37784fea97b5827eeaf4a23dbff98b73,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
3 | 1,1,0.5251956676347125,3220392a73f0fb73e5509a3f6b89ae64,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
4 | 2,2,-0.13450008028171972,1f0bd59babc615f7876d70abd81b0703,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
5 | 3,3,-0.5350296415166964,37784fea97b5827eeaf4a23dbff98b73,30e4f82eec0c5210c403aab8007a5881,2951c610187f9e9e8281ecd31a156bd1
6 | 4,4,-0.25230289240965403,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
7 | 5,5,0.03042385669738834,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
8 | 6,6,-0.13450008028171972,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
9 | 7,7,-0.8436730092918844,54130721ea2331736ec3cd62c6ff2a0a,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
10 | 8,8,-0.8436730092918844,54130721ea2331736ec3cd62c6ff2a0a,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
11 | 9,9,-0.6080673850360158,54130721ea2331736ec3cd62c6ff2a0a,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
12 | 10,10,-0.6080673850360158,54130721ea2331736ec3cd62c6ff2a0a,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
13 | 11,11,0.36027173065560447,e88594e2095dc09c70763bd14b6bb16e,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
14 | 12,12,0.36027173065560447,e88594e2095dc09c70763bd14b6bb16e,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
15 | 13,13,2.9990547223213335,37784fea97b5827eeaf4a23dbff98b73,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
16 | 14,14,1.3498153525302528,1f0bd59babc615f7876d70abd81b0703,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
17 | 15,15,-0.25230289240965403,3220392a73f0fb73e5509a3f6b89ae64,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
18 | 16,16,0.6901196046138206,1f0bd59babc615f7876d70abd81b0703,40b2c280a2676cf7e83a2c19a333d4a2,7ce347fef632da56f7d0cd2e3d96c9d2
19 | 17,17,2.339358974404901,1f0bd59babc615f7876d70abd81b0703,5e32ca87b332cb657386052c2962f06f,e6dceba864edcc7bf60d38616a52a13d
20 | 18,18,-0.39366626696317525,3220392a73f0fb73e5509a3f6b89ae64,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
21 | 19,19,-0.8648775154749125,37784fea97b5827eeaf4a23dbff98b73,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
22 | 20,20,0.8550435415929286,e88594e2095dc09c70763bd14b6bb16e,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
23 | 21,21,-0.6292718912190439,1f0bd59babc615f7876d70abd81b0703,c72dd70c97975aad5865c138b5c3c501,360f242a6660cf5ee5249dc3c197fe62
24 | 22,22,0.8550435415929286,e88594e2095dc09c70763bd14b6bb16e,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
25 | 23,23,-0.2994240172608278,37784fea97b5827eeaf4a23dbff98b73,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
26 | 24,24,-0.7235141409213913,3220392a73f0fb73e5509a3f6b89ae64,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
27 | 25,25,-0.95911976517726,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
28 | 26,26,-0.6292718912190439,980e8ad619a60423e616b67cfb8e09b9,a164a8f4dbd09847e25a3956e12bccff,360f242a6660cf5ee5249dc3c197fe62
29 | 27,27,-0.6292718912190439,980e8ad619a60423e616b67cfb8e09b9,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
30 | 28,28,0.3367111682300176,72f3f67e8e9907b474c547847f8d5fd3,92b8c61a1a556299172a5705f5a927db,360f242a6660cf5ee5249dc3c197fe62
31 | 29,29,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,a3438007435a63dbe0ea33f5a0d1e84a,360f242a6660cf5ee5249dc3c197fe62
32 | 30,30,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
33 | 31,31,-0.2994240172608278,980e8ad619a60423e616b67cfb8e09b9,a3438007435a63dbe0ea33f5a0d1e84a,360f242a6660cf5ee5249dc3c197fe62
34 | 32,32,0.17178723125090953,72f3f67e8e9907b474c547847f8d5fd3,04a71d6c9b0aa3b9e462a6923d1e8393,25e55d04edea9bd0a20aff26ac263414
35 | 33,33,-0.32298457968641464,72f3f67e8e9907b474c547847f8d5fd3,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
36 | 34,34,0.5016351052091257,72f3f67e8e9907b474c547847f8d5fd3,5e32ca87b332cb657386052c2962f06f,e6dceba864edcc7bf60d38616a52a13d
37 | 35,35,-0.5585902039422833,72f3f67e8e9907b474c547847f8d5fd3,b02dadb348cf4ac330bf1d90cb80237e,2951c610187f9e9e8281ecd31a156bd1
38 | 36,36,-0.13450008028171972,3220392a73f0fb73e5509a3f6b89ae64,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
39 | 37,37,-0.39366626696317525,f3a3cc32a3967214164eb2709555b3f7,5e886301a0b2b816d35a1209d156acdd,2951c610187f9e9e8281ecd31a156bd1
40 | 38,38,-0.3701057045375884,cd7b41b498ea6d9180ad3fd389422c39,d746b3324d88d353b5e1b82780f4d180,680f9d18b8f0ffa4633d41a7738c3c57
41 | 39,39,-0.46434795423993586,734fc1b871abffa4db3be9bc16ad80f7,b02dadb348cf4ac330bf1d90cb80237e,2951c610187f9e9e8281ecd31a156bd1
42 | 40,40,4.15352228117509,e88594e2095dc09c70763bd14b6bb16e,cedae94e7ca42bac679afaf582fda539,e6dceba864edcc7bf60d38616a52a13d
43 | 41,41,0.03042385669738834,734fc1b871abffa4db3be9bc16ad80f7,e1f84cd2715873f04359fb55b370f328,9ef568e90bed3b76bc560e33435f7c1d
44 | 42,42,-0.4879085166655227,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
45 | 43,43,-0.95911976517726,980e8ad619a60423e616b67cfb8e09b9,9dc794e30838384fe6068c9636d35d39,360f242a6660cf5ee5249dc3c197fe62
46 | 44,44,-0.46434795423993586,980e8ad619a60423e616b67cfb8e09b9,a164a8f4dbd09847e25a3956e12bccff,360f242a6660cf5ee5249dc3c197fe62
47 | 45,45,-0.7941958281981519,980e8ad619a60423e616b67cfb8e09b9,1b433010466b794694fc6f5f29eac0d8,360f242a6660cf5ee5249dc3c197fe62
48 |
--------------------------------------------------------------------------------
/obp/dataset/obd/random/men/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.6771831139635117,ceca20033d7d36b74dc683ddfb804aa7,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
3 | 1,1,-0.7202996418188664,270de57201b8ec18df9a72ed7ecf20eb,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
4 | 2,2,0.7456623052631924,270de57201b8ec18df9a72ed7ecf20eb,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
5 | 3,3,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,1d8ba92fbaa83078dfe330d66b81e5d6,5cc21cc265333250f10b13783ab06472
6 | 4,4,1.6511093902256406,ca9488139d82dbbf68a4e71fc7fe52f9,f65e8237cca7eb6b12f4f009a28a6f72,14fb049a96497a5deef345c1c38b2467
7 | 5,5,0.14203091528822703,ca9488139d82dbbf68a4e71fc7fe52f9,571216af60c365e6a05e1c33c7041f5f,795091554fd8f6b4a0ca7df81bf50a64
8 | 6,6,1.6511093902256406,ceca20033d7d36b74dc683ddfb804aa7,d56aaef6375c7844851af69b354331ba,14fb049a96497a5deef345c1c38b2467
9 | 7,7,2.8583721701755715,ca9488139d82dbbf68a4e71fc7fe52f9,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
10 | 8,8,1.349293695238158,ca9488139d82dbbf68a4e71fc7fe52f9,ef0257571cb05e9c0bba5446f9cfb0c9,14fb049a96497a5deef345c1c38b2467
11 | 9,9,1.1983858477444165,cb4655bc2d2e54055efefb998883d6fe,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
12 | 10,10,1.5864345984426087,cb4655bc2d2e54055efefb998883d6fe,2ba5916b91fd2d0688459ba79f033a9b,14fb049a96497a5deef345c1c38b2467
13 | 11,11,0.44384661027570976,ceca20033d7d36b74dc683ddfb804aa7,b1dbb432e49fb71cc3b3e820ff31f3ad,6893a4373a4e271e7f03b7a4bdfde4a3
14 | 12,12,1.1983858477444165,ceca20033d7d36b74dc683ddfb804aa7,09122ea36aaf2a8dff8f089286af7cf3,14fb049a96497a5deef345c1c38b2467
15 | 13,13,0.6163127216971285,cb4655bc2d2e54055efefb998883d6fe,09122ea36aaf2a8dff8f089286af7cf3,14fb049a96497a5deef345c1c38b2467
16 | 14,14,-1.000557072878672,cb4655bc2d2e54055efefb998883d6fe,ec5fb795fb7b3a111ad15e1506487535,795091554fd8f6b4a0ca7df81bf50a64
17 | 15,15,-0.37536741897602904,ceca20033d7d36b74dc683ddfb804aa7,e26d13daee6e371dead874b89752bbbe,5cc21cc265333250f10b13783ab06472
18 | 16,16,-0.5909500582528024,ceca20033d7d36b74dc683ddfb804aa7,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
19 | 17,17,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
20 | 18,18,-0.9143240171679625,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
21 | 19,19,-0.7634161696742211,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
22 | 20,20,-0.6125083221804798,270de57201b8ec18df9a72ed7ecf20eb,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
23 | 21,21,-0.6987413778911891,cb4655bc2d2e54055efefb998883d6fe,1d8ba92fbaa83078dfe330d66b81e5d6,5cc21cc265333250f10b13783ab06472
24 | 22,22,-0.6987413778911891,dbb8044a5cc8d79d0e5c3cf996e2d0b9,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
25 | 23,23,-0.5693917943251251,0450516d22e9e70b0ee136549576d0e7,937bfc1b19face0ab0a21dddaeaf19cd,14fb049a96497a5deef345c1c38b2467
26 | 24,24,0.4222883463480324,314759c31d4b75b54dfbbeb887f7bbe8,ef0257571cb05e9c0bba5446f9cfb0c9,14fb049a96497a5deef345c1c38b2467
27 | 25,25,-0.4616004746867384,cb4655bc2d2e54055efefb998883d6fe,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
28 | 26,26,0.8965701527569339,270de57201b8ec18df9a72ed7ecf20eb,ff86755a0252ce6d030f37e89025f60f,14fb049a96497a5deef345c1c38b2467
29 | 27,27,-0.8496492253849305,dbb8044a5cc8d79d0e5c3cf996e2d0b9,03053cdb09aecdd139df91ac8068987d,5cc21cc265333250f10b13783ab06472
30 | 28,28,-1.0652318646617038,dbb8044a5cc8d79d0e5c3cf996e2d0b9,3f7cf3ddf1cc36d8310a8c0a48187aa9,5cc21cc265333250f10b13783ab06472
31 | 29,29,-0.8496492253849305,dbb8044a5cc8d79d0e5c3cf996e2d0b9,5adc59d478af904390b1de5af7f33d45,795091554fd8f6b4a0ca7df81bf50a64
32 | 30,30,-0.9143240171679625,dbb8044a5cc8d79d0e5c3cf996e2d0b9,ec5fb795fb7b3a111ad15e1506487535,795091554fd8f6b4a0ca7df81bf50a64
33 | 31,31,-0.4616004746867384,0450516d22e9e70b0ee136549576d0e7,5adc59d478af904390b1de5af7f33d45,795091554fd8f6b4a0ca7df81bf50a64
34 | 32,32,-0.5262752664697704,314759c31d4b75b54dfbbeb887f7bbe8,3f7cf3ddf1cc36d8310a8c0a48187aa9,5cc21cc265333250f10b13783ab06472
35 | 33,33,-0.6125083221804798,314759c31d4b75b54dfbbeb887f7bbe8,eb6f942c01859574cb88d2e62bf84354,795091554fd8f6b4a0ca7df81bf50a64
36 |
--------------------------------------------------------------------------------
/obp/dataset/obd/random/women/item_context.csv:
--------------------------------------------------------------------------------
1 | ,item_id,item_feature_0,item_feature_1,item_feature_2,item_feature_3
2 | 0,0,-0.3701057045375884,01a0a328db2dd2a2e8d91bc43f204ba7,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
3 | 1,1,0.5251956676347125,dd868ca2c498f3384250f431e7767b34,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
4 | 2,2,-0.13450008028171972,252326b1475c78b26365ebc3430adca2,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
5 | 3,3,-0.5350296415166964,01a0a328db2dd2a2e8d91bc43f204ba7,d549c11ab8eb14045de2100d6ab90c86,6476528092c639c0ea8f74062f3dd1bb
6 | 4,4,-0.25230289240965403,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
7 | 5,5,0.03042385669738834,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
8 | 6,6,-0.13450008028171972,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
9 | 7,7,-0.8436730092918844,2f872b67f01f5f2f85b24eb87e99d52c,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
10 | 8,8,-0.8436730092918844,2f872b67f01f5f2f85b24eb87e99d52c,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
11 | 9,9,-0.6080673850360158,2f872b67f01f5f2f85b24eb87e99d52c,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
12 | 10,10,-0.6080673850360158,2f872b67f01f5f2f85b24eb87e99d52c,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
13 | 11,11,0.36027173065560447,ef42bd4fa577ce60a5b82b6781a08c64,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
14 | 12,12,0.36027173065560447,ef42bd4fa577ce60a5b82b6781a08c64,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
15 | 13,13,2.9990547223213335,01a0a328db2dd2a2e8d91bc43f204ba7,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
16 | 14,14,1.3498153525302528,252326b1475c78b26365ebc3430adca2,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
17 | 15,15,-0.25230289240965403,dd868ca2c498f3384250f431e7767b34,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
18 | 16,16,0.6901196046138206,252326b1475c78b26365ebc3430adca2,0409e7011c80bccc0ff6442a03d05b29,c395d5f54cf50e223953258801be2697
19 | 17,17,2.339358974404901,252326b1475c78b26365ebc3430adca2,836017345da8a6725b8eed235c5ec3d0,75b8605bfcb7433d5bd178b3a0a2d38c
20 | 18,18,-0.39366626696317525,dd868ca2c498f3384250f431e7767b34,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
21 | 19,19,-0.8648775154749125,01a0a328db2dd2a2e8d91bc43f204ba7,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
22 | 20,20,0.8550435415929286,ef42bd4fa577ce60a5b82b6781a08c64,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
23 | 21,21,-0.6292718912190439,252326b1475c78b26365ebc3430adca2,d34d1f81e32b9e570fab69c523409c8d,465917095d1b8b7359e781ee782c2c26
24 | 22,22,0.8550435415929286,ef42bd4fa577ce60a5b82b6781a08c64,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
25 | 23,23,-0.2994240172608278,01a0a328db2dd2a2e8d91bc43f204ba7,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
26 | 24,24,-0.7235141409213913,dd868ca2c498f3384250f431e7767b34,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
27 | 25,25,-0.95911976517726,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
28 | 26,26,-0.6292718912190439,4c508776e494a9f4bc302b34fdc6e76e,683522ca22eaee449c5ac25c2a84ee52,465917095d1b8b7359e781ee782c2c26
29 | 27,27,-0.6292718912190439,4c508776e494a9f4bc302b34fdc6e76e,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
30 | 28,28,0.3367111682300176,de083a9403b58424cb3834909131a6de,9f43744cf9ae18357d9da7e6b130d3ea,465917095d1b8b7359e781ee782c2c26
31 | 29,29,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e98a5a6ce8eca89f5d9084dee8079f60,465917095d1b8b7359e781ee782c2c26
32 | 30,30,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
33 | 31,31,-0.2994240172608278,4c508776e494a9f4bc302b34fdc6e76e,e98a5a6ce8eca89f5d9084dee8079f60,465917095d1b8b7359e781ee782c2c26
34 | 32,32,0.17178723125090953,de083a9403b58424cb3834909131a6de,75fb3fbc11695c908a1397f96079949b,7ab06c804ac515866a347cb9a54bf2c8
35 | 33,33,-0.32298457968641464,de083a9403b58424cb3834909131a6de,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
36 | 34,34,0.5016351052091257,de083a9403b58424cb3834909131a6de,836017345da8a6725b8eed235c5ec3d0,75b8605bfcb7433d5bd178b3a0a2d38c
37 | 35,35,-0.5585902039422833,de083a9403b58424cb3834909131a6de,7e7fdf8c70a61405fea41ab1bf7cca25,6476528092c639c0ea8f74062f3dd1bb
38 | 36,36,-0.13450008028171972,dd868ca2c498f3384250f431e7767b34,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
39 | 37,37,-0.39366626696317525,5c1e1f8eb530ea4363c04483cd523ac4,e0b3b6a06ee21c261d6937e51fef2f47,6476528092c639c0ea8f74062f3dd1bb
40 | 38,38,-0.3701057045375884,3f7aceec173a91029fead403c0fa4bc9,5eb5c6e65b24468fef997e461cb97425,0d1d6660a28b567ddedbaa991a056feb
41 | 39,39,-0.46434795423993586,a37dab32ea544e235487fb30dc1b29f1,7e7fdf8c70a61405fea41ab1bf7cca25,6476528092c639c0ea8f74062f3dd1bb
42 | 40,40,4.15352228117509,ef42bd4fa577ce60a5b82b6781a08c64,ae34b8819f09a1df5bc36a3ebb2ca7c1,75b8605bfcb7433d5bd178b3a0a2d38c
43 | 41,41,0.03042385669738834,a37dab32ea544e235487fb30dc1b29f1,8378abb75da7f75587cb8cd4b687c929,82f96bb81ce9feeb0c973d24adccc347
44 | 42,42,-0.4879085166655227,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
45 | 43,43,-0.95911976517726,4c508776e494a9f4bc302b34fdc6e76e,e82eb366fa5e481c13b09c24eeeb036d,465917095d1b8b7359e781ee782c2c26
46 | 44,44,-0.46434795423993586,4c508776e494a9f4bc302b34fdc6e76e,683522ca22eaee449c5ac25c2a84ee52,465917095d1b8b7359e781ee782c2c26
47 | 45,45,-0.7941958281981519,4c508776e494a9f4bc302b34fdc6e76e,14692cff9f8196fb8846653310d39719,465917095d1b8b7359e781ee782c2c26
48 |
--------------------------------------------------------------------------------
/obp/dataset/reward_type.py:
--------------------------------------------------------------------------------
1 | import enum
2 |
3 |
4 | class RewardType(enum.Enum):
5 | """Reward type.
6 |
7 | Attributes
8 | ----------
9 | BINARY:
10 | The reward type is binary.
11 | CONTINUOUS:
12 | The reward type is continuous.
13 | """
14 |
15 | BINARY = "binary"
16 | CONTINUOUS = "continuous"
17 |
18 | def __repr__(self) -> str:
19 |
20 | return str(self)
21 |
--------------------------------------------------------------------------------
/obp/ope/__init__.py:
--------------------------------------------------------------------------------
1 | from obp.ope.classification_model import ImportanceWeightEstimator
2 | from obp.ope.classification_model import PropensityScoreEstimator
3 | from obp.ope.estimators import BalancedInverseProbabilityWeighting
4 | from obp.ope.estimators import BaseOffPolicyEstimator
5 | from obp.ope.estimators import DirectMethod
6 | from obp.ope.estimators import DoublyRobust
7 | from obp.ope.estimators import DoublyRobustWithShrinkage
8 | from obp.ope.estimators import InverseProbabilityWeighting
9 | from obp.ope.estimators import ReplayMethod
10 | from obp.ope.estimators import SelfNormalizedDoublyRobust
11 | from obp.ope.estimators import SelfNormalizedInverseProbabilityWeighting
12 | from obp.ope.estimators import SubGaussianDoublyRobust
13 | from obp.ope.estimators import SubGaussianInverseProbabilityWeighting
14 | from obp.ope.estimators import SwitchDoublyRobust
15 | from obp.ope.estimators_continuous import (
16 | KernelizedSelfNormalizedInverseProbabilityWeighting,
17 | )
18 | from obp.ope.estimators_continuous import BaseContinuousOffPolicyEstimator
19 | from obp.ope.estimators_continuous import cosine_kernel
20 | from obp.ope.estimators_continuous import epanechnikov_kernel
21 | from obp.ope.estimators_continuous import gaussian_kernel
22 | from obp.ope.estimators_continuous import KernelizedDoublyRobust
23 | from obp.ope.estimators_continuous import KernelizedInverseProbabilityWeighting
24 | from obp.ope.estimators_continuous import triangular_kernel
25 | from obp.ope.estimators_embed import (
26 | SelfNormalizedMarginalizedInverseProbabilityWeighting,
27 | )
28 | from obp.ope.estimators_embed import MarginalizedInverseProbabilityWeighting
29 | from obp.ope.estimators_multi import BaseMultiLoggersOffPolicyEstimator
30 | from obp.ope.estimators_multi import MultiLoggersBalancedDoublyRobust
31 | from obp.ope.estimators_multi import MultiLoggersBalancedInverseProbabilityWeighting
32 | from obp.ope.estimators_multi import MultiLoggersNaiveDoublyRobust
33 | from obp.ope.estimators_multi import MultiLoggersNaiveInverseProbabilityWeighting
34 | from obp.ope.estimators_multi import MultiLoggersWeightedDoublyRobust
35 | from obp.ope.estimators_multi import MultiLoggersWeightedInverseProbabilityWeighting
36 | from obp.ope.estimators_slate import SelfNormalizedSlateIndependentIPS
37 | from obp.ope.estimators_slate import SelfNormalizedSlateRewardInteractionIPS
38 | from obp.ope.estimators_slate import SelfNormalizedSlateStandardIPS
39 | from obp.ope.estimators_slate import SlateCascadeDoublyRobust
40 | from obp.ope.estimators_slate import SlateIndependentIPS
41 | from obp.ope.estimators_slate import SlateRewardInteractionIPS
42 | from obp.ope.estimators_slate import SlateStandardIPS
43 | from obp.ope.estimators_tuning import DoublyRobustTuning
44 | from obp.ope.estimators_tuning import DoublyRobustWithShrinkageTuning
45 | from obp.ope.estimators_tuning import InverseProbabilityWeightingTuning
46 | from obp.ope.estimators_tuning import SubGaussianDoublyRobustTuning
47 | from obp.ope.estimators_tuning import SubGaussianInverseProbabilityWeightingTuning
48 | from obp.ope.estimators_tuning import SwitchDoublyRobustTuning
49 | from obp.ope.meta import OffPolicyEvaluation
50 | from obp.ope.meta_continuous import ContinuousOffPolicyEvaluation
51 | from obp.ope.meta_multi import MultiLoggersOffPolicyEvaluation
52 | from obp.ope.meta_slate import SlateOffPolicyEvaluation
53 | from obp.ope.regression_model import RegressionModel
54 | from obp.ope.regression_model_slate import SlateRegressionModel
55 |
56 |
57 | __all__ = [
58 | "BaseOffPolicyEstimator",
59 | "ReplayMethod",
60 | "InverseProbabilityWeighting",
61 | "SelfNormalizedInverseProbabilityWeighting",
62 | "DirectMethod",
63 | "DoublyRobust",
64 | "SelfNormalizedDoublyRobust",
65 | "SwitchDoublyRobust",
66 | "DoublyRobustWithShrinkage",
67 | "SubGaussianInverseProbabilityWeighting",
68 | "SubGaussianDoublyRobust",
69 | "InverseProbabilityWeightingTuning",
70 | "DoublyRobustTuning",
71 | "SwitchDoublyRobustTuning",
72 | "DoublyRobustWithShrinkageTuning",
73 | "SubGaussianInverseProbabilityWeightingTuning",
74 | "SubGaussianDoublyRobustTuning",
75 | "MarginalizedInverseProbabilityWeighting",
76 | "SelfNormalizedMarginalizedInverseProbabilityWeighting",
77 | "BaseMultiLoggersOffPolicyEstimator",
78 | "MultiLoggersNaiveInverseProbabilityWeighting",
79 | "MultiLoggersWeightedInverseProbabilityWeighting",
80 | "MultiLoggersBalancedInverseProbabilityWeighting",
81 | "MultiLoggersNaiveDoublyRobust",
82 | "MultiLoggersBalancedDoublyRobust",
83 | "MultiLoggersWeightedDoublyRobust",
84 | "OffPolicyEvaluation",
85 | "SlateOffPolicyEvaluation",
86 | "ContinuousOffPolicyEvaluation",
87 | "MultiLoggersOffPolicyEvaluation",
88 | "RegressionModel",
89 | "SlateRegressionModel",
90 | "SlateStandardIPS",
91 | "SlateIndependentIPS",
92 | "SlateRewardInteractionIPS",
93 | "SlateCascadeDoublyRobust",
94 | "SelfNormalizedSlateRewardInteractionIPS",
95 | "SelfNormalizedSlateIndependentIPS",
96 | "SelfNormalizedSlateStandardIPS",
97 | "BalancedInverseProbabilityWeighting",
98 | "ImportanceWeightEstimator",
99 | "PropensityScoreEstimator",
100 | "BaseContinuousOffPolicyEstimator",
101 | "KernelizedInverseProbabilityWeighting",
102 | "KernelizedSelfNormalizedInverseProbabilityWeighting",
103 | "KernelizedDoublyRobust",
104 | "triangular_kernel",
105 | "gaussian_kernel",
106 | "epanechnikov_kernel",
107 | "cosine_kernel",
108 | ]
109 |
110 | __all_estimators__ = [
111 | "ReplayMethod",
112 | "InverseProbabilityWeighting",
113 | "SelfNormalizedInverseProbabilityWeighting",
114 | "DirectMethod",
115 | "DoublyRobust",
116 | "DoublyRobustWithShrinkage",
117 | "SwitchDoublyRobust",
118 | "SelfNormalizedDoublyRobust",
119 | "SubGaussianInverseProbabilityWeighting",
120 | "SubGaussianDoublyRobust",
121 | "BalancedInverseProbabilityWeighting",
122 | ]
123 |
124 |
125 | __all_estimators_tuning__ = [
126 | "InverseProbabilityWeightingTuning",
127 | "DoublyRobustTuning",
128 | "SwitchDoublyRobustTuning",
129 | "DoublyRobustWithShrinkageTuning",
130 | ]
131 |
132 |
133 | __all_estimators_tuning_sg__ = [
134 | "SubGaussianInverseProbabilityWeightingTuning",
135 | "SubGaussianDoublyRobustTuning",
136 | ]
137 |
--------------------------------------------------------------------------------
/obp/policy/__init__.py:
--------------------------------------------------------------------------------
1 | from obp.policy.base import BaseContextFreePolicy
2 | from obp.policy.base import BaseContextualPolicy
3 | from obp.policy.base import BaseContinuousOfflinePolicyLearner
4 | from obp.policy.base import BaseOfflinePolicyLearner
5 | from obp.policy.contextfree import BernoulliTS
6 | from obp.policy.contextfree import EpsilonGreedy
7 | from obp.policy.contextfree import Random
8 | from obp.policy.linear import LinEpsilonGreedy
9 | from obp.policy.linear import LinTS
10 | from obp.policy.linear import LinUCB
11 | from obp.policy.logistic import LogisticEpsilonGreedy
12 | from obp.policy.logistic import LogisticTS
13 | from obp.policy.logistic import LogisticUCB
14 | from obp.policy.logistic import MiniBatchLogisticRegression
15 | from obp.policy.offline import IPWLearner
16 | from obp.policy.offline import NNPolicyLearner
17 | from obp.policy.offline import QLearner
18 | from obp.policy.offline_continuous import ContinuousNNPolicyLearner
19 |
20 |
21 | __all__ = [
22 | "BaseContextFreePolicy",
23 | "BaseContextualPolicy",
24 | "BaseOfflinePolicyLearner",
25 | "BaseContinuousOfflinePolicyLearner",
26 | "EpsilonGreedy",
27 | "Random",
28 | "BernoulliTS",
29 | "LinEpsilonGreedy",
30 | "LinUCB",
31 | "LinTS",
32 | "LogisticEpsilonGreedy",
33 | "LogisticUCB",
34 | "LogisticTS",
35 | "MiniBatchLogisticRegression",
36 | "IPWLearner",
37 | "NNPolicyLearner",
38 | "QLearner",
39 | "ContinuousNNPolicyLearner",
40 | ]
41 |
--------------------------------------------------------------------------------
/obp/policy/conf/prior_bts.yaml:
--------------------------------------------------------------------------------
1 | all:
2 | alpha:
3 | - 47.0
4 | - 8.0
5 | - 62.0
6 | - 142.0
7 | - 3.0
8 | - 14.0
9 | - 7.0
10 | - 857.0
11 | - 12.0
12 | - 15.0
13 | - 6.0
14 | - 100.0
15 | - 48.0
16 | - 23.0
17 | - 71.0
18 | - 61.0
19 | - 13.0
20 | - 16.0
21 | - 518.0
22 | - 30.0
23 | - 7.0
24 | - 4.0
25 | - 23.0
26 | - 8.0
27 | - 10.0
28 | - 11.0
29 | - 11.0
30 | - 18.0
31 | - 121.0
32 | - 11.0
33 | - 11.0
34 | - 10.0
35 | - 14.0
36 | - 9.0
37 | - 204.0
38 | - 58.0
39 | - 3.0
40 | - 19.0
41 | - 42.0
42 | - 1013.0
43 | - 2.0
44 | - 328.0
45 | - 15.0
46 | - 31.0
47 | - 14.0
48 | - 138.0
49 | - 45.0
50 | - 55.0
51 | - 23.0
52 | - 38.0
53 | - 10.0
54 | - 401.0
55 | - 52.0
56 | - 6.0
57 | - 3.0
58 | - 6.0
59 | - 5.0
60 | - 32.0
61 | - 35.0
62 | - 133.0
63 | - 52.0
64 | - 820.0
65 | - 43.0
66 | - 195.0
67 | - 8.0
68 | - 42.0
69 | - 40.0
70 | - 4.0
71 | - 32.0
72 | - 30.0
73 | - 9.0
74 | - 22.0
75 | - 6.0
76 | - 23.0
77 | - 5.0
78 | - 54.0
79 | - 8.0
80 | - 22.0
81 | - 65.0
82 | - 246.0
83 | beta:
84 | - 12198.0
85 | - 3566.0
86 | - 15993.0
87 | - 35522.0
88 | - 2367.0
89 | - 4609.0
90 | - 3171.0
91 | - 181745.0
92 | - 4372.0
93 | - 4951.0
94 | - 3100.0
95 | - 24665.0
96 | - 13210.0
97 | - 7061.0
98 | - 18061.0
99 | - 17449.0
100 | - 5644.0
101 | - 6787.0
102 | - 111326.0
103 | - 8776.0
104 | - 3334.0
105 | - 2271.0
106 | - 7389.0
107 | - 2659.0
108 | - 3665.0
109 | - 4724.0
110 | - 3561.0
111 | - 5085.0
112 | - 27407.0
113 | - 4601.0
114 | - 4756.0
115 | - 4120.0
116 | - 4736.0
117 | - 3788.0
118 | - 45292.0
119 | - 14719.0
120 | - 2189.0
121 | - 5589.0
122 | - 11995.0
123 | - 222255.0
124 | - 2308.0
125 | - 70034.0
126 | - 4801.0
127 | - 8274.0
128 | - 5421.0
129 | - 31912.0
130 | - 12213.0
131 | - 13576.0
132 | - 6230.0
133 | - 10382.0
134 | - 4141.0
135 | - 85731.0
136 | - 12811.0
137 | - 2707.0
138 | - 2250.0
139 | - 2668.0
140 | - 2886.0
141 | - 9581.0
142 | - 9465.0
143 | - 28336.0
144 | - 12062.0
145 | - 162793.0
146 | - 12107.0
147 | - 41240.0
148 | - 3162.0
149 | - 11604.0
150 | - 10818.0
151 | - 2923.0
152 | - 8897.0
153 | - 8654.0
154 | - 4000.0
155 | - 6580.0
156 | - 3174.0
157 | - 6766.0
158 | - 2602.0
159 | - 14506.0
160 | - 3968.0
161 | - 7523.0
162 | - 16532.0
163 | - 51964.0
164 | men:
165 | alpha:
166 | - 47.0
167 | - 8.0
168 | - 62.0
169 | - 142.0
170 | - 3.0
171 | - 6.0
172 | - 100.0
173 | - 48.0
174 | - 23.0
175 | - 71.0
176 | - 61.0
177 | - 13.0
178 | - 16.0
179 | - 518.0
180 | - 30.0
181 | - 7.0
182 | - 4.0
183 | - 23.0
184 | - 8.0
185 | - 10.0
186 | - 11.0
187 | - 11.0
188 | - 18.0
189 | - 121.0
190 | - 11.0
191 | - 4.0
192 | - 32.0
193 | - 30.0
194 | - 9.0
195 | - 22.0
196 | - 6.0
197 | - 23.0
198 | - 5.0
199 | - 54.0
200 | beta:
201 | - 12198.0
202 | - 3566.0
203 | - 15993.0
204 | - 35522.0
205 | - 2367.0
206 | - 3100.0
207 | - 24665.0
208 | - 13210.0
209 | - 7061.0
210 | - 18061.0
211 | - 17449.0
212 | - 5644.0
213 | - 6787.0
214 | - 111326.0
215 | - 8776.0
216 | - 3334.0
217 | - 2271.0
218 | - 7389.0
219 | - 2659.0
220 | - 3665.0
221 | - 4724.0
222 | - 3561.0
223 | - 5085.0
224 | - 27407.0
225 | - 4601.0
226 | - 2923.0
227 | - 8897.0
228 | - 8654.0
229 | - 4000.0
230 | - 6580.0
231 | - 3174.0
232 | - 6766.0
233 | - 2602.0
234 | - 14506.0
235 | women:
236 | alpha:
237 | - 12.0
238 | - 7.0
239 | - 984.0
240 | - 13.0
241 | - 15.0
242 | - 15.0
243 | - 11.0
244 | - 14.0
245 | - 9.0
246 | - 200.0
247 | - 72.0
248 | - 3.0
249 | - 14.0
250 | - 49.0
251 | - 1278.0
252 | - 3.0
253 | - 325.0
254 | - 14.0
255 | - 27.0
256 | - 14.0
257 | - 169.0
258 | - 48.0
259 | - 47.0
260 | - 18.0
261 | - 40.0
262 | - 12.0
263 | - 447.0
264 | - 46.0
265 | - 5.0
266 | - 3.0
267 | - 5.0
268 | - 7.0
269 | - 35.0
270 | - 34.0
271 | - 99.0
272 | - 30.0
273 | - 880.0
274 | - 51.0
275 | - 182.0
276 | - 6.0
277 | - 45.0
278 | - 39.0
279 | - 10.0
280 | - 24.0
281 | - 72.0
282 | - 229.0
283 | beta:
284 | - 3612.0
285 | - 3173.0
286 | - 204484.0
287 | - 4517.0
288 | - 4765.0
289 | - 5331.0
290 | - 4131.0
291 | - 4728.0
292 | - 4028.0
293 | - 44280.0
294 | - 17918.0
295 | - 2309.0
296 | - 4339.0
297 | - 12922.0
298 | - 270771.0
299 | - 2480.0
300 | - 68475.0
301 | - 5129.0
302 | - 7367.0
303 | - 5819.0
304 | - 38026.0
305 | - 13047.0
306 | - 11604.0
307 | - 5394.0
308 | - 10912.0
309 | - 4439.0
310 | - 94485.0
311 | - 10700.0
312 | - 2679.0
313 | - 2319.0
314 | - 2578.0
315 | - 3288.0
316 | - 9566.0
317 | - 9775.0
318 | - 20120.0
319 | - 7317.0
320 | - 172026.0
321 | - 13673.0
322 | - 37329.0
323 | - 3365.0
324 | - 10911.0
325 | - 10734.0
326 | - 4278.0
327 | - 7574.0
328 | - 16826.0
329 | - 47462.0
330 |
331 |
--------------------------------------------------------------------------------
/obp/policy/policy_type.py:
--------------------------------------------------------------------------------
1 | import enum
2 |
3 |
4 | class PolicyType(enum.Enum):
5 | """Policy type.
6 |
7 | Attributes
8 | ----------
9 | CONTEXT_FREE:
10 | The policy type is contextfree.
11 | CONTEXTUAL:
12 | The policy type is contextual.
13 | OFFLINE:
14 | The policy type is offline.
15 | """
16 |
17 | CONTEXT_FREE = enum.auto()
18 | CONTEXTUAL = enum.auto()
19 | OFFLINE = enum.auto()
20 |
21 | def __repr__(self) -> str:
22 |
23 | return str(self)
24 |
--------------------------------------------------------------------------------
/obp/simulator/__init__.py:
--------------------------------------------------------------------------------
1 | from obp.simulator.simulator import calc_ground_truth_policy_value
2 |
3 |
4 | __all__ = [
5 | "calc_ground_truth_policy_value",
6 | ]
7 |
--------------------------------------------------------------------------------
/obp/simulator/delay_sampler.py:
--------------------------------------------------------------------------------
1 | from dataclasses import dataclass
2 |
3 | import numpy as np
4 | from sklearn.utils import check_random_state
5 |
6 |
7 | @dataclass
8 | class ExponentialDelaySampler:
9 | """Class for sampling delays from different exponential functions.
10 |
11 | Parameters
12 | -----------
13 | max_scale: float, default=100.0
14 | The maximum scale parameter for the exponential delay distribution. When there is no weighted exponential
15 | function the max_scale becomes the default scale.
16 |
17 | min_scale: float, default=10.0
18 | The minimum scale parameter for the exponential delay distribution. Only used when sampling from a weighted
19 | exponential function.
20 |
21 | random_state: int, default=12345
22 | Controls the random seed in sampling synthetic bandit data.
23 | """
24 |
25 | max_scale: float = 100.0
26 | min_scale: float = 10.0
27 | random_state: int = None
28 |
29 | def __post_init__(self) -> None:
30 | if self.random_state is None:
31 | raise ValueError("`random_state` must be given")
32 | self.random_ = check_random_state(self.random_state)
33 |
34 | def exponential_delay_function(
35 | self, n_rounds: int, n_actions: int, **kwargs
36 | ) -> np.ndarray:
37 | """Exponential delay function used for sampling a number of delay rounds before rewards can be observed.
38 |
39 | Note
40 | ------
41 | This implementation of the exponential delay function assumes that there is no causal relationship between the
42 | context, action or reward and observed delay. Exponential delay function have been observed by Ktena, S.I. et al.
43 |
44 | Parameters
45 | -----------
46 | n_rounds: int
47 | Number of rounds to sample delays for.
48 |
49 | n_actions: int
50 | Number of actions to sample delays for. If the exponential function is not parameterised the delays are
51 | repeated for each actions.
52 |
53 | Returns
54 | ---------
55 | delay_rounds: array-like, shape (n_rounds, )
56 | Rounded up round delays representing the amount of rounds before the policy can observe the rewards.
57 |
58 | References
59 | ------------
60 | Ktena, S.I., Tejani, A., Theis, L., Myana, P.K., Dilipkumar, D., Huszár, F., Yoo, S. and Shi, W.
61 | "Addressing delayed feedback for continuous training with neural networks in CTR prediction." 2019.
62 |
63 | """
64 | delays_per_round = np.ceil(
65 | self.random_.exponential(scale=self.max_scale, size=n_rounds)
66 | )
67 |
68 | return np.tile(delays_per_round, (n_actions, 1)).T
69 |
70 | def exponential_delay_function_expected_reward_weighted(
71 | self, expected_rewards: np.ndarray, **kwargs
72 | ) -> np.ndarray:
73 | """Exponential delay function used for sampling a number of delay rounds before rewards can be observed.
74 | Each delay is conditioned on the expected reward by multiplying (1 - expected_reward) * scale. This creates
75 | the assumption that the more likely a reward is going be observed, the more likely it will be that the reward
76 | comes sooner. Eg. recommending an attractive item will likely result in a faster purchase.
77 |
78 | Parameters
79 | -----------
80 | expected_rewards : array-like, shape (n_rounds, n_actions)
81 | The expected reward between 0 and 1 for each arm for each round. This used to weight the scale of the
82 | exponential function.
83 |
84 | Returns
85 | ---------
86 | delay_rounds: array-like, shape (n_rounds, )
87 | Rounded up round delays representing the amount of rounds before the policy can observe the rewards.
88 | """
89 | scale = self.min_scale + (
90 | (1 - expected_rewards) * (self.max_scale - self.min_scale)
91 | )
92 | delays_per_round = np.ceil(
93 | self.random_.exponential(scale=scale, size=expected_rewards.shape)
94 | )
95 |
96 | return delays_per_round
97 |
--------------------------------------------------------------------------------
/obp/simulator/replay.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import tqdm as tqdm
3 |
4 | from obp.policy.policy_type import PolicyType
5 | from obp.simulator.simulator import BanditPolicy
6 | from obp.types import BanditFeedback
7 | from obp.utils import check_bandit_feedback_inputs, convert_to_action_dist
8 |
9 |
10 | def run_bandit_replay(
11 | bandit_feedback: BanditFeedback, policy: BanditPolicy
12 | ) -> np.ndarray:
13 | """Run an online bandit algorithm on given logged bandit feedback data using the replay method.
14 |
15 | Parameters
16 | ----------
17 | bandit_feedback: BanditFeedback
18 | Logged bandit data used in offline bandit simulation.
19 | policy: BanditPolicy
20 | Online bandit policy to be evaluated in offline bandit simulation (i.e., evaluation policy).
21 | Returns
22 | --------
23 | action_dist: array-like, shape (n_rounds, n_actions, len_list)
24 | Action choice probabilities (can be deterministic).
25 |
26 | References
27 | ------------
28 | Lihong Li, Wei Chu, John Langford, and Xuanhui Wang.
29 | "Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms.", 2011.
30 | """
31 | for key_ in ["action", "position", "reward", "pscore", "context"]:
32 | if key_ not in bandit_feedback:
33 | raise RuntimeError(f"Missing key of {key_} in 'bandit_feedback'.")
34 | check_bandit_feedback_inputs(
35 | context=bandit_feedback["context"],
36 | action=bandit_feedback["action"],
37 | reward=bandit_feedback["reward"],
38 | position=bandit_feedback["position"],
39 | pscore=bandit_feedback["pscore"],
40 | )
41 |
42 | policy_ = policy
43 | selected_actions_list = list()
44 | dim_context = bandit_feedback["context"].shape[1]
45 | if bandit_feedback["position"] is None:
46 | bandit_feedback["position"] = np.zeros_like(
47 | bandit_feedback["action"], dtype=int
48 | )
49 | for action_, reward_, position_, context_ in tqdm(
50 | zip(
51 | bandit_feedback["action"],
52 | bandit_feedback["reward"],
53 | bandit_feedback["position"],
54 | bandit_feedback["context"],
55 | ),
56 | total=bandit_feedback["n_rounds"],
57 | ):
58 |
59 | # select a list of actions
60 | if policy_.policy_type == PolicyType.CONTEXT_FREE:
61 | selected_actions = policy_.select_action()
62 | elif policy_.policy_type == PolicyType.CONTEXTUAL:
63 | selected_actions = policy_.select_action(context_.reshape(1, dim_context))
64 | action_match_ = action_ == selected_actions[position_]
65 | # update parameters of a bandit policy
66 | # only when selected actions&positions are equal to logged actions&positions
67 | if action_match_:
68 | if policy_.policy_type == PolicyType.CONTEXT_FREE:
69 | policy_.update_params(action=action_, reward=reward_)
70 | elif policy_.policy_type == PolicyType.CONTEXTUAL:
71 | policy_.update_params(
72 | action=action_,
73 | reward=reward_,
74 | context=context_.reshape(1, dim_context),
75 | )
76 | selected_actions_list.append(selected_actions)
77 |
78 | action_dist = convert_to_action_dist(
79 | n_actions=bandit_feedback["action"].max() + 1,
80 | selected_actions=np.array(selected_actions_list),
81 | )
82 | return action_dist
83 |
--------------------------------------------------------------------------------
/obp/types.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Yuta Saito, Yusuke Narita, and ZOZO Technologies, Inc. All rights reserved.
2 | # Licensed under the Apache 2.0 License.
3 |
4 | """Types."""
5 | from typing import Dict
6 | from typing import Union
7 |
8 | import numpy as np
9 |
10 |
11 | # dataset
12 | BanditFeedback = Dict[str, Union[int, np.ndarray]]
13 |
--------------------------------------------------------------------------------
/obp/version.py:
--------------------------------------------------------------------------------
1 | __version__ = "0.5.5"
2 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.poetry]
2 | name = "obp"
3 | version = "0.5.5"
4 | description = "Open Bandit Pipeline: a python library for off-policy evaluation and learning"
5 | authors = ["Yuta Saito "]
6 | license = "Apache License 2.0"
7 |
8 | [tool.poetry.dependencies]
9 | python = ">=3.7.1,<3.10"
10 | torch = "^1.9.0"
11 | scikit-learn = "1.0.2"
12 | pandas = "^1.3.2"
13 | numpy = "^1.21.2"
14 | matplotlib = "^3.4.3"
15 | tqdm = "^4.62.2"
16 | scipy = "1.7.3"
17 | PyYAML = "^5.4.1"
18 | seaborn = "^0.11.2"
19 | pyieoe = "^0.1.1"
20 | pingouin = "^0.4.0"
21 | mypy-extensions = "^0.4.3"
22 | Pillow = "9.1.1"
23 |
24 | [tool.poetry.dev-dependencies]
25 | flake8 = "^3.9.2"
26 | black = "22.1.0"
27 | pytest = "^6.2.5"
28 | isort = "^5.9.3"
29 |
30 | [build-system]
31 | requires = ["poetry-core>=1.0.0"]
32 | build-backend = "poetry.core.masonry.api"
33 |
34 | [tool.isort]
35 | profile = 'black'
36 | src_paths = ['obp', 'tests', 'examples', 'benchmark']
37 | line_length = 88
38 | lines_after_imports = 2
39 | force_single_line = 'True'
40 | force_sort_within_sections = 'True'
41 | order_by_type = 'False'
42 |
43 | [tool.pytest.ini_options]
44 | addopts = "--color=yes"
45 |
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [flake8]
2 | ignore =
3 | E501,W503,W605,E203
4 | # We ignore E501: line too long because we assume
5 | # the checking of code length is already done by black.
6 | # We ignore W503: line break before binary operator because it is incompatible with black
7 | # We ignore W605: invalid escape sequence because it is needed to write math equations
8 | # We ignore E203: whitespace before ':'
9 | exclude = .venv,build
10 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from obp.version import __version__
2 | from setuptools import setup, find_packages
3 | from os import path
4 | import sys
5 |
6 | here = path.abspath(path.dirname(__file__))
7 | sys.path.insert(0, path.join(here, "obp"))
8 |
9 | print("version")
10 | print(__version__)
11 |
12 | with open(path.join(here, "README.md"), encoding="utf-8") as f:
13 | long_description = f.read()
14 |
15 | package_data_list = ["obp/policy/conf/prior_bts.yaml", "obp/dataset/obd"]
16 |
17 | setup(
18 | name="obp",
19 | version=__version__,
20 | description="Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation",
21 | url="https://github.com/st-tech/zr-obp",
22 | author="Yuta Saito",
23 | author_email="open-bandit-project@googlegroups.com",
24 | keywords=["bandit algorithms", "off-policy evaluation"],
25 | long_description=long_description,
26 | long_description_content_type="text/markdown",
27 | install_requires=[
28 | "matplotlib>=3.4.3",
29 | "mypy-extensions>=0.4.3",
30 | "numpy>=1.21.2",
31 | "pandas>=1.3.2",
32 | "pyyaml>=5.1",
33 | "seaborn>=0.10.1",
34 | "scikit-learn>=1.0.2",
35 | "scipy>=1.7.3",
36 | "torch>=1.9.0",
37 | "tqdm>=4.62.2",
38 | "pyieoe>=0.1.1",
39 | "pingouin>=0.4.0",
40 | ],
41 | license="Apache License",
42 | packages=find_packages(
43 | exclude=["benchmark", "docs", "examples", "obd", "tests", "slides"]
44 | ),
45 | package_data={"obp": package_data_list},
46 | include_package_data=True,
47 | classifiers=[
48 | "Intended Audience :: Science/Research",
49 | "Programming Language :: Python :: 3.7",
50 | "Programming Language :: Python :: 3.8",
51 | "Programming Language :: Python :: 3.9",
52 | "Topic :: Scientific/Engineering",
53 | "Topic :: Scientific/Engineering :: Mathematics",
54 | "Topic :: Scientific/Engineering :: Artificial Intelligence",
55 | "Topic :: Software Development",
56 | "Topic :: Software Development :: Libraries",
57 | "Topic :: Software Development :: Libraries :: Python Modules",
58 | "License :: OSI Approved :: Apache Software License",
59 | ],
60 | )
61 |
--------------------------------------------------------------------------------
/slides/slides_EN.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/slides/slides_EN.pdf
--------------------------------------------------------------------------------
/slides/slides_JN.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/st-tech/zr-obp/8cbd5fa4558b7ad2ba4781546d6604e4cc3e07c4/slides/slides_JN.pdf
--------------------------------------------------------------------------------
/tests/dataset/test_multiclass.py:
--------------------------------------------------------------------------------
1 | from typing import Tuple
2 |
3 | import numpy as np
4 | import pytest
5 | from sklearn.datasets import load_digits
6 | from sklearn.linear_model import LogisticRegression
7 |
8 | from obp.dataset import MultiClassToBanditReduction
9 |
10 |
11 | @pytest.fixture(scope="session")
12 | def raw_data() -> Tuple[np.ndarray, np.ndarray]:
13 | X, y = load_digits(return_X_y=True)
14 | return X, y
15 |
16 |
17 | def test_invalid_initialization(raw_data):
18 | X, y = raw_data
19 |
20 | # invalid alpha_b
21 | with pytest.raises(ValueError):
22 | MultiClassToBanditReduction(
23 | X=X, y=y, base_classifier_b=LogisticRegression(max_iter=10000), alpha_b=-0.3
24 | )
25 |
26 | with pytest.raises(ValueError):
27 | MultiClassToBanditReduction(
28 | X=X, y=y, base_classifier_b=LogisticRegression(max_iter=10000), alpha_b=1.3
29 | )
30 |
31 | # invalid classifier
32 | with pytest.raises(ValueError):
33 | from sklearn.tree import DecisionTreeRegressor
34 |
35 | MultiClassToBanditReduction(X=X, y=y, base_classifier_b=DecisionTreeRegressor)
36 |
37 | # invalid n_def_actions
38 | with pytest.raises(TypeError):
39 | MultiClassToBanditReduction(
40 | X=X,
41 | y=y,
42 | base_classifier_b=LogisticRegression(max_iter=10000),
43 | n_deficient_actions="aaa",
44 | )
45 |
46 | with pytest.raises(TypeError):
47 | MultiClassToBanditReduction(
48 | X=X,
49 | y=y,
50 | base_classifier_b=LogisticRegression(max_iter=10000),
51 | n_deficient_actions=None,
52 | )
53 |
54 | with pytest.raises(ValueError):
55 | MultiClassToBanditReduction(
56 | X=X,
57 | y=y,
58 | base_classifier_b=LogisticRegression(max_iter=10000),
59 | n_deficient_actions=-1,
60 | )
61 |
62 | with pytest.raises(ValueError):
63 | MultiClassToBanditReduction(
64 | X=X,
65 | y=y,
66 | base_classifier_b=LogisticRegression(max_iter=10000),
67 | n_deficient_actions=1000,
68 | )
69 |
70 |
71 | def test_split_train_eval(raw_data):
72 | X, y = raw_data
73 |
74 | eval_size = 1000
75 | mcbr = MultiClassToBanditReduction(
76 | X=X, y=y, base_classifier_b=LogisticRegression(max_iter=10000), alpha_b=0.3
77 | )
78 | mcbr.split_train_eval(eval_size=eval_size)
79 |
80 | assert eval_size == mcbr.n_rounds_ev
81 |
82 |
83 | def test_obtain_batch_bandit_feedback(raw_data):
84 | X, y = raw_data
85 |
86 | for n_deficient_actions in [0, 2]:
87 | mcbr = MultiClassToBanditReduction(
88 | X=X,
89 | y=y,
90 | base_classifier_b=LogisticRegression(max_iter=10000),
91 | alpha_b=0.3,
92 | n_deficient_actions=n_deficient_actions,
93 | )
94 | mcbr.split_train_eval()
95 | bandit_feedback = mcbr.obtain_batch_bandit_feedback()
96 |
97 | assert "n_actions" in bandit_feedback.keys()
98 | assert "n_rounds" in bandit_feedback.keys()
99 | assert "context" in bandit_feedback.keys()
100 | assert "action" in bandit_feedback.keys()
101 | assert "reward" in bandit_feedback.keys()
102 | assert "position" in bandit_feedback.keys()
103 | assert "pi_b" in bandit_feedback.keys()
104 | assert "pscore" in bandit_feedback.keys()
105 |
106 | n_rounds = bandit_feedback["n_rounds"]
107 | pi_b = bandit_feedback["pi_b"]
108 | assert pi_b.shape[0] == n_rounds
109 | n_actions = np.unique(y).shape[0]
110 | assert pi_b.shape[1] == n_actions
111 | assert pi_b.shape[2] == 1
112 | assert np.allclose(pi_b[:, :, 0].sum(1), np.ones(n_rounds))
113 | assert (pi_b == 0).sum() == n_deficient_actions * n_rounds
114 |
115 |
116 | def test_obtain_action_dist_by_eval_policy(raw_data):
117 | X, y = raw_data
118 |
119 | eval_size = 1000
120 | mcbr = MultiClassToBanditReduction(
121 | X=X, y=y, base_classifier_b=LogisticRegression(max_iter=10000), alpha_b=0.3
122 | )
123 | mcbr.split_train_eval(eval_size=eval_size)
124 |
125 | # invalid alpha_e
126 | with pytest.raises(ValueError):
127 | mcbr.obtain_action_dist_by_eval_policy(alpha_e=-0.3)
128 |
129 | with pytest.raises(ValueError):
130 | mcbr.obtain_action_dist_by_eval_policy(alpha_e=1.3)
131 |
132 | # valid type
133 | action_dist = mcbr.obtain_action_dist_by_eval_policy()
134 |
135 | assert action_dist.shape[0] == eval_size
136 | n_actions = np.unique(y).shape[0]
137 | assert action_dist.shape[1] == n_actions
138 | assert action_dist.shape[2] == 1
139 | assert np.allclose(action_dist[:, :, 0].sum(1), np.ones(eval_size))
140 |
141 |
142 | def test_calc_ground_truth_policy_value(raw_data):
143 | X, y = raw_data
144 |
145 | eval_size = 1000
146 | mcbr = MultiClassToBanditReduction(
147 | X=X, y=y, base_classifier_b=LogisticRegression(max_iter=10000), alpha_b=0.3
148 | )
149 | mcbr.split_train_eval(eval_size=eval_size)
150 |
151 | with pytest.raises(ValueError):
152 | invalid_action_dist = np.zeros(eval_size)
153 | mcbr.calc_ground_truth_policy_value(action_dist=invalid_action_dist)
154 |
155 | with pytest.raises(ValueError):
156 | reshaped_action_dist = mcbr.obtain_action_dist_by_eval_policy().reshape(
157 | 1, -1, 1
158 | )
159 | mcbr.calc_ground_truth_policy_value(action_dist=reshaped_action_dist)
160 |
161 | action_dist = mcbr.obtain_action_dist_by_eval_policy()
162 | ground_truth_policy_value = mcbr.calc_ground_truth_policy_value(
163 | action_dist=action_dist
164 | )
165 | assert isinstance(ground_truth_policy_value, float)
166 |
--------------------------------------------------------------------------------
/tests/dataset/test_real.py:
--------------------------------------------------------------------------------
1 | from typing import Dict
2 | from typing import Tuple
3 |
4 | import numpy as np
5 | import pandas as pd
6 | import pytest
7 |
8 | from obp.dataset import OpenBanditDataset
9 |
10 |
11 | def test_real_init():
12 | # behavior_policy
13 | with pytest.raises(ValueError):
14 | OpenBanditDataset(behavior_policy="aaa", campaign="all")
15 |
16 | # campaign
17 | with pytest.raises(ValueError):
18 | OpenBanditDataset(behavior_policy="random", campaign="aaa")
19 |
20 | # data_path
21 | with pytest.raises(ValueError):
22 | OpenBanditDataset(behavior_policy="random", campaign="all", data_path=5)
23 |
24 | # load_raw_data
25 | obd = OpenBanditDataset(behavior_policy="random", campaign="all")
26 | # check the value exists and has the right type
27 | assert (
28 | isinstance(obd.data, pd.DataFrame)
29 | and isinstance(obd.item_context, pd.DataFrame)
30 | and isinstance(obd.action, np.ndarray)
31 | and isinstance(obd.position, np.ndarray)
32 | and isinstance(obd.reward, np.ndarray)
33 | and isinstance(obd.pscore, np.ndarray)
34 | )
35 |
36 | # pre_process (context and action_context)
37 | assert isinstance(obd.context, np.ndarray) and isinstance(
38 | obd.action_context, np.ndarray
39 | )
40 |
41 |
42 | def test_obtain_batch_bandit_feedback():
43 | # invalid test_size
44 | with pytest.raises(ValueError):
45 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
46 | dataset.obtain_batch_bandit_feedback(is_timeseries_split=True, test_size=1.3)
47 |
48 | with pytest.raises(ValueError):
49 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
50 | dataset.obtain_batch_bandit_feedback(is_timeseries_split=True, test_size=-0.5)
51 |
52 | with pytest.raises(TypeError):
53 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
54 | dataset.obtain_batch_bandit_feedback(is_timeseries_split=True, test_size="0.5")
55 |
56 | with pytest.raises(TypeError):
57 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
58 | dataset.obtain_batch_bandit_feedback(is_timeseries_split="True", test_size=0.5)
59 |
60 | # existence of keys
61 | # is_timeseries_split=False (default)
62 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
63 | bandit_feedback = dataset.obtain_batch_bandit_feedback()
64 |
65 | assert "n_rounds" in bandit_feedback.keys()
66 | assert "n_actions" in bandit_feedback.keys()
67 | assert "action" in bandit_feedback.keys()
68 | assert "position" in bandit_feedback.keys()
69 | assert "reward" in bandit_feedback.keys()
70 | assert "pscore" in bandit_feedback.keys()
71 | assert "context" in bandit_feedback.keys()
72 | assert "action_context" in bandit_feedback.keys()
73 |
74 | # is_timeseries_split=True
75 | bandit_feedback_timeseries = dataset.obtain_batch_bandit_feedback(
76 | is_timeseries_split=True
77 | )
78 | assert isinstance(bandit_feedback_timeseries, Tuple)
79 | bandit_feedback_train = bandit_feedback_timeseries[0]
80 | bandit_feedback_test = bandit_feedback_timeseries[1]
81 |
82 | bf_elems = {
83 | "n_rounds",
84 | "n_actions",
85 | "action",
86 | "position",
87 | "reward",
88 | "pscore",
89 | "context",
90 | "action_context",
91 | }
92 | assert all(k in bandit_feedback_train.keys() for k in bf_elems)
93 | assert all(k in bandit_feedback_test.keys() for k in bf_elems)
94 |
95 |
96 | def test_calc_on_policy_policy_value_estimate():
97 | ground_truth_policy_value = OpenBanditDataset.calc_on_policy_policy_value_estimate(
98 | behavior_policy="random", campaign="all"
99 | )
100 | assert isinstance(ground_truth_policy_value, float)
101 |
102 |
103 | def test_sample_bootstrap_bandit_feedback():
104 | with pytest.raises(ValueError):
105 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
106 | dataset.sample_bootstrap_bandit_feedback(
107 | is_timeseries_split=True, test_size=1.3
108 | )
109 |
110 | with pytest.raises(ValueError):
111 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
112 | dataset.sample_bootstrap_bandit_feedback(
113 | is_timeseries_split=True, test_size=-0.5
114 | )
115 |
116 | with pytest.raises(ValueError):
117 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
118 | dataset.sample_bootstrap_bandit_feedback(sample_size=-50)
119 |
120 | with pytest.raises(TypeError):
121 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
122 | dataset.sample_bootstrap_bandit_feedback(sample_size=50.0)
123 |
124 | with pytest.raises(ValueError):
125 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
126 | dataset.sample_bootstrap_bandit_feedback(sample_size=10000000)
127 |
128 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
129 | bandit_feedback = dataset.obtain_batch_bandit_feedback()
130 | bootstrap_bf = dataset.sample_bootstrap_bandit_feedback()
131 |
132 | bf_keys = {"action", "position", "reward", "pscore", "context"}
133 | for k in bf_keys:
134 | assert len(bandit_feedback[k]) == len(bootstrap_bf[k])
135 |
136 | bandit_feedback_timeseries: Dict = dataset.obtain_batch_bandit_feedback(
137 | is_timeseries_split=True
138 | )[0]
139 | bootstrap_bf_timeseries = dataset.sample_bootstrap_bandit_feedback(
140 | is_timeseries_split=True
141 | )
142 | for k in bf_keys:
143 | assert len(bandit_feedback_timeseries[k]) == len(bootstrap_bf_timeseries[k])
144 |
145 | sample_size = 1000
146 | dataset = OpenBanditDataset(behavior_policy="random", campaign="all")
147 | bootstrap_bf = dataset.sample_bootstrap_bandit_feedback(sample_size=sample_size)
148 | assert bootstrap_bf["n_rounds"] == sample_size
149 | for k in bf_keys:
150 | assert len(bootstrap_bf[k]) == sample_size
151 |
--------------------------------------------------------------------------------
/tests/ope/conftest.py:
--------------------------------------------------------------------------------
1 | import copy
2 | import os
3 | from typing import Set
4 |
5 | import numpy as np
6 | import pytest
7 | from scipy import special
8 | from sklearn.utils import check_random_state
9 |
10 | from obp.dataset import linear_behavior_policy
11 | from obp.dataset import logistic_reward_function
12 | from obp.dataset import SyntheticBanditDataset
13 | from obp.dataset import SyntheticBanditDatasetWithActionEmbeds
14 | from obp.dataset import SyntheticContinuousBanditDataset
15 | from obp.dataset import SyntheticMultiLoggersBanditDataset
16 | from obp.dataset import SyntheticSlateBanditDataset
17 | from obp.policy import Random
18 | from obp.types import BanditFeedback
19 | from obp.utils import sigmoid
20 |
21 |
22 | # resolve ImportMismatchError when using virtual environment
23 | os.environ["PY_IGNORE_IMPORTMISMATCH"] = "1"
24 |
25 |
26 | # generate synthetic bandit dataset using SyntheticBanditDataset
27 | @pytest.fixture(scope="session")
28 | def synthetic_bandit_feedback() -> BanditFeedback:
29 | n_actions = 10
30 | dim_context = 5
31 | random_state = 12345
32 | n_rounds = 10000
33 | dataset = SyntheticBanditDataset(
34 | n_actions=n_actions,
35 | dim_context=dim_context,
36 | reward_function=logistic_reward_function,
37 | behavior_policy_function=linear_behavior_policy,
38 | random_state=random_state,
39 | )
40 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
41 | return bandit_feedback
42 |
43 |
44 | # generate synthetic slate bandit dataset using SyntheticSlateBanditDataset
45 | @pytest.fixture(scope="session")
46 | def synthetic_slate_bandit_feedback() -> BanditFeedback:
47 | # set parameters
48 | n_unique_action = 10
49 | len_list = 3
50 | dim_context = 2
51 | reward_type = "binary"
52 | random_state = 12345
53 | n_rounds = 100
54 | dataset = SyntheticSlateBanditDataset(
55 | n_unique_action=n_unique_action,
56 | len_list=len_list,
57 | dim_context=dim_context,
58 | reward_type=reward_type,
59 | random_state=random_state,
60 | )
61 | # obtain feedback
62 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
63 | return bandit_feedback
64 |
65 |
66 | # generate synthetic continuous bandit dataset using SyntheticContinuousBanditDataset
67 | @pytest.fixture(scope="session")
68 | def synthetic_continuous_bandit_feedback() -> BanditFeedback:
69 | # set parameters
70 | dim_context = 2
71 | random_state = 12345
72 | n_rounds = 100
73 | min_action_value = -10
74 | max_action_value = 10
75 | dataset = SyntheticContinuousBanditDataset(
76 | dim_context=dim_context,
77 | min_action_value=min_action_value,
78 | max_action_value=max_action_value,
79 | random_state=random_state,
80 | )
81 | # obtain feedback
82 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
83 | return bandit_feedback
84 |
85 |
86 | @pytest.fixture(scope="session")
87 | def synthetic_multi_bandit_feedback() -> BanditFeedback:
88 | n_actions = 10
89 | dim_context = 5
90 | betas = [-10, -5, 0, 5, 10]
91 | rhos = [1, 2, 3, 2, 1]
92 | random_state = 12345
93 | n_rounds = 10000
94 | dataset = SyntheticMultiLoggersBanditDataset(
95 | n_actions=n_actions,
96 | dim_context=dim_context,
97 | betas=betas,
98 | rhos=rhos,
99 | reward_function=logistic_reward_function,
100 | random_state=random_state,
101 | )
102 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
103 | return bandit_feedback
104 |
105 |
106 | @pytest.fixture(scope="session")
107 | def synthetic_bandit_feedback_with_embed() -> BanditFeedback:
108 | n_actions = 10
109 | dim_context = 5
110 | n_cat_dim = 3
111 | n_cat_per_dim = 5
112 | random_state = 12345
113 | n_rounds = 10000
114 | dataset = SyntheticBanditDatasetWithActionEmbeds(
115 | n_actions=n_actions,
116 | dim_context=dim_context,
117 | n_cat_dim=n_cat_dim,
118 | n_cat_per_dim=n_cat_per_dim,
119 | reward_function=logistic_reward_function,
120 | random_state=random_state,
121 | )
122 | bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=n_rounds)
123 | return bandit_feedback
124 |
125 |
126 | # make the expected reward of synthetic bandit feedback close to that of the Open Bandit Dataset
127 | @pytest.fixture(scope="session")
128 | def fixed_synthetic_bandit_feedback(synthetic_bandit_feedback) -> BanditFeedback:
129 | # set random
130 | random_state = 12345
131 | random_ = check_random_state(random_state)
132 | # copy synthetic bandit feedback
133 | bandit_feedback = copy.deepcopy(synthetic_bandit_feedback)
134 | # expected reward would be about 0.65%, which is close to that of the ZOZO dataset
135 | logit = special.logit(bandit_feedback["expected_reward"])
136 | bandit_feedback["expected_reward"] = sigmoid(logit - 4.0)
137 | expected_reward_factual = bandit_feedback["expected_reward"][
138 | np.arange(bandit_feedback["n_rounds"]), bandit_feedback["action"]
139 | ]
140 | bandit_feedback["reward"] = random_.binomial(n=1, p=expected_reward_factual)
141 | return bandit_feedback
142 |
143 |
144 | # key set of bandit feedback data
145 | @pytest.fixture(scope="session")
146 | def feedback_key_set() -> Set[str]:
147 | return {
148 | "action",
149 | "action_context",
150 | "context",
151 | "expected_reward",
152 | "n_actions",
153 | "n_rounds",
154 | "position",
155 | "pi_b",
156 | "pscore",
157 | "reward",
158 | }
159 |
160 |
161 | # random evaluation policy
162 | @pytest.fixture(scope="session")
163 | def random_action_dist(synthetic_bandit_feedback) -> np.ndarray:
164 | n_actions = synthetic_bandit_feedback["n_actions"]
165 | evaluation_policy = Random(n_actions=n_actions, len_list=1)
166 | action_dist = evaluation_policy.compute_batch_action_dist(
167 | n_rounds=synthetic_bandit_feedback["n_rounds"]
168 | )
169 | return action_dist
170 |
171 |
172 | def generate_action_dist(i, j, k):
173 | x = np.random.uniform(size=(i, j, k))
174 | action_dist = x / x.sum(axis=1)[:, np.newaxis, :]
175 | return action_dist
176 |
--------------------------------------------------------------------------------
/tests/ope/hyperparams.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 100
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | logistic_regression:
8 | max_iter: 10000
9 | C: 1000
10 | random_state: 12345
11 | random_forest:
12 | n_estimators: 100
13 | max_depth: 5
14 | min_samples_leaf: 10
15 | random_state: 12345
16 | ridge:
17 | alpha: 0.2
18 | random_state: 12345
19 |
--------------------------------------------------------------------------------
/tests/ope/hyperparams_slate.yaml:
--------------------------------------------------------------------------------
1 | lightgbm:
2 | n_estimators: 100
3 | learning_rate: 0.01
4 | max_depth: 5
5 | min_samples_leaf: 10
6 | random_state: 12345
7 | random_forest:
8 | n_estimators: 100
9 | max_depth: 5
10 | min_samples_leaf: 10
11 | random_state: 12345
12 | ridge:
13 | alpha: 0.2
14 | random_state: 12345
15 |
--------------------------------------------------------------------------------
/tests/ope/test_dm_estimators.py:
--------------------------------------------------------------------------------
1 | import re
2 |
3 | from conftest import generate_action_dist
4 | import numpy as np
5 | import pytest
6 |
7 | from obp.ope import DirectMethod
8 | from obp.types import BanditFeedback
9 |
10 |
11 | # action_dist, position, estimated_rewards_by_reg_model, description
12 | invalid_input_of_dm = [
13 | (
14 | generate_action_dist(5, 4, 3),
15 | np.zeros(5, dtype=int),
16 | np.zeros((5, 4, 2)), #
17 | "Expected `estimated_rewards_by_reg_model.shape == action_dist.shape`, but found it False",
18 | ),
19 | (
20 | generate_action_dist(5, 4, 3),
21 | np.zeros(5, dtype=int),
22 | None, #
23 | "`estimated_rewards_by_reg_model` must be 3D array",
24 | ),
25 | (
26 | generate_action_dist(5, 4, 3),
27 | np.zeros(5, dtype=int),
28 | "4", #
29 | "`estimated_rewards_by_reg_model` must be 3D array",
30 | ),
31 | ]
32 |
33 |
34 | @pytest.mark.parametrize(
35 | "action_dist, position, estimated_rewards_by_reg_model, description",
36 | invalid_input_of_dm,
37 | )
38 | def test_dm_using_invalid_input_data(
39 | action_dist: np.ndarray,
40 | position: np.ndarray,
41 | estimated_rewards_by_reg_model: np.ndarray,
42 | description: str,
43 | ) -> None:
44 | dm = DirectMethod()
45 | with pytest.raises(ValueError, match=f"{description}*"):
46 | _ = dm.estimate_policy_value(
47 | action_dist=action_dist,
48 | position=position,
49 | estimated_rewards_by_reg_model=estimated_rewards_by_reg_model,
50 | )
51 | with pytest.raises(ValueError, match=f"{description}*"):
52 | _ = dm.estimate_interval(
53 | action_dist=action_dist,
54 | position=position,
55 | estimated_rewards_by_reg_model=estimated_rewards_by_reg_model,
56 | )
57 |
58 |
59 | def test_dm_using_random_evaluation_policy(
60 | synthetic_bandit_feedback: BanditFeedback, random_action_dist: np.ndarray
61 | ) -> None:
62 | """
63 | Test the performance of the direct method using synthetic bandit data and random evaluation policy
64 | """
65 | expected_reward = synthetic_bandit_feedback["expected_reward"][:, :, np.newaxis]
66 | action_dist = random_action_dist
67 | # compute ground truth policy value using expected reward
68 | q_pi_e = np.average(expected_reward[:, :, 0], weights=action_dist[:, :, 0], axis=1)
69 | # compute statistics of ground truth policy value
70 | gt_mean = q_pi_e.mean()
71 | # prepare dm
72 | dm = DirectMethod()
73 | # prepare input dict
74 | input_dict = {
75 | k: v
76 | for k, v in synthetic_bandit_feedback.items()
77 | if k in ["reward", "action", "pscore", "position"]
78 | }
79 | input_dict["action_dist"] = action_dist
80 | # estimated_rewards_by_reg_model is required
81 | with pytest.raises(
82 | TypeError,
83 | match=re.escape(
84 | "estimate_policy_value() missing 1 required positional argument: 'estimated_rewards_by_reg_model'"
85 | ),
86 | ):
87 | _ = dm.estimate_policy_value(**input_dict)
88 | # add estimated_rewards_by_reg_model
89 | input_dict["estimated_rewards_by_reg_model"] = expected_reward
90 | # check expectation
91 | estimated_policy_value = dm.estimate_policy_value(**input_dict)
92 | assert (
93 | gt_mean == estimated_policy_value
94 | ), "DM should be perfect when the regression model is perfect"
95 | # remove unnecessary keys
96 | del input_dict["reward"]
97 | del input_dict["pscore"]
98 | del input_dict["action"]
99 | estimated_policy_value = dm.estimate_policy_value(**input_dict)
100 | assert (
101 | gt_mean == estimated_policy_value
102 | ), "DM should be perfect when the regression model is perfect"
103 |
--------------------------------------------------------------------------------
/tests/ope/test_kernel_functions.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy import integrate
3 |
4 | from obp.ope import cosine_kernel
5 | from obp.ope import epanechnikov_kernel
6 | from obp.ope import gaussian_kernel
7 | from obp.ope import triangular_kernel
8 |
9 |
10 | def test_kernel_functions():
11 | # triangular
12 | assert np.isclose(
13 | integrate.quad(lambda x: triangular_kernel(x), -np.inf, np.inf)[0], 1
14 | )
15 | assert np.isclose(
16 | integrate.quad(lambda x: x * triangular_kernel(x), -np.inf, np.inf)[0], 0
17 | )
18 | assert integrate.quad(lambda x: triangular_kernel(x) ** 2, -np.inf, np.inf)[0] > 0
19 |
20 | # epanechnikov
21 | assert np.isclose(
22 | integrate.quad(lambda x: epanechnikov_kernel(x), -np.inf, np.inf)[0], 1
23 | )
24 | assert np.isclose(
25 | integrate.quad(lambda x: x * epanechnikov_kernel(x), -np.inf, np.inf)[0], 0
26 | )
27 | assert integrate.quad(lambda x: epanechnikov_kernel(x) ** 2, -np.inf, np.inf)[0] > 0
28 |
29 | # gaussian
30 | assert np.isclose(
31 | integrate.quad(lambda x: gaussian_kernel(x), -np.inf, np.inf)[0], 1
32 | )
33 | assert np.isclose(
34 | integrate.quad(lambda x: x * gaussian_kernel(x), -np.inf, np.inf)[0], 0
35 | )
36 | assert integrate.quad(lambda x: gaussian_kernel(x) ** 2, -np.inf, np.inf)[0] > 0
37 |
38 | # cosine
39 | assert np.isclose(integrate.quad(lambda x: cosine_kernel(x), -np.inf, np.inf)[0], 1)
40 | assert np.isclose(
41 | integrate.quad(lambda x: x * cosine_kernel(x), -np.inf, np.inf)[0], 0
42 | )
43 | assert integrate.quad(lambda x: cosine_kernel(x) ** 2, -np.inf, np.inf)[0] > 0
44 |
--------------------------------------------------------------------------------
/tests/policy/test_contextfree.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pytest
3 |
4 | from obp.policy.contextfree import BernoulliTS
5 | from obp.policy.contextfree import EpsilonGreedy
6 | from obp.policy.contextfree import Random
7 | from obp.policy.policy_type import PolicyType
8 |
9 |
10 | def test_contextfree_base_exception():
11 | # invalid n_actions
12 | with pytest.raises(ValueError):
13 | EpsilonGreedy(n_actions=0)
14 |
15 | with pytest.raises(TypeError):
16 | EpsilonGreedy(n_actions="3")
17 |
18 | # invalid len_list
19 | with pytest.raises(ValueError):
20 | EpsilonGreedy(n_actions=2, len_list=-1)
21 |
22 | with pytest.raises(TypeError):
23 | EpsilonGreedy(n_actions=2, len_list="5")
24 |
25 | # invalid batch_size
26 | with pytest.raises(ValueError):
27 | EpsilonGreedy(n_actions=2, batch_size=-3)
28 |
29 | with pytest.raises(TypeError):
30 | EpsilonGreedy(n_actions=2, batch_size="3")
31 |
32 | # invalid relationship between n_actions and len_list
33 | with pytest.raises(ValueError):
34 | EpsilonGreedy(n_actions=5, len_list=10)
35 |
36 | with pytest.raises(ValueError):
37 | EpsilonGreedy(n_actions=2, len_list=3)
38 |
39 |
40 | def test_egreedy_normal_epsilon():
41 |
42 | policy1 = EpsilonGreedy(n_actions=2)
43 | assert 0 <= policy1.epsilon <= 1
44 |
45 | policy2 = EpsilonGreedy(n_actions=3, epsilon=0.3)
46 | assert 0 <= policy2.epsilon <= 1
47 |
48 | # policy type
49 | assert EpsilonGreedy(n_actions=2).policy_type == PolicyType.CONTEXT_FREE
50 |
51 |
52 | def test_egreedy_abnormal_epsilon():
53 |
54 | with pytest.raises(ValueError):
55 | EpsilonGreedy(n_actions=2, epsilon=1.2)
56 |
57 | with pytest.raises(ValueError):
58 | EpsilonGreedy(n_actions=5, epsilon=-0.2)
59 |
60 |
61 | def test_egreedy_select_action_exploitation():
62 | trial_num = 50
63 | policy = EpsilonGreedy(n_actions=2, epsilon=0.0)
64 | policy.action_counts = np.array([3, 3])
65 | policy.reward_counts = np.array([3, 0])
66 | for _ in range(trial_num):
67 | assert policy.select_action()[0] == 0
68 |
69 |
70 | def test_egreedy_select_action_exploration():
71 | trial_num = 50
72 | policy = EpsilonGreedy(n_actions=2, epsilon=1.0)
73 | policy.action_counts = np.array([3, 3])
74 | policy.reward_counts = np.array([3, 0])
75 | selected_action = [policy.select_action() for _ in range(trial_num)]
76 | assert 0 < sum(selected_action)[0] < trial_num
77 |
78 |
79 | def test_egreedy_update_params():
80 | policy = EpsilonGreedy(n_actions=2, epsilon=1.0)
81 | policy.action_counts_temp = np.array([4, 3])
82 | policy.action_counts = np.copy(policy.action_counts_temp)
83 | policy.reward_counts_temp = np.array([2.0, 0.0])
84 | policy.reward_counts = np.copy(policy.reward_counts_temp)
85 | action = 0
86 | reward = 1.0
87 | policy.update_params(action, reward)
88 | assert np.array_equal(policy.action_counts, np.array([5, 3]))
89 | assert np.allclose(policy.reward_counts, np.array([2.0 + reward, 0.0]))
90 |
91 |
92 | def test_random_compute_batch_action_dist():
93 | n_actions = 10
94 | len_list = 5
95 | n_rounds = 100
96 | policy = Random(n_actions=n_actions, len_list=len_list)
97 | action_dist = policy.compute_batch_action_dist(n_rounds=n_rounds)
98 | assert action_dist.shape[0] == n_rounds
99 | assert action_dist.shape[1] == n_actions
100 | assert action_dist.shape[2] == len_list
101 | assert len(np.unique(action_dist)) == 1
102 | assert np.unique(action_dist)[0] == 1 / n_actions
103 |
104 |
105 | def test_bernoulli_ts_zozotown_prior():
106 |
107 | with pytest.raises(Exception):
108 | BernoulliTS(n_actions=2, is_zozotown_prior=True)
109 |
110 | policy_all = BernoulliTS(n_actions=2, is_zozotown_prior=True, campaign="all")
111 | # check whether it is not an non-informative prior parameter (i.e., default parameter)
112 | assert len(np.unique(policy_all.alpha)) != 1
113 | assert len(np.unique(policy_all.beta)) != 1
114 |
115 | policy_men = BernoulliTS(n_actions=2, is_zozotown_prior=True, campaign="men")
116 | assert len(np.unique(policy_men.alpha)) != 1
117 | assert len(np.unique(policy_men.beta)) != 1
118 |
119 | policy_women = BernoulliTS(n_actions=2, is_zozotown_prior=True, campaign="women")
120 | assert len(np.unique(policy_women.alpha)) != 1
121 | assert len(np.unique(policy_women.beta)) != 1
122 |
123 |
124 | def test_bernoulli_ts_select_action():
125 | # invalid relationship between n_actions and len_list
126 | with pytest.raises(ValueError):
127 | BernoulliTS(n_actions=5, len_list=10)
128 |
129 | with pytest.raises(ValueError):
130 | BernoulliTS(n_actions=2, len_list=3)
131 |
132 | policy1 = BernoulliTS(n_actions=3, len_list=3)
133 | assert np.allclose(np.sort(policy1.select_action()), np.array([0, 1, 2]))
134 |
135 | policy = BernoulliTS(n_actions=5, len_list=3)
136 | assert len(policy.select_action()) == 3
137 |
138 |
139 | def test_bernoulli_ts_update_params():
140 | policy = BernoulliTS(n_actions=2)
141 | policy.action_counts_temp = np.array([4, 3])
142 | policy.action_counts = np.copy(policy.action_counts_temp)
143 | policy.reward_counts_temp = np.array([2.0, 0.0])
144 | policy.reward_counts = np.copy(policy.reward_counts_temp)
145 | action = 0
146 | reward = 1.0
147 | policy.update_params(action, reward)
148 | assert np.array_equal(policy.action_counts, np.array([5, 3]))
149 | # in bernoulli ts, reward_counts is defined as the sum of observed rewards for each action
150 | next_reward = 2.0 + reward
151 | assert np.allclose(policy.reward_counts, np.array([next_reward, 0.0]))
152 |
153 |
154 | def test_bernoulli_ts_compute_batch_action_dist():
155 | n_rounds = 10
156 | n_actions = 5
157 | len_list = 2
158 | policy = BernoulliTS(n_actions=n_actions, len_list=len_list)
159 | action_dist = policy.compute_batch_action_dist(n_rounds=n_rounds, n_sim=30)
160 | assert action_dist.shape[0] == n_rounds
161 | assert action_dist.shape[1] == n_actions
162 | assert action_dist.shape[2] == len_list
163 |
--------------------------------------------------------------------------------
/tests/policy/test_logistic.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pytest
3 |
4 | from obp.policy.logistic import LogisticEpsilonGreedy
5 | from obp.policy.logistic import LogisticTS
6 | from obp.policy.logistic import LogisticUCB
7 | from obp.policy.logistic import MiniBatchLogisticRegression
8 |
9 |
10 | def test_logistic_base_exception():
11 | # invalid dim
12 | with pytest.raises(ValueError):
13 | LogisticEpsilonGreedy(n_actions=2, dim=-3)
14 |
15 | with pytest.raises(ValueError):
16 | LogisticEpsilonGreedy(n_actions=2, dim=0)
17 |
18 | with pytest.raises(TypeError):
19 | LogisticEpsilonGreedy(n_actions=2, dim="3")
20 |
21 | # invalid n_actions
22 | with pytest.raises(ValueError):
23 | LogisticEpsilonGreedy(n_actions=-3, dim=2)
24 |
25 | with pytest.raises(ValueError):
26 | LogisticEpsilonGreedy(n_actions=1, dim=2)
27 |
28 | with pytest.raises(TypeError):
29 | LogisticEpsilonGreedy(n_actions="2", dim=2)
30 |
31 | # invalid len_list
32 | with pytest.raises(ValueError):
33 | LogisticEpsilonGreedy(n_actions=2, dim=2, len_list=-3)
34 |
35 | with pytest.raises(ValueError):
36 | LogisticEpsilonGreedy(n_actions=2, dim=2, len_list=0)
37 |
38 | with pytest.raises(TypeError):
39 | LogisticEpsilonGreedy(n_actions=2, dim=2, len_list="3")
40 |
41 | # invalid batch_size
42 | with pytest.raises(ValueError):
43 | LogisticEpsilonGreedy(n_actions=2, dim=2, batch_size=-2)
44 |
45 | with pytest.raises(ValueError):
46 | LogisticEpsilonGreedy(n_actions=2, dim=2, batch_size=0)
47 |
48 | with pytest.raises(TypeError):
49 | LogisticEpsilonGreedy(n_actions=2, dim=2, batch_size="10")
50 |
51 | # invalid relationship between n_actions and len_list
52 | with pytest.raises(ValueError):
53 | LogisticEpsilonGreedy(n_actions=5, len_list=10, dim=2)
54 |
55 | with pytest.raises(ValueError):
56 | LogisticEpsilonGreedy(n_actions=2, len_list=3, dim=2)
57 |
58 |
59 | def test_logistic_epsilon_normal_epsilon():
60 |
61 | policy1 = LogisticEpsilonGreedy(n_actions=2, dim=2)
62 | assert 0 <= policy1.epsilon <= 1
63 |
64 | policy2 = LogisticEpsilonGreedy(n_actions=2, dim=2, epsilon=0.5)
65 | assert policy2.epsilon == 0.5
66 |
67 |
68 | def test_logistic_epsilon_abnormal_epsilon():
69 |
70 | with pytest.raises(ValueError):
71 | LogisticEpsilonGreedy(n_actions=2, dim=2, epsilon=1.3)
72 |
73 | with pytest.raises(ValueError):
74 | LogisticEpsilonGreedy(n_actions=2, dim=2, epsilon=-0.3)
75 |
76 |
77 | def test_logistic_epsilon_each_action_model():
78 | n_actions = 3
79 | policy = LogisticEpsilonGreedy(n_actions=n_actions, dim=2, epsilon=0.5)
80 | for i in range(n_actions):
81 | assert isinstance(policy.model_list[i], MiniBatchLogisticRegression)
82 |
83 |
84 | def test_logistic_epsilon_select_action_exploitation():
85 | trial_num = 50
86 | policy = LogisticEpsilonGreedy(n_actions=2, dim=2, epsilon=0.0)
87 | context = np.array([1.0, 1.0]).reshape(1, -1)
88 | policy.update_params(action=0, reward=1.0, context=context)
89 | policy.update_params(action=0, reward=1.0, context=context)
90 | policy.update_params(action=1, reward=1.0, context=context)
91 | policy.update_params(action=1, reward=0.0, context=context)
92 | for _ in range(trial_num):
93 | assert policy.select_action(context=context)[0] == 0
94 |
95 |
96 | def test_logistic_epsilon_select_action_exploration():
97 | trial_num = 50
98 | policy = LogisticEpsilonGreedy(n_actions=2, dim=2, epsilon=1.0)
99 | context = np.array([1.0, 1.0]).reshape(1, -1)
100 | policy.update_params(action=0, reward=1.0, context=context)
101 | policy.update_params(action=0, reward=1.0, context=context)
102 | policy.update_params(action=1, reward=1.0, context=context)
103 | policy.update_params(action=1, reward=0.0, context=context)
104 | selected_action = [policy.select_action(context=context) for _ in range(trial_num)]
105 | assert 0 < sum(selected_action)[0] < trial_num
106 |
107 |
108 | def test_logistic_ucb_initialize():
109 | # note that the meaning of epsilon is different from that of LogisticEpsilonGreedy
110 | with pytest.raises(ValueError):
111 | LogisticUCB(n_actions=2, dim=2, epsilon=-0.2)
112 |
113 | n_actions = 3
114 | policy = LogisticUCB(n_actions=n_actions, dim=2, epsilon=0.5)
115 | for i in range(n_actions):
116 | assert isinstance(policy.model_list[i], MiniBatchLogisticRegression)
117 |
118 |
119 | def test_logistic_ucb_select_action():
120 | dim = 3
121 | len_list = 2
122 | policy = LogisticUCB(n_actions=4, dim=dim, len_list=2, epsilon=0.0)
123 | context = np.ones(dim).reshape(1, -1)
124 | action = policy.select_action(context=context)
125 | assert len(action) == len_list
126 |
127 |
128 | def test_logistic_ts_initialize():
129 | n_actions = 3
130 | policy = LogisticTS(n_actions=n_actions, dim=2)
131 | for i in range(n_actions):
132 | assert isinstance(policy.model_list[i], MiniBatchLogisticRegression)
133 |
134 |
135 | def test_logistic_ts_select_action():
136 | dim = 3
137 | len_list = 2
138 | policy = LogisticTS(n_actions=4, dim=dim, len_list=2)
139 | context = np.ones(dim).reshape(1, -1)
140 | action = policy.select_action(context=context)
141 | assert len(action) == len_list
142 |
--------------------------------------------------------------------------------
/tests/policy/test_offline_learner_continuous_performance.py:
--------------------------------------------------------------------------------
1 | from dataclasses import dataclass
2 | from typing import Optional
3 | from typing import Tuple
4 | from typing import Union
5 |
6 | from joblib import delayed
7 | from joblib import Parallel
8 | import numpy as np
9 | import pytest
10 |
11 | from obp.dataset import linear_behavior_policy_continuous
12 | from obp.dataset import linear_reward_funcion_continuous
13 | from obp.dataset import SyntheticContinuousBanditDataset
14 | from obp.policy import BaseContinuousOfflinePolicyLearner
15 | from obp.policy import ContinuousNNPolicyLearner
16 |
17 |
18 | # n_rounds, dim_context, action_noise, reward_noise, min_action_value, max_action_value, pg_method, bandwidth
19 | offline_experiment_configurations = [
20 | (
21 | 1500,
22 | 10,
23 | 1.0,
24 | 1.0,
25 | -10.0,
26 | 10.0,
27 | "dpg",
28 | None,
29 | ),
30 | (
31 | 2000,
32 | 5,
33 | 1.0,
34 | 1.0,
35 | 0.0,
36 | 100.0,
37 | "dpg",
38 | None,
39 | ),
40 | ]
41 |
42 |
43 | @dataclass
44 | class RandomPolicy(BaseContinuousOfflinePolicyLearner):
45 | output_space: Tuple[Union[int, float], Union[int, float]] = None
46 |
47 | def fit(self):
48 | raise NotImplementedError
49 |
50 | def predict(self, context: np.ndarray) -> np.ndarray:
51 |
52 | n_rounds = context.shape[0]
53 | predicted_actions = np.random.uniform(
54 | self.output_space[0], self.output_space[1], size=n_rounds
55 | )
56 | return predicted_actions
57 |
58 |
59 | @pytest.mark.parametrize(
60 | "n_rounds, dim_context, action_noise, reward_noise, min_action_value, max_action_value, pg_method, bandwidth",
61 | offline_experiment_configurations,
62 | )
63 | def test_offline_nn_policy_learner_performance(
64 | n_rounds: int,
65 | dim_context: int,
66 | action_noise: float,
67 | reward_noise: float,
68 | min_action_value: float,
69 | max_action_value: float,
70 | pg_method: str,
71 | bandwidth: Optional[float],
72 | ) -> None:
73 | def process(i: int):
74 | # synthetic data generator
75 | dataset = SyntheticContinuousBanditDataset(
76 | dim_context=dim_context,
77 | action_noise=action_noise,
78 | reward_noise=reward_noise,
79 | min_action_value=min_action_value,
80 | max_action_value=max_action_value,
81 | reward_function=linear_reward_funcion_continuous,
82 | behavior_policy_function=linear_behavior_policy_continuous,
83 | random_state=i,
84 | )
85 | # define evaluation policy using NNPolicyLearner
86 | nn_policy = ContinuousNNPolicyLearner(
87 | dim_context=dim_context,
88 | pg_method=pg_method,
89 | bandwidth=bandwidth,
90 | output_space=(min_action_value, max_action_value),
91 | hidden_layer_size=(10, 10),
92 | learning_rate_init=0.001,
93 | max_iter=200,
94 | solver="sgd",
95 | q_func_estimator_hyperparams={"max_iter": 200},
96 | )
97 | # baseline method 1. RandomPolicy
98 | random_policy = RandomPolicy(output_space=(min_action_value, max_action_value))
99 | # sample new training and test sets of synthetic logged bandit data
100 | bandit_feedback_train = dataset.obtain_batch_bandit_feedback(
101 | n_rounds=n_rounds,
102 | )
103 | bandit_feedback_test = dataset.obtain_batch_bandit_feedback(
104 | n_rounds=n_rounds,
105 | )
106 | # train the evaluation policy on the training set of the synthetic logged bandit data
107 | nn_policy.fit(
108 | context=bandit_feedback_train["context"],
109 | action=bandit_feedback_train["action"],
110 | reward=bandit_feedback_train["reward"],
111 | pscore=bandit_feedback_train["pscore"],
112 | )
113 | # predict the action decisions for the test set of the synthetic logged bandit data
114 | actions_predicted_by_nn_policy = nn_policy.predict(
115 | context=bandit_feedback_test["context"],
116 | )
117 | actions_predicted_by_random = random_policy.predict(
118 | context=bandit_feedback_test["context"],
119 | )
120 | # get the ground truth policy value for each learner
121 | gt_nn_policy_learner = dataset.calc_ground_truth_policy_value(
122 | context=bandit_feedback_test["context"],
123 | action=actions_predicted_by_nn_policy,
124 | )
125 | gt_random_policy = dataset.calc_ground_truth_policy_value(
126 | context=bandit_feedback_test["context"],
127 | action=actions_predicted_by_random,
128 | )
129 |
130 | return gt_nn_policy_learner, gt_random_policy
131 |
132 | n_runs = 10
133 | processed = Parallel(
134 | n_jobs=1, # PyTorch uses multiple threads
135 | verbose=0,
136 | )([delayed(process)(i) for i in np.arange(n_runs)])
137 | list_gt_nn_policy, list_gt_random = [], []
138 | for i, ground_truth_policy_values in enumerate(processed):
139 | gt_nn_policy, gt_random = ground_truth_policy_values
140 | list_gt_nn_policy.append(gt_nn_policy)
141 | list_gt_random.append(gt_random)
142 |
143 | assert np.mean(list_gt_nn_policy) > np.mean(list_gt_random)
144 |
--------------------------------------------------------------------------------
/tests/test_utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | from obp.utils import sample_action_fast
4 | from obp.utils import softmax
5 |
6 |
7 | def test_sample_action_fast():
8 | n_rounds = 10
9 | n_actions = 5
10 | n_sim = 100000
11 |
12 | true_probs = softmax(np.random.normal(size=(n_rounds, n_actions)))
13 | sampled_action_list = list()
14 | for _ in np.arange(n_sim):
15 | sampled_action_list.append(sample_action_fast(true_probs)[:, np.newaxis])
16 |
17 | sampled_action_arr = np.concatenate(sampled_action_list, 1)
18 | for i in np.arange(n_rounds):
19 | sampled_action_counts = np.unique(sampled_action_arr[i], return_counts=True)[1]
20 | empirical_probs = sampled_action_counts / n_sim
21 | assert np.isclose(true_probs[i], empirical_probs, rtol=5e-2, atol=1e-3).all()
22 |
--------------------------------------------------------------------------------