├── .gitignore ├── LICENSE ├── README.md ├── data_prep.ipynb ├── ecom_sample_clean.parquet ├── elasticity_dml.ipynb ├── environment.yml └── img └── elast_dml_result.png /.gitignore: -------------------------------------------------------------------------------- 1 | models/ 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 larsroemheld 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Causal Inference in the Wild: Elasticity Pricing 2 | This example gives a real-life example of causal inference in the wild. I use real ecommerce sales data and Double Machine Learning [1] to infer causal effects of price on quantity sold ("price elasticity of demand"). See the [accompanying blogpost](https://medium.com/@lars.roemheld/causal-inference-example-elasticity-de4a3e2e621b?source=friends_link&sk=9165a7fcb8f806fe0dd05cf8df702216) for an easier-to-read version. 3 | 4 | The plot below shows the main result: Double Machine Learning yields a beautifully observable estimate of elasticity, notably different from naive correlation. 5 | ![A binned scatterplot showing DML in action](img/elast_dml_result.png) 6 | 7 | ## Getting started 8 | - `environment.yml` describes a conda environment. Nothing too fancy, this should run under most standard Anaconda settings. 9 | - `data_prep.ipynb` contains the data preparation steps. It is included only for completeness and to get a sense of real-world data; the output is a clean dataset in `econ_sample_clean.parquet` 10 | - `elasticity_dml.ipynb` contains the main code, and many explanations for context. 11 | - `models/` saves pre-trained RandomForest models (they get too large for github) 12 | 13 | The project originally accompanies a workshop on causal inference for machine learners. Reach out if you are interested to learn more. 14 | 15 | [1] Chernozhukov, Victor, et al.: Double/Debiased Machine Learning for Treatment and Structural Parameters. The Econometrics Journal, Volume 21, 2018. https://academic.oup.com/ectj/article/21/1/C1/5056401 16 | -------------------------------------------------------------------------------- /ecom_sample_clean.parquet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/larsroemheld/causalinf_ex_elasticity/f102ec3acf6f1cb6a2a6bc70c0914f51a4f783c7/ecom_sample_clean.parquet -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: ex_causalinf_env 2 | channels: 3 | - defaults 4 | dependencies: 5 | - anyio=2.2.0=py39hecd8cb5_1 6 | - appnope=0.1.2=py39hecd8cb5_1001 7 | - argon2-cffi=20.1.0=py39h9ed2024_1 8 | - arrow-cpp=4.0.1=py39hf7c73f6_3 9 | - async_generator=1.10=pyhd3eb1b0_0 10 | - attrs=21.2.0=pyhd3eb1b0_0 11 | - aws-c-common=0.4.57=hb1e8313_1 12 | - aws-c-event-stream=0.1.6=h23ab428_5 13 | - aws-checksums=0.1.9=hb1e8313_0 14 | - aws-sdk-cpp=1.8.185=he271ece_0 15 | - babel=2.9.1=pyhd3eb1b0_0 16 | - backcall=0.2.0=pyhd3eb1b0_0 17 | - blas=1.0=mkl 18 | - bleach=3.3.0=pyhd3eb1b0_0 19 | - boost-cpp=1.73.0=h9ed2024_11 20 | - bottleneck=1.3.2=py39he3068b8_1 21 | - brotli=1.0.9=hb1e8313_2 22 | - brotlipy=0.7.0=py39h9ed2024_1003 23 | - bzip2=1.0.8=h1de35cc_0 24 | - c-ares=1.17.1=h9ed2024_0 25 | - ca-certificates=2021.7.5=hecd8cb5_1 26 | - certifi=2021.5.30=py39hecd8cb5_0 27 | - cffi=1.14.5=py39h2125817_0 28 | - chardet=4.0.0=py39hecd8cb5_1003 29 | - cryptography=3.4.7=py39h2fd3fbb_0 30 | - cycler=0.10.0=py39hecd8cb5_0 31 | - decorator=5.0.9=pyhd3eb1b0_0 32 | - defusedxml=0.7.1=pyhd3eb1b0_0 33 | - double-conversion=3.1.5=haf313ee_1 34 | - entrypoints=0.3=py39hecd8cb5_0 35 | - freetype=2.10.4=ha233b18_0 36 | - gflags=2.2.2=h0a44026_0 37 | - glog=0.5.0=h23ab428_0 38 | - grpc-cpp=1.26.0=h044775b_0 39 | - icu=58.2=h0a44026_3 40 | - idna=2.10=pyhd3eb1b0_0 41 | - importlib-metadata=3.10.0=py39hecd8cb5_0 42 | - importlib_metadata=3.10.0=hd3eb1b0_0 43 | - intel-openmp=2021.2.0=hecd8cb5_564 44 | - ipykernel=5.3.4=py39h01d92e1_0 45 | - ipython=7.22.0=py39h01d92e1_0 46 | - ipython_genutils=0.2.0=pyhd3eb1b0_1 47 | - jedi=0.17.2=py39hecd8cb5_1 48 | - jinja2=3.0.1=pyhd3eb1b0_0 49 | - joblib=1.0.1=pyhd3eb1b0_0 50 | - jpeg=9b=he5867d9_2 51 | - json5=0.9.6=pyhd3eb1b0_0 52 | - jsonschema=3.2.0=py_2 53 | - jupyter-packaging=0.7.12=pyhd3eb1b0_0 54 | - jupyter_client=6.1.12=pyhd3eb1b0_0 55 | - jupyter_core=4.7.1=py39hecd8cb5_0 56 | - jupyter_server=1.4.1=py39hecd8cb5_0 57 | - jupyterlab=3.0.14=pyhd3eb1b0_1 58 | - jupyterlab_pygments=0.1.2=py_0 59 | - jupyterlab_server=2.4.0=pyhd3eb1b0_0 60 | - kiwisolver=1.3.1=py39h23ab428_0 61 | - krb5=1.18.2=h75d18d8_0 62 | - lcms2=2.12=hf1fd2bf_0 63 | - libboost=1.73.0=hd4c2dcd_11 64 | - libcurl=7.71.1=h8a08a2b_1 65 | - libcxx=10.0.0=1 66 | - libedit=3.1.20210216=h9ed2024_1 67 | - libevent=2.1.8=hddc9c9b_1 68 | - libffi=3.3=hb1e8313_2 69 | - libgfortran=3.0.1=h93005f0_2 70 | - libiconv=1.16=h1de35cc_0 71 | - libpng=1.6.37=ha441bb4_0 72 | - libprotobuf=3.11.2=hd9629dc_0 73 | - libsodium=1.0.18=h1de35cc_0 74 | - libssh2=1.9.0=ha12b0ac_1 75 | - libthrift=0.13.0=h054ceb0_6 76 | - libtiff=4.2.0=h87d7836_0 77 | - libwebp-base=1.2.0=h9ed2024_0 78 | - llvm-openmp=10.0.0=h28b9765_0 79 | - lz4-c=1.9.3=h23ab428_0 80 | - markupsafe=2.0.1=py39h9ed2024_0 81 | - matplotlib=3.3.4=py39hecd8cb5_0 82 | - matplotlib-base=3.3.4=py39h8b3ea08_0 83 | - mistune=0.8.4=py39h9ed2024_1000 84 | - mkl=2021.2.0=hecd8cb5_269 85 | - mkl-service=2.3.0=py39h9ed2024_1 86 | - mkl_fft=1.3.0=py39h4a7008c_2 87 | - mkl_random=1.2.1=py39hb2f4e1b_2 88 | - nbclassic=0.2.6=pyhd3eb1b0_0 89 | - nbclient=0.5.3=pyhd3eb1b0_0 90 | - nbconvert=6.1.0=py39hecd8cb5_0 91 | - nbformat=5.1.3=pyhd3eb1b0_0 92 | - ncurses=6.2=h0a44026_1 93 | - nest-asyncio=1.5.1=pyhd3eb1b0_0 94 | - notebook=6.4.0=py39hecd8cb5_0 95 | - numexpr=2.7.3=py39h5873af2_1 96 | - numpy=1.20.2=py39h4b4dc7a_0 97 | - numpy-base=1.20.2=py39he0bd621_0 98 | - olefile=0.46=py_0 99 | - openssl=1.1.1k=h9ed2024_0 100 | - orc=1.6.7=h001ef8f_2 101 | - packaging=21.0=pyhd3eb1b0_0 102 | - pandas=1.2.5=py39h23ab428_0 103 | - pandocfilters=1.4.3=py39hecd8cb5_1 104 | - parso=0.7.0=py_0 105 | - patsy=0.5.1=py39hecd8cb5_0 106 | - pexpect=4.8.0=pyhd3eb1b0_3 107 | - pickleshare=0.7.5=pyhd3eb1b0_1003 108 | - pillow=8.2.0=py39h5270095_0 109 | - pip=21.1.3=py39hecd8cb5_0 110 | - prometheus_client=0.11.0=pyhd3eb1b0_0 111 | - prompt-toolkit=3.0.17=pyh06a4308_0 112 | - ptyprocess=0.7.0=pyhd3eb1b0_2 113 | - pyarrow=4.0.1=py39hdf3e9eb_3 114 | - pycparser=2.20=py_2 115 | - pygments=2.9.0=pyhd3eb1b0_0 116 | - pyopenssl=20.0.1=pyhd3eb1b0_1 117 | - pyparsing=2.4.7=pyhd3eb1b0_0 118 | - pyrsistent=0.18.0=py39h9ed2024_0 119 | - pysocks=1.7.1=py39hecd8cb5_0 120 | - python=3.9.5=h88f2d9e_3 121 | - python-dateutil=2.8.1=pyhd3eb1b0_0 122 | - pytz=2021.1=pyhd3eb1b0_0 123 | - pyzmq=20.0.0=py39h23ab428_1 124 | - re2=2020.11.01=h23ab428_1 125 | - readline=8.1=h9ed2024_0 126 | - requests=2.25.1=pyhd3eb1b0_0 127 | - scikit-learn=0.24.2=py39hb2f4e1b_0 128 | - scipy=1.6.2=py39hd5f7400_1 129 | - seaborn=0.11.1=pyhd3eb1b0_0 130 | - send2trash=1.5.0=pyhd3eb1b0_1 131 | - setuptools=52.0.0=py39hecd8cb5_0 132 | - six=1.16.0=pyhd3eb1b0_0 133 | - snappy=1.1.8=hb1e8313_0 134 | - sniffio=1.2.0=py39hecd8cb5_1 135 | - sqlite=3.36.0=hce871da_0 136 | - statsmodels=0.12.2=py39h9ed2024_0 137 | - terminado=0.9.4=py39hecd8cb5_0 138 | - testpath=0.5.0=pyhd3eb1b0_0 139 | - threadpoolctl=2.1.0=pyh5ca1d4c_0 140 | - tk=8.6.10=hb0a8c7a_0 141 | - tornado=6.1=py39h9ed2024_0 142 | - traitlets=5.0.5=pyhd3eb1b0_0 143 | - tzdata=2021a=h52ac0ba_0 144 | - uriparser=0.9.3=h0a44026_1 145 | - urllib3=1.26.6=pyhd3eb1b0_1 146 | - utf8proc=2.6.1=h9ed2024_0 147 | - wcwidth=0.2.5=py_0 148 | - webencodings=0.5.1=py39hecd8cb5_1 149 | - wheel=0.36.2=pyhd3eb1b0_0 150 | - xz=5.2.5=h1de35cc_0 151 | - zeromq=4.3.4=h23ab428_0 152 | - zipp=3.5.0=pyhd3eb1b0_0 153 | - zlib=1.2.11=h1de35cc_3 154 | - zstd=1.4.9=h322a384_0 155 | prefix: /Users/lars/anaconda3/envs/ex_causalinf_env 156 | -------------------------------------------------------------------------------- /img/elast_dml_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/larsroemheld/causalinf_ex_elasticity/f102ec3acf6f1cb6a2a6bc70c0914f51a4f783c7/img/elast_dml_result.png --------------------------------------------------------------------------------