├── CONTRIBUTING.md
├── solutions
├── x_media_review
│ └── README.md
└── causal-impact
│ ├── README.md
│ └── CausalImpact_with_Experimental_Design.ipynb
├── README.md
├── .github
└── workflows
│ └── scorecard.yml
└── LICENSE
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | We'd love to accept your patches and contributions to this project. There are
4 | just a few small guidelines you need to follow.
5 |
6 | ## Contributor License Agreement
7 |
8 | Contributions to this project must be accompanied by a Contributor License
9 | Agreement (CLA). You (or your employer) retain the copyright to your
10 | contribution; this simply gives us permission to use and redistribute your
11 | contributions as part of the project. Head over to
12 | to see your current agreements on file or
13 | to sign a new one.
14 |
15 | You generally only need to submit a CLA once, so if you've already submitted one
16 | (even if it was for a different project), you probably don't need to do it
17 | again.
18 |
19 | ## Code Reviews
20 |
21 | All submissions, including submissions by project members, require review. We
22 | use GitHub pull requests for this purpose. Consult
23 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
24 | information on using pull requests.
25 |
26 | ## Community Guidelines
27 |
28 | This project follows
29 | [Google's Open Source Community Guidelines](https://opensource.google/conduct/).
30 |
--------------------------------------------------------------------------------
/solutions/x_media_review/README.md:
--------------------------------------------------------------------------------
1 | # Cross-Media Review with same level
2 |
3 | ##### This is not an official Google product.
4 |
5 | Advertisers often use many media to achieve their goals, but because standardsvary from one medium to another,
6 | it can be difficult to properly evaluate strategies and make appropriate decisions.
7 |
8 | By using Google Analytics 4 conversions, the media that directly drive sessions are limited,
9 | but multiple media can be evaluated at the same level.
10 |
11 | By providing the monthly investment amount for each medium,
12 | it produces consistent output about the effectiveness and efficiency of each measure, which is what marketers want to know.
13 |
14 |
15 | ## Overview
16 |
17 | ### What you can do with Cross-Media Review with same level
18 |
19 | - Visualization of effectiveness and efficiency in time series
20 | - Visualization of monthly differences in effectiveness for each media
21 | - Visualization of monthly efficiency by media
22 |
23 |
24 | ### Motivation to develop and open the source code
25 |
26 | There are cases where the effectiveness and efficiency of media with different standards are evaluated as they are,
27 | and cases where media with different roles are evaluated with the same standards (last click model).
28 | In these cases, there is a possibility that resources are not being allocated to measures that are truly contributing to acquisition,
29 | so we created this to reduce this possibility and improve productivity.
30 |
31 |
32 | ### Typical procedure for use
33 |
34 | 1. TBW
35 |
36 |
37 |
38 | ## Getting started
39 |
40 | 1. Prepare the time series data on spreadsheet
41 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Business Intelligence Group - Marketing Analysis & Data Science Solution Packages
2 |
3 | ##### This is not an official Google product.
4 |
5 | ## Overview
6 | This repository provides a collection of Jupyter Notebooks designed to help marketing analysts and data scientists measure and analyze the effectiveness of marketing campaigns. These notebooks facilitate data preprocessing, visualization, statistical analysis, and machine learning model building, enabling a deep dive into campaign performance and the extraction of actionable insights. By customizing and utilizing these notebooks, marketing analysts and data scientists can develop and execute more effective campaign strategies, ultimately driving data-informed decisions and optimizing marketing ROI.
7 |
8 | ### Motivation to develop and open the source code
9 | - CausalImpact with experimental design
10 | - Some marketing practitioners pay attention to
11 | [Causal inference in statistics](https://en.wikipedia.org/wiki/Causal_inference). However,
12 | using time series data without parallel trend assumptions does not allow for
13 | appropriate analysis. Therefore, the purpose is to enable the implementation and
14 | analysis of interventions after classifying time-series data for which parallel
15 | trend assumptions can be made.
16 |
17 | For contributions, see [CONTRIBUTING.md](https://github.com/google/business_intelligence_group/blob/main/CONTRIBUTING.md).
18 |
19 | ### Available solution packages:
20 | - [CausalImpact with experimental design](https://github.com/google/business_intelligence_group/tree/main/solutions/causal-impact)
21 |
22 |
23 | ## Note
24 | - **Analysis should not be the goal**
25 | - Solving business problems is the goal.
26 | - Be clear about the decision you want to make to solve business problems.
27 | - Make clear the path to what you need to know to make a decision.
28 | - Analysis is one way to find out what you need to know.
29 | - **[Test your hypotheses instead of testing the effectiveness]**
30 | - Formulate hypotheses about why there are issues in the current situation and how to solve them.
31 | - Business situations are constantly changing, so analysis without a hypothesis will not be reproducible.
32 | - **[Be honest with the data]**
33 | - However, playing with data to prove a hypothesis is strictly prohibited.
34 | - Acquire the necessary knowledge to be able to conduct appropriate verification.
35 | - Do not do [HARKing](https://en.wikipedia.org/wiki/HARKing)(hypothesizing after the results are known)
36 | - Do not do [p-hacking](https://en.wikipedia.org/wiki/Data_dredging)
37 |
--------------------------------------------------------------------------------
/.github/workflows/scorecard.yml:
--------------------------------------------------------------------------------
1 | # This workflow uses actions that are not certified by GitHub. They are provided
2 | # by a third-party and are governed by separate terms of service, privacy
3 | # policy, and support documentation.
4 |
5 | name: Scorecard supply-chain security
6 | on:
7 | # For Branch-Protection check. Only the default branch is supported. See
8 | # https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
9 | branch_protection_rule:
10 | # To guarantee Maintained check is occasionally updated. See
11 | # https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
12 | schedule:
13 | - cron: '29 10 * * 1'
14 | push:
15 | branches: [ "main" ]
16 |
17 | # Declare default permissions as read only.
18 | permissions: read-all
19 |
20 | jobs:
21 | analysis:
22 | name: Scorecard analysis
23 | runs-on: ubuntu-latest
24 | permissions:
25 | # Needed to upload the results to code-scanning dashboard.
26 | security-events: write
27 | # Needed to publish results and get a badge (see publish_results below).
28 | id-token: write
29 | # Uncomment the permissions below if installing in a private repository.
30 | # contents: read
31 | # actions: read
32 |
33 | steps:
34 | - name: "Checkout code"
35 | uses: actions/checkout@93ea575cb5d8a053eaa0ac8fa3b40d7e05a33cc8 # v3.1.0
36 | with:
37 | persist-credentials: false
38 |
39 | - name: "Run analysis"
40 | uses: ossf/scorecard-action@99c53751e09b9529366343771cc321ec74e9bd3d # v2.0.6
41 | with:
42 | results_file: results.sarif
43 | results_format: sarif
44 | # (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
45 | # - you want to enable the Branch-Protection check on a *public* repository, or
46 | # - you are installing Scorecard on a *private* repository
47 | # To create the PAT, follow the steps in https://github.com/ossf/scorecard-action#authentication-with-pat.
48 | # repo_token: ${{ secrets.SCORECARD_TOKEN }}
49 |
50 | # Public repositories:
51 | # - Publish results to OpenSSF REST API for easy access by consumers
52 | # - Allows the repository to include the Scorecard badge.
53 | # - See https://github.com/ossf/scorecard-action#publishing-results.
54 | # For private repositories:
55 | # - `publish_results` will always be set to `false`, regardless
56 | # of the value entered here.
57 | publish_results: true
58 |
59 | # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
60 | # format to the repository Actions tab.
61 | - name: "Upload artifact"
62 | uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # v3.1.0
63 | with:
64 | name: SARIF file
65 | path: results.sarif
66 | retention-days: 5
67 |
68 | # Upload the results to GitHub's code scanning dashboard.
69 | - name: "Upload to code-scanning"
70 | uses: github/codeql-action/upload-sarif@807578363a7869ca324a79039e6db9c843e0e100 # v2.1.27
71 | with:
72 | sarif_file: results.sarif
73 |
--------------------------------------------------------------------------------
/solutions/causal-impact/README.md:
--------------------------------------------------------------------------------
1 | # CausalImpact with experimental design
2 |
3 | ##### This is not an official Google product.
4 |
5 | CausalImpact is a R package for causal inference using Bayesian structural
6 | time-series models. In using CausalImpact, the parallel trend assumption is
7 | needed for counterfactual modeling, so this code performs classification of time
8 | series data based on DTW distances.
9 |
10 | ## Overview
11 |
12 | ### What you can do with CausalImpact with experiment design
13 |
14 | - Experimental Design
15 | - load time series data from google spreadsheet or csv file
16 | - classify time series data so that parallel trend assumptions can be made
17 | - simulate the conditions required for verification
18 | - CausalImpact Analysis
19 | - load time series data from google spreadsheet or csv file
20 | - CausalImpact Analysis
21 |
22 | ### Motivation to develop and open the source code
23 |
24 | Some marketing practitioners pay attention to
25 | [Causal inference in statistics](https://en.wikipedia.org/wiki/Causal_inference). However,
26 | using time series data without parallel trend assumptions does not allow for
27 | appropriate analysis. Therefore, the purpose is to enable the implementation and
28 | analysis of interventions after classifying time-series data for which parallel
29 | trend assumptions can be made.
30 |
31 | ### Typical procedure for use
32 |
33 | 1. Assume a hypothetical solution to the issue and its factors.
34 | 2. Assume room for KPIs and the mechanisms that drive KPIs depending on the solution.
35 | 3. In advance, decide next-actions to be taken for each result of hypothesis testing (with/without significant difference).
36 | - Recommend supporting the mechanism with relevant data other than KPIs
37 | 4. Prepare time-series KPI data for at least 100 time points.
38 | - Regional segmentation is recommended.
39 | - Previous data, such as the previous year, may make a difference in the market environment.
40 | - Relevant data must be independent and unaffected by interventions
41 | 5. **(Experimental Design)** This tool is used to conduct the experimental design.
42 | - Split into groups that are closest to each other where the parallel trend assumption can be placed.
43 | - Simulation of required timeframe and budget
44 | - :warning: If the parallel trend assumption cannot be placed, we recommend considering another approach
45 | 6. Implemented interventions.
46 | 7. Prepare time-series KPI data, including intervention period and assumed residual period, in addition to previous data.
47 | 8. **(CausalImpact Analysis)** Conduct CausalImpact analysis.
48 | 9. Implement next actions based on results of hypothesis testing
49 |
50 | ## Note
51 |
52 | - Do not do [HARKing](https://en.wikipedia.org/wiki/HARKing)(hypothesizing after the results are known)
53 | - Do not do [p-hacking](https://en.wikipedia.org/wiki/Data_dredging)
54 |
55 | ## Getting started
56 |
57 | 1. Prepare the time series data on spreadsheet or csv file
58 | 2. Open ipynb file with **[Open in Colab](https://colab.research.google.com/github/google/business_intelligence_group/blob/main/solutions/causal-impact/CausalImpact_with_Experimental_Design.ipynb)** Button.
59 | 3. Run cells in sequence
60 |
61 | ## Tutorial
62 | #### CausalImpact Analysis Section
63 |
64 | 1. Press the **Connect** button to connect to the runtime
65 |
66 | 2. Run **Step 1** cell. Step 1 cells take a little longer because they install the [tfcausalImpact library](https://github.com/WillianFuks/tfcausalimpact).
If you do so, you will see some selections in the Step 1 cell.
67 |
68 | 3. In question 1, choose **CausalImpact Analysis** and update period before the intervention(**Pre Start & Pre End**) and the period during the intervention(**Post Start and Post End**).
69 | 
70 |
71 | 4. In question 2, please select the data source from **google_spreadsheet**, **CSV_file**, or **Big_Query**.
72 | Then enter the required items.
73 | 
74 |
75 | 5. After entering the required items, the data format will be selected. For CausalImpact analysis, please prepare the data in **wide format** in advance.
76 | After selecting wide format, enter the **date column name**.
77 | 
78 |
79 | 6. Once the items are filled in, run the **Step 2** cell.
80 | (:warning: If you have selected **google_spreadsheet** or **big_query**, a pop-up will appear regarding granting permission, so please grant it to Colab.)
81 |
82 | 7. After Step 2 is executed, you will see **the results of CausalImpact Analysis**.
83 | 
84 |
85 | #### Experimental Design Section
86 |
87 | 1. Press the **Connect** button to connect to the runtime
88 |
89 | 2. Run **Step 1** cell. Step 1 cells take a little longer because they install the [tfcausalImpact library](https://github.com/WillianFuks/tfcausalimpact).
If you do so, you will see some selections in the Step 1 cell.
90 |
91 | 3. In question 1, choose **Experimental Design** and update the term(**Start Date & End Date**) to be used in the Experimental Design.
92 | 
93 |
94 | 4. After updating the term, select the **type of Experimental Design** and update the required items.
95 | * A: divide_equally divides the time series data into n groups with similar movements.
96 | * B: similarity_selection extracts n groups that move similarly to a particular column.
97 |
98 | 
99 |
100 | 5. After updating required items, enter the estimated incremental CPA.
101 | 
102 |
103 | 6. In question 2, please select the data source from **google_spreadsheet**, **CSV_file**, or **Big_Query**.
104 | Then enter the required items.
105 | 
106 |
107 | 6. After entering the required items, select data format [**narrow_format** or **wide_format**](https://en.wikipedia.org/wiki/Wide_and_narrow_data) and enter the required fields.
108 | 
109 |
110 | 7. Once the items are filled in, run the **Step 2** cell.
111 | (:warning: If you have selected **google_spreadsheet** or **big_query**, a pop-up will appear regarding granting permission, so please grant it to Colab.)
112 |
113 | 8. The output results will vary depending on the type of experimental design, but select the data on which you want to run the simulation.
114 |
115 | 9. Once the items are filled in, run the **Step 3** cell. Depending on the data, this may take more than 10 minutes.
116 | After Step 3 is run, the results are displayed in a table. Check the MAPE, budget and p-value, and consider the intervention period and the assumed increments to experimental design.
117 | 
118 |
119 | 10. run the **Step 4** cell.
120 | 
121 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/solutions/causal-impact/CausalImpact_with_Experimental_Design.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "view-in-github",
7 | "colab_type": "text"
8 | },
9 | "source": [
10 | "
"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {
16 | "id": "dH_QXijHFzJP"
17 | },
18 | "source": [
19 | "# **CausalImpact with Experimental Design**\n",
20 | "\n",
21 | "This Colab file contains *Experimental Design* and *CausalImpact Analysis*.\n",
22 | "\n",
23 | "See [README.md](https://github.com/google/business_intelligence_group/tree/main/solutions/causal-impact) for details\n",
24 | "\n",
25 | "---\n",
26 | "\n",
27 | "Copyright 2024 Google LLC\n",
28 | "\n",
29 | "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0\n",
30 | "\n",
31 | "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "source": [
37 | "# @title Step.1 (~ 2min)\n",
38 | "%%time\n",
39 | "\n",
40 | "import sys\n",
41 | "if 'fastdtw' not in sys.modules:\n",
42 | " !pip install 'fastdtw' --q\n",
43 | "if 'tslearn' not in sys.modules:\n",
44 | " !pip install 'tslearn' --q\n",
45 | "if 'tfp-causalimpact' not in sys.modules:\n",
46 | " !pip install 'tfp-causalimpact' --q\n",
47 | "\n",
48 | "# Data Load\n",
49 | "from google.colab import auth, files, widgets\n",
50 | "from google.auth import default\n",
51 | "from google.cloud import bigquery\n",
52 | "import io\n",
53 | "import os\n",
54 | "import gspread\n",
55 | "from oauth2client.client import GoogleCredentials\n",
56 | "\n",
57 | "# Calculate\n",
58 | "import altair as alt\n",
59 | "import itertools\n",
60 | "import random\n",
61 | "import numpy as np\n",
62 | "import pandas as pd\n",
63 | "import fastdtw\n",
64 | "\n",
65 | "from tslearn.clustering import TimeSeriesKMeans\n",
66 | "from decimal import Decimal, ROUND_HALF_UP\n",
67 | "from scipy.spatial.distance import euclidean\n",
68 | "from sklearn.metrics import mean_absolute_percentage_error\n",
69 | "from sklearn.preprocessing import MinMaxScaler\n",
70 | "from statsmodels.tsa.seasonal import STL\n",
71 | "\n",
72 | "# UI/UX\n",
73 | "import datetime\n",
74 | "from dateutil.relativedelta import relativedelta\n",
75 | "import ipywidgets\n",
76 | "from IPython.display import display, Markdown, HTML, Javascript\n",
77 | "from tqdm.auto import tqdm\n",
78 | "import warnings\n",
79 | "warnings.simplefilter('ignore')\n",
80 | "\n",
81 | "# causalimpact\n",
82 | "import causalimpact\n",
83 | "import tensorflow as tf\n",
84 | "import tensorflow_probability as tfp\n",
85 | "tfd = tfp.distributions\n",
86 | "\n",
87 | "class PreProcess(object):\n",
88 | " \"\"\"PreProcess handles process from data loading to visualization.\n",
89 | "\n",
90 | " Create a UI, load time series data based on input and do some\n",
91 | " transformations to pass it to analysis. This also includes visualization of\n",
92 | " points that should be confirmed in time series data.\n",
93 | "\n",
94 | " Attributes:\n",
95 | " _apply_text_style: Decorate the text\n",
96 | " define_ui: Define the UI using ipywidget\n",
97 | " generate_ui: Generates UI for input from the user\n",
98 | " load_data: Load data from any data source\n",
99 | " _load_data_from_sheet: Load data from spreadsheet\n",
100 | " _load_data_from_csv: Load data from CSV\n",
101 | " _load_data_from_bigquery: Load data from Big Query\n",
102 | " format_date: Set index\n",
103 | " _shape_wide: Configure narrow/wide conversion\n",
104 | " _trend_check: Visualize data\n",
105 | " saving_params: Save the contents entered in the UI\n",
106 | " set_params: Set the saved input contents to the instance\n",
107 | " \"\"\"\n",
108 | "\n",
109 | " def __init__(self):\n",
110 | " self.define_ui()\n",
111 | "\n",
112 | " @staticmethod\n",
113 | " def _apply_text_style(type, text):\n",
114 | " if type == 'success':\n",
115 | " return print(f\"\\033[38;2;15;157;88m \" + text + \"\\033[0m\")\n",
116 | "\n",
117 | " if type == 'failure':\n",
118 | " return print(f\"\\033[38;2;219;68;55m \" + text + \"\\033[0m\")\n",
119 | "\n",
120 | " if isinstance(type, int):\n",
121 | " span_style = ipywidgets.HTML(\n",
122 | " \"\"\n",
124 | " + text\n",
125 | " + ''\n",
126 | " )\n",
127 | " return span_style\n",
128 | "\n",
129 | " def define_ui(self):\n",
130 | " self._define_data_source_widgets()\n",
131 | " self._define_data_format_widgets()\n",
132 | " self._define_date_widgets()\n",
133 | " self._define_experimental_design_widgets()\n",
134 | " self._define_simulation_widgets()\n",
135 | "\n",
136 | " def _define_data_source_widgets(self):\n",
137 | " # Input box for data sources\n",
138 | " self.sheet_url = ipywidgets.Text(\n",
139 | " placeholder='Please enter google spreadsheet url',\n",
140 | " value='https://docs.google.com/spreadsheets/d/1dISrbX1mZHgzpsIct2QXFOWWRRJiCxDSmSzjuZz64Tw/edit#gid=0',\n",
141 | " description='spreadsheet url:',\n",
142 | " style={'description_width': 'initial'},\n",
143 | " layout=ipywidgets.Layout(width='800px'),\n",
144 | " )\n",
145 | " self.sheet_name = ipywidgets.Text(\n",
146 | " placeholder='Please enter sheet name',\n",
147 | " value='analysis_data',\n",
148 | " # value='raw_data',\n",
149 | " description='sheet name:',\n",
150 | " )\n",
151 | " self.csv_name = ipywidgets.Text(\n",
152 | " placeholder='Please enter csv name',\n",
153 | " description='csv name:',\n",
154 | " layout=ipywidgets.Layout(width='500px'),\n",
155 | " )\n",
156 | " self.bq_project_id = ipywidgets.Text(\n",
157 | " placeholder='Please enter project id',\n",
158 | " description='project id:',\n",
159 | " layout=ipywidgets.Layout(width='500px'),\n",
160 | " )\n",
161 | " self.bq_table_name = ipywidgets.Text(\n",
162 | " placeholder='Please enter table name',\n",
163 | " description='table name:',\n",
164 | " layout=ipywidgets.Layout(width='500px'),\n",
165 | " )\n",
166 | "\n",
167 | " def _define_data_format_widgets(self):\n",
168 | " # Input box for data format\n",
169 | " self.date_col = ipywidgets.Text(\n",
170 | " placeholder='Please enter date column name',\n",
171 | " value='Date',\n",
172 | " description='date column:',\n",
173 | " )\n",
174 | " self.pivot_col = ipywidgets.Text(\n",
175 | " placeholder='Please enter pivot column name',\n",
176 | " value='Geo',\n",
177 | " description='pivot column:',\n",
178 | " )\n",
179 | " self.kpi_col = ipywidgets.Text(\n",
180 | " placeholder='Please enter kpi column name',\n",
181 | " value='KPI',\n",
182 | " description='kpi column:',\n",
183 | " )\n",
184 | "\n",
185 | " def _define_experimental_design_widgets(self):\n",
186 | " # Input box for Experimental_Design-related\n",
187 | " self.exclude_cols = ipywidgets.Text(\n",
188 | " placeholder=(\n",
189 | " 'Enter comma-separated columns if any columns are not used in the'\n",
190 | " ' design.'\n",
191 | " ),\n",
192 | " description='exclude cols:',\n",
193 | " layout=ipywidgets.Layout(width='1000px'),\n",
194 | " )\n",
195 | " self.num_of_split = ipywidgets.Dropdown(\n",
196 | " options=[2, 3, 4, 5],\n",
197 | " value=2,\n",
198 | " description='split#:',\n",
199 | " disabled=False,\n",
200 | " )\n",
201 | " self.target_columns = ipywidgets.Text(\n",
202 | " placeholder='Please enter comma-separated entries',\n",
203 | " value='Tokyo, Kanagawa',\n",
204 | " description='target_cols:',\n",
205 | " layout=ipywidgets.Layout(width='500px'),\n",
206 | " )\n",
207 | " self.num_of_pick_range = ipywidgets.IntRangeSlider(\n",
208 | " value=[5, 10],\n",
209 | " min=1,\n",
210 | " max=50,\n",
211 | " step=1,\n",
212 | " description='pick range:',\n",
213 | " orientation='horizontal',\n",
214 | " readout=True,\n",
215 | " readout_format='d',\n",
216 | " )\n",
217 | " self.num_of_covariate = ipywidgets.Dropdown(\n",
218 | " options=[1, 2, 3, 4, 5],\n",
219 | " value=1,\n",
220 | " description='covariate#:',\n",
221 | " layout=ipywidgets.Layout(width='192px'),\n",
222 | " )\n",
223 | " self.target_share = ipywidgets.FloatSlider(\n",
224 | " value=0.3,\n",
225 | " min=0.05,\n",
226 | " max=0.5,\n",
227 | " step=0.05,\n",
228 | " description='target share#:',\n",
229 | " orientation='horizontal',\n",
230 | " readout=True,\n",
231 | " readout_format='.1%',\n",
232 | " )\n",
233 | " self.control_columns = ipywidgets.Text(\n",
234 | " placeholder='Please enter comma-separated entries',\n",
235 | " value='Aomori, Akita',\n",
236 | " description='control_cols:',\n",
237 | " layout=ipywidgets.Layout(width='500px'),\n",
238 | " )\n",
239 | "\n",
240 | " def _define_simulation_widgets(self):\n",
241 | " # Input box for simulation params\n",
242 | " self.num_of_seasons = ipywidgets.IntText(\n",
243 | " value=1,\n",
244 | " description='num_of_seasons:',\n",
245 | " disabled=False,\n",
246 | " style={'description_width': 'initial'},\n",
247 | " )\n",
248 | " self.estimate_icpa = ipywidgets.IntText(\n",
249 | " value=1000,\n",
250 | " description='Estimated iCPA:',\n",
251 | " style={'description_width': 'initial'},\n",
252 | " )\n",
253 | " self.credible_interval = ipywidgets.RadioButtons(\n",
254 | " options=[70, 80, 90, 95],\n",
255 | " value=90,\n",
256 | " description='Credible interval %:',\n",
257 | " style={'description_width': 'initial'},\n",
258 | " )\n",
259 | "\n",
260 | " def _define_date_widgets(self):\n",
261 | " # Input box for Date-related\n",
262 | " self.pre_period_start = ipywidgets.DatePicker(\n",
263 | " description='Pre Start:',\n",
264 | " value=datetime.date.today() - relativedelta(days=122),\n",
265 | " )\n",
266 | " self.pre_period_end = ipywidgets.DatePicker(\n",
267 | " description='Pre End:',\n",
268 | " value=datetime.date.today() - relativedelta(days=32),\n",
269 | " )\n",
270 | " self.post_period_start = ipywidgets.DatePicker(\n",
271 | " description='Post Start:',\n",
272 | " value=datetime.date.today() - relativedelta(days=31),\n",
273 | " )\n",
274 | " self.post_period_end = ipywidgets.DatePicker(\n",
275 | " description='Post End:',\n",
276 | " value=datetime.date.today(),\n",
277 | " )\n",
278 | " self.start_date = ipywidgets.DatePicker(\n",
279 | " description='Start Date:',\n",
280 | " value=datetime.date.today() - relativedelta(days=122),\n",
281 | " )\n",
282 | " self.end_date = ipywidgets.DatePicker(\n",
283 | " description='End Date:',\n",
284 | " value=datetime.date.today() - relativedelta(days=32),\n",
285 | " )\n",
286 | " self.depend_data = ipywidgets.ToggleButton(\n",
287 | " value=False,\n",
288 | " description='Click >> Use the beginning and end of data',\n",
289 | " disabled=False,\n",
290 | " button_style='info',\n",
291 | " tooltip='Description',\n",
292 | " layout=ipywidgets.Layout(width='300px'),\n",
293 | " )\n",
294 | "\n",
295 | " def generate_ui(self):\n",
296 | " self._build_source_selection_tab()\n",
297 | " self._build_data_type_selection_tab()\n",
298 | " self._build_design_type_tab()\n",
299 | " self._build_purpose_selection_tab()\n",
300 | "\n",
301 | " def _build_source_selection_tab(self):\n",
302 | " # UI for data soure\n",
303 | " self.soure_selection = ipywidgets.Tab()\n",
304 | " self.soure_selection.children = [\n",
305 | " ipywidgets.VBox([self.sheet_url, self.sheet_name]),\n",
306 | " ipywidgets.VBox([self.csv_name]),\n",
307 | " ipywidgets.VBox([self.bq_project_id, self.bq_table_name]),\n",
308 | " ]\n",
309 | " self.soure_selection.set_title(0, 'Google_Spreadsheet')\n",
310 | " self.soure_selection.set_title(1, 'CSV_file')\n",
311 | " self.soure_selection.set_title(2, 'Big_Query')\n",
312 | "\n",
313 | " def _build_data_type_selection_tab(self):\n",
314 | " # UI for data type(narrow or wide)\n",
315 | " self.data_type_selection = ipywidgets.Tab()\n",
316 | " self.data_type_selection.children = [\n",
317 | " ipywidgets.VBox([\n",
318 | " ipywidgets.Label(\n",
319 | " 'Wide, or unstacked data is presented with each different'\n",
320 | " ' data variable in a separate column.'\n",
321 | " ),\n",
322 | " self.date_col,\n",
323 | " ]),\n",
324 | " ipywidgets.VBox([\n",
325 | " ipywidgets.Label(\n",
326 | " 'Narrow, stacked, or long data is presented with one column '\n",
327 | " 'containing all the values and another column listing the '\n",
328 | " 'context of the value'\n",
329 | " ),\n",
330 | " ipywidgets.HBox([self.date_col, self.pivot_col, self.kpi_col]),\n",
331 | " ]),\n",
332 | " ]\n",
333 | " self.data_type_selection.set_title(0, 'Wide_Format')\n",
334 | " self.data_type_selection.set_title(1, 'Narrow_Format')\n",
335 | "\n",
336 | " def _build_design_type_tab(self):\n",
337 | " # UI for experimental design\n",
338 | " self.design_type = ipywidgets.Tab(\n",
339 | " children=[\n",
340 | " ipywidgets.VBox([\n",
341 | " ipywidgets.HTML(\n",
342 | " 'divide_equally divides the time series data into N'\n",
343 | " ' groups(split#) with similar movements.'\n",
344 | " ),\n",
345 | " self.num_of_split,\n",
346 | " self.exclude_cols,\n",
347 | " ]),\n",
348 | " ipywidgets.VBox([\n",
349 | " ipywidgets.HTML(\n",
350 | " 'similarity_selection extracts N groups(covariate#) that '\n",
351 | " 'move similarly to particular columns(target_cols).'\n",
352 | " ),\n",
353 | " ipywidgets.HBox([\n",
354 | " self.target_columns,\n",
355 | " self.num_of_covariate,\n",
356 | " self.num_of_pick_range,\n",
357 | " ]),\n",
358 | " self.exclude_cols,\n",
359 | " ]),\n",
360 | " ipywidgets.VBox([\n",
361 | " ipywidgets.HTML(\n",
362 | " 'target share extracts targeted time series data from'\n",
363 | " ' the proportion of interventions.'\n",
364 | " ),\n",
365 | " self.target_share,\n",
366 | " self.exclude_cols,\n",
367 | " ]),\n",
368 | " ipywidgets.VBox([\n",
369 | " ipywidgets.HTML(\n",
370 | " 'To improve reproducibility, it is important to create an'\n",
371 | " ' accurate counterfactual model rather than a balanced'\n",
372 | " ' assignment.'\n",
373 | " ),\n",
374 | " self.target_columns,\n",
375 | " self.control_columns,\n",
376 | " ]),\n",
377 | " ]\n",
378 | " )\n",
379 | " self.design_type.set_title(0, 'A: divide_equally')\n",
380 | " self.design_type.set_title(1, 'B: similarity_selection')\n",
381 | " self.design_type.set_title(2, 'C: target_share')\n",
382 | " self.design_type.set_title(3, 'D: pre-allocated')\n",
383 | "\n",
384 | " def _build_purpose_selection_tab(self):\n",
385 | " # UI for purpose (CausalImpact or Experimental Design)\n",
386 | " self.purpose_selection = ipywidgets.Tab()\n",
387 | " self.date_selection = ipywidgets.Tab()\n",
388 | " self.date_selection.children = [\n",
389 | " ipywidgets.VBox(\n",
390 | " [\n",
391 | " ipywidgets.HTML('The minimum date of the data is '\n",
392 | " 'selected as the start date.'),\n",
393 | " ipywidgets.HTML('The maximum date in the data is '\n",
394 | " 'selected as the end date.'),\n",
395 | " ]),\n",
396 | " ipywidgets.VBox(\n",
397 | " [\n",
398 | " self.start_date,\n",
399 | " self.end_date,\n",
400 | " ]\n",
401 | " )]\n",
402 | " self.date_selection.set_title(0, 'automatic selection')\n",
403 | " self.date_selection.set_title(1, 'manual input')\n",
404 | "\n",
405 | " self.purpose_selection.children = [\n",
406 | " # Causalimpact\n",
407 | " ipywidgets.VBox([\n",
408 | " PreProcess._apply_text_style(\n",
409 | " 15, '⑶ - a: Enter the Pre and Post the intervention.'\n",
410 | " ),\n",
411 | " self.pre_period_start,\n",
412 | " self.pre_period_end,\n",
413 | " self.post_period_start,\n",
414 | " self.post_period_end,\n",
415 | " PreProcess._apply_text_style(\n",
416 | " 15,\n",
417 | " '⑶ - b: Enter the number of periodicities in the'\n",
418 | " ' time series data.(default=1)',\n",
419 | " ),\n",
420 | " ipywidgets.VBox([self.num_of_seasons, self.credible_interval]),\n",
421 | " ]),\n",
422 | " # Experimental_Design\n",
423 | " ipywidgets.VBox([\n",
424 | " PreProcess._apply_text_style(\n",
425 | " 15,\n",
426 | " '⑶ - a: Please select date for experimental design',\n",
427 | " ),\n",
428 | " self.date_selection,\n",
429 | " PreProcess._apply_text_style(\n",
430 | " 15,\n",
431 | " '⑶ - b: Select the experimental design method and'\n",
432 | " ' enter the necessary items.',\n",
433 | " ),\n",
434 | " self.design_type,\n",
435 | " PreProcess._apply_text_style(\n",
436 | " 15,\n",
437 | " '⑶ - c: (Optional) Enter Estimated incremental CPA(Cost'\n",
438 | " ' of intervention ÷ Lift from intervention without bias) & the '\n",
439 | " 'number of periodicities in the time series data.',\n",
440 | " ),\n",
441 | " ipywidgets.VBox([\n",
442 | " self.estimate_icpa,\n",
443 | " self.num_of_seasons,\n",
444 | " self.credible_interval,\n",
445 | " ]),\n",
446 | " ]),\n",
447 | " ]\n",
448 | " self.purpose_selection.set_title(0, 'Causalimpact')\n",
449 | " self.purpose_selection.set_title(1, 'Experimental_Design')\n",
450 | "\n",
451 | " display(\n",
452 | " PreProcess._apply_text_style(18, '⑴ Please select a data source.'),\n",
453 | " self.soure_selection,\n",
454 | " Markdown('
'),\n",
455 | " PreProcess._apply_text_style(\n",
456 | " 18, '⑵ Please select wide or narrow data format.'\n",
457 | " ),\n",
458 | " self.data_type_selection,\n",
459 | " Markdown('
'),\n",
460 | " PreProcess._apply_text_style(\n",
461 | " 18, '⑶ Please select the purpose and set conditions.'\n",
462 | " ),\n",
463 | " self.purpose_selection,\n",
464 | " )\n",
465 | "\n",
466 | " def load_data(self):\n",
467 | " if self.soure_selection.selected_index == 0:\n",
468 | " try:\n",
469 | " self.loaded_df = self._load_data_from_sheet(\n",
470 | " self.sheet_url.value, self.sheet_name.value\n",
471 | " )\n",
472 | " except Exception as e:\n",
473 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
474 | " print('Error: {}'.format(e))\n",
475 | " print('Please check the following:')\n",
476 | " print('* sheet url:{}'.format(self.sheet_url.value))\n",
477 | " print('* sheet name:{}'.format(self.sheet_name.value))\n",
478 | " raise Exception('Please check Failure')\n",
479 | "\n",
480 | " elif self.soure_selection.selected_index == 1:\n",
481 | " try:\n",
482 | " self.loaded_df = self._load_data_from_csv(self.csv_name.value)\n",
483 | " except Exception as e:\n",
484 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
485 | " print('Error: {}'.format(e))\n",
486 | " print('Please check the following:')\n",
487 | " print('* There is something wrong with the CSV-related settings.')\n",
488 | " print('* CSV namel:{}'.format(self.csv_name.value))\n",
489 | " raise Exception('Please check Failure')\n",
490 | "\n",
491 | " elif self.soure_selection.selected_index == 2:\n",
492 | " try:\n",
493 | " self.loaded_df = self._load_data_from_bigquery(\n",
494 | " self.bq_project_id.value, self.bq_table_name.value\n",
495 | " )\n",
496 | " except Exception as e:\n",
497 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
498 | " print('Error: {}'.format(e))\n",
499 | " print('Please check the following:')\n",
500 | " print('* There is something wrong with the bq-related settings.')\n",
501 | " print('* bq project id:{}'.format(self.bq_project_id.value))\n",
502 | " print('* bq table name :{}'.format(self.bq_table_name.value))\n",
503 | " raise Exception('Please check Failure')\n",
504 | "\n",
505 | " else:\n",
506 | " raise Exception('Please select a data souce at Step.1-2.')\n",
507 | "\n",
508 | " self._apply_text_style(\n",
509 | " 'success',\n",
510 | " 'Success! The target data has been loaded.')\n",
511 | " display(self.loaded_df.head(3))\n",
512 | "\n",
513 | " @staticmethod\n",
514 | " def _load_data_from_sheet(spreadsheet_url, sheet_name):\n",
515 | " \"\"\"load_data_from_sheet load data from spreadsheet.\n",
516 | "\n",
517 | " Args:\n",
518 | " spreadsheet_url: Spreadsheet url with data.\n",
519 | " sheet_name: Sheet name with data.\n",
520 | " \"\"\"\n",
521 | " auth.authenticate_user()\n",
522 | " creds, _ = default()\n",
523 | " gc = gspread.authorize(creds)\n",
524 | " _workbook = gc.open_by_url(spreadsheet_url)\n",
525 | " _worksheet = _workbook.worksheet(sheet_name)\n",
526 | " df_sheet = pd.DataFrame(_worksheet.get_all_values())\n",
527 | " df_sheet.columns = list(df_sheet.loc[0, :])\n",
528 | " df_sheet.drop(0, inplace=True)\n",
529 | " df_sheet.reset_index(drop=True, inplace=True)\n",
530 | " df_sheet.replace(',', '', regex=True, inplace=True)\n",
531 | " df_sheet.rename(columns=lambda x: x.replace(\" \", \"\"), inplace=True)\n",
532 | " df_sheet = df_sheet.apply(pd.to_numeric, errors='ignore')\n",
533 | " return df_sheet\n",
534 | "\n",
535 | " @staticmethod\n",
536 | " def _load_data_from_csv(csv_name):\n",
537 | " \"\"\"load_data_from_csv read data from csv.\n",
538 | "\n",
539 | " Args:\n",
540 | " csv_name: csv file name.\n",
541 | " \"\"\"\n",
542 | " uploaded = files.upload()\n",
543 | " df_csv = pd.read_csv(io.BytesIO(uploaded[csv_name]))\n",
544 | " df_csv.replace(',', '', regex=True, inplace=True)\n",
545 | " df_csv.rename(columns=lambda x: x.replace(\" \", \"\"), inplace=True)\n",
546 | " df_csv = df_csv.apply(pd.to_numeric, errors='ignore')\n",
547 | " return df_csv\n",
548 | "\n",
549 | " @staticmethod\n",
550 | " def _load_data_from_bigquery(bq_project_id, bq_table_name):\n",
551 | " \"\"\"_load_data_from_bigquery load data from bigquery.\n",
552 | "\n",
553 | " Args:\n",
554 | " bq_project_id: bigquery project id.\n",
555 | " bq_table_name: bigquery table name\n",
556 | " \"\"\"\n",
557 | " auth.authenticate_user()\n",
558 | " client = bigquery.Client(project=bq_project_id)\n",
559 | " query = 'SELECT * FROM `' + bq_table_name + '`;'\n",
560 | " df_bq = client.query(query).to_dataframe()\n",
561 | " df_bq.replace(',', '', regex=True, inplace=True)\n",
562 | " df_bq.rename(columns=lambda x: x.replace(\" \", \"\"), inplace=True)\n",
563 | " df_bq = df_bq.apply(pd.to_numeric, errors='ignore')\n",
564 | " return df_bq\n",
565 | "\n",
566 | " def format_data(self):\n",
567 | " \"\"\"Formats the loaded data for causal impact analysis or experimental design.\n",
568 | "\n",
569 | " This method performs several data transformation steps:\n",
570 | " 1. Cleans column names by removing spaces from `date_col`, `pivot_col`, and `kpi_col`.\n",
571 | " 2. Converts the data to a wide format if specified by `data_type_selection`.\n",
572 | " 3. Drops columns specified in `exclude_cols`.\n",
573 | " 4. Converts the date column to datetime objects and sets it as the DataFrame index.\n",
574 | " 5. Reindexes the DataFrame to ensure a continuous date range from the minimum to maximum date.\n",
575 | " 6. Calculates `tick_count` for visualization purposes.\n",
576 | " 7. Provides visual feedback on the data formatting success or failure.\n",
577 | " 8. Displays an overview of the formatted data, including index, date range, and missing values.\n",
578 | " 9. Visualizes data trends (total and individual) and descriptive statistics.\n",
579 | "\n",
580 | " Raises:\n",
581 | " Exception: If any error occurs during data formatting, often due to\n",
582 | " mismatched data format selection (wide/narrow) or incorrect\n",
583 | " column names. Provides specific error messages to guide debugging.\n",
584 | " \"\"\"\n",
585 | " self.date_col_name = self.date_col.value.replace(' ', '')\n",
586 | " self.pivot_col_name = self.pivot_col.value.replace(' ', '')\n",
587 | " self.kpi_col_name = self.kpi_col.value.replace(' ', '')\n",
588 | "\n",
589 | " try:\n",
590 | " if self.data_type_selection.selected_index == 0:\n",
591 | " self.formatted_data = self.loaded_df.copy()\n",
592 | " elif self.data_type_selection.selected_index == 1:\n",
593 | " self.formatted_data = self._shape_wide(\n",
594 | " self.loaded_df,\n",
595 | " self.date_col_name,\n",
596 | " self.pivot_col_name,\n",
597 | " self.kpi_col_name,\n",
598 | " )\n",
599 | "\n",
600 | " self.formatted_data.drop(\n",
601 | " self.exclude_cols.value.replace(', ', ',').split(','),\n",
602 | " axis=1,\n",
603 | " errors='ignore',\n",
604 | " inplace=True,\n",
605 | " )\n",
606 | " self.formatted_data[self.date_col_name] = pd.to_datetime(\n",
607 | " self.formatted_data[self.date_col_name]\n",
608 | " )\n",
609 | " self.formatted_data = self.formatted_data.set_index(self.date_col_name)\n",
610 | " self.formatted_data = self.formatted_data.reindex(\n",
611 | " pd.date_range(\n",
612 | " start=self.formatted_data.index.min(),\n",
613 | " end=self.formatted_data.index.max(),\n",
614 | " name=self.formatted_data.index.name))\n",
615 | " self.tick_count = len(self.formatted_data.resample('M')) - 1\n",
616 | " self._apply_text_style(\n",
617 | " 'success',\n",
618 | " '\\nSuccess! The data was formatted for analysis.'\n",
619 | " )\n",
620 | " display(self.formatted_data.head(3))\n",
621 | " self._apply_text_style(\n",
622 | " 'failure',\n",
623 | " '\\nCheck! Here is an overview of the data.'\n",
624 | " )\n",
625 | " print(\n",
626 | " 'Index name:{} | The earliest date: {} | The latest date: {}'.format(\n",
627 | " self.formatted_data.index.name,\n",
628 | " min(self.formatted_data.index),\n",
629 | " max(self.formatted_data.index)\n",
630 | " ))\n",
631 | " print('* Rows with missing values')\n",
632 | " self.missing_row = self.formatted_data[\n",
633 | " self.formatted_data.isnull().any(axis=1)]\n",
634 | " if len(self.missing_row) > 0:\n",
635 | " self.missing_row\n",
636 | " else:\n",
637 | " print('>> Does not include missing values')\n",
638 | "\n",
639 | " self._apply_text_style(\n",
640 | " 'failure',\n",
641 | " '\\nCheck! below [total_trend] / [each_trend] / [describe_data]'\n",
642 | " )\n",
643 | " self._trend_check(\n",
644 | " self.formatted_data,\n",
645 | " self.date_col_name,\n",
646 | " self.tick_count)\n",
647 | "\n",
648 | " except Exception as e:\n",
649 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
650 | " print('Error: {}'.format(e))\n",
651 | " self._apply_text_style('failure', '\\nPlease check the following:')\n",
652 | " if self.data_type_selection.selected_index == 0:\n",
653 | " print('* Your selected data format: Wide format at (2)')\n",
654 | " print('1. Check if the data source is wide.')\n",
655 | " print('2. Compare \"date column\"( {} ) and \"data source\"'.format(\n",
656 | " self.date_col.value))\n",
657 | " print('\\n\\n')\n",
658 | " else:\n",
659 | " print('* Your selected data format: Narrow format at (2)')\n",
660 | " print('1. Check if the data source is narrow.')\n",
661 | " print('2. Compare \"your input\" and \"data source')\n",
662 | " print('>> date column: {}'.format(self.date_col.value))\n",
663 | " print('>> pivot column: {}'.format(self.pivot_col.value))\n",
664 | " print('>> kpi column: {}'.format(self.kpi_col.value))\n",
665 | " print('\\n\\n')\n",
666 | " raise Exception('Please check Failure')\n",
667 | "\n",
668 | " @staticmethod\n",
669 | " def _shape_wide(dataframe, date_column, pivot_column, kpi_column):\n",
670 | " \"\"\"shape_wide pivots the data in the specified column.\n",
671 | "\n",
672 | " Converts long data to wide data suitable for experiment design.\n",
673 | "\n",
674 | " Args:\n",
675 | " dataframe: The DataFrame to be pivoted.\n",
676 | " date_column: The name of the column that contains the dates.\n",
677 | " pivot_column: The name of the column that contains the pivot keys.\n",
678 | " kpi_column: The name of the column that contains the KPI values.\n",
679 | "\n",
680 | " Returns:\n",
681 | " A DataFrame with the pivoted data.\n",
682 | " \"\"\"\n",
683 | " # Check if the pivot_column is a single column or a list of columns.\n",
684 | " if ',' in pivot_column:\n",
685 | " group_cols = pivot_column.replace(', ', ',').split(',')\n",
686 | " else:\n",
687 | " group_cols = [pivot_column]\n",
688 | "\n",
689 | " pivoted_df = pd.pivot_table(\n",
690 | " (dataframe[[date_column] + [kpi_column] + group_cols])\n",
691 | " .groupby([date_column] + group_cols)\n",
692 | " .sum(),\n",
693 | " index=date_column,\n",
694 | " columns=group_cols,\n",
695 | " fill_value=0,\n",
696 | " )\n",
697 | " # Drop the first level of the column names.\n",
698 | " pivoted_df.columns = pivoted_df.columns.droplevel(0)\n",
699 | " # If there are multiple columns, convert the column names to a single string.\n",
700 | " if len(pivoted_df.columns.names) > 1:\n",
701 | " new_cols = [\n",
702 | " '_'.join([x.replace(',', '_') for x in y])\n",
703 | " for y in pivoted_df.columns.values\n",
704 | " ]\n",
705 | " pivoted_df.columns = new_cols\n",
706 | " pivoted_df = pivoted_df.reset_index()\n",
707 | " return pivoted_df\n",
708 | "\n",
709 | " @staticmethod\n",
710 | " def _trend_check(dataframe, date_col_name, tick_count):\n",
711 | " \"\"\"trend_check visualize daily trend, 7-day moving average\n",
712 | "\n",
713 | " Args:\n",
714 | " dataframe: Wide data to check the trend\n",
715 | " date_col_name: xxx\n",
716 | " \"\"\"\n",
717 | " df_each = pd.DataFrame(index=dataframe.index)\n",
718 | " col_list = list(dataframe.columns)\n",
719 | " for i in col_list:\n",
720 | " min_max = (\n",
721 | " dataframe[i] - dataframe[i].min()\n",
722 | " ) / (dataframe[i].max() - dataframe[i].min())\n",
723 | " df_each = pd.concat([df_each, min_max], axis = 1)\n",
724 | "\n",
725 | " metric = 'dtw'\n",
726 | " n_clusters = 5\n",
727 | " tskm_base = TimeSeriesKMeans(n_clusters=n_clusters, metric=metric,\n",
728 | " max_iter=100, random_state=42)\n",
729 | " df_cluster = pd.DataFrame({\n",
730 | " \"pivot\": col_list,\n",
731 | " \"cluster\": tskm_base.fit_predict(df_each.T).tolist()})\n",
732 | " cluster_counts = (\n",
733 | " df_cluster[\"cluster\"].value_counts().sort_values(ascending=True))\n",
734 | "\n",
735 | " cluster_text = []\n",
736 | " line_each = []\n",
737 | " for i in cluster_counts.index:\n",
738 | " clust_list = df_cluster.query(\"cluster == @i\")[\"pivot\"].to_list()\n",
739 | " source = df_each.filter(items=clust_list)\n",
740 | " cluster_text.append(str(clust_list).translate(\n",
741 | " str.maketrans({'[': '', ']': '', \"'\": ''})))\n",
742 | " line_each.append(\n",
743 | " alt.Chart(source.reset_index())\n",
744 | " .transform_fold(fold=clust_list, as_=['pivot', 'kpi'])\n",
745 | " .mark_line()\n",
746 | " .encode(\n",
747 | " alt.X(\n",
748 | " date_col_name + ':T',\n",
749 | " title=None,\n",
750 | " axis=alt.Axis(\n",
751 | " grid=False, format='%Y %b', tickCount=tick_count\n",
752 | " ),\n",
753 | " ),\n",
754 | " alt.Y('kpi:Q', stack=None, axis=None),\n",
755 | " alt.Color(str(i) + ':N', title=None, legend=None),\n",
756 | " alt.Row(\n",
757 | " 'pivot:N',\n",
758 | " title=None,\n",
759 | " header=alt.Header(labelAngle=0, labelAlign='left'),\n",
760 | " ),\n",
761 | " )\n",
762 | " .properties(bounds='flush', height=30)\n",
763 | " .configure_facet(spacing=0)\n",
764 | " .configure_view(stroke=None)\n",
765 | " .configure_title(anchor='end')\n",
766 | " )\n",
767 | "\n",
768 | " df_long = (\n",
769 | " pd.melt(dataframe.reset_index(), id_vars=date_col_name)\n",
770 | " .groupby(date_col_name)\n",
771 | " .sum(numeric_only=True)\n",
772 | " .reset_index()\n",
773 | " )\n",
774 | " line_total = (\n",
775 | " alt.Chart(df_long)\n",
776 | " .mark_line()\n",
777 | " .encode(\n",
778 | " x=alt.X(\n",
779 | " date_col_name + ':T',\n",
780 | " axis=alt.Axis(\n",
781 | " title='', format='%Y %b', tickCount=tick_count\n",
782 | " ),\n",
783 | " ),\n",
784 | " y=alt.Y('value:Q', axis=alt.Axis(title='kpi')),\n",
785 | " color=alt.value('#4285F4'),\n",
786 | " )\n",
787 | " )\n",
788 | " moving_average = (\n",
789 | " alt.Chart(df_long)\n",
790 | " .transform_window(\n",
791 | " rolling_mean='mean(value)',\n",
792 | " frame=[-4, 3],\n",
793 | " )\n",
794 | " .mark_line()\n",
795 | " .encode(\n",
796 | " x=alt.X(date_col_name + ':T'),\n",
797 | " y=alt.Y('rolling_mean:Q'),\n",
798 | " color=alt.value('#DB4437'),\n",
799 | " )\n",
800 | " )\n",
801 | " tab_total_trend = ipywidgets.Output()\n",
802 | " tab_each_trend = ipywidgets.Output()\n",
803 | " tab_describe_data = ipywidgets.Output()\n",
804 | " tab_result = ipywidgets.Tab(children = [\n",
805 | " tab_total_trend,\n",
806 | " tab_each_trend,\n",
807 | " tab_describe_data,\n",
808 | " ])\n",
809 | " tab_result.set_title(0, '>> total_trend')\n",
810 | " tab_result.set_title(1, '>> each_trend')\n",
811 | " tab_result.set_title(2, '>> describe_data')\n",
812 | " display(tab_result)\n",
813 | " with tab_total_trend:\n",
814 | " display(\n",
815 | " (line_total + moving_average).properties(\n",
816 | " width=700,\n",
817 | " height=200,\n",
818 | " title={\n",
819 | " 'text': ['Daily Trend(blue) & 7days moving average(red)'],\n",
820 | " },\n",
821 | " )\n",
822 | " )\n",
823 | " with tab_each_trend:\n",
824 | " for i in range(len(cluster_text)):\n",
825 | " print('cluster {}:{}'.format(i, cluster_text[i]))\n",
826 | " display(line_each[i].properties(width=700))\n",
827 | " with tab_describe_data:\n",
828 | " display(dataframe.describe(include='all'))\n",
829 | "\n",
830 | " @staticmethod\n",
831 | " def saving_params(instance):\n",
832 | " params_dict = {\n",
833 | " # section for data source\n",
834 | " 'soure_selection': instance.soure_selection.selected_index,\n",
835 | " 'sheet_url': instance.sheet_url.value,\n",
836 | " 'sheet_name': instance.sheet_name.value,\n",
837 | " 'csv_name': instance.csv_name.value,\n",
838 | " 'bq_project_id': instance.bq_project_id.value,\n",
839 | " 'bq_table_name': instance.bq_table_name.value,\n",
840 | "\n",
841 | " # section for data format(narrow or wide)\n",
842 | " 'data_type_selection': instance.data_type_selection.selected_index,\n",
843 | " 'date_col': instance.date_col.value,\n",
844 | " 'pivot_col': instance.pivot_col.value,\n",
845 | " 'kpi_col': instance.kpi_col.value,\n",
846 | "\n",
847 | " # section for porpose(CausalImpact or Experimental Design)\n",
848 | " 'purpose_selection': instance.purpose_selection.selected_index,\n",
849 | " 'pre_period_start': instance.pre_period_start.value,\n",
850 | " 'pre_period_end': instance.pre_period_end.value,\n",
851 | " 'post_period_start': instance.post_period_start.value,\n",
852 | " 'post_period_end': instance.post_period_end.value,\n",
853 | " 'start_date': instance.start_date.value,\n",
854 | " 'end_date': instance.end_date.value,\n",
855 | " 'depend_data': instance.depend_data.value,\n",
856 | "\n",
857 | " 'design_type': instance.design_type.selected_index,\n",
858 | " 'num_of_split': instance.num_of_split.value,\n",
859 | " 'target_columns': instance.target_columns.value,\n",
860 | " 'control_columns': instance.control_columns.value,\n",
861 | " 'num_of_pick_range': instance.num_of_pick_range.value,\n",
862 | " 'num_of_covariate': instance.num_of_covariate.value,\n",
863 | " 'target_share': instance.target_share.value,\n",
864 | " 'exclude_cols': instance.exclude_cols.value,\n",
865 | "\n",
866 | " 'num_of_seasons': instance.num_of_seasons.value,\n",
867 | " 'estimate_icpa': instance.estimate_icpa.value,\n",
868 | " 'credible_interval': instance.credible_interval.value,\n",
869 | " }\n",
870 | " return params_dict\n",
871 | "\n",
872 | " @staticmethod\n",
873 | " def set_params(instance, dict_params):\n",
874 | " # section for data source\n",
875 | " instance.soure_selection.selected_index = dict_params['soure_selection']\n",
876 | " instance.sheet_url.value = dict_params['sheet_url']\n",
877 | " instance.sheet_name.value = dict_params['sheet_name']\n",
878 | " instance.csv_name.value = dict_params['csv_name']\n",
879 | " instance.bq_project_id.value = dict_params['bq_project_id']\n",
880 | " instance.bq_table_name.value = dict_params['bq_table_name']\n",
881 | "\n",
882 | " # section for data format(narrow or wide)\n",
883 | " instance.data_type_selection.selected_index = dict_params['data_type_selection']\n",
884 | " instance.date_col.value = dict_params['date_col']\n",
885 | " instance.pivot_col.value = dict_params['pivot_col']\n",
886 | " instance.kpi_col.value = dict_params['kpi_col']\n",
887 | "\n",
888 | " # section for porpose(CausalImpact or Experimental Design)\n",
889 | " instance.purpose_selection.selected_index = dict_params['purpose_selection']\n",
890 | " instance.pre_period_start.value = dict_params['pre_period_start']\n",
891 | " instance.pre_period_end.value = dict_params['pre_period_end']\n",
892 | " instance.post_period_start.value = dict_params['post_period_start']\n",
893 | " instance.post_period_end.value = dict_params['post_period_end']\n",
894 | " instance.start_date.value = dict_params['start_date']\n",
895 | " instance.end_date.value = dict_params['end_date']\n",
896 | " instance.depend_data.value = dict_params['depend_data']\n",
897 | "\n",
898 | " instance.design_type.selected_index = dict_params['design_type']\n",
899 | " instance.num_of_split.value = dict_params['num_of_split']\n",
900 | " instance.target_columns.value = dict_params['target_columns']\n",
901 | " instance.control_columns.value = dict_params['control_columns']\n",
902 | " instance.num_of_pick_range.value = dict_params['num_of_pick_range']\n",
903 | " instance.num_of_covariate.value = dict_params['num_of_covariate']\n",
904 | " instance.target_share.value = dict_params['target_share']\n",
905 | " instance.exclude_cols.value = dict_params['exclude_cols']\n",
906 | "\n",
907 | " instance.num_of_seasons.value = dict_params['num_of_seasons']\n",
908 | " instance.estimate_icpa.value = dict_params['estimate_icpa']\n",
909 | " instance.credible_interval.value = dict_params['credible_interval']\n",
910 | "\n",
911 | "# @title dev\n",
912 | "class CausalImpact(PreProcess):\n",
913 | " \"\"\"CausalImpact analysis and experimental design on CausalImpact.\n",
914 | "\n",
915 | " CausalImpact Analysis performs a CausalImpact analysis on the given data and\n",
916 | " outputs the results. The experimental design will be based on N partitions,\n",
917 | " similarity, or share, with 1000 iterations of random sampling, and will output\n",
918 | " the three candidate groups with the closest DTW distance. A combination of\n",
919 | " increments and periods will be used to simulate and return which combination\n",
920 | " will result in a significantly different validation.\n",
921 | "\n",
922 | " Attributes:\n",
923 | " run_causalImpact: Runs CausalImpact on the given case.\n",
924 | " create_causalimpact_object:\n",
925 | " display_causalimpact_result:\n",
926 | " plot_causalimpact:\n",
927 | "\n",
928 | " Returns:\n",
929 | " The CausalImpact object.\n",
930 | " \"\"\"\n",
931 | "\n",
932 | " colors = [\n",
933 | " '#DB4437',\n",
934 | " '#AB47BC',\n",
935 | " '#4285F4',\n",
936 | " '#00ACC1',\n",
937 | " '#0F9D58',\n",
938 | " '#9E9D24',\n",
939 | " '#F4B400',\n",
940 | " '#FF7043',\n",
941 | " ]\n",
942 | " NUM_OF_ITERATION = 1000\n",
943 | " COMBINATION_TARGET = 10\n",
944 | " TREAT_DURATION = [14, 21, 28]\n",
945 | " TREAT_IMPACT = [1, 1.01, 1.03, 1.05, 1.10, 1.15]\n",
946 | " MAX_STRING_LENGTH = 150\n",
947 | "\n",
948 | " def __init__(self):\n",
949 | " super().__init__()\n",
950 | "\n",
951 | " def run_causalImpact(self):\n",
952 | " self.ci_objs = []\n",
953 | " try:\n",
954 | " self.ci_obj = self.create_causalimpact_object(\n",
955 | " self.formatted_data,\n",
956 | " self.date_col_name,\n",
957 | " self.pre_period_start.value,\n",
958 | " self.pre_period_end.value,\n",
959 | " self.post_period_start.value,\n",
960 | " self.post_period_end.value,\n",
961 | " self.num_of_seasons.value,\n",
962 | " self.credible_interval.value,\n",
963 | " )\n",
964 | " self.ci_objs.append(self.ci_obj)\n",
965 | " self._apply_text_style(\n",
966 | " 'success',\n",
967 | " '\\nSuccess! CausalImpact has been performed. Check the'\n",
968 | " ' results in the next cell.',\n",
969 | " )\n",
970 | "\n",
971 | " except Exception as e:\n",
972 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
973 | " print('Error: {}'.format(e))\n",
974 | " print('Please check the following:')\n",
975 | " print('* Date source.')\n",
976 | " print('* Date Column Name.')\n",
977 | " print('* Duration of experiment (pre and post).')\n",
978 | " raise Exception('Please check Failure')\n",
979 | "\n",
980 | " @staticmethod\n",
981 | " def create_causalimpact_object(\n",
982 | " data,\n",
983 | " date_col,\n",
984 | " pre_start,\n",
985 | " pre_end,\n",
986 | " post_start,\n",
987 | " post_end,\n",
988 | " num_of_seasons,\n",
989 | " credible_interval):\n",
990 | " if data.index.name != date_col: data.set_index(date_col, inplace=True)\n",
991 | "\n",
992 | " if num_of_seasons == 1:\n",
993 | " causalimpact_object = causalimpact.fit_causalimpact(\n",
994 | " data=data,\n",
995 | " pre_period=(str(pre_start), str(pre_end)),\n",
996 | " post_period=(str(post_start), str(post_end)),\n",
997 | " alpha= 1 - credible_interval / 100,\n",
998 | " )\n",
999 | " else:\n",
1000 | " causalimpact_object = causalimpact.fit_causalimpact(\n",
1001 | " data=data,\n",
1002 | " pre_period=(str(pre_start), str(pre_end)),\n",
1003 | " post_period=(str(post_start), str(post_end)),\n",
1004 | " alpha= 1 - credible_interval / 100,\n",
1005 | " model_options=causalimpact.ModelOptions(\n",
1006 | " seasons=[\n",
1007 | " causalimpact.Seasons(num_seasons=num_of_seasons),\n",
1008 | " ]\n",
1009 | " ),\n",
1010 | " )\n",
1011 | " return causalimpact_object\n",
1012 | "\n",
1013 | " def display_causalimpact_result(self):\n",
1014 | " print('Test & Control Time Series')\n",
1015 | " line = (\n",
1016 | " alt.Chart(self.formatted_data.reset_index())\n",
1017 | " .transform_fold(list(self.formatted_data.columns))\n",
1018 | " .mark_line()\n",
1019 | " .encode(\n",
1020 | " alt.X(\n",
1021 | " self.date_col_name + ':T',\n",
1022 | " title=None,\n",
1023 | " axis=alt.Axis(format='%Y %b', tickCount=self.tick_count),\n",
1024 | " ),\n",
1025 | " y=alt.Y('value:Q', axis=alt.Axis(title='kpi')),\n",
1026 | " color=alt.Color(\n",
1027 | " 'key:N',\n",
1028 | " legend=alt.Legend(\n",
1029 | " title=None,\n",
1030 | " orient='none',\n",
1031 | " legendY=-20,\n",
1032 | " direction='horizontal',\n",
1033 | " titleAnchor='start',\n",
1034 | " ),\n",
1035 | " scale=alt.Scale(\n",
1036 | " domain=list(self.formatted_data.columns),\n",
1037 | " range=CausalImpact.colors,\n",
1038 | " ),\n",
1039 | " ),\n",
1040 | " )\n",
1041 | " .properties(height=200, width=600)\n",
1042 | " )\n",
1043 | " rule = (\n",
1044 | " alt.Chart(\n",
1045 | " pd.DataFrame({\n",
1046 | " 'Date': [\n",
1047 | " str(self.post_period_start.value),\n",
1048 | " str(self.post_period_end.value)\n",
1049 | " ],\n",
1050 | " 'color': ['red', 'orange'],\n",
1051 | " })\n",
1052 | " )\n",
1053 | " .mark_rule(strokeDash=[5, 5])\n",
1054 | " .encode(x='Date:T', color=alt.Color('color:N', scale=None))\n",
1055 | " )\n",
1056 | " display((line+rule).properties(height=200, width=600))\n",
1057 | " print('=' * 100)\n",
1058 | "\n",
1059 | " self.plot_causalimpact(\n",
1060 | " self.ci_objs[0],\n",
1061 | " self.pre_period_start.value,\n",
1062 | " self.pre_period_end.value,\n",
1063 | " self.post_period_start.value,\n",
1064 | " self.post_period_end.value,\n",
1065 | " self.credible_interval.value,\n",
1066 | " self.date_col_name,\n",
1067 | " self.tick_count,\n",
1068 | " self.purpose_selection.selected_index\n",
1069 | " )\n",
1070 | "\n",
1071 | " @staticmethod\n",
1072 | " def plot_causalimpact(\n",
1073 | " causalimpact_object,\n",
1074 | " pre_start,\n",
1075 | " pre_end,\n",
1076 | " tread_start,\n",
1077 | " treat_end,\n",
1078 | " credible_interval,\n",
1079 | " date_col_name,\n",
1080 | " tick_count,\n",
1081 | " purpose_selection\n",
1082 | " ):\n",
1083 | " causalimpact_df = causalimpact_object.series#.copy()\n",
1084 | " mape = mean_absolute_percentage_error(\n",
1085 | " causalimpact_df['observed'][str(pre_start) : str(pre_end)],\n",
1086 | " causalimpact_df['posterior_mean'][str(pre_start) : str(pre_end)],\n",
1087 | " )\n",
1088 | " threshold = round(1 - credible_interval / 100, 2)\n",
1089 | "\n",
1090 | " line_1 = (\n",
1091 | " alt.Chart(causalimpact_df.reset_index())\n",
1092 | " .transform_fold([\n",
1093 | " 'observed',\n",
1094 | " 'posterior_mean',\n",
1095 | " ])\n",
1096 | " .mark_line()\n",
1097 | " .encode(\n",
1098 | " x=alt.X(\n",
1099 | " 'yearmonthdate(' + date_col_name + ')',\n",
1100 | " axis=alt.Axis(\n",
1101 | " title='',\n",
1102 | " labels=False,\n",
1103 | " ticks=False,\n",
1104 | " format='%Y %b',\n",
1105 | " tickCount=tick_count,\n",
1106 | " ),\n",
1107 | " ),\n",
1108 | " y=alt.Y(\n",
1109 | " 'value:Q',\n",
1110 | " scale=alt.Scale(zero=False),\n",
1111 | " axis=alt.Axis(title=''),\n",
1112 | " ),\n",
1113 | " color=alt.Color(\n",
1114 | " 'key:N',\n",
1115 | " legend=alt.Legend(\n",
1116 | " title=None,\n",
1117 | " orient='none',\n",
1118 | " legendY=-20,\n",
1119 | " direction='horizontal',\n",
1120 | " titleAnchor='start',\n",
1121 | " ),\n",
1122 | " sort=['posterior_mean', 'observed'],\n",
1123 | " ),\n",
1124 | " strokeDash=alt.condition(\n",
1125 | " alt.datum.key == 'posterior_mean',\n",
1126 | " alt.value([5, 5]),\n",
1127 | " alt.value([0]),\n",
1128 | " ),\n",
1129 | " )\n",
1130 | " )\n",
1131 | " area_1 = (\n",
1132 | " alt.Chart(causalimpact_df.reset_index())\n",
1133 | " .mark_area(opacity=0.3)\n",
1134 | " .encode(\n",
1135 | " x=alt.X('yearmonthdate(' + date_col_name + ')'),\n",
1136 | " y=alt.Y('posterior_lower:Q', scale=alt.Scale(zero=False)),\n",
1137 | " y2=alt.Y2('posterior_upper:Q'),\n",
1138 | " )\n",
1139 | " )\n",
1140 | " line_2 = (\n",
1141 | " alt.Chart(causalimpact_df.reset_index())\n",
1142 | " .mark_line(strokeDash=[5, 5])\n",
1143 | " .encode(\n",
1144 | " x=alt.X(\n",
1145 | " 'yearmonthdate(' + date_col_name + ')',\n",
1146 | " axis=alt.Axis(\n",
1147 | " title='',\n",
1148 | " labels=False,\n",
1149 | " ticks=False,\n",
1150 | " format='%Y %b',\n",
1151 | " tickCount=tick_count,\n",
1152 | " ),\n",
1153 | " ),\n",
1154 | " y=alt.Y(\n",
1155 | " 'point_effects_mean:Q',\n",
1156 | " scale=alt.Scale(zero=False),\n",
1157 | " axis=alt.Axis(title=''),\n",
1158 | " ),\n",
1159 | " )\n",
1160 | " )\n",
1161 | " area_2 = (\n",
1162 | " alt.Chart(causalimpact_df.reset_index())\n",
1163 | " .mark_area(opacity=0.3)\n",
1164 | " .encode(\n",
1165 | " x=alt.X('yearmonthdate(' + date_col_name + ')'),\n",
1166 | " y=alt.Y('point_effects_lower:Q', scale=alt.Scale(zero=False)),\n",
1167 | " y2=alt.Y2('point_effects_upper:Q'),\n",
1168 | " )\n",
1169 | " )\n",
1170 | " line_3 = (\n",
1171 | " alt.Chart(causalimpact_df.reset_index())\n",
1172 | " .mark_line(strokeDash=[5, 5])\n",
1173 | " .encode(\n",
1174 | " x=alt.X(\n",
1175 | " 'yearmonthdate(' + date_col_name + ')',\n",
1176 | " axis=alt.Axis(title='', format='%Y %b', tickCount=tick_count),\n",
1177 | " ),\n",
1178 | " y=alt.Y(\n",
1179 | " 'cumulative_effects_mean:Q',\n",
1180 | " scale=alt.Scale(zero=False),\n",
1181 | " axis=alt.Axis(title=''),\n",
1182 | " ),\n",
1183 | " )\n",
1184 | " )\n",
1185 | " area_3 = (\n",
1186 | " alt.Chart(causalimpact_df.reset_index())\n",
1187 | " .mark_area(opacity=0.3)\n",
1188 | " .encode(\n",
1189 | " x=alt.X(\n",
1190 | " 'yearmonthdate(' + date_col_name + ')',\n",
1191 | " axis=alt.Axis(title='')),\n",
1192 | " y=alt.Y('cumulative_effects_lower:Q', scale=alt.Scale(zero=False),\n",
1193 | " axis=alt.Axis(title='')),\n",
1194 | " y2=alt.Y2('cumulative_effects_upper:Q'),\n",
1195 | " )\n",
1196 | " )\n",
1197 | " zero_line = (\n",
1198 | " alt.Chart(pd.DataFrame({'y': [0]}))\n",
1199 | " .mark_rule()\n",
1200 | " .encode(y='y', color=alt.value('gray'))\n",
1201 | " )\n",
1202 | " rules = (\n",
1203 | " alt.Chart(\n",
1204 | " pd.DataFrame({\n",
1205 | " 'Date': [str(tread_start), str(treat_end)],\n",
1206 | " 'color': ['red', 'orange'],\n",
1207 | " })\n",
1208 | " )\n",
1209 | " .mark_rule(strokeDash=[5, 5])\n",
1210 | " .encode(x='Date:T', color=alt.Color('color:N', scale=None))\n",
1211 | " )\n",
1212 | " watermark = alt.Chart(pd.DataFrame([1])).mark_text(\n",
1213 | " align='center',\n",
1214 | " dx=0,\n",
1215 | " dy=0,\n",
1216 | " fontSize=48,\n",
1217 | " text='mock experiment',\n",
1218 | " color='red'\n",
1219 | " ).encode(\n",
1220 | " opacity=alt.value(0.5)\n",
1221 | " )\n",
1222 | " if purpose_selection == 1:\n",
1223 | " cumulative = line_3 + area_3 + rules + zero_line + watermark\n",
1224 | " elif causalimpact_object.summary.p_value.average >= threshold:\n",
1225 | " cumulative = area_3 + rules + zero_line\n",
1226 | " else:\n",
1227 | " cumulative = line_3 + area_3 + rules + zero_line\n",
1228 | " plot = alt.vconcat(\n",
1229 | " (line_1 + area_1 + rules).properties(height=100, width=600),\n",
1230 | " (line_2 + area_2 + rules + zero_line).properties(height=100, width=600),\n",
1231 | " (cumulative).properties(height=100, width=600),\n",
1232 | " )\n",
1233 | "\n",
1234 | " tab_data = ipywidgets.Output()\n",
1235 | " tab_report = ipywidgets.Output()\n",
1236 | " tab_summary = ipywidgets.Output()\n",
1237 | " tab_result = ipywidgets.Tab(children = [tab_summary, tab_report, tab_data])\n",
1238 | " tab_result.set_title(0, '>> summary')\n",
1239 | " tab_result.set_title(1, '>> report')\n",
1240 | " tab_result.set_title(2, '>> data')\n",
1241 | " with tab_summary:\n",
1242 | " print('Approximate model accuracy >> MAPE:{:.2%}'.format(mape))\n",
1243 | " if mape <= 0.05:\n",
1244 | " PreProcess._apply_text_style(\n",
1245 | " 'success',\n",
1246 | " 'Very Good: The difference between actual and predicted values is slight.')\n",
1247 | " elif mape <= 0.10:\n",
1248 | " PreProcess._apply_text_style(\n",
1249 | " 'success',\n",
1250 | " 'Good: The difference between the actual and predicted values is within the acceptable range.')\n",
1251 | " elif mape <= 0.15:\n",
1252 | " PreProcess._apply_text_style(\n",
1253 | " 'failure',\n",
1254 | " 'Medium: he difference between the actual and predicted values ismoderate, so this is only a reference value.')\n",
1255 | " else:\n",
1256 | " PreProcess._apply_text_style(\n",
1257 | " 'failure',\n",
1258 | " 'Bad: The difference between actual and predicted values is large, so we do not recommend using it.')\n",
1259 | " if causalimpact_object.summary.p_value.average <= threshold:\n",
1260 | " PreProcess._apply_text_style('success', f'\\nP-Value is under {threshold}. There is a statistically significant difference.')\n",
1261 | " else:\n",
1262 | " PreProcess._apply_text_style('failure', f'\\nP-Value is over {threshold}. There is not a statistically significant difference.')\n",
1263 | "\n",
1264 | " print(causalimpact.summary(\n",
1265 | " causalimpact_object,\n",
1266 | " output_format='summary',\n",
1267 | " alpha= 1 - credible_interval / 100))\n",
1268 | " display(plot)\n",
1269 | " with tab_report:\n",
1270 | " print(causalimpact.summary(\n",
1271 | " causalimpact_object,\n",
1272 | " output_format=\"report\",\n",
1273 | " alpha= 1 - credible_interval / 100))\n",
1274 | " with tab_data:\n",
1275 | " df = causalimpact_object.series\n",
1276 | " df.insert(2, 'diff_percentage', df['point_effects_mean'] / df['observed'])\n",
1277 | " display(df)\n",
1278 | " display(tab_result)\n",
1279 | "\n",
1280 | " def run_experimental_design(self):\n",
1281 | " if self.date_selection.selected_index == 0:\n",
1282 | " self.start_date_value = min(self.formatted_data.index).date()\n",
1283 | " self.end_date_value = max(self.formatted_data.index).date()\n",
1284 | " else:\n",
1285 | " self.start_date_value = self.start_date.value\n",
1286 | " self.end_date_value = self.end_date.value\n",
1287 | "\n",
1288 | " if self.design_type.selected_index == 0:\n",
1289 | " self.distance_data = self._n_part_split(\n",
1290 | " self.formatted_data.query(\n",
1291 | " '@self.start_date_value <= index <= @self.end_date_value'\n",
1292 | " ),\n",
1293 | " self.num_of_split.value,\n",
1294 | " CausalImpact.NUM_OF_ITERATION\n",
1295 | " )\n",
1296 | " elif self.design_type.selected_index == 1:\n",
1297 | " self.distance_data = self._find_similar(\n",
1298 | " self.formatted_data.query(\n",
1299 | " '@self.start_date_value <= index <= @self.end_date_value'\n",
1300 | " ),\n",
1301 | " self.target_columns.value,\n",
1302 | " self.num_of_pick_range.value,\n",
1303 | " self.num_of_covariate.value\n",
1304 | " )\n",
1305 | " elif self.design_type.selected_index == 2:\n",
1306 | " self.distance_data = self._from_share(\n",
1307 | " self.formatted_data.query(\n",
1308 | " '@self.start_date_value <= index <= @self.end_date_value'\n",
1309 | " ),\n",
1310 | " self.target_share.value,\n",
1311 | " )\n",
1312 | " elif self.design_type.selected_index == 3:\n",
1313 | " self.distance_data = self._given_assignment(\n",
1314 | " self.target_columns.value,\n",
1315 | " self.control_columns.value,\n",
1316 | " )\n",
1317 | " else:\n",
1318 | " self._apply_text_style('failure', '\\n\\nFailure!!')\n",
1319 | " print('Please check the following:')\n",
1320 | " print('* There is something wrong with design type.')\n",
1321 | " raise Exception('Please check Failure')\n",
1322 | "\n",
1323 | " self._visualize_candidate(\n",
1324 | " self.formatted_data,\n",
1325 | " self.distance_data,\n",
1326 | " self.start_date_value,\n",
1327 | " self.end_date_value,\n",
1328 | " self.date_col_name,\n",
1329 | " self.tick_count\n",
1330 | " )\n",
1331 | " self._generate_choice()\n",
1332 | "\n",
1333 | " @staticmethod\n",
1334 | " def _n_part_split(dataframe, num_of_split, NUM_OF_ITERATION):\n",
1335 | " \"\"\"n_part_split\n",
1336 | "\n",
1337 | " Args:\n",
1338 | " dataframe: xxx.\n",
1339 | " num_of_split: xxx.\n",
1340 | " NUM_OF_ITERATION: xxx.\n",
1341 | " \"\"\"\n",
1342 | " distance_data = pd.DataFrame(columns=['distance'])\n",
1343 | " num_of_pick = len(dataframe.columns) // num_of_split\n",
1344 | "\n",
1345 | " for l in tqdm(range(NUM_OF_ITERATION)):\n",
1346 | " col_list = list(dataframe.columns)\n",
1347 | " picked_data = pd.DataFrame()\n",
1348 | "\n",
1349 | " # random pick\n",
1350 | " picks = []\n",
1351 | " for s in range(num_of_split):\n",
1352 | " random_pick = random.sample(col_list, num_of_pick)\n",
1353 | " picks.append(random_pick)\n",
1354 | " col_list = [i for i in col_list if i not in random_pick]\n",
1355 | " picks[0].extend(col_list)\n",
1356 | "\n",
1357 | " for i in range(len(picks)):\n",
1358 | " picked_data = pd.concat([\n",
1359 | " picked_data,\n",
1360 | " pd.DataFrame(dataframe[picks[i]].sum(axis=1), columns=[i])\n",
1361 | " ], axis=1)\n",
1362 | "\n",
1363 | " # calculate distance\n",
1364 | " distance = CausalImpact._calculate_distance(\n",
1365 | " picked_data.reset_index(drop=True)\n",
1366 | " )\n",
1367 | " distance_data.loc[l, 'distance'] = float(distance)\n",
1368 | " for j in range(len(picks)):\n",
1369 | " distance_data.at[l, j] = str(sorted(picks[j]))\n",
1370 | "\n",
1371 | " distance_data = (\n",
1372 | " distance_data.drop_duplicates()\n",
1373 | " .sort_values('distance')\n",
1374 | " .head(3)\n",
1375 | " .reset_index(drop=True)\n",
1376 | " )\n",
1377 | " return distance_data\n",
1378 | "\n",
1379 | " @staticmethod\n",
1380 | " def _find_similar(\n",
1381 | " dataframe,\n",
1382 | " target_columns,\n",
1383 | " num_of_pick_range,\n",
1384 | " num_of_covariate,\n",
1385 | " ):\n",
1386 | " distance_data = pd.DataFrame(columns=['distance'])\n",
1387 | " target_cols = target_columns.replace(', ', ',').split(',')\n",
1388 | "\n",
1389 | " # An error occurs when the number of candidates (max num_of_range times\n",
1390 | " # num_of_covariates) is greater than num_of_columns excluding target column.\n",
1391 | " if (\n",
1392 | " len(dataframe.columns) - len(target_cols)\n",
1393 | " >= num_of_pick_range[1] * num_of_covariate):\n",
1394 | " pass\n",
1395 | " else:\n",
1396 | " print('Please check the following:')\n",
1397 | " print('* There is something wrong with similarity settings.')\n",
1398 | " print('* Total number of columns ー the target = {}'.format(\n",
1399 | " len(dataframe.columns) - len(target_cols)))\n",
1400 | " print('* But your settings are {}(max pick#) × {}(covariate#)'.format(\n",
1401 | " num_of_pick_range[1], num_of_covariate))\n",
1402 | " print('* Please set it so that it does not exceed.')\n",
1403 | " PreProcess._apply_text_style('failure', '▲▲▲▲▲▲\\n\\n')\n",
1404 | " raise Exception('Please check Failure')\n",
1405 | "\n",
1406 | " for l in tqdm(range(CausalImpact.NUM_OF_ITERATION)):\n",
1407 | " picked_data = pd.DataFrame()\n",
1408 | " remained_list = [\n",
1409 | " i for i in list(dataframe.columns) if i not in target_cols\n",
1410 | " ]\n",
1411 | " picks = []\n",
1412 | " for s in range(num_of_covariate):\n",
1413 | " pick = random.sample(remained_list, random.randrange(\n",
1414 | " num_of_pick_range[0], num_of_pick_range[1] + 1, 1\n",
1415 | " )\n",
1416 | " )\n",
1417 | " picks.append(pick)\n",
1418 | " remained_list = [\n",
1419 | " ele for ele in remained_list if ele not in pick\n",
1420 | " ]\n",
1421 | " picks.insert(0, target_cols)\n",
1422 | " for i in range(len(picks)):\n",
1423 | " picked_data = pd.concat([\n",
1424 | " picked_data,\n",
1425 | " pd.DataFrame(dataframe[picks[i]].sum(axis=1), columns=[i])\n",
1426 | " ], axis=1)\n",
1427 | "\n",
1428 | " # calculate distance\n",
1429 | " distance = CausalImpact._calculate_distance(\n",
1430 | " picked_data.reset_index(drop=True)\n",
1431 | " )\n",
1432 | " distance_data.loc[l, 'distance'] = float(distance)\n",
1433 | " for j in range(len(picks)):\n",
1434 | " distance_data.at[l, j] = str(sorted(picks[j]))\n",
1435 | "\n",
1436 | " distance_data = (\n",
1437 | " distance_data.drop_duplicates()\n",
1438 | " .sort_values('distance')\n",
1439 | " .head(3)\n",
1440 | " .reset_index(drop=True)\n",
1441 | " )\n",
1442 | " return distance_data\n",
1443 | "\n",
1444 | " @staticmethod\n",
1445 | " def _from_share(\n",
1446 | " dataframe,\n",
1447 | " target_share\n",
1448 | " ):\n",
1449 | " distance_data = pd.DataFrame(columns=['distance'])\n",
1450 | " combinations = []\n",
1451 | "\n",
1452 | " n = CausalImpact.NUM_OF_ITERATION\n",
1453 | " while len(combinations) < CausalImpact.COMBINATION_TARGET:\n",
1454 | " n -= 1\n",
1455 | " picked_col = np.random.choice(\n",
1456 | " dataframe.columns,\n",
1457 | " # Shareは50%までなので列数を2分割\n",
1458 | " random.randint(1, len(dataframe.columns)//2 + 1),\n",
1459 | " replace=False)\n",
1460 | "\n",
1461 | " # (todo)@rhirota シェアを除外済みか全体か検討\n",
1462 | " if float(Decimal(dataframe[picked_col].sum().sum() / dataframe.sum().sum()\n",
1463 | " ).quantize(Decimal('0.1'), ROUND_HALF_UP)) == target_share:\n",
1464 | " combinations.append(sorted(set(picked_col)))\n",
1465 | " if n == 1:\n",
1466 | " PreProcess._apply_text_style('failure', '\\n\\nFailure!!')\n",
1467 | " print('Please check the following:')\n",
1468 | " print('* There is something wrong with design type C.')\n",
1469 | " print(\"* You couldn't find the right combination in the repetitions.\")\n",
1470 | " print('* Please re-try or re-set target share')\n",
1471 | " PreProcess._apply_text_style('failure', '▲▲▲▲▲▲\\n\\n')\n",
1472 | " raise Exception('Please check Failure')\n",
1473 | "\n",
1474 | " for comb in tqdm(combinations):\n",
1475 | " for l in tqdm(\n",
1476 | " range(\n",
1477 | " CausalImpact.NUM_OF_ITERATION // CausalImpact.COMBINATION_TARGET),\n",
1478 | " leave=False):\n",
1479 | " picked_data = pd.DataFrame()\n",
1480 | " remained_list = [\n",
1481 | " i for i in list(dataframe.columns) if i not in comb\n",
1482 | " ]\n",
1483 | " picks = []\n",
1484 | " picks.append(random.sample(remained_list, random.randrange(\n",
1485 | " # (todo)@rhirota 最小Pickを検討\n",
1486 | " 1, len(remained_list), 1\n",
1487 | " )\n",
1488 | " ))\n",
1489 | " picks.insert(0, comb)\n",
1490 | "\n",
1491 | " for i in range(len(picks)):\n",
1492 | " picked_data = pd.concat([\n",
1493 | " picked_data,\n",
1494 | " pd.DataFrame(dataframe[picks[i]].sum(axis=1), columns=[i])\n",
1495 | " ], axis=1)\n",
1496 | "\n",
1497 | " # calculate distance\n",
1498 | " distance = CausalImpact._calculate_distance(\n",
1499 | " picked_data.reset_index(drop=True)\n",
1500 | " )\n",
1501 | " distance_data.loc[l, 'distance'] = float(distance)\n",
1502 | " for j in range(len(picks)):\n",
1503 | " distance_data.at[l, j] = str(sorted(picks[j]))\n",
1504 | "\n",
1505 | " distance_data = (\n",
1506 | " distance_data.drop_duplicates()\n",
1507 | " .sort_values('distance')\n",
1508 | " .head(3)\n",
1509 | " .reset_index(drop=True)\n",
1510 | " )\n",
1511 | " return distance_data\n",
1512 | "\n",
1513 | " @staticmethod\n",
1514 | " def _given_assignment(target_columns, control_columns):\n",
1515 | " distance_data = pd.DataFrame(columns=['distance'])\n",
1516 | " distance_data.loc[0, 'distance'] = 0\n",
1517 | " distance_data.loc[0, 0] = str(target_columns.replace(', ', ',').split(','))\n",
1518 | " distance_data.loc[0, 1] = str(control_columns.replace(', ', ',').split(','))\n",
1519 | " return distance_data\n",
1520 | "\n",
1521 | " @staticmethod\n",
1522 | " def _calculate_distance(dataframe):\n",
1523 | " total_distance = 0\n",
1524 | " scaled_data = pd.DataFrame()\n",
1525 | " for col in dataframe:\n",
1526 | " scaled_data[col] = (dataframe[col] - dataframe[col].min()) / (\n",
1527 | " dataframe[col].max() - dataframe[col].min()\n",
1528 | " )\n",
1529 | " scaled_data = scaled_data.diff().reset_index().dropna()\n",
1530 | " for v in itertools.combinations(list(scaled_data.columns), 2):\n",
1531 | " distance, _ = fastdtw.fastdtw(\n",
1532 | " scaled_data.loc[:, ['index', v[0]]],\n",
1533 | " scaled_data.loc[:, ['index', v[1]]],\n",
1534 | " dist=euclidean,\n",
1535 | " )\n",
1536 | " total_distance = total_distance + distance\n",
1537 | " return total_distance\n",
1538 | "\n",
1539 | " @staticmethod\n",
1540 | " def _visualize_candidate(\n",
1541 | " dataframe,\n",
1542 | " distance_data,\n",
1543 | " start_date_value,\n",
1544 | " end_date_value,\n",
1545 | " date_col_name,\n",
1546 | " tick_count\n",
1547 | " ):\n",
1548 | " PreProcess._apply_text_style(\n",
1549 | " 'failure',\n",
1550 | " '\\nCheck! Experimental Design Parameters.'\n",
1551 | " )\n",
1552 | " print('* start_date_value: ' + str(start_date_value))\n",
1553 | " print('* end_date_value: ' + str(end_date_value))\n",
1554 | " print('* columns:')\n",
1555 | " l = []\n",
1556 | " for i in range(len(dataframe.columns)):\n",
1557 | " l.append(dataframe.columns[i])\n",
1558 | " if len(str(l)) >= CausalImpact.MAX_STRING_LENGTH:\n",
1559 | " print(str(l).translate(str.maketrans({'[': '', ']': '', \"'\": ''})))\n",
1560 | " l = []\n",
1561 | " print('\\n')\n",
1562 | "\n",
1563 | " sub_tab=[ipywidgets.Output() for i in distance_data.index.tolist()]\n",
1564 | " tab_option = ipywidgets.Tab(sub_tab)\n",
1565 | " for i in range (len(distance_data.index.tolist())):\n",
1566 | " tab_option.set_title(i,\"option_{}\".format(i+1))\n",
1567 | " with sub_tab[i]:\n",
1568 | " candidate_df = pd.DataFrame(index=dataframe.index)\n",
1569 | " for col in range(len(distance_data.columns) - 1):\n",
1570 | " print(\n",
1571 | " 'col_' + str(col + 1) + ': '+ distance_data.at[i, col].replace(\n",
1572 | " \"'\", \"\"))\n",
1573 | " candidate_df[col + 1] = list(\n",
1574 | " dataframe.loc[:, eval(distance_data.at[i, col])].sum(axis=1)\n",
1575 | " )\n",
1576 | " print('\\n')\n",
1577 | " candidate_df = candidate_df.add_prefix('col_')\n",
1578 | "\n",
1579 | " candidate_share = pd.DataFrame(\n",
1580 | " candidate_df.loc[str(start_date_value):str(end_date_value), :\n",
1581 | " ].sum(),\n",
1582 | " columns=['total'])\n",
1583 | " candidate_share['daily_average'] = candidate_share['total'] // (\n",
1584 | " end_date_value - start_date_value).days\n",
1585 | " candidate_share['share'] = candidate_share['total'] / (dataframe.query(\n",
1586 | " '@start_date_value <= index <= @end_date_value'\n",
1587 | " ).sum().sum())\n",
1588 | "\n",
1589 | " try:\n",
1590 | " for i in candidate_df.columns:\n",
1591 | " stl = STL(candidate_df[i], robust=True).fit()\n",
1592 | " candidate_share.loc[i, 'std'] = np.std(stl.seasonal + stl.resid)\n",
1593 | " display(\n",
1594 | " candidate_share[['daily_average', 'share', 'std']].style.format(\n",
1595 | " {\n",
1596 | " 'daily_average': '{:,.0f}',\n",
1597 | " 'share': '{:.1%}',\n",
1598 | " 'std': '{:,.0f}',\n",
1599 | " }))\n",
1600 | " except Exception as e:\n",
1601 | " print(e)\n",
1602 | " display(\n",
1603 | " candidate_share[['daily_average', 'share']].style.format({\n",
1604 | " 'daily_average': '{:,.0f}',\n",
1605 | " 'share': '{:.1%}',\n",
1606 | " }))\n",
1607 | "\n",
1608 | " chart_line = (\n",
1609 | " alt.Chart(candidate_df.reset_index())\n",
1610 | " .transform_fold(\n",
1611 | " fold=list(candidate_df.columns), as_=['pivot', 'kpi']\n",
1612 | " )\n",
1613 | " .mark_line()\n",
1614 | " .encode(\n",
1615 | " x=alt.X(\n",
1616 | " date_col_name + ':T',\n",
1617 | " title=None,\n",
1618 | " axis=alt.Axis(\n",
1619 | " grid=False, format='%Y %b', tickCount=tick_count\n",
1620 | " ),\n",
1621 | " ),\n",
1622 | " y=alt.Y('kpi:Q'),\n",
1623 | " color=alt.Color(\n",
1624 | " 'pivot:N',\n",
1625 | " legend=alt.Legend(\n",
1626 | " title=None,\n",
1627 | " orient='none',\n",
1628 | " legendY=-20,\n",
1629 | " direction='horizontal',\n",
1630 | " titleAnchor='start'),\n",
1631 | " scale=alt.Scale(\n",
1632 | " domain=list(candidate_df.columns),\n",
1633 | " range=CausalImpact.colors)),\n",
1634 | " )\n",
1635 | " .properties(width=600, height=200)\n",
1636 | " )\n",
1637 | "\n",
1638 | " rules = alt.Chart(\n",
1639 | " pd.DataFrame(\n",
1640 | " {\n",
1641 | " 'Date': [str(start_date_value), str(end_date_value)],\n",
1642 | " 'color': ['red', 'orange']\n",
1643 | " })\n",
1644 | " ).mark_rule(strokeDash=[5, 5]).encode(\n",
1645 | " x='Date:T',\n",
1646 | " color=alt.Color('color:N', scale=None))\n",
1647 | "\n",
1648 | " df_scaled = candidate_df.copy()\n",
1649 | " df_scaled[:] = MinMaxScaler().fit_transform(candidate_df)\n",
1650 | " chart_line_scaled = (\n",
1651 | " alt.Chart(df_scaled.reset_index())\n",
1652 | " .transform_fold(\n",
1653 | " fold=list(candidate_df.columns),\n",
1654 | " as_=['pivot', 'kpi']\n",
1655 | " )\n",
1656 | " .mark_line()\n",
1657 | " .encode(\n",
1658 | " x=alt.X(\n",
1659 | " date_col_name + ':T',\n",
1660 | " title=None,\n",
1661 | " axis=alt.Axis(\n",
1662 | " grid=False, format='%Y %b', tickCount=tick_count\n",
1663 | " ),\n",
1664 | " ),\n",
1665 | " y=alt.Y('kpi:Q'),\n",
1666 | " color=alt.Color(\n",
1667 | " 'pivot:N',\n",
1668 | " legend=alt.Legend(\n",
1669 | " title=None,\n",
1670 | " orient='none',\n",
1671 | " legendY=-20,\n",
1672 | " direction='horizontal',\n",
1673 | " titleAnchor='start'),\n",
1674 | " scale=alt.Scale(\n",
1675 | " domain=list(candidate_df.columns),\n",
1676 | " range=CausalImpact.colors)),\n",
1677 | " )\n",
1678 | " .properties(width=600, height=80)\n",
1679 | " )\n",
1680 | "\n",
1681 | " df_diff = pd.DataFrame(\n",
1682 | " np.diff(candidate_df, axis=0),\n",
1683 | " columns=candidate_df.columns.values,\n",
1684 | " )\n",
1685 | " scatter = (\n",
1686 | " alt.Chart(df_diff.reset_index())\n",
1687 | " .mark_circle()\n",
1688 | " .encode(\n",
1689 | " alt.X(alt.repeat('column'), type='quantitative'),\n",
1690 | " alt.Y(alt.repeat('row'), type='quantitative'),\n",
1691 | " )\n",
1692 | " .properties(width=80, height=80)\n",
1693 | " .repeat(\n",
1694 | " row=df_diff.columns.values,\n",
1695 | " column=df_diff.columns.values,\n",
1696 | " )\n",
1697 | " )\n",
1698 | " display(\n",
1699 | " alt.vconcat(chart_line + rules, chart_line_scaled) | scatter)\n",
1700 | " display(tab_option)\n",
1701 | "\n",
1702 | " def _generate_choice(self):\n",
1703 | " self.your_choice = ipywidgets.Dropdown(\n",
1704 | " options=['option_1', 'option_2', 'option_3'],\n",
1705 | " description='your choice:',\n",
1706 | " )\n",
1707 | " self.target_col_to_simulate = ipywidgets.SelectMultiple(\n",
1708 | " options=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6'],\n",
1709 | " description='target col:',\n",
1710 | " value=['col_1',],\n",
1711 | " )\n",
1712 | " self.covariate_col_to_simulate = ipywidgets.SelectMultiple(\n",
1713 | " options=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6'],\n",
1714 | " description='covatiate col:',\n",
1715 | " value=['col_2',],\n",
1716 | " style={'description_width': 'initial'},\n",
1717 | " )\n",
1718 | " display(\n",
1719 | " PreProcess._apply_text_style(\n",
1720 | " 18,\n",
1721 | " '⑷ Please select option, test column & control column(s).'),\n",
1722 | " ipywidgets.HBox([\n",
1723 | " self.your_choice,\n",
1724 | " self.target_col_to_simulate,\n",
1725 | " self.covariate_col_to_simulate,\n",
1726 | " ]),\n",
1727 | " )\n",
1728 | "\n",
1729 | " def generate_simulation(self):\n",
1730 | " self.test_data = self._extract_data_from_choice(\n",
1731 | " self.your_choice.value,\n",
1732 | " self.target_col_to_simulate.value,\n",
1733 | " self.covariate_col_to_simulate.value,\n",
1734 | " self.formatted_data,\n",
1735 | " self.distance_data,\n",
1736 | " )\n",
1737 | " self.simulation_params, self.ci_objs = self._execute_simulation(\n",
1738 | " self.test_data,\n",
1739 | " self.date_col_name,\n",
1740 | " self.start_date_value,\n",
1741 | " self.end_date_value,\n",
1742 | " self.num_of_seasons.value,\n",
1743 | " self.credible_interval.value,\n",
1744 | " CausalImpact.TREAT_DURATION,\n",
1745 | " CausalImpact.TREAT_IMPACT,\n",
1746 | " )\n",
1747 | " self._display_simulation_result(\n",
1748 | " self.simulation_params,\n",
1749 | " self.ci_objs,\n",
1750 | " self.estimate_icpa.value,\n",
1751 | " )\n",
1752 | " self._plot_simulation_result(\n",
1753 | " self.simulation_params,\n",
1754 | " self.ci_objs,\n",
1755 | " self.date_col_name,\n",
1756 | " self.tick_count,\n",
1757 | " self.purpose_selection.selected_index,\n",
1758 | " self.credible_interval.value,\n",
1759 | " )\n",
1760 | "\n",
1761 | " @staticmethod\n",
1762 | " def _extract_data_from_choice(\n",
1763 | " your_choice,\n",
1764 | " target_col_to_simulate,\n",
1765 | " covariate_col_to_simulate,\n",
1766 | " dataframe,\n",
1767 | " distance\n",
1768 | " ):\n",
1769 | " selection_row = int(your_choice.replace('option_', '')) - 1\n",
1770 | " selection_cols = [\n",
1771 | " [int(t.replace('col_', '')) - 1 for t in list(target_col_to_simulate)],\n",
1772 | " [int(t.replace('col_', '')) - 1 for t in list(covariate_col_to_simulate)\n",
1773 | " ]]\n",
1774 | " test_data = pd.DataFrame(index = dataframe.index)\n",
1775 | "\n",
1776 | " test_column = []\n",
1777 | " for i in selection_cols[0]:\n",
1778 | " test_column.extend(eval(distance.at[selection_row,i]))\n",
1779 | " test_data['test'] = dataframe.loc[\n",
1780 | " :, test_column\n",
1781 | " ].sum(axis=1)\n",
1782 | "\n",
1783 | " for col in selection_cols[1]:\n",
1784 | " test_data['col_'+ str(col+1)] = dataframe.loc[\n",
1785 | " :, eval(distance.at[selection_row, col])\n",
1786 | " ].sum(axis=1)\n",
1787 | "\n",
1788 | " print('* test: {}\\n'.format(str(test_column).replace(\"'\", \"\")))\n",
1789 | " print('* covariate')\n",
1790 | " for x,i in zip(test_data.columns[1:],selection_cols[1]):\n",
1791 | " print('> {}: {}'.format(\n",
1792 | " x,\n",
1793 | " str(eval(distance.at[selection_row, i]))).replace(\"'\", \"\")\n",
1794 | " )\n",
1795 | " return test_data\n",
1796 | "\n",
1797 | " @staticmethod\n",
1798 | " def _execute_simulation(\n",
1799 | " dataframe,\n",
1800 | " date_col_name,\n",
1801 | " start_date_value,\n",
1802 | " end_date_value,\n",
1803 | " num_of_seasons,\n",
1804 | " credible_interval,\n",
1805 | " TREAT_DURATION,\n",
1806 | " TREAT_IMPACT,\n",
1807 | " ):\n",
1808 | " ci_objs = []\n",
1809 | " simulation_params = []\n",
1810 | " adjusted_data = dataframe.copy()\n",
1811 | "\n",
1812 | " for duration in tqdm(TREAT_DURATION):\n",
1813 | " for impact in tqdm(TREAT_IMPACT, leave=False):\n",
1814 | " pre_end_date = end_date_value + datetime.timedelta(days=-duration)\n",
1815 | " post_start_date = pre_end_date + datetime.timedelta(days=1)\n",
1816 | " adjusted_data.loc[\n",
1817 | " np.datetime64(post_start_date) : np.datetime64(end_date_value),\n",
1818 | " 'test',] = (\n",
1819 | " dataframe.loc[\n",
1820 | " np.datetime64(post_start_date) : np.datetime64(end_date_value\n",
1821 | " ),\n",
1822 | " 'test',\n",
1823 | " ]\n",
1824 | " * impact\n",
1825 | " )\n",
1826 | "\n",
1827 | " ci_obj = CausalImpact.create_causalimpact_object(\n",
1828 | " adjusted_data,\n",
1829 | " date_col_name,\n",
1830 | " start_date_value,\n",
1831 | " pre_end_date,\n",
1832 | " post_start_date,\n",
1833 | " end_date_value,\n",
1834 | " num_of_seasons,\n",
1835 | " credible_interval,\n",
1836 | " )\n",
1837 | " simulation_params.append([\n",
1838 | " start_date_value,\n",
1839 | " pre_end_date,\n",
1840 | " post_start_date,\n",
1841 | " end_date_value,\n",
1842 | " impact,\n",
1843 | " duration,\n",
1844 | " ])\n",
1845 | " ci_objs.append(ci_obj)\n",
1846 | " return simulation_params, ci_objs\n",
1847 | "\n",
1848 | " @staticmethod\n",
1849 | " def _display_simulation_result(simulation_params, ci_objs, estimate_icpa):\n",
1850 | " simulation_df = pd.DataFrame(\n",
1851 | " index=[],\n",
1852 | " columns=[\n",
1853 | " 'mock_lift',\n",
1854 | " 'Days_simulated',\n",
1855 | " 'Pre_Period_MAPE',\n",
1856 | " 'Post_Period_MAPE',\n",
1857 | " 'Total_effect',\n",
1858 | " 'Average_effect',\n",
1859 | " 'Required_budget',\n",
1860 | " 'p_value',\n",
1861 | " 'predicted_lift'\n",
1862 | " ],\n",
1863 | " )\n",
1864 | " for i in range(len(ci_objs)):\n",
1865 | " impact_df = ci_objs[i].series\n",
1866 | " impact_dict = {\n",
1867 | " 'test_period':'('+str(simulation_params[i][5])+'d) '+str(simulation_params[i][2])+'~'+str(simulation_params[i][3]),\n",
1868 | " 'mock_lift_rate': simulation_params[i][4] - 1,\n",
1869 | " 'predicted_lift_rate': ci_objs[i].summary.loc['average', 'rel_effect'],\n",
1870 | " 'Days_simulated': simulation_params[i][5],\n",
1871 | " 'Pre_Period_MAPE': [\n",
1872 | " mean_absolute_percentage_error(\n",
1873 | " impact_df.loc[:, 'observed'][\n",
1874 | " str(simulation_params[i][0]) : str(\n",
1875 | " simulation_params[i][1]\n",
1876 | " )\n",
1877 | " ],\n",
1878 | " impact_df.loc[:, 'posterior_mean'][\n",
1879 | " str(simulation_params[i][0]) : str(\n",
1880 | " simulation_params[i][1]\n",
1881 | " )\n",
1882 | " ],\n",
1883 | " )\n",
1884 | " ],\n",
1885 | " 'Post_Period_MAPE': [\n",
1886 | " mean_absolute_percentage_error(\n",
1887 | " impact_df.loc[:, 'observed'][\n",
1888 | " str(simulation_params[i][2]) : str(\n",
1889 | " simulation_params[i][3]\n",
1890 | " )\n",
1891 | " ],\n",
1892 | " impact_df.loc[:, 'posterior_mean'][\n",
1893 | " str(simulation_params[i][2]) : str(\n",
1894 | " simulation_params[i][3]\n",
1895 | " )\n",
1896 | " ],\n",
1897 | " )\n",
1898 | " ],\n",
1899 | " 'Total_effect': ci_objs[i].summary.loc['cumulative', 'abs_effect'],\n",
1900 | " 'Average_effect': ci_objs[i].summary.loc['average', 'abs_effect'],\n",
1901 | " 'Required_budget': [\n",
1902 | " ci_objs[i].summary.loc['cumulative', 'abs_effect'] * estimate_icpa\n",
1903 | " ],\n",
1904 | " 'p_value': ci_objs[i].summary.loc['average', 'p_value'],\n",
1905 | "\n",
1906 | " }\n",
1907 | " simulation_df = pd.concat(\n",
1908 | " [simulation_df, pd.DataFrame.from_dict(impact_dict)],\n",
1909 | " ignore_index=True,\n",
1910 | " )\n",
1911 | " display(PreProcess._apply_text_style(\n",
1912 | " 18,\n",
1913 | " 'A/A Test: Check the error without intervention'))\n",
1914 | " print('> If p_value < 0.05, please suspect \"poor model accuracy\"(See Pre_Period_MAPE) or \"data drift\"(See Time Series Chart).\\n')\n",
1915 | " display(\n",
1916 | " simulation_df.query('mock_lift_rate == 0')[\n",
1917 | " ['test_period','Pre_Period_MAPE','Post_Period_MAPE','p_value']\n",
1918 | " ].style.format({\n",
1919 | " 'Pre_Period_MAPE': '{:.2%}',\n",
1920 | " 'Post_Period_MAPE': '{:.2%}',\n",
1921 | " 'p_value': '{:,.2f}',\n",
1922 | " }).hide()\n",
1923 | " )\n",
1924 | " print('\\n')\n",
1925 | " display(PreProcess._apply_text_style(\n",
1926 | " 18,\n",
1927 | " 'Simulation with increments as a mock experiment'))\n",
1928 | " for i in simulation_df.Days_simulated.unique():\n",
1929 | " print('\\n During the last {} days'.format(i))\n",
1930 | " display(\n",
1931 | " simulation_df.query('mock_lift_rate != 0 & Days_simulated == @i')[\n",
1932 | " [\n",
1933 | " 'mock_lift_rate',\n",
1934 | " 'predicted_lift_rate',\n",
1935 | " 'Pre_Period_MAPE',\n",
1936 | " 'Total_effect',\n",
1937 | " 'Average_effect',\n",
1938 | " 'Required_budget',\n",
1939 | " 'p_value',\n",
1940 | " ]\n",
1941 | " ].style.format({\n",
1942 | " 'mock_lift_rate': '{:+.0%}',\n",
1943 | " 'predicted_lift_rate': '{:+.1%}',\n",
1944 | " 'Pre_Period_MAPE': '{:.2%}',\n",
1945 | " 'Total_effect': '{:,.2f}',\n",
1946 | " 'Average_effect': '{:,.2f}',\n",
1947 | " 'Required_budget': '{:,.0f}',\n",
1948 | " 'p_value': '{:,.2f}',\n",
1949 | " }).hide()\n",
1950 | " )\n",
1951 | "\n",
1952 | " @staticmethod\n",
1953 | " def _plot_simulation_result(\n",
1954 | " simulation_params,\n",
1955 | " ci_objs,\n",
1956 | " date_col_name,\n",
1957 | " tick_count,\n",
1958 | " purpose_selection,\n",
1959 | " credible_interval,\n",
1960 | " ):\n",
1961 | "\n",
1962 | " mock_combinations = []\n",
1963 | " for i in range(len(simulation_params)):\n",
1964 | " mock_combinations.append(\n",
1965 | " [\n",
1966 | " '{}d:+{:.0%}'.format(\n",
1967 | " simulation_params[i][5],\n",
1968 | " simulation_params[i][4]-1)\n",
1969 | " ])\n",
1970 | " simulation_tb=[ipywidgets.Output() for tab in mock_combinations]\n",
1971 | " tab_simulation = ipywidgets.Tab(simulation_tb)\n",
1972 | " for id,name in enumerate(mock_combinations):\n",
1973 | " tab_simulation.set_title(id,name)\n",
1974 | " with simulation_tb[id]:\n",
1975 | " print(\n",
1976 | " 'Pre Period:{} ~ {}\\nPost Period:{} ~ {}'.format(\n",
1977 | " simulation_params[id][0],\n",
1978 | " simulation_params[id][1],\n",
1979 | " simulation_params[id][2],\n",
1980 | " simulation_params[id][3],\n",
1981 | " )\n",
1982 | " )\n",
1983 | " CausalImpact.plot_causalimpact(\n",
1984 | " ci_objs[id],\n",
1985 | " simulation_params[id][0],\n",
1986 | " simulation_params[id][1],\n",
1987 | " simulation_params[id][2],\n",
1988 | " simulation_params[id][3],\n",
1989 | " credible_interval,\n",
1990 | " date_col_name,\n",
1991 | " tick_count,\n",
1992 | " purpose_selection\n",
1993 | " )\n",
1994 | " display(tab_simulation)\n",
1995 | "\n",
1996 | "case_1 = CausalImpact()\n",
1997 | "case_1.generate_ui()\n",
1998 | "if 'dict_params' in globals():\n",
1999 | " CausalImpact.set_params(case_1, dict_params)\n",
2000 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))\n"
2001 | ],
2002 | "metadata": {
2003 | "id": "_WR_6zEwE2yK",
2004 | "cellView": "form"
2005 | },
2006 | "execution_count": null,
2007 | "outputs": []
2008 | },
2009 | {
2010 | "cell_type": "code",
2011 | "source": [
2012 | "# @title Step.2\n",
2013 | "%%time\n",
2014 | "case_1.load_data()\n",
2015 | "case_1.format_data()\n",
2016 | "dict_params = PreProcess.saving_params(case_1)\n",
2017 | "\n",
2018 | "if case_1.purpose_selection.selected_index == 0:\n",
2019 | " case_1.run_causalImpact()\n",
2020 | "else:\n",
2021 | " case_1.run_experimental_design()\n",
2022 | "\n",
2023 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2024 | ],
2025 | "metadata": {
2026 | "id": "c94KKPvvlB3u",
2027 | "cellView": "form"
2028 | },
2029 | "execution_count": null,
2030 | "outputs": []
2031 | },
2032 | {
2033 | "cell_type": "code",
2034 | "source": [
2035 | "# @title Step.3\n",
2036 | "%%time\n",
2037 | "if case_1.purpose_selection.selected_index == 0:\n",
2038 | " case_1.display_causalimpact_result()\n",
2039 | "else:\n",
2040 | " case_1.generate_simulation()\n",
2041 | "\n",
2042 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2043 | ],
2044 | "metadata": {
2045 | "id": "yK5gZ0KioPP4",
2046 | "cellView": "form"
2047 | },
2048 | "execution_count": null,
2049 | "outputs": []
2050 | },
2051 | {
2052 | "cell_type": "markdown",
2053 | "source": [
2054 | "# (Optional) Case_2"
2055 | ],
2056 | "metadata": {
2057 | "id": "yRkmseYMdtfB"
2058 | }
2059 | },
2060 | {
2061 | "cell_type": "code",
2062 | "source": [
2063 | "# @title Case_2 Step.1\n",
2064 | "overwrite_pramas = True #@param {type:\"boolean\"}\n",
2065 | "case_2 = CausalImpact()\n",
2066 | "case_2.generate_ui()\n",
2067 | "if overwrite_pramas == True: PreProcess.set_params(case_2, dict_params)"
2068 | ],
2069 | "metadata": {
2070 | "cellView": "form",
2071 | "id": "PsQlufVpdxOD"
2072 | },
2073 | "execution_count": null,
2074 | "outputs": []
2075 | },
2076 | {
2077 | "cell_type": "code",
2078 | "source": [
2079 | "# @title Case_2 Step.2\n",
2080 | "%%time\n",
2081 | "case_2.load_data()\n",
2082 | "case_2.format_data()\n",
2083 | "\n",
2084 | "if case_2.purpose_selection.selected_index == 0:\n",
2085 | " case_2.run_causalImpact()\n",
2086 | "else:\n",
2087 | " case_2.run_experimental_design()\n",
2088 | "\n",
2089 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2090 | ],
2091 | "metadata": {
2092 | "cellView": "form",
2093 | "id": "rMgpKut9ewEy"
2094 | },
2095 | "execution_count": null,
2096 | "outputs": []
2097 | },
2098 | {
2099 | "cell_type": "code",
2100 | "source": [
2101 | "# @title Case_2 Step.3\n",
2102 | "%%time\n",
2103 | "if case_2.purpose_selection.selected_index == 0:\n",
2104 | " case_2.display_causalimpact_result()\n",
2105 | "else:\n",
2106 | " case_2.generate_simulation()\n",
2107 | "\n",
2108 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2109 | ],
2110 | "metadata": {
2111 | "cellView": "form",
2112 | "id": "L7H9OEhme7Wu"
2113 | },
2114 | "execution_count": null,
2115 | "outputs": []
2116 | },
2117 | {
2118 | "cell_type": "markdown",
2119 | "source": [
2120 | "# (Optional) Case_3"
2121 | ],
2122 | "metadata": {
2123 | "id": "wyh14BKUfKcD"
2124 | }
2125 | },
2126 | {
2127 | "cell_type": "code",
2128 | "source": [
2129 | "# @title Case_3 Step.1\n",
2130 | "overwrite_pramas = False #@param {type:\"boolean\"}\n",
2131 | "case_3 = CausalImpact()\n",
2132 | "case_3.generate_ui()\n",
2133 | "if overwrite_pramas == True: PreProcess.set_params(case_3, dict_params)"
2134 | ],
2135 | "metadata": {
2136 | "cellView": "form",
2137 | "id": "Gb_PkbFifKcE"
2138 | },
2139 | "execution_count": null,
2140 | "outputs": []
2141 | },
2142 | {
2143 | "cell_type": "code",
2144 | "source": [
2145 | "# @title Case_3 Step.2\n",
2146 | "%%time\n",
2147 | "case_3.load_data()\n",
2148 | "case_3.format_data()\n",
2149 | "\n",
2150 | "if case_3.purpose_selection.selected_index == 0:\n",
2151 | " case_3.run_causalImpact()\n",
2152 | "else:\n",
2153 | " case_3.run_experimental_design()\n",
2154 | "\n",
2155 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2156 | ],
2157 | "metadata": {
2158 | "cellView": "form",
2159 | "id": "UUKm41oWfKcF"
2160 | },
2161 | "execution_count": null,
2162 | "outputs": []
2163 | },
2164 | {
2165 | "cell_type": "code",
2166 | "source": [
2167 | "# @title Case_3 Step.3\n",
2168 | "%%time\n",
2169 | "if case_3.purpose_selection.selected_index == 0:\n",
2170 | " case_3.display_causalimpact_result()\n",
2171 | "else:\n",
2172 | " case_3.generate_simulation()\n",
2173 | "\n",
2174 | "print('\\nExecution datetime(GMT):{}'.format(datetime.datetime.now()))"
2175 | ],
2176 | "metadata": {
2177 | "cellView": "form",
2178 | "id": "7Ssv3wP9fKcF"
2179 | },
2180 | "execution_count": null,
2181 | "outputs": []
2182 | }
2183 | ],
2184 | "metadata": {
2185 | "colab": {
2186 | "provenance": [],
2187 | "include_colab_link": true
2188 | },
2189 | "kernelspec": {
2190 | "display_name": "Python 3",
2191 | "name": "python3"
2192 | },
2193 | "language_info": {
2194 | "name": "python"
2195 | }
2196 | },
2197 | "nbformat": 4,
2198 | "nbformat_minor": 0
2199 | }
--------------------------------------------------------------------------------