├── CONTRIBUTING.md
├── ContributorAgreement.txt
├── LICENSE
├── README.md
├── SUPPORT.md
├── discover.py
├── dsccnfg
    └── config.txt
├── dscdonl
    └── dscdonl.txt
├── dscextr
    └── dscextr.txt
├── dscwh
    └── dscwh.txt
├── log
    └── logs.txt
└── sql
    └── sql.txt


/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # How to Contribute
 2 | 
 3 | We'd love to accept your patches and contributions to this project. There are
 4 | just a few small guidelines you need to follow.
 5 | 
 6 | ## Contributor License Agreement
 7 | 
 8 | Contributions to this project must be accompanied by a signed
 9 | [Contributor Agreement](ContributorAgreement.txt).
10 | You (or your employer) retain the copyright to your contribution,
11 | this simply gives us permission to use and redistribute your contributions as
12 | part of the project.
13 | 
14 | ## Code reviews
15 | 
16 | All submissions, including submissions by project members, require review. We
17 | use GitHub pull requests for this purpose. Consult
18 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
19 | information on using pull requests.
20 | 


--------------------------------------------------------------------------------
/ContributorAgreement.txt:
--------------------------------------------------------------------------------
 1 | Contributor Agreement
 2 | 
 3 | Version 1.1
 4 | 
 5 | Contributions to this software are accepted only when they are
 6 | properly accompanied by a Contributor Agreement. The Contributor
 7 | Agreement for this software is the Developer's Certificate of Origin
 8 | 1.1 (DCO) as provided with and required for accepting contributions
 9 | to the Linux kernel.
10 | 
11 | In each contribution proposed to be included in this software, the
12 | developer must include a "sign-off" that denotes consent to the
13 | terms of the Developer's Certificate of Origin.  The sign-off is
14 | a line of text in the description that accompanies the change,
15 | certifying that you have the right to provide the contribution
16 | to be included.  For changes provided in source code control (for
17 | example, via a Git pull request) the sign-off must be included in
18 | the commit message in source code control.  For changes provided
19 | in email or issue tracking, the sign-off must be included in the
20 | email or the issue, and the sign-off will be incorporated into the
21 | permanent commit message if the contribution is accepted into the
22 | official source code.
23 | 
24 | If you can certify the below:
25 | 
26 |         Developer's Certificate of Origin 1.1
27 | 
28 |         By making a contribution to this project, I certify that:
29 | 
30 |         (a) The contribution was created in whole or in part by me and I
31 |             have the right to submit it under the open source license
32 |             indicated in the file; or
33 | 
34 |         (b) The contribution is based upon previous work that, to the best
35 |             of my knowledge, is covered under an appropriate open source
36 |             license and I have the right under that license to submit that
37 |             work with modifications, whether created in whole or in part
38 |             by me, under the same open source license (unless I am
39 |             permitted to submit under a different license), as indicated
40 |             in the file; or
41 | 
42 |         (c) The contribution was provided directly to me by some other
43 |             person who certified (a), (b) or (c) and I have not modified
44 |             it.
45 | 
46 |         (d) I understand and agree that this project and the contribution
47 |             are public and that a record of the contribution (including all
48 |             personal information I submit with it, including my sign-off) is
49 |             maintained indefinitely and may be redistributed consistent with
50 |             this project or the open source license(s) involved.
51 | 
52 | then you just add a line saying
53 | 
54 |         Signed-off-by: Random J Developer <random@developer.example.org>
55 | 
56 | using your real name (sorry, no pseudonyms or anonymous contributions.)


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 
177 |    END OF TERMS AND CONDITIONS
178 | 
179 |    APPENDIX: How to apply the Apache License to your work.
180 | 
181 |       To apply the Apache License to your work, attach the following
182 |       boilerplate notice, with the fields enclosed by brackets "[]"
183 |       replaced with your own identifying information. (Don't include
184 |       the brackets!)  The text should be enclosed in the appropriate
185 |       comment syntax for the file format. We also recommend that a
186 |       file or class name and description of purpose be included on the
187 |       same "printed page" as the copyright notice for easier
188 |       identification within third-party archives.
189 | 
190 |    Copyright [yyyy] [name of copyright owner]
191 | 
192 |    Licensed under the Apache License, Version 2.0 (the "License");
193 |    you may not use this file except in compliance with the License.
194 |    You may obtain a copy of the License at
195 | 
196 |        http://www.apache.org/licenses/LICENSE-2.0
197 | 
198 |    Unless required by applicable law or agreed to in writing, software
199 |    distributed under the License is distributed on an "AS IS" BASIS,
200 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 |    See the License for the specific language governing permissions and
202 |    limitations under the License.
203 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | #  SAS Customer Intelligence 360 Download Client: Python
  2 | 
  3 | ## Overview
  4 | This Python script enables you to download cloud-hosted data tables from SAS Customer Intelligence 360.
  5 |  
  6 | The script can perform the following tasks:
  7 |  * Download the following data marts: `detail`, `dbtReport`, and `snapshot (previously identity)`.
  8 |  * Specify a time range to be downloaded.
  9 |  * Automatically unzip the download packages and create csv files with header rows and field delimiters.
 10 |  * Keep track of all initiated downloads. This lets you download a delta from the last complete download and append it to one file per table.
 11 | 
 12 | This topic contains the following sections:
 13 | * [Configuration](#configuration)
 14 | * [Using the Download Script](#using-the-download-script)
 15 |     * [Considerations](#considerations)
 16 |     * [Running the script](#running-the-script)
 17 |     * [Examples](#examples)
 18 | * [Contributing](#contributing)
 19 | * [License](#license)
 20 | * [Additional Resources](#additional-resources)
 21 | 
 22 | 
 23 | 
 24 | ## Configuration
 25 | 1. Install Python (version 3 or later) from https://www.python.org/.
 26 | 
 27 |    **Tip:** Select the option to add Python to your PATH variable. If you choose the advanced installation option, make sure to install the pip utility.
 28 |    
 29 | 2. Make sure the following modules are installed for Python: `argparse`, `backoff`, `base64`, `codecs`, `csv`, `gzip`, `json`, `os`, 
 30 | `pandas` (version 1.3.0 or later), `PyJWT`, `requests`, `sys`, `time`, and `tqdm`.
 31 | 
 32 |      In most cases, many of the modules are installed by default. To list all packages that are installed with Python 
 33 |      (through pip or by default), use this command:  
 34 |      ```python -c help('modules')```
 35 |      
 36 |      **Tip:** In most situations, you can install the non-default packages with this command:  
 37 |      ```pip install backoff pandas PyJWT requests tqdm```
 38 |   
 39 | 
 40 | 3. Create an access point in SAS Customer Intelligence 360.
 41 |     1. From the user interface, navigate to **General Settings** > **External Access** > **Access Points**.
 42 |     2. Create a new access point if one does not exist.
 43 |     3. Get the following information from the access point:  
 44 |        ```
 45 |         External gateway address: e.g. https://extapigwservice-<server>/marketingGateway  
 46 |         Name: ci360_agent  
 47 |         Tenant ID: abc123-ci360-tenant-id-xyz  
 48 |         Client secret: ABC123ci360clientSecretXYZ  
 49 |        ```
 50 | 4. Download the Python script from this repository and save it to your local machine.
 51 | 
 52 | 5. In the `./dsccnfg/config.txt` file, set the following variables for your tenant:
 53 |    ```
 54 |      agentName = ci360_agent
 55 |      tenantId  = abc123-ci360-tenant-id-xyz
 56 |      secret    = ABC123ci360clientSecretXYZ
 57 |      baseUrl   = https://extapigwservice-<server>/marketingGateway/discoverService/dataDownload/eventData/
 58 |    ```
 59 | 
 60 | 6. Verify the installation by running the following command from command prompt:  
 61 | ```py discover.py –h```
 62 | 
 63 | 
 64 | ## Using the Download Script
 65 | 
 66 | ### Considerations
 67 | Before starting a download, make a note of the following things:
 68 | * When you use the option to create a CSV, choose a delimiter that is not present in the source data.
 69 | * If data resets were processed and you download data in append mode, the old data is not deleted.  
 70 |   The new reset data for the same time period will be appended to the file.
 71 | * If you download data using schema 1 and then use append mode to download data using schema 6, the data is appended based on schema 6. However, the header rows in the existing file will not be updated.
 72 | 
 73 | ### Running the Script
 74 | 
 75 | 1. Open a command prompt.
 76 | 2. Run the discover.py script with <a href="#script-parameters">parameter values</a> that are based on the tables that you want to download. For example, to download the detail tables with a start and end date range, you can run the following command:
 77 | ```
 78 |   py discover.py -m detail -st 2019-12-01T00 -et 2019-12-01T12
 79 | ```
 80 | 
 81 | ---
 82 | **Note:** On Unix-like environments and Macs, the default `py` or `python` command might default to Python 2 if that version is installed. Uninstall earlier versions of Python, or explicitly call Python 3 when you run script like this example:
 83 | ```
 84 | python3 discover.py -m detail -st 2019-12-01T00 -et 2019-12-01T12
 85 | ```
 86 | 
 87 | You can verify which version runs by default with the following command: `python --version`
 88 | 
 89 | ---
 90 | 
 91 | <a name="script-parameters"> </a>
 92 | 
 93 | These are the parameters to use when you run the discover.py script:
 94 | 
 95 | | Parameter   | Description       |
 96 | | :---------- | :-----------------|
 97 | | -h          | Displays the help |
 98 | | -m          | The table set to download. Use one of these values:<br><ul><li>detail (This value downloads Detail mart tables and the partitioned CDM tables - cdm_contact_history and cdm_response_history.)</li><li>dbtReport</li><li>snapshot (for CDM tables that are not partitioned, identity tables, and metadata tables)</li></ul>  |
 99 | | -svn        | Specify a specific schema of tables to download. |
100 | | -st         | The start value in this datetime format: `yyyy-mm-ddThh` |
101 | | -et         | The end value in this datetime format: `yyyy-mm-ddThh`   |
102 | | -ct         | The category of tables to download. When the parameter is not specified, you download tables for all the categories that you have a license to access.<br><br>To download tables from a specific category, you can use one of these values:<ul><li>cdm</li><li>discover</li><li>engagedigital</li><li>engagedirect</li><li>engagemetadata</li><li>engagemobile</li><li>engageweb</li><li>engageemail</li><li>optoutdata</li><li>plan</li></ul><br>For more information, see [Schemas and Categories](https://go.documentation.sas.com/?cdcId=cintcdc&cdcVersion=production.a&docsetId=cintag&docsetTarget=dat-export-api-sch.htm).| 
103 | | -d          | Download only the changes (the delta) from the previous download. Set the value to `yes` or `no`. |
104 | | -l          | For partitioned tables, specify a limit of partitions to download. For example, `-l 150` downloads only the first 150 partitions of a specific set.|
105 | | -a          | Append the download to the existing files. Set the value to `yes` or `no`.  |
106 | | -cf         | Create a CSV file from the download tables. Set the value to `yes` or `no`. |
107 | | -cd         | Specify a delimiter other than the default |
108 | | -ch         | Include a column header in the first row. Set the value to `yes` or `no`. |
109 | | -cl         | Clean the download .zip files. By default, the files are deleted, but you can set this parameter to `no` to keep them. |
110 | 
111 | **Note:** The start and end ranges are only used for the script's first run. After the first run, the download history is stored in the dsccnfg directory. To force the script to use the variables for start date and end date, delete or move the history information.
112 | 
113 |    In addition, the values in the dataRangeStartTimeStamp column and dataRangeEndTimeStamp column in the download history tables are in the UTC time zone. The values in the download_dttm column are in the local time zone.
114 | 
115 | ### Examples
116 | 
117 | * Download the detail tables:  
118 | ```py discover.py –m detail```
119 | 
120 | * Download the discover Base tables:  
121 | ```py discover.py –m dbtReport```
122 | 
123 | * Download the snapshot tables:  
124 | ```py discover.py –m snapshot```
125 | 
126 | * Download the complete set of the CDM tables (both partitioned tables and non-partitioned tables):  
127 | ```py discover.py –m snapshot -ct cdm```  
128 | ```py discover.py –m detail -ct cdm```
129 | 
130 | * Download the detail tables (with only the delta from the last download), create a CSV file, and append to the existing files:  
131 | ```py discover.py –m detail –d yes –cf yes –a yes```
132 | 
133 | * Download the detail tables for the specific time range from start hour (`-st`) to end hour (`-et`):  
134 | ```py discover.py -m detail -st 2019-12-01T00 -et 2019-12-01T12```
135 | 
136 | * Download the discover base tables, create a CSV file, use the ";" (semicolon) delimiter, and include a column header in 
137 | the first row:  
138 | ```py discover.py -m dbtReport -cf yes -cd ";" -ch yes```
139 | 
140 | * This example is similar to the previous example, but the option `-cl no` keeps the downloaded zip files in the download 
141 | folder:  
142 | ```py discover.py -m dbtReport -cf yes -cd ";" -ch yes -cl no```
143 | 
144 | * Download the detail tables with a specific schema (`-svn`), and specify a limit (`-l`) to download only the most recent 
145 | 150 partitions:  
146 | ```py discover.py -m detail -svn 3 -l 150 -cf yes -cd "," -ch yes```
147 | 
148 | * Download the Plan data tables, create a CSV file, use the ";" (semicolon) delimiter, and include a column header in 
149 | the first row:  
150 | ```py discover.py -m snapshot -ct plan -svn 5 -cf yes -cd ";" -ch yes```
151 | 
152 | 
153 | 
154 | ## Contributing
155 | 
156 | We welcome your contributions! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to submit contributions to this project.
157 | 
158 | 
159 | 
160 | ## License
161 | 
162 | This project is licensed under the [Apache 2.0 License](LICENSE).
163 | 
164 | 
165 | 
166 | ## Additional Resources
167 | For more information, see [Downloading Data Tables with the REST API](https://go.documentation.sas.com/?softwareId=ONEMKTMID&softwareVersion=production.a&softwareContextId=DownloadDataTables) in the Help Center for SAS Customer Intelligence 360.
168 | 


--------------------------------------------------------------------------------
/SUPPORT.md:
--------------------------------------------------------------------------------
1 | ## Support
2 | 
3 | We use GitHub for tracking bugs and feature requests. Please submit a GitHub issue or pull request for support.


--------------------------------------------------------------------------------
/discover.py:
--------------------------------------------------------------------------------
   1 | #! python3
   2 | '''
   3 | Copyright © 2019, SAS Institute Inc., Cary, NC, USA.  All Rights Reserved.
   4 | SPDX-License-Identifier: Apache-2.0
   5 | Created on Nov 3, 2017
   6 | last update on Feb 5, 2025
   7 | Version 0.9
   8 | @authors: Mathias Bouten , Shashikant Deore
   9 | NOTE: This Discover Download script should help to better understand 
  10 |       the download API and can be used as a base to start interacting 
  11 |       with the API and download collected customer information. It is
  12 |       not officially supported by SAS. 
  13 | '''
  14 | 
  15 | import requests, base64
  16 | import json, jwt, gzip, csv, codecs
  17 | import os, sys, argparse, time
  18 | from argparse import RawTextHelpFormatter
  19 | from datetime import datetime, timedelta
  20 | from tqdm import tqdm
  21 | from urllib.parse import urlsplit
  22 | import pandas
  23 | import backoff
  24 | 
  25 | #version and update date
  26 | version = 'V0.92'
  27 | updateDate = '05 Feb 2025'
  28 | downloadClient = 'ci360pythonV0.92'
  29 | 
  30 | # default values
  31 | limit     = "20" 
  32 | csvflag   = 'no'
  33 | delimiter = '|'
  34 | csvheader = 'yes'
  35 | append    = 'no'
  36 | sohDelimiter = "\x01"
  37 | progressbar = 'no'
  38 | allhourstatus='true'
  39 | #subhourrange="60"
  40 | #autoreset = 'yes'
  41 | dayOffset = "60"
  42 | max_retry_attempts = 4
  43 | 
  44 | # folders 
  45 | dir_log    = 'log/'
  46 | dir_csv    = 'dscwh/'
  47 | dir_zip    = 'dscdonl/'
  48 | dir_config = 'dsccnfg/'
  49 | dir_extr   = 'dscextr/'
  50 | dir_sql    = 'sql/'
  51 | 
  52 | # global variables
  53 | querystring = {}
  54 | resetQueryString = {} 
  55 | gSql = ''
  56 | gSqlInsert = ''
  57 | cleanFiles = 'yes'
  58 | responseText =''
  59 | 
  60 | ##### functions #####
  61 | 
  62 | # function to do the version specific changes to the already created objects if any
  63 | def versionUpdate():
  64 | 
  65 |     # Change#1 : update download_history_detail.csv & download_history_detail.csv to add dataRangeProcessingStatus column
  66 |     ColumnDelimiter=';'
  67 |     for martNm in ('detail','dbtReport'):
  68 |         historyFile='download_history_' + martNm 
  69 |         historyFilePath = dir_config + historyFile + '.csv'
  70 |         if fileExists(historyFilePath):
  71 |             # read first line and see if it contains the required number of columns
  72 |             with open(historyFilePath) as f:
  73 |                 historyHeader = f.readlines()[1]
  74 |                 f.close
  75 |             
  76 |             countofDelmHeader = historyHeader.count(ColumnDelimiter)
  77 |             if not ( countofDelmHeader == 3 ) :
  78 | 
  79 |                 backupFile=logFileNmtimeStamped(historyFile,fmt='{filename}_%Y%m%dT%H%M%S')
  80 |                 backupFilePath=dir_config + backupFile + '.csv'
  81 | 
  82 |                 logger('  backing up ' + historyFile + ' as ' + backupFile , 'n', True)
  83 |                 # back up the existing file 
  84 |                 with open(historyFilePath) as hf:
  85 |                     with open(backupFilePath, "w") as bkf:
  86 |                         for line in hf:
  87 |                             bkf.write(line)
  88 |                 
  89 |                 logger('  updating ' + historyFile + ' to new version' , 'n', True)
  90 |                 # from the backup re-write existing history file with required number of columns in header and in data lines
  91 |                 headerline = 'dataRangeStart;dataRangeEnd;download_dttm;dataRangeProcessingStatus' + "\n"
  92 |                 rows = 0
  93 |                 with open(backupFilePath) as bkf:
  94 |                     with open(historyFilePath, "w") as hf:
  95 |                         for line in bkf:
  96 |                             rows = rows+1                        
  97 |                             if (rows == 1):       
  98 |                                 hf.write(headerline)
  99 |                             else:
 100 |                                 # append delimiter to line
 101 |                                 newline = line.replace('\n','') + ColumnDelimiter + "\n"
 102 |                                 hf.write(newline)
 103 | 
 104 | def getNextDataRangeStart():
 105 |     # set  nextDataRangeStart = lastDataRangeEnd + 1 ms 
 106 |     historyFile = dir_config + 'download_history_' + martName + '.csv'
 107 |     
 108 |     try:
 109 |         with open(historyFile) as f:
 110 |             last = f.readlines()[-1]
 111 |     
 112 |         lastDataRangeEnd = last.split(';',3)[1]
 113 |         adjustedTime = datetime.strptime(lastDataRangeEnd, '%Y-%m-%dT%H:%M:%S.%fZ')
 114 |         adjustedTime += pandas.to_timedelta(1, unit='ms')
 115 |         adjustedTimeStr = adjustedTime.strftime('%Y-%m-%dT%H:%M:%S.000Z')
 116 |         return adjustedTimeStr
 117 |     except FileNotFoundError as e:
 118 |         print('\n', e)
 119 |         raise SystemExit('\nFATAL: When you use the -d parameter, a history file must exist.')
 120 | 
 121 | def logFileNmtimeStamped(filename, fmt='{filename}_%Y%m%dT%H%M%S.log'):
 122 |     #return datetime.datetime.now().strftime(fmt).format(filename=filename)
 123 |     return datetime.now().strftime(fmt).format(filename=filename)
 124 | 
 125 | def readConfig(configFile):
 126 |     keys = {}
 127 |     seperator = '='
 128 |     with open(configFile) as f:
 129 |         for line in f:
 130 |             if line.startswith('#') == False and seperator in line: 
 131 |                 # Find the name and value by splitting the string
 132 |                 name, value = line.split(seperator, 1)
 133 |                 # Assign key value pair to dict
 134 |                 keys[name.strip()] = value.strip()
 135 |     return keys
 136 | 
 137 | def printDownloadDetails(json_data):
 138 | 
 139 |     #if there is a message attribute in json response log the message
 140 |     if json_data.get('message') :
 141 |         logger('WARNING:' + str(json_data['message']),'n' )
 142 | 
 143 |     if martName == 'identity' or martName == 'snapshot':
 144 |         logger('  download of dataMart snapshot','n')
 145 |     else:
 146 |         TotalDownloadPackages=json_data['count']
 147 |         CurrentPageDownloadPackages=len(json_data['items'])
 148 |         logger('  download of dataMart ' + martName \
 149 |             + ' - total downloads ' + str(TotalDownloadPackages) + ' package(s)' \
 150 |             + ' - downloading ' + str(CurrentPageDownloadPackages) + ' package(s)','n')
 151 | 
 152 |     # logging
 153 |     logger('  request URL: ' + url, 'n', False)
 154 |     logger('  config: ' + json.dumps(config), 'n', False)
 155 |     logger('  arguments: ' + str(args),'n',False)
 156 | 
 157 | def printResetDetails(json_data):
 158 |     # print details of the json response like number of total reset packages & number of reset package on current page
 159 |     TotalResetPackages=json_data['count']
 160 |     CurrentPageResetPackages=len(json_data['items'])
 161 |     logger('  reset of dataMart ' + martName \
 162 |         + ' - total resets ' + str(TotalResetPackages) + ' package(s)' \
 163 |         + ' -  current page resets ' + str(CurrentPageResetPackages) + ' package(s)','n')
 164 | 
 165 |     # logging
 166 |     logger('  request URL: ' + url, 'n', False)
 167 |     logger('  config: ' + json.dumps(config), 'n', False)
 168 |     logger('  arguments: ' + str(args),'n',False)
 169 | 
 170 |     # extract the query parameters from url & print ?
 171 |     
 172 | 
 173 | def createDiscoverAPIUrl(config):
 174 |     baseUrl = config['baseUrl']
 175 |     if martName == 'detail':
 176 |         url = baseUrl + 'detail/partitionedData'
 177 |     elif martName == 'dbtReport':
 178 |         url = baseUrl + 'dbtReport'
 179 |     elif martName == 'identity' or martName =='snapshot':
 180 |         url = baseUrl + 'detail/nonPartitionedData'
 181 |     else:
 182 |         print('Error: wrong martName ')
 183 |         sys.exit()
 184 |     return url
 185 | 
 186 | def logger(line, action, console=True):
 187 |     #logfile = dir_log + 'discover_download.log'
 188 |     logfilePath = dir_log + logfile
 189 |     nowDttm = str(datetime.now())
 190 |     with open(logfilePath,'a') as log:
 191 |         if action == 'n':
 192 |             log.write('\n' + nowDttm + ': ' + line)
 193 |             if console == True:
 194 |                 print('\n' + line, sep='', end='', flush=True)
 195 |         elif action == 'a':
 196 |             log.write(line)
 197 |             if console == True:
 198 |                 print(line, sep='', end='', flush=True)
 199 | 
 200 | def log_retry_attempt(details):
 201 |     #print ("Backing off {wait:0.1f} seconds afters {tries} tries "
 202 |     #       "calling function {target} with args {args} and kwargs "
 203 |     #       "{kwargs}".format(**details))
 204 |     logger('  Caught retryable error after ' + str(details["tries"]) + ' tries. Waiting  ' + str(round(details["wait"],2)) + ' more seconds then retrying...', 'n', True)
 205 |     logger('  responseText: ' + str(responseText) ,'n', True)
 206 | 
 207 | def after_all_retries(details):
 208 |     _, exception, _ = sys.exc_info()
 209 |     logger('  error executing ' + str(details["target"]),'n',True)
 210 |     logger('  exception ' + str(exception) ,'n', True)
 211 |     sys.exit(1)
 212 |     
 213 | @backoff.on_exception(
 214 |     backoff.expo
 215 |     ,requests.exceptions.RequestException
 216 |     ,max_tries=max_retry_attempts
 217 |     ,factor=5
 218 |     ,on_backoff=log_retry_attempt
 219 |     ,on_giveup=after_all_retries
 220 |     #,jitter=backoff.full_jitter
 221 |     #,giveup=lambda e: e.response is not None and e.response.status_code < 500
 222 |     #,max_time=30
 223 |     )
 224 | def getResetUrls(url):
 225 |     global responseText
 226 |     resetQueryString["agentName"] = config['agentName']
 227 |     if martName == 'dbtReport' :
 228 |         resetQueryString["martType"] = 'dbt-report'
 229 |     else:
 230 |         resetQueryString["martType"] = martName
 231 |     
 232 |     resetQueryString["dayOffset"] = dayOffset
 233 | 
 234 |     getURLs_start = time.time() # track get download URL request time
 235 |     
 236 |     logger('  send reset request to Discover API with querystring: ','n')
 237 |     logger('    ' + json.dumps(resetQueryString),'n')
 238 |     response = requests.request("GET", url, headers=headers, params=resetQueryString)
 239 |     # to test retry meachanism force the response status code 
 240 |     # response.status_code=409
 241 |     #print_response(response)
 242 |     responseText=response.text
 243 |     response.raise_for_status()
 244 |             
 245 |     getURLs_end = time.time() # track get download URL request time
 246 |     getURLs_duration = round((getURLs_end - getURLs_start),2)
 247 |     logger('  getResetUrls request duration: ' + str(getURLs_duration) + ' seconds','n')
 248 | 
 249 |     json_data = json.loads(response.text)
 250 |     
 251 |     response_file = dir_config + 'ResetResponse.json'
 252 |     with open(response_file, "w") as f:
 253 |             f.write(json.dumps(json_data, indent=4, sort_keys=True))
 254 |     
 255 |     if 'error' in json_data:
 256 |         print('\n Error: ' + json_data['error'] + " - " + json_data['message'])
 257 |         print('\n Check connection details! \n')
 258 |         sys.exit()
 259 |             
 260 |     return json_data
 261 | 
 262 | # Function to log the request information to console if required
 263 | def print_request(req):
 264 |     print('HTTP/1.1 {method} {url}\n{headers}\n\n{body}'.format(
 265 |         method=req.method,
 266 |         url=req.url,
 267 |         headers='\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
 268 |         body=req.body,
 269 |     ))
 270 | 
 271 | def print_response(res):
 272 |     print('HTTP/1.1 {status_code}\n{headers}\n\n{body}'.format(
 273 |         status_code=res.status_code,
 274 |         headers='\n'.join('{}: {}'.format(k, v) for k, v in res.headers.items()),
 275 |         body=res.content,
 276 |     ))
 277 |     Response_info= 'HTTP/1.1 {status_code}\n{headers}\n\n{body}'.format(status_code=res.status_code, headers='\n'.join('{}: {}'.format(k, v) for k, v in res.headers.items()),body=res.content)
 278 |     logger(Response_info, 'n')  
 279 | 
 280 | @backoff.on_exception(backoff.expo,requests.exceptions.RequestException,max_tries=max_retry_attempts,factor=5,on_backoff=log_retry_attempt,on_giveup=after_all_retries)
 281 | def getDownloadUrls(url):
 282 |     global responseText
 283 |     querystring["limit"] = limit
 284 |     querystring["agentName"] = config['agentName']
 285 |     getURLs_start = time.time() # track get download URL request time
 286 |     
 287 |     logger('  send download request to Discover API with querystring: ','n')
 288 |     logger('    ' + json.dumps(querystring),'n')
 289 |     response = requests.request("GET", url, headers=headers, params=querystring)
 290 |     # to test retry meachanism force the response status code 
 291 |     # response.status_code=409
 292 |     responseText=response.text
 293 |     
 294 |     response.raise_for_status()
 295 | 
 296 |             
 297 |     getURLs_end = time.time() # track get download URL request time
 298 |     getURLs_duration = round((getURLs_end - getURLs_start),2)
 299 |     logger('  getDownloadUrls request duration: ' + str(getURLs_duration) + ' seconds','n')
 300 | 
 301 |     json_data = json.loads(response.text)
 302 |     
 303 |     response_file = dir_config + 'response.json'
 304 |     with open(response_file, "w") as f:
 305 |             f.write(json.dumps(json_data, indent=4, sort_keys=True))
 306 |     
 307 |     if 'error' in json_data:
 308 |         print('\n Error: ' + json_data['error'] + " - " + json_data['message'])
 309 |         print('\n Check connection details! \n')
 310 |         sys.exit()
 311 |             
 312 |     return json_data
 313 | 
 314 | def createDiscoverResetAPIUrl(config):
 315 |     baseUrl = config['baseUrl']
 316 |     if martName == 'detail':
 317 |         #url = baseUrl + 'partitionedData/resets'
 318 |         url = baseUrl + 'partitionedData/resets'
 319 |     elif martName == 'dbtReport':
 320 |         #url = baseUrl + 'dbtReport/resets'
 321 |         url = baseUrl + 'partitionedData/resets'        
 322 |     else:
 323 |         print('Error: wrong martName ')
 324 |         sys.exit()
 325 |     return url
 326 | 
 327 | def createDiscoverAPIUrlFromHref(config,href):
 328 |     # function to generate the reset API 
 329 |     baseUrl = config['baseUrl']
 330 |     baseUrlHost = "{0.scheme}://{0.netloc}".format(urlsplit(baseUrl))
 331 |     url= baseUrlHost +  href
 332 |     return url
 333 | 
 334 | @backoff.on_exception(backoff.expo,requests.exceptions.RequestException,max_tries=max_retry_attempts,factor=5,on_backoff=log_retry_attempt,on_giveup=after_all_retries)
 335 | def getSchema( url, tablename ):
 336 |     global responseText
 337 |     global gSql 
 338 |     global gSqlInsert
 339 |     r = requests.get(url)
 340 |     # to test retry meachanism force the response status code 
 341 |     # r.status_code=409
 342 |     responseText=r.text
 343 |     r.raise_for_status()
 344 | 
 345 |     json_meta = json.loads(r.text)
 346 |     columnHeader = ''
 347 |     sqlTable  = 'create table ' + tablename + '('
 348 |     sqlInsert = 'insert into ' + tablename + ' values ('
 349 |     sqlColumn = ''
 350 |     sqlInsertColumn = ''
 351 | 
 352 |     for item in json_meta:
 353 |         meta_table = item['table_name']       
 354 |         if tablename.lower() == meta_table.lower():
 355 |             column = item['column_name']
 356 |             columnType = item['column_type']
 357 |             sqlColumn = sqlColumn + '\n  ' + column + ' ' + columnType + ', '
 358 |             sqlInsertColumn = sqlInsertColumn + '%s,'
 359 |             columnHeader = columnHeader + column + delimiter  
 360 |     
 361 |     #finish create table statement    
 362 |     gSql = gSql + sqlTable + sqlColumn[:-2] + ');\n\n'   
 363 |     gSqlInsert = sqlInsert + sqlInsertColumn[:-1] + ')'
 364 |         
 365 |     #remove last delimiter and return line 
 366 |     return columnHeader[:-len(delimiter)]
 367 | 
 368 | @backoff.on_exception(backoff.expo,requests.exceptions.RequestException,max_tries=max_retry_attempts,factor=5,on_backoff=log_retry_attempt,on_giveup=after_all_retries)
 369 | def downloadWithProgress( url, outputfile, writeType):
 370 |     global responseText
 371 |     r = requests.get(url, stream=True)
 372 |     # to test retry meachanism force the response status code 
 373 |     # r.status_code=409
 374 |     #responseText=r.text
 375 |     r.raise_for_status()
 376 |     # Total size in bytes.
 377 |     file_size = int(r.headers.get('content-length', 0))
 378 |     block_size = 1024
 379 |     wrote = 0 
 380 |     with open(outputfile, writeType) as f:
 381 |         #for data in tqdm(iterable = r.iter_content(block_size), total= file_size/block_size , unit='KB', unit_scale=True, leave=True):
 382 |         for i in tqdm(range(file_size), ncols = 100, unit='KB'):
 383 |             data = r.raw.read(block_size) # read content block in bytes
 384 |             wrote = wrote + len(data)     # update actual number of written bytes
 385 |             i = i + block_size            # update iterator to continue loop from right point
 386 |             f.write(data)                 # write data to file
 387 |             f.flush()                     
 388 |     if file_size != 0 and wrote != file_size:
 389 |         print("ERROR, something went wrong during download - try again") 
 390 | 
 391 | @backoff.on_exception(backoff.expo,requests.exceptions.RequestException,max_tries=max_retry_attempts,factor=5,on_backoff=log_retry_attempt,on_giveup=after_all_retries)
 392 | def download( url, outputfile, writeType):
 393 |     global responseText
 394 |     r = requests.get(url)
 395 |     # to test retry meachanism force the response status code 
 396 |     #r.status_code=409
 397 |     #responseText=r.text
 398 |     r.raise_for_status()
 399 |     #with open(outputfile, writeType) as f:
 400 |     #    f.write(r.content)                 # write data to file
 401 | 
 402 |     # Retry for PermissionError in the open statement
 403 |     retry_attempts = 5
 404 |     for attempt in range(retry_attempts):
 405 |         try:
 406 |             with open(outputfile, writeType) as f:
 407 |                 f.write(r.content)  # Write data to file
 408 |             break  # If the operation succeeds, exit the loop
 409 |         except PermissionError:
 410 |             if attempt < retry_attempts - 1:  # Don't sleep after the last attempt
 411 |                 print(f"PermissionError occurred. Retry {attempt + 1}/{retry_attempts} after 1 second.")
 412 |                 time.sleep(2)  # Wait for 1 second before retrying
 413 |             else:
 414 |                 raise  # If it fails after the max retries, raise the exception
 415 | 
 416 | def unzipFile(in_file,out_file,in_delimiter,out_delimiter,header):
 417 |     #read file line by line and write columns as per schema 
 418 |     error = 0
 419 |     errorMsg = ''
 420 | 
 421 |     with gzip.open(in_file, "rb") as in_f, \
 422 |         open(out_file, "wb") as out_f:
 423 | 
 424 |         # go line by line and make changes in line to match header columns and data columns
 425 |         rows = 0
 426 |         for line in in_f:
 427 |             line2=str(line, 'utf-8')
 428 |             rows = rows+1
 429 |             # when its first row check if number of header cols are different from number of data columns
 430 |             if rows == 1:
 431 |                 countofDelmHeader = header.count(out_delimiter)
 432 |                 countofDelmData = str(line2).count(in_delimiter)
 433 |         
 434 |             # when schema is old but datafile is newer version then remove the extra columns
 435 |             if countofDelmHeader < countofDelmData :
 436 |                 #split the fields into list
 437 |                 fields = line2.split(in_delimiter)
 438 |                 #limit the list to required number of columns as per header
 439 |                 #join fileds and create a line string again 
 440 |                 line2=in_delimiter.join(fields[0:countofDelmHeader + 1])  + "\n"
 441 |             # when schema is newer but datafile is older version then add the extra empty columns
 442 |             elif countofDelmHeader > countofDelmData :
 443 |                 # remove the existing newline char ..add the empty columns and in the end of line add the new line char
 444 |                 line2 = line2.replace('\n','') + in_delimiter*(countofDelmHeader - countofDelmData) + "\n"
 445 | 
 446 |             try:
 447 |                 #out_f.write(line2.replace(in_delimiter, in_delimiter).encode())
 448 |                 out_f.write(line2.encode())
 449 |             except (UnicodeEncodeError) as e:
 450 |                 error = error+1
 451 |                 errorMsg = errorMsg +'\nerror in row: ' + str(rows) + ' - ' + str(e)
 452 | 
 453 |         logger("...unzipped file with " + str(rows) + " rows - errors: " + str(error),'a')
 454 | 
 455 | def createCSV(in_file, out_file, in_delimiter, out_delimiter, header):
 456 |     #read unzipped file line by line and replace delimiter
 457 |     error = 0
 458 |     errorMsg = ''
 459 |     with codecs.open(in_file, 'r','utf-8') as in_f, \
 460 |          codecs.open(out_file, 'w','utf-8') as out_f:
 461 |         # print column header line in csv file if flag is yes
 462 |         if csvheader == 'yes':
 463 |             out_f.write(header+"\n")
 464 |         # go line by line and replace delimiter
 465 |         rows = 0
 466 |         for line in in_f:
 467 |             rows = rows+1
 468 |             try:
 469 |                 out_f.write(line.replace(in_delimiter, out_delimiter))
 470 |             except (UnicodeEncodeError) as e:
 471 |                 error = error+1
 472 |                 errorMsg = errorMsg +'\nerror in row: ' + str(rows) + ' - ' + str(e)
 473 | 
 474 |     logger("...CSV created with " + str(rows) + " rows - errors: " + str(error),'a')
 475 |     
 476 |     if error != 0:
 477 |         logError(in_file, errorMsg)
 478 |         logger(" Error during csv creation process - see separate log file", 'n')
 479 | 
 480 |     if cleanFiles == 'yes':
 481 |         os.remove(in_file)
 482 | 
 483 | def createSingleTableFiles( entity, schemaUrl ):
 484 |     name = entity['entityName']
 485 |     tablefile = dir_csv+name+'.csv'
 486 |     if not os.path.exists(tablefile):
 487 |         header = getSchema(schemaUrl, name)
 488 |         with open(tablefile,'w') as f:
 489 |             f.write(header+"\n")
 490 | 
 491 | def appendCSV(in_file, out_file, in_delimiter, out_delimiter):
 492 |     #read unzipped file line by line and replace delimiter
 493 |     error = 0
 494 |     errorMsg = ''
 495 |     with codecs.open(in_file, 'r','utf-8') as in_f, \
 496 |          codecs.open(out_file, 'a','utf-8') as out_f:
 497 |         # go line by line and replace delimiter
 498 |         rows = 0
 499 |         for line in in_f:
 500 |             rows = rows+1
 501 |             try:
 502 |                 out_f.write(line.replace(in_delimiter, out_delimiter))
 503 |             except (UnicodeEncodeError) as e:
 504 |                 error = error+1
 505 |                 errorMsg = errorMsg +'\nerror in row: ' + str(rows) + ' - ' + str(e)
 506 | 
 507 |     logger("...appended " + str(rows) + " rows - errors: " + str(error),'a')
 508 |     
 509 |     if error != 0:
 510 |         logError(in_file, errorMsg)
 511 |         logger(" Error during append process - see separate log file", 'n')
 512 |     
 513 |     if cleanFiles == 'yes':
 514 |         os.remove(in_file)
 515 | 
 516 | def logError(file, message):
 517 |     errorLog = dir_log + 'error_' + file + '.log'
 518 |     with open(errorLog, 'a') as f:
 519 |         f.write(message)
 520 |        
 521 | def downloadEntity( entity, schemaUrl, prefix ):
 522 |     name = entity['entityName']
 523 |     header = getSchema(schemaUrl, name)
 524 |     zippedFile = dir_extr+prefix+name+'.gz'
 525 |     unzippedFile = dir_zip+prefix+name+'.soh'
 526 |     csvFile = dir_csv+prefix+name+'.csv'
 527 |     sqlFile = dir_sql+prefix+'create_tables_'+martName+'.sql'
 528 |     tablefile = dir_csv+name+'.csv'
 529 |     items = len(entity['dataUrlDetails'])
 530 |     logger('    ' + name + ' - total items: ' + str(items),'n')
 531 |     
 532 |     i=0
 533 |     for dataUrlDetail in entity['dataUrlDetails']:
 534 |         i=i+1
 535 | 
 536 |         # when its first file in the hour create a new file else append to existing file
 537 |         if (i==1):
 538 |             writeType='wb'
 539 |         else:
 540 |             writeType='ab'
 541 | 
 542 |         url = dataUrlDetail['url']
 543 |         logger('      item#: ' + str(i) , 'n')
 544 |         if progressbar == 'yes':
 545 |             #print(" - download with progress bar")
 546 |             #downloadWithProgress( url, zippedFile, "ab")
 547 |             downloadWithProgress( url, zippedFile, writeType)            
 548 |         elif progressbar == 'no':
 549 |             #print(" - download without progress bar")
 550 |             #download( url, zippedFile, "ab")
 551 |             download( url, zippedFile, writeType)
 552 |         
 553 |     #unzip downloaded file
 554 |     ''' sinshd - created a new function to do line by line unzipping
 555 |     with gzip.open(zippedFile, "rb") as zipped, \
 556 |         open(unzippedFile, "wb") as unzipped:
 557 |         #read zipped data
 558 |         unzipped_content = zipped.read()
 559 |         #save unzipped data into file
 560 |         unzipped.write(unzipped_content)
 561 |         logger("...unzipped",'a')
 562 |       '''
 563 |     unzipFile(zippedFile,unzippedFile,sohDelimiter,delimiter,header)
 564 |     
 565 |     #create CSV file - replace SOH delimiter
 566 |     if csvflag == 'yes' and append == 'no':        
 567 |         #sinshd - 
 568 |         createCSV(unzippedFile, csvFile, sohDelimiter, delimiter, header)
 569 |         #createCSV2(unzippedFile, csvFile, sohDelimiter, delimiter, header)
 570 |     elif csvflag == 'yes' and append == 'yes':
 571 |         createSingleTableFiles(entity, schemaUrl)
 572 |         appendCSV(unzippedFile, tablefile, sohDelimiter, delimiter)
 573 |         
 574 |         
 575 |     #remove zipped file
 576 |     if cleanFiles == 'yes':
 577 |         os.remove(zippedFile)
 578 |         
 579 |     with open(sqlFile,'w') as f:
 580 |         f.write(gSql)
 581 |     
 582 |     return
 583 | 
 584 | def logHistory(dataRangeStart, dataRangeEnd,dataRangeProcessingStatus):
 585 | 
 586 |     if resetInProgress == True :
 587 |         historyFile = dir_config + 'reset_download_history_' + martName + '.csv' 
 588 |     else:
 589 |         historyFile = dir_config + 'download_history_' + martName + '.csv'
 590 |     
 591 |     #historyFile = dir_config + 'download_history_' + martName + '.csv'
 592 |     nowDttm = str(datetime.now())
 593 | 
 594 |     # if martName = detail or dbtReport
 595 |     headerline = 'dataRangeStart;dataRangeEnd;download_dttm;dataRangeProcessingStatus'
 596 |     recordline = dataRangeStart + ';' + dataRangeEnd + ';' + nowDttm + ';' + dataRangeProcessingStatus
 597 |     
 598 |     # open file - if not exist create it with headerline
 599 |     if not os.path.exists(historyFile):
 600 |         with open(historyFile, 'w') as f:
 601 |             f.write(headerline+"\n")
 602 | 
 603 |     # append rows to history file
 604 |     with open(historyFile, 'a') as f:
 605 |         f.write(recordline+"\n")
 606 | 
 607 | def logResetRange(dataRangeStart=None, dataRangeEnd=None, resetCompleted_dttm=None):
 608 |     historyFile = dir_config + 'reset_range_' + martName + '.csv'
 609 |     # if martName = detail or dbtReport
 610 |     headerline = 'dataRangeStart;dataRangeEnd;resetCompleted_dttm;download_dttm'
 611 | 
 612 |     # open file - if not exist create it with headerline
 613 |     if not os.path.exists(historyFile):
 614 |         with open(historyFile, 'w') as f:
 615 |             f.write(headerline+"\n")
 616 | 
 617 |     # append line only when dataRangeStart is set to something 
 618 |     # this way we can call logResetRange to just create an header row only
 619 |     if not (dataRangeStart == None):
 620 |         nowDttm = str(datetime.now())
 621 |         recordline = dataRangeStart + ';' + dataRangeEnd + ';' + resetCompleted_dttm + ';' + nowDttm
 622 |         # append rows to history file
 623 |         with open(historyFile, 'a') as f:
 624 |             f.write(recordline+"\n")    
 625 | 
 626 | def logHistorySnapshot(entity):
 627 |     historyFile = dir_config +'download_history_snapshot.csv'
 628 |     nowDttm = str(datetime.now())
 629 |     lastModifiedTimestamp = entity['dataUrlDetails'][0]['lastModifiedTimestamp']
 630 | 
 631 |     headerline = 'entityName;lastModifiedTimestamp;download_dttm'
 632 |     recordline = entity['entityName'] + ';' + lastModifiedTimestamp + ';' + nowDttm
 633 |     
 634 |     # open file - if not exist create it with headerline
 635 |     if not os.path.exists(historyFile):
 636 |         with open(historyFile, 'w') as f:
 637 |             f.write(headerline+"\n")
 638 | 
 639 |     # append rows to history file
 640 |     with open(historyFile, 'a') as f:
 641 |         f.write(recordline+"\n")
 642 | 
 643 | def readDownloadHistory(historyFile):
 644 |     # function to read the mart history file as dataframe 
 645 |     # this will be later used to do lookups
 646 |     # e.g. to check if the history records exits 
 647 |     # historyFile = dir_config + 'download_history_' + martName + '.csv'
 648 |     df = pandas.read_csv(historyFile
 649 | 			            ,sep=';'
 650 | 			            ,header=0
 651 | 			            ,names=['dataRangeStart','dataRangeEnd','download_dttm','dataRangeProcessingStatus']
 652 | 			            ,parse_dates =['dataRangeStart','dataRangeEnd','download_dttm'])
 653 |     return df
 654 | 
 655 | def readResetRange(resetFile):
 656 |     # function to read the mart reset file as dataframe 
 657 |     # this will be later used for lookups
 658 |     # e.g. to check if the reset records exits 
 659 |     # resetFile = dir_config + 'reset_range_' + martName + '.csv'
 660 |     
 661 |     # weflower 2025-02-05: pandas 1.3.0+ deprecated error_bad_lines and replaced with on_bad_lines
 662 |     # https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
 663 |     
 664 |     df = pandas.read_csv(resetFile
 665 | 			            ,sep=';'
 666 |                         ,on_bad_lines='warn'
 667 | 			            ,header=0
 668 | 			            ,names=['dataRangeStart','dataRangeEnd','resetCompleted_dttm','download_dttm']
 669 | 			            ,parse_dates =['dataRangeStart','dataRangeEnd','resetCompleted_dttm','download_dttm'])
 670 |     return df
 671 | 
 672 | def fileExists( fileName ):
 673 |     # function to check if the input fileName exists
 674 |     fileExists=False
 675 |     if os.path.exists(fileName):
 676 |         fileExists=True
 677 |     return fileExists
 678 | 
 679 | def loopThroughDownloadPackages(url):
 680 | 
 681 |     # call CI360 Discover API to get Reset URLs
 682 |     json_data = getDownloadUrls(url)
 683 | 
 684 |     # print details of the json response like number of total download packages & number of download package on current page
 685 |     printDownloadDetails(json_data)
 686 | 
 687 |     #loop through download packages 
 688 |     packageNumber=0
 689 |     for item in json_data['items']:
 690 |         schemaUrl = item['schemaUrl']
 691 |         prefix = ''
 692 |         
 693 |         # only for detail and dbtReport data mart display the ranges
 694 |         if martName == 'detail' or martName == 'dbtReport':
 695 |             rangeStartDt = item['dataRangeStartTimeStamp']
 696 |             rangeStart = rangeStartDt.replace(':','-').replace('.000Z','')
 697 |             rangeEndDt = item['dataRangeEndTimeStamp']
 698 |             rangeEnd = rangeEndDt.replace(':','-').replace('.999Z','')
 699 |             
 700 |             if not is_json_key_present(item, 'dataRangeProcessingStatus'):
 701 |                 processingStatus  = ''
 702 |             else:  
 703 |                 processingStatus = item['dataRangeProcessingStatus']
 704 | 
 705 |             prefix = rangeStart+ "_"
 706 |             packageNumber=packageNumber+1
 707 |             str_packageNumber = str(packageNumber)
 708 |             # add a zero infront of package number if number is lower 10
 709 |             if packageNumber < 10:
 710 |                 str_packageNumber = '0' + str_packageNumber   
 711 |                  
 712 |             logger('********** Tables of package ' + str_packageNumber \
 713 |                     + ' - start: ' + str(rangeStart) + ' **********', 'n')
 714 |             # dataRangeProcessingStatus : DATA_AVAILABLE NO_DATA ERROR RESET_INPROGRESS
 715 |             logger('    dataRangeProcessingStatus : ' + processingStatus , 'n')
 716 |             
 717 |         for entity in json_data['items'][packageNumber-1]['entities']:  
 718 |             ###createSingleTableFiles(entity, schemaUrl)          
 719 |             downloadEntity(entity, schemaUrl, prefix)      
 720 |             if martName == 'identity' or martName == 'snapshot' :
 721 |                 logHistorySnapshot(entity)
 722 |     
 723 |         if martName == 'detail' or martName == 'dbtReport':
 724 |             logHistory(rangeStartDt, rangeEndDt,processingStatus)
 725 | 
 726 |     logger('********** Finished Downloading Current Page **********', 'n')
 727 | 
 728 |     for link in json_data['links']:
 729 |         if link['rel'] == 'next' :
 730 |             nextHref = link['href']
 731 |             #form the next url using href
 732 |             url=createDiscoverAPIUrlFromHref(config,nextHref)
 733 |             loopThroughDownloadPackages(url)
 734 | 
 735 | def is_json_key_present(json, key):
 736 |     try:
 737 |         buf = json[key]
 738 |     except KeyError:
 739 |         return False
 740 | 
 741 |     return True
 742 | 
 743 | def loopThroughResetPackages(url):
 744 | 
 745 |     # call CI360 Discover API to get Reset URLs
 746 |     json_data = getResetUrls(url)
 747 |     # print details of the json response like number of total reset packages & number of reset package on current page
 748 |     printResetDetails(json_data)
 749 |     
 750 |     #loop through reset packages 
 751 |     packageNumber = 0
 752 |     for item in json_data['items']:
 753 |         packageNumber=packageNumber + 1
 754 |         dataRangeStart = item['dataRangeStartTimeStamp']
 755 |         dataRangeEnd = item['dataRangeEndTimeStamp']
 756 |         resetCompleted_dttm = item['resetCompletedTimeStamp']    
 757 | 
 758 |         str_packageNumber = str(packageNumber)
 759 |         # add a zero infront of package number if number is lower 10
 760 |         if packageNumber < 10:
 761 |             str_packageNumber = '0' + str_packageNumber   
 762 |         logger('********** Reset of package ' + str_packageNumber \
 763 |             + ' - start: ' + str(dataRangeStart) \
 764 |             + ' - end: ' + str(dataRangeEnd) \
 765 |             + ' - resetCompleted_dttm: ' + str(resetCompleted_dttm) + ' **********' , 'n')
 766 | 
 767 |         # get the download packages from the current downloadUrl and loop through all download packages
 768 |         downloadUrlHref=item['downloadUrl']
 769 |         downloadUrl=createDiscoverAPIUrlFromHref(config,downloadUrlHref)
 770 | 
 771 |         logger('           checking if reset range exists in download history...  ', 'n')
 772 |         # check if the input range exists in download history
 773 |         # create a dataframe with the filtered data from download history dataframe        
 774 |         hist_dataRange_df = download_history_df[(download_history_df['dataRangeStart']==dataRangeStart)]
 775 |         if (len(hist_dataRange_df.index) > 0 ):
 776 |             logger('exists ', 'a')
 777 |             # check if reset range is already downloaded (record exists in reset history)
 778 |             logger('           checking if reset range exists in reset history...  ', 'n')
 779 |             reset_dataRange_df = reset_range_df[(reset_range_df['dataRangeStart']==dataRangeStart)
 780 |                                                  &(reset_range_df['dataRangeEnd']==dataRangeEnd)
 781 |                                                  &(reset_range_df['resetCompleted_dttm']==resetCompleted_dttm)]
 782 |             if (len(reset_dataRange_df.index) == 0 ):
 783 |                 logger(' does not exists ..starting download', 'a')
 784 |                 # download reseted data
 785 |                 loopThroughDownloadPackages(downloadUrl)
 786 |                 # add the reset entry into reset history table 
 787 |                 logResetRange(dataRangeStart, dataRangeEnd, resetCompleted_dttm)      
 788 |             else:
 789 |                 logger(' exists..skipping reset ', 'a')
 790 |         else:
 791 |             logger(' does not exists..skipping reset ', 'a')
 792 |             # check if reset range is already downloaded (record exists in reset history)
 793 |         
 794 |     logger('********** Finished Reset of packages on current page **********', 'n')
 795 |     for link in json_data['links']:
 796 |         if link['rel'] == 'next' :
 797 |             nextResetRangeHref = link['href']
 798 |             #form the next url using href
 799 |             url=createDiscoverAPIUrlFromHref(config,nextResetRangeHref)
 800 |             loopThroughResetPackages(url)
 801 | 
 802 | 
 803 | ###############################################
 804 | 
 805 | # set dynamic log file name to create a new log file in each run 
 806 | #logfile = dir_log + 'discover_download.log'
 807 | #logfile = dir_log + logFileNmtimeStamped('discover_' + martName)
 808 | #print ('logfile:',logfile)
 809 | #logger('CI360 DISCOVER Download API (' + version + ') - last updated '+ updateDate,'n')
 810 | 
 811 | #check command line arguments
 812 | parser = argparse.ArgumentParser(description=
 813 |         'Download CI360 Discover data for a specific data mart. \
 814 |         \nOptional you can download data for a specific time range. \
 815 |         \nOptional you can transform the downloaded data into a specific CSV file.\n \
 816 |         \nExamples: py discover.py -m detail \
 817 |         \n          py discover.py -m detail -d yes -cf yes -a yes \
 818 |         \n          py discover.py -m detail -d yes -cf yes -a yes -pb yes \
 819 |         \n          py discover.py -m detail -l 2 \
 820 |         \n          py discover.py -m detail -st 2017-12-07T10 -et 2017-12-07T12 \
 821 |         \n          py discover.py -m dbtReport -cf yes -cd ";" -ch yes \
 822 |         \n          py discover.py -m snapshot '
 823 |         , formatter_class=RawTextHelpFormatter)
 824 | 
 825 | parser.add_argument('-m', action='store', dest='mart', type=str, 
 826 |     help='enter dataMart: detail, dbtReport or snapshot', required=True)
 827 | parser.add_argument('-l', action='store', dest='limit', type=int, 
 828 |     help='enter a limit: ie. 30 - default 20', required=False)
 829 | parser.add_argument('-cd', action='store', dest='delimiter', type=str, 
 830 |     help='enter a csv delimiter - default | (pipe)', required=False)
 831 | parser.add_argument('-cf', action='store', dest='csvflag', type=str, 
 832 |     help='create csv: yes or no - default no', required=False)
 833 | parser.add_argument('-ch', action='store', dest='csvheader', type=str, 
 834 |     help='csv column header row: yes or no - default yes', required=False)
 835 | parser.add_argument('-st', action='store', dest='start', type=str, 
 836 |     help='enter start time: ie. 2017-11-07T10', required=False)
 837 | parser.add_argument('-et', action='store', dest='end', type=str, 
 838 |     help='enter end time: ie. 2017-11-07T12', required=False)
 839 | parser.add_argument('-a', action='store', dest='append', type=str, 
 840 |     help='append to one file: yes or no - default no', required=False)
 841 | parser.add_argument('-d', action='store', dest='delta', type=str, 
 842 |     help='download delta: yes or no - default no', required=False)
 843 | parser.add_argument('-cl', action='store', dest='clean', type=str, 
 844 |     help='clean zip files: yes or no - default yes', required=False)
 845 | parser.add_argument('-pb', action='store', dest='progressbar', type=str, 
 846 |     help='show progress bar: yes or no - default no', required=False)
 847 | 
 848 | #added 2018-11-21 by Mathias Bouten - new API features
 849 | #parser.add_argument('-ahs', action='store', dest='allhourstatus', type=str, 
 850 | #    help='enter includeAllHoursStatus flag: ie. true - default false', required=False)
 851 | parser.add_argument('-shr', action='store', dest='subhourrange', type=str, default=60,
 852 |     help='enter subHourlyDataRangeInMinutes: ie. 10', required=False)
 853 | parser.add_argument('-svn', action='store', dest='schemaversion', type=str, default=1,
 854 |     help='enter schemaVersion: ie. 3 - default 1', required=False)
 855 | parser.add_argument('-ar', action='store', dest='autoreset', type=str, default='yes',
 856 |     help='perform reset before download : yes or no - default yes', required=False)
 857 | parser.add_argument('-ct', action='store', dest='category', type=str, default='discover',
 858 |     help='category to download : e.g. discover,engagedirect .. - default discover', required=False)
 859 | #added 2020-01-27 - sinshd -new API features - test mode download
 860 | parser.add_argument('-tm', action='store', dest='testmode', type=str,
 861 |     help='test mode download : e.g. PLANTESTMODE  ', required=False)
 862 | 
 863 | args = parser.parse_args()
 864 | 
 865 | if args.mart is not None:
 866 |     martName = args.mart
 867 |     download_msg = 'all'
 868 |     print('  datamart: ' + martName)
 869 | if args.limit is not None:
 870 |     limit = str(args.limit)
 871 |     querystring["limit"] = limit
 872 |     print('  limit: ' + limit)
 873 | if args.delimiter is not None:
 874 |     delimiter = str(args.delimiter)
 875 |     print('  delimiter: ' + delimiter)
 876 | if args.csvflag is not None:
 877 |     csvflag = str(args.csvflag)
 878 |     print('  csvflag: ' + csvflag)
 879 | if args.csvheader is not None:
 880 |     csvheader = str(args.csvheader)
 881 |     print('  csvheader: ' + csvheader)
 882 | if args.clean is not None:
 883 |     cleanFiles = str(args.clean)
 884 |     print('  cleanFiles: ' + cleanFiles)
 885 | if args.progressbar is not None:
 886 |     progressbar = str(args.progressbar)
 887 |     print('  progressbar: ' + progressbar)
 888 | if args.start is not None:
 889 |     dataRangeStartTimeStamp = str(args.start) + ":00:00.000Z"
 890 |     querystring["dataRangeStartTimeStamp"] = dataRangeStartTimeStamp   
 891 |     print('  start: ' + dataRangeStartTimeStamp)
 892 | if args.end is not None:
 893 |     dataRangeEndTimeStamp = str(args.end) + ":00:00.000Z"
 894 |     querystring["dataRangeEndTimeStamp"] = dataRangeEndTimeStamp
 895 |     print('  end: ' + dataRangeEndTimeStamp)
 896 | if args.append is not None:
 897 |     append = str(args.append)
 898 |     print('  append: ' + append)
 899 | if args.delta is not None:
 900 |     # sinshd - use new function to get the max(lastEnd) + 1 instead of last(start) + 1 hour,
 901 |     #  as this can create a gap when download is in minute level mode 
 902 |     # dataRangeStartTimeStamp = getLastDataRangeStart()
 903 |     dataRangeStartTimeStamp = getNextDataRangeStart()
 904 |     querystring["dataRangeStartTimeStamp"] = dataRangeStartTimeStamp 
 905 |     print('  start: ' + dataRangeStartTimeStamp)
 906 |     # sinshd - getDataRangeEndOfNow is a system clock , this can limit the data even though its available in source
 907 |     # instead do not set the end , API by default should return upto the current date time when only start time is set
 908 |     #dataRangeEndTimeStamp = getDataRangeEndOfNow()    
 909 |     #querystring["dataRangeEndTimeStamp"] = dataRangeEndTimeStamp    
 910 |     #print('  end: ' + dataRangeEndTimeStamp)
 911 | 
 912 | #added 2018-11-21 by Mathias Bouten - new API features
 913 | 
 914 | # sinshd - set the allhourstatus = true by default
 915 | #if args.allhourstatus is not None:
 916 | #    allhourstatus = str(args.allhourstatus)
 917 | querystring["includeAllHourStatus"] = allhourstatus
 918 | print('  includeAllHourStatus: ' + allhourstatus)
 919 | 
 920 | if args.subhourrange is not None:
 921 |     subhourrange = str(args.subhourrange)
 922 |     querystring["subHourlyDataRangeInMinutes"] = subhourrange
 923 |     print('  subHourlyDataRangeInMinutes: ' + subhourrange)
 924 | 
 925 | if args.schemaversion is not None:
 926 |     schemaversion = str(args.schemaversion)
 927 |     querystring["schemaVersion"] = schemaversion
 928 |     print('  schemaVersion: ' + schemaversion)
 929 | 
 930 | if args.autoreset is not None:
 931 |     autoreset = str(args.autoreset)
 932 |     print('  autoreset: ' + autoreset)
 933 | 
 934 | if args.category is not None:
 935 |     category = str(args.category)
 936 |     querystring["category"] = category
 937 |     print('  category: ' + category)
 938 | 
 939 | if args.testmode is not None:
 940 |     testmode = str(args.testmode)
 941 |     querystring["code"] = testmode
 942 |     print('  testmode: ' + testmode)
 943 | 
 944 | querystring["downloadClient"] = downloadClient
 945 | print('  downloadClient: ' + downloadClient)
 946 | ################### START SCRITPT #####################
 947 | 
 948 | # call version update function to update existing files
 949 | 
 950 | # set dynamic log file name to create a new log file in each run 
 951 | fileNm = 'discover_' + martName
 952 | logfile = logFileNmtimeStamped(fileNm)
 953 | print ('logfile:',logfile)
 954 | 
 955 | logger('CI360 DISCOVER Download API (' + version + ') - last updated '+ updateDate,'n')
 956 | 
 957 | # track start time
 958 | start = time.time()    
 959 | 
 960 | # make any changes as we change the versions
 961 | versionUpdate()
 962 | 
 963 | resetInProgress=False
 964 | 
 965 | # read config file
 966 | config = readConfig(dir_config + 'config.txt')
 967 | 
 968 | # PyJWT returns str type for jwt.encode() function: https://pyjwt.readthedocs.io/en/latest/changelog.html#improve-typings
 969 | # For backwards compatibility for older PyJwt releases, decode or return token value based on type.
 970 | def decodeToken(token):
 971 |     if (type(token)) == bytes:
 972 |         return bytes.decode(token)
 973 |     else:
 974 |         return token
 975 | 
 976 | # Generate the JWT
 977 | encodedSecret = base64.b64encode(bytes(config['secret'], 'utf-8'))
 978 | token = jwt.encode({'clientID': config['tenantId']}, encodedSecret, algorithm='HS256')
 979 | #print('\nJWT token: ' + bytes.decode(token))
 980 | headers = {'authorization': "Bearer "+ decodeToken(token),'cache-control': "no-cache"}
 981 | 
 982 | # modify discover Reset API URL
 983 | #url = createDiscoverResetAPIUrl(config)
 984 | 
 985 | # do reset if autoreset is set to yes 
 986 | if martName == 'detail' or martName == 'dbtReport':
 987 |     if autoreset == 'yes' :
 988 |         # modify discover Reset API URL
 989 |         url = createDiscoverResetAPIUrl(config)
 990 | 
 991 |         # set resetInProgress=True to indicate we are now in reset mode.
 992 |         # when resetInProgress, record download history to reset_download_history_mart.csv
 993 |         resetInProgress=True
 994 |         # check if mart download history exists.if not exists then no need to run the resets
 995 |         # this will avoid running resets when nothing is downloaded
 996 |         logger(' starting resets','n')
 997 |         historyFile = dir_config + 'download_history_' + martName + '.csv'
 998 |         logger(' checking if ' +  historyFile + ' exists ...','n')
 999 |         if fileExists(historyFile) :            
1000 |             logger(' found ', 'a')
1001 |             #store the download history into a dataframe 
1002 |             download_history_df=readDownloadHistory(historyFile)
1003 | 
1004 |             resetFile = dir_config + 'reset_range_' + martName + '.csv'
1005 |             logger(' checking if ' +  resetFile + ' exists ...','n')
1006 |             if fileExists(resetFile) :            
1007 |                 logger(' found ', 'a')
1008 |             else:
1009 |                 logger(' not found...', 'a')                
1010 |                 logger(' creating ', 'a')
1011 |                 logResetRange()
1012 |             #store the reset history into a dataframe 
1013 |             reset_range_df=readResetRange(resetFile)
1014 |             # Start looping through reset packages
1015 |             loopThroughResetPackages(url)
1016 |         else:
1017 |             logger(' not found..skipping reset ', 'a')
1018 |         logger(' finished resets','n')
1019 |         resetInProgress=False
1020 | 
1021 | # modify discover API URL
1022 | url = createDiscoverAPIUrl(config)
1023 | 
1024 | # Start looping through download packages
1025 | logger(' starting downloads','n')
1026 | loopThroughDownloadPackages(url)
1027 | logger(' finished downloads','n')
1028 | 
1029 | # track end time and calculate duration
1030 | end = time.time() 
1031 | duration = round((end - start),2)
1032 |         
1033 | logger('Done - execution time: ' + str(duration) + ' seconds','n')
1034 | 
1035 | print('\n')
1036 | 


--------------------------------------------------------------------------------
/dsccnfg/config.txt:
--------------------------------------------------------------------------------
 1 | # Enter the agent name configured in CI360 GUI
 2 | agentName = ci360_agent
 3 | 
 4 | # Enter the tenantId of your CI360 tenant, you find it underneath General in CI360 GUI
 5 | tenantId  = abc123-ci360-tenant-id-xyz
 6 | 
 7 | # Enter the secret of your agent which you created in CI360 GUI
 8 | secret    = ABC123ci360clientSecretXYZ
 9 | 
10 | # CI360 Download URL
11 | baseUrl   = https://extapigwservice-<server>/marketingGateway/discoverService/dataDownload/eventData/
12 |             
13 | 
14 | 


--------------------------------------------------------------------------------
/dscdonl/dscdonl.txt:
--------------------------------------------------------------------------------
1 | This directory will be used for storing the downloaded files from cloud.


--------------------------------------------------------------------------------
/dscextr/dscextr.txt:
--------------------------------------------------------------------------------
1 | This directory will be used for storing the extracted files from cloud.


--------------------------------------------------------------------------------
/dscwh/dscwh.txt:
--------------------------------------------------------------------------------
1 | This directory will be used for storing the extracted files from cloud.


--------------------------------------------------------------------------------
/log/logs.txt:
--------------------------------------------------------------------------------
1 | This directory will be used for storing the log files.


--------------------------------------------------------------------------------
/sql/sql.txt:
--------------------------------------------------------------------------------
1 | This directory will be used for storing the sql files.


--------------------------------------------------------------------------------