├── .gitignore ├── CHANGELOG.md ├── LICENSE.md ├── README.md ├── config ├── __init__.py └── config_request.py ├── constants ├── __init__.py └── feed_constants.py ├── enums ├── __init__.py ├── config_enums.py ├── feed_enums.py └── file_enums.py ├── errors ├── __init__.py └── custom_exceptions.py ├── examples ├── __init__.py └── config_examples.py ├── feed ├── __init__.py └── feed_request.py ├── feed_cli.py ├── filter ├── __init__.py └── feed_filter.py ├── requirements.txt ├── sample-config ├── config-file-download ├── config-file-download-filter ├── config-file-filter └── config-file-query-only ├── tests ├── __init__.py ├── test-data │ ├── test_config │ └── test_json ├── test_config_request.py ├── test_date_utils.py ├── test_feed_filter.py ├── test_feed_request.py ├── test_file_utils.py ├── test_filter_utils.py └── test_logging_utils.py └── utils ├── __init__.py ├── date_utils.py ├── file_utils.py ├── filter_utils.py └── logging_utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | venv/ 3 | ## File-based project format: 4 | *.iws 5 | 6 | # IntelliJ 7 | out/ 8 | 9 | # Python 10 | # Byte-compiled / optimized / DLL files 11 | *.py[cod] 12 | *$py.class 13 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | Feed SDK Python CHANGE LOG 2 | ========================== 3 | # 1.0.1-RELEASE (2022/04/12) 4 | Enhancement Requests: 5 | - added supports for Python 3 6 | 7 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. 10 | 11 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. 12 | 13 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 14 | 15 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 16 | 17 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 18 | 19 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 20 | 21 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). 22 | 23 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 24 | 25 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." 26 | 27 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 28 | 29 | 2. Grant of Copyright License. 30 | 31 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 32 | 33 | 3. Grant of Patent License. 34 | 35 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 36 | 37 | 4. Redistribution. 38 | 39 | You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 40 | 41 | You must give any other recipients of the Work or Derivative Works a copy of this License; and 42 | You must cause any modified files to carry prominent notices stating that You changed the files; and 43 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 44 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 45 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 46 | 47 | 5. Submission of Contributions. 48 | 49 | Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 50 | 51 | 6. Trademarks. 52 | 53 | This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 54 | 55 | 7. Disclaimer of Warranty. 56 | 57 | Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 58 | 59 | 8. Limitation of Liability. 60 | 61 | In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 62 | 63 | 9. Accepting Warranty or Additional Liability. 64 | 65 | While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 66 | 67 | END OF TERMS AND CONDITIONS 68 | 69 | APPENDIX: How to apply the Apache License to your work 70 | 71 | To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. 72 | 73 | Copyright [yyyy] [name of copyright owner] 74 | 75 | Licensed under the Apache License, Version 2.0 (the "License"); 76 | you may not use this file except in compliance with the License. 77 | You may obtain a copy of the License at 78 | 79 | http://www.apache.org/licenses/LICENSE-2.0 80 | 81 | Unless required by applicable law or agreed to in writing, software 82 | distributed under the License is distributed on an "AS IS" BASIS, 83 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 84 | See the License for the specific language governing permissions and 85 | limitations under the License. 86 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Feed SDK 2 | ========== 3 | Python SDK for downloading and filtering item feed files 4 | 5 | Table of contents 6 | ========== 7 | * [Summary](#summary) 8 | * [Setup](#setup) 9 | - [Setting up in the local environment](#setting-up-in-the-local-environment) 10 | * [Downloading feed files](#downloading-feed-files) 11 | - [Customizing download location](#customizing-download-location) 12 | * [Filtering feed files](#filtering-feed-files) 13 | - [Available filters](#available-filters) 14 | - [Combining filter criteria](#combining-filter-criteria) 15 | - [Additional filter arguments](#additional-filter-arguments) 16 | * [Schemas](#schemas) 17 | - [GetFeedResponse](#getfeedresponse) 18 | - [Response](#response) 19 | * [Logging](#logging) 20 | * [Usage](#usage) 21 | - [Using command line options](#using-command-line-options) 22 | - [Using config file driven approach](#using-config-file-driven-approach) 23 | - [Using function calls](#using-function-calls) 24 | - [Code samples](#examples) 25 | * [Performance](#performance) 26 | * [Important notes](#important-notes) 27 | 28 | # Summary 29 | 30 | Similar to [Java Feed SDK](https://github.com/eBay/FeedSDK), this Python SDK facilitates download and filtering of eBay's item feed files provided through public [Feed API](https://developer.ebay.com/api-docs/buy/feed/overview.html). 31 | 32 | The feed SDK provides a simple interface to - 33 | * [Download](#downloading-feed-files) 34 | * [Filter](#filtering-feed-files) 35 | 36 | # Setup 37 | 38 | The the entire repository can be cloned/forked and changes can be made. You are most welcome to collaborate and enhance the existing code base. 39 | 40 | ## Setting up in the local environment 41 | 42 | For setting up the project in your local environment 43 | * Clone or download the repository 44 | * Install the requirements 45 | To set up your environment, please see the requirements listed in [requirements.txt](https://github.com/eBay/FeedSDK-Python/blob/master/requirements.txt). You can run $ pip install -r requirements.txt command to install all the requirements. 46 | 47 | 48 | ## Downloading feed files 49 | The feed files can be as big as several gigabytes. Feed API supports downloading such big feed files in chunks. Chunk size is 100 MB in production environment and is 10 MB in soundbox environment. 50 | 51 | The SDK abstracts the complexity involved in calculating the request header '__range__' based on the response header '__content-range__' and downloads and appends all the chunks until the whole feed file is downloaded. 52 | 53 | To download a feed file in production which is - 54 | * __bootstrap__ : (feed_scope = ALL_ACTIVE) 55 | * __L1 category 1__ : (category_id = 220) 56 | * __marketplace US__ : (X-EBAY-C-MARKETPLACE-ID: EBAY_US) 57 | instantiate a Feed object and call get() function 58 | 59 | ``` 60 | feed_obj = Feed(feed_type='item', feed_scope='ALL_ACTIVE', category_id='220', 61 | marketplace_id='EBAY_US', token=, environment='PRODUCTION') 62 | result_code, api_status_code, file_path = feed_obj.get() 63 | 64 | ``` 65 | The __filePath__ denotes the location where the file was downloaded. 66 | 67 | ### Customizing download location 68 | 69 | The default download location is ~/Desktop/feed-sdk directory. If the directory does not exist, it will be created. 70 | The download location can be changed by specifying the optional 'download_location' argument when instantiating Feed. 71 | The download location should point to a directory. If the directory does not exist, it will be created. 72 | For example, to download to the location __/tmp/feed__ - 73 | 74 | ``` 75 | feed_obj = Feed(feed_type='item', feed_scope='ALL_ACTIVE', category_id='220', 76 | marketplace_id='EBAY_US', token=, environment='PRODUCTION', 77 | download_location='/tmp/feed') 78 | ``` 79 | --- 80 | 81 | ## Filtering feed files 82 | 83 | ### Available filters 84 | The SDK provides the capability to filter the feed files based on :- 85 | * List of leaf category ids 86 | * List of seller usernames 87 | * List of item locations 88 | * List of item IDs 89 | * List of EPIDs 90 | * List of inferred EPIDs 91 | * List of GTINs 92 | * Price range 93 | * Any other SQL query 94 | 95 | On successful completion of a filter operation, a new __filtered__ file is created in the same directory as the feed file's. 96 | 97 | To filter a feed file on leaf category IDs create an object of FeedFilterRequest and call filter() function - 98 | ``` 99 | feed_filter_obj = FeedFilterRequest(input_fila_path=, 100 | leaf_category_ids=) 101 | file_path = feed_filter_obj.filter() 102 | 103 | ``` 104 | 105 | To filter on availability threshold type and availability threshold via any_query parameter 106 | ``` 107 | feed_filter_obj = FeedFilterRequest(input_fila_path=, 108 | any_query='AvailabilityThresholdType=\'MORE_THAN\' AND AvailabilityThreshold==10') 109 | file_path = feed_filter_obj.filter() 110 | 111 | ``` 112 | 113 | The __file_path__ denotes the location of the filtered file. The file_path value can also be read by filter_request.filtered_file_path. 114 | 115 | ### Combining filter criteria 116 | 117 | The SDK provides the freedom to combine the filter criteria. 118 | 119 | To filter on leaf category IDs and seller user names for listings in the price range of 1 to 100 120 | 121 | ``` 122 | feed_filter_obj = FeedFilterRequest(input_fila_path=, 123 | leaf_category_ids=, 124 | seller_names=, 125 | price_lower_limit=1, price_upper_limit=100) 126 | file_path = feed_filter_obj.filter() 127 | 128 | ``` 129 | 130 | To filter on item location countries for listings that have more than 10 items available 131 | 132 | ``` 133 | feed_filter_obj = FeedFilterRequest(input_fila_path=, 134 | item_location_countries=, 135 | any_query='AvailabilityThresholdType=\'MORE_THAN\' AND AvailabilityThreshold=10') 136 | file_path = feed_filter_obj.filter() 137 | 138 | ``` 139 | 140 | ### Additional filter arguments 141 | When filter function is called, feed data is loaded into a sqlite DB. 142 | If keep_db=True argument is passed to filter function, the sqlite db file is kept in the current directory with name sqlite_feed_sdk.db, otherwise it will be deleted after the program execution. 143 | 144 | By default all the columns except Title, ImageUrl, and AdditionalImageUrls are processed. This behaviour can be changed by passing column_name_list argument to filter function and changing IGNORE_COLUMNS set in feed_filter.py. 145 | 146 | --- 147 | ### Schemas 148 | This section provides more detail on what information is contained within the objects returned from the SDK function calls. 149 | 150 | ### GetFeedResponse 151 | 152 | An instance of GetFeedResponse named tuple is returned from the feed_obj.get() function. 153 | 154 | ``` 155 | int status_ode 156 | String message 157 | String file_path 158 | List errors 159 | 160 | ``` 161 | 162 | | Field name | Description 163 | |---|---| 164 | | status_code | int: 0 indicates a successful response. Any non zero value indicates an error 165 | | message | String: Detailed information on the status 166 | | file_path | String: Absolute path of the location of the resulting file 167 | | errors | List: Detailed error information 168 | 169 | 170 | ### Response 171 | 172 | An instance of Response named tuple is returned from feed_filter_object.filter() function. 173 | 174 | ``` 175 | int status_code 176 | String message 177 | String file_path 178 | List applied_filters 179 | ``` 180 | | Field name | Description 181 | |---|---| 182 | | status_code | int: 0 indicates a successful response. Any non zero value indicates an error 183 | | message | String: Detailed information on the status 184 | | file_path | String: Absolute path of the location of the resulting file 185 | | applied_filters | List: List of queries applied 186 | 187 | --- 188 | ## Logging 189 | 190 | Log files are created in the current directory. 191 | 192 | __Ensure that appropriate permissions are present to write to the directory__ 193 | 194 | * The current log file name is : feed-sdk-log.log 195 | * Rolling log files are created per day with the pattern : feed-sdk-log.{yyyy-MM-dd}.log 196 | 197 | --- 198 | ## Usage 199 | 200 | The following sections describe the different ways in which the SDK can be used 201 | 202 | ### Using command line options 203 | 204 | All the capabilities of the SDK can be invoked using the command line. 205 | 206 | To see the available options and filters , use '--help' 207 | ``` 208 | usage: FeedSDK [-h] [-dt DT] -c1 C1 [-scope {ALL_ACTIVE,NEWLY_LISTED}] 209 | [-mkt MKT] [-token TOKEN] [-env {SANDBOX,PRODUCTION}] 210 | [-lf LF [LF ...]] [-sellerf SELLERF [SELLERF ...]] 211 | [-locf LOCF [LOCF ...]] [-pricelf PRICELF] [-priceuf PRICEUF] 212 | [-epidf EPIDF [EPIDF ...]] [-iepidf IEPIDF [IEPIDF ...]] 213 | [-gtinf GTINF [GTINF ...]] [-itemf ITEMF [ITEMF ...]] 214 | [-dl DOWNLOADLOCATION] [--filteronly] [-format FORMAT] [-qf QF] 215 | 216 | Feed SDK CLI 217 | 218 | optional arguments: 219 | -h, --help show this help message and exit 220 | -dt DT the date when feed file was generated 221 | -c1 C1 the l1 category id of the feed file 222 | -scope {ALL_ACTIVE,NEWLY_LISTED} 223 | the feed scope. Available scopes are ALL_ACTIVE or 224 | NEWLY_LISTED 225 | -mkt MKT the marketplace id for which feed is being requested. 226 | For example - EBAY_US 227 | -token TOKEN the oauth token for the consumer. Omit the word 228 | 'Bearer' 229 | -env {SANDBOX,PRODUCTION} 230 | environment type. Supported Environments are SANDBOX 231 | and PRODUCTION 232 | -lf LF [LF ...] list of leaf categories which are used to filter the 233 | feed 234 | -sellerf SELLERF [SELLERF ...] 235 | list of seller names which are used to filter the feed 236 | -locf LOCF [LOCF ...] 237 | list of item locations which are used to filter the 238 | feed 239 | -pricelf PRICELF lower limit of the price range for items in the feed 240 | -priceuf PRICEUF upper limit of the price range for items in the feed 241 | -epidf EPIDF [EPIDF ...] 242 | list of epids which are used to filter the feed 243 | -iepidf IEPIDF [IEPIDF ...] 244 | list of inferred epids which are used to filter the 245 | feed 246 | -gtinf GTINF [GTINF ...] 247 | list of gtins which are used to filter the feed 248 | -itemf ITEMF [ITEMF ...] 249 | list of item IDs which are used to filter the feed 250 | -dl DOWNLOADLOCATION, --downloadlocation DOWNLOADLOCATION 251 | override for changing the directory where files are 252 | downloaded 253 | --filteronly filter the feed file that already exists in the 254 | default path or the path specified by -dl, 255 | --downloadlocation option. If --filteronly option is 256 | not specified, the feed file will be downloaded again 257 | -format FORMAT feed and filter file format. Default is gzip 258 | -qf QF any other query to filter the feed file. See Python 259 | dataframe query format 260 | ``` 261 | For example, to use the command line options to 262 | 263 | Download and filter feed files using token 264 | ``` 265 | python feed_cli.py -c1 3252 -scope ALL_ACTIVE -mkt EBAY_DE -env PRODUCTION -qf "AvailabilityThreshold=10" -locf IT GB -dl DIR -token xxx 266 | ``` 267 | 268 | Filter feed files, no token is needed 269 | ``` 270 | python feed_cli.py --filteronly -c1 260 -pricelf 5 -priceuf 20 -dl FILE_PATH 271 | ``` 272 | 273 | ### Using config file driven approach 274 | 275 | All the capabilities of the SDK can be leveraged via a config file. 276 | The feed file download and filter parameters can be specified in the config file for multiple files, and SDK will process them sequentially. 277 | 278 | The structure of the config file 279 | 280 | ``` 281 | { 282 | "requests": [ 283 | { 284 | "feedRequest": { 285 | "categoryId": "260", 286 | "marketplaceId": "EBAY_US", 287 | "feedScope": "ALL_ACTIVE", 288 | "type": "ITEM" 289 | }, 290 | "filterRequest": { 291 | "itemLocationCountries": [ 292 | "US", 293 | "HK", 294 | "CA" 295 | ], 296 | "priceLowerLimit": 10.0, 297 | "priceUpperLimit": 100.0 298 | } 299 | }, 300 | { 301 | "feedRequest": { 302 | "categoryId": "220", 303 | "marketplaceId": "EBAY_US", 304 | "date": "20190127", 305 | "feedScope": "NEWLY_LISTED", 306 | "type": "ITEM" 307 | } 308 | }, 309 | { 310 | "filterRequest": { 311 | "inputFilePath": "", 312 | "leafCategoryIds": [ 313 | "112529", 314 | "64619", 315 | "111694" 316 | ], 317 | "itemLocationCountries": [ 318 | "DE", 319 | "GB", 320 | "ES" 321 | ], 322 | "anyQuery": "AvailabilityThresholdType='MORE_THAN' AND AvailabilityThreshold=10", 323 | "fileFormat" : "gzip" 324 | } 325 | } 326 | ] 327 | } 328 | ``` 329 | An example of using the SDK through a config file is located at 330 | 331 | [Example config file - 1](https://github.com/eBay/FeedSDK-Python/blob/master/sample-config/config-file-download) 332 | 333 | [Example config file - 2](https://github.com/eBay/FeedSDK-Python/blob/master/sample-config/config-file-download-filter) 334 | 335 | [Example config file - 3](https://github.com/eBay/FeedSDK-Python/blob/master/sample-config/config-file-filter) 336 | 337 | [Example config file - 4](https://github.com/eBay/FeedSDK-Python/blob/master/sample-config/config-file-query-only) 338 | 339 | ### Using function calls 340 | 341 | Samples showing the usage of available operations and filters. 342 | 343 | #### Examples 344 | All the examples are located [__here__](https://github.com/eBay/FeedSDK-Python/tree/master/examples) 345 | [Download and filter by config request](https://github.com/eBay/FeedSDK-Python/blob/master/examples/config_examples.py) 346 | 347 | 348 | --- 349 | ## Performance 350 | | Category | Type | Size gz | Size unzipped | Records | Applied Filters | Filter Time | Loading Time | Save Time 351 | |---|---|---|---|---|---|---|---|---| 352 | | 11450 | BOOTSTRAP | 4.66 GB | 89.51 GB | 63.2 Million | PriceValue, AvailabilityThresholdType, AvailabilityThreshold | ~ 7 min | ~ 98 min | ~ 2 min 353 | | 220 | BOOTSTRAP | 867.8 MB | 4.26 GB | 3.3 Million | price, AvailabilityThresholdType, AvailabilityThreshold | ~ 18 sec | ~ 5 min | ~ 37 sec 354 | | 1281 | BOOTSTRAP | 118.4 MB | 1.06 GB | 812558 | item locations, AcceptedPaymentMethods | ~ 24 sec | ~ 1.2 min | ~ 1.8 min 355 | | 11232 | BOOTSTRAP | 102.5 MB | 499.9 MB | 405268 | epids, inferredEpids | ~ 0.3 sec | ~ 37 sec | ~ 0.003 sec 356 | | 550 | BOOTSTRAP | 60.7 MB | 986.5 MB | 1000795 | price, sellers, item locations | ~ 4 sec | ~ 1.4 min | ~ 0.1 sec 357 | | 260 | BOOTSTRAP | 2.3 MB | 15.6 MB | 24100 | price, AvailabilityThresholdType, AvailabilityThreshold | ~ 0.01 sec | ~ 2 sec | ~ 0.4 sec 358 | | 220 | DAILY | 13.5 MB | 60.4 MB | 55047 | price, leaf categories, item locations | ~ 0.08 sec | ~ 4 sec | ~ 0.007 sec 359 | 360 | 361 | --- 362 | ## Important notes 363 | 364 | * Ensure there is enough storage for feed files. 365 | * Ensure that the file storage directories have appropriate write permissions. 366 | * In case of failure in downloading due to network issues, the process needs to start again. There is no capability at the moment, to resume. 367 | 368 | # License 369 | Copyright (c) 2018-2022 eBay Inc. 370 | 371 | Use of this source code is governed by an Apache 2.0 license that can be found in the LICENSE file or at https://opensource.org/licenses/Apache-2.0. 372 | -------------------------------------------------------------------------------- /config/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'config_request' 3 | ] 4 | -------------------------------------------------------------------------------- /config/config_request.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import logging 19 | from os import path 20 | from utils.file_utils import read_json 21 | from feed.feed_request import Feed 22 | from filter.feed_filter import FeedFilterRequest 23 | from constants.feed_constants import SUCCESS_CODE 24 | from enums.config_enums import ConfigField, FeedField, FilterField 25 | from errors.custom_exceptions import ConfigError 26 | from utils.logging_utils import setup_logging 27 | 28 | setup_logging() 29 | logger = logging.getLogger(__name__) 30 | 31 | 32 | class ConfigRequest(object): 33 | def __init__(self, feed_obj, filter_request_obj): 34 | self.feed_obj = feed_obj 35 | self.filter_request_obj = filter_request_obj 36 | 37 | def __str__(self): 38 | return '[feed= %s, filter_request= %s]' % (self.feed_obj, self.filter_request_obj) 39 | 40 | 41 | class ConfigFileRequest(object): 42 | def __init__(self, config_file_path): 43 | self.file_path = config_file_path 44 | self.__token = None 45 | self.__config_json_obj = None 46 | self.__requests = [] 47 | 48 | @property 49 | def requests(self): 50 | return self.__requests 51 | 52 | def parse_requests(self, token=None): 53 | self.__load_config() 54 | # populate requests list 55 | self.__create_requests(token) 56 | 57 | def process_requests(self): 58 | if not self.requests: 59 | logger.error('No requests to process') 60 | return False 61 | for config_request_obj in self.requests: 62 | get_response = None 63 | if config_request_obj.feed_obj: 64 | feed_req = config_request_obj.feed_obj 65 | get_response = feed_req.get() 66 | if get_response.status_code != SUCCESS_CODE: 67 | logger.error('Exception in downloading feed. Cannot proceed, continue to the next request\n' 68 | 'File Path: %s | Error message: %s\nFeed Request: %s\n', get_response.file_path, 69 | get_response.message, feed_req) 70 | continue 71 | if config_request_obj.filter_request_obj: 72 | filter_req = config_request_obj.filter_request_obj 73 | if get_response and get_response.file_path: 74 | # override input file path if set 75 | filter_req.input_file_path = get_response.file_path 76 | filter_response = filter_req.filter() 77 | if filter_response.status_code != SUCCESS_CODE: 78 | print(filter_response.message) 79 | return True 80 | 81 | def __load_config(self): 82 | # check the path 83 | if not self.file_path or not path.exists(self.file_path) or path.getsize(self.file_path) == 0: 84 | raise ConfigError('Config file %s does not exist or is empty' % self.file_path) 85 | # load the config file 86 | self.__config_json_obj = read_json(self.file_path) 87 | # check the config object 88 | if not self.__config_json_obj: 89 | raise ConfigError('Could not read config file %s' % self.file_path) 90 | 91 | def __create_requests(self, token): 92 | if ConfigField.REQUESTS.value not in self.__config_json_obj: 93 | raise ConfigError('No \"%s\" field exists in the config file %s' % (str(ConfigField.REQUESTS), 94 | self.file_path)) 95 | for req in self.__config_json_obj[ConfigField.REQUESTS.value]: 96 | feed_obj = None 97 | feed_field = req.get(ConfigField.FEED_REQUEST.value) 98 | if feed_field: 99 | feed_obj = Feed(feed_field.get(FeedField.TYPE.value), 100 | feed_field.get(FeedField.SCOPE.value), 101 | feed_field.get(FeedField.CATEGORY_ID.value), 102 | feed_field.get(FeedField.MARKETPLACE_ID.value), 103 | token, 104 | feed_field.get(FeedField.DATE.value), 105 | feed_field.get(FeedField.ENVIRONMENT.value), 106 | feed_field.get(FeedField.DOWNLOAD_LOCATION.value), 107 | feed_field.get(FeedField.FILE_FORMAT.value)) 108 | filter_request_obj = None 109 | filter_field = req.get(ConfigField.FILTER_REQUEST.value) 110 | if filter_field: 111 | filter_request_obj = FeedFilterRequest(str(filter_field.get(FilterField.INPUT_FILE_PATH.value)), 112 | filter_field.get(FilterField.ITEM_IDS.value), 113 | filter_field.get(FilterField.LEAF_CATEGORY_IDS.value), 114 | filter_field.get(FilterField.SELLER_NAMES.value), 115 | filter_field.get(FilterField.GTINS.value), 116 | filter_field.get(FilterField.EPIDS.value), 117 | filter_field.get(FilterField.PRICE_LOWER_LIMIT.value), 118 | filter_field.get(FilterField.PRICE_UPPER_LIMIT.value), 119 | filter_field.get(FilterField.ITEM_LOCATION_COUNTRIES.value), 120 | filter_field.get(FilterField.INFERRED_EPIDS.value), 121 | filter_field.get(FilterField.ANY_QUERY.value), 122 | filter_field.get(FilterField.FILE_FORMAT.value)) 123 | config_request_obj = ConfigRequest(feed_obj, filter_request_obj) 124 | self.requests.append(config_request_obj) 125 | -------------------------------------------------------------------------------- /constants/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'feed_constants' 3 | ] 4 | -------------------------------------------------------------------------------- /constants/feed_constants.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | REQUEST_TIMEOUT = 60 19 | REQUEST_RETRIES = 3 20 | BACK_OFF_TIME = 2 21 | 22 | FEED_API_PROD_URL = 'https://api.ebay.com/buy/feed/v1_beta/' 23 | FEED_API_SANDBOX_URL = 'https://api.sandbox.ebay.com/buy/feed/v1_beta/' 24 | 25 | # max content that can be downloaded in one request, in bytes 26 | PROD_CHUNK_SIZE = 104857600 27 | SANDBOX_CHUNK_SIZE = 10485760 28 | 29 | TOKEN_BEARER_PREFIX = 'Bearer ' 30 | 31 | AUTHORIZATION_HEADER = 'Authorization' 32 | MARKETPLACE_HEADER = 'X-EBAY-C-MARKETPLACE-ID' 33 | CONTENT_TYPE_HEADER = 'Content-type' 34 | ACCEPT_HEADER = 'Accept' 35 | RANGE_HEADER = 'Range' 36 | 37 | CONTENT_RANGE_HEADER = 'Content-Range' 38 | 39 | RANGE_PREFIX = 'bytes=' 40 | 41 | APPLICATION_JSON = 'application/json' 42 | 43 | QUERY_SCOPE = 'feed_scope' 44 | QUERY_CATEGORY_ID = 'category_id' 45 | QUERY_SNAPSHOT_DATE = 'snapshot_date' 46 | QUERY_DATE = 'date' 47 | 48 | 49 | SUCCESS_CODE = 0 50 | FAILURE_CODE = -1 51 | 52 | SUCCESS_STR = 'Success' 53 | FAILURE_STR = 'Failure' 54 | 55 | DATA_FRAME_CHUNK_SIZE = 2*(10**4) # rows 56 | -------------------------------------------------------------------------------- /enums/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'config_enums', 3 | 'feed_enums', 4 | 'file_enums' 5 | ] 6 | -------------------------------------------------------------------------------- /enums/config_enums.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | from aenum import Enum, unique 19 | 20 | 21 | @unique 22 | class ConfigField(Enum): 23 | FILTER_REQUEST = 'filterRequest' 24 | FEED_REQUEST = 'feedRequest' 25 | REQUESTS = 'requests' 26 | 27 | def __str__(self): 28 | return str(self.value) 29 | 30 | 31 | @unique 32 | class FeedField(Enum): 33 | MARKETPLACE_ID = 'marketplaceId' 34 | CATEGORY_ID = 'categoryId' 35 | DATE = 'date' 36 | SCOPE = 'feedScope' 37 | TYPE = 'type' 38 | ENVIRONMENT = 'environment' 39 | DOWNLOAD_LOCATION = 'downloadLocation' 40 | FILE_FORMAT = 'fileFormat' 41 | 42 | def __str__(self): 43 | return str(self.value) 44 | 45 | 46 | @unique 47 | class FilterField(Enum): 48 | INPUT_FILE_PATH = 'inputFilePath' 49 | ITEM_IDS = 'itemIds' 50 | LEAF_CATEGORY_IDS = 'leafCategoryIds' 51 | SELLER_NAMES = 'sellerNames' 52 | GTINS = 'gtins' 53 | EPIDS = 'epids' 54 | PRICE_LOWER_LIMIT = 'priceLowerLimit' 55 | PRICE_UPPER_LIMIT = 'priceUpperLimit' 56 | ITEM_LOCATION_COUNTRIES = 'itemLocationCountries' 57 | INFERRED_EPIDS = 'inferredEpids' 58 | ANY_QUERY = 'anyQuery' 59 | FILE_FORMAT = 'fileFormat' 60 | 61 | def __str__(self): 62 | return str(self.value) 63 | -------------------------------------------------------------------------------- /enums/feed_enums.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | from aenum import Enum, unique 19 | 20 | 21 | @unique 22 | class FeedColumn(Enum): 23 | ITEM_ID = 'ItemId' # column 0 24 | CATEGORY_ID = 'CategoryId' # column 4 25 | SELLER_USERNAME = 'SellerUsername' # column 6 26 | GTIN = 'GTIN' # column 9 27 | EPID = 'EPID' # column 12 28 | PRICE_VALUE = 'PriceValue' # column 15 29 | ITEM_LOCATION_COUNTRIES = 'ItemLocationCountry' # column 21 30 | INFERRED_EPID = 'InferredEPID' # column 40 31 | 32 | def __str__(self): 33 | return str(self.value) 34 | 35 | 36 | @unique 37 | class Environment(Enum): 38 | PRODUCTION = 'production' 39 | SANDBOX = 'sandbox' 40 | 41 | def __str__(self): 42 | return str(self.value) 43 | 44 | 45 | @unique 46 | class FeedPrefix(Enum): 47 | DAILY = 'daily' 48 | BOOTSTRAP = 'bootstrap' 49 | 50 | def __str__(self): 51 | return str(self.value) 52 | 53 | 54 | @unique 55 | class FeedScope(Enum): 56 | DAILY = 'NEWLY_LISTED' 57 | BOOTSTRAP = 'ALL_ACTIVE' 58 | 59 | def __str__(self): 60 | return str(self.value) 61 | 62 | 63 | @unique 64 | class FeedType(Enum): 65 | ITEM = 'item' 66 | SNAPSHOT = 'item_snapshot' 67 | 68 | def __str__(self): 69 | return str(self.value) 70 | -------------------------------------------------------------------------------- /enums/file_enums.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | from aenum import Enum, unique 19 | 20 | 21 | @unique 22 | class FileEncoding(Enum): 23 | UTF8 = 'UTF-8' 24 | 25 | def __str__(self): 26 | return str(self.value) 27 | 28 | 29 | @unique 30 | class FileFormat(Enum): 31 | GZIP = 'gzip' 32 | 33 | def __str__(self): 34 | return str(self.value) 35 | -------------------------------------------------------------------------------- /errors/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'custom_exceptions' 3 | ] 4 | -------------------------------------------------------------------------------- /errors/custom_exceptions.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | class Error(Exception): 19 | """Base class for errors in this module.""" 20 | pass 21 | 22 | 23 | class AuthorizationError(Error): 24 | def __init__(self, msg): 25 | self.msg = msg 26 | 27 | 28 | class ConfigError(Error): 29 | def __init__(self, msg, mark=None): 30 | self.msg = msg 31 | self.mark = mark 32 | 33 | 34 | class FileCreationError(Error): 35 | def __init__(self, msg, path): 36 | self.msg = msg 37 | self.path = path 38 | 39 | 40 | class FilterError(Error): 41 | def __init__(self, msg, filter_query=None): 42 | self.msg = msg 43 | self.input_data = filter_query 44 | 45 | 46 | class InputDataError(Error): 47 | def __init__(self, msg, input_data=None): 48 | self.msg = msg 49 | self.input_data = input_data 50 | -------------------------------------------------------------------------------- /examples/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eBay/FeedSDK-Python/a225421b691803f027067721ad779e44d7647580/examples/__init__.py -------------------------------------------------------------------------------- /examples/config_examples.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | from config.config_request import ConfigFileRequest 19 | 20 | 21 | def filter_feed(config_path): 22 | cr = ConfigFileRequest(config_path) 23 | cr.parse_requests() 24 | cr.process_requests() 25 | 26 | 27 | def download_filter_feed(config_path, token): 28 | cr = ConfigFileRequest(config_path) 29 | cr.parse_requests(token) 30 | cr.process_requests() 31 | 32 | 33 | if __name__ == '__main__': 34 | filter_feed('../sample-config/config-file-filter') 35 | download_filter_feed('../sample-config/config-file-download-filter', 'v^1.1#i...') 36 | -------------------------------------------------------------------------------- /feed/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'feed_request' 3 | ] 4 | -------------------------------------------------------------------------------- /feed/feed_request.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import certifi 19 | import urllib3 20 | import json 21 | import logging 22 | from os import path 23 | from utils import file_utils, date_utils 24 | import constants.feed_constants as const 25 | from filter.feed_filter import GetFeedResponse 26 | from enums.file_enums import FileFormat 27 | from enums.feed_enums import FeedType, FeedScope, FeedPrefix, Environment 28 | from errors.custom_exceptions import InputDataError, FileCreationError 29 | from utils.logging_utils import setup_logging 30 | 31 | setup_logging() 32 | logger = logging.getLogger(__name__) 33 | 34 | DEFAULT_DOWNLOAD_LOCATION = path.expanduser('~/Desktop/feed-sdk') 35 | 36 | 37 | class Feed(object): 38 | def __init__(self, feed_type, feed_scope, category_id, marketplace_id, token, feed_date=None, 39 | environment=Environment.PRODUCTION.value, download_location=None, file_format=FileFormat.GZIP.value): 40 | self.token = const.TOKEN_BEARER_PREFIX + token if (token and not token.startswith('Bearer')) else token 41 | self.feed_type = feed_type.lower() if feed_type else FeedType.ITEM.value 42 | self.feed_scope = feed_scope.upper() if feed_scope else FeedScope.DAILY.value 43 | self.category_id = category_id 44 | self.marketplace_id = marketplace_id 45 | self.environment = environment if environment else Environment.PRODUCTION.value 46 | self.download_location = download_location if download_location else DEFAULT_DOWNLOAD_LOCATION 47 | self.file_format = file_format if file_format else FileFormat.GZIP.value 48 | self.feed_date = feed_date if feed_date else date_utils.get_formatted_date(feed_type) 49 | 50 | def __str__(self): 51 | return '[feed_type= %s, feed_scope= %s, category_id= %s, marketplace_id= %s, feed_date= %s, ' \ 52 | 'environment= %s, download_location= %s, file_format= %s, token= %s]' % (self.feed_type, 53 | self.feed_scope, 54 | self.category_id, 55 | self.marketplace_id, 56 | self.feed_date, 57 | self.environment, 58 | self.download_location, 59 | self.file_format, 60 | self.token) 61 | 62 | def get(self): 63 | """ 64 | :return: GetFeedResponse 65 | """ 66 | logger.info( 67 | 'Downloading... \ncategoryId: %s | marketplace: %s | date: %s | feed_scope: %s | environment: %s \n', 68 | self.category_id, self.marketplace_id, self.feed_date, self.feed_scope, self.environment) 69 | if not self.token: 70 | return GetFeedResponse(const.FAILURE_CODE, 'No token has been provided', None, None, None) 71 | if path.exists(self.download_location) and not path.isdir(self.download_location): 72 | return GetFeedResponse(const.FAILURE_CODE, 'Download location is not a directory', self.download_location, 73 | None, None) 74 | try: 75 | date_utils.validate_date(self.feed_date, self.feed_type) 76 | except InputDataError as exp: 77 | return GetFeedResponse(const.FAILURE_CODE, exp.msg, self.download_location, None, None) 78 | # generate the absolute file path 79 | file_name = self.__generate_file_name() 80 | file_path = path.join(self.download_location, file_name) 81 | # Create an empty file in the given path 82 | try: 83 | file_utils.create_and_replace_binary_file(file_path) 84 | with open(file_path, 'wb') as file_obj: 85 | # Get the feed file data 86 | result_code, message = self.__invoke_request(file_obj) 87 | return GetFeedResponse(result_code, message, file_path, None, None) 88 | except IOError as exp: 89 | return GetFeedResponse(const.FAILURE_CODE, 'Could not open file %s : %s' % (file_path, repr(exp)), 90 | file_path, None, None) 91 | except (InputDataError, FileCreationError) as exp: 92 | return GetFeedResponse(const.FAILURE_CODE, exp.msg, file_path, None, None) 93 | 94 | def __invoke_request(self, file_handler): 95 | # initialize API call counter 96 | api_call_counter = 0 97 | # Find max chunk size 98 | chunk_size = self.__find_max_chunk_size() 99 | logger.info('Chunk size: %s\n', chunk_size) 100 | # The initial request Range header is bytes=0-CHUNK_SIZE 101 | headers = {const.MARKETPLACE_HEADER: self.marketplace_id, 102 | const.AUTHORIZATION_HEADER: self.token, 103 | const.CONTENT_TYPE_HEADER: const.APPLICATION_JSON, 104 | const.ACCEPT_HEADER: const.APPLICATION_JSON, 105 | const.RANGE_HEADER: const.RANGE_PREFIX + '0-' + str(chunk_size)} 106 | parameters, endpoint = self.__get_query_parameters_and_base_url() 107 | http_manager = urllib3.PoolManager(timeout=const.REQUEST_TIMEOUT, 108 | retries=urllib3.Retry(const.REQUEST_RETRIES, 109 | backoff_factor=const.BACK_OFF_TIME), 110 | cert_reqs='CERT_REQUIRED', ca_certs=certifi.where()) 111 | # Initial request 112 | feed_response = http_manager.request('GET', endpoint, parameters, headers) 113 | # increase and print API call counter 114 | api_call_counter = api_call_counter + 1 115 | logger.info('API call #%s\n', api_call_counter) 116 | # Get the status code 117 | status_code = feed_response.status 118 | # Append the data to the file, might raise an exception 119 | if status_code == 200: 120 | file_utils.append_response_to_file(file_handler, feed_response.data) 121 | return const.SUCCESS_CODE, const.SUCCESS_STR 122 | while status_code == 206: 123 | # Append the data to the file, might raise an exception 124 | file_utils.append_response_to_file(file_handler, feed_response.data) 125 | headers[const.RANGE_HEADER] = file_utils.find_next_range(feed_response.headers[const.CONTENT_RANGE_HEADER], 126 | chunk_size) 127 | # check if we have reached the end of the file 128 | if not headers[const.RANGE_HEADER]: 129 | break 130 | # Send another request 131 | feed_response = http_manager.request('GET', endpoint, parameters, headers) 132 | # increase and print API call counter 133 | api_call_counter = api_call_counter+1 134 | logger.info('API call #%s\n', api_call_counter) 135 | # Get the status code 136 | status_code = feed_response.status 137 | if status_code == 206 and not headers[const.RANGE_HEADER]: 138 | return const.SUCCESS_CODE, const.SUCCESS_STR 139 | json_response = json.loads(feed_response.data.decode('utf-8')) 140 | return const.FAILURE_CODE, json_response.get('errors') 141 | 142 | def __get_query_parameters_and_base_url(self): 143 | # Base URL 144 | base_url = self.__find_base_url() 145 | base_url = base_url + str(FeedType.ITEM) 146 | # Common query parameter 147 | fields = {const.QUERY_CATEGORY_ID: self.category_id} 148 | # Snapshot feed 149 | if self.feed_type == str(FeedType.SNAPSHOT): 150 | fields.update({const.QUERY_SNAPSHOT_DATE: self.feed_date}) 151 | base_url = const.FEED_API_PROD_URL + str(FeedType.SNAPSHOT) 152 | return fields, base_url 153 | # Daily or bootstrap feed 154 | if self.feed_scope == str(FeedScope.DAILY): 155 | fields.update({const.QUERY_SCOPE: self.feed_scope, 156 | const.QUERY_DATE: self.feed_date}) 157 | elif self.feed_scope == str(FeedScope.BOOTSTRAP): 158 | fields.update({const.QUERY_SCOPE: self.feed_scope}) 159 | return fields, base_url 160 | 161 | def __find_base_url(self): 162 | if self.environment.lower() == str(Environment.PRODUCTION): 163 | return const.FEED_API_PROD_URL 164 | return const.FEED_API_SANDBOX_URL 165 | 166 | def __find_max_chunk_size(self): 167 | if self.environment.lower() == str(Environment.PRODUCTION): 168 | return const.PROD_CHUNK_SIZE 169 | return const.SANDBOX_CHUNK_SIZE 170 | 171 | def __generate_file_name(self): 172 | if str(FeedScope.BOOTSTRAP) == self.feed_scope: 173 | feed_prefix = str(FeedPrefix.BOOTSTRAP) 174 | elif str(FeedScope.DAILY) == self.feed_scope: 175 | feed_prefix = str(FeedPrefix.DAILY) 176 | else: 177 | raise InputDataError('Unknown feed scope', self.feed_scope) 178 | file_name = str(FeedType.ITEM) + '_' + feed_prefix + '_' + str(self.category_id) + '_' + self.feed_date + \ 179 | '_' + self.marketplace_id + file_utils.get_extension(self.file_format) 180 | return file_name 181 | -------------------------------------------------------------------------------- /feed_cli.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import time 19 | import logging 20 | import argparse 21 | from enums.feed_enums import FeedType 22 | from feed.feed_request import Feed 23 | from filter.feed_filter import FeedFilterRequest 24 | from constants.feed_constants import SUCCESS_CODE 25 | from utils.logging_utils import setup_logging 26 | 27 | setup_logging() 28 | logger = logging.getLogger(__name__) 29 | 30 | parser = argparse.ArgumentParser(prog='FeedSDK', description='Feed SDK CLI') 31 | 32 | # date 33 | parser.add_argument('-dt', help='the date when feed file was generated') 34 | # l1 category 35 | parser.add_argument('-c1', help='the l1 category id of the feed file', required=True) 36 | # scope 37 | parser.add_argument('-scope', help='the feed scope. Available scopes are ALL_ACTIVE or NEWLY_LISTED', 38 | choices=['ALL_ACTIVE', 'NEWLY_LISTED'], default='NEWLY_LISTED') 39 | # marketplace 40 | parser.add_argument('-mkt', help='the marketplace id for which feed is being requested. For example - EBAY_US', 41 | default='EBAY_US') 42 | # token 43 | parser.add_argument('-token', help='the oauth token for the consumer. Omit the word \'Bearer\'') 44 | # environment 45 | parser.add_argument('-env', help='environment type. Supported Environments are SANDBOX and PRODUCTION', 46 | choices=['SANDBOX', 'PRODUCTION']) 47 | 48 | # options for filtering the files 49 | parser.add_argument('-lf', nargs='+', help='list of leaf categories which are used to filter the feed') 50 | parser.add_argument('-sellerf', nargs='+', help='list of seller names which are used to filter the feed') 51 | parser.add_argument('-locf', nargs='+', help='list of item locations which are used to filter the feed') 52 | parser.add_argument('-pricelf', type=float, help='lower limit of the price range for items in the feed') 53 | parser.add_argument('-priceuf', type=float, help='upper limit of the price range for items in the feed') 54 | parser.add_argument('-epidf', nargs='+', help='list of epids which are used to filter the feed') 55 | parser.add_argument('-iepidf', nargs='+', help='list of inferred epids which are used to filter the feed') 56 | parser.add_argument('-gtinf', nargs='+', help='list of gtins which are used to filter the feed') 57 | parser.add_argument('-itemf', nargs='+', help='list of item IDs which are used to filter the feed') 58 | # file location 59 | parser.add_argument('-dl', '--downloadlocation', help='override for changing the directory where files are downloaded') 60 | parser.add_argument('--filteronly', help='filter the feed file that already exists in the default path or the path ' 61 | 'specified by -dl, --downloadlocation option. If --filteronly option is not ' 62 | 'specified, the feed file will be downloaded again', action="store_true") 63 | # file format 64 | parser.add_argument('-format', help='feed and filter file format. Default is gzip', default='gzip') 65 | 66 | # any query to filter the feed file 67 | parser.add_argument('-qf', help='any other query to filter the feed file. See Python dataframe query format') 68 | 69 | # parse the arguments 70 | args = parser.parse_args() 71 | 72 | 73 | start = time.time() 74 | if args.filteronly: 75 | # create the filtered file 76 | feed_filter_obj = FeedFilterRequest(args.downloadlocation, args.itemf, args.lf, args.sellerf, args.gtinf, 77 | args.epidf, args.pricelf, args.priceuf, args.locf, args.iepidf, args.qf, 78 | args.format) 79 | filter_response = feed_filter_obj.filter() 80 | if filter_response.status_code != SUCCESS_CODE: 81 | print(filter_response.message) 82 | 83 | else: 84 | # download the feed file if --filteronly option is not set 85 | feed_obj = Feed(FeedType.ITEM.value, args.scope, args.c1, args.mkt, args.token, args.dt, args.env, 86 | args.downloadlocation, args.format) 87 | get_response = feed_obj.get() 88 | if get_response.status_code != SUCCESS_CODE: 89 | logger.error('Exception in downloading feed. Cannot proceed\nFile path: %s\n Error message: %s\n', 90 | get_response.file_path, get_response.message) 91 | else: 92 | # create the filtered file 93 | feed_filter_obj = FeedFilterRequest(get_response.file_path, args.itemf, args.lf, args.sellerf, args.gtinf, 94 | args.epidf, args.pricelf, args.priceuf, args.locf, args.iepidf, args.qf, 95 | args.format) 96 | filter_response = feed_filter_obj.filter() 97 | if filter_response.status_code != SUCCESS_CODE: 98 | print(filter_response.message) 99 | end = time.time() 100 | logger.info('Execution time (s): %s', str(round(end - start, 3))) 101 | print('Execution time (s): %s' % str(round(end - start, 3))) 102 | 103 | 104 | 105 | -------------------------------------------------------------------------------- /filter/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'feed_filter' 3 | ] 4 | -------------------------------------------------------------------------------- /filter/feed_filter.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import time 19 | import logging 20 | import pandas as pd 21 | from os import remove 22 | from sqlalchemy import create_engine 23 | from collections import namedtuple 24 | from os.path import split, abspath, join, isfile 25 | from utils import filter_utils 26 | from utils.file_utils import get_extension 27 | 28 | from enums.feed_enums import FeedColumn 29 | from enums.file_enums import FileEncoding, FileFormat 30 | import constants.feed_constants as const 31 | from utils.logging_utils import setup_logging 32 | 33 | setup_logging() 34 | logger = logging.getLogger(__name__) 35 | 36 | Response = namedtuple('Response', 'status_code message file_path applied_filters') 37 | GetFeedResponse = namedtuple('GetFeedResponse', Response._fields + ('errors',)) 38 | 39 | BOOL_COLUMNS = {'ImageAlteringProhibited', 'ReturnsAccepted'} 40 | # using float64 for integer columns as well as the workaround for NAN values 41 | FLOAT_COLUMNS = {'AvailabilityThreshold', 'EstimatedAvailableQuantity', 42 | 'PriceValue', 'ReturnPeriodValue'} 43 | IGNORE_COLUMNS = {'AdditionalImageUrls', 'ImageUrl', 'Title'} 44 | 45 | DB_FILE_NAME = 'sqlite_feed_sdk.db' 46 | DB_TABLE_NAME = 'feed' 47 | 48 | 49 | class FeedFilterRequest(object): 50 | def __init__(self, input_file_path, item_ids=None, leaf_category_ids=None, seller_names=None, gtins=None, 51 | epids=None, price_lower_limit=None, price_upper_limit=None, item_location_countries=None, 52 | inferred_epids=None, any_query=None, compression_type=FileFormat.GZIP.value, separator='\t', 53 | encoding=FileEncoding.UTF8.value, rows_chunk_size=const.DATA_FRAME_CHUNK_SIZE): 54 | self.input_file_path = input_file_path 55 | self.item_ids = item_ids 56 | self.leaf_category_ids = leaf_category_ids 57 | self.seller_names = seller_names 58 | self.gtins = gtins 59 | self.epids = epids 60 | self.price_lower_limit = price_lower_limit 61 | self.price_upper_limit = price_upper_limit 62 | self.item_location_countries = item_location_countries 63 | self.inferred_epids = inferred_epids 64 | self.any_query = '(%s)' % any_query if any_query else None 65 | self.compression_type = compression_type if compression_type else FileFormat.GZIP.value 66 | self.separator = separator if separator else '\t' 67 | self.encoding = encoding if encoding else FileEncoding.UTF8.value 68 | self.rows_chunk_size = rows_chunk_size if rows_chunk_size else const.DATA_FRAME_CHUNK_SIZE 69 | self.__filtered_file_path = None 70 | self.__number_of_records = 0 71 | self.__number_of_filtered_records = 0 72 | self.__queries = [] 73 | 74 | def __str__(self): 75 | return '[input_file_path= %s, item_ids= %s, leaf_category_ids= %s, seller_names= %s, gtins= %s, ' \ 76 | 'epids= %s, price_lower_limit= %s, price_upper_limit= %s, item_location_countries= %s, ' \ 77 | 'inferred_epids= %s, any_query= %s, compression_type= %s, separator= %s, encoding= %s]' % \ 78 | (self.input_file_path, 79 | self.item_ids, 80 | self.leaf_category_ids, 81 | self.seller_names, 82 | self.gtins, 83 | self.epids, 84 | self.price_lower_limit, 85 | self.price_upper_limit, 86 | self.item_location_countries, 87 | self.inferred_epids, 88 | self.any_query, 89 | self.compression_type, 90 | self.separator, 91 | self.encoding) 92 | 93 | @property 94 | def filtered_file_path(self): 95 | return self.__filtered_file_path 96 | 97 | @property 98 | def number_of_records(self): 99 | return self.__number_of_records 100 | 101 | @property 102 | def number_of_filtered_records(self): 103 | return self.__number_of_filtered_records 104 | 105 | @property 106 | def queries(self): 107 | return self.__queries 108 | 109 | def __append_query(self, query_str): 110 | if query_str: 111 | self.__queries.append(query_str) 112 | 113 | def filter(self, column_name_list=None, keep_db=False): 114 | logger.info('Filtering... \nInput file: %s', self.input_file_path) 115 | 116 | self.__append_query(self.any_query) 117 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.ITEM_ID, self.item_ids)) 118 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.CATEGORY_ID, self.leaf_category_ids)) 119 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.SELLER_USERNAME, self.seller_names)) 120 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.GTIN, self.gtins)) 121 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.EPID, self.epids)) 122 | self.__append_query(filter_utils.get_inclusive_greater_query(FeedColumn.PRICE_VALUE, self.price_lower_limit)) 123 | self.__append_query(filter_utils.get_inclusive_less_query(FeedColumn.PRICE_VALUE, self.price_upper_limit)) 124 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.INFERRED_EPID, self.inferred_epids)) 125 | self.__append_query(filter_utils.get_list_string_element_query(FeedColumn.ITEM_LOCATION_COUNTRIES, 126 | self.item_location_countries)) 127 | query_str = None 128 | if self.__queries: 129 | query_str = ' AND '.join(self.__queries) 130 | if not self.input_file_path or not isfile(self.input_file_path): 131 | return Response(const.FAILURE_CODE, 132 | 'Input file is a directory or does not exist. Cannot filter. Aborting...', 133 | self.filtered_file_path, self.queries) 134 | if not query_str: 135 | return Response(const.FAILURE_CODE, 'No filters have been specified. Cannot filter. Aborting...', 136 | self.filtered_file_path, self.queries) 137 | # create the data frame 138 | filtered_data = self.__read_chunks_gzip_file(query_str, column_name_list, keep_db) 139 | if not filtered_data.empty: 140 | self.__save_filtered_data_frame(filtered_data) 141 | else: 142 | logger.error('No filtered feed file created') 143 | return Response(const.SUCCESS_CODE, const.SUCCESS_STR, self.filtered_file_path, self.queries) 144 | 145 | def __derive_filtered_file_path(self): 146 | file_path, full_file_name = split(abspath(self.input_file_path)) 147 | file_name = full_file_name.split('.')[0] 148 | time_milliseconds = int(time.time() * 1000) 149 | filtered_file_path = join(file_path, file_name + '-filtered-' + str(time_milliseconds) + 150 | get_extension(self.compression_type)) 151 | return filtered_file_path 152 | 153 | def __read_chunks_gzip_file(self, query_str, column_name_list, keep_db): 154 | disk_engine = create_engine('sqlite:///'+DB_FILE_NAME) 155 | chunk_num = 0 156 | columns_to_process, data_types = self.__get_cols_and_type_dict() 157 | cols = column_name_list if column_name_list else columns_to_process 158 | start = time.time() 159 | for chunk_df in pd.read_csv(self.input_file_path, header=0, 160 | compression=self.compression_type, encoding=self.encoding, usecols=cols, 161 | sep=self.separator, quotechar='"', lineterminator='\n', skip_blank_lines=True, 162 | skipinitialspace=True, error_bad_lines=False, index_col=False, 163 | chunksize=self.rows_chunk_size, dtype=data_types, 164 | converters={'AvailabilityThreshold': filter_utils.convert_to_float_max_int, 165 | 'EstimatedAvailableQuantity': filter_utils.convert_to_float_max_int, 166 | 'PriceValue': filter_utils.convert_to_float_zero, 167 | 'ReturnPeriodValue': filter_utils.convert_to_float_zero, 168 | 'ImageAlteringProhibited': filter_utils.convert_to_bool_false, 169 | 'ReturnsAccepted': filter_utils.convert_to_bool_false}): 170 | self.__number_of_records = self.__number_of_records + len(chunk_df.index) 171 | chunk_num = chunk_num + 1 172 | chunk_df.to_sql(DB_TABLE_NAME, disk_engine, if_exists='append', index=False) 173 | execution_time = time.time() - start 174 | logger.info('Loaded %s records in %s (s) %s (m)', self.__number_of_records, str(round(execution_time, 3)), 175 | str(round(execution_time / 60, 3))) 176 | # apply query 177 | sql_string = '''SELECT * From %s WHERE %s ''' % (DB_TABLE_NAME, query_str) 178 | 179 | start = time.time() 180 | query_result_df = pd.read_sql_query(sql_string, disk_engine) 181 | execution_time = time.time() - start 182 | self.__number_of_filtered_records = len(query_result_df.index) 183 | logger.info('Filtered %s records in %s (s) %s (m)', self.number_of_filtered_records, 184 | str(round(execution_time, 3)), 185 | str(round(execution_time / 60, 3))) 186 | # remove the created db file 187 | if not keep_db: 188 | remove(DB_FILE_NAME) 189 | return query_result_df 190 | 191 | def __save_filtered_data_frame(self, data_frame): 192 | self.__filtered_file_path = self.__derive_filtered_file_path() 193 | start = time.time() 194 | data_frame.to_csv(self.__filtered_file_path, sep=self.separator, na_rep='', header=True, index=False, mode='w', 195 | encoding=self.encoding, compression=self.compression_type, quotechar='"', 196 | line_terminator='\n', doublequote=True, escapechar='\\', decimal='.') 197 | execution_time = time.time() - start 198 | logger.info('Saved %s records in %s (s) %s (m)', self.number_of_filtered_records, 199 | str(round(execution_time, 3)), 200 | str(round(execution_time / 60, 3))) 201 | 202 | def __get_cols_and_type_dict(self): 203 | all_columns = pd.read_csv(self.input_file_path, nrows=1, sep=self.separator, 204 | compression=self.compression_type).columns.tolist() 205 | type_dict = {} 206 | cols = [] 207 | for col_name in all_columns: 208 | # Ignoring due to possibility of comma character in the value and breaking the parser 209 | if col_name in IGNORE_COLUMNS: 210 | continue 211 | else: 212 | cols.append(col_name) 213 | if col_name not in BOOL_COLUMNS and col_name not in FLOAT_COLUMNS: 214 | type_dict[col_name] = 'object' 215 | return cols, type_dict 216 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | urllib3==1.26.5 2 | certifi==2019.3.9 3 | aenum==2.1.2 4 | pandas==0.24.2 5 | SQLAlchemy==1.3.3 6 | -------------------------------------------------------------------------------- /sample-config/config-file-download: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "feedRequest": { 5 | "categoryId": "220", 6 | "marketplaceId": "EBAY_US", 7 | "feedScope": "ALL_ACTIVE", 8 | "type": "ITEM", 9 | "downloadLocation": "", 10 | "fileFormat": "gzip" 11 | } 12 | }, 13 | { 14 | "feedRequest": { 15 | "categoryId": "11450", 16 | "marketplaceId": "EBAY_DE", 17 | "feedScope": "NEWLY_LISTED", 18 | "date": "20190127", 19 | "type": "ITEM" 20 | } 21 | } 22 | ] 23 | } -------------------------------------------------------------------------------- /sample-config/config-file-download-filter: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "feedRequest": { 5 | "categoryId": "550", 6 | "marketplaceId": "EBAY_US", 7 | "feedScope": "ALL_ACTIVE", 8 | "type": "ITEM" 9 | }, 10 | "filterRequest": { 11 | "sellerNames": [ 12 | "patikaszop", 13 | "cherp-serge", 14 | "itemtrade" 15 | ], 16 | "itemLocationCountries": [ 17 | "US", 18 | "HK", 19 | "CA" 20 | ], 21 | "priceLowerLimit": 10.0, 22 | "priceUpperLimit": 100.0 23 | } 24 | }, 25 | { 26 | "feedRequest": { 27 | "categoryId": "260", 28 | "marketplaceId": "EBAY_GB", 29 | "feedScope": "ALL_ACTIVE", 30 | "type": "ITEM" 31 | }, 32 | "filterRequest": { 33 | "leafCategoryIds": [ 34 | "162057", 35 | "705" 36 | ], 37 | "priceUpperLimit": 10 38 | } 39 | }, 40 | { 41 | "feedRequest": { 42 | "categoryId": "1281", 43 | "marketplaceId": "EBAY_US", 44 | "feedScope": "ALL_ACTIVE", 45 | "type": "ITEM" 46 | }, 47 | "filterRequest": { 48 | "anyQuery": "AcceptedPaymentMethods='PAYPAL'" 49 | }, 50 | "itemLocationCountries": [ 51 | "CA" 52 | ] 53 | }, 54 | { 55 | "feedRequest": { 56 | "categoryId": "11232", 57 | "marketplaceId": "EBAY_DE", 58 | "date": "20180708", 59 | "feedScope": "ALL_ACTIVE", 60 | "type": "ITEM" 61 | }, 62 | "filterRequest": { 63 | "epids": [ 64 | "216949221", 65 | "3927841" 66 | ], 67 | "inferredEpids": [ 68 | "216949221", 69 | "3927841" 70 | ] 71 | } 72 | }, 73 | { 74 | "feedRequest": { 75 | "categoryId": "220", 76 | "marketplaceId": "EBAY_US", 77 | "date": "20190304", 78 | "feedScope": "NEWLY_LISTED", 79 | "type": "ITEM" 80 | }, 81 | "filterRequest": { 82 | "leafCategoryIds": [ 83 | "122569", 84 | "2537", 85 | "34061", 86 | "2624" 87 | ], 88 | "itemLocationCountries": [ 89 | "US" 90 | ], 91 | "priceLowerLimit": 10.0, 92 | "priceUpperLimit": 140.0 93 | } 94 | } 95 | ] 96 | } -------------------------------------------------------------------------------- /sample-config/config-file-filter: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "filterRequest": { 5 | "inputFilePath": "", 6 | "leafCategoryIds": [ 7 | "112529", 8 | "64619", 9 | "111694" 10 | ], 11 | "itemLocationCountries": [ 12 | "DE", 13 | "GB", 14 | "ES" 15 | ], 16 | "anyQuery": "AvailabilityThresholdType='MORE_THAN' AND AvailabilityThreshold=10", 17 | "fileFormat" : "gzip" 18 | } 19 | } 20 | ] 21 | } -------------------------------------------------------------------------------- /sample-config/config-file-query-only: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "filterRequest": { 5 | "inputFilePath": "", 6 | "anyQuery": "AvailabilityThresholdType='MORE_THAN' AND AvailabilityThreshold=10" 7 | } 8 | } 9 | ] 10 | } -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eBay/FeedSDK-Python/a225421b691803f027067721ad779e44d7647580/tests/__init__.py -------------------------------------------------------------------------------- /tests/test-data/test_config: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "feedRequest": { 5 | "categoryId": "260", 6 | "marketplaceId": "EBAY_US", 7 | "feedScope": "ALL_ACTIVE", 8 | "type": "ITEM" 9 | }, 10 | "filterRequest": { 11 | "itemLocationCountries": [ 12 | "US", 13 | "HK", 14 | "CA" 15 | ], 16 | "priceLowerLimit": 10.0, 17 | "priceUpperLimit": 100.0 18 | } 19 | }, 20 | { 21 | "feedRequest": { 22 | "categoryId": "220", 23 | "marketplaceId": "EBAY_US", 24 | "date": "20190127", 25 | "feedScope": "NEWLY_LISTED", 26 | "type": "ITEM" 27 | } 28 | }, 29 | { 30 | "filterRequest": { 31 | "inputFilePath": "/Users/[USER]/Desktop/sdk/test_bootstrap.gz", 32 | "leafCategoryIds": [ 33 | "112529", 34 | "64619", 35 | "111694" 36 | ], 37 | "itemLocationCountries": [ 38 | "DE", 39 | "GB", 40 | "ES" 41 | ], 42 | "anyQuery": "AvailabilityThresholdType='MORE_THAN' AND AvailabilityThreshold=10", 43 | "fileFormat" : "gzip" 44 | } 45 | } 46 | ] 47 | } -------------------------------------------------------------------------------- /tests/test-data/test_json: -------------------------------------------------------------------------------- 1 | { 2 | "requests": [ 3 | { 4 | "feedRequest": { 5 | "categoryId": "220", 6 | "marketplaceId": "EBAY_US", 7 | "feedScope": "ALL_ACTIVE", 8 | "type": "ITEM", 9 | "downloadLocation": "/Users/[USER]/Desktop/sdk", 10 | "fileFormat": "gzip" 11 | } 12 | } 13 | ] 14 | } -------------------------------------------------------------------------------- /tests/test_config_request.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from enums.file_enums import FileFormat 3 | from enums.feed_enums import FeedScope 4 | from config.config_request import ConfigFileRequest 5 | from feed.feed_request import DEFAULT_DOWNLOAD_LOCATION 6 | 7 | 8 | class TestConfigRequest(unittest.TestCase): 9 | def test_parse_requests(self): 10 | cr = ConfigFileRequest('../tests/test-data/test_config') 11 | cr.parse_requests('Bearer v^1...') 12 | self.assertIsNotNone(cr.requests) 13 | self.assertEqual(len(cr.requests), 3) 14 | 15 | # first request has both feed and filter requests 16 | feed_req = cr.requests[0].feed_obj 17 | filter_req = cr.requests[0].filter_request_obj 18 | self.assertIsNotNone(feed_req) 19 | self.assertIsNotNone(filter_req) 20 | self.assertEqual(feed_req.category_id, u'260') 21 | self.assertEqual(filter_req.price_lower_limit, 10) 22 | 23 | # second request has a feed request only 24 | self.assertIsNone(cr.requests[1].filter_request_obj) 25 | feed_req = cr.requests[1].feed_obj 26 | self.assertIsNotNone(feed_req) 27 | self.assertIsNotNone(feed_req.token) 28 | self.assertEqual(feed_req.category_id, u'220') 29 | self.assertEqual(feed_req.marketplace_id, u'EBAY_US') 30 | self.assertEqual(feed_req.feed_date, '20190127') 31 | self.assertEqual(feed_req.feed_scope, FeedScope.DAILY.value) 32 | self.assertEqual(feed_req.download_location, DEFAULT_DOWNLOAD_LOCATION) 33 | 34 | # third request has a filter request only 35 | self.assertIsNone(cr.requests[2].feed_obj) 36 | filter_req = cr.requests[2].filter_request_obj 37 | self.assertIsNotNone(filter_req) 38 | self.assertEqual(filter_req.input_file_path, '/Users/[USER]/Desktop/sdk/test_bootstrap.gz') 39 | self.assertEqual(filter_req.leaf_category_ids, ['112529', '64619', '111694']) 40 | self.assertEqual(filter_req.item_location_countries, ['DE', 'GB', 'ES']) 41 | self.assertEqual(filter_req.any_query, 42 | '(AvailabilityThresholdType=\'MORE_THAN\' AND AvailabilityThreshold=10)') 43 | self.assertEqual(filter_req.compression_type, FileFormat.GZIP.value) 44 | 45 | 46 | if __name__ == '__main__': 47 | unittest.main() 48 | -------------------------------------------------------------------------------- /tests/test_date_utils.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from utils import date_utils 3 | from datetime import datetime 4 | from enums.feed_enums import FeedType 5 | from errors.custom_exceptions import InputDataError 6 | 7 | 8 | class TestDateUtils(unittest.TestCase): 9 | def test_get_formatted_date(self): 10 | today_date = date_utils.get_formatted_date(FeedType.ITEM) 11 | try: 12 | datetime.strptime(today_date, '%Y%m%d') 13 | except ValueError: 14 | self.fail('Invalid date format: %s' % today_date) 15 | 16 | def test_validate_date_exception(self): 17 | with self.assertRaises(InputDataError): 18 | date_utils.validate_date('2019/02/01', FeedType.ITEM) 19 | 20 | def test_validate_date(self): 21 | date_utils.validate_date('20190201', FeedType.ITEM) 22 | 23 | 24 | if __name__ == '__main__': 25 | unittest.main() 26 | -------------------------------------------------------------------------------- /tests/test_feed_filter.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from os import remove 3 | from os.path import isfile 4 | from filter.feed_filter import FeedFilterRequest 5 | from enums.file_enums import FileFormat, FileEncoding 6 | from constants.feed_constants import DATA_FRAME_CHUNK_SIZE, SUCCESS_CODE, FAILURE_CODE 7 | 8 | 9 | class TestFeed(unittest.TestCase): 10 | @classmethod 11 | def setUpClass(cls): 12 | cls.test_file_path = '../tests/test-data/test_bootstrap_feed_260_3' 13 | cls.test_any_query = 'AvailabilityThresholdType=\'MORE_THAN\' AND AvailabilityThreshold=10' 14 | 15 | def test_default_values(self): 16 | filter_request = FeedFilterRequest(self.test_file_path) 17 | self.assertIsNone(filter_request.item_ids) 18 | self.assertIsNone(filter_request.leaf_category_ids) 19 | self.assertIsNone(filter_request.seller_names) 20 | self.assertIsNone(filter_request.gtins) 21 | self.assertIsNone(filter_request.epids) 22 | self.assertIsNone(filter_request.price_upper_limit) 23 | self.assertIsNone(filter_request.price_lower_limit) 24 | self.assertIsNone(filter_request.item_location_countries) 25 | self.assertIsNone(filter_request.inferred_epids) 26 | self.assertIsNone(filter_request.item_location_countries) 27 | self.assertIsNone(filter_request.any_query) 28 | self.assertIsNone(filter_request.filtered_file_path) 29 | self.assertEqual(filter_request.compression_type, FileFormat.GZIP.value) 30 | self.assertEqual(filter_request.separator, '\t') 31 | self.assertEqual(filter_request.encoding, FileEncoding.UTF8.value) 32 | self.assertEqual(filter_request.rows_chunk_size, DATA_FRAME_CHUNK_SIZE) 33 | self.assertEqual(filter_request.number_of_records, 0) 34 | self.assertEqual(filter_request.number_of_filtered_records, 0) 35 | self.assertEqual(len(filter_request.queries), 0) 36 | 37 | def test_any_query_format(self): 38 | filter_request = FeedFilterRequest(self.test_file_path, any_query=self.test_any_query) 39 | self.assertEqual(filter_request.any_query, '(' + self.test_any_query + ')') 40 | 41 | def test_none_file_path(self): 42 | filter_request = FeedFilterRequest(None) 43 | filter_response = filter_request.filter() 44 | self.assertEqual(filter_response.status_code, FAILURE_CODE) 45 | self.assertIsNotNone(filter_response.message) 46 | self.assertIsNone(filter_response.file_path) 47 | self.assertEqual(len(filter_response.applied_filters), 0) 48 | 49 | def test_dir_file_path(self): 50 | filter_request = FeedFilterRequest('../tests/test-data') 51 | filter_response = filter_request.filter() 52 | self.assertEqual(filter_response.status_code, FAILURE_CODE) 53 | self.assertIsNotNone(filter_response.message) 54 | self.assertIsNone(filter_response.file_path) 55 | self.assertEqual(len(filter_response.applied_filters), 0) 56 | 57 | def test_no_query(self): 58 | filter_request = FeedFilterRequest(self.test_file_path) 59 | filter_response = filter_request.filter() 60 | self.assertEqual(filter_response.status_code, FAILURE_CODE) 61 | self.assertIsNotNone(filter_response.message) 62 | self.assertIsNone(filter_response.file_path) 63 | self.assertEqual(len(filter_response.applied_filters), 0) 64 | 65 | def test_apply_filters(self): 66 | filter_request = FeedFilterRequest(self.test_file_path, price_upper_limit=10, any_query=self.test_any_query) 67 | filter_response = filter_request.filter(keep_db=False) 68 | self.assertEqual(filter_response.status_code, SUCCESS_CODE) 69 | self.assertIsNotNone(filter_response.message) 70 | 71 | self.assertEqual(len(filter_request.queries), 2) 72 | self.assertEqual(len(filter_response.applied_filters), 2) 73 | 74 | self.assertTrue(filter_request.number_of_records > 0) 75 | self.assertTrue(filter_request.number_of_filtered_records > 0) 76 | 77 | self.assertIsNotNone(filter_request.filtered_file_path) 78 | self.assertTrue(isfile(filter_request.filtered_file_path)) 79 | self.assertEqual(filter_request.filtered_file_path, filter_response.file_path) 80 | # clean up 81 | remove(filter_request.filtered_file_path) 82 | 83 | 84 | if __name__ == '__main__': 85 | unittest.main() 86 | -------------------------------------------------------------------------------- /tests/test_feed_request.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from os import remove 3 | from os.path import isfile, getsize, split, abspath 4 | from utils.date_utils import get_formatted_date 5 | from enums.file_enums import FileFormat 6 | from enums.feed_enums import FeedType, FeedScope, FeedPrefix, Environment 7 | from feed.feed_request import Feed, DEFAULT_DOWNLOAD_LOCATION 8 | from constants.feed_constants import SUCCESS_CODE, FAILURE_CODE, PROD_CHUNK_SIZE 9 | 10 | 11 | class TestFeed(unittest.TestCase): 12 | 13 | @classmethod 14 | def setUpClass(cls): 15 | cls.test_token = 'Bearer v^1 ...' 16 | cls.test_category_1 = '1' 17 | cls.test_category_2 = '625' 18 | cls.test_marketplace = 'EBAY_US' 19 | cls.file_paths = [] 20 | 21 | @classmethod 22 | def tearDownClass(cls): 23 | for file_path in cls.file_paths: 24 | if file_path and isfile(file_path): 25 | remove(file_path) 26 | 27 | def test_none_token(self): 28 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.BOOTSTRAP.value, '220', 'EBAY_US', None) 29 | get_response = feed_req_obj.get() 30 | self.assertEqual(get_response.status_code, FAILURE_CODE) 31 | self.assertIsNotNone(get_response.message) 32 | self.assertIsNone(get_response.file_path, 'file_path is not None in the response') 33 | 34 | def test_default_values(self): 35 | feed_req_obj = Feed(None, None, '220', 'EBAY_US', 'v^1 ...') 36 | self.assertEqual(feed_req_obj.feed_type, FeedType.ITEM.value) 37 | self.assertEqual(feed_req_obj.feed_scope, FeedScope.DAILY.value) 38 | self.assertTrue(feed_req_obj.token.startswith('Bearer'), 'Bearer is missing from token') 39 | self.assertEqual(feed_req_obj.feed_date, get_formatted_date(feed_req_obj.feed_type)) 40 | self.assertEqual(feed_req_obj.environment, Environment.PRODUCTION.value) 41 | self.assertEqual(feed_req_obj.download_location, DEFAULT_DOWNLOAD_LOCATION) 42 | self.assertEqual(feed_req_obj.file_format, FileFormat.GZIP.value) 43 | 44 | def test_download_feed_invalid_path(self): 45 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.BOOTSTRAP.value, '220', 'EBAY_US', 'Bearer v^1 ...', 46 | download_location='../tests/test-data/test_json') 47 | get_response = feed_req_obj.get() 48 | self.assertEqual(get_response.status_code, FAILURE_CODE) 49 | self.assertIsNotNone(get_response.message) 50 | self.assertIsNotNone(get_response.file_path, 'file_path is None in the response') 51 | 52 | def test_download_feed_invalid_date(self): 53 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.BOOTSTRAP.value, '220', 'EBAY_US', 'Bearer v^1 ...', 54 | download_location='../tests/test-data/', feed_date='2019-02-01') 55 | get_response = feed_req_obj.get() 56 | self.assertEqual(get_response.status_code, FAILURE_CODE) 57 | self.assertIsNotNone(get_response.message) 58 | self.assertIsNotNone(get_response.file_path, 'file_path is None in the response') 59 | 60 | def test_download_feed_daily(self): 61 | test_date = get_formatted_date(FeedType.ITEM, -4) 62 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.DAILY.value, self.test_category_1, 63 | self.test_marketplace, self.test_token, download_location='../tests/test-data/', 64 | feed_date=test_date) 65 | get_response = feed_req_obj.get() 66 | # store the file path for clean up 67 | self.file_paths.append(get_response.file_path) 68 | # assert the result 69 | self.assertEqual(get_response.status_code, SUCCESS_CODE) 70 | self.assertIsNotNone(get_response.message) 71 | self.assertIsNotNone(get_response.file_path, 'file_path is None') 72 | self.assertTrue(isfile(get_response.file_path), 'file_path is not pointing to a file. file_path: %s' 73 | % get_response.file_path) 74 | # check the file size and name 75 | self.assertTrue(getsize(get_response.file_path) > 0, 'feed file is empty. file_path: %s' 76 | % get_response.file_path) 77 | self.assertTrue(FeedPrefix.DAILY.value in get_response.file_path, 78 | 'feed file name does not have %s in it. file_path: %s' % 79 | (FeedPrefix.DAILY.value, get_response.file_path)) 80 | file_dir, file_name = split(abspath(get_response.file_path)) 81 | self.assertEqual(abspath(feed_req_obj.download_location), file_dir) 82 | 83 | def test_download_feed_daily_bad_request(self): 84 | # ask for a future feed file that does not exist 85 | test_date = get_formatted_date(FeedType.ITEM, 5) 86 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.DAILY.value, self.test_category_1, 87 | self.test_marketplace, self.test_token, download_location='../tests/test-data/', 88 | feed_date=test_date) 89 | get_response = feed_req_obj.get() 90 | # store the file path for clean up 91 | self.file_paths.append(get_response.file_path) 92 | # assert the result 93 | self.assertEqual(get_response.status_code, FAILURE_CODE) 94 | self.assertIsNotNone(get_response.message) 95 | self.assertIsNotNone(get_response.file_path, 'file has not been created') 96 | self.assertTrue(isfile(get_response.file_path), 'file_path is not pointing to a file. file_path: %s' 97 | % get_response.file_path) 98 | # check the file size and name 99 | self.assertTrue(getsize(get_response.file_path) == 0, 'feed file is empty. file_path: %s' 100 | % get_response.file_path) 101 | self.assertTrue(FeedPrefix.DAILY.value in get_response.file_path, 102 | 'feed file name does not have %s in it. file_path: %s' 103 | % (FeedPrefix.DAILY.value, get_response.file_path)) 104 | file_dir, file_name = split(abspath(get_response.file_path)) 105 | self.assertEqual(abspath(feed_req_obj.download_location), file_dir) 106 | 107 | def test_download_feed_daily_multiple_calls(self): 108 | feed_req_obj = Feed(FeedType.ITEM.value, FeedScope.BOOTSTRAP.value, self.test_category_2, 109 | self.test_marketplace, self.test_token, download_location='../tests/test-data/') 110 | get_response = feed_req_obj.get() 111 | # store the file path for clean up 112 | self.file_paths.append(get_response.file_path) 113 | # assert the result 114 | self.assertEqual(get_response.status_code, SUCCESS_CODE) 115 | self.assertIsNotNone(get_response.message) 116 | self.assertIsNotNone(get_response.file_path, 'file has not been created') 117 | self.assertTrue(isfile(get_response.file_path), 'file_path is not pointing to a file. file_path: %s' 118 | % get_response.file_path) 119 | # check the file size and name 120 | self.assertTrue(getsize(get_response.file_path) > PROD_CHUNK_SIZE, 'feed file is less than %s. file_path: %s' 121 | % (PROD_CHUNK_SIZE, get_response.file_path)) 122 | self.assertTrue(FeedPrefix.BOOTSTRAP.value in get_response.file_path, 123 | 'feed file name does not have %s in it. file_path: %s' 124 | % (FeedPrefix.BOOTSTRAP.value, get_response.file_path)) 125 | file_dir, file_name = split(abspath(get_response.file_path)) 126 | self.assertEqual(abspath(feed_req_obj.download_location), file_dir) 127 | 128 | 129 | if __name__ == '__main__': 130 | unittest.main() 131 | -------------------------------------------------------------------------------- /tests/test_file_utils.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import os 3 | import shutil 4 | from utils import file_utils 5 | from enums.file_enums import FileFormat 6 | from errors.custom_exceptions import FileCreationError, InputDataError 7 | from constants.feed_constants import SANDBOX_CHUNK_SIZE 8 | 9 | 10 | class TestFileUtils(unittest.TestCase): 11 | def test_append_response_to_file(self): 12 | test_binary_data = b'\x01\x02\x03\x04' 13 | test_file_path = '../tests/test-data/testFile1' 14 | try: 15 | with open(test_file_path, 'wb') as file_obj: 16 | # create and append to the file 17 | file_utils.append_response_to_file(file_obj, test_binary_data) 18 | except (IOError, FileCreationError) as exp: 19 | # clean up 20 | if os.path.isfile(test_file_path): 21 | os.remove(test_file_path) 22 | self.fail(repr(exp)) 23 | # verify that the file is created 24 | self.assertTrue(os.path.isfile(test_file_path), 'test file has not been created') 25 | # verify that file size is not zero 26 | self.assertTrue(os.path.getsize(test_file_path) > 0, 'the test file is empty') 27 | # clean up 28 | os.remove(test_file_path) 29 | 30 | def test_append_response_to_file_exception(self): 31 | with self.assertRaises(FileCreationError): 32 | file_utils.append_response_to_file(None, b'\x01') 33 | 34 | def test_create_and_replace_binary_file_none(self): 35 | with self.assertRaises(FileCreationError): 36 | file_utils.create_and_replace_binary_file(None) 37 | 38 | def test_create_and_replace_binary_file_dir(self): 39 | with self.assertRaises(FileCreationError): 40 | test_dir = os.path.expanduser('~/Desktop') 41 | file_utils.create_and_replace_binary_file(test_dir) 42 | 43 | def test_create_and_replace_binary_file_exists(self): 44 | test_binary_data = b'\x01\x02\x03\x04' 45 | test_file_path = '../tests/test-data/testFile2' 46 | with open(test_file_path, 'wb') as file_obj: 47 | file_obj.write(test_binary_data) 48 | # verify that the file is created 49 | self.assertTrue(os.path.isfile(test_file_path), 'test file has not been created') 50 | # verify that file size is not zero 51 | self.assertTrue(os.path.getsize(test_file_path) > 0, 'the test file is empty') 52 | # create and replace 53 | file_utils.create_and_replace_binary_file(test_file_path) 54 | # verify that the file is created 55 | self.assertTrue(os.path.isfile(test_file_path), 'test file has not been created') 56 | # verify that file size is zero 57 | self.assertEqual(os.path.getsize(test_file_path), 0) 58 | # clean up 59 | os.remove(test_file_path) 60 | 61 | def test_create_and_replace_binary_file_not_exists(self): 62 | test_dir_to_be_created = '../tests/test-data/testDir' 63 | test_file_path = os.path.join(test_dir_to_be_created, 'testFile3') 64 | self.assertFalse(os.path.isfile(test_file_path), 'test file exists') 65 | # create and replace 66 | file_utils.create_and_replace_binary_file(test_file_path) 67 | # verify that the file is created 68 | self.assertTrue(os.path.isfile(test_file_path), 'test file has not been created') 69 | # verify that file size is zero 70 | self.assertEqual(os.path.getsize(test_file_path), 0) 71 | # clean up 72 | shutil.rmtree(test_dir_to_be_created) 73 | 74 | def test_find_next_range_none_range_header(self): 75 | next_range = file_utils.find_next_range(None, 100) 76 | self.assertEqual(next_range, 'bytes=0-100') 77 | 78 | def test_find_next_range_none_chunk(self): 79 | next_range = file_utils.find_next_range('0-1000/718182376', None) 80 | self.assertEqual(next_range, 'bytes=1001-%s' % (SANDBOX_CHUNK_SIZE + 1001)) 81 | 82 | def test_find_next_range(self): 83 | next_range = file_utils.find_next_range('1001-2001/718182376', 1000) 84 | self.assertEqual(next_range, 'bytes=2002-3002') 85 | 86 | def test_get_file_extension_none(self): 87 | ext = file_utils.get_extension(None) 88 | self.assertEqual(ext, '') 89 | 90 | def test_get_file_extension(self): 91 | ext = file_utils.get_extension(FileFormat.GZIP.value) 92 | self.assertEqual(ext, '.gz') 93 | 94 | def test_get_file_name_dir(self): 95 | test_dir = os.path.expanduser('../feed-sdk/tests') 96 | returned_dir_name = file_utils.get_file_name(test_dir) 97 | self.assertEqual(returned_dir_name, 'tests') 98 | 99 | def test_get_file_name(self): 100 | test_dir = os.path.expanduser('../feed-sdk/tests/test_json') 101 | returned_file_name = file_utils.get_file_name(test_dir) 102 | self.assertEqual(returned_file_name, 'test_json') 103 | 104 | def test_get_file_name_none(self): 105 | with self.assertRaises(InputDataError): 106 | file_utils.get_file_name(None) 107 | 108 | def test_get_file_name_name(self): 109 | test_file_name = 'abc.txt' 110 | self.assertEqual(file_utils.get_file_name(test_file_name), test_file_name) 111 | 112 | def test_read_json(self): 113 | json_obj = file_utils.read_json('../tests/test-data/test_json') 114 | self.assertIsNotNone(json_obj) 115 | self.assertIsNotNone(json_obj.get('requests')) 116 | 117 | 118 | if __name__ == '__main__': 119 | unittest.main() 120 | -------------------------------------------------------------------------------- /tests/test_filter_utils.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import sys 3 | from utils import filter_utils 4 | from enums.feed_enums import FeedColumn 5 | 6 | 7 | class TestFilterUtils(unittest.TestCase): 8 | @classmethod 9 | def setUpClass(cls): 10 | cls.test_column_1 = FeedColumn.PRICE_VALUE 11 | cls.test_column_2 = FeedColumn.ITEM_LOCATION_COUNTRIES 12 | 13 | def test_get_inclusive_less_query_none(self): 14 | query_str = filter_utils.get_inclusive_less_query(self.test_column_1, None) 15 | self.assertEqual('', query_str, 'query is not an empty string') 16 | 17 | def test_get_inclusive_less_query_empty(self): 18 | query_str = filter_utils.get_inclusive_less_query(self.test_column_1, '') 19 | self.assertEqual('', query_str, 'query is not an empty string') 20 | 21 | def test_get_inclusive_less_query(self): 22 | query_str = filter_utils.get_inclusive_less_query(self.test_column_1, 10) 23 | expected_query = '%s <= 10' % self.test_column_1 24 | self.assertEqual(expected_query, query_str) 25 | 26 | def test_get_inclusive_greater_query_none(self): 27 | query_str = filter_utils.get_inclusive_greater_query(self.test_column_1, None) 28 | self.assertEqual('', query_str, 'query is not an empty string') 29 | 30 | def test_get_inclusive_greater_query_empty(self): 31 | query_str = filter_utils.get_inclusive_greater_query(self.test_column_1, '') 32 | self.assertEqual('', query_str, 'query is not an empty string') 33 | 34 | def test_get_inclusive_greater_query(self): 35 | query_str = filter_utils.get_inclusive_greater_query(self.test_column_1, 10) 36 | expected_query = '%s >= 10' % self.test_column_1 37 | self.assertEqual(expected_query, query_str) 38 | 39 | def test_get_list_number_element_query_none(self): 40 | query_str = filter_utils.get_list_number_element_query(self.test_column_2, None) 41 | self.assertEqual('', query_str, 'query is not an empty string') 42 | 43 | def test_get_list_number_element_query_empty(self): 44 | query_str = filter_utils.get_list_number_element_query(self.test_column_2, '') 45 | self.assertEqual('', query_str, 'query is not an empty string') 46 | 47 | def test_get_list_string_element_query_none(self): 48 | query_str = filter_utils.get_list_string_element_query(self.test_column_2, None) 49 | self.assertEqual('', query_str, 'query is not an empty string') 50 | 51 | def test_get_list_string_element_query_empty(self): 52 | query_str = filter_utils.get_list_string_element_query(self.test_column_2, '') 53 | self.assertEqual('', query_str, 'query is not an empty string') 54 | 55 | def test_get_list_number_element_query(self): 56 | query_str = filter_utils.get_list_number_element_query(self.test_column_2, [1, 2]) 57 | expected_query = '%s IN (1,2)' % self.test_column_2 58 | self.assertEqual(expected_query, query_str) 59 | 60 | def test_get_list_string_element_query(self): 61 | query_str = filter_utils.get_list_string_element_query(self.test_column_2, ['CA', 'US']) 62 | expected_query = '%s IN (\'CA\',\'US\')' % self.test_column_2 63 | self.assertEqual(expected_query, query_str) 64 | 65 | def test_convert_to_bool_false_invalid(self): 66 | converted_bool = filter_utils.convert_to_bool_false('invalid') 67 | self.assertEqual(False, converted_bool) 68 | 69 | def test_convert_to_bool_false_true(self): 70 | converted_bool = filter_utils.convert_to_bool_false('True') 71 | self.assertEqual(True, converted_bool) 72 | 73 | def test_convert_to_bool_false_false(self): 74 | converted_bool = filter_utils.convert_to_bool_false('False') 75 | self.assertEqual(False, converted_bool) 76 | 77 | def convert_to_float_max_int_invalid(self): 78 | converted_float = filter_utils.convert_to_float_max_int('invalid') 79 | self.assertEqual(sys.maxsize, converted_float) 80 | 81 | def convert_to_float_max_int(self): 82 | converted_float = filter_utils.convert_to_float_max_int('1.2') 83 | self.assertEqual(1.2, converted_float) 84 | 85 | def convert_to_float_zero_invalid(self): 86 | converted_float = filter_utils.convert_to_float_zero('invalid') 87 | self.assertEqual(0, converted_float) 88 | 89 | def convert_to_float_zero(self): 90 | converted_float = filter_utils.convert_to_float_zero('1.2') 91 | self.assertEqual(1.2, converted_float) 92 | 93 | 94 | if __name__ == '__main__': 95 | unittest.main() 96 | -------------------------------------------------------------------------------- /tests/test_logging_utils.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import re 3 | import utils.logging_utils as logging_utils 4 | 5 | 6 | class TestLoggingUtils(unittest.TestCase): 7 | def test_log_file_name(self): 8 | self.assertIsNotNone(logging_utils.log_file_name) 9 | pattern = re.compile(logging_utils.LOG_FILE_NAME + '.\\d{4}-\\d{2}-\\d{2}' + logging_utils.LOG_FILE_EXTENSION) 10 | self.assertTrue(pattern.match(logging_utils.log_file_name), 11 | 'logging file name %s does not match the format' % logging_utils.log_file_name) 12 | 13 | 14 | if __name__ == '__main__': 15 | unittest.main() 16 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = [ 2 | 'date_utils', 3 | 'file_utils', 4 | 'filter_utils', 5 | 'logging_utils' 6 | ] 7 | -------------------------------------------------------------------------------- /utils/date_utils.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | from datetime import datetime, timedelta 19 | from enums.feed_enums import FeedType 20 | from errors.custom_exceptions import InputDataError 21 | 22 | 23 | def get_formatted_date(feed_type, day_delta=None): 24 | """ 25 | :param day_delta: the day difference 26 | :param feed_type: item or item_snapshot 27 | :return: today date string in the correct format according to feed_type 28 | """ 29 | delta = day_delta if day_delta else 0 30 | date_obj = datetime.now() + timedelta(days=delta) 31 | if feed_type == str(FeedType.SNAPSHOT): 32 | # TODO: Fix the date format 33 | return date_obj.strftime('%Y-%m-%dT%H:%M:%SZ') 34 | else: 35 | return date_obj.strftime('%Y%m%d') 36 | 37 | 38 | def validate_date(feed_date, feed_type): 39 | """ 40 | Validates the feed_date string format according to feed_type. 41 | :param feed_date: the date string feed is requested for 42 | :param feed_type: item or item_snapshot 43 | :raise InputDataError: if the date string format is not correct an InputDataError exception is raised 44 | """ 45 | if feed_type == str(FeedType.SNAPSHOT): 46 | try: 47 | datetime.strptime(feed_date, '%Y-%m-%dT%H:%M:%SZ') 48 | except ValueError: 49 | raise InputDataError('Bad feed date format. Date should be in UTC format (yyyy-MM-ddThh:00:00.000Z)', 50 | feed_date) 51 | else: 52 | try: 53 | datetime.strptime(feed_date, '%Y%m%d') 54 | except ValueError: 55 | raise InputDataError('Bad feed date format. Date should be in yyyyMMdd format', feed_date) 56 | -------------------------------------------------------------------------------- /utils/file_utils.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import json 19 | from os import makedirs 20 | from os.path import isdir, basename, splitext, exists, dirname 21 | from errors import custom_exceptions 22 | import constants.feed_constants as const 23 | 24 | 25 | def append_response_to_file(file_handler, data): 26 | """ 27 | Appends the given data to the existing file 28 | :param file_handler: the existing and open file object 29 | :param data: the data to be appended to the file 30 | :raise if there are any IO errors a FileCreationError exception is raised 31 | """ 32 | try: 33 | file_handler.write(data) 34 | except (IOError, AttributeError) as exp: 35 | if file_handler: 36 | file_handler.close() 37 | raise custom_exceptions.FileCreationError('Error while writing in the file: %s' % repr(exp), data) 38 | 39 | 40 | def create_and_replace_binary_file(file_path): 41 | """ 42 | Creates a binary file in the given path including the file name and extension 43 | If the file exists, it will be replaced 44 | :param file_path: The path to the file including the file name and extension 45 | :raise: if the file is not created successfully an FileCreationError exception is raised 46 | """ 47 | try: 48 | if not exists(dirname(file_path)): 49 | makedirs(dirname(file_path)) 50 | with open(file_path, 'wb'): 51 | pass 52 | except (IOError, OSError, AttributeError) as exp: 53 | raise custom_exceptions.FileCreationError('IO error in creating file %s: %s' % (file_path, repr(exp)), 54 | file_path) 55 | 56 | 57 | def find_next_range(content_range_header, chunk_size=const.SANDBOX_CHUNK_SIZE): 58 | """ 59 | Finds the next value of the Range header 60 | :param content_range_header: The content-range header value returned in the response, ex. 0-1000/7181823761 61 | If None, the default Range header that is bytes=0-CHUNK_SIZE is returned 62 | :param chunk_size: The chunk size in bytes. If not provided, the default chunk size is used 63 | :return: The next value of the Range header in the format of bytes=lower-upper or empty string if no data is left 64 | :raise: If the input content-range value is not correct an InputDataError exception is raised 65 | """ 66 | chunk = chunk_size if chunk_size else const.SANDBOX_CHUNK_SIZE 67 | if content_range_header is None: 68 | return const.RANGE_PREFIX + '0-' + str(chunk) 69 | else: 70 | try: 71 | # ex. content-range : 0-1000/7181823761 72 | range_components = content_range_header.split('/') 73 | total_size = int(range_components[1]) 74 | bounds = range_components[0].split('-') 75 | upper_bound = int(bounds[1]) + 1 76 | if upper_bound > total_size: 77 | return '' 78 | return const.RANGE_PREFIX + str(upper_bound) + '-' + str(upper_bound + chunk) 79 | except Exception: 80 | raise custom_exceptions.InputDataError('Bad content-range header format: %s' % content_range_header, 81 | content_range_header) 82 | 83 | 84 | def get_extension(file_type): 85 | """ 86 | Returns file extension including '.' according to the given file type 87 | :param file_type: format of the file such as gzip 88 | :return: extension of the file such as '.gz' 89 | """ 90 | if not file_type: 91 | return '' 92 | if file_type.lower() == 'gz' or file_type.lower() == 'gzip': 93 | return '.gz' 94 | 95 | 96 | def get_file_name(name_or_path): 97 | """ 98 | Finds name of the file from the given file path or name 99 | :param name_or_path: name or path to the file 100 | :return: file name 101 | """ 102 | if not name_or_path: 103 | raise custom_exceptions.InputDataError('Bad file name or directory %s' % name_or_path, name_or_path) 104 | if isdir(name_or_path): 105 | base = basename(name_or_path) 106 | return splitext(base) 107 | elif '/' in name_or_path: 108 | return name_or_path[name_or_path.rfind('/') + 1:] 109 | else: 110 | return name_or_path 111 | 112 | 113 | def read_json(file_path): 114 | """ 115 | Reads json from a file and returns a json object 116 | :param file_path: the path to the file 117 | :return: a json object 118 | """ 119 | with open(file_path) as config_file: 120 | json_obj = json.load(config_file) 121 | return json_obj 122 | -------------------------------------------------------------------------------- /utils/filter_utils.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import sys 19 | import pandas as pd 20 | from distutils.util import strtobool 21 | 22 | 23 | def convert_to_bool_false(data): 24 | try: 25 | bool_value = strtobool(data) 26 | return pd.np.bool(bool_value) 27 | except (ValueError, TypeError, AttributeError): 28 | return pd.np.bool(False) 29 | 30 | 31 | def convert_to_float_max_int(data): 32 | try: 33 | return pd.np.float(data) 34 | except (ValueError, TypeError, AttributeError): 35 | return pd.np.float(sys.maxsize) 36 | 37 | 38 | def convert_to_float_zero(data): 39 | try: 40 | return pd.np.float(data) 41 | except (ValueError, TypeError, AttributeError): 42 | return pd.np.float(0) 43 | 44 | 45 | def get_inclusive_less_query(column_name, upper_limit): 46 | if not upper_limit: 47 | return '' 48 | return '%s <= %s' % (column_name, upper_limit) 49 | 50 | 51 | def get_inclusive_greater_query(column_name, lower_limit): 52 | if not lower_limit: 53 | return '' 54 | return '%s >= %s' % (column_name, lower_limit) 55 | 56 | 57 | def get_list_number_element_query(column_name, value_list): 58 | if not value_list: 59 | return '' 60 | list_str = ','.join(str(element) for element in value_list) 61 | return '%s IN (%s)' % (column_name, list_str) 62 | 63 | 64 | def get_list_string_element_query(column_name, value_list): 65 | if not value_list: 66 | return '' 67 | list_str = (','.join('\'' + item + '\'' for item in value_list)) 68 | return '%s IN (%s)' % (column_name, list_str) 69 | -------------------------------------------------------------------------------- /utils/logging_utils.py: -------------------------------------------------------------------------------- 1 | # ************************************************************************** 2 | # Copyright 2018-2019 eBay Inc. 3 | # Author/Developers: -- 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # **************************************************************************/ 17 | 18 | import logging 19 | from datetime import datetime 20 | 21 | LOG_FILE_NAME = 'feed-sdk-log' 22 | LOG_FILE_EXTENSION = '.log' 23 | LOGGING_FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s' 24 | 25 | log_file_name = LOG_FILE_NAME + '.' + datetime.now().strftime('%Y-%m-%d') + LOG_FILE_EXTENSION 26 | 27 | 28 | def setup_logging(): 29 | logging.basicConfig(filename=log_file_name, filemode='a', level=logging.DEBUG, format=LOGGING_FORMAT) 30 | --------------------------------------------------------------------------------