├── .gitignore ├── LICENSE.txt ├── README.md ├── monkeylearn ├── __init__.py ├── base.py ├── classification.py ├── exceptions.py ├── extraction.py ├── response.py ├── settings.py ├── validation.py └── workflows.py ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.egg-info 3 | *~ 4 | .DS_Store 5 | /dist/ 6 | /build/ 7 | README.html 8 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015-2018 MonkeyLearn, INC 2 | 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy 5 | of this software and associated documentation files (the "Software"), to deal 6 | in the Software without restriction, including without limitation the rights 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the Software is 9 | furnished to do so, subject to the following conditions: 10 | 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 22 | THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MonkeyLearn API for Python 2 | 3 | Official Python client for the [MonkeyLearn API](https://monkeylearn.com/api/). Build and run machine learning models for language processing from your Python apps. 4 | 5 | 6 | Installation 7 | --------------- 8 | 9 | 10 | You can use pip to install the library: 11 | 12 | ```bash 13 | $ pip install monkeylearn 14 | ``` 15 | 16 | Alternatively, you can just clone the repository and run the setup.py script: 17 | 18 | ```bash 19 | $ python setup.py install 20 | ``` 21 | 22 | 23 | Usage 24 | ------ 25 | 26 | 27 | Before making requests to the API, you need to create an instance of the MonkeyLearn client. You will have to use your [account API Key](https://app.monkeylearn.com/main/my-account/tab/api-keys/): 28 | 29 | ```python 30 | from monkeylearn import MonkeyLearn 31 | 32 | # Instantiate the client Using your API key 33 | ml = MonkeyLearn('') 34 | ``` 35 | 36 | ### Requests 37 | 38 | From the MonkeyLearn client instance, you can call any endpoint (check the [available endpoints](#available-endpoints) below). For example, you can [classify](#classify) a list of texts using the public [Sentiment analysis classifier](https://app.monkeylearn.com/main/classifiers/cl_oJNMkt2V/): 39 | 40 | 41 | ```python 42 | response = ml.classifiers.classify( 43 | model_id='cl_Jx8qzYJh', 44 | data=[ 45 | 'Great hotel with excellent location', 46 | 'This is the worst hotel ever.' 47 | ] 48 | ) 49 | 50 | ``` 51 | 52 | ### Responses 53 | 54 | The response object returned by every endpoint call is a `MonkeyLearnResponse` object. The `body` attribute has the parsed response from the API: 55 | 56 | ```python 57 | print(response.body) 58 | # => [ 59 | # => { 60 | # => 'text': 'Great hotel with excellent location', 61 | # => 'external_id': null, 62 | # => 'error': false, 63 | # => 'classifications': [ 64 | # => { 65 | # => 'tag_name': 'Positive', 66 | # => 'tag_id': 1994, 67 | # => 'confidence': 0.922, 68 | # => } 69 | # => ] 70 | # => }, 71 | # => { 72 | # => 'text': 'This is the worst hotel ever.', 73 | # => 'external_id': null, 74 | # => 'error': false, 75 | # => 'classifications': [ 76 | # => { 77 | # => 'tag_name': 'Negative', 78 | # => 'tag_id': 1941, 79 | # => 'confidence': 0.911, 80 | # => } 81 | # => ] 82 | # => } 83 | # => ] 84 | ``` 85 | 86 | You can also access other attributes in the response object to get information about the queries used or available: 87 | 88 | ```python 89 | print(response.plan_queries_allowed) 90 | # => 300 91 | 92 | print(response.plan_queries_remaining) 93 | # => 240 94 | 95 | print(response.request_queries_used) 96 | # => 2 97 | ``` 98 | 99 | ### Errors 100 | 101 | Endpoint calls may raise exceptions. Here is an example on how to handle them: 102 | 103 | ```python 104 | from monkeylearn.exceptions import PlanQueryLimitError, MonkeyLearnException 105 | 106 | try: 107 | response = ml.classifiers.classify('[MODEL_ID]', data=['My text']) 108 | except PlanQueryLimitError as e: 109 | # No monthly queries left 110 | # e.response contains the MonkeyLearnResponse object 111 | print(e.error_code, e.detail) 112 | except MonkeyLearnException: 113 | raise 114 | ``` 115 | 116 | Available exceptions: 117 | 118 | | class | Description | 119 | |-----------------------------|-------------| 120 | | `MonkeyLearnException` | Base class for every exception below. | 121 | | `RequestParamsError` | An invalid parameter was sent. Check the exception message or response object for more information. | 122 | | `AuthenticationError` | Authentication failed, usually because an invalid token was provided. Check the exception message. More about [Authentication](https://monkeylearn.com/api/v3/#authentication). | 123 | | `ForbiddenError` | You don't have permissions to perform the action on the given resource. | 124 | | `ModelLimitError` | You have reached the custom model limit for your plan. | 125 | | `ModelNotFound` | The model does not exist. Check the `model_id`. | 126 | | `TagNotFound` | The tag does not exist. Check the `tag_id` parameter. | 127 | | `PlanQueryLimitError` | You have reached the monthly query limit for your plan. Consider upgrading your plan. More about [Plan query limits](https://monkeylearn.com/api/v3/#query-limits). | 128 | | `PlanRateLimitError` | You have sent too many requests in the last minute. Check the exception detail. More about [Plan rate limit](https://monkeylearn.com/api/v3/#plan-rate-limit). | 129 | | `ConcurrencyRateLimitError` | You have sent too many requests in the last second. Check the exception detail. More about [Concurrency rate limit](https://monkeylearn.com/api/v3/#concurrecy-rate-limit). | 130 | | `ModelStateError` | The state of the model is invalid. Check the exception detail. | 131 | 132 | 133 | ### Auto-batching 134 | 135 | [Classify](#classify) and [Extract](#extract) endpoints might require more than one request to the MonkeyLearn API in order to process every text in the `data` parameter. If the `auto_batch` parameter is `True` (which is the default value), you won't have to keep the `data` length below the max allowed value (200). You can just pass the full list and the library will handle the batching and make the necessary requests. If the `retry_if_throttled` parameter is `True` (which is the default value), it will also wait and retry if the API throttled a request. 136 | 137 | Let's say you send a `data` parameter with 300 texts and `auto_batch` is enabled. The list will be split internally and two requests will be sent to MonkeyLearn with 200 and 100 texts, respectively. If all requests respond with a 200 status code, the responses will be appended and you will get the 300 classifications as usual in the `MonkeyLearnResponse.body` attribute: 138 | 139 | ``` python 140 | data = ['Text to classify'] * 300 141 | response = ml.classifiers.classify('[MODEL_ID]', data) 142 | assert len(response.body) == 300 # => True 143 | ``` 144 | 145 | Now, let's say you only had 200 queries left when trying the previous example, the second internal request would fail since you wouldn't have queries left after the first batch and a `PlanQueryLimitError` exception would be raised. The first 200 (successful) classifications will be in the exception object. However, if you don't manage this exception with an `except` clause, those first 200 successful classifications will be lost. Here's how you should handle that case: 146 | 147 | ``` python 148 | from monkeylearn.exceptions import PlanQueryLimitError 149 | 150 | data = ['Text to classify'] * 300 151 | batch_size = 200 152 | 153 | try: 154 | response = ml.classifiers.classify('[MODEL_ID]', data, batch_size=batch_size) 155 | except PlanQueryLimitError as e: 156 | partial_predictions = e.response.body # The body of the successful responses 157 | non_2xx_raw_responses = r.response.failed_raw_responses # List of requests responses objects 158 | else: 159 | predictions = response.body 160 | ``` 161 | 162 | This is very convenient and usually should be enough. If you need more flexibility, you can manage batching and rate limits yourself. 163 | 164 | ``` python 165 | from time import sleep 166 | from monkeylearn.exceptions import PlanQueryLimitError, ConcurrencyRateLimitError, PlanRateLimitError 167 | 168 | data = ['Text to classify'] * 300 169 | batch_size = 200 170 | predictions = [] 171 | 172 | for i in range(0, len(data), batch_size): 173 | batch_data = data[i:i + batch_size] 174 | 175 | retry = True 176 | while retry: 177 | try: 178 | retry = True 179 | response = ml.classifiers.classify('[MODEL_ID]', batch_data, auto_batch=False, 180 | retry_if_throttled=False) 181 | except PlanRateLimitError as e: 182 | sleep(e.seconds_to_wait) 183 | except ConcurrencyRateLimitError: 184 | sleep(2) 185 | except PlanQueryLimitError: 186 | raise 187 | else: 188 | retry = False 189 | 190 | predictions.extend(response.body) 191 | ``` 192 | 193 | This way you'll be able to control every request that is sent to the MonkeyLearn API. 194 | 195 | Available endpoints 196 | ------------------------ 197 | 198 | These are all the endpoints of the API. For more information about each endpoint, check out the [API documentation](https://monkeylearn.com/api/v3/). 199 | 200 | ### Classifiers 201 | 202 | #### [Classify](https://monkeylearn.com/api/v3/?shell#classify) 203 | 204 | 205 | ```python 206 | def MonkeyLearn.classifiers.classify(model_id, data, production_model=False, batch_size=200, 207 | auto_batch=True, retry_if_throttled=True) 208 | ``` 209 | 210 | Parameters: 211 | 212 | | Parameter |Type | Description | 213 | |--------------------|-------------------|-----------------------------------------------------------| 214 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 215 | |*data* |`list[str or dict]`|A list of up to 200 data elements to classify. Each element must be a *string* with the text or a *dict* with the required `text` key and the text as the value. You can provide an optional `external_id` key with a string that will be included in the response. | 216 | |*production_model* |`bool` |Indicates if the classifications are performed by the production model. Only use this parameter with *custom models* (not with the public ones). Note that you first need to deploy your model to production either from the UI model settings or by using the [Classifier deploy endpoint](#deploy). | 217 | |*batch_size* |`int` |Max number of texts each request will send to MonkeyLearn. A number from 1 to 200. | 218 | |*auto_batch* |`bool` |Split the `data` list into smaller valid lists, send each one in separate request to MonkeyLearn, and merge the responses. | 219 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 220 | 221 | Example: 222 | 223 | ```python 224 | data = ['First text', {'text': 'Second text', 'external_id': '2'}] 225 | response = ml.classifiers.classify('[MODEL_ID]', data) 226 | ``` 227 | 228 |
229 | 230 | #### [Classifier detail](https://monkeylearn.com/api/v3/?shell#classifier-detail) 231 | 232 | 233 | ```python 234 | def MonkeyLearn.classifiers.detail(model_id, retry_if_throttled=True) 235 | ``` 236 | 237 | Parameters: 238 | 239 | | Parameter |Type | Description | 240 | |--------------------|-------------------|-----------------------------------------------------------| 241 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 242 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 243 | 244 | Example: 245 | 246 | ```python 247 | response = ml.classifiers.detail('[MODEL_ID]') 248 | ``` 249 | 250 |
251 | 252 | #### [Create Classifier](https://monkeylearn.com/api/v3/?shell#create-classifier) 253 | 254 | 255 | ```python 256 | def MonkeyLearn.classifiers.create(name, description='', algorithm='nb', language='en', 257 | max_features=10000, ngram_range=(1, 1), use_stemming=True, 258 | preprocess_numbers=True, preprocess_social_media=False, 259 | normalize_weights=True, stopwords=True, whitelist=None, 260 | retry_if_throttled=True) 261 | ``` 262 | 263 | Parameters: 264 | 265 | Parameter | Type | Description 266 | --------- | ------- | ----------- 267 | *name* | `str` | The name of the model. 268 | *description* | `str` | The description of the model. 269 | *algorithm* | `str` | The [algorithm](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-changing-the-algorithm) used when training the model. It can be either "nb" or "svm". 270 | *language* | `str` | The [language](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-language) of the model. Full list of [supported languages](https://monkeylearn.com/api/v3/#classifier-detail). 271 | *max_features* | `int` | The [maximum number of features](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-max-features) used when training the model. Between 10 and 100000. 272 | *ngram_range* | `tuple(int,int)` | Indicates which [n-gram range](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-n-gram-range) used when training the model. A list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used. 273 | *use_stemming* | `bool`| Indicates whether [stemming](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-stemming) is used when training the model. 274 | *preprocess_numbers* | `bool` | Indicates whether [number preprocessing](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-preprocess-numbers) is done when training the model. 275 | *preprocess_social_media* | `bool` | Indicates whether [preprocessing of social media](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-social-media-preprocessing-and-regular-expressions) is done when training the model. 276 | *normalize_weights* | `bool` | Indicates whether [weights will be normalized](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-normalize-weights) when training the model. 277 | *stopwords* | `bool or list` | The list of [stopwords](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-filter-stopwords) used when training the model. Use *False* for no stopwords, *True* for the default stopwords, or a list of strings for custom stopwords. 278 | *whitelist* | `list` | The [whitelist](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-whitelist) of words used when training the model. 279 | *retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 280 | 281 | Example: 282 | 283 | ```python 284 | response = ml.classifiers.create(name='New classifier', stopwords=True) 285 | ``` 286 |
287 | 288 | #### [Edit Classifier](https://monkeylearn.com/api/v3/?shell#edit-classifier) 289 | 290 | 291 | ```python 292 | def MonkeyLearn.classifiers.edit(model_id, name=None, description=None, algorithm=None, 293 | language=None, max_features=None, ngram_range=None, 294 | use_stemming=None, preprocess_numbers=None, 295 | preprocess_social_media=None, normalize_weights=None, 296 | stopwords=None, whitelist=None, retry_if_throttled=None) 297 | ``` 298 | 299 | Parameters: 300 | 301 | Parameter | Type | Description 302 | --------- | ------- | ----------- 303 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 304 | *name* | `str` | The name of the model. 305 | *description* | `str` | The description of the model. 306 | *algorithm* | `str` | The [algorithm](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-changing-the-algorithm) used when training the model. It can be either "nb" or "svm". 307 | *language* | `str` | The [language](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-language) of the model. Full list of [supported languages](https://monkeylearn.com/api/v3/#classifier-detail). 308 | *max_features* | `int` | The [maximum number of features](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-max-features) used when training the model. Between 10 and 100000. 309 | *ngram_range* | `tuple(int,int)` | Indicates which [n-gram range](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-n-gram-range) used when training the model. A list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used. 310 | *use_stemming* | `bool`| Indicates whether [stemming](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-stemming) is used when training the model. 311 | *preprocess_numbers* | `bool` | Indicates whether [number preprocessing](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-preprocess-numbers) is done when training the model. 312 | *preprocess_social_media* | `bool` | Indicates whether [preprocessing of social media](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-social-media-preprocessing-and-regular-expressions) is done when training the model. 313 | *normalize_weights* | `bool` | Indicates whether [weights will be normalized](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-normalize-weights) when training the model. 314 | *stopwords* | `bool or list` | The list of [stopwords](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-filter-stopwords) used when training the model. Use *False* for no stopwords, *True* for the default stopwords, or a list of strings for custom stopwords. 315 | *whitelist* | `list` | The [whitelist](http://help.monkeylearn.com/tips-and-tricks-for-custom-modules/parameters-whitelist) of words used when training the model. 316 | *retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 317 | 318 | Example: 319 | 320 | ```python 321 | response = ml.classifiers.edit('[MODEL_ID]', description='The new description of the classifier') 322 | ``` 323 |
324 | 325 | #### [Delete classifier](https://monkeylearn.com/api/v3/?shell#delete-classifier) 326 | 327 | 328 | ```python 329 | def MonkeyLearn.classifiers.delete(model_id, retry_if_throttled=True) 330 | ``` 331 | 332 | Parameters: 333 | 334 | | Parameter |Type | Description | 335 | |--------------------|-------------------|-----------------------------------------------------------| 336 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 337 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 338 | 339 | Example: 340 | 341 | ```python 342 | response = ml.classifiers.delete('[MODEL_ID]') 343 | ``` 344 | 345 |
346 | 347 | #### [List Classifiers](https://monkeylearn.com/api/v3/?shell#list-classifiers) 348 | 349 | 350 | ```python 351 | def MonkeyLearn.classifiers.list(page=1, per_page=20, order_by='-created', retry_if_throttled=True) 352 | ``` 353 | 354 | Parameters: 355 | 356 | |Parameter |Type | Description | 357 | |-------------------- |-------------------|-------------| 358 | |*page* |`int` |Specifies which page to get.| 359 | |*per_page* |`int` |Specifies how many items per page will be returned. | 360 | |*order_by* |`string or list` |Specifies the ordering criteria. It can either be a *string* for single criteria ordering or a *list of strings* for more than one. Each *string* must be a valid field name; if you want inverse/descending order of the field prepend a `-` (dash) character. Some valid examples are: `'is_public'`, `'-name'` or `['-is_public', 'name']`. | 361 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 362 | 363 | Example: 364 | 365 | ```python 366 | response = ml.classifiers.list(page=2, per_page=5, order_by=['-is_public', 'name']) 367 | ``` 368 | 369 |
370 | 371 | #### [Deploy](https://monkeylearn.com/api/v3/?shell#deploy) 372 | 373 | 374 | ```python 375 | def MonkeyLearn.classifiers.deploy(model_id, retry_if_throttled=True) 376 | ``` 377 | 378 | Parameters: 379 | 380 | | Parameter |Type | Description | 381 | |--------------------|-------------------|-----------------------------------------------------------| 382 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 383 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 384 | 385 | Example: 386 | 387 | ```python 388 | response = ml.classifiers.deploy('[MODEL_ID]') 389 | ``` 390 | 391 |
392 | 393 | #### [Train](https://monkeylearn.com/api/v3/?shell#train) 394 | 395 | 396 | ```python 397 | def MonkeyLearn.classifiers.train(model_id, retry_if_throttled=True) 398 | ``` 399 | 400 | Parameters: 401 | 402 | | Parameter |Type | Description | 403 | |--------------------|-------------------|-----------------------------------------------------------| 404 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 405 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 406 | 407 | Example: 408 | 409 | ```python 410 | response = ml.classifiers.train('[MODEL_ID]') 411 | ``` 412 | 413 |
414 | 415 | #### [Tag detail](https://monkeylearn.com/api/v3/?shell#classify) 416 | 417 | 418 | ```python 419 | def MonkeyLearn.classifiers.tags.detail(model_id, tag_id, retry_if_throttled=True) 420 | ``` 421 | 422 | Parameters: 423 | 424 | | Parameter |Type | Description | 425 | |--------------------|-------------------|-----------------------------------------------------------| 426 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 427 | |*tag_id* |`int` |Tag ID. | 428 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 429 | 430 | Example: 431 | 432 | ``` python 433 | response = ml.classifiers.tags.detail('[MODEL_ID]', TAG_ID) 434 | ``` 435 | 436 |
437 | 438 | #### [Create tag](https://monkeylearn.com/api/v3/?shell#create-tag) 439 | 440 | 441 | ```python 442 | def MonkeyLearn.classifiers.tags.create(model_id, name, retry_if_throttled=True) 443 | ``` 444 | 445 | Parameters: 446 | 447 | | Parameter |Type | Description | 448 | |--------------------|-------------------|-----------------------------------------------------------| 449 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 450 | |*name* |`str` |The name of the new tag. | 451 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 452 | 453 | Example: 454 | 455 | ```python 456 | response = ml.classifiers.tags.create('[MODEL_ID]', 'Positive') 457 | ``` 458 | 459 |
460 | 461 | #### [Edit tag](https://monkeylearn.com/api/v3/?shell#edit-tag) 462 | 463 | 464 | ```python 465 | def MonkeyLearn.classifiers.tags.edit(model_id, tag_id, name=None, 466 | retry_if_throttled=True) 467 | ``` 468 | 469 | Parameters: 470 | 471 | | Parameter |Type | Description | 472 | |--------------------|-------------------|-----------------------------------------------------------| 473 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 474 | |*tag_id* |`int` |Tag ID. | 475 | |*name* |`str` |The new name of the tag. | 476 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 477 | 478 | Example: 479 | 480 | ```python 481 | response = ml.classifiers.tags.edit('[MODEL_ID]', TAG_ID, 'New name') 482 | ``` 483 | 484 |
485 | 486 | #### [Delete tag](https://monkeylearn.com/api/v3/?shell#delete-tag) 487 | 488 | 489 | ```python 490 | def MonkeyLearn.classifiers.tags.delete(model_id, tag_id, move_data_to=None, 491 | retry_if_throttled=True) 492 | ``` 493 | 494 | Parameters: 495 | 496 | | Parameter |Type | Description | 497 | |--------------------|-------------------|-----------------------------------------------------------| 498 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 499 | |*tag_id* |`int` |Tag ID. | 500 | |*move_data_to* |`int` |An optional tag ID. If provided, training data associated with the tag to be deleted will be moved to the specified tag before deletion. | 501 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 502 | 503 | Example: 504 | 505 | ```python 506 | response = ml.classifiers.tags.delete('[MODEL_ID]', TAG_ID) 507 | ``` 508 | 509 |
510 | 511 | #### [Upload data](https://monkeylearn.com/api/v3/?shell#upload-data) 512 | 513 | 514 | ```python 515 | def MonkeyLearn.classifiers.upload_data(model_id, data, retry_if_throttled=True) 516 | ``` 517 | 518 | Parameters: 519 | 520 | | Parameter |Type | Description | 521 | |--------------------|-------------------|-----------------------------------------------------------| 522 | |*model_id* |`str` |Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`. | 523 | |*data* |`list[dict]` |A list of dicts with the keys described below. 524 | |*input_duplicates_strategy* |`str` | Indicates what to do with duplicate texts in this request. Must be one of `merge`, `keep_first` or `keep_last`. 525 | |*existing_duplicates_strategy* |`str` | Indicates what to do with texts of this request that already exist in the model. Must be one of `overwrite` or `ignore`. 526 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 527 | 528 | `data` dict keys: 529 | 530 | |Key | Description | 531 | |--------- | ----------- | 532 | |text | A *string* of the text to upload.| 533 | |tags | A *list* of tags that can be refered to by their numeric ID or their name. The text will be tagged with each tag in the *list* when created (in case it doesn't already exist on the model). Otherwise, its tags will be updated to the new ones. New tags will be created if they don't already exist.| 534 | |markers | An optional *list* of *string*. Each one represents a marker that will be associated with the text. New markers will be created if they don't already exist.| 535 | 536 | 537 | Example: 538 | 539 | ```python 540 | response = ml.classifiers.upload_data( 541 | model_id='[MODEL_ID]', 542 | data=[{'text': 'text 1', 'tags': [TAG_ID_1, '[tag_name]']}, 543 | {'text': 'text 2', 'tags': [TAG_ID_1, TAG_ID_2]}] 544 | ) 545 | ``` 546 | 547 |
548 | 549 | ### Extractors 550 | 551 | 552 | #### [Extract](https://monkeylearn.com/api/v3/?shell#extract) 553 | 554 | 555 | ```python 556 | def MonkeyLearn.extractors.extract(model_id, data, production_model=False, batch_size=200, 557 | retry_if_throttled=True, extra_args=None) 558 | ``` 559 | 560 | Parameters: 561 | 562 | | Parameter |Type | Description | 563 | |--------------------|-------------------|-----------------------------------------------------------| 564 | |*model_id* |`str` |Extractor ID. It always starts with `'ex'`, for example, `'ex_oJNMkt2V'`. | 565 | |*data* |`list[str or dict]`|A list of up to 200 data elements to extract from. Each element must be a *string* with the text or a *dict* with the required `text` key and the text as the value. You can also provide an optional `external_id` key with a string that will be included in the response. | 566 | |*production_model* |`bool` |Indicates if the extractions are performed by the production model. Only use this parameter with *custom models* (not with the public ones). Note that you first need to deploy your model to production from the UI model settings. | 567 | |*batch_size* |`int` |Max number of texts each request will send to MonkeyLearn. A number from 1 to 200. | 568 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 569 | 570 | Example: 571 | 572 | ```python 573 | data = ['First text', {'text': 'Second text', 'external_id': '2'}] 574 | response = ml.extractors.extract('[MODEL_ID]', data=data) 575 | ``` 576 | 577 |
578 | 579 | #### [Extractor detail](https://monkeylearn.com/api/v3/?shell#extractor-detail) 580 | 581 | 582 | ```python 583 | def MonkeyLearn.extractors.detail(model_id, retry_if_throttled=True) 584 | ``` 585 | 586 | Parameters: 587 | 588 | | Parameter |Type | Description | 589 | |--------------------|-------------------|-----------------------------------------------------------| 590 | |*model_id* |`str` |Extractor ID. It always starts with `'ex'`, for example, `'ex_oJNMkt2V'`. | 591 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 592 | 593 | Example: 594 | 595 | ```python 596 | response = ml.extractors.detail('[MODEL_ID]') 597 | ``` 598 | 599 |
600 | 601 | #### [List extractors](https://monkeylearn.com/api/v3/?shell#list-extractors) 602 | 603 | 604 | ```python 605 | def MonkeyLearn.extractors.list(page=1, per_page=20, order_by='-created', retry_if_throttled=True) 606 | ``` 607 | 608 | Parameters: 609 | 610 | |Parameter |Type | Description | 611 | |---------------------|-------------------|-------------| 612 | |*page* |`int` |Specifies which page to get.| 613 | |*per_page* |`int` |Specifies how many items per page will be returned. | 614 | |*order_by* |`string or list` |Specifies the ordering criteria. It can either be a *string* for single criteria ordering or a *list of strings* for more than one. Each *string* must be a valid field name; if you want inverse/descending order of the field prepend a `-` (dash) character. Some valid examples are: `'is_public'`, `'-name'` or `['-is_public', 'name']`. | 615 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 616 | 617 | Example: 618 | 619 | ```python 620 | response = ml.extractors.list(page=2, per_page=5, order_by=['-is_public', 'name']) 621 | ``` 622 | 623 | ### Workflows 624 | 625 | #### [Workflow detail](https://monkeylearn.com/api/v3/#workflow-detail) 626 | 627 | ```python 628 | def MonkeyLearn.workflows.detail(model_id, step_id, retry_if_throttled=True) 629 | ``` 630 | 631 | Parameters: 632 | 633 | | Parameter |Type | Description | 634 | |--------------------|-------------------|-----------------------------------------------------------| 635 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 636 | |*step_id* |`int` |Step ID. | 637 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 638 | 639 | Example: 640 | 641 | ```python 642 | response = ml.workflows.detail('[MODEL_ID]', '[STEP_ID]') 643 | ``` 644 | 645 |
646 | 647 | #### [Create workflow](https://monkeylearn.com/api/v3/#create-workflow) 648 | 649 | ```python 650 | def MonkeyLearn.workflows.create(name, db_name, steps, description='', webhook_url=None, 651 | custom_fields=None, sources=None, retry_if_throttled=True) 652 | ``` 653 | 654 | Parameters: 655 | 656 | Parameter | Type | Description 657 | --------- | ------- | ----------- 658 | *name* | `str` | The name of the model. 659 | *db_name* | `str` | The name of the database where the data will be stored. The name must not already be in use by another database. 660 | *steps* | `list[dict]` | A list of step dicts. 661 | *description* | `str` | The description of the model. 662 | *webhook_url* | `str` | An URL that will be called when an action is triggered. 663 | *custom_fields* | `[]`| A list of custom_field dicts that represent user defined fields that come with the input data and that will be saved. It does not include the mandatory `text` field. 664 | *sources* | `{}` | An object that represents the data sources of the workflow. 665 | 666 | Example: 667 | 668 | ```python 669 | response = ml.workflows.create( 670 | name='Example Workflow', 671 | db_name='example_workflow', 672 | steps=[{ 673 | name: 'sentiment', 674 | model_id: 'cl_pi3C7JiL' 675 | }, { 676 | name: 'keywords', 677 | model_id: 'ex_YCya9nrn' 678 | }]) 679 | ``` 680 | 681 |
682 | 683 | #### [Delete workflow](https://monkeylearn.com/api/v3/#delete-workflow) 684 | 685 | ```python 686 | def MonkeyLearn.workflows.delete(model_id, retry_if_throttled=True) 687 | ``` 688 | 689 | Parameters: 690 | 691 | | Parameter |Type | Description | 692 | |--------------------|-------------------|-----------------------------------------------------------| 693 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 694 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 695 | 696 | Example: 697 | 698 | ```python 699 | response = ml.workflows.delete('[MODEL_ID]') 700 | ``` 701 | 702 |
703 | 704 | #### [Step detail](https://monkeylearn.com/api/v3/#step-detail) 705 | 706 | ```python 707 | def MonkeyLearn.workflows.steps.detail(model_id, step_id, retry_if_throttled=True) 708 | ``` 709 | 710 | Parameters: 711 | 712 | | Parameter |Type | Description | 713 | |--------------------|-------------------|-----------------------------------------------------------| 714 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 715 | |*step_id* |`int` |Step ID. | 716 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 717 | 718 | Example: 719 | 720 | ``` python 721 | response = ml.workflows.steps.detail('[MODEL_ID]', STEP_ID) 722 | ``` 723 | 724 |
725 | 726 | #### [Create step](https://monkeylearn.com/api/v3/#create-step) 727 | 728 | ```python 729 | def MonkeyLearn.workflows.steps.create(model_id, name, step_model_id, input=None, 730 | conditions=None, retry_if_throttled=True) 731 | ``` 732 | 733 | Parameters: 734 | 735 | | Parameter |Type | Description | 736 | |--------------------|-------------------|-----------------------------------------------------------| 737 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 738 | |*name* |`str` |The name of the new step. | 739 | |*step_model_id* |`str` |The ID of the MonkeyLearn model that will run in this step. Must be an existing classifier or extractor. | 740 | |*input* |`str` |Where the input text to use in this step comes from. It can be either the name of a step or `input_data` (the default), which means that the input will be the original text. | 741 | |*conditions* |`list[dict]` |A list of condition dicts that indicate whether this step should execute or not. All the conditions in the list must be true for the step to execute. | 742 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 743 | 744 | Example: 745 | 746 | ```python 747 | response = ml.workflows.steps.create(model_id='[MODEL_ID]', name='sentiment', 748 | step_model_id='cl_pi3C7JiL') 749 | ``` 750 | 751 |
752 | 753 | #### [Delete step](https://monkeylearn.com/api/v3/#delete-step) 754 | 755 | ```python 756 | def MonkeyLearn.workflows.steps.delete(model_id, step_id, retry_if_throttled=True) 757 | ``` 758 | 759 | Parameters: 760 | 761 | | Parameter |Type | Description | 762 | |--------------------|-------------------|-----------------------------------------------------------| 763 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 764 | |*step_id* |`int` |Step ID. | 765 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 766 | 767 | Example: 768 | 769 | ```python 770 | response = ml.workflows.steps.delete('[MODEL_ID]', STEP_ID) 771 | ``` 772 | 773 |
774 | 775 | #### [Upload workflow data](https://monkeylearn.com/api/v3/#upload-workflow-data) 776 | 777 | ```python 778 | def MonkeyLearn.workflows.data.create(model_id, data, retry_if_throttled=True) 779 | ``` 780 | 781 | Parameters: 782 | 783 | | Parameter |Type | Description | 784 | |--------------------|-------------------|-----------------------------------------------------------| 785 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 786 | |*data* |`list[dict]` |A list of dicts with the keys described below. 787 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 788 | 789 | `data` dict keys: 790 | 791 | |Key | Description | 792 | |--------- | ----------- | 793 | |text | A *string* of the text to upload.| 794 | |[custom field name] | The value for a custom field for this text. The type of the value must be the one specified when the field was created.| 795 | 796 | 797 | Example: 798 | 799 | ```python 800 | response = ml.workflows.data.create( 801 | model_id='[MODEL_ID]', 802 | data=[{'text': 'text 1', 'rating': 3}, 803 | {'text': 'text 2', 'rating': 4}] 804 | ) 805 | ``` 806 | 807 |
808 | 809 | #### [List workflow data](https://monkeylearn.com/api/v3/#list-workflow-data) 810 | 811 | ```python 812 | def MonkeyLearn.workflows.data.list(model_id, batch_id=None, is_processed=None, 813 | sent_to_process_date_from=None, sent_to_process_date_to=None, 814 | page=None, per_page=None, retry_if_throttled=True) 815 | ``` 816 | 817 | Parameters: 818 | 819 | Parameter | Type | Description 820 | --------- | ------- | ----------- 821 | page | `int` | The page number to be retrieved. 822 | per_page | `int` | The maximum number of items the page should have. The maximum allowed value is `50`. 823 | batch_id | `int` | The ID of the batch to retrieve. If unspecified, data from all batches is shown. 824 | is_processed | `bool` | Whether to return data that has been processed or data that has not been processed yet. If unspecified, both are shown indistinctly. 825 | sent_to_process_date_from | `str` | An [ISO formatted date](https://en.wikipedia.org/wiki/ISO_8601) which specifies the oldest `sent_date` of the data to be retrieved. 826 | sent_to_process_date_to | `str` | An [ISO formatted date](https://en.wikipedia.org/wiki/ISO_8601) which specifies the most recent `sent_date` of the data to be retrieved. 827 | 828 | Example: 829 | 830 | ```python 831 | response = ml.workflows.data.list('[MODEL_ID]', batch_id=1839, page=1) 832 | ``` 833 | 834 |
835 | 836 | #### [Create custom field](https://monkeylearn.com/api/v3/#create-custom-field) 837 | 838 | 839 | ```python 840 | def MonkeyLearn.workflows.custom_fields.create(model_id, name, data_type, retry_if_throttled=True) 841 | ``` 842 | 843 | Parameters: 844 | 845 | | Parameter |Type | Description | 846 | |--------------------|-------------------|-----------------------------------------------------------| 847 | |*model_id* |`str` |Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`. | 848 | |*name* |`str` |The name of the new custom field. | 849 | |*data_type* |`str` |The type of the data of the field. It must be one of `string`, `date`, `text`, `integer`, `float`, `bool`. | 850 | |*retry_if_throttled* |`bool` |If a request is [throttled](https://monkeylearn.com/api/v3/#query-limits), sleep and retry the request. | 851 | 852 | Example: 853 | 854 | ```python 855 | response = ml.workflows.custom_fields.create(model_id='[MODEL_ID]', name='rating', 856 | data_type='integer') 857 | ``` 858 | -------------------------------------------------------------------------------- /monkeylearn/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | from monkeylearn.settings import DEFAULT_BASE_URL 5 | from monkeylearn.classification import Classification 6 | from monkeylearn.extraction import Extraction 7 | from monkeylearn.workflows import Workflows 8 | 9 | 10 | class MonkeyLearn(object): 11 | def __init__(self, token, base_url=DEFAULT_BASE_URL): 12 | self.token = token 13 | self.base_url = base_url 14 | 15 | @property 16 | def classifiers(self): 17 | if not hasattr(self, '_classifiers'): 18 | self._classifiers = Classification(token=self.token, base_url=self.base_url) 19 | return self._classifiers 20 | 21 | @property 22 | def extractors(self): 23 | if not hasattr(self, '_extractors'): 24 | self._extractors = Extraction(token=self.token, base_url=self.base_url) 25 | return self._extractors 26 | 27 | @property 28 | def workflows(self): 29 | if not hasattr(self, '_workflows'): 30 | self._workflows = Workflows(token=self.token, base_url=self.base_url) 31 | return self._workflows 32 | -------------------------------------------------------------------------------- /monkeylearn/base.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | import json 5 | import time 6 | import pkg_resources 7 | 8 | import six 9 | from six.moves.urllib.parse import urlencode 10 | import requests 11 | 12 | from monkeylearn.settings import DEFAULT_BASE_URL 13 | 14 | try: 15 | version = pkg_resources.get_distribution('monkeylearn').version 16 | except Exception: 17 | version = 'noversion' 18 | 19 | 20 | class ModelEndpointSet(object): 21 | def __init__(self, token, base_url=DEFAULT_BASE_URL): 22 | self.token = token 23 | self.base_url = base_url 24 | 25 | def _add_action_or_query_string(self, url, action, query_string): 26 | if action is not None: 27 | url += '{}/'.format(action) 28 | if query_string is not None: 29 | url += '?' + urlencode(query_string) 30 | return url 31 | 32 | def get_list_url(self, action=None, query_string=None): 33 | url = '{}v3/{}/'.format(self.base_url, self.model_type) 34 | return self._add_action_or_query_string(url, action, query_string) 35 | 36 | def get_detail_url(self, model_id, action=None, query_string=None): 37 | url = '{}{}/'.format(self.get_list_url(), model_id) 38 | return self._add_action_or_query_string(url, action, query_string) 39 | 40 | def get_nested_list_url(self, parent_id, action=None, query_string=None): 41 | url = '{}v3/{}/{}/{}/'.format( 42 | self.base_url, self.model_type[0], parent_id, self.model_type[1] 43 | ) 44 | return self._add_action_or_query_string(url, action, query_string) 45 | 46 | def get_nested_detail_url(self, parent_id, children_id, action=None, query_string=None): 47 | url = '{}{}/'.format(self.get_nested_list_url(parent_id, action=None), children_id) 48 | return self._add_action_or_query_string(url, action, query_string) 49 | 50 | def make_request(self, method, url, data=None, retry_if_throttled=True, params=None): 51 | if data is not None: 52 | data = json.dumps(data) 53 | 54 | retries_left = 3 55 | while retries_left: 56 | 57 | response = requests.request(method, url, data=data, params=params, headers={ 58 | 'Authorization': 'Token ' + self.token, 59 | 'Content-Type': 'application/json', 60 | 'User-Agent': 'python-sdk-{}'.format(version), 61 | }) 62 | 63 | if response.content: 64 | body = response.json() 65 | 66 | if retry_if_throttled and response.status_code == 429: 67 | error_code = body.get('error_code') 68 | 69 | wait = None 70 | if error_code in ('PLAN_RATE_LIMIT', 'CONCURRENCY_RATE_LIMIT'): 71 | wait = int(body.get('seconds_to_wait', 2)) 72 | 73 | if wait: 74 | time.sleep(wait) 75 | retries_left -= 1 76 | continue 77 | 78 | return response 79 | return response 80 | 81 | def remove_none_value(self, d): 82 | return {k: v for k, v in six.iteritems(d) if v is not None} 83 | -------------------------------------------------------------------------------- /monkeylearn/classification.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | from six.moves import range 5 | 6 | from monkeylearn.base import ModelEndpointSet 7 | from monkeylearn.response import MonkeyLearnResponse 8 | from monkeylearn.settings import DEFAULT_BATCH_SIZE 9 | from monkeylearn.validation import validate_batch_size, validate_order_by_param 10 | 11 | 12 | class Classification(ModelEndpointSet): 13 | model_type = 'classifiers' 14 | 15 | @property 16 | def tags(self): 17 | if not hasattr(self, '_tags'): 18 | self._tags = Tags(self.token, self.base_url) 19 | return self._tags 20 | 21 | def list(self, page=None, per_page=None, order_by=None, retry_if_throttled=True): 22 | if order_by is not None: 23 | order_by = validate_order_by_param(order_by) 24 | query_string = self.remove_none_value(dict( 25 | page=page, 26 | per_page=per_page, 27 | order_by=order_by, 28 | )) 29 | url = self.get_list_url(query_string=query_string) 30 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 31 | return MonkeyLearnResponse(response) 32 | 33 | def detail(self, model_id, retry_if_throttled=True): 34 | url = self.get_detail_url(model_id) 35 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 36 | return MonkeyLearnResponse(response) 37 | 38 | def edit(self, model_id, name=None, description=None, algorithm=None, language=None, 39 | max_features=None, ngram_range=None, use_stemming=None, preprocess_numbers=None, 40 | preprocess_social_media=None, normalize_weights=None, stopwords=None, 41 | whitelist=None, retry_if_throttled=True): 42 | data = self.remove_none_value({ 43 | 'name': name, 44 | 'description': description, 45 | 'algorithm': algorithm, 46 | 'language': language, 47 | 'max_features': max_features, 48 | 'ngram_range': ngram_range, 49 | 'use_stemming': use_stemming, 50 | 'preprocess_numbers': preprocess_numbers, 51 | 'preprocess_social_media': preprocess_social_media, 52 | 'normalize_weights': normalize_weights, 53 | 'stopwords': stopwords, 54 | 'whitelist': whitelist, 55 | }) 56 | 57 | url = self.get_detail_url(model_id) 58 | response = self.make_request('PATCH', url, data, retry_if_throttled=retry_if_throttled) 59 | return MonkeyLearnResponse(response) 60 | 61 | def deploy(self, model_id, retry_if_throttled=True): 62 | url = self.get_detail_url(model_id, action='deploy') 63 | response = self.make_request('POST', url, retry_if_throttled=retry_if_throttled) 64 | return MonkeyLearnResponse(response) 65 | 66 | def train(self, model_id, retry_if_throttled=True): 67 | url = self.get_detail_url(model_id, action='train') 68 | response = self.make_request('POST', url, retry_if_throttled=retry_if_throttled) 69 | return MonkeyLearnResponse(response) 70 | 71 | def delete(self, model_id, retry_if_throttled=True): 72 | url = self.get_detail_url(model_id) 73 | response = self.make_request('DELETE', url, retry_if_throttled=retry_if_throttled) 74 | return MonkeyLearnResponse(response) 75 | 76 | def create(self, name, description='', algorithm='svm', language='en', max_features=10000, 77 | ngram_range=(1, 2), use_stemming=True, preprocess_numbers=True, 78 | preprocess_social_media=False, normalize_weights=True, stopwords=True, 79 | whitelist=None, retry_if_throttled=True): 80 | data = self.remove_none_value({ 81 | 'name': name, 82 | 'description': description, 83 | 'algorithm': algorithm, 84 | 'language': language, 85 | 'max_features': max_features, 86 | 'ngram_range': ngram_range, 87 | 'use_stemming': use_stemming, 88 | 'preprocess_numbers': preprocess_numbers, 89 | 'preprocess_social_media': preprocess_social_media, 90 | 'normalize_weights': normalize_weights, 91 | 'stopwords': stopwords, 92 | 'whitelist': whitelist, 93 | }) 94 | url = self.get_list_url() 95 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 96 | return MonkeyLearnResponse(response) 97 | 98 | def classify(self, model_id, data, production_model=False, batch_size=DEFAULT_BATCH_SIZE, 99 | auto_batch=True, retry_if_throttled=True): 100 | validate_batch_size(batch_size) 101 | 102 | url = self.get_detail_url(model_id, action='classify') 103 | 104 | response = MonkeyLearnResponse() 105 | for i in range(0, len(data), batch_size): 106 | data_dict = self.remove_none_value({ 107 | 'data': data[i:i + batch_size], 108 | 'production_model': production_model, 109 | }) 110 | raw_response = self.make_request('POST', url, data_dict, 111 | retry_if_throttled=retry_if_throttled) 112 | response.add_raw_response(raw_response) 113 | 114 | return response 115 | 116 | def upload_data(self, model_id, data, input_duplicates_strategy=None, 117 | existing_duplicates_strategy=None, retry_if_throttled=True): 118 | url = self.get_detail_url(model_id, action='data') 119 | data_dict = {'data': data} 120 | data_dict = self.remove_none_value({ 121 | 'data': data, 122 | 'input_duplicates_strategy': input_duplicates_strategy, 123 | 'existing_duplicates_strategy': existing_duplicates_strategy 124 | }) 125 | response = self.make_request('POST', url, data_dict, retry_if_throttled=retry_if_throttled) 126 | return MonkeyLearnResponse(response) 127 | 128 | 129 | class Tags(ModelEndpointSet): 130 | model_type = ('classifiers', 'tags') 131 | 132 | def detail(self, model_id, tag_id, retry_if_throttled=True): 133 | url = self.get_nested_detail_url(model_id, tag_id) 134 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 135 | return MonkeyLearnResponse(response) 136 | 137 | def create(self, model_id, name, retry_if_throttled=True): 138 | data = self.remove_none_value({ 139 | 'name': name, 140 | }) 141 | url = self.get_nested_list_url(model_id) 142 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 143 | return MonkeyLearnResponse(response) 144 | 145 | def edit(self, model_id, tag_id, name=None, retry_if_throttled=True): 146 | data = self.remove_none_value({ 147 | 'name': name, 148 | }) 149 | url = self.get_nested_detail_url(model_id, tag_id) 150 | response = self.make_request('PATCH', url, data, retry_if_throttled=retry_if_throttled) 151 | return MonkeyLearnResponse(response) 152 | 153 | def delete(self, model_id, tag_id, move_data_to=None, retry_if_throttled=True): 154 | data = self.remove_none_value({ 155 | 'move_data_to': move_data_to, 156 | }) 157 | url = self.get_nested_detail_url(model_id, tag_id) 158 | response = self.make_request('DELETE', url, data, retry_if_throttled=retry_if_throttled) 159 | return MonkeyLearnResponse(response) 160 | -------------------------------------------------------------------------------- /monkeylearn/exceptions.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | 5 | class MonkeyLearnException(Exception): 6 | pass 7 | 8 | 9 | class MonkeyLearnLocalException(MonkeyLearnException): 10 | pass 11 | 12 | 13 | class LocalParamValidationError(MonkeyLearnLocalException): 14 | pass 15 | 16 | 17 | class MonkeyLearnResponseException(MonkeyLearnException): 18 | def __init__(self, status_code=500, detail='Internal server error', 19 | error_code=None, response=None): 20 | self.detail = detail 21 | self.error_code = error_code 22 | self.status_code = status_code 23 | self.response = response 24 | 25 | message = 'Error' 26 | if error_code: 27 | message += ' {}'.format(error_code) 28 | message += ': {}'.format(detail) 29 | 30 | super(Exception, self).__init__(message) 31 | 32 | 33 | # Request Validation Errors (422) 34 | 35 | 36 | class RequestParamsError(MonkeyLearnResponseException): 37 | pass 38 | 39 | 40 | # Authentication (401) 41 | 42 | 43 | class AuthenticationError(MonkeyLearnResponseException): 44 | pass 45 | 46 | 47 | # Forbidden (403) 48 | 49 | 50 | class ForbiddenError(MonkeyLearnResponseException): 51 | pass 52 | 53 | 54 | class ModelLimitError(ForbiddenError): 55 | pass 56 | 57 | 58 | # Not found Exceptions (404) 59 | 60 | 61 | class ResourceNotFound(MonkeyLearnResponseException): 62 | pass 63 | 64 | 65 | class ModelNotFound(ResourceNotFound): 66 | pass 67 | 68 | 69 | class TagNotFound(ResourceNotFound): 70 | pass 71 | 72 | 73 | # Rate limit (429) 74 | 75 | 76 | class PlanQueryLimitError(MonkeyLearnResponseException): 77 | pass 78 | 79 | 80 | class RateLimitError(MonkeyLearnResponseException): 81 | pass 82 | 83 | 84 | class PlanRateLimitError(RateLimitError): 85 | def __init__(self, seconds_to_wait=60, *args, **kwargs): 86 | self.seconds_to_wait = seconds_to_wait 87 | super(RateLimitError, self).__init__(*args, **kwargs) 88 | 89 | 90 | class ConcurrencyRateLimitError(RateLimitError): 91 | pass 92 | 93 | 94 | # State errors (423) 95 | 96 | 97 | class ModelStateError(MonkeyLearnResponseException): 98 | pass 99 | 100 | 101 | RESPONSE_CODES_EXCEPTION_MAP = { 102 | 422: RequestParamsError, 103 | 401: AuthenticationError, 104 | 403: { 105 | 'MODEL_LIMIT': ModelLimitError, 106 | '*': ForbiddenError, 107 | }, 108 | 404: { 109 | 'MODEL_NOT_FOUND': ModelNotFound, 110 | 'TAG_NOT_FOUND': TagNotFound, 111 | '*': ResourceNotFound, 112 | }, 113 | 429: { 114 | 'PLAN_RATE_LIMIT': PlanRateLimitError, 115 | 'CONCURRENCY_RATE_LIMIT': ConcurrencyRateLimitError, 116 | 'PLAN_QUERY_LIMIT': PlanQueryLimitError, 117 | '*': RateLimitError, 118 | }, 119 | 423: ModelStateError, 120 | } 121 | 122 | 123 | def get_exception_class(status_code, error_code=None): 124 | exception_or_dict = RESPONSE_CODES_EXCEPTION_MAP.get(status_code, MonkeyLearnResponseException) 125 | if isinstance(exception_or_dict, dict): 126 | exception_class = exception_or_dict.get(error_code, exception_or_dict['*']) 127 | else: 128 | exception_class = exception_or_dict 129 | return exception_class 130 | -------------------------------------------------------------------------------- /monkeylearn/extraction.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | from six.moves import range 5 | 6 | from monkeylearn.base import ModelEndpointSet 7 | from monkeylearn.settings import DEFAULT_BATCH_SIZE 8 | from monkeylearn.response import MonkeyLearnResponse 9 | from monkeylearn.validation import validate_batch_size, validate_order_by_param 10 | 11 | 12 | class Extraction(ModelEndpointSet): 13 | model_type = 'extractors' 14 | 15 | def list(self, page=None, per_page=None, order_by=None, retry_if_throttled=True): 16 | if order_by is not None: 17 | order_by = validate_order_by_param(order_by) 18 | query_string = self.remove_none_value(dict( 19 | page=page, 20 | per_page=per_page, 21 | order_by=order_by, 22 | )) 23 | url = self.get_list_url(query_string=query_string) 24 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 25 | return MonkeyLearnResponse(response) 26 | 27 | def detail(self, model_id, retry_if_throttled=True): 28 | url = self.get_detail_url(model_id) 29 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 30 | return MonkeyLearnResponse(response) 31 | 32 | def extract(self, model_id, data, production_model=False, batch_size=DEFAULT_BATCH_SIZE, 33 | retry_if_throttled=True, extra_args=None): 34 | if extra_args is None: 35 | extra_args = {} 36 | 37 | validate_batch_size(batch_size) 38 | 39 | url = self.get_detail_url(model_id, action='extract') 40 | 41 | response = MonkeyLearnResponse() 42 | for i in range(0, len(data), batch_size): 43 | data_dict = self.remove_none_value({ 44 | 'data': data[i:i + batch_size], 45 | 'production_model': production_model, 46 | }) 47 | data_dict.update(extra_args) 48 | raw_response = self.make_request('POST', url, data_dict, 49 | retry_if_throttled=retry_if_throttled) 50 | response.add_raw_response(raw_response) 51 | 52 | return response 53 | -------------------------------------------------------------------------------- /monkeylearn/response.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | import requests 5 | 6 | from monkeylearn.exceptions import ( 7 | MonkeyLearnResponseException, get_exception_class, PlanRateLimitError 8 | ) 9 | 10 | 11 | class MonkeyLearnResponse(object): 12 | def __init__(self, raw_responses=None): 13 | if raw_responses is None: 14 | raw_responses = [] 15 | elif isinstance(raw_responses, requests.Response): 16 | raw_responses = [raw_responses] 17 | 18 | self.raw_responses = [] 19 | for rr in raw_responses: 20 | self.add_raw_response(rr) 21 | 22 | self._cached_body = None 23 | 24 | def _get_last_request_header(self, header_name): 25 | try: 26 | last_response = self.raw_responses[-1] 27 | except IndexError: 28 | return None 29 | return last_response.headers.get(header_name) 30 | 31 | @property 32 | def request_count(self): 33 | return len(self.raw_responses) 34 | 35 | @property 36 | def plan_queries_allowed(self): 37 | return int(self._get_last_request_header('X-Query-Limit-Limit')) 38 | 39 | @property 40 | def plan_queries_remaining(self): 41 | return int(self._get_last_request_header('X-Query-Limit-Remaining')) 42 | 43 | @property 44 | def request_queries_used(self): 45 | query_count = 0 46 | for r in self.raw_responses: 47 | query_count += int(r.headers['X-Query-Limit-Request-Queries']) 48 | return query_count 49 | 50 | @property 51 | def body(self): 52 | if not self._cached_body: 53 | if self.request_count == 1: 54 | body = self.raw_responses[0].json() if self.raw_responses[0].content else None 55 | else: 56 | # Batched response, assume 2xx response bodies are lists (classify, extract) 57 | body = [result for rr in self.raw_responses for result in rr.json() if rr.content] 58 | self._cached_body = body 59 | return self._cached_body 60 | 61 | def failed_raw_responses(self): 62 | return [r for r in self if r.status_code != requests.codes.ok] 63 | 64 | def successful_raw_responses(self): 65 | return [r for r in self if r.status_code == requests.codes.ok] 66 | 67 | def __iter__(self): 68 | for r in self.raw_responses: 69 | yield r 70 | 71 | def add_raw_response(self, raw_response): 72 | # Invalidate cached body 73 | self._cached_body = None 74 | self.raw_responses.append(raw_response) 75 | if raw_response.status_code != requests.codes.ok: 76 | self.raise_for_status(raw_response) 77 | 78 | def raise_for_status(self, raw_response): 79 | try: 80 | body = raw_response.json() 81 | except ValueError: 82 | raise MonkeyLearnResponseException(status_code=raw_response.status_code, 83 | detail='Non-JSON response from server') 84 | 85 | exception_class = get_exception_class(status_code=raw_response.status_code, 86 | error_code=body.get('error_code')) 87 | exception_kwargs = dict( 88 | status_code=raw_response.status_code, 89 | detail=body.get('detail', 'Internal server error'), 90 | error_code=body.get('error_code'), 91 | response=self, 92 | ) 93 | if exception_class == PlanRateLimitError: 94 | seconds_to_wait = int(body.get('seconds_to_wait', 60)) 95 | exception_kwargs['seconds_to_wait'] = seconds_to_wait 96 | 97 | raise exception_class(**exception_kwargs) 98 | -------------------------------------------------------------------------------- /monkeylearn/settings.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | 5 | DEFAULT_BATCH_SIZE = 200 6 | MAX_BATCH_SIZE = 500 7 | DEFAULT_BASE_URL = 'https://api.monkeylearn.com/' 8 | -------------------------------------------------------------------------------- /monkeylearn/validation.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | import six 5 | import re 6 | 7 | from monkeylearn.settings import MAX_BATCH_SIZE 8 | from monkeylearn.exceptions import LocalParamValidationError 9 | 10 | 11 | ORDER_BY_FIELD_RE = re.compile(r'^-?[a-z_]+$') 12 | 13 | 14 | def validate_batch_size(batch_size): 15 | if batch_size > MAX_BATCH_SIZE: 16 | raise LocalParamValidationError('batch_size must be less than {0}'.format(MAX_BATCH_SIZE)) 17 | 18 | 19 | def validate_order_by_param(order_by_param): 20 | def validate_order_by_field(order_by_field): 21 | if ',' in order_by_field: 22 | raise LocalParamValidationError( 23 | "'order_by' parameter has an invalid ',' (comma) character, try sending a list of " 24 | "strings if you need to specify multiple fields." 25 | ) 26 | if not ORDER_BY_FIELD_RE.match(order_by_field): 27 | raise LocalParamValidationError( 28 | "'order_by' parameter as a string must be a valid field name, invalid characters " 29 | "where found." 30 | ) 31 | return order_by_field 32 | 33 | order_by = [] 34 | 35 | if isinstance(order_by_param, six.string_types): 36 | order_by.append(validate_order_by_field(order_by_param)) 37 | else: 38 | try: 39 | order_by_param = list(order_by_param) 40 | except TypeError: 41 | raise LocalParamValidationError( 42 | "'order_by' parameter must be a string or a list (or iterable) of strings" 43 | ) 44 | 45 | if not len(order_by_param): 46 | raise LocalParamValidationError( 47 | "'order_by' parameter must be a list (or iterable) of at least one string" 48 | ) 49 | 50 | seen_fields = set() 51 | for order_by_field in order_by_param: 52 | if not isinstance(order_by_field, six.string_types): 53 | raise LocalParamValidationError( 54 | "'order_by' parameter must be a list (or iterable) of strings, non-string" 55 | "values were found" 56 | ) 57 | 58 | order_by_field = validate_order_by_field(order_by_field) 59 | 60 | if order_by_field in seen_fields: 61 | raise LocalParamValidationError( 62 | "'order_by' parameter must be a list unique field names, duplicated fields " 63 | "where found." 64 | ) 65 | 66 | order_by.append(order_by_field) 67 | 68 | return ','.join(order_by) 69 | -------------------------------------------------------------------------------- /monkeylearn/workflows.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import print_function, unicode_literals, division, absolute_import 3 | 4 | import warnings 5 | 6 | from monkeylearn.base import ModelEndpointSet 7 | from monkeylearn.response import MonkeyLearnResponse 8 | 9 | 10 | class Workflows(ModelEndpointSet): 11 | model_type = 'workflows' 12 | 13 | @property 14 | def steps(self): 15 | if not hasattr(self, '_steps'): 16 | self._steps = WorkflowSteps(self.token, self.base_url) 17 | return self._steps 18 | 19 | @property 20 | def data(self): 21 | if not hasattr(self, '_data'): 22 | self._data = WorkflowData(self.token, self.base_url) 23 | return self._data 24 | 25 | @property 26 | def custom_fields(self): 27 | if not hasattr(self, '_custom_fields'): 28 | self._custom_fields = WorkflowCustomFields(self.token, self.base_url) 29 | return self._custom_fields 30 | 31 | def create(self, name, db_name, steps, description='', webhook_url=None, custom_fields=None, 32 | sources=None, retry_if_throttled=True): 33 | if db_name: 34 | warnings.warn('Note: db_name parameter is ignored by the API and will be removed soon') 35 | data = self.remove_none_value({ 36 | 'name': name, 37 | 'description': description, 38 | 'webhook_url': webhook_url, 39 | 'steps': steps, 40 | 'custom_fields': custom_fields, 41 | 'sources': sources 42 | }) 43 | url = self.get_list_url() 44 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 45 | return MonkeyLearnResponse(response) 46 | 47 | def detail(self, model_id, retry_if_throttled=True): 48 | url = self.get_detail_url(model_id) 49 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 50 | return MonkeyLearnResponse(response) 51 | 52 | def delete(self, model_id, retry_if_throttled=True): 53 | url = self.get_detail_url(model_id) 54 | response = self.make_request('DELETE', url, retry_if_throttled=retry_if_throttled) 55 | return MonkeyLearnResponse(response) 56 | 57 | 58 | class WorkflowSteps(ModelEndpointSet): 59 | model_type = ('workflows', 'steps') 60 | 61 | def detail(self, model_id, step_id, retry_if_throttled=True): 62 | url = self.get_nested_list_url(model_id, step_id) 63 | response = self.make_request('GET', url, retry_if_throttled=retry_if_throttled) 64 | return MonkeyLearnResponse(response) 65 | 66 | def create(self, model_id, name, step_model_id, input=None, conditions=None, 67 | retry_if_throttled=True): 68 | data = self.remove_none_value({ 69 | 'name': name, 70 | 'model_id': step_model_id, 71 | 'input': input, 72 | 'conditions': conditions, 73 | }) 74 | url = self.get_nested_list_url(model_id) 75 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 76 | return MonkeyLearnResponse(response) 77 | 78 | def delete(self, model_id, step_id, retry_if_throttled=True): 79 | url = self.get_nested_list_url(model_id, step_id) 80 | response = self.make_request('DELETE', url, retry_if_throttled=retry_if_throttled) 81 | return MonkeyLearnResponse(response) 82 | 83 | 84 | class WorkflowData(ModelEndpointSet): 85 | model_type = ('workflows', 'data') 86 | 87 | def create(self, model_id, data, retry_if_throttled=True): 88 | data = {'data': data} 89 | url = self.get_nested_list_url(model_id) 90 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 91 | return MonkeyLearnResponse(response) 92 | 93 | def list(self, model_id, batch_id=None, is_processed=None, sent_to_process_date_from=None, 94 | sent_to_process_date_to=None, page=None, per_page=None, retry_if_throttled=True): 95 | params = self.remove_none_value({ 96 | 'batch_id': batch_id, 97 | 'is_processed': is_processed, 98 | 'sent_to_process_date_from': sent_to_process_date_from, 99 | 'sent_to_process_date_to': sent_to_process_date_to, 100 | 'page': page, 101 | 'per_page': per_page, 102 | }) 103 | url = self.get_nested_list_url(model_id) 104 | response = self.make_request('GET', url, params=params, 105 | retry_if_throttled=retry_if_throttled) 106 | return MonkeyLearnResponse(response) 107 | 108 | 109 | class WorkflowCustomFields(ModelEndpointSet): 110 | model_type = ('workflows', 'custom-fields') 111 | 112 | def create(self, model_id, name, data_type, retry_if_throttled=True): 113 | data = {'name': name, 'type': data_type} 114 | url = self.get_nested_list_url(model_id) 115 | response = self.make_request('POST', url, data, retry_if_throttled=retry_if_throttled) 116 | return MonkeyLearnResponse(response) 117 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md 3 | 4 | [flake8] 5 | max-line-length = 100 6 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | from setuptools import setup, find_packages 4 | from os import path 5 | 6 | 7 | this_directory = path.abspath(path.dirname(__file__)) 8 | with open(path.join(this_directory, 'README.md')) as f: 9 | long_description = f.read() 10 | 11 | setup( 12 | name='monkeylearn', 13 | version='3.6.0', 14 | author='MonkeyLearn', 15 | author_email='hello@monkeylearn.com', 16 | description='Official Python client for the MonkeyLearn API', 17 | long_description=long_description, 18 | long_description_content_type="text/markdown", 19 | url='https://github.com/monkeylearn/monkeylearn-python', 20 | download_url='https://github.com/monkeylearn/monkeylearn-python/tarball/v3.2.4', 21 | keywords=['monkeylearn', 'machine learning', 'python'], 22 | classifiers=[ 23 | 'Development Status :: 5 - Production/Stable', 24 | 'Programming Language :: Python :: 2', 25 | 'Programming Language :: Python :: 3', 26 | 'License :: OSI Approved :: MIT License', 27 | ], 28 | package_dir={'': '.'}, 29 | packages=find_packages('.'), 30 | install_requires=[ 31 | # use "pip install requests[security]" for taking out the warnings 32 | 'requests>=2.8.1', 33 | 'six>=1.10.0', 34 | ], 35 | ) 36 | --------------------------------------------------------------------------------