├── tests ├── __init__.py ├── test_detectlanguage.py ├── test_api.py └── test_proxy.py ├── requirements.txt ├── setup.cfg ├── test-requirements.txt ├── detectlanguage ├── exceptions.py ├── configuration.py ├── __init__.py ├── client.py └── api.py ├── pytest.ini ├── mise.toml ├── .gitignore ├── .github └── workflows │ ├── main.yml │ └── publish.yml ├── CHANGELOG.md ├── setup.py ├── LICENSE └── README.md /tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests>=1.2.0 2 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description_file = README.md 3 | -------------------------------------------------------------------------------- /test-requirements.txt: -------------------------------------------------------------------------------- 1 | pytest>=7.0.0 2 | pytest-mock>=3.10.0 3 | pytest-cov>=4.0.0 4 | -------------------------------------------------------------------------------- /detectlanguage/exceptions.py: -------------------------------------------------------------------------------- 1 | class DetectLanguageError(BaseException): 2 | def __init__(self, *args, **kwargs): 3 | super(DetectLanguageError, self).__init__(*args, **kwargs) 4 | -------------------------------------------------------------------------------- /detectlanguage/configuration.py: -------------------------------------------------------------------------------- 1 | import detectlanguage 2 | 3 | class Configuration: 4 | api_key = None 5 | api_version = 'v3' 6 | host = 'ws.detectlanguage.com' 7 | user_agent = 'Detect Language API Python Client ' + detectlanguage.__version__ 8 | timeout = 5 9 | proxies = None # e.g., {'https': 'https://proxy.example.com:8080'} 10 | -------------------------------------------------------------------------------- /pytest.ini: -------------------------------------------------------------------------------- 1 | [tool:pytest] 2 | testpaths = tests 3 | python_files = test_*.py 4 | python_classes = Test* 5 | python_functions = test_* 6 | addopts = 7 | --verbose 8 | --tb=short 9 | --strict-markers 10 | markers = 11 | slow: marks tests as slow (deselect with '-m "not slow"') 12 | integration: marks tests as integration tests -------------------------------------------------------------------------------- /detectlanguage/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = '2.0.0' 2 | 3 | from .exceptions import * 4 | from .configuration import Configuration 5 | from .client import Client 6 | from .api import detect, detect_code, detect_batch, account_status, languages 7 | 8 | # deprecated functions 9 | from .api import simple_detect, user_status 10 | 11 | configuration = Configuration() 12 | client = Client(configuration) 13 | -------------------------------------------------------------------------------- /tests/test_detectlanguage.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import detectlanguage 3 | 4 | class TestDetectlanguage: 5 | def testDefaults(self): 6 | assert detectlanguage.configuration.api_version == 'v3' 7 | assert detectlanguage.configuration.host == 'ws.detectlanguage.com' 8 | 9 | def testConfiguration(self): 10 | detectlanguage.configuration.api_key = 'TEST' 11 | assert detectlanguage.client.configuration.api_key == 'TEST' 12 | -------------------------------------------------------------------------------- /mise.toml: -------------------------------------------------------------------------------- 1 | [tools] 2 | 3 | [env] 4 | _.file = ".env.local" 5 | _.python.venv = { path = ".venv", create = true } 6 | 7 | [tasks] 8 | install-dev = "pip install -r requirements.txt && pip install -r test-requirements.txt && pip install build" 9 | test = "pytest" 10 | test-cov = "pytest --cov=detectlanguage --cov-report=term-missing" 11 | test-verbose = "pytest -v" 12 | console = "python -c \"import detectlanguage; import code; code.interact(local=locals())\"" 13 | build = "python3 -m build" 14 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env.local 2 | .venv 3 | 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | *.py[cod] 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | env/ 14 | bin/ 15 | build/ 16 | develop-eggs/ 17 | dist/ 18 | eggs/ 19 | lib/ 20 | lib64/ 21 | parts/ 22 | sdist/ 23 | var/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # Installer logs 29 | pip-log.txt 30 | pip-delete-this-directory.txt 31 | 32 | # Unit test / coverage reports 33 | .tox/ 34 | .coverage 35 | .cache 36 | nosetests.xml 37 | coverage.xml 38 | 39 | # Translations 40 | *.mo 41 | 42 | # Mr Developer 43 | .mr.developer.cfg 44 | .project 45 | .pydevproject 46 | 47 | # Rope 48 | .ropeproject 49 | 50 | # Django stuff: 51 | *.log 52 | *.pot 53 | 54 | # Sphinx documentation 55 | docs/_build/ 56 | 57 | -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: Build 2 | on: [push,pull_request] 3 | jobs: 4 | build: 5 | runs-on: ubuntu-latest 6 | strategy: 7 | matrix: 8 | python-version: 9 | - '3.8' 10 | - '3.9' 11 | - '3.10' 12 | - '3.11' 13 | - '3.12' 14 | - '3.13' 15 | - '3.14.0-beta.4' 16 | name: Python ${{ matrix.python-version }} sample 17 | steps: 18 | - uses: actions/checkout@v4 19 | - uses: actions/setup-python@v4 20 | with: 21 | python-version: ${{ matrix.python-version }} 22 | cache: 'pip' 23 | - name: Install dependencies 24 | run: | 25 | pip install -r requirements.txt 26 | pip install -r test-requirements.txt 27 | - name: Run tests 28 | env: 29 | DETECTLANGUAGE_API_KEY: ${{ secrets.DETECTLANGUAGE_API_KEY }} 30 | run: pytest 31 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | All notable changes to this project will be documented in this file. 4 | 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 7 | 8 | 9 | ## v2.0.0 10 | 11 | ### Added 12 | - `detect_batch` method for batch detections 13 | - Proxy support 14 | 15 | ### Changed 16 | - Switched to v3 API which uses updated language detection model 17 | - ⚠️ `detect` method result fields are `language` and `score` 18 | 19 | ### Deprecated 20 | - Calling `detect()` with list argument. Use `detect_batch` instead. 21 | - `simple_detect()` - Use `detect_code()` instead. Will be removed in a future version. 22 | - `user_status()` - Use `account_status()` instead. Will be removed in a future version. 23 | 24 | ### Removed 25 | - Secure mode configuration. HTTPS is used by default. 26 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools.depends import get_module_constant 4 | from setuptools import setup 5 | 6 | with open("README.md", "r") as fh: 7 | long_description = fh.read() 8 | 9 | setup( 10 | name = 'detectlanguage', 11 | packages = ['detectlanguage'], 12 | version = get_module_constant('detectlanguage', '__version__'), 13 | description = 'Language Detection API Client', 14 | long_description=long_description, 15 | long_description_content_type="text/markdown", 16 | author = 'Laurynas Butkus', 17 | author_email = 'info@detectlanguage.com', 18 | url = 'https://github.com/detectlanguage/detectlanguage-python', 19 | download_url = 'https://github.com/detectlanguage/detectlanguage-python', 20 | keywords = ['language', 'identification', 'detection', 'api', 'client'], 21 | install_requires= ['requests>=2.4.2'], 22 | classifiers = [], 23 | license = 'MIT', 24 | ) 25 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014 Laurynas Butkus 2 | 3 | MIT License 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /.github/workflows/publish.yml: -------------------------------------------------------------------------------- 1 | name: Publish Python 🐍 distribution 📦 to PyPI 2 | 3 | on: 4 | release: 5 | types: [published] 6 | 7 | jobs: 8 | build: 9 | name: Build distribution 📦 10 | runs-on: ubuntu-latest 11 | 12 | steps: 13 | - uses: actions/checkout@v4 14 | with: 15 | persist-credentials: false 16 | - name: Set up Python 17 | uses: actions/setup-python@v5 18 | with: 19 | python-version: "3.x" 20 | - name: Install pypa/build 21 | run: >- 22 | python3 -m 23 | pip install 24 | build 25 | --user 26 | - name: Build a binary wheel and a source tarball 27 | run: python3 -m build 28 | - name: Store the distribution packages 29 | uses: actions/upload-artifact@v4 30 | with: 31 | name: python-package-distributions 32 | path: dist/ 33 | 34 | publish-to-pypi: 35 | name: >- 36 | Publish Python 🐍 distribution 📦 to PyPI 37 | needs: 38 | - build 39 | runs-on: ubuntu-latest 40 | environment: 41 | name: pypi 42 | url: https://pypi.org/p/detectlanguage 43 | permissions: 44 | id-token: write # IMPORTANT: mandatory for trusted publishing 45 | 46 | steps: 47 | - name: Download all the dists 48 | uses: actions/download-artifact@v4 49 | with: 50 | name: python-package-distributions 51 | path: dist/ 52 | - name: Publish distribution 📦 to PyPI 53 | uses: pypa/gh-action-pypi-publish@release/v1 54 | -------------------------------------------------------------------------------- /detectlanguage/client.py: -------------------------------------------------------------------------------- 1 | import requests 2 | 3 | from .exceptions import * 4 | from requests.exceptions import HTTPError 5 | 6 | try: 7 | from json.decoder import JSONDecodeError 8 | except ImportError: 9 | JSONDecodeError = ValueError 10 | 11 | class Client: 12 | def __init__(self, configuration): 13 | self.configuration = configuration 14 | 15 | def get(self, path, payload = {}): 16 | r = requests.get(self.url(path), params=payload, headers = self.headers(), timeout = self.configuration.timeout, proxies = self.configuration.proxies) 17 | return self.handle_response(r) 18 | 19 | def post(self, path, payload): 20 | r = requests.post(self.url(path), json=payload, headers = self.headers(), timeout = self.configuration.timeout, proxies = self.configuration.proxies) 21 | return self.handle_response(r) 22 | 23 | def handle_response(self, r): 24 | try: 25 | r.raise_for_status() 26 | 27 | return r.json() 28 | except HTTPError as err: 29 | self.handle_http_error(r, err) 30 | except JSONDecodeError: 31 | raise DetectLanguageError("Error decoding response JSON") 32 | 33 | def handle_http_error(self, r, err): 34 | try: 35 | json = r.json() 36 | 37 | if not 'error' in json: 38 | raise DetectLanguageError(err) 39 | 40 | raise DetectLanguageError(json['error']['message']) 41 | except JSONDecodeError: 42 | raise DetectLanguageError(err) 43 | 44 | def url(self, path): 45 | return "https://%s/%s/%s" % (self.configuration.host, self.configuration.api_version, path) 46 | 47 | def headers(self): 48 | return { 49 | 'User-Agent': self.configuration.user_agent, 50 | 'Authorization': 'Bearer ' + self.configuration.api_key, 51 | } 52 | -------------------------------------------------------------------------------- /detectlanguage/api.py: -------------------------------------------------------------------------------- 1 | import detectlanguage 2 | import warnings 3 | 4 | def detect(data): 5 | if isinstance(data, list): 6 | _warn_deprecated('use detect_batch instead for multiple texts') 7 | return detect_batch(data) 8 | 9 | return detectlanguage.client.post('detect', { 'q': data }) 10 | 11 | def detect_code(data): 12 | result = detect(data) 13 | return result[0]['language'] 14 | 15 | def detect_batch(data): 16 | return detectlanguage.client.post('detect-batch', { 'q': data }) 17 | 18 | def account_status(): 19 | return detectlanguage.client.get('account/status') 20 | 21 | def languages(): 22 | return detectlanguage.client.get('languages') 23 | 24 | 25 | ### DEPRECATED 26 | 27 | def simple_detect(data): 28 | """ 29 | DEPRECATED: This function is deprecated and will be removed in a future version. 30 | Use detect_code() instead. 31 | 32 | Args: 33 | data: Text to detect language for 34 | 35 | Returns: 36 | str: Language code of the detected language 37 | """ 38 | _warn_deprecated( 39 | "simple_detect() is deprecated and will be removed in a future version. " 40 | "Use detect_code() instead." 41 | ) 42 | return detect_code(data) 43 | 44 | def user_status(): 45 | """ 46 | DEPRECATED: This function is deprecated and will be removed in a future version. 47 | Use account_status() instead. 48 | 49 | Returns: 50 | dict: Account status information 51 | """ 52 | _warn_deprecated( 53 | "user_status() is deprecated and will be removed in a future version. " 54 | "Use account_status() instead." 55 | ) 56 | return account_status() 57 | 58 | def _warn_deprecated(message): 59 | """Internal utility function to emit deprecation warnings.""" 60 | warnings.warn( 61 | message, 62 | DeprecationWarning, 63 | stacklevel=2 64 | ) 65 | -------------------------------------------------------------------------------- /tests/test_api.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import pytest 4 | import detectlanguage 5 | import os 6 | 7 | class TestApi: 8 | def setup_method(self): 9 | detectlanguage.configuration.api_key = os.environ['DETECTLANGUAGE_API_KEY'] 10 | 11 | def test_detect_code(self): 12 | result = detectlanguage.detect_code("Hello world") 13 | assert result == 'en' 14 | 15 | def test_detect(self): 16 | result = detectlanguage.detect("Hello world") 17 | assert result[0]['language'] == 'en' 18 | 19 | def test_detect_with_array(self): 20 | with pytest.warns(DeprecationWarning, match="use detect_batch"): 21 | detectlanguage.detect(["Hello world", "Ėjo ežiukas"]) 22 | 23 | def test_detect_unicode(self): 24 | result = detectlanguage.detect("Ėjo ežiukas") 25 | assert result[0]['language'] == 'lt' 26 | 27 | def test_detect_batch(self): 28 | result = detectlanguage.detect_batch(["Hello world", "Ėjo ežiukas"]) 29 | assert result[0][0]['language'] == 'en' 30 | assert result[1][0]['language'] == 'lt' 31 | 32 | def test_account_status(self): 33 | result = detectlanguage.account_status() 34 | assert result['status'] == 'ACTIVE' 35 | 36 | def test_languages(self): 37 | result = detectlanguage.languages() 38 | assert { 'code': 'en', 'name': 'English' } in result 39 | 40 | def test_simple_detect(self): 41 | with pytest.warns(DeprecationWarning, match="simple_detect.*deprecated"): 42 | result = detectlanguage.simple_detect("Hello world") 43 | assert result == 'en' 44 | 45 | def test_user_status(self): 46 | with pytest.warns(DeprecationWarning, match="user_status.*deprecated"): 47 | result = detectlanguage.user_status() 48 | assert result['status'] == 'ACTIVE' 49 | 50 | class TestApiErrors: 51 | def test_invalid_key(self): 52 | detectlanguage.configuration.api_key = 'invalid' 53 | with pytest.raises(detectlanguage.DetectLanguageError): 54 | detectlanguage.detect("Hello world") 55 | -------------------------------------------------------------------------------- /tests/test_proxy.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from unittest.mock import patch, MagicMock 3 | import detectlanguage 4 | 5 | 6 | class TestProxyConfiguration(unittest.TestCase): 7 | def setUp(self): 8 | detectlanguage.configuration.proxies = None 9 | 10 | def test_proxy_configuration(self): 11 | """Test proxy configuration""" 12 | detectlanguage.configuration.proxies = {'https': 'https://proxy.example.com:8080'} 13 | self.assertEqual(detectlanguage.configuration.proxies, {'https': 'https://proxy.example.com:8080'}) 14 | 15 | @patch('requests.get') 16 | def test_client_uses_proxy(self, mock_get): 17 | """Test that client uses configured proxy""" 18 | detectlanguage.configuration.proxies = {'https': 'https://proxy.example.com:8080'} 19 | 20 | mock_response = MagicMock() 21 | mock_response.json.return_value = {'test': 'data'} 22 | mock_response.raise_for_status.return_value = None 23 | mock_get.return_value = mock_response 24 | 25 | detectlanguage.account_status() 26 | 27 | mock_get.assert_called_once() 28 | self.assertEqual(mock_get.call_args[1]['proxies'], {'https': 'https://proxy.example.com:8080'}) 29 | 30 | @patch('requests.get') 31 | def test_client_no_proxy_when_disabled(self, mock_get): 32 | """Test that client doesn't use proxy when disabled""" 33 | detectlanguage.configuration.proxies = None 34 | 35 | mock_response = MagicMock() 36 | mock_response.json.return_value = {'test': 'data'} 37 | mock_response.raise_for_status.return_value = None 38 | mock_get.return_value = mock_response 39 | 40 | detectlanguage.account_status() 41 | 42 | mock_get.assert_called_once() 43 | self.assertIsNone(mock_get.call_args[1]['proxies']) 44 | 45 | 46 | if __name__ == '__main__': 47 | unittest.main() -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Detect Language API Python Client 2 | ======== 3 | 4 | [![PyPI version](https://badge.fury.io/py/detectlanguage.svg)](https://badge.fury.io/py/detectlanguage) 5 | [![Build Status](https://github.com/detectlanguage/detectlanguage-python/actions/workflows/main.yml/badge.svg)](https://github.com/detectlanguage/detectlanguage-python/actions) 6 | 7 | Detects language of given text. Returns detected language codes and scores. 8 | 9 | Before using Detect Language API client you have to setup your personal API key. 10 | You can get it by signing up at https://detectlanguage.com 11 | 12 | ## Installation 13 | 14 | ``` 15 | pip install detectlanguage 16 | ``` 17 | 18 | ### Upgrading 19 | 20 | When upgrading please check [changelog](CHANGELOG.md) for breaking changes. 21 | 22 | ### Configuration 23 | 24 | ```python 25 | import detectlanguage 26 | 27 | detectlanguage.configuration.api_key = "YOUR API KEY" 28 | 29 | # You can use proxy if needed 30 | # detectlanguage.configuration.proxies = {'https': 'https://user:pass@proxy:8080'} 31 | ``` 32 | 33 | ## Usage 34 | 35 | ### Detect language 36 | 37 | ```python 38 | detectlanguage.detect("Dolce far niente") 39 | ``` 40 | 41 | #### Result 42 | 43 | ```python 44 | [{'language': 'it', 'score': 0.5074}] 45 | ``` 46 | 47 | ### Detect single code 48 | 49 | If you need just a language code you can use `detect_code`. 50 | 51 | ```python 52 | detectlanguage.detect_code("Dolce far niente") 53 | ``` 54 | 55 | #### Result 56 | 57 | ```python 58 | 'it' 59 | ``` 60 | 61 | ### Batch detection 62 | 63 | It is possible to detect language of several texts with one request. 64 | This method is faster than doing one request per text. 65 | 66 | ```python 67 | detectlanguage.detect_batch(["Dolce far niente", "Hello world"]) 68 | ``` 69 | 70 | #### Result 71 | 72 | Result is array of detections in the same order as the texts were passed. 73 | 74 | ```python 75 | [[{'language': 'it', 'score': 0.5074}], [{'language': 'en', 'score': 0.9098}]] 76 | ``` 77 | 78 | ### Get your account status 79 | 80 | ```python 81 | detectlanguage.account_status() 82 | ``` 83 | 84 | #### Result 85 | 86 | ```python 87 | { 'status': 'ACTIVE', 'daily_requests_limit': 5000, 'daily_bytes_limit': 1048576, 88 | 'bytes': 3151, 'plan': 'FREE', 'date': '2014-03-29', 'requests': 263, 89 | 'plan_expires': None } 90 | ``` 91 | 92 | ### Get list of supported languages 93 | 94 | ```python 95 | detectlanguage.languages() 96 | ``` 97 | 98 | #### Result 99 | 100 | ```python 101 | [{'code': 'aa', 'name': 'Afar'}, {'code': 'ab', 'name': 'Abkhazian'}, ...] 102 | ``` 103 | 104 | ## License 105 | 106 | Detect Language API Python Client is free software, and may be redistributed under the terms specified in the MIT-LICENSE file. 107 | --------------------------------------------------------------------------------