├── .coveragerc ├── .dockerignore ├── .github └── workflows │ ├── ci.yml │ └── mkdocs.yml ├── .gitignore ├── AUTHORS.md ├── CHANGELOG.md ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE ├── MANIFEST.in ├── README.md ├── dev-requirements.txt ├── docs ├── configuration.md ├── index.md ├── migration.md ├── migration2to3.md ├── properties.md └── sources.md ├── guessit.spec ├── guessit ├── __init__.py ├── __main__.py ├── __version__.py ├── api.py ├── config │ ├── __init__.py │ └── options.json ├── data │ ├── __init__.py │ └── tlds-alpha-by-domain.txt ├── jsonutils.py ├── monkeypatch.py ├── options.py ├── reutils.py ├── rules │ ├── __init__.py │ ├── common │ │ ├── __init__.py │ │ ├── comparators.py │ │ ├── date.py │ │ ├── expected.py │ │ ├── formatters.py │ │ ├── numeral.py │ │ ├── pattern.py │ │ ├── quantity.py │ │ ├── validators.py │ │ └── words.py │ ├── markers │ │ ├── __init__.py │ │ ├── groups.py │ │ └── path.py │ ├── match_processors.py │ ├── processors.py │ └── properties │ │ ├── __init__.py │ │ ├── audio_codec.py │ │ ├── bit_rate.py │ │ ├── bonus.py │ │ ├── cd.py │ │ ├── container.py │ │ ├── country.py │ │ ├── crc.py │ │ ├── date.py │ │ ├── edition.py │ │ ├── episode_title.py │ │ ├── episodes.py │ │ ├── film.py │ │ ├── language.py │ │ ├── mimetype.py │ │ ├── other.py │ │ ├── part.py │ │ ├── release_group.py │ │ ├── screen_size.py │ │ ├── size.py │ │ ├── source.py │ │ ├── streaming_service.py │ │ ├── title.py │ │ ├── type.py │ │ ├── video_codec.py │ │ └── website.py ├── test │ ├── __init__.py │ ├── config │ │ ├── dummy.txt │ │ ├── test.json │ │ ├── test.yaml │ │ └── test.yml │ ├── enable_disable_properties.yml │ ├── episodes.yml │ ├── movies.yml │ ├── rules │ │ ├── __init__.py │ │ ├── audio_codec.yml │ │ ├── bonus.yml │ │ ├── cd.yml │ │ ├── common_words.yml │ │ ├── country.yml │ │ ├── date.yml │ │ ├── edition.yml │ │ ├── episodes.yml │ │ ├── film.yml │ │ ├── language.yml │ │ ├── other.yml │ │ ├── part.yml │ │ ├── processors.yml │ │ ├── processors_test.py │ │ ├── release_group.yml │ │ ├── screen_size.yml │ │ ├── size.yml │ │ ├── source.yml │ │ ├── title.yml │ │ ├── video_codec.yml │ │ └── website.yml │ ├── streaming_services.yaml │ ├── suggested.json │ ├── test-input-file.txt │ ├── test_api.py │ ├── test_api_unicode_literals.py │ ├── test_benchmark.py │ ├── test_main.py │ ├── test_options.py │ ├── test_yml.py │ └── various.yml └── yamlutils.py ├── mkdocs.yml ├── pylintrc ├── pyproject.toml ├── pytest.ini ├── requirements.txt ├── setup.py └── tox.ini /.coveragerc: -------------------------------------------------------------------------------- 1 | # .coveragerc to control coverage.py 2 | [run] 3 | omit = 4 | guessit/__version__.py 5 | guessit/test/* 6 | [report] 7 | exclude_lines = 8 | pragma: no cover 9 | -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | *.py[cod] 3 | **/__pycache__/ 4 | **/*.py[cod] 5 | .benchmarks/ 6 | .cache/ 7 | .eggs/ 8 | *.egg-info/ 9 | *.egg 10 | .tox/ 11 | .coverage 12 | .python-version 13 | doc/ 14 | *.log 15 | *.iml -------------------------------------------------------------------------------- /.github/workflows/mkdocs.yml: -------------------------------------------------------------------------------- 1 | name: mkdocs 2 | on: 3 | push: 4 | branches: 5 | - master 6 | jobs: 7 | deploy: 8 | runs-on: ubuntu-latest 9 | steps: 10 | - uses: actions/checkout@v3 11 | - uses: actions/setup-python@v4 12 | with: 13 | python-version: 3.x 14 | 15 | - run: pip install mkdocs mkdocs-material 16 | - run: mkdocs build 17 | 18 | - name: Deploy 🚀 19 | uses: JamesIves/github-pages-deploy-action@v4 20 | with: 21 | token: ${{ secrets.GITHUB_TOKEN }} 22 | branch: gh-pages 23 | folder: site 24 | clean: true 25 | single-commit: true 26 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | dist/ 5 | 6 | # Python dist 7 | *.egg-info/ 8 | .eggs/ 9 | build/ 10 | 11 | # Coverage 12 | .coverage 13 | 14 | # PyEnv 15 | .python-version 16 | 17 | # Tox 18 | .tox/ 19 | 20 | # py.test 21 | lastfailed 22 | .pytest_cache/ 23 | 24 | # Jetbrain 25 | *.iml 26 | .idea/ 27 | 28 | # docs 29 | docs/_build/ 30 | 31 | -------------------------------------------------------------------------------- /AUTHORS.md: -------------------------------------------------------------------------------- 1 | Copyright (c) 2011 - 2020, The GuessIt contributors. 2 | 3 | GuessIt is an opensource project written and maintained by passionate 4 | people. 5 | 6 | If you feel your name should belong to this list, please [open an 7 | issue](https://github.com/guessit/guessit/issues) 8 | 9 | Author and contributors of current guessit version (`2.x`/`3.x`): 10 | 11 | - Rémi Alvergnat <> 12 | - Rato <> 13 | 14 | Author and contributors of initial guessit version (`0.x`/`1.x`): 15 | 16 | - Nicolas Wack <> 17 | - Ricard Marxer <> 18 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contribute 2 | 3 | GuessIt is under active development, and contributions are more than 4 | welcome! 5 | 6 | 1. Check for open issues or open a fresh issue to start a discussion 7 | around a feature idea or a bug. There is a Contributor Friendly tag 8 | for issues that should be ideal for people who are not very familiar 9 | with the codebase yet. 10 | 2. Fork [the repository][] on Github to start making your changes to 11 | the **develop** branch (or branch off of it). 12 | 3. Write a test which shows that the bug was fixed or that the feature 13 | works as expected. 14 | 4. Send a pull request and bug the maintainer until it gets merged and 15 | published. :) 16 | 17 | # License 18 | 19 | GuessIt is licensed under the [LGPLv3 license][]. 20 | 21 | [the repository]: https://github.com/guessit-io/guessit 22 | [LGPLv3 license]: http://www.gnu.org/licenses/lgpl.html -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.7-alpine 2 | 3 | MAINTAINER Rémi Alvergnat 4 | 5 | WORKDIR /root 6 | 7 | COPY / /root/guessit/ 8 | WORKDIR /root/guessit/ 9 | 10 | RUN pip install -e . 11 | 12 | ENTRYPOINT ["guessit"] 13 | 14 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | recursive-include guessit *.py *.yml *.txt *.ini *.json *.yaml *.yml 2 | recursive-exclude guessit *.pyc 3 | include LICENSE 4 | include *.md 5 | include *.yml 6 | include *.ini 7 | include *.cfg 8 | include *.txt 9 | include .coveragerc 10 | include pylintrc 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | GuessIt 2 | 3 | [![Latest Version](https://img.shields.io/pypi/v/guessit.svg)](https://pypi.python.org/pypi/guessit) 4 | [![LGPLv3 License](https://img.shields.io/badge/license-LGPLv3-blue.svg)]() 5 | [![Codecov](https://img.shields.io/codecov/c/github/guessit-io/guessit)](https://codecov.io/gh/guessit-io/guessit) 6 | [![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/relekang/python-semantic-release) 7 | 8 | GuessIt is a python library that extracts as much information as 9 | possible from a video filename. 10 | 11 | It has a very powerful matcher that allows to guess properties from a 12 | video using its filename only. This matcher works with both movies and 13 | tv shows episodes. 14 | 15 | For example, GuessIt can do the following: 16 | 17 | $ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi" 18 | For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi 19 | GuessIt found: { 20 | "title": "Treme", 21 | "season": 1, 22 | "episode": 3, 23 | "episode_title": "Right Place, Wrong Time", 24 | "source": "HDTV", 25 | "video_codec": "Xvid", 26 | "release_group": "NoTV", 27 | "container": "avi", 28 | "mimetype": "video/x-msvideo", 29 | "type": "episode" 30 | } 31 | 32 | More information is available at [guessit-io.github.io/guessit](https://guessit-io.github.io/guessit). 33 | 34 | Support 35 | ------- 36 | 37 | This project is hosted on [GitHub](https://github.com/guessit-io/guessit). Feel free to open an issue if you think you have found a bug or something is missing in guessit. 38 | 39 | GuessIt relies on [Rebulk](https://github.com/Toilal/rebulk) project for pattern and rules registration. 40 | 41 | License 42 | ------- 43 | 44 | GuessIt is licensed under the [LGPLv3 license](http://www.gnu.org/licenses/lgpl.html). 45 | -------------------------------------------------------------------------------- /dev-requirements.txt: -------------------------------------------------------------------------------- 1 | -e .[dev,test] 2 | -------------------------------------------------------------------------------- /docs/configuration.md: -------------------------------------------------------------------------------- 1 | # Configuration files 2 | 3 | Guessit supports configuration through configuration files. 4 | 5 | Default configuration file is bundled inside guessit package from 6 | [config/options.json][] file. 7 | 8 | It is possible to disable the default configuration with 9 | `--no-default-config` option, but you have then to provide a full 10 | configuration file based on the default one. 11 | 12 | Configuration files are loaded from the following paths: 13 | 14 | > - `~/.guessit/options.(json|yml|yaml)` 15 | > - `~/.config/guessit/options.(json|yml|yaml)` 16 | 17 | It is also possible to disable those user configuration files with 18 | `no-user-config` option. 19 | 20 | Additional configuration files can be included using the `-c`/`--config` 21 | option. 22 | 23 | As many configuration files can be involved, they are deeply merged to 24 | keep all values inside the effective configuration. 25 | 26 | # Advanced configuration 27 | 28 | Configuration files contains all options available through the command 29 | line, but also an additional one named `advanced_config`. 30 | 31 | This advanced configuration contains all internal parameters and they 32 | are exposed to help you tweaking guessit to better fit your needs. 33 | 34 | If no `advanced_config` is declared through all effective configuration 35 | files, the default one will be used even when `--no-default-config` is 36 | used. 37 | 38 | We're willing to keep it backwards compatible, but in order to enhance 39 | Guessit, these parameters might change without prior notice. 40 | 41 | [config/options.json]: https://github.com/guessit-io/guessit/blob/master/guessit/config/options.json/ -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | GuessIt 2 | ======= 3 | 4 | [![Latest Version](https://img.shields.io/pypi/v/guessit.svg)](https://pypi.python.org/pypi/guessit) 5 | [![LGPLv3 License](https://img.shields.io/badge/license-LGPLv3-blue.svg)]() 6 | [![Build Status](https://img.shields.io/github/workflow/status/guessit-io/guessit/ci)](https://github.com/guessit-io/guessit/actions?query=workflow%3Aci) 7 | [![Coveralls](https://img.shields.io/coveralls/guessit-io/guessit/master.svg)](https://coveralls.io/github/guessit-io/guessit?branch=master) 8 | [![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/relekang/python-semantic-release) 9 | 10 | GuessIt is a python library that extracts as much information as 11 | possible from a video filename. 12 | 13 | It has a very powerful matcher that allows to guess properties from a 14 | video using its filename only. This matcher works with both movies and 15 | tv shows episodes. 16 | 17 | For example, GuessIt can do the following: 18 | 19 | $ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi" 20 | For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi 21 | GuessIt found: { 22 | "title": "Treme", 23 | "season": 1, 24 | "episode": 3, 25 | "episode_title": "Right Place, Wrong Time", 26 | "source": "HDTV", 27 | "video_codec": "Xvid", 28 | "release_group": "NoTV", 29 | "container": "avi", 30 | "mimetype": "video/x-msvideo", 31 | "type": "episode" 32 | } 33 | 34 | Migration note 35 | ----- 36 | 37 | In GuessIt 3, some properties and values were renamed in order to keep consistency and to be more intuitive. 38 | 39 | To migrate from guessit `2.x` to `3.x`, please read 40 | [migration2to3.md](./migration2to3.md). 41 | 42 | To migrate from guessit `0.x` or `1.x` to `guessit 2.x`, please read 43 | [migration.md](./migration.md). 44 | 45 | Install 46 | ----- 47 | 48 | Installing GuessIt is simple with [pip](http://www.pip-installer.org/): 49 | 50 | ```bash 51 | pip install guessit 52 | ``` 53 | 54 | You can also [install GuessIt from sources](./sources.md) 55 | 56 | Usage 57 | ----- 58 | 59 | GuessIt can be used from command line: 60 | 61 | ``` 62 | usage: guessit [-h] [-t TYPE] [-n] [-Y] [-D] [-L ALLOWED_LANGUAGES] 63 | [-C ALLOWED_COUNTRIES] [-E] [-T EXPECTED_TITLE] 64 | [-G EXPECTED_GROUP] [--includes INCLUDES] [--excludes EXCLUDES] 65 | [-f INPUT_FILE] [-v] [-P SHOW_PROPERTY] [-a] [-s] [-l] [-j] 66 | [-y] [-i] [-c CONFIG] [--no-user-config] [--no-default-config] 67 | [-p] [-V] [--version] 68 | [filename [filename ...]] 69 | 70 | positional arguments: 71 | filename Filename or release name to guess 72 | 73 | optional arguments: 74 | -h, --help show this help message and exit 75 | 76 | Naming: 77 | -t TYPE, --type TYPE The suggested file type: movie, episode. If undefined, 78 | type will be guessed. 79 | -n, --name-only Parse files as name only, considering "/" and "\" like 80 | other separators. 81 | -Y, --date-year-first 82 | If short date is found, consider the first digits as 83 | the year. 84 | -D, --date-day-first If short date is found, consider the second digits as 85 | the day. 86 | -L ALLOWED_LANGUAGES, --allowed-languages ALLOWED_LANGUAGES 87 | Allowed language (can be used multiple times) 88 | -C ALLOWED_COUNTRIES, --allowed-countries ALLOWED_COUNTRIES 89 | Allowed country (can be used multiple times) 90 | -E, --episode-prefer-number 91 | Guess "serie.213.avi" as the episode 213. Without this 92 | option, it will be guessed as season 2, episode 13 93 | -T EXPECTED_TITLE, --expected-title EXPECTED_TITLE 94 | Expected title to parse (can be used multiple times) 95 | -G EXPECTED_GROUP, --expected-group EXPECTED_GROUP 96 | Expected release group (can be used multiple times) 97 | --includes INCLUDES List of properties to be detected 98 | --excludes EXCLUDES List of properties to be ignored 99 | 100 | Input: 101 | -f INPUT_FILE, --input-file INPUT_FILE 102 | Read filenames from an input text file. File should 103 | use UTF-8 charset. 104 | 105 | Output: 106 | -v, --verbose Display debug output 107 | -P SHOW_PROPERTY, --show-property SHOW_PROPERTY 108 | Display the value of a single property (title, series, 109 | video_codec, year, ...) 110 | -a, --advanced Display advanced information for filename guesses, as 111 | json output 112 | -s, --single-value Keep only first value found for each property 113 | -l, --enforce-list Wrap each found value in a list even when property has 114 | a single value 115 | -j, --json Display information for filename guesses as json 116 | output 117 | -y, --yaml Display information for filename guesses as yaml 118 | output 119 | -i, --output-input-string 120 | Add input_string property in the output 121 | 122 | Configuration: 123 | -c CONFIG, --config CONFIG 124 | Filepath to configuration file. Configuration file 125 | contains the same options as those from command line 126 | options, but option names have "-" characters replaced 127 | with "_". This configuration will be merged with 128 | default and user configuration files. 129 | --no-user-config Disable user configuration. If not defined, guessit 130 | tries to read configuration files at 131 | ~/.guessit/options.(json|yml|yaml) and 132 | ~/.config/guessit/options.(json|yml|yaml) 133 | --no-default-config Disable default configuration. This should be done 134 | only if you are providing a full configuration through 135 | user configuration or --config option. If no 136 | "advanced_config" is provided by another configuration 137 | file, it will still be loaded from default 138 | configuration. 139 | 140 | Information: 141 | -p, --properties Display properties that can be guessed. 142 | -V, --values Display property values that can be guessed. 143 | --version Display the guessit version. 144 | ``` 145 | 146 | It can also be used as a python module: 147 | 148 | >>> from guessit import guessit 149 | >>> guessit('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi') 150 | MatchesDict([('title', 'Treme'), ('season', 1), ('episode', 3), ('episode_title', 'Right Place, Wrong Time'), ('source', 'HDTV'), ('video_codec', 'Xvid'), ('release_group', 'NoTV'), ('container', 'avi'), ('mimetype', 'video/x-msvideo'), ('type', 'episode')]) 151 | 152 | `MatchesDict` is a dict that keeps matches ordering. 153 | 154 | Command line options can be given as dict or string to the second argument. 155 | 156 | Configuration 157 | ------------- 158 | 159 | Find more about Guessit configuration at [configuration page](./configuration.md). 160 | 161 | REST API 162 | -------- 163 | 164 | A REST API will be available soon ... 165 | 166 | Sources are available in a dedicated [guessit-rest repository](https://github.com/Toilal/guessit-rest). 167 | 168 | Support 169 | ------- 170 | 171 | This project is hosted on [GitHub](https://github.com/guessit-io/guessit). Feel free to open an issue if you think you have found a bug or something is missing in guessit. 172 | 173 | GuessIt relies on [Rebulk](https://github.com/Toilal/rebulk) project for pattern and rules registration. 174 | 175 | License 176 | ------- 177 | 178 | GuessIt is licensed under the [LGPLv3 license](http://www.gnu.org/licenses/lgpl.html). 179 | -------------------------------------------------------------------------------- /docs/migration.md: -------------------------------------------------------------------------------- 1 | Migration 2 | ========= 3 | 4 | Guessit 2 has been rewritten from scratch. You can find in this file all information required to perform a migration from previous version `0.x` or `1.x`. 5 | 6 | API 7 | --- 8 | 9 | `guess_video_info`, `guess_movie_info` and `guess_episode_info` have been removed in favor of a unique function `guessit`. 10 | 11 | Example: 12 | 13 | >>> from guessit import guessit 14 | >>> guessit('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi') 15 | MatchesDict([('title', 'Treme'), ('season', 1), ('episode', 3), ('episode_title', 'Right Place, Wrong Time'), ('format', 'HDTV'), ('video_codec', 'XviD'), ('release_group', 'NoTV'), ('container', 'avi'), ('mimetype', 'video/x-msvideo'), ('type', 'episode')]) 16 | 17 | `MatchesDict` is a dict that keeps matches ordering. 18 | 19 | Command line options can be given as dict or string to the second argument. 20 | 21 | Properties 22 | ---------- 23 | 24 | Some properties have been renamed. 25 | 26 | - `series` is now `title`. 27 | - `title` is now `episode_title` (for `episode` `type` only). 28 | - `episodeNumber` is now `episode`. 29 | - `bonusNumber` is now `bonus` 30 | - `filmNumber` is now `film` 31 | - `cdNumber` is now `cd ` and `cdNumberTotal` is now `cd_count` 32 | - `idNumber` is now `uuid` 33 | 34 | `episodeList` and `partList` have been removed. `episode_number` and `part` properties that can now contains an `int` or a `list[int]`. 35 | 36 | All info `type`, like `seriesinfo` and `movieinfo`. You can check directly `nfo` value in `container` property. 37 | 38 | All `camelCase` properties have been renamed to `underscore_case`. 39 | 40 | - `releaseGroup` is now `release_group` 41 | - `episodeCount` is now `episode_count` 42 | - `episodeDetails` is now `episode_details` 43 | - `episodeFormat` is now `episode_format` 44 | - `screenSize` is now `screen_size` 45 | - `videoCodec` is now `video_codec` 46 | - `videoProfile` is now `video_profile` 47 | - `videoApi` is now `video_api` 48 | - `audioChannels` is now `audio_channels` 49 | - `audioCodec` is now `audio_codec` 50 | - `audioProfile` is now `audio_profile` 51 | - `subtitleLanguage` is now `subtitle_language` 52 | - `bonusTitle` is now `bonus_title` 53 | - `properCount` is now `proper_count` 54 | 55 | Options 56 | ------- 57 | 58 | Some options have been removed. 59 | 60 | - `-X DISABLED_TRANSFORMERS`, `-s, --transformers` 61 | 62 | There's no transformer anymore. 63 | 64 | - `-S EXPECTED_SERIES` 65 | 66 | As `series` was renamed to `title`, use `-T EXPECTED_TITLE` instead. 67 | 68 | - `-G EXPECTED_GROUP` 69 | 70 | GuessIt is now better to guess release group, so this option has been removed. 71 | 72 | - `-d, --demo` 73 | 74 | Probably not that useful. 75 | 76 | - `-i INFO, --info INFO` 77 | 78 | Features related to this option have been removed. 79 | 80 | - `-c, --split-camel`, `-u, --unidentified`, `-b, --bug` 81 | 82 | Will be back soon... (work in progress) 83 | 84 | Other GuessIt `1.x` options have been kept. 85 | 86 | Output 87 | ------ 88 | 89 | Output produced by `guessit` api function is now an instance of [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict). Property values are automatically ordered based on filename, and you can still use this output as a default python `dict`. 90 | 91 | If multiple values are available for a property, value in the dict will be a `list` instance. 92 | 93 | `country` 2-letter code is not added to the title anymore. As `country` is added to the returned guess dict, it's up to the user to edit the guessed title. 94 | 95 | Advanced display option (`-a, --advanced`) output is also changed. It now list `Match` objects from [Rebulk](https://github.com/Toilal/rebulk), and may display duplicates that would have been merged in standard output.: 96 | 97 | $ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi" -a 98 | For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi 99 | GuessIt found: { 100 | "title": { 101 | "value": "Treme", 102 | "raw": "Treme.", 103 | "start": 0, 104 | "end": 6 105 | }, 106 | "season": { 107 | "value": 1, 108 | "raw": "1", 109 | "start": 6, 110 | "end": 7 111 | }, 112 | "episode": { 113 | "value": 3, 114 | "raw": "03", 115 | "start": 8, 116 | "end": 10 117 | }, 118 | "episode_title": { 119 | "value": "Right Place, Wrong Time", 120 | "raw": ".Right.Place,.Wrong.Time.", 121 | "start": 10, 122 | "end": 35 123 | }, 124 | "format": { 125 | "value": "HDTV", 126 | "raw": "HDTV", 127 | "start": 35, 128 | "end": 39 129 | }, 130 | "video_codec": { 131 | "value": "XviD", 132 | "raw": "XviD", 133 | "start": 40, 134 | "end": 44 135 | }, 136 | "release_group": { 137 | "value": "NoTV", 138 | "raw": "-NoTV", 139 | "start": 44, 140 | "end": 49 141 | }, 142 | "container": { 143 | "value": "avi", 144 | "raw": ".avi", 145 | "start": 49, 146 | "end": 53 147 | }, 148 | "mimetype": { 149 | "value": "video/x-msvideo", 150 | "start": 53, 151 | "end": 53 152 | }, 153 | "type": { 154 | "value": "episode", 155 | "start": 53, 156 | "end": 53 157 | } 158 | } 159 | -------------------------------------------------------------------------------- /docs/migration2to3.md: -------------------------------------------------------------------------------- 1 | Migration 2 | ========= 3 | 4 | Guessit 3 has introduced breaking changes from previous versions. You can find in this file all information required to perform a migration from previous version `2.x`. 5 | 6 | API 7 | --- 8 | 9 | No changes. 10 | 11 | Properties 12 | ---------- 13 | 14 | Some properties have been renamed. 15 | 16 | - `format` is now `source`. 17 | 18 | Values 19 | ------ 20 | 21 | The major changes in GuessIt 3 are around the values. Values were renamed in order to keep consistency and to be more intuitive. Acronyms are uppercase (e.g.: `HDTV`). Names follow the official name (e.g.: `Blu-ray`). Words have only the first letter capitalized (e.g.: `Camera`) except prepositions (e.g.: `on`) which are all lowercase. 22 | 23 | The following values were changed: 24 | 25 | ### `source` (former `format` property) 26 | 27 | - `Cam` is now `Camera` or `HD Camera` 28 | - `Telesync` is now `Telesync` or `HD Telesync` 29 | - `PPV` is now `Pay-per-view` 30 | - `DVB` is now `Digital TV` 31 | - `VOD` is now `Video on Demand` 32 | - `WEBRip` is now `Web` with additional property `other: Rip` 33 | - `WEB-DL` is now `Web` 34 | - `AHDTV` is now `Analog HDTV` 35 | - `UHDTV` is now `Ultra HDTV` 36 | - `HDTC` is now `HD Telecine` 37 | 38 | ### `screen_size` 39 | 40 | - `360i` was added. 41 | - `480i` was added. 42 | - `576i` was added. 43 | - `900i` was added. 44 | - `4K` is now `2160p` 45 | - `4320p` was added. 46 | 47 | ### `video_codec` 48 | 49 | - `h264` is now `H.264` 50 | - `h265` is now `H.265` 51 | - `Mpeg2` is now `MPEG-2` 52 | - `Real` is now `RealVideo` 53 | - `XviD` is now `Xvid` 54 | 55 | ### `video_profile` 56 | 57 | - `BP` is now `Baseline`. 58 | - `HP` is now `High`. 59 | - `XP` is now `Extended`. 60 | - `MP` is now `Main`. 61 | - `Hi422P` is now `High 4:2:2`. 62 | - `Hi444PP` is now `High 4:4:4 Predictive`. 63 | - `High 10` was added. 64 | - `8bit` was removed. `8bit` is detected as `color_depth: 8-bit` 65 | - `10bit` was removed. `10bit` is detected as `color_depth: 10-bit` 66 | 67 | ### `audio_codec` 68 | 69 | - `DTS-HD` was added. 70 | - `AC3` is now `Dolby Digital` 71 | - `EAC3` is now `Dolby Digital Plus` 72 | - `TrueHD` is now `Dolby TrueHD` 73 | - `DolbyAtmos` is now `Dolby Atmos`. 74 | 75 | ### `audio_profile` 76 | 77 | - `HE` is now `High Efficiency`. 78 | - `LC` is now `Low Complexity`. 79 | - `HQ` is now `High Quality`. 80 | - `HDMA` is now `Master Audio`. 81 | 82 | ### `edition` 83 | 84 | - `Collector Edition` is now `Collector` 85 | - `Special Edition` is now `Special` 86 | - `Criterion Edition` is now `Criterion` 87 | - `Deluxe Edition` is now `Deluxe` 88 | - `Limited Edition` is now `Limited` 89 | - `Theatrical Edition` is now `Theatrical` 90 | - `Director's Definitive Cut` was added. 91 | 92 | ### `episode_details` 93 | 94 | - `Oav` and `Ova` were removed. They are now `other: Original Animated Video` 95 | - `Omake` is now `Extras` 96 | - `Final` was added. 97 | 98 | ### `other` 99 | 100 | - `Rip` was added. E.g.: `DVDRip` will output `other: Rip` 101 | - `DDC` was removed. `DDC` is now `edition: Director's Definitive Cut` 102 | - `CC` was removed. `CC` is now `edition: Criterion` 103 | - `FINAL` was removed. `FINAL` is now `episode_details: Final` 104 | - `Original Animated Video` was added. 105 | - `OV` is now `Original Video` 106 | - `AudioFix` is now `Audio Fixed` 107 | - `SyncFix` is now `Sync Fixed` 108 | - `DualAudio` is now `Dual Audio` 109 | - `Fansub` is now `Fan Subtitled` 110 | - `Fastsub` is now `Fast Subtitled` 111 | - `FullHD` is now `Full HD` 112 | - `UltraHD` is now `Ultra HD` 113 | - `mHD` and `HDLight` are now `Micro HD` 114 | - `HQ` is now `High Quality` 115 | - `HR` is now `High Resolution` 116 | - `LD` is now `Line Dubbed` 117 | - `MD` is now `Mic Dubbed` 118 | - `Low Definition` was added. 119 | - `LiNE` is now `Line Audio` 120 | - `R5` is now `Region 5` 121 | - `Region C` was added. 122 | - `ReEncoded` is now `Reencoded` 123 | - `WideScreen` is now `Widescreen` 124 | 125 | -------------------------------------------------------------------------------- /docs/sources.md: -------------------------------------------------------------------------------- 1 | Getting the source code 2 | ======================= 3 | 4 | GuessIt is actively developed on [GitHub](https://github.com/guessit-io/guessit). 5 | 6 | You can either clone the public repository: 7 | 8 | $ git clone https://github.com/guessit-io/guessit.git 9 | 10 | Download the [tarball](https://github.com/guessit-io/guessit/tarball/master): 11 | 12 | $ curl -L https://github.com/guessit-io/guessit/tarball/master -o guessit.tar.gz 13 | 14 | Or download the [zipball](https://github.com/guessit-io/guessit/zipball/master): 15 | 16 | $ curl -L https://github.com/guessit-io/guessit/zipball/master -o guessit.zip 17 | 18 | Once you have a copy of the source, you can embed it in your Python package, install it into your site-packages folder like that: 19 | 20 | $ python setup.py install 21 | 22 | or use it directly from the source folder for development: 23 | 24 | $ python setup.py develop 25 | -------------------------------------------------------------------------------- /guessit.spec: -------------------------------------------------------------------------------- 1 | # -*- mode: python -*- 2 | 3 | block_cipher = None 4 | 5 | import babelfish 6 | 7 | a = Analysis(['guessit/__main__.py'], 8 | pathex=[], 9 | binaries=[], 10 | datas=[ 11 | ('guessit/config/*', 'guessit/config'), 12 | ('guessit/data/*', 'guessit/data'), 13 | (babelfish.__path__[0] + '/data', 'babelfish/data') 14 | ], 15 | hiddenimports=[ 16 | 'pkg_resources.py2_warn', # https://github.com/pypa/setuptools/issues/1963 17 | 'babelfish.converters.alpha2', 18 | 'babelfish.converters.alpha3b', 19 | 'babelfish.converters.alpha3t', 20 | 'babelfish.converters.name', 21 | 'babelfish.converters.opensubtitles', 22 | 'babelfish.converters.countryname' 23 | ], 24 | hookspath=[], 25 | runtime_hooks=[], 26 | excludes=[], 27 | win_no_prefer_redirects=False, 28 | cipher=block_cipher) 29 | pyz = PYZ(a.pure, a.zipped_data, 30 | cipher=block_cipher) 31 | exe = EXE(pyz, 32 | a.scripts, 33 | a.binaries, 34 | a.zipfiles, 35 | a.datas, 36 | name='guessit', 37 | debug=False, 38 | strip=False, 39 | upx=False, 40 | runtime_tmpdir=None, 41 | console=True ) 42 | -------------------------------------------------------------------------------- /guessit/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Extracts as much information as possible from a video file. 5 | """ 6 | from . import monkeypatch as _monkeypatch 7 | 8 | from .api import guessit, GuessItApi 9 | from .options import ConfigurationException 10 | from .rules.common.quantity import Size 11 | 12 | from .__version__ import __version__ 13 | 14 | _monkeypatch.monkeypatch_rebulk() 15 | -------------------------------------------------------------------------------- /guessit/__main__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Entry point module 5 | """ 6 | # pragma: no cover 7 | import json 8 | import logging 9 | import sys 10 | 11 | from collections import OrderedDict 12 | 13 | from rebulk.__version__ import __version__ as __rebulk_version__ 14 | 15 | from guessit import api 16 | from guessit.__version__ import __version__ 17 | from guessit.jsonutils import GuessitEncoder 18 | from guessit.options import argument_parser, parse_options, load_config, merge_options 19 | 20 | 21 | def guess_filename(filename, options): 22 | """ 23 | Guess a single filename using given options 24 | :param filename: filename to parse 25 | :type filename: str 26 | :param options: 27 | :type options: dict 28 | :return: 29 | :rtype: 30 | """ 31 | if not options.get('yaml') and not options.get('json') and not options.get('show_property'): 32 | print('For:', filename) 33 | 34 | guess = api.guessit(filename, options) 35 | 36 | if options.get('show_property'): 37 | print(guess.get(options.get('show_property'), '')) 38 | return 39 | 40 | if options.get('json'): 41 | print(json.dumps(guess, cls=GuessitEncoder, ensure_ascii=False)) 42 | elif options.get('yaml'): 43 | # pylint:disable=import-outside-toplevel 44 | import yaml 45 | from guessit import yamlutils 46 | 47 | ystr = yaml.dump({filename: OrderedDict(guess)}, Dumper=yamlutils.CustomDumper, default_flow_style=False, 48 | allow_unicode=True) 49 | i = 0 50 | for yline in ystr.splitlines(): 51 | if i == 0: 52 | print("? " + yline[:-1]) 53 | elif i == 1: 54 | print(":" + yline[1:]) 55 | else: 56 | print(yline) 57 | i += 1 58 | else: 59 | print('GuessIt found:', json.dumps(guess, cls=GuessitEncoder, indent=4, ensure_ascii=False)) 60 | 61 | 62 | def display_properties(options): 63 | """ 64 | Display properties 65 | """ 66 | properties = api.properties(options) 67 | 68 | if options.get('json'): 69 | if options.get('values'): 70 | print(json.dumps(properties, cls=GuessitEncoder, ensure_ascii=False)) 71 | else: 72 | print(json.dumps(list(properties.keys()), cls=GuessitEncoder, ensure_ascii=False)) 73 | elif options.get('yaml'): 74 | # pylint:disable=import-outside-toplevel 75 | import yaml 76 | from guessit import yamlutils 77 | if options.get('values'): 78 | print(yaml.dump(properties, Dumper=yamlutils.CustomDumper, default_flow_style=False, allow_unicode=True)) 79 | else: 80 | print(yaml.dump(list(properties.keys()), Dumper=yamlutils.CustomDumper, default_flow_style=False, 81 | allow_unicode=True)) 82 | else: 83 | print('GuessIt properties:') 84 | 85 | properties_list = list(sorted(properties.keys())) 86 | for property_name in properties_list: 87 | property_values = properties.get(property_name) 88 | print(2 * ' ' + f'[+] {property_name}') 89 | if property_values and options.get('values'): 90 | for property_value in property_values: 91 | print(4 * ' ' + f'[!] {property_value}') 92 | 93 | 94 | def main(args=None): # pylint:disable=too-many-branches 95 | """ 96 | Main function for entry point 97 | """ 98 | if args is None: # pragma: no cover 99 | options = parse_options() 100 | else: 101 | options = parse_options(args) 102 | 103 | config = load_config(options) 104 | options = merge_options(config, options) 105 | 106 | if options.get('verbose'): 107 | logging.basicConfig(stream=sys.stdout, format='%(message)s') 108 | logging.getLogger().setLevel(logging.DEBUG) 109 | 110 | help_required = True 111 | 112 | if options.get('version'): 113 | print('+-------------------------------------------------------+') 114 | print('+ GuessIt ' + __version__ + (28 - len(__version__)) * ' ' + '+') 115 | print('+-------------------------------------------------------+') 116 | print('+ Rebulk ' + __rebulk_version__ + (29 - len(__rebulk_version__)) * ' ' + '+') 117 | print('+-------------------------------------------------------+') 118 | print('| Please report any bug or feature request at |') 119 | print('| https://github.com/guessit-io/guessit/issues. |') 120 | print('+-------------------------------------------------------+') 121 | help_required = False 122 | 123 | if options.get('yaml'): 124 | try: 125 | import yaml # pylint:disable=unused-variable,unused-import,import-outside-toplevel 126 | except ImportError: # pragma: no cover 127 | del options['yaml'] 128 | print('PyYAML is not installed. \'--yaml\' option will be ignored ...', file=sys.stderr) 129 | 130 | if options.get('properties') or options.get('values'): 131 | display_properties(options) 132 | help_required = False 133 | 134 | filenames = [] 135 | if options.get('filename'): 136 | for filename in options.get('filename'): 137 | filenames.append(filename) 138 | if options.get('input_file'): 139 | with open(options.get('input_file'), 'r', encoding='utf-8') as input_file: 140 | filenames.extend([line.strip() for line in input_file.readlines()]) 141 | 142 | filenames = list(filter(lambda f: f, filenames)) 143 | 144 | if filenames: 145 | for filename in filenames: 146 | help_required = False 147 | guess_filename(filename, options) 148 | 149 | if help_required: # pragma: no cover 150 | argument_parser.print_help() 151 | 152 | 153 | if __name__ == '__main__': # pragma: no cover 154 | main() 155 | -------------------------------------------------------------------------------- /guessit/__version__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Version module 5 | """ 6 | # pragma: no cover 7 | __version__ = '3.8.0' 8 | -------------------------------------------------------------------------------- /guessit/config/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Config module. 3 | """ 4 | from importlib import import_module 5 | from typing import Any, List 6 | 7 | from rebulk import Rebulk 8 | 9 | _regex_prefix = 're:' 10 | _import_prefix = 'import:' 11 | _import_cache = {} 12 | _eval_prefix = 'eval:' 13 | _eval_cache = {} 14 | _pattern_types = ('regex', 'string') 15 | _default_module_names = { 16 | 'validator': 'guessit.rules.common.validators', 17 | 'formatter': 'guessit.rules.common.formatters' 18 | } 19 | 20 | 21 | def _process_option(name: str, value: Any): 22 | if name in ('validator', 'conflict_solver', 'formatter'): 23 | if isinstance(value, dict): 24 | return {item_key: _process_option(name, item_value) for item_key, item_value in value.items()} 25 | if value is not None: 26 | return _process_option_executable(value, _default_module_names.get(name)) 27 | return value 28 | 29 | 30 | def _import(value: str, default_module_name=None): 31 | if '.' in value: 32 | module_name, target = value.rsplit(':', 1) 33 | else: 34 | module_name = default_module_name 35 | target = value 36 | import_id = module_name + ":" + target 37 | if import_id in _import_cache: 38 | return _import_cache[import_id] 39 | 40 | mod = import_module(module_name) 41 | 42 | imported = mod 43 | for item in target.split("."): 44 | imported = getattr(imported, item) 45 | 46 | _import_cache[import_id] = imported 47 | 48 | return imported 49 | 50 | 51 | def _eval(value: str): 52 | compiled = _eval_cache.get(value) 53 | if not compiled: 54 | compiled = compile(value, '', 'eval') 55 | return eval(compiled) # pylint:disable=eval-used 56 | 57 | 58 | def _process_option_executable(value: str, default_module_name=None): 59 | if value.startswith(_import_prefix): 60 | value = value[len(_import_prefix):] 61 | return _import(value, default_module_name) 62 | if value.startswith(_eval_prefix): 63 | value = value[len(_eval_prefix):] 64 | return _eval(value) 65 | if value.startswith('lambda ') or value.startswith('lambda:'): 66 | return _eval(value) 67 | return value 68 | 69 | 70 | def _process_callable_entry(callable_spec: str, rebulk: Rebulk, entry: dict): 71 | _process_option_executable(callable_spec)(rebulk, **entry) 72 | 73 | 74 | def _build_entry_decl(entry, options, value): 75 | entry_decl = dict(options.get(None, {})) 76 | if not value.startswith('_'): 77 | entry_decl['value'] = value 78 | if isinstance(entry, str): 79 | if entry.startswith(_regex_prefix): 80 | entry_decl["regex"] = [entry[len(_regex_prefix):]] 81 | else: 82 | entry_decl["string"] = [entry] 83 | else: 84 | entry_decl.update(entry) 85 | if "pattern" in entry_decl: 86 | legacy_pattern = entry.pop("pattern") 87 | if legacy_pattern.startswith(_regex_prefix): 88 | entry_decl["regex"] = [legacy_pattern[len(_regex_prefix):]] 89 | else: 90 | entry_decl["string"] = [legacy_pattern] 91 | return entry_decl 92 | 93 | 94 | def load_patterns(rebulk: Rebulk, 95 | pattern_type: str, 96 | patterns: List[str], 97 | options: dict = None): 98 | """ 99 | Load patterns for a prepared config entry 100 | :param rebulk: Rebulk builder to use. 101 | :param pattern_type: Pattern type. 102 | :param patterns: Patterns 103 | :param options: kwargs options to pass to rebulk pattern function. 104 | :return: 105 | """ 106 | default_options = options.get(None) if options else None 107 | item_options = dict(default_options) if default_options else {} 108 | pattern_type_option = options.get(pattern_type) 109 | if pattern_type_option: 110 | item_options.update(pattern_type_option) 111 | item_options = {name: _process_option(name, value) for name, value in item_options.items()} 112 | getattr(rebulk, pattern_type)(*patterns, **item_options) 113 | 114 | 115 | def load_config_patterns(rebulk: Rebulk, 116 | config: dict, 117 | options: dict = None): 118 | """ 119 | Load patterns defined in given config. 120 | :param rebulk: Rebulk builder to use. 121 | :param config: dict containing pattern definition. 122 | :param options: Additional pattern options to use. 123 | :type options: Dict[Dict[str, str]] A dict where key is the pattern type (regex, string, functional) and value is 124 | the default kwargs options to pass. 125 | :return: 126 | """ 127 | if options is None: 128 | options = {} 129 | 130 | for value, raw_entries in config.items(): 131 | entries = raw_entries if isinstance(raw_entries, list) else [raw_entries] 132 | for entry in entries: 133 | if isinstance(entry, dict) and "callable" in entry.keys(): 134 | _process_callable_entry(entry.pop("callable"), rebulk, entry) 135 | continue 136 | entry_decl = _build_entry_decl(entry, options, value) 137 | 138 | for pattern_type in _pattern_types: 139 | patterns = entry_decl.get(pattern_type) 140 | if not patterns: 141 | continue 142 | if not isinstance(patterns, list): 143 | patterns = [patterns] 144 | patterns_entry_decl = dict(entry_decl) 145 | 146 | for pattern_type_to_remove in _pattern_types: 147 | patterns_entry_decl.pop(pattern_type_to_remove, None) 148 | 149 | current_pattern_options = dict(options) 150 | current_pattern_options[None] = patterns_entry_decl 151 | 152 | load_patterns(rebulk, pattern_type, patterns, current_pattern_options) 153 | -------------------------------------------------------------------------------- /guessit/data/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Data 5 | """ 6 | -------------------------------------------------------------------------------- /guessit/data/tlds-alpha-by-domain.txt: -------------------------------------------------------------------------------- 1 | # Version 2013112900, Last Updated Fri Nov 29 07:07:01 2013 UTC 2 | AC 3 | AD 4 | AE 5 | AERO 6 | AF 7 | AG 8 | AI 9 | AL 10 | AM 11 | AN 12 | AO 13 | AQ 14 | AR 15 | ARPA 16 | AS 17 | ASIA 18 | AT 19 | AU 20 | AW 21 | AX 22 | AZ 23 | BA 24 | BB 25 | BD 26 | BE 27 | BF 28 | BG 29 | BH 30 | BI 31 | BIKE 32 | BIZ 33 | BJ 34 | BM 35 | BN 36 | BO 37 | BR 38 | BS 39 | BT 40 | BV 41 | BW 42 | BY 43 | BZ 44 | CA 45 | CAMERA 46 | CAT 47 | CC 48 | CD 49 | CF 50 | CG 51 | CH 52 | CI 53 | CK 54 | CL 55 | CLOTHING 56 | CM 57 | CN 58 | CO 59 | COM 60 | CONSTRUCTION 61 | CONTRACTORS 62 | COOP 63 | CR 64 | CU 65 | CV 66 | CW 67 | CX 68 | CY 69 | CZ 70 | DE 71 | DIAMONDS 72 | DIRECTORY 73 | DJ 74 | DK 75 | DM 76 | DO 77 | DZ 78 | EC 79 | EDU 80 | EE 81 | EG 82 | ENTERPRISES 83 | EQUIPMENT 84 | ER 85 | ES 86 | ESTATE 87 | ET 88 | EU 89 | FI 90 | FJ 91 | FK 92 | FM 93 | FO 94 | FR 95 | GA 96 | GALLERY 97 | GB 98 | GD 99 | GE 100 | GF 101 | GG 102 | GH 103 | GI 104 | GL 105 | GM 106 | GN 107 | GOV 108 | GP 109 | GQ 110 | GR 111 | GRAPHICS 112 | GS 113 | GT 114 | GU 115 | GURU 116 | GW 117 | GY 118 | HK 119 | HM 120 | HN 121 | HOLDINGS 122 | HR 123 | HT 124 | HU 125 | ID 126 | IE 127 | IL 128 | IM 129 | IN 130 | INFO 131 | INT 132 | IO 133 | IQ 134 | IR 135 | IS 136 | IT 137 | JE 138 | JM 139 | JO 140 | JOBS 141 | JP 142 | KE 143 | KG 144 | KH 145 | KI 146 | KITCHEN 147 | KM 148 | KN 149 | KP 150 | KR 151 | KW 152 | KY 153 | KZ 154 | LA 155 | LAND 156 | LB 157 | LC 158 | LI 159 | LIGHTING 160 | LK 161 | LR 162 | LS 163 | LT 164 | LU 165 | LV 166 | LY 167 | MA 168 | MC 169 | MD 170 | ME 171 | MG 172 | MH 173 | MIL 174 | MK 175 | ML 176 | MM 177 | MN 178 | MO 179 | MOBI 180 | MP 181 | MQ 182 | MR 183 | MS 184 | MT 185 | MU 186 | MUSEUM 187 | MV 188 | MW 189 | MX 190 | MY 191 | MZ 192 | NA 193 | NAME 194 | NC 195 | NE 196 | NET 197 | NF 198 | NG 199 | NI 200 | NL 201 | NO 202 | NP 203 | NR 204 | NU 205 | NZ 206 | OM 207 | ORG 208 | PA 209 | PE 210 | PF 211 | PG 212 | PH 213 | PHOTOGRAPHY 214 | PK 215 | PL 216 | PLUMBING 217 | PM 218 | PN 219 | POST 220 | PR 221 | PRO 222 | PS 223 | PT 224 | PW 225 | PY 226 | QA 227 | RE 228 | RO 229 | RS 230 | RU 231 | RW 232 | SA 233 | SB 234 | SC 235 | SD 236 | SE 237 | SEXY 238 | SG 239 | SH 240 | SI 241 | SINGLES 242 | SJ 243 | SK 244 | SL 245 | SM 246 | SN 247 | SO 248 | SR 249 | ST 250 | SU 251 | SV 252 | SX 253 | SY 254 | SZ 255 | TATTOO 256 | TC 257 | TD 258 | TECHNOLOGY 259 | TEL 260 | TF 261 | TG 262 | TH 263 | TIPS 264 | TJ 265 | TK 266 | TL 267 | TM 268 | TN 269 | TO 270 | TODAY 271 | TP 272 | TR 273 | TRAVEL 274 | TT 275 | TV 276 | TW 277 | TZ 278 | UA 279 | UG 280 | UK 281 | US 282 | UY 283 | UZ 284 | VA 285 | VC 286 | VE 287 | VENTURES 288 | VG 289 | VI 290 | VN 291 | VOYAGE 292 | VU 293 | WF 294 | WS 295 | XN--3E0B707E 296 | XN--45BRJ9C 297 | XN--80AO21A 298 | XN--80ASEHDB 299 | XN--80ASWG 300 | XN--90A3AC 301 | XN--CLCHC0EA0B2G2A9GCD 302 | XN--FIQS8S 303 | XN--FIQZ9S 304 | XN--FPCRJ9C3D 305 | XN--FZC2C9E2C 306 | XN--GECRJ9C 307 | XN--H2BRJ9C 308 | XN--J1AMH 309 | XN--J6W193G 310 | XN--KPRW13D 311 | XN--KPRY57D 312 | XN--L1ACC 313 | XN--LGBBAT1AD8J 314 | XN--MGB9AWBF 315 | XN--MGBA3A4F16A 316 | XN--MGBAAM7A8H 317 | XN--MGBAYH7GPA 318 | XN--MGBBH1A71E 319 | XN--MGBC0A9AZCG 320 | XN--MGBERP4A5D4AR 321 | XN--MGBX4CD0AB 322 | XN--NGBC5AZD 323 | XN--O3CW4H 324 | XN--OGBPF8FL 325 | XN--P1AI 326 | XN--PGBS0DH 327 | XN--Q9JYB4C 328 | XN--S9BRJ9C 329 | XN--UNUP4Y 330 | XN--WGBH1C 331 | XN--WGBL6A 332 | XN--XKC2AL3HYE2A 333 | XN--XKC2DL3A5EE0H 334 | XN--YFRO4I67O 335 | XN--YGBI2AMMX 336 | XXX 337 | YE 338 | YT 339 | ZA 340 | ZM 341 | ZW 342 | -------------------------------------------------------------------------------- /guessit/jsonutils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | JSON Utils 5 | """ 6 | import json 7 | 8 | from six import text_type 9 | from rebulk.match import Match 10 | 11 | class GuessitEncoder(json.JSONEncoder): 12 | """ 13 | JSON Encoder for guessit response 14 | """ 15 | 16 | def default(self, o): # pylint:disable=method-hidden 17 | if isinstance(o, Match): 18 | return o.advanced 19 | if hasattr(o, 'name'): # Babelfish languages/countries long name 20 | return text_type(o.name) 21 | # pragma: no cover 22 | return text_type(o) 23 | -------------------------------------------------------------------------------- /guessit/monkeypatch.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Monkeypatch initialisation functions 5 | """ 6 | 7 | from collections import OrderedDict 8 | 9 | from rebulk.match import Match 10 | 11 | 12 | def monkeypatch_rebulk(): 13 | """Monkeypatch rebulk classes""" 14 | 15 | @property 16 | def match_advanced(self): 17 | """ 18 | Build advanced dict from match 19 | :param self: 20 | :return: 21 | """ 22 | 23 | ret = OrderedDict() 24 | ret['value'] = self.value 25 | if self.raw: 26 | ret['raw'] = self.raw 27 | ret['start'] = self.start 28 | ret['end'] = self.end 29 | return ret 30 | 31 | Match.advanced = match_advanced 32 | -------------------------------------------------------------------------------- /guessit/reutils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Utils for re module 5 | """ 6 | 7 | from rebulk.remodule import re 8 | 9 | 10 | def build_or_pattern(patterns, name=None, escape=False): 11 | """ 12 | Build a or pattern string from a list of possible patterns 13 | 14 | :param patterns: 15 | :type patterns: 16 | :param name: 17 | :type name: 18 | :param escape: 19 | :type escape: 20 | :return: 21 | :rtype: 22 | """ 23 | or_pattern = [] 24 | for pattern in patterns: 25 | if not or_pattern: 26 | or_pattern.append('(?') 27 | if name: 28 | or_pattern.append(f'P<{name}>') 29 | else: 30 | or_pattern.append(':') 31 | else: 32 | or_pattern.append('|') 33 | or_pattern.append(f'(?:{re.escape(pattern)})' if escape else pattern) 34 | or_pattern.append(')') 35 | return ''.join(or_pattern) 36 | -------------------------------------------------------------------------------- /guessit/rules/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Rebulk object default builder 5 | """ 6 | from rebulk import Rebulk 7 | 8 | from .markers.path import path 9 | from .markers.groups import groups 10 | 11 | from .properties.episodes import episodes 12 | from .properties.container import container 13 | from .properties.source import source 14 | from .properties.video_codec import video_codec 15 | from .properties.audio_codec import audio_codec 16 | from .properties.screen_size import screen_size 17 | from .properties.website import website 18 | from .properties.date import date 19 | from .properties.title import title 20 | from .properties.episode_title import episode_title 21 | from .properties.language import language 22 | from .properties.country import country 23 | from .properties.release_group import release_group 24 | from .properties.streaming_service import streaming_service 25 | from .properties.other import other 26 | from .properties.size import size 27 | from .properties.bit_rate import bit_rate 28 | from .properties.edition import edition 29 | from .properties.cd import cd 30 | from .properties.bonus import bonus 31 | from .properties.film import film 32 | from .properties.part import part 33 | from .properties.crc import crc 34 | from .properties.mimetype import mimetype 35 | from .properties.type import type_ 36 | 37 | from .processors import processors 38 | 39 | 40 | def rebulk_builder(config): 41 | """ 42 | Default builder for main Rebulk object used by api. 43 | :return: Main Rebulk object 44 | :rtype: Rebulk 45 | """ 46 | def _config(name): 47 | return config.get(name, {}) 48 | 49 | rebulk = Rebulk() 50 | 51 | common_words = frozenset(_config('common_words')) 52 | 53 | rebulk.rebulk(path(_config('path'))) 54 | rebulk.rebulk(groups(_config('groups'))) 55 | 56 | rebulk.rebulk(episodes(_config('episodes'))) 57 | rebulk.rebulk(container(_config('container'))) 58 | rebulk.rebulk(source(_config('source'))) 59 | rebulk.rebulk(video_codec(_config('video_codec'))) 60 | rebulk.rebulk(audio_codec(_config('audio_codec'))) 61 | rebulk.rebulk(screen_size(_config('screen_size'))) 62 | rebulk.rebulk(website(_config('website'))) 63 | rebulk.rebulk(date(_config('date'))) 64 | rebulk.rebulk(title(_config('title'))) 65 | rebulk.rebulk(episode_title(_config('episode_title'))) 66 | rebulk.rebulk(language(_config('language'), common_words)) 67 | rebulk.rebulk(country(_config('country'), common_words)) 68 | rebulk.rebulk(release_group(_config('release_group'))) 69 | rebulk.rebulk(streaming_service(_config('streaming_service'))) 70 | rebulk.rebulk(other(_config('other'))) 71 | rebulk.rebulk(size(_config('size'))) 72 | rebulk.rebulk(bit_rate(_config('bit_rate'))) 73 | rebulk.rebulk(edition(_config('edition'))) 74 | rebulk.rebulk(cd(_config('cd'))) 75 | rebulk.rebulk(bonus(_config('bonus'))) 76 | rebulk.rebulk(film(_config('film'))) 77 | rebulk.rebulk(part(_config('part'))) 78 | rebulk.rebulk(crc(_config('crc'))) 79 | 80 | rebulk.rebulk(processors(_config('processors'))) 81 | 82 | rebulk.rebulk(mimetype(_config('mimetype'))) 83 | rebulk.rebulk(type_(_config('type'))) 84 | 85 | def customize_properties(properties): 86 | """ 87 | Customize default rebulk properties 88 | """ 89 | count = properties['count'] 90 | del properties['count'] 91 | 92 | properties['season_count'] = count 93 | properties['episode_count'] = count 94 | 95 | return properties 96 | 97 | rebulk.customize_properties = customize_properties 98 | 99 | return rebulk 100 | -------------------------------------------------------------------------------- /guessit/rules/common/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Common module 5 | """ 6 | from rebulk.remodule import re 7 | 8 | seps = r' [](){}+*|=-_~#/\\.,;:' # list of tags/words separators 9 | seps_no_groups = seps.replace('[](){}', '') 10 | seps_no_fs = seps.replace('/', '').replace('\\', '') 11 | 12 | title_seps = r'-+/\|' # separators for title 13 | 14 | dash = (r'-', r'['+re.escape(seps_no_fs)+']') # abbreviation used by many rebulk objects. 15 | alt_dash = (r'@', r'['+re.escape(seps_no_fs)+']') # abbreviation used by many rebulk objects. 16 | 17 | 18 | def optional(pattern): 19 | """ 20 | Make a regex pattern optional 21 | """ 22 | return '(?:' + pattern + ')?' 23 | -------------------------------------------------------------------------------- /guessit/rules/common/comparators.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Comparators 5 | """ 6 | 7 | from functools import cmp_to_key 8 | 9 | 10 | def marker_comparator_predicate(match): 11 | """ 12 | Match predicate used in comparator 13 | """ 14 | return ( 15 | not match.private 16 | and match.name not in ('proper_count', 'title') 17 | and not (match.name == 'container' and 'extension' in match.tags) 18 | and not (match.name == 'other' and match.value == 'Rip') 19 | ) 20 | 21 | 22 | def marker_weight(matches, marker, predicate): 23 | """ 24 | Compute the comparator weight of a marker 25 | :param matches: 26 | :param marker: 27 | :param predicate: 28 | :return: 29 | """ 30 | return len(set(match.name for match in matches.range(*marker.span, predicate=predicate))) 31 | 32 | 33 | def marker_comparator(matches, markers, predicate): 34 | """ 35 | Builds a comparator that returns markers sorted from the most valuable to the less. 36 | 37 | Take the parts where matches count is higher, then when length is higher, then when position is at left. 38 | 39 | :param matches: 40 | :type matches: 41 | :param markers: 42 | :param predicate: 43 | :return: 44 | :rtype: 45 | """ 46 | 47 | def comparator(marker1, marker2): 48 | """ 49 | The actual comparator function. 50 | """ 51 | matches_count = marker_weight(matches, marker2, predicate) - marker_weight(matches, marker1, predicate) 52 | if matches_count: 53 | return matches_count 54 | 55 | # give preference to rightmost path 56 | return markers.index(marker2) - markers.index(marker1) 57 | 58 | return comparator 59 | 60 | 61 | def marker_sorted(markers, matches, predicate=marker_comparator_predicate): 62 | """ 63 | Sort markers from matches, from the most valuable to the less. 64 | 65 | :param markers: 66 | :type markers: 67 | :param matches: 68 | :type matches: 69 | :param predicate: 70 | :return: 71 | :rtype: 72 | """ 73 | return sorted(markers, key=cmp_to_key(marker_comparator(matches, markers, predicate=predicate))) 74 | -------------------------------------------------------------------------------- /guessit/rules/common/date.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Date 5 | """ 6 | from dateutil import parser 7 | 8 | from rebulk.remodule import re 9 | 10 | _dsep = r'[-/ \.]' 11 | _dsep_bis = r'[-/ \.x]' 12 | 13 | date_regexps = [ 14 | # pylint:disable=consider-using-f-string 15 | re.compile(r'%s((\d{8}))%s' % (_dsep, _dsep), re.IGNORECASE), 16 | # pylint:disable=consider-using-f-string 17 | re.compile(r'%s((\d{6}))%s' % (_dsep, _dsep), re.IGNORECASE), 18 | # pylint:disable=consider-using-f-string 19 | re.compile(r'(?:^|[^\d])((\d{2})%s(\d{1,2})%s(\d{1,2}))(?:$|[^\d])' % (_dsep, _dsep), re.IGNORECASE), 20 | # pylint:disable=consider-using-f-string 21 | re.compile(r'(?:^|[^\d])((\d{1,2})%s(\d{1,2})%s(\d{2}))(?:$|[^\d])' % (_dsep, _dsep), re.IGNORECASE), 22 | # pylint:disable=consider-using-f-string 23 | re.compile(r'(?:^|[^\d])((\d{4})%s(\d{1,2})%s(\d{1,2}))(?:$|[^\d])' % (_dsep_bis, _dsep), re.IGNORECASE), 24 | # pylint:disable=consider-using-f-string 25 | re.compile(r'(?:^|[^\d])((\d{1,2})%s(\d{1,2})%s(\d{4}))(?:$|[^\d])' % (_dsep, _dsep_bis), re.IGNORECASE), 26 | # pylint:disable=consider-using-f-string 27 | re.compile(r'(?:^|[^\d])((\d{1,2}(?:st|nd|rd|th)?%s(?:[a-z]{3,10})%s\d{4}))(?:$|[^\d])' % (_dsep, _dsep), 28 | # pylint:disable=consider-using-f-string 29 | re.IGNORECASE)] 30 | 31 | 32 | def valid_year(year): 33 | """Check if number is a valid year""" 34 | return 1920 <= year < 2030 35 | 36 | 37 | def valid_week(week): 38 | """Check if number is a valid week""" 39 | return 1 <= week < 53 40 | 41 | 42 | def _is_int(string): 43 | """ 44 | Check if the input string is an integer 45 | 46 | :param string: 47 | :type string: 48 | :return: 49 | :rtype: 50 | """ 51 | try: 52 | int(string) 53 | return True 54 | except ValueError: 55 | return False 56 | 57 | 58 | def _guess_day_first_parameter(groups): # pylint:disable=inconsistent-return-statements 59 | """ 60 | If day_first is not defined, use some heuristic to fix it. 61 | It helps to solve issues with python dateutils 2.5.3 parser changes. 62 | 63 | :param groups: match groups found for the date 64 | :type groups: list of match objects 65 | :return: day_first option guessed value 66 | :rtype: bool 67 | """ 68 | 69 | # If match starts with a long year, then day_first is force to false. 70 | if _is_int(groups[0]) and valid_year(int(groups[0][:4])): 71 | return False 72 | # If match ends with a long year, the day_first is forced to true. 73 | if _is_int(groups[-1]) and valid_year(int(groups[-1][-4:])): 74 | return True 75 | # If match starts with a short year, then day_first is force to false. 76 | if _is_int(groups[0]) and int(groups[0][:2]) > 31: 77 | return False 78 | # If match ends with a short year, then day_first is force to true. 79 | if _is_int(groups[-1]) and int(groups[-1][-2:]) > 31: 80 | return True 81 | 82 | 83 | def search_date(string, year_first=None, day_first=None): # pylint:disable=inconsistent-return-statements 84 | """Looks for date patterns, and if found return the date and group span. 85 | 86 | Assumes there are sentinels at the beginning and end of the string that 87 | always allow matching a non-digit delimiting the date. 88 | 89 | Year can be defined on two digit only. It will return the nearest possible 90 | date from today. 91 | 92 | >>> search_date(' This happened on 2002-04-22. ') 93 | (18, 28, datetime.date(2002, 4, 22)) 94 | 95 | >>> search_date(' And this on 17-06-1998. ') 96 | (13, 23, datetime.date(1998, 6, 17)) 97 | 98 | >>> search_date(' no date in here ') 99 | """ 100 | for date_re in date_regexps: 101 | search_match = date_re.search(string) 102 | if not search_match: 103 | continue 104 | 105 | start, end = search_match.start(1), search_match.end(1) 106 | groups = search_match.groups()[1:] 107 | match = '-'.join(groups) 108 | 109 | if match is None: 110 | continue 111 | 112 | if year_first and day_first is None: 113 | day_first = False 114 | 115 | if day_first is None: 116 | day_first = _guess_day_first_parameter(groups) 117 | 118 | # If day_first/year_first is undefined, parse is made using both possible values. 119 | yearfirst_opts = [False, True] 120 | if year_first is not None: 121 | yearfirst_opts = [year_first] 122 | 123 | dayfirst_opts = [True, False] 124 | if day_first is not None: 125 | dayfirst_opts = [day_first] 126 | 127 | kwargs_list = ({'dayfirst': d, 'yearfirst': y} 128 | for d in dayfirst_opts for y in yearfirst_opts) 129 | for kwargs in kwargs_list: 130 | try: 131 | date = parser.parse(match, **kwargs) 132 | except (ValueError, TypeError): # pragma: no cover 133 | # see https://bugs.launchpad.net/dateutil/+bug/1247643 134 | date = None 135 | 136 | # check date plausibility 137 | if date and valid_year(date.year): # pylint:disable=no-member 138 | return start, end, date.date() # pylint:disable=no-member 139 | -------------------------------------------------------------------------------- /guessit/rules/common/expected.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Expected property factory 5 | """ 6 | from rebulk import Rebulk 7 | from rebulk.remodule import re 8 | from rebulk.utils import find_all 9 | 10 | from . import dash, seps 11 | 12 | 13 | def build_expected_function(context_key): 14 | """ 15 | Creates a expected property function 16 | :param context_key: 17 | :type context_key: 18 | :param cleanup: 19 | :type cleanup: 20 | :return: 21 | :rtype: 22 | """ 23 | 24 | def expected(input_string, context): 25 | """ 26 | Expected property functional pattern. 27 | :param input_string: 28 | :type input_string: 29 | :param context: 30 | :type context: 31 | :return: 32 | :rtype: 33 | """ 34 | ret = [] 35 | for search in context.get(context_key): 36 | if search.startswith('re:'): 37 | search = search[3:] 38 | search = search.replace(' ', '-') 39 | matches = Rebulk().regex(search, abbreviations=[dash], flags=re.IGNORECASE) \ 40 | .matches(input_string, context) 41 | for match in matches: 42 | ret.append(match.span) 43 | else: 44 | for sep in seps: 45 | input_string = input_string.replace(sep, ' ') 46 | search = search.replace(sep, ' ') 47 | for start in find_all(input_string, search, ignore_case=True): 48 | end = start + len(search) 49 | value = input_string[start:end] 50 | ret.append({'start': start, 'end': end, 'value': value}) 51 | return ret 52 | 53 | return expected 54 | -------------------------------------------------------------------------------- /guessit/rules/common/formatters.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Formatters 5 | """ 6 | from rebulk.formatters import formatters 7 | from rebulk.remodule import re 8 | from . import seps 9 | 10 | _excluded_clean_chars = ',:;-/\\' 11 | clean_chars = "" 12 | for sep in seps: 13 | if sep not in _excluded_clean_chars: 14 | clean_chars += sep 15 | 16 | 17 | def _potential_before(i, input_string): 18 | """ 19 | Check if the character at position i can be a potential single char separator considering what's before it. 20 | 21 | :param i: 22 | :type i: int 23 | :param input_string: 24 | :type input_string: str 25 | :return: 26 | :rtype: bool 27 | """ 28 | return i - 1 >= 0 and input_string[i] in seps and input_string[i - 2] in seps and input_string[i - 1] not in seps 29 | 30 | 31 | def _potential_after(i, input_string): 32 | """ 33 | Check if the character at position i can be a potential single char separator considering what's after it. 34 | 35 | :param i: 36 | :type i: int 37 | :param input_string: 38 | :type input_string: str 39 | :return: 40 | :rtype: bool 41 | """ 42 | return i + 2 >= len(input_string) or \ 43 | input_string[i + 2] == input_string[i] and input_string[i + 1] not in seps 44 | 45 | 46 | def cleanup(input_string): 47 | """ 48 | Removes and strip separators from input_string (but keep ',;' characters) 49 | 50 | It also keep separators for single characters (Mavels Agents of S.H.I.E.L.D.) 51 | 52 | :param input_string: 53 | :type input_string: str 54 | :return: 55 | :rtype: 56 | """ 57 | clean_string = input_string 58 | for char in clean_chars: 59 | clean_string = clean_string.replace(char, ' ') 60 | 61 | # Restore input separator if they separate single characters. 62 | # Useful for Mavels Agents of S.H.I.E.L.D. 63 | # https://github.com/guessit-io/guessit/issues/278 64 | 65 | indices = [i for i, letter in enumerate(clean_string) if letter in seps] 66 | 67 | dots = set() 68 | if indices: 69 | clean_list = list(clean_string) 70 | 71 | potential_indices = [] 72 | 73 | for i in indices: 74 | if _potential_before(i, input_string) and _potential_after(i, input_string): 75 | potential_indices.append(i) 76 | 77 | replace_indices = [] 78 | 79 | for potential_index in potential_indices: 80 | if potential_index - 2 in potential_indices or potential_index + 2 in potential_indices: 81 | replace_indices.append(potential_index) 82 | 83 | if replace_indices: 84 | for replace_index in replace_indices: 85 | dots.add(input_string[replace_index]) 86 | clean_list[replace_index] = input_string[replace_index] 87 | clean_string = ''.join(clean_list) 88 | 89 | clean_string = strip(clean_string, ''.join([c for c in seps if c not in dots])) 90 | 91 | clean_string = re.sub(' +', ' ', clean_string) 92 | return clean_string 93 | 94 | 95 | def strip(input_string, chars=seps): 96 | """ 97 | Strip separators from input_string 98 | :param input_string: 99 | :param chars: 100 | :type input_string: 101 | :return: 102 | :rtype: 103 | """ 104 | return input_string.strip(chars) 105 | 106 | 107 | def raw_cleanup(raw): 108 | """ 109 | Cleanup a raw value to perform raw comparison 110 | :param raw: 111 | :type raw: 112 | :return: 113 | :rtype: 114 | """ 115 | return formatters(cleanup, strip)(raw.lower()) 116 | 117 | 118 | def reorder_title(title, articles=('the',), separators=(',', ', ')): 119 | """ 120 | Reorder the title 121 | :param title: 122 | :type title: 123 | :param articles: 124 | :type articles: 125 | :param separators: 126 | :type separators: 127 | :return: 128 | :rtype: 129 | """ 130 | ltitle = title.lower() 131 | for article in articles: 132 | for separator in separators: 133 | suffix = separator + article 134 | if ltitle[-len(suffix):] == suffix: 135 | return title[-len(suffix) + len(separator):] + ' ' + title[:-len(suffix)] 136 | return title 137 | -------------------------------------------------------------------------------- /guessit/rules/common/numeral.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | parse numeral from various formats 5 | """ 6 | from rebulk.remodule import re 7 | 8 | digital_numeral = r'\d{1,4}' 9 | 10 | roman_numeral = r'(?=[MCDLXVI]+)M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})' 11 | 12 | english_word_numeral_list = [ 13 | 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 14 | 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen', 'twenty' 15 | ] 16 | 17 | french_word_numeral_list = [ 18 | 'zéro', 'un', 'deux', 'trois', 'quatre', 'cinq', 'six', 'sept', 'huit', 'neuf', 'dix', 19 | 'onze', 'douze', 'treize', 'quatorze', 'quinze', 'seize', 'dix-sept', 'dix-huit', 'dix-neuf', 'vingt' 20 | ] 21 | 22 | french_alt_word_numeral_list = [ 23 | 'zero', 'une', 'deux', 'trois', 'quatre', 'cinq', 'six', 'sept', 'huit', 'neuf', 'dix', 24 | 'onze', 'douze', 'treize', 'quatorze', 'quinze', 'seize', 'dixsept', 'dixhuit', 'dixneuf', 'vingt' 25 | ] 26 | 27 | 28 | def __build_word_numeral(*args): 29 | """ 30 | Build word numeral regexp from list. 31 | 32 | :param args: 33 | :type args: 34 | :param kwargs: 35 | :type kwargs: 36 | :return: 37 | :rtype: 38 | """ 39 | re_ = None 40 | for word_list in args: 41 | for word in word_list: 42 | if not re_: 43 | re_ = r'(?:(?=\w+)' 44 | else: 45 | re_ += '|' 46 | re_ += word 47 | re_ += ')' 48 | return re_ 49 | 50 | 51 | word_numeral = __build_word_numeral(english_word_numeral_list, french_word_numeral_list, french_alt_word_numeral_list) 52 | 53 | numeral = '(?:' + digital_numeral + '|' + roman_numeral + '|' + word_numeral + ')' 54 | 55 | __romanNumeralMap = ( 56 | ('M', 1000), 57 | ('CM', 900), 58 | ('D', 500), 59 | ('CD', 400), 60 | ('C', 100), 61 | ('XC', 90), 62 | ('L', 50), 63 | ('XL', 40), 64 | ('X', 10), 65 | ('IX', 9), 66 | ('V', 5), 67 | ('IV', 4), 68 | ('I', 1) 69 | ) 70 | 71 | __romanNumeralPattern = re.compile('^' + roman_numeral + '$') 72 | 73 | 74 | def __parse_roman(value): 75 | """ 76 | convert Roman numeral to integer 77 | 78 | :param value: Value to parse 79 | :type value: string 80 | :return: 81 | :rtype: 82 | """ 83 | if not __romanNumeralPattern.search(value): 84 | raise ValueError(f'Invalid Roman numeral: {value}') 85 | 86 | result = 0 87 | index = 0 88 | for num, integer in __romanNumeralMap: 89 | while value[index:index + len(num)] == num: 90 | result += integer 91 | index += len(num) 92 | return result 93 | 94 | 95 | def __parse_word(value): 96 | """ 97 | Convert Word numeral to integer 98 | 99 | :param value: Value to parse 100 | :type value: string 101 | :return: 102 | :rtype: 103 | """ 104 | for word_list in [english_word_numeral_list, french_word_numeral_list, french_alt_word_numeral_list]: 105 | try: 106 | return word_list.index(value.lower()) 107 | except ValueError: 108 | pass 109 | raise ValueError # pragma: no cover 110 | 111 | 112 | _clean_re = re.compile(r'[^\d]*(\d+)[^\d]*') 113 | 114 | 115 | def parse_numeral(value, int_enabled=True, roman_enabled=True, word_enabled=True, clean=True): 116 | """ 117 | Parse a numeric value into integer. 118 | 119 | :param value: Value to parse. Can be an integer, roman numeral or word. 120 | :type value: string 121 | :param int_enabled: 122 | :type int_enabled: 123 | :param roman_enabled: 124 | :type roman_enabled: 125 | :param word_enabled: 126 | :type word_enabled: 127 | :param clean: 128 | :type clean: 129 | :return: Numeric value, or None if value can't be parsed 130 | :rtype: int 131 | """ 132 | # pylint: disable=too-many-branches 133 | if int_enabled: 134 | try: 135 | if clean: 136 | match = _clean_re.match(value) 137 | if match: 138 | clean_value = match.group(1) 139 | return int(clean_value) 140 | return int(value) 141 | except ValueError: 142 | pass 143 | if roman_enabled: 144 | try: 145 | if clean: 146 | for word in value.split(): 147 | try: 148 | return __parse_roman(word.upper()) 149 | except ValueError: 150 | pass 151 | return __parse_roman(value) 152 | except ValueError: 153 | pass 154 | if word_enabled: 155 | try: 156 | if clean: 157 | for word in value.split(): 158 | try: 159 | return __parse_word(word) 160 | except ValueError: # pragma: no cover 161 | pass 162 | return __parse_word(value) # pragma: no cover 163 | except ValueError: # pragma: no cover 164 | pass 165 | raise ValueError('Invalid numeral: ' + value) # pragma: no cover 166 | -------------------------------------------------------------------------------- /guessit/rules/common/pattern.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Pattern utility functions 5 | """ 6 | 7 | 8 | def is_disabled(context, name): 9 | """Whether a specific pattern is disabled. 10 | 11 | The context object might define an inclusion list (includes) or an exclusion list (excludes) 12 | A pattern is considered disabled if it's found in the exclusion list or 13 | it's not found in the inclusion list and the inclusion list is not empty or not defined. 14 | 15 | :param context: 16 | :param name: 17 | :return: 18 | """ 19 | if not context: 20 | return False 21 | 22 | excludes = context.get('excludes') 23 | if excludes and name in excludes: 24 | return True 25 | 26 | includes = context.get('includes') 27 | return includes and name not in includes 28 | -------------------------------------------------------------------------------- /guessit/rules/common/quantity.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Quantities: Size 5 | """ 6 | from abc import abstractmethod 7 | 8 | from rebulk.remodule import re 9 | 10 | from ..common import seps 11 | 12 | 13 | class Quantity: 14 | """ 15 | Represent a quantity object with magnitude and units. 16 | """ 17 | 18 | parser_re = re.compile(r'(?P\d+(?:[.]\d+)?)(?P[^\d]+)?') 19 | 20 | def __init__(self, magnitude, units): 21 | self.magnitude = magnitude 22 | self.units = units 23 | 24 | @classmethod 25 | @abstractmethod 26 | def parse_units(cls, value): 27 | """ 28 | Parse a string to a proper unit notation. 29 | """ 30 | raise NotImplementedError 31 | 32 | @classmethod 33 | def fromstring(cls, string): 34 | """ 35 | Parse the string into a quantity object. 36 | :param string: 37 | :return: 38 | """ 39 | values = cls.parser_re.match(string).groupdict() 40 | try: 41 | magnitude = int(values['magnitude']) 42 | except ValueError: 43 | magnitude = float(values['magnitude']) 44 | units = cls.parse_units(values['units']) 45 | 46 | return cls(magnitude, units) 47 | 48 | def __hash__(self): 49 | return hash(str(self)) 50 | 51 | def __eq__(self, other): 52 | if isinstance(other, str): 53 | return str(self) == other 54 | if not isinstance(other, self.__class__): 55 | return NotImplemented 56 | return self.magnitude == other.magnitude and self.units == other.units 57 | 58 | def __ne__(self, other): 59 | return not self == other 60 | 61 | def __repr__(self): 62 | return f'<{self.__class__.__name__} [{self}]>' 63 | 64 | def __str__(self): 65 | return f'{self.magnitude}{self.units}' 66 | 67 | 68 | class Size(Quantity): 69 | """ 70 | Represent size. 71 | 72 | e.g.: 1.1GB, 300MB 73 | """ 74 | 75 | @classmethod 76 | def parse_units(cls, value): 77 | return value.strip(seps).upper() 78 | 79 | 80 | class BitRate(Quantity): 81 | """ 82 | Represent bit rate. 83 | 84 | e.g.: 320Kbps, 1.5Mbps 85 | """ 86 | 87 | @classmethod 88 | def parse_units(cls, value): 89 | value = value.strip(seps).capitalize() 90 | for token in ('bits', 'bit'): 91 | value = value.replace(token, 'bps') 92 | 93 | return value 94 | 95 | 96 | class FrameRate(Quantity): 97 | """ 98 | Represent frame rate. 99 | 100 | e.g.: 24fps, 60fps 101 | """ 102 | 103 | @classmethod 104 | def parse_units(cls, value): 105 | return 'fps' 106 | -------------------------------------------------------------------------------- /guessit/rules/common/validators.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Validators 5 | """ 6 | from functools import partial 7 | 8 | from rebulk.validators import chars_before, chars_after, chars_surround 9 | from . import seps 10 | 11 | seps_before = partial(chars_before, seps) 12 | seps_after = partial(chars_after, seps) 13 | seps_surround = partial(chars_surround, seps) 14 | 15 | 16 | def int_coercable(string): 17 | """ 18 | Check if string can be coerced to int 19 | :param string: 20 | :type string: 21 | :return: 22 | :rtype: 23 | """ 24 | try: 25 | int(string) 26 | return True 27 | except ValueError: 28 | return False 29 | 30 | 31 | def and_(*validators): 32 | """ 33 | Compose validators functions 34 | :param validators: 35 | :type validators: 36 | :return: 37 | :rtype: 38 | """ 39 | def composed(string): 40 | """ 41 | Composed validators function 42 | :param string: 43 | :type string: 44 | :return: 45 | :rtype: 46 | """ 47 | for validator in validators: 48 | if not validator(string): 49 | return False 50 | return True 51 | return composed 52 | 53 | 54 | def or_(*validators): 55 | """ 56 | Compose validators functions 57 | :param validators: 58 | :type validators: 59 | :return: 60 | :rtype: 61 | """ 62 | def composed(string): 63 | """ 64 | Composed validators function 65 | :param string: 66 | :type string: 67 | :return: 68 | :rtype: 69 | """ 70 | for validator in validators: 71 | if validator(string): 72 | return True 73 | return False 74 | return composed 75 | -------------------------------------------------------------------------------- /guessit/rules/common/words.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Words utils 5 | """ 6 | from collections import namedtuple 7 | 8 | from . import seps 9 | 10 | _Word = namedtuple('_Word', ['span', 'value']) 11 | 12 | 13 | def iter_words(string): 14 | """ 15 | Iterate on all words in a string 16 | :param string: 17 | :type string: 18 | :return: 19 | :rtype: iterable[str] 20 | """ 21 | i = 0 22 | last_sep_index = -1 23 | inside_word = False 24 | for char in string: 25 | if ord(char) < 128 and char in seps: # Make sure we don't exclude unicode characters. 26 | if inside_word: 27 | yield _Word(span=(last_sep_index+1, i), value=string[last_sep_index+1:i]) 28 | inside_word = False 29 | last_sep_index = i 30 | else: 31 | inside_word = True 32 | i += 1 33 | if inside_word: 34 | yield _Word(span=(last_sep_index+1, i), value=string[last_sep_index+1:i]) 35 | -------------------------------------------------------------------------------- /guessit/rules/markers/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Markers 5 | """ 6 | -------------------------------------------------------------------------------- /guessit/rules/markers/groups.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Groups markers (...), [...] and {...} 5 | """ 6 | from rebulk import Rebulk 7 | 8 | from ...options import ConfigurationException 9 | 10 | def groups(config): 11 | """ 12 | Builder for rebulk object. 13 | 14 | :param config: rule configuration 15 | :type config: dict 16 | :return: Created Rebulk object 17 | :rtype: Rebulk 18 | """ 19 | rebulk = Rebulk() 20 | rebulk.defaults(name="group", marker=True) 21 | 22 | starting = config['starting'] 23 | ending = config['ending'] 24 | 25 | if len(starting) != len(ending): 26 | raise ConfigurationException("Starting and ending groups must have the same length") 27 | 28 | def mark_groups(input_string): 29 | """ 30 | Functional pattern to mark groups (...), [...] and {...}. 31 | 32 | :param input_string: 33 | :return: 34 | """ 35 | openings = ([], ) * len(starting) 36 | i = 0 37 | 38 | ret = [] 39 | for char in input_string: 40 | start_type = starting.find(char) 41 | if start_type > -1: 42 | openings[start_type].append(i) 43 | 44 | i += 1 45 | 46 | end_type = ending.find(char) 47 | if end_type > -1: 48 | try: 49 | start_index = openings[end_type].pop() 50 | ret.append((start_index, i)) 51 | except IndexError: 52 | pass 53 | return ret 54 | 55 | rebulk.functional(mark_groups) 56 | return rebulk 57 | -------------------------------------------------------------------------------- /guessit/rules/markers/path.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Path markers 5 | """ 6 | from rebulk import Rebulk 7 | 8 | from rebulk.utils import find_all 9 | 10 | 11 | def path(config): # pylint:disable=unused-argument 12 | """ 13 | Builder for rebulk object. 14 | 15 | :param config: rule configuration 16 | :type config: dict 17 | :return: Created Rebulk object 18 | :rtype: Rebulk 19 | """ 20 | rebulk = Rebulk() 21 | rebulk.defaults(name="path", marker=True) 22 | 23 | def mark_path(input_string, context): 24 | """ 25 | Functional pattern to mark path elements. 26 | 27 | :param input_string: 28 | :param context: 29 | :return: 30 | """ 31 | ret = [] 32 | if context.get('name_only', False): 33 | ret.append((0, len(input_string))) 34 | else: 35 | indices = list(find_all(input_string, '/')) 36 | indices += list(find_all(input_string, '\\')) 37 | indices += [-1, len(input_string)] 38 | 39 | indices.sort() 40 | 41 | for i in range(0, len(indices) - 1): 42 | ret.append((indices[i] + 1, indices[i + 1])) 43 | 44 | return ret 45 | 46 | rebulk.functional(mark_path) 47 | return rebulk 48 | -------------------------------------------------------------------------------- /guessit/rules/match_processors.py: -------------------------------------------------------------------------------- 1 | """ 2 | Match processors 3 | """ 4 | from guessit.rules.common import seps 5 | 6 | 7 | def strip(match, chars=seps): 8 | """ 9 | Strip given characters from match. 10 | 11 | :param chars: 12 | :param match: 13 | :return: 14 | """ 15 | while match.input_string[match.start] in chars: 16 | match.start += 1 17 | while match.input_string[match.end - 1] in chars: 18 | match.end -= 1 19 | if not match: 20 | return False 21 | return None 22 | -------------------------------------------------------------------------------- /guessit/rules/properties/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Properties 5 | """ 6 | -------------------------------------------------------------------------------- /guessit/rules/properties/audio_codec.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | audio_codec, audio_profile and audio_channels property 5 | """ 6 | from rebulk import Rebulk, Rule, RemoveMatch 7 | from rebulk.remodule import re 8 | 9 | from ..common import dash 10 | from ..common.pattern import is_disabled 11 | from ..common.validators import seps_before, seps_after 12 | from ...config import load_config_patterns 13 | 14 | audio_properties = ['audio_codec', 'audio_profile', 'audio_channels'] 15 | 16 | 17 | def audio_codec(config): # pylint:disable=unused-argument 18 | """ 19 | Builder for rebulk object. 20 | 21 | :param config: rule configuration 22 | :type config: dict 23 | :return: Created Rebulk object 24 | :rtype: Rebulk 25 | """ 26 | rebulk = Rebulk() \ 27 | .regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]) \ 28 | .string_defaults(ignore_case=True) 29 | 30 | def audio_codec_priority(match1, match2): 31 | """ 32 | Gives priority to audio_codec 33 | :param match1: 34 | :type match1: 35 | :param match2: 36 | :type match2: 37 | :return: 38 | :rtype: 39 | """ 40 | if match1.name == 'audio_codec' and match2.name in ['audio_profile', 'audio_channels']: 41 | return match2 42 | if match1.name in ['audio_profile', 'audio_channels'] and match2.name == 'audio_codec': 43 | return match1 44 | return '__default__' 45 | 46 | rebulk.defaults(name='audio_codec', 47 | conflict_solver=audio_codec_priority, 48 | disabled=lambda context: is_disabled(context, 'audio_codec')) 49 | 50 | load_config_patterns(rebulk, config.get('audio_codec')) 51 | 52 | rebulk.defaults(clear=True, 53 | name='audio_profile', 54 | disabled=lambda context: is_disabled(context, 'audio_profile')) 55 | 56 | load_config_patterns(rebulk, config.get('audio_profile')) 57 | 58 | rebulk.defaults(clear=True, 59 | name="audio_channels", 60 | disabled=lambda context: is_disabled(context, 'audio_channels')) 61 | 62 | load_config_patterns(rebulk, config.get('audio_channels')) 63 | 64 | rebulk.rules(DtsHDRule, DtsRule, AacRule, DolbyDigitalRule, AudioValidatorRule, HqConflictRule, 65 | AudioChannelsValidatorRule) 66 | 67 | return rebulk 68 | 69 | 70 | class AudioValidatorRule(Rule): 71 | """ 72 | Remove audio properties if not surrounded by separators and not next each others 73 | """ 74 | priority = 64 75 | consequence = RemoveMatch 76 | 77 | def when(self, matches, context): 78 | ret = [] 79 | 80 | audio_list = matches.range(predicate=lambda match: match.name in audio_properties) 81 | for audio in audio_list: 82 | if not seps_before(audio): 83 | valid_before = matches.range(audio.start - 1, audio.start, 84 | lambda match: match.name in audio_properties) 85 | if not valid_before: 86 | ret.append(audio) 87 | continue 88 | if not seps_after(audio): 89 | valid_after = matches.range(audio.end, audio.end + 1, 90 | lambda match: match.name in audio_properties) 91 | if not valid_after: 92 | ret.append(audio) 93 | continue 94 | 95 | return ret 96 | 97 | 98 | class AudioProfileRule(Rule): 99 | """ 100 | Abstract rule to validate audio profiles 101 | """ 102 | priority = 64 103 | dependency = AudioValidatorRule 104 | consequence = RemoveMatch 105 | 106 | def __init__(self, codec): 107 | super().__init__() 108 | self.codec = codec 109 | 110 | def enabled(self, context): 111 | return not is_disabled(context, 'audio_profile') 112 | 113 | def when(self, matches, context): 114 | profile_list = matches.named('audio_profile', 115 | lambda match: 'audio_profile.rule' in match.tags and 116 | self.codec in match.tags) 117 | ret = [] 118 | for profile in profile_list: 119 | codec = matches.at_span(profile.span, 120 | lambda match: match.name == 'audio_codec' and 121 | match.value == self.codec, 0) 122 | if not codec: 123 | codec = matches.previous(profile, 124 | lambda match: match.name == 'audio_codec' and 125 | match.value == self.codec) 126 | if not codec: 127 | codec = matches.next(profile, 128 | lambda match: match.name == 'audio_codec' and 129 | match.value == self.codec) 130 | if not codec: 131 | ret.append(profile) 132 | if codec: 133 | ret.extend(matches.conflicting(profile)) 134 | return ret 135 | 136 | 137 | class DtsHDRule(AudioProfileRule): 138 | """ 139 | Rule to validate DTS-HD profile 140 | """ 141 | 142 | def __init__(self): 143 | super().__init__('DTS-HD') 144 | 145 | 146 | class DtsRule(AudioProfileRule): 147 | """ 148 | Rule to validate DTS profile 149 | """ 150 | 151 | def __init__(self): 152 | super().__init__('DTS') 153 | 154 | 155 | class AacRule(AudioProfileRule): 156 | """ 157 | Rule to validate AAC profile 158 | """ 159 | 160 | def __init__(self): 161 | super().__init__('AAC') 162 | 163 | 164 | class DolbyDigitalRule(AudioProfileRule): 165 | """ 166 | Rule to validate Dolby Digital profile 167 | """ 168 | 169 | def __init__(self): 170 | super().__init__('Dolby Digital') 171 | 172 | 173 | class HqConflictRule(Rule): 174 | """ 175 | Solve conflict between HQ from other property and from audio_profile. 176 | """ 177 | 178 | dependency = [DtsHDRule, DtsRule, AacRule, DolbyDigitalRule] 179 | consequence = RemoveMatch 180 | 181 | def enabled(self, context): 182 | return not is_disabled(context, 'audio_profile') 183 | 184 | def when(self, matches, context): 185 | hq_audio = matches.named('audio_profile', lambda m: m.value == 'High Quality') 186 | hq_audio_spans = [match.span for match in hq_audio] 187 | return matches.named('other', lambda m: m.span in hq_audio_spans) 188 | 189 | 190 | class AudioChannelsValidatorRule(Rule): 191 | """ 192 | Remove audio_channel if no audio codec as previous match. 193 | """ 194 | priority = 128 195 | consequence = RemoveMatch 196 | 197 | def enabled(self, context): 198 | return not is_disabled(context, 'audio_channels') 199 | 200 | def when(self, matches, context): 201 | ret = [] 202 | 203 | for audio_channel in matches.tagged('weak-audio_channels'): 204 | valid_before = matches.range(audio_channel.start - 1, audio_channel.start, 205 | lambda match: match.name == 'audio_codec') 206 | if not valid_before: 207 | ret.append(audio_channel) 208 | 209 | return ret 210 | -------------------------------------------------------------------------------- /guessit/rules/properties/bit_rate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | video_bit_rate and audio_bit_rate properties 5 | """ 6 | from rebulk import Rebulk 7 | from rebulk.remodule import re 8 | from rebulk.rules import Rule, RemoveMatch, RenameMatch 9 | 10 | from ..common import dash, seps 11 | from ..common.pattern import is_disabled 12 | from ..common.validators import seps_surround 13 | from ...config import load_config_patterns 14 | 15 | 16 | def bit_rate(config): # pylint:disable=unused-argument 17 | """ 18 | Builder for rebulk object. 19 | 20 | :param config: rule configuration 21 | :type config: dict 22 | :return: Created Rebulk object 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: (is_disabled(context, 'audio_bit_rate') 26 | and is_disabled(context, 'video_bit_rate'))) 27 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]) 28 | rebulk.defaults(name='audio_bit_rate', validator=seps_surround) 29 | 30 | load_config_patterns(rebulk, config.get('bit_rate')) 31 | 32 | rebulk.rules(BitRateTypeRule) 33 | 34 | return rebulk 35 | 36 | 37 | class BitRateTypeRule(Rule): 38 | """ 39 | Convert audio bit rate guess into video bit rate. 40 | """ 41 | consequence = [RenameMatch('video_bit_rate'), RemoveMatch] 42 | 43 | def when(self, matches, context): 44 | to_rename = [] 45 | to_remove = [] 46 | 47 | if is_disabled(context, 'audio_bit_rate'): 48 | to_remove.extend(matches.named('audio_bit_rate')) 49 | else: 50 | video_bit_rate_disabled = is_disabled(context, 'video_bit_rate') 51 | for match in matches.named('audio_bit_rate'): 52 | previous = matches.previous(match, index=0, 53 | predicate=lambda m: m.name in ('source', 'screen_size', 'video_codec')) 54 | if previous and not matches.holes(previous.end, match.start, predicate=lambda m: m.value.strip(seps)): 55 | after = matches.next(match, index=0, predicate=lambda m: m.name == 'audio_codec') 56 | if after and not matches.holes(match.end, after.start, predicate=lambda m: m.value.strip(seps)): 57 | bitrate = match.value 58 | if bitrate.units == 'Kbps' or (bitrate.units == 'Mbps' and bitrate.magnitude < 10): 59 | continue 60 | 61 | if video_bit_rate_disabled: 62 | to_remove.append(match) 63 | else: 64 | to_rename.append(match) 65 | 66 | if to_rename or to_remove: 67 | return to_rename, to_remove 68 | return False 69 | -------------------------------------------------------------------------------- /guessit/rules/properties/bonus.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | bonus property 5 | """ 6 | from rebulk import Rebulk, AppendMatch, Rule 7 | from rebulk.remodule import re 8 | 9 | from .title import TitleFromPosition 10 | from ..common.formatters import cleanup 11 | from ..common.pattern import is_disabled 12 | from ...config import load_config_patterns 13 | 14 | 15 | def bonus(config): # pylint:disable=unused-argument 16 | """ 17 | Builder for rebulk object. 18 | 19 | :param config: rule configuration 20 | :type config: dict 21 | :return: Created Rebulk object 22 | :rtype: Rebulk 23 | """ 24 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'bonus')) 25 | rebulk = rebulk.regex_defaults(name='bonus', flags=re.IGNORECASE) 26 | 27 | load_config_patterns(rebulk, config.get('bonus')) 28 | 29 | rebulk.rules(BonusTitleRule) 30 | 31 | return rebulk 32 | 33 | 34 | class BonusTitleRule(Rule): 35 | """ 36 | Find bonus title after bonus. 37 | """ 38 | dependency = TitleFromPosition 39 | consequence = AppendMatch 40 | 41 | properties = {'bonus_title': [None]} 42 | 43 | def when(self, matches, context): # pylint:disable=inconsistent-return-statements 44 | bonus_number = matches.named('bonus', lambda match: not match.private, index=0) 45 | if bonus_number: 46 | filepath = matches.markers.at_match(bonus_number, lambda marker: marker.name == 'path', 0) 47 | hole = matches.holes(bonus_number.end, filepath.end + 1, formatter=cleanup, index=0) 48 | if hole and hole.value: 49 | hole.name = 'bonus_title' 50 | return hole 51 | -------------------------------------------------------------------------------- /guessit/rules/properties/cd.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | cd and cd_count properties 5 | """ 6 | from rebulk import Rebulk 7 | from rebulk.remodule import re 8 | 9 | from ..common import dash 10 | from ..common.pattern import is_disabled 11 | from ...config import load_config_patterns 12 | 13 | 14 | def cd(config): # pylint:disable=unused-argument,invalid-name 15 | """ 16 | Builder for rebulk object. 17 | 18 | :param config: rule configuration 19 | :type config: dict 20 | :return: Created Rebulk object 21 | :rtype: Rebulk 22 | """ 23 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'cd')) 24 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]) 25 | 26 | load_config_patterns(rebulk, config) 27 | 28 | return rebulk 29 | -------------------------------------------------------------------------------- /guessit/rules/properties/container.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | container property 5 | """ 6 | from rebulk.remodule import re 7 | 8 | from rebulk import Rebulk 9 | 10 | from ..common import seps 11 | from ..common.pattern import is_disabled 12 | from ..common.validators import seps_surround 13 | from ...reutils import build_or_pattern 14 | 15 | 16 | def container(config): 17 | """ 18 | Builder for rebulk object. 19 | 20 | :param config: rule configuration 21 | :type config: dict 22 | :return: Created Rebulk object 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'container')) 26 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE).string_defaults(ignore_case=True) 27 | rebulk.defaults(name='container', 28 | formatter=lambda value: value.strip(seps), 29 | tags=['extension'], 30 | conflict_solver=lambda match, other: other 31 | if other.name in ('source', 'video_codec') or 32 | other.name == 'container' and 'extension' not in other.tags 33 | else '__default__') 34 | 35 | subtitles = config['subtitles'] 36 | info = config['info'] 37 | videos = config['videos'] 38 | torrent = config['torrent'] 39 | nzb = config['nzb'] 40 | 41 | rebulk.regex(r'\.'+build_or_pattern(subtitles)+'$', exts=subtitles, tags=['extension', 'subtitle']) 42 | rebulk.regex(r'\.'+build_or_pattern(info)+'$', exts=info, tags=['extension', 'info']) 43 | rebulk.regex(r'\.'+build_or_pattern(videos)+'$', exts=videos, tags=['extension', 'video']) 44 | rebulk.regex(r'\.'+build_or_pattern(torrent)+'$', exts=torrent, tags=['extension', 'torrent']) 45 | rebulk.regex(r'\.'+build_or_pattern(nzb)+'$', exts=nzb, tags=['extension', 'nzb']) 46 | 47 | rebulk.defaults(clear=True, 48 | name='container', 49 | validator=seps_surround, 50 | formatter=lambda s: s.lower(), 51 | conflict_solver=lambda match, other: match 52 | if other.name in ('source', 53 | 'video_codec') or other.name == 'container' and 'extension' in other.tags 54 | else '__default__') 55 | 56 | rebulk.string(*[sub for sub in subtitles if sub not in ('sub', 'ass')], tags=['subtitle']) 57 | rebulk.string(*videos, tags=['video']) 58 | rebulk.string(*torrent, tags=['torrent']) 59 | rebulk.string(*nzb, tags=['nzb']) 60 | 61 | return rebulk 62 | -------------------------------------------------------------------------------- /guessit/rules/properties/country.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | country property 5 | """ 6 | # pylint: disable=no-member 7 | import babelfish 8 | 9 | from rebulk import Rebulk 10 | from ..common.pattern import is_disabled 11 | from ..common.words import iter_words 12 | 13 | 14 | def country(config, common_words): 15 | """ 16 | Builder for rebulk object. 17 | 18 | :param config: rule configuration 19 | :type config: dict 20 | :param common_words: common words 21 | :type common_words: set 22 | :return: Created Rebulk object 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'country')) 26 | rebulk = rebulk.defaults(name='country') 27 | 28 | def find_countries(string, context=None): 29 | """ 30 | Find countries in given string. 31 | """ 32 | allowed_countries = context.get('allowed_countries') if context else None 33 | return CountryFinder(allowed_countries, common_words).find(string) 34 | 35 | rebulk.functional(find_countries, 36 | #  Prefer language and any other property over country if not US or GB. 37 | conflict_solver=lambda match, other: match 38 | if other.name != 'language' or match.value not in (babelfish.Country('US'), 39 | babelfish.Country('GB')) 40 | else other, 41 | properties={'country': [None]}, 42 | disabled=lambda context: not context.get('allowed_countries')) 43 | 44 | babelfish.country_converters['guessit'] = GuessitCountryConverter(config['synonyms']) 45 | 46 | return rebulk 47 | 48 | 49 | class GuessitCountryConverter(babelfish.CountryReverseConverter): # pylint: disable=missing-docstring 50 | def __init__(self, synonyms): 51 | self.guessit_exceptions = {} 52 | 53 | for alpha2, synlist in synonyms.items(): 54 | for syn in synlist: 55 | self.guessit_exceptions[syn.lower()] = alpha2 56 | 57 | @property 58 | def codes(self): # pylint: disable=missing-docstring 59 | return (babelfish.country_converters['name'].codes | 60 | frozenset(babelfish.COUNTRIES.values()) | 61 | frozenset(self.guessit_exceptions.keys())) 62 | 63 | def convert(self, alpha2): 64 | if alpha2 == 'GB': 65 | return 'UK' 66 | return str(babelfish.Country(alpha2)) 67 | 68 | def reverse(self, name): # pylint:disable=arguments-renamed 69 | # exceptions come first, as they need to override a potential match 70 | # with any of the other guessers 71 | try: 72 | return self.guessit_exceptions[name.lower()] 73 | except KeyError: 74 | pass 75 | 76 | try: 77 | return babelfish.Country(name.upper()).alpha2 78 | except ValueError: 79 | pass 80 | 81 | for conv in [babelfish.Country.fromname]: 82 | try: 83 | return conv(name).alpha2 84 | except babelfish.CountryReverseError: 85 | pass 86 | 87 | raise babelfish.CountryReverseError(name) 88 | 89 | 90 | class CountryFinder: 91 | """Helper class to search and return country matches.""" 92 | 93 | def __init__(self, allowed_countries, common_words): 94 | self.allowed_countries = {l.lower() for l in allowed_countries or []} 95 | self.common_words = common_words 96 | 97 | def find(self, string): 98 | """Return all matches for country.""" 99 | for word_match in iter_words(string.strip().lower()): 100 | word = word_match.value 101 | if word.lower() in self.common_words: 102 | continue 103 | 104 | try: 105 | country_object = babelfish.Country.fromguessit(word) 106 | if (country_object.name.lower() in self.allowed_countries or 107 | country_object.alpha2.lower() in self.allowed_countries): 108 | yield self._to_rebulk_match(word_match, country_object) 109 | except babelfish.Error: 110 | continue 111 | 112 | @classmethod 113 | def _to_rebulk_match(cls, word, value): 114 | return word.span[0], word.span[1], {'value': value} 115 | -------------------------------------------------------------------------------- /guessit/rules/properties/crc.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | crc and uuid properties 5 | """ 6 | from rebulk.remodule import re 7 | 8 | from rebulk import Rebulk 9 | from ..common.pattern import is_disabled 10 | from ..common.validators import seps_surround 11 | 12 | 13 | def crc(config): # pylint:disable=unused-argument 14 | """ 15 | Builder for rebulk object. 16 | 17 | :param config: rule configuration 18 | :type config: dict 19 | :return: Created Rebulk object 20 | :rtype: Rebulk 21 | """ 22 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'crc32')) 23 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE) 24 | rebulk.defaults(validator=seps_surround) 25 | 26 | rebulk.regex('(?:[a-fA-F]|[0-9]){8}', name='crc32', 27 | conflict_solver=lambda match, other: other 28 | if other.name in ['episode', 'season'] 29 | else '__default__') 30 | 31 | rebulk.functional(guess_idnumber, name='uuid', 32 | conflict_solver=lambda match, other: match 33 | if other.name in ['episode', 'season'] 34 | else '__default__') 35 | return rebulk 36 | 37 | 38 | _digit = 0 39 | _letter = 1 40 | _other = 2 41 | 42 | _idnum = re.compile(r'(?P[a-zA-Z0-9-]{20,})') # 1.0, (0, 0)) 43 | 44 | 45 | def guess_idnumber(string): 46 | """ 47 | Guess id number function 48 | :param string: 49 | :type string: 50 | :return: 51 | :rtype: 52 | """ 53 | # pylint:disable=invalid-name 54 | ret = [] 55 | 56 | matches = list(_idnum.finditer(string)) 57 | for match in matches: 58 | result = match.groupdict() 59 | switch_count = 0 60 | switch_letter_count = 0 61 | letter_count = 0 62 | last_letter = None 63 | 64 | last = _letter 65 | for c in result['uuid']: 66 | if c in '0123456789': 67 | ci = _digit 68 | elif c in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ': 69 | ci = _letter 70 | if c != last_letter: 71 | switch_letter_count += 1 72 | last_letter = c 73 | letter_count += 1 74 | else: 75 | ci = _other 76 | 77 | if ci != last: 78 | switch_count += 1 79 | 80 | last = ci 81 | 82 | # only return the result as probable if we alternate often between 83 | # char type (more likely for hash values than for common words) 84 | switch_ratio = float(switch_count) / len(result['uuid']) 85 | letters_ratio = (float(switch_letter_count) / letter_count) if letter_count > 0 else 1 86 | 87 | if switch_ratio > 0.4 and letters_ratio > 0.4: 88 | ret.append(match.span()) 89 | 90 | return ret 91 | -------------------------------------------------------------------------------- /guessit/rules/properties/date.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | date, week and year properties 5 | """ 6 | import re 7 | 8 | from rebulk import Rebulk, RemoveMatch, Rule 9 | 10 | from ..common import dash 11 | from ..common.date import search_date, valid_year, valid_week 12 | from ..common.pattern import is_disabled 13 | from ..common.validators import seps_surround 14 | from ...reutils import build_or_pattern 15 | 16 | 17 | def date(config): # pylint:disable=unused-argument 18 | """ 19 | Builder for rebulk object. 20 | 21 | :param config: rule configuration 22 | :type config: dict 23 | :return: Created Rebulk object 24 | :rtype: Rebulk 25 | """ 26 | rebulk = Rebulk().defaults(validator=seps_surround) 27 | 28 | rebulk.regex(r"\d{4}", name="year", formatter=int, 29 | disabled=lambda context: is_disabled(context, 'year'), 30 | conflict_solver=lambda match, other: other 31 | if other.name in ('episode', 'season') and len(other.raw) < len(match.raw) 32 | else '__default__', 33 | validator=lambda match: seps_surround(match) and valid_year(match.value)) 34 | 35 | rebulk.regex(build_or_pattern(config.get('week_words')) + r"-?(\d{1,2})", 36 | name="week", formatter=int, 37 | children=True, 38 | flags=re.IGNORECASE, abbreviations=[dash], 39 | conflict_solver=lambda match, other: other 40 | if other.name in ('episode', 'season') and len(other.raw) < len(match.raw) 41 | else '__default__', 42 | validator=lambda match: seps_surround(match) and valid_week(match.value)) 43 | 44 | def date_functional(string, context): # pylint:disable=inconsistent-return-statements 45 | """ 46 | Search for date in the string and retrieves match 47 | 48 | :param string: 49 | :return: 50 | """ 51 | 52 | ret = search_date(string, context.get('date_year_first'), context.get('date_day_first')) 53 | if ret: 54 | return ret[0], ret[1], {'value': ret[2]} 55 | 56 | rebulk.functional(date_functional, name="date", properties={'date': [None]}, 57 | disabled=lambda context: is_disabled(context, 'date'), 58 | conflict_solver=lambda match, other: other 59 | if other.name in ('episode', 'season', 'crc32') 60 | else '__default__') 61 | 62 | rebulk.rules(KeepMarkedYearInFilepart) 63 | 64 | return rebulk 65 | 66 | 67 | class KeepMarkedYearInFilepart(Rule): 68 | """ 69 | Keep first years marked with [](){} in filepart, or if no year is marked, ensure it won't override titles. 70 | """ 71 | priority = 64 72 | consequence = RemoveMatch 73 | 74 | def enabled(self, context): 75 | return not is_disabled(context, 'year') 76 | 77 | def when(self, matches, context): 78 | ret = [] 79 | if len(matches.named('year')) > 1: 80 | for filepart in matches.markers.named('path'): 81 | years = matches.range(filepart.start, filepart.end, lambda match: match.name == 'year') 82 | if len(years) > 1: 83 | group_years = [] 84 | ungroup_years = [] 85 | for year in years: 86 | if matches.markers.at_match(year, lambda marker: marker.name == 'group'): 87 | group_years.append(year) 88 | else: 89 | ungroup_years.append(year) 90 | if group_years and ungroup_years: 91 | ret.extend(ungroup_years) 92 | ret.extend(group_years[1:]) # Keep the first year in marker. 93 | elif not group_years: 94 | ret.append(ungroup_years[0]) # Keep first year for title. 95 | if len(ungroup_years) > 2: 96 | ret.extend(ungroup_years[2:]) 97 | return ret 98 | -------------------------------------------------------------------------------- /guessit/rules/properties/edition.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | edition property 5 | """ 6 | from rebulk import Rebulk 7 | from rebulk.remodule import re 8 | 9 | from ..common import dash 10 | from ..common.pattern import is_disabled 11 | from ..common.validators import seps_surround 12 | from ...config import load_config_patterns 13 | 14 | 15 | def edition(config): # pylint:disable=unused-argument 16 | """ 17 | Builder for rebulk object. 18 | 19 | :param config: rule configuration 20 | :type config: dict 21 | :return: Created Rebulk object 22 | :rtype: Rebulk 23 | """ 24 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'edition')) 25 | rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]).string_defaults(ignore_case=True) 26 | rebulk.defaults(name='edition', validator=seps_surround) 27 | 28 | load_config_patterns(rebulk, config.get('edition')) 29 | 30 | return rebulk 31 | -------------------------------------------------------------------------------- /guessit/rules/properties/film.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | film property 5 | """ 6 | from rebulk import Rebulk, AppendMatch, Rule 7 | from rebulk.remodule import re 8 | 9 | from ..common import dash 10 | from ..common.formatters import cleanup 11 | from ..common.pattern import is_disabled 12 | from ..common.validators import seps_surround 13 | from ...config import load_config_patterns 14 | 15 | 16 | def film(config): # pylint:disable=unused-argument 17 | """ 18 | Builder for rebulk object. 19 | :return: Created Rebulk object 20 | :rtype: Rebulk 21 | """ 22 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'film')) 23 | rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]).string_defaults(ignore_case=True) 24 | rebulk.defaults(name='film', validator=seps_surround) 25 | 26 | load_config_patterns(rebulk, config.get('film')) 27 | 28 | rebulk.rules(FilmTitleRule) 29 | 30 | return rebulk 31 | 32 | 33 | class FilmTitleRule(Rule): 34 | """ 35 | Rule to find out film_title (hole after film property 36 | """ 37 | consequence = AppendMatch 38 | 39 | properties = {'film_title': [None]} 40 | 41 | def enabled(self, context): 42 | return not is_disabled(context, 'film_title') 43 | 44 | def when(self, matches, context): # pylint:disable=inconsistent-return-statements 45 | bonus_number = matches.named('film', lambda match: not match.private, index=0) 46 | if bonus_number: 47 | filepath = matches.markers.at_match(bonus_number, lambda marker: marker.name == 'path', 0) 48 | hole = matches.holes(filepath.start, bonus_number.start + 1, formatter=cleanup, index=0) 49 | if hole and hole.value: 50 | hole.name = 'film_title' 51 | return hole 52 | -------------------------------------------------------------------------------- /guessit/rules/properties/mimetype.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | mimetype property 5 | """ 6 | import mimetypes 7 | 8 | from rebulk import Rebulk, CustomRule, POST_PROCESS 9 | from rebulk.match import Match 10 | 11 | from ..common.pattern import is_disabled 12 | from ...rules.processors import Processors 13 | 14 | 15 | def mimetype(config): # pylint:disable=unused-argument 16 | """ 17 | Builder for rebulk object. 18 | 19 | :param config: rule configuration 20 | :type config: dict 21 | :return: Created Rebulk object 22 | :rtype: Rebulk 23 | """ 24 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'mimetype')) 25 | rebulk.rules(Mimetype) 26 | 27 | return rebulk 28 | 29 | 30 | class Mimetype(CustomRule): 31 | """ 32 | Mimetype post processor 33 | :param matches: 34 | :type matches: 35 | :return: 36 | :rtype: 37 | """ 38 | priority = POST_PROCESS 39 | 40 | dependency = Processors 41 | 42 | def when(self, matches, context): 43 | mime, _ = mimetypes.guess_type(matches.input_string, strict=False) 44 | return mime 45 | 46 | def then(self, matches, when_response, context): 47 | mime = when_response 48 | matches.append(Match(len(matches.input_string), len(matches.input_string), name='mimetype', value=mime)) 49 | 50 | @property 51 | def properties(self): 52 | """ 53 | Properties for this rule. 54 | """ 55 | return {'mimetype': [None]} 56 | -------------------------------------------------------------------------------- /guessit/rules/properties/part.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | part property 5 | """ 6 | from rebulk.remodule import re 7 | 8 | from rebulk import Rebulk 9 | from ..common import dash 10 | from ..common.pattern import is_disabled 11 | from ..common.validators import seps_surround, int_coercable, and_ 12 | from ..common.numeral import numeral, parse_numeral 13 | from ...reutils import build_or_pattern 14 | 15 | 16 | def part(config): # pylint:disable=unused-argument 17 | """ 18 | Builder for rebulk object. 19 | 20 | :param config: rule configuration 21 | :type config: dict 22 | :return: Created Rebulk object 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'part')) 26 | rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash], validator={'__parent__': seps_surround}) 27 | 28 | prefixes = config['prefixes'] 29 | 30 | def validate_roman(match): 31 | """ 32 | Validate a roman match if surrounded by separators 33 | :param match: 34 | :type match: 35 | :return: 36 | :rtype: 37 | """ 38 | if int_coercable(match.raw): 39 | return True 40 | return seps_surround(match) 41 | 42 | rebulk.regex(build_or_pattern(prefixes) + r'-?(?P' + numeral + r')', 43 | prefixes=prefixes, validate_all=True, private_parent=True, children=True, formatter=parse_numeral, 44 | validator={'part': and_(validate_roman, lambda m: 0 < m.value < 100)}) 45 | 46 | return rebulk 47 | -------------------------------------------------------------------------------- /guessit/rules/properties/screen_size.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | screen_size property 5 | """ 6 | from rebulk.match import Match 7 | from rebulk.remodule import re 8 | 9 | from rebulk import Rebulk, Rule, RemoveMatch, AppendMatch 10 | 11 | from ..common.pattern import is_disabled 12 | from ..common.quantity import FrameRate 13 | from ..common.validators import seps_surround 14 | from ..common import dash, seps 15 | from ...reutils import build_or_pattern 16 | 17 | 18 | def screen_size(config): 19 | """ 20 | Builder for rebulk object. 21 | 22 | :param config: rule configuration 23 | :type config: dict 24 | :return: Created Rebulk object 25 | :rtype: Rebulk 26 | """ 27 | interlaced = frozenset(config['interlaced']) 28 | progressive = frozenset(config['progressive']) 29 | frame_rates = frozenset(config['frame_rates']) 30 | min_ar = config['min_ar'] 31 | max_ar = config['max_ar'] 32 | 33 | rebulk = Rebulk() 34 | rebulk = rebulk.string_defaults(ignore_case=True).regex_defaults(flags=re.IGNORECASE) 35 | 36 | rebulk.defaults(name='screen_size', validator=seps_surround, abbreviations=[dash], 37 | disabled=lambda context: is_disabled(context, 'screen_size')) 38 | 39 | frame_rate_pattern = build_or_pattern(frame_rates, name='frame_rate') 40 | interlaced_pattern = build_or_pattern(interlaced, name='height') 41 | progressive_pattern = build_or_pattern(progressive, name='height') 42 | 43 | res_pattern = r'(?:(?P\d{3,4})(?:x|\*))?' 44 | rebulk.regex(res_pattern + interlaced_pattern + r'(?Pi)' + frame_rate_pattern + '?') 45 | rebulk.regex(res_pattern + progressive_pattern + r'(?Pp)' + frame_rate_pattern + '?') 46 | rebulk.regex(res_pattern + progressive_pattern + r'(?Pp)?(?:hd)') 47 | rebulk.regex(res_pattern + progressive_pattern + r'(?Pp)?x?') 48 | rebulk.string('4k', value='2160p', 49 | conflict_solver=lambda match, other: '__default__' if other.name == 'screen_size' else match) 50 | rebulk.regex(r'(?P\d{3,4})-?(?:x|\*)-?(?P\d{3,4})', 51 | conflict_solver=lambda match, other: '__default__' if other.name == 'screen_size' else other) 52 | 53 | rebulk.regex(frame_rate_pattern + '-?(?:p|fps)', name='frame_rate', 54 | formatter=FrameRate.fromstring, disabled=lambda context: is_disabled(context, 'frame_rate')) 55 | 56 | rebulk.rules(PostProcessScreenSize(progressive, min_ar, max_ar), ScreenSizeOnlyOne, ResolveScreenSizeConflicts) 57 | 58 | return rebulk 59 | 60 | 61 | class PostProcessScreenSize(Rule): 62 | """ 63 | Process the screen size calculating the aspect ratio if available. 64 | 65 | Convert to a standard notation (720p, 1080p, etc) when it's a standard resolution and 66 | aspect ratio is valid or not available. 67 | 68 | It also creates an aspect_ratio match when available. 69 | """ 70 | consequence = AppendMatch 71 | 72 | def __init__(self, standard_heights, min_ar, max_ar): 73 | super().__init__() 74 | self.standard_heights = standard_heights 75 | self.min_ar = min_ar 76 | self.max_ar = max_ar 77 | 78 | def when(self, matches, context): 79 | to_append = [] 80 | for match in matches.named('screen_size'): 81 | if not is_disabled(context, 'frame_rate'): 82 | for frame_rate in match.children.named('frame_rate'): 83 | frame_rate.formatter = FrameRate.fromstring 84 | to_append.append(frame_rate) 85 | 86 | values = match.children.to_dict() 87 | if 'height' not in values: 88 | continue 89 | 90 | scan_type = (values.get('scan_type') or 'p').lower() 91 | height = values['height'] 92 | if 'width' not in values: 93 | match.value = f'{height}{scan_type}' 94 | continue 95 | 96 | width = values['width'] 97 | calculated_ar = float(width) / float(height) 98 | 99 | aspect_ratio = Match(match.start, match.end, input_string=match.input_string, 100 | name='aspect_ratio', value=round(calculated_ar, 3)) 101 | 102 | if not is_disabled(context, 'aspect_ratio'): 103 | to_append.append(aspect_ratio) 104 | 105 | if height in self.standard_heights and self.min_ar < calculated_ar < self.max_ar: 106 | match.value = f'{height}{scan_type}' 107 | else: 108 | match.value = f'{width}x{height}' 109 | 110 | return to_append 111 | 112 | 113 | class ScreenSizeOnlyOne(Rule): 114 | """ 115 | Keep a single screen_size per filepath part. 116 | """ 117 | consequence = RemoveMatch 118 | 119 | def when(self, matches, context): 120 | to_remove = [] 121 | for filepart in matches.markers.named('path'): 122 | screensize = list(reversed(matches.range(filepart.start, filepart.end, 123 | lambda match: match.name == 'screen_size'))) 124 | if len(screensize) > 1 and len(set((match.value for match in screensize))) > 1: 125 | to_remove.extend(screensize[1:]) 126 | 127 | return to_remove 128 | 129 | 130 | class ResolveScreenSizeConflicts(Rule): 131 | """ 132 | Resolve screen_size conflicts with season and episode matches. 133 | """ 134 | consequence = RemoveMatch 135 | 136 | def when(self, matches, context): 137 | to_remove = [] 138 | for filepart in matches.markers.named('path'): 139 | screensize = matches.range(filepart.start, filepart.end, lambda match: match.name == 'screen_size', 0) 140 | if not screensize: 141 | continue 142 | 143 | conflicts = matches.conflicting(screensize, lambda match: match.name in ('season', 'episode')) 144 | if not conflicts: 145 | continue 146 | 147 | has_neighbor = False 148 | video_profile = matches.range(screensize.end, filepart.end, lambda match: match.name == 'video_profile', 0) 149 | if video_profile and not matches.holes(screensize.end, video_profile.start, 150 | predicate=lambda h: h.value and h.value.strip(seps)): 151 | to_remove.extend(conflicts) 152 | has_neighbor = True 153 | 154 | previous = matches.previous(screensize, index=0, predicate=( 155 | lambda m: m.name in ('date', 'source', 'other', 'streaming_service'))) 156 | if previous and not matches.holes(previous.end, screensize.start, 157 | predicate=lambda h: h.value and h.value.strip(seps)): 158 | to_remove.extend(conflicts) 159 | has_neighbor = True 160 | 161 | if not has_neighbor: 162 | to_remove.append(screensize) 163 | 164 | return to_remove 165 | -------------------------------------------------------------------------------- /guessit/rules/properties/size.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | size property 5 | """ 6 | from rebulk.remodule import re 7 | 8 | from rebulk import Rebulk 9 | 10 | from ..common import dash 11 | from ..common.quantity import Size 12 | from ..common.pattern import is_disabled 13 | from ..common.validators import seps_surround 14 | 15 | 16 | def size(config): # pylint:disable=unused-argument 17 | """ 18 | Builder for rebulk object. 19 | 20 | :param config: rule configuration 21 | :type config: dict 22 | :return: Created Rebulk object 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'size')) 26 | rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]) 27 | rebulk.defaults(name='size', validator=seps_surround) 28 | rebulk.regex(r'\d+-?[mgt]b', r'\d+\.\d+-?[mgt]b', formatter=Size.fromstring, tags=['release-group-prefix']) 29 | 30 | return rebulk 31 | -------------------------------------------------------------------------------- /guessit/rules/properties/streaming_service.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | streaming_service property 5 | """ 6 | from rebulk.remodule import re 7 | 8 | from rebulk import Rebulk 9 | from rebulk.rules import Rule, RemoveMatch 10 | 11 | from ..common.pattern import is_disabled 12 | from ...config import load_config_patterns 13 | from ...rules.common import seps, dash 14 | from ...rules.common.validators import seps_before, seps_after 15 | 16 | 17 | def streaming_service(config): # pylint: disable=too-many-statements,unused-argument 18 | """Streaming service property. 19 | 20 | :param config: rule configuration 21 | :type config: dict 22 | :return: 23 | :rtype: Rebulk 24 | """ 25 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'streaming_service')) 26 | rebulk = rebulk.string_defaults(ignore_case=True).regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]) 27 | rebulk.defaults(name='streaming_service', tags=['source-prefix']) 28 | 29 | load_config_patterns(rebulk, config) 30 | 31 | rebulk.rules(ValidateStreamingService) 32 | 33 | return rebulk 34 | 35 | 36 | class ValidateStreamingService(Rule): 37 | """Validate streaming service matches.""" 38 | 39 | priority = 128 40 | consequence = RemoveMatch 41 | 42 | def when(self, matches, context): 43 | """Streaming service is always before source. 44 | 45 | :param matches: 46 | :type matches: rebulk.match.Matches 47 | :param context: 48 | :type context: dict 49 | :return: 50 | """ 51 | to_remove = [] 52 | for service in matches.named('streaming_service'): 53 | next_match = matches.next(service, lambda match: 'streaming_service.suffix' in match.tags, 0) 54 | previous_match = matches.previous(service, lambda match: 'streaming_service.prefix' in match.tags, 0) 55 | has_other = service.initiator and service.initiator.children.named('other') 56 | 57 | if not has_other: 58 | if (not next_match or 59 | matches.holes(service.end, next_match.start, 60 | predicate=lambda match: match.value.strip(seps)) or 61 | not seps_before(service)): 62 | if (not previous_match or 63 | matches.holes(previous_match.end, service.start, 64 | predicate=lambda match: match.value.strip(seps)) or 65 | not seps_after(service)): 66 | to_remove.append(service) 67 | continue 68 | 69 | if service.value == 'Comedy Central': 70 | # Current match is a valid streaming service, removing invalid Criterion Collection (CC) matches 71 | to_remove.extend(matches.named('edition', predicate=lambda match: match.value == 'Criterion')) 72 | 73 | return to_remove 74 | -------------------------------------------------------------------------------- /guessit/rules/properties/type.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | type property 5 | """ 6 | from rebulk import CustomRule, Rebulk, POST_PROCESS 7 | from rebulk.match import Match 8 | 9 | from ..common.pattern import is_disabled 10 | from ...rules.processors import Processors 11 | 12 | 13 | def _type(matches, value): 14 | """ 15 | Define type match with given value. 16 | :param matches: 17 | :param value: 18 | :return: 19 | """ 20 | matches.append(Match(len(matches.input_string), len(matches.input_string), name='type', value=value)) 21 | 22 | 23 | def type_(config): # pylint:disable=unused-argument 24 | """ 25 | Builder for rebulk object. 26 | 27 | :param config: rule configuration 28 | :type config: dict 29 | :return: Created Rebulk object 30 | :rtype: Rebulk 31 | """ 32 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'type')) 33 | rebulk = rebulk.rules(TypeProcessor) 34 | 35 | return rebulk 36 | 37 | 38 | class TypeProcessor(CustomRule): 39 | """ 40 | Post processor to find file type based on all others found matches. 41 | """ 42 | priority = POST_PROCESS 43 | 44 | dependency = Processors 45 | 46 | properties = {'type': ['episode', 'movie']} 47 | 48 | def when(self, matches, context): # pylint:disable=too-many-return-statements 49 | option_type = context.get('type', None) 50 | if option_type: 51 | return option_type 52 | 53 | episode = matches.named('episode') 54 | season = matches.named('season') 55 | absolute_episode = matches.named('absolute_episode') 56 | episode_details = matches.named('episode_details') 57 | 58 | if episode or season or episode_details or absolute_episode: 59 | return 'episode' 60 | 61 | film = matches.named('film') 62 | if film: 63 | return 'movie' 64 | 65 | year = matches.named('year') 66 | date = matches.named('date') 67 | 68 | if date and not year: 69 | return 'episode' 70 | 71 | bonus = matches.named('bonus') 72 | if bonus and not year: 73 | return 'episode' 74 | 75 | crc32 = matches.named('crc32') 76 | anime_release_group = matches.named('release_group', lambda match: 'anime' in match.tags) 77 | if crc32 and anime_release_group: 78 | return 'episode' 79 | 80 | return 'movie' 81 | 82 | def then(self, matches, when_response, context): 83 | _type(matches, when_response) 84 | -------------------------------------------------------------------------------- /guessit/rules/properties/video_codec.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | video_codec and video_profile property 5 | """ 6 | from rebulk import Rebulk, Rule, RemoveMatch 7 | from rebulk.remodule import re 8 | 9 | from ..common import dash 10 | from ..common.pattern import is_disabled 11 | from ..common.validators import seps_after, seps_before, seps_surround 12 | 13 | 14 | def video_codec(config): # pylint:disable=unused-argument 15 | """ 16 | Builder for rebulk object. 17 | 18 | :param config: rule configuration 19 | :type config: dict 20 | :return: Created Rebulk object 21 | :rtype: Rebulk 22 | """ 23 | rebulk = Rebulk() 24 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE, abbreviations=[dash]).string_defaults(ignore_case=True) 25 | rebulk.defaults(name="video_codec", 26 | tags=['source-suffix', 'streaming_service.suffix'], 27 | disabled=lambda context: is_disabled(context, 'video_codec')) 28 | 29 | rebulk.regex(r'Rv\d{2}', value='RealVideo') 30 | rebulk.regex('Mpe?g-?2', '[hx]-?262', value='MPEG-2') 31 | rebulk.string("DVDivX", "DivX", value="DivX") 32 | rebulk.string('XviD', value='Xvid') 33 | rebulk.regex('VC-?1', value='VC-1') 34 | rebulk.string('VP7', value='VP7') 35 | rebulk.string('VP8', 'VP80', value='VP8') 36 | rebulk.string('VP9', value='VP9') 37 | rebulk.regex('[hx]-?263', value='H.263') 38 | rebulk.regex('[hx]-?264', '(MPEG-?4)?AVC(?:HD)?', value='H.264') 39 | rebulk.regex('[hx]-?265', 'HEVC', value='H.265') 40 | rebulk.regex('(?Phevc)(?P10)', value={'video_codec': 'H.265', 'color_depth': '10-bit'}, 41 | tags=['video-codec-suffix'], children=True) 42 | 43 | # http://blog.mediacoderhq.com/h264-profiles-and-levels/ 44 | # https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC 45 | rebulk.defaults(clear=True, 46 | name="video_profile", 47 | validator=seps_surround, 48 | disabled=lambda context: is_disabled(context, 'video_profile')) 49 | 50 | rebulk.string('BP', value='Baseline', tags='video_profile.rule') 51 | rebulk.string('XP', 'EP', value='Extended', tags='video_profile.rule') 52 | rebulk.string('MP', value='Main', tags='video_profile.rule') 53 | rebulk.string('HP', 'HiP', value='High', tags='video_profile.rule') 54 | 55 | # https://en.wikipedia.org/wiki/Scalable_Video_Coding 56 | rebulk.string('SC', 'SVC', value='Scalable Video Coding', tags='video_profile.rule') 57 | # https://en.wikipedia.org/wiki/AVCHD 58 | rebulk.regex('AVC(?:HD)?', value='Advanced Video Codec High Definition', tags='video_profile.rule') 59 | # https://en.wikipedia.org/wiki/H.265/HEVC 60 | rebulk.string('HEVC', value='High Efficiency Video Coding', tags='video_profile.rule') 61 | 62 | rebulk.regex('Hi422P', value='High 4:2:2') 63 | rebulk.regex('Hi444PP', value='High 4:4:4 Predictive') 64 | rebulk.regex('Hi10P?', value='High 10') # no profile validation is required 65 | 66 | rebulk.string('DXVA', value='DXVA', name='video_api', 67 | disabled=lambda context: is_disabled(context, 'video_api')) 68 | 69 | rebulk.defaults(clear=True, 70 | name='color_depth', 71 | validator=seps_surround, 72 | disabled=lambda context: is_disabled(context, 'color_depth')) 73 | rebulk.regex('12.?bits?', value='12-bit') 74 | rebulk.regex('10.?bits?', 'YUV420P10', 'Hi10P?', value='10-bit') 75 | rebulk.regex('8.?bits?', value='8-bit') 76 | 77 | rebulk.rules(ValidateVideoCodec, VideoProfileRule) 78 | 79 | return rebulk 80 | 81 | 82 | class ValidateVideoCodec(Rule): 83 | """ 84 | Validate video_codec with source property or separated 85 | """ 86 | priority = 64 87 | consequence = RemoveMatch 88 | 89 | def enabled(self, context): 90 | return not is_disabled(context, 'video_codec') 91 | 92 | def when(self, matches, context): 93 | ret = [] 94 | for codec in matches.named('video_codec'): 95 | if not seps_before(codec) and \ 96 | not matches.at_index(codec.start - 1, lambda match: 'video-codec-prefix' in match.tags): 97 | ret.append(codec) 98 | continue 99 | if not seps_after(codec) and \ 100 | not matches.at_index(codec.end + 1, lambda match: 'video-codec-suffix' in match.tags): 101 | ret.append(codec) 102 | continue 103 | return ret 104 | 105 | 106 | class VideoProfileRule(Rule): 107 | """ 108 | Rule to validate video_profile 109 | """ 110 | consequence = RemoveMatch 111 | 112 | def enabled(self, context): 113 | return not is_disabled(context, 'video_profile') 114 | 115 | def when(self, matches, context): 116 | profile_list = matches.named('video_profile', lambda match: 'video_profile.rule' in match.tags) 117 | ret = [] 118 | for profile in profile_list: 119 | codec = matches.at_span(profile.span, lambda match: match.name == 'video_codec', 0) 120 | if not codec: 121 | codec = matches.previous(profile, lambda match: match.name == 'video_codec') 122 | if not codec: 123 | codec = matches.next(profile, lambda match: match.name == 'video_codec') 124 | if not codec: 125 | ret.append(profile) 126 | return ret 127 | -------------------------------------------------------------------------------- /guessit/rules/properties/website.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Website property. 5 | """ 6 | try: 7 | from importlib.resources import files # @UnresolvedImport 8 | except ImportError: 9 | from importlib_resources import files # @UnresolvedImport 10 | 11 | from rebulk.remodule import re 12 | 13 | from rebulk import Rebulk, Rule, RemoveMatch 14 | from ..common import seps 15 | from ..common.formatters import cleanup 16 | from ..common.pattern import is_disabled 17 | from ..common.validators import seps_surround 18 | from ...reutils import build_or_pattern 19 | 20 | 21 | def website(config): 22 | """ 23 | Builder for rebulk object. 24 | 25 | :param config: rule configuration 26 | :type config: dict 27 | :return: Created Rebulk object 28 | :rtype: Rebulk 29 | """ 30 | rebulk = Rebulk(disabled=lambda context: is_disabled(context, 'website')) 31 | rebulk = rebulk.regex_defaults(flags=re.IGNORECASE).string_defaults(ignore_case=True) 32 | rebulk.defaults(name="website") 33 | 34 | data_files = files('guessit.data') 35 | tld_file = data_files.joinpath('tlds-alpha-by-domain.txt').read_text(encoding='utf-8') 36 | tlds = [ 37 | tld.strip() 38 | for tld in tld_file.split('\n') 39 | if '--' not in tld 40 | ][1:] # All registered domain extension 41 | 42 | safe_tlds = config['safe_tlds'] # For sure a website extension 43 | safe_subdomains = config['safe_subdomains'] # For sure a website subdomain 44 | safe_prefix = config['safe_prefixes'] # Those words before a tlds are sure 45 | website_prefixes = config['prefixes'] 46 | 47 | rebulk.regex(r'(?:[^a-z0-9]|^)((?:'+build_or_pattern(safe_subdomains) + 48 | r'\.)+(?:[a-z-0-9-]+\.)+(?:'+build_or_pattern(tlds) + 49 | r'))(?:[^a-z0-9]|$)', 50 | children=True) 51 | rebulk.regex(r'(?:[^a-z0-9]|^)((?:'+build_or_pattern(safe_subdomains) + 52 | r'\.)*[a-z0-9-]+\.(?:'+build_or_pattern(safe_tlds) + 53 | r'))(?:[^a-z0-9]|$)', 54 | safe_subdomains=safe_subdomains, safe_tlds=safe_tlds, children=True) 55 | rebulk.regex(r'(?:[^a-z0-9]|^)((?:'+build_or_pattern(safe_subdomains) + 56 | r'\.)*[a-z0-9-]+\.(?:'+build_or_pattern(safe_prefix) + 57 | r'\.)+(?:'+build_or_pattern(tlds) + 58 | r'))(?:[^a-z0-9]|$)', 59 | safe_subdomains=safe_subdomains, safe_prefix=safe_prefix, tlds=tlds, children=True) 60 | 61 | rebulk.string(*website_prefixes, 62 | validator=seps_surround, private=True, tags=['website.prefix']) 63 | 64 | class PreferTitleOverWebsite(Rule): 65 | """ 66 | If found match is more likely a title, remove website. 67 | """ 68 | consequence = RemoveMatch 69 | 70 | @staticmethod 71 | def valid_followers(match): 72 | """ 73 | Validator for next website matches 74 | """ 75 | return match.named('season', 'episode', 'year') 76 | 77 | def when(self, matches, context): 78 | to_remove = [] 79 | for website_match in matches.named('website'): 80 | safe = False 81 | for safe_start in safe_subdomains + safe_prefix: 82 | if website_match.value.lower().startswith(safe_start): 83 | safe = True 84 | break 85 | if not safe: 86 | suffix = matches.next(website_match, PreferTitleOverWebsite.valid_followers, 0) 87 | if suffix: 88 | group = matches.markers.at_match(website_match, lambda marker: marker.name == 'group', 0) 89 | if not group: 90 | to_remove.append(website_match) 91 | return to_remove 92 | 93 | rebulk.rules(PreferTitleOverWebsite, ValidateWebsitePrefix) 94 | 95 | return rebulk 96 | 97 | 98 | class ValidateWebsitePrefix(Rule): 99 | """ 100 | Validate website prefixes 101 | """ 102 | priority = 64 103 | consequence = RemoveMatch 104 | 105 | def when(self, matches, context): 106 | to_remove = [] 107 | for prefix in matches.tagged('website.prefix'): 108 | website_match = matches.next(prefix, predicate=lambda match: match.name == 'website', index=0) 109 | if (not website_match or 110 | matches.holes(prefix.end, website_match.start, 111 | formatter=cleanup, seps=seps, predicate=lambda match: match.value)): 112 | to_remove.append(prefix) 113 | return to_remove 114 | -------------------------------------------------------------------------------- /guessit/test/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name 4 | -------------------------------------------------------------------------------- /guessit/test/config/dummy.txt: -------------------------------------------------------------------------------- 1 | Not a configuration file -------------------------------------------------------------------------------- /guessit/test/config/test.json: -------------------------------------------------------------------------------- 1 | { 2 | "expected_title": ["The 100", "OSS 117"], 3 | "yaml": false 4 | } 5 | -------------------------------------------------------------------------------- /guessit/test/config/test.yaml: -------------------------------------------------------------------------------- 1 | expected_title: 2 | - The 100 3 | - OSS 117 4 | yaml: True 5 | -------------------------------------------------------------------------------- /guessit/test/config/test.yml: -------------------------------------------------------------------------------- 1 | expected_title: 2 | - The 100 3 | - OSS 117 4 | yaml: True 5 | -------------------------------------------------------------------------------- /guessit/test/enable_disable_properties.yml: -------------------------------------------------------------------------------- 1 | ? vorbis 2 | : options: --exclude audio_codec 3 | -audio_codec: Vorbis 4 | 5 | ? DTS-ES 6 | : options: --exclude audio_profile 7 | audio_codec: DTS 8 | -audio_profile: Extended Surround 9 | 10 | ? DTS.ES 11 | : options: --include audio_codec 12 | audio_codec: DTS 13 | -audio_profile: Extended Surround 14 | 15 | ? 5.1 16 | ? 5ch 17 | ? 6ch 18 | : options: --exclude audio_channels 19 | -audio_channels: '5.1' 20 | 21 | ? Movie Title-x01-Other Title.mkv 22 | ? Movie Title-x01-Other Title 23 | ? directory/Movie Title-x01-Other Title/file.mkv 24 | : options: --exclude bonus 25 | -bonus: 1 26 | -bonus_title: Other Title 27 | 28 | ? Title-x02-Bonus Title.mkv 29 | : options: --include bonus 30 | bonus: 2 31 | -bonus_title: Other Title 32 | 33 | ? cd 1of3 34 | : options: --exclude cd 35 | -cd: 1 36 | -cd_count: 3 37 | 38 | ? This.is.Us 39 | : options: --exclude country 40 | title: This is Us 41 | -country: US 42 | 43 | ? 2015.01.31 44 | : options: --exclude date 45 | year: 2015 46 | -date: 2015-01-31 47 | 48 | ? Something 2 mar 2013) 49 | : options: --exclude date 50 | -date: 2013-03-02 51 | 52 | ? 2012 2009 S01E02 2015 # If no year is marked, the second one is guessed. 53 | : options: --exclude year 54 | -year: 2009 55 | 56 | ? Director's cut 57 | : options: --exclude edition 58 | -edition: Director's Cut 59 | 60 | ? 2x5 61 | ? 2X5 62 | ? 02x05 63 | ? 2X05 64 | ? 02x5 65 | ? S02E05 66 | ? s02e05 67 | ? s02e5 68 | ? s2e05 69 | ? s02ep05 70 | ? s2EP5 71 | : options: --exclude season 72 | -season: 2 73 | -episode: 5 74 | 75 | ? 2x6 76 | ? 2X6 77 | ? 02x06 78 | ? 2X06 79 | ? 02x6 80 | ? S02E06 81 | ? s02e06 82 | ? s02e6 83 | ? s2e06 84 | ? s02ep06 85 | ? s2EP6 86 | : options: --exclude episode 87 | -season: 2 88 | -episode: 6 89 | 90 | ? serie Season 2 other 91 | : options: --exclude season 92 | -season: 2 93 | 94 | ? Some Dummy Directory/S02 Some Series/E01-Episode title.mkv 95 | : options: --exclude episode_title 96 | -episode_title: Episode title 97 | season: 2 98 | episode: 1 99 | 100 | ? Another Dummy Directory/S02 Some Series/E01-Episode title.mkv 101 | : options: --include season --include episode 102 | -episode_title: Episode title 103 | season: 2 104 | episode: 1 105 | 106 | # pattern contains season and episode: it wont work enabling only one 107 | ? Some Series S03E01E02 108 | : options: --include episode 109 | -season: 3 110 | -episode: [1, 2] 111 | 112 | # pattern contains season and episode: it wont work enabling only one 113 | ? Another Series S04E01E02 114 | : options: --include season 115 | -season: 4 116 | -episode: [1, 2] 117 | 118 | ? Show.Name.Season.4.Episode.1 119 | : options: --include episode 120 | -season: 4 121 | episode: 1 122 | 123 | ? Another.Show.Name.Season.4.Episode.1 124 | : options: --include season 125 | season: 4 126 | -episode: 1 127 | 128 | ? Some Series S01 02 03 129 | : options: --exclude season 130 | -season: [1, 2, 3] 131 | 132 | ? Some Series E01 02 04 133 | : options: --exclude episode 134 | -episode: [1, 2, 4] 135 | 136 | ? A very special episode s06 special 137 | : options: -t episode --exclude episode_details 138 | season: 6 139 | -episode_details: Special 140 | 141 | ? S01D02.3-5-GROUP 142 | : options: --exclude disc 143 | -season: 1 144 | -disc: [2, 3, 4, 5] 145 | -episode: [2, 3, 4, 5] 146 | 147 | ? S01D02&4-6&8 148 | : options: --exclude season 149 | -season: 1 150 | -disc: [2, 4, 5, 6, 8] 151 | -episode: [2, 4, 5, 6, 8] 152 | 153 | ? Film Title-f01-Series Title.mkv 154 | : options: --exclude film 155 | -film: 1 156 | -film_title: Film Title 157 | 158 | ? Another Film Title-f01-Series Title.mkv 159 | : options: --exclude film_title 160 | film: 1 161 | -film_title: Film Title 162 | 163 | ? English 164 | ? .ENG. 165 | : options: --exclude language 166 | -language: English 167 | 168 | ? SubFrench 169 | ? SubFr 170 | ? STFr 171 | : options: --exclude subtitle_language 172 | -language: French 173 | -subtitle_language: French 174 | 175 | ? ST.FR 176 | : options: --exclude subtitle_language 177 | language: French 178 | -subtitle_language: French 179 | 180 | ? ENG.-.sub.FR 181 | ? ENG.-.FR Sub 182 | : options: --include language 183 | language: [English, French] 184 | -subtitle_language: French 185 | 186 | ? ENG.-.SubFR 187 | : options: --include language 188 | language: English 189 | -subtitle_language: French 190 | 191 | ? ENG.-.FRSUB 192 | ? ENG.-.FRSUBS 193 | ? ENG.-.FR-SUBS 194 | : options: --include subtitle_language 195 | -language: English 196 | subtitle_language: French 197 | 198 | ? DVD.Real.XViD 199 | ? DVD.fix.XViD 200 | : options: --exclude other 201 | -other: Fix 202 | -proper_count: 1 203 | 204 | ? Part 3 205 | ? Part III 206 | ? Part Three 207 | ? Part Trois 208 | ? Part3 209 | : options: --exclude part 210 | -part: 3 211 | 212 | ? Some.Title.XViD-by.Artik[SEDG].avi 213 | : options: --exclude release_group 214 | -release_group: Artik[SEDG] 215 | 216 | ? "[ABC] Some.Title.avi" 217 | ? some/folder/[ABC]Some.Title.avi 218 | : options: --exclude release_group 219 | -release_group: ABC 220 | 221 | ? 360p 222 | ? 360px 223 | ? "360" 224 | ? +500x360 225 | : options: --exclude screen_size 226 | -screen_size: 360p 227 | 228 | ? 640x360 229 | : options: --exclude aspect_ratio 230 | screen_size: 360p 231 | -aspect_ratio: 1.778 232 | 233 | ? 8196x4320 234 | : options: --exclude screen_size 235 | -screen_size: 4320p 236 | -aspect_ratio: 1.897 237 | 238 | ? 4.3gb 239 | : options: --exclude size 240 | -size: 4.3GB 241 | 242 | ? VhS_rip 243 | ? VHS.RIP 244 | : options: --exclude source 245 | -source: VHS 246 | -other: Rip 247 | 248 | ? DVD.RIP 249 | : options: --include other 250 | -source: DVD 251 | -other: Rip 252 | 253 | ? Title Only.avi 254 | : options: --exclude title 255 | -title: Title Only 256 | 257 | ? h265 258 | ? x265 259 | ? h.265 260 | ? x.265 261 | ? hevc 262 | : options: --exclude video_codec 263 | -video_codec: H.265 264 | 265 | ? hevc10 266 | : options: --include color_depth 267 | -video_codec: H.265 268 | -color_depth: 10-bit 269 | 270 | ? HEVC-YUV420P10 271 | : options: --include color_depth 272 | -video_codec: H.265 273 | color_depth: 10-bit 274 | 275 | ? h265-HP 276 | : options: --exclude video_profile 277 | video_codec: H.265 278 | -video_profile: High 279 | 280 | ? House.of.Cards.2013.S02E03.1080p.NF.WEBRip.DD5.1.x264-NTb.mkv 281 | ? House.of.Cards.2013.S02E03.1080p.Netflix.WEBRip.DD5.1.x264-NTb.mkv 282 | : options: --exclude streaming_service 283 | -streaming_service: Netflix 284 | 285 | ? wawa.co.uk 286 | : options: --exclude website 287 | -website: wawa.co.uk 288 | 289 | ? movie.mp4 290 | : options: --exclude mimetype 291 | -mimetype: video/mp4 292 | 293 | ? another movie.mkv 294 | : options: --exclude container 295 | -container: mkv 296 | 297 | ? series s02e01 298 | : options: --exclude type 299 | -type: episode 300 | 301 | ? series s02e01 302 | : options: --exclude type 303 | -type: episode 304 | 305 | ? Hotel.Hell.S01E01.720p.DD5.1.448kbps-ALANiS 306 | : options: --exclude audio_bit_rate 307 | -audio_bit_rate: 448Kbps 308 | 309 | ? Katy Perry - Pepsi & Billboard Summer Beats Concert Series 2012 1080i HDTV 20 Mbps DD2.0 MPEG2-TrollHD.ts 310 | : options: --exclude video_bit_rate 311 | -video_bit_rate: 20Mbps 312 | 313 | ? "[Figmentos] Monster 34 - At the End of Darkness [781219F1].mkv" 314 | : options: --exclude crc32 315 | -crc32: 781219F1 316 | 317 | ? 1080p25 318 | : options: --exclude frame_rate 319 | screen_size: 1080p 320 | -frame_rate: 25fps 321 | 322 | ? 1080p25 323 | : options: --exclude screen_size 324 | -screen_size: 1080p 325 | -frame_rate: 25fps 326 | 327 | ? 1080p25 328 | : options: --include frame_rate 329 | -screen_size: 1080p 330 | -frame_rate: 25fps 331 | 332 | ? 1080p 30fps 333 | : options: --exclude screen_size 334 | -screen_size: 1080p 335 | frame_rate: 30fps 336 | -------------------------------------------------------------------------------- /guessit/test/rules/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name 4 | -------------------------------------------------------------------------------- /guessit/test/rules/audio_codec.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use $ marker to check inputs that should not match results. 3 | 4 | 5 | ? +MP3 6 | ? +lame 7 | ? +lame3.12 8 | ? +lame3.100 9 | : audio_codec: MP3 10 | 11 | ? +MP2 12 | : audio_codec: MP2 13 | 14 | ? +DolbyDigital 15 | ? +DD 16 | ? +Dolby Digital 17 | ? +AC3 18 | ? +AC-3 19 | : audio_codec: Dolby Digital 20 | 21 | ? +DDP 22 | ? +DD+ 23 | ? +EAC3 24 | ? +EAC-3 25 | ? +E-AC-3 26 | ? +E-AC3 27 | : audio_codec: Dolby Digital Plus 28 | 29 | ? +DolbyAtmos 30 | ? +Dolby Atmos 31 | ? +Atmos 32 | ? -Atmosphere 33 | : audio_codec: Dolby Atmos 34 | 35 | ? +AAC 36 | : audio_codec: AAC 37 | 38 | ? +Flac 39 | : audio_codec: FLAC 40 | 41 | ? +DTS 42 | : audio_codec: DTS 43 | 44 | ? +True-HD 45 | ? +trueHD 46 | : audio_codec: Dolby TrueHD 47 | 48 | ? +True-HD51 49 | ? +trueHD51 50 | : audio_codec: Dolby TrueHD 51 | audio_channels: '5.1' 52 | 53 | ? +DTSHD 54 | ? +DTS HD 55 | ? +DTS-HD 56 | : audio_codec: DTS-HD 57 | 58 | ? +DTS-HDma 59 | ? +DTSMA 60 | : audio_codec: DTS-HD 61 | audio_profile: Master Audio 62 | 63 | ? +AC3-hq 64 | : audio_codec: Dolby Digital 65 | audio_profile: High Quality 66 | 67 | ? +AAC-HE 68 | : audio_codec: AAC 69 | audio_profile: High Efficiency 70 | 71 | ? +AAC-LC 72 | : audio_codec: AAC 73 | audio_profile: Low Complexity 74 | 75 | ? +AAC2.0 76 | ? +AAC20 77 | : audio_codec: AAC 78 | audio_channels: '2.0' 79 | 80 | ? +7.1 81 | ? +7ch 82 | ? +8ch 83 | : audio_channels: '7.1' 84 | 85 | ? +5.1 86 | ? +5ch 87 | ? +6ch 88 | : audio_channels: '5.1' 89 | 90 | ? +2ch 91 | ? +2.0 92 | ? +stereo 93 | : audio_channels: '2.0' 94 | 95 | ? +1.0 96 | ? +1ch 97 | ? +mono 98 | : audio_channels: '1.0' 99 | 100 | ? DD5.1 101 | ? DD51 102 | : audio_codec: Dolby Digital 103 | audio_channels: '5.1' 104 | 105 | ? -51 106 | : audio_channels: '5.1' 107 | 108 | ? DTS-HD.HRA 109 | ? DTSHD.HRA 110 | ? DTS-HD.HR 111 | ? DTSHD.HR 112 | ? -HRA 113 | ? -HR 114 | : audio_codec: DTS-HD 115 | audio_profile: High Resolution Audio 116 | 117 | ? DTSES 118 | ? DTS-ES 119 | ? -ES 120 | : audio_codec: DTS 121 | audio_profile: Extended Surround 122 | 123 | ? DTS:X 124 | ? DTS-X 125 | ? DTSX 126 | : audio_codec: DTS:X 127 | 128 | ? DD-EX 129 | ? DDEX 130 | ? -EX 131 | : audio_codec: Dolby Digital 132 | audio_profile: EX 133 | 134 | ? OPUS 135 | : audio_codec: Opus 136 | 137 | ? Vorbis 138 | : audio_codec: Vorbis 139 | 140 | ? PCM 141 | : audio_codec: PCM 142 | 143 | ? LPCM 144 | : audio_codec: LPCM 145 | -------------------------------------------------------------------------------- /guessit/test/rules/bonus.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Movie Title-x01-Other Title.mkv 4 | ? Movie Title-x01-Other Title 5 | ? directory/Movie Title-x01-Other Title/file.mkv 6 | : title: Movie Title 7 | bonus_title: Other Title 8 | bonus: 1 9 | 10 | -------------------------------------------------------------------------------- /guessit/test/rules/cd.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? cd 1of3 4 | : cd: 1 5 | cd_count: 3 6 | 7 | ? Some.Title-DVDRIP-x264-CDP 8 | : cd: !!null 9 | release_group: CDP 10 | video_codec: H.264 11 | -------------------------------------------------------------------------------- /guessit/test/rules/common_words.yml: -------------------------------------------------------------------------------- 1 | ? is 2 | : title: is 3 | 4 | ? it 5 | : title: it 6 | 7 | ? am 8 | : title: am 9 | 10 | ? mad 11 | : title: mad 12 | 13 | ? men 14 | : title: men 15 | 16 | ? man 17 | : title: man 18 | 19 | ? run 20 | : title: run 21 | 22 | ? sin 23 | : title: sin 24 | 25 | ? st 26 | : title: st 27 | 28 | ? to 29 | : title: to 30 | 31 | ? 'no' 32 | : title: 'no' 33 | 34 | ? non 35 | : title: non 36 | 37 | ? war 38 | : title: war 39 | 40 | ? min 41 | : title: min 42 | 43 | ? new 44 | : title: new 45 | 46 | ? car 47 | : title: car 48 | 49 | ? day 50 | : title: day 51 | 52 | ? bad 53 | : title: bad 54 | 55 | ? bat 56 | : title: bat 57 | 58 | ? fan 59 | : title: fan 60 | 61 | ? fry 62 | : title: fry 63 | 64 | ? cop 65 | : title: cop 66 | 67 | ? zen 68 | : title: zen 69 | 70 | ? gay 71 | : title: gay 72 | 73 | ? fat 74 | : title: fat 75 | 76 | ? one 77 | : title: one 78 | 79 | ? cherokee 80 | : title: cherokee 81 | 82 | ? got 83 | : title: got 84 | 85 | ? an 86 | : title: an 87 | 88 | ? as 89 | : title: as 90 | 91 | ? cat 92 | : title: cat 93 | 94 | ? her 95 | : title: her 96 | 97 | ? be 98 | : title: be 99 | 100 | ? hat 101 | : title: hat 102 | 103 | ? sun 104 | : title: sun 105 | 106 | ? may 107 | : title: may 108 | 109 | ? my 110 | : title: my 111 | 112 | ? mr 113 | : title: mr 114 | 115 | ? rum 116 | : title: rum 117 | 118 | ? pi 119 | : title: pi 120 | 121 | ? bb 122 | : title: bb 123 | 124 | ? bt 125 | : title: bt 126 | 127 | ? tv 128 | : title: tv 129 | 130 | ? aw 131 | : title: aw 132 | 133 | ? by 134 | : title: by 135 | 136 | ? md 137 | : other: Mic Dubbed 138 | 139 | ? mp 140 | : title: mp 141 | 142 | ? cd 143 | : title: cd 144 | 145 | ? in 146 | : title: in 147 | 148 | ? ad 149 | : title: ad 150 | 151 | ? ice 152 | : title: ice 153 | 154 | ? ay 155 | : title: ay 156 | 157 | ? at 158 | : title: at 159 | 160 | ? star 161 | : title: star 162 | 163 | ? so 164 | : title: so 165 | 166 | ? he 167 | : title: he 168 | 169 | ? do 170 | : title: do 171 | 172 | ? ax 173 | : title: ax 174 | 175 | ? mx 176 | : title: mx 177 | 178 | ? bas 179 | : title: bas 180 | 181 | ? de 182 | : title: de 183 | 184 | ? le 185 | : title: le 186 | 187 | ? son 188 | : title: son 189 | 190 | ? ne 191 | : title: ne 192 | 193 | ? ca 194 | : title: ca 195 | 196 | ? ce 197 | : title: ce 198 | 199 | ? et 200 | : title: et 201 | 202 | ? que 203 | : title: que 204 | 205 | ? mal 206 | : title: mal 207 | 208 | ? est 209 | : title: est 210 | 211 | ? vol 212 | : title: vol 213 | 214 | ? or 215 | : title: or 216 | 217 | ? mon 218 | : title: mon 219 | 220 | ? se 221 | : title: se 222 | 223 | ? je 224 | : title: je 225 | 226 | ? tu 227 | : title: tu 228 | 229 | ? me 230 | : title: me 231 | 232 | ? ma 233 | : title: ma 234 | 235 | ? va 236 | : title: va 237 | 238 | ? au 239 | : country: AU 240 | 241 | ? lu 242 | : title: lu 243 | 244 | ? wa 245 | : title: wa 246 | 247 | ? ga 248 | : title: ga 249 | 250 | ? ao 251 | : title: ao 252 | 253 | ? la 254 | : title: la 255 | 256 | ? el 257 | : title: el 258 | 259 | ? del 260 | : title: del 261 | 262 | ? por 263 | : title: por 264 | 265 | ? mar 266 | : title: mar 267 | 268 | ? al 269 | : title: al 270 | 271 | ? un 272 | : title: un 273 | 274 | ? ind 275 | : title: ind 276 | 277 | ? arw 278 | : title: arw 279 | 280 | ? ts 281 | : source: Telesync 282 | 283 | ? ii 284 | : title: ii 285 | 286 | ? bin 287 | : title: bin 288 | 289 | ? chan 290 | : title: chan 291 | 292 | ? ss 293 | : title: ss 294 | 295 | ? san 296 | : title: san 297 | 298 | ? oss 299 | : title: oss 300 | 301 | ? iii 302 | : title: iii 303 | 304 | ? vi 305 | : title: vi 306 | 307 | ? ben 308 | : title: ben 309 | 310 | ? da 311 | : title: da 312 | 313 | ? lt 314 | : title: lt 315 | 316 | ? ch 317 | : title: ch 318 | 319 | ? sr 320 | : title: sr 321 | 322 | ? ps 323 | : title: ps 324 | 325 | ? cx 326 | : title: cx 327 | 328 | ? vo 329 | : title: vo 330 | 331 | ? mkv 332 | : container: mkv 333 | 334 | ? avi 335 | : container: avi 336 | 337 | ? dmd 338 | : title: dmd 339 | 340 | ? the 341 | : title: the 342 | 343 | ? dis 344 | : title: dis 345 | 346 | ? cut 347 | : title: cut 348 | 349 | ? stv 350 | : title: stv 351 | 352 | ? des 353 | : title: des 354 | 355 | ? dia 356 | : title: dia 357 | 358 | ? and 359 | : title: and 360 | 361 | ? cab 362 | : title: cab 363 | 364 | ? sub 365 | : title: sub 366 | 367 | ? mia 368 | : title: mia 369 | 370 | ? rim 371 | : title: rim 372 | 373 | ? las 374 | : title: las 375 | 376 | ? une 377 | : title: une 378 | 379 | ? par 380 | : title: par 381 | 382 | ? srt 383 | : container: srt 384 | 385 | ? ano 386 | : title: ano 387 | 388 | ? toy 389 | : title: toy 390 | 391 | ? job 392 | : title: job 393 | 394 | ? gag 395 | : title: gag 396 | 397 | ? reel 398 | : title: reel 399 | 400 | ? www 401 | : title: www 402 | 403 | ? for 404 | : title: for 405 | 406 | ? ayu 407 | : title: ayu 408 | 409 | ? csi 410 | : title: csi 411 | 412 | ? ren 413 | : title: ren 414 | 415 | ? moi 416 | : title: moi 417 | 418 | ? sur 419 | : title: sur 420 | 421 | ? fer 422 | : title: fer 423 | 424 | ? fun 425 | : title: fun 426 | 427 | ? two 428 | : title: two 429 | 430 | ? big 431 | : title: big 432 | 433 | ? psy 434 | : title: psy 435 | 436 | ? air 437 | : title: air 438 | 439 | ? brazil 440 | : title: brazil 441 | 442 | ? jordan 443 | : title: jordan 444 | 445 | ? bs 446 | : title: bs 447 | 448 | ? kz 449 | : title: kz 450 | 451 | ? gt 452 | : title: gt 453 | 454 | ? im 455 | : title: im 456 | 457 | ? pt 458 | : language: pt 459 | 460 | ? scr 461 | : title: scr 462 | 463 | ? sd 464 | : title: sd 465 | 466 | ? hr 467 | : other: High Resolution 468 | -------------------------------------------------------------------------------- /guessit/test/rules/country.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use $ marker to check inputs that should not match results. 3 | ? Us.this.is.title 4 | ? this.is.title.US 5 | : country: US 6 | title: this is title 7 | 8 | ? This.is.Us 9 | : title: This is Us 10 | 11 | ? This.Is.Us 12 | : options: --no-default-config 13 | title: This Is Us 14 | -------------------------------------------------------------------------------- /guessit/test/rules/date.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +09.03.08 4 | ? +09.03.2008 5 | ? +2008.03.09 6 | : date: 2008-03-09 7 | 8 | ? +31.01.15 9 | ? +31.01.2015 10 | ? +15.01.31 11 | ? +2015.01.31 12 | : date: 2015-01-31 13 | 14 | ? +01.02.03 15 | : date: 2003-02-01 16 | 17 | ? +01.02.03 18 | : options: --date-year-first 19 | date: 2001-02-03 20 | 21 | ? +01.02.03 22 | : options: --date-day-first 23 | date: 2003-02-01 24 | 25 | ? 1919 26 | ? 2030 27 | : !!map {} 28 | 29 | ? 2029 30 | : year: 2029 31 | 32 | ? (1920) 33 | : year: 1920 34 | 35 | ? 2012 36 | : year: 2012 37 | 38 | ? 2011 2013 (2012) (2015) # first marked year is guessed. 39 | : title: "2011 2013" 40 | year: 2012 41 | 42 | ? 2012 2009 S01E02 2015 # If no year is marked, the second one is guessed. 43 | : title: "2012" 44 | year: 2009 45 | episode_title: "2015" 46 | 47 | ? Something 2 mar 2013) 48 | : title: Something 49 | date: 2013-03-02 50 | type: episode 51 | -------------------------------------------------------------------------------- /guessit/test/rules/edition.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Director's cut 4 | ? Edition Director's cut 5 | : edition: Director's Cut 6 | 7 | ? Collector 8 | ? Collector Edition 9 | ? Edition Collector 10 | : edition: Collector 11 | 12 | ? Special Edition 13 | ? Edition Special 14 | ? -Special 15 | : edition: Special 16 | 17 | ? Criterion Edition 18 | ? Criterion Collection 19 | ? Edition Criterion 20 | ? CC 21 | : edition: Criterion 22 | 23 | ? Deluxe 24 | ? Deluxe Edition 25 | ? Edition Deluxe 26 | : edition: Deluxe 27 | 28 | ? Super Movie Alternate XViD 29 | ? Super Movie Alternative XViD 30 | ? Super Movie Alternate Cut XViD 31 | ? Super Movie Alternative Cut XViD 32 | : edition: Alternative Cut 33 | 34 | ? Remaster 35 | ? Remastered 36 | ? 4k-Remaster 37 | ? 4k-Remastered 38 | ? 4k Remaster 39 | ? 4k Remastered 40 | : edition: Remastered 41 | 42 | ? Restore 43 | ? Restored 44 | ? 4k-Restore 45 | ? 4k-Restored 46 | ? 4k Restore 47 | ? 4k Restored 48 | : edition: Restored 49 | 50 | ? ddc 51 | : edition: Director's Definitive Cut 52 | 53 | ? IMAX 54 | ? IMAX Edition 55 | : edition: IMAX 56 | 57 | ? ultimate edition 58 | ? -ultimate 59 | : edition: Ultimate 60 | 61 | ? ultimate collector edition 62 | ? ultimate collector's edition 63 | ? ultimate collectors edition 64 | ? -collectors edition 65 | ? -ultimate edition 66 | : edition: [Ultimate, Collector] 67 | 68 | ? ultimate collectors edition dc 69 | : edition: [Ultimate, Collector, Director's Cut] 70 | 71 | ? fan edit 72 | ? fan edition 73 | ? fan collection 74 | : edition: Fan 75 | 76 | ? ultimate fan edit 77 | ? ultimate fan edition 78 | ? ultimate fan collection 79 | : edition: [Ultimate, Fan] 80 | -------------------------------------------------------------------------------- /guessit/test/rules/episodes.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use $ marker to check inputs that should not match results. 3 | ? +2x5 4 | ? +2X5 5 | ? +02x05 6 | ? +2X05 7 | ? +02x5 8 | ? S02E05 9 | ? s02e05 10 | ? s02e5 11 | ? s2e05 12 | ? s02ep05 13 | ? s2EP5 14 | ? -s03e05 15 | ? -s02e06 16 | ? -3x05 17 | ? -2x06 18 | : season: 2 19 | episode: 5 20 | 21 | ? "+0102" 22 | ? "+102" 23 | : season: 1 24 | episode: 2 25 | 26 | ? "0102 S03E04" 27 | ? "S03E04 102" 28 | : season: 3 29 | episode: 4 30 | 31 | ? +serie Saison 2 other 32 | ? +serie Season 2 other 33 | ? +serie Saisons 2 other 34 | ? +serie Seasons 2 other 35 | ? +serie Season Two other 36 | ? +serie Season II other 37 | : season: 2 38 | 39 | ? Some Series.S02E01.Episode.title.mkv 40 | ? Some Series/Season 02/E01-Episode title.mkv 41 | ? Some Series/Season 02/Some Series-E01-Episode title.mkv 42 | ? Some Dummy Directory/Season 02/Some Series-E01-Episode title.mkv 43 | ? -Some Dummy Directory/Season 02/E01-Episode title.mkv 44 | ? Some Series/Unsafe Season 02/Some Series-E01-Episode title.mkv 45 | ? -Some Series/Unsafe Season 02/E01-Episode title.mkv 46 | ? Some Series/Season 02/E01-Episode title.mkv 47 | ? Some Series/ Season 02/E01-Episode title.mkv 48 | ? Some Dummy Directory/Some Series S02/E01-Episode title.mkv 49 | ? Some Dummy Directory/S02 Some Series/E01-Episode title.mkv 50 | : title: Some Series 51 | episode_title: Episode title 52 | season: 2 53 | episode: 1 54 | 55 | ? Some Series.S02E01.mkv 56 | ? Some Series/Season 02/E01.mkv 57 | ? Some Series/Season 02/Some Series-E01.mkv 58 | ? Some Dummy Directory/Season 02/Some Series-E01.mkv 59 | ? -Some Dummy Directory/Season 02/E01.mkv 60 | ? Some Series/Unsafe Season 02/Some Series-E01.mkv 61 | ? -Some Series/Unsafe Season 02/E01.mkv 62 | ? Some Series/Season 02/E01.mkv 63 | ? Some Series/ Season 02/E01.mkv 64 | ? Some Dummy Directory/Some Series S02/E01-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.mkv 65 | : title: Some Series 66 | season: 2 67 | episode: 1 68 | 69 | ? Some Series S03E01E02 70 | : title: Some Series 71 | season: 3 72 | episode: [1, 2] 73 | 74 | ? Some Series S01S02S03 75 | ? Some Series S01-02-03 76 | ? Some Series S01 S02 S03 77 | ? Some Series S01 02 03 78 | : title: Some Series 79 | season: [1, 2, 3] 80 | 81 | ? Some Series E01E02E03 82 | ? Some Series E01-02-03 83 | ? Some Series E01-03 84 | ? Some Series E01 E02 E03 85 | ? Some Series E01 02 03 86 | : title: Some Series 87 | episode: [1, 2, 3] 88 | 89 | ? Some Series E01E02E04 90 | ? Some Series E01 E02 E04 91 | ? Some Series E01 02 04 92 | : title: Some Series 93 | episode: [1, 2, 4] 94 | 95 | ? Some Series E01-02-04 96 | ? Some Series E01-04 97 | ? Some Series E01-04 98 | : title: Some Series 99 | episode: [1, 2, 3, 4] 100 | 101 | ? Some Series E01-02-E04 102 | : title: Some Series 103 | episode: [1, 2, 3, 4] 104 | 105 | ? Episode 3 106 | ? -Episode III 107 | : episode: 3 108 | 109 | ? Episode 3 110 | ? Episode III 111 | : options: -t episode 112 | episode: 3 113 | 114 | ? -A very special movie 115 | : episode_details: Special 116 | 117 | ? -A very special episode 118 | : options: -t episode 119 | episode_details: Special 120 | 121 | ? A very special episode s06 special 122 | : options: -t episode 123 | title: A very special episode 124 | episode_details: Special 125 | 126 | ? 12 Monkeys\Season 01\Episode 05\12 Monkeys - S01E05 - The Night Room.mkv 127 | : container: mkv 128 | title: 12 Monkeys 129 | episode: 5 130 | season: 1 131 | 132 | ? S03E02.X.1080p 133 | : episode: 2 134 | screen_size: 1080p 135 | season: 3 136 | 137 | ? Something 1 x 2-FlexGet 138 | : options: -t episode 139 | title: Something 140 | season: 1 141 | episode: 2 142 | episode_title: FlexGet 143 | 144 | ? Show.Name.-.Season.1.to.3.-.Mp4.1080p 145 | ? Show.Name.-.Season.1~3.-.Mp4.1080p 146 | ? Show.Name.-.Saison.1.a.3.-.Mp4.1080p 147 | : container: mp4 148 | screen_size: 1080p 149 | season: 150 | - 1 151 | - 2 152 | - 3 153 | title: Show Name 154 | 155 | ? Show.Name.Season.1.3&5.HDTV.XviD-GoodGroup[SomeTrash] 156 | ? Show.Name.Season.1.3 and 5.HDTV.XviD-GoodGroup[SomeTrash] 157 | : source: HDTV 158 | release_group: GoodGroup[SomeTrash] 159 | season: 160 | - 1 161 | - 3 162 | - 5 163 | title: Show Name 164 | type: episode 165 | video_codec: Xvid 166 | 167 | ? Show.Name.Season.1.2.3-5.HDTV.XviD-GoodGroup[SomeTrash] 168 | ? Show.Name.Season.1.2.3~5.HDTV.XviD-GoodGroup[SomeTrash] 169 | ? Show.Name.Season.1.2.3 to 5.HDTV.XviD-GoodGroup[SomeTrash] 170 | : source: HDTV 171 | release_group: GoodGroup[SomeTrash] 172 | season: 173 | - 1 174 | - 2 175 | - 3 176 | - 4 177 | - 5 178 | title: Show Name 179 | type: episode 180 | video_codec: Xvid 181 | 182 | ? The.Get.Down.S01EP01.FRENCH.720p.WEBRIP.XVID-STR 183 | : episode: 1 184 | source: Web 185 | other: Rip 186 | language: fr 187 | release_group: STR 188 | screen_size: 720p 189 | season: 1 190 | title: The Get Down 191 | type: episode 192 | video_codec: Xvid 193 | 194 | ? My.Name.Is.Earl.S01E01-S01E21.SWE-SUB 195 | : episode: 196 | - 1 197 | - 2 198 | - 3 199 | - 4 200 | - 5 201 | - 6 202 | - 7 203 | - 8 204 | - 9 205 | - 10 206 | - 11 207 | - 12 208 | - 13 209 | - 14 210 | - 15 211 | - 16 212 | - 17 213 | - 18 214 | - 19 215 | - 20 216 | - 21 217 | season: 1 218 | subtitle_language: sv 219 | title: My Name Is Earl 220 | type: episode 221 | 222 | ? Show.Name.Season.4.Episodes.1-12 223 | : episode: 224 | - 1 225 | - 2 226 | - 3 227 | - 4 228 | - 5 229 | - 6 230 | - 7 231 | - 8 232 | - 9 233 | - 10 234 | - 11 235 | - 12 236 | season: 4 237 | title: Show Name 238 | type: episode 239 | 240 | ? show name s01.to.s04 241 | : season: 242 | - 1 243 | - 2 244 | - 3 245 | - 4 246 | title: show name 247 | type: episode 248 | 249 | ? epi 250 | : options: -t episode 251 | title: epi 252 | 253 | ? Episode20 254 | ? Episode 20 255 | : episode: 20 256 | 257 | ? Episode50 258 | ? Episode 50 259 | : episode: 50 260 | 261 | ? Episode51 262 | ? Episode 51 263 | : episode: 51 264 | 265 | ? Episode70 266 | ? Episode 70 267 | : episode: 70 268 | 269 | ? Episode71 270 | ? Episode 71 271 | : episode: 71 272 | 273 | ? S01D02.3-5-GROUP 274 | : disc: [2, 3, 4, 5] 275 | 276 | ? S01D02&4-6&8 277 | : disc: [2, 4, 5, 6, 8] 278 | 279 | ? Something.4x05-06 280 | ? Something - 4x05-06 281 | ? Something:4x05-06 282 | ? Something 4x05-06 283 | ? Something-4x05-06 284 | : title: Something 285 | season: 4 286 | episode: 287 | - 5 288 | - 6 289 | 290 | ? Something.4x05-06 291 | ? Something - 4x05-06 292 | ? Something:4x05-06 293 | ? Something 4x05-06 294 | ? Something-4x05-06 295 | : options: -T something 296 | title: Something 297 | season: 4 298 | episode: 299 | - 5 300 | - 6 301 | 302 | ? Colony 23/S01E01.Some.title.mkv 303 | : title: Colony 23 304 | season: 1 305 | episode: 1 306 | episode_title: Some title 307 | 308 | ? Show.Name.E02.2010.mkv 309 | : options: -t episode 310 | title: Show Name 311 | year: 2010 312 | episode: 2 313 | 314 | ? Show.Name.E02.S2010.mkv 315 | : options: -t episode 316 | title: Show Name 317 | year: 2010 318 | season: 2010 319 | episode: 2 320 | 321 | 322 | ? Show.Name.E02.2010.mkv 323 | : title: Show Name 324 | year: 2010 325 | episode: 2 326 | 327 | ? Show.Name.E02.S2010.mkv 328 | : title: Show Name 329 | year: 2010 330 | season: 2010 331 | episode: 2 332 | 333 | ? Show Name - S32-Dummy 45-Ep 6478 334 | : title: Show Name 335 | episode_title: Dummy 45 336 | season: 32 337 | episode: 6478 338 | 339 | ? Show Name - S32-Week 45-Ep 6478 340 | : title: Show Name 341 | season: 32 342 | week: 45 343 | episode: 6478 344 | -------------------------------------------------------------------------------- /guessit/test/rules/film.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Film Title-f01-Series Title.mkv 4 | ? Film Title-f01-Series Title 5 | ? directory/Film Title-f01-Series Title/file.mkv 6 | : title: Series Title 7 | film_title: Film Title 8 | film: 1 9 | 10 | -------------------------------------------------------------------------------- /guessit/test/rules/language.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +English 4 | ? .ENG. 5 | : language: English 6 | 7 | ? +French 8 | : language: French 9 | 10 | ? +SubFrench 11 | ? +SubFr 12 | ? +STFr 13 | ? ST.FR 14 | : subtitle_language: French 15 | 16 | ? +ENG.-.sub.FR 17 | ? ENG.-.FR Sub 18 | ? +ENG.-.SubFR 19 | ? +ENG.-.FRSUB 20 | ? +ENG.-.FRSUBS 21 | ? +ENG.-.FR-SUBS 22 | : language: English 23 | subtitle_language: French 24 | 25 | ? "{Fr-Eng}.St{Fr-Eng}" 26 | ? "Le.Prestige[x264.{Fr-Eng}.St{Fr-Eng}.Chaps].mkv" 27 | : language: [French, English] 28 | subtitle_language: [French, English] 29 | 30 | ? +ENG.-.sub.SWE 31 | ? ENG.-.SWE Sub 32 | ? +ENG.-.SubSWE 33 | ? +ENG.-.SWESUB 34 | ? +ENG.-.sub.SV 35 | ? ENG.-.SV Sub 36 | ? +ENG.-.SubSV 37 | ? +ENG.-.SVSUB 38 | : language: English 39 | subtitle_language: Swedish 40 | 41 | ? The English Patient (1996) 42 | : title: The English Patient 43 | -language: english 44 | 45 | ? French.Kiss.1995.1080p 46 | : title: French Kiss 47 | -language: french 48 | -------------------------------------------------------------------------------- /guessit/test/rules/other.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +DVDSCR 4 | ? +DVDScreener 5 | ? +DVD-SCR 6 | ? +DVD Screener 7 | ? +DVD AnythingElse Screener 8 | ? -DVD AnythingElse SCR 9 | : other: Screener 10 | 11 | ? +AudioFix 12 | ? +AudioFixed 13 | ? +Audio Fix 14 | ? +Audio Fixed 15 | : other: Audio Fixed 16 | 17 | ? +SyncFix 18 | ? +SyncFixed 19 | ? +Sync Fix 20 | ? +Sync Fixed 21 | : other: Sync Fixed 22 | 23 | ? +DualAudio 24 | ? +Dual Audio 25 | : other: Dual Audio 26 | 27 | ? +ws 28 | ? +WideScreen 29 | ? +Wide Screen 30 | : other: Widescreen 31 | 32 | # Fix must be surround by others properties to be matched. 33 | ? DVD.fix.XViD 34 | ? -DVD.Fix 35 | ? -Fix.XViD 36 | : other: Fix 37 | -proper_count: 1 38 | 39 | ? -DVD.BlablaBla.Fix.Blablabla.XVID 40 | ? -DVD.BlablaBla.Fix.XVID 41 | ? -DVD.Fix.Blablabla.XVID 42 | : other: Fix 43 | -proper_count: 1 44 | 45 | 46 | ? DVD.Real.PROPER.REPACK 47 | : other: Proper 48 | proper_count: 3 49 | 50 | 51 | ? Proper.720p 52 | ? +Repack 53 | ? +Rerip 54 | : other: Proper 55 | proper_count: 1 56 | 57 | ? XViD.Fansub 58 | : other: Fan Subtitled 59 | 60 | ? XViD.Fastsub 61 | : other: Fast Subtitled 62 | 63 | ? +Season Complete 64 | ? -Complete 65 | : other: Complete 66 | 67 | ? R5 68 | : other: Region 5 69 | 70 | ? RC 71 | : other: Region C 72 | 73 | ? PreAir 74 | ? Pre Air 75 | : other: Preair 76 | 77 | ? Screener 78 | : other: Screener 79 | 80 | ? Remux 81 | : other: Remux 82 | 83 | ? Hybrid 84 | : other: Hybrid 85 | 86 | ? 3D.2019 87 | : other: 3D 88 | 89 | ? HD 90 | : other: HD 91 | 92 | ? FHD 93 | ? FullHD 94 | ? Full HD 95 | : other: Full HD 96 | 97 | ? UHD 98 | ? Ultra 99 | ? UltraHD 100 | ? Ultra HD 101 | : other: Ultra HD 102 | 103 | ? mHD # ?? 104 | ? HDLight 105 | : other: Micro HD 106 | 107 | ? HQ 108 | : other: High Quality 109 | 110 | ? hr 111 | : other: High Resolution 112 | 113 | ? PAL 114 | : other: PAL 115 | 116 | ? SECAM 117 | : other: SECAM 118 | 119 | ? NTSC 120 | : other: NTSC 121 | 122 | ? LDTV 123 | : other: Low Definition 124 | 125 | ? LD 126 | : other: Line Dubbed 127 | 128 | ? MD 129 | : other: Mic Dubbed 130 | 131 | ? -The complete movie 132 | : other: Complete 133 | 134 | ? +The complete movie 135 | : title: The complete movie 136 | 137 | ? +AC3-HQ 138 | : audio_profile: High Quality 139 | 140 | ? Other-HQ 141 | : other: High Quality 142 | 143 | ? reenc 144 | ? re-enc 145 | ? re-encoded 146 | ? reencoded 147 | : other: Reencoded 148 | 149 | ? CONVERT XViD 150 | : other: Converted 151 | 152 | ? +HDRIP # it's a Rip from non specified HD source 153 | : other: [HD, Rip] 154 | 155 | ? SDR 156 | : other: Standard Dynamic Range 157 | 158 | ? HDR 159 | ? HDR10 160 | ? -HDR100 161 | : other: HDR10 162 | 163 | ? BT2020 164 | ? BT.2020 165 | ? -BT.20200 166 | ? -BT.2021 167 | : other: BT.2020 168 | 169 | ? Upscaled 170 | ? Upscale 171 | : other: Upscaled 172 | 173 | ? REPACK5 174 | ? ReRip5 175 | : other: Proper 176 | proper_count: 5 -------------------------------------------------------------------------------- /guessit/test/rules/part.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Filename Part 3.mkv 4 | ? Filename Part III.mkv 5 | ? Filename Part Three.mkv 6 | ? Filename Part Trois.mkv 7 | : title: Filename 8 | part: 3 9 | 10 | ? Part 3 11 | ? Part III 12 | ? Part Three 13 | ? Part Trois 14 | ? Part3 15 | : part: 3 16 | 17 | ? -Something.Apt.1 18 | : part: 1 -------------------------------------------------------------------------------- /guessit/test/rules/processors.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use $ marker to check inputs that should not match results. 3 | 4 | # Prefer information for last path. 5 | ? Some movie (2000)/Some movie (2001).mkv 6 | ? Some movie (2001)/Some movie.mkv 7 | : year: 2001 8 | container: mkv 9 | -------------------------------------------------------------------------------- /guessit/test/rules/processors_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name, pointless-string-statement 4 | 5 | from rebulk.match import Matches, Match 6 | 7 | from ...rules.processors import StripSeparators 8 | 9 | 10 | def test_strip_separators(): 11 | strip_separators = StripSeparators() 12 | 13 | matches = Matches() 14 | 15 | m = Match(3, 11, input_string="pre.ABCDEF.post") 16 | 17 | assert m.raw == '.ABCDEF.' 18 | matches.append(m) 19 | 20 | returned_matches = strip_separators.when(matches, None) 21 | assert returned_matches == matches 22 | 23 | strip_separators.then(matches, returned_matches, None) 24 | 25 | assert m.raw == 'ABCDEF' 26 | 27 | 28 | def test_strip_separators_keep_acronyms(): 29 | strip_separators = StripSeparators() 30 | 31 | matches = Matches() 32 | 33 | m = Match(0, 13, input_string=".S.H.I.E.L.D.") 34 | m2 = Match(0, 22, input_string=".Agent.Of.S.H.I.E.L.D.") 35 | 36 | assert m.raw == '.S.H.I.E.L.D.' 37 | matches.append(m) 38 | matches.append(m2) 39 | 40 | returned_matches = strip_separators.when(matches, None) 41 | assert returned_matches == matches 42 | 43 | strip_separators.then(matches, returned_matches, None) 44 | 45 | assert m.raw == '.S.H.I.E.L.D.' 46 | assert m2.raw == 'Agent.Of.S.H.I.E.L.D.' 47 | -------------------------------------------------------------------------------- /guessit/test/rules/release_group.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Some.Title.XViD-ReleaseGroup 4 | ? Some.Title.XViD-ReleaseGroup.mkv 5 | : release_group: ReleaseGroup 6 | 7 | ? Some.Title.XViD-by.Artik[SEDG].avi 8 | : release_group: Artik[SEDG] 9 | 10 | ? "[ABC] Some.Title.avi" 11 | ? some/folder/[ABC]Some.Title.avi 12 | : release_group: ABC 13 | 14 | ? "[ABC] Some.Title.XViD-GRP.avi" 15 | ? some/folder/[ABC]Some.Title.XViD-GRP.avi 16 | : release_group: GRP 17 | 18 | ? "[ABC] Some.Title.S01E02.avi" 19 | ? some/folder/[ABC]Some.Title.S01E02.avi 20 | : release_group: ABC 21 | 22 | ? Some.Title.XViD-S2E02.NoReleaseGroup.avi 23 | : release_group: !!null 24 | 25 | ? Test.S01E01-FooBar-Group 26 | : options: -G group -G xxxx 27 | episode: 1 28 | episode_title: FooBar 29 | release_group: Group 30 | season: 1 31 | title: Test 32 | type: episode 33 | 34 | ? Test.S01E01-FooBar-Group 35 | : options: -G re:gr.?up -G xxxx 36 | episode: 1 37 | episode_title: FooBar 38 | release_group: Group 39 | season: 1 40 | title: Test 41 | type: episode 42 | 43 | ? Show.Name.x264-byEMP 44 | : title: Show Name 45 | video_codec: H.264 46 | release_group: byEMP 47 | 48 | ? Show.Name.x264-NovaRip 49 | : title: Show Name 50 | video_codec: H.264 51 | release_group: NovaRip 52 | 53 | ? Show.Name.x264-PARTiCLE 54 | : title: Show Name 55 | video_codec: H.264 56 | release_group: PARTiCLE 57 | 58 | ? Show.Name.x264-POURMOi 59 | : title: Show Name 60 | video_codec: H.264 61 | release_group: POURMOi 62 | 63 | ? Show.Name.x264-RipPourBox 64 | : title: Show Name 65 | video_codec: H.264 66 | release_group: RipPourBox 67 | 68 | ? Show.Name.x264-RiPRG 69 | : title: Show Name 70 | video_codec: H.264 71 | release_group: RiPRG 72 | 73 | ? Archer (2009) S13E01 The Big Con (1080p AMZN Webrip x265 10bit EAC3 5.1 - JBENT)[TAoE] 74 | : release_group: JBENT TAoE 75 | 76 | ? Dark Phoenix (2019) (1080p BluRay x265 HEVC 10bit AAC 7.1 Tigole) [QxR] 77 | : release_group: Tigole QxR 78 | 79 | ? The Peripheral (2022) Season 1 S01 (1080p AMZN WEB-DL x265 HEVC 10bit DDP5.1 D0ct0rLew) [SEV] 80 | : release_group: D0ct0rLew SEV -------------------------------------------------------------------------------- /guessit/test/rules/screen_size.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +360p 4 | ? +360px 5 | ? -360 6 | ? +500x360 7 | ? -250x360 8 | : screen_size: 360p 9 | 10 | ? +640x360 11 | ? -640x360i 12 | ? -684x360i 13 | : screen_size: 360p 14 | aspect_ratio: 1.778 15 | 16 | ? +360i 17 | : screen_size: 360i 18 | 19 | ? +480x360i 20 | ? -480x360p 21 | ? -450x360 22 | : screen_size: 360i 23 | aspect_ratio: 1.333 24 | 25 | ? +368p 26 | ? +368px 27 | ? -368i 28 | ? -368 29 | ? +500x368 30 | : screen_size: 368p 31 | 32 | ? -490x368 33 | ? -700x368 34 | : screen_size: 368p 35 | 36 | ? +492x368p 37 | : screen_size: 38 | aspect_ratio: 1.337 39 | 40 | ? +654x368 41 | : screen_size: 368p 42 | aspect_ratio: 1.777 43 | 44 | ? +698x368 45 | : screen_size: 368p 46 | aspect_ratio: 1.897 47 | 48 | ? +368i 49 | : -screen_size: 368i 50 | 51 | ? +480p 52 | ? +480px 53 | ? -480i 54 | ? -480 55 | ? -500x480 56 | ? -638x480 57 | ? -920x480 58 | : screen_size: 480p 59 | 60 | ? +640x480 61 | : screen_size: 480p 62 | aspect_ratio: 1.333 63 | 64 | ? +852x480 65 | : screen_size: 480p 66 | aspect_ratio: 1.775 67 | 68 | ? +910x480 69 | : screen_size: 480p 70 | aspect_ratio: 1.896 71 | 72 | ? +500x480 73 | ? +500 x 480 74 | ? +500 * 480 75 | ? +500x480p 76 | ? +500X480i 77 | : screen_size: 500x480 78 | aspect_ratio: 1.042 79 | 80 | ? +480i 81 | ? +852x480i 82 | : screen_size: 480i 83 | 84 | ? +540p 85 | ? +540px 86 | ? -540i 87 | ? -540 88 | : screen_size: 540p 89 | 90 | ? +540i 91 | : screen_size: 540i 92 | 93 | ? +576p 94 | ? +576px 95 | ? -576i 96 | ? -576 97 | ? -500x576 98 | ? -766x576 99 | ? -1094x576 100 | : screen_size: 576p 101 | 102 | ? +768x576 103 | : screen_size: 576p 104 | aspect_ratio: 1.333 105 | 106 | ? +1024x576 107 | : screen_size: 576p 108 | aspect_ratio: 1.778 109 | 110 | ? +1092x576 111 | : screen_size: 576p 112 | aspect_ratio: 1.896 113 | 114 | ? +500x576 115 | : screen_size: 500x576 116 | aspect_ratio: 0.868 117 | 118 | ? +576i 119 | : screen_size: 576i 120 | 121 | ? +720p 122 | ? +720px 123 | ? -720i 124 | ? 720hd 125 | ? 720pHD 126 | ? -720 127 | ? -500x720 128 | ? -950x720 129 | ? -1368x720 130 | : screen_size: 720p 131 | 132 | ? +960x720 133 | : screen_size: 720p 134 | aspect_ratio: 1.333 135 | 136 | ? +1280x720 137 | : screen_size: 720p 138 | aspect_ratio: 1.778 139 | 140 | ? +1366x720 141 | : screen_size: 720p 142 | aspect_ratio: 1.897 143 | 144 | ? +500x720 145 | : screen_size: 500x720 146 | aspect_ratio: 0.694 147 | 148 | ? +900p 149 | ? +900px 150 | ? -900i 151 | ? -900 152 | ? -500x900 153 | ? -1198x900 154 | ? -1710x900 155 | : screen_size: 900p 156 | 157 | ? +1200x900 158 | : screen_size: 900p 159 | aspect_ratio: 1.333 160 | 161 | ? +1600x900 162 | : screen_size: 900p 163 | aspect_ratio: 1.778 164 | 165 | ? +1708x900 166 | : screen_size: 900p 167 | aspect_ratio: 1.898 168 | 169 | ? +500x900 170 | ? +500x900p 171 | ? +500x900i 172 | : screen_size: 500x900 173 | aspect_ratio: 0.556 174 | 175 | ? +900i 176 | : screen_size: 900i 177 | 178 | ? +1080p 179 | ? +1080px 180 | ? +1080hd 181 | ? +1080pHD 182 | ? -1080i 183 | ? -1080 184 | ? -500x1080 185 | ? -1438x1080 186 | ? -2050x1080 187 | : screen_size: 1080p 188 | 189 | ? +1440x1080 190 | : screen_size: 1080p 191 | aspect_ratio: 1.333 192 | 193 | ? +1920x1080 194 | : screen_size: 1080p 195 | aspect_ratio: 1.778 196 | 197 | ? +2048x1080 198 | : screen_size: 1080p 199 | aspect_ratio: 1.896 200 | 201 | ? +1080i 202 | ? -1080p 203 | : screen_size: 1080i 204 | 205 | ? 1440p 206 | : screen_size: 1440p 207 | 208 | ? +500x1080 209 | : screen_size: 500x1080 210 | aspect_ratio: 0.463 211 | 212 | ? +2160p 213 | ? +2160px 214 | ? -2160i 215 | ? -2160 216 | ? +4096x2160 217 | ? +4k 218 | ? -2878x2160 219 | ? -4100x2160 220 | : screen_size: 2160p 221 | 222 | ? +2880x2160 223 | : screen_size: 2160p 224 | aspect_ratio: 1.333 225 | 226 | ? +3840x2160 227 | : screen_size: 2160p 228 | aspect_ratio: 1.778 229 | 230 | ? +4098x2160 231 | : screen_size: 2160p 232 | aspect_ratio: 1.897 233 | 234 | ? +500x2160 235 | : screen_size: 500x2160 236 | aspect_ratio: 0.231 237 | 238 | ? +4320p 239 | ? +4320px 240 | ? -4320i 241 | ? -4320 242 | ? -5758x2160 243 | ? -8198x2160 244 | : screen_size: 4320p 245 | 246 | ? +5760x4320 247 | : screen_size: 4320p 248 | aspect_ratio: 1.333 249 | 250 | ? +7680x4320 251 | : screen_size: 4320p 252 | aspect_ratio: 1.778 253 | 254 | ? +8196x4320 255 | : screen_size: 4320p 256 | aspect_ratio: 1.897 257 | 258 | ? +500x4320 259 | : screen_size: 500x4320 260 | aspect_ratio: 0.116 261 | 262 | ? Test.File.720hd.bluray 263 | ? Test.File.720p24 264 | ? Test.File.720p30 265 | ? Test.File.720p50 266 | ? Test.File.720p60 267 | ? Test.File.720p120 268 | : screen_size: 720p 269 | 270 | ? Test.File.400p 271 | : options: 272 | advanced_config: 273 | screen_size: 274 | progressive: ["400"] 275 | screen_size: 400p 276 | 277 | ? Test.File2.400p 278 | : options: 279 | advanced_config: 280 | screen_size: 281 | progressive: ["400"] 282 | screen_size: 400p 283 | 284 | ? Test.File.720p 285 | : options: 286 | advanced_config: 287 | screen_size: 288 | progressive: ["400"] 289 | screen_size: 720p 290 | -------------------------------------------------------------------------------- /guessit/test/rules/size.yml: -------------------------------------------------------------------------------- 1 | ? 1.1tb 2 | : size: 1.1TB 3 | 4 | ? 123mb 5 | : size: 123MB 6 | 7 | ? 4.3gb 8 | : size: 4.3GB 9 | -------------------------------------------------------------------------------- /guessit/test/rules/source.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +VHS 4 | ? -VHSAnythingElse 5 | ? -SomeVHS stuff 6 | ? -VH 7 | ? -VHx 8 | : source: VHS 9 | -other: Rip 10 | 11 | ? +VHSRip 12 | ? +VHS-Rip 13 | ? +VhS_rip 14 | ? +VHS.RIP 15 | ? -VHS 16 | ? -VHxRip 17 | : source: VHS 18 | other: Rip 19 | 20 | ? +Cam 21 | : source: Camera 22 | -other: Rip 23 | 24 | ? +CamRip 25 | ? +CaM Rip 26 | ? +Cam_Rip 27 | ? +cam.rip 28 | ? -Cam 29 | : source: Camera 30 | other: Rip 31 | 32 | ? +HDCam 33 | ? +HD-Cam 34 | : source: HD Camera 35 | -other: Rip 36 | 37 | ? +HDCamRip 38 | ? +HD-Cam.rip 39 | ? -HDCam 40 | ? -HD-Cam 41 | : source: HD Camera 42 | other: Rip 43 | 44 | ? +Telesync 45 | ? +TS 46 | : source: Telesync 47 | -other: Rip 48 | 49 | ? +TelesyncRip 50 | ? +TSRip 51 | ? -Telesync 52 | ? -TS 53 | : source: Telesync 54 | other: Rip 55 | 56 | ? +HD TS 57 | ? -Hd.Ts # ts file extension 58 | ? -HD.TS # ts file extension 59 | ? +Hd-Ts 60 | : source: HD Telesync 61 | -other: Rip 62 | 63 | ? +HD TS Rip 64 | ? +Hd-Ts-Rip 65 | ? -HD TS 66 | ? -Hd-Ts 67 | : source: HD Telesync 68 | other: Rip 69 | 70 | ? +Workprint 71 | ? +workPrint 72 | ? +WorkPrint 73 | ? +WP 74 | ? -Work Print 75 | : source: Workprint 76 | -other: Rip 77 | 78 | ? +Telecine 79 | ? +teleCine 80 | ? +TC 81 | ? -Tele Cine 82 | : source: Telecine 83 | -other: Rip 84 | 85 | ? +Telecine Rip 86 | ? +teleCine-Rip 87 | ? +TC-Rip 88 | ? -Telecine 89 | ? -TC 90 | : source: Telecine 91 | other: Rip 92 | 93 | ? +HD-TELECINE 94 | ? +HDTC 95 | : source: HD Telecine 96 | -other: Rip 97 | 98 | ? +HD-TCRip 99 | ? +HD TELECINE RIP 100 | ? -HD-TELECINE 101 | ? -HDTC 102 | : source: HD Telecine 103 | other: Rip 104 | 105 | ? +PPV 106 | : source: Pay-per-view 107 | -other: Rip 108 | 109 | ? +ppv-rip 110 | ? -PPV 111 | : source: Pay-per-view 112 | other: Rip 113 | 114 | ? -TV 115 | ? +SDTV 116 | ? +TV-Dub 117 | : source: TV 118 | -other: Rip 119 | 120 | ? +SDTVRIP 121 | ? +Rip sd tv 122 | ? +TvRip 123 | ? +Rip TV 124 | ? -TV 125 | ? -SDTV 126 | : source: TV 127 | other: Rip 128 | 129 | ? +DVB 130 | ? +pdTV 131 | ? +Pd Tv 132 | : source: Digital TV 133 | -other: Rip 134 | 135 | ? +DVB-Rip 136 | ? +DvBRiP 137 | ? +pdtvRiP 138 | ? +pd tv RiP 139 | ? -DVB 140 | ? -pdTV 141 | ? -Pd Tv 142 | : source: Digital TV 143 | other: Rip 144 | 145 | ? +DVD 146 | ? +video ts 147 | ? +DVDR 148 | ? +DVD 9 149 | ? +dvd 5 150 | ? -dvd ts 151 | : source: DVD 152 | -source: Telesync 153 | -other: Rip 154 | 155 | ? +DVD-RIP 156 | ? -video ts 157 | ? -DVD 158 | ? -DVDR 159 | ? -DVD 9 160 | ? -dvd 5 161 | : source: DVD 162 | other: Rip 163 | 164 | ? +HDTV 165 | : source: HDTV 166 | -other: Rip 167 | 168 | ? +tv rip hd 169 | ? +HDtv Rip 170 | ? -HdRip # it's a Rip from non specified HD source 171 | ? -HDTV 172 | : source: HDTV 173 | other: Rip 174 | 175 | ? +VOD 176 | : source: Video on Demand 177 | -other: Rip 178 | 179 | ? +VodRip 180 | ? +vod rip 181 | ? -VOD 182 | : source: Video on Demand 183 | other: Rip 184 | 185 | ? +webrip 186 | ? +Web Rip 187 | ? +webdlrip 188 | ? +web dl rip 189 | ? +webcap 190 | ? +web cap 191 | ? +webcaprip 192 | ? +web cap rip 193 | : source: Web 194 | other: Rip 195 | 196 | ? +webdl 197 | ? +Web DL 198 | ? +webHD 199 | ? +WEB hd 200 | ? +web 201 | : source: Web 202 | -other: Rip 203 | 204 | ? +HDDVD 205 | ? +hd dvd 206 | : source: HD-DVD 207 | -other: Rip 208 | 209 | ? +hdDvdRip 210 | ? -HDDVD 211 | ? -hd dvd 212 | : source: HD-DVD 213 | other: Rip 214 | 215 | ? +BluRay 216 | ? +BD 217 | ? +BD5 218 | ? +BD9 219 | ? +BD25 220 | ? +bd50 221 | : source: Blu-ray 222 | -other: Rip 223 | 224 | ? +BR-Scr 225 | ? +BR.Screener 226 | : source: Blu-ray 227 | other: [Reencoded, Screener] 228 | -language: pt-BR 229 | 230 | ? +BR-Rip 231 | ? +BRRip 232 | : source: Blu-ray 233 | other: [Reencoded, Rip] 234 | -language: pt-BR 235 | 236 | ? +BluRay rip 237 | ? +BDRip 238 | ? -BluRay 239 | ? -BD 240 | ? -BR 241 | ? -BR rip 242 | ? -BD5 243 | ? -BD9 244 | ? -BD25 245 | ? -bd50 246 | : source: Blu-ray 247 | other: Rip 248 | 249 | ? XVID.NTSC.DVDR.nfo 250 | : source: DVD 251 | -other: Rip 252 | 253 | ? +AHDTV 254 | : source: Analog HDTV 255 | -other: Rip 256 | 257 | ? +dsr 258 | ? +dth 259 | : source: Satellite 260 | -other: Rip 261 | 262 | ? +dsrip 263 | ? +ds rip 264 | ? +dsrrip 265 | ? +dsr rip 266 | ? +satrip 267 | ? +sat rip 268 | ? +dthrip 269 | ? +dth rip 270 | ? -dsr 271 | ? -dth 272 | : source: Satellite 273 | other: Rip 274 | 275 | ? +UHDTV 276 | : source: Ultra HDTV 277 | -other: Rip 278 | 279 | ? +UHDRip 280 | ? +UHDTV Rip 281 | ? -UHDTV 282 | : source: Ultra HDTV 283 | other: Rip 284 | 285 | ? UHD Bluray 286 | ? UHD 2160p Bluray 287 | ? UHD 8bit Bluray 288 | ? UHD HQ 8bit Bluray 289 | ? Ultra Bluray 290 | ? Ultra HD Bluray 291 | ? Bluray ULTRA 292 | ? Bluray Ultra HD 293 | ? Bluray UHD 294 | ? 4K Bluray 295 | ? 2160p Bluray 296 | ? UHD 10bit HDR Bluray 297 | ? UHD HDR10 Bluray 298 | ? -HD Bluray 299 | ? -AMERICAN ULTRA (2015) 1080p Bluray 300 | ? -American.Ultra.2015.BRRip 301 | ? -BRRip XviD AC3-ULTRAS 302 | ? -UHD Proper Bluray 303 | : source: Ultra HD Blu-ray 304 | 305 | ? UHD.BRRip 306 | ? UHD.2160p.BRRip 307 | ? BRRip.2160p.UHD 308 | ? BRRip.[4K-2160p-UHD] 309 | : source: Ultra HD Blu-ray 310 | other: [Reencoded, Rip] 311 | 312 | ? UHD.2160p.BDRip 313 | ? BDRip.[4K-2160p-UHD] 314 | : source: Ultra HD Blu-ray 315 | other: Rip 316 | 317 | ? DM 318 | : source: Digital Master 319 | 320 | ? DMRIP 321 | ? DM-RIP 322 | : source: Digital Master 323 | other: Rip 324 | -------------------------------------------------------------------------------- /guessit/test/rules/title.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? Title Only 4 | ? -Title XViD 720p Only 5 | ? sub/folder/Title Only 6 | ? -sub/folder/Title XViD 720p Only 7 | ? Title Only.mkv 8 | ? Title Only.avi 9 | : title: Title Only 10 | 11 | ? Title Only/title_only.mkv 12 | : title: Title Only 13 | 14 | ? title_only.mkv 15 | : title: title only 16 | 17 | ? Some Title/some.title.mkv 18 | ? some.title/Some.Title.mkv 19 | : title: Some Title 20 | 21 | ? SOME TITLE/Some.title.mkv 22 | ? Some.title/SOME TITLE.mkv 23 | : title: Some title 24 | 25 | ? some title/Some.title.mkv 26 | ? Some.title/some title.mkv 27 | : title: Some title 28 | 29 | ? Some other title/Some.Other.title.mkv 30 | ? Some.Other title/Some other title.mkv 31 | : title: Some Other title 32 | 33 | ? This T.I.T.L.E. has dots 34 | ? This.T.I.T.L.E..has.dots 35 | : title: This T.I.T.L.E has dots 36 | 37 | ? This.T.I.T.L.E..has.dots.S01E02.This E.P.T.I.T.L.E.has.dots 38 | : title: This T.I.T.L.E has dots 39 | season: 1 40 | episode: 2 41 | episode_title: This E.P.T.I.T.L.E has dots 42 | type: episode 43 | 44 | ? /mydatapool/mydata/Videos/Shows/C/Caprica/Season 1/Apotheosis_1920x1080.mp4 45 | : title: Caprica 46 | episode_title: Apotheosis 47 | season: 1 48 | type: episode 49 | -------------------------------------------------------------------------------- /guessit/test/rules/video_codec.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? rv10 4 | ? rv13 5 | ? RV20 6 | ? Rv30 7 | ? rv40 8 | ? -xrv40 9 | : video_codec: RealVideo 10 | 11 | ? mpeg2 12 | ? MPEG2 13 | ? MPEG-2 14 | ? mpg2 15 | ? H262 16 | ? H.262 17 | ? x262 18 | ? -mpeg 19 | ? -xmpeg2 20 | ? -mpeg2x 21 | : video_codec: MPEG-2 22 | 23 | ? DivX 24 | ? -div X 25 | ? divx 26 | ? dvdivx 27 | ? DVDivX 28 | : video_codec: DivX 29 | 30 | ? XviD 31 | ? xvid 32 | ? -x vid 33 | : video_codec: Xvid 34 | 35 | ? h263 36 | ? x263 37 | ? h.263 38 | : video_codec: H.263 39 | 40 | ? h264 41 | ? x264 42 | ? h.264 43 | ? x.264 44 | ? AVC 45 | ? AVCHD 46 | ? -MPEG-4 47 | ? -mpeg4 48 | ? -mpeg 49 | ? -h 265 50 | ? -x265 51 | : video_codec: H.264 52 | 53 | ? h265 54 | ? x265 55 | ? h.265 56 | ? x.265 57 | ? hevc 58 | ? -h 264 59 | ? -x264 60 | : video_codec: H.265 61 | 62 | ? hevc10 63 | ? HEVC-YUV420P10 64 | : video_codec: H.265 65 | color_depth: 10-bit 66 | 67 | ? h265-HP 68 | : video_codec: H.265 69 | video_profile: High 70 | 71 | ? H.264-SC 72 | : video_codec: H.264 73 | video_profile: Scalable Video Coding 74 | 75 | ? mpeg4-AVC 76 | : video_codec: H.264 77 | video_profile: Advanced Video Codec High Definition 78 | 79 | ? AVCHD-SC 80 | ? H.264-AVCHD-SC 81 | : video_codec: H.264 82 | video_profile: 83 | - Scalable Video Coding 84 | - Advanced Video Codec High Definition 85 | 86 | ? VC1 87 | ? VC-1 88 | : video_codec: VC-1 89 | 90 | ? VP7 91 | : video_codec: VP7 92 | 93 | ? VP8 94 | ? VP80 95 | : video_codec: VP8 96 | 97 | ? VP9 98 | : video_codec: VP9 99 | -------------------------------------------------------------------------------- /guessit/test/rules/website.yml: -------------------------------------------------------------------------------- 1 | # Multiple input strings having same expected results can be chained. 2 | # Use - marker to check inputs that should not match results. 3 | ? +tvu.org.ru 4 | ? -tvu.unsafe.ru 5 | : website: tvu.org.ru 6 | 7 | ? +www.nimp.na 8 | ? -somewww.nimp.na 9 | ? -www.nimp.nawouak 10 | ? -nimp.na 11 | : website: www.nimp.na 12 | 13 | ? +wawa.co.uk 14 | ? -wawa.uk 15 | : website: wawa.co.uk 16 | 17 | ? -Dark.Net.S01E06.720p.HDTV.x264-BATV 18 | -Dark.Net.2015.720p.HDTV.x264-BATV 19 | : website: Dark.Net 20 | 21 | ? Dark.Net.S01E06.720p.HDTV.x264-BATV 22 | Dark.Net.2015.720p.HDTV.x264-BATV 23 | : title: Dark Net 24 | 25 | ? www.4MovieRulz.be - Ginny Weds Sunny (2020) 1080p Hindi Proper HDRip x264 DD5.1 - 2.4GB ESub.mkv 26 | : website: www.4MovieRulz.be 27 | -------------------------------------------------------------------------------- /guessit/test/suggested.json: -------------------------------------------------------------------------------- 1 | { 2 | "titles": [ 3 | "13 Reasons Why", 4 | "Star Wars: Episode VII - The Force Awakens", 5 | "3%", 6 | "The 100", 7 | "3 Percent", 8 | "This is Us", 9 | "Open Season 2", 10 | "Game of Thrones", 11 | "The X-Files", 12 | "11.22.63" 13 | ], 14 | "suggested": [ 15 | "13 Reasons Why", 16 | "Star Wars: Episode VII - The Force Awakens", 17 | "The 100", 18 | "Open Season 2", 19 | "11.22.63" 20 | ] 21 | } -------------------------------------------------------------------------------- /guessit/test/test-input-file.txt: -------------------------------------------------------------------------------- 1 | Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv 2 | SecondFile.avi -------------------------------------------------------------------------------- /guessit/test/test_api.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name, pointless-string-statement 4 | import json 5 | import os 6 | from pathlib import Path 7 | 8 | import pytest 9 | from pytest_mock import MockerFixture 10 | 11 | from .. import api 12 | from ..api import guessit, properties, suggested_expected, GuessitException, default_api 13 | 14 | __location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 15 | 16 | 17 | def test_default(): 18 | ret = guessit('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 19 | assert ret and 'title' in ret 20 | 21 | 22 | def test_forced_unicode(): 23 | ret = guessit('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 24 | assert ret and 'title' in ret and isinstance(ret['title'], str) 25 | 26 | 27 | def test_forced_binary(): 28 | ret = guessit(b'Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 29 | assert ret and 'title' in ret and isinstance(ret['title'], bytes) 30 | 31 | 32 | def test_pathlike_object(): 33 | path = Path('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 34 | ret = guessit(path) 35 | assert ret and 'title' in ret 36 | 37 | 38 | def test_unicode_japanese(): 39 | ret = guessit('[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi') 40 | assert ret and 'title' in ret 41 | 42 | 43 | def test_unicode_japanese_options(): 44 | ret = guessit("[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi", options={"expected_title": ["阿维达"]}) 45 | assert ret and 'title' in ret and ret['title'] == "阿维达" 46 | 47 | 48 | def test_forced_unicode_japanese_options(): 49 | ret = guessit("[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi", options={"expected_title": ["阿维达"]}) 50 | assert ret and 'title' in ret and ret['title'] == "阿维达" 51 | 52 | 53 | def test_properties(): 54 | props = properties() 55 | assert 'video_codec' in props.keys() 56 | 57 | 58 | def test_exception(): 59 | with pytest.raises(GuessitException) as excinfo: 60 | guessit(object()) 61 | assert "An internal error has occurred in guessit" in str(excinfo.value) 62 | assert "Guessit Exception Report" in str(excinfo.value) 63 | assert "Please report at https://github.com/guessit-io/guessit/issues" in str(excinfo.value) 64 | 65 | 66 | def test_suggested_expected(): 67 | with open(os.path.join(__location__, 'suggested.json'), 'r', encoding='utf-8') as f: 68 | content = json.load(f) 69 | actual = suggested_expected(content['titles']) 70 | assert actual == content['suggested'] 71 | 72 | 73 | def test_should_rebuild_rebulk_on_advanced_config_change(mocker: MockerFixture): 74 | api.reset() 75 | rebulk_builder_spy = mocker.spy(api, 'rebulk_builder') 76 | 77 | string = "some.movie.trfr.mkv" 78 | 79 | result1 = default_api.guessit(string) 80 | 81 | assert result1.get('title') == 'some movie trfr' 82 | assert 'subtitle_language' not in result1 83 | 84 | rebulk_builder_spy.assert_called_once_with(mocker.ANY) 85 | rebulk_builder_spy.reset_mock() 86 | 87 | result2 = default_api.guessit(string, {'advanced_config': {'language': {'subtitle_prefixes': ['tr']}}}) 88 | 89 | assert result2.get('title') == 'some movie' 90 | assert str(result2.get('subtitle_language')) == 'fr' 91 | 92 | rebulk_builder_spy.assert_called_once_with(mocker.ANY) 93 | rebulk_builder_spy.reset_mock() 94 | 95 | 96 | def test_should_not_rebuild_rebulk_on_same_advanced_config(mocker: MockerFixture): 97 | api.reset() 98 | rebulk_builder_spy = mocker.spy(api, 'rebulk_builder') 99 | 100 | string = "some.movie.subfr.mkv" 101 | 102 | result1 = default_api.guessit(string) 103 | 104 | assert result1.get('title') == 'some movie' 105 | assert str(result1.get('subtitle_language')) == 'fr' 106 | 107 | rebulk_builder_spy.assert_called_once_with(mocker.ANY) 108 | rebulk_builder_spy.reset_mock() 109 | 110 | result2 = default_api.guessit(string) 111 | 112 | assert result2.get('title') == 'some movie' 113 | assert str(result2.get('subtitle_language')) == 'fr' 114 | 115 | assert rebulk_builder_spy.call_count == 0 116 | rebulk_builder_spy.reset_mock() 117 | -------------------------------------------------------------------------------- /guessit/test/test_api_unicode_literals.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name, pointless-string-statement 4 | 5 | 6 | import os 7 | 8 | import pytest 9 | 10 | from ..api import guessit, properties, GuessitException 11 | 12 | __location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 13 | 14 | 15 | def test_default(): 16 | ret = guessit('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 17 | assert ret and 'title' in ret 18 | 19 | 20 | def test_forced_unicode(): 21 | ret = guessit('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 22 | assert ret and 'title' in ret and isinstance(ret['title'], str) 23 | 24 | 25 | def test_forced_binary(): 26 | ret = guessit(b'Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 27 | assert ret and 'title' in ret and isinstance(ret['title'], bytes) 28 | 29 | 30 | def test_unicode_japanese(): 31 | ret = guessit('[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi') 32 | assert ret and 'title' in ret 33 | 34 | 35 | def test_unicode_japanese_options(): 36 | ret = guessit("[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi", options={"expected_title": ["阿维达"]}) 37 | assert ret and 'title' in ret and ret['title'] == "阿维达" 38 | 39 | 40 | def test_forced_unicode_japanese_options(): 41 | ret = guessit("[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi", options={"expected_title": ["阿维达"]}) 42 | assert ret and 'title' in ret and ret['title'] == "阿维达" 43 | 44 | 45 | def test_ensure_custom_string_class(): 46 | class CustomStr(str): 47 | pass 48 | 49 | ret = guessit(CustomStr('some.title.1080p.mkv'), options={'advanced': True}) 50 | assert ret and 'screen_size' in ret and isinstance(ret['screen_size'].input_string, CustomStr) 51 | assert ret and 'title' in ret and isinstance(ret['title'].input_string, CustomStr) 52 | assert ret and 'container' in ret and isinstance(ret['container'].input_string, CustomStr) 53 | 54 | 55 | def test_properties(): 56 | props = properties() 57 | assert 'video_codec' in props.keys() 58 | 59 | 60 | def test_exception(): 61 | with pytest.raises(GuessitException) as excinfo: 62 | guessit(object()) 63 | assert "An internal error has occurred in guessit" in str(excinfo.value) 64 | assert "Guessit Exception Report" in str(excinfo.value) 65 | assert "Please report at https://github.com/guessit-io/guessit/issues" in str(excinfo.value) 66 | -------------------------------------------------------------------------------- /guessit/test/test_benchmark.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement,missing-docstring,invalid-name,line-too-long 4 | import time 5 | 6 | import pytest 7 | 8 | from ..api import guessit 9 | 10 | 11 | def case1(): 12 | return guessit('Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv') 13 | 14 | 15 | def case2(): 16 | return guessit('Movies/Fantastic Mr Fox/Fantastic.Mr.Fox.2009.DVDRip.{x264+LC-AAC.5.1}{Fr-Eng}{Sub.Fr-Eng}-™.[sharethefiles.com].mkv') 17 | 18 | 19 | def case3(): 20 | return guessit('Series/dexter/Dexter.5x02.Hello,.Bandit.ENG.-.sub.FR.HDTV.XviD-AlFleNi-TeaM.[tvu.org.ru].avi') 21 | 22 | 23 | def case4(): 24 | return guessit('Movies/The Doors (1991)/09.03.08.The.Doors.(1991).BDRip.720p.AC3.X264-HiS@SiLUHD-English.[sharethefiles.com].mkv') 25 | 26 | 27 | @pytest.mark.benchmark( 28 | group="Performance Tests", 29 | min_time=1, 30 | max_time=2, 31 | min_rounds=5, 32 | timer=time.time, 33 | disable_gc=True, 34 | warmup=False 35 | ) 36 | @pytest.mark.skipif(True, reason="Disabled") 37 | class TestBenchmark: 38 | def test_case1(self, benchmark): 39 | ret = benchmark(case1) 40 | assert ret 41 | 42 | def test_case2(self, benchmark): 43 | ret = benchmark(case2) 44 | assert ret 45 | 46 | def test_case3(self, benchmark): 47 | ret = benchmark(case3) 48 | assert ret 49 | 50 | def test_case4(self, benchmark): 51 | ret = benchmark(case4) 52 | assert ret 53 | -------------------------------------------------------------------------------- /guessit/test/test_main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name 4 | import json 5 | import os 6 | import sys 7 | 8 | import pytest 9 | from _pytest.capture import CaptureFixture 10 | 11 | from ..__main__ import main 12 | 13 | __location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 14 | 15 | 16 | # Prevent output from spamming the console 17 | @pytest.fixture(scope="function", autouse=True) 18 | def no_stdout(monkeypatch): 19 | with open(os.devnull, "w") as f: # pylint:disable=unspecified-encoding 20 | monkeypatch.setattr(sys, "stdout", f) 21 | yield 22 | 23 | def test_main_no_args(): 24 | main([]) 25 | 26 | 27 | def test_main(): 28 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv']) 29 | 30 | 31 | def test_main_unicode(): 32 | main(['[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi']) 33 | 34 | 35 | def test_main_forced_unicode(): 36 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv']) 37 | 38 | 39 | def test_main_verbose(): 40 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv', '--verbose']) 41 | 42 | 43 | def test_main_yaml(): 44 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv', '--yaml']) 45 | 46 | 47 | def test_main_json(): 48 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv', '--json']) 49 | 50 | 51 | def test_main_show_property(): 52 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv', '-P', 'title']) 53 | 54 | 55 | def test_main_advanced(): 56 | main(['Fear.and.Loathing.in.Las.Vegas.FRENCH.ENGLISH.720p.HDDVD.DTS.x264-ESiR.mkv', '-a']) 57 | 58 | 59 | def test_main_input(): 60 | main(['--input', os.path.join(__location__, 'test-input-file.txt')]) 61 | 62 | 63 | def test_main_properties(): 64 | main(['-p']) 65 | main(['-p', '--json']) 66 | main(['-p', '--yaml']) 67 | 68 | 69 | def test_main_values(): 70 | main(['-V']) 71 | main(['-V', '--json']) 72 | main(['-V', '--yaml']) 73 | 74 | 75 | def test_main_help(): 76 | with pytest.raises(SystemExit): 77 | main(['--help']) 78 | 79 | 80 | def test_main_version(): 81 | main(['--version']) 82 | 83 | 84 | def test_json_output_input_string(capsys: CaptureFixture): 85 | main(['--json', '--output-input-string', 'test.avi']) 86 | 87 | outerr = capsys.readouterr() 88 | data = json.loads(outerr.out) 89 | 90 | assert 'input_string' in data 91 | assert data['input_string'] == 'test.avi' 92 | 93 | 94 | def test_json_no_output_input_string(capsys: CaptureFixture): 95 | main(['--json', 'test.avi']) 96 | 97 | outerr = capsys.readouterr() 98 | data = json.loads(outerr.out) 99 | 100 | assert 'input_string' not in data 101 | -------------------------------------------------------------------------------- /guessit/test/test_options.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # pylint: disable=pointless-statement, missing-docstring, invalid-name, pointless-string-statement 4 | import os 5 | 6 | import pytest 7 | 8 | from ..options import get_options_file_locations, merge_options, load_config_file, ConfigurationException, \ 9 | load_config 10 | 11 | __location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 12 | 13 | 14 | def test_config_locations(): 15 | homedir = '/root' 16 | cwd = '/root/cwd' 17 | 18 | locations = get_options_file_locations(homedir, cwd, True) 19 | assert len(locations) == 9 20 | 21 | assert '/root/.guessit/options.json' in locations 22 | assert '/root/.guessit/options.yml' in locations 23 | assert '/root/.guessit/options.yaml' in locations 24 | assert '/root/.config/guessit/options.json' in locations 25 | assert '/root/.config/guessit/options.yml' in locations 26 | assert '/root/.config/guessit/options.yaml' in locations 27 | assert '/root/cwd/guessit.options.json' in locations 28 | assert '/root/cwd/guessit.options.yml' in locations 29 | assert '/root/cwd/guessit.options.yaml' in locations 30 | 31 | 32 | def test_merge_configurations(): 33 | c1 = {'param1': True, 'param2': True, 'param3': False} 34 | c2 = {'param1': False, 'param2': True, 'param3': False} 35 | c3 = {'param1': False, 'param2': True, 'param3': False} 36 | 37 | merged = merge_options(c1, c2, c3) 38 | assert not merged['param1'] 39 | assert merged['param2'] 40 | assert not merged['param3'] 41 | 42 | merged = merge_options(c3, c2, c1) 43 | assert merged['param1'] 44 | assert merged['param2'] 45 | assert not merged['param3'] 46 | 47 | 48 | def test_merge_configurations_lists(): 49 | c1 = {'param1': [1], 'param2': True, 'param3': False} 50 | c2 = {'param1': [2], 'param2': True, 'param3': False} 51 | c3 = {'param1': [3], 'param2': True, 'param3': False} 52 | 53 | merged = merge_options(c1, c2, c3) 54 | assert merged['param1'] == [1, 2, 3] 55 | assert merged['param2'] 56 | assert not merged['param3'] 57 | 58 | merged = merge_options(c3, c2, c1) 59 | assert merged['param1'] == [3, 2, 1] 60 | assert merged['param2'] 61 | assert not merged['param3'] 62 | 63 | 64 | def test_merge_configurations_deep(): 65 | c1 = {'param1': [1], 'param2': {'d1': [1]}, 'param3': False} 66 | c2 = {'param1': [2], 'param2': {'d1': [2]}, 'param3': False} 67 | c3 = {'param1': [3], 'param2': {'d3': [3]}, 'param3': False} 68 | 69 | merged = merge_options(c1, c2, c3) 70 | assert merged['param1'] == [1, 2, 3] 71 | assert merged['param2']['d1'] == [1, 2] 72 | assert merged['param2']['d3'] == [3] 73 | assert 'd2' not in merged['param2'] 74 | assert not merged['param3'] 75 | 76 | merged = merge_options(c3, c2, c1) 77 | assert merged['param1'] == [3, 2, 1] 78 | assert merged['param2'] 79 | assert merged['param2']['d1'] == [2, 1] 80 | assert 'd2' not in merged['param2'] 81 | assert merged['param2']['d3'] == [3] 82 | assert not merged['param3'] 83 | 84 | 85 | def test_merge_configurations_pristine_all(): 86 | c1 = {'param1': [1], 'param2': True, 'param3': False} 87 | c2 = {'param1': [2], 'param2': True, 'param3': False, 'pristine': True} 88 | c3 = {'param1': [3], 'param2': True, 'param3': False} 89 | 90 | merged = merge_options(c1, c2, c3) 91 | assert merged['param1'] == [2, 3] 92 | assert merged['param2'] 93 | assert not merged['param3'] 94 | 95 | merged = merge_options(c3, c2, c1) 96 | assert merged['param1'] == [2, 1] 97 | assert merged['param2'] 98 | assert not merged['param3'] 99 | 100 | 101 | def test_merge_configurations_pristine_properties(): 102 | c1 = {'param1': [1], 'param2': False, 'param3': True} 103 | c2 = {'param1': [2], 'param2': True, 'param3': False, 'pristine': ['param2', 'param3']} 104 | c3 = {'param1': [3], 'param2': True, 'param3': False} 105 | 106 | merged = merge_options(c1, c2, c3) 107 | assert merged['param1'] == [1, 2, 3] 108 | assert merged['param2'] 109 | assert not merged['param3'] 110 | 111 | 112 | def test_merge_configurations_pristine_properties_deep(): 113 | c1 = {'param1': [1], 'param2': {'d1': False}, 'param3': True} 114 | c2 = {'param1': [2], 'param2': {'d1': True}, 'param3': False, 'pristine': ['param2', 'param3']} 115 | c3 = {'param1': [3], 'param2': {'d1': True}, 'param3': False} 116 | 117 | merged = merge_options(c1, c2, c3) 118 | assert merged['param1'] == [1, 2, 3] 119 | assert merged['param2'] 120 | assert not merged['param3'] 121 | 122 | 123 | def test_merge_configurations_pristine_properties2(): 124 | c1 = {'param1': [1], 'param2': False, 'param3': True} 125 | c2 = {'param1': [2], 'param2': True, 'param3': False, 'pristine': ['param1', 'param2', 'param3']} 126 | c3 = {'param1': [3], 'param2': True, 'param3': False} 127 | 128 | merged = merge_options(c1, c2, c3) 129 | assert merged['param1'] == [2, 3] 130 | assert merged['param2'] 131 | assert not merged['param3'] 132 | 133 | 134 | def test_load_config_file(): 135 | json_config = load_config_file(os.path.join(__location__, 'config', 'test.json')) 136 | yml_config = load_config_file(os.path.join(__location__, 'config', 'test.yml')) 137 | yaml_config = load_config_file(os.path.join(__location__, 'config', 'test.yaml')) 138 | 139 | assert json_config['expected_title'] == ['The 100', 'OSS 117'] 140 | assert yml_config['expected_title'] == ['The 100', 'OSS 117'] 141 | assert yaml_config['expected_title'] == ['The 100', 'OSS 117'] 142 | 143 | assert json_config['yaml'] is False 144 | assert yml_config['yaml'] is True 145 | assert yaml_config['yaml'] is True 146 | 147 | with pytest.raises(ConfigurationException) as excinfo: 148 | load_config_file(os.path.join(__location__, 'config', 'dummy.txt')) 149 | 150 | assert excinfo.match('Configuration file extension is not supported for ".*?dummy.txt" file\\.') 151 | 152 | 153 | def test_load_config(): 154 | config = load_config({'no_default_config': True, 'param1': 'test', 155 | 'config': [os.path.join(__location__, 'config', 'test.yml')]}) 156 | 157 | assert not config.get('param1') 158 | 159 | assert config.get('advanced_config') # advanced_config is still loaded from default 160 | assert config['expected_title'] == ['The 100', 'OSS 117'] 161 | assert config['yaml'] is True 162 | 163 | config = load_config({'no_default_config': True, 'param1': 'test'}) 164 | 165 | assert not config.get('param1') 166 | 167 | assert 'expected_title' not in config 168 | assert 'yaml' not in config 169 | 170 | config = load_config({'no_default_config': True, 'param1': 'test', 'config': ['false']}) 171 | 172 | assert not config.get('param1') 173 | 174 | assert 'expected_title' not in config 175 | assert 'yaml' not in config 176 | -------------------------------------------------------------------------------- /guessit/yamlutils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Options 5 | """ 6 | 7 | from collections import OrderedDict 8 | 9 | import babelfish 10 | import yaml # pylint:disable=wrong-import-order 11 | 12 | from .rules.common.quantity import BitRate, FrameRate, Size 13 | 14 | 15 | class OrderedDictYAMLLoader(yaml.SafeLoader): 16 | """ 17 | A YAML loader that loads mappings into ordered dictionaries. 18 | From https://gist.github.com/enaeseth/844388 19 | """ 20 | 21 | def __init__(self, *args, **kwargs): 22 | yaml.SafeLoader.__init__(self, *args, **kwargs) 23 | 24 | self.add_constructor('tag:yaml.org,2002:map', type(self).construct_yaml_map) 25 | self.add_constructor('tag:yaml.org,2002:omap', type(self).construct_yaml_map) 26 | 27 | def construct_yaml_map(self, node): 28 | data = OrderedDict() 29 | yield data 30 | value = self.construct_mapping(node) 31 | data.update(value) 32 | 33 | def construct_mapping(self, node, deep=False): 34 | if isinstance(node, yaml.MappingNode): 35 | self.flatten_mapping(node) 36 | else: # pragma: no cover 37 | raise yaml.constructor.ConstructorError(None, None, 38 | f'expected a mapping node, but found {node.id}', node.start_mark) 39 | 40 | mapping = OrderedDict() 41 | for key_node, value_node in node.value: 42 | key = self.construct_object(key_node, deep=deep) 43 | try: 44 | hash(key) 45 | except TypeError as exc: # pragma: no cover 46 | raise yaml.constructor.ConstructorError('while constructing a mapping', 47 | node.start_mark, f'found unacceptable key ({exc})' 48 | , key_node.start_mark) 49 | value = self.construct_object(value_node, deep=deep) 50 | mapping[key] = value 51 | return mapping 52 | 53 | 54 | class CustomDumper(yaml.SafeDumper): 55 | """ 56 | Custom YAML Dumper. 57 | """ 58 | pass # pylint:disable=unnecessary-pass 59 | 60 | 61 | def default_representer(dumper, data): 62 | """Default representer""" 63 | return dumper.represent_str(str(data)) 64 | 65 | 66 | CustomDumper.add_representer(babelfish.Language, default_representer) 67 | CustomDumper.add_representer(babelfish.Country, default_representer) 68 | CustomDumper.add_representer(BitRate, default_representer) 69 | CustomDumper.add_representer(FrameRate, default_representer) 70 | CustomDumper.add_representer(Size, default_representer) 71 | 72 | 73 | def ordered_dict_representer(dumper, data): 74 | """OrderedDict representer""" 75 | return dumper.represent_mapping('tag:yaml.org,2002:map', data.items()) 76 | 77 | 78 | CustomDumper.add_representer(OrderedDict, ordered_dict_representer) 79 | -------------------------------------------------------------------------------- /mkdocs.yml: -------------------------------------------------------------------------------- 1 | site_name: GuessIt 2 | 3 | site_url: https://guessit-io.github.io/guessit 4 | site_description: GuessIt is a python library that extracts as much information as possible from a video filename. 5 | site_author: Rémi Alvergnat 6 | 7 | repo_url: https://github.com/guessit-io/guessit 8 | edit_uri: https://github.com/guessit-io/guessit/blob/develop/docs 9 | 10 | theme: 11 | language: 'en' 12 | name: 'material' 13 | 14 | markdown_extensions: 15 | - admonition 16 | - codehilite 17 | - toc: 18 | permalink: true 19 | - pymdownx.details 20 | - pymdownx.superfences 21 | 22 | nav: 23 | - Home: index.md 24 | - Properties: properties.md 25 | - Sources: sources.md 26 | - Configuration: configuration.md 27 | - Migration (2.x to 3.x): migration2to3.md 28 | - Migration (1.x to 2.x): migration.md 29 | 30 | plugins: 31 | - search 32 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.semantic_release] 2 | version_variables = ["guessit/__version__.py:__version__"] 3 | commit_message = "chore(release): release v{version}" 4 | commit_author = "github-actions " 5 | build_command = "" 6 | 7 | [tool.check-manifest] 8 | ignore = ["docs", "docs/*", ".dockerignore", "Dockerfile", "docker", "docker/*"] 9 | -------------------------------------------------------------------------------- /pytest.ini: -------------------------------------------------------------------------------- 1 | [pytest] 2 | addopts=-s --ignore=setup.py --ignore=build --ignore=docs --doctest-modules --doctest-glob='*.rst' 3 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # see https://caremad.io/blog/setup-vs-requirement/ 2 | -e . 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import io 5 | import os 6 | import re 7 | 8 | from setuptools import setup, find_packages 9 | 10 | here = os.path.abspath(os.path.dirname(__file__)) 11 | 12 | with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f: 13 | readme = f.read() 14 | 15 | with io.open(os.path.join(here, 'CHANGELOG.md'), encoding='utf-8') as f: 16 | changelog = f.read() 17 | 18 | install_requires = ['rebulk>=3.2.0', 'babelfish>=0.6.0', 'python-dateutil', 'importlib-resources;python_version<"3.9"'] 19 | 20 | dev_require = ['tox', 'mkdocs', 'mkdocs-material', 'pyinstaller', 'wheel', 'python-semantic-release', 'twine'] 21 | 22 | tests_require = ['pytest', 'pytest-mock', 'pytest-benchmark', 'pytest-cov', 'pylint', 'PyYAML'] 23 | 24 | package_data = ['config/*', 'data/*'] 25 | 26 | entry_points = { 27 | 'console_scripts': [ 28 | 'guessit = guessit.__main__:main' 29 | ], 30 | } 31 | 32 | with io.open('guessit/__version__.py', 'r') as f: 33 | version = re.search(r'^__version__\s*=\s*[\'"]([^\'"]*)[\'"]$', f.read(), re.MULTILINE).group(1) 34 | 35 | args = dict(name='guessit', 36 | version=version, 37 | description='GuessIt - a library for guessing information from video filenames.', 38 | long_description=readme + '\n\n' + changelog, 39 | long_description_content_type='text/markdown', 40 | # Get strings from http://pypi.python.org/pypi?%3Aaction=list_classifiers 41 | classifiers=['Development Status :: 5 - Production/Stable', 42 | 'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)', 43 | 'Operating System :: OS Independent', 44 | 'Intended Audience :: Developers', 45 | 'Programming Language :: Python :: 3', 46 | 'Programming Language :: Python :: 3.7', 47 | 'Programming Language :: Python :: 3.8', 48 | 'Programming Language :: Python :: 3.9', 49 | 'Programming Language :: Python :: 3.10', 50 | 'Programming Language :: Python :: 3.11', 51 | 'Topic :: Multimedia', 52 | 'Topic :: Software Development :: Libraries :: Python Modules' 53 | ], 54 | keywords='python library release parser name filename movies series episodes animes', 55 | author='Rémi Alvergnat', 56 | author_email='toilal.dev@gmail.com', 57 | url='https://guessit-io.github.io/guessit', 58 | download_url='https://pypi.python.org/packages/source/g/guessit/guessit-%s.tar.gz' % version, 59 | license='LGPLv3', 60 | packages=find_packages(), 61 | package_data={'guessit': package_data}, 62 | include_package_data=True, 63 | install_requires=install_requires, 64 | entry_points=entry_points, 65 | test_suite='guessit.test', 66 | zip_safe=True, 67 | extras_require={ 68 | 'test': tests_require, 69 | 'dev': dev_require 70 | }) 71 | 72 | setup(**args) 73 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py37,py38,py39,py310,py311,py312,pypy3.8,pypy3.9,pypy3.10 3 | 4 | [testenv] 5 | commands = 6 | {envbindir}/pip install -e .[dev,test] 7 | {envbindir}/pylint guessit 8 | {envbindir}/pytest 9 | --------------------------------------------------------------------------------