├── .circleci └── config.yml ├── .dockerignore ├── .gitignore ├── .pre-commit-config.yaml ├── .readthedocs.yaml ├── CODE_OF_CONDUCT.rst ├── CONTRIBUTING.rst ├── Dockerfile ├── LICENSE ├── README.rst ├── codemeta.json ├── doc ├── Makefile ├── api │ ├── environments.rst │ ├── utils.rst │ └── wrappers.rst ├── basic-usage.rst ├── code-of-conduct.rst ├── conf.py ├── contributing.rst ├── development-guide.rst ├── environments │ └── saturation-env.rst ├── example.org ├── example.py ├── example.rst ├── index.rst └── requirements.txt ├── gym_saturation ├── __init__.py ├── constants.py ├── envs │ ├── __init__.py │ ├── iprover_env.py │ ├── saturation_env.py │ └── vampire_env.py ├── py.typed ├── relay_server.py ├── resources │ ├── TPTP-mock │ │ └── Problems │ │ │ └── TST │ │ │ ├── TST001-1.p │ │ │ └── TST002-1.p │ └── vampire-mock ├── vampire_wrapper.py └── wrappers │ ├── __init__.py │ └── labels_extractor.py ├── joss-paper ├── architecture.png ├── architecture.svg ├── paper.bib ├── paper.md └── paper.tex ├── local-build.sh ├── poetry.lock ├── poetry.toml ├── pyproject.toml └── tableaux2023-paper ├── ast2vec.drawio.svg ├── gym-saturation.bib ├── gym-saturation.tex ├── iprover-gym.drawio.svg ├── llncs.cls ├── mean-reward.eps └── splncs04.bst /.circleci/config.yml: -------------------------------------------------------------------------------- 1 | version: 2.1 2 | jobs: 3 | build-and-test: 4 | docker: 5 | - image: inpefess/python_with_provers:2025.03.10 6 | steps: 7 | - checkout 8 | - run: 9 | name: use tox 10 | command: | 11 | pip install tox 12 | pyenv local 3.10.16 3.11.11 3.12.9 3.13.2 13 | tox 14 | - run: 15 | name: upload data to codecov 16 | command: | 17 | bash <(curl -s https://codecov.io/bash) -X gcov -X coveragepy 18 | - store_artifacts: 19 | path: build 20 | - store_test_results: 21 | path: test-results 22 | workflows: 23 | main: 24 | jobs: 25 | - build-and-test 26 | -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- 1 | venv 2 | doc 3 | !doc/example.py 4 | **/.* 5 | dist 6 | joss-paper 7 | coverage.xml 8 | test-results 9 | **/*.*~ 10 | **/*~ 11 | **/__pycache__ 12 | **/flycheck_* -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | venv*/ 2 | *.*~ 3 | *~ 4 | .ipynb_checkpoints/ 5 | *.pyc 6 | test-results/ 7 | .coverage 8 | *.dat 9 | *.pt 10 | runs/ 11 | \#*.*# 12 | *.gz 13 | *.whl 14 | doc/_build/ 15 | *.pdf 16 | .idea/ 17 | *.out 18 | *.bbl 19 | *.aux 20 | *.bcf 21 | *.blg 22 | *.log 23 | .dir-locals.el 24 | coverage.xml 25 | flycheck_* 26 | *.md5 27 | *.pickle -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: local 3 | hooks: 4 | - id: ruff format 5 | name: ruff format 6 | entry: ruff format 7 | language: system 8 | types: [python] 9 | - id: ruff check 10 | name: ruff check 11 | entry: ruff check 12 | language: system 13 | types: [python] 14 | - id: pydoclint 15 | name: pydoclint 16 | entry: pydoclint 17 | language: system 18 | types: [python] 19 | - id: pyrefly 20 | name: pyrefly 21 | entry: pyrefly check 22 | language: system 23 | types: [python] 24 | -------------------------------------------------------------------------------- /.readthedocs.yaml: -------------------------------------------------------------------------------- 1 | # Read the Docs configuration file 2 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details 3 | 4 | version: 2 5 | 6 | build: 7 | os: ubuntu-24.04 8 | tools: 9 | python: "3.13" 10 | 11 | sphinx: 12 | configuration: doc/conf.py 13 | fail_on_warning: true 14 | 15 | python: 16 | install: 17 | - requirements: doc/requirements.txt 18 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.rst: -------------------------------------------------------------------------------- 1 | Contributor Covenant Code of Conduct 2 | ==================================== 3 | 4 | Our Pledge 5 | ---------- 6 | 7 | We as members, contributors, and leaders pledge to make participation in 8 | our community a harassment-free experience for everyone, regardless of 9 | age, body size, visible or invisible disability, ethnicity, sex 10 | characteristics, gender identity and expression, level of experience, 11 | education, socio-economic status, nationality, personal appearance, 12 | race, caste, color, religion, or sexual identity and orientation. 13 | 14 | We pledge to act and interact in ways that contribute to an open, 15 | welcoming, diverse, inclusive, and healthy community. 16 | 17 | Our Standards 18 | ------------- 19 | 20 | Examples of behavior that contributes to a positive environment for our 21 | community include: 22 | 23 | - Demonstrating empathy and kindness toward other people 24 | - Being respectful of differing opinions, viewpoints, and experiences 25 | - Giving and gracefully accepting constructive feedback 26 | - Accepting responsibility and apologizing to those affected by our 27 | mistakes, and learning from the experience 28 | - Focusing on what is best not just for us as individuals, but for the 29 | overall community 30 | 31 | Examples of unacceptable behavior include: 32 | 33 | - The use of sexualized language or imagery, and sexual attention or 34 | advances of any kind 35 | - Trolling, insulting or derogatory comments, and personal or political 36 | attacks 37 | - Public or private harassment 38 | - Publishing others’ private information, such as a physical or email 39 | address, without their explicit permission 40 | - Other conduct which could reasonably be considered inappropriate in a 41 | professional setting 42 | 43 | Enforcement Responsibilities 44 | ---------------------------- 45 | 46 | Community leaders are responsible for clarifying and enforcing our 47 | standards of acceptable behavior and will take appropriate and fair 48 | corrective action in response to any behavior that they deem 49 | inappropriate, threatening, offensive, or harmful. 50 | 51 | Community leaders have the right and responsibility to remove, edit, or 52 | reject comments, commits, code, wiki edits, issues, and other 53 | contributions that are not aligned to this Code of Conduct, and will 54 | communicate reasons for moderation decisions when appropriate. 55 | 56 | Scope 57 | ----- 58 | 59 | This Code of Conduct applies within all community spaces, and also 60 | applies when an individual is officially representing the community in 61 | public spaces. Examples of representing our community include using an 62 | official e-mail address, posting via an official social media account, 63 | or acting as an appointed representative at an online or offline event. 64 | 65 | Enforcement 66 | ----------- 67 | 68 | Instances of abusive, harassing, or otherwise unacceptable behavior may 69 | be reported to the community leaders responsible for enforcement at `the 70 | following email `__. All complaints will be 71 | reviewed and investigated promptly and fairly. 72 | 73 | All community leaders are obligated to respect the privacy and security 74 | of the reporter of any incident. 75 | 76 | Enforcement Guidelines 77 | ---------------------- 78 | 79 | Community leaders will follow these Community Impact Guidelines in 80 | determining the consequences for any action they deem in violation of 81 | this Code of Conduct: 82 | 83 | 1. Correction 84 | ~~~~~~~~~~~~~ 85 | 86 | **Community Impact**: Use of inappropriate language or other behavior 87 | deemed unprofessional or unwelcome in the community. 88 | 89 | **Consequence**: A private, written warning from community leaders, 90 | providing clarity around the nature of the violation and an explanation 91 | of why the behavior was inappropriate. A public apology may be 92 | requested. 93 | 94 | 2. Warning 95 | ~~~~~~~~~~ 96 | 97 | **Community Impact**: A violation through a single incident or series of 98 | actions. 99 | 100 | **Consequence**: A warning with consequences for continued behavior. No 101 | interaction with the people involved, including unsolicited interaction 102 | with those enforcing the Code of Conduct, for a specified period of 103 | time. This includes avoiding interactions in community spaces as well as 104 | external channels like social media. Violating these terms may lead to a 105 | temporary or permanent ban. 106 | 107 | 3. Temporary Ban 108 | ~~~~~~~~~~~~~~~~ 109 | 110 | **Community Impact**: A serious violation of community standards, 111 | including sustained inappropriate behavior. 112 | 113 | **Consequence**: A temporary ban from any sort of interaction or public 114 | communication with the community for a specified period of time. No 115 | public or private interaction with the people involved, including 116 | unsolicited interaction with those enforcing the Code of Conduct, is 117 | allowed during this period. Violating these terms may lead to a 118 | permanent ban. 119 | 120 | 4. Permanent Ban 121 | ~~~~~~~~~~~~~~~~ 122 | 123 | **Community Impact**: Demonstrating a pattern of violation of community 124 | standards, including sustained inappropriate behavior, harassment of an 125 | individual, or aggression toward or disparagement of classes of 126 | individuals. 127 | 128 | **Consequence**: A permanent ban from any sort of public interaction 129 | within the community. 130 | 131 | Attribution 132 | ----------- 133 | 134 | This Code of Conduct is adapted from the `Contributor 135 | Covenant `__, version 2.1, 136 | available at 137 | https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. 138 | 139 | Community Impact Guidelines were inspired by `Mozilla’s code of conduct 140 | enforcement ladder `__. 141 | 142 | For answers to common questions about this code of conduct, see the FAQ 143 | at https://www.contributor-covenant.org/faq. Translations are available 144 | at https://www.contributor-covenant.org/translations. 145 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | ============ 2 | Contributing 3 | ============ 4 | 5 | Contributions are welcome, and they are greatly appreciated! Every 6 | little bit helps, and credit will always be given. Don't forget to 7 | read and adhere to the :ref:`code-of-conduct`. 8 | 9 | You can contribute in many ways: 10 | 11 | Types of Contributions 12 | ---------------------- 13 | 14 | Report Bugs 15 | ~~~~~~~~~~~ 16 | 17 | Report bugs at https://github.com/inpefess/gym-saturation/issues 18 | 19 | If you are reporting a bug, please include: 20 | 21 | * Your operating system name and version. 22 | * Any details about your local setup that might be helpful in 23 | troubleshooting. 24 | * Detailed steps to reproduce the bug. 25 | 26 | Fix Bugs 27 | ~~~~~~~~ 28 | 29 | Look through the GitHub issues for bugs. Anything tagged with "bug" 30 | and "help wanted" is open to whoever wants to implement a fix for it. 31 | 32 | Implement Features 33 | ~~~~~~~~~~~~~~~~~~ 34 | 35 | Look through the GitHub issues for features. Anything tagged with 36 | "enhancement" and "help wanted" is open to whoever wants to implement 37 | it. Contributors to the source code hold copyright of their work and 38 | should agree to distribute it under `Apache 2.0 39 | `__ licence. 40 | 41 | Write Documentation 42 | ~~~~~~~~~~~~~~~~~~~ 43 | 44 | ``gym-saturation`` could always use more documentation, whether as 45 | part of the official docs, in docstrings, or even on the web in blog 46 | posts, articles, and such. Documentation authors hold copyright of 47 | their work and should agree to distribute it under `Apache 2.0 48 | `__ licence. 49 | 50 | Submit Feedback 51 | ~~~~~~~~~~~~~~~ 52 | 53 | The best way to send feedback is to file an issue at 54 | https://github.com/inpefess/gym-saturation/issues. 55 | 56 | If you are proposing a new feature: 57 | 58 | * Explain in detail how it would work. 59 | * Keep the scope as narrow as possible, to make it easier to 60 | implement. 61 | * Remember that this is a volunteer-driven project, and that 62 | contributions are welcome. Please refer to the 63 | :ref:`development-guide` for details. 64 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM inpefess/python_with_provers:2025.03.10 2 | COPY gym_saturation ./gym_saturation 3 | COPY pyproject.toml poetry.toml poetry.lock README.rst ./doc/example.py . 4 | RUN pip install -e . 5 | RUN pip install jupyterlab jupytext 6 | RUN jupytext-config set-default-viewer python nest_asyncio 7 | ENTRYPOINT ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", \ 8 | "--ServerApp.token=''", "--no-browser"] 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | https://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | https://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2021-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | |PyPI version|\ |Anaconda|\ |CircleCI|\ |Documentation Status|\ |codecov|\ |DOI| 17 | 18 | gym-saturation 19 | ============== 20 | 21 | ``gym-saturation`` is a collection of `Gymnasium 22 | `__ environments for reinforcement 23 | learning (RL) agents guiding saturation-style automated theorem 24 | provers (ATPs) based on the `given clause algorithm 25 | `__. 26 | 27 | There are two environments in ``gym-saturation`` following the same 28 | API: `SaturationEnv 29 | `__: 30 | ``VampireEnv`` --- for `Vampire 31 | `__ prover, and ``IProverEnv`` 32 | --- for `iProver `__. 33 | 34 | ``gym-saturation`` can be interesting for RL practitioners willing to 35 | apply their experience to theorem proving without coding all the 36 | logic-related stuff themselves. 37 | 38 | In particular, ATPs serving as ``gym-saturation`` backends 39 | incapsulate parsing the input formal language (usually, one of the 40 | `TPTP `__ (Thousands of Problems for Theorem 41 | Provers) library), transforming the input formulae to the `clausal 42 | normal form 43 | `__, and logic 44 | inference using rules such as `resolution 45 | `__ and 46 | `superposition 47 | `__. 48 | 49 | How to Install 50 | ============== 51 | 52 | .. attention:: If you want to use ``VampireEnv`` you should have a 53 | Vampire binary on your machine. For example, download the 54 | latest `release 55 | `__. 56 | 57 | To use ``IProverEnv``, please download a stable iProver 58 | `release 59 | `__ or build it from `this commit `__. 60 | 61 | The best way to install this package is to use ``pip``: 62 | 63 | .. code:: sh 64 | 65 | pip install gym-saturation 66 | 67 | Another option is to use ``conda``: 68 | 69 | .. code:: sh 70 | 71 | conda install -c conda-forge gym-saturation 72 | 73 | One can also run it in a Docker container (pre-packed with 74 | ``vampire`` and ``iproveropt`` binaries): 75 | 76 | .. code:: sh 77 | 78 | docker build -t gym-saturation https://github.com/inpefess/gym-saturation.git 79 | docker run --rm -p 8888:8888 --name gym-saturation -d gym-saturation 80 | 81 | and navigate to ``__ in 82 | your browser. 83 | 84 | How to use 85 | ========== 86 | 87 | One can use ``gym-saturation`` environments as any other Gymnasium environment: 88 | 89 | .. code:: python 90 | 91 | import gym_saturation 92 | import gymnasium 93 | 94 | env = gymnasium.make("Vampire-v0") # or "iProver-v0" 95 | # skip this line to use the default problem 96 | env.set_task("a-TPTP-problem-filename") 97 | observation, info = env.reset() 98 | terminated, truncated = False, False 99 | while not (terminated or truncated): 100 | # apply policy 101 | action = ... 102 | observation, reward, terminated, truncated, info = env.step(str(action)) 103 | env.close() 104 | 105 | Have a look at the basic `tutorial `__. 106 | 107 | More Documentation 108 | ================== 109 | 110 | More documentation can be found 111 | `here `__. 112 | 113 | Related Projects 114 | ================= 115 | 116 | Other projects using RL-guidance for ATPs include: 117 | 118 | * `TRAIL `__ 119 | * `FLoP `__ (see `the paper `__ for more details) 120 | * `lazyCoP `__ (see `the paper `__ for more details) 121 | 122 | Other projects not using RL per se, but iterating a supervised 123 | learning procedure instead: 124 | 125 | * ENIGMA (several repos, e.g. `this one 126 | `__ for 127 | iProver; see `the paper `__ for 128 | others) 129 | * `Deepire `__ 130 | 131 | How to Contribute 132 | ================= 133 | 134 | Please follow `the contribution guide `__ while adhering to `the code of conduct `__. 135 | 136 | How to Cite 137 | ============ 138 | 139 | If you are writing a research paper and want to cite ``gym-saturation``, please use the following `DOI `__. 140 | 141 | .. |PyPI version| image:: https://badge.fury.io/py/gym-saturation.svg 142 | :target: https://badge.fury.io/py/gym-saturation 143 | .. |CircleCI| image:: https://circleci.com/gh/inpefess/gym-saturation.svg?style=svg 144 | :target: https://circleci.com/gh/inpefess/gym-saturation 145 | .. |Documentation Status| image:: https://readthedocs.org/projects/gym-saturation/badge/?version=latest 146 | :target: https://gym-saturation.readthedocs.io/en/latest/?badge=latest 147 | .. |codecov| image:: https://codecov.io/gh/inpefess/gym-saturation/branch/master/graph/badge.svg 148 | :target: https://codecov.io/gh/inpefess/gym-saturation 149 | .. |DOI| image:: https://img.shields.io/badge/DOI-10.1007%2F978--3--031--43513--3__11-blue 150 | :target: https://doi.org/10.1007/978-3-031-43513-3_11 151 | .. |Anaconda| image:: https://anaconda.org/conda-forge/gym-saturation/badges/version.svg 152 | :target: https://anaconda.org/conda-forge/gym-saturation 153 | -------------------------------------------------------------------------------- /codemeta.json: -------------------------------------------------------------------------------- 1 | { 2 | "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld", 3 | "@type": "Code", 4 | "author": [ 5 | { 6 | "@id": "0000-0002-1291-9896", 7 | "@type": "Person", 8 | "email": "boris.shminke@univ-cotedazur.fr", 9 | "name": "Boris Shminke", 10 | "affiliation": "Laboratoire J.A. Dieudonné, CNRS and Université Côte d'Azur, France", 11 | } 12 | ], 13 | "identifier": "", 14 | "codeRepository": "https://github.com/inpefess/gym-saturation", 15 | "datePublished": "2021-10-01", 16 | "dateModified": "2021-10-01", 17 | "dateCreated": "2021-10-01", 18 | "description": "Gymnasium environments for saturation provers", 19 | "keywords": "Gymnasium, reinforcement learning, automated theorem prover, saturation prover", 20 | "license": "Apache 2.0", 21 | "title": "gym-saturation", 22 | "version": "v0.1.4", 23 | } 24 | -------------------------------------------------------------------------------- /doc/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line, and also 5 | # from the environment for the first two. 6 | SPHINXOPTS ?= 7 | SPHINXBUILD ?= sphinx-build 8 | SOURCEDIR = . 9 | BUILDDIR = _build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /doc/api/environments.rst: -------------------------------------------------------------------------------- 1 | Environments 2 | ************* 3 | .. automodule:: gym_saturation.envs.saturation_env 4 | :members: 5 | .. automodule:: gym_saturation.envs.vampire_env 6 | :members: 7 | .. automodule:: gym_saturation.envs.iprover_env 8 | :members: 9 | -------------------------------------------------------------------------------- /doc/api/utils.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2021-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | Utils 17 | ****** 18 | .. automodule:: gym_saturation.vampire_wrapper 19 | :members: 20 | .. automodule:: gym_saturation.relay_server 21 | :members: 22 | .. automodule:: gym_saturation.constants 23 | :members: 24 | -------------------------------------------------------------------------------- /doc/api/wrappers.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2023-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | Wrappers 17 | ********* 18 | .. automodule:: gym_saturation.wrappers.labels_extractor 19 | :members: 20 | -------------------------------------------------------------------------------- /doc/basic-usage.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2023-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | ############ 17 | Basic Usage 18 | ############ 19 | 20 | Initialising Environments 21 | ************************** 22 | 23 | Using environments from ``gym-saturation`` is very similar to using them in `Gymnasium `__. You initialise an environment via: 24 | 25 | .. code:: python 26 | 27 | import gym_saturation 28 | import gymnasium 29 | 30 | env = gymnasium.make("Vampire-v0") 31 | 32 | Additional Environment API 33 | *************************** 34 | 35 | There are two additional methods to each ``gym-saturation`` environment: 36 | 37 | * ``set_task`` --- to specify a filename of a `TPTP `__ problem to solve (like in `Meta-World `__ multi-task environments) 38 | * ``get_task`` --- to look up a filename of a TPTP problem being solved (like in ``TaskSettableEnv`` in `Ray RLlib `__) 39 | -------------------------------------------------------------------------------- /doc/code-of-conduct.rst: -------------------------------------------------------------------------------- 1 | .. _code-of-conduct: 2 | 3 | .. include:: ../CODE_OF_CONDUCT.rst 4 | -------------------------------------------------------------------------------- /doc/conf.py: -------------------------------------------------------------------------------- 1 | # type: ignore 2 | # Copyright 2021-2025 Boris Shminke 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | """Sphinx config.""" 17 | 18 | import gym_saturation 19 | from importlib.metadata import distribution 20 | 21 | distribution_metadata = distribution(gym_saturation.__name__).metadata 22 | project = distribution_metadata["Name"] 23 | version = distribution_metadata["Version"] 24 | author = distribution_metadata["Author"] 25 | copyright = f"2021-2025, {author}" 26 | extensions = [ 27 | "sphinx.ext.autodoc", 28 | "sphinx.ext.coverage", 29 | ] 30 | html_theme = "furo" 31 | html_title = f"{project} documentation" 32 | -------------------------------------------------------------------------------- /doc/contributing.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../CONTRIBUTING.rst 2 | -------------------------------------------------------------------------------- /doc/development-guide.rst: -------------------------------------------------------------------------------- 1 | .. _development-guide: 2 | 3 | ================= 4 | Development Guide 5 | ================= 6 | 7 | Get Started! 8 | ------------ 9 | 10 | Ready to contribute? Here's how to set up `gym-saturation` for local 11 | development. Please note this documentation assumes you already have 12 | `Git 13 | `__ 14 | installed and ready to go. 15 | 16 | #. `Fork `__ the 17 | `gym-saturation` on GitHub. 18 | 19 | #. Clone your fork locally: 20 | 21 | .. code:: sh 22 | 23 | cd git_URL 24 | git clone git@github.com:YOUR_NAME/gym-saturation.git 25 | 26 | #. Install 27 | [poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) 28 | 29 | #. Now you can install all the things you need for development: 30 | 31 | .. code:: bash 32 | 33 | poetry install --all-groups 34 | # install Vampire binary 35 | wget https://github.com/vprover/vampire/releases/download/v4.7/vampire4.7.zip -O vampire.zip 36 | unzip vampire.zip 37 | # then use vampire_z3_rel_static_HEAD_6295 as an argument or add it to $PATH 38 | # install iProver binary 39 | wget https://gitlab.com/api/v4/projects/39846772/jobs/artifacts/2023.04.10/download?job=build-job -O iprover.zip 40 | unzip iprover.zip 41 | # then use iproveropt 42 | 43 | #. `poetry` will also create a virtual environment in a `.venv` 44 | subfolder. To activate it: 45 | 46 | .. code:: bash 47 | 48 | poetry env activate 49 | 50 | #. Create a branch for local development: 51 | 52 | .. code:: bash 53 | 54 | git checkout -b name-of-your-bug-fix-or-feature 55 | 56 | Now you can make your changes locally. 57 | 58 | #. When you're done making changes, check that your changes pass code 59 | quality checks. 60 | 61 | .. code:: bash 62 | 63 | ruff format 64 | ruff check 65 | pydoclint gym_saturation 66 | pyrefly check 67 | 68 | #. You can also do these checks automatically on each commit. To 69 | activate this option: 70 | 71 | .. code:: bash 72 | 73 | pre-commit install 74 | 75 | #. The next step would be to run the test cases. `gym-saturation` 76 | uses pytest and all the existing tests are `doctest 77 | `__. 78 | 79 | .. code:: bash 80 | 81 | coverage run -m pytest 82 | coverage report -m 83 | 84 | #. If your contribution is a bug fix or new feature, you may want to 85 | add a test to the existing test suite. If possible, do it by 86 | doctest, not a dedicates test case file. 87 | 88 | #. Commit your changes and push your branch to GitHub: 89 | 90 | .. code:: bash 91 | 92 | git add . 93 | git commit -m "Your detailed description of your changes." 94 | git push origin name-of-your-bug-fix-or-feature 95 | 96 | #. Submit a pull request through the GitHub website. 97 | 98 | 99 | Pull Request Guidelines 100 | ----------------------- 101 | 102 | Before you submit a pull request, check that it meets these 103 | guidelines: 104 | 105 | #. The pull request should include tests. 106 | 107 | #. If the pull request adds functionality, the docs should be 108 | updated. Put your new functionality into a function with a 109 | docstring, and add new classes or functions to a relevant file in 110 | the `doc/api` folder. To build the doc locally: 111 | 112 | .. code:: bash 113 | 114 | cd doc 115 | make html 116 | 117 | #. The pull request should work for Python 3.10, 3.11, 3.12, and 118 | 3.13. Check https://github.com/inpefess/gym-saturation/pulls and 119 | make sure that the CI checks pass for all supported Python 120 | versions. 121 | -------------------------------------------------------------------------------- /doc/environments/saturation-env.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2023-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | .. _saturation_env: 17 | 18 | ############## 19 | SaturationEnv 20 | ############## 21 | 22 | ``SaturationEnv`` is an abstract class for environments guiding the choice of a given clause in the saturation algorithm used to build automated theorem provers. It has two subclasses: 23 | 24 | * ``VampireEnv`` is an environment for guiding the choice of a given clause in the saturation loop of a `Vampire `__ prover. Since we focus on guiding the saturation loop here, we don't use the Avatar [3]_ 25 | * ``IProverEnv`` is an environment for guiding the choice of a given clause in the saturation loop of the `iProver `__ [1]_ 26 | 27 | .. csv-table:: 28 | 29 | Action Space, "``Text(256, charset=ALPHANUMERIC_WITH_UNDERSCORE)``" 30 | Observation Space, "``Sequence(Text(4000, charset=EXTENDED_ALPHANUMERIC))``" 31 | import, ``import gym_saturation; gymnasium.make("Vampire-v0")`` 32 | import, ``import gym_saturation; gymnasium.make("iProver-v0")`` 33 | 34 | ``EXTENDED_ALPHANUMERIC`` should cover characters used by the `TPTP 35 | `__ language. 36 | 37 | Description 38 | ************ 39 | 40 | The given clause (or saturation) algorithm is the basis of many contemporary provers. See [2]_ for an excellent introduction and Python code snippets. In short, the algorithm is the following one: 41 | 42 | .. code-block:: python 43 | 44 | unprocessed_clauses: list[Clause] = get_preprocessed_theorem_statement() 45 | processed_clauses: list[Clause] = [] 46 | while EMPTY_CLAUSE not in unprocessed_clauses and unprocessed_clauses: 47 | given_clause: Clause = select_given_clause(unprocessed_clauses) 48 | processed_clauses.append(given_clause) 49 | new_clauses: list[Clause] = apply_inference_rules( 50 | given_clause, processed_clauses 51 | ) 52 | unprocessed_clauses.extend(new_clauses) 53 | unprocessed_clauses.remove(given_clause) 54 | 55 | ``get_preprocessed_theorem_statement`` corresponds to the environment reset, and typically includes parsing, `Skolemization `__, transformation to `conjunctive normal form `__ among other things. 56 | 57 | ``apply_inference_rules`` is a 'logic engine' of a prover corresponding to what happens during the environment's ``step``. It usually includes `resolution `__, `superposition `__ and other nasty stuff. To guarantee the algorithm applicability, the inference rules system must be `refutation complete `__. 58 | 59 | ``select_given_clause`` is a trainable agent policy. 60 | 61 | If the ``EMPTY_CLAUSE`` (aka falsehood or contradiction) appears among the ``unprocessed_clauses``, it means we arrived at a refutation of the original set of clauses, which gives us a proof by contradiction for our theorem. If the ``unprocessed_clauses`` becomes empty, it means we've built a counter-example for our proposition. 62 | 63 | Action Space 64 | ************* 65 | 66 | Action is a label of a given clause. Different provers use different labelling systems. 67 | 68 | Observation Space 69 | ****************** 70 | 71 | An observation is a tuple of strings, containing a clause literals in the TPTP syntax, e.g. ``'mult(X, mult(Y, Z)) = mult(mult(X, Y), Z)'`` for each clause belonging to ``unprocessed_clauses`` and ``processed_clauses`` 72 | 73 | Starting State 74 | *************** 75 | 76 | A starting state of the environment depends on a task set (a theorem to prove). If there are ``N`` unprocessed clauses in the pre-processed theorem statement, the starting state contains ``N`` strings. 77 | 78 | By default, the task is a simple theorem from group theory: 79 | 80 | .. include:: ../../gym_saturation/resources/TPTP-mock/Problems/TST/TST001-1.p 81 | :literal: 82 | 83 | One can set another task by specifying a filename of a respective TPTP problem: 84 | 85 | .. code:: python 86 | 87 | env.set_task(filename) 88 | 89 | Rewards 90 | ******** 91 | 92 | Reward is always ``0.0``. One has to use `reward wrappers 93 | `__ 94 | according to their training strategy. 95 | 96 | Episode End 97 | ************ 98 | 99 | * Termination means the saturation algorithm ended with refutation 100 | found or satisfiability established. 101 | * There is no default truncation condition (one can add it using 102 | wrappers, e.g. `TimeLimit 103 | `__) 104 | 105 | Information 106 | ************ 107 | 108 | The environment returns no additional information. 109 | 110 | Arguments 111 | ********** 112 | 113 | .. code-block:: python 114 | 115 | import gymnasium 116 | 117 | gymnasium.make( 118 | "Vampire-v0", # or "iProver-v0" 119 | prover_binary_path="vampire", # or "iproveropt" 120 | ) 121 | 122 | ``prover_binary_path="vampire"`` (or ``"iproveropt"``): the path to a prover binary (supposed to be on the ``$PATH`` by default) 123 | 124 | References 125 | *********** 126 | 127 | .. [1] Duarte, A., Korovin, K. (2020). Implementing Superposition in iProver (System Description). In: Peltier, N., Sofronie-Stokkermans, V. (eds) Automated Reasoning. IJCAR 2020. Lecture Notes in Computer Science(), vol 12167. Springer, Cham. ``__ 128 | 129 | .. [2] Schulz, S., Pease, A. (2020). Teaching Automated Theorem Proving by Example: PyRes 1.2. In: Peltier, N., Sofronie-Stokkermans, V. (eds) Automated Reasoning. IJCAR 2020. Lecture Notes in Computer Science(), vol 12167. Springer, Cham. ``__ 130 | 131 | .. [3] Voronkov, A. (2014). AVATAR: The Architecture for First-Order Theorem Provers. In: Biere, A., Bloem, R. (eds) Computer Aided Verification. CAV 2014. Lecture Notes in Computer Science, vol 8559. Springer, Cham. ``__ 132 | 133 | Version History 134 | **************** 135 | 136 | * v0: Initial version release 137 | -------------------------------------------------------------------------------- /doc/example.org: -------------------------------------------------------------------------------- 1 | * Random and age agents for Vampire and iProver 2 | ** Random agent for Vampire 3 | We can make a prover environment as any other Gymnasium one. 4 | 5 | We will also add a wrapper to extract formulae labels. 6 | 7 | #+begin_src python 8 | import gymnasium as gym 9 | 10 | from gym_saturation.wrappers import LabelsExtractor 11 | 12 | env = LabelsExtractor(gym.make("Vampire-v0")) 13 | #+end_src 14 | 15 | #+RESULTS: 16 | 17 | before using the environment, we should reset it 18 | 19 | #+begin_src python 20 | observation, info = env.reset() 21 | #+end_src 22 | 23 | #+RESULTS: 24 | 25 | ~gym-saturation~ environments don't return any ~info~ 26 | 27 | #+begin_src python 28 | print(info) 29 | #+end_src 30 | 31 | #+RESULTS: 32 | : {} 33 | 34 | Observation is a tuple of CNF formulae. 35 | 36 | By default, we are trying to prove a basic group theory lemma: every idempotent element equals the identity. 37 | 38 | #+begin_src python 39 | print("Observation:") 40 | print("\n".join(observation["observation"])) 41 | #+end_src 42 | 43 | #+RESULTS: 44 | : Observation: 45 | : cnf(c_1,axiom,mult(X0,mult(X1,X2))=mult(mult(X0,X1),X2),file('input.p')). 46 | : cnf(c_2,axiom,mult(e,X0)=X0,file('input.p')). 47 | : cnf(c_3,axiom,e=mult(inv(X0),X0),file('input.p')). 48 | : cnf(c_4,axiom,a=mult(a,a),file('input.p')). 49 | : cnf(c_5,axiom,e!=a,file('input.p')). 50 | 51 | Wrappers extracts formulae labels for us: 52 | 53 | #+begin_src python 54 | labels = list(observation["labels"]) 55 | print(labels) 56 | #+end_src 57 | 58 | #+RESULTS: 59 | : ['c_1', 'c_2', 'c_3', 'c_4', 'c_5'] 60 | 61 | Here is an example of an episode during which we play random actions. 62 | We set the random seed for reproducibility. 63 | 64 | #+begin_src python 65 | import random 66 | 67 | random.seed(0) 68 | 69 | terminated, truncated = False, False 70 | while not (terminated or truncated): 71 | action = random.choice(labels) 72 | observation, reward, terminated, truncated, info = env.step(action) 73 | print("Action:", action, "Observation:") 74 | print("\n".join(observation["observation"])) 75 | labels.remove(action) 76 | labels += list(observation["labels"]) 77 | 78 | env.close() 79 | #+end_src 80 | 81 | #+RESULTS: 82 | #+begin_example 83 | Action: c_4 Observation: 84 | 85 | Action: c_5 Observation: 86 | 87 | Action: c_1 Observation: 88 | cnf(c_6,plain,mult(a,X0)=mult(a,mult(a,X0)),inference(superposition,[],[c_1,c_4])). 89 | Action: c_3 Observation: 90 | cnf(c_11,plain,mult(inv(X0),mult(X0,X1))=X1,inference(forward_demodulation,[],[c_10,c_2])). 91 | Action: c_11 Observation: 92 | cnf(c_18,plain,$false,inference(subsumption_resolution,[],[c_17,c_5])). 93 | #+end_example 94 | 95 | the episode is terminated 96 | 97 | #+begin_src python 98 | print(terminated, truncated) 99 | #+end_src 100 | 101 | #+RESULTS: 102 | : True False 103 | 104 | It means we arrived at a contradiction (~$false~) which proves the lemma. 105 | 106 | #+begin_src python 107 | print(observation["observation"][-1]) 108 | #+end_src 109 | 110 | #+RESULTS: 111 | : cnf(c_18,plain,$false,inference(subsumption_resolution,[],[c_17,c_5])). 112 | 113 | ** Age agent for iProver 114 | We initialise iProver-based environment in the same way as Vampire-based one: 115 | 116 | #+begin_src python 117 | env = LabelsExtractor(gym.make("iProver-v0")) 118 | #+end_src 119 | 120 | #+RESULTS: 121 | 122 | Instead of a random agent, let's use Age agent which selects actions in the order they appear 123 | 124 | #+begin_src python 125 | observation, info = env.reset() 126 | print("Observation:") 127 | print("\n".join(observation["observation"])) 128 | labels = list(observation["labels"]) 129 | terminated = False 130 | while not terminated: 131 | action = labels.pop(0) 132 | observation, reward, terminated, truncated, info = env.step(action) 133 | print("Action:", action, "Observation:") 134 | print("\n".join(observation["observation"])) 135 | labels += list(observation["labels"]) 136 | env.close() 137 | #+end_src 138 | 139 | #+RESULTS: 140 | #+begin_example 141 | Observation: 142 | cnf(c_53,axiom,e!=a,file('input.p')). 143 | cnf(c_52,axiom,mult(a,a)=a,file('input.p')). 144 | cnf(c_50,axiom,mult(e,X0)=X0,file('input.p')). 145 | cnf(c_51,axiom,mult(inv(X0),X0)=e,file('input.p')). 146 | cnf(c_49,axiom,mult(mult(X0,X1),X2)=mult(X0,mult(X1,X2)),file('input.p')). 147 | Action: c_53 Observation: 148 | 149 | Action: c_52 Observation: 150 | 151 | Action: c_50 Observation: 152 | 153 | Action: c_51 Observation: 154 | 155 | Action: c_49 Observation: 156 | cnf(c_63,plain,mult(a,mult(a,X0))=mult(a,X0),inference(superposition,[],[c_52,c_49])). 157 | cnf(c_62,plain,mult(inv(X0),mult(X0,X1))=mult(e,X1),inference(superposition,[],[c_51,c_49])). 158 | cnf(c_64,plain,mult(mult(X0,mult(X1,X2)),X3)=mult(mult(X0,X1),mult(X2,X3)),inference(superposition,[],[c_49,c_49])). 159 | Action: c_63 Observation: 160 | cnf(c_68,plain,mult(a,mult(mult(a,X0),X1))=mult(mult(a,X0),X1),inference(superposition,[],[c_63,c_49])). 161 | Action: c_62 Observation: 162 | cnf(c_70,plain,mult(inv(X0),mult(X0,X1))=X1,inference(demodulation,[],[c_62,c_50])). 163 | cnf(c_74,plain,mult(inv(a),a)=a,inference(superposition,[],[c_52,c_70])). 164 | cnf(c_72,plain,mult(inv(e),X0)=X0,inference(superposition,[],[c_50,c_70])). 165 | cnf(c_73,plain,mult(inv(inv(X0)),e)=X0,inference(superposition,[],[c_51,c_70])). 166 | cnf(c_77,plain,mult(inv(inv(X0)),X1)=mult(X0,X1),inference(superposition,[],[c_70,c_70])). 167 | cnf(c_76,plain,mult(inv(a),mult(a,X0))=mult(a,X0),inference(superposition,[],[c_63,c_70])). 168 | cnf(c_78,plain,mult(inv(X0),mult(mult(X0,X1),X2))=mult(X1,X2),inference(superposition,[],[c_70,c_49])). 169 | cnf(c_71,plain,mult(inv(mult(X0,X1)),mult(X0,mult(X1,X2)))=X2,inference(superposition,[],[c_49,c_70])). 170 | Action: c_64 Observation: 171 | 172 | Action: c_68 Observation: 173 | 174 | Action: c_70 Observation: 175 | 176 | Action: c_74 Observation: 177 | cnf(c_85,plain,e=a,inference(demodulation,[],[c_74,c_51])). 178 | cnf(c_86,plain,$false,inference(forward_subsumption_resolution,[],[c_85,c_53])). 179 | #+end_example 180 | 181 | We still arrive at a contradiction 182 | 183 | #+begin_src python 184 | print(terminated, truncated) 185 | print(observation["observation"][-1]) 186 | #+end_src 187 | 188 | #+RESULTS: 189 | : True False 190 | : cnf(c_86,plain,$false,inference(forward_subsumption_resolution,[],[c_85,c_53])). 191 | -------------------------------------------------------------------------------- /doc/example.py: -------------------------------------------------------------------------------- 1 | # Random agent for Vampire 2 | # We can make a prover environment as any other Gymnasium one. 3 | 4 | # We will also add a wrapper to extract formulae labels. 5 | 6 | 7 | import gymnasium as gym 8 | 9 | from gym_saturation.wrappers import LabelsExtractor 10 | 11 | env = LabelsExtractor(gym.make("Vampire-v0")) 12 | 13 | 14 | # before using the environment, we should reset it 15 | 16 | 17 | observation, info = env.reset() 18 | 19 | 20 | # ~gym-saturation~ environments don't return any ~info~ 21 | 22 | 23 | print(info) 24 | 25 | 26 | # Observation is a tuple of CNF formulae. 27 | 28 | # By default, we are trying to prove a basic group theory lemma: every idempotent element equals the identity. 29 | 30 | 31 | print("Observation:") 32 | print("\n".join(observation["observation"])) 33 | 34 | 35 | # Wrappers extracts formulae labels for us: 36 | 37 | 38 | labels = list(observation["labels"]) 39 | print(labels) 40 | 41 | 42 | # Here is an example of an episode during which we play random actions. 43 | # We set the random seed for reproducibility. 44 | 45 | 46 | import random 47 | 48 | random.seed(0) 49 | 50 | terminated, truncated = False, False 51 | while not (terminated or truncated): 52 | action = random.choice(labels) 53 | observation, reward, terminated, truncated, info = env.step(action) 54 | print("Action:", action, "Observation:") 55 | print("\n".join(observation["observation"])) 56 | labels.remove(action) 57 | labels += list(observation["labels"]) 58 | 59 | env.close() 60 | 61 | 62 | # the episode is terminated 63 | 64 | 65 | print(terminated, truncated) 66 | 67 | 68 | # It means we arrived at a contradiction (~$false~) which proves the lemma. 69 | 70 | 71 | print(observation["observation"][-1]) 72 | # Age agent for iProver 73 | # We initialise iProver-based environment in the same way as Vampire-based one: 74 | 75 | 76 | env = LabelsExtractor(gym.make("iProver-v0")) 77 | 78 | 79 | # To run in Jupyter 80 | 81 | import nest_asyncio 82 | 83 | nest_asyncio.apply() 84 | 85 | 86 | # Instead of a random agent, let's use Age agent which selects actions in the order they appear 87 | 88 | 89 | observation, info = env.reset() 90 | print("Observation:") 91 | print("\n".join(observation["observation"])) 92 | labels = list(observation["labels"]) 93 | terminated = False 94 | while not terminated: 95 | action = labels.pop(0) 96 | observation, reward, terminated, truncated, info = env.step(action) 97 | print("Action:", action, "Observation:") 98 | print("\n".join(observation["observation"])) 99 | labels += list(observation["labels"]) 100 | env.close() 101 | 102 | 103 | # We still arrive at a contradiction 104 | 105 | 106 | print(terminated, truncated) 107 | print(observation["observation"][-1]) 108 | -------------------------------------------------------------------------------- /doc/example.rst: -------------------------------------------------------------------------------- 1 | Random and age agents for Vampire and iProver 2 | ---------------------------------------------- 3 | 4 | Random agent for Vampire 5 | ~~~~~~~~~~~~~~~~~~~~~~~~~ 6 | 7 | We can make a prover environment as any other Gymnasium one. 8 | 9 | We will also add a wrapper to extract formulae labels. 10 | 11 | .. code:: python 12 | 13 | import gymnasium as gym 14 | 15 | from gym_saturation.wrappers import LabelsExtractor 16 | 17 | env = LabelsExtractor(gym.make("Vampire-v0")) 18 | 19 | before using the environment, we should reset it 20 | 21 | .. code:: python 22 | 23 | observation, info = env.reset() 24 | 25 | ``gym-saturation`` environments don't return any ``info`` 26 | 27 | .. code:: python 28 | 29 | print(info) 30 | 31 | :: 32 | 33 | {} 34 | 35 | 36 | Observation is a tuple of CNF formulae. 37 | 38 | By default, we are trying to prove a basic group theory lemma: every idempotent element equals the identity. 39 | 40 | .. code:: python 41 | 42 | print("Observation:") 43 | print("\n".join(observation["observation"])) 44 | 45 | :: 46 | 47 | Observation: 48 | cnf(c_1,axiom,mult(X0,mult(X1,X2))=mult(mult(X0,X1),X2),file('input.p')). 49 | cnf(c_2,axiom,mult(e,X0)=X0,file('input.p')). 50 | cnf(c_3,axiom,e=mult(inv(X0),X0),file('input.p')). 51 | cnf(c_4,axiom,a=mult(a,a),file('input.p')). 52 | cnf(c_5,axiom,e!=a,file('input.p')). 53 | 54 | 55 | Wrappers extracts formulae labels for us: 56 | 57 | .. code:: python 58 | 59 | labels = list(observation["labels"]) 60 | print(labels) 61 | 62 | :: 63 | 64 | ['c_1', 'c_2', 'c_3', 'c_4', 'c_5'] 65 | 66 | 67 | Here is an example of an episode during which we play random actions. 68 | We set the random seed for reproducibility. 69 | 70 | .. code:: python 71 | 72 | import random 73 | 74 | random.seed(0) 75 | 76 | terminated, truncated = False, False 77 | while not (terminated or truncated): 78 | action = random.choice(labels) 79 | observation, reward, terminated, truncated, info = env.step(action) 80 | print("Action:", action, "Observation:") 81 | print("\n".join(observation["observation"])) 82 | labels.remove(action) 83 | labels += list(observation["labels"]) 84 | 85 | env.close() 86 | 87 | :: 88 | 89 | Action: c_4 Observation: 90 | 91 | Action: c_5 Observation: 92 | 93 | Action: c_1 Observation: 94 | cnf(c_6,plain,mult(a,X0)=mult(a,mult(a,X0)),inference(superposition,[],[c_1,c_4])). 95 | Action: c_3 Observation: 96 | cnf(c_11,plain,mult(inv(X0),mult(X0,X1))=X1,inference(forward_demodulation,[],[c_10,c_2])). 97 | Action: c_11 Observation: 98 | cnf(c_18,plain,$false,inference(subsumption_resolution,[],[c_17,c_5])). 99 | 100 | the episode is terminated 101 | 102 | .. code:: python 103 | 104 | print(terminated, truncated) 105 | 106 | :: 107 | 108 | True False 109 | 110 | 111 | It means we arrived at a contradiction (``$false``) which proves the lemma. 112 | 113 | .. code:: python 114 | 115 | print(observation["observation"][-1]) 116 | 117 | :: 118 | 119 | cnf(c_18,plain,$false,inference(subsumption_resolution,[],[c_17,c_5])). 120 | 121 | Age agent for iProver 122 | ~~~~~~~~~~~~~~~~~~~~~~ 123 | 124 | We initialise iProver-based environment in the same way as Vampire-based one: 125 | 126 | .. code:: python 127 | 128 | env = LabelsExtractor(gym.make("iProver-v0")) 129 | 130 | Instead of a random agent, let's use Age agent which selects actions in the order they appear 131 | 132 | .. code:: python 133 | 134 | observation, info = env.reset() 135 | print("Observation:") 136 | print("\n".join(observation["observation"])) 137 | labels = list(observation["labels"]) 138 | terminated = False 139 | while not terminated: 140 | action = labels.pop(0) 141 | observation, reward, terminated, truncated, info = env.step(action) 142 | print("Action:", action, "Observation:") 143 | print("\n".join(observation["observation"])) 144 | labels += list(observation["labels"]) 145 | env.close() 146 | 147 | :: 148 | 149 | Observation: 150 | cnf(c_53,axiom,e!=a,file('input.p')). 151 | cnf(c_52,axiom,mult(a,a)=a,file('input.p')). 152 | cnf(c_50,axiom,mult(e,X0)=X0,file('input.p')). 153 | cnf(c_51,axiom,mult(inv(X0),X0)=e,file('input.p')). 154 | cnf(c_49,axiom,mult(mult(X0,X1),X2)=mult(X0,mult(X1,X2)),file('input.p')). 155 | Action: c_53 Observation: 156 | 157 | Action: c_52 Observation: 158 | 159 | Action: c_50 Observation: 160 | 161 | Action: c_51 Observation: 162 | 163 | Action: c_49 Observation: 164 | cnf(c_63,plain,mult(a,mult(a,X0))=mult(a,X0),inference(superposition,[],[c_52,c_49])). 165 | cnf(c_62,plain,mult(inv(X0),mult(X0,X1))=mult(e,X1),inference(superposition,[],[c_51,c_49])). 166 | cnf(c_64,plain,mult(mult(X0,mult(X1,X2)),X3)=mult(mult(X0,X1),mult(X2,X3)),inference(superposition,[],[c_49,c_49])). 167 | Action: c_63 Observation: 168 | cnf(c_68,plain,mult(a,mult(mult(a,X0),X1))=mult(mult(a,X0),X1),inference(superposition,[],[c_63,c_49])). 169 | Action: c_62 Observation: 170 | cnf(c_70,plain,mult(inv(X0),mult(X0,X1))=X1,inference(demodulation,[],[c_62,c_50])). 171 | cnf(c_74,plain,mult(inv(a),a)=a,inference(superposition,[],[c_52,c_70])). 172 | cnf(c_72,plain,mult(inv(e),X0)=X0,inference(superposition,[],[c_50,c_70])). 173 | cnf(c_73,plain,mult(inv(inv(X0)),e)=X0,inference(superposition,[],[c_51,c_70])). 174 | cnf(c_77,plain,mult(inv(inv(X0)),X1)=mult(X0,X1),inference(superposition,[],[c_70,c_70])). 175 | cnf(c_76,plain,mult(inv(a),mult(a,X0))=mult(a,X0),inference(superposition,[],[c_63,c_70])). 176 | cnf(c_78,plain,mult(inv(X0),mult(mult(X0,X1),X2))=mult(X1,X2),inference(superposition,[],[c_70,c_49])). 177 | cnf(c_71,plain,mult(inv(mult(X0,X1)),mult(X0,mult(X1,X2)))=X2,inference(superposition,[],[c_49,c_70])). 178 | Action: c_64 Observation: 179 | 180 | Action: c_68 Observation: 181 | 182 | Action: c_70 Observation: 183 | 184 | Action: c_74 Observation: 185 | cnf(c_85,plain,e=a,inference(demodulation,[],[c_74,c_51])). 186 | cnf(c_86,plain,$false,inference(forward_subsumption_resolution,[],[c_85,c_53])). 187 | 188 | We still arrive at a contradiction 189 | 190 | .. code:: python 191 | 192 | print(terminated, truncated) 193 | print(observation["observation"][-1]) 194 | 195 | :: 196 | 197 | True False 198 | cnf(c_86,plain,$false,inference(forward_subsumption_resolution,[],[c_85,c_53])). 199 | -------------------------------------------------------------------------------- /doc/index.rst: -------------------------------------------------------------------------------- 1 | .. 2 | Copyright 2021-2025 Boris Shminke 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | https://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | .. include:: ../README.rst 17 | 18 | .. toctree:: 19 | :hidden: 20 | :caption: Introduction 21 | 22 | basic-usage 23 | 24 | .. toctree:: 25 | :hidden: 26 | :caption: Environments 27 | 28 | environments/saturation-env 29 | 30 | .. toctree:: 31 | :hidden: 32 | :caption: Tutorials 33 | 34 | example 35 | 36 | .. toctree:: 37 | :hidden: 38 | :caption: API 39 | 40 | api/environments 41 | api/wrappers 42 | api/utils 43 | 44 | .. toctree:: 45 | :hidden: 46 | :caption: Development 47 | 48 | code-of-conduct 49 | contributing 50 | development-guide 51 | GitHub 52 | -------------------------------------------------------------------------------- /doc/requirements.txt: -------------------------------------------------------------------------------- 1 | gymnasium 2 | pexpect 3 | sphinx-autodoc-typehints 4 | furo 5 | -e . 6 | -------------------------------------------------------------------------------- /gym_saturation/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | """ 15 | There are two environments in this module. 16 | 17 | They are registered using a limit for the number of steps in an episode and the 18 | maximal possible reward is set to ``1.0`` (proof is found). 19 | """ 20 | 21 | from gymnasium.envs.registration import register 22 | 23 | register(id="Vampire-v0", entry_point="gym_saturation.envs:VampireEnv") 24 | register(id="iProver-v0", entry_point="gym_saturation.envs:IProverEnv") 25 | -------------------------------------------------------------------------------- /gym_saturation/constants.py: -------------------------------------------------------------------------------- 1 | # Copyright 2023-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Constants used throughout the package 17 | ====================================== 18 | """ # noqa: D205, D400 19 | 20 | import os 21 | from importlib.resources import files 22 | 23 | CLAUSE_EMBEDDINGS = "clause_embeddings" 24 | FALSEHOOD_SYMBOL = "$false" 25 | MOCK_TPTP_FOLDER = str( 26 | files("gym_saturation").joinpath(os.path.join("resources", "TPTP-mock")) 27 | ) 28 | MOCK_TPTP_PROBLEM = os.path.join( 29 | MOCK_TPTP_FOLDER, "Problems", "TST", "TST001-1.p" 30 | ) 31 | -------------------------------------------------------------------------------- /gym_saturation/envs/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Environment for Saturation Provers 17 | =================================== 18 | """ # noqa: D205, D400 19 | 20 | from gym_saturation.envs.iprover_env import IProverEnv 21 | from gym_saturation.envs.saturation_env import SaturationEnv 22 | from gym_saturation.envs.vampire_env import VampireEnv 23 | 24 | __all__ = ["IProverEnv", "SaturationEnv", "VampireEnv"] 25 | -------------------------------------------------------------------------------- /gym_saturation/envs/iprover_env.py: -------------------------------------------------------------------------------- 1 | # Copyright 2022-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Saturation Environment with iProver back-end 17 | ============================================ 18 | """ # noqa: D205, D400 19 | 20 | import asyncio 21 | import json 22 | import os 23 | import re 24 | from threading import Thread 25 | from typing import Any 26 | 27 | from gym_saturation.envs.saturation_env import SaturationEnv 28 | from gym_saturation.relay_server import ( 29 | QUERY_END_MESSAGE, 30 | SESSION_END_MESSAGE, 31 | RelayServer, 32 | RelayTCPHandler, 33 | ) 34 | 35 | 36 | def _iprover_start( 37 | iprover_port: int, problem_filename: str, prover_binary_path: str 38 | ) -> asyncio.subprocess.Process: 39 | tptp_folder = os.path.join(os.path.dirname(problem_filename), "..", "..") 40 | command = " ".join( 41 | [ 42 | prover_binary_path, 43 | "--interactive_mode", 44 | "true", 45 | "--external_ip_address", 46 | "127.0.0.1", 47 | "--external_port", 48 | str(iprover_port), 49 | "--schedule", 50 | "none", 51 | "--resolution_flag", 52 | "false", 53 | "--instantiation_flag", 54 | "false", 55 | "--superposition_flag", 56 | "true", 57 | "--sup_iter_deepening", 58 | "0", 59 | "--sup_passive_queue_type", 60 | "external_agent", 61 | "--preprocessing_flag", 62 | "false", 63 | "--include_path", 64 | tptp_folder, 65 | problem_filename, 66 | ] 67 | ) 68 | return asyncio.run( 69 | asyncio.create_subprocess_shell( 70 | command, 71 | stdout=asyncio.subprocess.DEVNULL, 72 | stderr=asyncio.subprocess.DEVNULL, 73 | ) 74 | ) 75 | 76 | 77 | class IProverEnv(SaturationEnv): 78 | """ 79 | An RL environment around iProver. 80 | 81 | :param prover_binary_path: a path to iProver binary; 82 | by default, we assume it to be ``iproveropt`` and in the $PATH 83 | 84 | Refer to :ref:`saturation_env` for more documentation. 85 | 86 | >>> from gymnasium.utils.env_checker import check_env 87 | >>> import gymnasium as gym 88 | >>> env = gym.make( 89 | ... "iProver-v0", 90 | ... ).unwrapped 91 | >>> env.relay_server 92 | Traceback (most recent call last): 93 | ... 94 | ValueError: run ``reset`` first! 95 | >>> check_env(env) 96 | 97 | episode is never truncated by default 98 | 99 | >>> _ = env.reset() 100 | >>> _, _, _, truncated, _ = env.step("c_49") 101 | >>> truncated 102 | False 103 | """ 104 | 105 | def __init__( # noqa: D107 106 | self, 107 | prover_binary_path: str = "iproveropt", 108 | ): 109 | super().__init__() 110 | self.prover_binary_path = prover_binary_path 111 | self._relay_server: RelayServer | None = None 112 | self.relay_server_thread: Thread | None = None 113 | self.iprover_process: asyncio.subprocess.Process | None = None 114 | 115 | def _restart_relay_server(self) -> None: 116 | if self._relay_server: 117 | self._terminate_threads() 118 | self._relay_server = RelayServer(("localhost", 0), RelayTCPHandler) 119 | self.relay_server_thread = Thread( 120 | target=self._relay_server.serve_forever 121 | ) 122 | self.relay_server_thread.daemon = True 123 | self.relay_server_thread.start() 124 | 125 | def _parse_batch_clauses( 126 | self, batch_clauses: list[dict[str, Any]] 127 | ) -> tuple[tuple[str, ...], set[str]]: 128 | new_labels: set[str] = set() 129 | new_clauses: tuple[str, ...] = () 130 | for dict_clause in batch_clauses: 131 | raw_clause = ( 132 | dict_clause["clause"].replace("\n", "").replace(" ", "") 133 | ) 134 | (label, literals, inference_record) = re.findall( 135 | pattern=r"cnf\((\w+),\w+,\((.*)\),(\w+\(.+\))\)\.", 136 | string=raw_clause, 137 | )[0] 138 | new_labels.add(label) 139 | if inference_record[:5] == "file(": 140 | new_clauses += ( 141 | f"cnf({label},axiom,{literals},file('input.p')).", 142 | ) 143 | else: 144 | new_clauses += ( 145 | f"cnf({label},plain,{literals}," 146 | f"{inference_record.replace('status(thm)', '')}).", 147 | ) 148 | return new_clauses, new_labels 149 | 150 | def reset( 151 | self, 152 | *, 153 | seed: int | None = None, 154 | options: dict[str, Any] | None = None, 155 | ) -> tuple[tuple[str, ...], dict[str, Any]]: 156 | """ 157 | Reset the environment. 158 | 159 | :param seed: seed for compatibility 160 | :param options: options for compatibility 161 | :returns: observations and info 162 | """ 163 | super().reset(seed=seed) 164 | self._restart_relay_server() 165 | self.iprover_process = _iprover_start( 166 | self.relay_server.server_address[1], 167 | self.get_task(), 168 | self.prover_binary_path, 169 | ) 170 | data = self._get_json_data() 171 | new_clauses, new_labels = self._parse_iprover_requests(data) 172 | self._available_actions = new_labels 173 | return new_clauses, {} 174 | 175 | def _get_json_data(self) -> list[dict[str, Any]]: 176 | json_data = [{"tag": "None"}] 177 | while json_data[-1]["tag"] not in { 178 | QUERY_END_MESSAGE, 179 | SESSION_END_MESSAGE, 180 | }: 181 | parsed_json = self.relay_server.input_queue.get() 182 | self.relay_server.input_queue.task_done() 183 | json_data.append(parsed_json) 184 | return json_data[1:] 185 | 186 | def _parse_iprover_requests( 187 | self, iprover_requests: list[dict[str, Any]] 188 | ) -> tuple[tuple[str, ...], set[str]]: 189 | new_labels: set[str] = set() 190 | new_clauses: tuple[str, ...] = () 191 | for iprover_request in iprover_requests: 192 | if "clauses" in iprover_request: 193 | more_new_clauses, more_new_labels = self._parse_batch_clauses( 194 | iprover_request["clauses"] 195 | ) 196 | new_labels.update(more_new_labels) 197 | new_clauses += more_new_clauses 198 | return new_clauses, new_labels 199 | 200 | def _do_deductions(self, action: str) -> tuple[tuple[str, ...], set[str]]: 201 | iprover_scores_message = { 202 | "tag": "given_clause_res", 203 | "passive_is_empty": False, 204 | "given_clause": int(action[2:]), 205 | } 206 | relayed_scores_message = bytes( 207 | json.dumps({"tag": "server_queries_end"}) 208 | + "\n\x00\n" 209 | + json.dumps(iprover_scores_message) 210 | + "\n\x00\n", 211 | "utf8", 212 | ) 213 | self.relay_server.output_queue.put(relayed_scores_message) 214 | data = self._get_json_data() 215 | return self._parse_iprover_requests(data) 216 | 217 | def close(self) -> None: 218 | """Stop relay server.""" 219 | self._terminate_threads() 220 | 221 | @property 222 | def relay_server(self) -> RelayServer: 223 | """ 224 | Return the relay server object. 225 | 226 | :raises ValueError: if called before ``reset`` 227 | """ 228 | if self._relay_server: 229 | return self._relay_server 230 | raise ValueError("run ``reset`` first!") 231 | 232 | def _terminate_threads(self) -> None: 233 | if self._relay_server: 234 | self._relay_server.shutdown() 235 | if self.relay_server_thread: 236 | self.relay_server_thread.join() 237 | if self.iprover_process: 238 | try: 239 | self.iprover_process.terminate() 240 | except ProcessLookupError: # pragma: no cover 241 | pass 242 | -------------------------------------------------------------------------------- /gym_saturation/envs/saturation_env.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Saturation Environment 17 | ======================= 18 | """ # noqa: D205, D400 19 | 20 | import random 21 | from abc import abstractmethod 22 | from typing import Any 23 | 24 | from gymnasium import Env, spaces 25 | from gymnasium.spaces.text import alphanumeric 26 | 27 | from gym_saturation.constants import FALSEHOOD_SYMBOL, MOCK_TPTP_PROBLEM 28 | 29 | ALPHANUMERIC_WITH_UNDERSCORE = "".join(alphanumeric) + "_" 30 | EXTENDED_ALPHANUMERIC = ALPHANUMERIC_WITH_UNDERSCORE + "(), |~=!$.'" 31 | 32 | 33 | class SaturationEnv(Env[tuple[str, ...], str]): 34 | """ 35 | Saturation algorithm in a reinforcement learning friendly way. 36 | 37 | It's an abstract class, so here we have only trivial smoke tests. 38 | One should override ``_do_deductions`` method in children classes. 39 | 40 | Refer to :ref:`saturation_env` for more documentation. 41 | 42 | >>> class DummyProver(SaturationEnv): 43 | ... def _do_deductions(action): 44 | ... pass 45 | 46 | >>> env = DummyProver() 47 | """ 48 | 49 | reward_range = (0, 1) 50 | action_space = spaces.Text(256, charset=ALPHANUMERIC_WITH_UNDERSCORE) 51 | observation_space = spaces.Sequence( 52 | spaces.Text(4000, charset=EXTENDED_ALPHANUMERIC) 53 | ) 54 | 55 | def __init__( # noqa: D107 56 | self, 57 | ): 58 | super().__init__() 59 | self._task = MOCK_TPTP_PROBLEM 60 | self._terminated = False 61 | self._available_actions: set[str] = set() 62 | 63 | def reset( 64 | self, 65 | *, 66 | seed: int | None = None, 67 | options: dict[str, Any] | None = None, 68 | ) -> tuple[tuple[str, ...], dict[str, Any]]: 69 | """ 70 | Reset the environment. 71 | 72 | :param seed: seed for compatibility 73 | :param options: options for compatibility 74 | :returns: observations and info 75 | """ 76 | super().reset(seed=seed) 77 | random.seed(seed) 78 | self._terminated = False 79 | self._available_actions = set() 80 | return (), {} 81 | 82 | @abstractmethod 83 | def _do_deductions(self, action: Any) -> tuple[tuple[str, ...], set[str]]: 84 | raise NotImplementedError # pragma: no cover 85 | 86 | def step( 87 | self, action: Any 88 | ) -> tuple[tuple[str, ...], float, bool, bool, dict[str, Any]]: 89 | """ 90 | Run one time-step of the environment's dynamics. 91 | 92 | When end of episode is reached, you are responsible for calling 93 | ``reset()`` to reset this environment's state. 94 | Accepts an action and returns a tuple 95 | (observation, reward, terminated, truncated, info) 96 | 97 | :param action: an action provided by the agent 98 | :returns: a tuple of four values:\n 99 | * observation: agent's observation of the current environment 100 | * reward: amount of reward returned after previous action 101 | * terminated: whether the proof was found 102 | * truncated: whether the episode was finished for an external 103 | reason (e.g. time limit) 104 | * info: contains auxiliary diagnostic information (helpful for 105 | debugging, and sometimes learning) 106 | """ # noqa: D301 107 | new_clauses: tuple[str, ...] = () 108 | if not self._terminated and action in self._available_actions: 109 | new_clauses, new_actions = self._do_deductions(action) 110 | self._terminated = max( 111 | (FALSEHOOD_SYMBOL in clause for clause in new_clauses), 112 | default=False, 113 | ) 114 | self._available_actions.discard(action) 115 | self._available_actions.update(new_actions) 116 | return ( 117 | new_clauses, 118 | 0.0, 119 | self._terminated, 120 | False, 121 | {}, 122 | ) 123 | 124 | def render(self) -> None: # type: ignore 125 | """No render.""" 126 | 127 | def set_task(self, task: str) -> None: 128 | """ 129 | Set the specified task to the current environment. 130 | 131 | :param task: a TPTP problem filename 132 | """ 133 | self._task = task 134 | 135 | def get_task(self) -> str: 136 | """ 137 | Get the task that the agent is performing in the current environment. 138 | 139 | :returns: a TPTP problem filename 140 | """ 141 | return self._task 142 | -------------------------------------------------------------------------------- /gym_saturation/envs/vampire_env.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Saturation Environment with Vampire back-end 17 | ============================================ 18 | """ # noqa: D205, D400 19 | 20 | import os 21 | import re 22 | from typing import Any 23 | 24 | from gym_saturation.constants import FALSEHOOD_SYMBOL 25 | from gym_saturation.envs.saturation_env import SaturationEnv 26 | from gym_saturation.vampire_wrapper import VampireWrapper 27 | 28 | 29 | class VampireEnv(SaturationEnv): 30 | """ 31 | An RL environment wrapper around Vampire prover. 32 | 33 | :param prover_binary_path: a path to Vampire binary; 34 | by default we expect it to be in the $PATH 35 | 36 | Refer to :ref:`saturation_env` for more documentation. 37 | 38 | We can run a full Gymnasium environment check: 39 | 40 | >>> from gymnasium.utils.env_checker import check_env 41 | >>> import gymnasium as gym 42 | >>> env = gym.make("Vampire-v0").unwrapped 43 | >>> check_env(env) 44 | 45 | repeating actions change nothing 46 | 47 | >>> env = gym.make("Vampire-v0") 48 | >>> _ = env.reset() 49 | >>> one = env.step("c_1") 50 | >>> two = env.step("c_1") 51 | >>> one == two 52 | True 53 | 54 | sometimes Vampire can solve a problem during pre-processing 55 | 56 | >>> from gym_saturation.constants import MOCK_TPTP_PROBLEM 57 | >>> trivial_problem = os.path.join(os.path.dirname(MOCK_TPTP_PROBLEM), 58 | ... "TST002-1.p") 59 | >>> env.unwrapped.set_task(trivial_problem) 60 | >>> _, _ = env.reset() 61 | >>> env.unwrapped._terminated 62 | True 63 | >>> _, _, terminated, _, _ = env.step("anything") 64 | >>> terminated 65 | True 66 | >>> env.close() 67 | 68 | a test of an unexpected reply from Vampire 69 | 70 | >>> from gym_saturation.constants import MOCK_TPTP_FOLDER 71 | >>> vampire_binary = os.path.join(MOCK_TPTP_FOLDER, "..", "vampire-mock") 72 | >>> vampire_env = VampireEnv(prover_binary_path=vampire_binary) 73 | >>> vampire_env.reset() 74 | Traceback (most recent call last): 75 | ... 76 | ValueError: ('Unexpected response type: ', 'who could expect that?') 77 | """ 78 | 79 | def __init__( # noqa: D107 80 | self, 81 | prover_binary_path: str = "vampire", 82 | ): 83 | super().__init__() 84 | self._vampire = VampireWrapper(prover_binary_path) 85 | 86 | def _parse_vampire_response( 87 | self, vampire_response: tuple[tuple[str, str, str], ...] 88 | ) -> tuple[tuple[str, ...], set[str]]: 89 | new_labels: set[str] = set() 90 | new_clauses: tuple[str, ...] = () 91 | for response_type, clause_label, clause_text in vampire_response: 92 | new_label = "c_" + clause_label 93 | if response_type == "passive" or FALSEHOOD_SYMBOL in clause_text: 94 | new_clauses += ( 95 | self._parse_vampire_clause(new_label, clause_text), 96 | ) 97 | new_labels.add(new_label) 98 | elif response_type not in { 99 | "active", 100 | "forward reduce", 101 | "backward reduce", 102 | "new propositional", 103 | "new", 104 | "final", 105 | "input", 106 | "fn def discovered", 107 | }: 108 | raise ValueError("Unexpected response type: ", response_type) 109 | return new_clauses, new_labels 110 | 111 | def reset( 112 | self, 113 | *, 114 | seed: int | None = None, 115 | options: dict[str, Any] | None = None, 116 | ) -> tuple[tuple[str, ...], dict[str, Any]]: 117 | """ 118 | Reset the environment. 119 | 120 | :param seed: seed for compatibility 121 | :param options: options for compatibility 122 | :returns: observations and info 123 | """ 124 | super().reset(seed=seed) 125 | tptp_folder = os.path.join( 126 | os.path.dirname(self.get_task()), "..", ".." 127 | ) 128 | vampire_response = self._vampire.start(self.get_task(), tptp_folder) 129 | new_clauses, new_labels = self._parse_vampire_response( 130 | vampire_response 131 | ) 132 | self._terminated = max( 133 | (FALSEHOOD_SYMBOL in clause for clause in new_clauses), 134 | default=False, 135 | ) 136 | self._available_actions = new_labels 137 | return new_clauses, {} 138 | 139 | def _do_deductions(self, action: str) -> tuple[tuple[str, ...], set[str]]: 140 | return self._parse_vampire_response( 141 | # the first two characters are `c_` 142 | self._vampire.pick_a_clause(action[2:]) 143 | ) 144 | 145 | def _parse_vampire_clause( 146 | self, clause_label: str, clause_text: str 147 | ) -> str: 148 | literals, inference_rule, inference_parents = re.findall( 149 | r"(.+) \[([^\d,]+)([\d,]*)\]", clause_text 150 | )[0] 151 | literals = literals.replace(" ", "") 152 | inference_rule = inference_rule.strip().replace(" ", "_") 153 | inference_parents = "c_" + inference_parents.replace(",", ",c_") 154 | if inference_rule != "input": 155 | return ( 156 | f"cnf({clause_label},plain,{literals}," 157 | f"inference({inference_rule}," 158 | f"[],[{inference_parents}]))." 159 | ) 160 | return f"cnf({clause_label},axiom,{literals},file('input.p'))." 161 | 162 | def close(self) -> None: 163 | """Terminate Vampire process.""" 164 | self._vampire.terminate() 165 | -------------------------------------------------------------------------------- /gym_saturation/py.typed: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/inpefess/gym-saturation/103e44c90ae48d56d8a06ad2908d3b0c0939bf22/gym_saturation/py.typed -------------------------------------------------------------------------------- /gym_saturation/relay_server.py: -------------------------------------------------------------------------------- 1 | # Copyright 2022-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Relay Server between Two Sockets 17 | ================================ 18 | """ # noqa: D205, D400 19 | 20 | import json 21 | from queue import Queue 22 | from socketserver import BaseRequestHandler, ThreadingTCPServer 23 | from typing import Any 24 | 25 | QUERY_END_MESSAGE = "server_queries_start" 26 | SESSION_END_MESSAGE = "proof_out" 27 | 28 | 29 | class RelayServer(ThreadingTCPServer): 30 | r""" 31 | Server relaying i/o of a long-running connection to a TCP socket to queues. 32 | 33 | >>> import socket 34 | >>> from threading import Thread 35 | >>> with RelayServer(("localhost", 0), RelayTCPHandler 36 | ... ) as relay_server: 37 | ... thread = Thread(target=relay_server.serve_forever) 38 | ... thread.daemon = True 39 | ... thread.start() 40 | ... with socket.create_connection(relay_server.server_address 41 | ... ) as socket_connection: 42 | ... # data sent to the socket are stored in one queue 43 | ... socket_connection.sendall(bytes( 44 | ... f'{{"tag": "{QUERY_END_MESSAGE}"}}\n\x00\n', "utf8") 45 | ... ) 46 | ... print(relay_server.input_queue.get()) 47 | ... # data from another queue are sent to the client 48 | ... relay_server.output_queue.put(b"test") 49 | ... print(str(socket_connection.recv(4096), "utf8")) 50 | ... # message format for closing connection 51 | ... socket_connection.sendall(bytes( 52 | ... f'{{"tag": "{SESSION_END_MESSAGE}"}}\n\x00\n', "utf8" 53 | ... )) 54 | ... relay_server.shutdown() 55 | ... thread.join() 56 | {'tag': 'server_queries_start'} 57 | test 58 | """ 59 | 60 | def __init__( # noqa: D107 61 | self, 62 | server_address: tuple[str, int], 63 | request_handler_class: type[BaseRequestHandler], 64 | bind_and_activate: bool = True, 65 | ): 66 | super().__init__( 67 | server_address, request_handler_class, bind_and_activate 68 | ) 69 | self.input_queue: Queue = Queue() 70 | self.output_queue: Queue = Queue() 71 | self.daemon_threads = True 72 | 73 | 74 | class RelayTCPHandler(BaseRequestHandler): 75 | """The request handler class for relay server.""" 76 | 77 | def read_messages(self) -> list[dict[str, Any]]: 78 | """ 79 | Read messages from TCP request. 80 | 81 | :returns: parsed messages 82 | """ 83 | raw_data = "" 84 | json_messages: list[dict[str, Any]] = [] 85 | while len(json_messages) == 0 or json_messages[-1]["tag"] not in { 86 | QUERY_END_MESSAGE, 87 | SESSION_END_MESSAGE, 88 | }: 89 | raw_data += str(self.request.recv(4096), "utf8") 90 | raw_jsons = raw_data.split("\n\x00\n") 91 | raw_data = raw_jsons[-1] 92 | json_messages.extend(list(map(json.loads, raw_jsons[:-1]))) 93 | return json_messages 94 | 95 | def handle(self) -> None: 96 | """Read data from another TCP socket or send data to it.""" 97 | if isinstance(self.server, RelayServer): 98 | json_messages: list[dict[str, Any]] = [] 99 | while ( 100 | len(json_messages) == 0 101 | or json_messages[-1]["tag"] != SESSION_END_MESSAGE 102 | ): 103 | json_messages = self.read_messages() 104 | for json_message in json_messages: 105 | self.server.input_queue.put(json_message) 106 | if json_messages[-1]["tag"] == QUERY_END_MESSAGE: 107 | self.request.sendall(self.server.output_queue.get()) 108 | self.server.output_queue.task_done() 109 | -------------------------------------------------------------------------------- /gym_saturation/resources/TPTP-mock/Problems/TST/TST001-1.p: -------------------------------------------------------------------------------- 1 | cnf(associativity, axiom, mult(X, mult(Y, Z)) = mult(mult(X, Y), Z)). 2 | cnf(left_identity, axiom, mult(e, X) = X). 3 | cnf(left_inverse, axiom, mult(inv(X), X) = e). 4 | cnf(idempotent_element, hypothesis, mult(a, a) = a). 5 | cnf(negated_conjecture, negated_conjecture, ~ a = e). 6 | -------------------------------------------------------------------------------- /gym_saturation/resources/TPTP-mock/Problems/TST/TST002-1.p: -------------------------------------------------------------------------------- 1 | cnf(conjecture, axiom, p). 2 | cnf(negated_conjecture, negated_conjecture, ~ p). 3 | -------------------------------------------------------------------------------- /gym_saturation/resources/vampire-mock: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | echo [SA] who could expect that?: 1. label [input] 4 | -------------------------------------------------------------------------------- /gym_saturation/vampire_wrapper.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Vampire Wrapper 17 | ================ 18 | """ # noqa: D205, D400 19 | 20 | import pexpect 21 | 22 | 23 | class VampireWrapper: 24 | """ 25 | A wrapper around Vampire binary running in a manual clause selection mode. 26 | 27 | :param binary_path: an absolute path to Vampire binary 28 | :param command_line_arguments: command line arguments in one string 29 | 30 | .. _vampire-wrapper : 31 | 32 | >>> from gym_saturation.constants import (MOCK_TPTP_PROBLEM, 33 | ... MOCK_TPTP_FOLDER) 34 | >>> vampire = VampireWrapper("vampire") 35 | >>> vampire.pick_a_clause("2") 36 | Traceback (most recent call last): 37 | ... 38 | ValueError: start solving a problem first! 39 | >>> result = vampire.start(MOCK_TPTP_PROBLEM, MOCK_TPTP_FOLDER) 40 | """ 41 | 42 | def __init__( # noqa: D107 43 | self, binary_path: str, command_line_arguments: str | None = None 44 | ): 45 | self.binary_path = binary_path 46 | self._proc: pexpect.spawn | None = None 47 | self.problem_filename: str | None = None 48 | self.command_line_arguments = ( 49 | " --manual_cs on --show_passive on" 50 | " --show_new on --time_limit 0 --avatar off " 51 | if command_line_arguments is None 52 | else command_line_arguments 53 | ) 54 | 55 | def _get_stdout(self) -> tuple[tuple[str, str, str], ...]: 56 | result: tuple[tuple[str, str, str], ...] = () 57 | self.proc.expect( 58 | ["Pick a clause:", "Pick a clause pair:", pexpect.EOF] 59 | ) 60 | if self.proc.before is not None: 61 | lines = self.proc.before.decode("utf-8").split("\r\n") 62 | for line in lines: 63 | if line[:5] == "[SA] ": 64 | result_type, result_body = line[5:].split(": ") 65 | clause_label, clause = result_body.split(". ") 66 | result += ((result_type, clause_label, clause),) 67 | return result 68 | 69 | def start( 70 | self, problem_filename: str, tptp_folder: str 71 | ) -> tuple[tuple[str, str, str], ...]: 72 | """ 73 | Start Vampire in a manual mode on a given problem. 74 | 75 | Time limit is one day, Vampire prints everything 76 | 77 | :param problem_filename: full path of a TPTP problem file 78 | :param tptp_folder: the root folder for TPTP library 79 | :returns: a sequence of action type, clause number and clause 80 | """ 81 | self.problem_filename = problem_filename 82 | if self._proc is not None: 83 | self._proc.close() 84 | self._proc = pexpect.spawn( 85 | f"{self.binary_path} {self.command_line_arguments} " 86 | f"--include {tptp_folder} {problem_filename}", 87 | echo=False, 88 | ) 89 | # https://pexpect.readthedocs.io/en/stable/commonissues.html#timing-issue-with-send-and-sendline 90 | self._proc.delaybeforesend = None # type: ignore 91 | return self._get_stdout() 92 | 93 | def pick_a_clause( 94 | self, clause_label: str 95 | ) -> tuple[tuple[str, str, str], ...]: 96 | """ 97 | Select a clause and get response from Vampire. 98 | 99 | :param clause_label: a given clause order number 100 | :returns: a sequence of action type, clause number and clause 101 | """ 102 | self.proc.sendline(clause_label) 103 | return self._get_stdout() 104 | 105 | @property 106 | def proc(self) -> pexpect.spawn: 107 | """ 108 | Vampire process. 109 | 110 | :raises ValueError: when called before ``reset`` 111 | """ 112 | if self._proc is None: 113 | raise ValueError("start solving a problem first!") 114 | return self._proc 115 | 116 | def terminate(self) -> None: 117 | """Terminate Vampire process if any.""" 118 | if self._proc is not None: 119 | self._proc.terminate() 120 | self._proc.wait() 121 | -------------------------------------------------------------------------------- /gym_saturation/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2023-2025 Boris Shminke 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """ 16 | Gymnasium Wrappers for Provers 17 | =============================== 18 | """ # noqa: D205, D400 19 | 20 | from gym_saturation.wrappers.labels_extractor import LabelsExtractor 21 | 22 | __all__ = ["LabelsExtractor"] 23 | -------------------------------------------------------------------------------- /gym_saturation/wrappers/labels_extractor.py: -------------------------------------------------------------------------------- 1 | """Labels extractor wrapper.""" 2 | 3 | import re 4 | 5 | import gymnasium as gym 6 | 7 | 8 | class LabelsExtractor(gym.ObservationWrapper): 9 | """ 10 | Labels extractor wrapper. 11 | 12 | >>> from gym_saturation.envs.vampire_env import VampireEnv 13 | >>> env = LabelsExtractor(VampireEnv()) 14 | >>> observation, info = env.reset() 15 | >>> type(observation) 16 | 17 | >>> observation.keys() 18 | dict_keys(['labels', 'observation']) 19 | >>> type(observation["labels"]) 20 | 21 | >>> type(observation["labels"][0]) 22 | 23 | """ 24 | 25 | def observation( 26 | self, observation: tuple[str, ...] 27 | ) -> dict[str, tuple[str, ...]]: 28 | """ 29 | Return a modified observation. 30 | 31 | :param observation: The observation 32 | :returns: The modified observation 33 | """ 34 | return { 35 | "labels": tuple( 36 | re.findall(r"cnf\((\w+),.+\)\.", clause)[0] 37 | for clause in observation 38 | ), 39 | "observation": observation, 40 | } 41 | -------------------------------------------------------------------------------- /joss-paper/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/inpefess/gym-saturation/103e44c90ae48d56d8a06ad2908d3b0c0939bf22/joss-paper/architecture.png -------------------------------------------------------------------------------- /joss-paper/architecture.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
2) logic_ops
3) OpenAI Gym Env
2) logic_ops...
RL agent
RL ag...
gym-saturation
gym-saturation
action (given clause index)
action (given clause index)
reward, state (JSON), ...
reward, state (JSON), ...
TPTP
TPTP
CNF problem
CNF problem
1) parsing
1) parsing
4) agent_testing
4) agent_testing
grammar trees
grammar tree...
Viewer does not support full SVG 1.1
-------------------------------------------------------------------------------- /joss-paper/paper.bib: -------------------------------------------------------------------------------- 1 | @inproceedings{DBLP:conf/cade/0001P20, 2 | author = {Stephan Schulz and 3 | Adam Pease}, 4 | editor = {Nicolas Peltier and 5 | Viorica Sofronie{-}Stokkermans}, 6 | title = {Teaching Automated Theorem Proving by Example: PyRes 1.2 - (System 7 | Description)}, 8 | booktitle = {Automated Reasoning - 10th International Joint Conference, {IJCAR} 9 | 2020, Paris, France, July 1-4, 2020, Proceedings, Part {II}}, 10 | series = {Lecture Notes in Computer Science}, 11 | volume = {12167}, 12 | pages = {158--166}, 13 | publisher = {Springer}, 14 | year = {2020}, 15 | url = {https://doi.org/10.1007/978-3-030-51054-1\_9}, 16 | doi = {10.1007/978-3-030-51054-1\_9}, 17 | timestamp = {Fri, 03 Jul 2020 14:00:29 +0200}, 18 | biburl = {https://dblp.org/rec/conf/cade/0001P20.bib}, 19 | bibsource = {dblp computer science bibliography, https://dblp.org} 20 | } 21 | @article{DBLP:journals/corr/BrockmanCPSSTZ16, 22 | author = {Greg Brockman and 23 | Vicki Cheung and 24 | Ludwig Pettersson and 25 | Jonas Schneider and 26 | John Schulman and 27 | Jie Tang and 28 | Wojciech Zaremba}, 29 | title = {OpenAI Gym}, 30 | journal = {CoRR}, 31 | volume = {abs/1606.01540}, 32 | year = {2016}, 33 | url = {http://arxiv.org/abs/1606.01540}, 34 | eprinttype = {arXiv}, 35 | eprint = {1606.01540}, 36 | timestamp = {Fri, 08 Nov 2019 12:51:06 +0100}, 37 | biburl = {https://dblp.org/rec/journals/corr/BrockmanCPSSTZ16.bib}, 38 | bibsource = {dblp computer science bibliography, https://dblp.org} 39 | } 40 | @Article{Sut17, 41 | Author = "Sutcliffe, G.", 42 | Year = "2017", 43 | Title = "{The TPTP Problem Library and Associated Infrastructure. 44 | From CNF to TH0, TPTP v6.4.0}", 45 | Journal = "Journal of Automated Reasoning", 46 | Volume = "59", 47 | Number = "4", 48 | Pages = "483-502" 49 | } 50 | @inproceedings{DBLP:conf/cade/0001CV19, 51 | author = {Stephan Schulz and 52 | Simon Cruanes and 53 | Petar Vukmirovic}, 54 | editor = {Pascal Fontaine}, 55 | title = {Faster, Higher, Stronger: {E} 2.3}, 56 | booktitle = {Automated Deduction - {CADE} 27 - 27th International Conference on 57 | Automated Deduction, Natal, Brazil, August 27-30, 2019, Proceedings}, 58 | series = {Lecture Notes in Computer Science}, 59 | volume = {11716}, 60 | pages = {495--507}, 61 | publisher = {Springer}, 62 | year = {2019}, 63 | url = {https://doi.org/10.1007/978-3-030-29436-6\_29}, 64 | doi = {10.1007/978-3-030-29436-6\_29}, 65 | timestamp = {Sat, 19 Oct 2019 20:28:03 +0200}, 66 | biburl = {https://dblp.org/rec/conf/cade/0001CV19.bib}, 67 | bibsource = {dblp computer science bibliography, https://dblp.org} 68 | } 69 | @inproceedings{DBLP:conf/cav/KovacsV13, 70 | author = {Laura Kov{\'{a}}cs and 71 | Andrei Voronkov}, 72 | editor = {Natasha Sharygina and 73 | Helmut Veith}, 74 | title = {First-Order Theorem Proving and Vampire}, 75 | booktitle = {Computer Aided Verification - 25th International Conference, {CAV} 76 | 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings}, 77 | series = {Lecture Notes in Computer Science}, 78 | volume = {8044}, 79 | pages = {1--35}, 80 | publisher = {Springer}, 81 | year = {2013}, 82 | url = {https://doi.org/10.1007/978-3-642-39799-8\_1}, 83 | doi = {10.1007/978-3-642-39799-8\_1}, 84 | timestamp = {Tue, 14 May 2019 10:00:43 +0200}, 85 | biburl = {https://dblp.org/rec/conf/cav/KovacsV13.bib}, 86 | bibsource = {dblp computer science bibliography, https://dblp.org} 87 | } 88 | @inproceedings{DBLP:conf/cade/000121a, 89 | author = {Martin Suda}, 90 | editor = {Andr{\'{e}} Platzer and 91 | Geoff Sutcliffe}, 92 | title = {Improving ENIGMA-style Clause Selection while Learning From History}, 93 | booktitle = {Automated Deduction - {CADE} 28 - 28th International Conference on 94 | Automated Deduction, Virtual Event, July 12-15, 2021, Proceedings}, 95 | series = {Lecture Notes in Computer Science}, 96 | volume = {12699}, 97 | pages = {543--561}, 98 | publisher = {Springer}, 99 | year = {2021}, 100 | url = {https://doi.org/10.1007/978-3-030-79876-5\_31}, 101 | doi = {10.1007/978-3-030-79876-5\_31}, 102 | timestamp = {Thu, 29 Jul 2021 13:42:16 +0200}, 103 | biburl = {https://dblp.org/rec/conf/cade/000121a.bib}, 104 | bibsource = {dblp computer science bibliography, https://dblp.org} 105 | } 106 | @inproceedings{DBLP:conf/cade/JakubuvCOP0U20, 107 | author = {Jan Jakubuv and 108 | Karel Chvalovsk{\'{y}} and 109 | Miroslav Ols{\'{a}}k and 110 | Bartosz Piotrowski and 111 | Martin Suda and 112 | Josef Urban}, 113 | editor = {Nicolas Peltier and 114 | Viorica Sofronie{-}Stokkermans}, 115 | title = {{ENIGMA} Anonymous: Symbol-Independent Inference Guiding Machine (System 116 | Description)}, 117 | booktitle = {Automated Reasoning - 10th International Joint Conference, {IJCAR} 118 | 2020, Paris, France, July 1-4, 2020, Proceedings, Part {II}}, 119 | series = {Lecture Notes in Computer Science}, 120 | volume = {12167}, 121 | pages = {448--463}, 122 | publisher = {Springer}, 123 | year = {2020}, 124 | url = {https://doi.org/10.1007/978-3-030-51054-1\_29}, 125 | doi = {10.1007/978-3-030-51054-1\_29}, 126 | timestamp = {Fri, 09 Apr 2021 18:50:52 +0200}, 127 | biburl = {https://dblp.org/rec/conf/cade/JakubuvCOP0U20.bib}, 128 | bibsource = {dblp computer science bibliography, https://dblp.org} 129 | } 130 | @inproceedings{DBLP:conf/tableaux/RawsonR21, 131 | author = {Michael Rawson and 132 | Giles Reger}, 133 | editor = {Anupam Das and 134 | Sara Negri}, 135 | title = {lazyCoP: Lazy Paramodulation Meets Neurally Guided Search}, 136 | booktitle = {Automated Reasoning with Analytic Tableaux and Related Methods - 30th 137 | International Conference, {TABLEAUX} 2021, Birmingham, UK, September 138 | 6-9, 2021, Proceedings}, 139 | series = {Lecture Notes in Computer Science}, 140 | volume = {12842}, 141 | pages = {187--199}, 142 | publisher = {Springer}, 143 | year = {2021}, 144 | url = {https://doi.org/10.1007/978-3-030-86059-2\_11}, 145 | doi = {10.1007/978-3-030-86059-2\_11}, 146 | timestamp = {Mon, 06 Sep 2021 13:59:54 +0200}, 147 | biburl = {https://dblp.org/rec/conf/tableaux/RawsonR21.bib}, 148 | bibsource = {dblp computer science bibliography, https://dblp.org} 149 | } 150 | @ARTICLE{9669114, 151 | author={Abdelaziz, Ibrahim and Crouse, Maxwell and Makni, Bassem and Austel, Vernon and Cornelio, Cristina and Ikbal, Shajith and Kapanipathi, Pavan and Makondo, Ndivhuwo and Srinivas, Kavitha and Witbrock, Michael and Fokoue, Achille}, 152 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 153 | title={Learning to Guide a Saturation-Based Theorem Prover}, 154 | year={2022}, 155 | volume={}, 156 | number={}, 157 | pages={1-1}, 158 | doi={10.1109/TPAMI.2022.3140382}} 159 | @inproceedings{DBLP:conf/icml/BansalLRSW19, 160 | author = {Kshitij Bansal and 161 | Sarah M. Loos and 162 | Markus N. Rabe and 163 | Christian Szegedy and 164 | Stewart Wilcox}, 165 | editor = {Kamalika Chaudhuri and 166 | Ruslan Salakhutdinov}, 167 | title = {HOList: An Environment for Machine Learning of Higher Order Logic 168 | Theorem Proving}, 169 | booktitle = {Proceedings of the 36th International Conference on Machine Learning, 170 | {ICML} 2019, 9-15 June 2019, Long Beach, California, {USA}}, 171 | series = {Proceedings of Machine Learning Research}, 172 | volume = {97}, 173 | pages = {454--463}, 174 | publisher = {{PMLR}}, 175 | year = {2019}, 176 | url = {http://proceedings.mlr.press/v97/bansal19a.html}, 177 | timestamp = {Tue, 11 Jun 2019 15:37:38 +0200}, 178 | biburl = {https://dblp.org/rec/conf/icml/BansalLRSW19.bib}, 179 | bibsource = {dblp computer science bibliography, https://dblp.org} 180 | } 181 | @software{tange_2021_5233953, 182 | author = {Tange, Ole}, 183 | title = {GNU Parallel 20210822 ('Kabul')}, 184 | month = Aug, 185 | year = 2021, 186 | note = {{GNU Parallel is a general parallelizer to run 187 | multiple serial command line programs in parallel 188 | without changing them.}}, 189 | publisher = {Zenodo}, 190 | doi = {10.5281/zenodo.5233953}, 191 | url = {https://doi.org/10.5281/zenodo.5233953} 192 | } 193 | @article{doi:10.1137/0204036, 194 | author = {Brand, D.}, 195 | title = {Proving Theorems with the Modification Method}, 196 | journal = {SIAM Journal on Computing}, 197 | volume = {4}, 198 | number = {4}, 199 | pages = {412-430}, 200 | year = {1975}, 201 | doi = {10.1137/0204036}, 202 | URL = {https://doi.org/10.1137/0204036}, 203 | eprint = {https://doi.org/10.1137/0204036} 204 | } 205 | @misc{LARK, 206 | author = {Erez Shinan}, 207 | title = {lark-parser}, 208 | url = {https://pypi.org/project/lark-parser/}, 209 | version = {0.12.0}, 210 | date = {2021-08-30}, 211 | } 212 | @inproceedings{DBLP:conf/aaai/PaliwalLRBS20, 213 | author = {Aditya Paliwal and 214 | Sarah M. Loos and 215 | Markus N. Rabe and 216 | Kshitij Bansal and 217 | Christian Szegedy}, 218 | title = {Graph Representations for Higher-Order Logic and Theorem Proving}, 219 | booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI} 220 | 2020, The Thirty-Second Innovative Applications of Artificial Intelligence 221 | Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational 222 | Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA, 223 | February 7-12, 2020}, 224 | pages = {2967--2974}, 225 | publisher = {{AAAI} Press}, 226 | year = {2020}, 227 | url = {https://aaai.org/ojs/index.php/AAAI/article/view/5689}, 228 | timestamp = {Tue, 02 Feb 2021 08:00:27 +0100}, 229 | biburl = {https://dblp.org/rec/conf/aaai/PaliwalLRBS20.bib}, 230 | bibsource = {dblp computer science bibliography, https://dblp.org} 231 | } 232 | @article{DBLP:journals/corr/abs-1905-10501, 233 | author = {Kshitij Bansal and 234 | Sarah M. Loos and 235 | Markus N. Rabe and 236 | Christian Szegedy}, 237 | title = {Learning to Reason in Large Theories without Imitation}, 238 | journal = {CoRR}, 239 | volume = {abs/1905.10501}, 240 | year = {2019}, 241 | url = {http://arxiv.org/abs/1905.10501}, 242 | eprinttype = {arXiv}, 243 | eprint = {1905.10501}, 244 | timestamp = {Mon, 03 Jun 2019 13:42:33 +0200}, 245 | biburl = {https://dblp.org/rec/journals/corr/abs-1905-10501.bib}, 246 | bibsource = {dblp computer science bibliography, https://dblp.org} 247 | } 248 | @inproceedings{DBLP:conf/iclr/HuangDSS19, 249 | author = {Daniel Huang and 250 | Prafulla Dhariwal and 251 | Dawn Song and 252 | Ilya Sutskever}, 253 | title = {GamePad: {A} Learning Environment for Theorem Proving}, 254 | booktitle = {7th International Conference on Learning Representations, {ICLR} 2019, 255 | New Orleans, LA, USA, May 6-9, 2019}, 256 | publisher = {OpenReview.net}, 257 | year = {2019}, 258 | url = {https://openreview.net/forum?id=r1xwKoR9Y7}, 259 | timestamp = {Thu, 25 Jul 2019 14:25:40 +0200}, 260 | biburl = {https://dblp.org/rec/conf/iclr/HuangDSS19.bib}, 261 | bibsource = {dblp computer science bibliography, https://dblp.org} 262 | } -------------------------------------------------------------------------------- /joss-paper/paper.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'gym-saturation: an OpenAI Gym environment for saturation provers' 3 | tags: 4 | - Python 5 | - OpenAI Gym 6 | - automated theorem prover 7 | - saturation prover 8 | - reinforcement learning 9 | authors: 10 | - name: Boris Shminke 11 | orcid: 0000-0002-1291-9896 12 | affiliation: 1 13 | affiliations: 14 | - name: Laboratoire J.A. Dieudonné, CNRS and Université Côte d'Azur, France 15 | index: 1 16 | date: 1 October 2021 17 | bibliography: paper.bib 18 | # Summary 19 | --- 20 | 21 | `gym-saturation` is an OpenAI Gym [@DBLP:journals/corr/BrockmanCPSSTZ16] environment for reinforcement learning (RL) agents capable of proving theorems. Currently, only theorems written in a formal language of the Thousands of Problems for Theorem Provers (TPTP) library [@Sut17] in clausal normal form (CNF) are supported. `gym-saturation` implements the 'given clause' algorithm (similar to the one used in Vampire [@DBLP:conf/cav/KovacsV13] and E Prover [@DBLP:conf/cade/0001CV19]). Being written in Python, `gym-saturation` was inspired by PyRes [@DBLP:conf/cade/0001P20]. In contrast to the monolithic architecture of a typical Automated Theorem Prover (ATP), `gym-saturation` gives different agents opportunities to select clauses themselves and train from their experience. Combined with a particular agent, `gym-saturation` can work as an ATP. Even with a non trained agent based on heuristics, `gym-saturation` can find refutations for 688 (of 8257) CNF problems from TPTP v7.5.0. 22 | 23 | # Statement of need 24 | 25 | Current applications of RL to saturation-based ATPs like Enigma [@DBLP:conf/cade/JakubuvCOP0U20] or Deepire [@DBLP:conf/cade/000121a] are similar in that the environment and the agent are not separate pieces of software but parts of larger systems that are hard to disentangle. The same is true for non saturation-based RL-friendly provers too (e.g. lazyCoP, @DBLP:conf/tableaux/RawsonR21). This monolithic approach hinders free experimentation with novel machine learning (ML) models and RL algorithms and creates unnecessary complications for ML and RL experts willing to contribute to the field. In contrast, for interactive theorem provers, projects like HOList [@DBLP:conf/icml/BansalLRSW19] or GamePad [@DBLP:conf/iclr/HuangDSS19] separate the concepts of environment and agent. Such modular architecture may lead to the development of easily comparable agents based on diverse approaches (see, e.g. @DBLP:conf/aaai/PaliwalLRBS20 or @DBLP:journals/corr/abs-1905-10501). `gym-saturation` is an attempt to implement a modular environment-agent architecture of an RL-based ATP. In addition, some RL empowered saturation ATPs are not accompanied with their source code [@9669114], while `gym-saturation` is open-source software. 26 | 27 | # Usage example 28 | 29 | Suppose we want to prove an extremely simple theorem with a very basic agent. We can do that in the following way: 30 | 31 | ```python 32 | # first we create and reset a OpenAI Gym environment 33 | from importlib.resources import files 34 | import gym 35 | 36 | env = gym.make( 37 | "gym_saturation:saturation-v0", 38 | # we will try to find a proof shorter than 10 steps 39 | step_limit=10, 40 | # for a classical syllogism about Socrates 41 | problem_list=[ 42 | files("gym_saturation").joinpath( 43 | "resources/TPTP-mock/Problems/TST/TST003-1.p" 44 | ) 45 | ], 46 | ) 47 | env.reset() 48 | # we can render the environment (that will become the beginning of the proof) 49 | print("starting hypotheses:") 50 | print(env.render("human")) 51 | # our 'age' agent will always select clauses for inference 52 | # in the order they appeared in current proof attempt 53 | action = 0 54 | done = False 55 | while not done: 56 | observation, reward, done, info = env.step(action) 57 | action += 1 58 | # SaturationEnv has an additional method 59 | # for extracting only clauses which became parts of the proof 60 | # (some steps were unnecessary to find the proof) 61 | print("refutation proof:") 62 | print(env.tstp_proof) 63 | print(f"number of attempted steps: {action}") 64 | ``` 65 | 66 | The output of this script includes a refutation proof found: 67 | 68 | ``` 69 | starting hypotheses: 70 | cnf(p_imp_q, hypothesis, ~man(X0) | mortal(X0)). 71 | cnf(p, hypothesis, man(socrates)). 72 | cnf(q, hypothesis, ~mortal(socrates)). 73 | refutation proof: 74 | cnf(_0, hypothesis, mortal(socrates), inference(resolution, [], [p_imp_q, p])). 75 | cnf(_2, hypothesis, $false, inference(resolution, [], [q, _0])). 76 | number of attempted steps: 6 77 | ``` 78 | 79 | # Architecture 80 | 81 | `gym-saturation` includes several sub-packages: 82 | 83 | * parsing (happens during `env.reset()` in example code snippet) 84 | * logic operations (happen during `env.step(action)` in the example) 85 | * AI Gym environment implementation 86 | * agent testing (a bit more elaborated version of the `while` loop from the examle) 87 | 88 | `gym-saturation` relies on a deduction system of four rules which is known to be refutationally complete [@doi:10.1137/0204036]: 89 | 90 | \begin{align*} 91 | {\frac{C_1\vee A_1,C_2\vee\neg A_2}{\sigma\left(C_1\vee C_2\right)}},\sigma=mgu\left(A_1,A_2\right)\quad\text{(resolution)} 92 | \end{align*} 93 | \begin{align*} 94 | {\frac{C_1\vee s\approx t,C_2\vee L\left[r\right]}{\sigma\left(L\left[t\right]\vee C_1\vee C_2\right)}},\sigma=mgu\left(s,r\right)\quad\text{(paramodulation)} 95 | \end{align*} 96 | \begin{align*} 97 | {\frac{C\vee A_1\vee A_2}{\sigma\left(C\vee L_1\right)}},\sigma=mgu\left(A_1,A_2\right)\quad\text{(factoring)} 98 | \end{align*} 99 | \begin{align*} 100 | \frac{C\vee s\not\approx t}{\sigma\left(C\right)},\sigma=mgu\left(s,t\right)\quad\text{(reflexivity resolution)} 101 | \end{align*} 102 | 103 | where $C,C_1,C_2$ are clauses, $A_1,A_2$ are atomic formulae, $L$ is a literal, $r,s,t$ are terms, and $\sigma$ is a substitution (most general unifier). $L\left[t\right]$ is a result of substituting the term $t$ in $L\left[r\right]$ for the term $r$ at only one chosen position. 104 | 105 | For parsing, we use the LARK parser [@LARK]. We represent the clauses as Python classes forming tree-like structures. `gym-saturation` also includes a JSON serializer/deserializer for those trees. For example, a TPTP clause 106 | 107 | ``` 108 | cnf(a2,hypothesis, 109 | ( ~ q(a) | f(X) = X )). 110 | ``` 111 | becomes 112 | 113 | ```python 114 | Clause( 115 | literals=[ 116 | Literal( 117 | negated=True, 118 | atom=Predicate( 119 | name="q", arguments=[Function(name="a", arguments=[])] 120 | ), 121 | ), 122 | Literal( 123 | negated=False, 124 | atom=Predicate( 125 | name="=", 126 | arguments=[ 127 | Function(name="f", arguments=[Variable(name="X")]), 128 | Variable(name="X"), 129 | ], 130 | ), 131 | ), 132 | ], 133 | label="a2", 134 | ) 135 | ``` 136 | 137 | This grammar serves as the glue for `gym-saturation` sub-packages, which are, in principle, independent of each other. After switching to another parser or another deduction system, the agent testing script won't break, and RL developers won't need to modify their agents for compatibility (for them, the environment will have the same standard OpenAI Gym API). 138 | 139 | ![A diagram showing interactions between four main subpackages of `gym-saturation`: 1) parsing; 2) logic operations (including the given clause algorithm); 3) OpenAI Gym Env implementation; 4) the agent testing script.\label{fig:architecture}](architecture.png) 140 | 141 | Agent testing is a simple episode pipeline (see \autoref{fig:architecture}). It is supposed to be run in parallel (e.g. using GNU Parallel, @tange_2021_5233953) for a testing subset of problems. See the following table for the testing results of two popular heuristic-based agents on TPTP v7.5.0 (trained RL agents should strive to be more successful than those primitive baselines): 142 | 143 | | | __size agent__ | __age agent__ | __size&age agent__ | 144 | |-|-|-|-| 145 | | __proof found__ | 509 | 206 | 688 | 146 | | __step limit__ | 1385 | 35 | 223 | 147 | | __out of memory__ | 148 | 149 | 148 | 148 | | __5 min time out__ | 6215 | 7867 | 7198 | 149 | | __total__ | 8257 | 8257 | 8257 | 150 | 151 | `size agent` is an agent which always selects the shortest clause. 152 | 153 | `age agent` is an agent which always selects the clause which arrived first to the set of unprocessed clauses ('the oldest one'). 154 | 155 | `size&age agent` is an agent which selects the shortest clause five times in a row and then one time --- the oldest one. 156 | 157 | 'Step limit' means an agent didn't find proof after 1000 steps (the longest proof found consists of 287 steps). This can work as a 'soft timeout'. 158 | 159 | # Mentions 160 | 161 | At the moment of writing this paper, `gym-saturation` was used by its author during their PhD studies for creating experimental RL-based ATPs. 162 | 163 | # Acknowledgements 164 | 165 | This work has been supported by the French government, through the 3IA Côte d'Azur Investments in the Future project managed by the National Research Agency (ANR) with the reference number ANR-19-P3IA-0002. This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-AD011013125). 166 | 167 | # References 168 | -------------------------------------------------------------------------------- /local-build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -e 4 | PACKAGE_NAME=gym_saturation 5 | cd doc 6 | make clean html coverage 7 | cat _build/coverage/python.txt 8 | cd .. 9 | ruff check 10 | pyrefly check 11 | pydoclint ${PACKAGE_NAME} 12 | coverage run -m pytest 13 | coverage report --show-missing --fail-under=100 14 | pyroma -n 10 . 15 | scc --no-cocomo --by-file -i py ${PACKAGE_NAME} 16 | -------------------------------------------------------------------------------- /poetry.toml: -------------------------------------------------------------------------------- 1 | [virtualenvs] 2 | in-project = true 3 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "gym-saturation" 3 | version = "1.0.3" 4 | description = "Gymnasium environments for saturation provers" 5 | authors = [{name = "Boris Shminke", email = ""}] 6 | readme = "README.rst" 7 | classifiers=[ 8 | "Programming Language :: Python :: 3.10", 9 | "Programming Language :: Python :: 3.11", 10 | "Programming Language :: Python :: 3.12", 11 | "Programming Language :: Python :: 3.13", 12 | "License :: OSI Approved :: Apache Software License", 13 | "Operating System :: OS Independent", 14 | "Intended Audience :: Science/Research", 15 | "Development Status :: 3 - Alpha", 16 | "Environment :: Console", 17 | "Natural Language :: English", 18 | "Topic :: Scientific/Engineering :: Artificial Intelligence", 19 | "Typing :: Typed" 20 | ] 21 | include = ["gym_saturation/py.typed"] 22 | keywords = ["saturation prover", "OpenAI Gym", "Gymnasium", "automated theorem prover"] 23 | requires-python = ">= 3.10.0, < 3.14" 24 | dependencies = ["gymnasium", "pexpect"] 25 | 26 | [project.urls] 27 | Documentation = "https://gym-saturation.readthedocs.io" 28 | Issues = "https://github.com/inpefess/gym-saturation/issues" 29 | Repository = "https://github.com/inpefess/gym-saturation.git" 30 | 31 | [project.optional-dependencies] 32 | dev = ["pre-commit", "tbump", "toml", "pyroma", "pydoclint", "pyrefly", "debugpy", 33 | "jedi-language-server", "isort", "ruff", "pyrefly"] 34 | test = ["coverage", "pytest"] 35 | doc = ["types-dataclasses", "sphinx-autodoc-typehints", "furo"] 36 | 37 | [tool.isort] 38 | profile = "black" 39 | src_paths = ["gym_saturation"] 40 | 41 | [build-system] 42 | requires = ["poetry-core>=1.0.3"] 43 | build-backend = "poetry.core.masonry.api" 44 | 45 | [tool.pytest.ini_options] 46 | minversion = "6.0" 47 | addopts = "--doctest-modules --junit-xml test-results/gym-saturation.xml" 48 | testpaths = ["gym_saturation"] 49 | doctest_optionflags = "NORMALIZE_WHITESPACE ELLIPSIS" 50 | 51 | [tool.tox] 52 | env_list = ["py310", "py311", "py312", "py13"] 53 | 54 | [tool.tox.env_run_base] 55 | deps = [ 56 | "coverage", 57 | "pytest", 58 | "pyrefly", 59 | "toml", 60 | "pyroma", 61 | "ruff", 62 | "pydoclint" 63 | ] 64 | commands = [ 65 | ["ruff", "format"], 66 | ["ruff", "check"], 67 | ["pyrefly", "check"], 68 | ["pydoclint", "gym_saturation"], 69 | ["coverage", "run", "-m", "pytest"], 70 | ["coverage", "xml", "--fail-under=100"], 71 | ["pyroma", "."] 72 | ] 73 | 74 | [tool.tbump] 75 | github_url = "https://github.com/inpfess/gym-saturation/" 76 | 77 | [tool.tbump.version] 78 | current = "1.0.3" 79 | regex = """ 80 | (?P\\d+) 81 | \\. 82 | (?P\\d+) 83 | \\. 84 | (?P\\d+) 85 | """ 86 | 87 | [tool.tbump.git] 88 | message_template = "Bump to {new_version}" 89 | tag_template = "v{new_version}" 90 | 91 | [[tool.tbump.file]] 92 | src = "pyproject.toml" 93 | 94 | [tool.ruff] 95 | line-length = 79 96 | exclude = ["doc/example.py"] 97 | 98 | [tool.ruff.lint] 99 | select = ["F", "E", "W", "D", "S", "UP", "PL"] 100 | 101 | [tool.ruff.lint.pydocstyle] 102 | convention = "pep257" 103 | 104 | [tool.pydoclint] 105 | style = "sphinx" 106 | arg-type-hints-in-docstring = false 107 | check-return-types = false 108 | 109 | [tool.ruff.lint.pylint] 110 | max-statements = 10 111 | 112 | [tool.pyrefly] 113 | project-excludes = ["doc/example.py", ".*"] 114 | -------------------------------------------------------------------------------- /tableaux2023-paper/ast2vec.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
Observation Wrapper
Observation Wrapper
gym-saturation environment
gym-saturation envi...
memcached server
memcached ser...
Docker container
Docker container
ast2vec model
ast2vec mo...
TorchServe handler
TorchServe h...
tensor
observation
tensor...
Deep RL agent
Deep...
TPTP clause
TPTP clause
JSON
request
JSON...
process
new
statements
process...
cache new embeddings
or get previously cached
cache new embeddings...
JSON
response
JSON...
Python
bool statement
Python...
action
action
Text is not SVG - cannot display
-------------------------------------------------------------------------------- /tableaux2023-paper/gym-saturation.tex: -------------------------------------------------------------------------------- 1 | % Copyright 2021-2023 Boris Shminke 2 | % 3 | % This version of the contribution has been accepted for publication, 4 | % after peer review but is not the Version of Record and does not reflect 5 | % post-acceptance improvements, or any corrections. The Version of Record 6 | % is available online at: https://doi.org/10.1007/978-3-031-43513-3_11 7 | % 8 | % Licensed under the Apache License, Version 2.0 (the "License"); 9 | % you may not use this file except in compliance with the License. 10 | % You may obtain a copy of the License at 11 | % 12 | % https://www.apache.org/licenses/LICENSE-2.0 13 | % 14 | % Unless required by applicable law or agreed to in writing, software 15 | % distributed under the License is distributed on an "AS IS" BASIS, 16 | % WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17 | % See the License for the specific language governing permissions and 18 | % limitations under the License. 19 | \documentclass[runningheads]{llncs} 20 | % 21 | \usepackage[T1]{fontenc} 22 | % T1 fonts will be used to generate the final print and online PDFs, 23 | % so please use T1 fonts in your manuscript whenever possible. 24 | % Other font encondings may result in incorrect characters. 25 | % 26 | \usepackage{graphicx} 27 | % Used for displaying a sample figure. If possible, figure files should 28 | % be included in EPS format. 29 | % 30 | % If you use the hyperref package, please uncomment the following two lines 31 | % to display URLs in blue roman font according to Springer's eBook style: 32 | \usepackage{hyperref} 33 | \usepackage{color} 34 | \renewcommand\UrlFont{\color{blue}\rmfamily} 35 | % 36 | \usepackage{float} 37 | \usepackage{minted} 38 | \hyphenation{RLlib} 39 | \hyphenation{OpenAI} 40 | \begin{document} 41 | % 42 | \title{\texttt{gym-saturation}: Gymnasium environments for saturation provers (System description) \thanks{This work has been supported by the French government, through the 43 | 3IA Côte d’Azur Investment in the Future project managed by the National Research Agency (ANR) with the reference numbers ANR-19-P3IA-0002.}} 44 | % 45 | \titlerunning{\texttt{gym-saturation}: Gymnasium environments for saturation provers} 46 | % If the paper title is too long for the running head, you can set 47 | % an abbreviated paper title here 48 | % 49 | \author{Boris Shminke\orcidID{0000-0002-1291-9896}} 50 | % 51 | \authorrunning{B. Shminke} 52 | % First names are abbreviated in the running head. 53 | % If there are more than two authors, 'et al.' is used. 54 | % 55 | \institute{Université Côte d’Azur, CNRS, LJAD, France \\ 56 | \email{boris.shminke@univ-cotedazur.fr}} 57 | % 58 | \maketitle % typeset the header of the contribution 59 | % 60 | \begin{abstract} 61 | This work describes a new version of a previously published Python package --- \texttt{gym-saturation}: a collection of OpenAI Gym environments for guiding saturation-style provers based on the given clause algorithm with reinforcement learning. We contribute usage examples with two different provers: Vampire and iProver. We also have decoupled the proof state representation from reinforcement learning per se and provided examples of using a known \texttt{ast2vec} Python code embedding model as a first-order logic representation. In addition, we demonstrate how environment wrappers can transform a prover into a problem similar to a multi-armed bandit. We applied two reinforcement learning algorithms (Thompson sampling and Proximal policy optimisation) implemented in Ray RLlib to show the ease of experimentation with the new release of our package. 62 | 63 | \keywords{Automated theorem proving\and Reinforcement learning\and Saturation-style proving\and Machine learning} 64 | \end{abstract} 65 | \section{Introduction} 66 | This work describes a new version (\texttt{0.10.0}, released 2023.04.25) of a previously published~\cite{Shminke2022} Python package --- \texttt{gym-saturation}~\footnote{\url{https://pypi.org/project/gym-saturation/}}: a collection of OpenAI~Gym~\cite{DBLP:journals/corr/BrockmanCPSSTZ16} environments for guiding saturation-style provers (using the given clause algorithm) with reinforcement learning (RL) algorithms. The new version partly implements the ideas of our project proposal~\cite{https://doi.org/10.48550/arxiv.2209.02562}. The main changes from the previous release (\texttt{0.2.9}, on 2022.02.26) are: 67 | \begin{itemize} 68 | \item guiding two popular provers instead of a single experimental one (Section~\ref{section:implementation}) 69 | \item pluggable first-order logic formulae embeddings support (Section~\ref{section:representation-subsystem}) 70 | \item examples of experiments with different RL algorithms (Section~\ref{section:experiments}) 71 | \item following the updated Gymnasium~\cite{towers_gymnasium_2023} API instead of the outdated OpenAI Gym 72 | \end{itemize} 73 | 74 | \texttt{gym-saturation} works with Python 3.8+. One can install it by \texttt{pip install gym-saturation} or \texttt{conda install -c conda-forge gym-saturation}. Then, provided Vampire and/or iProver binaries are on \texttt{PATH}, one can use it as any other Gymnasium environment: 75 | \begin{minted}{python} 76 | import gymnasium 77 | 78 | import gym_saturation 79 | 80 | # v0 here is a version of the environment class, not the prover 81 | env = gymnasium.make("Vampire-v0") # or "iProver-v0" 82 | # edit and uncomment the following line to set a non-default problem 83 | # env.set_task("a-TPTP-problem-path") 84 | observation, info = env.reset() 85 | print("Starting proof state:") 86 | env.render() 87 | # truncation means finishing an episode in a non-terminal state 88 | # e.g. because of the externally imposed time limit 89 | terminated, truncated = False, False 90 | while not (terminated or truncated): 91 | # apply policy (e.g. a random available action) 92 | action = env.action_space.sample(mask=observation["action_mask"]) 93 | print("Given clause:", observation["real_obs"][action]) 94 | observation, reward, terminated, truncated, info = env.step(action) 95 | print("Final proof state:") 96 | env.render() 97 | env.close() 98 | \end{minted} 99 | \section{Related work}\label{section:related-work} 100 | Guiding provers with RL is a hot topic. Recent projects in this domain include TRAIL (Trial Reasoner for AI that Learns)~\cite{9669114}, FLoP (Finding Longer Proofs)~\cite{FLoP}, and lazyCoP~\cite{10.1007/978-3-030-86059-2_11}. We will now compare the new \texttt{gym-saturation} features with these three projects. 101 | 102 | Usually, one guides either a new prover created for that purpose (lazyCoP; FLoP builds on fCoP~\cite{fCoP}, an OCaml rewrite of older leanCoP~\cite{OTTEN2003139}) or an experimental patched version of an existing one (TRAIL relies on a modified E~\cite{10.1007/978-3-030-29436-6_29}). Contrary to that, \texttt{gym-saturation} works with unmodified stable versions of Vampire~\cite{10.1007/978-3-642-39799-8_1} and iProver~\cite{DBLP:conf/cade/DuarteK20}. 103 | 104 | In addition, known RL-guiding projects are prover-dependent: FLoP could, in principle, work with both fCoP and leanCoP but reported only fCoP experiments. TRAIL claims to be reasoner-agnostic, but to our best knowledge, no one has tried it with anything but a patched E version it uses by default. ~\cite{10.1007/978-3-030-86059-2_11} mentions an anonymous reviewer's suggestion to create a standalone tool for other existing systems, but we are not aware of further development in this direction. Quite the contrary, we have tested \texttt{gym-saturation} compatibility with two different provers (Vampire and iProver). 105 | 106 | Deep learning models expect their input to be real-valued tensors and not, for example, character strings in the TPTP~\cite{DBLP:journals/jar/Sutcliffe17} language. Thus, one always uses a \emph{representation} (or \emph{embeddings}) --- a function mapping a (parsed) logic formula to a real vector. In lazyCoP and FLoP parts of embedding functions belong to the underlying provers, making it harder to vary and experiment with (e.g., one needs Rust or OCaml programming skills to do it). \texttt{gym-saturation} leaves the choice of representation open and supports any mapping from TPTP-formatted string to real vectors. The version described in this work also provides a couple of default options. 107 | \section{Architecture and implementation details}\label{section:implementation} 108 | \subsection{Architecture} 109 | \texttt{gym-saturation} is compatible with Gymnasium~\cite{towers_gymnasium_2023}, a maintained fork of now-outdated OpenAI Gym standard of RL-environments, and passes all required environment checks. As a result of our migration to Gymnasium, its maintainers featured \texttt{gym-saturation} in a curated list of third-party environments~\footnote{\url{https://gymnasium.farama.org/environments/third_party_environments/}}. 110 | 111 | Previously, \texttt{gym-saturation} guided an experimental pure Python prover~\cite{Shminke2022} which happened to be too slow and abandoned in favour of existing highly efficient provers: Vampire and iProver. 112 | 113 | Although the \texttt{gym-saturation} user communicates with both iProver and Vampire in the same manner, under the hood, they use different protocols. For Vampire, we relied on the so-called manual (interactive) clause selection mode implemented several years ago for an unrelated task~\cite{10.1007/978-3-030-34968-4_28}. In this mode, Vampire interrupts the saturation loop and listens to standard input for a number of a given clause instead of applying heuristics. Independent of this mode, Vampire writes (or not, depending on the option \texttt{show\_all}) newly inferred clauses to its standard output. Using Python package \texttt{pexpect}, we attach to Vampire's standard input and output, pass the action chosen by the agent to the former and read observations from the latter. In manual clause selection mode, Vampire works like a server awaiting a request with an action to which it replies (exactly what an environment typically does). 114 | 115 | iProver recently added support of being guided by external agents. An agent has to be a TCP server satisfying a particular API specification. So, iProver behaves as a client which sends a request with observations to some server and awaits a reply containing an action. To make it work with \texttt{gym-saturation}, we implemented a \emph{relay server}. It accepts a long-running TCP connection from a running iProver thread and stores its requests to a thread-safe queue, and pops a response to it from another such queue filled by \texttt{gym-saturation} thread. See Figure~\ref{fig:iprover-gym} for a communication scheme. 116 | \begin{figure} 117 | \includegraphics[width=\textwidth]{iprover-gym} 118 | \caption{\texttt{gym-saturation} interacting with iProver}\label{fig:iprover-gym} 119 | \end{figure} 120 | \subsection{Implementation details} 121 | \subsubsection{Clause class} 122 | A clause is a Python data class having the following keys and respective values: 123 | 124 | \begin{itemize} 125 | \item \texttt{literals} --- a string of clause literals in the TPTP format, e.g. \texttt{'member(X0,bb) | ~member(X0,b)'} 126 | \item \texttt{label} --- a string label of a clause, e.g. `21'. Some provers (e.g. Vampire) use integer numbers for labelling clauses, but others (e.g. iProver) use an alphanumeric mixture (e.g. `c\_54') 127 | \item \texttt{role} --- a string description of a clause role in a proof (hypothesis, negated conjecture, axiom, et cetera) 128 | \item \texttt{inference\_rule} --- a string name of an inference rule used to produce the clause. It includes not only resolution and superposition but also values like `axiom' and `input' (for theorem assumptions) 129 | \item \texttt{inference\_parents} --- a tuple of clause labels if needed by the inference rule (`axiom' doesn't need any, `factoring' expects only one, `resolution' --- two, et cetera) 130 | \item \texttt{birth\_step} --- an integer step number when the clause appeared in the proof state. Axioms, assumptions, and the negated conjecture have birth step zero. 131 | \end{itemize} 132 | 133 | All these fields except the \texttt{birth\_step} (computed by the environment itself) are already available as separate entities (and not parts of TPTP-formatted strings) in iProver and Vampire output. 134 | \subsubsection{Environment class} 135 | \paragraph{Observation} is a Python dictionary with several keys: 136 | \begin{itemize} 137 | \item \texttt{real\_obs} is a tuple of all clauses (processed and unprocessed). It can be transformed to tensor representation by so-called observation wrappers~\footnote{\url{https://gymnasium.farama.org/api/wrappers/observation_wrappers/}}. The \texttt{gym-saturation} provides several such wrappers for cases of external embeddings service or hand-coded feature extraction function 138 | \item \texttt{action\_mask} is a numpy~\cite{harris2020array} array of the size \texttt{max\_clauses} (a parameter which one can set during the environment object instantiation) having a value $1.0$ at index $i$ if and only if a clause with a zero-based order number $i$ currently exists and can be a given clause (e.g. not eliminated as redundant). All other values of \texttt{action\_mask} are zeros. This array simplifies tensor operations on observation representations. 139 | \end{itemize} 140 | Limiting the total number of clauses in a proof state is a proxy of both random-access memory (each clause needs storage space) and time (a prover has to process each clause encountered) limits typical for the CASC~\cite{DBLP:journals/aicom/Sutcliffe21} competition. One can add a standard Gymnasium time-limit wrapper to limit the number of steps in an episode. Setting wall-clock time and RAM limits is not typical for RL research. 141 | \paragraph{Action} is a zero-based order number of a clause from \texttt{real\_obs}. If a respective \texttt{action\_mask} is zero, an environment throws an exception during the execution of the \texttt{step} method. \paragraph{Reward} is $1.0$ after a step if we found the refutation at this step and $0.0$ otherwise. One can change this behaviour by either Gymnasium reward wrappers or by collecting trajectories in a local buffer and postprocessing them before feeding the trainer. \paragraph{Episode is terminated} when an empty clause \texttt{\$false} appears in the proof state or if there are no more available actions. \paragraph{Episode is truncated} when there are more than \texttt{max\_clauses} clauses in the proof state. Since the state is an (extendable) tuple, we don't raise an exception when a prover generates a few more clauses. \paragraph{Info} dictionary is always empty at every step by default. \paragraph{Render modes} of the environment include two standard ones (\texttt{'human'} and \texttt{'ansi'}), the first one printing and the second one returning the same TPTP formatted string. 142 | \subsubsection{Multi-task environment} 143 | The latest \texttt{gym-saturation} follows a Meta-World benchmark~\cite{pmlr-v100-yu20a} style and defines \texttt{set\_task} method with one argument --- a TPTP problem full path. If one resets an environment without explicitly setting a task in advance, the environment defaults to a simple group theory problem (any idempotent element equals the identity). Having a default task helps us keep compatibility with algorithms not aware of multi-task RL. One can inherit from \texttt{gym-saturation} environment classes to set a random problem at every reset or implement any other desirable behaviour. 144 | \section{Representation subsystem}\label{section:representation-subsystem} 145 | \subsection{Existing first-order formulae representations and related projects} 146 | As mentioned in Section~\ref{section:related-work}, to apply any deep reinforcement learning algorithm, one needs a representation of the environment state in a tensor form first. There are many known feature engineering procedures. It can be as simple as clause age and weight~\cite{10.1007/978-3-030-29436-6_27}, or information extracted from a clause syntax tree~\cite{mockju-ecai20} or an inference lineage of a clause~\cite{10.1007/978-3-030-79876-5_31}. Representing logic formulae as such is an active research domain: for example, in~\cite{VectorRepresentations}, the authors proposed more than a dozen different embedding techniques based on formulae syntax. In communities other than automated deduction, researchers also study first-order formulae representation: for example, in~\cite{10.1007/978-3-031-21203-1_22}, the authors use semantics representation rather than syntax. One can also notice that first-order logic (FOL) is nothing more than a formal language, so abstract syntax trees of FOL are not, in principle, that different from those of programming language statements. And of course, encoding models for programming languages (like \texttt{code2vec}~\cite{alon2019code2vec} for Java) exist, as well as commercially available solutions as GPT-3~\cite{10.5555/3495724.3495883} generic code embeddings and comparable free models like LLaMA~\cite{DBLP:journals/corr/abs-2302-13971}. 147 | 148 | To make the first step in this direction, we took advantage of existing pre-trained embedding models for programming languages and tried to apply them to a seemingly disconnected domain of automated provers. 149 | \subsection{\texttt{ast2vec} and our contributions to it} 150 | In~\cite{Paassen2022}, the authors proposed a particular neural network architecture they called \emph{Recursive Tree Grammar Autoencoders (RTG-AE)}, which encodes abstract syntax trees produced by a programming language parser into real vectors. Being interested in education applications, they also published the pre-trained model for Python~\cite{Paassen_McBroom_Jeffries_Koprinska_Yacef_2021}. To make use of it for our purpose, we furnished several technical improvements to their code (our contribution is freely available~\footnote{\url{https://gitlab.com/inpefess/ast2vec}}): 151 | \begin{itemize} 152 | \item a TorchServe~\cite{torchserve} handler for HTTP POST requests for embeddings 153 | \item request caching with the Memcached server~\cite{memcached} 154 | \item Docker container to start the whole subsystem easily on any operating system 155 | \end{itemize} 156 | 157 | \begin{figure}[H] 158 | \includegraphics[width=\textwidth]{ast2vec} 159 | \caption{gym-saturation communication with ast2vec} \label{fig:ast2vec} 160 | \end{figure} 161 | 162 | To integrate the \texttt{ast2vec} server with \texttt{gym-saturation} environments, we added Gymnasium observation wrappers, one of them mapping a clause in the TPTP language to a boolean-valued statement in Python (in particular, by replacing logic operation symbols, e.g. \texttt{=} in TPTP becomes \texttt{==} in Python). See Figure~\ref{fig:ast2vec} for a communication diagram. In principle, since a clause doesn't contain any quantifiers explicitly, one can rewrite it as a boolean-valued expression in many programming languages for which pre-trained embeddings might exist. 163 | 164 | \subsection{Latency considerations}\label{subsection:latency-considerations} 165 | Looking at Figure~\ref{fig:ast2vec}, one might wonder how efficient is such an architecture. The average response time observed in our experiments was $2ms$ (with a $150ms$ maximum). A typical natural language processing model which embeds whole texts has a latency from $40ms$ to more than $600ms$~\cite{nvidia-blog} (depending on the model complexity and the length of a text to embed) when run on CPU, so there is no reason to believe that \texttt{ast2vec} is too slow. When evaluating a prover, one usually fixes the time limit: for example, $60s$ is the default value for Vampire. Being written in C++ and with a cornucopia of optimisation tweaks, Vampire can generate around a million clauses during this relatively short timeframe. Thus, to be on par with Vampire, a representation service must have latency around $60\mu s$ (orders of magnitude faster than we have). There can be several ways to lower the latency: 166 | \begin{itemize} 167 | \item inference in batches (one should train the embedding model to do it; \texttt{ast2vec} doesn't do it out of the box). The improvement may vary 168 | \item use GPU. NVIDIA reports around 20x improvement vs CPU~\cite{nlu-with-tensorrt-bert}. However, throwing more GPUs won't be as efficient without batch inference from the previous point 169 | \item request an embedding for a binary object of an already parsed clause instead of a TPTP string. It means not repeating parsing already done by a prover, which might lower the latency substantially. To do this, one will have to patch an underlying prover to return binary objects instead of TPTP strings 170 | \item use RPC (remote procedure call) instead of REST protocol. TorchServe relies on REST and parcels in JSON format, and in gRPC~\cite{grpc}, they prefer the binary \texttt{protobuf} format. One rarely expects sub-millisecond latency from REST, although for RPC, $150\mu s$ is not unusual. This point doesn't make much sense without the previous one 171 | \end{itemize} 172 | 173 | \section{Usage examples}\label{section:experiments} 174 | We provide examples of experiments easily possible with \texttt{gym-saturation} as a supplementary code to this paper~\footnote{\url{https://github.com/inpefess/ray-prover/releases/tag/v0.0.3}}. We don't consider these experiments as being of any scientific significance per se, serving merely as illustrations and basic usage examples. Tweaking the RL algorithms' meta-parameters and deep neural network architectures is out of the scope of the present system description. 175 | 176 | We coded these experiments in the Ray framework, which includes an RLlib --- a library of popular RL algorithms. The Ray is compatible with Tensorflow~\cite{tensorflow2015-whitepaper} and PyTorch~\cite{NEURIPS2019_bdbca288} deep learning frameworks, so it doesn't limit a potential \texttt{gym-saturation} user by one. 177 | 178 | In the experiments, we try to solve \texttt{SET001-1} from the TPTP with \texttt{max\_clauses=20} (having no more than twenty clauses in the proof state) for guiding Vampire and \texttt{max\_clauses=15} for iProver. This difference is because even a random agent communicating to iProver manages to always solve \texttt{SET001-1} by generating no more than twenty clauses. We wanted training to start, but keep the examples as simple as possible, so we chose to harden the constraints instead of moving on to a more complicated problem. 179 | 180 | In one experiment, we organise clauses in two priority queues (by age and weight) and use an action wrapper to map from a queue number ($0$ or $1$) to the clause number. That means we don't implant these queues inside provers but follow a Gymnasium idiomatic way to extend environments. Of course, Vampire and iProver have these particular queues as part of their implementation, but our illustration shows one could use any other priorities instead. It transforms our environment into a semblance of a 2-armed bandit, and we use Thompson sampling~\cite{pmlr-v28-agrawal13} to train. This experiment reflects ideas similar to those described in~\cite{10.1007/978-3-031-10769-6_38}. 181 | 182 | In another experiment, we use \texttt{ast2vec} server for getting clause embeddings and train a Proximal Policy Optimisation (PPO) algorithm as implemented in the Ray RLlib. The default policy network there is a fully connected one, and we used $256\times20$ tensors as its input ($256$ is an embedding size in \texttt{ast2vec}, and $20$ is the maximal number of clauses we embed). So, the policy chooses a given clause given the embeddings of all clauses seen up to the current step (including those already chosen or judged to be redundant/subsumed). Such an approach is more similar to~\cite{FLoP}. 183 | 184 | \begin{figure}[H] 185 | \includegraphics[width=\textwidth]{mean-reward} 186 | \caption{Episode reward mean vs the total number of steps. The blue line is for a random agent and the orange one --- for the PPO. Both agents guide Vampire}\label{fig:mean-reward} 187 | \end{figure} 188 | 189 | We provide Figure~\ref{fig:mean-reward} as a typical training process chart. 190 | 191 | \section{Conclusion and future work} 192 | We contributed a new version of \texttt{gym-saturation}, which continued to be free and open-source software, easy to install and use while promising assistance in setting up experiments for RL research in the automated provers domain. In the new version, we enabled anyone interested to conduct experiments with RL algorithms independently of an underlying prover implementation. We also added the possibility of varying representations as external plug-ins for further experimentation. We hope that researchers having such an instrument can focus on more advanced questions, namely how to generate and prioritise training problems to better transfer search patterns learned on simpler theorems to harder ones. 193 | 194 | Our experience with adding Vampire and iProver support to \texttt{gym-saturation} shows that working tightly with corresponding prover developers is not mandatory, although it might help immensely. Implementing the prover guidance through the standard I/O (as in Vampire) seems to be relatively easy, and we hope more provers will add similar functionality in future to be more ML-friendly. Such provers could then profit from using any other external guidance (see~\cite{LPAR2023:Guiding_an_Instantiation_Prover} for a different system using the same iProver technical features as we did). 195 | 196 | We identify a discerning and computationally efficient representation service as a bottleneck for our approach and envision an upcoming project of creating a universal first-order logic embedding model usable not only by saturation-style provers but also tableaux-based ones, SMT-solvers, semantic reasoners, and beyond. 197 | \subsubsection{Acknowledgements} We would like to thank Konstantin Korovin for the productive discussion and for adding the external agents' communication feature to iProver, without which this work won't be possible. We also thank anonymous reviewers for their meticulous suggestions on improving the present paper. 198 | 199 | % 200 | % ---- Bibliography ---- 201 | % 202 | % BibTeX users should specify bibliography style 'splncs04'. 203 | % References will then be sorted and formatted in the correct style. 204 | % 205 | \bibliographystyle{splncs04} 206 | \bibliography{gym-saturation} 207 | % 208 | \end{document} 209 | -------------------------------------------------------------------------------- /tableaux2023-paper/iprover-gym.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
iProver thread
iProver thread
gym-saturation
(main thread)
gym-saturation...
Relay Server thread
Relay Server thread
TCP response
action
TCP response...
TCP request
observation
TCP request...
put action
into queue
put action...

observations queue
observations...

actions queue
actions queue
get observation
from queue
get observation...
Text is not SVG - cannot display
--------------------------------------------------------------------------------