├── .gitignore ├── LICENSE ├── README.md ├── examples ├── basic_example.ipynb ├── basic_example.py ├── data │ ├── langchain_data_tool_use_FULL.csv │ └── livebench_13012025.parquet └── plots │ ├── indep_comparisons.png │ ├── indep_intervals.png │ └── paired_comparisons.png ├── setup.py ├── src └── bayes_evals │ ├── __init__.py │ └── bayes_evals.py └── tests └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | *.egg-info/ 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/README.md -------------------------------------------------------------------------------- /examples/basic_example.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/basic_example.ipynb -------------------------------------------------------------------------------- /examples/basic_example.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/basic_example.py -------------------------------------------------------------------------------- /examples/data/langchain_data_tool_use_FULL.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/data/langchain_data_tool_use_FULL.csv -------------------------------------------------------------------------------- /examples/data/livebench_13012025.parquet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/data/livebench_13012025.parquet -------------------------------------------------------------------------------- /examples/plots/indep_comparisons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/plots/indep_comparisons.png -------------------------------------------------------------------------------- /examples/plots/indep_intervals.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/plots/indep_intervals.png -------------------------------------------------------------------------------- /examples/plots/paired_comparisons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/examples/plots/paired_comparisons.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/setup.py -------------------------------------------------------------------------------- /src/bayes_evals/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/src/bayes_evals/__init__.py -------------------------------------------------------------------------------- /src/bayes_evals/bayes_evals.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/src/bayes_evals/bayes_evals.py -------------------------------------------------------------------------------- /tests/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sambowyer/bayes_evals/HEAD/tests/utils.py --------------------------------------------------------------------------------