├── .gitignore ├── README.md ├── assets └── website-safety-details-fixed.svg ├── demo.ipynb ├── examples ├── Reinforcement Controls.ipynb ├── Supervised Controls.ipynb └── data │ ├── Ethical_override_data.csv │ ├── filtered_known_1000_with_output.csv │ ├── rc_post_edit_harmful_benchmark.csv │ ├── rc_pre_edit_harmful_benchmark.csv │ └── transfer_expriment_behaviors.csv ├── selfie ├── __init__.py ├── generate_wrappers.py ├── interpret.py └── llama_forward_wrappers.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/README.md -------------------------------------------------------------------------------- /assets/website-safety-details-fixed.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/assets/website-safety-details-fixed.svg -------------------------------------------------------------------------------- /demo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/demo.ipynb -------------------------------------------------------------------------------- /examples/Reinforcement Controls.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/Reinforcement Controls.ipynb -------------------------------------------------------------------------------- /examples/Supervised Controls.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/Supervised Controls.ipynb -------------------------------------------------------------------------------- /examples/data/Ethical_override_data.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/data/Ethical_override_data.csv -------------------------------------------------------------------------------- /examples/data/filtered_known_1000_with_output.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/data/filtered_known_1000_with_output.csv -------------------------------------------------------------------------------- /examples/data/rc_post_edit_harmful_benchmark.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/data/rc_post_edit_harmful_benchmark.csv -------------------------------------------------------------------------------- /examples/data/rc_pre_edit_harmful_benchmark.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/data/rc_pre_edit_harmful_benchmark.csv -------------------------------------------------------------------------------- /examples/data/transfer_expriment_behaviors.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/examples/data/transfer_expriment_behaviors.csv -------------------------------------------------------------------------------- /selfie/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/selfie/__init__.py -------------------------------------------------------------------------------- /selfie/generate_wrappers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/selfie/generate_wrappers.py -------------------------------------------------------------------------------- /selfie/interpret.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/selfie/interpret.py -------------------------------------------------------------------------------- /selfie/llama_forward_wrappers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/selfie/llama_forward_wrappers.py -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tonychenxyz/selfie/HEAD/setup.py --------------------------------------------------------------------------------