├── .gitignore ├── LICENSE ├── README.md ├── data ├── beavertails_with_refusals_train.json ├── decoding_trust │ └── training_dataset.jsonl └── decoding_trust_with_refusals_train.json ├── notebooks └── repnoise_demo.ipynb ├── poetry.lock ├── pyproject.toml ├── representation_noising ├── datasets.py ├── evaluation.py └── loss.py ├── requirements.txt └── scripts ├── generate_paired_refusals.py └── generate_paired_refusals.sh /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/README.md -------------------------------------------------------------------------------- /data/beavertails_with_refusals_train.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/data/beavertails_with_refusals_train.json -------------------------------------------------------------------------------- /data/decoding_trust/training_dataset.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/data/decoding_trust/training_dataset.jsonl -------------------------------------------------------------------------------- /data/decoding_trust_with_refusals_train.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/data/decoding_trust_with_refusals_train.json -------------------------------------------------------------------------------- /notebooks/repnoise_demo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/notebooks/repnoise_demo.ipynb -------------------------------------------------------------------------------- /poetry.lock: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/poetry.lock -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/pyproject.toml -------------------------------------------------------------------------------- /representation_noising/datasets.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/representation_noising/datasets.py -------------------------------------------------------------------------------- /representation_noising/evaluation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/representation_noising/evaluation.py -------------------------------------------------------------------------------- /representation_noising/loss.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/representation_noising/loss.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/generate_paired_refusals.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/scripts/generate_paired_refusals.py -------------------------------------------------------------------------------- /scripts/generate_paired_refusals.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/domenicrosati/representation-noising/HEAD/scripts/generate_paired_refusals.sh --------------------------------------------------------------------------------