├── LICENES ├── README.md ├── assets └── overview.png ├── requirements.txt └── src ├── compute_cka.py ├── datasets ├── confaide.csv ├── pku-rlhf-10k.csv ├── stereoset.csv ├── toxigen.csv └── truthfulqa.csv ├── generate_activations.py ├── generate_head_activations.py ├── ics.py ├── logits.py ├── model.py ├── pcs.py ├── plot.ipynb ├── scripts ├── save_activation.sh └── save_logits.sh ├── train_cls.py ├── train_cls_gcn.py ├── transfer_cls.py └── utils.py /LICENES: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/LICENES -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/README.md -------------------------------------------------------------------------------- /assets/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/assets/overview.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/requirements.txt -------------------------------------------------------------------------------- /src/compute_cka.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/compute_cka.py -------------------------------------------------------------------------------- /src/datasets/confaide.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/datasets/confaide.csv -------------------------------------------------------------------------------- /src/datasets/pku-rlhf-10k.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/datasets/pku-rlhf-10k.csv -------------------------------------------------------------------------------- /src/datasets/stereoset.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/datasets/stereoset.csv -------------------------------------------------------------------------------- /src/datasets/toxigen.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/datasets/toxigen.csv -------------------------------------------------------------------------------- /src/datasets/truthfulqa.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/datasets/truthfulqa.csv -------------------------------------------------------------------------------- /src/generate_activations.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/generate_activations.py -------------------------------------------------------------------------------- /src/generate_head_activations.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/generate_head_activations.py -------------------------------------------------------------------------------- /src/ics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/ics.py -------------------------------------------------------------------------------- /src/logits.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/logits.py -------------------------------------------------------------------------------- /src/model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/model.py -------------------------------------------------------------------------------- /src/pcs.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/pcs.py -------------------------------------------------------------------------------- /src/plot.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/plot.ipynb -------------------------------------------------------------------------------- /src/scripts/save_activation.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/scripts/save_activation.sh -------------------------------------------------------------------------------- /src/scripts/save_logits.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/scripts/save_logits.sh -------------------------------------------------------------------------------- /src/train_cls.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/train_cls.py -------------------------------------------------------------------------------- /src/train_cls_gcn.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/train_cls_gcn.py -------------------------------------------------------------------------------- /src/transfer_cls.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/transfer_cls.py -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/REEF/HEAD/src/utils.py --------------------------------------------------------------------------------