├── LICENSE ├── README.md ├── data.tgz ├── figures └── ravel-overview.svg ├── requirements.txt ├── scripts ├── demos │ ├── README.md │ └── demo_run_benchmark_gemma_2_2b.ipynb ├── train_intervention.py ├── train_probe.py └── train_sae.py └── src ├── __init__.py ├── methods ├── README.md ├── differential_binary_masking.py ├── distributed_alignment_search.py ├── linear_adversarial_probe.py ├── pca.py ├── select_features.py └── sparse_autoencoder.py └── utils ├── dataset_utils.py ├── generate_ravel_instance.py ├── generation_utils.py ├── intervention_utils.py └── metric_utils.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/README.md -------------------------------------------------------------------------------- /data.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/data.tgz -------------------------------------------------------------------------------- /figures/ravel-overview.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/figures/ravel-overview.svg -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/demos/README.md: -------------------------------------------------------------------------------- 1 | # A collection of demo notebooks. 2 | -------------------------------------------------------------------------------- /scripts/demos/demo_run_benchmark_gemma_2_2b.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/scripts/demos/demo_run_benchmark_gemma_2_2b.ipynb -------------------------------------------------------------------------------- /scripts/train_intervention.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/scripts/train_intervention.py -------------------------------------------------------------------------------- /scripts/train_probe.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/scripts/train_probe.py -------------------------------------------------------------------------------- /scripts/train_sae.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/scripts/train_sae.py -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/methods/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/README.md -------------------------------------------------------------------------------- /src/methods/differential_binary_masking.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/differential_binary_masking.py -------------------------------------------------------------------------------- /src/methods/distributed_alignment_search.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/distributed_alignment_search.py -------------------------------------------------------------------------------- /src/methods/linear_adversarial_probe.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/linear_adversarial_probe.py -------------------------------------------------------------------------------- /src/methods/pca.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/pca.py -------------------------------------------------------------------------------- /src/methods/select_features.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/select_features.py -------------------------------------------------------------------------------- /src/methods/sparse_autoencoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/methods/sparse_autoencoder.py -------------------------------------------------------------------------------- /src/utils/dataset_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/utils/dataset_utils.py -------------------------------------------------------------------------------- /src/utils/generate_ravel_instance.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/utils/generate_ravel_instance.py -------------------------------------------------------------------------------- /src/utils/generation_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/utils/generation_utils.py -------------------------------------------------------------------------------- /src/utils/intervention_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/utils/intervention_utils.py -------------------------------------------------------------------------------- /src/utils/metric_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/explanare/ravel/HEAD/src/utils/metric_utils.py --------------------------------------------------------------------------------