├── .gitignore ├── LICENSE ├── README.md ├── activation_steering ├── __init__.py ├── config.py ├── console.py ├── leash_layer.py ├── malleable_model.py ├── steering_dataset.py ├── steering_vector.py └── utils.py ├── docs ├── config.md ├── console.md ├── demo-data │ ├── alpaca.json │ ├── behavior_refusal.json │ ├── condition_harmful.json │ └── condition_multiple.json ├── faq.md ├── leash_layer.md ├── malleable_model.md ├── quickstart.md ├── steering_dataset.md ├── steering_vector.md └── utils.md ├── poetry.lock └── pyproject.toml /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/README.md -------------------------------------------------------------------------------- /activation_steering/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/__init__.py -------------------------------------------------------------------------------- /activation_steering/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/config.py -------------------------------------------------------------------------------- /activation_steering/console.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/console.py -------------------------------------------------------------------------------- /activation_steering/leash_layer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/leash_layer.py -------------------------------------------------------------------------------- /activation_steering/malleable_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/malleable_model.py -------------------------------------------------------------------------------- /activation_steering/steering_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/steering_dataset.py -------------------------------------------------------------------------------- /activation_steering/steering_vector.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/steering_vector.py -------------------------------------------------------------------------------- /activation_steering/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/activation_steering/utils.py -------------------------------------------------------------------------------- /docs/config.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/config.md -------------------------------------------------------------------------------- /docs/console.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/console.md -------------------------------------------------------------------------------- /docs/demo-data/alpaca.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/demo-data/alpaca.json -------------------------------------------------------------------------------- /docs/demo-data/behavior_refusal.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/demo-data/behavior_refusal.json -------------------------------------------------------------------------------- /docs/demo-data/condition_harmful.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/demo-data/condition_harmful.json -------------------------------------------------------------------------------- /docs/demo-data/condition_multiple.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/demo-data/condition_multiple.json -------------------------------------------------------------------------------- /docs/faq.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/faq.md -------------------------------------------------------------------------------- /docs/leash_layer.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/leash_layer.md -------------------------------------------------------------------------------- /docs/malleable_model.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/malleable_model.md -------------------------------------------------------------------------------- /docs/quickstart.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/quickstart.md -------------------------------------------------------------------------------- /docs/steering_dataset.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/steering_dataset.md -------------------------------------------------------------------------------- /docs/steering_vector.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/steering_vector.md -------------------------------------------------------------------------------- /docs/utils.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/docs/utils.md -------------------------------------------------------------------------------- /poetry.lock: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/poetry.lock -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/activation-steering/HEAD/pyproject.toml --------------------------------------------------------------------------------