├── README.md ├── gen_acts.py ├── leace.ipynb ├── patching.py ├── prompted-lying ├── filter_data.py ├── save-results-70b.py ├── table-results-70b.py ├── test-and-graph-13b.py ├── test-and-graph-70b.py └── test-and-graph-7b.py ├── requirements.txt ├── utils ├── __pycache__ │ ├── torch_hooks_utils.cpython-310.pyc │ └── torch_hooks_utils.cpython-38.pyc ├── analytics_utils.py ├── interp_utils.py ├── new_probing_utils.py └── torch_hooks_utils.py └── workshop_paper_graphs.ipynb /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/README.md -------------------------------------------------------------------------------- /gen_acts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/gen_acts.py -------------------------------------------------------------------------------- /leace.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/leace.ipynb -------------------------------------------------------------------------------- /patching.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/patching.py -------------------------------------------------------------------------------- /prompted-lying/filter_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/filter_data.py -------------------------------------------------------------------------------- /prompted-lying/save-results-70b.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/save-results-70b.py -------------------------------------------------------------------------------- /prompted-lying/table-results-70b.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/table-results-70b.py -------------------------------------------------------------------------------- /prompted-lying/test-and-graph-13b.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/test-and-graph-13b.py -------------------------------------------------------------------------------- /prompted-lying/test-and-graph-70b.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/test-and-graph-70b.py -------------------------------------------------------------------------------- /prompted-lying/test-and-graph-7b.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/prompted-lying/test-and-graph-7b.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/requirements.txt -------------------------------------------------------------------------------- /utils/__pycache__/torch_hooks_utils.cpython-310.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/__pycache__/torch_hooks_utils.cpython-310.pyc -------------------------------------------------------------------------------- /utils/__pycache__/torch_hooks_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/__pycache__/torch_hooks_utils.cpython-38.pyc -------------------------------------------------------------------------------- /utils/analytics_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/analytics_utils.py -------------------------------------------------------------------------------- /utils/interp_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/interp_utils.py -------------------------------------------------------------------------------- /utils/new_probing_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/new_probing_utils.py -------------------------------------------------------------------------------- /utils/torch_hooks_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/utils/torch_hooks_utils.py -------------------------------------------------------------------------------- /workshop_paper_graphs.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jam3scampbell/llama-lying/HEAD/workshop_paper_graphs.ipynb --------------------------------------------------------------------------------