├── .gitignore ├── Datasets.py ├── README.md ├── configs ├── dialog.json ├── dialog_dp │ ├── dp_0.0.json │ ├── dp_0.05.json │ ├── dp_0.1.json │ ├── dp_0.2.json │ ├── dp_0.3.json │ ├── dp_0.4.json │ ├── dp_0.5.json │ ├── dp_0.6.json │ ├── dp_0.7.json │ ├── dp_0.8.json │ └── dp_0.9.json ├── dialog_st │ ├── dialog_suffix_tree.json │ ├── dialog_suffix_tree1.json │ ├── dialog_suffix_tree2.json │ ├── dialog_suffix_tree3.json │ └── dialog_suffix_tree4.json ├── dialog_suffix_tree_debug.json ├── dialog_unlearn │ ├── dialog.json │ ├── dialog1.json │ ├── dialog2.json │ ├── dialog3.json │ └── dialog4.json ├── dp │ ├── 1.3B_0.json │ ├── 1.3B_1.json │ ├── 1.3B_2.json │ ├── 1.3B_3.json │ ├── 1.3B_4.json │ ├── 1.3B_general.json │ ├── 125M_0.json │ ├── 125M_1.json │ ├── 125M_2.json │ ├── 125M_3.json │ ├── 125M_4.json │ ├── 125M_general.json │ ├── 2.7B_0.json │ ├── 2.7B_1.json │ ├── 2.7B_2.json │ ├── 2.7B_3.json │ ├── 2.7B_4.json │ ├── 2.7B_general.json │ ├── create_configs.py │ └── template.json └── example.json ├── csv_out └── Dialog Initial.csv ├── data ├── domain_main │ ├── books3_8_0.csv │ ├── books3_8_1.csv │ ├── books3_8_2.csv │ ├── books3_8_3.csv │ ├── books3_8_4.csv │ ├── enron_emails_8_0.csv │ ├── enron_emails_8_1.csv │ ├── enron_emails_8_2.csv │ ├── enron_emails_8_3.csv │ ├── enron_emails_8_4.csv │ ├── freelaw_8_0.csv │ ├── freelaw_8_1.csv │ ├── freelaw_8_2.csv │ ├── freelaw_8_3.csv │ ├── freelaw_8_4.csv │ ├── github_8_0.csv │ ├── github_8_1.csv │ ├── github_8_2.csv │ ├── github_8_3.csv │ ├── github_8_4.csv │ ├── license_8_0.csv │ ├── license_8_1.csv │ ├── license_8_2.csv │ ├── license_8_3.csv │ ├── license_8_4.csv │ ├── pile-cc_8_0.csv │ ├── pile-cc_8_1.csv │ ├── pile-cc_8_2.csv │ ├── pile-cc_8_3.csv │ ├── pile-cc_8_4.csv │ ├── pubmed_central_8_0.csv │ ├── pubmed_central_8_1.csv │ ├── pubmed_central_8_2.csv │ ├── pubmed_central_8_3.csv │ ├── pubmed_central_8_4.csv │ ├── uspto_backgrounds_8_0.csv │ ├── uspto_backgrounds_8_1.csv │ ├── uspto_backgrounds_8_2.csv │ ├── uspto_backgrounds_8_3.csv │ └── uspto_backgrounds_8_4.csv └── main │ ├── lm_extraction_128_0.csv │ ├── lm_extraction_128_1.csv │ ├── lm_extraction_128_2.csv │ ├── lm_extraction_128_3.csv │ ├── lm_extraction_128_4.csv │ ├── lm_extraction_1_0.csv │ ├── lm_extraction_1_1.csv │ ├── lm_extraction_1_2.csv │ ├── lm_extraction_1_3.csv │ ├── lm_extraction_1_4.csv │ ├── lm_extraction_32_0.csv │ ├── lm_extraction_32_1.csv │ ├── lm_extraction_32_2.csv │ ├── lm_extraction_32_3.csv │ ├── lm_extraction_32_4.csv │ ├── lm_extraction_4_0.csv │ ├── lm_extraction_4_1.csv │ ├── lm_extraction_4_2.csv │ ├── lm_extraction_4_3.csv │ ├── lm_extraction_4_4.csv │ ├── lm_extraction_8_0.csv │ ├── lm_extraction_8_1.csv │ ├── lm_extraction_8_2.csv │ ├── lm_extraction_8_3.csv │ └── lm_extraction_8_4.csv ├── fig1.png ├── models ├── Neo_Model.py ├── Neo_Model_DP.py ├── Neo_Model_suffix_tree.py └── Neo_Model_valid.py ├── outputs ├── example.csv ├── init_DP-1.3B_0.0.csv ├── init_DP-1.3B_0.1.csv ├── init_DP-1.3B_0.2.csv ├── init_DP-1.3B_0.3.csv ├── init_DP-1.3B_0.4.csv ├── init_DP-1.3B_0.5.csv ├── init_DP-1.3B_0.6.csv ├── init_DP-1.3B_0.7.csv ├── init_DP-1.3B_0.8.csv ├── init_DP-1.3B_0.9.csv └── init_example.csv ├── requirements.txt ├── run.py ├── run_dp.py ├── run_st.py ├── utils.py └── validation_data ├── blended_skill_talk.json ├── empathetic_dialogues.json ├── lambada.csv ├── pile.csv ├── pubmed_qa.csv ├── valid_dm_mathematics.csv ├── wikitext.csv ├── wizard_of_internet.json └── wizard_of_wikipedia.json /.gitignore: -------------------------------------------------------------------------------- 1 | deepspeed 2 | 3 | #evaluation 4 | ckpt 5 | 6 | wandb 7 | tbImport.log 8 | 9 | __pycache__ 10 | models/__pycache__ 11 | 12 | logs 13 | 14 | *.pyc 15 | 16 | test.py 17 | nohup.out 18 | .fuse* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Knowledge Unlearning for Mitigating Privacy Risks in Langauge Models 2 | 3 | ![alt text](fig1.png "Main Figure") 4 | 5 | paper link: https://arxiv.org/abs/2210.01504 6 | 7 | In order to reproduce our results, take the following steps: 8 | ### 1. Create conda environment and install requirements 9 | ``` 10 | conda create -n ufl python=3.8 11 | conda activate ufl 12 | # Install the correct torch version depending on CUDA version from https://pytorch.org/ 13 | conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch 14 | pip install -r requirements.txt 15 | ``` 16 | 17 | ### 2. In order to run the basic code, use the following command 18 | ``` 19 | python run.py --config configs/example.json 20 | ``` 21 | 22 | ### 3. Reproducing Experimental Results 23 | 24 | **Configs** 25 | - mode (string) : Either "unlearn" or "general_lm_eval" 26 | - "unlearn" will measure MA and EL for validation sets with valid_type_path == "target", for others it will run normal evaluation 27 | - "general_lm_eval" will run normal evaluation for all validation sets, only use when not evaulating the target data (the data that should be unlearned) 28 | - check_validation_only (bool) : If true, a single validation loop will run without training 29 | - do_init_eval (bool) : Whether to run a single validation loop before training 30 | - train_set (string) : Path to train_set, should be a .csv file 31 | - valid_sets (list[string]) : List containing validation set info 32 | - Could either be a .csv file path, or the dataset name on Huggingface hub 33 | - valid_subset_path (list[string]) : Subset name of the dataset from HF hub 34 | - If it does not have a subset, or is a .csv file the string will be ignored 35 | - valid_type_path (list[string]) : Type of the valdiation data 36 | - If it's the target data pass "target" 37 | - If it's a HF hub data pass the appropriate type 38 | - If it's a .csv file the string will be ignored 39 | - el_n (list[int]) : list of n values for EL 40 | - el_threshold (float) : The models EL score for unseen data, exact values for each models in paper 41 | - ma_threshold (float) : The models MA score for unseen data, exact values for each models in paper 42 | - min_train_epochs (int) : Guarantees the minimum amount of epochs 43 | - By default the model will stop training when it reaches both el_threshold and ma_threshold 44 | - This configuration will give some control over this behaviour 45 | - target_length (int) : The token length of the unlearning target data 46 | - input_length, output_length (int) : The token length of the input, output for LM evaluation tasks 47 | - strategy : Strategy passed to Lightning Trainer() 48 | - The code was tested with "deepspeed_stage_2" and "deepspeed_stage_2_offload" 49 | 50 | **Note** 51 | - The effective batch size (train_batch_size * gradient_accumulation_steps * ngpu) should be identical to the train set size 52 | - We found that minimizing gradient updates is crucial for retaining LM performance 53 | - If "effective batch size" != "train set size" the code will throw an error 54 | - The eval_batch_size will be replaced with train_batch_size only for "target" data, because "target" data are usually much smaller than LM eval data 55 | - This also speeds up the evaluation, because it guarantees a single eval step 56 | - The code will save two .csv files to "outputs/". They contain MA and EL scores for each individual examples within the target data 57 | - One contains the validation results measured before training 58 | - The other contains the validation results throughout training 59 | -------------------------------------------------------------------------------- /configs/dialog.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Dialog Initial", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_espoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_0.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.0.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.0", 5 | "lambda_weight": 0, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.05.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.05", 5 | "lambda_weight": 0.05, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.1", 5 | "lambda_weight": 0.1, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.2", 5 | "lambda_weight": 0.2, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.3", 5 | "lambda_weight": 0.3, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.4", 5 | "lambda_weight": 0.4, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.5.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.5", 5 | "lambda_weight": 0.5, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.6.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.6", 5 | "lambda_weight": 0.6, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.7.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.7", 5 | "lambda_weight": 0.7, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.8.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.8", 5 | "lambda_weight": 0.8, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_dp/dp_0.9.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "DP-1.3B_0.9", 5 | "lambda_weight": 0.9, 6 | "num_train_epochs": 20, 7 | "check_val_every_n_epoch": 1, 8 | "check_validation_only": true, 9 | "do_init_eval": true, 10 | "train_set": "data/main/lm_extraction_32_0.csv", 11 | "valid_sets": [ 12 | "data/main/lm_extraction_32_0.csv", 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "", 23 | "" 24 | ], 25 | "valid_type_path": [ 26 | "target", 27 | "", 28 | "", 29 | "", 30 | "" 31 | ], 32 | "train_batch_size": 32, 33 | "eval_batch_size": 32, 34 | "gradient_accumulation_steps": 1, 35 | "ngpu": 1, 36 | "learning_rate": 5e-5, 37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 38 | "el_threshold": 0.0499, 39 | "ma_threshold": 0.2994, 40 | "input_length": 512, 41 | "output_length": 512, 42 | "target_length": 200, 43 | "num_workers": 64, 44 | "strategy": "deepspeed_stage_2_offload", 45 | "fp16": true, 46 | "wandb_log": true 47 | } 48 | -------------------------------------------------------------------------------- /configs/dialog_st/dialog_suffix_tree.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Suffix-1.3B_32_0", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_0.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_st/dialog_suffix_tree1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Suffix-1.3B_32_1", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_1.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_st/dialog_suffix_tree2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Suffix-1.3B_32_2", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_2.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_st/dialog_suffix_tree3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Suffix-1.3B_32_3", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_3.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_st/dialog_suffix_tree4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Suffix-1.3B_32_4", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_4.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "", 21 | "" 22 | ], 23 | "valid_type_path": [ 24 | "target", 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dialog_suffix_tree_debug.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning", 4 | "wandb_run_name": "example", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": true, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_0.csv", 10 | "valid_sets": [ 11 | "data/main/lm_extraction_32_0.csv" 12 | ], 13 | "valid_subset_path": [ 14 | "" 15 | ], 16 | "valid_type_path": [ 17 | "target" 18 | ], 19 | "train_batch_size": 32, 20 | "eval_batch_size": 32, 21 | "gradient_accumulation_steps": 1, 22 | "ngpu": 1, 23 | "learning_rate": 5e-5, 24 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 25 | "el_threshold": 0.0499, 26 | "ma_threshold": 0.2994, 27 | "input_length": 512, 28 | "output_length": 512, 29 | "target_length": 200, 30 | "num_workers": 64, 31 | "strategy": "deepspeed_stage_2_offload", 32 | "fp16": true, 33 | "wandb_log": true 34 | } 35 | -------------------------------------------------------------------------------- /configs/dialog_unlearn/dialog.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Unlearn-1.3B_32_0", 5 | "num_train_epochs": 13, 6 | "check_val_every_n_epoch": 13, 7 | "check_validation_only": false, 8 | "do_init_eval": false, 9 | "train_set": "data/main/lm_extraction_32_0.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "" 21 | ], 22 | "valid_type_path": [ 23 | "", 24 | "", 25 | "", 26 | "" 27 | ], 28 | "train_batch_size": 8, 29 | "eval_batch_size": 32, 30 | "gradient_accumulation_steps": 1, 31 | "ngpu": 4, 32 | "learning_rate": 5e-5, 33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 34 | "el_threshold": 0.0499, 35 | "ma_threshold": 0.2994, 36 | "input_length": 512, 37 | "output_length": 512, 38 | "target_length": 200, 39 | "num_workers": 64, 40 | "strategy": "deepspeed_stage_2_offload", 41 | "fp16": true, 42 | "wandb_log": true 43 | } 44 | -------------------------------------------------------------------------------- /configs/dialog_unlearn/dialog1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Unlearn-1.3B_32_1", 5 | "num_train_epochs": 14, 6 | "check_val_every_n_epoch": 14, 7 | "check_validation_only": false, 8 | "do_init_eval": false, 9 | "train_set": "data/main/lm_extraction_32_1.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "" 21 | ], 22 | "valid_type_path": [ 23 | "", 24 | "", 25 | "", 26 | "" 27 | ], 28 | "train_batch_size": 8, 29 | "eval_batch_size": 32, 30 | "gradient_accumulation_steps": 1, 31 | "ngpu": 4, 32 | "learning_rate": 5e-5, 33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 34 | "el_threshold": 0.0499, 35 | "ma_threshold": 0.2994, 36 | "input_length": 512, 37 | "output_length": 512, 38 | "target_length": 200, 39 | "num_workers": 64, 40 | "strategy": "deepspeed_stage_2_offload", 41 | "fp16": true, 42 | "wandb_log": true 43 | } 44 | -------------------------------------------------------------------------------- /configs/dialog_unlearn/dialog2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Unlearn-1.3B_32_2", 5 | "num_train_epochs": 13, 6 | "check_val_every_n_epoch": 13, 7 | "check_validation_only": false, 8 | "do_init_eval": false, 9 | "train_set": "data/main/lm_extraction_32_2.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "" 21 | ], 22 | "valid_type_path": [ 23 | "", 24 | "", 25 | "", 26 | "" 27 | ], 28 | "train_batch_size": 8, 29 | "eval_batch_size": 32, 30 | "gradient_accumulation_steps": 1, 31 | "ngpu": 4, 32 | "learning_rate": 5e-5, 33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 34 | "el_threshold": 0.0499, 35 | "ma_threshold": 0.2994, 36 | "input_length": 512, 37 | "output_length": 512, 38 | "target_length": 200, 39 | "num_workers": 64, 40 | "strategy": "deepspeed_stage_2_offload", 41 | "fp16": true, 42 | "wandb_log": true 43 | } 44 | -------------------------------------------------------------------------------- /configs/dialog_unlearn/dialog3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Unlearn-1.3B_32_3", 5 | "num_train_epochs": 14, 6 | "check_val_every_n_epoch": 14, 7 | "check_validation_only": false, 8 | "do_init_eval": false, 9 | "train_set": "data/main/lm_extraction_32_3.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "" 21 | ], 22 | "valid_type_path": [ 23 | "", 24 | "", 25 | "", 26 | "" 27 | ], 28 | "train_batch_size": 8, 29 | "eval_batch_size": 32, 30 | "gradient_accumulation_steps": 1, 31 | "ngpu": 4, 32 | "learning_rate": 5e-5, 33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 34 | "el_threshold": 0.0499, 35 | "ma_threshold": 0.2994, 36 | "input_length": 512, 37 | "output_length": 512, 38 | "target_length": 200, 39 | "num_workers": 64, 40 | "strategy": "deepspeed_stage_2_offload", 41 | "fp16": true, 42 | "wandb_log": true 43 | } 44 | -------------------------------------------------------------------------------- /configs/dialog_unlearn/dialog4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "wandb_project": "Knowledge Unlearning Dialog", 4 | "wandb_run_name": "Unlearn-1.3B_32_4", 5 | "num_train_epochs": 15, 6 | "check_val_every_n_epoch": 15, 7 | "check_validation_only": false, 8 | "do_init_eval": false, 9 | "train_set": "data/main/lm_extraction_32_4.csv", 10 | "valid_sets": [ 11 | "validation_data/wizard_of_wikipedia.json", 12 | "validation_data/empathetic_dialogues.json", 13 | "validation_data/blended_skill_talk.json", 14 | "validation_data/wizard_of_internet.json" 15 | ], 16 | "valid_subset_path": [ 17 | "", 18 | "", 19 | "", 20 | "" 21 | ], 22 | "valid_type_path": [ 23 | "", 24 | "", 25 | "", 26 | "" 27 | ], 28 | "train_batch_size": 8, 29 | "eval_batch_size": 32, 30 | "gradient_accumulation_steps": 1, 31 | "ngpu": 4, 32 | "learning_rate": 5e-5, 33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 34 | "el_threshold": 0.0499, 35 | "ma_threshold": 0.2994, 36 | "input_length": 512, 37 | "output_length": 512, 38 | "target_length": 200, 39 | "num_workers": 64, 40 | "strategy": "deepspeed_stage_2_offload", 41 | "fp16": true, 42 | "wandb_log": true 43 | } 44 | -------------------------------------------------------------------------------- /configs/dp/1.3B_0.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_0", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_0.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/1.3B_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_1", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_1.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_1.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/1.3B_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_2", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_2.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_2.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/1.3B_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_3", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_3.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_3.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/1.3B_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_4", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_4.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_4.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/1.3B_general.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_General", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "" 23 | ], 24 | "valid_type_path": [ 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dp/125M_0.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_0", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_0.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/125M_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_1", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_1.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_1.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/125M_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_2", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_2.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_2.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/125M_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_3", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_3.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_3.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/125M_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_4", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_4.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_4.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/125M_general.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-125M_General", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "" 23 | ], 24 | "valid_type_path": [ 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dp/2.7B_0.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_0", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_0.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/2.7B_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_1", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_1.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_1.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/2.7B_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_2", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_2.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_2.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/2.7B_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_3", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_3.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_3.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/2.7B_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_4", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_4.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_4.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-05, 26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } -------------------------------------------------------------------------------- /configs/dp/2.7B_general.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "general_lm_eval", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-2.7B_General", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "validation_data/wizard_of_wikipedia.json", 14 | "validation_data/empathetic_dialogues.json", 15 | "validation_data/blended_skill_talk.json", 16 | "validation_data/wizard_of_internet.json" 17 | ], 18 | "valid_subset_path": [ 19 | "", 20 | "", 21 | "", 22 | "" 23 | ], 24 | "valid_type_path": [ 25 | "", 26 | "", 27 | "", 28 | "" 29 | ], 30 | "train_batch_size": 32, 31 | "eval_batch_size": 32, 32 | "gradient_accumulation_steps": 1, 33 | "ngpu": 1, 34 | "learning_rate": 5e-5, 35 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B", 36 | "el_threshold": 0.0499, 37 | "ma_threshold": 0.2994, 38 | "input_length": 512, 39 | "output_length": 512, 40 | "target_length": 200, 41 | "num_workers": 64, 42 | "strategy": "deepspeed_stage_2_offload", 43 | "fp16": true, 44 | "wandb_log": true 45 | } 46 | -------------------------------------------------------------------------------- /configs/dp/create_configs.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | model = '2.7B' 4 | batch = 128 5 | f = open(f'/home/lklab/knowledge-unlearning/configs/dp/template.json') 6 | data = json.load(f) 7 | print(data) 8 | 9 | for i in range(0, 5): 10 | data['wandb_run_name'] = f'DP-0.2-{model}_{i}' 11 | data['model_name_or_path'] = f"EleutherAI/gpt-neo-{model}" 12 | data['train_set'] = f'data/main/lm_extraction_32_{i}.csv' 13 | data['valid_sets'][0] = f'data/main/lm_extraction_32_{i}.csv' 14 | with open(f'/home/lklab/knowledge-unlearning/configs/dp/{model}_{i}.json', 'w') as fp: 15 | json.dump(data, fp, indent=4) -------------------------------------------------------------------------------- /configs/dp/template.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "privacy_method": "dp", 4 | "wandb_project": "Knowledge Unlearning Dialog", 5 | "wandb_run_name": "DP-0.2-1.3B_0", 6 | "lambda_weight": 0.2, 7 | "num_train_epochs": 20, 8 | "check_val_every_n_epoch": 1, 9 | "check_validation_only": true, 10 | "do_init_eval": true, 11 | "train_set": "data/main/lm_extraction_32_0.csv", 12 | "valid_sets": [ 13 | "data/main/lm_extraction_32_0.csv" 14 | ], 15 | "valid_subset_path": [ 16 | "" 17 | ], 18 | "valid_type_path": [ 19 | "target" 20 | ], 21 | "train_batch_size": 32, 22 | "eval_batch_size": 32, 23 | "gradient_accumulation_steps": 1, 24 | "ngpu": 1, 25 | "learning_rate": 5e-5, 26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B", 27 | "el_threshold": 0.0499, 28 | "ma_threshold": 0.2994, 29 | "input_length": 512, 30 | "output_length": 512, 31 | "target_length": 200, 32 | "num_workers": 64, 33 | "strategy": "deepspeed_stage_2_offload", 34 | "fp16": true, 35 | "wandb_log": true 36 | } 37 | -------------------------------------------------------------------------------- /configs/example.json: -------------------------------------------------------------------------------- 1 | { 2 | "mode": "unlearn", 3 | "wandb_project": "Knowledge Unlearning", 4 | "wandb_run_name": "example", 5 | "num_train_epochs": 20, 6 | "check_val_every_n_epoch": 1, 7 | "check_validation_only": false, 8 | "do_init_eval": true, 9 | "train_set": "data/main/lm_extraction_32_0.csv", 10 | "valid_sets": [ 11 | "data/main/lm_extraction_32_0.csv", 12 | "validation_data/lambada.csv", 13 | "piqa", 14 | "hellaswag", 15 | "ai2_arc", 16 | "ai2_arc", 17 | "super_glue", 18 | "winogrande", 19 | "math_qa", 20 | "validation_data/pubmed_qa.csv" 21 | ], 22 | "valid_subset_path": [ 23 | "", 24 | "", 25 | "", 26 | "", 27 | "ARC-Easy", 28 | "ARC-Challenge", 29 | "copa", 30 | "winogrande_s", 31 | "", 32 | "" 33 | ], 34 | "valid_type_path": [ 35 | "target", 36 | "test", 37 | "validation", 38 | "validation", 39 | "validation", 40 | "validation", 41 | "validation", 42 | "validation", 43 | "validation", 44 | "" 45 | ], 46 | "train_batch_size": 8, 47 | "eval_batch_size": 8, 48 | "gradient_accumulation_steps": 4, 49 | "ngpu": 1, 50 | "learning_rate": 5e-5, 51 | "model_name_or_path": "EleutherAI/gpt-neo-125M", 52 | "el_threshold": 0.0499, 53 | "ma_threshold": 0.2994, 54 | "input_length": 512, 55 | "output_length": 512, 56 | "target_length": 200, 57 | "num_workers": 64, 58 | "strategy": "deepspeed_stage_2_offload", 59 | "fp16": true, 60 | "wandb_log": true 61 | } 62 | -------------------------------------------------------------------------------- /csv_out/Dialog Initial.csv: -------------------------------------------------------------------------------- 1 | ,wizard_of_wikipedia/loss,wizard_of_wikipedia/f1,empathetic_dialogues/loss,empathetic_dialogues/f1,blended_skill_talk/loss,blended_skill_talk/f1,wizard_of_internet/loss,wizard_of_internet/f1 2 | 0,4.13671875,0.07524767518043518,3.724609375,0.08438178896903992,3.865234375,0.11232497543096542,3.83203125,0.1023608073592186 3 | -------------------------------------------------------------------------------- /data/domain_main/enron_emails_8_1.csv: -------------------------------------------------------------------------------- 1 | doc_id,corpus,text 2 | 2809,enron_emails,", 3 | ? 4 | You have been selected to participate in the Mid Year 2001 Performance 5 | Management process. Your feedback plays an important role in the process, 6 | and your participation is critical to the success of Enron's Performance 7 | Management goals. 8 | ? 9 | To complete a request for feedback, access PEP at http://pep.enron.com and 10 | select Complete Feedback from the Main Menu. You may begin providing 11 | feedback immediately and are requested to have all feedback forms completed 12 | by Friday, May 25, 2001. 13 | ? 14 | If you have any questions regarding PEP or your responsibility in the 15 | process, please contact the PEP Help Desk at: 16 | Houston: 1.713.853.4777, Option 4 or email: perfmgmt@enron.com 17 | London: 44.207.783.4040, Option 4 or email: pep.enquiries@enron." 18 | 2128,enron_emails," This e-mail is the property of Enron Corp. and/or its relevant 19 | > affiliate and may contain confidential and privileged material for the 20 | > sole use of the intended recipient (s). Any review, use, distribution or 21 | > disclosure by others is strictly prohibited. If you are not the intended 22 | > recipient (or authorized to receive for the recipient), please contact 23 | > the sender or reply to Enron Corp. at 24 | > enron.messaging.administration@enron.com and delete all copies of the 25 | > message. This e-mail (and any attachments hereto) are not intended to be 26 | > an offer (or an acceptance) and do not create or evidence a binding and 27 | > enforceable contract between Enron Corp. (or any of its affiliates) and 28 | > the intended recipient or any other party, and may not be relied on by 29 | > anyone as the basis of a contract by estoppel or otherwise. Thank you" 30 | 8151,enron_emails,"B 31 | 32 | Andrew B. Brown 33 | Ellison, Schneider & Harris, LLP 34 | 2015 H Street 35 | Sacramento, CA 95814 36 | Phone: (916) 447-2166 37 | Fax: (916) 447-3512 38 | mailto:abb@eslawfirm.com 39 | 40 | CONFIDENTIALITY NOTICE: This communication and any accompanying document(s) 41 | are confidential and privileged. They are intended for the sole use of the 42 | addressee. If you receive this transmission in error, you are advised that 43 | any disclosure, copying, distribution, or the taking of any action in 44 | reliance upon the communication is strictly prohibited. Moreover, any such 45 | inadvertent disclosure shall not compromise or waive the attorney-client 46 | privilege as to this communication or otherwise. If you have received this 47 | communication in error, please contact the sender at the internet address 48 | indicated or by telephone at (916)" 49 | 416,enron_emails," 50 | 51 | To: Karen Lambert/HOU/ECT@ECT, Tana Jones/HOU/ECT@ECT, Samuel 52 | Schott/HOU/ECT@ECT, Sheri Thomas/HOU/ECT@ECT, Mark Taylor/HOU/ECT@ECT, 53 | Bernice Rodriguez/HOU/ECT@ECT, Brant Reves/HOU/ECT@ECT, Debbie R 54 | Brackett/HOU/ECT@ECT, David Hardy/LON/ECT@ECT, Lesli Campbell/HOU/ECT@ECT, 55 | Molly Harris/HOU/ECT@ECT, Cynthia Clark/Corp/Enron@ENRON, Mary G 56 | Gosnell/HOU/ECT@ECT, Enron Europe Global Contracts and Facilities, Enron 57 | Europe Global CounterParty, Stephanie Sever/HOU/ECT@ECT, Bradley 58 | Diebner/HOU/ECT@ECT, Stacey Richardson/HOU/ECT@" 59 | 5275,enron_emails," 60 | ------------------------------------------------------------------------------ 61 | NEW E-MAIL ADDRESSES AT PAUL, HASTINGS, JANOFSKY & WALKER LLP 62 | 63 | We have changed our e-mail address. Our new domain name is 64 | paulhastings.com. In most cases, our address is composed of 65 | conventional first name and last name plus @paulhastings.com. Here are 66 | two examples: janesmith@paulhastings.com and danjones@paulhastings.com. 67 | If you have any questions, please contact us at noc@paulhastings.com. 68 | 69 | ============================================================================== 70 | ""The information transmitted is intended only for the person or entity 71 | to which it is addressed and may contain confidential and/or privileged 72 | material. Any review, retransmission, dissemination or other use of, or 73 | taking of any action in reliance upon, this information by persons" 74 | 9293,enron_emails,".fantasy.sportsline.com/mp/options-ereports?league=ene&owner=45547.3>click here
75 |
76 |
NFL Reports, Player Updates 
77 | 31 | 32 | 33 | 34 | 35 | 38 |
39 |
40 |