├── .gitignore
├── Datasets.py
├── README.md
├── configs
├── dialog.json
├── dialog_dp
│ ├── dp_0.0.json
│ ├── dp_0.05.json
│ ├── dp_0.1.json
│ ├── dp_0.2.json
│ ├── dp_0.3.json
│ ├── dp_0.4.json
│ ├── dp_0.5.json
│ ├── dp_0.6.json
│ ├── dp_0.7.json
│ ├── dp_0.8.json
│ └── dp_0.9.json
├── dialog_st
│ ├── dialog_suffix_tree.json
│ ├── dialog_suffix_tree1.json
│ ├── dialog_suffix_tree2.json
│ ├── dialog_suffix_tree3.json
│ └── dialog_suffix_tree4.json
├── dialog_suffix_tree_debug.json
├── dialog_unlearn
│ ├── dialog.json
│ ├── dialog1.json
│ ├── dialog2.json
│ ├── dialog3.json
│ └── dialog4.json
├── dp
│ ├── 1.3B_0.json
│ ├── 1.3B_1.json
│ ├── 1.3B_2.json
│ ├── 1.3B_3.json
│ ├── 1.3B_4.json
│ ├── 1.3B_general.json
│ ├── 125M_0.json
│ ├── 125M_1.json
│ ├── 125M_2.json
│ ├── 125M_3.json
│ ├── 125M_4.json
│ ├── 125M_general.json
│ ├── 2.7B_0.json
│ ├── 2.7B_1.json
│ ├── 2.7B_2.json
│ ├── 2.7B_3.json
│ ├── 2.7B_4.json
│ ├── 2.7B_general.json
│ ├── create_configs.py
│ └── template.json
└── example.json
├── csv_out
└── Dialog Initial.csv
├── data
├── domain_main
│ ├── books3_8_0.csv
│ ├── books3_8_1.csv
│ ├── books3_8_2.csv
│ ├── books3_8_3.csv
│ ├── books3_8_4.csv
│ ├── enron_emails_8_0.csv
│ ├── enron_emails_8_1.csv
│ ├── enron_emails_8_2.csv
│ ├── enron_emails_8_3.csv
│ ├── enron_emails_8_4.csv
│ ├── freelaw_8_0.csv
│ ├── freelaw_8_1.csv
│ ├── freelaw_8_2.csv
│ ├── freelaw_8_3.csv
│ ├── freelaw_8_4.csv
│ ├── github_8_0.csv
│ ├── github_8_1.csv
│ ├── github_8_2.csv
│ ├── github_8_3.csv
│ ├── github_8_4.csv
│ ├── license_8_0.csv
│ ├── license_8_1.csv
│ ├── license_8_2.csv
│ ├── license_8_3.csv
│ ├── license_8_4.csv
│ ├── pile-cc_8_0.csv
│ ├── pile-cc_8_1.csv
│ ├── pile-cc_8_2.csv
│ ├── pile-cc_8_3.csv
│ ├── pile-cc_8_4.csv
│ ├── pubmed_central_8_0.csv
│ ├── pubmed_central_8_1.csv
│ ├── pubmed_central_8_2.csv
│ ├── pubmed_central_8_3.csv
│ ├── pubmed_central_8_4.csv
│ ├── uspto_backgrounds_8_0.csv
│ ├── uspto_backgrounds_8_1.csv
│ ├── uspto_backgrounds_8_2.csv
│ ├── uspto_backgrounds_8_3.csv
│ └── uspto_backgrounds_8_4.csv
└── main
│ ├── lm_extraction_128_0.csv
│ ├── lm_extraction_128_1.csv
│ ├── lm_extraction_128_2.csv
│ ├── lm_extraction_128_3.csv
│ ├── lm_extraction_128_4.csv
│ ├── lm_extraction_1_0.csv
│ ├── lm_extraction_1_1.csv
│ ├── lm_extraction_1_2.csv
│ ├── lm_extraction_1_3.csv
│ ├── lm_extraction_1_4.csv
│ ├── lm_extraction_32_0.csv
│ ├── lm_extraction_32_1.csv
│ ├── lm_extraction_32_2.csv
│ ├── lm_extraction_32_3.csv
│ ├── lm_extraction_32_4.csv
│ ├── lm_extraction_4_0.csv
│ ├── lm_extraction_4_1.csv
│ ├── lm_extraction_4_2.csv
│ ├── lm_extraction_4_3.csv
│ ├── lm_extraction_4_4.csv
│ ├── lm_extraction_8_0.csv
│ ├── lm_extraction_8_1.csv
│ ├── lm_extraction_8_2.csv
│ ├── lm_extraction_8_3.csv
│ └── lm_extraction_8_4.csv
├── fig1.png
├── models
├── Neo_Model.py
├── Neo_Model_DP.py
├── Neo_Model_suffix_tree.py
└── Neo_Model_valid.py
├── outputs
├── example.csv
├── init_DP-1.3B_0.0.csv
├── init_DP-1.3B_0.1.csv
├── init_DP-1.3B_0.2.csv
├── init_DP-1.3B_0.3.csv
├── init_DP-1.3B_0.4.csv
├── init_DP-1.3B_0.5.csv
├── init_DP-1.3B_0.6.csv
├── init_DP-1.3B_0.7.csv
├── init_DP-1.3B_0.8.csv
├── init_DP-1.3B_0.9.csv
└── init_example.csv
├── requirements.txt
├── run.py
├── run_dp.py
├── run_st.py
├── utils.py
└── validation_data
├── blended_skill_talk.json
├── empathetic_dialogues.json
├── lambada.csv
├── pile.csv
├── pubmed_qa.csv
├── valid_dm_mathematics.csv
├── wikitext.csv
├── wizard_of_internet.json
└── wizard_of_wikipedia.json
/.gitignore:
--------------------------------------------------------------------------------
1 | deepspeed
2 |
3 | #evaluation
4 | ckpt
5 |
6 | wandb
7 | tbImport.log
8 |
9 | __pycache__
10 | models/__pycache__
11 |
12 | logs
13 |
14 | *.pyc
15 |
16 | test.py
17 | nohup.out
18 | .fuse*
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Knowledge Unlearning for Mitigating Privacy Risks in Langauge Models
2 |
3 | 
4 |
5 | paper link: https://arxiv.org/abs/2210.01504
6 |
7 | In order to reproduce our results, take the following steps:
8 | ### 1. Create conda environment and install requirements
9 | ```
10 | conda create -n ufl python=3.8
11 | conda activate ufl
12 | # Install the correct torch version depending on CUDA version from https://pytorch.org/
13 | conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
14 | pip install -r requirements.txt
15 | ```
16 |
17 | ### 2. In order to run the basic code, use the following command
18 | ```
19 | python run.py --config configs/example.json
20 | ```
21 |
22 | ### 3. Reproducing Experimental Results
23 |
24 | **Configs**
25 | - mode (string) : Either "unlearn" or "general_lm_eval"
26 | - "unlearn" will measure MA and EL for validation sets with valid_type_path == "target", for others it will run normal evaluation
27 | - "general_lm_eval" will run normal evaluation for all validation sets, only use when not evaulating the target data (the data that should be unlearned)
28 | - check_validation_only (bool) : If true, a single validation loop will run without training
29 | - do_init_eval (bool) : Whether to run a single validation loop before training
30 | - train_set (string) : Path to train_set, should be a .csv file
31 | - valid_sets (list[string]) : List containing validation set info
32 | - Could either be a .csv file path, or the dataset name on Huggingface hub
33 | - valid_subset_path (list[string]) : Subset name of the dataset from HF hub
34 | - If it does not have a subset, or is a .csv file the string will be ignored
35 | - valid_type_path (list[string]) : Type of the valdiation data
36 | - If it's the target data pass "target"
37 | - If it's a HF hub data pass the appropriate type
38 | - If it's a .csv file the string will be ignored
39 | - el_n (list[int]) : list of n values for EL
40 | - el_threshold (float) : The models EL score for unseen data, exact values for each models in paper
41 | - ma_threshold (float) : The models MA score for unseen data, exact values for each models in paper
42 | - min_train_epochs (int) : Guarantees the minimum amount of epochs
43 | - By default the model will stop training when it reaches both el_threshold and ma_threshold
44 | - This configuration will give some control over this behaviour
45 | - target_length (int) : The token length of the unlearning target data
46 | - input_length, output_length (int) : The token length of the input, output for LM evaluation tasks
47 | - strategy : Strategy passed to Lightning Trainer()
48 | - The code was tested with "deepspeed_stage_2" and "deepspeed_stage_2_offload"
49 |
50 | **Note**
51 | - The effective batch size (train_batch_size * gradient_accumulation_steps * ngpu) should be identical to the train set size
52 | - We found that minimizing gradient updates is crucial for retaining LM performance
53 | - If "effective batch size" != "train set size" the code will throw an error
54 | - The eval_batch_size will be replaced with train_batch_size only for "target" data, because "target" data are usually much smaller than LM eval data
55 | - This also speeds up the evaluation, because it guarantees a single eval step
56 | - The code will save two .csv files to "outputs/". They contain MA and EL scores for each individual examples within the target data
57 | - One contains the validation results measured before training
58 | - The other contains the validation results throughout training
59 |
--------------------------------------------------------------------------------
/configs/dialog.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Dialog Initial",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_espoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_0.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.0.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.0",
5 | "lambda_weight": 0,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.05.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.05",
5 | "lambda_weight": 0.05,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.1",
5 | "lambda_weight": 0.1,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.2",
5 | "lambda_weight": 0.2,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.3",
5 | "lambda_weight": 0.3,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.4",
5 | "lambda_weight": 0.4,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.5.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.5",
5 | "lambda_weight": 0.5,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.6.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.6",
5 | "lambda_weight": 0.6,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.7.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.7",
5 | "lambda_weight": 0.7,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.8.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.8",
5 | "lambda_weight": 0.8,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_dp/dp_0.9.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "DP-1.3B_0.9",
5 | "lambda_weight": 0.9,
6 | "num_train_epochs": 20,
7 | "check_val_every_n_epoch": 1,
8 | "check_validation_only": true,
9 | "do_init_eval": true,
10 | "train_set": "data/main/lm_extraction_32_0.csv",
11 | "valid_sets": [
12 | "data/main/lm_extraction_32_0.csv",
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | "",
23 | ""
24 | ],
25 | "valid_type_path": [
26 | "target",
27 | "",
28 | "",
29 | "",
30 | ""
31 | ],
32 | "train_batch_size": 32,
33 | "eval_batch_size": 32,
34 | "gradient_accumulation_steps": 1,
35 | "ngpu": 1,
36 | "learning_rate": 5e-5,
37 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
38 | "el_threshold": 0.0499,
39 | "ma_threshold": 0.2994,
40 | "input_length": 512,
41 | "output_length": 512,
42 | "target_length": 200,
43 | "num_workers": 64,
44 | "strategy": "deepspeed_stage_2_offload",
45 | "fp16": true,
46 | "wandb_log": true
47 | }
48 |
--------------------------------------------------------------------------------
/configs/dialog_st/dialog_suffix_tree.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Suffix-1.3B_32_0",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_0.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_st/dialog_suffix_tree1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Suffix-1.3B_32_1",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_1.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_st/dialog_suffix_tree2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Suffix-1.3B_32_2",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_2.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_st/dialog_suffix_tree3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Suffix-1.3B_32_3",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_3.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_st/dialog_suffix_tree4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Suffix-1.3B_32_4",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_4.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | "",
21 | ""
22 | ],
23 | "valid_type_path": [
24 | "target",
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dialog_suffix_tree_debug.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning",
4 | "wandb_run_name": "example",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": true,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_0.csv",
10 | "valid_sets": [
11 | "data/main/lm_extraction_32_0.csv"
12 | ],
13 | "valid_subset_path": [
14 | ""
15 | ],
16 | "valid_type_path": [
17 | "target"
18 | ],
19 | "train_batch_size": 32,
20 | "eval_batch_size": 32,
21 | "gradient_accumulation_steps": 1,
22 | "ngpu": 1,
23 | "learning_rate": 5e-5,
24 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
25 | "el_threshold": 0.0499,
26 | "ma_threshold": 0.2994,
27 | "input_length": 512,
28 | "output_length": 512,
29 | "target_length": 200,
30 | "num_workers": 64,
31 | "strategy": "deepspeed_stage_2_offload",
32 | "fp16": true,
33 | "wandb_log": true
34 | }
35 |
--------------------------------------------------------------------------------
/configs/dialog_unlearn/dialog.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Unlearn-1.3B_32_0",
5 | "num_train_epochs": 13,
6 | "check_val_every_n_epoch": 13,
7 | "check_validation_only": false,
8 | "do_init_eval": false,
9 | "train_set": "data/main/lm_extraction_32_0.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | ""
21 | ],
22 | "valid_type_path": [
23 | "",
24 | "",
25 | "",
26 | ""
27 | ],
28 | "train_batch_size": 8,
29 | "eval_batch_size": 32,
30 | "gradient_accumulation_steps": 1,
31 | "ngpu": 4,
32 | "learning_rate": 5e-5,
33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
34 | "el_threshold": 0.0499,
35 | "ma_threshold": 0.2994,
36 | "input_length": 512,
37 | "output_length": 512,
38 | "target_length": 200,
39 | "num_workers": 64,
40 | "strategy": "deepspeed_stage_2_offload",
41 | "fp16": true,
42 | "wandb_log": true
43 | }
44 |
--------------------------------------------------------------------------------
/configs/dialog_unlearn/dialog1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Unlearn-1.3B_32_1",
5 | "num_train_epochs": 14,
6 | "check_val_every_n_epoch": 14,
7 | "check_validation_only": false,
8 | "do_init_eval": false,
9 | "train_set": "data/main/lm_extraction_32_1.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | ""
21 | ],
22 | "valid_type_path": [
23 | "",
24 | "",
25 | "",
26 | ""
27 | ],
28 | "train_batch_size": 8,
29 | "eval_batch_size": 32,
30 | "gradient_accumulation_steps": 1,
31 | "ngpu": 4,
32 | "learning_rate": 5e-5,
33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
34 | "el_threshold": 0.0499,
35 | "ma_threshold": 0.2994,
36 | "input_length": 512,
37 | "output_length": 512,
38 | "target_length": 200,
39 | "num_workers": 64,
40 | "strategy": "deepspeed_stage_2_offload",
41 | "fp16": true,
42 | "wandb_log": true
43 | }
44 |
--------------------------------------------------------------------------------
/configs/dialog_unlearn/dialog2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Unlearn-1.3B_32_2",
5 | "num_train_epochs": 13,
6 | "check_val_every_n_epoch": 13,
7 | "check_validation_only": false,
8 | "do_init_eval": false,
9 | "train_set": "data/main/lm_extraction_32_2.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | ""
21 | ],
22 | "valid_type_path": [
23 | "",
24 | "",
25 | "",
26 | ""
27 | ],
28 | "train_batch_size": 8,
29 | "eval_batch_size": 32,
30 | "gradient_accumulation_steps": 1,
31 | "ngpu": 4,
32 | "learning_rate": 5e-5,
33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
34 | "el_threshold": 0.0499,
35 | "ma_threshold": 0.2994,
36 | "input_length": 512,
37 | "output_length": 512,
38 | "target_length": 200,
39 | "num_workers": 64,
40 | "strategy": "deepspeed_stage_2_offload",
41 | "fp16": true,
42 | "wandb_log": true
43 | }
44 |
--------------------------------------------------------------------------------
/configs/dialog_unlearn/dialog3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Unlearn-1.3B_32_3",
5 | "num_train_epochs": 14,
6 | "check_val_every_n_epoch": 14,
7 | "check_validation_only": false,
8 | "do_init_eval": false,
9 | "train_set": "data/main/lm_extraction_32_3.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | ""
21 | ],
22 | "valid_type_path": [
23 | "",
24 | "",
25 | "",
26 | ""
27 | ],
28 | "train_batch_size": 8,
29 | "eval_batch_size": 32,
30 | "gradient_accumulation_steps": 1,
31 | "ngpu": 4,
32 | "learning_rate": 5e-5,
33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
34 | "el_threshold": 0.0499,
35 | "ma_threshold": 0.2994,
36 | "input_length": 512,
37 | "output_length": 512,
38 | "target_length": 200,
39 | "num_workers": 64,
40 | "strategy": "deepspeed_stage_2_offload",
41 | "fp16": true,
42 | "wandb_log": true
43 | }
44 |
--------------------------------------------------------------------------------
/configs/dialog_unlearn/dialog4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "wandb_project": "Knowledge Unlearning Dialog",
4 | "wandb_run_name": "Unlearn-1.3B_32_4",
5 | "num_train_epochs": 15,
6 | "check_val_every_n_epoch": 15,
7 | "check_validation_only": false,
8 | "do_init_eval": false,
9 | "train_set": "data/main/lm_extraction_32_4.csv",
10 | "valid_sets": [
11 | "validation_data/wizard_of_wikipedia.json",
12 | "validation_data/empathetic_dialogues.json",
13 | "validation_data/blended_skill_talk.json",
14 | "validation_data/wizard_of_internet.json"
15 | ],
16 | "valid_subset_path": [
17 | "",
18 | "",
19 | "",
20 | ""
21 | ],
22 | "valid_type_path": [
23 | "",
24 | "",
25 | "",
26 | ""
27 | ],
28 | "train_batch_size": 8,
29 | "eval_batch_size": 32,
30 | "gradient_accumulation_steps": 1,
31 | "ngpu": 4,
32 | "learning_rate": 5e-5,
33 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
34 | "el_threshold": 0.0499,
35 | "ma_threshold": 0.2994,
36 | "input_length": 512,
37 | "output_length": 512,
38 | "target_length": 200,
39 | "num_workers": 64,
40 | "strategy": "deepspeed_stage_2_offload",
41 | "fp16": true,
42 | "wandb_log": true
43 | }
44 |
--------------------------------------------------------------------------------
/configs/dp/1.3B_0.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_0",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_0.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/1.3B_1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_1",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_1.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_1.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/1.3B_2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_2",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_2.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_2.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/1.3B_3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_3",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_3.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_3.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/1.3B_4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_4",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_4.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_4.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/1.3B_general.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_General",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | ""
23 | ],
24 | "valid_type_path": [
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dp/125M_0.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_0",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_0.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/125M_1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_1",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_1.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_1.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/125M_2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_2",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_2.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_2.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/125M_3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_3",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_3.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_3.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/125M_4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_4",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_4.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_4.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/125M_general.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-125M_General",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | ""
23 | ],
24 | "valid_type_path": [
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dp/2.7B_0.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_0",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_0.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/2.7B_1.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_1",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_1.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_1.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/2.7B_2.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_2",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_2.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_2.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/2.7B_3.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_3",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_3.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_3.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/2.7B_4.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_4",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_4.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_4.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-05,
26 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
--------------------------------------------------------------------------------
/configs/dp/2.7B_general.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "general_lm_eval",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-2.7B_General",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "validation_data/wizard_of_wikipedia.json",
14 | "validation_data/empathetic_dialogues.json",
15 | "validation_data/blended_skill_talk.json",
16 | "validation_data/wizard_of_internet.json"
17 | ],
18 | "valid_subset_path": [
19 | "",
20 | "",
21 | "",
22 | ""
23 | ],
24 | "valid_type_path": [
25 | "",
26 | "",
27 | "",
28 | ""
29 | ],
30 | "train_batch_size": 32,
31 | "eval_batch_size": 32,
32 | "gradient_accumulation_steps": 1,
33 | "ngpu": 1,
34 | "learning_rate": 5e-5,
35 | "model_name_or_path": "EleutherAI/gpt-neo-2.7B",
36 | "el_threshold": 0.0499,
37 | "ma_threshold": 0.2994,
38 | "input_length": 512,
39 | "output_length": 512,
40 | "target_length": 200,
41 | "num_workers": 64,
42 | "strategy": "deepspeed_stage_2_offload",
43 | "fp16": true,
44 | "wandb_log": true
45 | }
46 |
--------------------------------------------------------------------------------
/configs/dp/create_configs.py:
--------------------------------------------------------------------------------
1 | import json
2 |
3 | model = '2.7B'
4 | batch = 128
5 | f = open(f'/home/lklab/knowledge-unlearning/configs/dp/template.json')
6 | data = json.load(f)
7 | print(data)
8 |
9 | for i in range(0, 5):
10 | data['wandb_run_name'] = f'DP-0.2-{model}_{i}'
11 | data['model_name_or_path'] = f"EleutherAI/gpt-neo-{model}"
12 | data['train_set'] = f'data/main/lm_extraction_32_{i}.csv'
13 | data['valid_sets'][0] = f'data/main/lm_extraction_32_{i}.csv'
14 | with open(f'/home/lklab/knowledge-unlearning/configs/dp/{model}_{i}.json', 'w') as fp:
15 | json.dump(data, fp, indent=4)
--------------------------------------------------------------------------------
/configs/dp/template.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "privacy_method": "dp",
4 | "wandb_project": "Knowledge Unlearning Dialog",
5 | "wandb_run_name": "DP-0.2-1.3B_0",
6 | "lambda_weight": 0.2,
7 | "num_train_epochs": 20,
8 | "check_val_every_n_epoch": 1,
9 | "check_validation_only": true,
10 | "do_init_eval": true,
11 | "train_set": "data/main/lm_extraction_32_0.csv",
12 | "valid_sets": [
13 | "data/main/lm_extraction_32_0.csv"
14 | ],
15 | "valid_subset_path": [
16 | ""
17 | ],
18 | "valid_type_path": [
19 | "target"
20 | ],
21 | "train_batch_size": 32,
22 | "eval_batch_size": 32,
23 | "gradient_accumulation_steps": 1,
24 | "ngpu": 1,
25 | "learning_rate": 5e-5,
26 | "model_name_or_path": "EleutherAI/gpt-neo-1.3B",
27 | "el_threshold": 0.0499,
28 | "ma_threshold": 0.2994,
29 | "input_length": 512,
30 | "output_length": 512,
31 | "target_length": 200,
32 | "num_workers": 64,
33 | "strategy": "deepspeed_stage_2_offload",
34 | "fp16": true,
35 | "wandb_log": true
36 | }
37 |
--------------------------------------------------------------------------------
/configs/example.json:
--------------------------------------------------------------------------------
1 | {
2 | "mode": "unlearn",
3 | "wandb_project": "Knowledge Unlearning",
4 | "wandb_run_name": "example",
5 | "num_train_epochs": 20,
6 | "check_val_every_n_epoch": 1,
7 | "check_validation_only": false,
8 | "do_init_eval": true,
9 | "train_set": "data/main/lm_extraction_32_0.csv",
10 | "valid_sets": [
11 | "data/main/lm_extraction_32_0.csv",
12 | "validation_data/lambada.csv",
13 | "piqa",
14 | "hellaswag",
15 | "ai2_arc",
16 | "ai2_arc",
17 | "super_glue",
18 | "winogrande",
19 | "math_qa",
20 | "validation_data/pubmed_qa.csv"
21 | ],
22 | "valid_subset_path": [
23 | "",
24 | "",
25 | "",
26 | "",
27 | "ARC-Easy",
28 | "ARC-Challenge",
29 | "copa",
30 | "winogrande_s",
31 | "",
32 | ""
33 | ],
34 | "valid_type_path": [
35 | "target",
36 | "test",
37 | "validation",
38 | "validation",
39 | "validation",
40 | "validation",
41 | "validation",
42 | "validation",
43 | "validation",
44 | ""
45 | ],
46 | "train_batch_size": 8,
47 | "eval_batch_size": 8,
48 | "gradient_accumulation_steps": 4,
49 | "ngpu": 1,
50 | "learning_rate": 5e-5,
51 | "model_name_or_path": "EleutherAI/gpt-neo-125M",
52 | "el_threshold": 0.0499,
53 | "ma_threshold": 0.2994,
54 | "input_length": 512,
55 | "output_length": 512,
56 | "target_length": 200,
57 | "num_workers": 64,
58 | "strategy": "deepspeed_stage_2_offload",
59 | "fp16": true,
60 | "wandb_log": true
61 | }
62 |
--------------------------------------------------------------------------------
/csv_out/Dialog Initial.csv:
--------------------------------------------------------------------------------
1 | ,wizard_of_wikipedia/loss,wizard_of_wikipedia/f1,empathetic_dialogues/loss,empathetic_dialogues/f1,blended_skill_talk/loss,blended_skill_talk/f1,wizard_of_internet/loss,wizard_of_internet/f1
2 | 0,4.13671875,0.07524767518043518,3.724609375,0.08438178896903992,3.865234375,0.11232497543096542,3.83203125,0.1023608073592186
3 |
--------------------------------------------------------------------------------
/data/domain_main/enron_emails_8_1.csv:
--------------------------------------------------------------------------------
1 | doc_id,corpus,text
2 | 2809,enron_emails,",
3 | ?
4 | You have been selected to participate in the Mid Year 2001 Performance
5 | Management process. Your feedback plays an important role in the process,
6 | and your participation is critical to the success of Enron's Performance
7 | Management goals.
8 | ?
9 | To complete a request for feedback, access PEP at http://pep.enron.com and
10 | select Complete Feedback from the Main Menu. You may begin providing
11 | feedback immediately and are requested to have all feedback forms completed
12 | by Friday, May 25, 2001.
13 | ?
14 | If you have any questions regarding PEP or your responsibility in the
15 | process, please contact the PEP Help Desk at:
16 | Houston: 1.713.853.4777, Option 4 or email: perfmgmt@enron.com
17 | London: 44.207.783.4040, Option 4 or email: pep.enquiries@enron."
18 | 2128,enron_emails," This e-mail is the property of Enron Corp. and/or its relevant
19 | > affiliate and may contain confidential and privileged material for the
20 | > sole use of the intended recipient (s). Any review, use, distribution or
21 | > disclosure by others is strictly prohibited. If you are not the intended
22 | > recipient (or authorized to receive for the recipient), please contact
23 | > the sender or reply to Enron Corp. at
24 | > enron.messaging.administration@enron.com and delete all copies of the
25 | > message. This e-mail (and any attachments hereto) are not intended to be
26 | > an offer (or an acceptance) and do not create or evidence a binding and
27 | > enforceable contract between Enron Corp. (or any of its affiliates) and
28 | > the intended recipient or any other party, and may not be relied on by
29 | > anyone as the basis of a contract by estoppel or otherwise. Thank you"
30 | 8151,enron_emails,"B
31 |
32 | Andrew B. Brown
33 | Ellison, Schneider & Harris, LLP
34 | 2015 H Street
35 | Sacramento, CA 95814
36 | Phone: (916) 447-2166
37 | Fax: (916) 447-3512
38 | mailto:abb@eslawfirm.com
39 |
40 | CONFIDENTIALITY NOTICE: This communication and any accompanying document(s)
41 | are confidential and privileged. They are intended for the sole use of the
42 | addressee. If you receive this transmission in error, you are advised that
43 | any disclosure, copying, distribution, or the taking of any action in
44 | reliance upon the communication is strictly prohibited. Moreover, any such
45 | inadvertent disclosure shall not compromise or waive the attorney-client
46 | privilege as to this communication or otherwise. If you have received this
47 | communication in error, please contact the sender at the internet address
48 | indicated or by telephone at (916)"
49 | 416,enron_emails,"
50 |
51 | To: Karen Lambert/HOU/ECT@ECT, Tana Jones/HOU/ECT@ECT, Samuel
52 | Schott/HOU/ECT@ECT, Sheri Thomas/HOU/ECT@ECT, Mark Taylor/HOU/ECT@ECT,
53 | Bernice Rodriguez/HOU/ECT@ECT, Brant Reves/HOU/ECT@ECT, Debbie R
54 | Brackett/HOU/ECT@ECT, David Hardy/LON/ECT@ECT, Lesli Campbell/HOU/ECT@ECT,
55 | Molly Harris/HOU/ECT@ECT, Cynthia Clark/Corp/Enron@ENRON, Mary G
56 | Gosnell/HOU/ECT@ECT, Enron Europe Global Contracts and Facilities, Enron
57 | Europe Global CounterParty, Stephanie Sever/HOU/ECT@ECT, Bradley
58 | Diebner/HOU/ECT@ECT, Stacey Richardson/HOU/ECT@"
59 | 5275,enron_emails,"
60 | ------------------------------------------------------------------------------
61 | NEW E-MAIL ADDRESSES AT PAUL, HASTINGS, JANOFSKY & WALKER LLP
62 |
63 | We have changed our e-mail address. Our new domain name is
64 | paulhastings.com. In most cases, our address is composed of
65 | conventional first name and last name plus @paulhastings.com. Here are
66 | two examples: janesmith@paulhastings.com and danjones@paulhastings.com.
67 | If you have any questions, please contact us at noc@paulhastings.com.
68 |
69 | ==============================================================================
70 | ""The information transmitted is intended only for the person or entity
71 | to which it is addressed and may contain confidential and/or privileged
72 | material. Any review, retransmission, dissemination or other use of, or
73 | taking of any action in reliance upon, this information by persons"
74 | 9293,enron_emails,".fantasy.sportsline.com/mp/options-ereports?league=ene&owner=45547.3>click here
75 |
76 |
NFL Reports, Player Updates |