├── .copyright.tmpl
├── .pre-commit-config.yaml
├── CODEOWNERS
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING-ARCHIVED.md
├── LICENSE.txt
├── README.md
├── SECURIITY.md
├── SECURITY.md
├── data
├── Beauty.txt
├── README.md
├── Sports_and_Outdoors.txt
├── Toys_and_Games.txt
├── Yelp.txt
└── ml-1m.txt
├── img
├── model.png
└── motivation_sports.png
└── src
├── data_augmentation.py
├── datasets.py
├── main.py
├── models.py
├── modules.py
├── output
├── ICLRec-Beauty-1.pt
├── ICLRec-Sports_and_Outdoors-1.pt
├── ICLRec-Toys_and_Games-1.pt
├── ICLRec-Yelp-1.pt
├── ICLRec-ml-1m-1.pt
└── README.md
├── scripts
├── run_beauty.sh
├── run_ml_1m.sh
├── run_sports.sh
├── run_toys.sh
└── run_yelp.sh
├── trainers.py
└── utils.py
/.copyright.tmpl:
--------------------------------------------------------------------------------
1 | Copyright (c) ${years} ${owner}
2 | All rights reserved.
3 | SPDX-License-Identifier: BSD-3-Clause
4 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
5 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/psf/black
3 | rev: '19.3b0'
4 | hooks:
5 | - id: black
6 | args: ["--line-length", "120"]
7 | - repo: https://github.com/johann-petrak/licenseheaders.git
8 | rev: 'v0.8.8'
9 | hooks:
10 | - id: licenseheaders
11 | args: ["-t", ".copyright.tmpl", "-cy", "-o", "salesforce.com, inc.",
12 | "-E", ".py", "-f"]
--------------------------------------------------------------------------------
/CODEOWNERS:
--------------------------------------------------------------------------------
1 | # Comment line immediately above ownership line is reserved for related other information. Please be careful while editing.
2 | #ECCN:Open Source
3 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Salesforce Open Source Community Code of Conduct
2 |
3 | ## About the Code of Conduct
4 |
5 | Equality is a core value at Salesforce. We believe a diverse and inclusive
6 | community fosters innovation and creativity, and are committed to building a
7 | culture where everyone feels included.
8 |
9 | Salesforce open-source projects are committed to providing a friendly, safe, and
10 | welcoming environment for all, regardless of gender identity and expression,
11 | sexual orientation, disability, physical appearance, body size, ethnicity, nationality,
12 | race, age, religion, level of experience, education, socioeconomic status, or
13 | other similar personal characteristics.
14 |
15 | The goal of this code of conduct is to specify a baseline standard of behavior so
16 | that people with different social values and communication styles can work
17 | together effectively, productively, and respectfully in our open source community.
18 | It also establishes a mechanism for reporting issues and resolving conflicts.
19 |
20 | All questions and reports of abusive, harassing, or otherwise unacceptable behavior
21 | in a Salesforce open-source project may be reported by contacting the Salesforce
22 | Open Source Conduct Committee at ossconduct@salesforce.com.
23 |
24 | ## Our Pledge
25 |
26 | In the interest of fostering an open and welcoming environment, we as
27 | contributors and maintainers pledge to making participation in our project and
28 | our community a harassment-free experience for everyone, regardless of gender
29 | identity and expression, sexual orientation, disability, physical appearance,
30 | body size, ethnicity, nationality, race, age, religion, level of experience, education,
31 | socioeconomic status, or other similar personal characteristics.
32 |
33 | ## Our Standards
34 |
35 | Examples of behavior that contributes to creating a positive environment
36 | include:
37 |
38 | * Using welcoming and inclusive language
39 | * Being respectful of differing viewpoints and experiences
40 | * Gracefully accepting constructive criticism
41 | * Focusing on what is best for the community
42 | * Showing empathy toward other community members
43 |
44 | Examples of unacceptable behavior by participants include:
45 |
46 | * The use of sexualized language or imagery and unwelcome sexual attention or
47 | advances
48 | * Personal attacks, insulting/derogatory comments, or trolling
49 | * Public or private harassment
50 | * Publishing, or threatening to publish, others' private information—such as
51 | a physical or electronic address—without explicit permission
52 | * Other conduct which could reasonably be considered inappropriate in a
53 | professional setting
54 | * Advocating for or encouraging any of the above behaviors
55 |
56 | ## Our Responsibilities
57 |
58 | Project maintainers are responsible for clarifying the standards of acceptable
59 | behavior and are expected to take appropriate and fair corrective action in
60 | response to any instances of unacceptable behavior.
61 |
62 | Project maintainers have the right and responsibility to remove, edit, or
63 | reject comments, commits, code, wiki edits, issues, and other contributions
64 | that are not aligned with this Code of Conduct, or to ban temporarily or
65 | permanently any contributor for other behaviors that they deem inappropriate,
66 | threatening, offensive, or harmful.
67 |
68 | ## Scope
69 |
70 | This Code of Conduct applies both within project spaces and in public spaces
71 | when an individual is representing the project or its community. Examples of
72 | representing a project or community include using an official project email
73 | address, posting via an official social media account, or acting as an appointed
74 | representative at an online or offline event. Representation of a project may be
75 | further defined and clarified by project maintainers.
76 |
77 | ## Enforcement
78 |
79 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
80 | reported by contacting the Salesforce Open Source Conduct Committee
81 | at ossconduct@salesforce.com. All complaints will be reviewed and investigated
82 | and will result in a response that is deemed necessary and appropriate to the
83 | circumstances. The committee is obligated to maintain confidentiality with
84 | regard to the reporter of an incident. Further details of specific enforcement
85 | policies may be posted separately.
86 |
87 | Project maintainers who do not follow or enforce the Code of Conduct in good
88 | faith may face temporary or permanent repercussions as determined by other
89 | members of the project's leadership and the Salesforce Open Source Conduct
90 | Committee.
91 |
92 | ## Attribution
93 |
94 | This Code of Conduct is adapted from the [Contributor Covenant][contributor-covenant-home],
95 | version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html.
96 | It includes adaptions and additions from [Go Community Code of Conduct][golang-coc],
97 | [CNCF Code of Conduct][cncf-coc], and [Microsoft Open Source Code of Conduct][microsoft-coc].
98 |
99 | This Code of Conduct is licensed under the [Creative Commons Attribution 3.0 License][cc-by-3-us].
100 |
101 | [contributor-covenant-home]: https://www.contributor-covenant.org (https://www.contributor-covenant.org/)
102 | [golang-coc]: https://golang.org/conduct
103 | [cncf-coc]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
104 | [microsoft-coc]: https://opensource.microsoft.com/codeofconduct/
105 | [cc-by-3-us]: https://creativecommons.org/licenses/by/3.0/us/
106 |
--------------------------------------------------------------------------------
/CONTRIBUTING-ARCHIVED.md:
--------------------------------------------------------------------------------
1 | # ARCHIVED
2 |
3 | This project is `Archived` and is no longer actively maintained;
4 | We are not accepting contributions or Pull Requests.
5 |
6 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2021, Salesforce.com, Inc.
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
7 |
8 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
9 |
10 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
11 |
12 | 3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
13 |
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
15 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Intent Contrastive Learning for Sequential Recommendation (ICLRec)
2 |
3 | Source code for paper: [Intent Contrastive Learning for Sequential Recommendation](https://arxiv.org/pdf/2202.02519.pdf)
4 |
5 | ## Introduction
6 |
7 | Motivation:
8 |
9 | Users' interactions with items are driven by various underlying intents. These intents are often unobservable while potentially beneficial to learn a better users' preferences toward massive item set.
10 |
11 |
12 |
13 | Model Architecture:
14 |
15 |
16 |
17 | ## Reference
18 |
19 | Please cite our paper if you use this code.
20 |
21 | ```
22 | @article{chen2022intent,
23 | title={Intent Contrastive Learning for Sequential Recommendation},
24 | author={Chen, Yongjun and Liu, Zhiwei and Li, Jia and McAuley, Julian and Xiong, Caiming},
25 | journal={arXiv preprint arXiv:2202.02519},
26 | year={2022}
27 | }
28 | ```
29 |
30 | ## Implementation
31 | ### Requirements
32 |
33 | Python >= 3.7
34 | Pytorch >= 1.2.0
35 | tqdm == 4.26.0
36 | faiss-gpu==1.7.1
37 |
38 | ### Datasets
39 |
40 | Four prepared datasets are included in `data` folder.
41 |
42 |
43 | ### Evaluate Model
44 |
45 | We provide the trained models on Beauty, Sports_and_Games, Toys_and_Games, and Yelp datasets in `./src/output` folder. You can directly evaluate the trained models on test set by running:
46 |
47 | ```
48 | python main.py --data_name --model_idx 1 --do_eval
49 | ```
50 |
51 | You are expected following results:
52 |
53 | On Beauty:
54 | ```
55 | {'Epoch': 0, 'HIT@5': '0.0500', 'NDCG@5': '0.0326', 'HIT@10': '0.0744', 'NDCG@10': '0.0403', 'HIT@20': '0.1058', 'NDCG@20': '0.0483'}
56 | ```
57 | On Sports:
58 | ```
59 | {'Epoch': 0, 'HIT@5': '0.0290', 'NDCG@5': '0.0191', 'HIT@10': '0.0437', 'NDCG@10': '0.0238', 'HIT@20': '0.0646', 'NDCG@20': '0.0291'}
60 | ```
61 | On Toys:
62 |
63 | ```
64 | {'Epoch': 0, 'HIT@5': '0.0598', 'NDCG@5': '0.0404', 'HIT@10': '0.0834', 'NDCG@10': '0.0480', 'HIT@20': '0.1138', 'NDCG@20': '0.0557'}
65 | ```
66 |
67 | On Yelp:
68 | ```
69 | {'Epoch': 0, 'HIT@5': '0.0240', 'NDCG@5': '0.0153', 'HIT@10': '0.0409', 'NDCG@10': '0.0207', 'HIT@20': '0.0659', 'NDCG@20': '0.0270'}
70 | ```
71 |
72 | Please feel free to test is out!
73 |
74 |
75 | ### Train Model
76 |
77 | To train ICLRec on a specific dataset, change to the `src` folder and run following command:
78 |
79 | ```
80 | bash scripts/run_.sh
81 | ```
82 |
83 | The script will automatically train ICLRec and save the best model found in validation set, and then evaluate on test set.
84 |
85 |
86 | ## Acknowledgment
87 | - Transformer and training pipeline are implemented based on [S3-Rec](https://github.com/RUCAIBox/CIKM2020-S3Rec). Thanks them for providing efficient implementation.
88 |
89 |
--------------------------------------------------------------------------------
/SECURIITY.md:
--------------------------------------------------------------------------------
1 | ## Security
2 |
3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com)
4 | as soon as it is discovered. This library limits its runtime dependencies in
5 | order to reduce the total cost of ownership as much as can be, but all consumers
6 | should remain vigilant and have their security stakeholders review all third-party
7 | products (3PP) like this one and their dependencies.
8 |
--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 | ## Security
2 |
3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com)
4 | as soon as it is discovered. This library limits its runtime dependencies in
5 | order to reduce the total cost of ownership as much as can be, but all consumers
6 | should remain vigilant and have their security stakeholders review all third-party
7 | products (3PP) like this one and their dependencies.
--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | ## Datasets
2 |
3 | We provide four preprocessed datasets, Beauty, Sports_and_Outdoors, Toys_and_Games, and Yelp.
4 |
5 | The first three datasets are originally from [here](http://jmcauley.ucsd.edu/data/amazon/index.html).
6 |
7 | Cite following one or both if you use them:
8 |
9 | ```
10 | @inproceedings{he2016ups,
11 | title={Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering},
12 | author={He, Ruining and McAuley, Julian},
13 | booktitle={proceedings of the 25th international conference on world wide web},
14 | pages={507--517},
15 | year={2016}
16 | }
17 | ```
18 | and
19 | ```
20 | @inproceedings{mcauley2015image,
21 | title={Image-based recommendations on styles and substitutes},
22 | author={McAuley, Julian and Targett, Christopher and Shi, Qinfeng and Van Den Hengel, Anton},
23 | booktitle={Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval},
24 | pages={43--52},
25 | year={2015}
26 | }
27 | ```
28 | .
29 |
30 |
31 | The Yelp dataset is from https://www.yelp.com/dataset.
32 |
33 |
34 |
35 |
--------------------------------------------------------------------------------
/img/model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/img/model.png
--------------------------------------------------------------------------------
/img/motivation_sports.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/img/motivation_sports.png
--------------------------------------------------------------------------------
/src/data_augmentation.py:
--------------------------------------------------------------------------------
1 | #
2 | # Copyright (c) 2022 salesforce.com, inc.
3 | # All rights reserved.
4 | # SPDX-License-Identifier: BSD-3-Clause
5 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
6 | #
7 | import random
8 | import copy
9 | import itertools
10 |
11 |
12 | class Random(object):
13 | """Randomly pick one data augmentation type every time call"""
14 |
15 | def __init__(self, tao=0.2, gamma=0.7, beta=0.2):
16 | self.data_augmentation_methods = [Crop(tao=tao), Mask(gamma=gamma), Reorder(beta=beta)]
17 | print("Total augmentation numbers: ", len(self.data_augmentation_methods))
18 |
19 | def __call__(self, sequence):
20 | # randint generate int x in range: a <= x <= b
21 | augment_method_idx = random.randint(0, len(self.data_augmentation_methods) - 1)
22 | augment_method = self.data_augmentation_methods[augment_method_idx]
23 | # print(augment_method.__class__.__name__) # debug usage
24 | return augment_method(sequence)
25 |
26 |
27 | def _ensmeble_sim_models(top_k_one, top_k_two):
28 | # only support top k = 1 case so far
29 | # print("offline: ",top_k_one, "online: ", top_k_two)
30 | if top_k_one[0][1] >= top_k_two[0][1]:
31 | return [top_k_one[0][0]]
32 | else:
33 | return [top_k_two[0][1]]
34 |
35 |
36 | class Crop(object):
37 | """Randomly crop a subseq from the original sequence"""
38 |
39 | def __init__(self, tao=0.2):
40 | self.tao = tao
41 |
42 | def __call__(self, sequence):
43 | # make a deep copy to avoid original sequence be modified
44 | copied_sequence = copy.deepcopy(sequence)
45 | sub_seq_length = int(self.tao * len(copied_sequence))
46 | # randint generate int x in range: a <= x <= b
47 | start_index = random.randint(0, len(copied_sequence) - sub_seq_length - 1)
48 | if sub_seq_length < 1:
49 | return [copied_sequence[start_index]]
50 | else:
51 | cropped_seq = copied_sequence[start_index : start_index + sub_seq_length]
52 | return cropped_seq
53 |
54 |
55 | class Mask(object):
56 | """Randomly mask k items given a sequence"""
57 |
58 | def __init__(self, gamma=0.7):
59 | self.gamma = gamma
60 |
61 | def __call__(self, sequence):
62 | # make a deep copy to avoid original sequence be modified
63 | copied_sequence = copy.deepcopy(sequence)
64 | mask_nums = int(self.gamma * len(copied_sequence))
65 | mask = [0 for i in range(mask_nums)]
66 | mask_idx = random.sample([i for i in range(len(copied_sequence))], k=mask_nums)
67 | for idx, mask_value in zip(mask_idx, mask):
68 | copied_sequence[idx] = mask_value
69 | return copied_sequence
70 |
71 |
72 | class Reorder(object):
73 | """Randomly shuffle a continuous sub-sequence"""
74 |
75 | def __init__(self, beta=0.2):
76 | self.beta = beta
77 |
78 | def __call__(self, sequence):
79 | # make a deep copy to avoid original sequence be modified
80 | copied_sequence = copy.deepcopy(sequence)
81 | sub_seq_length = int(self.beta * len(copied_sequence))
82 | start_index = random.randint(0, len(copied_sequence) - sub_seq_length - 1)
83 | sub_seq = copied_sequence[start_index : start_index + sub_seq_length]
84 | random.shuffle(sub_seq)
85 | reordered_seq = copied_sequence[:start_index] + sub_seq + copied_sequence[start_index + sub_seq_length :]
86 | assert len(copied_sequence) == len(reordered_seq)
87 | return reordered_seq
88 |
89 |
90 | if __name__ == "__main__":
91 | reorder = Reorder(beta=0.2)
92 | sequence = [
93 | 14052,
94 | 10908,
95 | 2776,
96 | 16243,
97 | 2726,
98 | 2961,
99 | 11962,
100 | 4672,
101 | 2224,
102 | 5727,
103 | 4985,
104 | 9310,
105 | 2181,
106 | 3428,
107 | 4156,
108 | 16536,
109 | 180,
110 | 12044,
111 | 13700,
112 | ]
113 | rs = reorder(sequence)
114 | crop = Crop(tao=0.2)
115 | rs = crop(sequence)
116 | # rt = RandomType()
117 | # rs = rt(sequence)
118 | n_views = 5
119 | enum_type = CombinatorialEnumerateType(n_views=n_views)
120 | for i in range(40):
121 | if i == 20:
122 | print("-------")
123 | es = enum_type(sequence)
124 |
--------------------------------------------------------------------------------
/src/datasets.py:
--------------------------------------------------------------------------------
1 | #
2 | # Copyright (c) 2022 salesforce.com, inc.
3 | # All rights reserved.
4 | # SPDX-License-Identifier: BSD-3-Clause
5 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
6 | #
7 | import random
8 | import torch
9 | from torch.utils.data import Dataset
10 |
11 | from data_augmentation import Crop, Mask, Reorder, Random
12 | from utils import neg_sample, nCr
13 | import copy
14 |
15 |
16 | class RecWithContrastiveLearningDataset(Dataset):
17 | def __init__(self, args, user_seq, test_neg_items=None, data_type="train", similarity_model_type="offline"):
18 | self.args = args
19 | self.user_seq = user_seq
20 | self.test_neg_items = test_neg_items
21 | self.data_type = data_type
22 | self.max_len = args.max_seq_length
23 | # currently apply one transform, will extend to multiples
24 | self.augmentations = {
25 | "crop": Crop(tao=args.tao),
26 | "mask": Mask(gamma=args.gamma),
27 | "reorder": Reorder(beta=args.beta),
28 | "random": Random(tao=args.tao, gamma=args.gamma, beta=args.beta),
29 | }
30 | if self.args.augment_type not in self.augmentations:
31 | raise ValueError(f"augmentation type: '{self.args.augment_type}' is invalided")
32 | print(f"Creating Contrastive Learning Dataset using '{self.args.augment_type}' data augmentation")
33 | self.base_transform = self.augmentations[self.args.augment_type]
34 | # number of augmentations for each sequences, current support two
35 | self.n_views = self.args.n_views
36 |
37 | def _one_pair_data_augmentation(self, input_ids):
38 | """
39 | provides two positive samples given one sequence
40 | """
41 | augmented_seqs = []
42 | for i in range(2):
43 | augmented_input_ids = self.base_transform(input_ids)
44 | pad_len = self.max_len - len(augmented_input_ids)
45 | augmented_input_ids = [0] * pad_len + augmented_input_ids
46 |
47 | augmented_input_ids = augmented_input_ids[-self.max_len :]
48 |
49 | assert len(augmented_input_ids) == self.max_len
50 |
51 | cur_tensors = torch.tensor(augmented_input_ids, dtype=torch.long)
52 | augmented_seqs.append(cur_tensors)
53 | return augmented_seqs
54 |
55 | def _process_sequence_label_signal(self, seq_label_signal):
56 | seq_class_label = torch.tensor(seq_label_signal, dtype=torch.long)
57 | return seq_class_label
58 |
59 | def _data_sample_rec_task(self, user_id, items, input_ids, target_pos, answer):
60 | # make a deep copy to avoid original sequence be modified
61 | copied_input_ids = copy.deepcopy(input_ids)
62 | target_neg = []
63 | seq_set = set(items)
64 | for _ in copied_input_ids:
65 | target_neg.append(neg_sample(seq_set, self.args.item_size))
66 |
67 | pad_len = self.max_len - len(copied_input_ids)
68 | copied_input_ids = [0] * pad_len + copied_input_ids
69 | target_pos = [0] * pad_len + target_pos
70 | target_neg = [0] * pad_len + target_neg
71 |
72 | copied_input_ids = copied_input_ids[-self.max_len :]
73 | target_pos = target_pos[-self.max_len :]
74 | target_neg = target_neg[-self.max_len :]
75 |
76 | assert len(copied_input_ids) == self.max_len
77 | assert len(target_pos) == self.max_len
78 | assert len(target_neg) == self.max_len
79 |
80 | if self.test_neg_items is not None:
81 | test_samples = self.test_neg_items[index]
82 |
83 | cur_rec_tensors = (
84 | torch.tensor(user_id, dtype=torch.long), # user_id for testing
85 | torch.tensor(copied_input_ids, dtype=torch.long),
86 | torch.tensor(target_pos, dtype=torch.long),
87 | torch.tensor(target_neg, dtype=torch.long),
88 | torch.tensor(answer, dtype=torch.long),
89 | torch.tensor(test_samples, dtype=torch.long),
90 | )
91 | else:
92 | cur_rec_tensors = (
93 | torch.tensor(user_id, dtype=torch.long), # user_id for testing
94 | torch.tensor(copied_input_ids, dtype=torch.long),
95 | torch.tensor(target_pos, dtype=torch.long),
96 | torch.tensor(target_neg, dtype=torch.long),
97 | torch.tensor(answer, dtype=torch.long),
98 | )
99 |
100 | return cur_rec_tensors
101 |
102 | def _add_noise_interactions(self, items):
103 | copied_sequence = copy.deepcopy(items)
104 | insert_nums = max(int(self.args.noise_ratio * len(copied_sequence)), 0)
105 | if insert_nums == 0:
106 | return copied_sequence
107 | insert_idx = random.choices([i for i in range(len(copied_sequence))], k=insert_nums)
108 | inserted_sequence = []
109 | for index, item in enumerate(copied_sequence):
110 | if index in insert_idx:
111 | item_id = random.randint(1, self.args.item_size - 2)
112 | while item_id in copied_sequence:
113 | item_id = random.randint(1, self.args.item_size - 2)
114 | inserted_sequence += [item_id]
115 | inserted_sequence += [item]
116 | return inserted_sequence
117 |
118 | def __getitem__(self, index):
119 | user_id = index
120 | items = self.user_seq[index]
121 |
122 | assert self.data_type in {"train", "valid", "test"}
123 |
124 | # [0, 1, 2, 3, 4, 5, 6]
125 | # train [0, 1, 2, 3]
126 | # target [1, 2, 3, 4]
127 | if self.data_type == "train":
128 | input_ids = items[:-3]
129 | target_pos = items[1:-2]
130 | seq_label_signal = items[-2]
131 | answer = [0] # no use
132 | elif self.data_type == "valid":
133 | input_ids = items[:-2]
134 | target_pos = items[1:-1]
135 | answer = [items[-2]]
136 |
137 | else:
138 | items_with_noise = self._add_noise_interactions(items)
139 | input_ids = items_with_noise[:-1]
140 | target_pos = items_with_noise[1:]
141 | answer = [items_with_noise[-1]]
142 | if self.data_type == "train":
143 | cur_rec_tensors = self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer)
144 | cf_tensors_list = []
145 | # if n_views == 2, then it's downgraded to pair-wise contrastive learning
146 | total_augmentaion_pairs = nCr(self.n_views, 2)
147 | for i in range(total_augmentaion_pairs):
148 | cf_tensors_list.append(self._one_pair_data_augmentation(input_ids))
149 |
150 | # add supervision of sequences
151 | seq_class_label = self._process_sequence_label_signal(seq_label_signal)
152 | return (cur_rec_tensors, cf_tensors_list, seq_class_label)
153 | elif self.data_type == "valid":
154 | cur_rec_tensors = self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer)
155 | return cur_rec_tensors
156 | else:
157 | cur_rec_tensors = self._data_sample_rec_task(user_id, items_with_noise, input_ids, target_pos, answer)
158 | return cur_rec_tensors
159 |
160 | def __len__(self):
161 | """
162 | consider n_view of a single sequence as one sample
163 | """
164 | return len(self.user_seq)
165 |
166 |
167 | class SASRecDataset(Dataset):
168 | def __init__(self, args, user_seq, test_neg_items=None, data_type="train"):
169 | self.args = args
170 | self.user_seq = user_seq
171 | self.test_neg_items = test_neg_items
172 | self.data_type = data_type
173 | self.max_len = args.max_seq_length
174 |
175 | def _data_sample_rec_task(self, user_id, items, input_ids, target_pos, answer):
176 | # make a deep copy to avoid original sequence be modified
177 | copied_input_ids = copy.deepcopy(input_ids)
178 | target_neg = []
179 | seq_set = set(items)
180 | for _ in input_ids:
181 | target_neg.append(neg_sample(seq_set, self.args.item_size))
182 |
183 | pad_len = self.max_len - len(input_ids)
184 | input_ids = [0] * pad_len + input_ids
185 | target_pos = [0] * pad_len + target_pos
186 | target_neg = [0] * pad_len + target_neg
187 |
188 | input_ids = input_ids[-self.max_len :]
189 | target_pos = target_pos[-self.max_len :]
190 | target_neg = target_neg[-self.max_len :]
191 |
192 | assert len(input_ids) == self.max_len
193 | assert len(target_pos) == self.max_len
194 | assert len(target_neg) == self.max_len
195 |
196 | if self.test_neg_items is not None:
197 | test_samples = self.test_neg_items[index]
198 |
199 | cur_rec_tensors = (
200 | torch.tensor(user_id, dtype=torch.long), # user_id for testing
201 | torch.tensor(input_ids, dtype=torch.long),
202 | torch.tensor(target_pos, dtype=torch.long),
203 | torch.tensor(target_neg, dtype=torch.long),
204 | torch.tensor(answer, dtype=torch.long),
205 | torch.tensor(test_samples, dtype=torch.long),
206 | )
207 | else:
208 | cur_rec_tensors = (
209 | torch.tensor(user_id, dtype=torch.long), # user_id for testing
210 | torch.tensor(input_ids, dtype=torch.long),
211 | torch.tensor(target_pos, dtype=torch.long),
212 | torch.tensor(target_neg, dtype=torch.long),
213 | torch.tensor(answer, dtype=torch.long),
214 | )
215 |
216 | return cur_rec_tensors
217 |
218 | def __getitem__(self, index):
219 |
220 | user_id = index
221 | items = self.user_seq[index]
222 |
223 | assert self.data_type in {"train", "valid", "test"}
224 |
225 | # [0, 1, 2, 3, 4, 5, 6]
226 | # train [0, 1, 2, 3]
227 | # target [1, 2, 3, 4]
228 |
229 | # valid [0, 1, 2, 3, 4]
230 | # answer [5]
231 |
232 | # test [0, 1, 2, 3, 4, 5]
233 | # answer [6]
234 | if self.data_type == "train":
235 | input_ids = items[:-3]
236 | target_pos = items[1:-2]
237 | answer = [0] # no use
238 |
239 | elif self.data_type == "valid":
240 | input_ids = items[:-2]
241 | target_pos = items[1:-1]
242 | answer = [items[-2]]
243 |
244 | else:
245 | input_ids = items[:-1]
246 | target_pos = items[1:]
247 | answer = [items[-1]]
248 |
249 | return self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer)
250 |
251 | def __len__(self):
252 | return len(self.user_seq)
253 |
254 |
255 | if __name__ == "__main__":
256 | import argparse
257 | from utils import get_user_seqs, set_seed
258 | from torch.utils.data import DataLoader, RandomSampler
259 | from tqdm import tqdm
260 |
261 | parser = argparse.ArgumentParser()
262 |
263 | parser.add_argument("--data_dir", default="../data/", type=str)
264 | parser.add_argument("--output_dir", default="output/", type=str)
265 | parser.add_argument("--data_name", default="Beauty", type=str)
266 | parser.add_argument("--do_eval", action="store_true")
267 | parser.add_argument("--model_idx", default=1, type=int, help="model idenfier 10, 20, 30...")
268 |
269 | # data augmentation args
270 | parser.add_argument(
271 | "--base_augment_type",
272 | default="reorder",
273 | type=str,
274 | help="data augmentation types. Chosen from mask, crop, reorder, random.",
275 | )
276 | # model args
277 | parser.add_argument("--model_name", default="ICLRec", type=str)
278 | parser.add_argument("--hidden_size", type=int, default=64, help="hidden size of transformer model")
279 | parser.add_argument("--num_hidden_layers", type=int, default=2, help="number of layers")
280 | parser.add_argument("--num_attention_heads", default=2, type=int)
281 | parser.add_argument("--hidden_act", default="gelu", type=str) # gelu relu
282 | parser.add_argument("--attention_probs_dropout_prob", type=float, default=0.5, help="attention dropout p")
283 | parser.add_argument("--hidden_dropout_prob", type=float, default=0.5, help="hidden dropout p")
284 | parser.add_argument("--initializer_range", type=float, default=0.02)
285 | parser.add_argument("--max_seq_length", default=50, type=int)
286 |
287 | # train args
288 | parser.add_argument("--lr", type=float, default=0.001, help="learning rate of adam")
289 | parser.add_argument("--batch_size", type=int, default=2, help="number of batch_size")
290 | parser.add_argument("--epochs", type=int, default=200, help="number of epochs")
291 | parser.add_argument("--no_cuda", action="store_true")
292 | parser.add_argument("--log_freq", type=int, default=1, help="per epoch print res")
293 | parser.add_argument("--seed", default=42, type=int)
294 | ## contrastive learning related
295 | parser.add_argument("--temperature", default=1.0, type=float, help="softmax temperature (default: 1.0)")
296 | parser.add_argument(
297 | "--n_views", default=2, type=int, metavar="N", help="Number of augmented data for each sequence"
298 | )
299 | parser.add_argument("--cf_weight", type=float, default=0.2, help="weight of contrastive learning task")
300 | parser.add_argument("--rec_weight", type=float, default=1.0, help="weight of contrastive learning task")
301 |
302 | # learning related
303 | parser.add_argument("--weight_decay", type=float, default=0.0, help="weight_decay of adam")
304 | parser.add_argument("--adam_beta1", type=float, default=0.9, help="adam first beta value")
305 | parser.add_argument("--adam_beta2", type=float, default=0.999, help="adam second beta value")
306 | parser.add_argument("--gpu_id", type=str, default="0", help="gpu_id")
307 |
308 | args = parser.parse_args()
309 | set_seed(args.seed)
310 | args.data_file = args.data_dir + args.data_name + ".txt"
311 | user_seq, max_item, valid_rating_matrix, test_rating_matrix = get_user_seqs(args.data_file)
312 | args.item_size = max_item + 2
313 | train_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="train")
314 | train_sampler = RandomSampler(train_dataset)
315 | train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=1)
316 | rec_cf_data_iter = tqdm(enumerate(train_dataloader), total=len(train_dataloader))
317 |
318 | for i, (rec_batch, cf_batch) in rec_cf_data_iter:
319 | for j in range(len(rec_batch)):
320 | print("tensor ", j, rec_batch[j])
321 | print("cf_batch:", cf_batch)
322 | if i > 2:
323 | break
324 |
--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Copyright (c) 2022 salesforce.com, inc.
4 | # All rights reserved.
5 | # SPDX-License-Identifier: BSD-3-Clause
6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
7 | #
8 |
9 | import os
10 | import numpy as np
11 | import random
12 | import torch
13 | import argparse
14 |
15 | from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
16 |
17 | from datasets import RecWithContrastiveLearningDataset
18 |
19 | from trainers import ICLRecTrainer
20 | from models import SASRecModel, OfflineItemSimilarity, OnlineItemSimilarity
21 | from utils import EarlyStopping, get_user_seqs, get_item2attribute_json, check_path, set_seed
22 |
23 |
24 | def show_args_info(args):
25 | print(f"--------------------Configure Info:------------")
26 | for arg in vars(args):
27 | print(f"{arg:<30} : {getattr(args, arg):>35}")
28 |
29 |
30 | def main():
31 | parser = argparse.ArgumentParser()
32 | # system args
33 | parser.add_argument("--data_dir", default="../data/", type=str)
34 | parser.add_argument("--output_dir", default="output/", type=str)
35 | parser.add_argument("--data_name", default="Sports_and_Outdoors", type=str)
36 | parser.add_argument("--do_eval", action="store_true")
37 | parser.add_argument("--model_idx", default=0, type=int, help="model idenfier 10, 20, 30...")
38 | parser.add_argument("--gpu_id", type=str, default="0", help="gpu_id")
39 |
40 | # data augmentation args
41 | parser.add_argument(
42 | "--noise_ratio",
43 | default=0.0,
44 | type=float,
45 | help="percentage of negative interactions in a sequence - robustness analysis",
46 | )
47 | parser.add_argument(
48 | "--training_data_ratio",
49 | default=1.0,
50 | type=float,
51 | help="percentage of training samples used for training - robustness analysis",
52 | )
53 | parser.add_argument(
54 | "--augment_type",
55 | default="random",
56 | type=str,
57 | help="default data augmentation types. Chosen from: \
58 | mask, crop, reorder, substitute, insert, random, \
59 | combinatorial_enumerate (for multi-view).",
60 | )
61 | parser.add_argument("--tao", type=float, default=0.2, help="crop ratio for crop operator")
62 | parser.add_argument("--gamma", type=float, default=0.7, help="mask ratio for mask operator")
63 | parser.add_argument("--beta", type=float, default=0.2, help="reorder ratio for reorder operator")
64 |
65 | ## contrastive learning task args
66 | parser.add_argument(
67 | "--temperature", default=1.0, type=float, help="softmax temperature (default: 1.0) - not studied."
68 | )
69 | parser.add_argument(
70 | "--n_views", default=2, type=int, metavar="N", help="Number of augmented data for each sequence - not studied."
71 | )
72 | parser.add_argument(
73 | "--contrast_type",
74 | default="Hybrid",
75 | type=str,
76 | help="Ways to contrastive of. \
77 | Support InstanceCL and ShortInterestCL, IntentCL, and Hybrid types.",
78 | )
79 | parser.add_argument(
80 | "--num_intent_clusters",
81 | default="256",
82 | type=str,
83 | help="Number of cluster of intents. Activated only when using \
84 | IntentCL or Hybrid types.",
85 | )
86 | parser.add_argument(
87 | "--seq_representation_type",
88 | default="mean",
89 | type=str,
90 | help="operate of item representation overtime. Support types: \
91 | mean, concatenate",
92 | )
93 | parser.add_argument(
94 | "--seq_representation_instancecl_type",
95 | default="concatenate",
96 | type=str,
97 | help="operate of item representation overtime. Support types: \
98 | mean, concatenate",
99 | )
100 | parser.add_argument("--warm_up_epoches", type=float, default=0, help="number of epochs to start IntentCL.")
101 | parser.add_argument("--de_noise", action="store_true", help="whether to de-false negative pairs during learning.")
102 |
103 | # model args
104 | parser.add_argument("--model_name", default="ICLRec", type=str)
105 | parser.add_argument("--hidden_size", type=int, default=64, help="hidden size of transformer model")
106 | parser.add_argument("--num_hidden_layers", type=int, default=2, help="number of layers")
107 | parser.add_argument("--num_attention_heads", default=2, type=int)
108 | parser.add_argument("--hidden_act", default="gelu", type=str) # gelu relu
109 | parser.add_argument("--attention_probs_dropout_prob", type=float, default=0.5, help="attention dropout p")
110 | parser.add_argument("--hidden_dropout_prob", type=float, default=0.5, help="hidden dropout p")
111 | parser.add_argument("--initializer_range", type=float, default=0.02)
112 | parser.add_argument("--max_seq_length", default=50, type=int)
113 |
114 | # train args
115 | parser.add_argument("--lr", type=float, default=0.001, help="learning rate of adam")
116 | parser.add_argument("--batch_size", type=int, default=256, help="number of batch_size")
117 | parser.add_argument("--epochs", type=int, default=300, help="number of epochs")
118 | parser.add_argument("--no_cuda", action="store_true")
119 | parser.add_argument("--log_freq", type=int, default=1, help="per epoch print res")
120 | parser.add_argument("--seed", default=1, type=int)
121 | parser.add_argument("--cf_weight", type=float, default=0.1, help="weight of contrastive learning task")
122 | parser.add_argument("--rec_weight", type=float, default=1.0, help="weight of contrastive learning task")
123 | parser.add_argument("--intent_cf_weight", type=float, default=0.3, help="weight of contrastive learning task")
124 |
125 | # learning related
126 | parser.add_argument("--weight_decay", type=float, default=0.0, help="weight_decay of adam")
127 | parser.add_argument("--adam_beta1", type=float, default=0.9, help="adam first beta value")
128 | parser.add_argument("--adam_beta2", type=float, default=0.999, help="adam second beta value")
129 |
130 | args = parser.parse_args()
131 |
132 | set_seed(args.seed)
133 | check_path(args.output_dir)
134 |
135 | os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu_id
136 | args.cuda_condition = torch.cuda.is_available() and not args.no_cuda
137 | print("Using Cuda:", torch.cuda.is_available())
138 | args.data_file = args.data_dir + args.data_name + ".txt"
139 |
140 | user_seq, max_item, valid_rating_matrix, test_rating_matrix = get_user_seqs(args.data_file)
141 |
142 | args.item_size = max_item + 2
143 | args.mask_id = max_item + 1
144 |
145 | # save model args
146 | args_str = f"{args.model_name}-{args.data_name}-{args.model_idx}"
147 | args.log_file = os.path.join(args.output_dir, args_str + ".txt")
148 |
149 | show_args_info(args)
150 |
151 | with open(args.log_file, "a") as f:
152 | f.write(str(args) + "\n")
153 |
154 | # set item score in train set to `0` in validation
155 | args.train_matrix = valid_rating_matrix
156 |
157 | # save model
158 | checkpoint = args_str + ".pt"
159 | args.checkpoint_path = os.path.join(args.output_dir, checkpoint)
160 |
161 | # training data for node classification
162 | cluster_dataset = RecWithContrastiveLearningDataset(
163 | args, user_seq[: int(len(user_seq) * args.training_data_ratio)], data_type="train"
164 | )
165 | cluster_sampler = SequentialSampler(cluster_dataset)
166 | cluster_dataloader = DataLoader(cluster_dataset, sampler=cluster_sampler, batch_size=args.batch_size)
167 |
168 | train_dataset = RecWithContrastiveLearningDataset(
169 | args, user_seq[: int(len(user_seq) * args.training_data_ratio)], data_type="train"
170 | )
171 | train_sampler = RandomSampler(train_dataset)
172 | train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=args.batch_size)
173 |
174 | eval_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="valid")
175 | eval_sampler = SequentialSampler(eval_dataset)
176 | eval_dataloader = DataLoader(eval_dataset, sampler=eval_sampler, batch_size=args.batch_size)
177 |
178 | test_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="test")
179 | test_sampler = SequentialSampler(test_dataset)
180 | test_dataloader = DataLoader(test_dataset, sampler=test_sampler, batch_size=args.batch_size)
181 |
182 | model = SASRecModel(args=args)
183 |
184 | trainer = ICLRecTrainer(model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args)
185 |
186 | if args.do_eval:
187 | trainer.args.train_matrix = test_rating_matrix
188 | trainer.load(args.checkpoint_path)
189 | print(f"Load model from {args.checkpoint_path} for test!")
190 | scores, result_info = trainer.test(0, full_sort=True)
191 |
192 | else:
193 | print(f"Train ICLRec")
194 | early_stopping = EarlyStopping(args.checkpoint_path, patience=40, verbose=True)
195 | for epoch in range(args.epochs):
196 | trainer.train(epoch)
197 | # evaluate on NDCG@20
198 | scores, _ = trainer.valid(epoch, full_sort=True)
199 | early_stopping(np.array(scores[-1:]), trainer.model)
200 | if early_stopping.early_stop:
201 | print("Early stopping")
202 | break
203 | trainer.args.train_matrix = test_rating_matrix
204 | print("---------------Change to test_rating_matrix!-------------------")
205 | # load the best model
206 | trainer.model.load_state_dict(torch.load(args.checkpoint_path))
207 | scores, result_info = trainer.test(0, full_sort=True)
208 |
209 | print(args_str)
210 | print(result_info)
211 | with open(args.log_file, "a") as f:
212 | f.write(args_str + "\n")
213 | f.write(result_info + "\n")
214 |
215 |
216 | main()
217 |
--------------------------------------------------------------------------------
/src/models.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Copyright (c) 2022 salesforce.com, inc.
4 | # All rights reserved.
5 | # SPDX-License-Identifier: BSD-3-Clause
6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
7 | #
8 | import math
9 | import os
10 | import pickle
11 | from tqdm import tqdm
12 | import random
13 | import copy
14 |
15 | import torch
16 | import torch.nn as nn
17 | import gensim
18 | import faiss
19 |
20 | # from kmeans_pytorch import kmeans
21 | import time
22 |
23 | from modules import Encoder, LayerNorm
24 |
25 |
26 | class KMeans(object):
27 | def __init__(self, num_cluster, seed, hidden_size, gpu_id=0, device="cpu"):
28 | """
29 | Args:
30 | k: number of clusters
31 | """
32 | self.seed = seed
33 | self.num_cluster = num_cluster
34 | self.max_points_per_centroid = 4096
35 | self.min_points_per_centroid = 0
36 | self.gpu_id = 0
37 | self.device = device
38 | self.first_batch = True
39 | self.hidden_size = hidden_size
40 | self.clus, self.index = self.__init_cluster(self.hidden_size)
41 | self.centroids = []
42 |
43 | def __init_cluster(
44 | self, hidden_size, verbose=False, niter=20, nredo=5, max_points_per_centroid=4096, min_points_per_centroid=0
45 | ):
46 | print(" cluster train iterations:", niter)
47 | clus = faiss.Clustering(hidden_size, self.num_cluster)
48 | clus.verbose = verbose
49 | clus.niter = niter
50 | clus.nredo = nredo
51 | clus.seed = self.seed
52 | clus.max_points_per_centroid = max_points_per_centroid
53 | clus.min_points_per_centroid = min_points_per_centroid
54 |
55 | res = faiss.StandardGpuResources()
56 | res.noTempMemory()
57 | cfg = faiss.GpuIndexFlatConfig()
58 | cfg.useFloat16 = False
59 | cfg.device = self.gpu_id
60 | index = faiss.GpuIndexFlatL2(res, hidden_size, cfg)
61 | return clus, index
62 |
63 | def train(self, x):
64 | # train to get centroids
65 | if x.shape[0] > self.num_cluster:
66 | self.clus.train(x, self.index)
67 | # get cluster centroids
68 | centroids = faiss.vector_to_array(self.clus.centroids).reshape(self.num_cluster, self.hidden_size)
69 | # convert to cuda Tensors for broadcast
70 | centroids = torch.Tensor(centroids).to(self.device)
71 | self.centroids = nn.functional.normalize(centroids, p=2, dim=1)
72 |
73 | def query(self, x):
74 | # self.index.add(x)
75 | D, I = self.index.search(x, 1) # for each sample, find cluster distance and assignments
76 | seq2cluster = [int(n[0]) for n in I]
77 | # print("cluster number:", self.num_cluster,"cluster in batch:", len(set(seq2cluster)))
78 | seq2cluster = torch.LongTensor(seq2cluster).to(self.device)
79 | return seq2cluster, self.centroids[seq2cluster]
80 |
81 |
82 | class KMeans_Pytorch(object):
83 | def __init__(self, num_cluster, seed, hidden_size, gpu_id=0, device="cpu"):
84 | """
85 | Args:
86 | k: number of clusters
87 | """
88 | self.seed = seed
89 | self.num_cluster = num_cluster
90 | self.max_points_per_centroid = 4096
91 | self.min_points_per_centroid = 10
92 | self.first_batch = True
93 | self.hidden_size = hidden_size
94 | self.gpu_id = gpu_id
95 | self.device = device
96 | print(self.device, "-----")
97 |
98 | def run_kmeans(self, x, Niter=20, tqdm_flag=False):
99 | if x.shape[0] >= self.num_cluster:
100 | seq2cluster, centroids = kmeans(
101 | X=x, num_clusters=self.num_cluster, distance="euclidean", device=self.device, tqdm_flag=False
102 | )
103 | seq2cluster = seq2cluster.to(self.device)
104 | centroids = centroids.to(self.device)
105 | # last batch where
106 | else:
107 | seq2cluster, centroids = kmeans(
108 | X=x, num_clusters=x.shape[0] - 1, distance="euclidean", device=self.device, tqdm_flag=False
109 | )
110 | seq2cluster = seq2cluster.to(self.device)
111 | centroids = centroids.to(self.device)
112 | return seq2cluster, centroids
113 |
114 |
115 | class SASRecModel(nn.Module):
116 | def __init__(self, args):
117 | super(SASRecModel, self).__init__()
118 | self.item_embeddings = nn.Embedding(args.item_size, args.hidden_size, padding_idx=0)
119 | self.position_embeddings = nn.Embedding(args.max_seq_length, args.hidden_size)
120 | self.item_encoder = Encoder(args)
121 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12)
122 | self.dropout = nn.Dropout(args.hidden_dropout_prob)
123 | self.args = args
124 |
125 | self.criterion = nn.BCELoss(reduction="none")
126 | self.apply(self.init_weights)
127 |
128 | # Positional Embedding
129 | def add_position_embedding(self, sequence):
130 |
131 | seq_length = sequence.size(1)
132 | position_ids = torch.arange(seq_length, dtype=torch.long, device=sequence.device)
133 | position_ids = position_ids.unsqueeze(0).expand_as(sequence)
134 | item_embeddings = self.item_embeddings(sequence)
135 | position_embeddings = self.position_embeddings(position_ids)
136 | sequence_emb = item_embeddings + position_embeddings
137 | sequence_emb = self.LayerNorm(sequence_emb)
138 | sequence_emb = self.dropout(sequence_emb)
139 |
140 | return sequence_emb
141 |
142 | # model same as SASRec
143 | def forward(self, input_ids):
144 |
145 | attention_mask = (input_ids > 0).long()
146 | extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) # torch.int64
147 | max_len = attention_mask.size(-1)
148 | attn_shape = (1, max_len, max_len)
149 | subsequent_mask = torch.triu(torch.ones(attn_shape), diagonal=1) # torch.uint8
150 | subsequent_mask = (subsequent_mask == 0).unsqueeze(1)
151 | subsequent_mask = subsequent_mask.long()
152 |
153 | if self.args.cuda_condition:
154 | subsequent_mask = subsequent_mask.cuda()
155 |
156 | extended_attention_mask = extended_attention_mask * subsequent_mask
157 | extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
158 | extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
159 |
160 | sequence_emb = self.add_position_embedding(input_ids)
161 |
162 | item_encoded_layers = self.item_encoder(sequence_emb, extended_attention_mask, output_all_encoded_layers=True)
163 |
164 | sequence_output = item_encoded_layers[-1]
165 | return sequence_output
166 |
167 | def init_weights(self, module):
168 | """ Initialize the weights.
169 | """
170 | if isinstance(module, (nn.Linear, nn.Embedding)):
171 | # Slightly different from the TF version which uses truncated_normal for initialization
172 | # cf https://github.com/pytorch/pytorch/pull/5617
173 | module.weight.data.normal_(mean=0.0, std=self.args.initializer_range)
174 | elif isinstance(module, LayerNorm):
175 | module.bias.data.zero_()
176 | module.weight.data.fill_(1.0)
177 | if isinstance(module, nn.Linear) and module.bias is not None:
178 | module.bias.data.zero_()
179 |
180 |
181 | class OnlineItemSimilarity:
182 | def __init__(self, item_size):
183 | self.item_size = item_size
184 | self.item_embeddings = None
185 | self.cuda_condition = torch.cuda.is_available()
186 | self.device = torch.device("cuda" if self.cuda_condition else "cpu")
187 | self.total_item_list = torch.tensor([i for i in range(self.item_size)], dtype=torch.long).to(self.device)
188 | self.max_score, self.min_score = self.get_maximum_minimum_sim_scores()
189 |
190 | def update_embedding_matrix(self, item_embeddings):
191 | self.item_embeddings = copy.deepcopy(item_embeddings)
192 | self.base_embedding_matrix = self.item_embeddings(self.total_item_list)
193 |
194 | def get_maximum_minimum_sim_scores(self):
195 | max_score, min_score = -1, 100
196 | for item_idx in range(1, self.item_size):
197 | try:
198 | item_vector = self.item_embeddings(item_idx).view(-1, 1)
199 | item_similarity = torch.mm(self.base_embedding_matrix, item_vector).view(-1)
200 | max_score = max(torch.max(item_similarity), max_score)
201 | min_score = min(torch.min(item_similarity), min_score)
202 | except:
203 | continue
204 | return max_score, min_score
205 |
206 | def most_similar(self, item_idx, top_k=1, with_score=False):
207 | item_idx = torch.tensor(item_idx, dtype=torch.long).to(self.device)
208 | item_vector = self.item_embeddings(item_idx).view(-1, 1)
209 | item_similarity = torch.mm(self.base_embedding_matrix, item_vector).view(-1)
210 | item_similarity = (self.max_score - item_similarity) / (self.max_score - self.min_score)
211 | # remove item idx itself
212 | values, indices = item_similarity.topk(top_k + 1)
213 | if with_score:
214 | item_list = indices.tolist()
215 | score_list = values.tolist()
216 | if item_idx in item_list:
217 | idd = item_list.index(item_idx)
218 | item_list.remove(item_idx)
219 | score_list.pop(idd)
220 | return list(zip(item_list, score_list))
221 | item_list = indices.tolist()
222 | if item_idx in item_list:
223 | item_list.remove(item_idx)
224 | return item_list
225 |
226 |
227 | class OfflineItemSimilarity:
228 | def __init__(self, data_file=None, similarity_path=None, model_name="ItemCF", dataset_name="Sports_and_Outdoors"):
229 | self.dataset_name = dataset_name
230 | self.similarity_path = similarity_path
231 | # train_data_list used for item2vec, train_data_dict used for itemCF and itemCF-IUF
232 | self.train_data_list, self.train_item_list, self.train_data_dict = self._load_train_data(data_file)
233 | self.model_name = model_name
234 | self.similarity_model = self.load_similarity_model(self.similarity_path)
235 | self.max_score, self.min_score = self.get_maximum_minimum_sim_scores()
236 |
237 | def get_maximum_minimum_sim_scores(self):
238 | max_score, min_score = -1, 100
239 | for item in self.similarity_model.keys():
240 | for neig in self.similarity_model[item]:
241 | sim_score = self.similarity_model[item][neig]
242 | max_score = max(max_score, sim_score)
243 | min_score = min(min_score, sim_score)
244 | return max_score, min_score
245 |
246 | def _convert_data_to_dict(self, data):
247 | """
248 | split the data set
249 | testdata is a test data set
250 | traindata is a train set
251 | """
252 | train_data_dict = {}
253 | for user, item, record in data:
254 | train_data_dict.setdefault(user, {})
255 | train_data_dict[user][item] = record
256 | return train_data_dict
257 |
258 | def _save_dict(self, dict_data, save_path="./similarity.pkl"):
259 | print("saving data to ", save_path)
260 | with open(save_path, "wb") as write_file:
261 | pickle.dump(dict_data, write_file)
262 |
263 | def _load_train_data(self, data_file=None):
264 | """
265 | read the data from the data file which is a data set
266 | """
267 | train_data = []
268 | train_data_list = []
269 | train_data_set_list = []
270 | for line in open(data_file).readlines():
271 | userid, items = line.strip().split(" ", 1)
272 | # only use training data
273 | items = items.split(" ")[:-3]
274 | train_data_list.append(items)
275 | train_data_set_list += items
276 | for itemid in items:
277 | train_data.append((userid, itemid, int(1)))
278 | return train_data_list, set(train_data_set_list), self._convert_data_to_dict(train_data)
279 |
280 | def _generate_item_similarity(self, train=None, save_path="./"):
281 | """
282 | calculate co-rated users between items
283 | """
284 | print("getting item similarity...")
285 | train = train or self.train_data_dict
286 | C = dict()
287 | N = dict()
288 |
289 | if self.model_name in ["ItemCF", "ItemCF_IUF"]:
290 | print("Step 1: Compute Statistics")
291 | data_iter = tqdm(enumerate(train.items()), total=len(train.items()))
292 | for idx, (u, items) in data_iter:
293 | if self.model_name == "ItemCF":
294 | for i in items.keys():
295 | N.setdefault(i, 0)
296 | N[i] += 1
297 | for j in items.keys():
298 | if i == j:
299 | continue
300 | C.setdefault(i, {})
301 | C[i].setdefault(j, 0)
302 | C[i][j] += 1
303 | elif self.model_name == "ItemCF_IUF":
304 | for i in items.keys():
305 | N.setdefault(i, 0)
306 | N[i] += 1
307 | for j in items.keys():
308 | if i == j:
309 | continue
310 | C.setdefault(i, {})
311 | C[i].setdefault(j, 0)
312 | C[i][j] += 1 / math.log(1 + len(items) * 1.0)
313 | self.itemSimBest = dict()
314 | print("Step 2: Compute co-rate matrix")
315 | c_iter = tqdm(enumerate(C.items()), total=len(C.items()))
316 | for idx, (cur_item, related_items) in c_iter:
317 | self.itemSimBest.setdefault(cur_item, {})
318 | for related_item, score in related_items.items():
319 | self.itemSimBest[cur_item].setdefault(related_item, 0)
320 | self.itemSimBest[cur_item][related_item] = score / math.sqrt(N[cur_item] * N[related_item])
321 | self._save_dict(self.itemSimBest, save_path=save_path)
322 | elif self.model_name == "Item2Vec":
323 | # details here: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py
324 | print("Step 1: train item2vec model")
325 | item2vec_model = gensim.models.Word2Vec(
326 | sentences=self.train_data_list, vector_size=20, window=5, min_count=0, epochs=100
327 | )
328 | self.itemSimBest = dict()
329 | total_item_nums = len(item2vec_model.wv.index_to_key)
330 | print("Step 2: convert to item similarity dict")
331 | total_items = tqdm(item2vec_model.wv.index_to_key, total=total_item_nums)
332 | for cur_item in total_items:
333 | related_items = item2vec_model.wv.most_similar(positive=[cur_item], topn=20)
334 | self.itemSimBest.setdefault(cur_item, {})
335 | for (related_item, score) in related_items:
336 | self.itemSimBest[cur_item].setdefault(related_item, 0)
337 | self.itemSimBest[cur_item][related_item] = score
338 | print("Item2Vec model saved to: ", save_path)
339 | self._save_dict(self.itemSimBest, save_path=save_path)
340 | elif self.model_name == "LightGCN":
341 | # train a item embedding from lightGCN model, and then convert to sim dict
342 | print("generating similarity model..")
343 | itemSimBest = light_gcn.generate_similarity_from_light_gcn(self.dataset_name)
344 | print("LightGCN based model saved to: ", save_path)
345 | self._save_dict(itemSimBest, save_path=save_path)
346 |
347 | def load_similarity_model(self, similarity_model_path):
348 | if not similarity_model_path:
349 | raise ValueError("invalid path")
350 | elif not os.path.exists(similarity_model_path):
351 | print("the similirity dict not exist, generating...")
352 | self._generate_item_similarity(save_path=self.similarity_path)
353 | if self.model_name in ["ItemCF", "ItemCF_IUF", "Item2Vec", "LightGCN"]:
354 | with open(similarity_model_path, "rb") as read_file:
355 | similarity_dict = pickle.load(read_file)
356 | return similarity_dict
357 | elif self.model_name == "Random":
358 | similarity_dict = self.train_item_list
359 | return similarity_dict
360 |
361 | def most_similar(self, item, top_k=1, with_score=False):
362 | if self.model_name in ["ItemCF", "ItemCF_IUF", "Item2Vec", "LightGCN"]:
363 | """TODO: handle case that item not in keys"""
364 | if str(item) in self.similarity_model:
365 | top_k_items_with_score = sorted(
366 | self.similarity_model[str(item)].items(), key=lambda x: x[1], reverse=True
367 | )[0:top_k]
368 | if with_score:
369 | return list(
370 | map(
371 | lambda x: (int(x[0]), (self.max_score - float(x[1])) / (self.max_score - self.min_score)),
372 | top_k_items_with_score,
373 | )
374 | )
375 | return list(map(lambda x: int(x[0]), top_k_items_with_score))
376 | elif int(item) in self.similarity_model:
377 | top_k_items_with_score = sorted(
378 | self.similarity_model[int(item)].items(), key=lambda x: x[1], reverse=True
379 | )[0:top_k]
380 | if with_score:
381 | return list(
382 | map(
383 | lambda x: (int(x[0]), (self.max_score - float(x[1])) / (self.max_score - self.min_score)),
384 | top_k_items_with_score,
385 | )
386 | )
387 | return list(map(lambda x: int(x[0]), top_k_items_with_score))
388 | else:
389 | item_list = list(self.similarity_model.keys())
390 | random_items = random.sample(item_list, k=top_k)
391 | return list(map(lambda x: int(x), random_items))
392 | elif self.model_name == "Random":
393 | random_items = random.sample(self.similarity_model, k=top_k)
394 | return list(map(lambda x: int(x), random_items))
395 |
396 |
397 | if __name__ == "__main__":
398 | onlineitemsim = OnlineItemSimilarity(item_size=10)
399 | item_embeddings = nn.Embedding(10, 6, padding_idx=0)
400 | onlineitemsim.update_embedding_matrix(item_embeddings)
401 | item_idx = torch.tensor(2, dtype=torch.long)
402 | similiar_items = onlineitemsim.most_similar(item_idx=item_idx, top_k=1)
403 | print(similiar_items)
404 |
--------------------------------------------------------------------------------
/src/modules.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | #
3 | # Copyright (c) 2022 salesforce.com, inc.
4 | # All rights reserved.
5 | # SPDX-License-Identifier: BSD-3-Clause
6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
7 | #
8 |
9 | import numpy as np
10 |
11 | import copy
12 | import math
13 | import torch
14 | import torch.nn as nn
15 | import torch.nn.functional as F
16 |
17 |
18 | class PCLoss(nn.Module):
19 | """ Reference: https://github.com/salesforce/PCL/blob/018a929c53fcb93fd07041b1725185e1237d2c0e/pcl/builder.py#L168
20 | """
21 |
22 | def __init__(self, temperature, device, contrast_mode="all"):
23 | super(PCLoss, self).__init__()
24 | self.contrast_mode = contrast_mode
25 | self.criterion = NCELoss(temperature, device)
26 |
27 | def forward(self, batch_sample_one, batch_sample_two, intents, intent_ids):
28 | """
29 | features:
30 | intents: num_clusters x batch_size x hidden_dims
31 | """
32 | # instance contrast with prototypes
33 | mean_pcl_loss = 0
34 | # do de-noise
35 | if intent_ids is not None:
36 | for intent, intent_id in zip(intents, intent_ids):
37 | pos_one_compare_loss = self.criterion(batch_sample_one, intent, intent_id)
38 | pos_two_compare_loss = self.criterion(batch_sample_two, intent, intent_id)
39 | mean_pcl_loss += pos_one_compare_loss
40 | mean_pcl_loss += pos_two_compare_loss
41 | mean_pcl_loss /= 2 * len(intents)
42 | # don't do de-noise
43 | else:
44 | for intent in intents:
45 | pos_one_compare_loss = self.criterion(batch_sample_one, intent, intent_ids=None)
46 | pos_two_compare_loss = self.criterion(batch_sample_two, intent, intent_ids=None)
47 | mean_pcl_loss += pos_one_compare_loss
48 | mean_pcl_loss += pos_two_compare_loss
49 | mean_pcl_loss /= 2 * len(intents)
50 | return mean_pcl_loss
51 |
52 |
53 | class SupConLoss(nn.Module):
54 | """Supervised Contrastive Learning: https://arxiv.org/pdf/2004.11362.pdf.
55 | It also supports the unsupervised contrastive loss in SimCLR"""
56 |
57 | def __init__(self, temperature, device, contrast_mode="all"):
58 | super(SupConLoss, self).__init__()
59 | self.device = device
60 | self.temperature = temperature
61 | self.contrast_mode = contrast_mode
62 | self.total_calls = 0
63 | self.call_with_repeat_seq = 0
64 |
65 | def forward(self, features, intents=None, mask=None):
66 | """Compute loss for model. If both `labels` and `mask` are None,
67 | it degenerates to SimCLR unsupervised loss:
68 | https://arxiv.org/pdf/2002.05709.pdf
69 | Args:
70 | features: hidden vector of shape [bsz, n_views, ...].
71 | labels: ground truth of shape [bsz].
72 | mask: contrastive mask of shape [bsz, bsz], mask_{i,j}=1 if sample j
73 | has the same class as sample i. Can be asymmetric.
74 | Returns:
75 | A loss scalar.
76 | """
77 |
78 | # check probability of intent belongs to the same intent
79 | if intents is not None:
80 | unique_intents = torch.unique(intents)
81 | if unique_intents.shape[0] != intents.shape[0]:
82 | self.call_with_repeat_seq += 1
83 | self.total_calls += 1
84 | if len(features.shape) < 3:
85 | raise ValueError("`features` needs to be [bsz, n_views, ...]," "at least 3 dimensions are required")
86 | if len(features.shape) > 3:
87 | features = features.view(features.shape[0], features.shape[1], -1)
88 |
89 | # normalize features
90 | features = F.normalize(features, dim=2)
91 |
92 | batch_size = features.shape[0]
93 | if intents is not None and mask is not None:
94 | raise ValueError("Cannot define both `labels` and `mask`")
95 | elif intents is None and mask is None:
96 | mask = torch.eye(batch_size, dtype=torch.float32).to(self.device)
97 | elif intents is not None:
98 | intents = intents.contiguous().view(-1, 1)
99 | if intents.shape[0] != batch_size:
100 | raise ValueError("Num of labels does not match num of features")
101 | mask = torch.eq(intents, intents.T).float().to(self.device)
102 | else:
103 | mask = mask.float().to(self.device)
104 |
105 | contrast_count = features.shape[1]
106 | contrast_feature = torch.cat(torch.unbind(features, dim=1), dim=0)
107 | if self.contrast_mode == "one":
108 | anchor_feature = features[:, 0]
109 | anchor_count = 1
110 | elif self.contrast_mode == "all":
111 | anchor_feature = contrast_feature
112 | anchor_count = contrast_count
113 | else:
114 | raise ValueError("Unknown mode: {}".format(self.contrast_mode))
115 |
116 | # compute logits
117 | anchor_dot_contrast = torch.div(torch.matmul(anchor_feature, contrast_feature.T), self.temperature)
118 | # for numerical stability
119 | logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True)
120 | logits = anchor_dot_contrast - logits_max.detach()
121 |
122 | # tile mask
123 | mask = mask.repeat(anchor_count, contrast_count)
124 | # mask-out self-contrast cases
125 | logits_mask = torch.scatter(
126 | torch.ones_like(mask), 1, torch.arange(batch_size * anchor_count).view(-1, 1).to(self.device), 0
127 | )
128 | mask = mask * logits_mask
129 |
130 | # compute log_prob
131 | exp_logits = torch.exp(logits) * logits_mask
132 | log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))
133 |
134 | # compute mean of log-likelihood over positive
135 | mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)
136 |
137 | # loss
138 | # loss = - (self.temperature / self.base_temperature) * mean_log_prob_pos
139 | loss = -mean_log_prob_pos
140 | loss = loss.view(anchor_count, batch_size).mean()
141 |
142 | return loss
143 |
144 |
145 | class NCELoss(nn.Module):
146 | """
147 | Eq. (12): L_{NCE}
148 | """
149 |
150 | def __init__(self, temperature, device):
151 | super(NCELoss, self).__init__()
152 | self.device = device
153 | self.criterion = nn.CrossEntropyLoss().to(self.device)
154 | self.temperature = temperature
155 | self.cossim = nn.CosineSimilarity(dim=-1).to(self.device)
156 |
157 | # #modified based on impl: https://github.com/ae-foster/pytorch-simclr/blob/dc9ac57a35aec5c7d7d5fe6dc070a975f493c1a5/critic.py#L5
158 | def forward(self, batch_sample_one, batch_sample_two, intent_ids=None):
159 | # sim11 = self.cossim(batch_sample_one.unsqueeze(-2), batch_sample_one.unsqueeze(-3)) / self.temperature
160 | # sim22 = self.cossim(batch_sample_two.unsqueeze(-2), batch_sample_two.unsqueeze(-3)) / self.temperature
161 | # sim12 = self.cossim(batch_sample_one.unsqueeze(-2), batch_sample_two.unsqueeze(-3)) / self.temperature
162 | sim11 = torch.matmul(batch_sample_one, batch_sample_one.T) / self.temperature
163 | sim22 = torch.matmul(batch_sample_two, batch_sample_two.T) / self.temperature
164 | sim12 = torch.matmul(batch_sample_one, batch_sample_two.T) / self.temperature
165 | d = sim12.shape[-1]
166 | # avoid contrast against positive intents
167 | if intent_ids is not None:
168 | intent_ids = intent_ids.contiguous().view(-1, 1)
169 | mask_11_22 = torch.eq(intent_ids, intent_ids.T).long().to(self.device)
170 | sim11[mask_11_22 == 1] = float("-inf")
171 | sim22[mask_11_22 == 1] = float("-inf")
172 | eye_metrix = torch.eye(d, dtype=torch.long).to(self.device)
173 | mask_11_22[eye_metrix == 1] = 0
174 | sim12[mask_11_22 == 1] = float("-inf")
175 | else:
176 | mask = torch.eye(d, dtype=torch.long).to(self.device)
177 | sim11[mask == 1] = float("-inf")
178 | sim22[mask == 1] = float("-inf")
179 | # sim22 = sim22.masked_fill_(mask, -np.inf)
180 | # sim11[..., range(d), range(d)] = float('-inf')
181 | # sim22[..., range(d), range(d)] = float('-inf')
182 |
183 | raw_scores1 = torch.cat([sim12, sim11], dim=-1)
184 | raw_scores2 = torch.cat([sim22, sim12.transpose(-1, -2)], dim=-1)
185 | logits = torch.cat([raw_scores1, raw_scores2], dim=-2)
186 | labels = torch.arange(2 * d, dtype=torch.long, device=logits.device)
187 | nce_loss = self.criterion(logits, labels)
188 | return nce_loss
189 |
190 |
191 | class NTXent(nn.Module):
192 | """
193 | Contrastive loss with distributed data parallel support
194 | code: https://github.com/AndrewAtanov/simclr-pytorch/blob/master/models/losses.py
195 | """
196 |
197 | LARGE_NUMBER = 1e9
198 |
199 | def __init__(self, tau=1.0, gpu=None, multiplier=2, distributed=False):
200 | super().__init__()
201 | self.tau = tau
202 | self.multiplier = multiplier
203 | self.distributed = distributed
204 | self.norm = 1.0
205 |
206 | def forward(self, batch_sample_one, batch_sample_two):
207 | z = torch.cat([batch_sample_one, batch_sample_two], dim=0)
208 | n = z.shape[0]
209 | assert n % self.multiplier == 0
210 |
211 | z = F.normalize(z, p=2, dim=1) / np.sqrt(self.tau)
212 | logits = z @ z.t()
213 | logits[np.arange(n), np.arange(n)] = -self.LARGE_NUMBER
214 |
215 | logprob = F.log_softmax(logits, dim=1)
216 |
217 | # choose all positive objects for an example, for i it would be (i + k * n/m), where k=0...(m-1)
218 | m = self.multiplier
219 | labels = (np.repeat(np.arange(n), m) + np.tile(np.arange(m) * n // m, n)) % n
220 | # remove labels pointet to itself, i.e. (i, i)
221 | labels = labels.reshape(n, m)[:, 1:].reshape(-1)
222 |
223 | # TODO: maybe different terms for each process should only be computed here...
224 | loss = -logprob[np.repeat(np.arange(n), m - 1), labels].sum() / n / (m - 1) / self.norm
225 | return loss
226 |
227 |
228 | def gelu(x):
229 | """Implementation of the gelu activation function.
230 | For information: OpenAI GPT's gelu is slightly different
231 | (and gives slightly different results):
232 | 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) *
233 | (x + 0.044715 * torch.pow(x, 3))))
234 | Also see https://arxiv.org/abs/1606.08415
235 | """
236 | return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
237 |
238 |
239 | def swish(x):
240 | return x * torch.sigmoid(x)
241 |
242 |
243 | ACT2FN = {"gelu": gelu, "relu": F.relu, "swish": swish}
244 |
245 |
246 | class LayerNorm(nn.Module):
247 | def __init__(self, hidden_size, eps=1e-12):
248 | """Construct a layernorm module in the TF style (epsilon inside the square root).
249 | """
250 | super(LayerNorm, self).__init__()
251 | self.weight = nn.Parameter(torch.ones(hidden_size))
252 | self.bias = nn.Parameter(torch.zeros(hidden_size))
253 | self.variance_epsilon = eps
254 |
255 | def forward(self, x):
256 | u = x.mean(-1, keepdim=True)
257 | s = (x - u).pow(2).mean(-1, keepdim=True)
258 | x = (x - u) / torch.sqrt(s + self.variance_epsilon)
259 | return self.weight * x + self.bias
260 |
261 |
262 | class Embeddings(nn.Module):
263 | """Construct the embeddings from item, position.
264 | """
265 |
266 | def __init__(self, args):
267 | super(Embeddings, self).__init__()
268 |
269 | self.item_embeddings = nn.Embedding(args.item_size, args.hidden_size, padding_idx=0) # 不要乱用padding_idx
270 | self.position_embeddings = nn.Embedding(args.max_seq_length, args.hidden_size)
271 |
272 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12)
273 | self.dropout = nn.Dropout(args.hidden_dropout_prob)
274 |
275 | self.args = args
276 |
277 | def forward(self, input_ids):
278 | seq_length = input_ids.size(1)
279 | position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device)
280 | position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
281 | items_embeddings = self.item_embeddings(input_ids)
282 | position_embeddings = self.position_embeddings(position_ids)
283 | embeddings = items_embeddings + position_embeddings
284 | # 修改属性
285 | embeddings = self.LayerNorm(embeddings)
286 | embeddings = self.dropout(embeddings)
287 | return embeddings
288 |
289 |
290 | class SelfAttention(nn.Module):
291 | def __init__(self, args):
292 | super(SelfAttention, self).__init__()
293 | if args.hidden_size % args.num_attention_heads != 0:
294 | raise ValueError(
295 | "The hidden size (%d) is not a multiple of the number of attention "
296 | "heads (%d)" % (args.hidden_size, args.num_attention_heads)
297 | )
298 | self.num_attention_heads = args.num_attention_heads
299 | self.attention_head_size = int(args.hidden_size / args.num_attention_heads)
300 | self.all_head_size = self.num_attention_heads * self.attention_head_size
301 |
302 | self.query = nn.Linear(args.hidden_size, self.all_head_size)
303 | self.key = nn.Linear(args.hidden_size, self.all_head_size)
304 | self.value = nn.Linear(args.hidden_size, self.all_head_size)
305 |
306 | self.attn_dropout = nn.Dropout(args.attention_probs_dropout_prob)
307 |
308 | # 做完self-attention 做一个前馈全连接 LayerNorm 输出
309 | self.dense = nn.Linear(args.hidden_size, args.hidden_size)
310 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12)
311 | self.out_dropout = nn.Dropout(args.hidden_dropout_prob)
312 |
313 | def transpose_for_scores(self, x):
314 | new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
315 | x = x.view(*new_x_shape)
316 | return x.permute(0, 2, 1, 3)
317 |
318 | def forward(self, input_tensor, attention_mask):
319 | mixed_query_layer = self.query(input_tensor)
320 | mixed_key_layer = self.key(input_tensor)
321 | mixed_value_layer = self.value(input_tensor)
322 |
323 | query_layer = self.transpose_for_scores(mixed_query_layer)
324 | key_layer = self.transpose_for_scores(mixed_key_layer)
325 | value_layer = self.transpose_for_scores(mixed_value_layer)
326 |
327 | # Take the dot product between "query" and "key" to get the raw attention scores.
328 | attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
329 |
330 | attention_scores = attention_scores / math.sqrt(self.attention_head_size)
331 | # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
332 | # [batch_size heads seq_len seq_len] scores
333 | # [batch_size 1 1 seq_len]
334 | attention_scores = attention_scores + attention_mask
335 |
336 | # Normalize the attention scores to probabilities.
337 | attention_probs = nn.Softmax(dim=-1)(attention_scores)
338 | # This is actually dropping out entire tokens to attend to, which might
339 | # seem a bit unusual, but is taken from the original Transformer paper.
340 | # Fixme
341 | attention_probs = self.attn_dropout(attention_probs)
342 | context_layer = torch.matmul(attention_probs, value_layer)
343 | context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
344 | new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
345 | context_layer = context_layer.view(*new_context_layer_shape)
346 | hidden_states = self.dense(context_layer)
347 | hidden_states = self.out_dropout(hidden_states)
348 | hidden_states = self.LayerNorm(hidden_states + input_tensor)
349 |
350 | return hidden_states
351 |
352 |
353 | class Intermediate(nn.Module):
354 | def __init__(self, args):
355 | super(Intermediate, self).__init__()
356 | self.dense_1 = nn.Linear(args.hidden_size, args.hidden_size * 4)
357 | if isinstance(args.hidden_act, str):
358 | self.intermediate_act_fn = ACT2FN[args.hidden_act]
359 | else:
360 | self.intermediate_act_fn = args.hidden_act
361 |
362 | self.dense_2 = nn.Linear(args.hidden_size * 4, args.hidden_size)
363 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12)
364 | self.dropout = nn.Dropout(args.hidden_dropout_prob)
365 |
366 | def forward(self, input_tensor):
367 |
368 | hidden_states = self.dense_1(input_tensor)
369 | hidden_states = self.intermediate_act_fn(hidden_states)
370 |
371 | hidden_states = self.dense_2(hidden_states)
372 | hidden_states = self.dropout(hidden_states)
373 | hidden_states = self.LayerNorm(hidden_states + input_tensor)
374 |
375 | return hidden_states
376 |
377 |
378 | class Layer(nn.Module):
379 | def __init__(self, args):
380 | super(Layer, self).__init__()
381 | self.attention = SelfAttention(args)
382 | self.intermediate = Intermediate(args)
383 |
384 | def forward(self, hidden_states, attention_mask):
385 | attention_output = self.attention(hidden_states, attention_mask)
386 | intermediate_output = self.intermediate(attention_output)
387 | return intermediate_output
388 |
389 |
390 | class Encoder(nn.Module):
391 | def __init__(self, args):
392 | super(Encoder, self).__init__()
393 | layer = Layer(args)
394 | self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(args.num_hidden_layers)])
395 |
396 | def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True):
397 | all_encoder_layers = []
398 | for layer_module in self.layer:
399 | hidden_states = layer_module(hidden_states, attention_mask)
400 | if output_all_encoded_layers:
401 | all_encoder_layers.append(hidden_states)
402 | if not output_all_encoded_layers:
403 | all_encoder_layers.append(hidden_states)
404 | return all_encoder_layers
405 |
--------------------------------------------------------------------------------
/src/output/ICLRec-Beauty-1.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Beauty-1.pt
--------------------------------------------------------------------------------
/src/output/ICLRec-Sports_and_Outdoors-1.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Sports_and_Outdoors-1.pt
--------------------------------------------------------------------------------
/src/output/ICLRec-Toys_and_Games-1.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Toys_and_Games-1.pt
--------------------------------------------------------------------------------
/src/output/ICLRec-Yelp-1.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Yelp-1.pt
--------------------------------------------------------------------------------
/src/output/ICLRec-ml-1m-1.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-ml-1m-1.pt
--------------------------------------------------------------------------------
/src/output/README.md:
--------------------------------------------------------------------------------
1 |
2 | # Trained models on the four datasets
3 |
4 | Run following commands to evaluate ICLRec on four datasets, respectively:
5 | For Sports_and_Outdoors
6 | ```
7 | python main.py --data_name Sports_and_Outdoors --model_idx 1 --do_eval
8 | ```
9 |
10 | For Beauty
11 |
12 | ```
13 | python main.py --data_name Beauty --model_idx 1 --do_eval
14 | ```
15 |
16 | For Toys_and_Games
17 |
18 | ```
19 | python main.py --data_name Toys_and_Games --model_idx 1 --do_eval
20 | ```
21 |
22 | For Yelp
23 |
24 | ```
25 | python main.py --data_name Yelp --model_idx 1 --do_eval
26 | ```
27 |
--------------------------------------------------------------------------------
/src/scripts/run_beauty.sh:
--------------------------------------------------------------------------------
1 | python3 main.py --data_name Beauty --cf_weight 0.1 \
2 | --model_idx 1 --gpu_id 0 \
3 | --batch_size 256 --contrast_type Hybrid \
4 | --num_intent_cluster 256 --seq_representation_type mean \
5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 1 \
--------------------------------------------------------------------------------
/src/scripts/run_ml_1m.sh:
--------------------------------------------------------------------------------
1 | python3 main.py --data_name ml-1m --cf_weight 0.0 \
2 | --model_idx 1 --gpu_id 0 \
3 | --batch_size 256 --contrast_type IntentCL \
4 | --num_intent_cluster 256 --seq_representation_type mean \
5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2 --max_seq_length 200
6 |
--------------------------------------------------------------------------------
/src/scripts/run_sports.sh:
--------------------------------------------------------------------------------
1 | python3 main.py --data_name Sports_and_Outdoors --cf_weight 0.1 \
2 | --model_idx 1 --gpu_id 0 \
3 | --batch_size 256 --contrast_type Hybrid \
4 | --num_intent_cluster 256 --seq_representation_type mean \
5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2 \
--------------------------------------------------------------------------------
/src/scripts/run_toys.sh:
--------------------------------------------------------------------------------
1 | python3 main.py --data_name Toys_and_Games --cf_weight 0.1 \
2 | --model_idx 1 --gpu_id 0 \
3 | --batch_size 256 --contrast_type Hybrid \
4 | --num_intent_cluster 256 --seq_representation_type mean \
5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 3
--------------------------------------------------------------------------------
/src/scripts/run_yelp.sh:
--------------------------------------------------------------------------------
1 | python3 main.py --data_name Yelp --cf_weight 0.1 \
2 | --model_idx 1 --gpu_id 0 \
3 | --batch_size 256 --contrast_type Hybrid \
4 | --num_intent_cluster 256 --seq_representation_type mean \
5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2
--------------------------------------------------------------------------------
/src/trainers.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Copyright (c) 2022 salesforce.com, inc.
4 | # All rights reserved.
5 | # SPDX-License-Identifier: BSD-3-Clause
6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
7 | #
8 |
9 | import numpy as np
10 | from tqdm import tqdm
11 | import random
12 |
13 | import torch
14 | import torch.nn as nn
15 | from torch.optim import Adam
16 | from torch.utils.data import DataLoader, RandomSampler
17 |
18 | from models import KMeans
19 | from datasets import RecWithContrastiveLearningDataset
20 | from modules import NCELoss, NTXent, SupConLoss, PCLoss
21 | from utils import recall_at_k, ndcg_k, get_metric, get_user_seqs, nCr
22 |
23 |
24 | class Trainer:
25 | def __init__(self, model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args):
26 |
27 | self.args = args
28 | self.cuda_condition = torch.cuda.is_available() and not self.args.no_cuda
29 | self.device = torch.device("cuda" if self.cuda_condition else "cpu")
30 |
31 | self.model = model
32 |
33 | self.num_intent_clusters = [int(i) for i in self.args.num_intent_clusters.split(",")]
34 | self.clusters = []
35 | for num_intent_cluster in self.num_intent_clusters:
36 | # initialize Kmeans
37 | if self.args.seq_representation_type == "mean":
38 | cluster = KMeans(
39 | num_cluster=num_intent_cluster,
40 | seed=self.args.seed,
41 | hidden_size=self.args.hidden_size,
42 | gpu_id=self.args.gpu_id,
43 | device=self.device,
44 | )
45 | self.clusters.append(cluster)
46 | else:
47 | cluster = KMeans(
48 | num_cluster=num_intent_cluster,
49 | seed=self.args.seed,
50 | hidden_size=self.args.hidden_size * self.args.max_seq_length,
51 | gpu_id=self.args.gpu_id,
52 | device=self.device,
53 | )
54 | self.clusters.append(cluster)
55 |
56 | self.total_augmentaion_pairs = nCr(self.args.n_views, 2)
57 | # projection head for contrastive learn task
58 | self.projection = nn.Sequential(
59 | nn.Linear(self.args.max_seq_length * self.args.hidden_size, 512, bias=False),
60 | nn.BatchNorm1d(512),
61 | nn.ReLU(inplace=True),
62 | nn.Linear(512, self.args.hidden_size, bias=True),
63 | )
64 | if self.cuda_condition:
65 | self.model.cuda()
66 | self.projection.cuda()
67 | # Setting the train and test data loader
68 | self.train_dataloader = train_dataloader
69 | self.cluster_dataloader = cluster_dataloader
70 | self.eval_dataloader = eval_dataloader
71 | self.test_dataloader = test_dataloader
72 |
73 | # self.data_name = self.args.data_name
74 | betas = (self.args.adam_beta1, self.args.adam_beta2)
75 | self.optim = Adam(self.model.parameters(), lr=self.args.lr, betas=betas, weight_decay=self.args.weight_decay)
76 |
77 | print("Total Parameters:", sum([p.nelement() for p in self.model.parameters()]))
78 |
79 | self.cf_criterion = NCELoss(self.args.temperature, self.device)
80 | self.pcl_criterion = PCLoss(self.args.temperature, self.device)
81 |
82 | def train(self, epoch):
83 | self.iteration(epoch, self.train_dataloader, self.cluster_dataloader)
84 |
85 | def valid(self, epoch, full_sort=False):
86 | return self.iteration(epoch, self.eval_dataloader, full_sort=full_sort, train=False)
87 |
88 | def test(self, epoch, full_sort=False):
89 | return self.iteration(epoch, self.test_dataloader, full_sort=full_sort, train=False)
90 |
91 | def iteration(self, epoch, dataloader, full_sort=False, train=True):
92 | raise NotImplementedError
93 |
94 | def get_sample_scores(self, epoch, pred_list):
95 | pred_list = (-pred_list).argsort().argsort()[:, 0]
96 | HIT_1, NDCG_1, MRR = get_metric(pred_list, 1)
97 | HIT_5, NDCG_5, MRR = get_metric(pred_list, 5)
98 | HIT_10, NDCG_10, MRR = get_metric(pred_list, 10)
99 | post_fix = {
100 | "Epoch": epoch,
101 | "HIT@1": "{:.4f}".format(HIT_1),
102 | "NDCG@1": "{:.4f}".format(NDCG_1),
103 | "HIT@5": "{:.4f}".format(HIT_5),
104 | "NDCG@5": "{:.4f}".format(NDCG_5),
105 | "HIT@10": "{:.4f}".format(HIT_10),
106 | "NDCG@10": "{:.4f}".format(NDCG_10),
107 | "MRR": "{:.4f}".format(MRR),
108 | }
109 | print(post_fix)
110 | with open(self.args.log_file, "a") as f:
111 | f.write(str(post_fix) + "\n")
112 | return [HIT_1, NDCG_1, HIT_5, NDCG_5, HIT_10, NDCG_10, MRR], str(post_fix)
113 |
114 | def get_full_sort_score(self, epoch, answers, pred_list):
115 | recall, ndcg = [], []
116 | for k in [5, 10, 15, 20]:
117 | recall.append(recall_at_k(answers, pred_list, k))
118 | ndcg.append(ndcg_k(answers, pred_list, k))
119 | post_fix = {
120 | "Epoch": epoch,
121 | "HIT@5": "{:.4f}".format(recall[0]),
122 | "NDCG@5": "{:.4f}".format(ndcg[0]),
123 | "HIT@10": "{:.4f}".format(recall[1]),
124 | "NDCG@10": "{:.4f}".format(ndcg[1]),
125 | "HIT@20": "{:.4f}".format(recall[3]),
126 | "NDCG@20": "{:.4f}".format(ndcg[3]),
127 | }
128 | print(post_fix)
129 | with open(self.args.log_file, "a") as f:
130 | f.write(str(post_fix) + "\n")
131 | return [recall[0], ndcg[0], recall[1], ndcg[1], recall[3], ndcg[3]], str(post_fix)
132 |
133 | def save(self, file_name):
134 | torch.save(self.model.cpu().state_dict(), file_name)
135 | self.model.to(self.device)
136 |
137 | def load(self, file_name):
138 | self.model.load_state_dict(torch.load(file_name))
139 |
140 | def cross_entropy(self, seq_out, pos_ids, neg_ids):
141 | # [batch seq_len hidden_size]
142 | pos_emb = self.model.item_embeddings(pos_ids)
143 | neg_emb = self.model.item_embeddings(neg_ids)
144 | # [batch*seq_len hidden_size]
145 | pos = pos_emb.view(-1, pos_emb.size(2))
146 | neg = neg_emb.view(-1, neg_emb.size(2))
147 | seq_emb = seq_out.view(-1, self.args.hidden_size) # [batch*seq_len hidden_size]
148 | pos_logits = torch.sum(pos * seq_emb, -1) # [batch*seq_len]
149 | neg_logits = torch.sum(neg * seq_emb, -1)
150 | istarget = (pos_ids > 0).view(pos_ids.size(0) * self.model.args.max_seq_length).float() # [batch*seq_len]
151 | loss = torch.sum(
152 | -torch.log(torch.sigmoid(pos_logits) + 1e-24) * istarget
153 | - torch.log(1 - torch.sigmoid(neg_logits) + 1e-24) * istarget
154 | ) / torch.sum(istarget)
155 |
156 | return loss
157 |
158 | def predict_sample(self, seq_out, test_neg_sample):
159 | # [batch 100 hidden_size]
160 | test_item_emb = self.model.item_embeddings(test_neg_sample)
161 | # [batch hidden_size]
162 | test_logits = torch.bmm(test_item_emb, seq_out.unsqueeze(-1)).squeeze(-1) # [B 100]
163 | return test_logits
164 |
165 | def predict_full(self, seq_out):
166 | # [item_num hidden_size]
167 | test_item_emb = self.model.item_embeddings.weight
168 | # [batch hidden_size ]
169 | rating_pred = torch.matmul(seq_out, test_item_emb.transpose(0, 1))
170 | return rating_pred
171 |
172 |
173 | class ICLRecTrainer(Trainer):
174 | def __init__(self, model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args):
175 | super(ICLRecTrainer, self).__init__(
176 | model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args
177 | )
178 |
179 | def _instance_cl_one_pair_contrastive_learning(self, inputs, intent_ids=None):
180 | """
181 | contrastive learning given one pair sequences (batch)
182 | inputs: [batch1_augmented_data, batch2_augmentated_data]
183 | """
184 | cl_batch = torch.cat(inputs, dim=0)
185 | cl_batch = cl_batch.to(self.device)
186 | cl_sequence_output = self.model(cl_batch)
187 | # cf_sequence_output = cf_sequence_output[:, -1, :]
188 | if self.args.seq_representation_instancecl_type == "mean":
189 | cl_sequence_output = torch.mean(cl_sequence_output, dim=1, keepdim=False)
190 | cl_sequence_flatten = cl_sequence_output.view(cl_batch.shape[0], -1)
191 | # cf_output = self.projection(cf_sequence_flatten)
192 | batch_size = cl_batch.shape[0] // 2
193 | cl_output_slice = torch.split(cl_sequence_flatten, batch_size)
194 | if self.args.de_noise:
195 | cl_loss = self.cf_criterion(cl_output_slice[0], cl_output_slice[1], intent_ids=intent_ids)
196 | else:
197 | cl_loss = self.cf_criterion(cl_output_slice[0], cl_output_slice[1], intent_ids=None)
198 | return cl_loss
199 |
200 | def _pcl_one_pair_contrastive_learning(self, inputs, intents, intent_ids):
201 | """
202 | contrastive learning given one pair sequences (batch)
203 | inputs: [batch1_augmented_data, batch2_augmentated_data]
204 | intents: [num_clusters batch_size hidden_dims]
205 | """
206 | n_views, (bsz, seq_len) = len(inputs), inputs[0].shape
207 | cl_batch = torch.cat(inputs, dim=0)
208 | cl_batch = cl_batch.to(self.device)
209 | cl_sequence_output = self.model(cl_batch)
210 | if self.args.seq_representation_type == "mean":
211 | cl_sequence_output = torch.mean(cl_sequence_output, dim=1, keepdim=False)
212 | cl_sequence_flatten = cl_sequence_output.view(cl_batch.shape[0], -1)
213 | cl_output_slice = torch.split(cl_sequence_flatten, bsz)
214 | if self.args.de_noise:
215 | cl_loss = self.pcl_criterion(cl_output_slice[0], cl_output_slice[1], intents=intents, intent_ids=intent_ids)
216 | else:
217 | cl_loss = self.pcl_criterion(cl_output_slice[0], cl_output_slice[1], intents=intents, intent_ids=None)
218 | return cl_loss
219 |
220 | def iteration(self, epoch, dataloader, cluster_dataloader=None, full_sort=True, train=True):
221 |
222 | str_code = "train" if train else "test"
223 |
224 | # Setting the tqdm progress bar
225 |
226 | if train:
227 | # ------ intentions clustering ----- #
228 | if self.args.contrast_type in ["IntentCL", "Hybrid"] and epoch >= self.args.warm_up_epoches:
229 | print("Preparing Clustering:")
230 | self.model.eval()
231 | kmeans_training_data = []
232 | rec_cf_data_iter = tqdm(enumerate(cluster_dataloader), total=len(cluster_dataloader))
233 | for i, (rec_batch, _, _) in rec_cf_data_iter:
234 | rec_batch = tuple(t.to(self.device) for t in rec_batch)
235 | _, input_ids, target_pos, target_neg, _ = rec_batch
236 | sequence_output = self.model(input_ids)
237 | # average sum
238 | if self.args.seq_representation_type == "mean":
239 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False)
240 | sequence_output = sequence_output.view(sequence_output.shape[0], -1)
241 | sequence_output = sequence_output.detach().cpu().numpy()
242 | kmeans_training_data.append(sequence_output)
243 | kmeans_training_data = np.concatenate(kmeans_training_data, axis=0)
244 |
245 | # train multiple clusters
246 | print("Training Clusters:")
247 | for i, cluster in tqdm(enumerate(self.clusters), total=len(self.clusters)):
248 | cluster.train(kmeans_training_data)
249 | self.clusters[i] = cluster
250 | # clean memory
251 | del kmeans_training_data
252 | import gc
253 |
254 | gc.collect()
255 |
256 | # ------ model training -----#
257 | print("Performing Rec model Training:")
258 | self.model.train()
259 | rec_avg_loss = 0.0
260 | cl_individual_avg_losses = [0.0 for i in range(self.total_augmentaion_pairs)]
261 | cl_sum_avg_loss = 0.0
262 | joint_avg_loss = 0.0
263 |
264 | print(f"rec dataset length: {len(dataloader)}")
265 | rec_cf_data_iter = tqdm(enumerate(dataloader), total=len(dataloader))
266 |
267 | for i, (rec_batch, cl_batches, seq_class_label_batches) in rec_cf_data_iter:
268 | """
269 | rec_batch shape: key_name x batch_size x feature_dim
270 | cl_batches shape:
271 | list of n_views x batch_size x feature_dim tensors
272 | """
273 | # 0. batch_data will be sent into the device(GPU or CPU)
274 | rec_batch = tuple(t.to(self.device) for t in rec_batch)
275 | _, input_ids, target_pos, target_neg, _ = rec_batch
276 |
277 | # ---------- recommendation task ---------------#
278 | sequence_output = self.model(input_ids)
279 | rec_loss = self.cross_entropy(sequence_output, target_pos, target_neg)
280 |
281 | # ---------- contrastive learning task -------------#
282 | cl_losses = []
283 | for cl_batch in cl_batches:
284 | if self.args.contrast_type == "InstanceCL":
285 | cl_loss = self._instance_cl_one_pair_contrastive_learning(
286 | cl_batch, intent_ids=seq_class_label_batches
287 | )
288 | cl_losses.append(self.args.cf_weight * cl_loss)
289 | elif self.args.contrast_type == "IntentCL":
290 | # ------ performing clustering for getting users' intentions ----#
291 | # average sum
292 | if epoch >= self.args.warm_up_epoches:
293 | if self.args.seq_representation_type == "mean":
294 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False)
295 | sequence_output = sequence_output.view(sequence_output.shape[0], -1)
296 | sequence_output = sequence_output.detach().cpu().numpy()
297 |
298 | # query on multiple clusters
299 | for cluster in self.clusters:
300 | seq2intents = []
301 | intent_ids = []
302 | intent_id, seq2intent = cluster.query(sequence_output)
303 | seq2intents.append(seq2intent)
304 | intent_ids.append(intent_id)
305 | cl_loss = self._pcl_one_pair_contrastive_learning(
306 | cl_batch, intents=seq2intents, intent_ids=intent_ids
307 | )
308 | cl_losses.append(self.args.intent_cf_weight * cl_loss)
309 | else:
310 | continue
311 | elif self.args.contrast_type == "Hybrid":
312 | if epoch < self.args.warm_up_epoches:
313 | cl_loss1 = self._instance_cl_one_pair_contrastive_learning(
314 | cl_batch, intent_ids=seq_class_label_batches
315 | )
316 | cl_losses.append(self.args.cf_weight * cl_loss1)
317 | else:
318 | cl_loss1 = self._instance_cl_one_pair_contrastive_learning(
319 | cl_batch, intent_ids=seq_class_label_batches
320 | )
321 | cl_losses.append(self.args.cf_weight * cl_loss1)
322 | if self.args.seq_representation_type == "mean":
323 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False)
324 | sequence_output = sequence_output.view(sequence_output.shape[0], -1)
325 | sequence_output = sequence_output.detach().cpu().numpy()
326 | # query on multiple clusters
327 | for cluster in self.clusters:
328 | seq2intents = []
329 | intent_ids = []
330 | intent_id, seq2intent = cluster.query(sequence_output)
331 | seq2intents.append(seq2intent)
332 | intent_ids.append(intent_id)
333 | cl_loss3 = self._pcl_one_pair_contrastive_learning(
334 | cl_batch, intents=seq2intents, intent_ids=intent_ids
335 | )
336 | cl_losses.append(self.args.intent_cf_weight * cl_loss3)
337 |
338 | joint_loss = self.args.rec_weight * rec_loss
339 | for cl_loss in cl_losses:
340 | joint_loss += cl_loss
341 | self.optim.zero_grad()
342 | joint_loss.backward()
343 | self.optim.step()
344 |
345 | rec_avg_loss += rec_loss.item()
346 |
347 | for i, cl_loss in enumerate(cl_losses):
348 | cl_sum_avg_loss += cl_loss.item()
349 | joint_avg_loss += joint_loss.item()
350 |
351 | post_fix = {
352 | "epoch": epoch,
353 | "rec_avg_loss": "{:.4f}".format(rec_avg_loss / len(rec_cf_data_iter)),
354 | "joint_avg_loss": "{:.4f}".format(joint_avg_loss / len(rec_cf_data_iter)),
355 | }
356 | if (epoch + 1) % self.args.log_freq == 0:
357 | print(str(post_fix))
358 |
359 | with open(self.args.log_file, "a") as f:
360 | f.write(str(post_fix) + "\n")
361 |
362 | else:
363 | rec_data_iter = tqdm(enumerate(dataloader), total=len(dataloader))
364 | self.model.eval()
365 |
366 | pred_list = None
367 |
368 | if full_sort:
369 | answer_list = None
370 | for i, batch in rec_data_iter:
371 | # 0. batch_data will be sent into the device(GPU or cpu)
372 | batch = tuple(t.to(self.device) for t in batch)
373 | user_ids, input_ids, target_pos, target_neg, answers = batch
374 | recommend_output = self.model(input_ids)
375 |
376 | recommend_output = recommend_output[:, -1, :]
377 | # recommendation results
378 |
379 | rating_pred = self.predict_full(recommend_output)
380 |
381 | rating_pred = rating_pred.cpu().data.numpy().copy()
382 | batch_user_index = user_ids.cpu().numpy()
383 | rating_pred[self.args.train_matrix[batch_user_index].toarray() > 0] = 0
384 | # reference: https://stackoverflow.com/a/23734295, https://stackoverflow.com/a/20104162
385 | # argpartition T: O(n) argsort O(nlogn)
386 | ind = np.argpartition(rating_pred, -20)[:, -20:]
387 | arr_ind = rating_pred[np.arange(len(rating_pred))[:, None], ind]
388 | arr_ind_argsort = np.argsort(arr_ind)[np.arange(len(rating_pred)), ::-1]
389 | batch_pred_list = ind[np.arange(len(rating_pred))[:, None], arr_ind_argsort]
390 |
391 | if i == 0:
392 | pred_list = batch_pred_list
393 | answer_list = answers.cpu().data.numpy()
394 | else:
395 | pred_list = np.append(pred_list, batch_pred_list, axis=0)
396 | answer_list = np.append(answer_list, answers.cpu().data.numpy(), axis=0)
397 | return self.get_full_sort_score(epoch, answer_list, pred_list)
398 |
399 | else:
400 | for i, batch in rec_data_iter:
401 | batch = tuple(t.to(self.device) for t in batch)
402 | user_ids, input_ids, target_pos, target_neg, answers, sample_negs = batch
403 | recommend_output = self.model.finetune(input_ids)
404 | test_neg_items = torch.cat((answers, sample_negs), -1)
405 | recommend_output = recommend_output[:, -1, :]
406 |
407 | test_logits = self.predict_sample(recommend_output, test_neg_items)
408 | test_logits = test_logits.cpu().detach().numpy().copy()
409 | if i == 0:
410 | pred_list = test_logits
411 | else:
412 | pred_list = np.append(pred_list, test_logits, axis=0)
413 |
414 | return self.get_sample_scores(epoch, pred_list)
415 |
--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Copyright (c) 2022 salesforce.com, inc.
4 | # All rights reserved.
5 | # SPDX-License-Identifier: BSD-3-Clause
6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
7 | #
8 |
9 | import numpy as np
10 | import math
11 | import random
12 | import os
13 | import json
14 | import pickle
15 | from scipy.sparse import csr_matrix
16 |
17 | import torch
18 | import torch.nn.functional as F
19 |
20 |
21 | def set_seed(seed):
22 | random.seed(seed)
23 | os.environ["PYTHONHASHSEED"] = str(seed)
24 | np.random.seed(seed)
25 | torch.manual_seed(seed)
26 | torch.cuda.manual_seed(seed)
27 | torch.cuda.manual_seed_all(seed)
28 | # some cudnn methods can be random even after fixing the seed
29 | # unless you tell it to be deterministic
30 | torch.backends.cudnn.deterministic = True
31 |
32 |
33 | def nCr(n, r):
34 | f = math.factorial
35 | return f(n) // f(r) // f(n - r)
36 |
37 |
38 | def check_path(path):
39 | if not os.path.exists(path):
40 | os.makedirs(path)
41 | print(f"{path} created")
42 |
43 |
44 | def neg_sample(item_set, item_size): # 前闭后闭
45 | item = random.randint(1, item_size - 1)
46 | while item in item_set:
47 | item = random.randint(1, item_size - 1)
48 | return item
49 |
50 |
51 | class EarlyStopping:
52 | """Early stops the training if validation loss doesn't improve after a given patience."""
53 |
54 | def __init__(self, checkpoint_path, patience=7, verbose=False, delta=0):
55 | """
56 | Args:
57 | patience (int): How long to wait after last time validation loss improved.
58 | Default: 7
59 | verbose (bool): If True, prints a message for each validation loss improvement.
60 | Default: False
61 | delta (float): Minimum change in the monitored quantity to qualify as an improvement.
62 | Default: 0
63 | """
64 | self.checkpoint_path = checkpoint_path
65 | self.patience = patience
66 | self.verbose = verbose
67 | self.counter = 0
68 | self.best_score = None
69 | self.early_stop = False
70 | self.delta = delta
71 |
72 | def compare(self, score):
73 | for i in range(len(score)):
74 | # 有一个指标增加了就认为是还在涨
75 | if score[i] > self.best_score[i] + self.delta:
76 | return False
77 | return True
78 |
79 | def __call__(self, score, model):
80 | # score HIT@10 NDCG@10
81 |
82 | if self.best_score is None:
83 | self.best_score = score
84 | self.score_min = np.array([0] * len(score))
85 | self.save_checkpoint(score, model)
86 | elif self.compare(score):
87 | self.counter += 1
88 | print(f"EarlyStopping counter: {self.counter} out of {self.patience}")
89 | if self.counter >= self.patience:
90 | self.early_stop = True
91 | else:
92 | self.best_score = score
93 | self.save_checkpoint(score, model)
94 | self.counter = 0
95 |
96 | def save_checkpoint(self, score, model):
97 | """Saves model when validation loss decrease."""
98 | if self.verbose:
99 | # ({self.score_min:.6f} --> {score:.6f}) # 这里如果是一个值的话输出才不会有问题
100 | print(f"Validation score increased. Saving model ...")
101 | torch.save(model.state_dict(), self.checkpoint_path)
102 | self.score_min = score
103 |
104 |
105 | def kmax_pooling(x, dim, k):
106 | index = x.topk(k, dim=dim)[1].sort(dim=dim)[0]
107 | return x.gather(dim, index).squeeze(dim)
108 |
109 |
110 | def avg_pooling(x, dim):
111 | return x.sum(dim=dim) / x.size(dim)
112 |
113 |
114 | def generate_rating_matrix_valid(user_seq, num_users, num_items):
115 | # three lists are used to construct sparse matrix
116 | row = []
117 | col = []
118 | data = []
119 | for user_id, item_list in enumerate(user_seq):
120 | for item in item_list[:-2]: #
121 | row.append(user_id)
122 | col.append(item)
123 | data.append(1)
124 |
125 | row = np.array(row)
126 | col = np.array(col)
127 | data = np.array(data)
128 | rating_matrix = csr_matrix((data, (row, col)), shape=(num_users, num_items))
129 |
130 | return rating_matrix
131 |
132 |
133 | def generate_rating_matrix_test(user_seq, num_users, num_items):
134 | # three lists are used to construct sparse matrix
135 | row = []
136 | col = []
137 | data = []
138 | for user_id, item_list in enumerate(user_seq):
139 | for item in item_list[:-1]: #
140 | row.append(user_id)
141 | col.append(item)
142 | data.append(1)
143 |
144 | row = np.array(row)
145 | col = np.array(col)
146 | data = np.array(data)
147 | rating_matrix = csr_matrix((data, (row, col)), shape=(num_users, num_items))
148 |
149 | return rating_matrix
150 |
151 |
152 | def get_user_seqs(data_file):
153 | lines = open(data_file).readlines()
154 | user_seq = []
155 | item_set = set()
156 | for line in lines:
157 | user, items = line.strip().split(" ", 1)
158 | items = items.split(" ")
159 | items = [int(item) for item in items]
160 | user_seq.append(items)
161 | item_set = item_set | set(items)
162 | max_item = max(item_set)
163 |
164 | num_users = len(lines)
165 | num_items = max_item + 2
166 |
167 | valid_rating_matrix = generate_rating_matrix_valid(user_seq, num_users, num_items)
168 | test_rating_matrix = generate_rating_matrix_test(user_seq, num_users, num_items)
169 | return user_seq, max_item, valid_rating_matrix, test_rating_matrix
170 |
171 |
172 | def get_user_seqs_long(data_file):
173 | lines = open(data_file).readlines()
174 | user_seq = []
175 | long_sequence = []
176 | item_set = set()
177 | for line in lines:
178 | user, items = line.strip().split(" ", 1)
179 | items = items.split(" ")
180 | items = [int(item) for item in items]
181 | long_sequence.extend(items) # 后面的都是采的负例
182 | user_seq.append(items)
183 | item_set = item_set | set(items)
184 | max_item = max(item_set)
185 |
186 | return user_seq, max_item, long_sequence
187 |
188 |
189 | def get_user_seqs_and_sample(data_file, sample_file):
190 | lines = open(data_file).readlines()
191 | user_seq = []
192 | item_set = set()
193 | for line in lines:
194 | user, items = line.strip().split(" ", 1)
195 | items = items.split(" ")
196 | items = [int(item) for item in items]
197 | user_seq.append(items)
198 | item_set = item_set | set(items)
199 | max_item = max(item_set)
200 |
201 | lines = open(sample_file).readlines()
202 | sample_seq = []
203 | for line in lines:
204 | user, items = line.strip().split(" ", 1)
205 | items = items.split(" ")
206 | items = [int(item) for item in items]
207 | sample_seq.append(items)
208 |
209 | assert len(user_seq) == len(sample_seq)
210 |
211 | return user_seq, max_item, sample_seq
212 |
213 |
214 | def get_item2attribute_json(data_file):
215 | item2attribute = json.loads(open(data_file).readline())
216 | attribute_set = set()
217 | for item, attributes in item2attribute.items():
218 | attribute_set = attribute_set | set(attributes)
219 | attribute_size = max(attribute_set) # 331
220 | return item2attribute, attribute_size
221 |
222 |
223 | def get_metric(pred_list, topk=10):
224 | NDCG = 0.0
225 | HIT = 0.0
226 | MRR = 0.0
227 | # [batch] the answer's rank
228 | for rank in pred_list:
229 | MRR += 1.0 / (rank + 1.0)
230 | if rank < topk:
231 | NDCG += 1.0 / np.log2(rank + 2.0)
232 | HIT += 1.0
233 | return HIT / len(pred_list), NDCG / len(pred_list), MRR / len(pred_list)
234 |
235 |
236 | def precision_at_k_per_sample(actual, predicted, topk):
237 | num_hits = 0
238 | for place in predicted:
239 | if place in actual:
240 | num_hits += 1
241 | return num_hits / (topk + 0.0)
242 |
243 |
244 | def precision_at_k(actual, predicted, topk):
245 | sum_precision = 0.0
246 | num_users = len(predicted)
247 | for i in range(num_users):
248 | act_set = set(actual[i])
249 | pred_set = set(predicted[i][:topk])
250 | sum_precision += len(act_set & pred_set) / float(topk)
251 |
252 | return sum_precision / num_users
253 |
254 |
255 | def recall_at_k(actual, predicted, topk):
256 | sum_recall = 0.0
257 | num_users = len(predicted)
258 | true_users = 0
259 | for i in range(num_users):
260 | act_set = set(actual[i])
261 | pred_set = set(predicted[i][:topk])
262 | if len(act_set) != 0:
263 | sum_recall += len(act_set & pred_set) / float(len(act_set))
264 | true_users += 1
265 | return sum_recall / true_users
266 |
267 |
268 | def apk(actual, predicted, k=10):
269 | """
270 | Computes the average precision at k.
271 | This function computes the average precision at k between two lists of
272 | items.
273 | Parameters
274 | ----------
275 | actual : list
276 | A list of elements that are to be predicted (order doesn't matter)
277 | predicted : list
278 | A list of predicted elements (order does matter)
279 | k : int, optional
280 | The maximum number of predicted elements
281 | Returns
282 | -------
283 | score : double
284 | The average precision at k over the input lists
285 | """
286 | if len(predicted) > k:
287 | predicted = predicted[:k]
288 |
289 | score = 0.0
290 | num_hits = 0.0
291 |
292 | for i, p in enumerate(predicted):
293 | if p in actual and p not in predicted[:i]:
294 | num_hits += 1.0
295 | score += num_hits / (i + 1.0)
296 |
297 | if not actual:
298 | return 0.0
299 |
300 | return score / min(len(actual), k)
301 |
302 |
303 | def mapk(actual, predicted, k=10):
304 | """
305 | Computes the mean average precision at k.
306 | This function computes the mean average prescision at k between two lists
307 | of lists of items.
308 | Parameters
309 | ----------
310 | actual : list
311 | A list of lists of elements that are to be predicted
312 | (order doesn't matter in the lists)
313 | predicted : list
314 | A list of lists of predicted elements
315 | (order matters in the lists)
316 | k : int, optional
317 | The maximum number of predicted elements
318 | Returns
319 | -------
320 | score : double
321 | The mean average precision at k over the input lists
322 | """
323 | return np.mean([apk(a, p, k) for a, p in zip(actual, predicted)])
324 |
325 |
326 | def ndcg_k(actual, predicted, topk):
327 | res = 0
328 | for user_id in range(len(actual)):
329 | k = min(topk, len(actual[user_id]))
330 | idcg = idcg_k(k)
331 | dcg_k = sum([int(predicted[user_id][j] in set(actual[user_id])) / math.log(j + 2, 2) for j in range(topk)])
332 | res += dcg_k / idcg
333 | return res / float(len(actual))
334 |
335 |
336 | # Calculates the ideal discounted cumulative gain at k
337 | def idcg_k(k):
338 | res = sum([1.0 / math.log(i + 2, 2) for i in range(k)])
339 | if not res:
340 | return 1.0
341 | else:
342 | return res
343 |
--------------------------------------------------------------------------------