├── .copyright.tmpl ├── .pre-commit-config.yaml ├── CODEOWNERS ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING-ARCHIVED.md ├── LICENSE.txt ├── README.md ├── SECURIITY.md ├── SECURITY.md ├── data ├── Beauty.txt ├── README.md ├── Sports_and_Outdoors.txt ├── Toys_and_Games.txt ├── Yelp.txt └── ml-1m.txt ├── img ├── model.png └── motivation_sports.png └── src ├── data_augmentation.py ├── datasets.py ├── main.py ├── models.py ├── modules.py ├── output ├── ICLRec-Beauty-1.pt ├── ICLRec-Sports_and_Outdoors-1.pt ├── ICLRec-Toys_and_Games-1.pt ├── ICLRec-Yelp-1.pt ├── ICLRec-ml-1m-1.pt └── README.md ├── scripts ├── run_beauty.sh ├── run_ml_1m.sh ├── run_sports.sh ├── run_toys.sh └── run_yelp.sh ├── trainers.py └── utils.py /.copyright.tmpl: -------------------------------------------------------------------------------- 1 | Copyright (c) ${years} ${owner} 2 | All rights reserved. 3 | SPDX-License-Identifier: BSD-3-Clause 4 | For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 5 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/psf/black 3 | rev: '19.3b0' 4 | hooks: 5 | - id: black 6 | args: ["--line-length", "120"] 7 | - repo: https://github.com/johann-petrak/licenseheaders.git 8 | rev: 'v0.8.8' 9 | hooks: 10 | - id: licenseheaders 11 | args: ["-t", ".copyright.tmpl", "-cy", "-o", "salesforce.com, inc.", 12 | "-E", ".py", "-f"] -------------------------------------------------------------------------------- /CODEOWNERS: -------------------------------------------------------------------------------- 1 | # Comment line immediately above ownership line is reserved for related other information. Please be careful while editing. 2 | #ECCN:Open Source 3 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Salesforce Open Source Community Code of Conduct 2 | 3 | ## About the Code of Conduct 4 | 5 | Equality is a core value at Salesforce. We believe a diverse and inclusive 6 | community fosters innovation and creativity, and are committed to building a 7 | culture where everyone feels included. 8 | 9 | Salesforce open-source projects are committed to providing a friendly, safe, and 10 | welcoming environment for all, regardless of gender identity and expression, 11 | sexual orientation, disability, physical appearance, body size, ethnicity, nationality, 12 | race, age, religion, level of experience, education, socioeconomic status, or 13 | other similar personal characteristics. 14 | 15 | The goal of this code of conduct is to specify a baseline standard of behavior so 16 | that people with different social values and communication styles can work 17 | together effectively, productively, and respectfully in our open source community. 18 | It also establishes a mechanism for reporting issues and resolving conflicts. 19 | 20 | All questions and reports of abusive, harassing, or otherwise unacceptable behavior 21 | in a Salesforce open-source project may be reported by contacting the Salesforce 22 | Open Source Conduct Committee at ossconduct@salesforce.com. 23 | 24 | ## Our Pledge 25 | 26 | In the interest of fostering an open and welcoming environment, we as 27 | contributors and maintainers pledge to making participation in our project and 28 | our community a harassment-free experience for everyone, regardless of gender 29 | identity and expression, sexual orientation, disability, physical appearance, 30 | body size, ethnicity, nationality, race, age, religion, level of experience, education, 31 | socioeconomic status, or other similar personal characteristics. 32 | 33 | ## Our Standards 34 | 35 | Examples of behavior that contributes to creating a positive environment 36 | include: 37 | 38 | * Using welcoming and inclusive language 39 | * Being respectful of differing viewpoints and experiences 40 | * Gracefully accepting constructive criticism 41 | * Focusing on what is best for the community 42 | * Showing empathy toward other community members 43 | 44 | Examples of unacceptable behavior by participants include: 45 | 46 | * The use of sexualized language or imagery and unwelcome sexual attention or 47 | advances 48 | * Personal attacks, insulting/derogatory comments, or trolling 49 | * Public or private harassment 50 | * Publishing, or threatening to publish, others' private information—such as 51 | a physical or electronic address—without explicit permission 52 | * Other conduct which could reasonably be considered inappropriate in a 53 | professional setting 54 | * Advocating for or encouraging any of the above behaviors 55 | 56 | ## Our Responsibilities 57 | 58 | Project maintainers are responsible for clarifying the standards of acceptable 59 | behavior and are expected to take appropriate and fair corrective action in 60 | response to any instances of unacceptable behavior. 61 | 62 | Project maintainers have the right and responsibility to remove, edit, or 63 | reject comments, commits, code, wiki edits, issues, and other contributions 64 | that are not aligned with this Code of Conduct, or to ban temporarily or 65 | permanently any contributor for other behaviors that they deem inappropriate, 66 | threatening, offensive, or harmful. 67 | 68 | ## Scope 69 | 70 | This Code of Conduct applies both within project spaces and in public spaces 71 | when an individual is representing the project or its community. Examples of 72 | representing a project or community include using an official project email 73 | address, posting via an official social media account, or acting as an appointed 74 | representative at an online or offline event. Representation of a project may be 75 | further defined and clarified by project maintainers. 76 | 77 | ## Enforcement 78 | 79 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 80 | reported by contacting the Salesforce Open Source Conduct Committee 81 | at ossconduct@salesforce.com. All complaints will be reviewed and investigated 82 | and will result in a response that is deemed necessary and appropriate to the 83 | circumstances. The committee is obligated to maintain confidentiality with 84 | regard to the reporter of an incident. Further details of specific enforcement 85 | policies may be posted separately. 86 | 87 | Project maintainers who do not follow or enforce the Code of Conduct in good 88 | faith may face temporary or permanent repercussions as determined by other 89 | members of the project's leadership and the Salesforce Open Source Conduct 90 | Committee. 91 | 92 | ## Attribution 93 | 94 | This Code of Conduct is adapted from the [Contributor Covenant][contributor-covenant-home], 95 | version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html. 96 | It includes adaptions and additions from [Go Community Code of Conduct][golang-coc], 97 | [CNCF Code of Conduct][cncf-coc], and [Microsoft Open Source Code of Conduct][microsoft-coc]. 98 | 99 | This Code of Conduct is licensed under the [Creative Commons Attribution 3.0 License][cc-by-3-us]. 100 | 101 | [contributor-covenant-home]: https://www.contributor-covenant.org (https://www.contributor-covenant.org/) 102 | [golang-coc]: https://golang.org/conduct 103 | [cncf-coc]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md 104 | [microsoft-coc]: https://opensource.microsoft.com/codeofconduct/ 105 | [cc-by-3-us]: https://creativecommons.org/licenses/by/3.0/us/ 106 | -------------------------------------------------------------------------------- /CONTRIBUTING-ARCHIVED.md: -------------------------------------------------------------------------------- 1 | # ARCHIVED 2 | 3 | This project is `Archived` and is no longer actively maintained; 4 | We are not accepting contributions or Pull Requests. 5 | 6 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Salesforce.com, Inc. 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 9 | 10 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 11 | 12 | 3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Intent Contrastive Learning for Sequential Recommendation (ICLRec) 2 | 3 | Source code for paper: [Intent Contrastive Learning for Sequential Recommendation](https://arxiv.org/pdf/2202.02519.pdf) 4 | 5 | ## Introduction 6 | 7 | Motivation: 8 | 9 | Users' interactions with items are driven by various underlying intents. These intents are often unobservable while potentially beneficial to learn a better users' preferences toward massive item set. 10 | 11 | 12 | 13 | Model Architecture: 14 | 15 | 16 | 17 | ## Reference 18 | 19 | Please cite our paper if you use this code. 20 | 21 | ``` 22 | @article{chen2022intent, 23 | title={Intent Contrastive Learning for Sequential Recommendation}, 24 | author={Chen, Yongjun and Liu, Zhiwei and Li, Jia and McAuley, Julian and Xiong, Caiming}, 25 | journal={arXiv preprint arXiv:2202.02519}, 26 | year={2022} 27 | } 28 | ``` 29 | 30 | ## Implementation 31 | ### Requirements 32 | 33 | Python >= 3.7 34 | Pytorch >= 1.2.0 35 | tqdm == 4.26.0 36 | faiss-gpu==1.7.1 37 | 38 | ### Datasets 39 | 40 | Four prepared datasets are included in `data` folder. 41 | 42 | 43 | ### Evaluate Model 44 | 45 | We provide the trained models on Beauty, Sports_and_Games, Toys_and_Games, and Yelp datasets in `./src/output` folder. You can directly evaluate the trained models on test set by running: 46 | 47 | ``` 48 | python main.py --data_name --model_idx 1 --do_eval 49 | ``` 50 | 51 | You are expected following results: 52 | 53 | On Beauty: 54 | ``` 55 | {'Epoch': 0, 'HIT@5': '0.0500', 'NDCG@5': '0.0326', 'HIT@10': '0.0744', 'NDCG@10': '0.0403', 'HIT@20': '0.1058', 'NDCG@20': '0.0483'} 56 | ``` 57 | On Sports: 58 | ``` 59 | {'Epoch': 0, 'HIT@5': '0.0290', 'NDCG@5': '0.0191', 'HIT@10': '0.0437', 'NDCG@10': '0.0238', 'HIT@20': '0.0646', 'NDCG@20': '0.0291'} 60 | ``` 61 | On Toys: 62 | 63 | ``` 64 | {'Epoch': 0, 'HIT@5': '0.0598', 'NDCG@5': '0.0404', 'HIT@10': '0.0834', 'NDCG@10': '0.0480', 'HIT@20': '0.1138', 'NDCG@20': '0.0557'} 65 | ``` 66 | 67 | On Yelp: 68 | ``` 69 | {'Epoch': 0, 'HIT@5': '0.0240', 'NDCG@5': '0.0153', 'HIT@10': '0.0409', 'NDCG@10': '0.0207', 'HIT@20': '0.0659', 'NDCG@20': '0.0270'} 70 | ``` 71 | 72 | Please feel free to test is out! 73 | 74 | 75 | ### Train Model 76 | 77 | To train ICLRec on a specific dataset, change to the `src` folder and run following command: 78 | 79 | ``` 80 | bash scripts/run_.sh 81 | ``` 82 | 83 | The script will automatically train ICLRec and save the best model found in validation set, and then evaluate on test set. 84 | 85 | 86 | ## Acknowledgment 87 | - Transformer and training pipeline are implemented based on [S3-Rec](https://github.com/RUCAIBox/CIKM2020-S3Rec). Thanks them for providing efficient implementation. 88 | 89 | -------------------------------------------------------------------------------- /SECURIITY.md: -------------------------------------------------------------------------------- 1 | ## Security 2 | 3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com) 4 | as soon as it is discovered. This library limits its runtime dependencies in 5 | order to reduce the total cost of ownership as much as can be, but all consumers 6 | should remain vigilant and have their security stakeholders review all third-party 7 | products (3PP) like this one and their dependencies. 8 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | ## Security 2 | 3 | Please report any security issue to [security@salesforce.com](mailto:security@salesforce.com) 4 | as soon as it is discovered. This library limits its runtime dependencies in 5 | order to reduce the total cost of ownership as much as can be, but all consumers 6 | should remain vigilant and have their security stakeholders review all third-party 7 | products (3PP) like this one and their dependencies. -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | ## Datasets 2 | 3 | We provide four preprocessed datasets, Beauty, Sports_and_Outdoors, Toys_and_Games, and Yelp. 4 | 5 | The first three datasets are originally from [here](http://jmcauley.ucsd.edu/data/amazon/index.html). 6 | 7 | Cite following one or both if you use them: 8 | 9 | ``` 10 | @inproceedings{he2016ups, 11 | title={Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering}, 12 | author={He, Ruining and McAuley, Julian}, 13 | booktitle={proceedings of the 25th international conference on world wide web}, 14 | pages={507--517}, 15 | year={2016} 16 | } 17 | ``` 18 | and 19 | ``` 20 | @inproceedings{mcauley2015image, 21 | title={Image-based recommendations on styles and substitutes}, 22 | author={McAuley, Julian and Targett, Christopher and Shi, Qinfeng and Van Den Hengel, Anton}, 23 | booktitle={Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval}, 24 | pages={43--52}, 25 | year={2015} 26 | } 27 | ``` 28 | . 29 | 30 | 31 | The Yelp dataset is from https://www.yelp.com/dataset. 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /img/model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/img/model.png -------------------------------------------------------------------------------- /img/motivation_sports.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/img/motivation_sports.png -------------------------------------------------------------------------------- /src/data_augmentation.py: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright (c) 2022 salesforce.com, inc. 3 | # All rights reserved. 4 | # SPDX-License-Identifier: BSD-3-Clause 5 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 6 | # 7 | import random 8 | import copy 9 | import itertools 10 | 11 | 12 | class Random(object): 13 | """Randomly pick one data augmentation type every time call""" 14 | 15 | def __init__(self, tao=0.2, gamma=0.7, beta=0.2): 16 | self.data_augmentation_methods = [Crop(tao=tao), Mask(gamma=gamma), Reorder(beta=beta)] 17 | print("Total augmentation numbers: ", len(self.data_augmentation_methods)) 18 | 19 | def __call__(self, sequence): 20 | # randint generate int x in range: a <= x <= b 21 | augment_method_idx = random.randint(0, len(self.data_augmentation_methods) - 1) 22 | augment_method = self.data_augmentation_methods[augment_method_idx] 23 | # print(augment_method.__class__.__name__) # debug usage 24 | return augment_method(sequence) 25 | 26 | 27 | def _ensmeble_sim_models(top_k_one, top_k_two): 28 | # only support top k = 1 case so far 29 | # print("offline: ",top_k_one, "online: ", top_k_two) 30 | if top_k_one[0][1] >= top_k_two[0][1]: 31 | return [top_k_one[0][0]] 32 | else: 33 | return [top_k_two[0][1]] 34 | 35 | 36 | class Crop(object): 37 | """Randomly crop a subseq from the original sequence""" 38 | 39 | def __init__(self, tao=0.2): 40 | self.tao = tao 41 | 42 | def __call__(self, sequence): 43 | # make a deep copy to avoid original sequence be modified 44 | copied_sequence = copy.deepcopy(sequence) 45 | sub_seq_length = int(self.tao * len(copied_sequence)) 46 | # randint generate int x in range: a <= x <= b 47 | start_index = random.randint(0, len(copied_sequence) - sub_seq_length - 1) 48 | if sub_seq_length < 1: 49 | return [copied_sequence[start_index]] 50 | else: 51 | cropped_seq = copied_sequence[start_index : start_index + sub_seq_length] 52 | return cropped_seq 53 | 54 | 55 | class Mask(object): 56 | """Randomly mask k items given a sequence""" 57 | 58 | def __init__(self, gamma=0.7): 59 | self.gamma = gamma 60 | 61 | def __call__(self, sequence): 62 | # make a deep copy to avoid original sequence be modified 63 | copied_sequence = copy.deepcopy(sequence) 64 | mask_nums = int(self.gamma * len(copied_sequence)) 65 | mask = [0 for i in range(mask_nums)] 66 | mask_idx = random.sample([i for i in range(len(copied_sequence))], k=mask_nums) 67 | for idx, mask_value in zip(mask_idx, mask): 68 | copied_sequence[idx] = mask_value 69 | return copied_sequence 70 | 71 | 72 | class Reorder(object): 73 | """Randomly shuffle a continuous sub-sequence""" 74 | 75 | def __init__(self, beta=0.2): 76 | self.beta = beta 77 | 78 | def __call__(self, sequence): 79 | # make a deep copy to avoid original sequence be modified 80 | copied_sequence = copy.deepcopy(sequence) 81 | sub_seq_length = int(self.beta * len(copied_sequence)) 82 | start_index = random.randint(0, len(copied_sequence) - sub_seq_length - 1) 83 | sub_seq = copied_sequence[start_index : start_index + sub_seq_length] 84 | random.shuffle(sub_seq) 85 | reordered_seq = copied_sequence[:start_index] + sub_seq + copied_sequence[start_index + sub_seq_length :] 86 | assert len(copied_sequence) == len(reordered_seq) 87 | return reordered_seq 88 | 89 | 90 | if __name__ == "__main__": 91 | reorder = Reorder(beta=0.2) 92 | sequence = [ 93 | 14052, 94 | 10908, 95 | 2776, 96 | 16243, 97 | 2726, 98 | 2961, 99 | 11962, 100 | 4672, 101 | 2224, 102 | 5727, 103 | 4985, 104 | 9310, 105 | 2181, 106 | 3428, 107 | 4156, 108 | 16536, 109 | 180, 110 | 12044, 111 | 13700, 112 | ] 113 | rs = reorder(sequence) 114 | crop = Crop(tao=0.2) 115 | rs = crop(sequence) 116 | # rt = RandomType() 117 | # rs = rt(sequence) 118 | n_views = 5 119 | enum_type = CombinatorialEnumerateType(n_views=n_views) 120 | for i in range(40): 121 | if i == 20: 122 | print("-------") 123 | es = enum_type(sequence) 124 | -------------------------------------------------------------------------------- /src/datasets.py: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright (c) 2022 salesforce.com, inc. 3 | # All rights reserved. 4 | # SPDX-License-Identifier: BSD-3-Clause 5 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 6 | # 7 | import random 8 | import torch 9 | from torch.utils.data import Dataset 10 | 11 | from data_augmentation import Crop, Mask, Reorder, Random 12 | from utils import neg_sample, nCr 13 | import copy 14 | 15 | 16 | class RecWithContrastiveLearningDataset(Dataset): 17 | def __init__(self, args, user_seq, test_neg_items=None, data_type="train", similarity_model_type="offline"): 18 | self.args = args 19 | self.user_seq = user_seq 20 | self.test_neg_items = test_neg_items 21 | self.data_type = data_type 22 | self.max_len = args.max_seq_length 23 | # currently apply one transform, will extend to multiples 24 | self.augmentations = { 25 | "crop": Crop(tao=args.tao), 26 | "mask": Mask(gamma=args.gamma), 27 | "reorder": Reorder(beta=args.beta), 28 | "random": Random(tao=args.tao, gamma=args.gamma, beta=args.beta), 29 | } 30 | if self.args.augment_type not in self.augmentations: 31 | raise ValueError(f"augmentation type: '{self.args.augment_type}' is invalided") 32 | print(f"Creating Contrastive Learning Dataset using '{self.args.augment_type}' data augmentation") 33 | self.base_transform = self.augmentations[self.args.augment_type] 34 | # number of augmentations for each sequences, current support two 35 | self.n_views = self.args.n_views 36 | 37 | def _one_pair_data_augmentation(self, input_ids): 38 | """ 39 | provides two positive samples given one sequence 40 | """ 41 | augmented_seqs = [] 42 | for i in range(2): 43 | augmented_input_ids = self.base_transform(input_ids) 44 | pad_len = self.max_len - len(augmented_input_ids) 45 | augmented_input_ids = [0] * pad_len + augmented_input_ids 46 | 47 | augmented_input_ids = augmented_input_ids[-self.max_len :] 48 | 49 | assert len(augmented_input_ids) == self.max_len 50 | 51 | cur_tensors = torch.tensor(augmented_input_ids, dtype=torch.long) 52 | augmented_seqs.append(cur_tensors) 53 | return augmented_seqs 54 | 55 | def _process_sequence_label_signal(self, seq_label_signal): 56 | seq_class_label = torch.tensor(seq_label_signal, dtype=torch.long) 57 | return seq_class_label 58 | 59 | def _data_sample_rec_task(self, user_id, items, input_ids, target_pos, answer): 60 | # make a deep copy to avoid original sequence be modified 61 | copied_input_ids = copy.deepcopy(input_ids) 62 | target_neg = [] 63 | seq_set = set(items) 64 | for _ in copied_input_ids: 65 | target_neg.append(neg_sample(seq_set, self.args.item_size)) 66 | 67 | pad_len = self.max_len - len(copied_input_ids) 68 | copied_input_ids = [0] * pad_len + copied_input_ids 69 | target_pos = [0] * pad_len + target_pos 70 | target_neg = [0] * pad_len + target_neg 71 | 72 | copied_input_ids = copied_input_ids[-self.max_len :] 73 | target_pos = target_pos[-self.max_len :] 74 | target_neg = target_neg[-self.max_len :] 75 | 76 | assert len(copied_input_ids) == self.max_len 77 | assert len(target_pos) == self.max_len 78 | assert len(target_neg) == self.max_len 79 | 80 | if self.test_neg_items is not None: 81 | test_samples = self.test_neg_items[index] 82 | 83 | cur_rec_tensors = ( 84 | torch.tensor(user_id, dtype=torch.long), # user_id for testing 85 | torch.tensor(copied_input_ids, dtype=torch.long), 86 | torch.tensor(target_pos, dtype=torch.long), 87 | torch.tensor(target_neg, dtype=torch.long), 88 | torch.tensor(answer, dtype=torch.long), 89 | torch.tensor(test_samples, dtype=torch.long), 90 | ) 91 | else: 92 | cur_rec_tensors = ( 93 | torch.tensor(user_id, dtype=torch.long), # user_id for testing 94 | torch.tensor(copied_input_ids, dtype=torch.long), 95 | torch.tensor(target_pos, dtype=torch.long), 96 | torch.tensor(target_neg, dtype=torch.long), 97 | torch.tensor(answer, dtype=torch.long), 98 | ) 99 | 100 | return cur_rec_tensors 101 | 102 | def _add_noise_interactions(self, items): 103 | copied_sequence = copy.deepcopy(items) 104 | insert_nums = max(int(self.args.noise_ratio * len(copied_sequence)), 0) 105 | if insert_nums == 0: 106 | return copied_sequence 107 | insert_idx = random.choices([i for i in range(len(copied_sequence))], k=insert_nums) 108 | inserted_sequence = [] 109 | for index, item in enumerate(copied_sequence): 110 | if index in insert_idx: 111 | item_id = random.randint(1, self.args.item_size - 2) 112 | while item_id in copied_sequence: 113 | item_id = random.randint(1, self.args.item_size - 2) 114 | inserted_sequence += [item_id] 115 | inserted_sequence += [item] 116 | return inserted_sequence 117 | 118 | def __getitem__(self, index): 119 | user_id = index 120 | items = self.user_seq[index] 121 | 122 | assert self.data_type in {"train", "valid", "test"} 123 | 124 | # [0, 1, 2, 3, 4, 5, 6] 125 | # train [0, 1, 2, 3] 126 | # target [1, 2, 3, 4] 127 | if self.data_type == "train": 128 | input_ids = items[:-3] 129 | target_pos = items[1:-2] 130 | seq_label_signal = items[-2] 131 | answer = [0] # no use 132 | elif self.data_type == "valid": 133 | input_ids = items[:-2] 134 | target_pos = items[1:-1] 135 | answer = [items[-2]] 136 | 137 | else: 138 | items_with_noise = self._add_noise_interactions(items) 139 | input_ids = items_with_noise[:-1] 140 | target_pos = items_with_noise[1:] 141 | answer = [items_with_noise[-1]] 142 | if self.data_type == "train": 143 | cur_rec_tensors = self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer) 144 | cf_tensors_list = [] 145 | # if n_views == 2, then it's downgraded to pair-wise contrastive learning 146 | total_augmentaion_pairs = nCr(self.n_views, 2) 147 | for i in range(total_augmentaion_pairs): 148 | cf_tensors_list.append(self._one_pair_data_augmentation(input_ids)) 149 | 150 | # add supervision of sequences 151 | seq_class_label = self._process_sequence_label_signal(seq_label_signal) 152 | return (cur_rec_tensors, cf_tensors_list, seq_class_label) 153 | elif self.data_type == "valid": 154 | cur_rec_tensors = self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer) 155 | return cur_rec_tensors 156 | else: 157 | cur_rec_tensors = self._data_sample_rec_task(user_id, items_with_noise, input_ids, target_pos, answer) 158 | return cur_rec_tensors 159 | 160 | def __len__(self): 161 | """ 162 | consider n_view of a single sequence as one sample 163 | """ 164 | return len(self.user_seq) 165 | 166 | 167 | class SASRecDataset(Dataset): 168 | def __init__(self, args, user_seq, test_neg_items=None, data_type="train"): 169 | self.args = args 170 | self.user_seq = user_seq 171 | self.test_neg_items = test_neg_items 172 | self.data_type = data_type 173 | self.max_len = args.max_seq_length 174 | 175 | def _data_sample_rec_task(self, user_id, items, input_ids, target_pos, answer): 176 | # make a deep copy to avoid original sequence be modified 177 | copied_input_ids = copy.deepcopy(input_ids) 178 | target_neg = [] 179 | seq_set = set(items) 180 | for _ in input_ids: 181 | target_neg.append(neg_sample(seq_set, self.args.item_size)) 182 | 183 | pad_len = self.max_len - len(input_ids) 184 | input_ids = [0] * pad_len + input_ids 185 | target_pos = [0] * pad_len + target_pos 186 | target_neg = [0] * pad_len + target_neg 187 | 188 | input_ids = input_ids[-self.max_len :] 189 | target_pos = target_pos[-self.max_len :] 190 | target_neg = target_neg[-self.max_len :] 191 | 192 | assert len(input_ids) == self.max_len 193 | assert len(target_pos) == self.max_len 194 | assert len(target_neg) == self.max_len 195 | 196 | if self.test_neg_items is not None: 197 | test_samples = self.test_neg_items[index] 198 | 199 | cur_rec_tensors = ( 200 | torch.tensor(user_id, dtype=torch.long), # user_id for testing 201 | torch.tensor(input_ids, dtype=torch.long), 202 | torch.tensor(target_pos, dtype=torch.long), 203 | torch.tensor(target_neg, dtype=torch.long), 204 | torch.tensor(answer, dtype=torch.long), 205 | torch.tensor(test_samples, dtype=torch.long), 206 | ) 207 | else: 208 | cur_rec_tensors = ( 209 | torch.tensor(user_id, dtype=torch.long), # user_id for testing 210 | torch.tensor(input_ids, dtype=torch.long), 211 | torch.tensor(target_pos, dtype=torch.long), 212 | torch.tensor(target_neg, dtype=torch.long), 213 | torch.tensor(answer, dtype=torch.long), 214 | ) 215 | 216 | return cur_rec_tensors 217 | 218 | def __getitem__(self, index): 219 | 220 | user_id = index 221 | items = self.user_seq[index] 222 | 223 | assert self.data_type in {"train", "valid", "test"} 224 | 225 | # [0, 1, 2, 3, 4, 5, 6] 226 | # train [0, 1, 2, 3] 227 | # target [1, 2, 3, 4] 228 | 229 | # valid [0, 1, 2, 3, 4] 230 | # answer [5] 231 | 232 | # test [0, 1, 2, 3, 4, 5] 233 | # answer [6] 234 | if self.data_type == "train": 235 | input_ids = items[:-3] 236 | target_pos = items[1:-2] 237 | answer = [0] # no use 238 | 239 | elif self.data_type == "valid": 240 | input_ids = items[:-2] 241 | target_pos = items[1:-1] 242 | answer = [items[-2]] 243 | 244 | else: 245 | input_ids = items[:-1] 246 | target_pos = items[1:] 247 | answer = [items[-1]] 248 | 249 | return self._data_sample_rec_task(user_id, items, input_ids, target_pos, answer) 250 | 251 | def __len__(self): 252 | return len(self.user_seq) 253 | 254 | 255 | if __name__ == "__main__": 256 | import argparse 257 | from utils import get_user_seqs, set_seed 258 | from torch.utils.data import DataLoader, RandomSampler 259 | from tqdm import tqdm 260 | 261 | parser = argparse.ArgumentParser() 262 | 263 | parser.add_argument("--data_dir", default="../data/", type=str) 264 | parser.add_argument("--output_dir", default="output/", type=str) 265 | parser.add_argument("--data_name", default="Beauty", type=str) 266 | parser.add_argument("--do_eval", action="store_true") 267 | parser.add_argument("--model_idx", default=1, type=int, help="model idenfier 10, 20, 30...") 268 | 269 | # data augmentation args 270 | parser.add_argument( 271 | "--base_augment_type", 272 | default="reorder", 273 | type=str, 274 | help="data augmentation types. Chosen from mask, crop, reorder, random.", 275 | ) 276 | # model args 277 | parser.add_argument("--model_name", default="ICLRec", type=str) 278 | parser.add_argument("--hidden_size", type=int, default=64, help="hidden size of transformer model") 279 | parser.add_argument("--num_hidden_layers", type=int, default=2, help="number of layers") 280 | parser.add_argument("--num_attention_heads", default=2, type=int) 281 | parser.add_argument("--hidden_act", default="gelu", type=str) # gelu relu 282 | parser.add_argument("--attention_probs_dropout_prob", type=float, default=0.5, help="attention dropout p") 283 | parser.add_argument("--hidden_dropout_prob", type=float, default=0.5, help="hidden dropout p") 284 | parser.add_argument("--initializer_range", type=float, default=0.02) 285 | parser.add_argument("--max_seq_length", default=50, type=int) 286 | 287 | # train args 288 | parser.add_argument("--lr", type=float, default=0.001, help="learning rate of adam") 289 | parser.add_argument("--batch_size", type=int, default=2, help="number of batch_size") 290 | parser.add_argument("--epochs", type=int, default=200, help="number of epochs") 291 | parser.add_argument("--no_cuda", action="store_true") 292 | parser.add_argument("--log_freq", type=int, default=1, help="per epoch print res") 293 | parser.add_argument("--seed", default=42, type=int) 294 | ## contrastive learning related 295 | parser.add_argument("--temperature", default=1.0, type=float, help="softmax temperature (default: 1.0)") 296 | parser.add_argument( 297 | "--n_views", default=2, type=int, metavar="N", help="Number of augmented data for each sequence" 298 | ) 299 | parser.add_argument("--cf_weight", type=float, default=0.2, help="weight of contrastive learning task") 300 | parser.add_argument("--rec_weight", type=float, default=1.0, help="weight of contrastive learning task") 301 | 302 | # learning related 303 | parser.add_argument("--weight_decay", type=float, default=0.0, help="weight_decay of adam") 304 | parser.add_argument("--adam_beta1", type=float, default=0.9, help="adam first beta value") 305 | parser.add_argument("--adam_beta2", type=float, default=0.999, help="adam second beta value") 306 | parser.add_argument("--gpu_id", type=str, default="0", help="gpu_id") 307 | 308 | args = parser.parse_args() 309 | set_seed(args.seed) 310 | args.data_file = args.data_dir + args.data_name + ".txt" 311 | user_seq, max_item, valid_rating_matrix, test_rating_matrix = get_user_seqs(args.data_file) 312 | args.item_size = max_item + 2 313 | train_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="train") 314 | train_sampler = RandomSampler(train_dataset) 315 | train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=1) 316 | rec_cf_data_iter = tqdm(enumerate(train_dataloader), total=len(train_dataloader)) 317 | 318 | for i, (rec_batch, cf_batch) in rec_cf_data_iter: 319 | for j in range(len(rec_batch)): 320 | print("tensor ", j, rec_batch[j]) 321 | print("cf_batch:", cf_batch) 322 | if i > 2: 323 | break 324 | -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Copyright (c) 2022 salesforce.com, inc. 4 | # All rights reserved. 5 | # SPDX-License-Identifier: BSD-3-Clause 6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 7 | # 8 | 9 | import os 10 | import numpy as np 11 | import random 12 | import torch 13 | import argparse 14 | 15 | from torch.utils.data import DataLoader, RandomSampler, SequentialSampler 16 | 17 | from datasets import RecWithContrastiveLearningDataset 18 | 19 | from trainers import ICLRecTrainer 20 | from models import SASRecModel, OfflineItemSimilarity, OnlineItemSimilarity 21 | from utils import EarlyStopping, get_user_seqs, get_item2attribute_json, check_path, set_seed 22 | 23 | 24 | def show_args_info(args): 25 | print(f"--------------------Configure Info:------------") 26 | for arg in vars(args): 27 | print(f"{arg:<30} : {getattr(args, arg):>35}") 28 | 29 | 30 | def main(): 31 | parser = argparse.ArgumentParser() 32 | # system args 33 | parser.add_argument("--data_dir", default="../data/", type=str) 34 | parser.add_argument("--output_dir", default="output/", type=str) 35 | parser.add_argument("--data_name", default="Sports_and_Outdoors", type=str) 36 | parser.add_argument("--do_eval", action="store_true") 37 | parser.add_argument("--model_idx", default=0, type=int, help="model idenfier 10, 20, 30...") 38 | parser.add_argument("--gpu_id", type=str, default="0", help="gpu_id") 39 | 40 | # data augmentation args 41 | parser.add_argument( 42 | "--noise_ratio", 43 | default=0.0, 44 | type=float, 45 | help="percentage of negative interactions in a sequence - robustness analysis", 46 | ) 47 | parser.add_argument( 48 | "--training_data_ratio", 49 | default=1.0, 50 | type=float, 51 | help="percentage of training samples used for training - robustness analysis", 52 | ) 53 | parser.add_argument( 54 | "--augment_type", 55 | default="random", 56 | type=str, 57 | help="default data augmentation types. Chosen from: \ 58 | mask, crop, reorder, substitute, insert, random, \ 59 | combinatorial_enumerate (for multi-view).", 60 | ) 61 | parser.add_argument("--tao", type=float, default=0.2, help="crop ratio for crop operator") 62 | parser.add_argument("--gamma", type=float, default=0.7, help="mask ratio for mask operator") 63 | parser.add_argument("--beta", type=float, default=0.2, help="reorder ratio for reorder operator") 64 | 65 | ## contrastive learning task args 66 | parser.add_argument( 67 | "--temperature", default=1.0, type=float, help="softmax temperature (default: 1.0) - not studied." 68 | ) 69 | parser.add_argument( 70 | "--n_views", default=2, type=int, metavar="N", help="Number of augmented data for each sequence - not studied." 71 | ) 72 | parser.add_argument( 73 | "--contrast_type", 74 | default="Hybrid", 75 | type=str, 76 | help="Ways to contrastive of. \ 77 | Support InstanceCL and ShortInterestCL, IntentCL, and Hybrid types.", 78 | ) 79 | parser.add_argument( 80 | "--num_intent_clusters", 81 | default="256", 82 | type=str, 83 | help="Number of cluster of intents. Activated only when using \ 84 | IntentCL or Hybrid types.", 85 | ) 86 | parser.add_argument( 87 | "--seq_representation_type", 88 | default="mean", 89 | type=str, 90 | help="operate of item representation overtime. Support types: \ 91 | mean, concatenate", 92 | ) 93 | parser.add_argument( 94 | "--seq_representation_instancecl_type", 95 | default="concatenate", 96 | type=str, 97 | help="operate of item representation overtime. Support types: \ 98 | mean, concatenate", 99 | ) 100 | parser.add_argument("--warm_up_epoches", type=float, default=0, help="number of epochs to start IntentCL.") 101 | parser.add_argument("--de_noise", action="store_true", help="whether to de-false negative pairs during learning.") 102 | 103 | # model args 104 | parser.add_argument("--model_name", default="ICLRec", type=str) 105 | parser.add_argument("--hidden_size", type=int, default=64, help="hidden size of transformer model") 106 | parser.add_argument("--num_hidden_layers", type=int, default=2, help="number of layers") 107 | parser.add_argument("--num_attention_heads", default=2, type=int) 108 | parser.add_argument("--hidden_act", default="gelu", type=str) # gelu relu 109 | parser.add_argument("--attention_probs_dropout_prob", type=float, default=0.5, help="attention dropout p") 110 | parser.add_argument("--hidden_dropout_prob", type=float, default=0.5, help="hidden dropout p") 111 | parser.add_argument("--initializer_range", type=float, default=0.02) 112 | parser.add_argument("--max_seq_length", default=50, type=int) 113 | 114 | # train args 115 | parser.add_argument("--lr", type=float, default=0.001, help="learning rate of adam") 116 | parser.add_argument("--batch_size", type=int, default=256, help="number of batch_size") 117 | parser.add_argument("--epochs", type=int, default=300, help="number of epochs") 118 | parser.add_argument("--no_cuda", action="store_true") 119 | parser.add_argument("--log_freq", type=int, default=1, help="per epoch print res") 120 | parser.add_argument("--seed", default=1, type=int) 121 | parser.add_argument("--cf_weight", type=float, default=0.1, help="weight of contrastive learning task") 122 | parser.add_argument("--rec_weight", type=float, default=1.0, help="weight of contrastive learning task") 123 | parser.add_argument("--intent_cf_weight", type=float, default=0.3, help="weight of contrastive learning task") 124 | 125 | # learning related 126 | parser.add_argument("--weight_decay", type=float, default=0.0, help="weight_decay of adam") 127 | parser.add_argument("--adam_beta1", type=float, default=0.9, help="adam first beta value") 128 | parser.add_argument("--adam_beta2", type=float, default=0.999, help="adam second beta value") 129 | 130 | args = parser.parse_args() 131 | 132 | set_seed(args.seed) 133 | check_path(args.output_dir) 134 | 135 | os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu_id 136 | args.cuda_condition = torch.cuda.is_available() and not args.no_cuda 137 | print("Using Cuda:", torch.cuda.is_available()) 138 | args.data_file = args.data_dir + args.data_name + ".txt" 139 | 140 | user_seq, max_item, valid_rating_matrix, test_rating_matrix = get_user_seqs(args.data_file) 141 | 142 | args.item_size = max_item + 2 143 | args.mask_id = max_item + 1 144 | 145 | # save model args 146 | args_str = f"{args.model_name}-{args.data_name}-{args.model_idx}" 147 | args.log_file = os.path.join(args.output_dir, args_str + ".txt") 148 | 149 | show_args_info(args) 150 | 151 | with open(args.log_file, "a") as f: 152 | f.write(str(args) + "\n") 153 | 154 | # set item score in train set to `0` in validation 155 | args.train_matrix = valid_rating_matrix 156 | 157 | # save model 158 | checkpoint = args_str + ".pt" 159 | args.checkpoint_path = os.path.join(args.output_dir, checkpoint) 160 | 161 | # training data for node classification 162 | cluster_dataset = RecWithContrastiveLearningDataset( 163 | args, user_seq[: int(len(user_seq) * args.training_data_ratio)], data_type="train" 164 | ) 165 | cluster_sampler = SequentialSampler(cluster_dataset) 166 | cluster_dataloader = DataLoader(cluster_dataset, sampler=cluster_sampler, batch_size=args.batch_size) 167 | 168 | train_dataset = RecWithContrastiveLearningDataset( 169 | args, user_seq[: int(len(user_seq) * args.training_data_ratio)], data_type="train" 170 | ) 171 | train_sampler = RandomSampler(train_dataset) 172 | train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=args.batch_size) 173 | 174 | eval_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="valid") 175 | eval_sampler = SequentialSampler(eval_dataset) 176 | eval_dataloader = DataLoader(eval_dataset, sampler=eval_sampler, batch_size=args.batch_size) 177 | 178 | test_dataset = RecWithContrastiveLearningDataset(args, user_seq, data_type="test") 179 | test_sampler = SequentialSampler(test_dataset) 180 | test_dataloader = DataLoader(test_dataset, sampler=test_sampler, batch_size=args.batch_size) 181 | 182 | model = SASRecModel(args=args) 183 | 184 | trainer = ICLRecTrainer(model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args) 185 | 186 | if args.do_eval: 187 | trainer.args.train_matrix = test_rating_matrix 188 | trainer.load(args.checkpoint_path) 189 | print(f"Load model from {args.checkpoint_path} for test!") 190 | scores, result_info = trainer.test(0, full_sort=True) 191 | 192 | else: 193 | print(f"Train ICLRec") 194 | early_stopping = EarlyStopping(args.checkpoint_path, patience=40, verbose=True) 195 | for epoch in range(args.epochs): 196 | trainer.train(epoch) 197 | # evaluate on NDCG@20 198 | scores, _ = trainer.valid(epoch, full_sort=True) 199 | early_stopping(np.array(scores[-1:]), trainer.model) 200 | if early_stopping.early_stop: 201 | print("Early stopping") 202 | break 203 | trainer.args.train_matrix = test_rating_matrix 204 | print("---------------Change to test_rating_matrix!-------------------") 205 | # load the best model 206 | trainer.model.load_state_dict(torch.load(args.checkpoint_path)) 207 | scores, result_info = trainer.test(0, full_sort=True) 208 | 209 | print(args_str) 210 | print(result_info) 211 | with open(args.log_file, "a") as f: 212 | f.write(args_str + "\n") 213 | f.write(result_info + "\n") 214 | 215 | 216 | main() 217 | -------------------------------------------------------------------------------- /src/models.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Copyright (c) 2022 salesforce.com, inc. 4 | # All rights reserved. 5 | # SPDX-License-Identifier: BSD-3-Clause 6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 7 | # 8 | import math 9 | import os 10 | import pickle 11 | from tqdm import tqdm 12 | import random 13 | import copy 14 | 15 | import torch 16 | import torch.nn as nn 17 | import gensim 18 | import faiss 19 | 20 | # from kmeans_pytorch import kmeans 21 | import time 22 | 23 | from modules import Encoder, LayerNorm 24 | 25 | 26 | class KMeans(object): 27 | def __init__(self, num_cluster, seed, hidden_size, gpu_id=0, device="cpu"): 28 | """ 29 | Args: 30 | k: number of clusters 31 | """ 32 | self.seed = seed 33 | self.num_cluster = num_cluster 34 | self.max_points_per_centroid = 4096 35 | self.min_points_per_centroid = 0 36 | self.gpu_id = 0 37 | self.device = device 38 | self.first_batch = True 39 | self.hidden_size = hidden_size 40 | self.clus, self.index = self.__init_cluster(self.hidden_size) 41 | self.centroids = [] 42 | 43 | def __init_cluster( 44 | self, hidden_size, verbose=False, niter=20, nredo=5, max_points_per_centroid=4096, min_points_per_centroid=0 45 | ): 46 | print(" cluster train iterations:", niter) 47 | clus = faiss.Clustering(hidden_size, self.num_cluster) 48 | clus.verbose = verbose 49 | clus.niter = niter 50 | clus.nredo = nredo 51 | clus.seed = self.seed 52 | clus.max_points_per_centroid = max_points_per_centroid 53 | clus.min_points_per_centroid = min_points_per_centroid 54 | 55 | res = faiss.StandardGpuResources() 56 | res.noTempMemory() 57 | cfg = faiss.GpuIndexFlatConfig() 58 | cfg.useFloat16 = False 59 | cfg.device = self.gpu_id 60 | index = faiss.GpuIndexFlatL2(res, hidden_size, cfg) 61 | return clus, index 62 | 63 | def train(self, x): 64 | # train to get centroids 65 | if x.shape[0] > self.num_cluster: 66 | self.clus.train(x, self.index) 67 | # get cluster centroids 68 | centroids = faiss.vector_to_array(self.clus.centroids).reshape(self.num_cluster, self.hidden_size) 69 | # convert to cuda Tensors for broadcast 70 | centroids = torch.Tensor(centroids).to(self.device) 71 | self.centroids = nn.functional.normalize(centroids, p=2, dim=1) 72 | 73 | def query(self, x): 74 | # self.index.add(x) 75 | D, I = self.index.search(x, 1) # for each sample, find cluster distance and assignments 76 | seq2cluster = [int(n[0]) for n in I] 77 | # print("cluster number:", self.num_cluster,"cluster in batch:", len(set(seq2cluster))) 78 | seq2cluster = torch.LongTensor(seq2cluster).to(self.device) 79 | return seq2cluster, self.centroids[seq2cluster] 80 | 81 | 82 | class KMeans_Pytorch(object): 83 | def __init__(self, num_cluster, seed, hidden_size, gpu_id=0, device="cpu"): 84 | """ 85 | Args: 86 | k: number of clusters 87 | """ 88 | self.seed = seed 89 | self.num_cluster = num_cluster 90 | self.max_points_per_centroid = 4096 91 | self.min_points_per_centroid = 10 92 | self.first_batch = True 93 | self.hidden_size = hidden_size 94 | self.gpu_id = gpu_id 95 | self.device = device 96 | print(self.device, "-----") 97 | 98 | def run_kmeans(self, x, Niter=20, tqdm_flag=False): 99 | if x.shape[0] >= self.num_cluster: 100 | seq2cluster, centroids = kmeans( 101 | X=x, num_clusters=self.num_cluster, distance="euclidean", device=self.device, tqdm_flag=False 102 | ) 103 | seq2cluster = seq2cluster.to(self.device) 104 | centroids = centroids.to(self.device) 105 | # last batch where 106 | else: 107 | seq2cluster, centroids = kmeans( 108 | X=x, num_clusters=x.shape[0] - 1, distance="euclidean", device=self.device, tqdm_flag=False 109 | ) 110 | seq2cluster = seq2cluster.to(self.device) 111 | centroids = centroids.to(self.device) 112 | return seq2cluster, centroids 113 | 114 | 115 | class SASRecModel(nn.Module): 116 | def __init__(self, args): 117 | super(SASRecModel, self).__init__() 118 | self.item_embeddings = nn.Embedding(args.item_size, args.hidden_size, padding_idx=0) 119 | self.position_embeddings = nn.Embedding(args.max_seq_length, args.hidden_size) 120 | self.item_encoder = Encoder(args) 121 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12) 122 | self.dropout = nn.Dropout(args.hidden_dropout_prob) 123 | self.args = args 124 | 125 | self.criterion = nn.BCELoss(reduction="none") 126 | self.apply(self.init_weights) 127 | 128 | # Positional Embedding 129 | def add_position_embedding(self, sequence): 130 | 131 | seq_length = sequence.size(1) 132 | position_ids = torch.arange(seq_length, dtype=torch.long, device=sequence.device) 133 | position_ids = position_ids.unsqueeze(0).expand_as(sequence) 134 | item_embeddings = self.item_embeddings(sequence) 135 | position_embeddings = self.position_embeddings(position_ids) 136 | sequence_emb = item_embeddings + position_embeddings 137 | sequence_emb = self.LayerNorm(sequence_emb) 138 | sequence_emb = self.dropout(sequence_emb) 139 | 140 | return sequence_emb 141 | 142 | # model same as SASRec 143 | def forward(self, input_ids): 144 | 145 | attention_mask = (input_ids > 0).long() 146 | extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) # torch.int64 147 | max_len = attention_mask.size(-1) 148 | attn_shape = (1, max_len, max_len) 149 | subsequent_mask = torch.triu(torch.ones(attn_shape), diagonal=1) # torch.uint8 150 | subsequent_mask = (subsequent_mask == 0).unsqueeze(1) 151 | subsequent_mask = subsequent_mask.long() 152 | 153 | if self.args.cuda_condition: 154 | subsequent_mask = subsequent_mask.cuda() 155 | 156 | extended_attention_mask = extended_attention_mask * subsequent_mask 157 | extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility 158 | extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0 159 | 160 | sequence_emb = self.add_position_embedding(input_ids) 161 | 162 | item_encoded_layers = self.item_encoder(sequence_emb, extended_attention_mask, output_all_encoded_layers=True) 163 | 164 | sequence_output = item_encoded_layers[-1] 165 | return sequence_output 166 | 167 | def init_weights(self, module): 168 | """ Initialize the weights. 169 | """ 170 | if isinstance(module, (nn.Linear, nn.Embedding)): 171 | # Slightly different from the TF version which uses truncated_normal for initialization 172 | # cf https://github.com/pytorch/pytorch/pull/5617 173 | module.weight.data.normal_(mean=0.0, std=self.args.initializer_range) 174 | elif isinstance(module, LayerNorm): 175 | module.bias.data.zero_() 176 | module.weight.data.fill_(1.0) 177 | if isinstance(module, nn.Linear) and module.bias is not None: 178 | module.bias.data.zero_() 179 | 180 | 181 | class OnlineItemSimilarity: 182 | def __init__(self, item_size): 183 | self.item_size = item_size 184 | self.item_embeddings = None 185 | self.cuda_condition = torch.cuda.is_available() 186 | self.device = torch.device("cuda" if self.cuda_condition else "cpu") 187 | self.total_item_list = torch.tensor([i for i in range(self.item_size)], dtype=torch.long).to(self.device) 188 | self.max_score, self.min_score = self.get_maximum_minimum_sim_scores() 189 | 190 | def update_embedding_matrix(self, item_embeddings): 191 | self.item_embeddings = copy.deepcopy(item_embeddings) 192 | self.base_embedding_matrix = self.item_embeddings(self.total_item_list) 193 | 194 | def get_maximum_minimum_sim_scores(self): 195 | max_score, min_score = -1, 100 196 | for item_idx in range(1, self.item_size): 197 | try: 198 | item_vector = self.item_embeddings(item_idx).view(-1, 1) 199 | item_similarity = torch.mm(self.base_embedding_matrix, item_vector).view(-1) 200 | max_score = max(torch.max(item_similarity), max_score) 201 | min_score = min(torch.min(item_similarity), min_score) 202 | except: 203 | continue 204 | return max_score, min_score 205 | 206 | def most_similar(self, item_idx, top_k=1, with_score=False): 207 | item_idx = torch.tensor(item_idx, dtype=torch.long).to(self.device) 208 | item_vector = self.item_embeddings(item_idx).view(-1, 1) 209 | item_similarity = torch.mm(self.base_embedding_matrix, item_vector).view(-1) 210 | item_similarity = (self.max_score - item_similarity) / (self.max_score - self.min_score) 211 | # remove item idx itself 212 | values, indices = item_similarity.topk(top_k + 1) 213 | if with_score: 214 | item_list = indices.tolist() 215 | score_list = values.tolist() 216 | if item_idx in item_list: 217 | idd = item_list.index(item_idx) 218 | item_list.remove(item_idx) 219 | score_list.pop(idd) 220 | return list(zip(item_list, score_list)) 221 | item_list = indices.tolist() 222 | if item_idx in item_list: 223 | item_list.remove(item_idx) 224 | return item_list 225 | 226 | 227 | class OfflineItemSimilarity: 228 | def __init__(self, data_file=None, similarity_path=None, model_name="ItemCF", dataset_name="Sports_and_Outdoors"): 229 | self.dataset_name = dataset_name 230 | self.similarity_path = similarity_path 231 | # train_data_list used for item2vec, train_data_dict used for itemCF and itemCF-IUF 232 | self.train_data_list, self.train_item_list, self.train_data_dict = self._load_train_data(data_file) 233 | self.model_name = model_name 234 | self.similarity_model = self.load_similarity_model(self.similarity_path) 235 | self.max_score, self.min_score = self.get_maximum_minimum_sim_scores() 236 | 237 | def get_maximum_minimum_sim_scores(self): 238 | max_score, min_score = -1, 100 239 | for item in self.similarity_model.keys(): 240 | for neig in self.similarity_model[item]: 241 | sim_score = self.similarity_model[item][neig] 242 | max_score = max(max_score, sim_score) 243 | min_score = min(min_score, sim_score) 244 | return max_score, min_score 245 | 246 | def _convert_data_to_dict(self, data): 247 | """ 248 | split the data set 249 | testdata is a test data set 250 | traindata is a train set 251 | """ 252 | train_data_dict = {} 253 | for user, item, record in data: 254 | train_data_dict.setdefault(user, {}) 255 | train_data_dict[user][item] = record 256 | return train_data_dict 257 | 258 | def _save_dict(self, dict_data, save_path="./similarity.pkl"): 259 | print("saving data to ", save_path) 260 | with open(save_path, "wb") as write_file: 261 | pickle.dump(dict_data, write_file) 262 | 263 | def _load_train_data(self, data_file=None): 264 | """ 265 | read the data from the data file which is a data set 266 | """ 267 | train_data = [] 268 | train_data_list = [] 269 | train_data_set_list = [] 270 | for line in open(data_file).readlines(): 271 | userid, items = line.strip().split(" ", 1) 272 | # only use training data 273 | items = items.split(" ")[:-3] 274 | train_data_list.append(items) 275 | train_data_set_list += items 276 | for itemid in items: 277 | train_data.append((userid, itemid, int(1))) 278 | return train_data_list, set(train_data_set_list), self._convert_data_to_dict(train_data) 279 | 280 | def _generate_item_similarity(self, train=None, save_path="./"): 281 | """ 282 | calculate co-rated users between items 283 | """ 284 | print("getting item similarity...") 285 | train = train or self.train_data_dict 286 | C = dict() 287 | N = dict() 288 | 289 | if self.model_name in ["ItemCF", "ItemCF_IUF"]: 290 | print("Step 1: Compute Statistics") 291 | data_iter = tqdm(enumerate(train.items()), total=len(train.items())) 292 | for idx, (u, items) in data_iter: 293 | if self.model_name == "ItemCF": 294 | for i in items.keys(): 295 | N.setdefault(i, 0) 296 | N[i] += 1 297 | for j in items.keys(): 298 | if i == j: 299 | continue 300 | C.setdefault(i, {}) 301 | C[i].setdefault(j, 0) 302 | C[i][j] += 1 303 | elif self.model_name == "ItemCF_IUF": 304 | for i in items.keys(): 305 | N.setdefault(i, 0) 306 | N[i] += 1 307 | for j in items.keys(): 308 | if i == j: 309 | continue 310 | C.setdefault(i, {}) 311 | C[i].setdefault(j, 0) 312 | C[i][j] += 1 / math.log(1 + len(items) * 1.0) 313 | self.itemSimBest = dict() 314 | print("Step 2: Compute co-rate matrix") 315 | c_iter = tqdm(enumerate(C.items()), total=len(C.items())) 316 | for idx, (cur_item, related_items) in c_iter: 317 | self.itemSimBest.setdefault(cur_item, {}) 318 | for related_item, score in related_items.items(): 319 | self.itemSimBest[cur_item].setdefault(related_item, 0) 320 | self.itemSimBest[cur_item][related_item] = score / math.sqrt(N[cur_item] * N[related_item]) 321 | self._save_dict(self.itemSimBest, save_path=save_path) 322 | elif self.model_name == "Item2Vec": 323 | # details here: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py 324 | print("Step 1: train item2vec model") 325 | item2vec_model = gensim.models.Word2Vec( 326 | sentences=self.train_data_list, vector_size=20, window=5, min_count=0, epochs=100 327 | ) 328 | self.itemSimBest = dict() 329 | total_item_nums = len(item2vec_model.wv.index_to_key) 330 | print("Step 2: convert to item similarity dict") 331 | total_items = tqdm(item2vec_model.wv.index_to_key, total=total_item_nums) 332 | for cur_item in total_items: 333 | related_items = item2vec_model.wv.most_similar(positive=[cur_item], topn=20) 334 | self.itemSimBest.setdefault(cur_item, {}) 335 | for (related_item, score) in related_items: 336 | self.itemSimBest[cur_item].setdefault(related_item, 0) 337 | self.itemSimBest[cur_item][related_item] = score 338 | print("Item2Vec model saved to: ", save_path) 339 | self._save_dict(self.itemSimBest, save_path=save_path) 340 | elif self.model_name == "LightGCN": 341 | # train a item embedding from lightGCN model, and then convert to sim dict 342 | print("generating similarity model..") 343 | itemSimBest = light_gcn.generate_similarity_from_light_gcn(self.dataset_name) 344 | print("LightGCN based model saved to: ", save_path) 345 | self._save_dict(itemSimBest, save_path=save_path) 346 | 347 | def load_similarity_model(self, similarity_model_path): 348 | if not similarity_model_path: 349 | raise ValueError("invalid path") 350 | elif not os.path.exists(similarity_model_path): 351 | print("the similirity dict not exist, generating...") 352 | self._generate_item_similarity(save_path=self.similarity_path) 353 | if self.model_name in ["ItemCF", "ItemCF_IUF", "Item2Vec", "LightGCN"]: 354 | with open(similarity_model_path, "rb") as read_file: 355 | similarity_dict = pickle.load(read_file) 356 | return similarity_dict 357 | elif self.model_name == "Random": 358 | similarity_dict = self.train_item_list 359 | return similarity_dict 360 | 361 | def most_similar(self, item, top_k=1, with_score=False): 362 | if self.model_name in ["ItemCF", "ItemCF_IUF", "Item2Vec", "LightGCN"]: 363 | """TODO: handle case that item not in keys""" 364 | if str(item) in self.similarity_model: 365 | top_k_items_with_score = sorted( 366 | self.similarity_model[str(item)].items(), key=lambda x: x[1], reverse=True 367 | )[0:top_k] 368 | if with_score: 369 | return list( 370 | map( 371 | lambda x: (int(x[0]), (self.max_score - float(x[1])) / (self.max_score - self.min_score)), 372 | top_k_items_with_score, 373 | ) 374 | ) 375 | return list(map(lambda x: int(x[0]), top_k_items_with_score)) 376 | elif int(item) in self.similarity_model: 377 | top_k_items_with_score = sorted( 378 | self.similarity_model[int(item)].items(), key=lambda x: x[1], reverse=True 379 | )[0:top_k] 380 | if with_score: 381 | return list( 382 | map( 383 | lambda x: (int(x[0]), (self.max_score - float(x[1])) / (self.max_score - self.min_score)), 384 | top_k_items_with_score, 385 | ) 386 | ) 387 | return list(map(lambda x: int(x[0]), top_k_items_with_score)) 388 | else: 389 | item_list = list(self.similarity_model.keys()) 390 | random_items = random.sample(item_list, k=top_k) 391 | return list(map(lambda x: int(x), random_items)) 392 | elif self.model_name == "Random": 393 | random_items = random.sample(self.similarity_model, k=top_k) 394 | return list(map(lambda x: int(x), random_items)) 395 | 396 | 397 | if __name__ == "__main__": 398 | onlineitemsim = OnlineItemSimilarity(item_size=10) 399 | item_embeddings = nn.Embedding(10, 6, padding_idx=0) 400 | onlineitemsim.update_embedding_matrix(item_embeddings) 401 | item_idx = torch.tensor(2, dtype=torch.long) 402 | similiar_items = onlineitemsim.most_similar(item_idx=item_idx, top_k=1) 403 | print(similiar_items) 404 | -------------------------------------------------------------------------------- /src/modules.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | # 3 | # Copyright (c) 2022 salesforce.com, inc. 4 | # All rights reserved. 5 | # SPDX-License-Identifier: BSD-3-Clause 6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 7 | # 8 | 9 | import numpy as np 10 | 11 | import copy 12 | import math 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.functional as F 16 | 17 | 18 | class PCLoss(nn.Module): 19 | """ Reference: https://github.com/salesforce/PCL/blob/018a929c53fcb93fd07041b1725185e1237d2c0e/pcl/builder.py#L168 20 | """ 21 | 22 | def __init__(self, temperature, device, contrast_mode="all"): 23 | super(PCLoss, self).__init__() 24 | self.contrast_mode = contrast_mode 25 | self.criterion = NCELoss(temperature, device) 26 | 27 | def forward(self, batch_sample_one, batch_sample_two, intents, intent_ids): 28 | """ 29 | features: 30 | intents: num_clusters x batch_size x hidden_dims 31 | """ 32 | # instance contrast with prototypes 33 | mean_pcl_loss = 0 34 | # do de-noise 35 | if intent_ids is not None: 36 | for intent, intent_id in zip(intents, intent_ids): 37 | pos_one_compare_loss = self.criterion(batch_sample_one, intent, intent_id) 38 | pos_two_compare_loss = self.criterion(batch_sample_two, intent, intent_id) 39 | mean_pcl_loss += pos_one_compare_loss 40 | mean_pcl_loss += pos_two_compare_loss 41 | mean_pcl_loss /= 2 * len(intents) 42 | # don't do de-noise 43 | else: 44 | for intent in intents: 45 | pos_one_compare_loss = self.criterion(batch_sample_one, intent, intent_ids=None) 46 | pos_two_compare_loss = self.criterion(batch_sample_two, intent, intent_ids=None) 47 | mean_pcl_loss += pos_one_compare_loss 48 | mean_pcl_loss += pos_two_compare_loss 49 | mean_pcl_loss /= 2 * len(intents) 50 | return mean_pcl_loss 51 | 52 | 53 | class SupConLoss(nn.Module): 54 | """Supervised Contrastive Learning: https://arxiv.org/pdf/2004.11362.pdf. 55 | It also supports the unsupervised contrastive loss in SimCLR""" 56 | 57 | def __init__(self, temperature, device, contrast_mode="all"): 58 | super(SupConLoss, self).__init__() 59 | self.device = device 60 | self.temperature = temperature 61 | self.contrast_mode = contrast_mode 62 | self.total_calls = 0 63 | self.call_with_repeat_seq = 0 64 | 65 | def forward(self, features, intents=None, mask=None): 66 | """Compute loss for model. If both `labels` and `mask` are None, 67 | it degenerates to SimCLR unsupervised loss: 68 | https://arxiv.org/pdf/2002.05709.pdf 69 | Args: 70 | features: hidden vector of shape [bsz, n_views, ...]. 71 | labels: ground truth of shape [bsz]. 72 | mask: contrastive mask of shape [bsz, bsz], mask_{i,j}=1 if sample j 73 | has the same class as sample i. Can be asymmetric. 74 | Returns: 75 | A loss scalar. 76 | """ 77 | 78 | # check probability of intent belongs to the same intent 79 | if intents is not None: 80 | unique_intents = torch.unique(intents) 81 | if unique_intents.shape[0] != intents.shape[0]: 82 | self.call_with_repeat_seq += 1 83 | self.total_calls += 1 84 | if len(features.shape) < 3: 85 | raise ValueError("`features` needs to be [bsz, n_views, ...]," "at least 3 dimensions are required") 86 | if len(features.shape) > 3: 87 | features = features.view(features.shape[0], features.shape[1], -1) 88 | 89 | # normalize features 90 | features = F.normalize(features, dim=2) 91 | 92 | batch_size = features.shape[0] 93 | if intents is not None and mask is not None: 94 | raise ValueError("Cannot define both `labels` and `mask`") 95 | elif intents is None and mask is None: 96 | mask = torch.eye(batch_size, dtype=torch.float32).to(self.device) 97 | elif intents is not None: 98 | intents = intents.contiguous().view(-1, 1) 99 | if intents.shape[0] != batch_size: 100 | raise ValueError("Num of labels does not match num of features") 101 | mask = torch.eq(intents, intents.T).float().to(self.device) 102 | else: 103 | mask = mask.float().to(self.device) 104 | 105 | contrast_count = features.shape[1] 106 | contrast_feature = torch.cat(torch.unbind(features, dim=1), dim=0) 107 | if self.contrast_mode == "one": 108 | anchor_feature = features[:, 0] 109 | anchor_count = 1 110 | elif self.contrast_mode == "all": 111 | anchor_feature = contrast_feature 112 | anchor_count = contrast_count 113 | else: 114 | raise ValueError("Unknown mode: {}".format(self.contrast_mode)) 115 | 116 | # compute logits 117 | anchor_dot_contrast = torch.div(torch.matmul(anchor_feature, contrast_feature.T), self.temperature) 118 | # for numerical stability 119 | logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True) 120 | logits = anchor_dot_contrast - logits_max.detach() 121 | 122 | # tile mask 123 | mask = mask.repeat(anchor_count, contrast_count) 124 | # mask-out self-contrast cases 125 | logits_mask = torch.scatter( 126 | torch.ones_like(mask), 1, torch.arange(batch_size * anchor_count).view(-1, 1).to(self.device), 0 127 | ) 128 | mask = mask * logits_mask 129 | 130 | # compute log_prob 131 | exp_logits = torch.exp(logits) * logits_mask 132 | log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True)) 133 | 134 | # compute mean of log-likelihood over positive 135 | mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1) 136 | 137 | # loss 138 | # loss = - (self.temperature / self.base_temperature) * mean_log_prob_pos 139 | loss = -mean_log_prob_pos 140 | loss = loss.view(anchor_count, batch_size).mean() 141 | 142 | return loss 143 | 144 | 145 | class NCELoss(nn.Module): 146 | """ 147 | Eq. (12): L_{NCE} 148 | """ 149 | 150 | def __init__(self, temperature, device): 151 | super(NCELoss, self).__init__() 152 | self.device = device 153 | self.criterion = nn.CrossEntropyLoss().to(self.device) 154 | self.temperature = temperature 155 | self.cossim = nn.CosineSimilarity(dim=-1).to(self.device) 156 | 157 | # #modified based on impl: https://github.com/ae-foster/pytorch-simclr/blob/dc9ac57a35aec5c7d7d5fe6dc070a975f493c1a5/critic.py#L5 158 | def forward(self, batch_sample_one, batch_sample_two, intent_ids=None): 159 | # sim11 = self.cossim(batch_sample_one.unsqueeze(-2), batch_sample_one.unsqueeze(-3)) / self.temperature 160 | # sim22 = self.cossim(batch_sample_two.unsqueeze(-2), batch_sample_two.unsqueeze(-3)) / self.temperature 161 | # sim12 = self.cossim(batch_sample_one.unsqueeze(-2), batch_sample_two.unsqueeze(-3)) / self.temperature 162 | sim11 = torch.matmul(batch_sample_one, batch_sample_one.T) / self.temperature 163 | sim22 = torch.matmul(batch_sample_two, batch_sample_two.T) / self.temperature 164 | sim12 = torch.matmul(batch_sample_one, batch_sample_two.T) / self.temperature 165 | d = sim12.shape[-1] 166 | # avoid contrast against positive intents 167 | if intent_ids is not None: 168 | intent_ids = intent_ids.contiguous().view(-1, 1) 169 | mask_11_22 = torch.eq(intent_ids, intent_ids.T).long().to(self.device) 170 | sim11[mask_11_22 == 1] = float("-inf") 171 | sim22[mask_11_22 == 1] = float("-inf") 172 | eye_metrix = torch.eye(d, dtype=torch.long).to(self.device) 173 | mask_11_22[eye_metrix == 1] = 0 174 | sim12[mask_11_22 == 1] = float("-inf") 175 | else: 176 | mask = torch.eye(d, dtype=torch.long).to(self.device) 177 | sim11[mask == 1] = float("-inf") 178 | sim22[mask == 1] = float("-inf") 179 | # sim22 = sim22.masked_fill_(mask, -np.inf) 180 | # sim11[..., range(d), range(d)] = float('-inf') 181 | # sim22[..., range(d), range(d)] = float('-inf') 182 | 183 | raw_scores1 = torch.cat([sim12, sim11], dim=-1) 184 | raw_scores2 = torch.cat([sim22, sim12.transpose(-1, -2)], dim=-1) 185 | logits = torch.cat([raw_scores1, raw_scores2], dim=-2) 186 | labels = torch.arange(2 * d, dtype=torch.long, device=logits.device) 187 | nce_loss = self.criterion(logits, labels) 188 | return nce_loss 189 | 190 | 191 | class NTXent(nn.Module): 192 | """ 193 | Contrastive loss with distributed data parallel support 194 | code: https://github.com/AndrewAtanov/simclr-pytorch/blob/master/models/losses.py 195 | """ 196 | 197 | LARGE_NUMBER = 1e9 198 | 199 | def __init__(self, tau=1.0, gpu=None, multiplier=2, distributed=False): 200 | super().__init__() 201 | self.tau = tau 202 | self.multiplier = multiplier 203 | self.distributed = distributed 204 | self.norm = 1.0 205 | 206 | def forward(self, batch_sample_one, batch_sample_two): 207 | z = torch.cat([batch_sample_one, batch_sample_two], dim=0) 208 | n = z.shape[0] 209 | assert n % self.multiplier == 0 210 | 211 | z = F.normalize(z, p=2, dim=1) / np.sqrt(self.tau) 212 | logits = z @ z.t() 213 | logits[np.arange(n), np.arange(n)] = -self.LARGE_NUMBER 214 | 215 | logprob = F.log_softmax(logits, dim=1) 216 | 217 | # choose all positive objects for an example, for i it would be (i + k * n/m), where k=0...(m-1) 218 | m = self.multiplier 219 | labels = (np.repeat(np.arange(n), m) + np.tile(np.arange(m) * n // m, n)) % n 220 | # remove labels pointet to itself, i.e. (i, i) 221 | labels = labels.reshape(n, m)[:, 1:].reshape(-1) 222 | 223 | # TODO: maybe different terms for each process should only be computed here... 224 | loss = -logprob[np.repeat(np.arange(n), m - 1), labels].sum() / n / (m - 1) / self.norm 225 | return loss 226 | 227 | 228 | def gelu(x): 229 | """Implementation of the gelu activation function. 230 | For information: OpenAI GPT's gelu is slightly different 231 | (and gives slightly different results): 232 | 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * 233 | (x + 0.044715 * torch.pow(x, 3)))) 234 | Also see https://arxiv.org/abs/1606.08415 235 | """ 236 | return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) 237 | 238 | 239 | def swish(x): 240 | return x * torch.sigmoid(x) 241 | 242 | 243 | ACT2FN = {"gelu": gelu, "relu": F.relu, "swish": swish} 244 | 245 | 246 | class LayerNorm(nn.Module): 247 | def __init__(self, hidden_size, eps=1e-12): 248 | """Construct a layernorm module in the TF style (epsilon inside the square root). 249 | """ 250 | super(LayerNorm, self).__init__() 251 | self.weight = nn.Parameter(torch.ones(hidden_size)) 252 | self.bias = nn.Parameter(torch.zeros(hidden_size)) 253 | self.variance_epsilon = eps 254 | 255 | def forward(self, x): 256 | u = x.mean(-1, keepdim=True) 257 | s = (x - u).pow(2).mean(-1, keepdim=True) 258 | x = (x - u) / torch.sqrt(s + self.variance_epsilon) 259 | return self.weight * x + self.bias 260 | 261 | 262 | class Embeddings(nn.Module): 263 | """Construct the embeddings from item, position. 264 | """ 265 | 266 | def __init__(self, args): 267 | super(Embeddings, self).__init__() 268 | 269 | self.item_embeddings = nn.Embedding(args.item_size, args.hidden_size, padding_idx=0) # 不要乱用padding_idx 270 | self.position_embeddings = nn.Embedding(args.max_seq_length, args.hidden_size) 271 | 272 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12) 273 | self.dropout = nn.Dropout(args.hidden_dropout_prob) 274 | 275 | self.args = args 276 | 277 | def forward(self, input_ids): 278 | seq_length = input_ids.size(1) 279 | position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) 280 | position_ids = position_ids.unsqueeze(0).expand_as(input_ids) 281 | items_embeddings = self.item_embeddings(input_ids) 282 | position_embeddings = self.position_embeddings(position_ids) 283 | embeddings = items_embeddings + position_embeddings 284 | # 修改属性 285 | embeddings = self.LayerNorm(embeddings) 286 | embeddings = self.dropout(embeddings) 287 | return embeddings 288 | 289 | 290 | class SelfAttention(nn.Module): 291 | def __init__(self, args): 292 | super(SelfAttention, self).__init__() 293 | if args.hidden_size % args.num_attention_heads != 0: 294 | raise ValueError( 295 | "The hidden size (%d) is not a multiple of the number of attention " 296 | "heads (%d)" % (args.hidden_size, args.num_attention_heads) 297 | ) 298 | self.num_attention_heads = args.num_attention_heads 299 | self.attention_head_size = int(args.hidden_size / args.num_attention_heads) 300 | self.all_head_size = self.num_attention_heads * self.attention_head_size 301 | 302 | self.query = nn.Linear(args.hidden_size, self.all_head_size) 303 | self.key = nn.Linear(args.hidden_size, self.all_head_size) 304 | self.value = nn.Linear(args.hidden_size, self.all_head_size) 305 | 306 | self.attn_dropout = nn.Dropout(args.attention_probs_dropout_prob) 307 | 308 | # 做完self-attention 做一个前馈全连接 LayerNorm 输出 309 | self.dense = nn.Linear(args.hidden_size, args.hidden_size) 310 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12) 311 | self.out_dropout = nn.Dropout(args.hidden_dropout_prob) 312 | 313 | def transpose_for_scores(self, x): 314 | new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) 315 | x = x.view(*new_x_shape) 316 | return x.permute(0, 2, 1, 3) 317 | 318 | def forward(self, input_tensor, attention_mask): 319 | mixed_query_layer = self.query(input_tensor) 320 | mixed_key_layer = self.key(input_tensor) 321 | mixed_value_layer = self.value(input_tensor) 322 | 323 | query_layer = self.transpose_for_scores(mixed_query_layer) 324 | key_layer = self.transpose_for_scores(mixed_key_layer) 325 | value_layer = self.transpose_for_scores(mixed_value_layer) 326 | 327 | # Take the dot product between "query" and "key" to get the raw attention scores. 328 | attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) 329 | 330 | attention_scores = attention_scores / math.sqrt(self.attention_head_size) 331 | # Apply the attention mask is (precomputed for all layers in BertModel forward() function) 332 | # [batch_size heads seq_len seq_len] scores 333 | # [batch_size 1 1 seq_len] 334 | attention_scores = attention_scores + attention_mask 335 | 336 | # Normalize the attention scores to probabilities. 337 | attention_probs = nn.Softmax(dim=-1)(attention_scores) 338 | # This is actually dropping out entire tokens to attend to, which might 339 | # seem a bit unusual, but is taken from the original Transformer paper. 340 | # Fixme 341 | attention_probs = self.attn_dropout(attention_probs) 342 | context_layer = torch.matmul(attention_probs, value_layer) 343 | context_layer = context_layer.permute(0, 2, 1, 3).contiguous() 344 | new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) 345 | context_layer = context_layer.view(*new_context_layer_shape) 346 | hidden_states = self.dense(context_layer) 347 | hidden_states = self.out_dropout(hidden_states) 348 | hidden_states = self.LayerNorm(hidden_states + input_tensor) 349 | 350 | return hidden_states 351 | 352 | 353 | class Intermediate(nn.Module): 354 | def __init__(self, args): 355 | super(Intermediate, self).__init__() 356 | self.dense_1 = nn.Linear(args.hidden_size, args.hidden_size * 4) 357 | if isinstance(args.hidden_act, str): 358 | self.intermediate_act_fn = ACT2FN[args.hidden_act] 359 | else: 360 | self.intermediate_act_fn = args.hidden_act 361 | 362 | self.dense_2 = nn.Linear(args.hidden_size * 4, args.hidden_size) 363 | self.LayerNorm = LayerNorm(args.hidden_size, eps=1e-12) 364 | self.dropout = nn.Dropout(args.hidden_dropout_prob) 365 | 366 | def forward(self, input_tensor): 367 | 368 | hidden_states = self.dense_1(input_tensor) 369 | hidden_states = self.intermediate_act_fn(hidden_states) 370 | 371 | hidden_states = self.dense_2(hidden_states) 372 | hidden_states = self.dropout(hidden_states) 373 | hidden_states = self.LayerNorm(hidden_states + input_tensor) 374 | 375 | return hidden_states 376 | 377 | 378 | class Layer(nn.Module): 379 | def __init__(self, args): 380 | super(Layer, self).__init__() 381 | self.attention = SelfAttention(args) 382 | self.intermediate = Intermediate(args) 383 | 384 | def forward(self, hidden_states, attention_mask): 385 | attention_output = self.attention(hidden_states, attention_mask) 386 | intermediate_output = self.intermediate(attention_output) 387 | return intermediate_output 388 | 389 | 390 | class Encoder(nn.Module): 391 | def __init__(self, args): 392 | super(Encoder, self).__init__() 393 | layer = Layer(args) 394 | self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(args.num_hidden_layers)]) 395 | 396 | def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True): 397 | all_encoder_layers = [] 398 | for layer_module in self.layer: 399 | hidden_states = layer_module(hidden_states, attention_mask) 400 | if output_all_encoded_layers: 401 | all_encoder_layers.append(hidden_states) 402 | if not output_all_encoded_layers: 403 | all_encoder_layers.append(hidden_states) 404 | return all_encoder_layers 405 | -------------------------------------------------------------------------------- /src/output/ICLRec-Beauty-1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Beauty-1.pt -------------------------------------------------------------------------------- /src/output/ICLRec-Sports_and_Outdoors-1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Sports_and_Outdoors-1.pt -------------------------------------------------------------------------------- /src/output/ICLRec-Toys_and_Games-1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Toys_and_Games-1.pt -------------------------------------------------------------------------------- /src/output/ICLRec-Yelp-1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-Yelp-1.pt -------------------------------------------------------------------------------- /src/output/ICLRec-ml-1m-1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/salesforce/ICLRec/3d9444178ac2a720b1664b91995dd0d58ce15337/src/output/ICLRec-ml-1m-1.pt -------------------------------------------------------------------------------- /src/output/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Trained models on the four datasets 3 | 4 | Run following commands to evaluate ICLRec on four datasets, respectively: 5 | For Sports_and_Outdoors 6 | ``` 7 | python main.py --data_name Sports_and_Outdoors --model_idx 1 --do_eval 8 | ``` 9 | 10 | For Beauty 11 | 12 | ``` 13 | python main.py --data_name Beauty --model_idx 1 --do_eval 14 | ``` 15 | 16 | For Toys_and_Games 17 | 18 | ``` 19 | python main.py --data_name Toys_and_Games --model_idx 1 --do_eval 20 | ``` 21 | 22 | For Yelp 23 | 24 | ``` 25 | python main.py --data_name Yelp --model_idx 1 --do_eval 26 | ``` 27 | -------------------------------------------------------------------------------- /src/scripts/run_beauty.sh: -------------------------------------------------------------------------------- 1 | python3 main.py --data_name Beauty --cf_weight 0.1 \ 2 | --model_idx 1 --gpu_id 0 \ 3 | --batch_size 256 --contrast_type Hybrid \ 4 | --num_intent_cluster 256 --seq_representation_type mean \ 5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 1 \ -------------------------------------------------------------------------------- /src/scripts/run_ml_1m.sh: -------------------------------------------------------------------------------- 1 | python3 main.py --data_name ml-1m --cf_weight 0.0 \ 2 | --model_idx 1 --gpu_id 0 \ 3 | --batch_size 256 --contrast_type IntentCL \ 4 | --num_intent_cluster 256 --seq_representation_type mean \ 5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2 --max_seq_length 200 6 | -------------------------------------------------------------------------------- /src/scripts/run_sports.sh: -------------------------------------------------------------------------------- 1 | python3 main.py --data_name Sports_and_Outdoors --cf_weight 0.1 \ 2 | --model_idx 1 --gpu_id 0 \ 3 | --batch_size 256 --contrast_type Hybrid \ 4 | --num_intent_cluster 256 --seq_representation_type mean \ 5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2 \ -------------------------------------------------------------------------------- /src/scripts/run_toys.sh: -------------------------------------------------------------------------------- 1 | python3 main.py --data_name Toys_and_Games --cf_weight 0.1 \ 2 | --model_idx 1 --gpu_id 0 \ 3 | --batch_size 256 --contrast_type Hybrid \ 4 | --num_intent_cluster 256 --seq_representation_type mean \ 5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 3 -------------------------------------------------------------------------------- /src/scripts/run_yelp.sh: -------------------------------------------------------------------------------- 1 | python3 main.py --data_name Yelp --cf_weight 0.1 \ 2 | --model_idx 1 --gpu_id 0 \ 3 | --batch_size 256 --contrast_type Hybrid \ 4 | --num_intent_cluster 256 --seq_representation_type mean \ 5 | --warm_up_epoches 0 --intent_cf_weight 0.1 --num_hidden_layers 2 -------------------------------------------------------------------------------- /src/trainers.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Copyright (c) 2022 salesforce.com, inc. 4 | # All rights reserved. 5 | # SPDX-License-Identifier: BSD-3-Clause 6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 7 | # 8 | 9 | import numpy as np 10 | from tqdm import tqdm 11 | import random 12 | 13 | import torch 14 | import torch.nn as nn 15 | from torch.optim import Adam 16 | from torch.utils.data import DataLoader, RandomSampler 17 | 18 | from models import KMeans 19 | from datasets import RecWithContrastiveLearningDataset 20 | from modules import NCELoss, NTXent, SupConLoss, PCLoss 21 | from utils import recall_at_k, ndcg_k, get_metric, get_user_seqs, nCr 22 | 23 | 24 | class Trainer: 25 | def __init__(self, model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args): 26 | 27 | self.args = args 28 | self.cuda_condition = torch.cuda.is_available() and not self.args.no_cuda 29 | self.device = torch.device("cuda" if self.cuda_condition else "cpu") 30 | 31 | self.model = model 32 | 33 | self.num_intent_clusters = [int(i) for i in self.args.num_intent_clusters.split(",")] 34 | self.clusters = [] 35 | for num_intent_cluster in self.num_intent_clusters: 36 | # initialize Kmeans 37 | if self.args.seq_representation_type == "mean": 38 | cluster = KMeans( 39 | num_cluster=num_intent_cluster, 40 | seed=self.args.seed, 41 | hidden_size=self.args.hidden_size, 42 | gpu_id=self.args.gpu_id, 43 | device=self.device, 44 | ) 45 | self.clusters.append(cluster) 46 | else: 47 | cluster = KMeans( 48 | num_cluster=num_intent_cluster, 49 | seed=self.args.seed, 50 | hidden_size=self.args.hidden_size * self.args.max_seq_length, 51 | gpu_id=self.args.gpu_id, 52 | device=self.device, 53 | ) 54 | self.clusters.append(cluster) 55 | 56 | self.total_augmentaion_pairs = nCr(self.args.n_views, 2) 57 | # projection head for contrastive learn task 58 | self.projection = nn.Sequential( 59 | nn.Linear(self.args.max_seq_length * self.args.hidden_size, 512, bias=False), 60 | nn.BatchNorm1d(512), 61 | nn.ReLU(inplace=True), 62 | nn.Linear(512, self.args.hidden_size, bias=True), 63 | ) 64 | if self.cuda_condition: 65 | self.model.cuda() 66 | self.projection.cuda() 67 | # Setting the train and test data loader 68 | self.train_dataloader = train_dataloader 69 | self.cluster_dataloader = cluster_dataloader 70 | self.eval_dataloader = eval_dataloader 71 | self.test_dataloader = test_dataloader 72 | 73 | # self.data_name = self.args.data_name 74 | betas = (self.args.adam_beta1, self.args.adam_beta2) 75 | self.optim = Adam(self.model.parameters(), lr=self.args.lr, betas=betas, weight_decay=self.args.weight_decay) 76 | 77 | print("Total Parameters:", sum([p.nelement() for p in self.model.parameters()])) 78 | 79 | self.cf_criterion = NCELoss(self.args.temperature, self.device) 80 | self.pcl_criterion = PCLoss(self.args.temperature, self.device) 81 | 82 | def train(self, epoch): 83 | self.iteration(epoch, self.train_dataloader, self.cluster_dataloader) 84 | 85 | def valid(self, epoch, full_sort=False): 86 | return self.iteration(epoch, self.eval_dataloader, full_sort=full_sort, train=False) 87 | 88 | def test(self, epoch, full_sort=False): 89 | return self.iteration(epoch, self.test_dataloader, full_sort=full_sort, train=False) 90 | 91 | def iteration(self, epoch, dataloader, full_sort=False, train=True): 92 | raise NotImplementedError 93 | 94 | def get_sample_scores(self, epoch, pred_list): 95 | pred_list = (-pred_list).argsort().argsort()[:, 0] 96 | HIT_1, NDCG_1, MRR = get_metric(pred_list, 1) 97 | HIT_5, NDCG_5, MRR = get_metric(pred_list, 5) 98 | HIT_10, NDCG_10, MRR = get_metric(pred_list, 10) 99 | post_fix = { 100 | "Epoch": epoch, 101 | "HIT@1": "{:.4f}".format(HIT_1), 102 | "NDCG@1": "{:.4f}".format(NDCG_1), 103 | "HIT@5": "{:.4f}".format(HIT_5), 104 | "NDCG@5": "{:.4f}".format(NDCG_5), 105 | "HIT@10": "{:.4f}".format(HIT_10), 106 | "NDCG@10": "{:.4f}".format(NDCG_10), 107 | "MRR": "{:.4f}".format(MRR), 108 | } 109 | print(post_fix) 110 | with open(self.args.log_file, "a") as f: 111 | f.write(str(post_fix) + "\n") 112 | return [HIT_1, NDCG_1, HIT_5, NDCG_5, HIT_10, NDCG_10, MRR], str(post_fix) 113 | 114 | def get_full_sort_score(self, epoch, answers, pred_list): 115 | recall, ndcg = [], [] 116 | for k in [5, 10, 15, 20]: 117 | recall.append(recall_at_k(answers, pred_list, k)) 118 | ndcg.append(ndcg_k(answers, pred_list, k)) 119 | post_fix = { 120 | "Epoch": epoch, 121 | "HIT@5": "{:.4f}".format(recall[0]), 122 | "NDCG@5": "{:.4f}".format(ndcg[0]), 123 | "HIT@10": "{:.4f}".format(recall[1]), 124 | "NDCG@10": "{:.4f}".format(ndcg[1]), 125 | "HIT@20": "{:.4f}".format(recall[3]), 126 | "NDCG@20": "{:.4f}".format(ndcg[3]), 127 | } 128 | print(post_fix) 129 | with open(self.args.log_file, "a") as f: 130 | f.write(str(post_fix) + "\n") 131 | return [recall[0], ndcg[0], recall[1], ndcg[1], recall[3], ndcg[3]], str(post_fix) 132 | 133 | def save(self, file_name): 134 | torch.save(self.model.cpu().state_dict(), file_name) 135 | self.model.to(self.device) 136 | 137 | def load(self, file_name): 138 | self.model.load_state_dict(torch.load(file_name)) 139 | 140 | def cross_entropy(self, seq_out, pos_ids, neg_ids): 141 | # [batch seq_len hidden_size] 142 | pos_emb = self.model.item_embeddings(pos_ids) 143 | neg_emb = self.model.item_embeddings(neg_ids) 144 | # [batch*seq_len hidden_size] 145 | pos = pos_emb.view(-1, pos_emb.size(2)) 146 | neg = neg_emb.view(-1, neg_emb.size(2)) 147 | seq_emb = seq_out.view(-1, self.args.hidden_size) # [batch*seq_len hidden_size] 148 | pos_logits = torch.sum(pos * seq_emb, -1) # [batch*seq_len] 149 | neg_logits = torch.sum(neg * seq_emb, -1) 150 | istarget = (pos_ids > 0).view(pos_ids.size(0) * self.model.args.max_seq_length).float() # [batch*seq_len] 151 | loss = torch.sum( 152 | -torch.log(torch.sigmoid(pos_logits) + 1e-24) * istarget 153 | - torch.log(1 - torch.sigmoid(neg_logits) + 1e-24) * istarget 154 | ) / torch.sum(istarget) 155 | 156 | return loss 157 | 158 | def predict_sample(self, seq_out, test_neg_sample): 159 | # [batch 100 hidden_size] 160 | test_item_emb = self.model.item_embeddings(test_neg_sample) 161 | # [batch hidden_size] 162 | test_logits = torch.bmm(test_item_emb, seq_out.unsqueeze(-1)).squeeze(-1) # [B 100] 163 | return test_logits 164 | 165 | def predict_full(self, seq_out): 166 | # [item_num hidden_size] 167 | test_item_emb = self.model.item_embeddings.weight 168 | # [batch hidden_size ] 169 | rating_pred = torch.matmul(seq_out, test_item_emb.transpose(0, 1)) 170 | return rating_pred 171 | 172 | 173 | class ICLRecTrainer(Trainer): 174 | def __init__(self, model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args): 175 | super(ICLRecTrainer, self).__init__( 176 | model, train_dataloader, cluster_dataloader, eval_dataloader, test_dataloader, args 177 | ) 178 | 179 | def _instance_cl_one_pair_contrastive_learning(self, inputs, intent_ids=None): 180 | """ 181 | contrastive learning given one pair sequences (batch) 182 | inputs: [batch1_augmented_data, batch2_augmentated_data] 183 | """ 184 | cl_batch = torch.cat(inputs, dim=0) 185 | cl_batch = cl_batch.to(self.device) 186 | cl_sequence_output = self.model(cl_batch) 187 | # cf_sequence_output = cf_sequence_output[:, -1, :] 188 | if self.args.seq_representation_instancecl_type == "mean": 189 | cl_sequence_output = torch.mean(cl_sequence_output, dim=1, keepdim=False) 190 | cl_sequence_flatten = cl_sequence_output.view(cl_batch.shape[0], -1) 191 | # cf_output = self.projection(cf_sequence_flatten) 192 | batch_size = cl_batch.shape[0] // 2 193 | cl_output_slice = torch.split(cl_sequence_flatten, batch_size) 194 | if self.args.de_noise: 195 | cl_loss = self.cf_criterion(cl_output_slice[0], cl_output_slice[1], intent_ids=intent_ids) 196 | else: 197 | cl_loss = self.cf_criterion(cl_output_slice[0], cl_output_slice[1], intent_ids=None) 198 | return cl_loss 199 | 200 | def _pcl_one_pair_contrastive_learning(self, inputs, intents, intent_ids): 201 | """ 202 | contrastive learning given one pair sequences (batch) 203 | inputs: [batch1_augmented_data, batch2_augmentated_data] 204 | intents: [num_clusters batch_size hidden_dims] 205 | """ 206 | n_views, (bsz, seq_len) = len(inputs), inputs[0].shape 207 | cl_batch = torch.cat(inputs, dim=0) 208 | cl_batch = cl_batch.to(self.device) 209 | cl_sequence_output = self.model(cl_batch) 210 | if self.args.seq_representation_type == "mean": 211 | cl_sequence_output = torch.mean(cl_sequence_output, dim=1, keepdim=False) 212 | cl_sequence_flatten = cl_sequence_output.view(cl_batch.shape[0], -1) 213 | cl_output_slice = torch.split(cl_sequence_flatten, bsz) 214 | if self.args.de_noise: 215 | cl_loss = self.pcl_criterion(cl_output_slice[0], cl_output_slice[1], intents=intents, intent_ids=intent_ids) 216 | else: 217 | cl_loss = self.pcl_criterion(cl_output_slice[0], cl_output_slice[1], intents=intents, intent_ids=None) 218 | return cl_loss 219 | 220 | def iteration(self, epoch, dataloader, cluster_dataloader=None, full_sort=True, train=True): 221 | 222 | str_code = "train" if train else "test" 223 | 224 | # Setting the tqdm progress bar 225 | 226 | if train: 227 | # ------ intentions clustering ----- # 228 | if self.args.contrast_type in ["IntentCL", "Hybrid"] and epoch >= self.args.warm_up_epoches: 229 | print("Preparing Clustering:") 230 | self.model.eval() 231 | kmeans_training_data = [] 232 | rec_cf_data_iter = tqdm(enumerate(cluster_dataloader), total=len(cluster_dataloader)) 233 | for i, (rec_batch, _, _) in rec_cf_data_iter: 234 | rec_batch = tuple(t.to(self.device) for t in rec_batch) 235 | _, input_ids, target_pos, target_neg, _ = rec_batch 236 | sequence_output = self.model(input_ids) 237 | # average sum 238 | if self.args.seq_representation_type == "mean": 239 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False) 240 | sequence_output = sequence_output.view(sequence_output.shape[0], -1) 241 | sequence_output = sequence_output.detach().cpu().numpy() 242 | kmeans_training_data.append(sequence_output) 243 | kmeans_training_data = np.concatenate(kmeans_training_data, axis=0) 244 | 245 | # train multiple clusters 246 | print("Training Clusters:") 247 | for i, cluster in tqdm(enumerate(self.clusters), total=len(self.clusters)): 248 | cluster.train(kmeans_training_data) 249 | self.clusters[i] = cluster 250 | # clean memory 251 | del kmeans_training_data 252 | import gc 253 | 254 | gc.collect() 255 | 256 | # ------ model training -----# 257 | print("Performing Rec model Training:") 258 | self.model.train() 259 | rec_avg_loss = 0.0 260 | cl_individual_avg_losses = [0.0 for i in range(self.total_augmentaion_pairs)] 261 | cl_sum_avg_loss = 0.0 262 | joint_avg_loss = 0.0 263 | 264 | print(f"rec dataset length: {len(dataloader)}") 265 | rec_cf_data_iter = tqdm(enumerate(dataloader), total=len(dataloader)) 266 | 267 | for i, (rec_batch, cl_batches, seq_class_label_batches) in rec_cf_data_iter: 268 | """ 269 | rec_batch shape: key_name x batch_size x feature_dim 270 | cl_batches shape: 271 | list of n_views x batch_size x feature_dim tensors 272 | """ 273 | # 0. batch_data will be sent into the device(GPU or CPU) 274 | rec_batch = tuple(t.to(self.device) for t in rec_batch) 275 | _, input_ids, target_pos, target_neg, _ = rec_batch 276 | 277 | # ---------- recommendation task ---------------# 278 | sequence_output = self.model(input_ids) 279 | rec_loss = self.cross_entropy(sequence_output, target_pos, target_neg) 280 | 281 | # ---------- contrastive learning task -------------# 282 | cl_losses = [] 283 | for cl_batch in cl_batches: 284 | if self.args.contrast_type == "InstanceCL": 285 | cl_loss = self._instance_cl_one_pair_contrastive_learning( 286 | cl_batch, intent_ids=seq_class_label_batches 287 | ) 288 | cl_losses.append(self.args.cf_weight * cl_loss) 289 | elif self.args.contrast_type == "IntentCL": 290 | # ------ performing clustering for getting users' intentions ----# 291 | # average sum 292 | if epoch >= self.args.warm_up_epoches: 293 | if self.args.seq_representation_type == "mean": 294 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False) 295 | sequence_output = sequence_output.view(sequence_output.shape[0], -1) 296 | sequence_output = sequence_output.detach().cpu().numpy() 297 | 298 | # query on multiple clusters 299 | for cluster in self.clusters: 300 | seq2intents = [] 301 | intent_ids = [] 302 | intent_id, seq2intent = cluster.query(sequence_output) 303 | seq2intents.append(seq2intent) 304 | intent_ids.append(intent_id) 305 | cl_loss = self._pcl_one_pair_contrastive_learning( 306 | cl_batch, intents=seq2intents, intent_ids=intent_ids 307 | ) 308 | cl_losses.append(self.args.intent_cf_weight * cl_loss) 309 | else: 310 | continue 311 | elif self.args.contrast_type == "Hybrid": 312 | if epoch < self.args.warm_up_epoches: 313 | cl_loss1 = self._instance_cl_one_pair_contrastive_learning( 314 | cl_batch, intent_ids=seq_class_label_batches 315 | ) 316 | cl_losses.append(self.args.cf_weight * cl_loss1) 317 | else: 318 | cl_loss1 = self._instance_cl_one_pair_contrastive_learning( 319 | cl_batch, intent_ids=seq_class_label_batches 320 | ) 321 | cl_losses.append(self.args.cf_weight * cl_loss1) 322 | if self.args.seq_representation_type == "mean": 323 | sequence_output = torch.mean(sequence_output, dim=1, keepdim=False) 324 | sequence_output = sequence_output.view(sequence_output.shape[0], -1) 325 | sequence_output = sequence_output.detach().cpu().numpy() 326 | # query on multiple clusters 327 | for cluster in self.clusters: 328 | seq2intents = [] 329 | intent_ids = [] 330 | intent_id, seq2intent = cluster.query(sequence_output) 331 | seq2intents.append(seq2intent) 332 | intent_ids.append(intent_id) 333 | cl_loss3 = self._pcl_one_pair_contrastive_learning( 334 | cl_batch, intents=seq2intents, intent_ids=intent_ids 335 | ) 336 | cl_losses.append(self.args.intent_cf_weight * cl_loss3) 337 | 338 | joint_loss = self.args.rec_weight * rec_loss 339 | for cl_loss in cl_losses: 340 | joint_loss += cl_loss 341 | self.optim.zero_grad() 342 | joint_loss.backward() 343 | self.optim.step() 344 | 345 | rec_avg_loss += rec_loss.item() 346 | 347 | for i, cl_loss in enumerate(cl_losses): 348 | cl_sum_avg_loss += cl_loss.item() 349 | joint_avg_loss += joint_loss.item() 350 | 351 | post_fix = { 352 | "epoch": epoch, 353 | "rec_avg_loss": "{:.4f}".format(rec_avg_loss / len(rec_cf_data_iter)), 354 | "joint_avg_loss": "{:.4f}".format(joint_avg_loss / len(rec_cf_data_iter)), 355 | } 356 | if (epoch + 1) % self.args.log_freq == 0: 357 | print(str(post_fix)) 358 | 359 | with open(self.args.log_file, "a") as f: 360 | f.write(str(post_fix) + "\n") 361 | 362 | else: 363 | rec_data_iter = tqdm(enumerate(dataloader), total=len(dataloader)) 364 | self.model.eval() 365 | 366 | pred_list = None 367 | 368 | if full_sort: 369 | answer_list = None 370 | for i, batch in rec_data_iter: 371 | # 0. batch_data will be sent into the device(GPU or cpu) 372 | batch = tuple(t.to(self.device) for t in batch) 373 | user_ids, input_ids, target_pos, target_neg, answers = batch 374 | recommend_output = self.model(input_ids) 375 | 376 | recommend_output = recommend_output[:, -1, :] 377 | # recommendation results 378 | 379 | rating_pred = self.predict_full(recommend_output) 380 | 381 | rating_pred = rating_pred.cpu().data.numpy().copy() 382 | batch_user_index = user_ids.cpu().numpy() 383 | rating_pred[self.args.train_matrix[batch_user_index].toarray() > 0] = 0 384 | # reference: https://stackoverflow.com/a/23734295, https://stackoverflow.com/a/20104162 385 | # argpartition T: O(n) argsort O(nlogn) 386 | ind = np.argpartition(rating_pred, -20)[:, -20:] 387 | arr_ind = rating_pred[np.arange(len(rating_pred))[:, None], ind] 388 | arr_ind_argsort = np.argsort(arr_ind)[np.arange(len(rating_pred)), ::-1] 389 | batch_pred_list = ind[np.arange(len(rating_pred))[:, None], arr_ind_argsort] 390 | 391 | if i == 0: 392 | pred_list = batch_pred_list 393 | answer_list = answers.cpu().data.numpy() 394 | else: 395 | pred_list = np.append(pred_list, batch_pred_list, axis=0) 396 | answer_list = np.append(answer_list, answers.cpu().data.numpy(), axis=0) 397 | return self.get_full_sort_score(epoch, answer_list, pred_list) 398 | 399 | else: 400 | for i, batch in rec_data_iter: 401 | batch = tuple(t.to(self.device) for t in batch) 402 | user_ids, input_ids, target_pos, target_neg, answers, sample_negs = batch 403 | recommend_output = self.model.finetune(input_ids) 404 | test_neg_items = torch.cat((answers, sample_negs), -1) 405 | recommend_output = recommend_output[:, -1, :] 406 | 407 | test_logits = self.predict_sample(recommend_output, test_neg_items) 408 | test_logits = test_logits.cpu().detach().numpy().copy() 409 | if i == 0: 410 | pred_list = test_logits 411 | else: 412 | pred_list = np.append(pred_list, test_logits, axis=0) 413 | 414 | return self.get_sample_scores(epoch, pred_list) 415 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Copyright (c) 2022 salesforce.com, inc. 4 | # All rights reserved. 5 | # SPDX-License-Identifier: BSD-3-Clause 6 | # For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause 7 | # 8 | 9 | import numpy as np 10 | import math 11 | import random 12 | import os 13 | import json 14 | import pickle 15 | from scipy.sparse import csr_matrix 16 | 17 | import torch 18 | import torch.nn.functional as F 19 | 20 | 21 | def set_seed(seed): 22 | random.seed(seed) 23 | os.environ["PYTHONHASHSEED"] = str(seed) 24 | np.random.seed(seed) 25 | torch.manual_seed(seed) 26 | torch.cuda.manual_seed(seed) 27 | torch.cuda.manual_seed_all(seed) 28 | # some cudnn methods can be random even after fixing the seed 29 | # unless you tell it to be deterministic 30 | torch.backends.cudnn.deterministic = True 31 | 32 | 33 | def nCr(n, r): 34 | f = math.factorial 35 | return f(n) // f(r) // f(n - r) 36 | 37 | 38 | def check_path(path): 39 | if not os.path.exists(path): 40 | os.makedirs(path) 41 | print(f"{path} created") 42 | 43 | 44 | def neg_sample(item_set, item_size): # 前闭后闭 45 | item = random.randint(1, item_size - 1) 46 | while item in item_set: 47 | item = random.randint(1, item_size - 1) 48 | return item 49 | 50 | 51 | class EarlyStopping: 52 | """Early stops the training if validation loss doesn't improve after a given patience.""" 53 | 54 | def __init__(self, checkpoint_path, patience=7, verbose=False, delta=0): 55 | """ 56 | Args: 57 | patience (int): How long to wait after last time validation loss improved. 58 | Default: 7 59 | verbose (bool): If True, prints a message for each validation loss improvement. 60 | Default: False 61 | delta (float): Minimum change in the monitored quantity to qualify as an improvement. 62 | Default: 0 63 | """ 64 | self.checkpoint_path = checkpoint_path 65 | self.patience = patience 66 | self.verbose = verbose 67 | self.counter = 0 68 | self.best_score = None 69 | self.early_stop = False 70 | self.delta = delta 71 | 72 | def compare(self, score): 73 | for i in range(len(score)): 74 | # 有一个指标增加了就认为是还在涨 75 | if score[i] > self.best_score[i] + self.delta: 76 | return False 77 | return True 78 | 79 | def __call__(self, score, model): 80 | # score HIT@10 NDCG@10 81 | 82 | if self.best_score is None: 83 | self.best_score = score 84 | self.score_min = np.array([0] * len(score)) 85 | self.save_checkpoint(score, model) 86 | elif self.compare(score): 87 | self.counter += 1 88 | print(f"EarlyStopping counter: {self.counter} out of {self.patience}") 89 | if self.counter >= self.patience: 90 | self.early_stop = True 91 | else: 92 | self.best_score = score 93 | self.save_checkpoint(score, model) 94 | self.counter = 0 95 | 96 | def save_checkpoint(self, score, model): 97 | """Saves model when validation loss decrease.""" 98 | if self.verbose: 99 | # ({self.score_min:.6f} --> {score:.6f}) # 这里如果是一个值的话输出才不会有问题 100 | print(f"Validation score increased. Saving model ...") 101 | torch.save(model.state_dict(), self.checkpoint_path) 102 | self.score_min = score 103 | 104 | 105 | def kmax_pooling(x, dim, k): 106 | index = x.topk(k, dim=dim)[1].sort(dim=dim)[0] 107 | return x.gather(dim, index).squeeze(dim) 108 | 109 | 110 | def avg_pooling(x, dim): 111 | return x.sum(dim=dim) / x.size(dim) 112 | 113 | 114 | def generate_rating_matrix_valid(user_seq, num_users, num_items): 115 | # three lists are used to construct sparse matrix 116 | row = [] 117 | col = [] 118 | data = [] 119 | for user_id, item_list in enumerate(user_seq): 120 | for item in item_list[:-2]: # 121 | row.append(user_id) 122 | col.append(item) 123 | data.append(1) 124 | 125 | row = np.array(row) 126 | col = np.array(col) 127 | data = np.array(data) 128 | rating_matrix = csr_matrix((data, (row, col)), shape=(num_users, num_items)) 129 | 130 | return rating_matrix 131 | 132 | 133 | def generate_rating_matrix_test(user_seq, num_users, num_items): 134 | # three lists are used to construct sparse matrix 135 | row = [] 136 | col = [] 137 | data = [] 138 | for user_id, item_list in enumerate(user_seq): 139 | for item in item_list[:-1]: # 140 | row.append(user_id) 141 | col.append(item) 142 | data.append(1) 143 | 144 | row = np.array(row) 145 | col = np.array(col) 146 | data = np.array(data) 147 | rating_matrix = csr_matrix((data, (row, col)), shape=(num_users, num_items)) 148 | 149 | return rating_matrix 150 | 151 | 152 | def get_user_seqs(data_file): 153 | lines = open(data_file).readlines() 154 | user_seq = [] 155 | item_set = set() 156 | for line in lines: 157 | user, items = line.strip().split(" ", 1) 158 | items = items.split(" ") 159 | items = [int(item) for item in items] 160 | user_seq.append(items) 161 | item_set = item_set | set(items) 162 | max_item = max(item_set) 163 | 164 | num_users = len(lines) 165 | num_items = max_item + 2 166 | 167 | valid_rating_matrix = generate_rating_matrix_valid(user_seq, num_users, num_items) 168 | test_rating_matrix = generate_rating_matrix_test(user_seq, num_users, num_items) 169 | return user_seq, max_item, valid_rating_matrix, test_rating_matrix 170 | 171 | 172 | def get_user_seqs_long(data_file): 173 | lines = open(data_file).readlines() 174 | user_seq = [] 175 | long_sequence = [] 176 | item_set = set() 177 | for line in lines: 178 | user, items = line.strip().split(" ", 1) 179 | items = items.split(" ") 180 | items = [int(item) for item in items] 181 | long_sequence.extend(items) # 后面的都是采的负例 182 | user_seq.append(items) 183 | item_set = item_set | set(items) 184 | max_item = max(item_set) 185 | 186 | return user_seq, max_item, long_sequence 187 | 188 | 189 | def get_user_seqs_and_sample(data_file, sample_file): 190 | lines = open(data_file).readlines() 191 | user_seq = [] 192 | item_set = set() 193 | for line in lines: 194 | user, items = line.strip().split(" ", 1) 195 | items = items.split(" ") 196 | items = [int(item) for item in items] 197 | user_seq.append(items) 198 | item_set = item_set | set(items) 199 | max_item = max(item_set) 200 | 201 | lines = open(sample_file).readlines() 202 | sample_seq = [] 203 | for line in lines: 204 | user, items = line.strip().split(" ", 1) 205 | items = items.split(" ") 206 | items = [int(item) for item in items] 207 | sample_seq.append(items) 208 | 209 | assert len(user_seq) == len(sample_seq) 210 | 211 | return user_seq, max_item, sample_seq 212 | 213 | 214 | def get_item2attribute_json(data_file): 215 | item2attribute = json.loads(open(data_file).readline()) 216 | attribute_set = set() 217 | for item, attributes in item2attribute.items(): 218 | attribute_set = attribute_set | set(attributes) 219 | attribute_size = max(attribute_set) # 331 220 | return item2attribute, attribute_size 221 | 222 | 223 | def get_metric(pred_list, topk=10): 224 | NDCG = 0.0 225 | HIT = 0.0 226 | MRR = 0.0 227 | # [batch] the answer's rank 228 | for rank in pred_list: 229 | MRR += 1.0 / (rank + 1.0) 230 | if rank < topk: 231 | NDCG += 1.0 / np.log2(rank + 2.0) 232 | HIT += 1.0 233 | return HIT / len(pred_list), NDCG / len(pred_list), MRR / len(pred_list) 234 | 235 | 236 | def precision_at_k_per_sample(actual, predicted, topk): 237 | num_hits = 0 238 | for place in predicted: 239 | if place in actual: 240 | num_hits += 1 241 | return num_hits / (topk + 0.0) 242 | 243 | 244 | def precision_at_k(actual, predicted, topk): 245 | sum_precision = 0.0 246 | num_users = len(predicted) 247 | for i in range(num_users): 248 | act_set = set(actual[i]) 249 | pred_set = set(predicted[i][:topk]) 250 | sum_precision += len(act_set & pred_set) / float(topk) 251 | 252 | return sum_precision / num_users 253 | 254 | 255 | def recall_at_k(actual, predicted, topk): 256 | sum_recall = 0.0 257 | num_users = len(predicted) 258 | true_users = 0 259 | for i in range(num_users): 260 | act_set = set(actual[i]) 261 | pred_set = set(predicted[i][:topk]) 262 | if len(act_set) != 0: 263 | sum_recall += len(act_set & pred_set) / float(len(act_set)) 264 | true_users += 1 265 | return sum_recall / true_users 266 | 267 | 268 | def apk(actual, predicted, k=10): 269 | """ 270 | Computes the average precision at k. 271 | This function computes the average precision at k between two lists of 272 | items. 273 | Parameters 274 | ---------- 275 | actual : list 276 | A list of elements that are to be predicted (order doesn't matter) 277 | predicted : list 278 | A list of predicted elements (order does matter) 279 | k : int, optional 280 | The maximum number of predicted elements 281 | Returns 282 | ------- 283 | score : double 284 | The average precision at k over the input lists 285 | """ 286 | if len(predicted) > k: 287 | predicted = predicted[:k] 288 | 289 | score = 0.0 290 | num_hits = 0.0 291 | 292 | for i, p in enumerate(predicted): 293 | if p in actual and p not in predicted[:i]: 294 | num_hits += 1.0 295 | score += num_hits / (i + 1.0) 296 | 297 | if not actual: 298 | return 0.0 299 | 300 | return score / min(len(actual), k) 301 | 302 | 303 | def mapk(actual, predicted, k=10): 304 | """ 305 | Computes the mean average precision at k. 306 | This function computes the mean average prescision at k between two lists 307 | of lists of items. 308 | Parameters 309 | ---------- 310 | actual : list 311 | A list of lists of elements that are to be predicted 312 | (order doesn't matter in the lists) 313 | predicted : list 314 | A list of lists of predicted elements 315 | (order matters in the lists) 316 | k : int, optional 317 | The maximum number of predicted elements 318 | Returns 319 | ------- 320 | score : double 321 | The mean average precision at k over the input lists 322 | """ 323 | return np.mean([apk(a, p, k) for a, p in zip(actual, predicted)]) 324 | 325 | 326 | def ndcg_k(actual, predicted, topk): 327 | res = 0 328 | for user_id in range(len(actual)): 329 | k = min(topk, len(actual[user_id])) 330 | idcg = idcg_k(k) 331 | dcg_k = sum([int(predicted[user_id][j] in set(actual[user_id])) / math.log(j + 2, 2) for j in range(topk)]) 332 | res += dcg_k / idcg 333 | return res / float(len(actual)) 334 | 335 | 336 | # Calculates the ideal discounted cumulative gain at k 337 | def idcg_k(k): 338 | res = sum([1.0 / math.log(i + 2, 2) for i in range(k)]) 339 | if not res: 340 | return 1.0 341 | else: 342 | return res 343 | --------------------------------------------------------------------------------