├── Screenshots ├── CB_result.png ├── 备选推荐节目集及所属类型.png ├── UserCF_result.png ├── 备选推荐节目集及所属类型01矩阵.png ├── 所有用户对其看过的节目的评分矩阵.png ├── 所有用户看过的节目及所属类型的01矩阵.png ├── CB_Mixture_userCF_result.png └── 用户A(B、C)对于其三个月来所看过节目的评分.png ├── 输入数据表格 ├── 备选推荐节目集及所属类型.xlsx ├── 备选推荐节目集及所属类型01矩阵.xlsx ├── 所有用户对其看过的节目的评分矩阵.xlsx ├── 用户A对于其三个月来所看过节目的评分.xls ├── 用户B对于其三个月来所看过节目的评分.xls ├── 用户C对于其三个月来所看过节目的评分.xls └── 所有用户看过的节目及所属类型的01矩阵.xlsx ├── LICENSE ├── RecommenderSystem ├── items_labels_to_01matrix.py ├── items_saw_labels_to_01matrix.py ├── CB_Mixture_userCF.py ├── UserCF.py └── CB.py └── README.md /Screenshots/CB_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/CB_result.png -------------------------------------------------------------------------------- /输入数据表格/备选推荐节目集及所属类型.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/备选推荐节目集及所属类型.xlsx -------------------------------------------------------------------------------- /Screenshots/备选推荐节目集及所属类型.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/备选推荐节目集及所属类型.png -------------------------------------------------------------------------------- /输入数据表格/备选推荐节目集及所属类型01矩阵.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/备选推荐节目集及所属类型01矩阵.xlsx -------------------------------------------------------------------------------- /输入数据表格/所有用户对其看过的节目的评分矩阵.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/所有用户对其看过的节目的评分矩阵.xlsx -------------------------------------------------------------------------------- /Screenshots/UserCF_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/UserCF_result.png -------------------------------------------------------------------------------- /输入数据表格/用户A对于其三个月来所看过节目的评分.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/用户A对于其三个月来所看过节目的评分.xls -------------------------------------------------------------------------------- /输入数据表格/用户B对于其三个月来所看过节目的评分.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/用户B对于其三个月来所看过节目的评分.xls -------------------------------------------------------------------------------- /输入数据表格/用户C对于其三个月来所看过节目的评分.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/用户C对于其三个月来所看过节目的评分.xls -------------------------------------------------------------------------------- /Screenshots/备选推荐节目集及所属类型01矩阵.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/备选推荐节目集及所属类型01矩阵.png -------------------------------------------------------------------------------- /Screenshots/所有用户对其看过的节目的评分矩阵.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/所有用户对其看过的节目的评分矩阵.png -------------------------------------------------------------------------------- /输入数据表格/所有用户看过的节目及所属类型的01矩阵.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/输入数据表格/所有用户看过的节目及所属类型的01矩阵.xlsx -------------------------------------------------------------------------------- /Screenshots/所有用户看过的节目及所属类型的01矩阵.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/所有用户看过的节目及所属类型的01矩阵.png -------------------------------------------------------------------------------- /Screenshots/CB_Mixture_userCF_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/CB_Mixture_userCF_result.png -------------------------------------------------------------------------------- /Screenshots/用户A(B、C)对于其三个月来所看过节目的评分.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yuziquan/RecommenderSystem/HEAD/Screenshots/用户A(B、C)对于其三个月来所看过节目的评分.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 WuchangI 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /RecommenderSystem/items_labels_to_01matrix.py: -------------------------------------------------------------------------------- 1 | # 代码说明: 2 | # 根据"D:\Recommender Systems\备选推荐节目集及所属类型.xlsx"生成"D:\Recommender Systems\备选推荐节目集及所属类型01矩阵.xlsx" 3 | 4 | import pandas as pd 5 | import numpy as np 6 | 7 | if __name__ == '__main__': 8 | 9 | df = pd.read_excel("D:\Recommender Systems\备选推荐节目集及所属类型.xlsx") 10 | (m, n) = df.shape 11 | 12 | data_array = np.array(df.iloc[0:m+1,:]) 13 | print(data_array) 14 | 15 | # 按指定顺序排列的所有标签 16 | all_labels = ['教育', '戏曲', '悬疑', '科幻', '惊悚', '动作', '资讯', '武侠', '剧情', '警匪', '生活', '军事', '言情', '体育', '冒险', '纪实', '少儿教育', '少儿', '综艺', '古装', '搞笑', '广告'] 17 | labels_num = len(all_labels) 18 | 19 | # 按顺序提取所有节目的名称 20 | all_items_names = np.array(df.iloc[:m+1, 0]) 21 | print(all_items_names) 22 | 23 | # 创建一个01矩阵,0表示该节目不属于该类型,1表示该节目属于该类型 24 | data_to_be_written = [] 25 | 26 | for i in range(len(all_items_names)): 27 | 28 | # 每个节目的01行向量 29 | vector = [0] * labels_num 30 | labels_names = str(data_array[i][1]).split(" ") 31 | 32 | for j in range(len(labels_names)): 33 | location = all_labels.index(labels_names[j]) 34 | vector[location] = 1 35 | 36 | data_to_be_written.append(vector) 37 | 38 | # 将01矩阵写入“备选推荐节目集及所属类型01矩阵” 39 | df = pd.DataFrame(data_to_be_written, index=all_items_names, columns=all_labels) 40 | df.to_excel("D:\Recommender Systems\备选推荐节目集及所属类型01矩阵.xlsx") 41 | 42 | # PS: 记得在生成的“备选推荐节目集及所属类型01矩阵表”中节目名那一列的首个空白的单元格中打上“节目名” 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | -------------------------------------------------------------------------------- /RecommenderSystem/items_saw_labels_to_01matrix.py: -------------------------------------------------------------------------------- 1 | # 代码说明: 2 | # 根据"D:\Recommender Systems\用户A/B/C对于其三个月来所看过节目的评分.xls" 3 | # 生成"D:\Recommender Systems\所有用户看过的节目及所属类型的01矩阵.xlsx" 4 | import pandas as pd 5 | import numpy as np 6 | 7 | if __name__ == '__main__': 8 | 9 | all_users_names = ['A', 'B', 'C'] 10 | 11 | # 所有用户看过的节目名 all_items_users_saw = [item2, item3, item4] 12 | # 所有用户看过的节目名对应的类型 all_items_users_saw_labels = ["label2 label3", "label3", ...] 13 | all_items_users_saw = [] 14 | all_items_users_saw_labels = [] 15 | 16 | for j in range(len(all_users_names)): 17 | 18 | fileToBeRead = "D:\Recommender Systems\用户" + all_users_names[j] + "对于其三个月来所看过节目的评分.xls" 19 | df = pd.read_excel(fileToBeRead) 20 | (m, n) = df.shape 21 | data_array = np.array(df) 22 | 23 | for i in range(m): 24 | # 不重复记录相同的节目 25 | if data_array[i][2] not in all_items_users_saw: 26 | all_items_users_saw.append(data_array[i][2]) 27 | all_items_users_saw_labels.append(data_array[i][3]) 28 | 29 | # 生成"所有用户看过的节目及所属类型的01矩阵" 30 | all_labels = ['教育', '戏曲', '悬疑', '科幻', '惊悚', '动作', '资讯', '武侠', '剧情', '警匪', '生活', '军事', '言情', '体育', '冒险', '纪实', '少儿教育', '少儿', '综艺', '古装', '搞笑', '广告'] 31 | labels_num = len(all_labels) 32 | 33 | all_items_labels_01_vectors = [] 34 | 35 | for i in range(len(all_items_users_saw)): 36 | vector = [0] * labels_num 37 | labels_names = all_items_users_saw_labels[i].split(" ") 38 | 39 | for j in range(len(labels_names)): 40 | location = all_labels.index(labels_names[j]) 41 | vector[location] = 1 42 | 43 | all_items_labels_01_vectors.append(vector) 44 | 45 | df = pd.DataFrame(all_items_labels_01_vectors, index=all_items_users_saw, columns=all_labels) 46 | df.to_excel("D:\Recommender Systems\所有用户看过的节目及所属类型的01矩阵.xlsx") 47 | 48 | # PS: 记得在生成的“所有用户看过的节目及所属类型的01矩阵表”中节目名那一列的首个空白的单元格中打上“节目名” -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## RecommenderSystem 2 | 3 | [![RecommenderSystem](https://img.shields.io/badge/RecommenderSystem-v1.0.0-brightgreen.svg)](https://github.com/Yuziquan/RecommenderSystem) 4 | [![license](https://img.shields.io/packagist/l/doctrine/orm.svg)](https://github.com/Yuziquan/RecommenderSystem/blob/master/LICENSE) 5 | 6 | > 详情移步:https://blog.csdn.net/WuchangI/article/details/80160566 7 | 8 | ### 一、项目功能 9 | 10 | 实现一个网络电视节目推荐系统,基于每位用户的观看记录以及节目信息,对每位用户实行节目的个性化推荐。 主要技术是基于用户协同过滤与基于内容的推荐算法的后融合。 11 | 12 |
13 | 14 | *** 15 | 16 | ### 二、项目运行效果 17 | 18 | #### 1. 用户A(B、C)对于其三个月来所看过节目的评分 19 | 20 | ![1](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/%E7%94%A8%E6%88%B7A(B%E3%80%81C)%E5%AF%B9%E4%BA%8E%E5%85%B6%E4%B8%89%E4%B8%AA%E6%9C%88%E6%9D%A5%E6%89%80%E7%9C%8B%E8%BF%87%E8%8A%82%E7%9B%AE%E7%9A%84%E8%AF%84%E5%88%86.png) 21 | 22 |
23 | 24 | #### 2. 备选推荐节目集及所属类型 25 | 26 | ![2](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/%E5%A4%87%E9%80%89%E6%8E%A8%E8%8D%90%E8%8A%82%E7%9B%AE%E9%9B%86%E5%8F%8A%E6%89%80%E5%B1%9E%E7%B1%BB%E5%9E%8B.png) 27 | 28 |
29 | 30 | #### 3. 备选推荐节目集及所属类型01矩阵 31 | 32 | ![3](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/%E5%A4%87%E9%80%89%E6%8E%A8%E8%8D%90%E8%8A%82%E7%9B%AE%E9%9B%86%E5%8F%8A%E6%89%80%E5%B1%9E%E7%B1%BB%E5%9E%8B01%E7%9F%A9%E9%98%B5.png) 33 | 34 |
35 | 36 | #### 4. 所有用户对其看过的节目的评分矩阵 37 | 38 | ![4](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/%E6%89%80%E6%9C%89%E7%94%A8%E6%88%B7%E5%AF%B9%E5%85%B6%E7%9C%8B%E8%BF%87%E7%9A%84%E8%8A%82%E7%9B%AE%E7%9A%84%E8%AF%84%E5%88%86%E7%9F%A9%E9%98%B5.png) 39 | 40 |
41 | 42 | #### 5. 所有用户看过的节目及所属类型的01矩阵 43 | 44 | ![5](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/%E6%89%80%E6%9C%89%E7%94%A8%E6%88%B7%E7%9C%8B%E8%BF%87%E7%9A%84%E8%8A%82%E7%9B%AE%E5%8F%8A%E6%89%80%E5%B1%9E%E7%B1%BB%E5%9E%8B%E7%9A%8401%E7%9F%A9%E9%98%B5.png) 45 | 46 |
47 | 48 | #### 6. 基于内容的推荐算法CB的推荐结果 49 | 50 | ![6](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/CB_result.png) 51 | 52 |
53 | 54 | #### 7. 基于用户的协同过滤算法UserCF的推荐结果 55 | 56 | ![7](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/UserCF_result.png) 57 | 58 |
59 | 60 | #### 8. 将CB和UserCF的两个推荐集按一定比例混合(后融合),实现混合推荐 61 | 62 | ![8](https://github.com/Yuziquan/RecommenderSystem/blob/master/Screenshots/CB_Mixture_userCF_result.png) 63 | 64 | -------------------------------------------------------------------------------- /RecommenderSystem/CB_Mixture_userCF.py: -------------------------------------------------------------------------------- 1 | from recommender_system.CB import * 2 | from recommender_system.UserCF import * 3 | 4 | 5 | # 输出推荐给该用户的节目列表 6 | # max_num:最多输出的推荐节目数 7 | def printRecommendItems(recommend_items_sorted, max_num): 8 | count = 0 9 | for item, degree in recommend_items_sorted: 10 | print("节目名:%s, 推荐指数:%f" % (item, degree)) 11 | count += 1 12 | if count == max_num: 13 | break 14 | 15 | if __name__ == '__main__': 16 | 17 | all_users_names = ['A', 'B', 'C'] 18 | all_labels = ['教育', '戏曲', '悬疑', '科幻', '惊悚', '动作', '资讯', '武侠', '剧情', '警匪', '生活', '军事', '言情', '体育', '冒险', '纪实', 19 | '少儿教育', '少儿', '综艺', '古装', '搞笑', '广告'] 20 | labels_num = len(all_labels) 21 | 22 | df1 = pd.read_excel("D:\Recommender Systems\所有用户对其看过的节目的评分矩阵.xlsx") 23 | (m1, n1) = df1.shape 24 | # 所有用户对其看过的节目的评分矩阵 25 | # data_array1 = [[0.1804 0.042 0.11 0.07 0.19 0.56 0.14 0.3 0.32 0, ...], [...]] 26 | data_array1 = np.array(df1.iloc[:m1 + 1, 1:]) 27 | # 按照"所有用户对其看过的节目的评分矩阵"的列序排列的所有用户观看过的节目名称 28 | items_users_saw_names1 = df1.columns[1:].tolist() 29 | 30 | # users_dict = {用户一:[['节目一', 3.2], ['节目四', 0.2], ['节目八', 6.5]], 用户二: ... } 31 | users_dict = createUsersDict(df1) 32 | # items_dict = {节目一: [用户一, 用户三], 节目二: [...], ... } 33 | items_dict = createItemsDict(df1) 34 | 35 | df2 = pd.read_excel("D:\Recommender Systems\所有用户看过的节目及所属类型的01矩阵.xlsx") 36 | (m2, n2) = df2.shape 37 | data_array2 = np.array(df2.iloc[:m2 + 1, 1:]) 38 | # 按照"所有用户看过的节目及所属类型的01矩阵"的列序排列的所有用户观看过的节目名称 39 | items_users_saw_names2 = np.array(df2.iloc[:m2 + 1, 0]).tolist() 40 | 41 | # 为用户看过的节目建立节目画像 42 | items_users_saw_profiles = createItemsProfiles(data_array2, all_labels, items_users_saw_names2) 43 | 44 | # 建立用户画像users_profiles和用户看过的节目集items_users_saw 45 | (users_profiles, items_users_saw) = createUsersProfiles(data_array1, all_users_names, items_users_saw_names1, 46 | all_labels, items_users_saw_profiles) 47 | 48 | df3 = pd.read_excel("D:\Recommender Systems\备选推荐节目集及所属类型01矩阵.xlsx") 49 | (m3, n3) = df3.shape 50 | data_array3 = np.array(df3.iloc[:m3 + 1, 1:]) 51 | # 按照"备选推荐节目集及所属类型01矩阵"的列序排列的所有用户观看过的节目名称 52 | items_to_be_recommended_names = np.array(df3.iloc[:m3 + 1, 0]).tolist() 53 | 54 | # 为备选推荐节目集建立节目画像 55 | items_to_be_recommended_profiles = createItemsProfiles(data_array3, all_labels, items_to_be_recommended_names) 56 | 57 | 58 | # 两种推荐算法后融合,也就是将两种推荐算法对某个用户分别产生的两个推荐节目集按不同比例混合,得出最后的对该用户的推荐结果 59 | 60 | # 对于每个用户推荐topN个节目,在两种推荐算法产生的推荐集中分别选取比例为w1和w2的推荐结果,CB占w1, userCF占w2 61 | # w1 + w2 = 1 且 w1 * topN + w2 * topN = topN 62 | 63 | topN = 5 64 | 65 | w1 = 0.7 66 | w2 = 0.3 67 | 68 | # 从CB的推荐集中选出前topW1项 69 | topW1 = int(w1 * topN) 70 | 71 | # 从userCF的推荐集中选出前topW2项 72 | topW2 = topN-topW1 73 | 74 | for user in all_users_names: 75 | 76 | # 对于用户user的最终混合推荐节目集 77 | recommend_items = [] 78 | 79 | # CB 80 | # recommend_items1 = [[节目名, 该节目与该用户user画像的相似度], ...] 81 | recommend_items1 = contentBased(users_profiles[user], items_to_be_recommended_profiles, items_to_be_recommended_names, all_labels, items_users_saw[user]) 82 | len1 = len(recommend_items1) 83 | 84 | if len1 <= topW1: 85 | recommend_items = recommend_items + recommend_items1 86 | else: 87 | recommend_items = recommend_items + recommend_items1[:topW1] 88 | 89 | 90 | # userCF 91 | # recommend_item2 = [[节目名, 该用户user对该节目的感兴趣程度],...] 92 | recommend_items2 = userCF(user, users_dict, items_dict, 2, items_to_be_recommended_names) 93 | len2 = len(recommend_items2) 94 | 95 | if len2 <= topW2: 96 | recommend_items = recommend_items + recommend_items2 97 | else: 98 | recommend_items = recommend_items + recommend_items2[:topW2] 99 | 100 | # 将推荐结果按推荐指数降序排序 101 | recommend_items.sort(key=lambda item: item[1], reverse=True) 102 | 103 | print("对于用户 %s 的推荐节目如下" % user) 104 | printRecommendItems(recommend_items, 5) 105 | print() -------------------------------------------------------------------------------- /RecommenderSystem/UserCF.py: -------------------------------------------------------------------------------- 1 | # 代码说明: 2 | # 基于用户的协同过滤算法的具体实现 3 | 4 | import math 5 | import numpy as np 6 | import pandas as pd 7 | 8 | # 借助pearson相关系数进行修正后的余弦相似度计算公式,计算两个用户之间的相似度 9 | # 记 sim(user1, user2) = sigma_xy /sqrt(sigma_x * sigma_y) 10 | # user1和user2都表示为[[节目名称,隐性评分], [节目名称,隐性评分]],如user1 = [['节目一', 3.2], ['节目四', 0.2], ['节目八', 6.5], ...] 11 | 12 | def calCosDistByPearson(user1, user2): 13 | x = 0.0 14 | y = 0.0 15 | 16 | sigma_xy = 0.0 17 | sigma_x = 0.0 18 | sigma_y = 0.0 19 | 20 | for item in user1: 21 | x += item[1] 22 | 23 | # user1对其看过的所有节目的平均评分 24 | average_x = x / len(user1) 25 | 26 | for item in user2: 27 | y += item[1] 28 | 29 | # user2对其看过的所有节目的平均评分 30 | average_y = y / len(user2) 31 | 32 | for item1 in user1: 33 | for item2 in user2: 34 | if item1[0] == item2[0]: # 对user1和user2都共同看过的节目才考虑进去 35 | sigma_xy += (item1[1] - average_x) * (item2[1] - average_y) 36 | sigma_x += (item1[1] - average_x) * (item1[1] - average_x) 37 | sigma_y += (item2[1] - average_y) * (item2[1] - average_y) 38 | 39 | if sigma_x == 0.0 or sigma_y == 0.0: # 若分母为0,相似度为0 40 | return 0 41 | 42 | return sigma_xy/math.sqrt(sigma_x * sigma_y) 43 | 44 | 45 | # 创建所有用户的观看信息(包含隐性评分信息),“从用户到节目” 46 | # 格式例子:users_to_items = {用户一:[['节目一', 3.2], ['节目四', 0.2], ['节目八', 6.5]], 用户二: ... } 47 | def createUsersDict(df): 48 | 49 | (m, n) = df.shape 50 | data_array = np.array(df.iloc[:m + 1, 1:]) 51 | users_names = np.array(df.iloc[:m + 1, 0]).tolist() 52 | items_names = np.array(df.columns)[1:] 53 | 54 | users_to_items = {} 55 | 56 | for i in range(len(users_names)): 57 | user_and_scores_list = [] 58 | for j in range(len(items_names)): 59 | if data_array[i][j] > 0: 60 | user_and_scores_list.append([items_names[j], data_array[i][j]]) 61 | users_to_items[users_names[i]] = user_and_scores_list 62 | 63 | return users_to_items 64 | 65 | # 创建所有节目被哪些用户观看的字典,也就是创建“从节目到用户”的倒排表items_and_users 66 | # items_to_users = {节目一: [用户一, 用户三], 节目二: ... } 67 | def createItemsDict(df): 68 | 69 | (m, n) = df.shape 70 | data_array = np.array(df.iloc[:m + 1, 1:]) 71 | users_names = np.array(df.iloc[:m + 1, 0]).tolist() 72 | items_names = np.array(df.columns)[1:] 73 | items_to_users = {} 74 | 75 | for i in range(len(items_names)): 76 | users_list = [] 77 | for j in range(len(users_names)): 78 | if data_array[j][i] > 0: 79 | users_list.append(users_names[j]) 80 | items_to_users[items_names[i]] = users_list 81 | 82 | return items_to_users 83 | 84 | 85 | # 找出与用户user_name相关的所有用户(即邻居)并依照相似度排序 86 | # neighbors_distance = [[用户名, 相似度大小], [...], ...] = [['用户四', 0.78],[...], ...] 87 | def findSimilarUsers(users_dict, items_dict, user_name): 88 | 89 | neighbors = [] # neighbors表示与该用户看过相同节目的所有用户 90 | 91 | for items in users_dict[user_name]: 92 | for neighbor in items_dict[items[0]]: 93 | if neighbor != user_name and neighbor not in neighbors: 94 | neighbors.append(neighbor) 95 | 96 | # 计算该用户与其所有邻居的相似度并降序排序 97 | neighbors_distance = [] 98 | for neighbor in neighbors: 99 | distance = calCosDistByPearson(users_dict[user_name], users_dict[neighbor]) 100 | neighbors_distance.append([neighbor, distance]) 101 | 102 | neighbors_distance.sort(key=lambda item: item[1], reverse=True) 103 | 104 | return neighbors_distance 105 | 106 | 107 | # 基于用户的协同过滤算法 108 | # K为邻居个数,是一个重要参数,参数调优时使用 109 | def userCF(user_name, users_dict, items_dict, K, all_items_names_to_be_recommend): 110 | 111 | # recommend_items = {节目名:某个看过该节目的该用户user_name的邻居与该用户的相似度, ...} 112 | recommend_items = {} 113 | # 将上面的recommend_items转换成列表形式并排序为recommend_items_sorted = [[节目一, 该用户对节目一的感兴趣程度],[...], ...] 114 | recommend_items_sorted = [] 115 | 116 | # 用户user_name看过的节目 117 | items_user_saw = [] 118 | for item in users_dict[user_name]: 119 | items_user_saw.append(item[0]) 120 | 121 | # 找出与该用户相似度最大的K个用户(邻居) 122 | similar_users = findSimilarUsers(users_dict, items_dict, user_name) 123 | if len(similar_users) < K: 124 | k_similar_user = similar_users 125 | else: 126 | k_similar_user = similar_users[:K] 127 | 128 | # 得出对该用户的推荐节目集 129 | for user in k_similar_user: 130 | for item in users_dict[user[0]]: 131 | # 该用户user_name没有看过的节目才添加进来,才可以推荐给该用户 132 | if item[0] not in items_user_saw: 133 | # 而且该节目必须是在备选推荐节目集中 134 | if item[0] in all_items_names_to_be_recommend: 135 | if item[0] not in recommend_items: 136 | # recommend_items是一个字典。第一次迭代中,表示将第一个邻居用户与该用户的相似度加到节目名上,后续迭代如果有其他邻居用户也看过该节目, 137 | # 也将其与该用户的相似度加到节目名上,迭代的结果就是该用户对该节目的感兴趣程度 138 | recommend_items[item[0]] = user[1] 139 | 140 | else: 141 | # 如果某个节目有k个邻居用户看过,则将这k个邻居用户与该用户的相似度相加,得到该用户对某个节目的感兴趣程度 142 | recommend_items[item[0]] += user[1] 143 | 144 | for key in recommend_items: 145 | recommend_items_sorted.append([key, recommend_items[key]]) 146 | 147 | # 对推荐节目集按用户感兴趣程度降序排序 148 | recommend_items_sorted.sort(key=lambda item: item[1], reverse=True) 149 | 150 | return recommend_items_sorted 151 | 152 | # 输出推荐给该用户的节目列表 153 | # max_num:最多输出的推荐节目数 154 | def printRecommendItems(recommend_items_sorted, max_num): 155 | count = 0 156 | for item, degree in recommend_items_sorted: 157 | print("节目名:%s, 推荐指数:%f" % (item, degree)) 158 | count += 1 159 | if count == max_num: 160 | break 161 | 162 | # 主程序 163 | if __name__ == '__main__': 164 | 165 | all_users_names = ['A', 'B', 'C'] 166 | 167 | df1 = pd.read_excel("D:\Recommender Systems\备选推荐节目集及所属类型01矩阵.xlsx") 168 | (m1, n1) = df1.shape 169 | # 按照"备选推荐节目集及所属类型01矩阵"的列序排列的所有用户观看过的节目名称 170 | items_to_be_recommended_names = np.array(df1.iloc[:m1 + 1, 0]).tolist() 171 | 172 | df2 = pd.read_excel("D:\Recommender Systems\所有用户对其看过的节目的评分矩阵.xlsx") 173 | 174 | # users_dict = {用户一:[['节目一', 3.2], ['节目四', 0.2], ['节目八', 6.5]], 用户二: ... } 175 | users_dict = createUsersDict(df2) 176 | # items_dict = {节目一: [用户一, 用户三], 节目二: [...], ... } 177 | items_dict = createItemsDict(df2) 178 | 179 | for user in all_users_names: 180 | print("对于用户 %s 的推荐节目如下:" % user) 181 | recommend_items = userCF(user, users_dict, items_dict, 2, items_to_be_recommended_names) 182 | printRecommendItems(recommend_items, 3) 183 | print() 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | -------------------------------------------------------------------------------- /RecommenderSystem/CB.py: -------------------------------------------------------------------------------- 1 | # 代码说明: 2 | # 基于内容的推荐算法的具体实现 3 | 4 | import math 5 | import numpy as np 6 | import pandas as pd 7 | 8 | # 创建节目画像 9 | # 参数说明: 10 | # items_profiles = {item1:{'label1':1, 'label2': 0, 'label3': 0, ...}, item2:{...}...} 11 | def createItemsProfiles(data_array, labels_names, items_names): 12 | 13 | items_profiles = {} 14 | 15 | for i in range(len(items_names)): 16 | 17 | items_profiles[items_names[i]] = {} 18 | 19 | for j in range(len(labels_names)): 20 | items_profiles[items_names[i]][labels_names[j]] = data_array[i][j] 21 | 22 | return items_profiles 23 | 24 | # 创建用户画像 25 | # 参数说明: 26 | # data_array: 所有用户对于其所看过的节目的评分矩阵 data_array = [[2, 0, 0, 1.1, ...], [0, 0, 1.1, ...], ...] 27 | # users_profiles = {user1:{'label1':1.1, 'label2': 0.5, 'label3': 0.0, ...}, user2:{...}...} 28 | def createUsersProfiles(data_array, users_names, items_names, labels_names, items_profiles): 29 | 30 | users_profiles = {} 31 | 32 | # 计算每个用户对所看过的所有节目的平均隐性评分 33 | # users_average_scores_list = [1.2, 2.2, 4.3,...] 34 | users_average_scores_list = [] 35 | 36 | # 统计每个用户所看过的节目(不加入隐性评分信息) 37 | # items_users_saw = {user1:[item1, item3, item5], user2:[...],...} 38 | items_users_saw = {} 39 | 40 | # 统计每个用户所看过的节目及评分 41 | # items_users_saw_scores = {user1:[[item1, 1.1], [item2, 4.1]], user2:...} 42 | items_users_saw_scores = {} 43 | 44 | for i in range(len(users_names)): 45 | 46 | items_users_saw_scores[users_names[i]] = [] 47 | items_users_saw[users_names[i]] = [] 48 | count = 0 49 | sum = 0.0 50 | 51 | for j in range(len(items_names)): 52 | 53 | # 用户对该节目隐性评分为正,表示真正看过该节目 54 | if data_array[i][j] > 0: 55 | items_users_saw[users_names[i]].append(items_names[j]) 56 | items_users_saw_scores[users_names[i]].append([items_names[j], data_array[i][j]]) 57 | count += 1 58 | sum += data_array[i][j] 59 | 60 | if count == 0: 61 | users_average_scores_list.append(0) 62 | else: 63 | users_average_scores_list.append(sum / count) 64 | 65 | for i in range(len(users_names)): 66 | 67 | users_profiles[users_names[i]] = {} 68 | 69 | for j in range(len(labels_names)): 70 | count = 0 71 | score = 0.0 72 | 73 | for item in items_users_saw_scores[users_names[i]]: 74 | 75 | # 参数: 76 | # 用户user1对于类型label1的隐性评分: user1_score_to_label1 77 | # 用户user1对于其看过的含有类型label1的节目item i 的评分: score_to_item i 78 | # 用户user1对其所看过的所有节目的平均评分: user1_average_score 79 | # 用户user1看过的节目总数: items_count 80 | 81 | # 公式: user1_score_to_label1 = Sigma(score_to_item i - user1_average_score)/items_count 82 | 83 | # 该节目含有特定标签labels_names[j] 84 | if items_profiles[item[0]][labels_names[j]] > 0: 85 | score += (item[1] - users_average_scores_list[i]) 86 | count += 1 87 | 88 | # 如果求出的值太小,直接置0 89 | if abs(score) < 1e-6: 90 | score = 0.0 91 | if count == 0: 92 | result = 0.0 93 | else: 94 | result = score / count 95 | 96 | users_profiles[users_names[i]][labels_names[j]] = result 97 | 98 | return (users_profiles, items_users_saw) 99 | 100 | 101 | # 计算用户画像向量与节目画像向量的距离(相似度) 102 | # 向量相似度计算公式: 103 | # cos(user, item) = sigma_ui/sqrt(sigma_u * sigma_i) 104 | 105 | # 参数说明: 106 | # user_profile: 某一用户user的画像 user = {'label1':1.1, 'label2': 0.5, 'label3': 0.0, ...} 107 | # item: 某一节目item的画像 item = {'label1':1, 'label2': 0, 'label3': 0, ...} 108 | # labels_names: 所有类型名 109 | def calCosDistance(user, item, labels_names): 110 | 111 | sigma_ui = 0.0 112 | sigma_u = 0.0 113 | sigma_i = 0.0 114 | 115 | for label in labels_names: 116 | sigma_ui += user[label] * item[label] 117 | sigma_u += (user[label] * user[label]) 118 | sigma_i += (item[label] * item[label]) 119 | 120 | if sigma_u == 0.0 or sigma_i == 0.0: # 若分母为0,相似度为0 121 | return 0 122 | 123 | return sigma_ui/math.sqrt(sigma_u * sigma_i) 124 | 125 | 126 | # 基于内容的推荐算法: 127 | # 借助特定某个用户user的画像user_profile和备选推荐节目集的画像items_profiles,通过计算向量之间的相似度得出推荐节目集 128 | 129 | # 参数说明: 130 | # user_profile: 某一用户user的画像 user_profile = {'label1':1.1, 'label2': 0.5, 'label3': 0.0, ...} 131 | # items_profiles: 备选推荐节目集的节目画像: items_profiles = {item1:{'label1':1, 'label2': 0, 'label3': 0}, item2:{...}...} 132 | # items_names: 备选推荐节目集中的所有节目名 133 | # labels_names: 所有类型名 134 | # items_user_saw: 用户user看过的节目 135 | 136 | def contentBased(user_profile, items_profiles, items_names, labels_names, items_user_saw): 137 | 138 | # 对于用户user的推荐节目集为 recommend_items = [[节目名, 该节目画像与该用户画像的相似度], ...] 139 | recommend_items = [] 140 | 141 | for i in range(len(items_names)): 142 | # 从备选推荐节目集中的选择用户user没有看过的节目 143 | if items_names[i] not in items_user_saw: 144 | recommend_items.append([items_names[i], calCosDistance(user_profile, items_profiles[items_names[i]], labels_names)]) 145 | 146 | # 将推荐节目集按相似度降序排列 147 | recommend_items.sort(key=lambda item: item[1], reverse=True) 148 | 149 | return recommend_items 150 | 151 | # 输出推荐给该用户的节目列表 152 | # max_num:最多输出的推荐节目数 153 | def printRecommendedItems(recommend_items_sorted, max_num): 154 | count = 0 155 | for item, degree in recommend_items_sorted: 156 | print("节目名:%s, 推荐指数:%f" % (item, degree)) 157 | count += 1 158 | if count == max_num: 159 | break 160 | 161 | 162 | # 主程序 163 | if __name__ == '__main__': 164 | 165 | all_users_names = ['A', 'B', 'C'] 166 | all_labels = ['教育', '戏曲', '悬疑', '科幻', '惊悚', '动作', '资讯', '武侠', '剧情', '警匪', '生活', '军事', '言情', '体育', '冒险', '纪实', 167 | '少儿教育', '少儿', '综艺', '古装', '搞笑', '广告'] 168 | labels_num = len(all_labels) 169 | 170 | df1 = pd.read_excel("D:\Recommender Systems\所有用户对其看过的节目的评分矩阵.xlsx") 171 | (m1, n1) = df1.shape 172 | # 所有用户对其看过的节目的评分矩阵 173 | # data_array1 = [[0.1804 0.042 0.11 0.07 0.19 0.56 0.14 0.3 0.32 0, ...], [...]] 174 | data_array1 = np.array(df1.iloc[:m1 + 1, 1:]) 175 | # 按照"所有用户对其看过的节目的评分矩阵"的列序排列的所有用户观看过的节目名称 176 | items_users_saw_names1 = df1.columns[1:].tolist() 177 | 178 | 179 | df2 = pd.read_excel("D:\Recommender Systems\所有用户看过的节目及所属类型的01矩阵.xlsx") 180 | (m2, n2) = df2.shape 181 | data_array2 = np.array(df2.iloc[:m2 + 1, 1:]) 182 | # 按照"所有用户看过的节目及所属类型的01矩阵"的列序排列的所有用户观看过的节目名称 183 | items_users_saw_names2 = np.array(df2.iloc[:m2 + 1, 0]).tolist() 184 | 185 | # 为用户看过的节目建立节目画像 186 | items_users_saw_profiles = createItemsProfiles(data_array2, all_labels, items_users_saw_names2) 187 | 188 | # 建立用户画像users_profiles和用户看过的节目集items_users_saw 189 | (users_profiles, items_users_saw) = createUsersProfiles(data_array1, all_users_names, items_users_saw_names1, all_labels, items_users_saw_profiles) 190 | 191 | df3 = pd.read_excel("D:\Recommender Systems\备选推荐节目集及所属类型01矩阵.xlsx") 192 | (m3, n3) = df3.shape 193 | data_array3 = np.array(df3.iloc[:m3 + 1, 1:]) 194 | # 按照"备选推荐节目集及所属类型01矩阵"的列序排列的所有用户观看过的节目名称 195 | items_to_be_recommended_names = np.array(df3.iloc[:m3 + 1, 0]).tolist() 196 | 197 | # 为备选推荐节目集建立节目画像 198 | items_to_be_recommended_profiles = createItemsProfiles(data_array3, all_labels, items_to_be_recommended_names) 199 | 200 | for user in all_users_names: 201 | print("对于用户 %s 的推荐节目如下:" % user) 202 | recommend_items = contentBased(users_profiles[user], items_to_be_recommended_profiles, items_to_be_recommended_names, all_labels, items_users_saw[user]) 203 | printRecommendedItems(recommend_items, 3) 204 | print() 205 | 206 | 207 | 208 | --------------------------------------------------------------------------------