├── 3.png ├── README.md └── 基于协同过滤的推荐系统 └── 基于协同过滤算法的推荐系统.ipynb /3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ztz818/Recommender-System/HEAD/3.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Recommender-System 2 | 分别基于协同过滤算法和基于TensorFlow建立推荐系统 3 | 4 | 一、基于协同过滤的推荐系统 5 | 6 | 协同过滤(collaborative filtering):利用某兴趣相投、拥有共同经验之群体的喜好来推荐用户感兴趣的信息。比如说,你和另外一个人都喜欢看电影,而你们所喜欢的电影类型都差不多,那个人对于某一部电影的评价很高,而这部电影你没有看过。那么,我是不是就能将这部电影推荐给你呢?基于协同过滤的推荐,有基于用户和基于物品。 7 | 8 | 本项目可实现的功能: 9 | 10 | 1、基于用户的协同过滤 11 | 12 | 2、基于项目的协同过滤 13 | 14 | 3、基于内容的过滤算法 15 | 16 | 4、混合推荐系统 17 | 18 | 示例: 19 | The 10 nearest neighbors of That Thing You Do! (1996): 20 | 21 | Sleepless in Seattle (1993) 22 | 23 | Braveheart (1995) 24 | 25 | Toy Story (1995) 26 | 27 | Batman (1989) 28 | 29 | Groundhog Day (1993) 30 | 31 | Pretty Woman (1990) 32 | 33 | Star Trek: First Contact (1996) 34 | 35 | While You Were Sleeping (1995) 36 | 37 | Sabrina (1995) 38 | 39 | Grosse Pointe Blank (1997) 40 | 41 | 具体实现见项目内代码 42 | 43 | 二、基于tensorflow的个性推荐系统 44 | 45 | 本项目使用文本卷积神经网络,并使用MovieLens数据集完成电影推荐的任务。 46 | 47 | 48 | 实现的推荐功能如下: 49 | 50 | 1、指定用户和电影进行评分 51 | 52 | 2、推荐同类型的电影 53 | 54 | 3、推荐某个用户喜欢的电影 55 | 56 | 4、看过这个电影的人还看了(喜欢)哪些电影 57 | 58 | 59 | 本项目使用的数据集:
60 | 61 | 本项目使用的是MovieLens 1M 数据集,包含6000个用户在近4000部电影上的1亿条评论。 62 | 63 | 数据集分为三个文件:用户数据users.dat,电影数据movies.dat和评分数据ratings.dat。 64 | 65 | 系统结构如下 66 | 67 | ![](https://github.com/chengstone/movie_recommender/blob/master/assets/model.001.jpeg) 68 | 69 | 我们的目的就是要训练出用户特征和电影特征,在实现推荐功能时使用。得到这两个特征以后,就可以选择任意的方式来拟合评分了。我使用了两种方式,一个是上图中画出的将两个特征做向量乘法,将结果与真实评分做回归,采用MSE优化损失。因为本质上这是一个回归问题,另一种方式是,将两个特征作为输入,再次传入全连接层,输出一个值,将输出值回归到真实评分,采用MSE优化损失。
70 | 实际上第二个方式的MSE loss在0.8附近,第一个方式在1附近,5次迭代的结果。
71 | 72 | 73 | 文本卷积网络:
74 | ![](https://github.com/ztz818/Recommender-System/blob/master/3.png) 75 | 76 | 网络的第一层是词嵌入层,由每一个单词的嵌入向量组成的嵌入矩阵。下一层使用多个不同尺寸(窗口大小)的卷积核在嵌入矩阵上做卷积,窗口大小指的是每次卷积覆盖几个单词。这里跟对图像做卷积不太一样,图像的卷积通常用2x2、3x3、5x5之类的尺寸,而文本卷积要覆盖整个单词的嵌入向量,所以尺寸是(单词数,向量维度),比如每次滑动3个,4个或者5个单词。第三层网络是max pooling得到一个长向量,最后使用dropout做正则化,最终得到了电影Title的特征。 77 | 78 | 训练的loss: 79 | 80 | ![](https://github.com/chengstone/movie_recommender/blob/master/assets/loss.png) 81 | 82 | 开始推荐电影
83 | 84 | 使用生产的用户特征矩阵和电影特征矩阵做电影推荐
85 | 86 | 推荐同类型的电影
87 | 88 | 思路是计算当前看的电影特征向量与整个电影特征矩阵的余弦相似度,取相似度最大的top_k个,这里加了些随机选择在里面,保证每次的推荐稍稍有些不同。
89 | 90 | 推荐您喜欢的电影
91 | 思路是使用用户特征向量与电影特征矩阵计算所有电影的评分,取评分最高的top_k个,同样加了些随机选择部分。
92 | 93 | 看过这个电影的人还看了(喜欢)哪些电影
94 | 首先选出喜欢某个电影的top_k个人,得到这几个人的用户特征向量。
95 | 然后计算这几个人对所有电影的评分

96 | 选择每个人评分最高的电影作为推荐
97 | 同样加入了随机选择
98 | 99 | 以上就是实现的常用的推荐功能,将网络模型作为回归问题进行训练,得到训练好的用户特征矩阵和电影特征矩阵进行推荐。 100 | 101 | 102 | 由于网络原因,数据集暂不上传,需要的请联系513617866@qq.com 103 | -------------------------------------------------------------------------------- /基于协同过滤的推荐系统/基于协同过滤算法的推荐系统.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import io\n", 11 | "from surprise import KNNBaseline\n", 12 | "from surprise import Dataset\n", 13 | "\n" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 8, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "import logging\n", 23 | "\n", 24 | "logging.basicConfig(level=logging.INFO,\n", 25 | " format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',\n", 26 | " datefmt='%a, %d %b %Y %H:%M:%S')" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 12, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "name": "stderr", 36 | "output_type": "stream", 37 | "text": [ 38 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py:66: UserWarning: The evaluate() method is deprecated. Please use model_selection.cross_validate() instead.\n", 39 | " 'model_selection.cross_validate() instead.', UserWarning)\n", 40 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\dataset.py:193: UserWarning: Using data.split() or using load_from_folds() without using a CV iterator is now deprecated. \n", 41 | " UserWarning)\n" 42 | ] 43 | }, 44 | { 45 | "name": "stdout", 46 | "output_type": "stream", 47 | "text": [ 48 | "Evaluating RMSE, MAE of algorithm SVD.\n", 49 | "\n", 50 | "------------\n", 51 | "Fold 1\n", 52 | "RMSE: 0.9478\n", 53 | "MAE: 0.7475\n", 54 | "------------\n", 55 | "Fold 2\n", 56 | "RMSE: 0.9430\n", 57 | "MAE: 0.7445\n", 58 | "------------\n", 59 | "Fold 3\n", 60 | "RMSE: 0.9422\n", 61 | "MAE: 0.7430\n", 62 | "------------\n", 63 | "------------\n", 64 | "Mean RMSE: 0.9444\n", 65 | "Mean MAE : 0.7450\n", 66 | "------------\n", 67 | "------------\n", 68 | " Fold 1 Fold 2 Fold 3 Mean \n", 69 | "RMSE 0.9478 0.9430 0.9422 0.9444 \n", 70 | "MAE 0.7475 0.7445 0.7430 0.7450 \n" 71 | ] 72 | } 73 | ], 74 | "source": [ 75 | "# 可以使用上面提到的各种推荐系统算法\n", 76 | "from surprise import SVD\n", 77 | "from surprise import Dataset\n", 78 | "from surprise import evaluate, print_perf\n", 79 | "\n", 80 | "# 默认载入movielens数据集,会提示是否下载这个数据集,这是非常经典的公开推荐系统数据集——MovieLens数据集之一\n", 81 | "data = Dataset.load_builtin('ml-100k')\n", 82 | "# k折交叉验证(k=3)\n", 83 | "data.split(n_folds=3)\n", 84 | "# 试一把SVD矩阵分解\n", 85 | "algo = SVD()\n", 86 | "# 在数据集上测试一下效果\n", 87 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 88 | "#输出结果\n", 89 | "print_perf(perf)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "算法调参(让推荐系统有更好的效果)\n", 97 | "这里实现的算法用到的算法无外乎也是SGD等,因此也有一些超参数会影响最后的结果,我们同样可以用sklearn中常用到的网格搜索交叉验证(GridSearchCV)来选择最优的参数。简单的例子如下所示:" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 16, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "name": "stderr", 107 | "output_type": "stream", 108 | "text": [ 109 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py:232: UserWarning: The GridSearch() class is deprecated. Please use model_selection.GridSearchCV instead.\n", 110 | " 'model_selection.GridSearchCV instead.', UserWarning)\n" 111 | ] 112 | }, 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "Running grid search for the following parameter combinations:\n", 118 | "{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.4}\n", 119 | "{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.6}\n", 120 | "{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.4}\n", 121 | "{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.6}\n", 122 | "{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.4}\n", 123 | "{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.6}\n", 124 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}\n", 125 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}\n" 126 | ] 127 | }, 128 | { 129 | "name": "stderr", 130 | "output_type": "stream", 131 | "text": [ 132 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py:66: UserWarning: The evaluate() method is deprecated. Please use model_selection.cross_validate() instead.\n", 133 | " 'model_selection.cross_validate() instead.', UserWarning)\n", 134 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\dataset.py:193: UserWarning: Using data.split() or using load_from_folds() without using a CV iterator is now deprecated. \n", 135 | " UserWarning)\n" 136 | ] 137 | }, 138 | { 139 | "name": "stdout", 140 | "output_type": "stream", 141 | "text": [ 142 | "Resulsts:\n", 143 | "{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.4}\n", 144 | "{'RMSE': 0.99757802895975123, 'FCP': 0.68344528462992526}\n", 145 | "----------\n", 146 | "{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.6}\n", 147 | "{'RMSE': 1.0034412513430804, 'FCP': 0.68687609151250051}\n", 148 | "----------\n", 149 | "{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.4}\n", 150 | "{'RMSE': 0.97430396591083557, 'FCP': 0.69267116521887429}\n", 151 | "----------\n", 152 | "{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.6}\n", 153 | "{'RMSE': 0.98306874054929827, 'FCP': 0.69355288039671981}\n", 154 | "----------\n", 155 | "{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.4}\n", 156 | "{'RMSE': 0.97845027003242768, 'FCP': 0.69177462351387298}\n", 157 | "----------\n", 158 | "{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.6}\n", 159 | "{'RMSE': 0.98654376832754365, 'FCP': 0.69286994997251039}\n", 160 | "----------\n", 161 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}\n", 162 | "{'RMSE': 0.96436411806250799, 'FCP': 0.69717746370739053}\n", 163 | "----------\n", 164 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}\n", 165 | "{'RMSE': 0.97412101384387262, 'FCP': 0.69751300786180559}\n", 166 | "----------\n", 167 | "0.964364118063\n", 168 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}\n", 169 | "0.697513007862\n", 170 | "{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}\n" 171 | ] 172 | } 173 | ], 174 | "source": [ 175 | "from surprise import GridSearch\n", 176 | "# 定义好需要优选的参数网格\n", 177 | "param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],\n", 178 | " 'reg_all': [0.4, 0.6]}\n", 179 | "# 使用网格搜索交叉验证\n", 180 | "grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])\n", 181 | "# 在数据集上找到最好的参数\n", 182 | "data = Dataset.load_builtin('ml-100k')\n", 183 | "data.split(n_folds=3)\n", 184 | "grid_search.evaluate(data)\n", 185 | "# 输出调优的参数组 \n", 186 | "# 输出最好的RMSE结果\n", 187 | "print(grid_search.best_score['RMSE'])\n", 188 | "# >>> 0.96117566386\n", 189 | "\n", 190 | "# 输出对应最好的RMSE结果的参数\n", 191 | "print(grid_search.best_params['RMSE'])\n", 192 | "# >>> {'reg_all': 0.4, 'lr_all': 0.005, 'n_epochs': 10}\n", 193 | "\n", 194 | "# 最好的FCP得分\n", 195 | "print(grid_search.best_score['FCP'])\n", 196 | "# >>> 0.702279736531\n", 197 | "\n", 198 | "# 对应最高FCP得分的参数\n", 199 | "print(grid_search.best_params['FCP'])\n", 200 | "# >>> {'reg_all': 0.6, 'lr_all': 0.005, 'n_epochs': 10}" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "使用不同的推荐系统算法进行建模比较" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 17, 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "name": "stderr", 217 | "output_type": "stream", 218 | "text": [ 219 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py:66: UserWarning: The evaluate() method is deprecated. Please use model_selection.cross_validate() instead.\n", 220 | " 'model_selection.cross_validate() instead.', UserWarning)\n", 221 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\dataset.py:193: UserWarning: Using data.split() or using load_from_folds() without using a CV iterator is now deprecated. \n", 222 | " UserWarning)\n" 223 | ] 224 | }, 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "Evaluating RMSE, MAE of algorithm NormalPredictor.\n", 230 | "\n", 231 | "------------\n", 232 | "Fold 1\n", 233 | "RMSE: 1.5225\n", 234 | "MAE: 1.2223\n", 235 | "------------\n", 236 | "Fold 2\n", 237 | "RMSE: 1.5231\n", 238 | "MAE: 1.2261\n", 239 | "------------\n", 240 | "Fold 3\n", 241 | "RMSE: 1.5139\n", 242 | "MAE: 1.2174\n", 243 | "------------\n", 244 | "------------\n", 245 | "Mean RMSE: 1.5198\n", 246 | "Mean MAE : 1.2220\n", 247 | "------------\n", 248 | "------------\n", 249 | "Evaluating RMSE, MAE of algorithm BaselineOnly.\n", 250 | "\n", 251 | "------------\n", 252 | "Fold 1\n", 253 | "Estimating biases using als...\n", 254 | "RMSE: 0.9504\n", 255 | "MAE: 0.7544\n", 256 | "------------\n", 257 | "Fold 2\n", 258 | "Estimating biases using als...\n", 259 | "RMSE: 0.9476\n", 260 | "MAE: 0.7515\n", 261 | "------------\n", 262 | "Fold 3\n", 263 | "Estimating biases using als...\n", 264 | "RMSE: 0.9445\n", 265 | "MAE: 0.7487\n", 266 | "------------\n", 267 | "------------\n", 268 | "Mean RMSE: 0.9475\n", 269 | "Mean MAE : 0.7515\n", 270 | "------------\n", 271 | "------------\n", 272 | "Evaluating RMSE, MAE of algorithm KNNBasic.\n", 273 | "\n", 274 | "------------\n", 275 | "Fold 1\n", 276 | "Computing the msd similarity matrix...\n", 277 | "Done computing similarity matrix.\n", 278 | "RMSE: 0.9894\n", 279 | "MAE: 0.7818\n", 280 | "------------\n", 281 | "Fold 2\n", 282 | "Computing the msd similarity matrix...\n", 283 | "Done computing similarity matrix.\n", 284 | "RMSE: 0.9907\n", 285 | "MAE: 0.7828\n", 286 | "------------\n", 287 | "Fold 3\n", 288 | "Computing the msd similarity matrix...\n", 289 | "Done computing similarity matrix.\n", 290 | "RMSE: 0.9867\n", 291 | "MAE: 0.7800\n", 292 | "------------\n", 293 | "------------\n", 294 | "Mean RMSE: 0.9889\n", 295 | "Mean MAE : 0.7815\n", 296 | "------------\n", 297 | "------------\n", 298 | "Evaluating RMSE, MAE of algorithm KNNWithMeans.\n", 299 | "\n", 300 | "------------\n", 301 | "Fold 1\n", 302 | "Computing the msd similarity matrix...\n", 303 | "Done computing similarity matrix.\n", 304 | "RMSE: 0.9563\n", 305 | "MAE: 0.7540\n", 306 | "------------\n", 307 | "Fold 2\n", 308 | "Computing the msd similarity matrix...\n", 309 | "Done computing similarity matrix.\n", 310 | "RMSE: 0.9568\n", 311 | "MAE: 0.7541\n", 312 | "------------\n", 313 | "Fold 3\n", 314 | "Computing the msd similarity matrix...\n", 315 | "Done computing similarity matrix.\n", 316 | "RMSE: 0.9569\n", 317 | "MAE: 0.7533\n", 318 | "------------\n", 319 | "------------\n", 320 | "Mean RMSE: 0.9567\n", 321 | "Mean MAE : 0.7538\n", 322 | "------------\n", 323 | "------------\n", 324 | "Evaluating RMSE, MAE of algorithm KNNBaseline.\n", 325 | "\n", 326 | "------------\n", 327 | "Fold 1\n", 328 | "Estimating biases using als...\n", 329 | "Computing the msd similarity matrix...\n", 330 | "Done computing similarity matrix.\n", 331 | "RMSE: 0.9373\n", 332 | "MAE: 0.7394\n", 333 | "------------\n", 334 | "Fold 2\n", 335 | "Estimating biases using als...\n", 336 | "Computing the msd similarity matrix...\n", 337 | "Done computing similarity matrix.\n", 338 | "RMSE: 0.9368\n", 339 | "MAE: 0.7383\n", 340 | "------------\n", 341 | "Fold 3\n", 342 | "Estimating biases using als...\n", 343 | "Computing the msd similarity matrix...\n", 344 | "Done computing similarity matrix.\n", 345 | "RMSE: 0.9358\n", 346 | "MAE: 0.7367\n", 347 | "------------\n", 348 | "------------\n", 349 | "Mean RMSE: 0.9367\n", 350 | "Mean MAE : 0.7381\n", 351 | "------------\n", 352 | "------------\n", 353 | "Evaluating RMSE, MAE of algorithm SVD.\n", 354 | "\n", 355 | "------------\n", 356 | "Fold 1\n", 357 | "RMSE: 0.9464\n", 358 | "MAE: 0.7476\n", 359 | "------------\n", 360 | "Fold 2\n", 361 | "RMSE: 0.9458\n", 362 | "MAE: 0.7459\n", 363 | "------------\n", 364 | "Fold 3\n", 365 | "RMSE: 0.9423\n", 366 | "MAE: 0.7434\n", 367 | "------------\n", 368 | "------------\n", 369 | "Mean RMSE: 0.9448\n", 370 | "Mean MAE : 0.7456\n", 371 | "------------\n", 372 | "------------\n", 373 | "Evaluating RMSE, MAE of algorithm SVDpp.\n", 374 | "\n", 375 | "------------\n", 376 | "Fold 1\n", 377 | "RMSE: 0.9292\n", 378 | "MAE: 0.7305\n", 379 | "------------\n", 380 | "Fold 2\n", 381 | "RMSE: 0.9273\n", 382 | "MAE: 0.7287\n", 383 | "------------\n", 384 | "Fold 3\n" 385 | ] 386 | }, 387 | { 388 | "ename": "KeyboardInterrupt", 389 | "evalue": "", 390 | "output_type": "error", 391 | "traceback": [ 392 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 393 | "\u001b[1;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", 394 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 32\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0msurprise\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mSVDpp\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mevaluate\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[0malgo\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mSVDpp\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 34\u001b[1;33m \u001b[0mperf\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mevaluate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0malgo\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmeasures\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'RMSE'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'MAE'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 35\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 36\u001b[0m \u001b[1;31m### 使用NMF\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 395 | "\u001b[1;32m~\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py\u001b[0m in \u001b[0;36mevaluate\u001b[1;34m(algo, data, measures, with_dump, dump_dir, verbose)\u001b[0m\n\u001b[0;32m 81\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 82\u001b[0m \u001b[1;31m# train and test algorithm. Keep all rating predictions in a list\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 83\u001b[1;33m \u001b[0malgo\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtrainset\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 84\u001b[0m \u001b[0mpredictions\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0malgo\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtest\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtestset\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mverbose\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mverbose\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;36m2\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 85\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", 396 | "\u001b[1;32m~\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\prediction_algorithms\\matrix_factorization.pyx\u001b[0m in \u001b[0;36msurprise.prediction_algorithms.matrix_factorization.SVDpp.fit\u001b[1;34m()\u001b[0m\n", 397 | "\u001b[1;32m~\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\prediction_algorithms\\matrix_factorization.pyx\u001b[0m in \u001b[0;36msurprise.prediction_algorithms.matrix_factorization.SVDpp.sgd\u001b[1;34m()\u001b[0m\n", 398 | "\u001b[1;32m~\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\trainset.py\u001b[0m in \u001b[0;36mall_ratings\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 188\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mu\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mu_ratings\u001b[0m \u001b[1;32min\u001b[0m \u001b[0miteritems\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mur\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 189\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mr\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mu_ratings\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 190\u001b[1;33m \u001b[1;32myield\u001b[0m \u001b[0mu\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mr\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 191\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 192\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mbuild_testset\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 399 | "\u001b[1;31mKeyboardInterrupt\u001b[0m: " 400 | ] 401 | } 402 | ], 403 | "source": [ 404 | "### 使用NormalPredictor\n", 405 | "from surprise import NormalPredictor, evaluate\n", 406 | "algo = NormalPredictor()\n", 407 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 408 | "\n", 409 | "### 使用BaselineOnly\n", 410 | "from surprise import BaselineOnly, evaluate\n", 411 | "algo = BaselineOnly()\n", 412 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 413 | "\n", 414 | "### 使用基础版协同过滤\n", 415 | "from surprise import KNNBasic, evaluate\n", 416 | "algo = KNNBasic()\n", 417 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 418 | "\n", 419 | "### 使用均值协同过滤\n", 420 | "from surprise import KNNWithMeans, evaluate\n", 421 | "algo = KNNWithMeans()\n", 422 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 423 | "\n", 424 | "### 使用协同过滤baseline\n", 425 | "from surprise import KNNBaseline, evaluate\n", 426 | "algo = KNNBaseline()\n", 427 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 428 | "\n", 429 | "### 使用SVD\n", 430 | "from surprise import SVD, evaluate\n", 431 | "algo = SVD()\n", 432 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 433 | "\n", 434 | "### 使用SVD++\n", 435 | "from surprise import SVDpp, evaluate\n", 436 | "algo = SVDpp()\n", 437 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 438 | "\n", 439 | "### 使用NMF\n", 440 | "from surprise import NMF\n", 441 | "algo = NMF()\n", 442 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 443 | "print_perf(perf)" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "用协同过滤构建模型并进行预测" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 18, 456 | "metadata": {}, 457 | "outputs": [ 458 | { 459 | "name": "stderr", 460 | "output_type": "stream", 461 | "text": [ 462 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\evaluate.py:66: UserWarning: The evaluate() method is deprecated. Please use model_selection.cross_validate() instead.\n", 463 | " 'model_selection.cross_validate() instead.', UserWarning)\n", 464 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\dataset.py:193: UserWarning: Using data.split() or using load_from_folds() without using a CV iterator is now deprecated. \n", 465 | " UserWarning)\n" 466 | ] 467 | }, 468 | { 469 | "name": "stdout", 470 | "output_type": "stream", 471 | "text": [ 472 | "Evaluating RMSE, MAE of algorithm SVD.\n", 473 | "\n", 474 | "------------\n", 475 | "Fold 1\n", 476 | "RMSE: 0.9439\n", 477 | "MAE: 0.7449\n", 478 | "------------\n", 479 | "Fold 2\n", 480 | "RMSE: 0.9455\n", 481 | "MAE: 0.7462\n", 482 | "------------\n", 483 | "Fold 3\n", 484 | "RMSE: 0.9467\n", 485 | "MAE: 0.7460\n", 486 | "------------\n", 487 | "------------\n", 488 | "Mean RMSE: 0.9454\n", 489 | "Mean MAE : 0.7457\n", 490 | "------------\n", 491 | "------------\n", 492 | " Fold 1 Fold 2 Fold 3 Mean \n", 493 | "RMSE 0.9439 0.9455 0.9467 0.9454 \n", 494 | "MAE 0.7449 0.7462 0.7460 0.7457 \n" 495 | ] 496 | } 497 | ], 498 | "source": [ 499 | "# 可以使用上面提到的各种推荐系统算法\n", 500 | "from surprise import SVD\n", 501 | "from surprise import Dataset\n", 502 | "from surprise import evaluate, print_perf\n", 503 | "\n", 504 | "# 默认载入movielens数据集\n", 505 | "data = Dataset.load_builtin('ml-100k')\n", 506 | "# k折交叉验证(k=3)\n", 507 | "data.split(n_folds=3)\n", 508 | "# 试一把SVD矩阵分解\n", 509 | "algo = SVD()\n", 510 | "# 在数据集上测试一下效果\n", 511 | "perf = evaluate(algo, data, measures=['RMSE', 'MAE'])\n", 512 | "#输出结果\n", 513 | "print_perf(perf)" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 19, 519 | "metadata": {}, 520 | "outputs": [], 521 | "source": [ 522 | "from __future__ import (absolute_import, division, print_function,\n", 523 | " unicode_literals)\n", 524 | "import os\n", 525 | "import io\n", 526 | "\n", 527 | "from surprise import KNNBaseline\n", 528 | "from surprise import Dataset\n", 529 | "\n", 530 | "\n", 531 | "def read_item_names():\n", 532 | " \"\"\"\n", 533 | " 获取电影名到电影id 和 电影id到电影名的映射\n", 534 | " \"\"\"\n", 535 | "\n", 536 | " file_name = (os.path.expanduser('~') +\n", 537 | " '/.surprise_data/ml-100k/ml-100k/u.item')\n", 538 | " rid_to_name = {}\n", 539 | " name_to_rid = {}\n", 540 | " with io.open(file_name, 'r', encoding='ISO-8859-1') as f:\n", 541 | " for line in f:\n", 542 | " line = line.split('|')\n", 543 | " rid_to_name[line[0]] = line[1]\n", 544 | " name_to_rid[line[1]] = line[0]\n", 545 | "\n", 546 | " return rid_to_name, name_to_rid\n" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 21, 552 | "metadata": {}, 553 | "outputs": [ 554 | { 555 | "name": "stderr", 556 | "output_type": "stream", 557 | "text": [ 558 | "C:\\Users\\jiangpin\\AppData\\Roaming\\Python\\Python36\\site-packages\\surprise\\prediction_algorithms\\algo_base.py:51: UserWarning: train() is deprecated. Use fit() instead\n", 559 | " warnings.warn('train() is deprecated. Use fit() instead', UserWarning)\n" 560 | ] 561 | }, 562 | { 563 | "name": "stdout", 564 | "output_type": "stream", 565 | "text": [ 566 | "Estimating biases using als...\n", 567 | "Computing the pearson_baseline similarity matrix...\n", 568 | "Done computing similarity matrix.\n", 569 | "\n", 570 | "和该电影最相似的前10部电影是:\n", 571 | "Lion King, The (1994)\n", 572 | "Toy Story (1995)\n", 573 | "Cinderella (1950)\n", 574 | "Hunchback of Notre Dame, The (1996)\n", 575 | "Sound of Music, The (1965)\n", 576 | "Clueless (1995)\n", 577 | "Aladdin (1992)\n", 578 | "E.T. the Extra-Terrestrial (1982)\n", 579 | "Winnie the Pooh and the Blustery Day (1968)\n", 580 | "Ghost (1990)\n" 581 | ] 582 | } 583 | ], 584 | "source": [ 585 | "# 首先,用算法计算相互间的相似度\n", 586 | "data = Dataset.load_builtin('ml-100k')\n", 587 | "trainset = data.build_full_trainset()\n", 588 | "sim_options = {'name': 'pearson_baseline', 'user_based': False}\n", 589 | "algo = KNNBaseline(sim_options=sim_options)\n", 590 | "algo.train(trainset)\n", 591 | "\n", 592 | "# 获取电影名到电影id 和 电影id到电影名的映射\n", 593 | "rid_to_name, name_to_rid = read_item_names()\n", 594 | "\n", 595 | "# Retieve inner id of the movie Toy Story\n", 596 | "toy_story_raw_id = name_to_rid['Beauty and the Beast (1991)']\n", 597 | "toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)\n", 598 | "\n", 599 | "# Retrieve inner ids of the nearest neighbors of Toy Story.\n", 600 | "toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)\n", 601 | "\n", 602 | "# Convert inner ids of the neighbors into names.\n", 603 | "toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)\n", 604 | " for inner_id in toy_story_neighbors)\n", 605 | "toy_story_neighbors = (rid_to_name[rid]\n", 606 | " for rid in toy_story_neighbors)\n", 607 | "\n", 608 | "print()\n", 609 | "print('和该电影最相似的前10部电影是:')\n", 610 | "for movie in toy_story_neighbors:\n", 611 | " print(movie)" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": null, 617 | "metadata": {}, 618 | "outputs": [], 619 | "source": [] 620 | } 621 | ], 622 | "metadata": { 623 | "kernelspec": { 624 | "display_name": "Python 3", 625 | "language": "python", 626 | "name": "python3" 627 | }, 628 | "language_info": { 629 | "codemirror_mode": { 630 | "name": "ipython", 631 | "version": 3 632 | }, 633 | "file_extension": ".py", 634 | "mimetype": "text/x-python", 635 | "name": "python", 636 | "nbconvert_exporter": "python", 637 | "pygments_lexer": "ipython3", 638 | "version": "3.6.2" 639 | } 640 | }, 641 | "nbformat": 4, 642 | "nbformat_minor": 2 643 | } 644 | --------------------------------------------------------------------------------