└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # DatasetCollection 2 | Common datasets used in our research 3 | 4 |

Recommender systems

5 |

Social Recommendation

6 |
7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |     17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 |
Data SetBasic MetaUser Context
UsersItemsRatings (Scale)DensityUsersLinks (Type)
Ciao [1]7,375105,114284,086[1, 5]0.0365%7,375111,781Trust
Epinions [2]40,163139,738664,824[1, 5]0.0118%49,289487,183Trust
Douban [3]2,84839,586894,887[1, 5]0.794%2,84835,770Trust
LastFM [8]1,89217,63292,834implicit0.27%1,89225,434Trust
66 |
67 | 68 |

Music Recommendation

69 |
70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |     80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 |
Data SetBasic MetaContext
UsersTracksArtistsAlbumsRecordTagUser ProfileArtist Profile
NowPlaying [9]1,74416,8642,108N/A1,117,335N/AN/AN/A
Xiami [10]4,271290,31233,31695,0031,301,486YesN/AN/A
Yahoo Music [source]1,800,000136,000manymany717,000,000YesN/AN/A
30 Music [source][11]451675023108595049217337manyYesYesN/A
131 |
132 | 133 |

Paper Recommendation

134 |
135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 |   145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 |
Data SetBasic MetaContext
UsersPapersFeedBackTagContent
CiteULike [12]7,94725,975134,86052,946full abstract
158 | 159 |

Location Recommendation

160 |
161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 |   171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 |
Data SetBasic MetaContext
UsersLocationsFeedBackrelationTime
Gowalla 18,73732,5101,278,274YesYes
183 |
184 | 185 |

Product Recommendation

186 |
187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 |   197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 |
Data SetBasic MetaContext
UsersItemsCategoryBehavior TypeTime
Taobao(Extraction code: xv8o)[24, 25] 987,9944,162,0249,4395Yes
210 |
211 | 212 |

Spammer detection

213 |

Social Network

214 |
215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 |
Data Set Non-spammer Spammer Introduction
Twitter [4]1,295355The first column is the user class (i.e., 1 for non-spammers and 2 for spammers) and the subsequent columns numbered from 1 to 62 represent the user characteristics.
YouTube [5]64131 (promoter) 157(spammer)The first column is the user class (i.e., 1 for promoters, 2 for spammers, and 3 for legitimates) and the subsequent columns numbered from 1 to 60 represent the user characteristics.
235 |
236 |

Shilling Detection

237 |
238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 |     247 | 248 | 249 | 252 | 253 | 254 |     255 | 256 | 257 | 261 | 262 |
Data Set Non-spammer Spammer Introduction
Amazon [6]3,1181,937Colunms in profiles.txt follow this order: userid itemid rating.
250 |     In labels.txt: 1: spammer 0: non-spammer 251 |
Yelp [7]52,81580,466Colunms in yelp.txt follow this order: user_id prod_id rating label date.
258 |     labels -1: spammer 1: non-spammer
259 | I recommend you to filter users who have less than 5 ratings. *More information can be found in Google Drive 260 |
263 |
264 | 265 |

Cyberbullying Detection

266 |
267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 |
Data Set Year Annotated method # Data # Cyberbullying Cyberbullying Ratio
Formspring [13]2010Crowdsourcing3,9153699.43%
MySpace [14]2011Expert Labeling2,08843420.79%
Ask.fm [15]2014
Instagram [16]2014Crowdsourcing1,95456729%
Vine [17]2015Crowdsourcing97130431.34%
BullyingV3.0 [18]2015Label Algorithm7,3212,10228.71%
WOW [19]2016Expert Labeling16,9751370.81%
LOL [19]2016Expert Labeling17,3542071.19%
Twitter [20]2017Crowdsourcing1,303584.45%
Wikipedia [21]2017Crowdsourcing37,6113380.9%
Harassment-Corpus [22]2018Expert Labeling24,1893,11912.89%
Hate and Abusive Speech [23]2018Crowdsourcing99,79946,00946.1%
373 |
374 | 375 | 376 |

Reference

377 |

[1]. Tang, J., Gao, H., Liu, H.: mtrust:discerning multi-faceted trust in a connected world. In: International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, Wa, Usa, February. pp. 93–102 (2012)

378 |

[2]. Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on Recommender systems. pp. 17–24. ACM (2007)

379 |

[3]. G. Zhao, X. Qian, and X. Xie, “User-service rating prediction by exploring social users’ rating behaviors,” IEEE Transactions on Multimedia, vol. 18, no. 3, pp. 496–506, 2016.

380 |

[4]. Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6, No. 2010, p. 12. 2010.

381 |

[5]. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., & Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval. pp. 620-627. ACM (2009)

382 |

[6]. Xu, Chang, et al. "Uncovering collusive spammers in Chinese review websites." ACM International Conference on Conference on Information & Knowledge Management ACM, 2013:979-988.

383 |

[7]. Rayana, Shebuti, and L. Akoglu. "Collective Opinion Spam Detection: Bridging Review Networks and Metadata." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 2015:985-994. 384 |

[8]. Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. 2nd Workshop on Information Heterogeneity and Fusion in Recom- 385 | mender Systems (HetRec 2011). In Proceedings of the 5th ACM conference on Recommender systems (RecSys 2011). ACM, New York, NY, USA 386 |

[9]. Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26

387 |

[10]. Wang, Dongjing, et al. "Learning music embedding with metadata for context aware recommendation." Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016.

388 |

[11]. Turrin R, Quadrana M, Condorelli A, et al. 30Music Listening and Playlists Dataset[C]//RecSys Posters. 2015.

389 |

[12]. Hao Wang*, Wu-Jun Li, Relational collaborative topic regression for recommender systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(5): 1343-1355, 2015.

390 |

[13]. Reynolds K, Kontostathis A, Edwards L. Using machine learning to detect cyberbullying. Machine learning and applications and workshops (ICMLA), 2011 10th International Conference on. IEEE, 2011, 2: 241-244.

391 |

[14]. Bayzick J, Kontostathis A, Edwards L. Detecting the presence of cyberbullying using computer software. In 3rd Annual ACM Web Science Conference (WebSci ‘11). 2011: 1-2.

392 |

[15]. Hosseinmardi H, Ghasemianlangroodi A, Han R, et al. Towards understanding cyberbullying behavior in a semi-anonymous social network. Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014: 244-252.

393 |

[16]. Hosseinmardi H, Mattson S A, Rafiq R I, et al. Analyzing labeled cyberbullying incidents on the Instagram social network. International Conference on Social Informatics. Springer, Cham, 2015: 49-66.

394 |

[17]. Rafiq R I, Hosseinmardi H, Han R, et al. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015: 617-622.

395 |

[18]. Sui J. Understanding and fighting bullying with machine learning[D]. The University of Wisconsin-Madison, 2015.

396 |

[19]. Bretschneider U, Peters R. Detecting Cyberbullying in Online Communities. ECIS. 2016: ResearchPaper61.

397 |

[20]. Chatzakou D, Kourtellis N, Blackburn J, et al. Mean birds: Detecting aggression and bullying on twitter. Proceedings of the 2017 ACM on web science conference. ACM, 2017: 13-22.

398 |

[21]. Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1391-1399.

399 |

[22]. Rezvan M, Shekarpour S, Balasuriya L, et al. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Proceedings of the 10th ACM Conference on Web Science. ACM, 2018: 33-36.

400 |

[23]. Founta A-M, Djouvas C, Chatzakou D, et al. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. Proceedings of the 11th International Conference on Web and Social Media, ICWSM, 2018.

401 |

[24]. Han Z, Xiang L, Pengye Z, et al. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

402 |

[25]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.

403 |

[26]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.

404 | --------------------------------------------------------------------------------