└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # DatasetCollection 2 | Common datasets used in our research 3 | 4 |
| Data Set | 10 |Basic Meta | 11 |User Context | 12 |||||||
|---|---|---|---|---|---|---|---|---|
| Users | 15 |Items | 16 |Ratings (Scale) | 17 |Density | 18 |Users | 19 |Links (Type) | 20 ||||
| Ciao [1] | 23 |7,375 | 24 |105,114 | 25 |284,086 | 26 |[1, 5] | 27 |0.0365% | 28 |7,375 | 29 |111,781 | 30 |Trust | 31 |
| Epinions [2] | 34 |40,163 | 35 |139,738 | 36 |664,824 | 37 |[1, 5] | 38 |0.0118% | 39 |49,289 | 40 |487,183 | 41 |Trust | 42 |
| Douban [3] | 45 |2,848 | 46 |39,586 | 47 |894,887 | 48 |[1, 5] | 49 |0.794% | 50 |2,848 | 51 |35,770 | 52 |Trust | 53 |
| LastFM [8] | 56 |1,892 | 57 |17,632 | 58 |92,834 | 59 |implicit | 60 |0.27% | 61 |1,892 | 62 |25,434 | 63 |Trust | 64 |
| Data Set | 73 |Basic Meta | 74 |Context | 75 |||||||
|---|---|---|---|---|---|---|---|---|
| Users | 78 |Tracks | 79 |Artists | 80 |Albums | 81 |Record | 82 |Tag | 83 |User Profile | 84 |Artist Profile | 85 ||
| NowPlaying [9] | 88 |1,744 | 89 |16,864 | 90 |2,108 | 91 |N/A | 92 |1,117,335 | 93 |N/A | 94 |N/A | 95 |N/A | 96 |
| Xiami [10] | 99 |4,271 | 100 |290,312 | 101 |33,316 | 102 |95,003 | 103 |1,301,486 | 104 |Yes | 105 |N/A | 106 |N/A | 107 |
| Yahoo Music [source] | 110 |1,800,000 | 111 |136,000 | 112 |many | 113 |many | 114 |717,000,000 | 115 |Yes | 116 |N/A | 117 |N/A | 118 |
| 30 Music [source][11] | 121 |45167 | 122 |5023108 | 123 |595049 | 124 |217337 | 125 |many | 126 |Yes | 127 |Yes | 128 |N/A | 129 |
| Data Set | 138 |Basic Meta | 139 |Context | 140 |||||
|---|---|---|---|---|---|---|
| Users | 143 |Papers | 144 |FeedBack | 145 |Tag | 146 |Content | 147 |||
| CiteULike [12] | 150 |7,947 | 151 |25,975 | 152 |134,860 | 153 |52,946 | 154 |full abstract | 155 ||
| Data Set | 164 |Basic Meta | 165 |Context | 166 |||||
|---|---|---|---|---|---|---|
| Users | 169 |Locations | 170 |FeedBack | 171 |relation | 172 |Time | 173 |||
| Gowalla | 176 |18,737 | 177 |32,510 | 178 |1,278,274 | 179 |Yes | 180 |Yes | 181 ||
| Data Set | 190 |Basic Meta | 191 |Context | 192 |||||
|---|---|---|---|---|---|---|
| Users | 195 |Items | 196 |Category | 197 |Behavior Type | 198 |Time | 199 |||
| Taobao(Extraction code: xv8o)[24, 25] | 203 |987,994 | 204 |4,162,024 | 205 |9,439 | 206 |5 | 207 |Yes | 208 ||
| Data Set | 218 |Non-spammer | 219 |Spammer | 220 |Introduction | 221 |
|---|---|---|---|
| Twitter [4] | 224 |1,295 | 225 |355 | 226 |The first column is the user class (i.e., 1 for non-spammers and 2 for spammers) and the subsequent columns numbered from 1 to 62 represent the user characteristics. | 227 |
| YouTube [5] | 230 |641 | 231 |31 (promoter) 157(spammer) | 232 |The first column is the user class (i.e., 1 for promoters, 2 for spammers, and 3 for legitimates) and the subsequent columns numbered from 1 to 60 represent the user characteristics. | 233 |
| Data Set | 241 |Non-spammer | 242 |Spammer | 243 |Introduction | 244 |
|---|---|---|---|
| Amazon [6] | 247 |3,118 | 248 |1,937 | 249 |Colunms in profiles.txt follow this order: userid itemid rating. 250 | In labels.txt: 1: spammer 0: non-spammer 251 | | 252 |
| Yelp [7] | 255 |52,815 | 256 |80,466 | 257 |Colunms in yelp.txt follow this order: user_id prod_id rating label date. 258 | labels -1: spammer 1: non-spammer 259 | I recommend you to filter users who have less than 5 ratings. *More information can be found in Google Drive 260 | | 261 |
| Data Set | 270 |Year | 271 |Annotated method | 272 |# Data | 273 |# Cyberbullying | 274 |Cyberbullying Ratio | 275 |
|---|---|---|---|---|---|
| Formspring [13] | 278 |2010 | 279 |Crowdsourcing | 280 |3,915 | 281 |369 | 282 |9.43% | 283 |
| MySpace [14] | 286 |2011 | 287 |Expert Labeling | 288 |2,088 | 289 |434 | 290 |20.79% | 291 |
| Ask.fm [15] | 294 |2014 | 295 |296 | | 297 | | 298 | | 299 | |
| Instagram [16] | 302 |2014 | 303 |Crowdsourcing | 304 |1,954 | 305 |567 | 306 |29% | 307 |
| Vine [17] | 310 |2015 | 311 |Crowdsourcing | 312 |971 | 313 |304 | 314 |31.34% | 315 |
| BullyingV3.0 [18] | 318 |2015 | 319 |Label Algorithm | 320 |7,321 | 321 |2,102 | 322 |28.71% | 323 |
| WOW [19] | 326 |2016 | 327 |Expert Labeling | 328 |16,975 | 329 |137 | 330 |0.81% | 331 |
| LOL [19] | 334 |2016 | 335 |Expert Labeling | 336 |17,354 | 337 |207 | 338 |1.19% | 339 |
| Twitter [20] | 342 |2017 | 343 |Crowdsourcing | 344 |1,303 | 345 |58 | 346 |4.45% | 347 |
| Wikipedia [21] | 350 |2017 | 351 |Crowdsourcing | 352 |37,611 | 353 |338 | 354 |0.9% | 355 |
| Harassment-Corpus [22] | 358 |2018 | 359 |Expert Labeling | 360 |24,189 | 361 |3,119 | 362 |12.89% | 363 |
| Hate and Abusive Speech [23] | 366 |2018 | 367 |Crowdsourcing | 368 |99,799 | 369 |46,009 | 370 |46.1% | 371 |
[1]. Tang, J., Gao, H., Liu, H.: mtrust:discerning multi-faceted trust in a connected world. In: International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, Wa, Usa, February. pp. 93–102 (2012)
378 |[2]. Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on Recommender systems. pp. 17–24. ACM (2007)
379 |[3]. G. Zhao, X. Qian, and X. Xie, “User-service rating prediction by exploring social users’ rating behaviors,” IEEE Transactions on Multimedia, vol. 18, no. 3, pp. 496–506, 2016.
380 |[4]. Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6, No. 2010, p. 12. 2010.
381 |[5]. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., & Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval. pp. 620-627. ACM (2009)
382 |[6]. Xu, Chang, et al. "Uncovering collusive spammers in Chinese review websites." ACM International Conference on Conference on Information & Knowledge Management ACM, 2013:979-988.
383 |[7]. Rayana, Shebuti, and L. Akoglu. "Collective Opinion Spam Detection: Bridging Review Networks and Metadata." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 2015:985-994. 384 |
[8]. Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. 2nd Workshop on Information Heterogeneity and Fusion in Recom- 385 | mender Systems (HetRec 2011). In Proceedings of the 5th ACM conference on Recommender systems (RecSys 2011). ACM, New York, NY, USA 386 |
[9]. Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26
387 |[10]. Wang, Dongjing, et al. "Learning music embedding with metadata for context aware recommendation." Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016.
388 |[11]. Turrin R, Quadrana M, Condorelli A, et al. 30Music Listening and Playlists Dataset[C]//RecSys Posters. 2015.
389 |[12]. Hao Wang*, Wu-Jun Li, Relational collaborative topic regression for recommender systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(5): 1343-1355, 2015.
390 |[13]. Reynolds K, Kontostathis A, Edwards L. Using machine learning to detect cyberbullying. Machine learning and applications and workshops (ICMLA), 2011 10th International Conference on. IEEE, 2011, 2: 241-244.
391 |[14]. Bayzick J, Kontostathis A, Edwards L. Detecting the presence of cyberbullying using computer software. In 3rd Annual ACM Web Science Conference (WebSci ‘11). 2011: 1-2.
392 |[15]. Hosseinmardi H, Ghasemianlangroodi A, Han R, et al. Towards understanding cyberbullying behavior in a semi-anonymous social network. Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014: 244-252.
393 |[16]. Hosseinmardi H, Mattson S A, Rafiq R I, et al. Analyzing labeled cyberbullying incidents on the Instagram social network. International Conference on Social Informatics. Springer, Cham, 2015: 49-66.
394 |[17]. Rafiq R I, Hosseinmardi H, Han R, et al. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015: 617-622.
395 |[18]. Sui J. Understanding and fighting bullying with machine learning[D]. The University of Wisconsin-Madison, 2015.
396 |[19]. Bretschneider U, Peters R. Detecting Cyberbullying in Online Communities. ECIS. 2016: ResearchPaper61.
397 |[20]. Chatzakou D, Kourtellis N, Blackburn J, et al. Mean birds: Detecting aggression and bullying on twitter. Proceedings of the 2017 ACM on web science conference. ACM, 2017: 13-22.
398 |[21]. Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1391-1399.
399 |[22]. Rezvan M, Shekarpour S, Balasuriya L, et al. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Proceedings of the 10th ACM Conference on Web Science. ACM, 2018: 33-36.
400 |[23]. Founta A-M, Djouvas C, Chatzakou D, et al. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. Proceedings of the 11th International Conference on Web and Social Media, ICWSM, 2018.
401 |[24]. Han Z, Xiang L, Pengye Z, et al. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
402 |[25]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.
403 |[26]. Han Z, Daqing C, Ziru X, et al. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. arXiv:1902.07565.
404 | --------------------------------------------------------------------------------