└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # research-database 2 | Focus on collecting different public database for research. If you have any links please contact me or push to the repository. 3 | 4 | 5 | ### Phishing 6 | + [PhishTank](https://www.phishtank.com/developer_info.php); 7 | + [OpenPhish](https://www.openphish.com/); 8 | + [315online](http://www.315online.com.cn/list.php?catid=33); 9 | + [中国移动垃圾短信](http://www.wid.org.cn/project/2015ccf/comp_detail.php?cid=227); 10 | + [360最近恶意网站列表](http://webscan.360.cn/url) 11 | 12 | ### Social data 13 | + [Reddit Comments Corpus](https://archive.org/details/2015_reddit_comments_corpus); 14 | + [Full Reddit Submission Corpus](https://www.reddit.com/r/datasets/comments/3mg812/full_reddit_submission_corpus_now_available_2006/); 15 | + [City Record Online](https://nycopendata.socrata.com/); 16 | + [TLC Trip Record Data](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml); 17 | + [Frequency Word Lists](https://invokeit.wordpress.com/frequency-word-lists/); 18 | + [Amazon product data](http://jmcauley.ucsd.edu/data/amazon/); 19 | + [Wikimedia database](https://dumps.wikimedia.org/); 20 | + [Airbnb database](http://insideairbnb.com/get-the-data.html); 21 | 22 | ### Network data 23 | + [KDD Cup 1999 Data](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html); 24 | 25 | ### Security Data 26 | + [Driving in the Cloud Dataset](http://malicia-project.com/dataset.html); 27 | + [Nothink Malware samples](http://www.nothink.org/honeypots/malware-archives/) 28 | + [SecRepo.com - Samples of Security Related Data](http://www.secrepo.com/) **** 29 | + [lanl.gov Open Data Sets](http://csr.lanl.gov/data/); 30 | + [Crime data from the St. Louis Metropolitan Police Departments](https://github.com/kylesykes/stl-crime-data); 31 | + [Chronology of Data Breaches Security Breaches 2005 - Present](https://www.privacyrights.org/data-breach); 32 | + [Malware Sample Sources for Researchers](https://zeltser.com/malware-sample-sources/); 33 | + [Microsoft Malware Classification Challenge (BIG 2015)](https://www.kaggle.com/c/malware-classification/forums); 34 | + [Android Malware-The Drebin Dataset](http://user.informatik.uni-goettingen.de/~darp/drebin/); 35 | 36 | 37 | ### Others 38 | + [beijing data](http://www.beijingcitylab.com/data-released-1/) 39 | 40 | ### [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data) 41 | + Social networks : online social networks, edges represent interactions between people 42 | + Networks with ground-truth communities : ground-truth network communities in social and information networks 43 | + Communication networks : email communication networks with edges representing communication 44 | + Citation networks : nodes represent papers, edges represent citations 45 | + Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring a paper) 46 | + Web graphs : nodes represent webpages and edges are hyperlinks 47 | + Amazon networks : nodes represent products and edges link commonly co-purchased products 48 | + Internet networks : nodes represent computers and edges communication 49 | + Road networks : nodes represent intersections and edges roads connecting the intersections 50 | + Autonomous systems : graphs of the internet 51 | + Signed networks : networks with positive and negative edges (friend/foe, trust/distrust) 52 | + Location-based online social networks : Social networks with geographic check-ins 53 | + Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia 54 | + Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets 55 | + Online communities : Data from online communities such as Reddit and Flickr 56 | + Online reviews : Data from online review systems such as BeerAdvocate and Amazon 57 | --------------------------------------------------------------------------------