├── Datasets └── titanic.csv.zip ├── .travis.yml ├── LICENSE ├── Government.rst └── README.rst /Datasets/titanic.csv.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SudalaiRajkumar/awesome-public-datasets/HEAD/Datasets/titanic.csv.zip -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | # language: ruby 2 | # rvm: 3 | # - 2.2 4 | # before_script: 5 | # - gem install awesome_bot 6 | # script: 7 | # - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ 8 | # - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov,arcgis,bixi 9 | # - site503=datamob.org,research.microsoft.com 10 | # - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014-2015 Xiaming Chen and other contributors to this list. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /Government.rst: -------------------------------------------------------------------------------- 1 | Government 2 | ---------- 3 | 4 | * `EveryPolitician, ongoing project collating and sharing data on every politician. `_ 5 | 6 | * `Alberta, Province of Canada `_ 7 | * `Antwerp, Belgium `_ 8 | * `Argentina (non official) `_ 9 | * `Argentina `_ 10 | * `Austin, TX, US `_ 11 | * `Australia (abs.gov.au) `_ 12 | * `Australia (data.gov.au) `_ 13 | * `Austria (data.gv.at) `_ 14 | * `Baton Rouge, LA, US `_ 15 | * `Belgium `_ 16 | * `Brazil `_ 17 | * `Buenos Aires, Argentina `_ 18 | * `Calgary, AB, Canada `_ 19 | * `Cambridge, MA, US `_ 20 | * `Canada `_ 21 | * `Chicago `_ 22 | * `Chile `_ 23 | * `Dallas Open Data `_ 24 | * `DataBC - data from the Province of British Columbia `_ 25 | * `Denver Open Data `_ 26 | * `Durham, NC Open Data `_ 27 | * `Edmonton, AB, Canada `_ 28 | * `England LGInform `_ 29 | * `EuroStat `_ 30 | * `FedStats `_ 31 | * `Finland `_ 32 | * `France `_ 33 | * `Fredericton, NB, Canada `_ 34 | * `Gatineau, QC, Canada `_ 35 | * `Germany `_ 36 | * `Ghent, Belgium `_ 37 | * `Glasgow, Scotland, UK `_ 38 | * `Greece `_ 39 | * `Guardian world governments `_ 40 | * `Halifax, NS, Canada `_ 41 | * `Helsinki Region, Finland `_ 42 | * `Hong Kong, China `_ 43 | * `Houston Open Data `_ 44 | * `Indian Government Data `_ 45 | * `Indonesian Data Portal `_ 46 | * `Ireland's Open Data Portal `_ 47 | * `Japan `_ 48 | * `Laval, QC, Canada `_ 49 | * `Lexington, KY `_ 50 | * `London Datastore, UK `_ 51 | * `London, ON, Canada `_ 52 | * `Los Angeles Open Data `_ 53 | * `MassGIS, Massachusetts, U.S. `_ 54 | * `Mexico `_ 55 | * `Missisauga, ON, Canada `_ 56 | * `Moldova `_ 57 | * `Moncton, NB, Canada `_ 58 | * `Montreal, QC, Canada `_ 59 | * `Netherlands `_ 60 | * `New Zealand `_ 61 | * `NYC betanyc `_ 62 | * `NYC Open Data `_ 63 | * `OECD `_ 64 | * `Oklahoma `_ 65 | * `Open Government Data (OGD) Platform India `_ 66 | * `Oregon `_ 67 | * `Ottawa, ON, Canada `_ 68 | * `Portland, Oregon `_ 69 | * `Portugal - Pordata organization `_ 70 | * `Puerto Rico Government `_ 71 | * `Quebec City, QC, Canada `_ 72 | * `Quebec Province of Canada `_ 73 | * `Regina SK, Canada `_ 74 | * `Rio de Janeiro, Brazil `_ 75 | * `Romania `_ 76 | * `Russia `_ 77 | * `San Francisco Data sets `_ 78 | * `Saskatchewan, Province of Canada `_ 79 | * `Seattle `_ 80 | * `Singapore Government Data `_ 81 | * `South Africa `_ 82 | * `South Africa Trade Statistics `_ 83 | * `State of Utah, US `_ 84 | * `Switzerland `_ 85 | * `Taiwan `_ 86 | * `Taiwan g0v `_ 87 | * `Texas Open Data `_ 88 | * `The World Bank `_ 89 | * `Toronto, ON, Canada `_ 90 | * `Tunisia `_ 91 | * `U.K. Government Data `_ 92 | * `U.S. American Community Survey `_ 93 | * `U.S. CDC Public Health datasets `_ 94 | * `U.S. Census Bureau `_ 95 | * `U.S. Department of Housing and Urban Development (HUD) `_ 96 | * `U.S. Federal Government Agencies `_ 97 | * `U.S. Federal Government Data Catalog `_ 98 | * `U.S. Food and Drug Administration (FDA) `_ 99 | * `U.S. National Center for Education Statistics (NCES) `_ 100 | * `U.S. Open Government `_ 101 | * `Uganda Bureau of Statistics `_ 102 | * `UK 2011 Census Open Atlas Project `_ 103 | * `United Nations `_ 104 | * `Uruguay `_ 105 | * `Vancouver, BC Open Data Catalog `_ 106 | * `Victoria, BC, Canada `_ 107 | * `Vienna, Austria `_ 108 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | Awesome Public Datasets 2 | ======================= 3 | .. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg 4 | :alt: Awesome 5 | :target: https://github.com/sindresorhus/awesome 6 | 7 | `This list of public data sources `_ 8 | are collected and tidied from blogs, answers, and user responses. 9 | Most of the data sets listed below are free, however, some are not. 10 | Other amazingly awesome lists can be found in the 11 | `awesome-awesomeness `_ and 12 | `sindresorhus's awesome `_ list. 13 | 14 | .. contents:: Table of Contents 15 | 16 | 17 | Agriculture 18 | ------------ 19 | * `U.S. Department of Agriculture's PLANTS Database `_ 20 | 21 | 22 | Biology 23 | ------- 24 | 25 | * `1000 Genomes `_ 26 | * `American Gut (Microbiome Project) `_ 27 | * `Broad Cancer Cell Line Encyclopedia (CCLE) `_ 28 | * `Broad Bioimage Benchmark Collection (BBBC) `_ 29 | * `Cell Image Library `_ 30 | * `Complete Genomics Public Data `_ 31 | * `EBI ArrayExpress `_ 32 | * `EBI Protein Data Bank in Europe `_ 33 | * `Electron Microscopy Pilot Image Archive (EMPIAR) `_ 34 | * `ENCODE project `_ 35 | * `Ensembl Genomes `_ 36 | * `Gene Expression Omnibus (GEO) `_ 37 | * `Gene Ontology (GO) `_ 38 | * `Global Biotic Interactions (GloBI) `_ 39 | * `Harvard Medical School (HMS) LINCS Project `_ 40 | * `Human Genome Diversity Project `_ 41 | * `Human Microbiome Project (HMP) `_ 42 | * `ICOS PSP Benchmark `_ 43 | * `International HapMap Project `_ 44 | * `Journal of Cell Biology DataViewer `_ 45 | * `MIT Cancer Genomics Data `_ 46 | * `NCBI Proteins `_ 47 | * `NCBI Taxonomy `_ 48 | * `NCI Genomic Data Commons `_ 49 | * `NIH Microarray data `_ or `FTP `_ (see FTP link on `RAW `_) 50 | * `OpenSNP genotypes data `_ 51 | * `Pathguid - Protein-Protein Interactions Catalog `_ 52 | * `Protein Data Bank `_ 53 | * `Psychiatric Genomics Consortium `_ 54 | * `PubChem Project `_ 55 | * `PubGene (now Coremine Medical) `_ 56 | * `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) `_ 57 | * `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) `_ 58 | * `Sequence Read Archive(SRA) `_ 59 | * `Stanford Microarray Data `_ 60 | * `Stowers Institute Original Data Repository `_ 61 | * `Systems Science of Biological Dynamics (SSBD) Database `_ 62 | * `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ 63 | * `The Catalogue of Life `_ 64 | * `The Personal Genome Project `_ or `PGP `_ 65 | * `UCSC Public Data `_ 66 | * `Universal Protein Resource (UnitProt) `_ 67 | * `UniGene `_ 68 | 69 | 70 | Climate/Weather 71 | --------------- 72 | * `Actuaries Climate Index `_ 73 | * `Australian Weather `_ 74 | * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ 75 | * `Brazilian Weather - Historical data (In Portuguese) `_ 76 | * `Canadian Meteorological Centre `_ 77 | * `Climate Data from UEA (updated monthly) `_ 78 | * `European Climate Assessment & Dataset `_ 79 | * `Global Climate Data Since 1929 `_ 80 | * `NASA Global Imagery Browse Services `_ 81 | * `NOAA Bering Sea Climate `_ 82 | * `NOAA Climate Datasets `_ 83 | * `NOAA Realtime Weather Models `_ 84 | * `NOAA SURFRAD Meteorology and Radiation Datasets `_ 85 | * `The World Bank Open Data Resources for Climate Change `_ 86 | * `UEA Climatic Research Unit `_ 87 | * `WorldClim - Global Climate Data `_ 88 | * `WU Historical Weather Worldwide `_ 89 | 90 | 91 | Complex Networks 92 | ---------------- 93 | 94 | * `AMiner Citation Network Dataset `_ 95 | * `CrossRef DOI URLs `_ 96 | * `DBLP Citation dataset `_ 97 | * `NBER Patent Citations `_ 98 | * `Network Repository with Interactive Exploratory Analysis Tools `_ 99 | * `NIST complex networks data collection `_ 100 | * `Protein-protein interaction network `_ 101 | * `PyPI and Maven Dependency Network `_ 102 | * `Scopus Citation Database `_ 103 | * `Small Network Data `_ 104 | * `Stanford GraphBase (Steven Skiena) `_ 105 | * `Stanford Large Network Dataset Collection `_ 106 | * `Stanford Longitudinal Network Data Sources `_ 107 | * `The Koblenz Network Collection `_ 108 | * `The Laboratory for Web Algorithmics (UNIMI) `_ 109 | * `The Nexus Network Repository `_ 110 | * `UCI Network Data Repository `_ 111 | * `UFL sparse matrix collection `_ 112 | * `WSU Graph Database `_ 113 | * `DIMACS Road Networks Collection `_ 114 | 115 | Computer Networks 116 | ----------------- 117 | 118 | * `3.5B Web Pages from CommonCrawl 2012 `_ 119 | * `53.5B Web clicks of 100K users in Indiana Univ. `_ 120 | * `CAIDA Internet Datasets `_ 121 | * `ClueWeb09 - 1B web pages `_ 122 | * `ClueWeb12 - 733M web pages `_ 123 | * `CommonCrawl Web Data over 7 years `_ 124 | * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ 125 | * `Criteo click-through data `_ 126 | * `OONI: Open Observatory of Network Interference - Internet censorship data `_ 127 | * `Open Mobile Data by MobiPerf `_ 128 | * `Rapid7 Sonar Internet Scans `_ 129 | * `UCSD Network Telescope, IPv4 /8 net `_ 130 | 131 | 132 | Contextual Data 133 | --------------- 134 | 135 | * `Context-aware data sets from five domains `_ 136 | 137 | 138 | Data Challenges 139 | --------------- 140 | 141 | * `Challenges in Machine Learning `_ 142 | * `CrowdANALYTIX dataX `_ 143 | * `D4D Challenge of Orange `_ 144 | * `DrivenData Competitions for Social Good `_ 145 | * `ICWSM Data Challenge (since 2009) `_ 146 | * `Kaggle Competition Data `_ 147 | * `KDD Cup by Tencent 2012 `_ 148 | * `Localytics Data Visualization Challenge `_ 149 | * `Netflix Prize `_ 150 | * `Space Apps Challenge `_ 151 | * `Telecom Italia Big Data Challenge `_ 152 | * `Yelp Dataset Challenge `_ 153 | * `Bruteforce Database `_ 154 | * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ 155 | 156 | Earth Science 157 | ------------- 158 | 159 | * `AQUASTAT - Global water resources and uses `_ 160 | * `BODC - marine data of ~22K vars `_ 161 | * `Earth Models `_ 162 | * `EOSDIS - NASA's earth observing system data `_ 163 | * `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ 164 | * `Marinexplore - Open Oceanographic Data `_ 165 | * `Smithsonian Institution Global Volcano and Eruption Database `_ 166 | * `USGS Earthquake Archives `_ 167 | 168 | 169 | Economics 170 | --------- 171 | 172 | * `American Economic Association (AEA) `_ 173 | * `EconData from UMD `_ 174 | * `Economic Freedom of the World Data `_ 175 | * `Historical MacroEconomc Statistics `_ 176 | * `International Economics Database `_ and `various data tools `_ 177 | * `International Trade Statistics `_ 178 | * `Internet Product Code Database `_ 179 | * `Joint External Debt Data Hub `_ 180 | * `Jon Haveman International Trade Data Links `_ 181 | * `OpenCorporates Database of Companies in the World `_ 182 | * `Our World in Data `_ 183 | * `SciencesPo World Trade Gravity Datasets `_ 184 | * `The Atlas of Economic Complexity `_ 185 | * `The Center for International Data `_ 186 | * `The Observatory of Economic Complexity `_ 187 | * `UN Commodity Trade Statistics `_ 188 | * `UN Human Development Reports `_ 189 | 190 | 191 | Education 192 | ------------ 193 | 194 | * `College Scorecard Data `_ 195 | * `Student Data from Free Code Camp `_ 196 | 197 | 198 | Energy 199 | ------ 200 | 201 | * `AMPds `_ 202 | * `BLUEd `_ 203 | * `COMBED `_ 204 | * `Dataport `_ 205 | * `DRED `_ 206 | * `ECO `_ 207 | * `EIA `_ 208 | * `HES `_ - Household Electricity Study, UK 209 | * `HFED `_ 210 | * `iAWE `_ 211 | * `PLAID `_ - the Plug Load Appliance Identification Dataset 212 | * `REDD `_ 213 | * `Tracebase `_ 214 | * `UK-DALE `_ - UK Domestic Appliance-Level Electricity 215 | * `WHITED `_ 216 | 217 | 218 | 219 | Finance 220 | ------- 221 | 222 | * `CBOE Futures Exchange `_ 223 | * `Google Finance `_ 224 | * `Google Trends `_ 225 | * `NASDAQ `_ 226 | * `OANDA `_ 227 | * `OSU Financial data `_ 228 | * `Quandl `_ 229 | * `St Louis Federal `_ 230 | * `Yahoo Finance `_ 231 | * `NYSE Market Data `_ (see FTP link on `RAW `_) 232 | 233 | 234 | GIS 235 | --- 236 | 237 | * `ArcGIS Open Data portal `_ 238 | * `Cambridge, MA, US, GIS data on GitHub `_ 239 | * `Factual Global Location Data `_ 240 | * `Geo Spatial Data from ASU `_ 241 | * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ 242 | * `GeoFabrik - OSM data extracted to a variety of formats and areas `_ 243 | * `GeoNames Worldwide `_ 244 | * `Global Administrative Areas Database (GADM) `_ 245 | * `Homeland Infrastructure Foundation-Level Data `_ 246 | * `Landsat 8 on AWS `_ 247 | * `List of all countries in all languages `_ 248 | * `National Weather Service GIS Data Portal `_ 249 | * `Natural Earth - vectors and rasters of the world `_ 250 | * `OpenAddresses `_ 251 | * `OpenStreetMap (OSM) `_ 252 | * `Pleiades - Gazetteer and graph of ancient places `_ 253 | * `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ 254 | * `TIGER/Line - U.S. boundaries and roads `_ 255 | * `TwoFishes - Foursquare's coarse geocoder `_ 256 | * `TZ Timezones shapfiles `_ 257 | * `UN Environmental Data `_ 258 | * `World boundaries from the U.S. Department of State `_ 259 | * `World countries in multiple formats `_ 260 | 261 | 262 | Government 263 | ---------- 264 | 265 | * `OpenDataSoft's list of 1,600 open data `_ 266 | * `Open Data for Africa `_ 267 | * `A list of cities and countries contributed by community `_ 268 | 269 | 270 | Healthcare 271 | ---------- 272 | 273 | * `EHDP Large Health Data Sets `_ 274 | * `Gapminder World demographic databases `_ 275 | * `Medicare Coverage Database (MCD), U.S. `_ 276 | * `Medicare Data Engine of medicare.gov Data `_ 277 | * `Medicare Data File `_ 278 | * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ 279 | * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ 280 | * `Open-ODS (structure of the UK NHS) `_ 281 | * `OpenPaymentsData, Healthcare financial relationship data `_ 282 | * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ 283 | * `World Health Organization Global Health Observatory `_ 284 | 285 | 286 | Image Processing 287 | ---------------- 288 | 289 | * `10k US Adult Faces Database `_ 290 | * `2GB of Photos of Cats `_ or `Archive version `_ 291 | * `Affective Image Classification `_ 292 | * `Animals with attributes `_ 293 | * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ 294 | * `Face Recognition Benchmark `_ 295 | * `ImageNet (in WordNet hierarchy) `_ 296 | * `Indoor Scene Recognition `_ 297 | * `International Affective Picture System, UFL `_ 298 | * `Massive Visual Memory Stimuli, MIT `_ 299 | * `MNIST database of handwritten digits, near 1 million examples `_ 300 | * `Several Shape-from-Silhouette Datasets `_ 301 | * `Stanford Dogs Dataset `_ 302 | * `SUN database, MIT `_ 303 | * `The Oxford-IIIT Pet Dataset `_ 304 | * `YouTube Faces Database `_ 305 | * `Adience Unfiltered faces for gender and age classification `_ 306 | * `The Action Similarity Labeling (ASLAN) Challenge `_ 307 | * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ 308 | * `Visual genome `_ 309 | 310 | Machine Learning 311 | ---------------- 312 | 313 | * `Delve Datasets for classification and regression (Univ. of Toronto) `_ 314 | * `Discogs Monthly Data `_ 315 | * `eBay Online Auctions (2012) `_ 316 | * `IMDb Database `_ 317 | * `Keel Repository for classification, regression and time series `_ 318 | * `Labeled Faces in the Wild (LFW) `_ 319 | * `Lending Club Loan Data `_ 320 | * `Machine Learning Data Set Repository `_ 321 | * `Million Song Dataset `_ 322 | * `More Song Datasets `_ 323 | * `New Yorker caption contest ratings `_ 324 | * `MovieLens Data Sets `_ 325 | * `RDataMining - "R and Data Mining" ebook data `_ 326 | * `Registered Meteorites on Earth `_ 327 | * `Restaurants Health Score Data in San Francisco `_ 328 | * `UCI Machine Learning Repository `_ 329 | * `Yahoo! Ratings and Classification Data `_ 330 | * `Youtube 8m `_ 331 | 332 | 333 | Museums 334 | ------- 335 | 336 | * `Canada Science and Technology Museums Corporation's Open Data `_ 337 | * `Cooper-Hewitt's Collection Database `_ 338 | * `Minneapolis Institute of Arts metadata `_ 339 | * `Natural History Museum (London) Data Portal `_ 340 | * `Rijksmuseum Historical Art Collection `_ 341 | * `Tate Collection metadata `_ 342 | * `The Getty vocabularies `_ 343 | 344 | 345 | Natural Language 346 | ---------------- 347 | 348 | * `Blogger Corpus `_ 349 | * `CLiPS Stylometry Investigation Corpus `_ 350 | * `ClueWeb09 FACC `_ 351 | * `ClueWeb12 FACC `_ 352 | * `DBpedia - 4.58M things with 583M facts `_ 353 | * `Flickr Personal Taxonomies `_ 354 | * `Freebase.com of people, places, and things `_ 355 | * `Google Books Ngrams (2.2TB) `_ 356 | * `Google MC-AFP, generated based on the public available Gigaword dataset using Paragraph Vectors `_ 357 | * `Google Web 5gram (1TB, 2006) `_ 358 | * `Gutenberg eBooks List `_ 359 | * `Hansards text chunks of Canadian Parliament `_ 360 | * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ 361 | * `Machine Translation of European languages `_ 362 | * `Multi-Domain Sentiment Dataset (version 2.0) `_ 363 | * `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ 364 | * `Personae Corpus `_ 365 | * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ 366 | * `SMS Spam Collection in English `_ 367 | * `USENET postings corpus of 2005~2011 `_ 368 | * `Wikidata - Wikipedia databases `_ 369 | * `Wikipedia Links data - 40 Million Entities in Context `_ 370 | * `Universal Dependencies `_ 371 | * `WordNet databases and tools `_ 372 | * `Open Multilingual Wordnet `_ 373 | * `Automatic Keyphrase Extracttion `_ 374 | 375 | 376 | Neuroscience 377 | ------------- 378 | 379 | * `Allen Institute Datasets `_ 380 | * `Brain Catalogue `_ 381 | * `Brainomics `_ 382 | * `CodeNeuro Datasets `_ 383 | * `Collaborative Research in Computational Neuroscience (CRCNS) `_ 384 | * `FCP-INDI `_ 385 | * `Human Connectome Project `_ 386 | * `NDAR `_ 387 | * `NIMH Data Archive `_ 388 | * `NeuroData `_ 389 | * `OASIS `_ 390 | * `OpenfMRI `_ 391 | * `Neuroelectro `_ 392 | * `Study Forrest `_ 393 | 394 | 395 | Physics 396 | ------- 397 | 398 | * `CERN Open Data Portal `_ 399 | * `Crystallography Open Database `_ 400 | * `NASA Exoplanet Archive `_ 401 | * `NSSDC (NASA) data of 550 space spacecraft `_ 402 | * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ 403 | 404 | 405 | Psychology/Cognition 406 | -------------------- 407 | 408 | * `OSU Cognitive Modeling Repository Datasets `_ 409 | 410 | 411 | Public Domains 412 | -------------- 413 | 414 | * `Amazon `_ 415 | * `Archive-it from Internet Archive `_ 416 | * `Archive.org Datasets `_ 417 | * `CMU JASA data archive `_ 418 | * `CMU StatLab collections `_ 419 | * `Data360 `_ 420 | * `Datamob.org `_ 421 | * `Data.World `_ 422 | * `Google `_ 423 | * `Infochimps `_ 424 | * `KDNuggets Data Collections `_ 425 | * `Microsoft Azure Data Market Free DataSets `_ 426 | * `Microsoft Data Science for Research `_ 427 | * `Numbray `_ 428 | * `Open Library Data Dumps `_ 429 | * `Reddit Datasets `_ 430 | * `RevolutionAnalytics Collection `_ 431 | * `Sample R data sets `_ 432 | * `Stats4Stem R data sets `_ 433 | * `StatSci.org `_ 434 | * `The Washington Post List `_ 435 | * `UCLA SOCR data collection `_ 436 | * `UFO Reports `_ 437 | * `Wikileaks 911 pager intercepts `_ 438 | * `Yahoo Webscope `_ 439 | 440 | 441 | Search Engines 442 | -------------- 443 | 444 | * `Academic Torrents of data sharing from UMB `_ 445 | * `Datahub.io `_ 446 | * `DataMarket (Qlik) `_ 447 | * `Harvard Dataverse Network of scientific data `_ 448 | * `ICPSR (UMICH) `_ 449 | * `Institute of Education Sciences `_ 450 | * `National Technical Reports Library `_ 451 | * `Open Data Certificates (beta) `_ 452 | * `OpenDataNetwork - A search engine of all Socrata powered data portals `_ 453 | * `Statista.com - statistics and Studies `_ 454 | * `Zenodo - An open dependable home for the long-tail of science `_ 455 | 456 | 457 | Social Networks 458 | --------------- 459 | 460 | * `72 hours #gamergate Twitter Scrape `_ 461 | * `Ancestry.com Forum Dataset over 10 years `_ 462 | * `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ 463 | * `CMU Enron Email of 150 users `_ 464 | * `EDRM Enron EMail of 151 users, hosted on S3 `_ 465 | * `Facebook Data Scrape (2005) `_ 466 | * `Facebook Social Networks from LAW (since 2007) `_ 467 | * `Foursquare from UMN/Sarwat (2013) `_ 468 | * `GitHub Collaboration Archive `_ 469 | * `Google Scholar citation relations `_ 470 | * `High-Resolution Contact Networks from Wearable Sensors `_ 471 | * `Mobile Social Networks from UMASS `_ 472 | * `Network Twitter Data `_ 473 | * `Reddit Comments `_ 474 | * `Skytrax' Air Travel Reviews Dataset `_ 475 | * `Social Twitter Data `_ 476 | * `SourceForge.net Research Data `_ 477 | * `Twitter Data for Sentiment Analysis `_ 478 | * `Twitter Data for Online Reputation Management `_ 479 | * `Twitter Graph of entire Twitter site `_ 480 | * `Twitter Scrape Calufa May 2011 `_ 481 | * `UNIMI/LAW Social Network Datasets `_ 482 | * `Yahoo! Graph and Social Data `_ 483 | * `Youtube Video Social Graph in 2007,2008 `_ 484 | 485 | 486 | Social Sciences 487 | --------------- 488 | 489 | * `ACLED (Armed Conflict Location & Event Data Project) `_ 490 | * `Canadian Legal Information Institute `_ 491 | * `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ 492 | * `Correlates of War Project `_ 493 | * `Cryptome Conspiracy Theory Items `_ 494 | * `Datacards `_ 495 | * `European Social Survey `_ 496 | * `FBI Hate Crime 2013 - aggregated data `_ 497 | * `Fragile States Index `_ 498 | * `GDELT Global Events Database `_ 499 | * `General Social Survey (GSS) since 1972 `_ 500 | * `German Social Survey `_ 501 | * `Global Religious Futures Project `_ 502 | * `Humanitarian Data Exchange `_ 503 | * `INFORM Index for Risk Management `_ 504 | * `Institute for Demographic Studies `_ 505 | * `International Networks Archive `_ 506 | * `International Social Survey Program ISSP `_ 507 | * `International Studies Compendium Project `_ 508 | * `James McGuire Cross National Data `_ 509 | * `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ 510 | * `Minnesota Population Center `_ 511 | * `MIT Reality Mining Dataset `_ 512 | * `Notre Dame Global Adaptation Index (NG-DAIN) `_ 513 | * `Open Crime and Policing Data in England, Wales and Northern Ireland `_ 514 | * `Paul Hensel General International Data Page `_ 515 | * `PewResearch Internet Survey Project `_ 516 | * `PewResearch Society Data Collection `_ 517 | * `Political Polarity Data `_ 518 | * `StackExchange Data Explorer `_ 519 | * `Terrorism Research and Analysis Consortium `_ 520 | * `Texas Inmates Executed Since 1984 `_ 521 | * `Titanic Survival Data Set `_ or `on Kaggle `_ 522 | * `UCB's Archive of Social Science Data (D-Lab) `_ 523 | * `Uppsala Conflict Data Program `_ 524 | * `UCLA Social Sciences Data Archive `_ 525 | * `UN Civil Society Database `_ 526 | * `Universities Worldwide `_ 527 | * `UPJOHN for Labor Employment Research `_ 528 | * `World Bank Open Data `_ 529 | * `WorldPop project - Worldwide human population distributions `_ 530 | 531 | 532 | Software 533 | -------- 534 | 535 | * `FLOSSmole data about free, libre, and open source software development `_ 536 | 537 | Sports 538 | ------ 539 | 540 | * `Basketball (NBA/NCAA/Euro) Player Database and Statistics `_ 541 | * `Betfair Historical Exchange Data `_ 542 | * `Cricsheet Matches (cricket) `_ 543 | * `Ergast Formula 1, from 1950 up to date (API) `_ 544 | * `Football/Soccer resources (data and APIs) `_ 545 | * `Lahman's Baseball Database `_ 546 | * `Pinhooker: Thoroughbred Bloodstock Sale Data `_ 547 | * `Retrosheet Baseball Statistics `_ 548 | * `Tennis database of rankings, results, and stats for ATP `_, `WTA `_, `Grand Slams `_ and `Match Charting Project `_ 549 | 550 | 551 | Time Series 552 | ----------- 553 | 554 | * `Databanks International Cross National Time Series Data Archive `_ 555 | * `Hard Drive Failure Rates `_ 556 | * `Heart Rate Time Series from MIT `_ 557 | * `Time Series Data Library (TSDL) from MU `_ 558 | * `UC Riverside Time Series Dataset `_ 559 | 560 | 561 | Transportation 562 | -------------- 563 | 564 | * `Airlines OD Data 1987-2008 `_ 565 | * `Bay Area Bike Share Data `_ 566 | * `Bike Share Systems (BSS) collection `_ 567 | * `GeoLife GPS Trajectory from Microsoft Research `_ 568 | * `German train system by Deutsche Bahn `_ 569 | * `Hubway Million Rides in MA `_ 570 | * `Marine Traffic - ship tracks, port calls and more `_ 571 | * `Montreal BIXI Bike Share `_ 572 | * `NYC Taxi Trip Data 2009- `_ 573 | * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ 574 | * `NYC Uber trip data April 2014 to September 2014 `_ 575 | * `Open Traffic collection `_ 576 | * `OpenFlights - airport, airline and route data `_ 577 | * `Philadelphia Bike Share Stations (JSON) `_ 578 | * `Plane Crash Database, since 1920 `_ 579 | * `RITA Airline On-Time Performance data `_ 580 | * `RITA/BTS transport data collection (TranStat) `_ 581 | * `Toronto Bike Share Stations (XML file) `_ 582 | * `Transport for London (TFL) `_ 583 | * `Travel Tracker Survey (TTS) for Chicago `_ 584 | * `U.S. Bureau of Transportation Statistics (BTS) `_ 585 | * `U.S. Domestic Flights 1990 to 2009 `_ 586 | * `U.S. Freight Analysis Framework since 2007 `_ 587 | 588 | 589 | Complementary Collections 590 | ------------------------- 591 | 592 | * `Data Packaged Core Datasets `_ 593 | * `Database of Scientific Code Contributions `_ 594 | * DataWrangling: `Some Datasets Available on the Web `_ 595 | * Inside-r: `Finding Data on the Internet `_ 596 | * OpenDataMonitor: `An overview of available open data resources in Europe `_ 597 | * Quora: `Where can I find large datasets open to the public? `_ 598 | * RS.io: `100+ Interesting Data Sets for Statistics `_ 599 | * StaTrek: `Leveraging open data to understand urban lives `_ 600 | --------------------------------------------------------------------------------