├── Datasets
└── titanic.csv.zip
├── .travis.yml
├── LICENSE
├── Government.rst
└── README.rst
/Datasets/titanic.csv.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SudalaiRajkumar/awesome-public-datasets/HEAD/Datasets/titanic.csv.zip
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | # language: ruby
2 | # rvm:
3 | # - 2.2
4 | # before_script:
5 | # - gem install awesome_bot
6 | # script:
7 | # - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/
8 | # - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov,arcgis,bixi
9 | # - site503=datamob.org,research.microsoft.com
10 | # - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503
11 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2014-2015 Xiaming Chen and other contributors to this list.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
23 |
--------------------------------------------------------------------------------
/Government.rst:
--------------------------------------------------------------------------------
1 | Government
2 | ----------
3 |
4 | * `EveryPolitician, ongoing project collating and sharing data on every politician. `_
5 |
6 | * `Alberta, Province of Canada `_
7 | * `Antwerp, Belgium `_
8 | * `Argentina (non official) `_
9 | * `Argentina `_
10 | * `Austin, TX, US `_
11 | * `Australia (abs.gov.au) `_
12 | * `Australia (data.gov.au) `_
13 | * `Austria (data.gv.at) `_
14 | * `Baton Rouge, LA, US `_
15 | * `Belgium `_
16 | * `Brazil `_
17 | * `Buenos Aires, Argentina `_
18 | * `Calgary, AB, Canada `_
19 | * `Cambridge, MA, US `_
20 | * `Canada `_
21 | * `Chicago `_
22 | * `Chile `_
23 | * `Dallas Open Data `_
24 | * `DataBC - data from the Province of British Columbia `_
25 | * `Denver Open Data `_
26 | * `Durham, NC Open Data `_
27 | * `Edmonton, AB, Canada `_
28 | * `England LGInform `_
29 | * `EuroStat `_
30 | * `FedStats `_
31 | * `Finland `_
32 | * `France `_
33 | * `Fredericton, NB, Canada `_
34 | * `Gatineau, QC, Canada `_
35 | * `Germany `_
36 | * `Ghent, Belgium `_
37 | * `Glasgow, Scotland, UK `_
38 | * `Greece `_
39 | * `Guardian world governments `_
40 | * `Halifax, NS, Canada `_
41 | * `Helsinki Region, Finland `_
42 | * `Hong Kong, China `_
43 | * `Houston Open Data `_
44 | * `Indian Government Data `_
45 | * `Indonesian Data Portal `_
46 | * `Ireland's Open Data Portal `_
47 | * `Japan `_
48 | * `Laval, QC, Canada `_
49 | * `Lexington, KY `_
50 | * `London Datastore, UK `_
51 | * `London, ON, Canada `_
52 | * `Los Angeles Open Data `_
53 | * `MassGIS, Massachusetts, U.S. `_
54 | * `Mexico `_
55 | * `Missisauga, ON, Canada `_
56 | * `Moldova `_
57 | * `Moncton, NB, Canada `_
58 | * `Montreal, QC, Canada `_
59 | * `Netherlands `_
60 | * `New Zealand `_
61 | * `NYC betanyc `_
62 | * `NYC Open Data `_
63 | * `OECD `_
64 | * `Oklahoma `_
65 | * `Open Government Data (OGD) Platform India `_
66 | * `Oregon `_
67 | * `Ottawa, ON, Canada `_
68 | * `Portland, Oregon `_
69 | * `Portugal - Pordata organization `_
70 | * `Puerto Rico Government `_
71 | * `Quebec City, QC, Canada `_
72 | * `Quebec Province of Canada `_
73 | * `Regina SK, Canada `_
74 | * `Rio de Janeiro, Brazil `_
75 | * `Romania `_
76 | * `Russia `_
77 | * `San Francisco Data sets `_
78 | * `Saskatchewan, Province of Canada `_
79 | * `Seattle `_
80 | * `Singapore Government Data `_
81 | * `South Africa `_
82 | * `South Africa Trade Statistics `_
83 | * `State of Utah, US `_
84 | * `Switzerland `_
85 | * `Taiwan `_
86 | * `Taiwan g0v `_
87 | * `Texas Open Data `_
88 | * `The World Bank `_
89 | * `Toronto, ON, Canada `_
90 | * `Tunisia `_
91 | * `U.K. Government Data `_
92 | * `U.S. American Community Survey `_
93 | * `U.S. CDC Public Health datasets `_
94 | * `U.S. Census Bureau `_
95 | * `U.S. Department of Housing and Urban Development (HUD) `_
96 | * `U.S. Federal Government Agencies `_
97 | * `U.S. Federal Government Data Catalog `_
98 | * `U.S. Food and Drug Administration (FDA) `_
99 | * `U.S. National Center for Education Statistics (NCES) `_
100 | * `U.S. Open Government `_
101 | * `Uganda Bureau of Statistics `_
102 | * `UK 2011 Census Open Atlas Project `_
103 | * `United Nations `_
104 | * `Uruguay `_
105 | * `Vancouver, BC Open Data Catalog `_
106 | * `Victoria, BC, Canada `_
107 | * `Vienna, Austria `_
108 |
--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
1 | Awesome Public Datasets
2 | =======================
3 | .. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg
4 | :alt: Awesome
5 | :target: https://github.com/sindresorhus/awesome
6 |
7 | `This list of public data sources `_
8 | are collected and tidied from blogs, answers, and user responses.
9 | Most of the data sets listed below are free, however, some are not.
10 | Other amazingly awesome lists can be found in the
11 | `awesome-awesomeness `_ and
12 | `sindresorhus's awesome `_ list.
13 |
14 | .. contents:: Table of Contents
15 |
16 |
17 | Agriculture
18 | ------------
19 | * `U.S. Department of Agriculture's PLANTS Database `_
20 |
21 |
22 | Biology
23 | -------
24 |
25 | * `1000 Genomes `_
26 | * `American Gut (Microbiome Project) `_
27 | * `Broad Cancer Cell Line Encyclopedia (CCLE) `_
28 | * `Broad Bioimage Benchmark Collection (BBBC) `_
29 | * `Cell Image Library `_
30 | * `Complete Genomics Public Data `_
31 | * `EBI ArrayExpress `_
32 | * `EBI Protein Data Bank in Europe `_
33 | * `Electron Microscopy Pilot Image Archive (EMPIAR) `_
34 | * `ENCODE project `_
35 | * `Ensembl Genomes `_
36 | * `Gene Expression Omnibus (GEO) `_
37 | * `Gene Ontology (GO) `_
38 | * `Global Biotic Interactions (GloBI) `_
39 | * `Harvard Medical School (HMS) LINCS Project `_
40 | * `Human Genome Diversity Project `_
41 | * `Human Microbiome Project (HMP) `_
42 | * `ICOS PSP Benchmark `_
43 | * `International HapMap Project `_
44 | * `Journal of Cell Biology DataViewer `_
45 | * `MIT Cancer Genomics Data `_
46 | * `NCBI Proteins `_
47 | * `NCBI Taxonomy `_
48 | * `NCI Genomic Data Commons `_
49 | * `NIH Microarray data `_ or `FTP `_ (see FTP link on `RAW `_)
50 | * `OpenSNP genotypes data `_
51 | * `Pathguid - Protein-Protein Interactions Catalog `_
52 | * `Protein Data Bank `_
53 | * `Psychiatric Genomics Consortium `_
54 | * `PubChem Project `_
55 | * `PubGene (now Coremine Medical) `_
56 | * `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) `_
57 | * `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) `_
58 | * `Sequence Read Archive(SRA) `_
59 | * `Stanford Microarray Data `_
60 | * `Stowers Institute Original Data Repository `_
61 | * `Systems Science of Biological Dynamics (SSBD) Database `_
62 | * `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_
63 | * `The Catalogue of Life `_
64 | * `The Personal Genome Project `_ or `PGP `_
65 | * `UCSC Public Data `_
66 | * `Universal Protein Resource (UnitProt) `_
67 | * `UniGene `_
68 |
69 |
70 | Climate/Weather
71 | ---------------
72 | * `Actuaries Climate Index `_
73 | * `Australian Weather `_
74 | * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_
75 | * `Brazilian Weather - Historical data (In Portuguese) `_
76 | * `Canadian Meteorological Centre `_
77 | * `Climate Data from UEA (updated monthly) `_
78 | * `European Climate Assessment & Dataset `_
79 | * `Global Climate Data Since 1929 `_
80 | * `NASA Global Imagery Browse Services `_
81 | * `NOAA Bering Sea Climate `_
82 | * `NOAA Climate Datasets `_
83 | * `NOAA Realtime Weather Models `_
84 | * `NOAA SURFRAD Meteorology and Radiation Datasets `_
85 | * `The World Bank Open Data Resources for Climate Change `_
86 | * `UEA Climatic Research Unit `_
87 | * `WorldClim - Global Climate Data `_
88 | * `WU Historical Weather Worldwide `_
89 |
90 |
91 | Complex Networks
92 | ----------------
93 |
94 | * `AMiner Citation Network Dataset `_
95 | * `CrossRef DOI URLs `_
96 | * `DBLP Citation dataset `_
97 | * `NBER Patent Citations `_
98 | * `Network Repository with Interactive Exploratory Analysis Tools `_
99 | * `NIST complex networks data collection `_
100 | * `Protein-protein interaction network `_
101 | * `PyPI and Maven Dependency Network `_
102 | * `Scopus Citation Database `_
103 | * `Small Network Data `_
104 | * `Stanford GraphBase (Steven Skiena) `_
105 | * `Stanford Large Network Dataset Collection `_
106 | * `Stanford Longitudinal Network Data Sources `_
107 | * `The Koblenz Network Collection `_
108 | * `The Laboratory for Web Algorithmics (UNIMI) `_
109 | * `The Nexus Network Repository `_
110 | * `UCI Network Data Repository `_
111 | * `UFL sparse matrix collection `_
112 | * `WSU Graph Database `_
113 | * `DIMACS Road Networks Collection `_
114 |
115 | Computer Networks
116 | -----------------
117 |
118 | * `3.5B Web Pages from CommonCrawl 2012 `_
119 | * `53.5B Web clicks of 100K users in Indiana Univ. `_
120 | * `CAIDA Internet Datasets `_
121 | * `ClueWeb09 - 1B web pages `_
122 | * `ClueWeb12 - 733M web pages `_
123 | * `CommonCrawl Web Data over 7 years `_
124 | * `CRAWDAD Wireless datasets from Dartmouth Univ. `_
125 | * `Criteo click-through data `_
126 | * `OONI: Open Observatory of Network Interference - Internet censorship data `_
127 | * `Open Mobile Data by MobiPerf `_
128 | * `Rapid7 Sonar Internet Scans `_
129 | * `UCSD Network Telescope, IPv4 /8 net `_
130 |
131 |
132 | Contextual Data
133 | ---------------
134 |
135 | * `Context-aware data sets from five domains `_
136 |
137 |
138 | Data Challenges
139 | ---------------
140 |
141 | * `Challenges in Machine Learning `_
142 | * `CrowdANALYTIX dataX `_
143 | * `D4D Challenge of Orange `_
144 | * `DrivenData Competitions for Social Good `_
145 | * `ICWSM Data Challenge (since 2009) `_
146 | * `Kaggle Competition Data `_
147 | * `KDD Cup by Tencent 2012 `_
148 | * `Localytics Data Visualization Challenge `_
149 | * `Netflix Prize `_
150 | * `Space Apps Challenge `_
151 | * `Telecom Italia Big Data Challenge `_
152 | * `Yelp Dataset Challenge `_
153 | * `Bruteforce Database `_
154 | * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_
155 |
156 | Earth Science
157 | -------------
158 |
159 | * `AQUASTAT - Global water resources and uses `_
160 | * `BODC - marine data of ~22K vars `_
161 | * `Earth Models `_
162 | * `EOSDIS - NASA's earth observing system data `_
163 | * `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_
164 | * `Marinexplore - Open Oceanographic Data `_
165 | * `Smithsonian Institution Global Volcano and Eruption Database `_
166 | * `USGS Earthquake Archives `_
167 |
168 |
169 | Economics
170 | ---------
171 |
172 | * `American Economic Association (AEA) `_
173 | * `EconData from UMD `_
174 | * `Economic Freedom of the World Data `_
175 | * `Historical MacroEconomc Statistics `_
176 | * `International Economics Database `_ and `various data tools `_
177 | * `International Trade Statistics `_
178 | * `Internet Product Code Database `_
179 | * `Joint External Debt Data Hub `_
180 | * `Jon Haveman International Trade Data Links `_
181 | * `OpenCorporates Database of Companies in the World `_
182 | * `Our World in Data `_
183 | * `SciencesPo World Trade Gravity Datasets `_
184 | * `The Atlas of Economic Complexity `_
185 | * `The Center for International Data `_
186 | * `The Observatory of Economic Complexity `_
187 | * `UN Commodity Trade Statistics `_
188 | * `UN Human Development Reports `_
189 |
190 |
191 | Education
192 | ------------
193 |
194 | * `College Scorecard Data `_
195 | * `Student Data from Free Code Camp `_
196 |
197 |
198 | Energy
199 | ------
200 |
201 | * `AMPds `_
202 | * `BLUEd `_
203 | * `COMBED `_
204 | * `Dataport `_
205 | * `DRED `_
206 | * `ECO `_
207 | * `EIA `_
208 | * `HES `_ - Household Electricity Study, UK
209 | * `HFED `_
210 | * `iAWE `_
211 | * `PLAID `_ - the Plug Load Appliance Identification Dataset
212 | * `REDD `_
213 | * `Tracebase `_
214 | * `UK-DALE `_ - UK Domestic Appliance-Level Electricity
215 | * `WHITED `_
216 |
217 |
218 |
219 | Finance
220 | -------
221 |
222 | * `CBOE Futures Exchange `_
223 | * `Google Finance `_
224 | * `Google Trends `_
225 | * `NASDAQ `_
226 | * `OANDA `_
227 | * `OSU Financial data `_
228 | * `Quandl `_
229 | * `St Louis Federal `_
230 | * `Yahoo Finance `_
231 | * `NYSE Market Data `_ (see FTP link on `RAW `_)
232 |
233 |
234 | GIS
235 | ---
236 |
237 | * `ArcGIS Open Data portal `_
238 | * `Cambridge, MA, US, GIS data on GitHub `_
239 | * `Factual Global Location Data `_
240 | * `Geo Spatial Data from ASU `_
241 | * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_
242 | * `GeoFabrik - OSM data extracted to a variety of formats and areas `_
243 | * `GeoNames Worldwide `_
244 | * `Global Administrative Areas Database (GADM) `_
245 | * `Homeland Infrastructure Foundation-Level Data `_
246 | * `Landsat 8 on AWS `_
247 | * `List of all countries in all languages `_
248 | * `National Weather Service GIS Data Portal `_
249 | * `Natural Earth - vectors and rasters of the world `_
250 | * `OpenAddresses `_
251 | * `OpenStreetMap (OSM) `_
252 | * `Pleiades - Gazetteer and graph of ancient places `_
253 | * `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_
254 | * `TIGER/Line - U.S. boundaries and roads `_
255 | * `TwoFishes - Foursquare's coarse geocoder `_
256 | * `TZ Timezones shapfiles `_
257 | * `UN Environmental Data `_
258 | * `World boundaries from the U.S. Department of State `_
259 | * `World countries in multiple formats `_
260 |
261 |
262 | Government
263 | ----------
264 |
265 | * `OpenDataSoft's list of 1,600 open data `_
266 | * `Open Data for Africa `_
267 | * `A list of cities and countries contributed by community `_
268 |
269 |
270 | Healthcare
271 | ----------
272 |
273 | * `EHDP Large Health Data Sets `_
274 | * `Gapminder World demographic databases `_
275 | * `Medicare Coverage Database (MCD), U.S. `_
276 | * `Medicare Data Engine of medicare.gov Data `_
277 | * `Medicare Data File `_
278 | * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_
279 | * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_
280 | * `Open-ODS (structure of the UK NHS) `_
281 | * `OpenPaymentsData, Healthcare financial relationship data `_
282 | * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_
283 | * `World Health Organization Global Health Observatory `_
284 |
285 |
286 | Image Processing
287 | ----------------
288 |
289 | * `10k US Adult Faces Database `_
290 | * `2GB of Photos of Cats `_ or `Archive version `_
291 | * `Affective Image Classification `_
292 | * `Animals with attributes `_
293 | * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_
294 | * `Face Recognition Benchmark `_
295 | * `ImageNet (in WordNet hierarchy) `_
296 | * `Indoor Scene Recognition `_
297 | * `International Affective Picture System, UFL `_
298 | * `Massive Visual Memory Stimuli, MIT `_
299 | * `MNIST database of handwritten digits, near 1 million examples `_
300 | * `Several Shape-from-Silhouette Datasets `_
301 | * `Stanford Dogs Dataset `_
302 | * `SUN database, MIT `_
303 | * `The Oxford-IIIT Pet Dataset `_
304 | * `YouTube Faces Database `_
305 | * `Adience Unfiltered faces for gender and age classification `_
306 | * `The Action Similarity Labeling (ASLAN) Challenge `_
307 | * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_
308 | * `Visual genome `_
309 |
310 | Machine Learning
311 | ----------------
312 |
313 | * `Delve Datasets for classification and regression (Univ. of Toronto) `_
314 | * `Discogs Monthly Data `_
315 | * `eBay Online Auctions (2012) `_
316 | * `IMDb Database `_
317 | * `Keel Repository for classification, regression and time series `_
318 | * `Labeled Faces in the Wild (LFW) `_
319 | * `Lending Club Loan Data `_
320 | * `Machine Learning Data Set Repository `_
321 | * `Million Song Dataset `_
322 | * `More Song Datasets `_
323 | * `New Yorker caption contest ratings `_
324 | * `MovieLens Data Sets `_
325 | * `RDataMining - "R and Data Mining" ebook data `_
326 | * `Registered Meteorites on Earth `_
327 | * `Restaurants Health Score Data in San Francisco `_
328 | * `UCI Machine Learning Repository `_
329 | * `Yahoo! Ratings and Classification Data `_
330 | * `Youtube 8m `_
331 |
332 |
333 | Museums
334 | -------
335 |
336 | * `Canada Science and Technology Museums Corporation's Open Data `_
337 | * `Cooper-Hewitt's Collection Database `_
338 | * `Minneapolis Institute of Arts metadata `_
339 | * `Natural History Museum (London) Data Portal `_
340 | * `Rijksmuseum Historical Art Collection `_
341 | * `Tate Collection metadata `_
342 | * `The Getty vocabularies `_
343 |
344 |
345 | Natural Language
346 | ----------------
347 |
348 | * `Blogger Corpus `_
349 | * `CLiPS Stylometry Investigation Corpus `_
350 | * `ClueWeb09 FACC `_
351 | * `ClueWeb12 FACC `_
352 | * `DBpedia - 4.58M things with 583M facts `_
353 | * `Flickr Personal Taxonomies `_
354 | * `Freebase.com of people, places, and things `_
355 | * `Google Books Ngrams (2.2TB) `_
356 | * `Google MC-AFP, generated based on the public available Gigaword dataset using Paragraph Vectors `_
357 | * `Google Web 5gram (1TB, 2006) `_
358 | * `Gutenberg eBooks List `_
359 | * `Hansards text chunks of Canadian Parliament `_
360 | * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_
361 | * `Machine Translation of European languages `_
362 | * `Multi-Domain Sentiment Dataset (version 2.0) `_
363 | * `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_
364 | * `Personae Corpus `_
365 | * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_
366 | * `SMS Spam Collection in English `_
367 | * `USENET postings corpus of 2005~2011 `_
368 | * `Wikidata - Wikipedia databases `_
369 | * `Wikipedia Links data - 40 Million Entities in Context `_
370 | * `Universal Dependencies `_
371 | * `WordNet databases and tools `_
372 | * `Open Multilingual Wordnet `_
373 | * `Automatic Keyphrase Extracttion `_
374 |
375 |
376 | Neuroscience
377 | -------------
378 |
379 | * `Allen Institute Datasets `_
380 | * `Brain Catalogue `_
381 | * `Brainomics `_
382 | * `CodeNeuro Datasets `_
383 | * `Collaborative Research in Computational Neuroscience (CRCNS) `_
384 | * `FCP-INDI `_
385 | * `Human Connectome Project `_
386 | * `NDAR `_
387 | * `NIMH Data Archive `_
388 | * `NeuroData `_
389 | * `OASIS `_
390 | * `OpenfMRI `_
391 | * `Neuroelectro `_
392 | * `Study Forrest `_
393 |
394 |
395 | Physics
396 | -------
397 |
398 | * `CERN Open Data Portal `_
399 | * `Crystallography Open Database `_
400 | * `NASA Exoplanet Archive `_
401 | * `NSSDC (NASA) data of 550 space spacecraft `_
402 | * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_
403 |
404 |
405 | Psychology/Cognition
406 | --------------------
407 |
408 | * `OSU Cognitive Modeling Repository Datasets `_
409 |
410 |
411 | Public Domains
412 | --------------
413 |
414 | * `Amazon `_
415 | * `Archive-it from Internet Archive `_
416 | * `Archive.org Datasets `_
417 | * `CMU JASA data archive `_
418 | * `CMU StatLab collections `_
419 | * `Data360 `_
420 | * `Datamob.org `_
421 | * `Data.World `_
422 | * `Google `_
423 | * `Infochimps `_
424 | * `KDNuggets Data Collections `_
425 | * `Microsoft Azure Data Market Free DataSets `_
426 | * `Microsoft Data Science for Research `_
427 | * `Numbray `_
428 | * `Open Library Data Dumps `_
429 | * `Reddit Datasets `_
430 | * `RevolutionAnalytics Collection `_
431 | * `Sample R data sets `_
432 | * `Stats4Stem R data sets `_
433 | * `StatSci.org `_
434 | * `The Washington Post List `_
435 | * `UCLA SOCR data collection `_
436 | * `UFO Reports `_
437 | * `Wikileaks 911 pager intercepts `_
438 | * `Yahoo Webscope `_
439 |
440 |
441 | Search Engines
442 | --------------
443 |
444 | * `Academic Torrents of data sharing from UMB `_
445 | * `Datahub.io `_
446 | * `DataMarket (Qlik) `_
447 | * `Harvard Dataverse Network of scientific data `_
448 | * `ICPSR (UMICH) `_
449 | * `Institute of Education Sciences `_
450 | * `National Technical Reports Library `_
451 | * `Open Data Certificates (beta) `_
452 | * `OpenDataNetwork - A search engine of all Socrata powered data portals `_
453 | * `Statista.com - statistics and Studies `_
454 | * `Zenodo - An open dependable home for the long-tail of science `_
455 |
456 |
457 | Social Networks
458 | ---------------
459 |
460 | * `72 hours #gamergate Twitter Scrape `_
461 | * `Ancestry.com Forum Dataset over 10 years `_
462 | * `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_
463 | * `CMU Enron Email of 150 users `_
464 | * `EDRM Enron EMail of 151 users, hosted on S3 `_
465 | * `Facebook Data Scrape (2005) `_
466 | * `Facebook Social Networks from LAW (since 2007) `_
467 | * `Foursquare from UMN/Sarwat (2013) `_
468 | * `GitHub Collaboration Archive `_
469 | * `Google Scholar citation relations `_
470 | * `High-Resolution Contact Networks from Wearable Sensors `_
471 | * `Mobile Social Networks from UMASS `_
472 | * `Network Twitter Data `_
473 | * `Reddit Comments `_
474 | * `Skytrax' Air Travel Reviews Dataset `_
475 | * `Social Twitter Data `_
476 | * `SourceForge.net Research Data `_
477 | * `Twitter Data for Sentiment Analysis `_
478 | * `Twitter Data for Online Reputation Management `_
479 | * `Twitter Graph of entire Twitter site `_
480 | * `Twitter Scrape Calufa May 2011 `_
481 | * `UNIMI/LAW Social Network Datasets `_
482 | * `Yahoo! Graph and Social Data `_
483 | * `Youtube Video Social Graph in 2007,2008 `_
484 |
485 |
486 | Social Sciences
487 | ---------------
488 |
489 | * `ACLED (Armed Conflict Location & Event Data Project) `_
490 | * `Canadian Legal Information Institute `_
491 | * `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_
492 | * `Correlates of War Project `_
493 | * `Cryptome Conspiracy Theory Items `_
494 | * `Datacards `_
495 | * `European Social Survey `_
496 | * `FBI Hate Crime 2013 - aggregated data `_
497 | * `Fragile States Index `_
498 | * `GDELT Global Events Database `_
499 | * `General Social Survey (GSS) since 1972 `_
500 | * `German Social Survey `_
501 | * `Global Religious Futures Project `_
502 | * `Humanitarian Data Exchange `_
503 | * `INFORM Index for Risk Management `_
504 | * `Institute for Demographic Studies `_
505 | * `International Networks Archive `_
506 | * `International Social Survey Program ISSP `_
507 | * `International Studies Compendium Project `_
508 | * `James McGuire Cross National Data `_
509 | * `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_
510 | * `Minnesota Population Center `_
511 | * `MIT Reality Mining Dataset `_
512 | * `Notre Dame Global Adaptation Index (NG-DAIN) `_
513 | * `Open Crime and Policing Data in England, Wales and Northern Ireland `_
514 | * `Paul Hensel General International Data Page `_
515 | * `PewResearch Internet Survey Project `_
516 | * `PewResearch Society Data Collection `_
517 | * `Political Polarity Data `_
518 | * `StackExchange Data Explorer `_
519 | * `Terrorism Research and Analysis Consortium `_
520 | * `Texas Inmates Executed Since 1984 `_
521 | * `Titanic Survival Data Set `_ or `on Kaggle `_
522 | * `UCB's Archive of Social Science Data (D-Lab) `_
523 | * `Uppsala Conflict Data Program `_
524 | * `UCLA Social Sciences Data Archive `_
525 | * `UN Civil Society Database `_
526 | * `Universities Worldwide `_
527 | * `UPJOHN for Labor Employment Research `_
528 | * `World Bank Open Data