├── commercial-data.md ├── LICENSE └── README.md /commercial-data.md: -------------------------------------------------------------------------------- 1 | ## Commercial, Non-Open, or Limited Data Sources 2 | 3 | ### Data Set Vendors 4 | 5 | * [UPenn Linguistic Data Consortium](https://catalog.ldc.upenn.edu/topten) 6 | 7 | ### Data Sets 8 | 9 | * [Economic Time Series Page](http://www.economagic.com/popular.htm) 10 | * [Lending Club Loan Data](https://www.lendingclub.com/info/download-data.action) 11 | * [Natural Language Corpus Data](http://norvig.com/ngrams/) (Peter Norvig) 12 | 13 | ### APIs 14 | 15 | * [Twitter](https://dev.twitter.com/docs/api/1.1) ^ 16 | 17 | ^ _requires registration_ 18 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 The Open Source Data Science Masters 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Open Data Sources 2 | 3 | * _**Availability and access**: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form._ 4 | * _**Reuse and redistribution**: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable._ 5 | * _**Universal participation**: everyone must be able to use, reuse and redistribute — there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed._ 6 | 7 | -- _Definition by the [Open Knowledge Foundation](https://okfn.org/opendata/)_ 8 | 9 | ### Lists of Data Sets 10 | * [Interesting Data Sets for Statisticians](http://rs.io/100-interesting-data-sets-for-statistics/) - editorialized, entertaining set of open data 11 | 12 | ### Open Data 13 | 14 | * [List of Public Datasets](https://github.com/caesar0301/awesome-public-datasets) - user-curated 15 | * [DBpedia](http://wiki.dbpedia.org/Datasets) - utilizing a large multi-domain ontology 16 | * [Public Data Sets on AWS](https://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1) - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more. 17 | 18 | ### Private Opened Data 19 | * [New York Times](http://data.nytimes.com/) - vocabulary as linked open data; linked vocabulary of people, places, companies, etc. 20 | 21 | ### Governmental Data 22 | 23 | [Compendium of Governmental Open Data Sources](http://datacatalogs.org/) 24 | 25 | * [Data.gov (USA)](http://www.data.gov/) 26 | * [Africa Open Data](http://africaopendata.org/dataset) 27 | * [US Census](http://www.census.gov/data/developers/data-sets.html) - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more. 28 | 29 | ### Non-Governmental Org Data 30 | 31 | * [The World Bank](http://data.worldbank.org/topic/private-sector) - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Bank finances 32 | * ^[Pew Research Center's Internet Project](http://www.pewinternet.org/datasets/pages/3/) 33 | 34 | ### Academic Data 35 | 36 | [Inter-university Consortium for Political and Social Research Data Portal](http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/subject.jsp) 37 | 38 | * [Surveys of Economic Attitudes and Behavior](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.B.) 39 | * [Continuing Series of Consumer Surveys](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.A.) 40 | * [Historical and Contemporary Economic Processes and Indicators](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.C.) 41 | 42 | ### Truly Random Data 43 | 44 | * [200,000+ Jeopardy! Questions in a JSON file](http://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/) 45 | * [10,000 annotated images of cats](http://137.189.35.203/WebUI/CatDatabase/catData.html) 46 | 47 | ## Open Data Resources 48 | 49 | * reddit [r/datasets](http://www.reddit.com/r/datasets/) 50 | * [Open Data - Stack Exchange](http://opendata.stackexchange.com/) (discussion) 51 | 52 | ^ _license is not truly open, involves some limitations_ 53 | --------------------------------------------------------------------------------