├── README.md ├── data ├── aliases.json ├── allCountriesPost2010-2014Filtered15-150.json ├── featuresKept.json ├── labeled_claims │ ├── consumer_price_index_claims.xlsx │ ├── cpi_inflation_rate_claims.xlsx │ ├── diesel_price_liter_claims.xlsx │ ├── fertility_rate_claims.xlsx │ ├── gdp_growth_rate_claims.xlsx │ ├── gdp_nominal_claims.xlsx │ ├── gdp_nominal_per_capita_claims.xlsx │ ├── gni_in_ppp_dollars_claims.xlsx │ ├── gni_per_capita_in_ppp_dollars_claims.xlsx │ ├── health_expenditure_as_percent_of_gdp_claims.xlsx │ ├── internet_users_percent_population_claims.xlsx │ ├── life_expectancy_claims.xlsx │ ├── population_claims.xlsx │ ├── population_growth_rate_claims.xlsx │ ├── prevalence_of_undernourisment_claims.xlsx │ └── renewable_freshwater_per_capita_claims.xlsx ├── locationNames ├── propertiesOfInterest.json ├── test.json ├── theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json └── train.json └── src ├── main ├── abstractPredictor.py ├── baselinePredictor.py ├── buildMatrix.py ├── factChecker.py ├── fixedValuePredictor.py └── matrixFiltering.py └── utils ├── FreeBaseDownload.py ├── bingDownload.py ├── dataFiltering.py ├── dataSplits.py ├── htmlPageDownload.py ├── numberExtraction.py └── obtainAliases.py /README.md: -------------------------------------------------------------------------------- 1 | # Simple Numerical Fact-Checker 2 | Fact checker for simple claims about statistical properties 3 | 4 | This repository contains the code and data needed to reproduce the results of the paper: 5 | 6 | Identification and Verification of Simple Claims about Statistical Properties 7 | 8 | Andreas Vlachos and Sebastian Riedel, EMNLP 2015 9 | 10 | Preprocessing: 11 | 12 | 1. FreeBaseDownload.py: to get the JSONs for all statistical regions in FreeBase. 13 | 2. numberExtraction.py: to extract the most recent numbers mentioned for each statistical region in a triple form: region-property-value 14 | 3. dataFiltering.py: to get the countries and properties with most values filled in (2 parameters to play with). From this we get the file data/allCountriesPost2010-2014Filtered15-150.json. 15 | 4. bingDownload.py: to run queries of the form "region + property" on Bing and get JSONs back with the links 16 | 5. htmlDownload.py: to get the html from the links 17 | 6. obtainAliases.py: Gets the common aliases for the statistical regions considered needed for the matrix filtering later on. From this we get the file data/aliases.json. 18 | 19 | Then we run the following bits of Java from the HTML2Stanford: 20 | HTML2Text (need the BoilerPipe jar) 21 | Text2Parsed2JSON (careful to use the CollapsedCCproccessed dependencies, best a more recent version of Stanford CoreNLP (>3.5) that outputs straight to json) 22 | 23 | From this we obtain a large number of html pages, converted to text, parsed with Stanford CoreNLP. 24 | 25 | And then: 26 | 27 | 1. buildMatrix.py: This processes the preprocessed HTML pages and builds a json file which is a dictionary from pattern (string or lexicalized dependencies) to countries/locations and then to the values. 28 | 2. matrixFiltering.py: this takes the matrix from the previous step and filters its values and patterns to avoid those without enough entries or those whose entries have too much deviation so they cannot be sensibly averaged. Also uses the aliases to merge the values for different location names used in the experiments. From this we get the file data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json. 29 | 30 | 3. Split the data from Freebase (data/allCountriesPost2010-2014Filtered15-150.json) into training/dev (data/train.json) and test (data/test.json). 31 | 32 | 4. To reproduce the IE-style evaluation results 33 | 34 | - informedGuess: 35 | 36 | ```python src/main/fixedValuePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/informedGuess``` 37 | 38 | - unadjustedMAPE: 39 | 40 | ```python src/main/baselinePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/unadjustedMAPE FALSE``` 41 | 42 | - adjustedMAPE: 43 | 44 | ```python src/main/baselinePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/adjustedMAPE TRUE``` 45 | 46 | To run the fact-checker on the HTML pages obtained from the web: 47 | 48 | First create a directory for the output, i.e.: 49 | 50 | ```mkdir out``` 51 | 52 | Then run 53 | 54 | ```python src/main/factChecker.py data/allCountriesPost2010-2014Filtered15-150.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json population 0.03125 data/htmlPages2textPARSEDALL data/locationNames data/aliases.json out/population.tsv``` 55 | 56 | The directory data/htmlPages2textPARSEDALL is not on github due to its size (1.6GB compressed), but feel free to ask me for it. 57 | 58 | This is run for each of the 16 properties independently. The parameter for adjusted MAPE used in the paper was set according to the IE experiments. Here is the table the setting for each property: 59 | 60 | - gni_per_capita_in_ppp_dollars: 8 61 | - gdp_nominal: 0.03125 62 | - internet_users_percent_population: 2 63 | - cpi_inflation_rate: 2 64 | - health_expenditure_as_percent_of_gdp: 2 65 | - gdp_growth_rate: 1 66 | - fertility_rate: 0.5 67 | - consumer_price_index: 1 68 | - prevalence_of_undernourisment: 32 69 | - gni_in_ppp_dollars: 16 70 | - population_growth_rate: 0.0078125 71 | - diesel_price_liter: 2 72 | - life_expectancy: 1 73 | - population: 0.03125 74 | - gdp_nominal_per_capita: 16 75 | - renewable_freshwater_per_capita: 8 76 | 77 | The output for each relation is a .tsv file which can be loaded in Excel. We did this and labeled the claims. The files from which the results in Table 2 are obtained are in data/labeled_claims. 78 | -------------------------------------------------------------------------------- /data/aliases.json: -------------------------------------------------------------------------------- 1 | {"Canada": ["Canuckistan", "Dominion of Canada"], "Turkmenistan": [], "Montenegro": ["Montenegr"], "Lithuania": ["Litauen", "Lietuva", "Republic of Lithuania", "Lietuvos Respublika"], "Cambodia": [], "Ethiopia": ["Federal Democratic Republic of Ethiopia"], "Aruba": [], "Sri Lanka": ["Ceylon", "Democratic Socialist Republic of Sri Lanka"], "Swaziland": [], "Argentina": ["Agrentina ", "Argentine Republic", "The Argentine"], "Bolivia": ["The Plurinational State of Bolivia"], "Cameroon": ["Republic of Cameroon", "R\u00e9publique du Cameroun", "Africa in miniature"], "Burkina Faso": [], "Bahrain": ["Kingdom of Bahrain"], "Saudi Arabia": ["Kingdom of Saudi Arabia"], "Cape Verde": ["Republic of Cape Verde"], "Slovenia": ["Slovania", "Republic of Slovenia", "Repubblica di Slovenia"], "Guatemala": ["Guatemala."], "Bosnia and Herzegovina": ["Bosnia", "Bosnia y Herzegovina", "Bosnia-Herzegovina", "BiH"], "Kuwait": [], "Dominica": ["Commonwealth of Dominica"], "Australia": ["Down Under"], "Liberia": [], "Maldives": ["Republic of Maldives", "Maldive Islands", "Republic of the Maldives"], "Oman": ["Sultanate of Oman", "\u0639\u064f\u0645\u0627\u0646", "\u0633\u0644\u0637\u0646\u0629 \u0639\u064f\u0645\u0627\u0646\u200e"], "C\u00f4te d\u2019Ivoire": ["Cote d'Ivoire", "Ivory Coast", "Republic of C\u00f4te d'Ivoire", "C\u221a\u00a5te d\u201a\u00c4\u00f4Ivoire"], "Gabon": ["The Gabonese Republic"], "New Zealand": ["Aotearoa", "God's own", "NZ", "Land of the long white cloud"], "Yemen": ["El-\u1e24odeidah", "Fourth Governorate", "Baladiyat `Adan", "Qa\u1e11\u0101\u2019 Al \u1e28udaydah", "Iv", "Al Mu\u1e29\u0101faz\u0327Ah Ar R\u0101bi\u2018Ah", "Baladiyatadan", "Hodeida", "Muhafazat Shabwah", "Governorate Al Jawf", "Al Hudaydah", "Elhodeidah", "Al Muhafazah Ar Rabi`Ah", "Mu\u1e29\u0101faz\u0327At Al Jawf", "Muhafazat Al Jawf", "Governorate Number Four", "Hodeidah", "Almuhafazaharrabiah", "Al \u1e28udaydah", "Shabwah", "`Adan", "Aden", "\u2018Adan", "Al Jawf", "Balad\u012byat \u2018Adan", "Qada' Al Hudaydah", "El-Hodeidah", "Mu\u1e29\u0101faz\u0327At Shabwah", "Adan", "Qadaalhudaydah"], "Pakistan": ["Federation of Pakistan", "Islamic Republic of Pakistan"], "Albania": ["Republic of Albania"], "Samoa": ["Western Samoa", "Independent State of Samoa", "S\u0101moa", "Malo Sa'oloto Tuto'atasi o S\u0101moa"], "Macau": ["Macao", "Macao Special Administrative Region of the People's Republic of China", "Macao SAR"], "United Arab Emirates": ["UAE", "Emirates"], "Uruguay": ["Eastern Republic of the Uruguay", "Oriental Republic of Uruguay", "Republic East of the Uruguay", "Eastern Republic of Uruguay"], "India": ["Bharat", "Hindustan", "The Republic of India", "Republic of India", "Bharat Ganrajya"], "Azerbaijan": ["Republic of Azerbaijan", "Az\u0259rbaycan Respublikas\u0131"], "Lesotho": [], "Saint Vincent and the Grenadines": ["SVG", "St. Vincent and the Grenadines"], "Kenya": ["Republic of Kenya"], "South Korea": ["Republic of Korea", "ROK", "Daehan Minguk", "Korea"], "Tajikistan": [], "Afghanistan": ["Islamic Republic of Afghanistan", "Afganistan"], "Bangladesh": [], "Eritrea": ["State of Eritrea"], "Solomon Islands": [], "Saint Lucia": ["St. Lucia"], "Cyprus": [], "Mongolia": [], "France": ["R\u00e9publique fran\u00e7aise", "French Republic", "l'Hexagone"], "Rwanda": [], "Slovakia": ["Slovensko", "The Slovak Republic"], "Vanuatu": [], "Norway": ["Norge", "Kingdom of Norway"], "Malawi": [], "Benin": [], "Federated States of Micronesia": [], "Singapore": ["Singapura", "Republic of Singapore"], "United States of America": ["America", "U.S.", "USA", "United States", "US", "U.S.A.", "U.S. of A.", "Estats Units d'Am\u00e8rica", "The States"], "Saint Kitts and Nevis": ["St. Kitts and Nevis"], "Togo": [], "Armenia": ["Republic of Armenia"], "Timor-Leste": ["Timor Timur", "East Timor"], "Dominican Republic": ["Rep\u00fablica Dominicana"], "Ukraine": ["Ukraine Region"], "Ghana": ["Republic of Ghana"], "Tonga": ["Pule\u02bbanga Fakatu\u02bbi \u02bbo Tonga", "Kingdom of Tonga"], "Finland": ["Suomi", "Republic of Finland"], "Libya": [], "Indonesia": ["Republic of Indonesia", "The Republic of Indonesia"], "Central African Republic": [], "Mauritius": ["Republic of Mauritius"], "Sweden": ["Sweden, Maine"], "Vietnam": ["Socialist Republic of Vietnam", "Republic of Vietnam", "Annam", "Viet nam"], "Mali": [], "Russia": ["\u0420\u043e\u0441\u0441\u0438\u044f", "Russian Federation", "\u041e\u0440\u0434\u0430"], "Bulgaria": ["Republic of Bulgaria"], "Romania": ["Rom\u00e2nia"], "Angola": [], "Portugal": ["Portuguese Republic"], "South Africa": ["Republiek van Suid-Afrika", "Republic of South Africa"], "Fiji": [], "Qatar": ["State of Qatar", "Dawlat Qa\u1e6dar"], "Malaysia": [], "Austria": ["\u00d6sterreich", "Autriche", "Oesterreich", "Republic of Austria"], "Mozambique": ["Republica de Mocambique", "Mocambique", "Republic of Mozambique"], "Uganda": ["The Republic of Uganda", "Republic of Uganda"], "Japan": ["Nihon", "Nippon", "\u042f\u043f\u043e\u043d\u0438\u044f", "Nippon-koku", "Nihon-koku", "State of Japan", "Land of the Rising Sun", "Dai-Nippon", "NTSC J", "Japan", "JAP", "JPN"], "Niger": ["Republic of Niger"], "Brazil": ["Brasil", "Rep\u00fablica Federativa do Brasil", "Brazilian ", "Federative Republic of Brazil"], "Guinea": [], "Guyana": ["El Dorado"], "Costa Rica": ["Republic of Costa Rica"], "Republic of Ireland": ["\u00c9ire", "the twenty-six counties", "the 26 counties", "the Free State", "Irish Republic"], "Bahamas": ["Commonwealth of The Bahamas"], "Nigeria": ["Federal Republic of Nigeria"], "Ecuador": ["Republic of Ecuador", "Rep\u00fablica del Ecuador"], "Czech Republic": ["Bohemia"], "Brunei": ["Brunei Darussalam", "Nation of Brunei, the Abode of Peace"], "Belarus": ["Bellarussiya", "Republic of Belarus", "Bielaru\u015b", "Respublika Belarus\u2019"], "Iran": ["Islamic Republic of Iran", "Persia"], "Algeria": ["People's Democratic Republic of Algeria", "The People's Democratic Republic of Algeria"], "El Salvador": [], "Chile": ["Republic of Chile"], "Puerto Rico": ["Borinquen", "Commonwealth of Puerto Rico"], "Belgium": ["Belgi\u00eb", "Belgique", "Kingdom of Belgium"], "Thailand": ["Kingdom of Thailand", "Siam"], "Haiti": ["Republic of Haiti", "Ha\u00efti"], "Belize": [], "Hong Kong": ["Hong Kong Special Administrative Region", "Hongkong Island", "Hong Kon", "Hongkong", "Hong Kong Special Administrative Region of the People's Republic of China"], "Sierra Leone": [], "Georgia": ["Republic of Georgia"], "Gambia": [], "Philippines": ["Pearl of the Orient Seas", "philippines", "The Philippines", "Republic of the Philippines", "Republika ng Pilipinas", "\u30d5\u30a3\u30ea\u30d4\u30f3\u5171\u548c\u56fd", "Philippiness", "\ud544\ub9ac\ud540 \uacf5\ud654\uad6d"], "Moldova": ["Republic of Moldova", "Moldavia"], "Morocco": ["Kingdom of Morocco", "The Western Kingdom", "The West", "Regne del Marroc"], "Namibia": [], "Guinea-Bissau": [], "Kiribati": ["Republic of Kiribati", "Gilbert Islands"], "Switzerland": ["Svizzera", "Svizra", "Schweiz", "Swiss Confederation", "La Suisse", "Helvetia"], "Seychelles": ["Republic of Seychelles", "Repiblik Sesel", "R\u00e9publique des Seychelles"], "Chad": ["T\u0161\u0101d", "Tchad", "Republic of Chad", "R\u00e9publique du Tchad", "\u01e6umh\u016briyyat T\u0161\u0101d"], "Estonia": ["Estland", "Republic of Estonia"], "Kosovo": ["Kosovo i Metohija"], "Uzbekistan": ["Republic of Uzbekistan"], "Djibouti": ["Republic of Djibouti"], "Antigua and Barbuda": [], "Spain": ["Espa\u00f1a", "Kingdom of Spain", "Furija"], "Colombia": ["\u30b3\u30ed\u30f3\u30d3\u30a2\u5171\u548c\u56fd", "Republic of Colombia"], "Burundi": [], "Nicaragua": ["Rep\u00fablica de Nicaragua", "Republic of Nicaragua"], "Barbados": [], "Madagascar": ["Republic of Madagascar", "Malagasy Republic"], "Palau": ["Palau Islands"], "Bhutan": [], "Sudan": ["Republic of the Sudan", "North Sudan", "Jumh\u016br\u012byat as-S\u016bd\u0101n"], "Laos": ["Lao People's Democratic Republic", "\ub77c\uc624\uc2a4"], "Democratic Republic of the Congo": ["Zaire", "Democratic Republic of Congo", "DR Congo", "DROC", "Congo Kinshasa", "RDC", "Congo-Kinshasa"], "Netherlands": ["Holland", "Nederland", "The Netherlands", "Nederlands", "Hollanti"], "Suriname": ["Republic of Suriname", "Surinam"], "S\u00e3o Tom\u00e9 and Pr\u00edncipe": ["Sao Tome and Principe", "Democratic Republic of S\u00e3o Tom\u00e9 and Pr\u00edncipe"], "Venezuela": ["Bolivarian Republic of Venezuela"], "Israel": ["State of Israel"], "Iceland": ["\u00cdsland", "Republic of Iceland"], "Zambia": [], "Senegal": ["R\u00e9publique du S\u00e9n\u00e9gal", "Republic of Senegal"], "Papua New Guinea": [], "Zimbabwe": ["The Republic of Zimbabwe"], "Germany": ["Federal Republic of Germany", "Deutschland", "Bundesrepublik Deutschland", "BRD"], "Denmark": ["D\u00e4nemark", "\u4e39\u9ea6", "Kingdom of Denmark", "Kongeriget Danmark"], "Kazakhstan": [], "Tanzania": ["United Republic of Tanzania", "Jamhuri ya Muungano wa Tanzania"], "Mauritania": ["Islamic Republic of Mauritania"], "Kyrgyzstan": ["Kirgisistan"], "Iraq": [], "North Korea": ["Democratic People's Republic of Korea", "DPRK"], "Trinidad and Tobago": ["Trinidad & Tobago"], "Latvia": ["Lettland", "Republic of Latvia"], "Hungary": ["Magyarorsz\u00e1g", "Hungary, Europe"], "Croatia": ["Croacia", "Croatia/Hrvatska", "Croatie", "Croazia", "Crotaia", "Cro\u00e1cia", "Hirvatistan", "Hravatska", "Hrvatska", "ISO 3166-1:HR", "Kroatia", "Kroatien", "Republika Hrvatska", "Republic of Croatia"], "Syria": ["Syrian Arab Republic"], "Nepal": ["Federal Democratic Republic of Nepal", "Democratic Republic of Nepal"], "Honduras": ["Republic of Honduras", "Spanish Honduras", "Rep\u00fablica de Honduras"], "Myanmar": ["Republic of the Union of Myanmar", "Myanmar (Burma)", "\u30df\u30e3\u30f3\u30de\u30fc\u9023\u90a6\u5171\u548c\u56fd", "Burma"], "Equatorial Guinea": [], "Tunisia": [], "Republic of Macedonia": ["Macedonia", "Makedonija", "Republika Makedonija", "\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430 \u041c\u0430\u043a\u0435\u0434\u043e\u043d\u0438\u0458\u0430", "Macedonia (FYROM)"], "Serbia": ["Republic of Serbia", "\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430 \u0421\u0440\u0431\u0438\u0458\u0430", "Republika Srbija", "Srbija", "Serbijos Respublika"], "Botswana": [], "United Kingdom": ["Britain", "Great Britain", "UK", "The United Kingdom of Great Britain and Northern Ireland", "United Kingdom of Great Britain and Ireland", "U.K.", "United Kingdom of Great Britain and Northern Ireland", "United Kingdom of Great Britain", "GB", "GBR"], "Congo": ["Republic of Congo", "Congo Brazzaville", "Congo-Brazzaville"], "Greece": ["Hellenic Republic", "Hellas"], "Paraguay": ["Republic of Paraguay", "Coraz\u00f3n de Am\u00e9rica", "Heart of America", "Corazon de America"], "Earth": ["Gaia", "The World", "Terra", "\u05d0\u05e8\u05e5", "\u05db\u05d3\u05d5\u05e8 \u05d4\u05d0\u05e8\u05e5", "world", "\u0393\u03b7", "globe", "the Globe", "Planet Earth", "The Earth", "The Blue Planet", "Tellus"], "Comoros": ["Union of the Comoros", "The Comoros"]} -------------------------------------------------------------------------------- /data/featuresKept.json: -------------------------------------------------------------------------------- 1 | ["/location/statistical_region/gni_per_capita_in_ppp_dollars", "/location/statistical_region/gdp_nominal", "/location/statistical_region/internet_users_percent_population", "/location/statistical_region/cpi_inflation_rate", "/location/statistical_region/health_expenditure_as_percent_of_gdp", "/location/statistical_region/gdp_growth_rate", "/location/statistical_region/fertility_rate", "/location/statistical_region/consumer_price_index", "/location/statistical_region/prevalence_of_undernourisment", "/location/statistical_region/gni_in_ppp_dollars", "/location/statistical_region/population_growth_rate", "/location/statistical_region/diesel_price_liter", "/location/statistical_region/life_expectancy", "/location/statistical_region/population", "/location/statistical_region/gdp_nominal_per_capita", "/location/statistical_region/renewable_freshwater_per_capita"] -------------------------------------------------------------------------------- /data/labeled_claims/consumer_price_index_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/consumer_price_index_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/cpi_inflation_rate_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/cpi_inflation_rate_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/diesel_price_liter_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/diesel_price_liter_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/fertility_rate_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/fertility_rate_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/gdp_growth_rate_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_growth_rate_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/gdp_nominal_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_nominal_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/gdp_nominal_per_capita_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_nominal_per_capita_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/gni_in_ppp_dollars_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gni_in_ppp_dollars_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/gni_per_capita_in_ppp_dollars_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gni_per_capita_in_ppp_dollars_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/health_expenditure_as_percent_of_gdp_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/health_expenditure_as_percent_of_gdp_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/internet_users_percent_population_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/internet_users_percent_population_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/life_expectancy_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/life_expectancy_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/population_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/population_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/population_growth_rate_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/population_growth_rate_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/prevalence_of_undernourisment_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/prevalence_of_undernourisment_claims.xlsx -------------------------------------------------------------------------------- /data/labeled_claims/renewable_freshwater_per_capita_claims.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/renewable_freshwater_per_capita_claims.xlsx -------------------------------------------------------------------------------- /data/locationNames: -------------------------------------------------------------------------------- 1 | Saint Vincent and the Grenadines 2 | St. Vincent and the Grenadines 3 | São Tomé and Príncipe 4 | Sao Tome and Principe 5 | Democratic Republic of São Tomé and Príncipe 6 | Saint Kitts and Nevis 7 | St. Kitts and Nevis 8 | Antigua and Barbuda 9 | -------------------------------------------------------------------------------- /data/propertiesOfInterest.json: -------------------------------------------------------------------------------- 1 | {"/location/statistical_region/agriculture_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Agriculture (% of GDP)"}, "/location/statistical_region/greenhouse_gas_emission_intensity": {"expectedType": "/measurement_unit/dated_metric_tons_per_million_ppp_dollars", "name": "Greenhouse gas emission intensity"}, "/location/statistical_region/internet_users": {"expectedType": "/measurement_unit/dated_integer", "name": "Internet users"}, "/location/statistical_region/market_cap_of_listed_companies_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Market capitalization of listed companies (% of GDP)"}, "/location/statistical_region/diesel_price_liter": {"expectedType": "/measurement_unit/dated_money_value", "name": "Diesel price (per liter)"}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"expectedType": "/measurement_unit/dated_money_value", "name": "Foreign direct investment, net inflows"}, "/location/statistical_region/gdp_deflator_change": {"expectedType": "/measurement_unit/dated_percentage", "name": "GDP deflator change"}, "/location/statistical_region/greenhouse_gas_emission": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "Greenhouse gas emission"}, "/location/statistical_region/gdp_nominal": {"expectedType": "/measurement_unit/dated_money_value", "name": "GDP (nominal)"}, "/location/statistical_region/smoking_prevalence_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Smoking prevalence rate"}, "/location/statistical_region/lending_interest_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Lending interest rate"}, "/location/statistical_region/deposit_interest_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Deposit interest rate"}, "/location/statistical_region/co2_emissions_mobile": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - mobile"}, "/location/statistical_region/part_time_employment_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Part time employment"}, "/location/statistical_region/prevalence_of_undernourisment": {"expectedType": "/measurement_unit/dated_percentage", "name": "Prevalence of undernourisment"}, "/location/statistical_region/consumer_price_index": {"expectedType": "/measurement_unit/dated_index_value", "name": "Consumer price index"}, "/location/statistical_region/co2_emissions_residential": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - residential"}, "/location/statistical_region/brain_drain_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Emigration of university educated workforce"}, "/location/statistical_region/life_expectancy": {"expectedType": "/measurement_unit/dated_float", "name": "Life expectancy"}, "/location/statistical_region/gross_savings_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Gross Savings (% of GDP)"}, "/location/statistical_region/size_of_armed_forces": {"expectedType": "/measurement_unit/dated_integer", "name": "Size of armed forces"}, "/location/statistical_region/co2_emissions_industrial": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - industrial"}, "/location/statistical_region/natural_gas_production": {"expectedType": "/location/natural_gas_production", "name": "Natural gas production"}, "/location/statistical_region/internet_users_percent_population": {"expectedType": "/measurement_unit/dated_percentage", "name": "Internet users as percentage of population"}, "/location/statistical_region/cpi_inflation_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Inflation rate"}, "/location/statistical_region/rent50_2": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 2 br"}, "/location/statistical_region/rent50_3": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 3 br"}, "/location/statistical_region/rent50_0": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 0 br"}, "/location/statistical_region/rent50_1": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 1 br"}, "/location/statistical_region/global_competitiveness_index": {"expectedType": "/measurement_unit/dated_float", "name": "Global Competitiveness Index"}, "/location/statistical_region/energy_use_per_capita": {"expectedType": "/measurement_unit/dated_kgoe", "name": "Energy use per capita"}, "/location/statistical_region/rent50_4": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 4 br"}, "/location/statistical_region/oil_production": {"expectedType": "/location/oil_production", "name": "Oil production"}, "/location/statistical_region/greenhouse_gas_emissions_per_capita": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "Greenhouse gas emissions per capita"}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Health expenditure (% of GDP)"}, "/location/statistical_region/time_required_to_start_a_business": {"expectedType": "/measurement_unit/dated_days", "name": "Time required to start a business"}, "/location/statistical_region/gdp_nominal_per_capita": {"expectedType": "/measurement_unit/dated_money_value", "name": "GDP (nominal per capita)"}, "/location/statistical_region/child_labor_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Child labor (% of children ages 7-14)"}, "/location/statistical_region/gdp_growth_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "GDP growth rate"}, "/location/statistical_region/literacy_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Literacy rate"}, "/location/statistical_region/fertility_rate": {"expectedType": "/measurement_unit/dated_float", "name": "Fertility rate"}, "/location/statistical_region/tax_revenue_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Tax revenue (% of GDP)"}, "/location/statistical_region/debt_service_as_percent_of_trade_volume": {"expectedType": "/measurement_unit/dated_percentage", "name": "Debt service (% of trade volume)"}, "/location/statistical_region/net_migration": {"expectedType": "/measurement_unit/dated_integer", "name": "Net migration"}, "/location/statistical_region/electricity_production": {"expectedType": "/location/electricity_production", "name": "Electricity production"}, "/location/statistical_region/automobiles_per_capita": {"expectedType": "/measurement_unit/dated_float", "name": "Automobiles per capita"}, "/location/statistical_region/electricity_consumption_per_capita": {"expectedType": "/measurement_unit/dated_kilowatt_hour", "name": "Electricity consumption per capita"}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"expectedType": "/measurement_unit/dated_money_value", "name": "GNI per capita in PPP dollars"}, "/location/statistical_region/co2_emissions_total": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - total"}, "/location/statistical_region/poverty_rate_2dollars_per_day": {"expectedType": "/measurement_unit/dated_float", "name": "Poverty rate ($2 per day)"}, "/location/statistical_region/military_expenditure_percent_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Military expenditure as percentage of GDP"}, "/location/statistical_region/arithmetic_population_density": {"expectedType": "/measurement_unit/dated_float", "name": "Arithmetic population density (per km\u00b2)"}, "/location/statistical_region/services_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Services (% of GDP)"}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Trade balance (% of GDP)"}, "/location/statistical_region/gas_price_liter": {"expectedType": "/measurement_unit/dated_money_value", "name": "Gas price (per liter)"}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Merchandise trade (% of GDP)"}, "/location/statistical_region/hiv_prevalence_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "HIV prevalence rate"}, "/location/statistical_region/labor_participation_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Labor participation rate"}, "/location/statistical_region/government_debt_percent_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Government debt as percent of GDP"}, "/location/statistical_region/population_growth_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Population growth rate"}, "/location/statistical_region/minimum_wage": {"expectedType": "/measurement_unit/recurring_money_value", "name": "Minimum wage"}, "/location/statistical_region/industry_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Industry (% of GDP)"}, "/location/statistical_region/exports_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Exports as percent of GDP"}, "/location/statistical_region/net_workers_remittances": {"expectedType": "/measurement_unit/dated_money_value", "name": "Net workers' remittances"}, "/location/statistical_region/external_debt_stock": {"expectedType": "/measurement_unit/dated_money_value", "name": "External debt stock"}, "/location/statistical_region/unemployment_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Unemployment Rate"}, "/location/statistical_region/broadband_penetration_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Broadband penetration rate"}, "/location/statistical_region/official_development_assistance": {"expectedType": "/measurement_unit/dated_money_value", "name": "Official development assistance"}, "/location/statistical_region/co2_emissions_per_capita": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "CO2 emissions per capita"}, "/location/statistical_region/gross_capital_formation_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Gross capital formation (% of GDP)"}, "/location/statistical_region/gdp_real": {"expectedType": "/measurement_unit/adjusted_money_value", "name": "GDP real"}, "/location/statistical_region/population": {"expectedType": "/measurement_unit/dated_integer", "name": "Population"}, "/location/statistical_region/co2_emissions_commercial": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - commercial"}, "/location/statistical_region/household_consumption_expenditure_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Household consumption expenditures (% of GDP)"}, "/location/statistical_region/imports_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Imports as percent of GDP"}, "/location/statistical_region/renewable_freshwater_per_capita": {"expectedType": "/measurement_unit/dated_cubic_meters", "name": "Renewable freshwater resources per capita"}, "/location/statistical_region/high_tech_as_percent_of_manufactured_exports": {"expectedType": "/measurement_unit/dated_percentage", "name": "High-tech as % of manufactured exports"}, "/location/statistical_region/long_term_unemployment_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Long term unemployment rate"}, "/location/statistical_region/gni_in_ppp_dollars": {"expectedType": "/measurement_unit/dated_money_value", "name": "GNI in PPP dollars"}} -------------------------------------------------------------------------------- /data/test.json: -------------------------------------------------------------------------------- 1 | {"/location/statistical_region/size_of_armed_forces": {"Qatar": 11800.0, "Turkmenistan": 22000.0, "Eritrea": 201750.0, "Sudan": 264300.0, "Lithuania": 23350.0, "Bahamas": 850.0, "Rwanda": 35000.0, "Bolivia": 83200.0, "Venezuela": 115000.0, "Bangladesh": 220950.0, "Bahrain": 19460.0, "Brunei": 9250.0, "Israel": 184550.0, "Australia": 57050.0, "Iran": 563000.0, "Algeria": 317200.0, "Singapore": 147600.0, "Cameroon": 23100.0, "Japan": 260086.0, "United States of America": 1520100.0, "Guatemala": 42300.0, "Belgium": 34050.0, "Thailand": 474550.0, "Dominican Republic": 39500.0, "Belize": 1050.0, "Ghana": 15500.0, "Kyrgyzstan": 20400.0, "Netherlands": 43300.0, "Gambia": 800.0, "Finland": 25000.0, "Morocco": 245800.0, "Sweden": 21300.0, "Belarus": 158000.0, "Mali": 12150.0, "Syria": 178000.0, "New Zealand": 8550.0, "South Korea": 659500.0, "Honduras": 20000.0, "Myanmar": 513250.0, "Portugal": 90300.0, "Uruguay": 25450.0, "Tunisia": 47800.0, "Cyprus": 12750.0, "Uzbekistan": 68000.0, "Malaysia": 133600.0, "Senegal": 18600.0, "Antigua and Barbuda": 180.0, "Greece": 148350.0, "Kenya": 29100.0, "Niger": 10700.0, "Fiji": 3500.0}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"Canada": 42530.0, "Afghanistan": 1400.0, "Madagascar": 950.0, "Bhutan": 6310.0, "Kuwait": 53820.0, "Nepal": 1500.0, "Qatar": 87030.0, "France": 36720.0, "Bahamas": 29740.0, "Ethiopia": 1140.0, "Slovakia": 24770.0, "Swaziland": 4840.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 1850.0, "Nigeria": 2420.0, "Malawi": 880.0, "Federated States of Micronesia": 4090.0, "Israel": 27120.0, "Australia": 43300.0, "Singapore": 61100.0, "Iceland": 33840.0, "Zambia": 1620.0, "United States of America": 50610.0, "Guatemala": 4960.0, "Germany": 41890.0, "Thailand": 9430.0, "Haiti": 1240.0, "Belize": 6070.0, "Spain": 32320.0, "Ukraine": 7290.0, "Paraguay": 5610.0, "Tanzania": 1590.0, "Central African Republic": 860.0, "Trinidad and Tobago": 22400.0, "Sweden": 44150.0, "Vietnam": 3440.0, "Namibia": 7470.0, "Earth": 12128.69, "Switzerland": 56240.0, "New Zealand": 29960.0, "Yemen": 2180.0, "Pakistan": 3030.0, "Iraq": 4300.0, "Honduras": 3890.0, "Chad": 1320.0, "Portugal": 24770.0, "Democratic Republic of the Congo": 370.0, "United Arab Emirates": 48220.0, "Uruguay": 15570.0, "Azerbaijan": 9410.0, "Malaysia": 16530.0, "Senegal": 1920.0, "Antigua and Barbuda": 19260.0, "Burundi": 560.0, "Kenya": 1760.0, "Botswana": 16520.0}, "/location/statistical_region/gdp_nominal": {"Madagascar": 9946995390.0, "Palau": 179912816.0, "Guinea-Bissau": 973427457.0, "Kenya": 33620684016.0, "Nepal": 18884495628.0, "Kyrgyzstan": 5918610958.0, "Mongolia": 8557529910.0, "Bahamas": 7787514000.0, "Switzerland": 635650112360.0, "Democratic Republic of the Congo": 15642236881.0, "Vanuatu": 819227088.0, "Bolivia": 24426829466.0, "Ecuador": 67002768302.0, "Bahrain": 22945456867.0, "Brunei": 12369689792.0, "Belarus": 55136144037.0, "Iran": 482445000000.0, "Singapore": 239699598462.0, "Iceland": 14059073613.0, "Republic of Ireland": 217275000000.0, "Togo": 3594513925.0, "Zimbabwe": 9900000000.0, "Germany": 3600833333330.0, "Bosnia and Herzegovina": 18088238054.0, "Saint Vincent and the Grenadines": 687993693.0, "Haiti": 7346156703.0, "Belize": 1474000000.0, "Tanzania": 23705302064.0, "Dominica": 482277143.0, "Senegal": 14291456855.0, "Tonga": 435589200.0, "Maldives": 2050135788.0, "Federated States of Micronesia": 318100000.0, "Oman": 71781535039.0, "C\u00f4te d\u2019Ivoire": 24074625536.0, "Finland": 266070833333.0, "Equatorial Guinea": 19789801404.0, "Trinidad and Tobago": 22483115868.0, "Sweden": 538131124807.0, "Croatia": 63850068202.0, "Guyana": 2259288026.0, "Mali": 10589925352.0, "Namibia": 12300698896.0, "Yemen": 33757503322.0, "Pakistan": 211091994819.0, "Morocco": 100221001988.0, "Chad": 9485741541.0, "Estonia": 22184722472.0, "Uruguay": 46709797684.0, "Kosovo": 6446205171.0, "India": 1842000000000.0, "Austria": 418483975383.0, "Timor-Leste": 1054000000.0, "Uganda": 16809623489.0, "Sri Lanka": 59172135299.0, "Myanmar": 51925000000.0, "Niger": 6016960988.0, "Nicaragua": 7297481501.0}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"Brazil": -68093253945.0, "Canada": 8683048195.0, "Hungary": -2946777396.0, "Cambodia": -872503569.0, "France": -1761634578.0, "Rwanda": -106210000.0, "Laos": -300743507.0, "Seychelles": -136777269.0, "Norway": 12234909873.0, "Benin": -194717291.0, "Israel": -7236800000.0, "Zambia": -831500000.0, "United States of America": 176768000000.0, "Cape Verde": -49165747.0, "Papua New Guinea": -28720688.0, "Slovenia": -240865771.0, "Guatemala": -1030405000.0, "Armenia": -473273917.0, "Thailand": 3290308028.0, "Haiti": -181000000.0, "Belize": -193325682.0, "Hong Kong": -229107860.0, "Sierra Leone": -714974888.0, "Dominica": -34259368.0, "Ukraine": -7015000000.0, "Kyrgyzstan": -693589500.0, "Georgia": -602622655.0, "Mauritius": 600533078.0, "Sweden": 19903849686.0, "Latvia": -796300000.0, "Guinea-Bissau": -27709745.0, "Mali": -398496178.0, "New Zealand": -1767785208.0, "Yemen": 712813329.0, "Bulgaria": -1669004093.0, "Iraq": -1030200000.0, "Angola": 5116413413.0, "Estonia": -576823972.0, "Portugal": -6939732362.0, "Uruguay": -2206070525.0, "Tunisia": -432666012.0, "Republic of Macedonia": -140066689.0, "Azerbaijan": -812407000.0, "Nicaragua": -810000000.0, "Djibouti": -79000231.0, "Mozambique": -2090083915.0, "Uganda": -1721169095.0, "Paraguay": -483366667.0, "Antigua and Barbuda": -57723257.0, "South Korea": 18628100000.0, "Tajikistan": -11142170.0}, "/location/statistical_region/life_expectancy": {"Swaziland": 48.659, "Bhutan": 67.285, "Eritrea": 61.417, "France": 81.668, "Bahamas": 75.452, "Slovakia": 75.959, "Suriname": 70.581, "Argentina": 75.798, "Cameroon": 51.576, "Turkmenistan": 64.998, "Federated States of Micronesia": 68.948, "Algeria": 73.08, "Lesotho": 47.984, "Zambia": 48.969, "Papua New Guinea": 62.801, "Togo": 57.027, "Zimbabwe": 51.236, "Germany": 80.741, "Puerto Rico": 79.028, "Thailand": 74.091, "Haiti": 62.062, "Kazakhstan": 68.893, "Sierra Leone": 47.776, "Ukraine": 70.809, "Liberia": 56.743, "Gambia": 58.485, "Philippines": 68.757, "Finland": 80.471, "Aruba": 75.113, "Moldova": 69.212, "Bangladesh": 68.937, "Trinidad and Tobago": 69.963, "Vietnam": 75.051, "Croatia": 76.876, "Guinea-Bissau": 48.113, "Switzerland": 82.695, "Yemen": 65.452, "Seychelles": 73.456, "Albania": 77.042, "Chad": 49.523, "Estonia": 76.127, "Equatorial Guinea": 51.137, "Tunisia": 74.754, "Republic of Macedonia": 74.788, "India": 65.478, "Azerbaijan": 70.653, "Uzbekistan": 68.265, "Malaysia": 74.261, "Senegal": 59.272, "Timor-Leste": 62.461, "Colombia": 73.642, "Greece": 80.744, "Paraguay": 72.485, "Namibia": 62.332, "Niger": 54.691, "Cyprus": 79.563, "Comoros": 61.042}, "/location/statistical_region/internet_users_percent_population": {"Brazil": 49.847999, "Afghanistan": 5.454545, "Republic of Macedonia": 63.1477, "Turkmenistan": 7.1958, "Mauritania": 5.3691, "Sudan": 21.0, "Nepal": 11.1493, "Tonga": 34.8609, "Cambodia": 4.939862, "Democratic Republic of the Congo": 1.679961, "Aruba": 74.0, "Cyprus": 61.0, "Bolivia": 34.188434, "Norway": 95.0, "Burkina Faso": 3.725035, "Ghana": 17.107678, "Slovakia": 80.0, "Australia": 82.349549, "Iran": 25.997636, "Slovenia": 70.0, "Zambia": 13.4682, "Senegal": 19.2036, "Papua New Guinea": 2.301957, "Togo": 4.0, "Guatemala": 16.0, "Hong Kong": 72.8, "Tanzania": 13.0803, "Liberia": 3.7941, "Kyrgyzstan": 21.723509, "Georgia": 45.503098, "Oman": 60.0, "Philippines": 36.2351, "Indonesia": 15.36, "Singapore": 74.1818, "Equatorial Guinea": 13.943182, "Sweden": 94.0, "Belarus": 46.906006, "Gabon": 8.616714, "Mongolia": 16.4, "Switzerland": 85.2, "Yemen": 17.4465, "Angola": 16.93721, "Estonia": 79.0, "Uruguay": 55.1146, "United Arab Emirates": 85.0, "South Africa": 41.0, "Serbia": 48.1, "United Kingdom": 87.0162, "Lesotho": 4.589618, "Djibouti": 8.267233, "Congo": 6.106695, "Antigua and Barbuda": 83.787167, "Greece": 56.0, "Paraguay": 27.0758, "Croatia": 63.0, "Tajikistan": 14.51, "Botswana": 11.5, "Barbados": 73.329814}, "/location/statistical_region/cpi_inflation_rate": {"Brazil": 5.4, "Qatar": 1.87, "Liberia": 6.83, "United States of America": 2.07, "Mali": 5.43, "Cyprus": 2.39, "Cambodia": 2.93, "Burundi": 18.01, "Rwanda": 6.27, "Slovakia": 3.61, "Nigeria": 12.22, "Cameroon": 2.94, "Malawi": 21.27, "Saudi Arabia": 4.48, "Australia": 1.76, "Montenegro": 3.18, "Burkina Faso": 3.82, "Germany": 2.01, "Bosnia and Herzegovina": 2.05, "Oman": 2.91, "Antigua and Barbuda": 3.38, "Dominican Republic": 3.69, "Iraq": 2.88, "Hong Kong": 4.06, "Sierra Leone": 12.87, "Mauritania": 4.94, "Tonga": 1.21, "Georgia": -0.94, "Denmark": 2.41, "Tanzania": 16.0, "Finland": 2.81, "Moldova": 4.68, "Morocco": 1.28, "Latvia": 2.25, "Gabon": 2.66, "Guinea-Bissau": 2.13, "Thailand": 3.01, "New Zealand": 0.88, "Nepal": 9.45, "Russia": 5.07, "Romania": 3.33, "Seychelles": 7.11, "Albania": 2.03, "Uruguay": 8.1, "Tunisia": 5.5, "Fiji": 4.33, "Comoros": 0.87, "United Kingdom": 2.82, "Lesotho": 6.1, "Congo": 3.89, "Timor-Leste": 11.8, "Greece": 1.5, "Sri Lanka": 6.83, "Namibia": 6.54, "Nicaragua": 7.19, "Botswana": 7.54}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"Qatar": 1.91, "Palau": 10.65, "Solomon Islands": 8.83, "Saint Lucia": 7.19, "Nepal": 5.44, "Costa Rica": 10.87, "Mongolia": 5.26, "France": 11.63, "Bahamas": 7.68, "Democratic Republic of the Congo": 8.55, "Rwanda": 10.76, "Laos": 2.77, "Belize": 5.65, "Argentina": 8.11, "Norway": 9.07, "Israel": 7.73, "Australia": 9.03, "Iran": 5.95, "El Salvador": 6.78, "Cape Verde": 4.76, "Slovenia": 9.06, "Germany": 11.06, "Armenia": 4.33, "Gambia": 4.39, "Thailand": 4.06, "Haiti": 7.95, "Iraq": 8.3, "Dominica": 5.87, "Ukraine": 7.19, "Kyrgyzstan": 6.49, "Oman": 2.34, "Finland": 8.85, "Saudi Arabia": 3.69, "Moldova": 11.37, "Trinidad and Tobago": 5.73, "Latvia": 6.17, "Gabon": 3.22, "Cambodia": 5.69, "Kenya": 4.49, "New Zealand": 10.08, "Bulgaria": 7.27, "Pakistan": 2.51, "Seychelles": 3.78, "Samoa": 7.03, "Chad": 4.28, "South Africa": 8.52, "Fiji": 3.82, "Serbia": 10.43, "Azerbaijan": 5.24, "Djibouti": 7.87, "Antigua and Barbuda": 5.94, "Uganda": 9.45, "Republic of Macedonia": 6.57, "Sri Lanka": 3.43, "Cyprus": 7.41, "Tajikistan": 5.78, "Botswana": 5.06}, "/location/statistical_region/time_required_to_start_a_business": {"Canada": 5.0, "Afghanistan": 7.0, "Swaziland": 56.0, "Bhutan": 36.0, "Mauritania": 19.0, "Saint Lucia": 15.0, "Costa Rica": 60.0, "Mongolia": 12.0, "Georgia": 2.0, "Mozambique": 13.0, "Suriname": 694.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.0, "Venezuela": 144.0, "Thailand": 29.0, "Ecuador": 56.0, "Benin": 26.0, "Uzbekistan": 12.0, "Israel": 21.0, "Australia": 2.0, "Iran": 13.0, "Algeria": 25.0, "Singapore": 3.0, "Iceland": 5.0, "Zambia": 17.0, "Bosnia and Herzegovina": 37.0, "Armenia": 8.0, "Kiribati": 31.0, "Spain": 28.0, "Liberia": 6.0, "Tonga": 16.0, "Maldives": 9.0, "Brunei": 101.0, "Gambia": 27.0, "Philippines": 36.0, "Central African Republic": 22.0, "Cameroon": 15.0, "Vietnam": 34.0, "Earth": 29.61, "Syria": 13.0, "New Zealand": 1.0, "Bulgaria": 18.0, "Pakistan": 21.0, "Samoa": 9.0, "Chad": 62.0, "South Africa": 19.0, "United Arab Emirates": 8.0, "United Kingdom": 13.0, "Malaysia": 6.0, "Congo": 161.0, "Timor-Leste": 94.0, "Uganda": 33.0, "Burundi": 8.0, "Japan": 23.0, "Niger": 17.0, "Tajikistan": 24.0, "Botswana": 61.0, "Barbados": 18.0}, "/location/statistical_region/net_migration": {"Afghanistan": -381030.0, "Serbia": 0.0, "Saint Lucia": -1000.0, "Lithuania": -35495.0, "Cambodia": -254942.0, "Netherlands": 50006.0, "Swaziland": -6000.0, "Laos": -74998.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -6496.0, "Kuwait": 277629.0, "Burkina Faso": -125000.0, "Ecuador": -120000.0, "Ghana": -51258.0, "Brunei": 3500.0, "Saudi Arabia": 1055517.0, "Algeria": -140000.0, "Iceland": 10417.0, "Zambia": -85000.0, "Republic of Ireland": 100000.0, "Papua New Guinea": 0.0, "Guatemala": -200000.0, "Bosnia and Herzegovina": -10000.0, "Denmark": 90316.0, "Thailand": 492252.0, "Ukraine": -40006.0, "Portugal": 150002.0, "Maldives": -53.0, "Paraguay": -40000.0, "Gambia": -13742.0, "Finland": 72634.0, "Morocco": -675000.0, "Guinea-Bissau": -10000.0, "Guyana": -40000.0, "Switzerland": 182803.0, "Syria": -55877.0, "South Korea": -30000.0, "Pakistan": -1999998.0, "Honduras": -100000.0, "Chad": -75000.0, "Macau": 50625.0, "South Africa": 700001.0, "Tunisia": -20000.0, "Tajikistan": -296075.0, "Uruguay": -50000.0, "India": -2999998.0, "Azerbaijan": 53264.0, "Lesotho": -19998.0, "Djibouti": 0.0, "Timor-Leste": -49930.0, "Uganda": -135000.0, "United Arab Emirates": 3076634.0, "Burundi": 370000.0, "Japan": 270000.0, "Niger": -28497.0, "Fiji": -28754.0, "Comoros": -10000.0}, "/location/statistical_region/gdp_growth_rate": {"Qatar": 18.8, "Bhutan": 9.441666, "Serbia": -1.7, "Costa Rica": 5.128995, "Mongolia": 12.283366, "Ethiopia": 8.5, "Georgia": 6.0, "Suriname": 4.476504, "Laos": 8.164082, "Argentina": 8.869529, "Venezuela": 5.54077, "Malawi": 1.8858, "Ecuador": 5.000245, "Bahrain": 4.5, "Saudi Arabia": 6.774455, "Australia": 3.397707, "El Salvador": 1.64457, "Thailand": 6.434809, "Montenegro": 3.5, "Cape Verde": 4.292214, "Papua New Guinea": 8.0, "Togo": 5.622674, "Zimbabwe": 5.015681, "Germany": 0.671445, "Bosnia and Herzegovina": -0.7, "Guinea": 3.944115, "Belgium": -0.28101, "Saint Vincent and the Grenadines": 1.524261, "Dominican Republic": 3.887683, "Kazakhstan": 5.0, "Spain": -1.418905, "Eritrea": 7.017109, "Netherlands": -0.95665, "Philippines": 6.590642, "Moldova": -0.800035, "Mauritius": 3.165694, "Trinidad and Tobago": 1.244764, "Cambodia": 7.261323, "Guinea-Bissau": -1.5, "Guyana": 4.816459, "Yemen": 0.137234, "Haiti": 2.822235, "Romania": 3.700039, "Albania": 0.8, "Angola": 6.830624, "Macau": 9.948269, "United Arab Emirates": 4.9, "India": 3.236943, "Azerbaijan": 4.45248, "United Kingdom": 0.27268, "Congo": 3.8, "Mozambique": 7.4, "Iceland": 1.639232, "Earth": 2.156773, "South Korea": 2.044099, "Nicaragua": 5.204732, "Botswana": 6.1}, "/location/statistical_region/fertility_rate": {"Canada": 1.677, "Qatar": 2.232, "Bangladesh": 2.202, "Nepal": 2.658, "France": 2.0, "Democratic Republic of the Congo": 5.657, "Rwanda": 5.339, "Slovakia": 1.4, "Suriname": 2.307, "Nigeria": 5.489, "Venezuela": 2.49, "Czech Republic": 1.49, "Federated States of Micronesia": 3.39, "Singapore": 1.15, "El Salvador": 2.217, "Japan": 1.39, "United States of America": 2.1, "Germany": 1.39, "Armenia": 1.736, "Oman": 2.24, "Belgium": 1.84, "Dominican Republic": 2.544, "Belize": 2.79, "Kazakhstan": 2.59, "Mauritania": 4.464, "Kyrgyzstan": 2.898, "Netherlands": 1.79, "Gambia": 4.814, "C\u00f4te d\u2019Ivoire": 4.348, "Indonesia": 2.09, "Central African Republic": 4.546, "Moldova": 1.466, "Burundi": 4.218, "Latvia": 1.17, "Croatia": 1.46, "Guyana": 2.234, "Namibia": 3.15, "Syria": 2.934, "Russia": 1.54, "Bulgaria": 1.49, "Pakistan": 3.423, "Iraq": 4.702, "Albania": 1.523, "Portugal": 1.32, "South Africa": 2.458, "Republic of Macedonia": 1.411, "Uruguay": 1.986, "Azerbaijan": 2.3, "Lesotho": 3.137, "Congo": 4.504, "Timor-Leste": 5.453, "Uganda": 6.052, "Greece": 1.44, "Paraguay": 2.911, "Earth": 2.451, "Tajikistan": 3.24, "Botswana": 2.696}, "/location/statistical_region/consumer_price_index": {"Czech Republic": 121.1, "Guinea": 331.03, "Nepal": 186.27, "Costa Rica": 172.69, "Bahamas": 117.02, "Rwanda": 173.86, "Aruba": 124.76, "Vanuatu": 120.21, "Bolivia": 157.24, "Norway": 114.16, "Ecuador": 136.59, "Benin": 130.05, "Iran": 316.3, "Slovenia": 120.34, "El Salvador": 126.95, "Iceland": 163.08, "Zambia": 177.64, "United States of America": 117.56, "Cape Verde": 129.74, "Togo": 123.79, "Guatemala": 147.82, "Chile": 107.95, "Gambia": 129.28, "Thailand": 123.55, "Belize": 115.66, "Hong Kong": 122.42, "Philippines": 137.24, "Georgia": 153.68, "Oman": 140.81, "C\u00f4te d\u2019Ivoire": 121.17, "Indonesia": 159.96, "Moldova": 172.83, "Morocco": 113.95, "Sweden": 112.05, "Finland": 116.6, "Kenya": 224.6, "Syria": 203.6, "New Zealand": 121.08, "Yemen": 227.65, "South Korea": 123.41, "Pakistan": 221.91, "Macau": 140.79, "Chad": 122.45, "Uruguay": 165.64, "Republic of Macedonia": 123.69, "Equatorial Guinea": 138.03, "South Africa": 154.95, "Djibouti": 144.74, "Congo": 136.63, "Antigua and Barbuda": 119.55, "Uganda": 202.96, "United Arab Emirates": 116.01, "Paraguay": 157.29, "Japan": 99.27, "Niger": 121.14}, "/location/statistical_region/prevalence_of_undernourisment": {"C\u00f4te d\u2019Ivoire": 21.4, "Kenya": 30.4, "Kuwait": 5.0, "Vanuatu": 8.5, "Maldives": 5.6, "Slovakia": 5.0, "United Kingdom": 5.0, "Laos": 27.8, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.7, "Norway": 5.0, "Burkina Faso": 25.9, "Ghana": 5.0, "Brunei": 5.0, "Australia": 5.0, "Iceland": 5.0, "Republic of Ireland": 5.0, "Slovenia": 5.0, "Germany": 5.0, "Bosnia and Herzegovina": 5.0, "Chile": 5.0, "Kiribati": 8.2, "Djibouti": 19.8, "Liberia": 31.4, "Netherlands": 5.0, "Gambia": 14.4, "Philippines": 17.0, "Indonesia": 8.6, "Central African Republic": 30.0, "Cameroon": 15.7, "North Korea": 32.0, "Trinidad and Tobago": 9.3, "Sweden": 5.0, "Latvia": 5.0, "Croatia": 5.0, "Finland": 5.0, "Mali": 7.9, "Switzerland": 5.0, "Syria": 5.0, "Russia": 5.0, "Pakistan": 19.9, "Estonia": 5.0, "Nicaragua": 20.1, "Uzbekistan": 6.1, "Lesotho": 16.6, "Austria": 5.0, "Congo": 37.4, "Timor-Leste": 38.2, "Burundi": 73.4, "Earth": 12.76, "Niger": 12.6, "Cyprus": 5.0, "Comoros": 70.0, "Barbados": 5.0}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"Afghanistan": -27.57, "Kenya": -20.8, "Sudan": -1.29, "India": -7.71, "Saint Lucia": -21.75, "Hungary": 7.43, "Lithuania": -1.49, "Democratic Republic of the Congo": -6.61, "Rwanda": -19.67, "Aruba": -15.66, "Bolivia": 1.11, "Norway": 13.22, "Ghana": -12.85, "Algeria": 9.35, "Singapore": 22.17, "El Salvador": -18.4, "Montenegro": -23.53, "Saint Kitts and Nevis": -11.63, "Bosnia and Herzegovina": -22.57, "Armenia": -23.81, "Kazakhstan": 20.28, "Spain": 1.02, "Mauritania": -36.23, "Tonga": -42.71, "Georgia": -18.8, "Tanzania": -17.39, "Burundi": -28.31, "Sweden": 6.17, "Latvia": -3.83, "Croatia": -0.06, "Mali": -10.71, "Switzerland": 10.77, "Honduras": -21.3, "Bulgaria": 0.69, "Romania": -7.97, "Albania": -20.26, "Samoa": -27.17, "Trinidad and Tobago": 25.25, "Nicaragua": 25.05, "Ethiopia": -14.97, "Equatorial Guinea": 25.85, "Serbia": -17.0, "Azerbaijan": 32.91, "Botswana": -5.55, "Uzbekistan": -0.2, "Lesotho": -61.5, "Vietnam": -0.45, "Mozambique": -16.8, "Colombia": -1.19, "Paraguay": -0.31, "Slovakia": 2.6, "South Korea": 2.03, "Fiji": -11.01, "Comoros": -36.47}, "/location/statistical_region/gni_in_ppp_dollars": {"Canada": 1483585959450.0, "Afghanistan": 40733504335.0, "Turkmenistan": 49869451990.0, "Sudan": 75344423730.0, "Kuwait": 147286887192.0, "Cambodia": 35140720758.0, "Cape Verde": 2144395901.0, "Slovakia": 134028909210.0, "Swaziland": 5957667254.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 348533299.0, "Bolivia": 52092982470.0, "Cameroon": 50337640645.0, "Ecuador": 148518151985.0, "Benin": 15813194294.0, "Ghana": 49204636851.0, "Saudi Arabia": 698483964625.0, "Australia": 982162738360.0, "Iceland": 10832141089.0, "Zambia": 22772173762.0, "Republic of Ireland": 164621089972.0, "Armenia": 20766498855.0, "Mozambique": 25720859707.0, "Kazakhstan": 200736024235.0, "Sierra Leone": 8125419233.0, "Eritrea": 3440313080.0, "Tonga": 539620568.0, "Netherlands": 731483151504.0, "Uganda": 41415418747.0, "Gambia": 3326562005.0, "Tanzania": 73586580060.0, "Indonesia": 1187981528740.0, "Bangladesh": 319938507853.0, "Trinidad and Tobago": 29957349125.0, "Vietnam": 305613341376.0, "Japan": 4629656540970.0, "Switzerland": 449790421104.0, "Syria": 116510985150.0, "France": 2412626106700.0, "South Korea": 1548728614980.0, "Albania": 29704999640.0, "South Africa": 572631363165.0, "Nicaragua": 23721394234.0, "Uruguay": 52852337840.0, "India": 4749213279870.0, "Azerbaijan": 87524263465.0, "Uzbekistan": 111636498122.0, "Malaysia": 483240939374.0, "Senegal": 26323780629.0, "Congo": 15201717418.0, "Saint Vincent and the Grenadines": 1181987293.0, "Colombia": 482247050718.0, "Greece": 287152170395.0, "Hungary": 205925845708.0, "Niger": 11201853091.0, "Cyprus": 25671055725.0}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"Sudan": 20.76, "Kuwait": 73.1, "Lithuania": 145.9, "Cyprus": 39.39, "Mongolia": 108.3, "France": 47.56, "Bahamas": 54.61, "Equatorial Guinea": 121.49, "Aruba": 52.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 57.26, "Norway": 49.45, "Burkina Faso": 52.68, "Brunei": 99.98, "Australia": 34.05, "Venezuela": 41.14, "Zambia": 80.04, "United States of America": 24.75, "Saint Kitts and Nevis": 36.74, "Papua New Guinea": 76.66, "Slovenia": 140.75, "Bosnia and Herzegovina": 89.08, "Guinea": 54.67, "Mozambique": 74.72, "Spain": 46.28, "Ukraine": 86.88, "Portugal": 61.3, "Maldives": 84.08, "Federated States of Micronesia": 74.88, "Oman": 98.84, "Cameroon": 46.43, "Moldova": 101.67, "Trinidad and Tobago": 93.81, "Sweden": 63.65, "Belarus": 146.04, "Namibia": 84.72, "Mali": 49.48, "Croatia": 58.65, "Samoa": 62.27, "Cape Verde": 43.16, "Bulgaria": 116.45, "Pakistan": 29.74, "Romania": 75.49, "Angola": 84.94, "Estonia": 154.63, "Macau": 23.22, "Morocco": 67.73, "United Arab Emirates": 136.02, "Uruguay": 41.5, "Azerbaijan": 62.95, "Uzbekistan": 43.24, "Malaysia": 139.69, "Timor-Leste": 29.55, "Kenya": 60.21, "Niger": 66.99, "Nicaragua": 81.17, "Botswana": 97.12, "Barbados": 61.87}, "/location/statistical_region/labor_participation_rate": {"Brazil": 43.81, "Netherlands": 45.79, "Mauritania": 26.66, "Solomon Islands": 39.6, "Costa Rica": 36.42, "Morocco": 27.3, "France": 47.42, "Uzbekistan": 39.67, "Rwanda": 52.41, "Tanzania": 49.78, "Suriname": 37.33, "Laos": 50.09, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 37.66, "Bolivia": 44.87, "Malawi": 51.35, "Ecuador": 40.02, "Brunei": 41.7, "Belarus": 48.85, "Iran": 18.21, "El Salvador": 41.61, "Zambia": 46.29, "Republic of Ireland": 44.22, "Burkina Faso": 47.56, "Sudan": 28.9, "Zimbabwe": 49.44, "Chile": 39.72, "C\u00f4te d\u2019Ivoire": 37.47, "Spain": 44.35, "Liberia": 47.48, "Kyrgyzstan": 42.75, "Finland": 47.88, "Armenia": 42.0, "Oman": 16.85, "Philippines": 38.94, "Indonesia": 38.05, "Trinidad and Tobago": 42.23, "Sweden": 47.08, "Gabon": 46.34, "Japan": 42.49, "Syria": 15.15, "Russia": 48.91, "Samoa": 34.26, "Estonia": 49.99, "Ethiopia": 47.06, "Tunisia": 27.23, "Republic of Macedonia": 38.59, "Azerbaijan": 48.87, "United Kingdom": 45.99, "Lesotho": 46.06, "Congo": 48.71, "Saint Vincent and the Grenadines": 41.11, "Sri Lanka": 32.55, "Kenya": 46.53, "Nicaragua": 38.1, "Comoros": 30.45}, "/location/statistical_region/population_growth_rate": {"Canada": 1.14, "Afghanistan": 2.44, "Qatar": 7.05, "Palau": 0.72, "Czech Republic": 0.18, "Sudan": 2.08, "Saint Lucia": 0.89, "Nepal": 1.16, "Cambodia": 1.76, "Democratic Republic of the Congo": 2.74, "Earth": 1.15, "Vanuatu": 2.24, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 2.65, "Argentina": 0.88, "Norway": 1.32, "Malawi": 2.86, "Turkmenistan": 1.29, "Ghana": 2.17, "Israel": 1.81, "Australia": 1.6, "Zambia": 3.19, "United States of America": 0.74, "Saint Kitts and Nevis": 1.15, "Togo": 2.6, "Bosnia and Herzegovina": -0.14, "Armenia": 0.17, "Puerto Rico": -0.73, "Antigua and Barbuda": 1.03, "Dominican Republic": 1.26, "Eritrea": 3.28, "Kyrgyzstan": 1.22, "Georgia": 0.63, "Equatorial Guinea": 2.8, "Moldova": -0.04, "North Korea": 0.53, "Morocco": 1.43, "Namibia": 1.87, "Kiribati": 1.54, "Syria": 1.97, "New Zealand": 0.63, "Yemen": 2.33, "Russia": 0.4, "Mauritius": 0.42, "Kosovo": 0.86, "Angola": 3.12, "Myanmar": 0.85, "Macau": 1.9, "Trinidad and Tobago": 0.33, "Tunisia": 0.97, "United Arab Emirates": 3.1, "Uruguay": 0.35, "United Kingdom": 0.75, "Lesotho": 1.08, "Timor-Leste": 2.88, "Sri Lanka": 1.04, "Hungary": -0.28, "Niger": 3.84, "Nicaragua": 1.46, "Comoros": 2.44}, "/location/statistical_region/diesel_price_liter": {"Canada": 1.23, "Brazil": 1.02, "Qatar": 0.27, "Turkmenistan": 0.2, "Nepal": 1.09, "Mongolia": 1.22, "France": 1.78, "Democratic Republic of the Congo": 1.48, "Rwanda": 1.73, "Swaziland": 1.34, "Benin": 1.26, "Bahrain": 0.17, "Saudi Arabia": 0.07, "Belarus": 0.9, "Indonesia": 0.47, "Lesotho": 1.31, "Zambia": 1.48, "Cape Verde": 1.58, "Togo": 1.22, "Zimbabwe": 1.4, "Germany": 1.88, "Armenia": 1.15, "Haiti": 1.03, "Belize": 1.21, "Sierra Leone": 1.05, "Mauritania": 1.27, "Georgia": 1.37, "Philippines": 1.01, "Finland": 1.95, "Central African Republic": 1.69, "Moldova": 1.4, "Bangladesh": 0.76, "Burundi": 1.47, "Sweden": 2.16, "Australia": 1.57, "Gabon": 0.91, "Mali": 1.25, "Namibia": 1.31, "Romania": 1.73, "Ethiopia": 0.94, "Uruguay": 1.88, "Tunisia": 0.69, "Republic of Macedonia": 1.55, "Kosovo": 1.6, "India": 0.86, "United Kingdom": 2.27, "Malaysia": 0.59, "Greece": 2.08, "Sri Lanka": 0.93, "Japan": 1.61, "Tajikistan": 0.91, "Barbados": 1.14}, "/location/statistical_region/gdp_real": {"Canada": 872845084506.0, "Palau": 126565271.0, "Bhutan": 961365502.0, "Guinea": 4107607446.0, "Saint Lucia": 791562406.0, "Nepal": 8036784848.0, "Costa Rica": 23898072634.0, "Cambodia": 7788044971.0, "Republic of Ireland": 123812040038.0, "Ethiopia": 18322929015.0, "Rwanda": 3593742140.0, "Slovakia": 43788541093.0, "Angola": 25901052471.0, "Swaziland": 1845684558.0, "Vanuatu": 383740629.0, "Venezuela": 156970286735.0, "Burkina Faso": 4548468401.0, "United Kingdom": 1698161182430.0, "Saudi Arabia": 258706344474.0, "Belarus": 26250070655.0, "Singapore": 162404931770.0, "Montenegro": 1385332123.0, "Saint Kitts and Nevis": 375102704.0, "Papua New Guinea": 5103993896.0, "Guatemala": 26721536972.0, "Germany": 2071241017140.0, "Armenia": 4053319790.0, "Belgium": 266511169431.0, "Thailand": 187482253175.0, "Hong Kong": 251168960224.0, "Spain": 712338669975.0, "Ukraine": 47467479851.0, "Liberia": 619202726.0, "C\u00f4te d\u2019Ivoire": 11666499085.0, "Seychelles": 749428459.0, "Namibia": 6089324238.0, "Japan": 5064043338020.0, "Syria": 30733741143.0, "Cape Verde": 944370324.0, "Romania": 56527456853.0, "Albania": 6137563946.0, "Samoa": 328592779.0, "Equatorial Guinea": 6058175791.0, "India": 971486068096.0, "Uzbekistan": 26896407919.0, "Lesotho": 1046135464.0, "Senegal": 6970078285.0, "Congo": 5067059617.0, "Antigua and Barbuda": 814439750.0, "Colombia": 149836914917.0, "Paraguay": 10483452830.0, "Hungary": 57065580674.0}, "/location/statistical_region/population": {"Canada": 34994000.0, "Bangladesh": 161083804.0, "Liberia": 3786764.0, "Guinea": 10221808.0, "Saint Lucia": 176000.0, "Lithuania": 3199342.0, "Mongolia": 2892876.0, "France": 65433714.0, "Ethiopia": 84734262.0, "Aruba": 108141.0, "Argentina": 40764561.0, "Nigeria": 174507539.0, "Ecuador": 14666055.0, "Ghana": 24965816.0, "Brunei": 405938.0, "Israel": 7907900.0, "United States of America": 313914040.0, "Republic of Ireland": 4576317.0, "Papua New Guinea": 6187591.0, "Malawi": 15380888.0, "Togo": 6154813.0, "Guatemala": 14757316.0, "Bosnia and Herzegovina": 3834000.0, "Kuwait": 2818042.0, "Thailand": 69518555.0, "Belize": 356600.0, "Eritrea": 5824000.0, "Oman": 2846145.0, "Philippines": 94852030.0, "Finland": 5421827.0, "Saudi Arabia": 28082541.0, "Moldova": 3559000.0, "North Korea": 24763188.0, "Sweden": 9514406.0, "Latvia": 2027000.0, "Mali": 15839538.0, "Syria": 22399254.0, "Cape Verde": 500585.0, "Fiji": 868400.0, "Seychelles": 86000.0, "Samoa": 183874.0, "Chad": 11525496.0, "Macau": 555731.0, "Democratic Republic of the Congo": 67757577.0, "Tunisia": 10777500.0, "Republic of Macedonia": 2063893.0, "Botswana": 2030738.0, "Malaysia": 28859154.0, "Senegal": 12767556.0, "Vietnam": 87840000.0, "Saint Vincent and the Grenadines": 109365.0, "Greece": 11300410.0, "Paraguay": 6541591.0, "Slovakia": 5440000.0, "South Korea": 50004441.0, "Tajikistan": 6976958.0, "Albania": 3215988.0, "Barbados": 273925.0, "Nicaragua": 5869859.0}, "/location/statistical_region/gdp_nominal_per_capita": {"Canada": 52218.99, "Afghanistan": 619.59, "Madagascar": 447.44, "Turkmenistan": 6510.61, "Mauritania": 1106.14, "Guinea": 591.02, "Vanuatu": 3176.21, "Kiribati": 1743.39, "Cambodia": 945.99, "Swaziland": 3043.5, "Laos": 1399.21, "Belize": 4576.64, "Venezuela": 12766.72, "Burkina Faso": 634.32, "Ecuador": 5456.43, "Bahrain": 18334.17, "Brunei": 41126.61, "Saudi Arabia": 20777.67, "Belarus": 6685.02, "Algeria": 5404.0, "Togo": 574.12, "Cameroon": 1151.36, "Zambia": 1469.12, "Montenegro": 6813.04, "Papua New Guinea": 2184.16, "Slovenia": 22092.26, "Zimbabwe": 787.94, "Thailand": 5473.75, "Haiti": 770.95, "Iraq": 6454.62, "Hong Kong": 36795.82, "Tanzania": 608.85, "Ukraine": 3866.99, "Liberia": 421.7, "Tonga": 4493.87, "C\u00f4te d\u2019Ivoire": 1243.99, "Israel": 31281.47, "Mali": 693.98, "Philippines": 2587.88, "Sweden": 55244.65, "Latvia": 14008.51, "Gabon": 11430.49, "Guyana": 3583.96, "Switzerland": 79052.34, "Bulgaria": 6986.04, "Seychelles": 11758.04, "Honduras": 2264.09, "Chad": 885.11, "Macau": 78275.15, "United Arab Emirates": 40363.16, "United Kingdom": 38514.46, "Malaysia": 10380.54, "Vietnam": 1595.81, "Saint Vincent and the Grenadines": 6515.22, "Uganda": 547.01, "South Korea": 23020.0, "Cyprus": 26315.47, "Barbados": 13076.46}, "/location/statistical_region/renewable_freshwater_per_capita": {"Brazil": 27511.596788, "Afghanistan": 1619.969848, "Bangladesh": 686.892125, "Sudan": 640.860866, "Cambodia": 8256.958747, "France": 3059.431928, "Burundi": 1054.467325, "Swaziland": 2177.932103, "Nigeria": 1345.977605, "Cameroon": 12903.974765, "Benin": 1053.19181, "Iran": 1703.695302, "Zambia": 5882.440958, "United States of America": 9043.999333, "Republic of Ireland": 10706.291891, "Guatemala": 7425.248756, "Chile": 51073.32263, "Belgium": 1086.194611, "Thailand": 3372.069221, "Haiti": 1296.738399, "Iraq": 1108.311645, "Kazakhstan": 3886.180272, "Sierra Leone": 27278.193761, "Saint Kitts and Nevis": 453.078099, "Georgia": 12965.606459, "Paraguay": 14300.716998, "Libya": 114.693311, "Turkmenistan": 275.130476, "Moldova": 280.835688, "Mauritius": 2139.106458, "Morocco": 904.570213, "Gabon": 102883.627325, "Guinea-Bissau": 9850.83375, "Honduras": 12335.615673, "Yemen": 90.112489, "Russia": 30169.27812, "Albania": 8529.168647, "Angola": 7333.815978, "Suriname": 166112.643249, "South Africa": 885.607275, "Tunisia": 393.018419, "United Arab Emirates": 16.806542, "Uruguay": 17437.636804, "Nicaragua": 32124.523255, "Malaysia": 20167.622148, "Austria": 6529.247765, "Congo": 52539.91436, "Antigua and Barbuda": 589.89019, "Uganda": 1109.591698, "Greece": 5132.754264, "Sri Lanka": 2530.068523, "Japan": 3364.177442, "Niger": 211.973961, "Tajikistan": 8120.437372}} 2 | -------------------------------------------------------------------------------- /data/train.json: -------------------------------------------------------------------------------- 1 | {"/location/statistical_region/size_of_armed_forces": {"Canada": 65700.0, "Cambodia": 191300.0, "Ethiopia": 138000.0, "Sri Lanka": 223100.0, "Argentina": 104350.0, "Burkina Faso": 11450.0, "Saudi Arabia": 249000.0, "Republic of Ireland": 8900.0, "Slovenia": 12100.0, "Bosnia and Herzegovina": 10550.0, "Kuwait": 22600.0, "Spain": 215700.0, "Liberia": 2050.0, "Namibia": 15200.0, "Oman": 47000.0, "Tanzania": 28400.0, "Gabon": 6700.0, "Yemen": 137900.0, "Pakistan": 946000.0, "Albania": 14750.0, "United Arab Emirates": 51000.0, "India": 2647150.0, "Azerbaijan": 81950.0, "Lesotho": 2000.0, "Tajikistan": 16300.0, "Afghanistan": 340350.0, "Czech Republic": 26750.0, "Mongolia": 17200.0, "France": 332250.0, "Slovakia": 15850.0, "Laos": 129100.0, "Norway": 24450.0, "Malawi": 6800.0, "Benin": 9450.0, "Montenegro": 12180.0, "Togo": 9300.0, "Armenia": 55544.0, "Ukraine": 214850.0, "Indonesia": 676500.0, "Central African Republic": 3150.0, "Mauritius": 2500.0, "Vietnam": 522000.0, "Russia": 1364000.0, "Bulgaria": 47300.0, "Romania": 151300.0, "Angola": 117000.0, "Chad": 34850.0, "South Africa": 77582.0, "Austria": 23250.0, "Mozambique": 11200.0, "Uganda": 46800.0, "Hungary": 38500.0, "Brazil": 713480.0, "Guinea": 12300.0, "Costa Rica": 9800.0, "Cape Verde": 1200.0, "Nigeria": 162000.0, "Ecuador": 58983.0, "El Salvador": 32300.0, "Chile": 103750.0, "Haiti": 50.0, "Iraq": 802400.0, "Sierra Leone": 10500.0, "Georgia": 32350.0, "Denmark": 16450.0, "Philippines": 165500.0, "Moldova": 7750.0, "Croatia": 21600.0, "Guinea-Bissau": 6450.0, "Switzerland": 23100.0, "Seychelles": 870.0, "Estonia": 5750.0, "Djibouti": 12950.0, "Timor-Leste": 1330.0, "Colombia": 440224.0, "Burundi": 51050.0, "Nicaragua": 12000.0, "Barbados": 610.0, "Madagascar": 21600.0, "Nepal": 157750.0, "Democratic Republic of the Congo": 134250.0, "Suriname": 1840.0, "Iceland": 180.0, "Zambia": 16500.0, "Papua New Guinea": 3100.0, "Zimbabwe": 50800.0, "Germany": 196000.0, "Kazakhstan": 70500.0, "Mauritania": 20850.0, "North Korea": 1379000.0, "Trinidad and Tobago": 4050.0, "Latvia": 5350.0, "Guyana": 1100.0, "Equatorial Guinea": 1320.0, "Republic of Macedonia": 8000.0, "Serbia": 28150.0, "United Kingdom": 165650.0, "Congo": 12000.0, "Paraguay": 25450.0, "Earth": 28020079.0, "Botswana": 10500.0}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"Turkmenistan": 9640.0, "Lithuania": 22760.0, "Cambodia": 2360.0, "Argentina": 17250.0, "Bolivia": 4960.0, "Cameroon": 2320.0, "Burkina Faso": 1510.0, "Bahrain": 21240.0, "Saudi Arabia": 24870.0, "Republic of Ireland": 35870.0, "Slovenia": 27240.0, "Bosnia and Herzegovina": 9380.0, "Dominica": 12190.0, "Liberia": 600.0, "Netherlands": 43620.0, "Oman": 25770.0, "C\u00f4te d\u2019Ivoire": 1960.0, "Gabon": 14290.0, "Albania": 9390.0, "Samoa": 4270.0, "Macau": 68710.0, "India": 3840.0, "Lesotho": 2210.0, "Saint Vincent and the Grenadines": 10810.0, "Cyprus": 29400.0, "South Korea": 30970.0, "Tajikistan": 2220.0, "Bangladesh": 2070.0, "Mauritania": 2520.0, "Solomon Islands": 2170.0, "Saint Lucia": 11020.0, "Hungary": 20710.0, "Mongolia": 5100.0, "Rwanda": 1240.0, "Vanuatu": 4500.0, "Norway": 66960.0, "Benin": 1570.0, "Montenegro": 13930.0, "Saint Kitts and Nevis": 17280.0, "Togo": 920.0, "Armenia": 6990.0, "Dominican Republic": 9820.0, "Ghana": 1940.0, "Tonga": 5140.0, "Indonesia": 4810.0, "Finland": 38630.0, "Mauritius": 15820.0, "Mali": 1160.0, "Russia": 22720.0, "Bulgaria": 15390.0, "Romania": 16310.0, "Angola": 5490.0, "South Africa": 11190.0, "Nicaragua": 3960.0, "Austria": 44100.0, "Mozambique": 1020.0, "Uganda": 1140.0, "Japan": 36290.0, "Niger": 650.0, "Brazil": 11720.0, "Guinea": 980.0, "Costa Rica": 12590.0, "Cape Verde": 4340.0, "Ecuador": 9590.0, "Czech Republic": 24710.0, "Belarus": 15210.0, "Algeria": 8370.0, "El Salvador": 6790.0, "Chile": 21310.0, "Belgium": 40170.0, "Kiribati": 3380.0, "Hong Kong": 53050.0, "Sierra Leone": 1360.0, "Georgia": 5860.0, "Denmark": 43340.0, "Philippines": 4400.0, "Moldova": 3690.0, "Morocco": 5040.0, "Croatia": 19760.0, "Guinea-Bissau": 1190.0, "Seychelles": 25760.0, "Estonia": 22030.0, "Uzbekistan": 3750.0, "Timor-Leste": 6410.0, "Colombia": 10110.0, "Fiji": 4880.0, "Palau": 17150.0, "Sudan": 2030.0, "Laos": 2730.0, "Maldives": 7690.0, "Suriname": 8500.0, "Venezuela": 13120.0, "Papua New Guinea": 2780.0, "Gambia": 1860.0, "Kazakhstan": 11950.0, "Eritrea": 560.0, "Kyrgyzstan": 2260.0, "Latvia": 21020.0, "Guyana": 3400.0, "Syria": 5200.0, "Equatorial Guinea": 18880.0, "Tunisia": 9360.0, "Republic of Macedonia": 11570.0, "Serbia": 11180.0, "United Kingdom": 36880.0, "Congo": 3510.0, "Greece": 25460.0, "Sri Lanka": 6120.0, "Comoros": 1230.0}, "/location/statistical_region/gdp_nominal": {"Canada": 1736050505050.0, "Turkmenistan": 24107017544.0, "Montenegro": 4550463278.0, "Lithuania": 42725404055.0, "Cambodia": 12875310959.0, "Ethiopia": 31708848033.0, "Swaziland": 3977754360.0, "Argentina": 445988571982.0, "Cameroon": 25464850391.0, "Burkina Faso": 10187211704.0, "Ghana": 39199656051.0, "Saudi Arabia": 576824000000.0, "Cape Verde": 1901136230.0, "Slovenia": 49539271106.0, "Guatemala": 46900000257.0, "Kuwait": 176590075215.0, "Spain": 1490809722220.0, "Liberia": 1161000000.0, "Netherlands": 836256944444.0, "Gabon": 17051616749.0, "New Zealand": 161851000000.0, "Albania": 12959563902.0, "Samoa": 649414531.0, "Macau": 36428443915.0, "United Arab Emirates": 360245074960.0, "Azerbaijan": 63403650746.0, "Lesotho": 2426200017.0, "South Korea": 1116247397320.0, "Tajikistan": 6522200291.0, "Afghanistan": 20343461030.0, "Czech Republic": 215215310734.0, "Mauritania": 4075675053.0, "Solomon Islands": 838022105.0, "Saint Lucia": 1232180089.0, "France": 2773032125000.0, "Rwanda": 6377408665.0, "Slovakia": 95994147901.0, "Laos": 8297664741.0, "Norway": 485803392857.0, "Malawi": 5700383783.0, "Benin": 7294865847.0, "United States of America": 15684800000000.0, "Saint Kitts and Nevis": 708955238.0, "Armenia": 10247788878.0, "Dominican Republic": 55611245616.0, "Ukraine": 165245009991.0, "Indonesia": 846832283153.0, "Central African Republic": 2165868600.0, "Mauritius": 11313454891.0, "Vietnam": 123960665229.0, "Russia": 1857769676140.0, "Bulgaria": 53514098360.0, "Romania": 179793512340.0, "Angola": 100990011820.0, "Portugal": 237522083333.0, "South Africa": 408236752338.0, "Cyprus": 24689602447.0, "Malaysia": 278671114817.0, "Mozambique": 12797754231.0, "Japan": 5867154491920.0, "Brazil": 2476652189880.0, "Guinea": 5131221608.0, "Costa Rica": 41006959585.0, "Nigeria": 235922915395.0, "Bangladesh": 110612124360.0, "Australia": 1384145284190.0, "Algeria": 188681099191.0, "El Salvador": 23054100000.0, "Chile": 248585243788.0, "Puerto Rico": 96260500000.0, "Belgium": 511533333333.0, "Thailand": 345649290737.0, "Iraq": 115388468974.0, "Hong Kong": 243665853032.0, "Sierra Leone": 2242960927.0, "Georgia": 14366566609.0, "Denmark": 332677281192.0, "Philippines": 224753569097.0, "Moldova": 7000318677.0, "Kiribati": 177960937.0, "Seychelles": 1007186292.0, "Uzbekistan": 45359432355.0, "Antigua and Barbuda": 1128708617.0, "Colombia": 336345827848.0, "Burundi": 2325972144.0, "Fiji": 3812749216.0, "Barbados": 3685000000.0, "Qatar": 172981588421.0, "Bhutan": 1688939892.0, "Sudan": 55097394769.0, "Suriname": 4350523600.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 248286777.0, "Venezuela": 316482190800.0, "Israel": 242928731135.0, "Zambia": 19206011889.0, "Papua New Guinea": 12937183216.0, "Gambia": 1109306596.0, "Kazakhstan": 186198433060.0, "Eritrea": 2608715447.0, "North Korea": 12384542599.0, "Latvia": 28252498853.0, "Hungary": 140029344474.0, "Syria": 59147033452.0, "Honduras": 17259407972.0, "Tunisia": 45863804800.0, "Republic of Macedonia": 10165373218.0, "Serbia": 45043430299.0, "Comoros": 610114092.0, "United Kingdom": 2445408064520.0, "Congo": 14748024198.0, "Greece": 298733589250.0, "Paraguay": 23877089240.0, "Earth": 69993693036500.0, "Botswana": 17627441191.0}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"Lithuania": -425560872.0, "Ethiopia": -626509560.0, "Aruba": -540670391.0, "Swaziland": -131753962.0, "Argentina": -11461806106.0, "Bolivia": -858666304.0, "Cameroon": -35250979.0, "Burkina Faso": -38150367.0, "Ghana": -3196890000.0, "Saudi Arabia": -7780825000.0, "Republic of Ireland": -10187309748.0, "Bosnia and Herzegovina": -596307428.0, "Guinea": -955240000.0, "Spain": -32927439660.0, "Liberia": -1312748380.0, "Maldives": -281565780.0, "Oman": -215864759.0, "C\u00f4te d\u2019Ivoire": -314068409.0, "Pakistan": -766680000.0, "Albania": -936691049.0, "Samoa": -24499635.0, "Macau": -1484518549.0, "India": -17354000000.0, "Lesotho": -136069926.0, "Saint Vincent and the Grenadines": -109882294.0, "Kenya": -325817353.0, "Belarus": -1343300000.0, "Afghanistan": -75649209.0, "Bangladesh": -1134654834.0, "Solomon Islands": -140957747.0, "Saint Lucia": -80976186.0, "Mongolia": -4620100551.0, "Slovakia": -1597851682.0, "Vanuatu": -58358477.0, "Malawi": -82806903.0, "Singapore": -33570856095.0, "Montenegro": -540907166.0, "Saint Kitts and Nevis": -114056140.0, "Togo": -48641465.0, "Dominican Republic": -2371100000.0, "Bahrain": 112765957.0, "Finland": 6402174412.0, "Libya": 56900000.0, "Indonesia": -14429887628.0, "Vietnam": -6480000000.0, "Russia": -358083300.0, "Romania": -2557000000.0, "South Africa": -6042705282.0, "Fiji": -190357291.0, "Malaysia": 3219856442.0, "Austria": 12289609229.0, "Japan": 117440000000.0, "Niger": -1000054471.0, "Kuwait": 8328173808.0, "Costa Rica": -2098850939.0, "Bahamas": -594982000.0, "Nigeria": -8025110597.0, "Ecuador": -640736359.0, "Czech Republic": -9243781083.0, "Australia": -51676175225.0, "Algeria": -2027471933.0, "El Salvador": -385350000.0, "Chile": -9232973456.0, "Belgium": 16309420385.0, "Gambia": -35998400.0, "Philippines": -952000000.0, "Moldova": -139430000.0, "Morocco": -2273342437.0, "Croatia": -1373993102.0, "Switzerland": 39812665157.0, "Kosovo": -272977856.0, "Timor-Leste": -47074658.0, "Colombia": -16070799122.0, "Burundi": -3354999.0, "Cyprus": -446852288.0, "Barbados": -329023166.0, "Qatar": 1513076923.0, "Bhutan": -16402331.0, "Sudan": -3056748795.0, "Nepal": -94022275.0, "Netherlands": -3736854662.0, "Suriname": -72895403.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -22182553.0, "Venezuela": -756000000.0, "Iceland": -3828118586.0, "Senegal": -263880605.0, "Germany": 60445461151.0, "Denmark": 4160376105.0, "Kazakhstan": -8380073233.0, "Tanzania": -1095401491.0, "Trinidad and Tobago": -549400000.0, "Guyana": -269560000.0, "Syria": -1469196863.0, "Honduras": -996693901.0, "Myanmar": -1000557266.0, "Serbia": -2532729957.0, "United Kingdom": 9022415330.0, "Greece": -2906522266.0, "Sri Lanka": -895920000.0, "Namibia": -948833835.0, "Botswana": -302592364.0}, "/location/statistical_region/life_expectancy": {"Canada": 80.929, "United States of America": 78.49, "Lithuania": 73.563, "Cambodia": 62.977, "Ethiopia": 59.243, "Bolivia": 66.577, "Burkina Faso": 55.358, "Ghana": 64.224, "Saudi Arabia": 74.058, "Cape Verde": 73.917, "Slovenia": 79.971, "Guatemala": 71.072, "Bosnia and Herzegovina": 75.553, "Guinea": 54.092, "Spain": 82.327, "Maldives": 76.883, "Oman": 73.342, "Tanzania": 58.151, "Gabon": 62.691, "New Zealand": 80.905, "Pakistan": 65.449, "Samoa": 72.543, "Macau": 81.019, "United Arab Emirates": 76.743, "Uruguay": 76.412, "Saint Vincent and the Grenadines": 72.295, "Kenya": 57.081, "South Korea": 80.866, "Tajikistan": 67.536, "Afghanistan": 48.681, "Czech Republic": 77.873, "Solomon Islands": 67.863, "Saint Lucia": 74.611, "Hungary": 74.859, "Mongolia": 68.488, "Rwanda": 55.395, "Laos": 67.432, "Norway": 81.295, "Malawi": 54.136, "Benin": 56.014, "Singapore": 81.893, "Montenegro": 74.504, "Armenia": 73.916, "Dominican Republic": 73.438, "Bahrain": 75.156, "Tonga": 72.286, "Libya": 74.95, "Indonesia": 69.319, "Central African Republic": 48.346, "Mauritius": 73.267, "Sweden": 81.802, "Australia": 81.846, "Mali": 51.372, "Russia": 69.005, "Bulgaria": 74.163, "Romania": 74.512, "Angola": 51.059, "Portugal": 80.722, "South Africa": 52.615, "Fiji": 69.349, "Qatar": 78.249, "Austria": 81.032, "Mozambique": 50.151, "Uganda": 54.074, "Japan": 82.591, "Brazil": 73.435, "Kuwait": 74.728, "Costa Rica": 79.315, "Republic of Ireland": 80.495, "Nigeria": 51.863, "Ecuador": 75.63, "Brunei": 78.065, "Belarus": 70.651, "Iran": 72.999, "El Salvador": 71.945, "Chile": 79.017, "Belgium": 80.485, "Iraq": 68.985, "Hong Kong": 83.422, "Georgia": 73.327, "Denmark": 79.8, "Morocco": 72.132, "Belize": 76.053, "Kosovo": 70.149, "Djibouti": 57.909, "Burundi": 50.337, "Nicaragua": 73.996, "Barbados": 76.739, "Madagascar": 66.696, "Sudan": 61.448, "Vanuatu": 71.098, "Democratic Republic of the Congo": 48.369, "Netherlands": 81.205, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 64.592, "Venezuela": 74.312, "Israel": 81.756, "Iceland": 82.359, "C\u00f4te d\u2019Ivoire": 55.421, "Mauritania": 58.547, "Kyrgyzstan": 69.602, "North Korea": 68.676, "Latvia": 73.576, "Guyana": 69.863, "Syria": 75.844, "Nepal": 68.726, "Honduras": 73.113, "Myanmar": 65.15, "Serbia": 74.585, "United Kingdom": 80.754, "Congo": 57.356, "Sri Lanka": 74.902, "Earth": 69.924, "Botswana": 53.018}, "/location/statistical_region/internet_users_percent_population": {"Canada": 86.765864, "United States of America": 81.0252, "Lithuania": 68.0, "Ethiopia": 1.48281, "Swaziland": 20.781783, "Belize": 25.0, "Argentina": 55.8, "Cameroon": 5.698987, "Bahrain": 88.0, "Saudi Arabia": 54.0, "Cape Verde": 34.743414, "Bosnia and Herzegovina": 65.356094, "Kuwait": 79.178201, "Dominica": 55.177014, "Maldives": 38.9301, "C\u00f4te d\u2019Ivoire": 2.378958, "Costa Rica": 47.500915, "New Zealand": 89.5109, "Pakistan": 9.9637, "Albania": 54.655959, "Samoa": 12.92249, "Macau": 64.2727, "India": 12.580061, "Azerbaijan": 54.2, "Saint Vincent and the Grenadines": 47.52, "Kenya": 32.095417, "South Korea": 84.1, "Czech Republic": 75.0, "Solomon Islands": 6.9974, "Saint Lucia": 48.6281, "France": 83.0, "Rwanda": 8.023854, "Laos": 10.747676, "Malawi": 4.3506, "Benin": 3.797705, "Federated States of Micronesia": 25.974423, "Montenegro": 56.838783, "Saint Kitts and Nevis": 79.348899, "Armenia": 39.160792, "Dominican Republic": 45.0, "Ukraine": 33.7, "Libya": 19.8637, "Finland": 91.0, "Central African Republic": 3.0, "Mauritius": 41.3946, "Vietnam": 39.49, "Mali": 2.1689, "Russia": 53.2748, "Bulgaria": 55.148098, "Romania": 50.0, "Chad": 2.1, "Fiji": 33.742357, "Malaysia": 65.8, "Austria": 81.0, "Mozambique": 4.8491, "Uganda": 14.6896, "Japan": 79.05, "Niger": 1.4077, "Guinea": 1.490144, "Guyana": 34.308046, "Qatar": 88.104367, "Republic of Ireland": 79.0, "Bahamas": 71.748203, "Nigeria": 32.8763, "Ecuador": 35.134506, "Bangladesh": 6.3, "Brunei": 60.273065, "Algeria": 15.228027, "El Salvador": 25.5, "Chile": 61.418155, "Puerto Rico": 51.4114, "Belgium": 82.0, "Kiribati": 10.746798, "Haiti": 10.870296, "Iraq": 7.1, "Sierra Leone": 1.3, "Gambia": 12.449229, "Moldova": 43.37, "Morocco": 55.0, "Namibia": 12.9414, "Guinea-Bissau": 2.893991, "Thailand": 26.5, "Seychelles": 47.076, "Portugal": 64.0, "Uzbekistan": 36.5213, "Timor-Leste": 0.9147, "Spain": 72.0, "Colombia": 48.984319, "Burundi": 1.22, "Nicaragua": 13.5, "Madagascar": 2.0549, "Bhutan": 25.434349, "Vanuatu": 10.598, "Netherlands": 93.0, "Suriname": 34.6812, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 21.5724, "Venezuela": 44.0456, "Israel": 73.365016, "Iceland": 96.0, "Zimbabwe": 17.0908, "Germany": 84.0, "Denmark": 93.0, "Kazakhstan": 53.315669, "Eritrea": 0.8, "Trinidad and Tobago": 59.5162, "Latvia": 74.0, "Hungary": 72.0, "Syria": 24.3001, "Honduras": 18.11987, "Myanmar": 1.0691, "Tunisia": 41.4416, "Sri Lanka": 18.2854, "Earth": 35.571014, "Comoros": 5.975296}, "/location/statistical_region/cpi_inflation_rate": {"Canada": 1.52, "Lithuania": 3.08, "Ethiopia": 23.43, "Aruba": 0.56, "Swaziland": 9.4, "Argentina": 10.03, "Bolivia": 4.59, "Ghana": 9.16, "Japan": -0.03, "Republic of Ireland": 1.69, "Slovenia": 2.6, "Guatemala": 3.78, "Kuwait": 2.92, "Dominica": 1.44, "Maldives": 11.29, "C\u00f4te d\u2019Ivoire": 1.31, "Yemen": 17.29, "Pakistan": 9.69, "Samoa": 2.05, "Macau": 6.11, "United Arab Emirates": 0.88, "India": 9.31, "Azerbaijan": 1.06, "Saint Vincent and the Grenadines": 2.6, "Kenya": 9.38, "South Korea": 2.21, "Tajikistan": 5.83, "Afghanistan": 6.8, "Bangladesh": 8.74, "Solomon Islands": 7.34, "Saint Lucia": 4.18, "Mongolia": 14.98, "France": 1.96, "Vanuatu": 0.86, "Norway": 0.71, "Benin": 6.75, "Singapore": 4.53, "Saint Kitts and Nevis": 1.37, "Togo": 2.63, "Armenia": 2.56, "Ukraine": 0.56, "Bahrain": -0.36, "Libya": 6.07, "Indonesia": 4.28, "Central African Republic": 1.3, "Mauritius": 3.85, "Sweden": 0.89, "Vietnam": 9.09, "Bulgaria": 2.95, "Angola": 10.29, "Chad": 10.25, "South Africa": 5.41, "Malaysia": 1.66, "Senegal": 1.42, "Mozambique": 10.35, "Uganda": 14.02, "Hungary": 5.71, "Niger": 0.46, "Guinea": 15.21, "Costa Rica": 4.5, "Cape Verde": 2.54, "Bahamas": 3.17, "Ecuador": 5.1, "Czech Republic": 3.3, "Brunei": 0.46, "Belarus": 59.22, "Iran": 27.34, "Algeria": 8.89, "El Salvador": 1.73, "Chile": 3.01, "Belgium": 2.84, "Haiti": 6.28, "Belize": 1.31, "Gambia": 4.8, "Philippines": 3.17, "Croatia": 3.42, "Switzerland": -0.67, "Portugal": 2.77, "Estonia": 3.93, "Kosovo": 2.48, "Djibouti": 7.88, "Spain": 2.45, "Colombia": 3.18, "Barbados": 4.53, "Madagascar": 6.36, "Bhutan": 10.92, "Sudan": 12.99, "Laos": 4.26, "Netherlands": 2.45, "Suriname": 5.01, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 10.41, "Venezuela": 21.07, "Israel": 1.71, "Iceland": 5.19, "Zambia": 6.43, "Austria": 2.49, "Papua New Guinea": 8.44, "Kazakhstan": 5.11, "Kyrgyzstan": 2.69, "Trinidad and Tobago": 9.26, "Guyana": 2.39, "Syria": 36.7, "Honduras": 5.2, "Myanmar": 5.02, "Equatorial Guinea": 6.95, "Republic of Macedonia": 3.31, "Serbia": 7.33, "Paraguay": 3.68, "Earth": 3.82}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"Canada": 11.18, "Turkmenistan": 2.73, "United States of America": 17.85, "Lithuania": 6.6, "Ethiopia": 4.65, "Swaziland": 8.01, "Bolivia": 4.88, "Cameroon": 5.23, "Burkina Faso": 6.51, "Ghana": 4.78, "Republic of Ireland": 9.38, "Guatemala": 6.73, "Bosnia and Herzegovina": 10.21, "Kuwait": 2.66, "Spain": 9.44, "Liberia": 19.48, "Netherlands": 11.96, "Tanzania": 7.28, "Yemen": 5.46, "Albania": 6.32, "United Arab Emirates": 3.35, "India": 3.87, "Lesotho": 12.76, "Saint Vincent and the Grenadines": 4.93, "South Korea": 7.21, "Afghanistan": 9.58, "Bangladesh": 3.72, "Eritrea": 2.56, "Hungary": 7.75, "Slovakia": 8.69, "Vanuatu": 4.12, "Malawi": 8.38, "Benin": 4.57, "Federated States of Micronesia": 13.42, "Singapore": 4.56, "Montenegro": 9.32, "Saint Kitts and Nevis": 4.43, "Togo": 8.01, "Dominican Republic": 5.36, "Bahrain": 3.79, "Tonga": 5.26, "Libya": 4.39, "Indonesia": 2.72, "Central African Republic": 3.79, "Mauritius": 5.89, "Sweden": 9.36, "Vietnam": 6.81, "Mali": 6.81, "Russia": 6.2, "Romania": 5.84, "Angola": 3.49, "Portugal": 10.36, "Malaysia": 3.58, "Austria": 10.64, "Mozambique": 6.59, "Japan": 9.27, "Niger": 5.32, "Brazil": 8.9, "Guinea": 5.96, "Earth": 10.06, "Nigeria": 5.32, "Ecuador": 7.26, "Czech Republic": 7.38, "Brunei": 2.46, "Belarus": 5.32, "Algeria": 3.93, "Chile": 7.46, "Belgium": 10.6, "Kiribati": 10.06, "Sierra Leone": 18.84, "Georgia": 9.89, "Denmark": 11.15, "Philippines": 4.07, "Morocco": 6.03, "Namibia": 5.34, "Guinea-Bissau": 6.28, "Switzerland": 10.86, "Estonia": 5.96, "Uruguay": 8.0, "Uzbekistan": 5.42, "Timor-Leste": 5.07, "Colombia": 6.12, "Burundi": 8.73, "Nicaragua": 10.05, "Barbados": 7.66, "Madagascar": 4.07, "Bhutan": 4.07, "Sudan": 8.39, "Maldives": 8.5, "Suriname": 5.29, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.73, "Venezuela": 5.16, "Iceland": 9.07, "Zambia": 6.12, "Senegal": 5.98, "Papua New Guinea": 4.28, "Kazakhstan": 3.92, "C\u00f4te d\u2019Ivoire": 6.78, "Mauritania": 5.39, "Guyana": 5.86, "Syria": 3.74, "Honduras": 8.62, "Myanmar": 2.0, "Equatorial Guinea": 3.95, "Tunisia": 6.16, "United Kingdom": 9.32, "Congo": 2.45, "Greece": 10.83, "Paraguay": 9.72, "Croatia": 7.81, "Comoros": 5.26}, "/location/statistical_region/time_required_to_start_a_business": {"United States of America": 6.0, "Lithuania": 20.0, "Cambodia": 85.0, "Ethiopia": 15.0, "Belize": 44.0, "Argentina": 26.0, "Bolivia": 50.0, "Burkina Faso": 13.0, "Bahrain": 9.0, "Saudi Arabia": 21.0, "Republic of Ireland": 10.0, "Slovenia": 6.0, "Guatemala": 40.0, "Guinea": 35.0, "Dominica": 13.0, "Netherlands": 5.0, "Paraguay": 35.0, "Oman": 8.0, "C\u00f4te d\u2019Ivoire": 32.0, "Gabon": 58.0, "Yemen": 40.0, "Albania": 4.0, "Kosovo": 52.0, "India": 27.0, "Azerbaijan": 8.0, "Lesotho": 24.0, "Saint Vincent and the Grenadines": 10.0, "Kenya": 32.0, "South Korea": 7.0, "Czech Republic": 20.0, "Solomon Islands": 9.0, "Cyprus": 8.0, "France": 7.0, "Rwanda": 3.0, "Slovakia": 16.0, "Vanuatu": 35.0, "Norway": 7.0, "Malawi": 39.0, "Federated States of Micronesia": 16.0, "Montenegro": 10.0, "Saint Kitts and Nevis": 19.0, "Togo": 38.0, "Dominican Republic": 19.0, "Ukraine": 22.0, "Ghana": 12.0, "Indonesia": 47.0, "Finland": 14.0, "Mauritius": 6.0, "Sweden": 16.0, "Mali": 8.0, "Russia": 18.0, "Romania": 10.0, "Angola": 68.0, "Portugal": 5.0, "Nicaragua": 39.0, "Senegal": 5.0, "Hungary": 5.0, "Brazil": 119.0, "Kuwait": 32.0, "Qatar": 9.0, "Cape Verde": 11.0, "Bahamas": 31.0, "Nigeria": 34.0, "Bangladesh": 19.0, "Belarus": 5.0, "El Salvador": 17.0, "Chile": 8.0, "Puerto Rico": 6.0, "Belgium": 4.0, "Haiti": 105.0, "Iraq": 74.0, "Hong Kong": 3.0, "Sierra Leone": 12.0, "Denmark": 6.0, "Moldova": 9.0, "Morocco": 12.0, "Namibia": 66.0, "Guinea-Bissau": 9.0, "Switzerland": 18.0, "Seychelles": 39.0, "Estonia": 7.0, "Uruguay": 7.0, "Laos": 92.0, "Djibouti": 37.0, "Antigua and Barbuda": 21.0, "Colombia": 13.0, "Fiji": 58.0, "Madagascar": 8.0, "Palau": 28.0, "Sudan": 36.0, "Nepal": 29.0, "Democratic Republic of the Congo": 58.0, "Austria": 25.0, "Papua New Guinea": 51.0, "Zimbabwe": 90.0, "Germany": 15.0, "Kazakhstan": 19.0, "Tanzania": 26.0, "Eritrea": 84.0, "Kyrgyzstan": 10.0, "Trinidad and Tobago": 41.0, "Latvia": 16.0, "Guyana": 20.0, "Honduras": 14.0, "Equatorial Guinea": 135.0, "Tunisia": 11.0, "Republic of Macedonia": 2.0, "Serbia": 12.0, "Greece": 11.0, "Sri Lanka": 7.0, "Croatia": 9.0, "Comoros": 20.0}, "/location/statistical_region/net_migration": {"Canada": 1098444.0, "Turkmenistan": -54499.0, "Montenegro": -2508.0, "Ethiopia": -300000.0, "Aruba": 4000.0, "Argentina": -199997.0, "Bolivia": -165177.0, "Cameroon": -19000.0, "Bahrain": 447856.0, "Cape Verde": -17279.0, "Slovenia": 22000.0, "Spain": 2250005.0, "Liberia": 300000.0, "Oman": 153003.0, "C\u00f4te d\u2019Ivoire": -360000.0, "Gabon": 5000.0, "New Zealand": 65004.0, "Yemen": -135000.0, "Albania": -47889.0, "Samoa": -15738.0, "Saint Vincent and the Grenadines": -5000.0, "Kenya": -189330.0, "Czech Republic": 240466.0, "Mauritania": 9900.0, "Solomon Islands": 0.0, "Mongolia": -15001.0, "France": 500001.0, "Rwanda": 15109.0, "Slovakia": 36684.0, "Vanuatu": 0.0, "Norway": 171232.0, "Malawi": -20000.0, "Benin": 50000.0, "Federated States of Micronesia": -9000.0, "Singapore": 721738.0, "United States of America": 4954924.0, "Togo": -5430.0, "Armenia": -75000.0, "Dominican Republic": -140000.0, "Tonga": -8196.0, "Libya": -20300.0, "Indonesia": -1293089.0, "Central African Republic": 5000.0, "Mauritius": 0.0, "Sweden": 265649.0, "Australia": 1124639.0, "Mali": -100823.0, "Russia": 1135737.0, "Bulgaria": -50000.0, "Romania": -100000.0, "Angola": 82005.0, "Nicaragua": -200000.0, "Qatar": 857090.0, "Malaysia": 84494.0, "Senegal": -132842.0, "Vietnam": -430692.0, "Mozambique": -20000.0, "Hungary": 75000.0, "Brazil": -499999.0, "Guinea": -300000.0, "Costa Rica": 75600.0, "Bahamas": 6440.0, "Nigeria": -300000.0, "Bangladesh": -2908015.0, "Belarus": -50010.0, "Iran": -185650.0, "El Salvador": -291710.0, "Chile": 30000.0, "Puerto Rico": -145092.0, "Belgium": 200000.0, "Haiti": -239997.0, "Belize": -972.0, "Hong Kong": 176125.0, "Sierra Leone": 60000.0, "Georgia": -150000.0, "Philippines": -1233365.0, "Moldova": -171748.0, "Namibia": -1494.0, "Iraq": -150021.0, "Estonia": 0.0, "Uzbekistan": -518486.0, "Colombia": -120000.0, "Cyprus": 44166.0, "Barbados": 0.0, "Madagascar": -5000.0, "Bhutan": 16829.0, "Sudan": 135000.0, "Nepal": -100000.0, "Democratic Republic of the Congo": -23975.0, "Suriname": -4998.0, "Venezuela": 40000.0, "Israel": 273635.0, "Austria": 160000.0, "Zimbabwe": -900000.0, "Germany": 550001.0, "Kazakhstan": 6990.0, "Tanzania": -300000.0, "Eritrea": 55000.0, "Kyrgyzstan": -131593.0, "North Korea": 0.0, "Trinidad and Tobago": -19806.0, "Latvia": -10000.0, "Myanmar": -500000.0, "Equatorial Guinea": 20000.0, "Republic of Macedonia": 2000.0, "United Kingdom": 1020211.0, "Congo": 49872.0, "Greece": 154004.0, "Sri Lanka": -249998.0, "Croatia": 10000.0, "Botswana": 18730.0}, "/location/statistical_region/gdp_growth_rate": {"Canada": 1.709006, "Turkmenistan": 11.10011, "Lithuania": 3.7, "Swaziland": -1.5, "Bolivia": 5.17643, "Cameroon": 4.7, "Burkina Faso": 10.034099, "Ghana": 7.909568, "Republic of Ireland": 0.93789, "Slovenia": -2.3, "Guatemala": 2.965657, "Dominica": -1.45165, "Liberia": 10.819873, "Maldives": 3.419255, "Oman": 5.5, "Tanzania": 6.858715, "Seychelles": 2.9, "Gabon": 6.1, "New Zealand": 2.972364, "Pakistan": 4.185263, "Samoa": 1.2, "Kosovo": 3.8, "Lesotho": 3.963882, "Kenya": 4.3, "Tajikistan": 8.0, "Afghanistan": 6.964322, "Czech Republic": -1.323942, "Solomon Islands": 3.9, "Saint Lucia": -3.040925, "France": 0.013879, "Rwanda": 7.981099, "Slovakia": 2.0, "Vanuatu": 2.25, "Norway": 3.091298, "Benin": 5.399974, "Federated States of Micronesia": 1.4, "Singapore": 1.318966, "United States of America": 2.21, "Saint Kitts and Nevis": -1.071206, "Armenia": 7.14438, "Timor-Leste": 8.579973, "Ukraine": 0.2, "Tonga": 0.8, "Indonesia": 6.226484, "Finland": -0.208975, "Central African Republic": 4.1, "Sweden": 0.741078, "Vietnam": 5.027921, "Mali": -1.187974, "Russia": 3.442173, "Bulgaria": 0.8, "Portugal": -3.248857, "South Africa": 2.548464, "Cyprus": -2.4, "Malaysia": 5.612743, "Austria": 0.849901, "Uganda": 3.425419, "Japan": 1.945, "Niger": 11.2, "Brazil": 0.872708, "Kuwait": 8.19, "Bahamas": 1.832172, "Nigeria": 6.55, "Bangladesh": 6.317928, "Brunei": 2.154756, "Belarus": 1.5, "Algeria": 2.5, "Chile": 5.55537, "Puerto Rico": 0.516052, "Kiribati": 2.5, "Belize": 2.0, "Hong Kong": 1.501485, "Sierra Leone": 15.223847, "Gambia": 6.011361, "Morocco": 2.712963, "Croatia": -2.0, "Switzerland": 0.96709, "Iraq": 8.42846, "Chad": 5.020602, "Estonia": 3.224363, "Uruguay": 3.935344, "Uzbekistan": 8.2, "Djibouti": 4.8, "Antigua and Barbuda": 2.331803, "Colombia": 4.004284, "Burundi": 4.003347, "Fiji": 2.152812, "Madagascar": 3.1, "Palau": 5.251819, "Sudan": -10.1, "Nepal": 4.6336, "Democratic Republic of the Congo": 7.150601, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 4.0, "Israel": 4.706663, "Zambia": 7.316899, "Senegal": 3.690368, "Denmark": -0.46941, "C\u00f4te d\u2019Ivoire": 9.496871, "Mauritania": 7.568362, "Kyrgyzstan": -0.899685, "Sri Lanka": 6.409926, "Latvia": 5.6, "Hungary": -1.7, "Syria": 3.2, "Honduras": 3.500041, "Equatorial Guinea": 2.5, "Tunisia": 3.6, "Republic of Macedonia": -0.26668, "Greece": -6.379831, "Paraguay": -1.213102, "Namibia": 4.984768, "Comoros": 2.960789}, "/location/statistical_region/fertility_rate": {"Turkmenistan": 2.363, "Lithuania": 1.55, "Cambodia": 2.51, "Ethiopia": 4.045, "Aruba": 1.687, "Swaziland": 3.286, "Argentina": 2.196, "Bolivia": 3.294, "Cameroon": 4.409, "Burkina Faso": 5.812, "Bahrain": 2.501, "Saudi Arabia": 2.811, "Republic of Ireland": 2.07, "Slovenia": 1.57, "Guatemala": 3.922, "Bosnia and Herzegovina": 1.14, "Guinea": 5.162, "Spain": 1.39, "Liberia": 5.161, "Maldives": 1.709, "Tanzania": 5.529, "Gabon": 3.221, "New Zealand": 2.1, "Yemen": 5.092, "Samoa": 3.815, "Macau": 1.123, "United Arab Emirates": 1.721, "India": 2.589, "Saint Vincent and the Grenadines": 2.037, "Kenya": 4.68, "South Korea": 1.22, "Afghanistan": 6.288, "Solomon Islands": 4.157, "Saint Lucia": 1.98, "Cyprus": 1.468, "Mongolia": 2.494, "Vanuatu": 3.82, "Norway": 1.95, "Malawi": 5.985, "Benin": 5.206, "Montenegro": 1.644, "Togo": 3.985, "Ukraine": 1.445, "Ghana": 4.1, "Tonga": 3.861, "Libya": 2.502, "Finland": 1.87, "Mauritius": 1.47, "Sweden": 1.98, "Australia": 1.92, "Mali": 6.227, "Romania": 1.38, "Angola": 5.313, "Chad": 5.886, "Fiji": 2.639, "Malaysia": 2.607, "Senegal": 4.735, "Vietnam": 1.822, "Mozambique": 4.832, "Hungary": 1.25, "Niger": 7.012, "Brazil": 1.811, "Kuwait": 2.295, "Costa Rica": 1.827, "Cape Verde": 2.344, "Bahamas": 1.891, "Ecuador": 2.443, "Brunei": 2.017, "Belarus": 1.44, "Iran": 1.67, "Algeria": 2.217, "Chile": 1.849, "Puerto Rico": 1.797, "Thailand": 1.559, "Haiti": 3.264, "Hong Kong": 1.108, "Sierra Leone": 4.884, "Georgia": 1.555, "Denmark": 1.87, "Philippines": 3.099, "Morocco": 2.242, "Guinea-Bissau": 4.988, "Switzerland": 1.5, "Seychelles": 2.5, "Estonia": 1.63, "Kosovo": 2.2, "Uzbekistan": 2.499, "Djibouti": 3.679, "Colombia": 2.1, "Nicaragua": 2.573, "Barbados": 1.561, "Madagascar": 4.584, "Bhutan": 2.332, "Sudan": 4.325, "Laos": 2.658, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 3.585, "Israel": 3.03, "Iceland": 2.2, "Zambia": 6.279, "Austria": 1.44, "Papua New Guinea": 3.891, "Zimbabwe": 3.219, "Eritrea": 4.366, "North Korea": 2.01, "Trinidad and Tobago": 1.637, "Honduras": 3.078, "Myanmar": 1.976, "Equatorial Guinea": 5.106, "Tunisia": 2.04, "Serbia": 1.4, "United Kingdom": 1.94, "Sri Lanka": 2.313, "Comoros": 4.919}, "/location/statistical_region/consumer_price_index": {"Canada": 113.74, "Lithuania": 138.22, "Cambodia": 160.23, "Ethiopia": 366.69, "Swaziland": 167.13, "Argentina": 185.85, "Cameroon": 123.58, "Burkina Faso": 122.73, "Ghana": 224.24, "Saudi Arabia": 142.05, "Republic of Ireland": 111.95, "Bosnia and Herzegovina": 124.63, "Spain": 118.84, "Liberia": 187.67, "Maldives": 173.59, "Tanzania": 197.07, "Gabon": 117.15, "Albania": 121.76, "Samoa": 140.51, "India": 180.77, "Azerbaijan": 178.58, "Lesotho": 157.25, "Saint Vincent and the Grenadines": 130.83, "Cyprus": 119.36, "Tajikistan": 202.13, "Afghanistan": 164.2, "Bangladesh": 174.08, "Solomon Islands": 163.2, "Saint Lucia": 122.88, "Mongolia": 211.21, "France": 112.25, "Slovakia": 124.04, "Laos": 142.87, "Malawi": 203.31, "Singapore": 125.0, "Montenegro": 125.52, "Saint Kitts and Nevis": 132.92, "Armenia": 144.55, "Dominican Republic": 153.26, "Ukraine": 212.1, "Bahrain": 113.87, "Tonga": 140.59, "Libya": 153.55, "Central African Republic": 125.24, "Mauritius": 151.83, "Vietnam": 216.05, "Mali": 125.97, "Russia": 185.44, "Bulgaria": 147.54, "Romania": 147.57, "Angola": 233.06, "Portugal": 116.08, "Nicaragua": 184.11, "Malaysia": 119.63, "Austria": 115.86, "Mozambique": 173.52, "Hungary": 142.83, "Brazil": 141.27, "Kuwait": 140.16, "Qatar": 141.06, "Nigeria": 200.79, "Brunei": 107.29, "Australia": 121.81, "Algeria": 139.1, "Belgium": 117.77, "Haiti": 172.6, "Iraq": 170.7, "Sierra Leone": 214.26, "Denmark": 116.89, "Namibia": 157.36, "Guinea-Bissau": 127.44, "Switzerland": 104.03, "Seychelles": 203.06, "Estonia": 137.92, "Kosovo": 127.55, "Timor-Leste": 170.53, "Dominica": 120.76, "Colombia": 133.93, "Burundi": 211.4, "Fiji": 143.57, "Barbados": 151.42, "Madagascar": 184.97, "Bhutan": 161.31, "Sudan": 166.31, "Netherlands": 113.22, "Suriname": 179.32, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 321.23, "Venezuela": 248.68, "Israel": 119.86, "Senegal": 120.1, "Papua New Guinea": 140.63, "Germany": 112.62, "Kazakhstan": 184.47, "Mauritania": 147.32, "Kyrgyzstan": 199.96, "Trinidad and Tobago": 177.82, "Latvia": 147.95, "Guyana": 146.12, "Belarus": 395.63, "Honduras": 156.02, "Myanmar": 235.84, "Tunisia": 133.97, "Serbia": 182.91, "Comoros": 119.49, "United Kingdom": 122.99, "Greece": 122.93, "Sri Lanka": 195.71, "Croatia": 123.23, "Botswana": 181.42}, "/location/statistical_region/prevalence_of_undernourisment": {"Canada": 5.0, "Turkmenistan": 5.0, "Montenegro": 5.0, "Lithuania": 5.0, "Cambodia": 17.1, "Ethiopia": 40.2, "Swaziland": 27.0, "Argentina": 5.0, "Bolivia": 24.1, "Saudi Arabia": 5.0, "Japan": 5.0, "Cape Verde": 8.9, "Guatemala": 30.4, "Spain": 5.0, "Tanzania": 38.8, "Gabon": 6.5, "New Zealand": 5.0, "Yemen": 32.4, "Albania": 5.0, "Samoa": 5.0, "United Arab Emirates": 5.0, "India": 17.5, "Azerbaijan": 5.0, "Saint Vincent and the Grenadines": 5.0, "South Korea": 5.0, "Tajikistan": 31.7, "Czech Republic": 5.0, "Mauritania": 9.3, "Solomon Islands": 12.7, "Saint Lucia": 14.6, "Mongolia": 24.2, "France": 5.0, "Rwanda": 28.9, "Malawi": 23.1, "Benin": 8.1, "United States of America": 5.0, "Saint Kitts and Nevis": 14.0, "Togo": 16.5, "Armenia": 5.0, "Dominican Republic": 15.4, "Ukraine": 5.0, "Libya": 5.0, "Mauritius": 5.7, "Vietnam": 9.0, "Bulgaria": 5.0, "Romania": 5.0, "Angola": 27.4, "Portugal": 5.0, "South Africa": 5.0, "Malaysia": 5.0, "Senegal": 20.5, "Mozambique": 39.2, "Uganda": 34.6, "Hungary": 5.0, "Brazil": 6.9, "Guinea": 17.3, "Costa Rica": 6.5, "Bahamas": 7.2, "Nigeria": 8.5, "Ecuador": 18.3, "Bangladesh": 16.8, "Belarus": 5.0, "Iran": 5.0, "Algeria": 5.0, "El Salvador": 12.3, "Belgium": 5.0, "Thailand": 7.3, "Haiti": 44.5, "Belize": 6.8, "Sierra Leone": 28.8, "Georgia": 24.7, "Denmark": 5.0, "Moldova": 5.0, "Morocco": 5.5, "Namibia": 33.9, "Guinea-Bissau": 8.7, "Seychelles": 8.6, "Chad": 33.4, "Uruguay": 5.0, "Antigua and Barbuda": 20.5, "Dominica": 5.0, "Colombia": 12.6, "Fiji": 5.0, "Madagascar": 33.4, "Sudan": 39.4, "Nepal": 18.0, "Suriname": 11.4, "Venezuela": 5.0, "Israel": 5.0, "Zambia": 47.4, "Zimbabwe": 32.8, "Kazakhstan": 5.0, "Eritrea": 65.4, "Kyrgyzstan": 6.4, "Paraguay": 25.5, "Guyana": 5.1, "Honduras": 9.6, "Tunisia": 5.0, "Republic of Macedonia": 5.0, "Serbia": 5.0, "Greece": 5.0, "Sri Lanka": 24.0, "Botswana": 27.9}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"Canada": -1.2, "Turkmenistan": 7.99, "Cambodia": -5.42, "Swaziland": -11.87, "Argentina": 2.58, "Cameroon": -4.27, "Burkina Faso": -7.43, "Saudi Arabia": 30.98, "Republic of Ireland": 24.14, "Slovenia": 1.03, "Guatemala": -11.25, "Kuwait": 44.86, "Dominica": -19.79, "Liberia": -65.84, "Netherlands": 8.84, "New Zealand": 1.48, "Yemen": -4.11, "Pakistan": -8.16, "Macau": 58.57, "United Arab Emirates": 9.18, "Kosovo": -40.85, "Saint Vincent and the Grenadines": -28.96, "Tajikistan": -39.25, "Bangladesh": -10.33, "Solomon Islands": -16.98, "Mongolia": -25.93, "France": -2.22, "Laos": -6.26, "Vanuatu": -5.67, "Malawi": -9.91, "Benin": -11.98, "United States of America": -3.79, "Togo": -17.22, "Dominican Republic": -8.26, "Ukraine": -7.64, "Finland": -0.61, "Indonesia": 2.14, "Central African Republic": -11.56, "Mauritius": -11.87, "Australia": 0.26, "Russia": 7.21, "Angola": 19.27, "Portugal": -0.6, "South Africa": -3.05, "Malaysia": 11.93, "Senegal": -19.52, "Uganda": -9.46, "Japan": -0.91, "Niger": -30.01, "Brazil": -0.9, "Guinea": -37.04, "Costa Rica": -4.81, "Cape Verde": -25.88, "Bahamas": -18.1, "Nigeria": 3.99, "Ecuador": -2.37, "Czech Republic": 5.34, "Brunei": 50.19, "Belarus": 6.11, "Chile": 0.8, "Puerto Rico": 19.92, "Belgium": 1.19, "Thailand": 2.74, "Haiti": -41.44, "Belize": 0.31, "Hong Kong": 0.04, "Sierra Leone": -37.34, "Denmark": 4.4, "Philippines": -5.68, "Moldova": -40.4, "Morocco": -12.63, "Namibia": -6.1, "Seychelles": -53.48, "Chad": 15.22, "Estonia": 0.49, "Uruguay": -3.4, "Antigua and Barbuda": -9.53, "Cyprus": -6.44, "Barbados": -5.06, "Madagascar": -11.05, "Palau": 7.33, "Bhutan": -21.1, "Nepal": -22.81, "Maldives": -0.91, "Suriname": -3.46, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -45.9, "Venezuela": 6.33, "Israel": -0.86, "Iceland": 6.31, "Zambia": 8.57, "Austria": 3.53, "Papua New Guinea": 3.18, "Zimbabwe": -23.73, "Germany": 5.65, "Gambia": -18.3, "Eritrea": -3.16, "Kyrgyzstan": -30.87, "Syria": -0.04, "Tunisia": -10.65, "Republic of Macedonia": -20.45, "United Kingdom": -2.35, "Congo": 33.74, "Greece": -5.03, "Sri Lanka": -14.55, "Earth": 0.11}, "/location/statistical_region/gni_in_ppp_dollars": {"Montenegro": 8654162319.0, "Lithuania": 67938246602.0, "Ethiopia": 104221119312.0, "Argentina": 703278101130.0, "Burkina Faso": 24933633406.0, "Bahrain": 26801669108.0, "Slovenia": 56072441827.0, "Guatemala": 74834109960.0, "Bosnia and Herzegovina": 35979806912.0, "Spain": 1493845075000.0, "Liberia": 2507173874.0, "Maldives": 2601820817.0, "Oman": 71695822352.0, "C\u00f4te d\u2019Ivoire": 38821061079.0, "Seychelles": 2261451674.0, "Gabon": 23328015948.0, "New Zealand": 131980720498.0, "Yemen": 54066290308.0, "Pakistan": 543576254992.0, "Samoa": 807211757.0, "Macau": 37533041979.0, "United Arab Emirates": 380512914082.0, "Madagascar": 21178794693.0, "Lesotho": 4527708912.0, "Kenya": 76054614484.0, "Tajikistan": 17819107731.0, "Czech Republic": 259789481557.0, "Solomon Islands": 1191968090.0, "Saint Lucia": 1992531771.0, "Mongolia": 14265391693.0, "Rwanda": 13568517097.0, "Laos": 18141360226.0, "Norway": 336059622022.0, "Malawi": 13924085669.0, "Federated States of Micronesia": 423304417.0, "Singapore": 324599341500.0, "United States of America": 15887600000000.0, "Saint Kitts and Nevis": 925927382.0, "Togo": 6098235684.0, "Timor-Leste": 7760635504.0, "Dominican Republic": 100960380994.0, "Ukraine": 332544983323.0, "Finland": 209150898182.0, "Central African Republic": 3873992201.0, "Mauritius": 20425210760.0, "Sweden": 420148431023.0, "Mali": 17163548455.0, "Russia": 3260623066100.0, "Bulgaria": 112444585303.0, "Romania": 347816274800.0, "Angola": 114272478894.0, "Chad": 16447992439.0, "Austria": 373185776984.0, "Brazil": 2328799385170.0, "Guinea": 11279280880.0, "Costa Rica": 60486240692.0, "Bahamas": 10894769805.0, "Nigeria": 409083176913.0, "Belarus": 143909024541.0, "Algeria": 301065051176.0, "El Salvador": 42772730106.0, "Chile": 372116945471.0, "Belgium": 447572881592.0, "Kiribati": 340465519.0, "Haiti": 12602329409.0, "Iraq": 140186508091.0, "Hong Kong": 379564207517.0, "Georgia": 26448099223.0, "Denmark": 242271306781.0, "Philippines": 425233320593.0, "Moldova": 13138151958.0, "Morocco": 166575345252.0, "Croatia": 84326098327.0, "Guinea-Bissau": 1980619530.0, "Thailand": 629981459136.0, "Namibia": 16879684702.0, "Belize": 2164586795.0, "Portugal": 260731604978.0, "Estonia": 29511428089.0, "Antigua and Barbuda": 1715280881.0, "Dominica": 873945056.0, "Burundi": 5481978148.0, "Fiji": 4265435843.0, "Qatar": 162744911650.0, "Palau": 355858623.0, "Bhutan": 4678326829.0, "Vanuatu": 1111709036.0, "Democratic Republic of the Congo": 24528773905.0, "Suriname": 4541111746.0, "Venezuela": 393044537438.0, "Israel": 210599676916.0, "Papua New Guinea": 19934921202.0, "Germany": 3430106816840.0, "Mauritania": 9569602448.0, "Kyrgyzstan": 12639933104.0, "Sri Lanka": 124507519600.0, "Latvia": 42566661549.0, "Guyana": 2702511288.0, "Nepal": 41088560414.0, "Honduras": 30907870013.0, "Equatorial Guinea": 13900860150.0, "Tunisia": 100881806830.0, "Republic of Macedonia": 24354018399.0, "Serbia": 80790555790.0, "Botswana": 33114218810.0, "United Kingdom": 2331851340120.0, "Paraguay": 37538361759.0, "Earth": 85463197120800.0, "Comoros": 881534824.0}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"Canada": 51.04, "Turkmenistan": 76.31, "Cambodia": 136.54, "Ethiopia": 34.78, "Swaziland": 102.76, "Argentina": 31.53, "Bolivia": 70.28, "Bahrain": 118.68, "Saudi Arabia": 86.06, "Republic of Ireland": 85.26, "Guatemala": 53.37, "Dominica": 49.0, "Liberia": 86.3, "Netherlands": 161.42, "C\u00f4te d\u2019Ivoire": 89.75, "Costa Rica": 64.17, "Seychelles": 125.62, "Gabon": 85.2, "New Zealand": 44.37, "Yemen": 57.51, "Albania": 52.21, "India": 42.49, "Lesotho": 151.17, "Saint Vincent and the Grenadines": 55.29, "South Korea": 94.5, "Tajikistan": 73.52, "Afghanistan": 37.52, "Bangladesh": 51.25, "Eritrea": 45.93, "Solomon Islands": 96.19, "Saint Lucia": 75.03, "Rwanda": 34.77, "Slovakia": 173.82, "Vanuatu": 44.57, "Malawi": 85.6, "Benin": 47.64, "Singapore": 286.9, "Montenegro": 66.38, "Togo": 73.42, "Armenia": 57.46, "Dominican Republic": 45.12, "Ghana": 73.69, "Tonga": 47.92, "Indonesia": 43.09, "Finland": 59.52, "Central African Republic": 24.78, "Mauritius": 74.82, "Vietnam": 161.2, "Russia": 42.92, "Chad": 58.99, "South Africa": 54.65, "Austria": 86.17, "Uganda": 41.87, "Japan": 28.26, "Brazil": 21.12, "Guyana": 112.26, "Madagascar": 45.61, "Nigeria": 62.83, "Ecuador": 58.12, "Czech Republic": 151.88, "Iran": 37.4, "Algeria": 58.08, "El Salvador": 65.62, "Chile": 58.92, "Belgium": 182.17, "Kiribati": 62.6, "Haiti": 46.15, "Belize": 83.34, "Hong Kong": 397.93, "Sierra Leone": 63.22, "Georgia": 64.56, "Gambia": 52.33, "Philippines": 46.89, "Guinea-Bissau": 42.34, "Thailand": 130.51, "Switzerland": 66.98, "Iraq": 72.0, "Laos": 54.85, "Djibouti": 48.69, "Antigua and Barbuda": 49.73, "Colombia": 32.26, "Burundi": 36.81, "Fiji": 87.33, "Qatar": 83.35, "Palau": 64.36, "Bhutan": 90.47, "Nepal": 38.42, "Democratic Republic of the Congo": 69.39, "Suriname": 88.64, "Israel": 59.12, "Iceland": 71.98, "Senegal": 63.21, "Zimbabwe": 75.83, "Germany": 75.73, "Denmark": 63.36, "Kazakhstan": 67.84, "Tanzania": 58.81, "Mauritania": 126.22, "Kyrgyzstan": 112.27, "Paraguay": 73.54, "Latvia": 109.19, "Hungary": 158.61, "Syria": 16.02, "Honduras": 106.3, "Tunisia": 90.79, "Republic of Macedonia": 108.79, "Serbia": 81.0, "United Kingdom": 47.17, "Congo": 118.44, "Greece": 37.93, "Sri Lanka": 48.07, "Earth": 50.46, "Comoros": 54.54}, "/location/statistical_region/labor_participation_rate": {"Canada": 47.15, "Turkmenistan": 39.22, "Lithuania": 50.59, "Cambodia": 50.1, "Swaziland": 39.53, "Argentina": 40.29, "Cameroon": 45.69, "Ghana": 49.66, "Saudi Arabia": 14.54, "Cape Verde": 38.46, "Slovenia": 45.63, "Guatemala": 38.16, "Bosnia and Herzegovina": 39.24, "Kuwait": 23.91, "Maldives": 42.01, "New Zealand": 46.8, "Yemen": 25.97, "Pakistan": 20.81, "Albania": 41.4, "Macau": 48.85, "United Arab Emirates": 14.43, "India": 25.37, "South Korea": 41.4, "Tajikistan": 43.76, "Afghanistan": 15.84, "Czech Republic": 43.33, "Saint Lucia": 46.7, "Mongolia": 46.08, "Slovakia": 44.77, "Vanuatu": 43.47, "Norway": 47.05, "Benin": 47.06, "Singapore": 43.35, "United States of America": 46.37, "Togo": 51.06, "Dominican Republic": 39.6, "Ukraine": 49.2, "Bahrain": 19.39, "Tonga": 42.69, "Libya": 27.96, "Central African Republic": 47.18, "Mauritius": 37.69, "Vietnam": 48.77, "Mali": 34.75, "Bulgaria": 46.42, "Romania": 44.58, "Angola": 45.96, "Chad": 44.86, "South Africa": 43.95, "Fiji": 32.41, "Malaysia": 37.38, "Austria": 46.04, "Mozambique": 53.19, "Uganda": 48.96, "Hungary": 45.91, "Niger": 31.13, "Guinea": 45.66, "Qatar": 11.97, "Bahamas": 48.35, "Earth": 39.97, "Nigeria": 42.55, "Bangladesh": 39.98, "Australia": 45.4, "Algeria": 17.07, "Puerto Rico": 42.34, "Belgium": 45.38, "Thailand": 45.79, "Haiti": 47.32, "Iraq": 17.47, "Hong Kong": 46.6, "Sierra Leone": 49.56, "Georgia": 46.95, "Denmark": 47.11, "Moldova": 49.31, "Namibia": 47.92, "Guinea-Bissau": 47.06, "Switzerland": 45.83, "Belize": 37.46, "Portugal": 47.48, "Uruguay": 44.52, "Djibouti": 34.93, "Timor-Leste": 33.55, "Colombia": 42.67, "Burundi": 51.4, "Cyprus": 43.5, "Barbados": 46.44, "Madagascar": 48.99, "Bhutan": 41.5, "Nepal": 50.68, "Democratic Republic of the Congo": 49.94, "Venezuela": 39.57, "Israel": 46.99, "Iceland": 47.34, "Senegal": 44.84, "Papua New Guinea": 48.27, "Germany": 45.64, "Gambia": 47.88, "Kazakhstan": 49.15, "Eritrea": 48.09, "North Korea": 47.84, "Latvia": 50.3, "Guyana": 35.12, "Honduras": 34.33, "Myanmar": 49.77, "Equatorial Guinea": 44.73, "Greece": 41.73, "Paraguay": 39.98, "Croatia": 45.83, "Botswana": 46.71}, "/location/statistical_region/population_growth_rate": {"Lithuania": -1.48, "Ethiopia": 2.58, "Aruba": 0.44, "Swaziland": 1.54, "Bolivia": 1.65, "Cameroon": 2.54, "Burkina Faso": 2.86, "Bahrain": 1.92, "Saudi Arabia": 1.88, "Cape Verde": 0.78, "Slovenia": 0.26, "Guatemala": 2.53, "Kuwait": 3.95, "Dominica": 0.4, "Liberia": 2.68, "Netherlands": 0.45, "Oman": 9.13, "Tanzania": 3.04, "Gabon": 2.39, "Pakistan": 1.69, "Albania": 0.26, "Samoa": 0.78, "India": 1.26, "Azerbaijan": 1.35, "Saint Vincent and the Grenadines": 0.01, "Kenya": 2.7, "South Korea": 0.45, "Tajikistan": 2.45, "Bangladesh": 1.19, "Solomon Islands": 2.13, "Mongolia": 1.52, "France": 0.5, "Rwanda": 2.77, "Slovakia": 0.22, "Laos": 1.89, "Benin": 2.73, "Federated States of Micronesia": -0.03, "Singapore": 2.45, "Montenegro": 0.07, "Ukraine": -0.25, "Tonga": 0.37, "Indonesia": 1.25, "Libya": 0.84, "Finland": 0.48, "Central African Republic": 1.99, "Sweden": 0.71, "Vietnam": 1.06, "Mali": 2.99, "Bulgaria": -0.6, "Romania": -0.27, "Portugal": -0.29, "South Africa": 1.18, "Fiji": 0.78, "Malaysia": 1.66, "Austria": 0.46, "Mozambique": 2.5, "Uganda": 3.35, "Japan": -0.2, "Brazil": 0.87, "Guinea": 2.56, "Costa Rica": 1.42, "Republic of Ireland": 0.26, "Bahamas": 1.52, "Nigeria": 2.79, "Ecuador": 1.6, "Brunei": 1.4, "Belarus": -0.1, "Iran": 1.32, "Algeria": 1.89, "El Salvador": 0.66, "Chile": 0.9, "Belgium": 0.85, "Thailand": 0.31, "Haiti": 1.39, "Belize": 2.43, "Hong Kong": 1.17, "Sierra Leone": 1.91, "Denmark": 0.36, "Philippines": 1.72, "Croatia": -0.32, "Guinea-Bissau": 2.39, "Switzerland": 1.07, "Seychelles": 0.39, "Chad": 3.0, "Estonia": -0.04, "Uzbekistan": 1.47, "Djibouti": 1.52, "Spain": 0.09, "Colombia": 1.32, "Burundi": 3.19, "Cyprus": 1.11, "Barbados": 0.5, "Madagascar": 2.8, "Bhutan": 1.68, "Maldives": 1.93, "Suriname": 0.9, "Venezuela": 1.53, "Iceland": 0.35, "Senegal": 2.92, "Papua New Guinea": 2.17, "Zimbabwe": 2.7, "Germany": 0.11, "Gambia": 3.19, "Kazakhstan": 1.43, "C\u00f4te d\u2019Ivoire": 2.29, "Mauritania": 2.49, "Iraq": 2.54, "Latvia": -1.6, "Guyana": 0.57, "Honduras": 2.03, "Republic of Macedonia": 0.08, "Serbia": -0.48, "Congo": 2.61, "Greece": -0.18, "Paraguay": 1.72, "Botswana": 0.86}, "/location/statistical_region/diesel_price_liter": {"United States of America": 1.05, "Lithuania": 1.7, "Cambodia": 1.27, "Argentina": 1.33, "Bolivia": 0.54, "Cameroon": 1.01, "Burkina Faso": 1.28, "Ghana": 0.95, "Republic of Ireland": 1.98, "Slovenia": 1.77, "Guatemala": 1.04, "Bosnia and Herzegovina": 1.62, "Guinea": 1.34, "Spain": 1.75, "Liberia": 1.22, "Maldives": 1.09, "Oman": 0.38, "C\u00f4te d\u2019Ivoire": 1.2, "New Zealand": 1.24, "Yemen": 0.47, "Pakistan": 1.2, "Albania": 1.79, "Samoa": 1.06, "United Arab Emirates": 0.64, "Azerbaijan": 0.57, "Kenya": 1.26, "South Korea": 1.63, "Afghanistan": 1.21, "Czech Republic": 1.87, "Slovakia": 1.85, "Laos": 1.18, "Norway": 2.35, "Malawi": 1.9, "Singapore": 1.26, "Montenegro": 1.75, "Antigua and Barbuda": 0.96, "Dominican Republic": 1.35, "Ukraine": 1.25, "Libya": 0.1, "Mauritius": 1.38, "Vietnam": 1.06, "Russia": 1.0, "Bulgaria": 1.68, "Angola": 0.42, "Portugal": 1.89, "South Africa": 1.42, "Fiji": 1.29, "Senegal": 1.53, "Mozambique": 1.23, "Uganda": 1.35, "Hungary": 1.91, "Niger": 1.12, "Kuwait": 0.2, "Costa Rica": 1.36, "Nigeria": 1.09, "Ecuador": 0.29, "Brunei": 0.26, "Iran": 0.12, "Algeria": 0.17, "El Salvador": 1.17, "Chile": 1.24, "Belgium": 1.98, "Thailand": 0.97, "Iraq": 0.56, "Hong Kong": 1.57, "Denmark": 1.89, "Morocco": 0.96, "Croatia": 1.7, "Switzerland": 2.06, "Chad": 1.16, "Estonia": 1.76, "Uzbekistan": 0.87, "Djibouti": 1.18, "Timor-Leste": 1.43, "Colombia": 1.18, "Cyprus": 1.78, "Madagascar": 1.22, "Bhutan": 0.86, "Sudan": 0.51, "Netherlands": 1.95, "Suriname": 1.52, "Venezuela": 0.01, "Israel": 2.12, "Iceland": 2.06, "Austria": 1.81, "Kazakhstan": 0.67, "Tanzania": 1.27, "Eritrea": 1.71, "Kyrgyzstan": 0.79, "North Korea": 1.31, "Latvia": 1.77, "Guyana": 1.05, "Syria": 0.36, "Honduras": 1.15, "Myanmar": 0.8, "Nicaragua": 1.19, "Serbia": 1.8, "Congo": 0.92, "Paraguay": 1.31, "Earth": 1.27, "Botswana": 1.25}, "/location/statistical_region/gdp_real": {"Turkmenistan": 9915617757.0, "Lithuania": 17527068250.0, "Argentina": 434405530244.0, "Bolivia": 12249026878.0, "Cameroon": 13905299155.0, "Ghana": 8722164062.0, "Slovenia": 26001281943.0, "Bosnia and Herzegovina": 8193311384.0, "Dominica": 326872087.0, "Maldives": 1072369790.0, "Tanzania": 19954809364.0, "Gabon": 6287360043.0, "New Zealand": 66856387828.0, "Pakistan": 116334731021.0, "Kosovo": 3358236649.0, "Azerbaijan": 21230787904.0, "Saint Vincent and the Grenadines": 441125963.0, "Kenya": 18938389509.0, "South Korea": 800205926791.0, "Tajikistan": 1920482333.0, "Bangladesh": 82795988931.0, "Eritrea": 692457272.0, "Solomon Islands": 615526980.0, "Mongolia": 2125694923.0, "France": 1484694819490.0, "Laos": 3430231223.0, "Norway": 196836975505.0, "Malawi": 2743896911.0, "Benin": 3336801340.0, "Federated States of Micronesia": 226813019.0, "United States of America": 11681216873700.0, "Togo": 1719332980.0, "Dominican Republic": 40196106908.0, "Tonga": 214573346.0, "Finland": 145567532707.0, "Indonesia": 274371100612.0, "Central African Republic": 1054122016.0, "Mauritius": 6630525389.0, "Sweden": 302113665981.0, "Vietnam": 62832215474.0, "Mali": 4148253583.0, "Russia": 414355712287.0, "Bulgaria": 19207765822.0, "Chad": 3097352885.0, "South Africa": 187234185124.0, "Nicaragua": 5250043844.0, "Malaysia": 146942750904.0, "Austria": 222637577391.0, "Mozambique": 9116571405.0, "Uganda": 12614923290.0, "Niger": 2793453329.0, "Brazil": 916131427896.0, "Bahamas": 5564794827.0, "Nigeria": 85602703669.0, "Ecuador": 24995505261.0, "Czech Republic": 77630138229.0, "Algeria": 78708051653.0, "El Salvador": 15963452873.0, "Chile": 108399900217.0, "Kiribati": 76026237.0, "Haiti": 3711642006.0, "Iraq": 23583402031.0, "Sierra Leone": 1574302614.0, "Georgia": 5603249070.0, "Denmark": 171232571662.0, "Philippines": 129017441694.0, "Moldova": 2122435432.0, "Morocco": 59797619847.0, "Croatia": 27970797366.0, "Guinea-Bissau": 244395463.0, "Switzerland": 293607939568.0, "Belize": 1212219000.0, "Portugal": 124994368988.0, "Estonia": 8252354890.0, "Uruguay": 31164067816.0, "Timor-Leste": 415400000.0, "Burundi": 966494858.0, "Fiji": 1865583038.0, "Madagascar": 5026822443.0, "Sudan": 22819076998.0, "Democratic Republic of the Congo": 6850715769.0, "Netherlands": 440122535471.0, "Israel": 169830330662.0, "Iceland": 10832625067.0, "Zambia": 5587389858.0, "Zimbabwe": 4081749006.0, "Gambia": 613102927.0, "Kazakhstan": 40395838537.0, "Mauritania": 1592148932.0, "Kyrgyzstan": 2030314527.0, "Trinidad and Tobago": 14054779697.0, "Latvia": 11220120363.0, "Guyana": 822264759.0, "Honduras": 10573497668.0, "Tunisia": 30347628073.0, "Republic of Macedonia": 4438521279.0, "Serbia": 9145935035.0, "Botswana": 8405868745.0, "Greece": 158667298803.0, "Sri Lanka": 27029192432.0, "Earth": 41365019350600.0, "Comoros": 247231031.0}, "/location/statistical_region/population": {"Turkmenistan": 5105301.0, "Georgia": 4486000.0, "Cambodia": 14305183.0, "Swaziland": 1067773.0, "Bolivia": 10088108.0, "Cameroon": 20030362.0, "Burkina Faso": 16967845.0, "Bahrain": 1323535.0, "Japan": 127817277.0, "Slovenia": 2050189.0, "Dominica": 67675.0, "Netherlands": 16847007.0, "C\u00f4te d\u2019Ivoire": 20152894.0, "Costa Rica": 4726575.0, "Gabon": 1534262.0, "New Zealand": 4242048.0, "Yemen": 24799880.0, "Pakistan": 179160111.0, "United Arab Emirates": 7890924.0, "Uruguay": 3368595.0, "India": 1236686732.0, "Azerbaijan": 9168000.0, "Lesotho": 2193843.0, "Kenya": 41609728.0, "Afghanistan": 31108077.0, "Czech Republic": 10546000.0, "Solomon Islands": 571890.0, "Rwanda": 10942950.0, "Laos": 6288037.0, "Norway": 5063709.0, "Benin": 9325032.0, "Federated States of Micronesia": 111542.0, "Singapore": 5312400.0, "Montenegro": 632261.0, "Saint Kitts and Nevis": 53051.0, "Armenia": 3100236.0, "Timor-Leste": 1175880.0, "Dominican Republic": 10056181.0, "Ukraine": 45706100.0, "Tonga": 104509.0, "Libya": 6422772.0, "Indonesia": 246864191.0, "Central African Republic": 4486837.0, "Mauritius": 1291456.0, "Australia": 23059862.0, "Russia": 143400000.0, "Bulgaria": 7476000.0, "Romania": 21380000.0, "Angola": 19618432.0, "Portugal": 10556999.0, "South Africa": 50586757.0, "Austria": 8419000.0, "Mozambique": 23929708.0, "Uganda": 34509205.0, "Hungary": 9942000.0, "Niger": 16068994.0, "Brazil": 198656019.0, "Guyana": 756040.0, "Qatar": 1870041.0, "Bahamas": 347176.0, "Belarus": 9473000.0, "Iran": 76424443.0, "Algeria": 35980193.0, "El Salvador": 6227491.0, "Chile": 17402630.0, "Puerto Rico": 3667084.0, "Belgium": 11041266.0, "Kiribati": 101093.0, "Haiti": 10123787.0, "Iraq": 32961959.0, "Hong Kong": 7071600.0, "Sierra Leone": 5997486.0, "Nepal": 30485798.0, "Gambia": 1776103.0, "Morocco": 32878400.0, "Namibia": 2259000.0, "Guinea-Bissau": 1547061.0, "Switzerland": 8014000.0, "Estonia": 1340000.0, "Kosovo": 1802765.0, "Uzbekistan": 29559100.0, "Djibouti": 792198.0, "Antigua and Barbuda": 89612.0, "Spain": 46818216.0, "Colombia": 46366364.0, "Burundi": 10216190.0, "Cyprus": 1058300.0, "Madagascar": 21315135.0, "Palau": 20956.0, "Bhutan": 738267.0, "Sudan": 34318385.0, "Vanuatu": 224564.0, "Maldives": 328536.0, "Suriname": 529419.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 179506.0, "Venezuela": 29278000.0, "Iceland": 321857.0, "Zambia": 13474959.0, "Zimbabwe": 12754378.0, "Germany": 80327900.0, "Denmark": 5602628.0, "Kazakhstan": 16967000.0, "Tanzania": 46218486.0, "Mauritania": 3541540.0, "Kyrgyzstan": 5507000.0, "Trinidad and Tobago": 1346350.0, "Earth": 7046368812.0, "Honduras": 7754687.0, "Myanmar": 52797319.0, "Equatorial Guinea": 720213.0, "Serbia": 7261000.0, "United Kingdom": 62744081.0, "Congo": 4139748.0, "Sri Lanka": 20328000.0, "Croatia": 4267000.0, "Comoros": 753943.0}, "/location/statistical_region/gdp_nominal_per_capita": {"Lithuania": 14150.19, "Ethiopia": 470.22, "Aruba": 25354.78, "Argentina": 11557.57, "Bolivia": 2575.68, "Ghana": 1604.89, "Japan": 46720.36, "Republic of Ireland": 45835.75, "Guatemala": 3368.49, "Bosnia and Herzegovina": 4446.52, "Spain": 29195.38, "Netherlands": 46054.41, "Oman": 23731.21, "New Zealand": 36648.0, "Yemen": 1494.43, "Pakistan": 1290.36, "Albania": 4148.85, "Samoa": 3584.33, "Kosovo": 3453.1, "India": 1489.24, "Azerbaijan": 7227.5, "Lesotho": 1193.04, "Kenya": 862.23, "Tajikistan": 872.34, "Bangladesh": 747.34, "Solomon Islands": 1834.84, "Saint Lucia": 6558.44, "Mongolia": 3672.97, "France": 35520.0, "Rwanda": 619.93, "Slovakia": 16934.33, "Norway": 99557.73, "Malawi": 268.05, "Benin": 751.92, "Federated States of Micronesia": 3164.56, "Singapore": 51709.45, "United States of America": 49965.27, "Saint Kitts and Nevis": 13968.58, "Armenia": 3337.86, "Timor-Leste": 1068.14, "Dominican Republic": 5736.44, "Finland": 46178.59, "Indonesia": 3556.79, "Central African Republic": 472.68, "Mauritius": 8124.17, "Russia": 14037.02, "Romania": 7942.83, "Angola": 5484.83, "Portugal": 20182.4, "Trinidad and Tobago": 17934.06, "Fiji": 4437.76, "Austria": 47226.2, "Mozambique": 578.8, "Hungary": 12621.74, "Niger": 382.83, "Brazil": 11339.52, "Kuwait": 56514.16, "Costa Rica": 9391.16, "Cape Verde": 3837.68, "Bahamas": 21908.28, "Nigeria": 1555.41, "Czech Republic": 18607.71, "Australia": 67035.57, "Iran": 13053.0, "El Salvador": 3777.25, "Chile": 15363.1, "Puerto Rico": 27677.53, "Belgium": 43412.53, "Sierra Leone": 634.92, "Georgia": 3508.42, "Gambia": 512.1, "Moldova": 2037.94, "Morocco": 2924.94, "Croatia": 13227.47, "Guinea-Bissau": 539.45, "Estonia": 16316.46, "Uruguay": 14449.5, "South Africa": 7507.67, "Uzbekistan": 1716.53, "Djibouti": 1463.59, "Antigua and Barbuda": 13207.16, "Dominica": 6691.02, "Colombia": 7752.17, "Burundi": 250.97, "Nicaragua": 1753.64, "Qatar": 90523.53, "Palau": 11005.87, "Bhutan": 2398.91, "Sudan": 1580.0, "Nepal": 706.65, "Democratic Republic of the Congo": 271.97, "Maldives": 6566.65, "Suriname": 8864.02, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 1402.08, "Iceland": 42658.4, "Senegal": 1031.6, "Germany": 41514.17, "Denmark": 56202.0, "Kazakhstan": 12006.59, "Eritrea": 504.3, "Kyrgyzstan": 1159.63, "North Korea": 1800.0, "Paraguay": 3813.47, "Earth": 10170.68, "Syria": 3289.06, "Equatorial Guinea": 24035.71, "Tunisia": 4236.79, "Republic of Macedonia": 4589.34, "Serbia": 5189.58, "Botswana": 7191.44, "Congo": 3153.74, "Greece": 22082.89, "Sri Lanka": 2923.13, "Namibia": 5668.39, "Comoros": 830.52}, "/location/statistical_region/renewable_freshwater_per_capita": {"Canada": 82647.084624, "Lithuania": 5135.020344, "Ethiopia": 1364.759142, "Argentina": 6776.54191, "Bolivia": 29396.253261, "Burkina Faso": 781.478924, "Bahrain": 3.094146, "Saudi Arabia": 86.44995, "Cape Verde": 611.550975, "Slovenia": 9094.704271, "Bosnia and Herzegovina": 9246.424238, "Guinea": 20248.120105, "Spain": 2408.250371, "Liberia": 49023.24854, "Netherlands": 658.955924, "Oman": 462.844497, "C\u00f4te d\u2019Ivoire": 3962.876859, "New Zealand": 74230.454917, "Pakistan": 312.204908, "India": 1184.123586, "Azerbaijan": 884.653598, "Lesotho": 2576.96909, "Kenya": 492.530068, "South Korea": 1302.758191, "Belarus": 3926.95028, "Czech Republic": 1252.847728, "Eritrea": 471.948399, "Solomon Islands": 83085.965163, "Mongolia": 12635.206696, "Rwanda": 852.452573, "Slovakia": 2334.031814, "Laos": 29196.569894, "Norway": 77123.604507, "Malawi": 1044.15123, "Singapore": 115.747439, "Togo": 1776.801584, "Armenia": 2314.00888, "Dominican Republic": 2069.455254, "Ukraine": 1161.77053, "Ghana": 1220.754962, "Indonesia": 8281.322506, "Finland": 19857.943326, "Central African Republic": 31783.837445, "Sweden": 18096.7452, "Vietnam": 4091.530055, "Mali": 4161.829407, "Bulgaria": 2857.792956, "Romania": 1978.037517, "Chad": 1241.718051, "Fiji": 32894.698942, "Qatar": 29.305532, "Senegal": 1935.376866, "Mozambique": 4080.326371, "Hungary": 601.70119, "Kuwait": 0.0, "Costa Rica": 23724.692254, "Bahamas": 54.595434, "Ecuador": 28334.407133, "Brunei": 20909.591845, "Australia": 22039.159824, "Algeria": 297.910953, "El Salvador": 2837.166465, "Puerto Rico": 1921.987346, "Belize": 50588.086506, "Gambia": 1729.140513, "Philippines": 5039.2707, "Namibia": 2777.755231, "Croatia": 8807.176564, "Switzerland": 5105.911002, "Portugal": 3599.507777, "Estonia": 9485.5843, "Uzbekistan": 556.895156, "Djibouti": 354.339358, "Timor-Leste": 6986.257101, "Colombia": 44860.964147, "Cyprus": 698.603599, "Barbados": 283.885254, "Madagascar": 15545.044789, "Bhutan": 106932.957149, "Nepal": 7298.472583, "Democratic Republic of the Congo": 14077.564754, "Maldives": 90.371245, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 11901.057447, "Venezuela": 24487.616788, "Israel": 96.576057, "Iceland": 532891.973393, "Papua New Guinea": 114216.829743, "Zimbabwe": 917.751362, "Germany": 1308.105672, "Denmark": 1077.088672, "Tanzania": 1812.117618, "Mauritania": 108.027438, "Kyrgyzstan": 8872.810358, "North Korea": 2720.117269, "Trinidad and Tobago": 2880.542982, "Latvia": 8133.383604, "Guyana": 304723.081319, "Syria": 324.747528, "Myanmar": 19159.224098, "Equatorial Guinea": 36313.052028, "Republic of Macedonia": 2566.674113, "Serbia": 1158.189191, "Comoros": 1713.756898, "United Kingdom": 2310.665945, "Earth": 6122.562736, "Botswana": 1208.032814}} -------------------------------------------------------------------------------- /src/main/abstractPredictor.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This is a baseline predictor. For each property, it finds the text patterns that correlate the best. 3 | If the value for a country cannot be predicted in this way, it returns the average of the property 4 | ''' 5 | import operator 6 | import json 7 | import numpy 8 | import random 9 | from sklearn.metrics import mean_squared_error 10 | import math 11 | import multiprocessing 12 | from collections import OrderedDict 13 | 14 | 15 | class AbstractPredictor(object): 16 | def __init__(self): 17 | pass 18 | 19 | @staticmethod 20 | def loadMatrix(jsonFile): 21 | print "loading from file " + jsonFile 22 | with open(jsonFile) as freebaseFile: 23 | property2region2value = json.loads(freebaseFile.read()) 24 | 25 | 26 | regions = set([]) 27 | valueCounter = 0 28 | for property, region2value in property2region2value.items(): 29 | # Check for nan values and remove them 30 | for region, value in region2value.items(): 31 | if not numpy.isfinite(value): 32 | del region2value[region] 33 | print "REMOVED:", value, " for ", region, " ", property 34 | if len(region2value) == 0: 35 | del property2region2value[property] 36 | print "REMOVED property:", property, " no values left" 37 | else: 38 | valueCounter += len(region2value) 39 | regions = regions.union(set(region2value.keys())) 40 | 41 | print len(property2region2value), " properties" 42 | print len(regions), " unique regions" 43 | print valueCounter, " values loaded" 44 | return property2region2value 45 | 46 | def train(self, trainMatrix, textMatrix, params): 47 | pass 48 | 49 | def predict(self, property, region): 50 | pass 51 | 52 | @classmethod 53 | def runRelEval(cls, d, property, trainRegion2value, textMatrix, testRegion2value, ofn, params): 54 | predictor = cls() 55 | of = open(ofn, "w") 56 | print "Training" 57 | 58 | #try: 59 | #cProfile.runctx('predictor.trainRelation(property, trainRegion2value, textMatrix, of, params)', globals(), locals()) 60 | 61 | predictor.trainRelation(property, trainRegion2value, textMatrix, of, params) 62 | print "Done training" 63 | #except FloatingPointError: 64 | # print "Training with params ", params, " failed due to floating point error" 65 | # avgScore = float("inf") 66 | #else: 67 | print "Testing" 68 | predMatrix = {} 69 | predMatrix[property] = {} 70 | for region in testRegion2value: 71 | predMatrix[property][region] = predictor.predict(property, region, of) 72 | 73 | testMatrix = {} 74 | testMatrix[property] = testRegion2value 75 | avgScore = predictor.eval(predMatrix, testMatrix, of) 76 | of.write("fold MAPE:" + str(avgScore) + "\n") 77 | 78 | # Now repeat the prediction but now do not use defaults 79 | of.write("Evaluation without using the defaults\n") 80 | predMatrix = {} 81 | predMatrix[property] = {} 82 | for region in testRegion2value: 83 | val = predictor.predict(property, region, of, False) 84 | if val != None: 85 | predMatrix[property][region] = val 86 | coverage = float(len(predMatrix[property]))/len(testMatrix[property]) 87 | if coverage > 0: 88 | avgScoreNoDefault = predictor.eval(predMatrix, testMatrix, of) 89 | of.write("fold MAPE without defaults:" + str(avgScoreNoDefault) +" coverage " + str(coverage) + "\n") 90 | else: 91 | of.write("No values predicted.\n") 92 | 93 | 94 | if ofn.split("_")[-1] == "TEST": 95 | d["TEST"] = avgScore 96 | else: 97 | d[int(ofn.split("_")[-1])] = avgScore 98 | return avgScore 99 | 100 | #finally: 101 | of.close() 102 | 103 | 104 | # the paramSets 105 | @classmethod 106 | def crossValidate(cls, trainMatrix, textMatrix, folds, properties, outputFileName, paramSets, multi=True): 107 | # first construct the folds per relation 108 | property2folds = {} 109 | # we set the same random in order to get the same folds every time 110 | # we do it on the whole dataset everytime independently of the choice of properties 111 | random.seed(13) 112 | # For each property 113 | for property, region2value in trainMatrix.items(): 114 | # create the empty folds 115 | property2folds[property] = [{} for _ in xrange(folds)] 116 | # shuffle the data points 117 | regions = region2value.keys() 118 | random.shuffle(regions) 119 | for idx, region in enumerate(regions): 120 | # pick a fold 121 | foldNo = idx % folds 122 | # add the datapoint there 123 | property2folds[property][foldNo][region] = region2value[region] 124 | 125 | # here we keep the best params found for each relation 126 | property2bestParams = {} 127 | bestParams = [None] 128 | 129 | # for each of the properties we decide 130 | for property in properties: 131 | print "property: " + property 132 | # this keeps the lowest MAPE achieved for this property across folds 133 | lowestAvgMAPE = float("inf") 134 | 135 | for params in paramSets: 136 | print "params: ", params 137 | 138 | if multi: 139 | # this is to do the cross validation across folds 140 | mgr = multiprocessing.Manager() 141 | d = mgr.dict() 142 | jobs = [] 143 | else: 144 | d= {} 145 | 146 | 147 | paramMAPEs = [] 148 | # for each fold 149 | 150 | for foldNo in xrange(folds): 151 | print "fold:", foldNo 152 | # construct the training and test datasets 153 | foldTrainRegion2value = {} 154 | foldTestRegion2value = {} 155 | data = property2folds[property] 156 | 157 | foldTrainMatrix = {} 158 | for idx in xrange(folds): 159 | if (idx % folds) == foldNo: 160 | # this the test data 161 | foldTestRegion2value = data[idx] 162 | else: 163 | # the rest adds to the training data 164 | foldTrainRegion2value.update(data[idx]) 165 | 166 | # now create a predictor and run the eval 167 | predictor = cls() 168 | # run the eval 169 | # open the file for the relation, fold and params 170 | paramsStrs = [] 171 | for param in params: 172 | paramsStrs.append(str(param)) 173 | #print outputFileName 174 | #print paramsStrs 175 | #print "_".join(paramsStrs) 176 | #print property.split("/")[-1] 177 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_" + str(foldNo) 178 | if multi: 179 | job = multiprocessing.Process(target=predictor.runRelEval, args=(d, property, foldTrainRegion2value, textMatrix, foldTestRegion2value, ofn, params,)) 180 | jobs.append(job) 181 | else: 182 | predictor.runRelEval(d, property, foldTrainRegion2value, textMatrix, foldTestRegion2value, ofn, params) 183 | 184 | if multi: 185 | # start all the jobs 186 | for j in jobs: 187 | j.start() 188 | 189 | # Ensure all of the processes have finished 190 | for j in jobs: 191 | j.join() 192 | 193 | orderedFold2MAPE = OrderedDict(sorted(d.items(), key=lambda t: t[0])) 194 | # get the average across folds 195 | if float("inf") not in orderedFold2MAPE.values(): 196 | avgMAPE = numpy.mean(orderedFold2MAPE.values()) 197 | print property + ":params:", params, " avgMAPE:", avgMAPE, "stdMAPE:", numpy.std(orderedFold2MAPE.values()), "foldMAPEs:", orderedFold2MAPE.values() 198 | # lower is better 199 | if avgMAPE <= lowestAvgMAPE: 200 | bestParams = params 201 | lowestAvgMAPE = avgMAPE 202 | 203 | else: 204 | print property + ":params:", params, "Training in some folds failed due to overflow", "foldMAPEs:", orderedFold2MAPE.values() 205 | 206 | 207 | 208 | print property + ": lowestAvgMAPE:", lowestAvgMAPE 209 | print property + ": bestParams: ", bestParams 210 | property2bestParams[property] = bestParams 211 | 212 | # we return the best params 213 | return property2bestParams 214 | 215 | 216 | @staticmethod 217 | def eval(predMatrix, testMatrix, of): 218 | of.write(str(predMatrix) +"\n") 219 | of.write(str(testMatrix) +"\n") 220 | property2MAPE = {} 221 | property2MASE = {} 222 | property2RMSE = {} 223 | for property, predRegion2value in predMatrix.items(): 224 | of.write(property+"\n") 225 | #print "real: ", testMatrix[property] 226 | #print "predicted: ", predRegion2value 227 | mape = AbstractPredictor.MAPE(predRegion2value, testMatrix[property]) 228 | of.write("MAPE: " + str(mape) + "\n") 229 | property2MAPE[property] = mape 230 | rmse = AbstractPredictor.RMSE(predRegion2value, testMatrix[property]) 231 | of.write("RMSE: " + str(rmse) + "\n") 232 | property2RMSE[property] = rmse 233 | mase = AbstractPredictor.MASE(predRegion2value, testMatrix[property]) 234 | #of.write("MASE: " + str(mase) + "\n") 235 | property2MASE[property] = mase 236 | 237 | #return numpy.mean(MAPEs) 238 | of.write("properties ordered by MAPE\n") 239 | sortedMAPEs = sorted(property2MAPE.items(), key=operator.itemgetter(1)) 240 | for property, mape in sortedMAPEs: 241 | of.write(property + ":" + str(mape)+"\n") 242 | 243 | #of.write("properties ordered by MASE\n") 244 | #sortedMASEs = sorted(property2MASE.items(), key=operator.itemgetter(1)) 245 | #for property, mase in sortedMASEs: 246 | # of.write(property + ":" + str(mase)+"\n") 247 | 248 | 249 | of.write("avg. MAPE: " + str(numpy.mean(property2MAPE.values())) +"\n") 250 | of.write("avg. RMSE: " + str(numpy.mean(property2RMSE.values())) +"\n") 251 | #of.write("avg. MASE: " + str(numpy.mean(property2MASE.values())) +"\n") 252 | # we use MASE as the main metric, which is returned to guide the hyperparamter selection 253 | return numpy.mean(property2MAPE.values()) 254 | 255 | # We follow the definitions of Chen and Yang (2004) 256 | # the second dict does the scaling 257 | # not defined when the trueDict value is 0 258 | # returns the mean absolute percentage error and the number of predicted values used in it 259 | @staticmethod 260 | def MAPE(predDict, trueDict, verbose=False): 261 | absPercentageErrors = {} 262 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 263 | 264 | #print keysInCommon 265 | for key in keysInCommon: 266 | # avoid 0's 267 | if trueDict[key] != 0: 268 | absError = abs(predDict[key] - trueDict[key]) 269 | absPercentageErrors[key] = absError/numpy.abs(trueDict[key]) 270 | 271 | if len(absPercentageErrors) > 0: 272 | if verbose: 273 | print "MAPE results" 274 | sortedAbsPercentageErrors = sorted(absPercentageErrors.items(), key=operator.itemgetter(1)) 275 | print "top-5 predictions" 276 | print "region:pred:true" 277 | for idx in xrange(5): 278 | print sortedAbsPercentageErrors[idx][0].encode('utf-8'), ":", predDict[sortedAbsPercentageErrors[idx][0]], ":", trueDict[sortedAbsPercentageErrors[idx][0]] 279 | print "bottom-5 predictions" 280 | for idx in xrange(5): 281 | print sortedAbsPercentageErrors[-idx-1][0].encode('utf-8'), ":", predDict[sortedAbsPercentageErrors[-idx-1][0]], ":", trueDict[sortedAbsPercentageErrors[-idx-1][0]] 282 | 283 | return numpy.mean(absPercentageErrors.values()) 284 | else: 285 | return float("inf") 286 | 287 | 288 | # This is MASE, sort of proposed in Hyndman 2006 289 | # at the moment the evaluation metric of choice 290 | # it returns 1 if the method has the same absolute errors as the median of the test set. 291 | @staticmethod 292 | def MASE(predDict, trueDict, verbose=False): 293 | # first let's estimate the error from the median: 294 | median = numpy.median(trueDict.values()) 295 | 296 | # calculate the errors of the test median 297 | # we are scaling with the error of the median on the value at question. This will be 0 often, thus we want to know the smallest non-zero to add it. 298 | minMedianAbsError = float("inf") 299 | for value in trueDict.values(): 300 | medianAbsError = numpy.abs(value - median) 301 | if medianAbsError > 0 and medianAbsError < minMedianAbsError: 302 | minMedianAbsError = medianAbsError 303 | 304 | 305 | # get those that were predicted 306 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 307 | predScaledAbsErrors = {} 308 | for key in keysInCommon: 309 | predScaledAbsErrors[key] = (numpy.abs(predDict[key] - trueDict[key]) + minMedianAbsError)/(numpy.abs(median - trueDict[key]) + minMedianAbsError) 310 | 311 | if verbose: 312 | print "MASE results" 313 | sortedPredScaledAbsErrors = sorted(predScaledAbsErrors.items(), key=operator.itemgetter(1)) 314 | print "top-5 predictions" 315 | print "region:pred:true" 316 | for idx in xrange(5): 317 | print sortedPredScaledAbsErrors[idx][0].encode('utf-8'), ":", predDict[sortedPredScaledAbsErrors[idx][0]], ":", trueDict[sortedPredScaledAbsErrors[idx][0]] 318 | print "bottom-5 predictions" 319 | for idx in xrange(5): 320 | print sortedPredScaledAbsErrors[-idx-1][0].encode('utf-8'), ":", predDict[sortedPredScaledAbsErrors[-idx-1][0]], ":", trueDict[sortedPredScaledAbsErrors[-idx-1][0]] 321 | 322 | return numpy.mean(predScaledAbsErrors.values()) 323 | 324 | 325 | 326 | 327 | # This is the KL-DE1 measure defined in Chen and Yang (2004) 328 | @staticmethod 329 | def KLDE(predDict, trueDict, verbose=False): 330 | kldes = {} 331 | # first we need to get the stdev used in scaling 332 | # let's use all the values for this, not only the ones in common 333 | std = numpy.std(trueDict.values()) 334 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 335 | 336 | for key in keysInCommon: 337 | scaledAbsError = abs(predDict[key] - trueDict[key])/std 338 | klde = numpy.exp(-scaledAbsError) + scaledAbsError - 1 339 | kldes[key] = klde 340 | 341 | if verbose: 342 | print "KLDE results" 343 | sortedKLDEs = sorted(kldes.items(), key=operator.itemgetter(1)) 344 | print "top-5 predictions" 345 | print "region:pred:true" 346 | for idx in xrange(5): 347 | print sortedKLDEs[idx][0].encode('utf-8'), ":", predDict[sortedKLDEs[idx][0]], ":", trueDict[sortedKLDEs[idx][0]] 348 | print "bottom-5 predictions" 349 | for idx in xrange(5): 350 | print sortedKLDEs[-idx-1][0].encode('utf-8'), ":", predDict[sortedKLDEs[-idx-1][0]], ":", trueDict[sortedKLDEs[-idx-1][0]] 351 | 352 | return numpy.mean(kldes.values()) 353 | 354 | # This does a scaling according to the number of values actually used in the calculation 355 | # The more values used, the lower the score (lower is better) 356 | # smaller scaling parameters make the number of values used more important, larger lead to the same as standard KLDE 357 | # Inspired by the shrunk correlation coefficient (Koren 2008 equation 2) 358 | @staticmethod 359 | def supportScaledKLDE(predDict, trueDict, scalingParam=1): 360 | klde = AbstractPredictor.KLDE(predDict, trueDict) 361 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 362 | scalingFactor = scalingParam/(scalingParam + len(keysInCommon)) 363 | return klde * scalingFactor 364 | 365 | @staticmethod 366 | def supportScaledMASE(predDict, trueDict, scalingParam=1): 367 | mase = AbstractPredictor.MASE(predDict, trueDict) 368 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 369 | scalingFactor = float(scalingParam)/(scalingParam + len(keysInCommon)) 370 | return mase * scalingFactor 371 | 372 | @staticmethod 373 | def supportScaledMAPE(predDict, trueDict, scalingParam=1): 374 | mape = AbstractPredictor.MAPE(predDict, trueDict) 375 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 376 | scalingFactor = float(scalingParam)/(scalingParam + len(keysInCommon)) 377 | return mape * scalingFactor 378 | 379 | 380 | @staticmethod 381 | def RMSE(predDict, trueDict): 382 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys())) 383 | #print keysInCommon 384 | y_actual = [] 385 | y_predicted = [] 386 | for key in keysInCommon: 387 | y_actual.append(trueDict[key]) 388 | y_predicted.append(predDict[key]) 389 | return math.sqrt(mean_squared_error(y_actual, y_predicted)) 390 | -------------------------------------------------------------------------------- /src/main/baselinePredictor.py: -------------------------------------------------------------------------------- 1 | import fixedValuePredictor 2 | import numpy 3 | import heapq 4 | 5 | class BaselinePredictor(fixedValuePredictor.FixedValuePredictor): 6 | 7 | def __init__(self): 8 | # this keeps the patterns for each relation 9 | # each pattern has a dict of regions and values associated with it 10 | self.property2patterns = {} 11 | # this initializes the fixed value 12 | fixedValuePredictor.FixedValuePredictor.__init__(self) 13 | #super(BaselinePredictor,self).__init_() 14 | 15 | 16 | def predict(self, property, region, of, useDefault=True): 17 | # collect all the values for this region found in related patterns 18 | # if 0 then one pattern is enough, otherwise more are needed to avoid default 19 | patternsRequiredForPrediction = 0 20 | values = [] 21 | if property in self.property2patterns: 22 | patterns = self.property2patterns[property] 23 | for pattern, region2value in patterns.items(): 24 | if region in region2value: 25 | values.append(region2value[region]) 26 | of.write("region: " + region.encode('utf-8') + " pattern used: " + pattern.encode('utf-8') + " value: " + str(region2value[region]) + "\n") 27 | 28 | if len(values) > patternsRequiredForPrediction: 29 | return numpy.mean(values) 30 | else: 31 | if useDefault: 32 | return fixedValuePredictor.FixedValuePredictor.predict(self, property, region, of) 33 | else: 34 | return None 35 | 36 | def trainRelation(self, property, trainRegion2value, textMatrix, of, params=[False]): 37 | # get the fixed value first 38 | fixedValuePredictor.FixedValuePredictor.trainRelation(self, property, trainRegion2value, textMatrix, of) 39 | 40 | # whether we are scaling or not 41 | scaling = params[0] 42 | # if we are scaling, what is the scaling parameter? 43 | print scaling 44 | if scaling: 45 | scalingParam=float(params[1]) 46 | of.write("Training with MAPE support scaling parameter: " + str(scalingParam) + "\n") 47 | else: 48 | 49 | of.write("Training without MAPE support scaling\n") 50 | 51 | self.property2patterns[property] = {} 52 | #print "OK" 53 | patternMAPEs = [] 54 | # we first need to rank all the text patterns according to their msae 55 | for pattern, region2value in textMatrix.items(): 56 | # make sure that it has at least two value in common with training data, otherwise we might get spurious stuff 57 | keysInCommon = list(set(region2value.keys()) & set(trainRegion2value.keys())) 58 | if len(keysInCommon) > 1: 59 | if scaling: 60 | mape = self.supportScaledMAPE(region2value, trainRegion2value, scalingParam) 61 | else: 62 | mape = self.MAPE(region2value, trainRegion2value) 63 | heapq.heappush(patternMAPEs, (mape, pattern)) 64 | #print "OK" 65 | # predict 66 | prediction = {} 67 | for region in trainRegion2value: 68 | prediction[region] = self.predict(property, region, of) 69 | 70 | # calculate the current score with 71 | currentMAPE = self.MAPE(prediction, trainRegion2value) 72 | 73 | while len(patternMAPEs)>0: 74 | # The pattern with the smallest MAPE is indexed at 0 75 | # the elememts are (MAPE, pattern) tuples 76 | mape, pattern = heapq.heappop(patternMAPEs) 77 | 78 | # add it to the classifiers 79 | self.property2patterns[property][pattern] = textMatrix[pattern] 80 | 81 | of.write("text pattern: " + pattern.encode('utf-8') + "\n") 82 | 83 | of.write("MAPE:" + str(mape) + "\n") 84 | #print "MASE", self.MASE(textMatrix[pattern], trainRegion2value) 85 | of.write(str(textMatrix[pattern]) + "\n") 86 | 87 | # predict 88 | prediction = {} 89 | 90 | for region in trainRegion2value: 91 | prediction[region] = self.predict(property, region, of) 92 | 93 | # calculate new MAPE 94 | newMAPE = self.MAPE(prediction, trainRegion2value) 95 | of.write("MAPE of predictor before adding the pattern:" + str(currentMAPE) + "\n") 96 | of.write("MAPE of predictor after adding the pattern:" + str(newMAPE) + "\n") 97 | # if higher than before, remove the last pattern added and break 98 | 99 | if newMAPE > currentMAPE: 100 | del self.property2patterns[property][pattern] 101 | break 102 | else: 103 | currentMAPE = newMAPE 104 | 105 | 106 | 107 | 108 | def train(self, trainMatrix, textMatrix, params=[False]): 109 | fixedValuePredictor.FixedValuePredictor.train(self, trainMatrix, textMatrix) 110 | 111 | # whether we are scaling or not 112 | scaling = params[0] 113 | # if we are scaling, what is the scaling parameter? 114 | if scaling: 115 | scalingParam=float(params[1]) 116 | print "Training with MAPE supprt scaling parameter: ", scalingParam 117 | else: 118 | print "Training without MAPE support scaling" 119 | 120 | # for each property, find the patterns that result in improving the average the most 121 | # it should get better initially as good patterns are added, but then down as worse ones are added 122 | for property, trainRegion2value in trainMatrix.items(): 123 | print property, trainRegion2value 124 | # first get the average 125 | #self.property2median[property] = numpy.median(trainRegion2value.values()) 126 | self.property2patterns[property] = {} 127 | 128 | # this is used to store the msaes for each pattern 129 | patternMAPEs = [] 130 | # we first need to rank all the text patterns according to their msae 131 | for pattern, region2value in textMatrix.items(): 132 | # make sure that it has at least two value in common with training data, otherwise we might get spurious stuff 133 | keysInCommon = list(set(region2value.keys()) & set(trainRegion2value.keys())) 134 | if len(keysInCommon) > 1: 135 | if scaling: 136 | mape = self.supportScaledMAPE(region2value, trainRegion2value, scalingParam) 137 | else: 138 | mape = self.MAPE(region2value, trainRegion2value) 139 | heapq.heappush(patternMAPEs, (mape, pattern)) 140 | 141 | # predict 142 | prediction = {} 143 | for region in trainRegion2value: 144 | prediction[region] = self.predict(property, region) 145 | # calculate the current score with 146 | currentMAPE = self.MAPE(prediction, trainRegion2value) 147 | while True: 148 | # The pattern with the smallest MAPE is indexed at 0 149 | # the elememts are (MAPE, pattern) tuples 150 | mape, pattern = heapq.heappop(patternMAPEs) 151 | 152 | # add it to the classifiers 153 | self.property2patterns[property][pattern] = textMatrix[pattern] 154 | print "text pattern: " + pattern.encode('utf-8') 155 | print "MAPE:", mape 156 | print "MASE", self.MASE(textMatrix[pattern], trainRegion2value) 157 | print textMatrix[pattern] 158 | 159 | # predict 160 | for region in trainRegion2value: 161 | prediction[region] = self.predict(property, region) 162 | 163 | # calculate new MAPE 164 | newMAPE = self.MAPE(prediction, trainRegion2value) 165 | print "MAPE of predictor before adding the pattern:", currentMAPE 166 | print "MAPE of predictor after adding the pattern:", newMAPE 167 | # if higher than before, remove the last pattern added and break 168 | if newMAPE > currentMAPE: 169 | del self.property2patterns[property][pattern] 170 | break 171 | else: 172 | currentMAPE = newMAPE 173 | 174 | 175 | 176 | if __name__ == "__main__": 177 | 178 | import sys 179 | import json 180 | import os.path 181 | 182 | baselinePredictor = BaselinePredictor() 183 | 184 | trainMatrix = baselinePredictor.loadMatrix(sys.argv[1]) 185 | textMatrix = baselinePredictor.loadMatrix(sys.argv[2]) 186 | testMatrix = baselinePredictor.loadMatrix(sys.argv[3]) 187 | 188 | outputFileName = sys.argv[4] 189 | 190 | adjust = sys.argv[5] 191 | 192 | properties = json.loads(open(os.path.dirname(os.path.abspath(sys.argv[1])) + "/featuresKept.json").read()) 193 | 194 | if adjust == "TRUE": 195 | property2bestParams = baselinePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[True, 0.0078125], [True, 0.015625],[True, 0.03125],[True, 0.0625],[True, 0.125],[True, 0.25],[True,0.5],[True,1],[True,2],[True,4],[True,8],[True,16],[True,32],]) 196 | else: 197 | property2bestParams = baselinePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[False]]) 198 | #print "OK" 199 | 200 | property2MAPE = {} 201 | for property in properties: 202 | paramsStrs = [] 203 | for param in property2bestParams[property]: 204 | paramsStrs.append(str(param)) 205 | 206 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_TEST" 207 | a= {} 208 | baselinePredictor.runRelEval(a, property, trainMatrix[property], textMatrix, testMatrix[property], ofn, property2bestParams[property]) 209 | property2MAPE[property] = a.values()[0] 210 | 211 | for property in sorted(property2MAPE): 212 | print property, property2MAPE[property] 213 | print "avg MAPE:", str(numpy.mean(property2MAPE.values())) 214 | 215 | -------------------------------------------------------------------------------- /src/main/buildMatrix.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 3 | This script reads in parsed and NER'ed JSONs by Stanford CoreNLP and produces the following structure: 4 | 5 | Location:[dep1:[val1, val2], dep1:[val1, val2, ...]] 6 | 7 | 8 | ''' 9 | 10 | import json 11 | import sys 12 | import os 13 | import glob 14 | import networkx 15 | import re 16 | import copy 17 | import numpy 18 | import codecs 19 | 20 | # this class def allows us to write: 21 | #print(json.dumps(np.arange(5), cls=NumPyArangeEncoder)) 22 | #class NumPyArangeEncoder(json.JSONEncoder): 23 | # def default(self, obj): 24 | # if isinstance(obj, numpy.ndarray): 25 | # return obj.tolist() # or map(int, obj) 26 | # return json.JSONEncoder.default(self, obj) 27 | 28 | 29 | 30 | def getNumbers(sentence): 31 | # a number can span over multiple tokens 32 | tokenIDs2number = {} 33 | for idx, token in enumerate(sentence["tokens"]): 34 | # avoid only tokens known to be dates or part of locations 35 | # This only takes actual numbers into account thus it ignores things like "one million" 36 | # and also treating "500 millions" as "500" 37 | if token["ner"] not in ["DATE", "LOCATION", "PERSON", "ORGANIZATION", "MISC"]: 38 | try: 39 | # this makes sure that 123,123,123.23 which fails the float test, becomes 123123123.23 which is good 40 | tokenWithoutCommas = re.sub(",([0-9][0-9][0-9])", "\g<1>", token["word"]) 41 | number = float(tokenWithoutCommas) 42 | # we want this to avoid taking in nan, inf and -inf as floats 43 | if numpy.isfinite(number): 44 | ids = [idx] 45 | # check the next token if it is million or thousand 46 | if len(sentence["tokens"]) > idx+1: 47 | if sentence["tokens"][idx+1]["word"].startswith("trillion"): 48 | number = number * 1000000000000 49 | ids.append(idx+1) 50 | elif sentence["tokens"][idx+1]["word"].startswith("billion"): 51 | number = number * 1000000000 52 | ids.append(idx+1) 53 | elif sentence["tokens"][idx+1]["word"].startswith("million"): 54 | number = number * 1000000 55 | ids.append(idx+1) 56 | #print sentence["tokens"] 57 | #print number 58 | elif sentence["tokens"][idx+1]["word"].startswith("thousand"): 59 | number = number * 1000 60 | ids.append(idx+1) 61 | #print sentence["tokens"] 62 | #print number 63 | 64 | tokenIDs2number[tuple(ids)] = number 65 | 66 | except ValueError: 67 | pass 68 | return tokenIDs2number 69 | 70 | # this function performs a dictNER matching to help with names that Stanford NER fails 71 | # use with caution, it ignores everything apart from the tokens, over-writing existing NER tags 72 | def dictLocationMatching(sentence, tokenizedLocations): 73 | # first re-construct the sentence as a string 74 | wordsInSentence = [] 75 | for token in sentence["tokens"]: 76 | wordsInSentence.append(token["word"]) 77 | #print wordsInSentence 78 | for tokLoc in tokenizedLocations: 79 | #print wordsInSentence 80 | #print tokLoc 81 | tokenSeqs = [(i, i+len(tokLoc)) for i in range(len(wordsInSentence)) if wordsInSentence[i:i+len(tokLoc)] == tokLoc] 82 | #print tokenSeqs 83 | for tokenSeq in tokenSeqs: 84 | for tokenNo in range(tokenSeq[0], tokenSeq[1]): 85 | sentence["tokens"][tokenNo]["ner"] = "LOCATION" 86 | 87 | def getLocations(sentence): 88 | # note that a location can span multiple tokens 89 | tokenIDs2location = {} 90 | currentLocation = [] 91 | for idx, token in enumerate(sentence["tokens"]): 92 | # if it is a location token add it: 93 | if token["ner"] == "LOCATION": 94 | currentLocation.append(idx) 95 | # if it is a no location token 96 | else: 97 | # check if we have just finished a location 98 | if len(currentLocation) > 0: 99 | # convert the tokenID to a tuple (immutable) and put the name there 100 | locationTokens = [] 101 | for locIdx in currentLocation: 102 | locationTokens.append(sentence["tokens"][locIdx]["word"]) 103 | 104 | tokenIDs2location[tuple(currentLocation)] = " ".join(locationTokens) 105 | currentLocation = [] 106 | 107 | return tokenIDs2location 108 | 109 | def buildDAGfromSentence(sentence): 110 | sentenceDAG = networkx.DiGraph() 111 | for idx, token in enumerate(sentence["tokens"]): 112 | sentenceDAG.add_node(idx, word=token["word"]) 113 | sentenceDAG.add_node(idx, lemma=token["lemma"]) 114 | sentenceDAG.add_node(idx, ner=token["ner"]) 115 | sentenceDAG.add_node(idx, pos=token["pos"]) 116 | 117 | # and now the edges: 118 | for dependency in sentence["dependencies"]: 119 | sentenceDAG.add_edge(dependency["head"], dependency["dep"], label=dependency["label"]) 120 | # add the reverse if one doesn't exist 121 | # if an edge exists, the label gets updated, thus the standard edges do 122 | if not sentenceDAG.has_edge(dependency["dep"], dependency["head"]): 123 | sentenceDAG.add_edge(dependency["dep"], dependency["head"], label="-" + dependency["label"]) 124 | return sentenceDAG 125 | 126 | # getDepPaths 127 | # also there can be more than one paths 128 | def getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs): 129 | shortestPaths = [] 130 | for locationTokenID in locationTokenIDs: 131 | for numberTokenID in numberTokenIDs: 132 | try: 133 | # get the shortest paths 134 | # get the list as it they are unlikely to be very many and we need to len() 135 | tempShortestPaths = list(networkx.all_shortest_paths(sentenceDAG, source=locationTokenID, target=numberTokenID)) 136 | # if the paths found are shorter than the ones we had (or we didn't have any) 137 | if (len(shortestPaths) == 0) or len(shortestPaths[0]) > len(tempShortestPaths[0]): 138 | shortestPaths = tempShortestPaths 139 | # if they have equal length add them 140 | elif len(shortestPaths[0]) == len(tempShortestPaths[0]): 141 | shortestPaths.extend(tempShortestPaths) 142 | # if not paths were found, do nothing 143 | except networkx.exception.NetworkXNoPath: 144 | pass 145 | return shortestPaths 146 | 147 | # given the a dep path defined by the nodes, get the string of the lexicalized dep path, possibly extended by one more dep 148 | def depPath2StringExtend(sentenceDAG, path, locationTokenIDs, numberTokenIDs, extend=True): 149 | strings = [] 150 | # this keeps the various bits of the string 151 | pathStrings = [] 152 | # get the first dep which is from the location 153 | pathStrings.append("LOCATION_SLOT~" + sentenceDAG[path[0]][path[1]]["label"]) 154 | # for the words in between add the lemma and the dep 155 | hasContentWord = False 156 | for seqOnPath, tokenId in enumerate(path[1:-1]): 157 | if sentenceDAG.node[tokenId]["ner"] == "O": 158 | pathStrings.append(sentenceDAG.node[tokenId]["word"].lower() + "~" + sentenceDAG[tokenId][path[seqOnPath+2]]["label"]) 159 | if sentenceDAG.node[tokenId]["pos"][0] in "NVJR": 160 | hasContentWord = True 161 | else: 162 | pathStrings.append(sentenceDAG.node[tokenId]["ner"] + "~" + sentenceDAG[tokenId][path[seqOnPath+2]]["label"]) 163 | 164 | pathStrings.append("NUMBER_SLOT") 165 | 166 | if hasContentWord: 167 | strings.append("+".join(pathStrings)) 168 | 169 | if extend: 170 | # create additional paths by adding all out-edges from the number token (except for the ones on the path) 171 | # the extension is always added left of the node 172 | for nodeOnPath in path: 173 | # go over each node on the path 174 | outEdges = sentenceDAG.out_edges_iter([nodeOnPath]) 175 | 176 | for pathIdx, edge in enumerate(outEdges): 177 | tempPathStrings = copy.deepcopy(pathStrings) 178 | # the source of the edge we knew 179 | curNode, outNode = edge 180 | # if we are not going on the path 181 | if outNode not in path and outNode not in numberTokenIDs: 182 | if sentenceDAG.node[outNode]["ner"] == "O": 183 | if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR": 184 | #print "*extend*" + sentenceDAG.node[outNode]["lemma"] + "~" + sentenceDAG[curNode][outNode]["label"] 185 | #print pathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["lemma"] + "~" + sentenceDAG[curNode][outNode]["label"]) 186 | tempPathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["word"].lower() + "~" + sentenceDAG[curNode][outNode]["label"]) 187 | #print tempPathStrings 188 | strings.append("+".join(tempPathStrings)) 189 | elif hasContentWord: 190 | tempPathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["ner"] + "~" + sentenceDAG[curNode][outNode]["label"]) 191 | strings.append("+".join(tempPathStrings)) 192 | 193 | 194 | # # create additional paths by adding all out-edges from the number token (except for the one taking as back) 195 | # # the number token is the last one on the path 196 | # #outEdgesFromNumber = sentenceDAG.out_edges_iter([path[-1]]) 197 | # #for edge in outEdgesFromNumber: 198 | # # the source of the edge we knew 199 | # dummy, outNode = edge 200 | # # if we are not going back 201 | # if outNode != path[-2] and outNode not in numberTokenIDs: 202 | # if sentenceDAG.node[outNode]["ner"] == "O": 203 | # if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR": 204 | # strings.append("+".join(pathStrings + ["NUMBER_SLOT~" + sentenceDAG[path[-1]][outNode]["label"] + "~" + sentenceDAG.node[outNode]["lemma"] ])) 205 | # elif hasContentWord: 206 | # strings.append("+".join(pathStrings + ["NUMBER_SLOT~" + sentenceDAG[path[-1]][outNode]["label"] + "~" + sentenceDAG.node[outNode]["ner"] ])) 207 | # 208 | # # do the same for the LOCATION 209 | # outEdgesFromLocation = sentenceDAG.out_edges_iter([path[0]]) 210 | # for edge in outEdgesFromLocation: 211 | # # the source of the edge we knew 212 | # dummy, outNode = edge 213 | # # if we are not going on the path 214 | # if outNode != path[1] and outNode not in locationTokenIDs: 215 | # if sentenceDAG.node[outNode]["ner"] == "O": 216 | # if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR": 217 | # strings.append("+".join([sentenceDAG.node[outNode]["lemma"] + "~"+ sentenceDAG[path[0]][outNode]["label"]] + pathStrings + ["NUMBER_SLOT"])) 218 | # elif hasContentWord: 219 | # strings.append("+".join([sentenceDAG.node[outNode]["ner"] + "~"+ sentenceDAG[path[0]][outNode]["label"]] + pathStrings + ["NUMBER_SLOT"])) 220 | # 221 | 222 | return strings 223 | 224 | def getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs, extend=True): 225 | # so this can go either from the location to the number, or the other way around 226 | # if the number token is before the first token of the location 227 | tokenSeqs = [] 228 | if numberTokenIDs[-1] < locationTokenIDs[0]: 229 | tokenIDs = range(numberTokenIDs[-1]+1, locationTokenIDs[0]) 230 | else: 231 | tokenIDs = range(locationTokenIDs[-1]+1, numberTokenIDs[0]) 232 | 233 | # check whether there is a content word: 234 | hasContentWord = False 235 | tokens = [] 236 | for id in tokenIDs: 237 | if sentence["tokens"][id]["ner"] == "O": 238 | tokens.append('"' + sentence["tokens"][id]["word"].lower() + '"') 239 | if sentence["tokens"][id]["pos"][0] in "NVJR": 240 | hasContentWord = True 241 | else: 242 | tokens.append('"' + sentence["tokens"][id]["ner"] + '"') 243 | 244 | if numberTokenIDs[-1] < locationTokenIDs[0]: 245 | tokens = ["NUMBER_SLOT"] + tokens + ["LOCATION_SLOT"] 246 | else: 247 | tokens = ["LOCATION_SLOT"] + tokens + ["NUMBER_SLOT"] 248 | if hasContentWord: 249 | tokenSeqs.append(tokens) 250 | 251 | if extend: 252 | lhsID = min(list(numberTokenIDs) + list(locationTokenIDs)) 253 | rhsID = max(list(numberTokenIDs) + list(locationTokenIDs)) 254 | # add the word to left 255 | extension = [] 256 | extensionHasContentWord = False 257 | for idx in range(lhsID-1, max(-1, lhsID-10),-1): 258 | if sentence["tokens"][idx]["ner"] == "O": 259 | extension = ['"' + sentence["tokens"][idx]["word"].lower() + '"'] + extension 260 | if sentence["tokens"][idx]["pos"][0] in "NVJR": 261 | extensionHasContentWord = True 262 | else: 263 | extension = ['"' + sentence["tokens"][idx]["ner"] + '"'] + extension 264 | # add the extension if it has a content word and the last thing added is not a comma 265 | if (hasContentWord or extensionHasContentWord) and (sentence["tokens"][idx]["word"] != ","): 266 | tokenSeqs.append(copy.copy(extension) + tokens) 267 | 268 | # and now to the right 269 | extension = [] 270 | extensionHasContentWord = False 271 | for idx in range(rhsID+1, min(len(sentence["tokens"]), rhsID+9)): 272 | if sentence["tokens"][idx]["ner"] == "O": 273 | extension.append('"' + sentence["tokens"][idx]["word"].lower() + '"') 274 | if sentence["tokens"][idx]["pos"][0] in "NVJR": 275 | extensionHasContentWord = True 276 | else: 277 | extension.append('"' + sentence["tokens"][idx]["ner"] + '"') 278 | # add the extension if it has a content word and the last thing added is not a comma 279 | if (hasContentWord or extensionHasContentWord) and (sentence["tokens"][idx]["word"] != ","): 280 | tokenSeqs.append(tokens + copy.copy(extension)) 281 | 282 | return tokenSeqs 283 | 284 | 285 | if __name__ == "__main__": 286 | 287 | parsedJSONDir = sys.argv[1] 288 | 289 | # get all the files 290 | jsonFiles = glob.glob(parsedJSONDir + "/*.json") 291 | 292 | # one json to rule them all 293 | outputFile = sys.argv[2] 294 | 295 | # this forms the columns using the lexicalized dependency and surface patterns 296 | pattern2location2values = {} 297 | 298 | # this keeps the sentences for each pattern 299 | pattern2sentences = {} 300 | 301 | print str(len(jsonFiles)) + " files to process" 302 | 303 | # load the hardcoded names (if any): 304 | tokenizedLocationNames = [] 305 | if len(sys.argv) > 3: 306 | names = codecs.open(sys.argv[3], encoding='utf-8').readlines() 307 | for name in names: 308 | tokenizedLocationNames.append(unicode(name).split()) 309 | print "Dictionary with hardcoded tokenized location names" 310 | print tokenizedLocationNames 311 | 312 | for fileCounter, jsonFileName in enumerate(jsonFiles): 313 | print "processing " + jsonFileName 314 | with codecs.open(jsonFileName) as jsonFile: 315 | parsedSentences = json.loads(jsonFile.read()) 316 | 317 | for sentence in parsedSentences: 318 | # fix the ner tags 319 | if len(tokenizedLocationNames)>0: 320 | dictLocationMatching(sentence, tokenizedLocationNames) 321 | 322 | tokenIDs2number = getNumbers(sentence) 323 | tokenIDs2location = getLocations(sentence) 324 | 325 | # if there was at least one location and one number build the dependency graph: 326 | if len(tokenIDs2number) > 0 and len(tokenIDs2location) > 0 and len(sentence["tokens"])<120: 327 | 328 | sentenceDAG = buildDAGfromSentence(sentence) 329 | 330 | wordsInSentence = [] 331 | for token in sentence["tokens"]: 332 | wordsInSentence.append(token["word"]) 333 | sample = " ".join(wordsInSentence) 334 | 335 | # for each pair of location and number 336 | # get the pairs of each and find their dependency paths (might be more than one) 337 | for locationTokenIDs, location in tokenIDs2location.items(): 338 | 339 | for numberTokenIDs, number in tokenIDs2number.items(): 340 | 341 | # keep all the shortest paths between the number and the tokens of the location 342 | shortestPaths = getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs) 343 | 344 | # ignore paths longer than some number deps (=tokens_on_path + 1) 345 | if len(shortestPaths) > 0 and len(shortestPaths[0]) < 10: 346 | for shortestPath in shortestPaths: 347 | pathStrings = depPath2StringExtend(sentenceDAG, shortestPath, locationTokenIDs, numberTokenIDs) 348 | for pathString in pathStrings: 349 | if pathString not in pattern2location2values: 350 | pattern2location2values[pathString] = {} 351 | 352 | 353 | if location not in pattern2location2values[pathString]: 354 | pattern2location2values[pathString][location] = [] 355 | 356 | pattern2location2values[pathString][location].append(number) 357 | if pathString in pattern2sentences: 358 | pattern2sentences[pathString].append(sample) 359 | else: 360 | pattern2sentences[pathString] = [sample] 361 | 362 | # now get the surface strings 363 | surfacePatternTokenSeqs = getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs) 364 | for surfacePatternTokens in surfacePatternTokenSeqs: 365 | if len(surfacePatternTokens) < 15: 366 | surfaceString = ",".join(surfacePatternTokens) 367 | if surfaceString not in pattern2location2values: 368 | pattern2location2values[surfaceString] = {} 369 | 370 | 371 | if location not in pattern2location2values[surfaceString]: 372 | pattern2location2values[surfaceString][location] = [] 373 | 374 | pattern2location2values[surfaceString][location].append(number) 375 | 376 | if surfaceString in pattern2sentences: 377 | pattern2sentences[surfaceString].append(sample) 378 | else: 379 | pattern2sentences[surfaceString] = [sample] 380 | 381 | # save every 1000 files 382 | if fileCounter % 10000 == 0: 383 | print str(fileCounter) + " files processed" 384 | with open(outputFile + "_tmp", "wb") as out: 385 | json.dump(pattern2location2values, out) 386 | 387 | with open(outputFile + "_sentences_tmp", "wb") as out: 388 | json.dump(pattern2sentences, out) 389 | 390 | 391 | with open(outputFile, "wb") as out: 392 | json.dump(pattern2location2values, out) 393 | 394 | with open(outputFile + "_sentences", "wb") as out: 395 | json.dump(pattern2sentences, out) 396 | 397 | 398 | -------------------------------------------------------------------------------- /src/main/factChecker.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 3 | Then it trains a model using a matrix of text patterns and database 4 | 5 | It then obtains a ranking of the patterns 6 | 7 | And the checks files parsed and NER'ed JSONs by Stanford CoreNLP and produces the following structure: 8 | 9 | Location:[dep1:[val1, val2], dep1:[val1, val2, ...]] 10 | 11 | It produces a ranking of the sentences according to the relation at question and scores each value by MAPE 12 | 13 | ''' 14 | 15 | import json 16 | import sys 17 | import os 18 | import glob 19 | import codecs 20 | import operator 21 | import buildMatrix 22 | import baselinePredictor 23 | 24 | 25 | 26 | # training data 27 | # load the FreeBase file 28 | with open(sys.argv[1]) as freebaseFile: 29 | region2property2value = json.loads(freebaseFile.read()) 30 | 31 | # we need to make it property2region2value 32 | property2region2value = {} 33 | for region, property2value in region2property2value.items(): 34 | for property, value in property2value.items(): 35 | if property not in property2region2value: 36 | property2region2value[property] = {} 37 | property2region2value[property][region] = value 38 | 39 | # text patterns 40 | textMatrix = baselinePredictor.BaselinePredictor.loadMatrix(sys.argv[2]) 41 | 42 | # specify which ones are needed: 43 | property = "/location/statistical_region/" + sys.argv[3] 44 | 45 | # first let's train a model 46 | 47 | predictor = baselinePredictor.BaselinePredictor() 48 | params = [True, float(sys.argv[4])] 49 | 50 | # train 51 | predictor.trainRelation(property, property2region2value[property], textMatrix, sys.stdout, params) 52 | 53 | print "patterns kept:" 54 | print predictor.property2patterns[property].keys() 55 | 56 | 57 | # parsed texts to check 58 | parsedJSONDir = sys.argv[5] 59 | 60 | # get all the files 61 | jsonFiles = glob.glob(parsedJSONDir + "/*.json") 62 | 63 | 64 | print str(len(jsonFiles)) + " files to process" 65 | 66 | # load the hardcoded names 67 | tokenizedLocationNames = [] 68 | names = codecs.open(sys.argv[6], encoding='utf-8').readlines() 69 | for name in names: 70 | tokenizedLocationNames.append(unicode(name).split()) 71 | print "Dictionary with hardcoded tokenized location names" 72 | print tokenizedLocationNames 73 | 74 | # get the aliases 75 | # load the file 76 | with open(sys.argv[7]) as jsonFile: 77 | region2aliases = json.loads(jsonFile.read()) 78 | 79 | # so we first need to take the location2aliases dict and turn in into aliases to region 80 | alias2region = {} 81 | for region, aliases in region2aliases.items(): 82 | # add the location as alias to itself 83 | alias2region[region] = region 84 | for alias in aliases: 85 | # so if this alias is used for a different location 86 | if alias in alias2region and region!=alias2region[alias]: 87 | alias2region[alias] = None 88 | alias2region[alias.lower()] = None 89 | else: 90 | # remember to add the lower 91 | alias2region[alias] = region 92 | alias2region[alias.lower()] = region 93 | 94 | # now filter out the Nones 95 | for alias, region in alias2region.items(): 96 | if region == None: 97 | print "alias ", alias, " ambiguous" 98 | del alias2region[alias] 99 | 100 | print alias2region 101 | 102 | # store the result: sentence, country, number, nearestPattern, euclidDistance, correctNumber, MAPE 103 | 104 | tsv = open(sys.argv[8], "wb") 105 | 106 | headers = ['sentence', 'region', 'kb_region', 'property', 'kb_value', 'mape_support_scaling_param', 'pattern', 'value', 'MAPE', 'source_JSON'] 107 | 108 | tsv.write("\t".join(headers) + "\n") 109 | 110 | # Now go over each file 111 | for fileCounter, jsonFileName in enumerate(jsonFiles): 112 | #print "processing " + jsonFileName 113 | with codecs.open(jsonFileName) as jsonFile: 114 | parsedSentences = json.loads(jsonFile.read()) 115 | 116 | for sentence in parsedSentences: 117 | # skip sentences with more than 120 tokens. 118 | if len(sentence["tokens"])>120: 119 | continue 120 | 121 | # fix the ner tags 122 | if len(tokenizedLocationNames)>0: 123 | buildMatrix.dictLocationMatching(sentence, tokenizedLocationNames) 124 | 125 | wordsInSentence = [] 126 | for idx, token in enumerate(sentence["tokens"]): 127 | wordsInSentence.append(token["word"]) 128 | 129 | #print "Sentence: " + sentenceText.encode('utf-8') 130 | 131 | # get the numbers mentioned 132 | tokenIDs2number = buildMatrix.getNumbers(sentence) 133 | 134 | # and the locations mentioned in the sentence 135 | tokenIDs2location = buildMatrix.getLocations(sentence) 136 | 137 | # So let's check if the locations are among those that we can fact check for this relation 138 | for locationTokenIDs, location in tokenIDs2location.items(): 139 | 140 | # so we have the location, but is it a known region? 141 | region = location 142 | # if the location has an alias 143 | if location in alias2region: 144 | # get it 145 | region = alias2region[location] 146 | elif location.lower() in alias2region: 147 | region = alias2region[location.lower()] 148 | 149 | if region in property2region2value[property]: 150 | 151 | sentenceDAG = buildMatrix.buildDAGfromSentence(sentence) 152 | 153 | for numberTokenIDs, number in tokenIDs2number.items(): 154 | 155 | #print "number in text: " + str(number) 156 | 157 | patterns = [] 158 | # keep all the shortest paths between the number and the tokens of the location 159 | shortestPaths = buildMatrix.getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs) 160 | for shortestPath in shortestPaths: 161 | pathStrings = buildMatrix.depPath2StringExtend(sentenceDAG, shortestPath, locationTokenIDs, numberTokenIDs) 162 | patterns.extend(pathStrings) 163 | 164 | # now get the surface strings 165 | surfacePatternTokenSeqs = buildMatrix.getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs) 166 | for surfacePatternTokens in surfacePatternTokenSeqs: 167 | if len(surfacePatternTokens) < 15: 168 | surfaceString = ",".join(surfacePatternTokens) 169 | patterns.append(surfaceString) 170 | 171 | patternsApplied = [] 172 | for pattern in patterns: 173 | if pattern in predictor.property2patterns[property].keys(): 174 | patternsApplied.append(pattern) 175 | 176 | if len(patternsApplied) > 0: 177 | wordsInSentence[numberTokenIDs[0]] = "" + wordsInSentence[numberTokenIDs[0]] 178 | wordsInSentence[numberTokenIDs[-1]] = wordsInSentence[numberTokenIDs[-1]] + "" 179 | 180 | wordsInSentence[locationTokenIDs[0]] = "" + wordsInSentence[locationTokenIDs[0]] 181 | wordsInSentence[locationTokenIDs[-1]] = wordsInSentence[locationTokenIDs[-1]] + "" 182 | 183 | sentenceText = " ".join(wordsInSentence) 184 | 185 | print "Sentence: " + sentenceText.encode('utf-8') 186 | print "location in text " + location.encode('utf-8') + " is known as " + region.encode('utf-8') + " in FB with known " + property + " value " + str(property2region2value[property][region]) 187 | print "confidence level= " + str(len(patternsApplied)) + "\t" + str(patternsApplied) 188 | print "sentence states that " + location.encode('utf-8') + " has " + property + " value " + str(number) 189 | if property2region2value[property][region] != 0.0: 190 | mape = abs(number - property2region2value[property][region]) / float(abs(property2region2value[property][region])) 191 | print "MAPE: " + str(mape) 192 | else: 193 | print "MAPE undefined" 194 | mape = "undef" 195 | print "source: " + jsonFileName 196 | print "------------------------------" 197 | details = [sentenceText.encode('utf-8'), location.encode('utf-8'), region.encode('utf-8'), sys.argv[3], str(property2region2value[property][region]), str(len(patternsApplied)),str(patternsApplied), str(number), str(mape), jsonFileName] 198 | tsv.write("\t".join(details) + "\n") 199 | 200 | tsv.close() 201 | 202 | 203 | 204 | 205 | 206 | -------------------------------------------------------------------------------- /src/main/fixedValuePredictor.py: -------------------------------------------------------------------------------- 1 | import abstractPredictor 2 | import numpy 3 | import heapq 4 | 5 | class FixedValuePredictor(abstractPredictor.AbstractPredictor): 6 | 7 | def __init__(self): 8 | # this keeps the median for each relation 9 | self.property2fixedValue = {} 10 | 11 | 12 | def predict(self, property, region, of=None, useDefault=True): 13 | if useDefault: 14 | return self.property2fixedValue[property] 15 | else: 16 | return None 17 | 18 | # TODO: remove the textMatrix from the arg list 19 | def trainRelation(self, property, trainRegion2value, textMatrix, of, params=None): 20 | 21 | # try three options 22 | candidates = [0, numpy.median(trainRegion2value.values()), numpy.mean(trainRegion2value.values())] 23 | #print candidates 24 | bestScore = float("inf") 25 | bestCandidate = None 26 | for candidate in candidates: 27 | prediction = {} 28 | for region in trainRegion2value: 29 | prediction[region] = candidate 30 | mape = abstractPredictor.AbstractPredictor.MAPE(prediction, trainRegion2value) 31 | 32 | if mape < bestScore: 33 | bestScore = mape 34 | bestCandidate = candidate 35 | 36 | if bestCandidate == 0: 37 | of.write(property + " best value is 0 with score " + str(bestScore) + "\n") 38 | elif bestCandidate == numpy.median(trainRegion2value.values()): 39 | of.write(property + " best value is median (" + str(numpy.median(trainRegion2value.values())) + ") with score " + str(bestScore) + "\n") 40 | elif bestCandidate == numpy.mean(trainRegion2value.values()): 41 | of.write(property + " best value is mean (" + str(numpy.mean(trainRegion2value.values())) + ") with score " + str(bestScore) + "\n") 42 | self.property2fixedValue[property] = bestCandidate 43 | 44 | 45 | # TODO: refactor to reuse the above 46 | def train(self, trainMatrix, textMatrix, params=None): 47 | for property, trainRegion2value in trainMatrix.items(): 48 | print property, trainRegion2value 49 | # try three options 50 | candidates = [0, numpy.median(trainRegion2value.values()), numpy.mean(trainRegion2value.values())] 51 | bestScore = float("inf") 52 | bestCandidate = None 53 | for candidate in candidates: 54 | prediction = {} 55 | for region in trainRegion2value: 56 | prediction[region] = candidate 57 | mape = abstractPredictor.AbstractPredictor.MAPE(prediction, trainRegion2value) 58 | 59 | if mape < bestScore: 60 | bestScore = mape 61 | bestCandidate = candidate 62 | 63 | if bestCandidate == 0: 64 | print property, " best value is 0 with score ", bestScore 65 | elif bestCandidate == numpy.median(trainRegion2value.values()): 66 | print property, " best value is median with score ", bestScore 67 | elif bestCandidate == numpy.mean(trainRegion2value.values()): 68 | print property, " best value is mean with score ", bestScore 69 | self.property2fixedValue[property] = bestCandidate 70 | 71 | 72 | 73 | if __name__ == "__main__": 74 | 75 | import sys 76 | import os.path 77 | import json 78 | 79 | fixedValuePredictor = FixedValuePredictor() 80 | 81 | trainMatrix = fixedValuePredictor.loadMatrix(sys.argv[1]) 82 | textMatrix = fixedValuePredictor.loadMatrix(sys.argv[2]) 83 | testMatrix = fixedValuePredictor.loadMatrix(sys.argv[3]) 84 | 85 | outputFileName = sys.argv[4] 86 | 87 | #properties = ["/location/statistical_region/population","/location/statistical_region/gdp_real","/location/statistical_region/cpi_inflation_rate"] 88 | #properties = ["/location/statistical_region/population"] 89 | properties = json.loads(open(os.path.dirname(os.path.abspath(sys.argv[1])) + "/featuresKept.json").read()) 90 | 91 | property2bestParams = fixedValuePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[None]]) 92 | #print "OK" 93 | 94 | print property2bestParams 95 | property2MAPE = {} 96 | for property in properties: 97 | paramsStrs = [] 98 | for param in property2bestParams[property]: 99 | paramsStrs.append(str(param)) 100 | 101 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_TEST" 102 | a= {} 103 | fixedValuePredictor.runRelEval(a, property, trainMatrix[property], textMatrix, testMatrix[property], ofn, property2bestParams[property]) 104 | property2MAPE[property] = a.values()[0] 105 | 106 | for property in sorted(property2MAPE): 107 | print property, property2MAPE[property] 108 | print "avg MAPE:", str(numpy.mean(property2MAPE.values())) 109 | 110 | -------------------------------------------------------------------------------- /src/main/matrixFiltering.py: -------------------------------------------------------------------------------- 1 | # so this script takes as input a dictionary in json with the following structure: 2 | # dep or string pattern : {location1:[values], location2:[values]}, etc. 3 | # and does the following kinds of filtering: 4 | # - removes locations that have less than one value for a pattern 5 | # - removes patterns for which location lists are all over the place (high stdev) 6 | # - removes patterns that have fewer than arg1 location 7 | 8 | # The second argument is a list of (FreeBase) region names to their aliases which will 9 | # to bring condense the matrix (UK and U.K. becoming the same location), but also they 10 | # prepare us for experiments 11 | 12 | 13 | import json 14 | import numpy 15 | import sys 16 | import math 17 | 18 | minNumberOfValues = int(sys.argv[4]) 19 | minNumberOfLocations = int(sys.argv[5]) 20 | maxAllowedDeviation = float(sys.argv[6]) 21 | percentageRemoved = float(sys.argv[7]) 22 | 23 | 24 | # helps detect errors 25 | numpy.seterr(all='raise') 26 | 27 | # load the file 28 | with open(sys.argv[1]) as jsonFile: 29 | pattern2locations2values = json.loads(jsonFile.read()) 30 | 31 | print "patterns before filtering:", len(pattern2locations2values) 32 | 33 | # load the file 34 | with open(sys.argv[2]) as jsonFile: 35 | region2aliases = json.loads(jsonFile.read()) 36 | 37 | # so we first need to take the location2aliases dict and turn in into aliases to region 38 | alias2region = {} 39 | for region, aliases in region2aliases.items(): 40 | # add the location as alias to itself 41 | alias2region[region] = region 42 | for alias in aliases: 43 | # so if this alias is used for a different location 44 | if alias in alias2region and region!=alias2region[alias]: 45 | alias2region[alias] = None 46 | alias2region[alias.lower()] = None 47 | else: 48 | # remember to add the lower 49 | alias2region[alias] = region 50 | alias2region[alias.lower()] = region 51 | 52 | # now filter out the Nones 53 | for alias, region in alias2region.items(): 54 | if region == None: 55 | print "alias ", alias, " ambiguous" 56 | del alias2region[alias] 57 | 58 | # ok, let's traverse now all the patterns and any locations we find we match them case independently to the aliases and replace them with the location 59 | for pattern, locations2values in pattern2locations2values.items(): 60 | # so here are the locations 61 | # we must be careful in case two or more locations are collapsed to the same region 62 | regions2values = {} 63 | # for each location 64 | for location, values in locations2values.items(): 65 | #print location 66 | #print values 67 | #print numpy.isfinite(values) 68 | #if not numpy.all(numpy.isfinite(values)): 69 | # print "ERROR" 70 | region = location 71 | # if the location has an alias 72 | if location in alias2region: 73 | # get it 74 | region = alias2region[location] 75 | elif location.lower() in alias2region: 76 | region = alias2region[location.lower()] 77 | 78 | # if we haven't added it to the regions 79 | if region not in regions2values: 80 | regions2values[region] = values 81 | else: 82 | regions2values[region].extend(values) 83 | # replace the location values of the pattern with the new ones 84 | pattern2locations2values[pattern] = regions2values 85 | 86 | 87 | countNotEnoughValues = 0 88 | for pattern, loc2values in pattern2locations2values.items(): 89 | for loc in loc2values.keys(): 90 | # so if there are not enough values, remove the location from that pattern 91 | if len(loc2values[loc]) < minNumberOfValues: 92 | del loc2values[loc] 93 | countNotEnoughValues +=1 94 | if len(loc2values) == 0: 95 | del pattern2locations2values[pattern] 96 | 97 | print "set of values removed for having less than", minNumberOfValues, " of values: ", countNotEnoughValues 98 | 99 | countTooMuchDeviation = 0 100 | for pattern, loc2values in pattern2locations2values.items(): 101 | initialLocations = len(loc2values) 102 | locationsRemoved = 0 103 | #print pattern 104 | for loc, values in loc2values.items(): 105 | #print loc 106 | #print values 107 | a = numpy.array(values) 108 | # if the values have a high stdev after normalizing them between 0 and 1 (only positive values) 109 | # the value should be interpreted as the percentage of the max value allowed as stdev 110 | # we need the largest absolute value 111 | #print a 112 | largestAbsValue = numpy.abs(a).max() 113 | #print largestAbsValue 114 | # if we didn't have all 0 115 | if largestAbsValue > 0 and numpy.std(a/largestAbsValue) > maxAllowedDeviation: 116 | del loc2values[loc] 117 | countTooMuchDeviation += 1 118 | locationsRemoved += 1 119 | # if the pattern has many locations with values all over the place, remove it altogether. 120 | if float(locationsRemoved)/initialLocations > percentageRemoved: 121 | print "pattern ", pattern.encode('utf-8'), " removed because it has more than ",percentageRemoved, " value with large deviation" 122 | del pattern2locations2values[pattern] 123 | 124 | print "sets of values removed for having more than", maxAllowedDeviation, " std deviation : ", countTooMuchDeviation 125 | 126 | 127 | for pattern in pattern2locations2values.keys(): 128 | # now make sure there are enough locations left per pattern 129 | if len(pattern2locations2values[pattern]) < minNumberOfLocations: 130 | del pattern2locations2values[pattern] 131 | else: 132 | # if there are enough values then just keep the average 133 | for location in pattern2locations2values[pattern].keys(): 134 | pattern2locations2values[pattern][location] = numpy.mean(pattern2locations2values[pattern][location]) 135 | 136 | # if the feature has the same values independently of the region, remove it as well: 137 | for pattern,location2values in pattern2locations2values.items(): 138 | if min(location2values.values()) == max(location2values.values()): 139 | print "Removing pattern ", pattern.encode('utf-8'), " because it has the same values for all locations:", location2values 140 | del pattern2locations2values[pattern] 141 | 142 | print "patterns after filtering:",len(pattern2locations2values) 143 | 144 | with open(sys.argv[3], "wb") as out: 145 | json.dump(pattern2locations2values, out) 146 | -------------------------------------------------------------------------------- /src/utils/FreeBaseDownload.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Created on May 10, 2014 3 | 4 | @author: andreasvlachos 5 | 6 | This script downloads all statistical regions from FreeBase using a combination of the MQL read API and the Topic API. 7 | ''' 8 | import json 9 | import urllib 10 | 11 | # TODO: add the part to retrieve the aliases: 12 | # [{ "mid": null, "name": null, "/common/topic/alias": [], "type": "/location/statistical_region", "limit": 100 }] 13 | 14 | api_key = open("/cs/research/intelsys/home1/avlachos/freebaseApiKey").read() 15 | mqlread_url = 'https://www.googleapis.com/freebase/v1/mqlread' 16 | # use the mid instead of the id as they do need escaping 17 | mql_query = '[{"mid": null,"name": null, "type": "/location/statistical_region","limit": 100}]' 18 | # set this to the last value we obtained 19 | cursor = "" 20 | 21 | # we need to have a parameter limit=0 as in: 22 | topicService_url = 'https://www.googleapis.com/freebase/v1/topic' 23 | params = { 24 | 'key': api_key, 25 | 'filter': '/location/statistical_region', 26 | 'limit': 0 27 | } 28 | 29 | # Given the quota, we can run this 1000 times daily. 30 | # It stops when the topics are exhausted. 31 | 32 | for i in xrange(1000): 33 | # construct the query 34 | mql_url = mqlread_url + '?query=' + mql_query + "&cursor=" + cursor 35 | print mql_url 36 | statisticalRegionsResult = json.loads(urllib.urlopen(mql_url).read()) 37 | #print statisticalRegionsResult 38 | for region in statisticalRegionsResult["result"]: 39 | print region["mid"]# + ":" + region["name"] 40 | # now get the statistical properties 41 | topic_url = topicService_url + region["mid"] + '?' + urllib.urlencode(params) 42 | topicResult = json.loads(urllib.urlopen(topic_url).read()) 43 | # print topicResult 44 | topicResult["name"] = region["name"] 45 | filename = region["mid"].split("/")[-1] + ".json" 46 | with open(filename, 'w') as outfile: 47 | json.dump(topicResult, outfile) 48 | 49 | # update the cursor 50 | cursor = statisticalRegionsResult['cursor'] 51 | # this cursor can be used to resume the data download 52 | print "New cursor to process" 53 | print cursor 54 | if not cursor: 55 | break 56 | -------------------------------------------------------------------------------- /src/utils/bingDownload.py: -------------------------------------------------------------------------------- 1 | # Following the example from here 2 | # http://www.cs.columbia.edu/~gravano/cs6111/Proj1/bing-python.txt 3 | 4 | import urllib2 5 | import base64 6 | import urllib 7 | import json 8 | import os 9 | 10 | # from here we only want to keep the countries 11 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010Filtered15-150.json") as dataFile: 12 | allCountriesData = json.loads(dataFile.read()) 13 | 14 | # from here get the property ids 15 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/featuresKept.json") as dataFile: 16 | featuresKept = json.loads(dataFile.read()) 17 | 18 | # better lowercase the property names 19 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allStatisticalRegionProperties.json") as dataFile: 20 | featuresDesc = json.loads(dataFile.read()) 21 | 22 | 23 | 24 | propertyKeywords = [] 25 | for feature in featuresKept: 26 | # get the name for it and lower case it 27 | for feat in featuresDesc["result"]: 28 | if feat["id"] == feature: 29 | propertyKeywords.append(feat["name"].lower().encode('utf-8')) 30 | #propertyKeywords = ["fertility rate"] 31 | #print propertyKeywords 32 | 33 | countryNames = [] 34 | for country in allCountriesData: 35 | countryNames.append(country.encode('utf-8')) 36 | 37 | #countryNames = ["Czech Republic"] 38 | #print countryNames 39 | 40 | bingUrl = 'https://api.datamarket.azure.com/Bing/SearchWeb/v1/Web' # ?Query=%27gates%27&$top=10&$format=json' 41 | #Provide your account key here 42 | accountKey = 'XXX' 43 | accountKeyEnc = base64.b64encode(accountKey + ':' + accountKey) 44 | headers = {'Authorization': 'Basic ' + accountKeyEnc} 45 | 46 | pathName = "/cs/research/intelsys/home1/avlachos/FactChecking/Bing" 47 | 48 | if not os.path.exists(pathName): 49 | print "creating dir " + pathName 50 | os.mkdir(pathName) 51 | 52 | 53 | for keywords in propertyKeywords: 54 | print keywords 55 | # create a directory for the relation 56 | relPathName = pathName + "/" + keywords 57 | if not os.path.exists(relPathName): 58 | print "creating dir " + relPathName 59 | os.mkdir(relPathName) 60 | 61 | for name in countryNames: 62 | print name 63 | params = { 64 | #'format': "Json", 65 | 'Adult': "\'Strict\'", 66 | 'WebFileType' : "\'HTML\'", 67 | } 68 | # the query terms are done with urllib quote in order to get %20 instead of + (bing likes that instead) 69 | #print '\''.encode('utf-8') + name + " " + keywords + u' 2014\''.encode('utf-8') 70 | # one can add in the end this bit 'WebSearchOptions' : "DisableQueryAlterations" 71 | #&WebSearchOptions=%27DisableQueryAlterations%27 72 | # this bit can fetch the second page "$skip=100" 73 | url = bingUrl + "?Query=" + urllib.quote('\''.encode('utf-8') + name + " " + keywords + '\''.encode('utf-8')) + "&" + urllib.urlencode(params) + "&$format=json" 74 | print url 75 | req = urllib2.Request(url, headers = headers) 76 | response = urllib2.urlopen(req) 77 | content = json.loads(response.read()) 78 | # content contains the xml/json response from Bing. 79 | print content 80 | # save the json in a file named after the country and the property. 81 | filename = relPathName + "/" + name + ".json" 82 | with open(filename, 'w') as outfile: 83 | json.dump(content["d"]["results"], outfile) 84 | 85 | -------------------------------------------------------------------------------- /src/utils/dataFiltering.py: -------------------------------------------------------------------------------- 1 | """ This script takes the json extracted from the freebase jsons 2 | and creates a matrix of countries x FreeBase relations. 3 | We probably want to filter out relations and countries that do not 4 | have a lot of values in the data""" 5 | 6 | import json 7 | from collections import Counter 8 | 9 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010.json") as dataFile: 10 | data = json.loads(dataFile.read()) 11 | 12 | #print json.dumps(data, sort_keys=True, indent=4) 13 | print len(data) 14 | filteredFeatureCounts = Counter() 15 | filteredCountries = {} 16 | for country, numbers in data.items(): 17 | if len(numbers) >= 15: 18 | filteredCountries[country] = numbers 19 | for feature in numbers: 20 | filteredFeatureCounts[feature] += 1 21 | 22 | 23 | print filteredFeatureCounts 24 | 25 | filteredFeatureCountries = {} 26 | featuresKept = [] 27 | entriesFilled = 0 28 | for country, numbers in filteredCountries.items(): 29 | filteredFeatures = {} 30 | for feature, number in numbers.items(): 31 | if filteredFeatureCounts[feature] >= 150: 32 | if feature not in featuresKept: 33 | featuresKept.append(feature) 34 | filteredFeatures[feature] = number 35 | filteredFeatureCountries[country] = filteredFeatures 36 | entriesFilled += len(filteredFeatures) 37 | 38 | 39 | print len(filteredFeatureCountries) 40 | print len(featuresKept) 41 | print entriesFilled 42 | 43 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010Filtered15-150.json", "w") as dataFile: 44 | json.dump(filteredFeatureCountries, dataFile) 45 | 46 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/featuresKept.json", "w") as dataFile: 47 | json.dump(featuresKept, dataFile) 48 | 49 | 50 | #print len(featureCounts) 51 | #print data["Algeria"] 52 | #print data["Germany"]["/location/statistical_region/population"] 53 | #print data["Algeria"]["/location/statistical_region/population"] 54 | 55 | #print featureCounts.most_common(40) 56 | -------------------------------------------------------------------------------- /src/utils/dataSplits.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Here we keep functions for training/test splits ensuring the matrix still has enough in each row and column 3 | 4 | ''' 5 | 6 | import random 7 | import json 8 | import sys 9 | 10 | # load the FreeBase file 11 | with open(sys.argv[1]) as freebaseFile: 12 | region2property2value = json.loads(freebaseFile.read()) 13 | 14 | # we need to make it property2region2value 15 | property2region2value = {} 16 | for region, property2value in region2property2value.items(): 17 | for property, value in property2value.items(): 18 | if property not in property2region2value: 19 | property2region2value[property] = {} 20 | property2region2value[property][region] = value 21 | 22 | trainingPortion = 2.0/3.0 23 | 24 | # for each property, pick the trainingPortion, ensuring that all countries are represented 25 | trainMatrix = {} 26 | testMatrix = {} 27 | 28 | random.seed(3) 29 | 30 | for property, region2value in property2region2value.items(): 31 | trainMatrix[property] = {} 32 | testMatrix[property] = {} 33 | 34 | regions = region2value.keys() 35 | random.shuffle(regions) 36 | 37 | for idx, region in enumerate(regions): 38 | if float(idx+1)/float(len(regions)) 0 and len(val["property"][timeType]["values"]) > 0: 69 | thisValue = val["property"][valueType]["values"][0]["value"] 70 | thisTime = val["property"][timeType]["values"][0]["value"] 71 | else: 72 | continue 73 | 74 | try: 75 | # if the time is given as YYYY-MM or YYYY-MM-DD 76 | if thisTime.find("-") > -1: 77 | if len(thisTime.split("-")) ==2: 78 | thisYear, thisMonth = thisTime.split("-") 79 | thisTime = [int(thisYear), int(thisMonth)] 80 | elif len(thisTime.split("-")) ==3: 81 | # the day of the month is ignored 82 | thisYear, thisMonth, thisDay = thisTime.split("-") 83 | thisTime = [int(thisYear), int(thisMonth)] 84 | else: 85 | # or it is just YYYY 86 | thisTime = [int(thisTime), 0] 87 | # check that the numbers are not future projections! 88 | if (thisTime[0] < 2015) and ((mostRecentTime == [0,0]) or (thisTime[0] > mostRecentTime[0]) \ 89 | or (thisTime[0] == mostRecentTime[0] and thisTime[1] > mostRecentTime[1])): 90 | mostRecentTime = thisTime 91 | mostRecentValue = thisValue 92 | # if the time specified cannot be parsed, ignore it 93 | except ValueError: 94 | pass 95 | # or the time of the measurement is recent enough 96 | if mostRecentTime[0] >= 2010: 97 | print "Time=" + str(mostRecentTime) + " Value=" + str(mostRecentValue) 98 | numbers[prop] = mostRecentValue 99 | return name, numbers 100 | 101 | 102 | if __name__ == '__main__': 103 | import sys 104 | dirName = sys.argv[1] 105 | propertiesOfInterest = {} 106 | with open(dirName + "../propertiesOfInterest.json") as props: 107 | propertiesOfInterest = json.loads(props.read()) 108 | 109 | countries2numbers = {} 110 | totalCountries = 0 111 | totalNumbers = 0 112 | uniqueRelations = [] 113 | relation2counts = {} 114 | rels = [] 115 | for fl in os.listdir(dirName): 116 | print fl 117 | jsonFl = open(dirName + "/" + fl).read() 118 | name, numbers = extractNumericalValues(jsonFl, propertiesOfInterest) 119 | if name != None and len(numbers)>0: 120 | countries2numbers[name] = numbers 121 | totalCountries += 1 122 | for relation in numbers: 123 | totalNumbers += 1 124 | if relation not in uniqueRelations: 125 | uniqueRelations.append(relation) 126 | relation2counts[relation] = 0 127 | relation2counts[relation] += 1 128 | 129 | print countries2numbers 130 | # maybe return a json? Would be useful to be language independent 131 | print relation2counts 132 | print "countries with at least one 2010-2014 number: " + str(totalCountries) 133 | print "total post 2010 numbers: " + str(totalNumbers) 134 | print "Unique relations: " + str(len(uniqueRelations)) 135 | 136 | with open(dirName + "../allCountriesPost2010-2014.json", 'wb') as dc: 137 | json.dump(countries2numbers, dc) 138 | 139 | 140 | -------------------------------------------------------------------------------- /src/utils/obtainAliases.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 3 | This script obtains the aliases from Freebase 4 | 5 | ''' 6 | 7 | import sys 8 | import json 9 | import urllib 10 | 11 | # load the file 12 | with open(sys.argv[1]) as freebaseFile: 13 | region2property2value = json.loads(freebaseFile.read()) 14 | 15 | 16 | apiKey = open("/cs/research/intelsys/home1/avlachos/freebaseApiKey").read() 17 | 18 | mqlreadUrl = 'https://www.googleapis.com/freebase/v1/mqlread' 19 | 20 | aliasQueryParams = { 21 | 'key': apiKey, 22 | } 23 | 24 | # the limit gives back only one result, which seems to be the most popular and the one we are interested in 25 | aliasQuery = { "/common/topic/alias": [], "type": "/location/statistical_region", "limit":1 } 26 | 27 | region2aliases = {} 28 | for regionName in region2property2value: 29 | print regionName.encode('utf-8') 30 | aliasQuery["name"] = regionName 31 | aliasQueryParams["query"] = json.dumps(aliasQuery) 32 | 33 | aliasUrl = mqlreadUrl + '?' + urllib.urlencode(aliasQueryParams) 34 | aliasJSON = json.loads(urllib.urlopen(aliasUrl).read()) 35 | region2aliases[regionName] = aliasJSON["result"]["/common/topic/alias"] 36 | 37 | with open(sys.argv[2], "wb") as out: 38 | json.dump(region2aliases, out) 39 | 40 | print len(region2aliases), " region names with aliases" 41 | 42 | 43 | --------------------------------------------------------------------------------