├── README.md
├── data
├── aliases.json
├── allCountriesPost2010-2014Filtered15-150.json
├── featuresKept.json
├── labeled_claims
│ ├── consumer_price_index_claims.xlsx
│ ├── cpi_inflation_rate_claims.xlsx
│ ├── diesel_price_liter_claims.xlsx
│ ├── fertility_rate_claims.xlsx
│ ├── gdp_growth_rate_claims.xlsx
│ ├── gdp_nominal_claims.xlsx
│ ├── gdp_nominal_per_capita_claims.xlsx
│ ├── gni_in_ppp_dollars_claims.xlsx
│ ├── gni_per_capita_in_ppp_dollars_claims.xlsx
│ ├── health_expenditure_as_percent_of_gdp_claims.xlsx
│ ├── internet_users_percent_population_claims.xlsx
│ ├── life_expectancy_claims.xlsx
│ ├── population_claims.xlsx
│ ├── population_growth_rate_claims.xlsx
│ ├── prevalence_of_undernourisment_claims.xlsx
│ └── renewable_freshwater_per_capita_claims.xlsx
├── locationNames
├── propertiesOfInterest.json
├── test.json
├── theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json
└── train.json
└── src
├── main
├── abstractPredictor.py
├── baselinePredictor.py
├── buildMatrix.py
├── factChecker.py
├── fixedValuePredictor.py
└── matrixFiltering.py
└── utils
├── FreeBaseDownload.py
├── bingDownload.py
├── dataFiltering.py
├── dataSplits.py
├── htmlPageDownload.py
├── numberExtraction.py
└── obtainAliases.py
/README.md:
--------------------------------------------------------------------------------
1 | # Simple Numerical Fact-Checker
2 | Fact checker for simple claims about statistical properties
3 |
4 | This repository contains the code and data needed to reproduce the results of the paper:
5 |
6 | Identification and Verification of Simple Claims about Statistical Properties
7 |
8 | Andreas Vlachos and Sebastian Riedel, EMNLP 2015
9 |
10 | Preprocessing:
11 |
12 | 1. FreeBaseDownload.py: to get the JSONs for all statistical regions in FreeBase.
13 | 2. numberExtraction.py: to extract the most recent numbers mentioned for each statistical region in a triple form: region-property-value
14 | 3. dataFiltering.py: to get the countries and properties with most values filled in (2 parameters to play with). From this we get the file data/allCountriesPost2010-2014Filtered15-150.json.
15 | 4. bingDownload.py: to run queries of the form "region + property" on Bing and get JSONs back with the links
16 | 5. htmlDownload.py: to get the html from the links
17 | 6. obtainAliases.py: Gets the common aliases for the statistical regions considered needed for the matrix filtering later on. From this we get the file data/aliases.json.
18 |
19 | Then we run the following bits of Java from the HTML2Stanford:
20 | HTML2Text (need the BoilerPipe jar)
21 | Text2Parsed2JSON (careful to use the CollapsedCCproccessed dependencies, best a more recent version of Stanford CoreNLP (>3.5) that outputs straight to json)
22 |
23 | From this we obtain a large number of html pages, converted to text, parsed with Stanford CoreNLP.
24 |
25 | And then:
26 |
27 | 1. buildMatrix.py: This processes the preprocessed HTML pages and builds a json file which is a dictionary from pattern (string or lexicalized dependencies) to countries/locations and then to the values.
28 | 2. matrixFiltering.py: this takes the matrix from the previous step and filters its values and patterns to avoid those without enough entries or those whose entries have too much deviation so they cannot be sensibly averaged. Also uses the aliases to merge the values for different location names used in the experiments. From this we get the file data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json.
29 |
30 | 3. Split the data from Freebase (data/allCountriesPost2010-2014Filtered15-150.json) into training/dev (data/train.json) and test (data/test.json).
31 |
32 | 4. To reproduce the IE-style evaluation results
33 |
34 | - informedGuess:
35 |
36 | ```python src/main/fixedValuePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/informedGuess```
37 |
38 | - unadjustedMAPE:
39 |
40 | ```python src/main/baselinePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/unadjustedMAPE FALSE```
41 |
42 | - adjustedMAPE:
43 |
44 | ```python src/main/baselinePredictor.py data/train.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json data/test.json out/adjustedMAPE TRUE```
45 |
46 | To run the fact-checker on the HTML pages obtained from the web:
47 |
48 | First create a directory for the output, i.e.:
49 |
50 | ```mkdir out```
51 |
52 | Then run
53 |
54 | ```python src/main/factChecker.py data/allCountriesPost2010-2014Filtered15-150.json data/theMatrixExtend120TokenFiltered_2_2_0.1_0.5_fixed2.json population 0.03125 data/htmlPages2textPARSEDALL data/locationNames data/aliases.json out/population.tsv```
55 |
56 | The directory data/htmlPages2textPARSEDALL is not on github due to its size (1.6GB compressed), but feel free to ask me for it.
57 |
58 | This is run for each of the 16 properties independently. The parameter for adjusted MAPE used in the paper was set according to the IE experiments. Here is the table the setting for each property:
59 |
60 | - gni_per_capita_in_ppp_dollars: 8
61 | - gdp_nominal: 0.03125
62 | - internet_users_percent_population: 2
63 | - cpi_inflation_rate: 2
64 | - health_expenditure_as_percent_of_gdp: 2
65 | - gdp_growth_rate: 1
66 | - fertility_rate: 0.5
67 | - consumer_price_index: 1
68 | - prevalence_of_undernourisment: 32
69 | - gni_in_ppp_dollars: 16
70 | - population_growth_rate: 0.0078125
71 | - diesel_price_liter: 2
72 | - life_expectancy: 1
73 | - population: 0.03125
74 | - gdp_nominal_per_capita: 16
75 | - renewable_freshwater_per_capita: 8
76 |
77 | The output for each relation is a .tsv file which can be loaded in Excel. We did this and labeled the claims. The files from which the results in Table 2 are obtained are in data/labeled_claims.
78 |
--------------------------------------------------------------------------------
/data/aliases.json:
--------------------------------------------------------------------------------
1 | {"Canada": ["Canuckistan", "Dominion of Canada"], "Turkmenistan": [], "Montenegro": ["Montenegr"], "Lithuania": ["Litauen", "Lietuva", "Republic of Lithuania", "Lietuvos Respublika"], "Cambodia": [], "Ethiopia": ["Federal Democratic Republic of Ethiopia"], "Aruba": [], "Sri Lanka": ["Ceylon", "Democratic Socialist Republic of Sri Lanka"], "Swaziland": [], "Argentina": ["Agrentina ", "Argentine Republic", "The Argentine"], "Bolivia": ["The Plurinational State of Bolivia"], "Cameroon": ["Republic of Cameroon", "R\u00e9publique du Cameroun", "Africa in miniature"], "Burkina Faso": [], "Bahrain": ["Kingdom of Bahrain"], "Saudi Arabia": ["Kingdom of Saudi Arabia"], "Cape Verde": ["Republic of Cape Verde"], "Slovenia": ["Slovania", "Republic of Slovenia", "Repubblica di Slovenia"], "Guatemala": ["Guatemala."], "Bosnia and Herzegovina": ["Bosnia", "Bosnia y Herzegovina", "Bosnia-Herzegovina", "BiH"], "Kuwait": [], "Dominica": ["Commonwealth of Dominica"], "Australia": ["Down Under"], "Liberia": [], "Maldives": ["Republic of Maldives", "Maldive Islands", "Republic of the Maldives"], "Oman": ["Sultanate of Oman", "\u0639\u064f\u0645\u0627\u0646", "\u0633\u0644\u0637\u0646\u0629 \u0639\u064f\u0645\u0627\u0646\u200e"], "C\u00f4te d\u2019Ivoire": ["Cote d'Ivoire", "Ivory Coast", "Republic of C\u00f4te d'Ivoire", "C\u221a\u00a5te d\u201a\u00c4\u00f4Ivoire"], "Gabon": ["The Gabonese Republic"], "New Zealand": ["Aotearoa", "God's own", "NZ", "Land of the long white cloud"], "Yemen": ["El-\u1e24odeidah", "Fourth Governorate", "Baladiyat `Adan", "Qa\u1e11\u0101\u2019 Al \u1e28udaydah", "Iv", "Al Mu\u1e29\u0101faz\u0327Ah Ar R\u0101bi\u2018Ah", "Baladiyatadan", "Hodeida", "Muhafazat Shabwah", "Governorate Al Jawf", "Al Hudaydah", "Elhodeidah", "Al Muhafazah Ar Rabi`Ah", "Mu\u1e29\u0101faz\u0327At Al Jawf", "Muhafazat Al Jawf", "Governorate Number Four", "Hodeidah", "Almuhafazaharrabiah", "Al \u1e28udaydah", "Shabwah", "`Adan", "Aden", "\u2018Adan", "Al Jawf", "Balad\u012byat \u2018Adan", "Qada' Al Hudaydah", "El-Hodeidah", "Mu\u1e29\u0101faz\u0327At Shabwah", "Adan", "Qadaalhudaydah"], "Pakistan": ["Federation of Pakistan", "Islamic Republic of Pakistan"], "Albania": ["Republic of Albania"], "Samoa": ["Western Samoa", "Independent State of Samoa", "S\u0101moa", "Malo Sa'oloto Tuto'atasi o S\u0101moa"], "Macau": ["Macao", "Macao Special Administrative Region of the People's Republic of China", "Macao SAR"], "United Arab Emirates": ["UAE", "Emirates"], "Uruguay": ["Eastern Republic of the Uruguay", "Oriental Republic of Uruguay", "Republic East of the Uruguay", "Eastern Republic of Uruguay"], "India": ["Bharat", "Hindustan", "The Republic of India", "Republic of India", "Bharat Ganrajya"], "Azerbaijan": ["Republic of Azerbaijan", "Az\u0259rbaycan Respublikas\u0131"], "Lesotho": [], "Saint Vincent and the Grenadines": ["SVG", "St. Vincent and the Grenadines"], "Kenya": ["Republic of Kenya"], "South Korea": ["Republic of Korea", "ROK", "Daehan Minguk", "Korea"], "Tajikistan": [], "Afghanistan": ["Islamic Republic of Afghanistan", "Afganistan"], "Bangladesh": [], "Eritrea": ["State of Eritrea"], "Solomon Islands": [], "Saint Lucia": ["St. Lucia"], "Cyprus": [], "Mongolia": [], "France": ["R\u00e9publique fran\u00e7aise", "French Republic", "l'Hexagone"], "Rwanda": [], "Slovakia": ["Slovensko", "The Slovak Republic"], "Vanuatu": [], "Norway": ["Norge", "Kingdom of Norway"], "Malawi": [], "Benin": [], "Federated States of Micronesia": [], "Singapore": ["Singapura", "Republic of Singapore"], "United States of America": ["America", "U.S.", "USA", "United States", "US", "U.S.A.", "U.S. of A.", "Estats Units d'Am\u00e8rica", "The States"], "Saint Kitts and Nevis": ["St. Kitts and Nevis"], "Togo": [], "Armenia": ["Republic of Armenia"], "Timor-Leste": ["Timor Timur", "East Timor"], "Dominican Republic": ["Rep\u00fablica Dominicana"], "Ukraine": ["Ukraine Region"], "Ghana": ["Republic of Ghana"], "Tonga": ["Pule\u02bbanga Fakatu\u02bbi \u02bbo Tonga", "Kingdom of Tonga"], "Finland": ["Suomi", "Republic of Finland"], "Libya": [], "Indonesia": ["Republic of Indonesia", "The Republic of Indonesia"], "Central African Republic": [], "Mauritius": ["Republic of Mauritius"], "Sweden": ["Sweden, Maine"], "Vietnam": ["Socialist Republic of Vietnam", "Republic of Vietnam", "Annam", "Viet nam"], "Mali": [], "Russia": ["\u0420\u043e\u0441\u0441\u0438\u044f", "Russian Federation", "\u041e\u0440\u0434\u0430"], "Bulgaria": ["Republic of Bulgaria"], "Romania": ["Rom\u00e2nia"], "Angola": [], "Portugal": ["Portuguese Republic"], "South Africa": ["Republiek van Suid-Afrika", "Republic of South Africa"], "Fiji": [], "Qatar": ["State of Qatar", "Dawlat Qa\u1e6dar"], "Malaysia": [], "Austria": ["\u00d6sterreich", "Autriche", "Oesterreich", "Republic of Austria"], "Mozambique": ["Republica de Mocambique", "Mocambique", "Republic of Mozambique"], "Uganda": ["The Republic of Uganda", "Republic of Uganda"], "Japan": ["Nihon", "Nippon", "\u042f\u043f\u043e\u043d\u0438\u044f", "Nippon-koku", "Nihon-koku", "State of Japan", "Land of the Rising Sun", "Dai-Nippon", "NTSC J", "Japan", "JAP", "JPN"], "Niger": ["Republic of Niger"], "Brazil": ["Brasil", "Rep\u00fablica Federativa do Brasil", "Brazilian ", "Federative Republic of Brazil"], "Guinea": [], "Guyana": ["El Dorado"], "Costa Rica": ["Republic of Costa Rica"], "Republic of Ireland": ["\u00c9ire", "the twenty-six counties", "the 26 counties", "the Free State", "Irish Republic"], "Bahamas": ["Commonwealth of The Bahamas"], "Nigeria": ["Federal Republic of Nigeria"], "Ecuador": ["Republic of Ecuador", "Rep\u00fablica del Ecuador"], "Czech Republic": ["Bohemia"], "Brunei": ["Brunei Darussalam", "Nation of Brunei, the Abode of Peace"], "Belarus": ["Bellarussiya", "Republic of Belarus", "Bielaru\u015b", "Respublika Belarus\u2019"], "Iran": ["Islamic Republic of Iran", "Persia"], "Algeria": ["People's Democratic Republic of Algeria", "The People's Democratic Republic of Algeria"], "El Salvador": [], "Chile": ["Republic of Chile"], "Puerto Rico": ["Borinquen", "Commonwealth of Puerto Rico"], "Belgium": ["Belgi\u00eb", "Belgique", "Kingdom of Belgium"], "Thailand": ["Kingdom of Thailand", "Siam"], "Haiti": ["Republic of Haiti", "Ha\u00efti"], "Belize": [], "Hong Kong": ["Hong Kong Special Administrative Region", "Hongkong Island", "Hong Kon", "Hongkong", "Hong Kong Special Administrative Region of the People's Republic of China"], "Sierra Leone": [], "Georgia": ["Republic of Georgia"], "Gambia": [], "Philippines": ["Pearl of the Orient Seas", "philippines", "The Philippines", "Republic of the Philippines", "Republika ng Pilipinas", "\u30d5\u30a3\u30ea\u30d4\u30f3\u5171\u548c\u56fd", "Philippiness", "\ud544\ub9ac\ud540 \uacf5\ud654\uad6d"], "Moldova": ["Republic of Moldova", "Moldavia"], "Morocco": ["Kingdom of Morocco", "The Western Kingdom", "The West", "Regne del Marroc"], "Namibia": [], "Guinea-Bissau": [], "Kiribati": ["Republic of Kiribati", "Gilbert Islands"], "Switzerland": ["Svizzera", "Svizra", "Schweiz", "Swiss Confederation", "La Suisse", "Helvetia"], "Seychelles": ["Republic of Seychelles", "Repiblik Sesel", "R\u00e9publique des Seychelles"], "Chad": ["T\u0161\u0101d", "Tchad", "Republic of Chad", "R\u00e9publique du Tchad", "\u01e6umh\u016briyyat T\u0161\u0101d"], "Estonia": ["Estland", "Republic of Estonia"], "Kosovo": ["Kosovo i Metohija"], "Uzbekistan": ["Republic of Uzbekistan"], "Djibouti": ["Republic of Djibouti"], "Antigua and Barbuda": [], "Spain": ["Espa\u00f1a", "Kingdom of Spain", "Furija"], "Colombia": ["\u30b3\u30ed\u30f3\u30d3\u30a2\u5171\u548c\u56fd", "Republic of Colombia"], "Burundi": [], "Nicaragua": ["Rep\u00fablica de Nicaragua", "Republic of Nicaragua"], "Barbados": [], "Madagascar": ["Republic of Madagascar", "Malagasy Republic"], "Palau": ["Palau Islands"], "Bhutan": [], "Sudan": ["Republic of the Sudan", "North Sudan", "Jumh\u016br\u012byat as-S\u016bd\u0101n"], "Laos": ["Lao People's Democratic Republic", "\ub77c\uc624\uc2a4"], "Democratic Republic of the Congo": ["Zaire", "Democratic Republic of Congo", "DR Congo", "DROC", "Congo Kinshasa", "RDC", "Congo-Kinshasa"], "Netherlands": ["Holland", "Nederland", "The Netherlands", "Nederlands", "Hollanti"], "Suriname": ["Republic of Suriname", "Surinam"], "S\u00e3o Tom\u00e9 and Pr\u00edncipe": ["Sao Tome and Principe", "Democratic Republic of S\u00e3o Tom\u00e9 and Pr\u00edncipe"], "Venezuela": ["Bolivarian Republic of Venezuela"], "Israel": ["State of Israel"], "Iceland": ["\u00cdsland", "Republic of Iceland"], "Zambia": [], "Senegal": ["R\u00e9publique du S\u00e9n\u00e9gal", "Republic of Senegal"], "Papua New Guinea": [], "Zimbabwe": ["The Republic of Zimbabwe"], "Germany": ["Federal Republic of Germany", "Deutschland", "Bundesrepublik Deutschland", "BRD"], "Denmark": ["D\u00e4nemark", "\u4e39\u9ea6", "Kingdom of Denmark", "Kongeriget Danmark"], "Kazakhstan": [], "Tanzania": ["United Republic of Tanzania", "Jamhuri ya Muungano wa Tanzania"], "Mauritania": ["Islamic Republic of Mauritania"], "Kyrgyzstan": ["Kirgisistan"], "Iraq": [], "North Korea": ["Democratic People's Republic of Korea", "DPRK"], "Trinidad and Tobago": ["Trinidad & Tobago"], "Latvia": ["Lettland", "Republic of Latvia"], "Hungary": ["Magyarorsz\u00e1g", "Hungary, Europe"], "Croatia": ["Croacia", "Croatia/Hrvatska", "Croatie", "Croazia", "Crotaia", "Cro\u00e1cia", "Hirvatistan", "Hravatska", "Hrvatska", "ISO 3166-1:HR", "Kroatia", "Kroatien", "Republika Hrvatska", "Republic of Croatia"], "Syria": ["Syrian Arab Republic"], "Nepal": ["Federal Democratic Republic of Nepal", "Democratic Republic of Nepal"], "Honduras": ["Republic of Honduras", "Spanish Honduras", "Rep\u00fablica de Honduras"], "Myanmar": ["Republic of the Union of Myanmar", "Myanmar (Burma)", "\u30df\u30e3\u30f3\u30de\u30fc\u9023\u90a6\u5171\u548c\u56fd", "Burma"], "Equatorial Guinea": [], "Tunisia": [], "Republic of Macedonia": ["Macedonia", "Makedonija", "Republika Makedonija", "\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430 \u041c\u0430\u043a\u0435\u0434\u043e\u043d\u0438\u0458\u0430", "Macedonia (FYROM)"], "Serbia": ["Republic of Serbia", "\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430 \u0421\u0440\u0431\u0438\u0458\u0430", "Republika Srbija", "Srbija", "Serbijos Respublika"], "Botswana": [], "United Kingdom": ["Britain", "Great Britain", "UK", "The United Kingdom of Great Britain and Northern Ireland", "United Kingdom of Great Britain and Ireland", "U.K.", "United Kingdom of Great Britain and Northern Ireland", "United Kingdom of Great Britain", "GB", "GBR"], "Congo": ["Republic of Congo", "Congo Brazzaville", "Congo-Brazzaville"], "Greece": ["Hellenic Republic", "Hellas"], "Paraguay": ["Republic of Paraguay", "Coraz\u00f3n de Am\u00e9rica", "Heart of America", "Corazon de America"], "Earth": ["Gaia", "The World", "Terra", "\u05d0\u05e8\u05e5", "\u05db\u05d3\u05d5\u05e8 \u05d4\u05d0\u05e8\u05e5", "world", "\u0393\u03b7", "globe", "the Globe", "Planet Earth", "The Earth", "The Blue Planet", "Tellus"], "Comoros": ["Union of the Comoros", "The Comoros"]}
--------------------------------------------------------------------------------
/data/featuresKept.json:
--------------------------------------------------------------------------------
1 | ["/location/statistical_region/gni_per_capita_in_ppp_dollars", "/location/statistical_region/gdp_nominal", "/location/statistical_region/internet_users_percent_population", "/location/statistical_region/cpi_inflation_rate", "/location/statistical_region/health_expenditure_as_percent_of_gdp", "/location/statistical_region/gdp_growth_rate", "/location/statistical_region/fertility_rate", "/location/statistical_region/consumer_price_index", "/location/statistical_region/prevalence_of_undernourisment", "/location/statistical_region/gni_in_ppp_dollars", "/location/statistical_region/population_growth_rate", "/location/statistical_region/diesel_price_liter", "/location/statistical_region/life_expectancy", "/location/statistical_region/population", "/location/statistical_region/gdp_nominal_per_capita", "/location/statistical_region/renewable_freshwater_per_capita"]
--------------------------------------------------------------------------------
/data/labeled_claims/consumer_price_index_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/consumer_price_index_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/cpi_inflation_rate_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/cpi_inflation_rate_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/diesel_price_liter_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/diesel_price_liter_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/fertility_rate_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/fertility_rate_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/gdp_growth_rate_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_growth_rate_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/gdp_nominal_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_nominal_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/gdp_nominal_per_capita_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gdp_nominal_per_capita_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/gni_in_ppp_dollars_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gni_in_ppp_dollars_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/gni_per_capita_in_ppp_dollars_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/gni_per_capita_in_ppp_dollars_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/health_expenditure_as_percent_of_gdp_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/health_expenditure_as_percent_of_gdp_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/internet_users_percent_population_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/internet_users_percent_population_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/life_expectancy_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/life_expectancy_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/population_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/population_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/population_growth_rate_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/population_growth_rate_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/prevalence_of_undernourisment_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/prevalence_of_undernourisment_claims.xlsx
--------------------------------------------------------------------------------
/data/labeled_claims/renewable_freshwater_per_capita_claims.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/uclnlp/simpleNumericalFactChecker/989413efe4553c434c1e838bd8557956ecc5bdcc/data/labeled_claims/renewable_freshwater_per_capita_claims.xlsx
--------------------------------------------------------------------------------
/data/locationNames:
--------------------------------------------------------------------------------
1 | Saint Vincent and the Grenadines
2 | St. Vincent and the Grenadines
3 | São Tomé and Príncipe
4 | Sao Tome and Principe
5 | Democratic Republic of São Tomé and Príncipe
6 | Saint Kitts and Nevis
7 | St. Kitts and Nevis
8 | Antigua and Barbuda
9 |
--------------------------------------------------------------------------------
/data/propertiesOfInterest.json:
--------------------------------------------------------------------------------
1 | {"/location/statistical_region/agriculture_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Agriculture (% of GDP)"}, "/location/statistical_region/greenhouse_gas_emission_intensity": {"expectedType": "/measurement_unit/dated_metric_tons_per_million_ppp_dollars", "name": "Greenhouse gas emission intensity"}, "/location/statistical_region/internet_users": {"expectedType": "/measurement_unit/dated_integer", "name": "Internet users"}, "/location/statistical_region/market_cap_of_listed_companies_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Market capitalization of listed companies (% of GDP)"}, "/location/statistical_region/diesel_price_liter": {"expectedType": "/measurement_unit/dated_money_value", "name": "Diesel price (per liter)"}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"expectedType": "/measurement_unit/dated_money_value", "name": "Foreign direct investment, net inflows"}, "/location/statistical_region/gdp_deflator_change": {"expectedType": "/measurement_unit/dated_percentage", "name": "GDP deflator change"}, "/location/statistical_region/greenhouse_gas_emission": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "Greenhouse gas emission"}, "/location/statistical_region/gdp_nominal": {"expectedType": "/measurement_unit/dated_money_value", "name": "GDP (nominal)"}, "/location/statistical_region/smoking_prevalence_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Smoking prevalence rate"}, "/location/statistical_region/lending_interest_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Lending interest rate"}, "/location/statistical_region/deposit_interest_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Deposit interest rate"}, "/location/statistical_region/co2_emissions_mobile": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - mobile"}, "/location/statistical_region/part_time_employment_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Part time employment"}, "/location/statistical_region/prevalence_of_undernourisment": {"expectedType": "/measurement_unit/dated_percentage", "name": "Prevalence of undernourisment"}, "/location/statistical_region/consumer_price_index": {"expectedType": "/measurement_unit/dated_index_value", "name": "Consumer price index"}, "/location/statistical_region/co2_emissions_residential": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - residential"}, "/location/statistical_region/brain_drain_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Emigration of university educated workforce"}, "/location/statistical_region/life_expectancy": {"expectedType": "/measurement_unit/dated_float", "name": "Life expectancy"}, "/location/statistical_region/gross_savings_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Gross Savings (% of GDP)"}, "/location/statistical_region/size_of_armed_forces": {"expectedType": "/measurement_unit/dated_integer", "name": "Size of armed forces"}, "/location/statistical_region/co2_emissions_industrial": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - industrial"}, "/location/statistical_region/natural_gas_production": {"expectedType": "/location/natural_gas_production", "name": "Natural gas production"}, "/location/statistical_region/internet_users_percent_population": {"expectedType": "/measurement_unit/dated_percentage", "name": "Internet users as percentage of population"}, "/location/statistical_region/cpi_inflation_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Inflation rate"}, "/location/statistical_region/rent50_2": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 2 br"}, "/location/statistical_region/rent50_3": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 3 br"}, "/location/statistical_region/rent50_0": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 0 br"}, "/location/statistical_region/rent50_1": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 1 br"}, "/location/statistical_region/global_competitiveness_index": {"expectedType": "/measurement_unit/dated_float", "name": "Global Competitiveness Index"}, "/location/statistical_region/energy_use_per_capita": {"expectedType": "/measurement_unit/dated_kgoe", "name": "Energy use per capita"}, "/location/statistical_region/rent50_4": {"expectedType": "/measurement_unit/dated_money_value", "name": "50th percentile rent - 4 br"}, "/location/statistical_region/oil_production": {"expectedType": "/location/oil_production", "name": "Oil production"}, "/location/statistical_region/greenhouse_gas_emissions_per_capita": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "Greenhouse gas emissions per capita"}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Health expenditure (% of GDP)"}, "/location/statistical_region/time_required_to_start_a_business": {"expectedType": "/measurement_unit/dated_days", "name": "Time required to start a business"}, "/location/statistical_region/gdp_nominal_per_capita": {"expectedType": "/measurement_unit/dated_money_value", "name": "GDP (nominal per capita)"}, "/location/statistical_region/child_labor_percent": {"expectedType": "/measurement_unit/dated_percentage", "name": "Child labor (% of children ages 7-14)"}, "/location/statistical_region/gdp_growth_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "GDP growth rate"}, "/location/statistical_region/literacy_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Literacy rate"}, "/location/statistical_region/fertility_rate": {"expectedType": "/measurement_unit/dated_float", "name": "Fertility rate"}, "/location/statistical_region/tax_revenue_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Tax revenue (% of GDP)"}, "/location/statistical_region/debt_service_as_percent_of_trade_volume": {"expectedType": "/measurement_unit/dated_percentage", "name": "Debt service (% of trade volume)"}, "/location/statistical_region/net_migration": {"expectedType": "/measurement_unit/dated_integer", "name": "Net migration"}, "/location/statistical_region/electricity_production": {"expectedType": "/location/electricity_production", "name": "Electricity production"}, "/location/statistical_region/automobiles_per_capita": {"expectedType": "/measurement_unit/dated_float", "name": "Automobiles per capita"}, "/location/statistical_region/electricity_consumption_per_capita": {"expectedType": "/measurement_unit/dated_kilowatt_hour", "name": "Electricity consumption per capita"}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"expectedType": "/measurement_unit/dated_money_value", "name": "GNI per capita in PPP dollars"}, "/location/statistical_region/co2_emissions_total": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - total"}, "/location/statistical_region/poverty_rate_2dollars_per_day": {"expectedType": "/measurement_unit/dated_float", "name": "Poverty rate ($2 per day)"}, "/location/statistical_region/military_expenditure_percent_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Military expenditure as percentage of GDP"}, "/location/statistical_region/arithmetic_population_density": {"expectedType": "/measurement_unit/dated_float", "name": "Arithmetic population density (per km\u00b2)"}, "/location/statistical_region/services_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Services (% of GDP)"}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Trade balance (% of GDP)"}, "/location/statistical_region/gas_price_liter": {"expectedType": "/measurement_unit/dated_money_value", "name": "Gas price (per liter)"}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Merchandise trade (% of GDP)"}, "/location/statistical_region/hiv_prevalence_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "HIV prevalence rate"}, "/location/statistical_region/labor_participation_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Labor participation rate"}, "/location/statistical_region/government_debt_percent_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Government debt as percent of GDP"}, "/location/statistical_region/population_growth_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Population growth rate"}, "/location/statistical_region/minimum_wage": {"expectedType": "/measurement_unit/recurring_money_value", "name": "Minimum wage"}, "/location/statistical_region/industry_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Industry (% of GDP)"}, "/location/statistical_region/exports_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Exports as percent of GDP"}, "/location/statistical_region/net_workers_remittances": {"expectedType": "/measurement_unit/dated_money_value", "name": "Net workers' remittances"}, "/location/statistical_region/external_debt_stock": {"expectedType": "/measurement_unit/dated_money_value", "name": "External debt stock"}, "/location/statistical_region/unemployment_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Unemployment Rate"}, "/location/statistical_region/broadband_penetration_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Broadband penetration rate"}, "/location/statistical_region/official_development_assistance": {"expectedType": "/measurement_unit/dated_money_value", "name": "Official development assistance"}, "/location/statistical_region/co2_emissions_per_capita": {"expectedType": "/measurement_unit/dated_metric_ton", "name": "CO2 emissions per capita"}, "/location/statistical_region/gross_capital_formation_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Gross capital formation (% of GDP)"}, "/location/statistical_region/gdp_real": {"expectedType": "/measurement_unit/adjusted_money_value", "name": "GDP real"}, "/location/statistical_region/population": {"expectedType": "/measurement_unit/dated_integer", "name": "Population"}, "/location/statistical_region/co2_emissions_commercial": {"expectedType": "/location/co2_emission", "name": "CO2 emissions - commercial"}, "/location/statistical_region/household_consumption_expenditure_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Household consumption expenditures (% of GDP)"}, "/location/statistical_region/imports_as_percent_of_gdp": {"expectedType": "/measurement_unit/dated_percentage", "name": "Imports as percent of GDP"}, "/location/statistical_region/renewable_freshwater_per_capita": {"expectedType": "/measurement_unit/dated_cubic_meters", "name": "Renewable freshwater resources per capita"}, "/location/statistical_region/high_tech_as_percent_of_manufactured_exports": {"expectedType": "/measurement_unit/dated_percentage", "name": "High-tech as % of manufactured exports"}, "/location/statistical_region/long_term_unemployment_rate": {"expectedType": "/measurement_unit/dated_percentage", "name": "Long term unemployment rate"}, "/location/statistical_region/gni_in_ppp_dollars": {"expectedType": "/measurement_unit/dated_money_value", "name": "GNI in PPP dollars"}}
--------------------------------------------------------------------------------
/data/test.json:
--------------------------------------------------------------------------------
1 | {"/location/statistical_region/size_of_armed_forces": {"Qatar": 11800.0, "Turkmenistan": 22000.0, "Eritrea": 201750.0, "Sudan": 264300.0, "Lithuania": 23350.0, "Bahamas": 850.0, "Rwanda": 35000.0, "Bolivia": 83200.0, "Venezuela": 115000.0, "Bangladesh": 220950.0, "Bahrain": 19460.0, "Brunei": 9250.0, "Israel": 184550.0, "Australia": 57050.0, "Iran": 563000.0, "Algeria": 317200.0, "Singapore": 147600.0, "Cameroon": 23100.0, "Japan": 260086.0, "United States of America": 1520100.0, "Guatemala": 42300.0, "Belgium": 34050.0, "Thailand": 474550.0, "Dominican Republic": 39500.0, "Belize": 1050.0, "Ghana": 15500.0, "Kyrgyzstan": 20400.0, "Netherlands": 43300.0, "Gambia": 800.0, "Finland": 25000.0, "Morocco": 245800.0, "Sweden": 21300.0, "Belarus": 158000.0, "Mali": 12150.0, "Syria": 178000.0, "New Zealand": 8550.0, "South Korea": 659500.0, "Honduras": 20000.0, "Myanmar": 513250.0, "Portugal": 90300.0, "Uruguay": 25450.0, "Tunisia": 47800.0, "Cyprus": 12750.0, "Uzbekistan": 68000.0, "Malaysia": 133600.0, "Senegal": 18600.0, "Antigua and Barbuda": 180.0, "Greece": 148350.0, "Kenya": 29100.0, "Niger": 10700.0, "Fiji": 3500.0}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"Canada": 42530.0, "Afghanistan": 1400.0, "Madagascar": 950.0, "Bhutan": 6310.0, "Kuwait": 53820.0, "Nepal": 1500.0, "Qatar": 87030.0, "France": 36720.0, "Bahamas": 29740.0, "Ethiopia": 1140.0, "Slovakia": 24770.0, "Swaziland": 4840.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 1850.0, "Nigeria": 2420.0, "Malawi": 880.0, "Federated States of Micronesia": 4090.0, "Israel": 27120.0, "Australia": 43300.0, "Singapore": 61100.0, "Iceland": 33840.0, "Zambia": 1620.0, "United States of America": 50610.0, "Guatemala": 4960.0, "Germany": 41890.0, "Thailand": 9430.0, "Haiti": 1240.0, "Belize": 6070.0, "Spain": 32320.0, "Ukraine": 7290.0, "Paraguay": 5610.0, "Tanzania": 1590.0, "Central African Republic": 860.0, "Trinidad and Tobago": 22400.0, "Sweden": 44150.0, "Vietnam": 3440.0, "Namibia": 7470.0, "Earth": 12128.69, "Switzerland": 56240.0, "New Zealand": 29960.0, "Yemen": 2180.0, "Pakistan": 3030.0, "Iraq": 4300.0, "Honduras": 3890.0, "Chad": 1320.0, "Portugal": 24770.0, "Democratic Republic of the Congo": 370.0, "United Arab Emirates": 48220.0, "Uruguay": 15570.0, "Azerbaijan": 9410.0, "Malaysia": 16530.0, "Senegal": 1920.0, "Antigua and Barbuda": 19260.0, "Burundi": 560.0, "Kenya": 1760.0, "Botswana": 16520.0}, "/location/statistical_region/gdp_nominal": {"Madagascar": 9946995390.0, "Palau": 179912816.0, "Guinea-Bissau": 973427457.0, "Kenya": 33620684016.0, "Nepal": 18884495628.0, "Kyrgyzstan": 5918610958.0, "Mongolia": 8557529910.0, "Bahamas": 7787514000.0, "Switzerland": 635650112360.0, "Democratic Republic of the Congo": 15642236881.0, "Vanuatu": 819227088.0, "Bolivia": 24426829466.0, "Ecuador": 67002768302.0, "Bahrain": 22945456867.0, "Brunei": 12369689792.0, "Belarus": 55136144037.0, "Iran": 482445000000.0, "Singapore": 239699598462.0, "Iceland": 14059073613.0, "Republic of Ireland": 217275000000.0, "Togo": 3594513925.0, "Zimbabwe": 9900000000.0, "Germany": 3600833333330.0, "Bosnia and Herzegovina": 18088238054.0, "Saint Vincent and the Grenadines": 687993693.0, "Haiti": 7346156703.0, "Belize": 1474000000.0, "Tanzania": 23705302064.0, "Dominica": 482277143.0, "Senegal": 14291456855.0, "Tonga": 435589200.0, "Maldives": 2050135788.0, "Federated States of Micronesia": 318100000.0, "Oman": 71781535039.0, "C\u00f4te d\u2019Ivoire": 24074625536.0, "Finland": 266070833333.0, "Equatorial Guinea": 19789801404.0, "Trinidad and Tobago": 22483115868.0, "Sweden": 538131124807.0, "Croatia": 63850068202.0, "Guyana": 2259288026.0, "Mali": 10589925352.0, "Namibia": 12300698896.0, "Yemen": 33757503322.0, "Pakistan": 211091994819.0, "Morocco": 100221001988.0, "Chad": 9485741541.0, "Estonia": 22184722472.0, "Uruguay": 46709797684.0, "Kosovo": 6446205171.0, "India": 1842000000000.0, "Austria": 418483975383.0, "Timor-Leste": 1054000000.0, "Uganda": 16809623489.0, "Sri Lanka": 59172135299.0, "Myanmar": 51925000000.0, "Niger": 6016960988.0, "Nicaragua": 7297481501.0}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"Brazil": -68093253945.0, "Canada": 8683048195.0, "Hungary": -2946777396.0, "Cambodia": -872503569.0, "France": -1761634578.0, "Rwanda": -106210000.0, "Laos": -300743507.0, "Seychelles": -136777269.0, "Norway": 12234909873.0, "Benin": -194717291.0, "Israel": -7236800000.0, "Zambia": -831500000.0, "United States of America": 176768000000.0, "Cape Verde": -49165747.0, "Papua New Guinea": -28720688.0, "Slovenia": -240865771.0, "Guatemala": -1030405000.0, "Armenia": -473273917.0, "Thailand": 3290308028.0, "Haiti": -181000000.0, "Belize": -193325682.0, "Hong Kong": -229107860.0, "Sierra Leone": -714974888.0, "Dominica": -34259368.0, "Ukraine": -7015000000.0, "Kyrgyzstan": -693589500.0, "Georgia": -602622655.0, "Mauritius": 600533078.0, "Sweden": 19903849686.0, "Latvia": -796300000.0, "Guinea-Bissau": -27709745.0, "Mali": -398496178.0, "New Zealand": -1767785208.0, "Yemen": 712813329.0, "Bulgaria": -1669004093.0, "Iraq": -1030200000.0, "Angola": 5116413413.0, "Estonia": -576823972.0, "Portugal": -6939732362.0, "Uruguay": -2206070525.0, "Tunisia": -432666012.0, "Republic of Macedonia": -140066689.0, "Azerbaijan": -812407000.0, "Nicaragua": -810000000.0, "Djibouti": -79000231.0, "Mozambique": -2090083915.0, "Uganda": -1721169095.0, "Paraguay": -483366667.0, "Antigua and Barbuda": -57723257.0, "South Korea": 18628100000.0, "Tajikistan": -11142170.0}, "/location/statistical_region/life_expectancy": {"Swaziland": 48.659, "Bhutan": 67.285, "Eritrea": 61.417, "France": 81.668, "Bahamas": 75.452, "Slovakia": 75.959, "Suriname": 70.581, "Argentina": 75.798, "Cameroon": 51.576, "Turkmenistan": 64.998, "Federated States of Micronesia": 68.948, "Algeria": 73.08, "Lesotho": 47.984, "Zambia": 48.969, "Papua New Guinea": 62.801, "Togo": 57.027, "Zimbabwe": 51.236, "Germany": 80.741, "Puerto Rico": 79.028, "Thailand": 74.091, "Haiti": 62.062, "Kazakhstan": 68.893, "Sierra Leone": 47.776, "Ukraine": 70.809, "Liberia": 56.743, "Gambia": 58.485, "Philippines": 68.757, "Finland": 80.471, "Aruba": 75.113, "Moldova": 69.212, "Bangladesh": 68.937, "Trinidad and Tobago": 69.963, "Vietnam": 75.051, "Croatia": 76.876, "Guinea-Bissau": 48.113, "Switzerland": 82.695, "Yemen": 65.452, "Seychelles": 73.456, "Albania": 77.042, "Chad": 49.523, "Estonia": 76.127, "Equatorial Guinea": 51.137, "Tunisia": 74.754, "Republic of Macedonia": 74.788, "India": 65.478, "Azerbaijan": 70.653, "Uzbekistan": 68.265, "Malaysia": 74.261, "Senegal": 59.272, "Timor-Leste": 62.461, "Colombia": 73.642, "Greece": 80.744, "Paraguay": 72.485, "Namibia": 62.332, "Niger": 54.691, "Cyprus": 79.563, "Comoros": 61.042}, "/location/statistical_region/internet_users_percent_population": {"Brazil": 49.847999, "Afghanistan": 5.454545, "Republic of Macedonia": 63.1477, "Turkmenistan": 7.1958, "Mauritania": 5.3691, "Sudan": 21.0, "Nepal": 11.1493, "Tonga": 34.8609, "Cambodia": 4.939862, "Democratic Republic of the Congo": 1.679961, "Aruba": 74.0, "Cyprus": 61.0, "Bolivia": 34.188434, "Norway": 95.0, "Burkina Faso": 3.725035, "Ghana": 17.107678, "Slovakia": 80.0, "Australia": 82.349549, "Iran": 25.997636, "Slovenia": 70.0, "Zambia": 13.4682, "Senegal": 19.2036, "Papua New Guinea": 2.301957, "Togo": 4.0, "Guatemala": 16.0, "Hong Kong": 72.8, "Tanzania": 13.0803, "Liberia": 3.7941, "Kyrgyzstan": 21.723509, "Georgia": 45.503098, "Oman": 60.0, "Philippines": 36.2351, "Indonesia": 15.36, "Singapore": 74.1818, "Equatorial Guinea": 13.943182, "Sweden": 94.0, "Belarus": 46.906006, "Gabon": 8.616714, "Mongolia": 16.4, "Switzerland": 85.2, "Yemen": 17.4465, "Angola": 16.93721, "Estonia": 79.0, "Uruguay": 55.1146, "United Arab Emirates": 85.0, "South Africa": 41.0, "Serbia": 48.1, "United Kingdom": 87.0162, "Lesotho": 4.589618, "Djibouti": 8.267233, "Congo": 6.106695, "Antigua and Barbuda": 83.787167, "Greece": 56.0, "Paraguay": 27.0758, "Croatia": 63.0, "Tajikistan": 14.51, "Botswana": 11.5, "Barbados": 73.329814}, "/location/statistical_region/cpi_inflation_rate": {"Brazil": 5.4, "Qatar": 1.87, "Liberia": 6.83, "United States of America": 2.07, "Mali": 5.43, "Cyprus": 2.39, "Cambodia": 2.93, "Burundi": 18.01, "Rwanda": 6.27, "Slovakia": 3.61, "Nigeria": 12.22, "Cameroon": 2.94, "Malawi": 21.27, "Saudi Arabia": 4.48, "Australia": 1.76, "Montenegro": 3.18, "Burkina Faso": 3.82, "Germany": 2.01, "Bosnia and Herzegovina": 2.05, "Oman": 2.91, "Antigua and Barbuda": 3.38, "Dominican Republic": 3.69, "Iraq": 2.88, "Hong Kong": 4.06, "Sierra Leone": 12.87, "Mauritania": 4.94, "Tonga": 1.21, "Georgia": -0.94, "Denmark": 2.41, "Tanzania": 16.0, "Finland": 2.81, "Moldova": 4.68, "Morocco": 1.28, "Latvia": 2.25, "Gabon": 2.66, "Guinea-Bissau": 2.13, "Thailand": 3.01, "New Zealand": 0.88, "Nepal": 9.45, "Russia": 5.07, "Romania": 3.33, "Seychelles": 7.11, "Albania": 2.03, "Uruguay": 8.1, "Tunisia": 5.5, "Fiji": 4.33, "Comoros": 0.87, "United Kingdom": 2.82, "Lesotho": 6.1, "Congo": 3.89, "Timor-Leste": 11.8, "Greece": 1.5, "Sri Lanka": 6.83, "Namibia": 6.54, "Nicaragua": 7.19, "Botswana": 7.54}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"Qatar": 1.91, "Palau": 10.65, "Solomon Islands": 8.83, "Saint Lucia": 7.19, "Nepal": 5.44, "Costa Rica": 10.87, "Mongolia": 5.26, "France": 11.63, "Bahamas": 7.68, "Democratic Republic of the Congo": 8.55, "Rwanda": 10.76, "Laos": 2.77, "Belize": 5.65, "Argentina": 8.11, "Norway": 9.07, "Israel": 7.73, "Australia": 9.03, "Iran": 5.95, "El Salvador": 6.78, "Cape Verde": 4.76, "Slovenia": 9.06, "Germany": 11.06, "Armenia": 4.33, "Gambia": 4.39, "Thailand": 4.06, "Haiti": 7.95, "Iraq": 8.3, "Dominica": 5.87, "Ukraine": 7.19, "Kyrgyzstan": 6.49, "Oman": 2.34, "Finland": 8.85, "Saudi Arabia": 3.69, "Moldova": 11.37, "Trinidad and Tobago": 5.73, "Latvia": 6.17, "Gabon": 3.22, "Cambodia": 5.69, "Kenya": 4.49, "New Zealand": 10.08, "Bulgaria": 7.27, "Pakistan": 2.51, "Seychelles": 3.78, "Samoa": 7.03, "Chad": 4.28, "South Africa": 8.52, "Fiji": 3.82, "Serbia": 10.43, "Azerbaijan": 5.24, "Djibouti": 7.87, "Antigua and Barbuda": 5.94, "Uganda": 9.45, "Republic of Macedonia": 6.57, "Sri Lanka": 3.43, "Cyprus": 7.41, "Tajikistan": 5.78, "Botswana": 5.06}, "/location/statistical_region/time_required_to_start_a_business": {"Canada": 5.0, "Afghanistan": 7.0, "Swaziland": 56.0, "Bhutan": 36.0, "Mauritania": 19.0, "Saint Lucia": 15.0, "Costa Rica": 60.0, "Mongolia": 12.0, "Georgia": 2.0, "Mozambique": 13.0, "Suriname": 694.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.0, "Venezuela": 144.0, "Thailand": 29.0, "Ecuador": 56.0, "Benin": 26.0, "Uzbekistan": 12.0, "Israel": 21.0, "Australia": 2.0, "Iran": 13.0, "Algeria": 25.0, "Singapore": 3.0, "Iceland": 5.0, "Zambia": 17.0, "Bosnia and Herzegovina": 37.0, "Armenia": 8.0, "Kiribati": 31.0, "Spain": 28.0, "Liberia": 6.0, "Tonga": 16.0, "Maldives": 9.0, "Brunei": 101.0, "Gambia": 27.0, "Philippines": 36.0, "Central African Republic": 22.0, "Cameroon": 15.0, "Vietnam": 34.0, "Earth": 29.61, "Syria": 13.0, "New Zealand": 1.0, "Bulgaria": 18.0, "Pakistan": 21.0, "Samoa": 9.0, "Chad": 62.0, "South Africa": 19.0, "United Arab Emirates": 8.0, "United Kingdom": 13.0, "Malaysia": 6.0, "Congo": 161.0, "Timor-Leste": 94.0, "Uganda": 33.0, "Burundi": 8.0, "Japan": 23.0, "Niger": 17.0, "Tajikistan": 24.0, "Botswana": 61.0, "Barbados": 18.0}, "/location/statistical_region/net_migration": {"Afghanistan": -381030.0, "Serbia": 0.0, "Saint Lucia": -1000.0, "Lithuania": -35495.0, "Cambodia": -254942.0, "Netherlands": 50006.0, "Swaziland": -6000.0, "Laos": -74998.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -6496.0, "Kuwait": 277629.0, "Burkina Faso": -125000.0, "Ecuador": -120000.0, "Ghana": -51258.0, "Brunei": 3500.0, "Saudi Arabia": 1055517.0, "Algeria": -140000.0, "Iceland": 10417.0, "Zambia": -85000.0, "Republic of Ireland": 100000.0, "Papua New Guinea": 0.0, "Guatemala": -200000.0, "Bosnia and Herzegovina": -10000.0, "Denmark": 90316.0, "Thailand": 492252.0, "Ukraine": -40006.0, "Portugal": 150002.0, "Maldives": -53.0, "Paraguay": -40000.0, "Gambia": -13742.0, "Finland": 72634.0, "Morocco": -675000.0, "Guinea-Bissau": -10000.0, "Guyana": -40000.0, "Switzerland": 182803.0, "Syria": -55877.0, "South Korea": -30000.0, "Pakistan": -1999998.0, "Honduras": -100000.0, "Chad": -75000.0, "Macau": 50625.0, "South Africa": 700001.0, "Tunisia": -20000.0, "Tajikistan": -296075.0, "Uruguay": -50000.0, "India": -2999998.0, "Azerbaijan": 53264.0, "Lesotho": -19998.0, "Djibouti": 0.0, "Timor-Leste": -49930.0, "Uganda": -135000.0, "United Arab Emirates": 3076634.0, "Burundi": 370000.0, "Japan": 270000.0, "Niger": -28497.0, "Fiji": -28754.0, "Comoros": -10000.0}, "/location/statistical_region/gdp_growth_rate": {"Qatar": 18.8, "Bhutan": 9.441666, "Serbia": -1.7, "Costa Rica": 5.128995, "Mongolia": 12.283366, "Ethiopia": 8.5, "Georgia": 6.0, "Suriname": 4.476504, "Laos": 8.164082, "Argentina": 8.869529, "Venezuela": 5.54077, "Malawi": 1.8858, "Ecuador": 5.000245, "Bahrain": 4.5, "Saudi Arabia": 6.774455, "Australia": 3.397707, "El Salvador": 1.64457, "Thailand": 6.434809, "Montenegro": 3.5, "Cape Verde": 4.292214, "Papua New Guinea": 8.0, "Togo": 5.622674, "Zimbabwe": 5.015681, "Germany": 0.671445, "Bosnia and Herzegovina": -0.7, "Guinea": 3.944115, "Belgium": -0.28101, "Saint Vincent and the Grenadines": 1.524261, "Dominican Republic": 3.887683, "Kazakhstan": 5.0, "Spain": -1.418905, "Eritrea": 7.017109, "Netherlands": -0.95665, "Philippines": 6.590642, "Moldova": -0.800035, "Mauritius": 3.165694, "Trinidad and Tobago": 1.244764, "Cambodia": 7.261323, "Guinea-Bissau": -1.5, "Guyana": 4.816459, "Yemen": 0.137234, "Haiti": 2.822235, "Romania": 3.700039, "Albania": 0.8, "Angola": 6.830624, "Macau": 9.948269, "United Arab Emirates": 4.9, "India": 3.236943, "Azerbaijan": 4.45248, "United Kingdom": 0.27268, "Congo": 3.8, "Mozambique": 7.4, "Iceland": 1.639232, "Earth": 2.156773, "South Korea": 2.044099, "Nicaragua": 5.204732, "Botswana": 6.1}, "/location/statistical_region/fertility_rate": {"Canada": 1.677, "Qatar": 2.232, "Bangladesh": 2.202, "Nepal": 2.658, "France": 2.0, "Democratic Republic of the Congo": 5.657, "Rwanda": 5.339, "Slovakia": 1.4, "Suriname": 2.307, "Nigeria": 5.489, "Venezuela": 2.49, "Czech Republic": 1.49, "Federated States of Micronesia": 3.39, "Singapore": 1.15, "El Salvador": 2.217, "Japan": 1.39, "United States of America": 2.1, "Germany": 1.39, "Armenia": 1.736, "Oman": 2.24, "Belgium": 1.84, "Dominican Republic": 2.544, "Belize": 2.79, "Kazakhstan": 2.59, "Mauritania": 4.464, "Kyrgyzstan": 2.898, "Netherlands": 1.79, "Gambia": 4.814, "C\u00f4te d\u2019Ivoire": 4.348, "Indonesia": 2.09, "Central African Republic": 4.546, "Moldova": 1.466, "Burundi": 4.218, "Latvia": 1.17, "Croatia": 1.46, "Guyana": 2.234, "Namibia": 3.15, "Syria": 2.934, "Russia": 1.54, "Bulgaria": 1.49, "Pakistan": 3.423, "Iraq": 4.702, "Albania": 1.523, "Portugal": 1.32, "South Africa": 2.458, "Republic of Macedonia": 1.411, "Uruguay": 1.986, "Azerbaijan": 2.3, "Lesotho": 3.137, "Congo": 4.504, "Timor-Leste": 5.453, "Uganda": 6.052, "Greece": 1.44, "Paraguay": 2.911, "Earth": 2.451, "Tajikistan": 3.24, "Botswana": 2.696}, "/location/statistical_region/consumer_price_index": {"Czech Republic": 121.1, "Guinea": 331.03, "Nepal": 186.27, "Costa Rica": 172.69, "Bahamas": 117.02, "Rwanda": 173.86, "Aruba": 124.76, "Vanuatu": 120.21, "Bolivia": 157.24, "Norway": 114.16, "Ecuador": 136.59, "Benin": 130.05, "Iran": 316.3, "Slovenia": 120.34, "El Salvador": 126.95, "Iceland": 163.08, "Zambia": 177.64, "United States of America": 117.56, "Cape Verde": 129.74, "Togo": 123.79, "Guatemala": 147.82, "Chile": 107.95, "Gambia": 129.28, "Thailand": 123.55, "Belize": 115.66, "Hong Kong": 122.42, "Philippines": 137.24, "Georgia": 153.68, "Oman": 140.81, "C\u00f4te d\u2019Ivoire": 121.17, "Indonesia": 159.96, "Moldova": 172.83, "Morocco": 113.95, "Sweden": 112.05, "Finland": 116.6, "Kenya": 224.6, "Syria": 203.6, "New Zealand": 121.08, "Yemen": 227.65, "South Korea": 123.41, "Pakistan": 221.91, "Macau": 140.79, "Chad": 122.45, "Uruguay": 165.64, "Republic of Macedonia": 123.69, "Equatorial Guinea": 138.03, "South Africa": 154.95, "Djibouti": 144.74, "Congo": 136.63, "Antigua and Barbuda": 119.55, "Uganda": 202.96, "United Arab Emirates": 116.01, "Paraguay": 157.29, "Japan": 99.27, "Niger": 121.14}, "/location/statistical_region/prevalence_of_undernourisment": {"C\u00f4te d\u2019Ivoire": 21.4, "Kenya": 30.4, "Kuwait": 5.0, "Vanuatu": 8.5, "Maldives": 5.6, "Slovakia": 5.0, "United Kingdom": 5.0, "Laos": 27.8, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.7, "Norway": 5.0, "Burkina Faso": 25.9, "Ghana": 5.0, "Brunei": 5.0, "Australia": 5.0, "Iceland": 5.0, "Republic of Ireland": 5.0, "Slovenia": 5.0, "Germany": 5.0, "Bosnia and Herzegovina": 5.0, "Chile": 5.0, "Kiribati": 8.2, "Djibouti": 19.8, "Liberia": 31.4, "Netherlands": 5.0, "Gambia": 14.4, "Philippines": 17.0, "Indonesia": 8.6, "Central African Republic": 30.0, "Cameroon": 15.7, "North Korea": 32.0, "Trinidad and Tobago": 9.3, "Sweden": 5.0, "Latvia": 5.0, "Croatia": 5.0, "Finland": 5.0, "Mali": 7.9, "Switzerland": 5.0, "Syria": 5.0, "Russia": 5.0, "Pakistan": 19.9, "Estonia": 5.0, "Nicaragua": 20.1, "Uzbekistan": 6.1, "Lesotho": 16.6, "Austria": 5.0, "Congo": 37.4, "Timor-Leste": 38.2, "Burundi": 73.4, "Earth": 12.76, "Niger": 12.6, "Cyprus": 5.0, "Comoros": 70.0, "Barbados": 5.0}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"Afghanistan": -27.57, "Kenya": -20.8, "Sudan": -1.29, "India": -7.71, "Saint Lucia": -21.75, "Hungary": 7.43, "Lithuania": -1.49, "Democratic Republic of the Congo": -6.61, "Rwanda": -19.67, "Aruba": -15.66, "Bolivia": 1.11, "Norway": 13.22, "Ghana": -12.85, "Algeria": 9.35, "Singapore": 22.17, "El Salvador": -18.4, "Montenegro": -23.53, "Saint Kitts and Nevis": -11.63, "Bosnia and Herzegovina": -22.57, "Armenia": -23.81, "Kazakhstan": 20.28, "Spain": 1.02, "Mauritania": -36.23, "Tonga": -42.71, "Georgia": -18.8, "Tanzania": -17.39, "Burundi": -28.31, "Sweden": 6.17, "Latvia": -3.83, "Croatia": -0.06, "Mali": -10.71, "Switzerland": 10.77, "Honduras": -21.3, "Bulgaria": 0.69, "Romania": -7.97, "Albania": -20.26, "Samoa": -27.17, "Trinidad and Tobago": 25.25, "Nicaragua": 25.05, "Ethiopia": -14.97, "Equatorial Guinea": 25.85, "Serbia": -17.0, "Azerbaijan": 32.91, "Botswana": -5.55, "Uzbekistan": -0.2, "Lesotho": -61.5, "Vietnam": -0.45, "Mozambique": -16.8, "Colombia": -1.19, "Paraguay": -0.31, "Slovakia": 2.6, "South Korea": 2.03, "Fiji": -11.01, "Comoros": -36.47}, "/location/statistical_region/gni_in_ppp_dollars": {"Canada": 1483585959450.0, "Afghanistan": 40733504335.0, "Turkmenistan": 49869451990.0, "Sudan": 75344423730.0, "Kuwait": 147286887192.0, "Cambodia": 35140720758.0, "Cape Verde": 2144395901.0, "Slovakia": 134028909210.0, "Swaziland": 5957667254.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 348533299.0, "Bolivia": 52092982470.0, "Cameroon": 50337640645.0, "Ecuador": 148518151985.0, "Benin": 15813194294.0, "Ghana": 49204636851.0, "Saudi Arabia": 698483964625.0, "Australia": 982162738360.0, "Iceland": 10832141089.0, "Zambia": 22772173762.0, "Republic of Ireland": 164621089972.0, "Armenia": 20766498855.0, "Mozambique": 25720859707.0, "Kazakhstan": 200736024235.0, "Sierra Leone": 8125419233.0, "Eritrea": 3440313080.0, "Tonga": 539620568.0, "Netherlands": 731483151504.0, "Uganda": 41415418747.0, "Gambia": 3326562005.0, "Tanzania": 73586580060.0, "Indonesia": 1187981528740.0, "Bangladesh": 319938507853.0, "Trinidad and Tobago": 29957349125.0, "Vietnam": 305613341376.0, "Japan": 4629656540970.0, "Switzerland": 449790421104.0, "Syria": 116510985150.0, "France": 2412626106700.0, "South Korea": 1548728614980.0, "Albania": 29704999640.0, "South Africa": 572631363165.0, "Nicaragua": 23721394234.0, "Uruguay": 52852337840.0, "India": 4749213279870.0, "Azerbaijan": 87524263465.0, "Uzbekistan": 111636498122.0, "Malaysia": 483240939374.0, "Senegal": 26323780629.0, "Congo": 15201717418.0, "Saint Vincent and the Grenadines": 1181987293.0, "Colombia": 482247050718.0, "Greece": 287152170395.0, "Hungary": 205925845708.0, "Niger": 11201853091.0, "Cyprus": 25671055725.0}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"Sudan": 20.76, "Kuwait": 73.1, "Lithuania": 145.9, "Cyprus": 39.39, "Mongolia": 108.3, "France": 47.56, "Bahamas": 54.61, "Equatorial Guinea": 121.49, "Aruba": 52.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 57.26, "Norway": 49.45, "Burkina Faso": 52.68, "Brunei": 99.98, "Australia": 34.05, "Venezuela": 41.14, "Zambia": 80.04, "United States of America": 24.75, "Saint Kitts and Nevis": 36.74, "Papua New Guinea": 76.66, "Slovenia": 140.75, "Bosnia and Herzegovina": 89.08, "Guinea": 54.67, "Mozambique": 74.72, "Spain": 46.28, "Ukraine": 86.88, "Portugal": 61.3, "Maldives": 84.08, "Federated States of Micronesia": 74.88, "Oman": 98.84, "Cameroon": 46.43, "Moldova": 101.67, "Trinidad and Tobago": 93.81, "Sweden": 63.65, "Belarus": 146.04, "Namibia": 84.72, "Mali": 49.48, "Croatia": 58.65, "Samoa": 62.27, "Cape Verde": 43.16, "Bulgaria": 116.45, "Pakistan": 29.74, "Romania": 75.49, "Angola": 84.94, "Estonia": 154.63, "Macau": 23.22, "Morocco": 67.73, "United Arab Emirates": 136.02, "Uruguay": 41.5, "Azerbaijan": 62.95, "Uzbekistan": 43.24, "Malaysia": 139.69, "Timor-Leste": 29.55, "Kenya": 60.21, "Niger": 66.99, "Nicaragua": 81.17, "Botswana": 97.12, "Barbados": 61.87}, "/location/statistical_region/labor_participation_rate": {"Brazil": 43.81, "Netherlands": 45.79, "Mauritania": 26.66, "Solomon Islands": 39.6, "Costa Rica": 36.42, "Morocco": 27.3, "France": 47.42, "Uzbekistan": 39.67, "Rwanda": 52.41, "Tanzania": 49.78, "Suriname": 37.33, "Laos": 50.09, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 37.66, "Bolivia": 44.87, "Malawi": 51.35, "Ecuador": 40.02, "Brunei": 41.7, "Belarus": 48.85, "Iran": 18.21, "El Salvador": 41.61, "Zambia": 46.29, "Republic of Ireland": 44.22, "Burkina Faso": 47.56, "Sudan": 28.9, "Zimbabwe": 49.44, "Chile": 39.72, "C\u00f4te d\u2019Ivoire": 37.47, "Spain": 44.35, "Liberia": 47.48, "Kyrgyzstan": 42.75, "Finland": 47.88, "Armenia": 42.0, "Oman": 16.85, "Philippines": 38.94, "Indonesia": 38.05, "Trinidad and Tobago": 42.23, "Sweden": 47.08, "Gabon": 46.34, "Japan": 42.49, "Syria": 15.15, "Russia": 48.91, "Samoa": 34.26, "Estonia": 49.99, "Ethiopia": 47.06, "Tunisia": 27.23, "Republic of Macedonia": 38.59, "Azerbaijan": 48.87, "United Kingdom": 45.99, "Lesotho": 46.06, "Congo": 48.71, "Saint Vincent and the Grenadines": 41.11, "Sri Lanka": 32.55, "Kenya": 46.53, "Nicaragua": 38.1, "Comoros": 30.45}, "/location/statistical_region/population_growth_rate": {"Canada": 1.14, "Afghanistan": 2.44, "Qatar": 7.05, "Palau": 0.72, "Czech Republic": 0.18, "Sudan": 2.08, "Saint Lucia": 0.89, "Nepal": 1.16, "Cambodia": 1.76, "Democratic Republic of the Congo": 2.74, "Earth": 1.15, "Vanuatu": 2.24, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 2.65, "Argentina": 0.88, "Norway": 1.32, "Malawi": 2.86, "Turkmenistan": 1.29, "Ghana": 2.17, "Israel": 1.81, "Australia": 1.6, "Zambia": 3.19, "United States of America": 0.74, "Saint Kitts and Nevis": 1.15, "Togo": 2.6, "Bosnia and Herzegovina": -0.14, "Armenia": 0.17, "Puerto Rico": -0.73, "Antigua and Barbuda": 1.03, "Dominican Republic": 1.26, "Eritrea": 3.28, "Kyrgyzstan": 1.22, "Georgia": 0.63, "Equatorial Guinea": 2.8, "Moldova": -0.04, "North Korea": 0.53, "Morocco": 1.43, "Namibia": 1.87, "Kiribati": 1.54, "Syria": 1.97, "New Zealand": 0.63, "Yemen": 2.33, "Russia": 0.4, "Mauritius": 0.42, "Kosovo": 0.86, "Angola": 3.12, "Myanmar": 0.85, "Macau": 1.9, "Trinidad and Tobago": 0.33, "Tunisia": 0.97, "United Arab Emirates": 3.1, "Uruguay": 0.35, "United Kingdom": 0.75, "Lesotho": 1.08, "Timor-Leste": 2.88, "Sri Lanka": 1.04, "Hungary": -0.28, "Niger": 3.84, "Nicaragua": 1.46, "Comoros": 2.44}, "/location/statistical_region/diesel_price_liter": {"Canada": 1.23, "Brazil": 1.02, "Qatar": 0.27, "Turkmenistan": 0.2, "Nepal": 1.09, "Mongolia": 1.22, "France": 1.78, "Democratic Republic of the Congo": 1.48, "Rwanda": 1.73, "Swaziland": 1.34, "Benin": 1.26, "Bahrain": 0.17, "Saudi Arabia": 0.07, "Belarus": 0.9, "Indonesia": 0.47, "Lesotho": 1.31, "Zambia": 1.48, "Cape Verde": 1.58, "Togo": 1.22, "Zimbabwe": 1.4, "Germany": 1.88, "Armenia": 1.15, "Haiti": 1.03, "Belize": 1.21, "Sierra Leone": 1.05, "Mauritania": 1.27, "Georgia": 1.37, "Philippines": 1.01, "Finland": 1.95, "Central African Republic": 1.69, "Moldova": 1.4, "Bangladesh": 0.76, "Burundi": 1.47, "Sweden": 2.16, "Australia": 1.57, "Gabon": 0.91, "Mali": 1.25, "Namibia": 1.31, "Romania": 1.73, "Ethiopia": 0.94, "Uruguay": 1.88, "Tunisia": 0.69, "Republic of Macedonia": 1.55, "Kosovo": 1.6, "India": 0.86, "United Kingdom": 2.27, "Malaysia": 0.59, "Greece": 2.08, "Sri Lanka": 0.93, "Japan": 1.61, "Tajikistan": 0.91, "Barbados": 1.14}, "/location/statistical_region/gdp_real": {"Canada": 872845084506.0, "Palau": 126565271.0, "Bhutan": 961365502.0, "Guinea": 4107607446.0, "Saint Lucia": 791562406.0, "Nepal": 8036784848.0, "Costa Rica": 23898072634.0, "Cambodia": 7788044971.0, "Republic of Ireland": 123812040038.0, "Ethiopia": 18322929015.0, "Rwanda": 3593742140.0, "Slovakia": 43788541093.0, "Angola": 25901052471.0, "Swaziland": 1845684558.0, "Vanuatu": 383740629.0, "Venezuela": 156970286735.0, "Burkina Faso": 4548468401.0, "United Kingdom": 1698161182430.0, "Saudi Arabia": 258706344474.0, "Belarus": 26250070655.0, "Singapore": 162404931770.0, "Montenegro": 1385332123.0, "Saint Kitts and Nevis": 375102704.0, "Papua New Guinea": 5103993896.0, "Guatemala": 26721536972.0, "Germany": 2071241017140.0, "Armenia": 4053319790.0, "Belgium": 266511169431.0, "Thailand": 187482253175.0, "Hong Kong": 251168960224.0, "Spain": 712338669975.0, "Ukraine": 47467479851.0, "Liberia": 619202726.0, "C\u00f4te d\u2019Ivoire": 11666499085.0, "Seychelles": 749428459.0, "Namibia": 6089324238.0, "Japan": 5064043338020.0, "Syria": 30733741143.0, "Cape Verde": 944370324.0, "Romania": 56527456853.0, "Albania": 6137563946.0, "Samoa": 328592779.0, "Equatorial Guinea": 6058175791.0, "India": 971486068096.0, "Uzbekistan": 26896407919.0, "Lesotho": 1046135464.0, "Senegal": 6970078285.0, "Congo": 5067059617.0, "Antigua and Barbuda": 814439750.0, "Colombia": 149836914917.0, "Paraguay": 10483452830.0, "Hungary": 57065580674.0}, "/location/statistical_region/population": {"Canada": 34994000.0, "Bangladesh": 161083804.0, "Liberia": 3786764.0, "Guinea": 10221808.0, "Saint Lucia": 176000.0, "Lithuania": 3199342.0, "Mongolia": 2892876.0, "France": 65433714.0, "Ethiopia": 84734262.0, "Aruba": 108141.0, "Argentina": 40764561.0, "Nigeria": 174507539.0, "Ecuador": 14666055.0, "Ghana": 24965816.0, "Brunei": 405938.0, "Israel": 7907900.0, "United States of America": 313914040.0, "Republic of Ireland": 4576317.0, "Papua New Guinea": 6187591.0, "Malawi": 15380888.0, "Togo": 6154813.0, "Guatemala": 14757316.0, "Bosnia and Herzegovina": 3834000.0, "Kuwait": 2818042.0, "Thailand": 69518555.0, "Belize": 356600.0, "Eritrea": 5824000.0, "Oman": 2846145.0, "Philippines": 94852030.0, "Finland": 5421827.0, "Saudi Arabia": 28082541.0, "Moldova": 3559000.0, "North Korea": 24763188.0, "Sweden": 9514406.0, "Latvia": 2027000.0, "Mali": 15839538.0, "Syria": 22399254.0, "Cape Verde": 500585.0, "Fiji": 868400.0, "Seychelles": 86000.0, "Samoa": 183874.0, "Chad": 11525496.0, "Macau": 555731.0, "Democratic Republic of the Congo": 67757577.0, "Tunisia": 10777500.0, "Republic of Macedonia": 2063893.0, "Botswana": 2030738.0, "Malaysia": 28859154.0, "Senegal": 12767556.0, "Vietnam": 87840000.0, "Saint Vincent and the Grenadines": 109365.0, "Greece": 11300410.0, "Paraguay": 6541591.0, "Slovakia": 5440000.0, "South Korea": 50004441.0, "Tajikistan": 6976958.0, "Albania": 3215988.0, "Barbados": 273925.0, "Nicaragua": 5869859.0}, "/location/statistical_region/gdp_nominal_per_capita": {"Canada": 52218.99, "Afghanistan": 619.59, "Madagascar": 447.44, "Turkmenistan": 6510.61, "Mauritania": 1106.14, "Guinea": 591.02, "Vanuatu": 3176.21, "Kiribati": 1743.39, "Cambodia": 945.99, "Swaziland": 3043.5, "Laos": 1399.21, "Belize": 4576.64, "Venezuela": 12766.72, "Burkina Faso": 634.32, "Ecuador": 5456.43, "Bahrain": 18334.17, "Brunei": 41126.61, "Saudi Arabia": 20777.67, "Belarus": 6685.02, "Algeria": 5404.0, "Togo": 574.12, "Cameroon": 1151.36, "Zambia": 1469.12, "Montenegro": 6813.04, "Papua New Guinea": 2184.16, "Slovenia": 22092.26, "Zimbabwe": 787.94, "Thailand": 5473.75, "Haiti": 770.95, "Iraq": 6454.62, "Hong Kong": 36795.82, "Tanzania": 608.85, "Ukraine": 3866.99, "Liberia": 421.7, "Tonga": 4493.87, "C\u00f4te d\u2019Ivoire": 1243.99, "Israel": 31281.47, "Mali": 693.98, "Philippines": 2587.88, "Sweden": 55244.65, "Latvia": 14008.51, "Gabon": 11430.49, "Guyana": 3583.96, "Switzerland": 79052.34, "Bulgaria": 6986.04, "Seychelles": 11758.04, "Honduras": 2264.09, "Chad": 885.11, "Macau": 78275.15, "United Arab Emirates": 40363.16, "United Kingdom": 38514.46, "Malaysia": 10380.54, "Vietnam": 1595.81, "Saint Vincent and the Grenadines": 6515.22, "Uganda": 547.01, "South Korea": 23020.0, "Cyprus": 26315.47, "Barbados": 13076.46}, "/location/statistical_region/renewable_freshwater_per_capita": {"Brazil": 27511.596788, "Afghanistan": 1619.969848, "Bangladesh": 686.892125, "Sudan": 640.860866, "Cambodia": 8256.958747, "France": 3059.431928, "Burundi": 1054.467325, "Swaziland": 2177.932103, "Nigeria": 1345.977605, "Cameroon": 12903.974765, "Benin": 1053.19181, "Iran": 1703.695302, "Zambia": 5882.440958, "United States of America": 9043.999333, "Republic of Ireland": 10706.291891, "Guatemala": 7425.248756, "Chile": 51073.32263, "Belgium": 1086.194611, "Thailand": 3372.069221, "Haiti": 1296.738399, "Iraq": 1108.311645, "Kazakhstan": 3886.180272, "Sierra Leone": 27278.193761, "Saint Kitts and Nevis": 453.078099, "Georgia": 12965.606459, "Paraguay": 14300.716998, "Libya": 114.693311, "Turkmenistan": 275.130476, "Moldova": 280.835688, "Mauritius": 2139.106458, "Morocco": 904.570213, "Gabon": 102883.627325, "Guinea-Bissau": 9850.83375, "Honduras": 12335.615673, "Yemen": 90.112489, "Russia": 30169.27812, "Albania": 8529.168647, "Angola": 7333.815978, "Suriname": 166112.643249, "South Africa": 885.607275, "Tunisia": 393.018419, "United Arab Emirates": 16.806542, "Uruguay": 17437.636804, "Nicaragua": 32124.523255, "Malaysia": 20167.622148, "Austria": 6529.247765, "Congo": 52539.91436, "Antigua and Barbuda": 589.89019, "Uganda": 1109.591698, "Greece": 5132.754264, "Sri Lanka": 2530.068523, "Japan": 3364.177442, "Niger": 211.973961, "Tajikistan": 8120.437372}}
2 |
--------------------------------------------------------------------------------
/data/train.json:
--------------------------------------------------------------------------------
1 | {"/location/statistical_region/size_of_armed_forces": {"Canada": 65700.0, "Cambodia": 191300.0, "Ethiopia": 138000.0, "Sri Lanka": 223100.0, "Argentina": 104350.0, "Burkina Faso": 11450.0, "Saudi Arabia": 249000.0, "Republic of Ireland": 8900.0, "Slovenia": 12100.0, "Bosnia and Herzegovina": 10550.0, "Kuwait": 22600.0, "Spain": 215700.0, "Liberia": 2050.0, "Namibia": 15200.0, "Oman": 47000.0, "Tanzania": 28400.0, "Gabon": 6700.0, "Yemen": 137900.0, "Pakistan": 946000.0, "Albania": 14750.0, "United Arab Emirates": 51000.0, "India": 2647150.0, "Azerbaijan": 81950.0, "Lesotho": 2000.0, "Tajikistan": 16300.0, "Afghanistan": 340350.0, "Czech Republic": 26750.0, "Mongolia": 17200.0, "France": 332250.0, "Slovakia": 15850.0, "Laos": 129100.0, "Norway": 24450.0, "Malawi": 6800.0, "Benin": 9450.0, "Montenegro": 12180.0, "Togo": 9300.0, "Armenia": 55544.0, "Ukraine": 214850.0, "Indonesia": 676500.0, "Central African Republic": 3150.0, "Mauritius": 2500.0, "Vietnam": 522000.0, "Russia": 1364000.0, "Bulgaria": 47300.0, "Romania": 151300.0, "Angola": 117000.0, "Chad": 34850.0, "South Africa": 77582.0, "Austria": 23250.0, "Mozambique": 11200.0, "Uganda": 46800.0, "Hungary": 38500.0, "Brazil": 713480.0, "Guinea": 12300.0, "Costa Rica": 9800.0, "Cape Verde": 1200.0, "Nigeria": 162000.0, "Ecuador": 58983.0, "El Salvador": 32300.0, "Chile": 103750.0, "Haiti": 50.0, "Iraq": 802400.0, "Sierra Leone": 10500.0, "Georgia": 32350.0, "Denmark": 16450.0, "Philippines": 165500.0, "Moldova": 7750.0, "Croatia": 21600.0, "Guinea-Bissau": 6450.0, "Switzerland": 23100.0, "Seychelles": 870.0, "Estonia": 5750.0, "Djibouti": 12950.0, "Timor-Leste": 1330.0, "Colombia": 440224.0, "Burundi": 51050.0, "Nicaragua": 12000.0, "Barbados": 610.0, "Madagascar": 21600.0, "Nepal": 157750.0, "Democratic Republic of the Congo": 134250.0, "Suriname": 1840.0, "Iceland": 180.0, "Zambia": 16500.0, "Papua New Guinea": 3100.0, "Zimbabwe": 50800.0, "Germany": 196000.0, "Kazakhstan": 70500.0, "Mauritania": 20850.0, "North Korea": 1379000.0, "Trinidad and Tobago": 4050.0, "Latvia": 5350.0, "Guyana": 1100.0, "Equatorial Guinea": 1320.0, "Republic of Macedonia": 8000.0, "Serbia": 28150.0, "United Kingdom": 165650.0, "Congo": 12000.0, "Paraguay": 25450.0, "Earth": 28020079.0, "Botswana": 10500.0}, "/location/statistical_region/gni_per_capita_in_ppp_dollars": {"Turkmenistan": 9640.0, "Lithuania": 22760.0, "Cambodia": 2360.0, "Argentina": 17250.0, "Bolivia": 4960.0, "Cameroon": 2320.0, "Burkina Faso": 1510.0, "Bahrain": 21240.0, "Saudi Arabia": 24870.0, "Republic of Ireland": 35870.0, "Slovenia": 27240.0, "Bosnia and Herzegovina": 9380.0, "Dominica": 12190.0, "Liberia": 600.0, "Netherlands": 43620.0, "Oman": 25770.0, "C\u00f4te d\u2019Ivoire": 1960.0, "Gabon": 14290.0, "Albania": 9390.0, "Samoa": 4270.0, "Macau": 68710.0, "India": 3840.0, "Lesotho": 2210.0, "Saint Vincent and the Grenadines": 10810.0, "Cyprus": 29400.0, "South Korea": 30970.0, "Tajikistan": 2220.0, "Bangladesh": 2070.0, "Mauritania": 2520.0, "Solomon Islands": 2170.0, "Saint Lucia": 11020.0, "Hungary": 20710.0, "Mongolia": 5100.0, "Rwanda": 1240.0, "Vanuatu": 4500.0, "Norway": 66960.0, "Benin": 1570.0, "Montenegro": 13930.0, "Saint Kitts and Nevis": 17280.0, "Togo": 920.0, "Armenia": 6990.0, "Dominican Republic": 9820.0, "Ghana": 1940.0, "Tonga": 5140.0, "Indonesia": 4810.0, "Finland": 38630.0, "Mauritius": 15820.0, "Mali": 1160.0, "Russia": 22720.0, "Bulgaria": 15390.0, "Romania": 16310.0, "Angola": 5490.0, "South Africa": 11190.0, "Nicaragua": 3960.0, "Austria": 44100.0, "Mozambique": 1020.0, "Uganda": 1140.0, "Japan": 36290.0, "Niger": 650.0, "Brazil": 11720.0, "Guinea": 980.0, "Costa Rica": 12590.0, "Cape Verde": 4340.0, "Ecuador": 9590.0, "Czech Republic": 24710.0, "Belarus": 15210.0, "Algeria": 8370.0, "El Salvador": 6790.0, "Chile": 21310.0, "Belgium": 40170.0, "Kiribati": 3380.0, "Hong Kong": 53050.0, "Sierra Leone": 1360.0, "Georgia": 5860.0, "Denmark": 43340.0, "Philippines": 4400.0, "Moldova": 3690.0, "Morocco": 5040.0, "Croatia": 19760.0, "Guinea-Bissau": 1190.0, "Seychelles": 25760.0, "Estonia": 22030.0, "Uzbekistan": 3750.0, "Timor-Leste": 6410.0, "Colombia": 10110.0, "Fiji": 4880.0, "Palau": 17150.0, "Sudan": 2030.0, "Laos": 2730.0, "Maldives": 7690.0, "Suriname": 8500.0, "Venezuela": 13120.0, "Papua New Guinea": 2780.0, "Gambia": 1860.0, "Kazakhstan": 11950.0, "Eritrea": 560.0, "Kyrgyzstan": 2260.0, "Latvia": 21020.0, "Guyana": 3400.0, "Syria": 5200.0, "Equatorial Guinea": 18880.0, "Tunisia": 9360.0, "Republic of Macedonia": 11570.0, "Serbia": 11180.0, "United Kingdom": 36880.0, "Congo": 3510.0, "Greece": 25460.0, "Sri Lanka": 6120.0, "Comoros": 1230.0}, "/location/statistical_region/gdp_nominal": {"Canada": 1736050505050.0, "Turkmenistan": 24107017544.0, "Montenegro": 4550463278.0, "Lithuania": 42725404055.0, "Cambodia": 12875310959.0, "Ethiopia": 31708848033.0, "Swaziland": 3977754360.0, "Argentina": 445988571982.0, "Cameroon": 25464850391.0, "Burkina Faso": 10187211704.0, "Ghana": 39199656051.0, "Saudi Arabia": 576824000000.0, "Cape Verde": 1901136230.0, "Slovenia": 49539271106.0, "Guatemala": 46900000257.0, "Kuwait": 176590075215.0, "Spain": 1490809722220.0, "Liberia": 1161000000.0, "Netherlands": 836256944444.0, "Gabon": 17051616749.0, "New Zealand": 161851000000.0, "Albania": 12959563902.0, "Samoa": 649414531.0, "Macau": 36428443915.0, "United Arab Emirates": 360245074960.0, "Azerbaijan": 63403650746.0, "Lesotho": 2426200017.0, "South Korea": 1116247397320.0, "Tajikistan": 6522200291.0, "Afghanistan": 20343461030.0, "Czech Republic": 215215310734.0, "Mauritania": 4075675053.0, "Solomon Islands": 838022105.0, "Saint Lucia": 1232180089.0, "France": 2773032125000.0, "Rwanda": 6377408665.0, "Slovakia": 95994147901.0, "Laos": 8297664741.0, "Norway": 485803392857.0, "Malawi": 5700383783.0, "Benin": 7294865847.0, "United States of America": 15684800000000.0, "Saint Kitts and Nevis": 708955238.0, "Armenia": 10247788878.0, "Dominican Republic": 55611245616.0, "Ukraine": 165245009991.0, "Indonesia": 846832283153.0, "Central African Republic": 2165868600.0, "Mauritius": 11313454891.0, "Vietnam": 123960665229.0, "Russia": 1857769676140.0, "Bulgaria": 53514098360.0, "Romania": 179793512340.0, "Angola": 100990011820.0, "Portugal": 237522083333.0, "South Africa": 408236752338.0, "Cyprus": 24689602447.0, "Malaysia": 278671114817.0, "Mozambique": 12797754231.0, "Japan": 5867154491920.0, "Brazil": 2476652189880.0, "Guinea": 5131221608.0, "Costa Rica": 41006959585.0, "Nigeria": 235922915395.0, "Bangladesh": 110612124360.0, "Australia": 1384145284190.0, "Algeria": 188681099191.0, "El Salvador": 23054100000.0, "Chile": 248585243788.0, "Puerto Rico": 96260500000.0, "Belgium": 511533333333.0, "Thailand": 345649290737.0, "Iraq": 115388468974.0, "Hong Kong": 243665853032.0, "Sierra Leone": 2242960927.0, "Georgia": 14366566609.0, "Denmark": 332677281192.0, "Philippines": 224753569097.0, "Moldova": 7000318677.0, "Kiribati": 177960937.0, "Seychelles": 1007186292.0, "Uzbekistan": 45359432355.0, "Antigua and Barbuda": 1128708617.0, "Colombia": 336345827848.0, "Burundi": 2325972144.0, "Fiji": 3812749216.0, "Barbados": 3685000000.0, "Qatar": 172981588421.0, "Bhutan": 1688939892.0, "Sudan": 55097394769.0, "Suriname": 4350523600.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 248286777.0, "Venezuela": 316482190800.0, "Israel": 242928731135.0, "Zambia": 19206011889.0, "Papua New Guinea": 12937183216.0, "Gambia": 1109306596.0, "Kazakhstan": 186198433060.0, "Eritrea": 2608715447.0, "North Korea": 12384542599.0, "Latvia": 28252498853.0, "Hungary": 140029344474.0, "Syria": 59147033452.0, "Honduras": 17259407972.0, "Tunisia": 45863804800.0, "Republic of Macedonia": 10165373218.0, "Serbia": 45043430299.0, "Comoros": 610114092.0, "United Kingdom": 2445408064520.0, "Congo": 14748024198.0, "Greece": 298733589250.0, "Paraguay": 23877089240.0, "Earth": 69993693036500.0, "Botswana": 17627441191.0}, "/location/statistical_region/foreign_direct_investment_net_inflows": {"Lithuania": -425560872.0, "Ethiopia": -626509560.0, "Aruba": -540670391.0, "Swaziland": -131753962.0, "Argentina": -11461806106.0, "Bolivia": -858666304.0, "Cameroon": -35250979.0, "Burkina Faso": -38150367.0, "Ghana": -3196890000.0, "Saudi Arabia": -7780825000.0, "Republic of Ireland": -10187309748.0, "Bosnia and Herzegovina": -596307428.0, "Guinea": -955240000.0, "Spain": -32927439660.0, "Liberia": -1312748380.0, "Maldives": -281565780.0, "Oman": -215864759.0, "C\u00f4te d\u2019Ivoire": -314068409.0, "Pakistan": -766680000.0, "Albania": -936691049.0, "Samoa": -24499635.0, "Macau": -1484518549.0, "India": -17354000000.0, "Lesotho": -136069926.0, "Saint Vincent and the Grenadines": -109882294.0, "Kenya": -325817353.0, "Belarus": -1343300000.0, "Afghanistan": -75649209.0, "Bangladesh": -1134654834.0, "Solomon Islands": -140957747.0, "Saint Lucia": -80976186.0, "Mongolia": -4620100551.0, "Slovakia": -1597851682.0, "Vanuatu": -58358477.0, "Malawi": -82806903.0, "Singapore": -33570856095.0, "Montenegro": -540907166.0, "Saint Kitts and Nevis": -114056140.0, "Togo": -48641465.0, "Dominican Republic": -2371100000.0, "Bahrain": 112765957.0, "Finland": 6402174412.0, "Libya": 56900000.0, "Indonesia": -14429887628.0, "Vietnam": -6480000000.0, "Russia": -358083300.0, "Romania": -2557000000.0, "South Africa": -6042705282.0, "Fiji": -190357291.0, "Malaysia": 3219856442.0, "Austria": 12289609229.0, "Japan": 117440000000.0, "Niger": -1000054471.0, "Kuwait": 8328173808.0, "Costa Rica": -2098850939.0, "Bahamas": -594982000.0, "Nigeria": -8025110597.0, "Ecuador": -640736359.0, "Czech Republic": -9243781083.0, "Australia": -51676175225.0, "Algeria": -2027471933.0, "El Salvador": -385350000.0, "Chile": -9232973456.0, "Belgium": 16309420385.0, "Gambia": -35998400.0, "Philippines": -952000000.0, "Moldova": -139430000.0, "Morocco": -2273342437.0, "Croatia": -1373993102.0, "Switzerland": 39812665157.0, "Kosovo": -272977856.0, "Timor-Leste": -47074658.0, "Colombia": -16070799122.0, "Burundi": -3354999.0, "Cyprus": -446852288.0, "Barbados": -329023166.0, "Qatar": 1513076923.0, "Bhutan": -16402331.0, "Sudan": -3056748795.0, "Nepal": -94022275.0, "Netherlands": -3736854662.0, "Suriname": -72895403.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -22182553.0, "Venezuela": -756000000.0, "Iceland": -3828118586.0, "Senegal": -263880605.0, "Germany": 60445461151.0, "Denmark": 4160376105.0, "Kazakhstan": -8380073233.0, "Tanzania": -1095401491.0, "Trinidad and Tobago": -549400000.0, "Guyana": -269560000.0, "Syria": -1469196863.0, "Honduras": -996693901.0, "Myanmar": -1000557266.0, "Serbia": -2532729957.0, "United Kingdom": 9022415330.0, "Greece": -2906522266.0, "Sri Lanka": -895920000.0, "Namibia": -948833835.0, "Botswana": -302592364.0}, "/location/statistical_region/life_expectancy": {"Canada": 80.929, "United States of America": 78.49, "Lithuania": 73.563, "Cambodia": 62.977, "Ethiopia": 59.243, "Bolivia": 66.577, "Burkina Faso": 55.358, "Ghana": 64.224, "Saudi Arabia": 74.058, "Cape Verde": 73.917, "Slovenia": 79.971, "Guatemala": 71.072, "Bosnia and Herzegovina": 75.553, "Guinea": 54.092, "Spain": 82.327, "Maldives": 76.883, "Oman": 73.342, "Tanzania": 58.151, "Gabon": 62.691, "New Zealand": 80.905, "Pakistan": 65.449, "Samoa": 72.543, "Macau": 81.019, "United Arab Emirates": 76.743, "Uruguay": 76.412, "Saint Vincent and the Grenadines": 72.295, "Kenya": 57.081, "South Korea": 80.866, "Tajikistan": 67.536, "Afghanistan": 48.681, "Czech Republic": 77.873, "Solomon Islands": 67.863, "Saint Lucia": 74.611, "Hungary": 74.859, "Mongolia": 68.488, "Rwanda": 55.395, "Laos": 67.432, "Norway": 81.295, "Malawi": 54.136, "Benin": 56.014, "Singapore": 81.893, "Montenegro": 74.504, "Armenia": 73.916, "Dominican Republic": 73.438, "Bahrain": 75.156, "Tonga": 72.286, "Libya": 74.95, "Indonesia": 69.319, "Central African Republic": 48.346, "Mauritius": 73.267, "Sweden": 81.802, "Australia": 81.846, "Mali": 51.372, "Russia": 69.005, "Bulgaria": 74.163, "Romania": 74.512, "Angola": 51.059, "Portugal": 80.722, "South Africa": 52.615, "Fiji": 69.349, "Qatar": 78.249, "Austria": 81.032, "Mozambique": 50.151, "Uganda": 54.074, "Japan": 82.591, "Brazil": 73.435, "Kuwait": 74.728, "Costa Rica": 79.315, "Republic of Ireland": 80.495, "Nigeria": 51.863, "Ecuador": 75.63, "Brunei": 78.065, "Belarus": 70.651, "Iran": 72.999, "El Salvador": 71.945, "Chile": 79.017, "Belgium": 80.485, "Iraq": 68.985, "Hong Kong": 83.422, "Georgia": 73.327, "Denmark": 79.8, "Morocco": 72.132, "Belize": 76.053, "Kosovo": 70.149, "Djibouti": 57.909, "Burundi": 50.337, "Nicaragua": 73.996, "Barbados": 76.739, "Madagascar": 66.696, "Sudan": 61.448, "Vanuatu": 71.098, "Democratic Republic of the Congo": 48.369, "Netherlands": 81.205, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 64.592, "Venezuela": 74.312, "Israel": 81.756, "Iceland": 82.359, "C\u00f4te d\u2019Ivoire": 55.421, "Mauritania": 58.547, "Kyrgyzstan": 69.602, "North Korea": 68.676, "Latvia": 73.576, "Guyana": 69.863, "Syria": 75.844, "Nepal": 68.726, "Honduras": 73.113, "Myanmar": 65.15, "Serbia": 74.585, "United Kingdom": 80.754, "Congo": 57.356, "Sri Lanka": 74.902, "Earth": 69.924, "Botswana": 53.018}, "/location/statistical_region/internet_users_percent_population": {"Canada": 86.765864, "United States of America": 81.0252, "Lithuania": 68.0, "Ethiopia": 1.48281, "Swaziland": 20.781783, "Belize": 25.0, "Argentina": 55.8, "Cameroon": 5.698987, "Bahrain": 88.0, "Saudi Arabia": 54.0, "Cape Verde": 34.743414, "Bosnia and Herzegovina": 65.356094, "Kuwait": 79.178201, "Dominica": 55.177014, "Maldives": 38.9301, "C\u00f4te d\u2019Ivoire": 2.378958, "Costa Rica": 47.500915, "New Zealand": 89.5109, "Pakistan": 9.9637, "Albania": 54.655959, "Samoa": 12.92249, "Macau": 64.2727, "India": 12.580061, "Azerbaijan": 54.2, "Saint Vincent and the Grenadines": 47.52, "Kenya": 32.095417, "South Korea": 84.1, "Czech Republic": 75.0, "Solomon Islands": 6.9974, "Saint Lucia": 48.6281, "France": 83.0, "Rwanda": 8.023854, "Laos": 10.747676, "Malawi": 4.3506, "Benin": 3.797705, "Federated States of Micronesia": 25.974423, "Montenegro": 56.838783, "Saint Kitts and Nevis": 79.348899, "Armenia": 39.160792, "Dominican Republic": 45.0, "Ukraine": 33.7, "Libya": 19.8637, "Finland": 91.0, "Central African Republic": 3.0, "Mauritius": 41.3946, "Vietnam": 39.49, "Mali": 2.1689, "Russia": 53.2748, "Bulgaria": 55.148098, "Romania": 50.0, "Chad": 2.1, "Fiji": 33.742357, "Malaysia": 65.8, "Austria": 81.0, "Mozambique": 4.8491, "Uganda": 14.6896, "Japan": 79.05, "Niger": 1.4077, "Guinea": 1.490144, "Guyana": 34.308046, "Qatar": 88.104367, "Republic of Ireland": 79.0, "Bahamas": 71.748203, "Nigeria": 32.8763, "Ecuador": 35.134506, "Bangladesh": 6.3, "Brunei": 60.273065, "Algeria": 15.228027, "El Salvador": 25.5, "Chile": 61.418155, "Puerto Rico": 51.4114, "Belgium": 82.0, "Kiribati": 10.746798, "Haiti": 10.870296, "Iraq": 7.1, "Sierra Leone": 1.3, "Gambia": 12.449229, "Moldova": 43.37, "Morocco": 55.0, "Namibia": 12.9414, "Guinea-Bissau": 2.893991, "Thailand": 26.5, "Seychelles": 47.076, "Portugal": 64.0, "Uzbekistan": 36.5213, "Timor-Leste": 0.9147, "Spain": 72.0, "Colombia": 48.984319, "Burundi": 1.22, "Nicaragua": 13.5, "Madagascar": 2.0549, "Bhutan": 25.434349, "Vanuatu": 10.598, "Netherlands": 93.0, "Suriname": 34.6812, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 21.5724, "Venezuela": 44.0456, "Israel": 73.365016, "Iceland": 96.0, "Zimbabwe": 17.0908, "Germany": 84.0, "Denmark": 93.0, "Kazakhstan": 53.315669, "Eritrea": 0.8, "Trinidad and Tobago": 59.5162, "Latvia": 74.0, "Hungary": 72.0, "Syria": 24.3001, "Honduras": 18.11987, "Myanmar": 1.0691, "Tunisia": 41.4416, "Sri Lanka": 18.2854, "Earth": 35.571014, "Comoros": 5.975296}, "/location/statistical_region/cpi_inflation_rate": {"Canada": 1.52, "Lithuania": 3.08, "Ethiopia": 23.43, "Aruba": 0.56, "Swaziland": 9.4, "Argentina": 10.03, "Bolivia": 4.59, "Ghana": 9.16, "Japan": -0.03, "Republic of Ireland": 1.69, "Slovenia": 2.6, "Guatemala": 3.78, "Kuwait": 2.92, "Dominica": 1.44, "Maldives": 11.29, "C\u00f4te d\u2019Ivoire": 1.31, "Yemen": 17.29, "Pakistan": 9.69, "Samoa": 2.05, "Macau": 6.11, "United Arab Emirates": 0.88, "India": 9.31, "Azerbaijan": 1.06, "Saint Vincent and the Grenadines": 2.6, "Kenya": 9.38, "South Korea": 2.21, "Tajikistan": 5.83, "Afghanistan": 6.8, "Bangladesh": 8.74, "Solomon Islands": 7.34, "Saint Lucia": 4.18, "Mongolia": 14.98, "France": 1.96, "Vanuatu": 0.86, "Norway": 0.71, "Benin": 6.75, "Singapore": 4.53, "Saint Kitts and Nevis": 1.37, "Togo": 2.63, "Armenia": 2.56, "Ukraine": 0.56, "Bahrain": -0.36, "Libya": 6.07, "Indonesia": 4.28, "Central African Republic": 1.3, "Mauritius": 3.85, "Sweden": 0.89, "Vietnam": 9.09, "Bulgaria": 2.95, "Angola": 10.29, "Chad": 10.25, "South Africa": 5.41, "Malaysia": 1.66, "Senegal": 1.42, "Mozambique": 10.35, "Uganda": 14.02, "Hungary": 5.71, "Niger": 0.46, "Guinea": 15.21, "Costa Rica": 4.5, "Cape Verde": 2.54, "Bahamas": 3.17, "Ecuador": 5.1, "Czech Republic": 3.3, "Brunei": 0.46, "Belarus": 59.22, "Iran": 27.34, "Algeria": 8.89, "El Salvador": 1.73, "Chile": 3.01, "Belgium": 2.84, "Haiti": 6.28, "Belize": 1.31, "Gambia": 4.8, "Philippines": 3.17, "Croatia": 3.42, "Switzerland": -0.67, "Portugal": 2.77, "Estonia": 3.93, "Kosovo": 2.48, "Djibouti": 7.88, "Spain": 2.45, "Colombia": 3.18, "Barbados": 4.53, "Madagascar": 6.36, "Bhutan": 10.92, "Sudan": 12.99, "Laos": 4.26, "Netherlands": 2.45, "Suriname": 5.01, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 10.41, "Venezuela": 21.07, "Israel": 1.71, "Iceland": 5.19, "Zambia": 6.43, "Austria": 2.49, "Papua New Guinea": 8.44, "Kazakhstan": 5.11, "Kyrgyzstan": 2.69, "Trinidad and Tobago": 9.26, "Guyana": 2.39, "Syria": 36.7, "Honduras": 5.2, "Myanmar": 5.02, "Equatorial Guinea": 6.95, "Republic of Macedonia": 3.31, "Serbia": 7.33, "Paraguay": 3.68, "Earth": 3.82}, "/location/statistical_region/health_expenditure_as_percent_of_gdp": {"Canada": 11.18, "Turkmenistan": 2.73, "United States of America": 17.85, "Lithuania": 6.6, "Ethiopia": 4.65, "Swaziland": 8.01, "Bolivia": 4.88, "Cameroon": 5.23, "Burkina Faso": 6.51, "Ghana": 4.78, "Republic of Ireland": 9.38, "Guatemala": 6.73, "Bosnia and Herzegovina": 10.21, "Kuwait": 2.66, "Spain": 9.44, "Liberia": 19.48, "Netherlands": 11.96, "Tanzania": 7.28, "Yemen": 5.46, "Albania": 6.32, "United Arab Emirates": 3.35, "India": 3.87, "Lesotho": 12.76, "Saint Vincent and the Grenadines": 4.93, "South Korea": 7.21, "Afghanistan": 9.58, "Bangladesh": 3.72, "Eritrea": 2.56, "Hungary": 7.75, "Slovakia": 8.69, "Vanuatu": 4.12, "Malawi": 8.38, "Benin": 4.57, "Federated States of Micronesia": 13.42, "Singapore": 4.56, "Montenegro": 9.32, "Saint Kitts and Nevis": 4.43, "Togo": 8.01, "Dominican Republic": 5.36, "Bahrain": 3.79, "Tonga": 5.26, "Libya": 4.39, "Indonesia": 2.72, "Central African Republic": 3.79, "Mauritius": 5.89, "Sweden": 9.36, "Vietnam": 6.81, "Mali": 6.81, "Russia": 6.2, "Romania": 5.84, "Angola": 3.49, "Portugal": 10.36, "Malaysia": 3.58, "Austria": 10.64, "Mozambique": 6.59, "Japan": 9.27, "Niger": 5.32, "Brazil": 8.9, "Guinea": 5.96, "Earth": 10.06, "Nigeria": 5.32, "Ecuador": 7.26, "Czech Republic": 7.38, "Brunei": 2.46, "Belarus": 5.32, "Algeria": 3.93, "Chile": 7.46, "Belgium": 10.6, "Kiribati": 10.06, "Sierra Leone": 18.84, "Georgia": 9.89, "Denmark": 11.15, "Philippines": 4.07, "Morocco": 6.03, "Namibia": 5.34, "Guinea-Bissau": 6.28, "Switzerland": 10.86, "Estonia": 5.96, "Uruguay": 8.0, "Uzbekistan": 5.42, "Timor-Leste": 5.07, "Colombia": 6.12, "Burundi": 8.73, "Nicaragua": 10.05, "Barbados": 7.66, "Madagascar": 4.07, "Bhutan": 4.07, "Sudan": 8.39, "Maldives": 8.5, "Suriname": 5.29, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 7.73, "Venezuela": 5.16, "Iceland": 9.07, "Zambia": 6.12, "Senegal": 5.98, "Papua New Guinea": 4.28, "Kazakhstan": 3.92, "C\u00f4te d\u2019Ivoire": 6.78, "Mauritania": 5.39, "Guyana": 5.86, "Syria": 3.74, "Honduras": 8.62, "Myanmar": 2.0, "Equatorial Guinea": 3.95, "Tunisia": 6.16, "United Kingdom": 9.32, "Congo": 2.45, "Greece": 10.83, "Paraguay": 9.72, "Croatia": 7.81, "Comoros": 5.26}, "/location/statistical_region/time_required_to_start_a_business": {"United States of America": 6.0, "Lithuania": 20.0, "Cambodia": 85.0, "Ethiopia": 15.0, "Belize": 44.0, "Argentina": 26.0, "Bolivia": 50.0, "Burkina Faso": 13.0, "Bahrain": 9.0, "Saudi Arabia": 21.0, "Republic of Ireland": 10.0, "Slovenia": 6.0, "Guatemala": 40.0, "Guinea": 35.0, "Dominica": 13.0, "Netherlands": 5.0, "Paraguay": 35.0, "Oman": 8.0, "C\u00f4te d\u2019Ivoire": 32.0, "Gabon": 58.0, "Yemen": 40.0, "Albania": 4.0, "Kosovo": 52.0, "India": 27.0, "Azerbaijan": 8.0, "Lesotho": 24.0, "Saint Vincent and the Grenadines": 10.0, "Kenya": 32.0, "South Korea": 7.0, "Czech Republic": 20.0, "Solomon Islands": 9.0, "Cyprus": 8.0, "France": 7.0, "Rwanda": 3.0, "Slovakia": 16.0, "Vanuatu": 35.0, "Norway": 7.0, "Malawi": 39.0, "Federated States of Micronesia": 16.0, "Montenegro": 10.0, "Saint Kitts and Nevis": 19.0, "Togo": 38.0, "Dominican Republic": 19.0, "Ukraine": 22.0, "Ghana": 12.0, "Indonesia": 47.0, "Finland": 14.0, "Mauritius": 6.0, "Sweden": 16.0, "Mali": 8.0, "Russia": 18.0, "Romania": 10.0, "Angola": 68.0, "Portugal": 5.0, "Nicaragua": 39.0, "Senegal": 5.0, "Hungary": 5.0, "Brazil": 119.0, "Kuwait": 32.0, "Qatar": 9.0, "Cape Verde": 11.0, "Bahamas": 31.0, "Nigeria": 34.0, "Bangladesh": 19.0, "Belarus": 5.0, "El Salvador": 17.0, "Chile": 8.0, "Puerto Rico": 6.0, "Belgium": 4.0, "Haiti": 105.0, "Iraq": 74.0, "Hong Kong": 3.0, "Sierra Leone": 12.0, "Denmark": 6.0, "Moldova": 9.0, "Morocco": 12.0, "Namibia": 66.0, "Guinea-Bissau": 9.0, "Switzerland": 18.0, "Seychelles": 39.0, "Estonia": 7.0, "Uruguay": 7.0, "Laos": 92.0, "Djibouti": 37.0, "Antigua and Barbuda": 21.0, "Colombia": 13.0, "Fiji": 58.0, "Madagascar": 8.0, "Palau": 28.0, "Sudan": 36.0, "Nepal": 29.0, "Democratic Republic of the Congo": 58.0, "Austria": 25.0, "Papua New Guinea": 51.0, "Zimbabwe": 90.0, "Germany": 15.0, "Kazakhstan": 19.0, "Tanzania": 26.0, "Eritrea": 84.0, "Kyrgyzstan": 10.0, "Trinidad and Tobago": 41.0, "Latvia": 16.0, "Guyana": 20.0, "Honduras": 14.0, "Equatorial Guinea": 135.0, "Tunisia": 11.0, "Republic of Macedonia": 2.0, "Serbia": 12.0, "Greece": 11.0, "Sri Lanka": 7.0, "Croatia": 9.0, "Comoros": 20.0}, "/location/statistical_region/net_migration": {"Canada": 1098444.0, "Turkmenistan": -54499.0, "Montenegro": -2508.0, "Ethiopia": -300000.0, "Aruba": 4000.0, "Argentina": -199997.0, "Bolivia": -165177.0, "Cameroon": -19000.0, "Bahrain": 447856.0, "Cape Verde": -17279.0, "Slovenia": 22000.0, "Spain": 2250005.0, "Liberia": 300000.0, "Oman": 153003.0, "C\u00f4te d\u2019Ivoire": -360000.0, "Gabon": 5000.0, "New Zealand": 65004.0, "Yemen": -135000.0, "Albania": -47889.0, "Samoa": -15738.0, "Saint Vincent and the Grenadines": -5000.0, "Kenya": -189330.0, "Czech Republic": 240466.0, "Mauritania": 9900.0, "Solomon Islands": 0.0, "Mongolia": -15001.0, "France": 500001.0, "Rwanda": 15109.0, "Slovakia": 36684.0, "Vanuatu": 0.0, "Norway": 171232.0, "Malawi": -20000.0, "Benin": 50000.0, "Federated States of Micronesia": -9000.0, "Singapore": 721738.0, "United States of America": 4954924.0, "Togo": -5430.0, "Armenia": -75000.0, "Dominican Republic": -140000.0, "Tonga": -8196.0, "Libya": -20300.0, "Indonesia": -1293089.0, "Central African Republic": 5000.0, "Mauritius": 0.0, "Sweden": 265649.0, "Australia": 1124639.0, "Mali": -100823.0, "Russia": 1135737.0, "Bulgaria": -50000.0, "Romania": -100000.0, "Angola": 82005.0, "Nicaragua": -200000.0, "Qatar": 857090.0, "Malaysia": 84494.0, "Senegal": -132842.0, "Vietnam": -430692.0, "Mozambique": -20000.0, "Hungary": 75000.0, "Brazil": -499999.0, "Guinea": -300000.0, "Costa Rica": 75600.0, "Bahamas": 6440.0, "Nigeria": -300000.0, "Bangladesh": -2908015.0, "Belarus": -50010.0, "Iran": -185650.0, "El Salvador": -291710.0, "Chile": 30000.0, "Puerto Rico": -145092.0, "Belgium": 200000.0, "Haiti": -239997.0, "Belize": -972.0, "Hong Kong": 176125.0, "Sierra Leone": 60000.0, "Georgia": -150000.0, "Philippines": -1233365.0, "Moldova": -171748.0, "Namibia": -1494.0, "Iraq": -150021.0, "Estonia": 0.0, "Uzbekistan": -518486.0, "Colombia": -120000.0, "Cyprus": 44166.0, "Barbados": 0.0, "Madagascar": -5000.0, "Bhutan": 16829.0, "Sudan": 135000.0, "Nepal": -100000.0, "Democratic Republic of the Congo": -23975.0, "Suriname": -4998.0, "Venezuela": 40000.0, "Israel": 273635.0, "Austria": 160000.0, "Zimbabwe": -900000.0, "Germany": 550001.0, "Kazakhstan": 6990.0, "Tanzania": -300000.0, "Eritrea": 55000.0, "Kyrgyzstan": -131593.0, "North Korea": 0.0, "Trinidad and Tobago": -19806.0, "Latvia": -10000.0, "Myanmar": -500000.0, "Equatorial Guinea": 20000.0, "Republic of Macedonia": 2000.0, "United Kingdom": 1020211.0, "Congo": 49872.0, "Greece": 154004.0, "Sri Lanka": -249998.0, "Croatia": 10000.0, "Botswana": 18730.0}, "/location/statistical_region/gdp_growth_rate": {"Canada": 1.709006, "Turkmenistan": 11.10011, "Lithuania": 3.7, "Swaziland": -1.5, "Bolivia": 5.17643, "Cameroon": 4.7, "Burkina Faso": 10.034099, "Ghana": 7.909568, "Republic of Ireland": 0.93789, "Slovenia": -2.3, "Guatemala": 2.965657, "Dominica": -1.45165, "Liberia": 10.819873, "Maldives": 3.419255, "Oman": 5.5, "Tanzania": 6.858715, "Seychelles": 2.9, "Gabon": 6.1, "New Zealand": 2.972364, "Pakistan": 4.185263, "Samoa": 1.2, "Kosovo": 3.8, "Lesotho": 3.963882, "Kenya": 4.3, "Tajikistan": 8.0, "Afghanistan": 6.964322, "Czech Republic": -1.323942, "Solomon Islands": 3.9, "Saint Lucia": -3.040925, "France": 0.013879, "Rwanda": 7.981099, "Slovakia": 2.0, "Vanuatu": 2.25, "Norway": 3.091298, "Benin": 5.399974, "Federated States of Micronesia": 1.4, "Singapore": 1.318966, "United States of America": 2.21, "Saint Kitts and Nevis": -1.071206, "Armenia": 7.14438, "Timor-Leste": 8.579973, "Ukraine": 0.2, "Tonga": 0.8, "Indonesia": 6.226484, "Finland": -0.208975, "Central African Republic": 4.1, "Sweden": 0.741078, "Vietnam": 5.027921, "Mali": -1.187974, "Russia": 3.442173, "Bulgaria": 0.8, "Portugal": -3.248857, "South Africa": 2.548464, "Cyprus": -2.4, "Malaysia": 5.612743, "Austria": 0.849901, "Uganda": 3.425419, "Japan": 1.945, "Niger": 11.2, "Brazil": 0.872708, "Kuwait": 8.19, "Bahamas": 1.832172, "Nigeria": 6.55, "Bangladesh": 6.317928, "Brunei": 2.154756, "Belarus": 1.5, "Algeria": 2.5, "Chile": 5.55537, "Puerto Rico": 0.516052, "Kiribati": 2.5, "Belize": 2.0, "Hong Kong": 1.501485, "Sierra Leone": 15.223847, "Gambia": 6.011361, "Morocco": 2.712963, "Croatia": -2.0, "Switzerland": 0.96709, "Iraq": 8.42846, "Chad": 5.020602, "Estonia": 3.224363, "Uruguay": 3.935344, "Uzbekistan": 8.2, "Djibouti": 4.8, "Antigua and Barbuda": 2.331803, "Colombia": 4.004284, "Burundi": 4.003347, "Fiji": 2.152812, "Madagascar": 3.1, "Palau": 5.251819, "Sudan": -10.1, "Nepal": 4.6336, "Democratic Republic of the Congo": 7.150601, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 4.0, "Israel": 4.706663, "Zambia": 7.316899, "Senegal": 3.690368, "Denmark": -0.46941, "C\u00f4te d\u2019Ivoire": 9.496871, "Mauritania": 7.568362, "Kyrgyzstan": -0.899685, "Sri Lanka": 6.409926, "Latvia": 5.6, "Hungary": -1.7, "Syria": 3.2, "Honduras": 3.500041, "Equatorial Guinea": 2.5, "Tunisia": 3.6, "Republic of Macedonia": -0.26668, "Greece": -6.379831, "Paraguay": -1.213102, "Namibia": 4.984768, "Comoros": 2.960789}, "/location/statistical_region/fertility_rate": {"Turkmenistan": 2.363, "Lithuania": 1.55, "Cambodia": 2.51, "Ethiopia": 4.045, "Aruba": 1.687, "Swaziland": 3.286, "Argentina": 2.196, "Bolivia": 3.294, "Cameroon": 4.409, "Burkina Faso": 5.812, "Bahrain": 2.501, "Saudi Arabia": 2.811, "Republic of Ireland": 2.07, "Slovenia": 1.57, "Guatemala": 3.922, "Bosnia and Herzegovina": 1.14, "Guinea": 5.162, "Spain": 1.39, "Liberia": 5.161, "Maldives": 1.709, "Tanzania": 5.529, "Gabon": 3.221, "New Zealand": 2.1, "Yemen": 5.092, "Samoa": 3.815, "Macau": 1.123, "United Arab Emirates": 1.721, "India": 2.589, "Saint Vincent and the Grenadines": 2.037, "Kenya": 4.68, "South Korea": 1.22, "Afghanistan": 6.288, "Solomon Islands": 4.157, "Saint Lucia": 1.98, "Cyprus": 1.468, "Mongolia": 2.494, "Vanuatu": 3.82, "Norway": 1.95, "Malawi": 5.985, "Benin": 5.206, "Montenegro": 1.644, "Togo": 3.985, "Ukraine": 1.445, "Ghana": 4.1, "Tonga": 3.861, "Libya": 2.502, "Finland": 1.87, "Mauritius": 1.47, "Sweden": 1.98, "Australia": 1.92, "Mali": 6.227, "Romania": 1.38, "Angola": 5.313, "Chad": 5.886, "Fiji": 2.639, "Malaysia": 2.607, "Senegal": 4.735, "Vietnam": 1.822, "Mozambique": 4.832, "Hungary": 1.25, "Niger": 7.012, "Brazil": 1.811, "Kuwait": 2.295, "Costa Rica": 1.827, "Cape Verde": 2.344, "Bahamas": 1.891, "Ecuador": 2.443, "Brunei": 2.017, "Belarus": 1.44, "Iran": 1.67, "Algeria": 2.217, "Chile": 1.849, "Puerto Rico": 1.797, "Thailand": 1.559, "Haiti": 3.264, "Hong Kong": 1.108, "Sierra Leone": 4.884, "Georgia": 1.555, "Denmark": 1.87, "Philippines": 3.099, "Morocco": 2.242, "Guinea-Bissau": 4.988, "Switzerland": 1.5, "Seychelles": 2.5, "Estonia": 1.63, "Kosovo": 2.2, "Uzbekistan": 2.499, "Djibouti": 3.679, "Colombia": 2.1, "Nicaragua": 2.573, "Barbados": 1.561, "Madagascar": 4.584, "Bhutan": 2.332, "Sudan": 4.325, "Laos": 2.658, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 3.585, "Israel": 3.03, "Iceland": 2.2, "Zambia": 6.279, "Austria": 1.44, "Papua New Guinea": 3.891, "Zimbabwe": 3.219, "Eritrea": 4.366, "North Korea": 2.01, "Trinidad and Tobago": 1.637, "Honduras": 3.078, "Myanmar": 1.976, "Equatorial Guinea": 5.106, "Tunisia": 2.04, "Serbia": 1.4, "United Kingdom": 1.94, "Sri Lanka": 2.313, "Comoros": 4.919}, "/location/statistical_region/consumer_price_index": {"Canada": 113.74, "Lithuania": 138.22, "Cambodia": 160.23, "Ethiopia": 366.69, "Swaziland": 167.13, "Argentina": 185.85, "Cameroon": 123.58, "Burkina Faso": 122.73, "Ghana": 224.24, "Saudi Arabia": 142.05, "Republic of Ireland": 111.95, "Bosnia and Herzegovina": 124.63, "Spain": 118.84, "Liberia": 187.67, "Maldives": 173.59, "Tanzania": 197.07, "Gabon": 117.15, "Albania": 121.76, "Samoa": 140.51, "India": 180.77, "Azerbaijan": 178.58, "Lesotho": 157.25, "Saint Vincent and the Grenadines": 130.83, "Cyprus": 119.36, "Tajikistan": 202.13, "Afghanistan": 164.2, "Bangladesh": 174.08, "Solomon Islands": 163.2, "Saint Lucia": 122.88, "Mongolia": 211.21, "France": 112.25, "Slovakia": 124.04, "Laos": 142.87, "Malawi": 203.31, "Singapore": 125.0, "Montenegro": 125.52, "Saint Kitts and Nevis": 132.92, "Armenia": 144.55, "Dominican Republic": 153.26, "Ukraine": 212.1, "Bahrain": 113.87, "Tonga": 140.59, "Libya": 153.55, "Central African Republic": 125.24, "Mauritius": 151.83, "Vietnam": 216.05, "Mali": 125.97, "Russia": 185.44, "Bulgaria": 147.54, "Romania": 147.57, "Angola": 233.06, "Portugal": 116.08, "Nicaragua": 184.11, "Malaysia": 119.63, "Austria": 115.86, "Mozambique": 173.52, "Hungary": 142.83, "Brazil": 141.27, "Kuwait": 140.16, "Qatar": 141.06, "Nigeria": 200.79, "Brunei": 107.29, "Australia": 121.81, "Algeria": 139.1, "Belgium": 117.77, "Haiti": 172.6, "Iraq": 170.7, "Sierra Leone": 214.26, "Denmark": 116.89, "Namibia": 157.36, "Guinea-Bissau": 127.44, "Switzerland": 104.03, "Seychelles": 203.06, "Estonia": 137.92, "Kosovo": 127.55, "Timor-Leste": 170.53, "Dominica": 120.76, "Colombia": 133.93, "Burundi": 211.4, "Fiji": 143.57, "Barbados": 151.42, "Madagascar": 184.97, "Bhutan": 161.31, "Sudan": 166.31, "Netherlands": 113.22, "Suriname": 179.32, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 321.23, "Venezuela": 248.68, "Israel": 119.86, "Senegal": 120.1, "Papua New Guinea": 140.63, "Germany": 112.62, "Kazakhstan": 184.47, "Mauritania": 147.32, "Kyrgyzstan": 199.96, "Trinidad and Tobago": 177.82, "Latvia": 147.95, "Guyana": 146.12, "Belarus": 395.63, "Honduras": 156.02, "Myanmar": 235.84, "Tunisia": 133.97, "Serbia": 182.91, "Comoros": 119.49, "United Kingdom": 122.99, "Greece": 122.93, "Sri Lanka": 195.71, "Croatia": 123.23, "Botswana": 181.42}, "/location/statistical_region/prevalence_of_undernourisment": {"Canada": 5.0, "Turkmenistan": 5.0, "Montenegro": 5.0, "Lithuania": 5.0, "Cambodia": 17.1, "Ethiopia": 40.2, "Swaziland": 27.0, "Argentina": 5.0, "Bolivia": 24.1, "Saudi Arabia": 5.0, "Japan": 5.0, "Cape Verde": 8.9, "Guatemala": 30.4, "Spain": 5.0, "Tanzania": 38.8, "Gabon": 6.5, "New Zealand": 5.0, "Yemen": 32.4, "Albania": 5.0, "Samoa": 5.0, "United Arab Emirates": 5.0, "India": 17.5, "Azerbaijan": 5.0, "Saint Vincent and the Grenadines": 5.0, "South Korea": 5.0, "Tajikistan": 31.7, "Czech Republic": 5.0, "Mauritania": 9.3, "Solomon Islands": 12.7, "Saint Lucia": 14.6, "Mongolia": 24.2, "France": 5.0, "Rwanda": 28.9, "Malawi": 23.1, "Benin": 8.1, "United States of America": 5.0, "Saint Kitts and Nevis": 14.0, "Togo": 16.5, "Armenia": 5.0, "Dominican Republic": 15.4, "Ukraine": 5.0, "Libya": 5.0, "Mauritius": 5.7, "Vietnam": 9.0, "Bulgaria": 5.0, "Romania": 5.0, "Angola": 27.4, "Portugal": 5.0, "South Africa": 5.0, "Malaysia": 5.0, "Senegal": 20.5, "Mozambique": 39.2, "Uganda": 34.6, "Hungary": 5.0, "Brazil": 6.9, "Guinea": 17.3, "Costa Rica": 6.5, "Bahamas": 7.2, "Nigeria": 8.5, "Ecuador": 18.3, "Bangladesh": 16.8, "Belarus": 5.0, "Iran": 5.0, "Algeria": 5.0, "El Salvador": 12.3, "Belgium": 5.0, "Thailand": 7.3, "Haiti": 44.5, "Belize": 6.8, "Sierra Leone": 28.8, "Georgia": 24.7, "Denmark": 5.0, "Moldova": 5.0, "Morocco": 5.5, "Namibia": 33.9, "Guinea-Bissau": 8.7, "Seychelles": 8.6, "Chad": 33.4, "Uruguay": 5.0, "Antigua and Barbuda": 20.5, "Dominica": 5.0, "Colombia": 12.6, "Fiji": 5.0, "Madagascar": 33.4, "Sudan": 39.4, "Nepal": 18.0, "Suriname": 11.4, "Venezuela": 5.0, "Israel": 5.0, "Zambia": 47.4, "Zimbabwe": 32.8, "Kazakhstan": 5.0, "Eritrea": 65.4, "Kyrgyzstan": 6.4, "Paraguay": 25.5, "Guyana": 5.1, "Honduras": 9.6, "Tunisia": 5.0, "Republic of Macedonia": 5.0, "Serbia": 5.0, "Greece": 5.0, "Sri Lanka": 24.0, "Botswana": 27.9}, "/location/statistical_region/trade_balance_as_percent_of_gdp": {"Canada": -1.2, "Turkmenistan": 7.99, "Cambodia": -5.42, "Swaziland": -11.87, "Argentina": 2.58, "Cameroon": -4.27, "Burkina Faso": -7.43, "Saudi Arabia": 30.98, "Republic of Ireland": 24.14, "Slovenia": 1.03, "Guatemala": -11.25, "Kuwait": 44.86, "Dominica": -19.79, "Liberia": -65.84, "Netherlands": 8.84, "New Zealand": 1.48, "Yemen": -4.11, "Pakistan": -8.16, "Macau": 58.57, "United Arab Emirates": 9.18, "Kosovo": -40.85, "Saint Vincent and the Grenadines": -28.96, "Tajikistan": -39.25, "Bangladesh": -10.33, "Solomon Islands": -16.98, "Mongolia": -25.93, "France": -2.22, "Laos": -6.26, "Vanuatu": -5.67, "Malawi": -9.91, "Benin": -11.98, "United States of America": -3.79, "Togo": -17.22, "Dominican Republic": -8.26, "Ukraine": -7.64, "Finland": -0.61, "Indonesia": 2.14, "Central African Republic": -11.56, "Mauritius": -11.87, "Australia": 0.26, "Russia": 7.21, "Angola": 19.27, "Portugal": -0.6, "South Africa": -3.05, "Malaysia": 11.93, "Senegal": -19.52, "Uganda": -9.46, "Japan": -0.91, "Niger": -30.01, "Brazil": -0.9, "Guinea": -37.04, "Costa Rica": -4.81, "Cape Verde": -25.88, "Bahamas": -18.1, "Nigeria": 3.99, "Ecuador": -2.37, "Czech Republic": 5.34, "Brunei": 50.19, "Belarus": 6.11, "Chile": 0.8, "Puerto Rico": 19.92, "Belgium": 1.19, "Thailand": 2.74, "Haiti": -41.44, "Belize": 0.31, "Hong Kong": 0.04, "Sierra Leone": -37.34, "Denmark": 4.4, "Philippines": -5.68, "Moldova": -40.4, "Morocco": -12.63, "Namibia": -6.1, "Seychelles": -53.48, "Chad": 15.22, "Estonia": 0.49, "Uruguay": -3.4, "Antigua and Barbuda": -9.53, "Cyprus": -6.44, "Barbados": -5.06, "Madagascar": -11.05, "Palau": 7.33, "Bhutan": -21.1, "Nepal": -22.81, "Maldives": -0.91, "Suriname": -3.46, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": -45.9, "Venezuela": 6.33, "Israel": -0.86, "Iceland": 6.31, "Zambia": 8.57, "Austria": 3.53, "Papua New Guinea": 3.18, "Zimbabwe": -23.73, "Germany": 5.65, "Gambia": -18.3, "Eritrea": -3.16, "Kyrgyzstan": -30.87, "Syria": -0.04, "Tunisia": -10.65, "Republic of Macedonia": -20.45, "United Kingdom": -2.35, "Congo": 33.74, "Greece": -5.03, "Sri Lanka": -14.55, "Earth": 0.11}, "/location/statistical_region/gni_in_ppp_dollars": {"Montenegro": 8654162319.0, "Lithuania": 67938246602.0, "Ethiopia": 104221119312.0, "Argentina": 703278101130.0, "Burkina Faso": 24933633406.0, "Bahrain": 26801669108.0, "Slovenia": 56072441827.0, "Guatemala": 74834109960.0, "Bosnia and Herzegovina": 35979806912.0, "Spain": 1493845075000.0, "Liberia": 2507173874.0, "Maldives": 2601820817.0, "Oman": 71695822352.0, "C\u00f4te d\u2019Ivoire": 38821061079.0, "Seychelles": 2261451674.0, "Gabon": 23328015948.0, "New Zealand": 131980720498.0, "Yemen": 54066290308.0, "Pakistan": 543576254992.0, "Samoa": 807211757.0, "Macau": 37533041979.0, "United Arab Emirates": 380512914082.0, "Madagascar": 21178794693.0, "Lesotho": 4527708912.0, "Kenya": 76054614484.0, "Tajikistan": 17819107731.0, "Czech Republic": 259789481557.0, "Solomon Islands": 1191968090.0, "Saint Lucia": 1992531771.0, "Mongolia": 14265391693.0, "Rwanda": 13568517097.0, "Laos": 18141360226.0, "Norway": 336059622022.0, "Malawi": 13924085669.0, "Federated States of Micronesia": 423304417.0, "Singapore": 324599341500.0, "United States of America": 15887600000000.0, "Saint Kitts and Nevis": 925927382.0, "Togo": 6098235684.0, "Timor-Leste": 7760635504.0, "Dominican Republic": 100960380994.0, "Ukraine": 332544983323.0, "Finland": 209150898182.0, "Central African Republic": 3873992201.0, "Mauritius": 20425210760.0, "Sweden": 420148431023.0, "Mali": 17163548455.0, "Russia": 3260623066100.0, "Bulgaria": 112444585303.0, "Romania": 347816274800.0, "Angola": 114272478894.0, "Chad": 16447992439.0, "Austria": 373185776984.0, "Brazil": 2328799385170.0, "Guinea": 11279280880.0, "Costa Rica": 60486240692.0, "Bahamas": 10894769805.0, "Nigeria": 409083176913.0, "Belarus": 143909024541.0, "Algeria": 301065051176.0, "El Salvador": 42772730106.0, "Chile": 372116945471.0, "Belgium": 447572881592.0, "Kiribati": 340465519.0, "Haiti": 12602329409.0, "Iraq": 140186508091.0, "Hong Kong": 379564207517.0, "Georgia": 26448099223.0, "Denmark": 242271306781.0, "Philippines": 425233320593.0, "Moldova": 13138151958.0, "Morocco": 166575345252.0, "Croatia": 84326098327.0, "Guinea-Bissau": 1980619530.0, "Thailand": 629981459136.0, "Namibia": 16879684702.0, "Belize": 2164586795.0, "Portugal": 260731604978.0, "Estonia": 29511428089.0, "Antigua and Barbuda": 1715280881.0, "Dominica": 873945056.0, "Burundi": 5481978148.0, "Fiji": 4265435843.0, "Qatar": 162744911650.0, "Palau": 355858623.0, "Bhutan": 4678326829.0, "Vanuatu": 1111709036.0, "Democratic Republic of the Congo": 24528773905.0, "Suriname": 4541111746.0, "Venezuela": 393044537438.0, "Israel": 210599676916.0, "Papua New Guinea": 19934921202.0, "Germany": 3430106816840.0, "Mauritania": 9569602448.0, "Kyrgyzstan": 12639933104.0, "Sri Lanka": 124507519600.0, "Latvia": 42566661549.0, "Guyana": 2702511288.0, "Nepal": 41088560414.0, "Honduras": 30907870013.0, "Equatorial Guinea": 13900860150.0, "Tunisia": 100881806830.0, "Republic of Macedonia": 24354018399.0, "Serbia": 80790555790.0, "Botswana": 33114218810.0, "United Kingdom": 2331851340120.0, "Paraguay": 37538361759.0, "Earth": 85463197120800.0, "Comoros": 881534824.0}, "/location/statistical_region/merchandise_trade_percent_of_gdp": {"Canada": 51.04, "Turkmenistan": 76.31, "Cambodia": 136.54, "Ethiopia": 34.78, "Swaziland": 102.76, "Argentina": 31.53, "Bolivia": 70.28, "Bahrain": 118.68, "Saudi Arabia": 86.06, "Republic of Ireland": 85.26, "Guatemala": 53.37, "Dominica": 49.0, "Liberia": 86.3, "Netherlands": 161.42, "C\u00f4te d\u2019Ivoire": 89.75, "Costa Rica": 64.17, "Seychelles": 125.62, "Gabon": 85.2, "New Zealand": 44.37, "Yemen": 57.51, "Albania": 52.21, "India": 42.49, "Lesotho": 151.17, "Saint Vincent and the Grenadines": 55.29, "South Korea": 94.5, "Tajikistan": 73.52, "Afghanistan": 37.52, "Bangladesh": 51.25, "Eritrea": 45.93, "Solomon Islands": 96.19, "Saint Lucia": 75.03, "Rwanda": 34.77, "Slovakia": 173.82, "Vanuatu": 44.57, "Malawi": 85.6, "Benin": 47.64, "Singapore": 286.9, "Montenegro": 66.38, "Togo": 73.42, "Armenia": 57.46, "Dominican Republic": 45.12, "Ghana": 73.69, "Tonga": 47.92, "Indonesia": 43.09, "Finland": 59.52, "Central African Republic": 24.78, "Mauritius": 74.82, "Vietnam": 161.2, "Russia": 42.92, "Chad": 58.99, "South Africa": 54.65, "Austria": 86.17, "Uganda": 41.87, "Japan": 28.26, "Brazil": 21.12, "Guyana": 112.26, "Madagascar": 45.61, "Nigeria": 62.83, "Ecuador": 58.12, "Czech Republic": 151.88, "Iran": 37.4, "Algeria": 58.08, "El Salvador": 65.62, "Chile": 58.92, "Belgium": 182.17, "Kiribati": 62.6, "Haiti": 46.15, "Belize": 83.34, "Hong Kong": 397.93, "Sierra Leone": 63.22, "Georgia": 64.56, "Gambia": 52.33, "Philippines": 46.89, "Guinea-Bissau": 42.34, "Thailand": 130.51, "Switzerland": 66.98, "Iraq": 72.0, "Laos": 54.85, "Djibouti": 48.69, "Antigua and Barbuda": 49.73, "Colombia": 32.26, "Burundi": 36.81, "Fiji": 87.33, "Qatar": 83.35, "Palau": 64.36, "Bhutan": 90.47, "Nepal": 38.42, "Democratic Republic of the Congo": 69.39, "Suriname": 88.64, "Israel": 59.12, "Iceland": 71.98, "Senegal": 63.21, "Zimbabwe": 75.83, "Germany": 75.73, "Denmark": 63.36, "Kazakhstan": 67.84, "Tanzania": 58.81, "Mauritania": 126.22, "Kyrgyzstan": 112.27, "Paraguay": 73.54, "Latvia": 109.19, "Hungary": 158.61, "Syria": 16.02, "Honduras": 106.3, "Tunisia": 90.79, "Republic of Macedonia": 108.79, "Serbia": 81.0, "United Kingdom": 47.17, "Congo": 118.44, "Greece": 37.93, "Sri Lanka": 48.07, "Earth": 50.46, "Comoros": 54.54}, "/location/statistical_region/labor_participation_rate": {"Canada": 47.15, "Turkmenistan": 39.22, "Lithuania": 50.59, "Cambodia": 50.1, "Swaziland": 39.53, "Argentina": 40.29, "Cameroon": 45.69, "Ghana": 49.66, "Saudi Arabia": 14.54, "Cape Verde": 38.46, "Slovenia": 45.63, "Guatemala": 38.16, "Bosnia and Herzegovina": 39.24, "Kuwait": 23.91, "Maldives": 42.01, "New Zealand": 46.8, "Yemen": 25.97, "Pakistan": 20.81, "Albania": 41.4, "Macau": 48.85, "United Arab Emirates": 14.43, "India": 25.37, "South Korea": 41.4, "Tajikistan": 43.76, "Afghanistan": 15.84, "Czech Republic": 43.33, "Saint Lucia": 46.7, "Mongolia": 46.08, "Slovakia": 44.77, "Vanuatu": 43.47, "Norway": 47.05, "Benin": 47.06, "Singapore": 43.35, "United States of America": 46.37, "Togo": 51.06, "Dominican Republic": 39.6, "Ukraine": 49.2, "Bahrain": 19.39, "Tonga": 42.69, "Libya": 27.96, "Central African Republic": 47.18, "Mauritius": 37.69, "Vietnam": 48.77, "Mali": 34.75, "Bulgaria": 46.42, "Romania": 44.58, "Angola": 45.96, "Chad": 44.86, "South Africa": 43.95, "Fiji": 32.41, "Malaysia": 37.38, "Austria": 46.04, "Mozambique": 53.19, "Uganda": 48.96, "Hungary": 45.91, "Niger": 31.13, "Guinea": 45.66, "Qatar": 11.97, "Bahamas": 48.35, "Earth": 39.97, "Nigeria": 42.55, "Bangladesh": 39.98, "Australia": 45.4, "Algeria": 17.07, "Puerto Rico": 42.34, "Belgium": 45.38, "Thailand": 45.79, "Haiti": 47.32, "Iraq": 17.47, "Hong Kong": 46.6, "Sierra Leone": 49.56, "Georgia": 46.95, "Denmark": 47.11, "Moldova": 49.31, "Namibia": 47.92, "Guinea-Bissau": 47.06, "Switzerland": 45.83, "Belize": 37.46, "Portugal": 47.48, "Uruguay": 44.52, "Djibouti": 34.93, "Timor-Leste": 33.55, "Colombia": 42.67, "Burundi": 51.4, "Cyprus": 43.5, "Barbados": 46.44, "Madagascar": 48.99, "Bhutan": 41.5, "Nepal": 50.68, "Democratic Republic of the Congo": 49.94, "Venezuela": 39.57, "Israel": 46.99, "Iceland": 47.34, "Senegal": 44.84, "Papua New Guinea": 48.27, "Germany": 45.64, "Gambia": 47.88, "Kazakhstan": 49.15, "Eritrea": 48.09, "North Korea": 47.84, "Latvia": 50.3, "Guyana": 35.12, "Honduras": 34.33, "Myanmar": 49.77, "Equatorial Guinea": 44.73, "Greece": 41.73, "Paraguay": 39.98, "Croatia": 45.83, "Botswana": 46.71}, "/location/statistical_region/population_growth_rate": {"Lithuania": -1.48, "Ethiopia": 2.58, "Aruba": 0.44, "Swaziland": 1.54, "Bolivia": 1.65, "Cameroon": 2.54, "Burkina Faso": 2.86, "Bahrain": 1.92, "Saudi Arabia": 1.88, "Cape Verde": 0.78, "Slovenia": 0.26, "Guatemala": 2.53, "Kuwait": 3.95, "Dominica": 0.4, "Liberia": 2.68, "Netherlands": 0.45, "Oman": 9.13, "Tanzania": 3.04, "Gabon": 2.39, "Pakistan": 1.69, "Albania": 0.26, "Samoa": 0.78, "India": 1.26, "Azerbaijan": 1.35, "Saint Vincent and the Grenadines": 0.01, "Kenya": 2.7, "South Korea": 0.45, "Tajikistan": 2.45, "Bangladesh": 1.19, "Solomon Islands": 2.13, "Mongolia": 1.52, "France": 0.5, "Rwanda": 2.77, "Slovakia": 0.22, "Laos": 1.89, "Benin": 2.73, "Federated States of Micronesia": -0.03, "Singapore": 2.45, "Montenegro": 0.07, "Ukraine": -0.25, "Tonga": 0.37, "Indonesia": 1.25, "Libya": 0.84, "Finland": 0.48, "Central African Republic": 1.99, "Sweden": 0.71, "Vietnam": 1.06, "Mali": 2.99, "Bulgaria": -0.6, "Romania": -0.27, "Portugal": -0.29, "South Africa": 1.18, "Fiji": 0.78, "Malaysia": 1.66, "Austria": 0.46, "Mozambique": 2.5, "Uganda": 3.35, "Japan": -0.2, "Brazil": 0.87, "Guinea": 2.56, "Costa Rica": 1.42, "Republic of Ireland": 0.26, "Bahamas": 1.52, "Nigeria": 2.79, "Ecuador": 1.6, "Brunei": 1.4, "Belarus": -0.1, "Iran": 1.32, "Algeria": 1.89, "El Salvador": 0.66, "Chile": 0.9, "Belgium": 0.85, "Thailand": 0.31, "Haiti": 1.39, "Belize": 2.43, "Hong Kong": 1.17, "Sierra Leone": 1.91, "Denmark": 0.36, "Philippines": 1.72, "Croatia": -0.32, "Guinea-Bissau": 2.39, "Switzerland": 1.07, "Seychelles": 0.39, "Chad": 3.0, "Estonia": -0.04, "Uzbekistan": 1.47, "Djibouti": 1.52, "Spain": 0.09, "Colombia": 1.32, "Burundi": 3.19, "Cyprus": 1.11, "Barbados": 0.5, "Madagascar": 2.8, "Bhutan": 1.68, "Maldives": 1.93, "Suriname": 0.9, "Venezuela": 1.53, "Iceland": 0.35, "Senegal": 2.92, "Papua New Guinea": 2.17, "Zimbabwe": 2.7, "Germany": 0.11, "Gambia": 3.19, "Kazakhstan": 1.43, "C\u00f4te d\u2019Ivoire": 2.29, "Mauritania": 2.49, "Iraq": 2.54, "Latvia": -1.6, "Guyana": 0.57, "Honduras": 2.03, "Republic of Macedonia": 0.08, "Serbia": -0.48, "Congo": 2.61, "Greece": -0.18, "Paraguay": 1.72, "Botswana": 0.86}, "/location/statistical_region/diesel_price_liter": {"United States of America": 1.05, "Lithuania": 1.7, "Cambodia": 1.27, "Argentina": 1.33, "Bolivia": 0.54, "Cameroon": 1.01, "Burkina Faso": 1.28, "Ghana": 0.95, "Republic of Ireland": 1.98, "Slovenia": 1.77, "Guatemala": 1.04, "Bosnia and Herzegovina": 1.62, "Guinea": 1.34, "Spain": 1.75, "Liberia": 1.22, "Maldives": 1.09, "Oman": 0.38, "C\u00f4te d\u2019Ivoire": 1.2, "New Zealand": 1.24, "Yemen": 0.47, "Pakistan": 1.2, "Albania": 1.79, "Samoa": 1.06, "United Arab Emirates": 0.64, "Azerbaijan": 0.57, "Kenya": 1.26, "South Korea": 1.63, "Afghanistan": 1.21, "Czech Republic": 1.87, "Slovakia": 1.85, "Laos": 1.18, "Norway": 2.35, "Malawi": 1.9, "Singapore": 1.26, "Montenegro": 1.75, "Antigua and Barbuda": 0.96, "Dominican Republic": 1.35, "Ukraine": 1.25, "Libya": 0.1, "Mauritius": 1.38, "Vietnam": 1.06, "Russia": 1.0, "Bulgaria": 1.68, "Angola": 0.42, "Portugal": 1.89, "South Africa": 1.42, "Fiji": 1.29, "Senegal": 1.53, "Mozambique": 1.23, "Uganda": 1.35, "Hungary": 1.91, "Niger": 1.12, "Kuwait": 0.2, "Costa Rica": 1.36, "Nigeria": 1.09, "Ecuador": 0.29, "Brunei": 0.26, "Iran": 0.12, "Algeria": 0.17, "El Salvador": 1.17, "Chile": 1.24, "Belgium": 1.98, "Thailand": 0.97, "Iraq": 0.56, "Hong Kong": 1.57, "Denmark": 1.89, "Morocco": 0.96, "Croatia": 1.7, "Switzerland": 2.06, "Chad": 1.16, "Estonia": 1.76, "Uzbekistan": 0.87, "Djibouti": 1.18, "Timor-Leste": 1.43, "Colombia": 1.18, "Cyprus": 1.78, "Madagascar": 1.22, "Bhutan": 0.86, "Sudan": 0.51, "Netherlands": 1.95, "Suriname": 1.52, "Venezuela": 0.01, "Israel": 2.12, "Iceland": 2.06, "Austria": 1.81, "Kazakhstan": 0.67, "Tanzania": 1.27, "Eritrea": 1.71, "Kyrgyzstan": 0.79, "North Korea": 1.31, "Latvia": 1.77, "Guyana": 1.05, "Syria": 0.36, "Honduras": 1.15, "Myanmar": 0.8, "Nicaragua": 1.19, "Serbia": 1.8, "Congo": 0.92, "Paraguay": 1.31, "Earth": 1.27, "Botswana": 1.25}, "/location/statistical_region/gdp_real": {"Turkmenistan": 9915617757.0, "Lithuania": 17527068250.0, "Argentina": 434405530244.0, "Bolivia": 12249026878.0, "Cameroon": 13905299155.0, "Ghana": 8722164062.0, "Slovenia": 26001281943.0, "Bosnia and Herzegovina": 8193311384.0, "Dominica": 326872087.0, "Maldives": 1072369790.0, "Tanzania": 19954809364.0, "Gabon": 6287360043.0, "New Zealand": 66856387828.0, "Pakistan": 116334731021.0, "Kosovo": 3358236649.0, "Azerbaijan": 21230787904.0, "Saint Vincent and the Grenadines": 441125963.0, "Kenya": 18938389509.0, "South Korea": 800205926791.0, "Tajikistan": 1920482333.0, "Bangladesh": 82795988931.0, "Eritrea": 692457272.0, "Solomon Islands": 615526980.0, "Mongolia": 2125694923.0, "France": 1484694819490.0, "Laos": 3430231223.0, "Norway": 196836975505.0, "Malawi": 2743896911.0, "Benin": 3336801340.0, "Federated States of Micronesia": 226813019.0, "United States of America": 11681216873700.0, "Togo": 1719332980.0, "Dominican Republic": 40196106908.0, "Tonga": 214573346.0, "Finland": 145567532707.0, "Indonesia": 274371100612.0, "Central African Republic": 1054122016.0, "Mauritius": 6630525389.0, "Sweden": 302113665981.0, "Vietnam": 62832215474.0, "Mali": 4148253583.0, "Russia": 414355712287.0, "Bulgaria": 19207765822.0, "Chad": 3097352885.0, "South Africa": 187234185124.0, "Nicaragua": 5250043844.0, "Malaysia": 146942750904.0, "Austria": 222637577391.0, "Mozambique": 9116571405.0, "Uganda": 12614923290.0, "Niger": 2793453329.0, "Brazil": 916131427896.0, "Bahamas": 5564794827.0, "Nigeria": 85602703669.0, "Ecuador": 24995505261.0, "Czech Republic": 77630138229.0, "Algeria": 78708051653.0, "El Salvador": 15963452873.0, "Chile": 108399900217.0, "Kiribati": 76026237.0, "Haiti": 3711642006.0, "Iraq": 23583402031.0, "Sierra Leone": 1574302614.0, "Georgia": 5603249070.0, "Denmark": 171232571662.0, "Philippines": 129017441694.0, "Moldova": 2122435432.0, "Morocco": 59797619847.0, "Croatia": 27970797366.0, "Guinea-Bissau": 244395463.0, "Switzerland": 293607939568.0, "Belize": 1212219000.0, "Portugal": 124994368988.0, "Estonia": 8252354890.0, "Uruguay": 31164067816.0, "Timor-Leste": 415400000.0, "Burundi": 966494858.0, "Fiji": 1865583038.0, "Madagascar": 5026822443.0, "Sudan": 22819076998.0, "Democratic Republic of the Congo": 6850715769.0, "Netherlands": 440122535471.0, "Israel": 169830330662.0, "Iceland": 10832625067.0, "Zambia": 5587389858.0, "Zimbabwe": 4081749006.0, "Gambia": 613102927.0, "Kazakhstan": 40395838537.0, "Mauritania": 1592148932.0, "Kyrgyzstan": 2030314527.0, "Trinidad and Tobago": 14054779697.0, "Latvia": 11220120363.0, "Guyana": 822264759.0, "Honduras": 10573497668.0, "Tunisia": 30347628073.0, "Republic of Macedonia": 4438521279.0, "Serbia": 9145935035.0, "Botswana": 8405868745.0, "Greece": 158667298803.0, "Sri Lanka": 27029192432.0, "Earth": 41365019350600.0, "Comoros": 247231031.0}, "/location/statistical_region/population": {"Turkmenistan": 5105301.0, "Georgia": 4486000.0, "Cambodia": 14305183.0, "Swaziland": 1067773.0, "Bolivia": 10088108.0, "Cameroon": 20030362.0, "Burkina Faso": 16967845.0, "Bahrain": 1323535.0, "Japan": 127817277.0, "Slovenia": 2050189.0, "Dominica": 67675.0, "Netherlands": 16847007.0, "C\u00f4te d\u2019Ivoire": 20152894.0, "Costa Rica": 4726575.0, "Gabon": 1534262.0, "New Zealand": 4242048.0, "Yemen": 24799880.0, "Pakistan": 179160111.0, "United Arab Emirates": 7890924.0, "Uruguay": 3368595.0, "India": 1236686732.0, "Azerbaijan": 9168000.0, "Lesotho": 2193843.0, "Kenya": 41609728.0, "Afghanistan": 31108077.0, "Czech Republic": 10546000.0, "Solomon Islands": 571890.0, "Rwanda": 10942950.0, "Laos": 6288037.0, "Norway": 5063709.0, "Benin": 9325032.0, "Federated States of Micronesia": 111542.0, "Singapore": 5312400.0, "Montenegro": 632261.0, "Saint Kitts and Nevis": 53051.0, "Armenia": 3100236.0, "Timor-Leste": 1175880.0, "Dominican Republic": 10056181.0, "Ukraine": 45706100.0, "Tonga": 104509.0, "Libya": 6422772.0, "Indonesia": 246864191.0, "Central African Republic": 4486837.0, "Mauritius": 1291456.0, "Australia": 23059862.0, "Russia": 143400000.0, "Bulgaria": 7476000.0, "Romania": 21380000.0, "Angola": 19618432.0, "Portugal": 10556999.0, "South Africa": 50586757.0, "Austria": 8419000.0, "Mozambique": 23929708.0, "Uganda": 34509205.0, "Hungary": 9942000.0, "Niger": 16068994.0, "Brazil": 198656019.0, "Guyana": 756040.0, "Qatar": 1870041.0, "Bahamas": 347176.0, "Belarus": 9473000.0, "Iran": 76424443.0, "Algeria": 35980193.0, "El Salvador": 6227491.0, "Chile": 17402630.0, "Puerto Rico": 3667084.0, "Belgium": 11041266.0, "Kiribati": 101093.0, "Haiti": 10123787.0, "Iraq": 32961959.0, "Hong Kong": 7071600.0, "Sierra Leone": 5997486.0, "Nepal": 30485798.0, "Gambia": 1776103.0, "Morocco": 32878400.0, "Namibia": 2259000.0, "Guinea-Bissau": 1547061.0, "Switzerland": 8014000.0, "Estonia": 1340000.0, "Kosovo": 1802765.0, "Uzbekistan": 29559100.0, "Djibouti": 792198.0, "Antigua and Barbuda": 89612.0, "Spain": 46818216.0, "Colombia": 46366364.0, "Burundi": 10216190.0, "Cyprus": 1058300.0, "Madagascar": 21315135.0, "Palau": 20956.0, "Bhutan": 738267.0, "Sudan": 34318385.0, "Vanuatu": 224564.0, "Maldives": 328536.0, "Suriname": 529419.0, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 179506.0, "Venezuela": 29278000.0, "Iceland": 321857.0, "Zambia": 13474959.0, "Zimbabwe": 12754378.0, "Germany": 80327900.0, "Denmark": 5602628.0, "Kazakhstan": 16967000.0, "Tanzania": 46218486.0, "Mauritania": 3541540.0, "Kyrgyzstan": 5507000.0, "Trinidad and Tobago": 1346350.0, "Earth": 7046368812.0, "Honduras": 7754687.0, "Myanmar": 52797319.0, "Equatorial Guinea": 720213.0, "Serbia": 7261000.0, "United Kingdom": 62744081.0, "Congo": 4139748.0, "Sri Lanka": 20328000.0, "Croatia": 4267000.0, "Comoros": 753943.0}, "/location/statistical_region/gdp_nominal_per_capita": {"Lithuania": 14150.19, "Ethiopia": 470.22, "Aruba": 25354.78, "Argentina": 11557.57, "Bolivia": 2575.68, "Ghana": 1604.89, "Japan": 46720.36, "Republic of Ireland": 45835.75, "Guatemala": 3368.49, "Bosnia and Herzegovina": 4446.52, "Spain": 29195.38, "Netherlands": 46054.41, "Oman": 23731.21, "New Zealand": 36648.0, "Yemen": 1494.43, "Pakistan": 1290.36, "Albania": 4148.85, "Samoa": 3584.33, "Kosovo": 3453.1, "India": 1489.24, "Azerbaijan": 7227.5, "Lesotho": 1193.04, "Kenya": 862.23, "Tajikistan": 872.34, "Bangladesh": 747.34, "Solomon Islands": 1834.84, "Saint Lucia": 6558.44, "Mongolia": 3672.97, "France": 35520.0, "Rwanda": 619.93, "Slovakia": 16934.33, "Norway": 99557.73, "Malawi": 268.05, "Benin": 751.92, "Federated States of Micronesia": 3164.56, "Singapore": 51709.45, "United States of America": 49965.27, "Saint Kitts and Nevis": 13968.58, "Armenia": 3337.86, "Timor-Leste": 1068.14, "Dominican Republic": 5736.44, "Finland": 46178.59, "Indonesia": 3556.79, "Central African Republic": 472.68, "Mauritius": 8124.17, "Russia": 14037.02, "Romania": 7942.83, "Angola": 5484.83, "Portugal": 20182.4, "Trinidad and Tobago": 17934.06, "Fiji": 4437.76, "Austria": 47226.2, "Mozambique": 578.8, "Hungary": 12621.74, "Niger": 382.83, "Brazil": 11339.52, "Kuwait": 56514.16, "Costa Rica": 9391.16, "Cape Verde": 3837.68, "Bahamas": 21908.28, "Nigeria": 1555.41, "Czech Republic": 18607.71, "Australia": 67035.57, "Iran": 13053.0, "El Salvador": 3777.25, "Chile": 15363.1, "Puerto Rico": 27677.53, "Belgium": 43412.53, "Sierra Leone": 634.92, "Georgia": 3508.42, "Gambia": 512.1, "Moldova": 2037.94, "Morocco": 2924.94, "Croatia": 13227.47, "Guinea-Bissau": 539.45, "Estonia": 16316.46, "Uruguay": 14449.5, "South Africa": 7507.67, "Uzbekistan": 1716.53, "Djibouti": 1463.59, "Antigua and Barbuda": 13207.16, "Dominica": 6691.02, "Colombia": 7752.17, "Burundi": 250.97, "Nicaragua": 1753.64, "Qatar": 90523.53, "Palau": 11005.87, "Bhutan": 2398.91, "Sudan": 1580.0, "Nepal": 706.65, "Democratic Republic of the Congo": 271.97, "Maldives": 6566.65, "Suriname": 8864.02, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 1402.08, "Iceland": 42658.4, "Senegal": 1031.6, "Germany": 41514.17, "Denmark": 56202.0, "Kazakhstan": 12006.59, "Eritrea": 504.3, "Kyrgyzstan": 1159.63, "North Korea": 1800.0, "Paraguay": 3813.47, "Earth": 10170.68, "Syria": 3289.06, "Equatorial Guinea": 24035.71, "Tunisia": 4236.79, "Republic of Macedonia": 4589.34, "Serbia": 5189.58, "Botswana": 7191.44, "Congo": 3153.74, "Greece": 22082.89, "Sri Lanka": 2923.13, "Namibia": 5668.39, "Comoros": 830.52}, "/location/statistical_region/renewable_freshwater_per_capita": {"Canada": 82647.084624, "Lithuania": 5135.020344, "Ethiopia": 1364.759142, "Argentina": 6776.54191, "Bolivia": 29396.253261, "Burkina Faso": 781.478924, "Bahrain": 3.094146, "Saudi Arabia": 86.44995, "Cape Verde": 611.550975, "Slovenia": 9094.704271, "Bosnia and Herzegovina": 9246.424238, "Guinea": 20248.120105, "Spain": 2408.250371, "Liberia": 49023.24854, "Netherlands": 658.955924, "Oman": 462.844497, "C\u00f4te d\u2019Ivoire": 3962.876859, "New Zealand": 74230.454917, "Pakistan": 312.204908, "India": 1184.123586, "Azerbaijan": 884.653598, "Lesotho": 2576.96909, "Kenya": 492.530068, "South Korea": 1302.758191, "Belarus": 3926.95028, "Czech Republic": 1252.847728, "Eritrea": 471.948399, "Solomon Islands": 83085.965163, "Mongolia": 12635.206696, "Rwanda": 852.452573, "Slovakia": 2334.031814, "Laos": 29196.569894, "Norway": 77123.604507, "Malawi": 1044.15123, "Singapore": 115.747439, "Togo": 1776.801584, "Armenia": 2314.00888, "Dominican Republic": 2069.455254, "Ukraine": 1161.77053, "Ghana": 1220.754962, "Indonesia": 8281.322506, "Finland": 19857.943326, "Central African Republic": 31783.837445, "Sweden": 18096.7452, "Vietnam": 4091.530055, "Mali": 4161.829407, "Bulgaria": 2857.792956, "Romania": 1978.037517, "Chad": 1241.718051, "Fiji": 32894.698942, "Qatar": 29.305532, "Senegal": 1935.376866, "Mozambique": 4080.326371, "Hungary": 601.70119, "Kuwait": 0.0, "Costa Rica": 23724.692254, "Bahamas": 54.595434, "Ecuador": 28334.407133, "Brunei": 20909.591845, "Australia": 22039.159824, "Algeria": 297.910953, "El Salvador": 2837.166465, "Puerto Rico": 1921.987346, "Belize": 50588.086506, "Gambia": 1729.140513, "Philippines": 5039.2707, "Namibia": 2777.755231, "Croatia": 8807.176564, "Switzerland": 5105.911002, "Portugal": 3599.507777, "Estonia": 9485.5843, "Uzbekistan": 556.895156, "Djibouti": 354.339358, "Timor-Leste": 6986.257101, "Colombia": 44860.964147, "Cyprus": 698.603599, "Barbados": 283.885254, "Madagascar": 15545.044789, "Bhutan": 106932.957149, "Nepal": 7298.472583, "Democratic Republic of the Congo": 14077.564754, "Maldives": 90.371245, "S\u00e3o Tom\u00e9 and Pr\u00edncipe": 11901.057447, "Venezuela": 24487.616788, "Israel": 96.576057, "Iceland": 532891.973393, "Papua New Guinea": 114216.829743, "Zimbabwe": 917.751362, "Germany": 1308.105672, "Denmark": 1077.088672, "Tanzania": 1812.117618, "Mauritania": 108.027438, "Kyrgyzstan": 8872.810358, "North Korea": 2720.117269, "Trinidad and Tobago": 2880.542982, "Latvia": 8133.383604, "Guyana": 304723.081319, "Syria": 324.747528, "Myanmar": 19159.224098, "Equatorial Guinea": 36313.052028, "Republic of Macedonia": 2566.674113, "Serbia": 1158.189191, "Comoros": 1713.756898, "United Kingdom": 2310.665945, "Earth": 6122.562736, "Botswana": 1208.032814}}
--------------------------------------------------------------------------------
/src/main/abstractPredictor.py:
--------------------------------------------------------------------------------
1 | '''
2 | This is a baseline predictor. For each property, it finds the text patterns that correlate the best.
3 | If the value for a country cannot be predicted in this way, it returns the average of the property
4 | '''
5 | import operator
6 | import json
7 | import numpy
8 | import random
9 | from sklearn.metrics import mean_squared_error
10 | import math
11 | import multiprocessing
12 | from collections import OrderedDict
13 |
14 |
15 | class AbstractPredictor(object):
16 | def __init__(self):
17 | pass
18 |
19 | @staticmethod
20 | def loadMatrix(jsonFile):
21 | print "loading from file " + jsonFile
22 | with open(jsonFile) as freebaseFile:
23 | property2region2value = json.loads(freebaseFile.read())
24 |
25 |
26 | regions = set([])
27 | valueCounter = 0
28 | for property, region2value in property2region2value.items():
29 | # Check for nan values and remove them
30 | for region, value in region2value.items():
31 | if not numpy.isfinite(value):
32 | del region2value[region]
33 | print "REMOVED:", value, " for ", region, " ", property
34 | if len(region2value) == 0:
35 | del property2region2value[property]
36 | print "REMOVED property:", property, " no values left"
37 | else:
38 | valueCounter += len(region2value)
39 | regions = regions.union(set(region2value.keys()))
40 |
41 | print len(property2region2value), " properties"
42 | print len(regions), " unique regions"
43 | print valueCounter, " values loaded"
44 | return property2region2value
45 |
46 | def train(self, trainMatrix, textMatrix, params):
47 | pass
48 |
49 | def predict(self, property, region):
50 | pass
51 |
52 | @classmethod
53 | def runRelEval(cls, d, property, trainRegion2value, textMatrix, testRegion2value, ofn, params):
54 | predictor = cls()
55 | of = open(ofn, "w")
56 | print "Training"
57 |
58 | #try:
59 | #cProfile.runctx('predictor.trainRelation(property, trainRegion2value, textMatrix, of, params)', globals(), locals())
60 |
61 | predictor.trainRelation(property, trainRegion2value, textMatrix, of, params)
62 | print "Done training"
63 | #except FloatingPointError:
64 | # print "Training with params ", params, " failed due to floating point error"
65 | # avgScore = float("inf")
66 | #else:
67 | print "Testing"
68 | predMatrix = {}
69 | predMatrix[property] = {}
70 | for region in testRegion2value:
71 | predMatrix[property][region] = predictor.predict(property, region, of)
72 |
73 | testMatrix = {}
74 | testMatrix[property] = testRegion2value
75 | avgScore = predictor.eval(predMatrix, testMatrix, of)
76 | of.write("fold MAPE:" + str(avgScore) + "\n")
77 |
78 | # Now repeat the prediction but now do not use defaults
79 | of.write("Evaluation without using the defaults\n")
80 | predMatrix = {}
81 | predMatrix[property] = {}
82 | for region in testRegion2value:
83 | val = predictor.predict(property, region, of, False)
84 | if val != None:
85 | predMatrix[property][region] = val
86 | coverage = float(len(predMatrix[property]))/len(testMatrix[property])
87 | if coverage > 0:
88 | avgScoreNoDefault = predictor.eval(predMatrix, testMatrix, of)
89 | of.write("fold MAPE without defaults:" + str(avgScoreNoDefault) +" coverage " + str(coverage) + "\n")
90 | else:
91 | of.write("No values predicted.\n")
92 |
93 |
94 | if ofn.split("_")[-1] == "TEST":
95 | d["TEST"] = avgScore
96 | else:
97 | d[int(ofn.split("_")[-1])] = avgScore
98 | return avgScore
99 |
100 | #finally:
101 | of.close()
102 |
103 |
104 | # the paramSets
105 | @classmethod
106 | def crossValidate(cls, trainMatrix, textMatrix, folds, properties, outputFileName, paramSets, multi=True):
107 | # first construct the folds per relation
108 | property2folds = {}
109 | # we set the same random in order to get the same folds every time
110 | # we do it on the whole dataset everytime independently of the choice of properties
111 | random.seed(13)
112 | # For each property
113 | for property, region2value in trainMatrix.items():
114 | # create the empty folds
115 | property2folds[property] = [{} for _ in xrange(folds)]
116 | # shuffle the data points
117 | regions = region2value.keys()
118 | random.shuffle(regions)
119 | for idx, region in enumerate(regions):
120 | # pick a fold
121 | foldNo = idx % folds
122 | # add the datapoint there
123 | property2folds[property][foldNo][region] = region2value[region]
124 |
125 | # here we keep the best params found for each relation
126 | property2bestParams = {}
127 | bestParams = [None]
128 |
129 | # for each of the properties we decide
130 | for property in properties:
131 | print "property: " + property
132 | # this keeps the lowest MAPE achieved for this property across folds
133 | lowestAvgMAPE = float("inf")
134 |
135 | for params in paramSets:
136 | print "params: ", params
137 |
138 | if multi:
139 | # this is to do the cross validation across folds
140 | mgr = multiprocessing.Manager()
141 | d = mgr.dict()
142 | jobs = []
143 | else:
144 | d= {}
145 |
146 |
147 | paramMAPEs = []
148 | # for each fold
149 |
150 | for foldNo in xrange(folds):
151 | print "fold:", foldNo
152 | # construct the training and test datasets
153 | foldTrainRegion2value = {}
154 | foldTestRegion2value = {}
155 | data = property2folds[property]
156 |
157 | foldTrainMatrix = {}
158 | for idx in xrange(folds):
159 | if (idx % folds) == foldNo:
160 | # this the test data
161 | foldTestRegion2value = data[idx]
162 | else:
163 | # the rest adds to the training data
164 | foldTrainRegion2value.update(data[idx])
165 |
166 | # now create a predictor and run the eval
167 | predictor = cls()
168 | # run the eval
169 | # open the file for the relation, fold and params
170 | paramsStrs = []
171 | for param in params:
172 | paramsStrs.append(str(param))
173 | #print outputFileName
174 | #print paramsStrs
175 | #print "_".join(paramsStrs)
176 | #print property.split("/")[-1]
177 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_" + str(foldNo)
178 | if multi:
179 | job = multiprocessing.Process(target=predictor.runRelEval, args=(d, property, foldTrainRegion2value, textMatrix, foldTestRegion2value, ofn, params,))
180 | jobs.append(job)
181 | else:
182 | predictor.runRelEval(d, property, foldTrainRegion2value, textMatrix, foldTestRegion2value, ofn, params)
183 |
184 | if multi:
185 | # start all the jobs
186 | for j in jobs:
187 | j.start()
188 |
189 | # Ensure all of the processes have finished
190 | for j in jobs:
191 | j.join()
192 |
193 | orderedFold2MAPE = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
194 | # get the average across folds
195 | if float("inf") not in orderedFold2MAPE.values():
196 | avgMAPE = numpy.mean(orderedFold2MAPE.values())
197 | print property + ":params:", params, " avgMAPE:", avgMAPE, "stdMAPE:", numpy.std(orderedFold2MAPE.values()), "foldMAPEs:", orderedFold2MAPE.values()
198 | # lower is better
199 | if avgMAPE <= lowestAvgMAPE:
200 | bestParams = params
201 | lowestAvgMAPE = avgMAPE
202 |
203 | else:
204 | print property + ":params:", params, "Training in some folds failed due to overflow", "foldMAPEs:", orderedFold2MAPE.values()
205 |
206 |
207 |
208 | print property + ": lowestAvgMAPE:", lowestAvgMAPE
209 | print property + ": bestParams: ", bestParams
210 | property2bestParams[property] = bestParams
211 |
212 | # we return the best params
213 | return property2bestParams
214 |
215 |
216 | @staticmethod
217 | def eval(predMatrix, testMatrix, of):
218 | of.write(str(predMatrix) +"\n")
219 | of.write(str(testMatrix) +"\n")
220 | property2MAPE = {}
221 | property2MASE = {}
222 | property2RMSE = {}
223 | for property, predRegion2value in predMatrix.items():
224 | of.write(property+"\n")
225 | #print "real: ", testMatrix[property]
226 | #print "predicted: ", predRegion2value
227 | mape = AbstractPredictor.MAPE(predRegion2value, testMatrix[property])
228 | of.write("MAPE: " + str(mape) + "\n")
229 | property2MAPE[property] = mape
230 | rmse = AbstractPredictor.RMSE(predRegion2value, testMatrix[property])
231 | of.write("RMSE: " + str(rmse) + "\n")
232 | property2RMSE[property] = rmse
233 | mase = AbstractPredictor.MASE(predRegion2value, testMatrix[property])
234 | #of.write("MASE: " + str(mase) + "\n")
235 | property2MASE[property] = mase
236 |
237 | #return numpy.mean(MAPEs)
238 | of.write("properties ordered by MAPE\n")
239 | sortedMAPEs = sorted(property2MAPE.items(), key=operator.itemgetter(1))
240 | for property, mape in sortedMAPEs:
241 | of.write(property + ":" + str(mape)+"\n")
242 |
243 | #of.write("properties ordered by MASE\n")
244 | #sortedMASEs = sorted(property2MASE.items(), key=operator.itemgetter(1))
245 | #for property, mase in sortedMASEs:
246 | # of.write(property + ":" + str(mase)+"\n")
247 |
248 |
249 | of.write("avg. MAPE: " + str(numpy.mean(property2MAPE.values())) +"\n")
250 | of.write("avg. RMSE: " + str(numpy.mean(property2RMSE.values())) +"\n")
251 | #of.write("avg. MASE: " + str(numpy.mean(property2MASE.values())) +"\n")
252 | # we use MASE as the main metric, which is returned to guide the hyperparamter selection
253 | return numpy.mean(property2MAPE.values())
254 |
255 | # We follow the definitions of Chen and Yang (2004)
256 | # the second dict does the scaling
257 | # not defined when the trueDict value is 0
258 | # returns the mean absolute percentage error and the number of predicted values used in it
259 | @staticmethod
260 | def MAPE(predDict, trueDict, verbose=False):
261 | absPercentageErrors = {}
262 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
263 |
264 | #print keysInCommon
265 | for key in keysInCommon:
266 | # avoid 0's
267 | if trueDict[key] != 0:
268 | absError = abs(predDict[key] - trueDict[key])
269 | absPercentageErrors[key] = absError/numpy.abs(trueDict[key])
270 |
271 | if len(absPercentageErrors) > 0:
272 | if verbose:
273 | print "MAPE results"
274 | sortedAbsPercentageErrors = sorted(absPercentageErrors.items(), key=operator.itemgetter(1))
275 | print "top-5 predictions"
276 | print "region:pred:true"
277 | for idx in xrange(5):
278 | print sortedAbsPercentageErrors[idx][0].encode('utf-8'), ":", predDict[sortedAbsPercentageErrors[idx][0]], ":", trueDict[sortedAbsPercentageErrors[idx][0]]
279 | print "bottom-5 predictions"
280 | for idx in xrange(5):
281 | print sortedAbsPercentageErrors[-idx-1][0].encode('utf-8'), ":", predDict[sortedAbsPercentageErrors[-idx-1][0]], ":", trueDict[sortedAbsPercentageErrors[-idx-1][0]]
282 |
283 | return numpy.mean(absPercentageErrors.values())
284 | else:
285 | return float("inf")
286 |
287 |
288 | # This is MASE, sort of proposed in Hyndman 2006
289 | # at the moment the evaluation metric of choice
290 | # it returns 1 if the method has the same absolute errors as the median of the test set.
291 | @staticmethod
292 | def MASE(predDict, trueDict, verbose=False):
293 | # first let's estimate the error from the median:
294 | median = numpy.median(trueDict.values())
295 |
296 | # calculate the errors of the test median
297 | # we are scaling with the error of the median on the value at question. This will be 0 often, thus we want to know the smallest non-zero to add it.
298 | minMedianAbsError = float("inf")
299 | for value in trueDict.values():
300 | medianAbsError = numpy.abs(value - median)
301 | if medianAbsError > 0 and medianAbsError < minMedianAbsError:
302 | minMedianAbsError = medianAbsError
303 |
304 |
305 | # get those that were predicted
306 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
307 | predScaledAbsErrors = {}
308 | for key in keysInCommon:
309 | predScaledAbsErrors[key] = (numpy.abs(predDict[key] - trueDict[key]) + minMedianAbsError)/(numpy.abs(median - trueDict[key]) + minMedianAbsError)
310 |
311 | if verbose:
312 | print "MASE results"
313 | sortedPredScaledAbsErrors = sorted(predScaledAbsErrors.items(), key=operator.itemgetter(1))
314 | print "top-5 predictions"
315 | print "region:pred:true"
316 | for idx in xrange(5):
317 | print sortedPredScaledAbsErrors[idx][0].encode('utf-8'), ":", predDict[sortedPredScaledAbsErrors[idx][0]], ":", trueDict[sortedPredScaledAbsErrors[idx][0]]
318 | print "bottom-5 predictions"
319 | for idx in xrange(5):
320 | print sortedPredScaledAbsErrors[-idx-1][0].encode('utf-8'), ":", predDict[sortedPredScaledAbsErrors[-idx-1][0]], ":", trueDict[sortedPredScaledAbsErrors[-idx-1][0]]
321 |
322 | return numpy.mean(predScaledAbsErrors.values())
323 |
324 |
325 |
326 |
327 | # This is the KL-DE1 measure defined in Chen and Yang (2004)
328 | @staticmethod
329 | def KLDE(predDict, trueDict, verbose=False):
330 | kldes = {}
331 | # first we need to get the stdev used in scaling
332 | # let's use all the values for this, not only the ones in common
333 | std = numpy.std(trueDict.values())
334 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
335 |
336 | for key in keysInCommon:
337 | scaledAbsError = abs(predDict[key] - trueDict[key])/std
338 | klde = numpy.exp(-scaledAbsError) + scaledAbsError - 1
339 | kldes[key] = klde
340 |
341 | if verbose:
342 | print "KLDE results"
343 | sortedKLDEs = sorted(kldes.items(), key=operator.itemgetter(1))
344 | print "top-5 predictions"
345 | print "region:pred:true"
346 | for idx in xrange(5):
347 | print sortedKLDEs[idx][0].encode('utf-8'), ":", predDict[sortedKLDEs[idx][0]], ":", trueDict[sortedKLDEs[idx][0]]
348 | print "bottom-5 predictions"
349 | for idx in xrange(5):
350 | print sortedKLDEs[-idx-1][0].encode('utf-8'), ":", predDict[sortedKLDEs[-idx-1][0]], ":", trueDict[sortedKLDEs[-idx-1][0]]
351 |
352 | return numpy.mean(kldes.values())
353 |
354 | # This does a scaling according to the number of values actually used in the calculation
355 | # The more values used, the lower the score (lower is better)
356 | # smaller scaling parameters make the number of values used more important, larger lead to the same as standard KLDE
357 | # Inspired by the shrunk correlation coefficient (Koren 2008 equation 2)
358 | @staticmethod
359 | def supportScaledKLDE(predDict, trueDict, scalingParam=1):
360 | klde = AbstractPredictor.KLDE(predDict, trueDict)
361 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
362 | scalingFactor = scalingParam/(scalingParam + len(keysInCommon))
363 | return klde * scalingFactor
364 |
365 | @staticmethod
366 | def supportScaledMASE(predDict, trueDict, scalingParam=1):
367 | mase = AbstractPredictor.MASE(predDict, trueDict)
368 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
369 | scalingFactor = float(scalingParam)/(scalingParam + len(keysInCommon))
370 | return mase * scalingFactor
371 |
372 | @staticmethod
373 | def supportScaledMAPE(predDict, trueDict, scalingParam=1):
374 | mape = AbstractPredictor.MAPE(predDict, trueDict)
375 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
376 | scalingFactor = float(scalingParam)/(scalingParam + len(keysInCommon))
377 | return mape * scalingFactor
378 |
379 |
380 | @staticmethod
381 | def RMSE(predDict, trueDict):
382 | keysInCommon = list(set(predDict.keys()) & set(trueDict.keys()))
383 | #print keysInCommon
384 | y_actual = []
385 | y_predicted = []
386 | for key in keysInCommon:
387 | y_actual.append(trueDict[key])
388 | y_predicted.append(predDict[key])
389 | return math.sqrt(mean_squared_error(y_actual, y_predicted))
390 |
--------------------------------------------------------------------------------
/src/main/baselinePredictor.py:
--------------------------------------------------------------------------------
1 | import fixedValuePredictor
2 | import numpy
3 | import heapq
4 |
5 | class BaselinePredictor(fixedValuePredictor.FixedValuePredictor):
6 |
7 | def __init__(self):
8 | # this keeps the patterns for each relation
9 | # each pattern has a dict of regions and values associated with it
10 | self.property2patterns = {}
11 | # this initializes the fixed value
12 | fixedValuePredictor.FixedValuePredictor.__init__(self)
13 | #super(BaselinePredictor,self).__init_()
14 |
15 |
16 | def predict(self, property, region, of, useDefault=True):
17 | # collect all the values for this region found in related patterns
18 | # if 0 then one pattern is enough, otherwise more are needed to avoid default
19 | patternsRequiredForPrediction = 0
20 | values = []
21 | if property in self.property2patterns:
22 | patterns = self.property2patterns[property]
23 | for pattern, region2value in patterns.items():
24 | if region in region2value:
25 | values.append(region2value[region])
26 | of.write("region: " + region.encode('utf-8') + " pattern used: " + pattern.encode('utf-8') + " value: " + str(region2value[region]) + "\n")
27 |
28 | if len(values) > patternsRequiredForPrediction:
29 | return numpy.mean(values)
30 | else:
31 | if useDefault:
32 | return fixedValuePredictor.FixedValuePredictor.predict(self, property, region, of)
33 | else:
34 | return None
35 |
36 | def trainRelation(self, property, trainRegion2value, textMatrix, of, params=[False]):
37 | # get the fixed value first
38 | fixedValuePredictor.FixedValuePredictor.trainRelation(self, property, trainRegion2value, textMatrix, of)
39 |
40 | # whether we are scaling or not
41 | scaling = params[0]
42 | # if we are scaling, what is the scaling parameter?
43 | print scaling
44 | if scaling:
45 | scalingParam=float(params[1])
46 | of.write("Training with MAPE support scaling parameter: " + str(scalingParam) + "\n")
47 | else:
48 |
49 | of.write("Training without MAPE support scaling\n")
50 |
51 | self.property2patterns[property] = {}
52 | #print "OK"
53 | patternMAPEs = []
54 | # we first need to rank all the text patterns according to their msae
55 | for pattern, region2value in textMatrix.items():
56 | # make sure that it has at least two value in common with training data, otherwise we might get spurious stuff
57 | keysInCommon = list(set(region2value.keys()) & set(trainRegion2value.keys()))
58 | if len(keysInCommon) > 1:
59 | if scaling:
60 | mape = self.supportScaledMAPE(region2value, trainRegion2value, scalingParam)
61 | else:
62 | mape = self.MAPE(region2value, trainRegion2value)
63 | heapq.heappush(patternMAPEs, (mape, pattern))
64 | #print "OK"
65 | # predict
66 | prediction = {}
67 | for region in trainRegion2value:
68 | prediction[region] = self.predict(property, region, of)
69 |
70 | # calculate the current score with
71 | currentMAPE = self.MAPE(prediction, trainRegion2value)
72 |
73 | while len(patternMAPEs)>0:
74 | # The pattern with the smallest MAPE is indexed at 0
75 | # the elememts are (MAPE, pattern) tuples
76 | mape, pattern = heapq.heappop(patternMAPEs)
77 |
78 | # add it to the classifiers
79 | self.property2patterns[property][pattern] = textMatrix[pattern]
80 |
81 | of.write("text pattern: " + pattern.encode('utf-8') + "\n")
82 |
83 | of.write("MAPE:" + str(mape) + "\n")
84 | #print "MASE", self.MASE(textMatrix[pattern], trainRegion2value)
85 | of.write(str(textMatrix[pattern]) + "\n")
86 |
87 | # predict
88 | prediction = {}
89 |
90 | for region in trainRegion2value:
91 | prediction[region] = self.predict(property, region, of)
92 |
93 | # calculate new MAPE
94 | newMAPE = self.MAPE(prediction, trainRegion2value)
95 | of.write("MAPE of predictor before adding the pattern:" + str(currentMAPE) + "\n")
96 | of.write("MAPE of predictor after adding the pattern:" + str(newMAPE) + "\n")
97 | # if higher than before, remove the last pattern added and break
98 |
99 | if newMAPE > currentMAPE:
100 | del self.property2patterns[property][pattern]
101 | break
102 | else:
103 | currentMAPE = newMAPE
104 |
105 |
106 |
107 |
108 | def train(self, trainMatrix, textMatrix, params=[False]):
109 | fixedValuePredictor.FixedValuePredictor.train(self, trainMatrix, textMatrix)
110 |
111 | # whether we are scaling or not
112 | scaling = params[0]
113 | # if we are scaling, what is the scaling parameter?
114 | if scaling:
115 | scalingParam=float(params[1])
116 | print "Training with MAPE supprt scaling parameter: ", scalingParam
117 | else:
118 | print "Training without MAPE support scaling"
119 |
120 | # for each property, find the patterns that result in improving the average the most
121 | # it should get better initially as good patterns are added, but then down as worse ones are added
122 | for property, trainRegion2value in trainMatrix.items():
123 | print property, trainRegion2value
124 | # first get the average
125 | #self.property2median[property] = numpy.median(trainRegion2value.values())
126 | self.property2patterns[property] = {}
127 |
128 | # this is used to store the msaes for each pattern
129 | patternMAPEs = []
130 | # we first need to rank all the text patterns according to their msae
131 | for pattern, region2value in textMatrix.items():
132 | # make sure that it has at least two value in common with training data, otherwise we might get spurious stuff
133 | keysInCommon = list(set(region2value.keys()) & set(trainRegion2value.keys()))
134 | if len(keysInCommon) > 1:
135 | if scaling:
136 | mape = self.supportScaledMAPE(region2value, trainRegion2value, scalingParam)
137 | else:
138 | mape = self.MAPE(region2value, trainRegion2value)
139 | heapq.heappush(patternMAPEs, (mape, pattern))
140 |
141 | # predict
142 | prediction = {}
143 | for region in trainRegion2value:
144 | prediction[region] = self.predict(property, region)
145 | # calculate the current score with
146 | currentMAPE = self.MAPE(prediction, trainRegion2value)
147 | while True:
148 | # The pattern with the smallest MAPE is indexed at 0
149 | # the elememts are (MAPE, pattern) tuples
150 | mape, pattern = heapq.heappop(patternMAPEs)
151 |
152 | # add it to the classifiers
153 | self.property2patterns[property][pattern] = textMatrix[pattern]
154 | print "text pattern: " + pattern.encode('utf-8')
155 | print "MAPE:", mape
156 | print "MASE", self.MASE(textMatrix[pattern], trainRegion2value)
157 | print textMatrix[pattern]
158 |
159 | # predict
160 | for region in trainRegion2value:
161 | prediction[region] = self.predict(property, region)
162 |
163 | # calculate new MAPE
164 | newMAPE = self.MAPE(prediction, trainRegion2value)
165 | print "MAPE of predictor before adding the pattern:", currentMAPE
166 | print "MAPE of predictor after adding the pattern:", newMAPE
167 | # if higher than before, remove the last pattern added and break
168 | if newMAPE > currentMAPE:
169 | del self.property2patterns[property][pattern]
170 | break
171 | else:
172 | currentMAPE = newMAPE
173 |
174 |
175 |
176 | if __name__ == "__main__":
177 |
178 | import sys
179 | import json
180 | import os.path
181 |
182 | baselinePredictor = BaselinePredictor()
183 |
184 | trainMatrix = baselinePredictor.loadMatrix(sys.argv[1])
185 | textMatrix = baselinePredictor.loadMatrix(sys.argv[2])
186 | testMatrix = baselinePredictor.loadMatrix(sys.argv[3])
187 |
188 | outputFileName = sys.argv[4]
189 |
190 | adjust = sys.argv[5]
191 |
192 | properties = json.loads(open(os.path.dirname(os.path.abspath(sys.argv[1])) + "/featuresKept.json").read())
193 |
194 | if adjust == "TRUE":
195 | property2bestParams = baselinePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[True, 0.0078125], [True, 0.015625],[True, 0.03125],[True, 0.0625],[True, 0.125],[True, 0.25],[True,0.5],[True,1],[True,2],[True,4],[True,8],[True,16],[True,32],])
196 | else:
197 | property2bestParams = baselinePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[False]])
198 | #print "OK"
199 |
200 | property2MAPE = {}
201 | for property in properties:
202 | paramsStrs = []
203 | for param in property2bestParams[property]:
204 | paramsStrs.append(str(param))
205 |
206 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_TEST"
207 | a= {}
208 | baselinePredictor.runRelEval(a, property, trainMatrix[property], textMatrix, testMatrix[property], ofn, property2bestParams[property])
209 | property2MAPE[property] = a.values()[0]
210 |
211 | for property in sorted(property2MAPE):
212 | print property, property2MAPE[property]
213 | print "avg MAPE:", str(numpy.mean(property2MAPE.values()))
214 |
215 |
--------------------------------------------------------------------------------
/src/main/buildMatrix.py:
--------------------------------------------------------------------------------
1 | '''
2 |
3 | This script reads in parsed and NER'ed JSONs by Stanford CoreNLP and produces the following structure:
4 |
5 | Location:[dep1:[val1, val2], dep1:[val1, val2, ...]]
6 |
7 |
8 | '''
9 |
10 | import json
11 | import sys
12 | import os
13 | import glob
14 | import networkx
15 | import re
16 | import copy
17 | import numpy
18 | import codecs
19 |
20 | # this class def allows us to write:
21 | #print(json.dumps(np.arange(5), cls=NumPyArangeEncoder))
22 | #class NumPyArangeEncoder(json.JSONEncoder):
23 | # def default(self, obj):
24 | # if isinstance(obj, numpy.ndarray):
25 | # return obj.tolist() # or map(int, obj)
26 | # return json.JSONEncoder.default(self, obj)
27 |
28 |
29 |
30 | def getNumbers(sentence):
31 | # a number can span over multiple tokens
32 | tokenIDs2number = {}
33 | for idx, token in enumerate(sentence["tokens"]):
34 | # avoid only tokens known to be dates or part of locations
35 | # This only takes actual numbers into account thus it ignores things like "one million"
36 | # and also treating "500 millions" as "500"
37 | if token["ner"] not in ["DATE", "LOCATION", "PERSON", "ORGANIZATION", "MISC"]:
38 | try:
39 | # this makes sure that 123,123,123.23 which fails the float test, becomes 123123123.23 which is good
40 | tokenWithoutCommas = re.sub(",([0-9][0-9][0-9])", "\g<1>", token["word"])
41 | number = float(tokenWithoutCommas)
42 | # we want this to avoid taking in nan, inf and -inf as floats
43 | if numpy.isfinite(number):
44 | ids = [idx]
45 | # check the next token if it is million or thousand
46 | if len(sentence["tokens"]) > idx+1:
47 | if sentence["tokens"][idx+1]["word"].startswith("trillion"):
48 | number = number * 1000000000000
49 | ids.append(idx+1)
50 | elif sentence["tokens"][idx+1]["word"].startswith("billion"):
51 | number = number * 1000000000
52 | ids.append(idx+1)
53 | elif sentence["tokens"][idx+1]["word"].startswith("million"):
54 | number = number * 1000000
55 | ids.append(idx+1)
56 | #print sentence["tokens"]
57 | #print number
58 | elif sentence["tokens"][idx+1]["word"].startswith("thousand"):
59 | number = number * 1000
60 | ids.append(idx+1)
61 | #print sentence["tokens"]
62 | #print number
63 |
64 | tokenIDs2number[tuple(ids)] = number
65 |
66 | except ValueError:
67 | pass
68 | return tokenIDs2number
69 |
70 | # this function performs a dictNER matching to help with names that Stanford NER fails
71 | # use with caution, it ignores everything apart from the tokens, over-writing existing NER tags
72 | def dictLocationMatching(sentence, tokenizedLocations):
73 | # first re-construct the sentence as a string
74 | wordsInSentence = []
75 | for token in sentence["tokens"]:
76 | wordsInSentence.append(token["word"])
77 | #print wordsInSentence
78 | for tokLoc in tokenizedLocations:
79 | #print wordsInSentence
80 | #print tokLoc
81 | tokenSeqs = [(i, i+len(tokLoc)) for i in range(len(wordsInSentence)) if wordsInSentence[i:i+len(tokLoc)] == tokLoc]
82 | #print tokenSeqs
83 | for tokenSeq in tokenSeqs:
84 | for tokenNo in range(tokenSeq[0], tokenSeq[1]):
85 | sentence["tokens"][tokenNo]["ner"] = "LOCATION"
86 |
87 | def getLocations(sentence):
88 | # note that a location can span multiple tokens
89 | tokenIDs2location = {}
90 | currentLocation = []
91 | for idx, token in enumerate(sentence["tokens"]):
92 | # if it is a location token add it:
93 | if token["ner"] == "LOCATION":
94 | currentLocation.append(idx)
95 | # if it is a no location token
96 | else:
97 | # check if we have just finished a location
98 | if len(currentLocation) > 0:
99 | # convert the tokenID to a tuple (immutable) and put the name there
100 | locationTokens = []
101 | for locIdx in currentLocation:
102 | locationTokens.append(sentence["tokens"][locIdx]["word"])
103 |
104 | tokenIDs2location[tuple(currentLocation)] = " ".join(locationTokens)
105 | currentLocation = []
106 |
107 | return tokenIDs2location
108 |
109 | def buildDAGfromSentence(sentence):
110 | sentenceDAG = networkx.DiGraph()
111 | for idx, token in enumerate(sentence["tokens"]):
112 | sentenceDAG.add_node(idx, word=token["word"])
113 | sentenceDAG.add_node(idx, lemma=token["lemma"])
114 | sentenceDAG.add_node(idx, ner=token["ner"])
115 | sentenceDAG.add_node(idx, pos=token["pos"])
116 |
117 | # and now the edges:
118 | for dependency in sentence["dependencies"]:
119 | sentenceDAG.add_edge(dependency["head"], dependency["dep"], label=dependency["label"])
120 | # add the reverse if one doesn't exist
121 | # if an edge exists, the label gets updated, thus the standard edges do
122 | if not sentenceDAG.has_edge(dependency["dep"], dependency["head"]):
123 | sentenceDAG.add_edge(dependency["dep"], dependency["head"], label="-" + dependency["label"])
124 | return sentenceDAG
125 |
126 | # getDepPaths
127 | # also there can be more than one paths
128 | def getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs):
129 | shortestPaths = []
130 | for locationTokenID in locationTokenIDs:
131 | for numberTokenID in numberTokenIDs:
132 | try:
133 | # get the shortest paths
134 | # get the list as it they are unlikely to be very many and we need to len()
135 | tempShortestPaths = list(networkx.all_shortest_paths(sentenceDAG, source=locationTokenID, target=numberTokenID))
136 | # if the paths found are shorter than the ones we had (or we didn't have any)
137 | if (len(shortestPaths) == 0) or len(shortestPaths[0]) > len(tempShortestPaths[0]):
138 | shortestPaths = tempShortestPaths
139 | # if they have equal length add them
140 | elif len(shortestPaths[0]) == len(tempShortestPaths[0]):
141 | shortestPaths.extend(tempShortestPaths)
142 | # if not paths were found, do nothing
143 | except networkx.exception.NetworkXNoPath:
144 | pass
145 | return shortestPaths
146 |
147 | # given the a dep path defined by the nodes, get the string of the lexicalized dep path, possibly extended by one more dep
148 | def depPath2StringExtend(sentenceDAG, path, locationTokenIDs, numberTokenIDs, extend=True):
149 | strings = []
150 | # this keeps the various bits of the string
151 | pathStrings = []
152 | # get the first dep which is from the location
153 | pathStrings.append("LOCATION_SLOT~" + sentenceDAG[path[0]][path[1]]["label"])
154 | # for the words in between add the lemma and the dep
155 | hasContentWord = False
156 | for seqOnPath, tokenId in enumerate(path[1:-1]):
157 | if sentenceDAG.node[tokenId]["ner"] == "O":
158 | pathStrings.append(sentenceDAG.node[tokenId]["word"].lower() + "~" + sentenceDAG[tokenId][path[seqOnPath+2]]["label"])
159 | if sentenceDAG.node[tokenId]["pos"][0] in "NVJR":
160 | hasContentWord = True
161 | else:
162 | pathStrings.append(sentenceDAG.node[tokenId]["ner"] + "~" + sentenceDAG[tokenId][path[seqOnPath+2]]["label"])
163 |
164 | pathStrings.append("NUMBER_SLOT")
165 |
166 | if hasContentWord:
167 | strings.append("+".join(pathStrings))
168 |
169 | if extend:
170 | # create additional paths by adding all out-edges from the number token (except for the ones on the path)
171 | # the extension is always added left of the node
172 | for nodeOnPath in path:
173 | # go over each node on the path
174 | outEdges = sentenceDAG.out_edges_iter([nodeOnPath])
175 |
176 | for pathIdx, edge in enumerate(outEdges):
177 | tempPathStrings = copy.deepcopy(pathStrings)
178 | # the source of the edge we knew
179 | curNode, outNode = edge
180 | # if we are not going on the path
181 | if outNode not in path and outNode not in numberTokenIDs:
182 | if sentenceDAG.node[outNode]["ner"] == "O":
183 | if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR":
184 | #print "*extend*" + sentenceDAG.node[outNode]["lemma"] + "~" + sentenceDAG[curNode][outNode]["label"]
185 | #print pathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["lemma"] + "~" + sentenceDAG[curNode][outNode]["label"])
186 | tempPathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["word"].lower() + "~" + sentenceDAG[curNode][outNode]["label"])
187 | #print tempPathStrings
188 | strings.append("+".join(tempPathStrings))
189 | elif hasContentWord:
190 | tempPathStrings.insert(pathIdx, "*extend*" + sentenceDAG.node[outNode]["ner"] + "~" + sentenceDAG[curNode][outNode]["label"])
191 | strings.append("+".join(tempPathStrings))
192 |
193 |
194 | # # create additional paths by adding all out-edges from the number token (except for the one taking as back)
195 | # # the number token is the last one on the path
196 | # #outEdgesFromNumber = sentenceDAG.out_edges_iter([path[-1]])
197 | # #for edge in outEdgesFromNumber:
198 | # # the source of the edge we knew
199 | # dummy, outNode = edge
200 | # # if we are not going back
201 | # if outNode != path[-2] and outNode not in numberTokenIDs:
202 | # if sentenceDAG.node[outNode]["ner"] == "O":
203 | # if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR":
204 | # strings.append("+".join(pathStrings + ["NUMBER_SLOT~" + sentenceDAG[path[-1]][outNode]["label"] + "~" + sentenceDAG.node[outNode]["lemma"] ]))
205 | # elif hasContentWord:
206 | # strings.append("+".join(pathStrings + ["NUMBER_SLOT~" + sentenceDAG[path[-1]][outNode]["label"] + "~" + sentenceDAG.node[outNode]["ner"] ]))
207 | #
208 | # # do the same for the LOCATION
209 | # outEdgesFromLocation = sentenceDAG.out_edges_iter([path[0]])
210 | # for edge in outEdgesFromLocation:
211 | # # the source of the edge we knew
212 | # dummy, outNode = edge
213 | # # if we are not going on the path
214 | # if outNode != path[1] and outNode not in locationTokenIDs:
215 | # if sentenceDAG.node[outNode]["ner"] == "O":
216 | # if hasContentWord or sentenceDAG.node[outNode]["pos"][0] in "NVJR":
217 | # strings.append("+".join([sentenceDAG.node[outNode]["lemma"] + "~"+ sentenceDAG[path[0]][outNode]["label"]] + pathStrings + ["NUMBER_SLOT"]))
218 | # elif hasContentWord:
219 | # strings.append("+".join([sentenceDAG.node[outNode]["ner"] + "~"+ sentenceDAG[path[0]][outNode]["label"]] + pathStrings + ["NUMBER_SLOT"]))
220 | #
221 |
222 | return strings
223 |
224 | def getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs, extend=True):
225 | # so this can go either from the location to the number, or the other way around
226 | # if the number token is before the first token of the location
227 | tokenSeqs = []
228 | if numberTokenIDs[-1] < locationTokenIDs[0]:
229 | tokenIDs = range(numberTokenIDs[-1]+1, locationTokenIDs[0])
230 | else:
231 | tokenIDs = range(locationTokenIDs[-1]+1, numberTokenIDs[0])
232 |
233 | # check whether there is a content word:
234 | hasContentWord = False
235 | tokens = []
236 | for id in tokenIDs:
237 | if sentence["tokens"][id]["ner"] == "O":
238 | tokens.append('"' + sentence["tokens"][id]["word"].lower() + '"')
239 | if sentence["tokens"][id]["pos"][0] in "NVJR":
240 | hasContentWord = True
241 | else:
242 | tokens.append('"' + sentence["tokens"][id]["ner"] + '"')
243 |
244 | if numberTokenIDs[-1] < locationTokenIDs[0]:
245 | tokens = ["NUMBER_SLOT"] + tokens + ["LOCATION_SLOT"]
246 | else:
247 | tokens = ["LOCATION_SLOT"] + tokens + ["NUMBER_SLOT"]
248 | if hasContentWord:
249 | tokenSeqs.append(tokens)
250 |
251 | if extend:
252 | lhsID = min(list(numberTokenIDs) + list(locationTokenIDs))
253 | rhsID = max(list(numberTokenIDs) + list(locationTokenIDs))
254 | # add the word to left
255 | extension = []
256 | extensionHasContentWord = False
257 | for idx in range(lhsID-1, max(-1, lhsID-10),-1):
258 | if sentence["tokens"][idx]["ner"] == "O":
259 | extension = ['"' + sentence["tokens"][idx]["word"].lower() + '"'] + extension
260 | if sentence["tokens"][idx]["pos"][0] in "NVJR":
261 | extensionHasContentWord = True
262 | else:
263 | extension = ['"' + sentence["tokens"][idx]["ner"] + '"'] + extension
264 | # add the extension if it has a content word and the last thing added is not a comma
265 | if (hasContentWord or extensionHasContentWord) and (sentence["tokens"][idx]["word"] != ","):
266 | tokenSeqs.append(copy.copy(extension) + tokens)
267 |
268 | # and now to the right
269 | extension = []
270 | extensionHasContentWord = False
271 | for idx in range(rhsID+1, min(len(sentence["tokens"]), rhsID+9)):
272 | if sentence["tokens"][idx]["ner"] == "O":
273 | extension.append('"' + sentence["tokens"][idx]["word"].lower() + '"')
274 | if sentence["tokens"][idx]["pos"][0] in "NVJR":
275 | extensionHasContentWord = True
276 | else:
277 | extension.append('"' + sentence["tokens"][idx]["ner"] + '"')
278 | # add the extension if it has a content word and the last thing added is not a comma
279 | if (hasContentWord or extensionHasContentWord) and (sentence["tokens"][idx]["word"] != ","):
280 | tokenSeqs.append(tokens + copy.copy(extension))
281 |
282 | return tokenSeqs
283 |
284 |
285 | if __name__ == "__main__":
286 |
287 | parsedJSONDir = sys.argv[1]
288 |
289 | # get all the files
290 | jsonFiles = glob.glob(parsedJSONDir + "/*.json")
291 |
292 | # one json to rule them all
293 | outputFile = sys.argv[2]
294 |
295 | # this forms the columns using the lexicalized dependency and surface patterns
296 | pattern2location2values = {}
297 |
298 | # this keeps the sentences for each pattern
299 | pattern2sentences = {}
300 |
301 | print str(len(jsonFiles)) + " files to process"
302 |
303 | # load the hardcoded names (if any):
304 | tokenizedLocationNames = []
305 | if len(sys.argv) > 3:
306 | names = codecs.open(sys.argv[3], encoding='utf-8').readlines()
307 | for name in names:
308 | tokenizedLocationNames.append(unicode(name).split())
309 | print "Dictionary with hardcoded tokenized location names"
310 | print tokenizedLocationNames
311 |
312 | for fileCounter, jsonFileName in enumerate(jsonFiles):
313 | print "processing " + jsonFileName
314 | with codecs.open(jsonFileName) as jsonFile:
315 | parsedSentences = json.loads(jsonFile.read())
316 |
317 | for sentence in parsedSentences:
318 | # fix the ner tags
319 | if len(tokenizedLocationNames)>0:
320 | dictLocationMatching(sentence, tokenizedLocationNames)
321 |
322 | tokenIDs2number = getNumbers(sentence)
323 | tokenIDs2location = getLocations(sentence)
324 |
325 | # if there was at least one location and one number build the dependency graph:
326 | if len(tokenIDs2number) > 0 and len(tokenIDs2location) > 0 and len(sentence["tokens"])<120:
327 |
328 | sentenceDAG = buildDAGfromSentence(sentence)
329 |
330 | wordsInSentence = []
331 | for token in sentence["tokens"]:
332 | wordsInSentence.append(token["word"])
333 | sample = " ".join(wordsInSentence)
334 |
335 | # for each pair of location and number
336 | # get the pairs of each and find their dependency paths (might be more than one)
337 | for locationTokenIDs, location in tokenIDs2location.items():
338 |
339 | for numberTokenIDs, number in tokenIDs2number.items():
340 |
341 | # keep all the shortest paths between the number and the tokens of the location
342 | shortestPaths = getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs)
343 |
344 | # ignore paths longer than some number deps (=tokens_on_path + 1)
345 | if len(shortestPaths) > 0 and len(shortestPaths[0]) < 10:
346 | for shortestPath in shortestPaths:
347 | pathStrings = depPath2StringExtend(sentenceDAG, shortestPath, locationTokenIDs, numberTokenIDs)
348 | for pathString in pathStrings:
349 | if pathString not in pattern2location2values:
350 | pattern2location2values[pathString] = {}
351 |
352 |
353 | if location not in pattern2location2values[pathString]:
354 | pattern2location2values[pathString][location] = []
355 |
356 | pattern2location2values[pathString][location].append(number)
357 | if pathString in pattern2sentences:
358 | pattern2sentences[pathString].append(sample)
359 | else:
360 | pattern2sentences[pathString] = [sample]
361 |
362 | # now get the surface strings
363 | surfacePatternTokenSeqs = getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs)
364 | for surfacePatternTokens in surfacePatternTokenSeqs:
365 | if len(surfacePatternTokens) < 15:
366 | surfaceString = ",".join(surfacePatternTokens)
367 | if surfaceString not in pattern2location2values:
368 | pattern2location2values[surfaceString] = {}
369 |
370 |
371 | if location not in pattern2location2values[surfaceString]:
372 | pattern2location2values[surfaceString][location] = []
373 |
374 | pattern2location2values[surfaceString][location].append(number)
375 |
376 | if surfaceString in pattern2sentences:
377 | pattern2sentences[surfaceString].append(sample)
378 | else:
379 | pattern2sentences[surfaceString] = [sample]
380 |
381 | # save every 1000 files
382 | if fileCounter % 10000 == 0:
383 | print str(fileCounter) + " files processed"
384 | with open(outputFile + "_tmp", "wb") as out:
385 | json.dump(pattern2location2values, out)
386 |
387 | with open(outputFile + "_sentences_tmp", "wb") as out:
388 | json.dump(pattern2sentences, out)
389 |
390 |
391 | with open(outputFile, "wb") as out:
392 | json.dump(pattern2location2values, out)
393 |
394 | with open(outputFile + "_sentences", "wb") as out:
395 | json.dump(pattern2sentences, out)
396 |
397 |
398 |
--------------------------------------------------------------------------------
/src/main/factChecker.py:
--------------------------------------------------------------------------------
1 | '''
2 |
3 | Then it trains a model using a matrix of text patterns and database
4 |
5 | It then obtains a ranking of the patterns
6 |
7 | And the checks files parsed and NER'ed JSONs by Stanford CoreNLP and produces the following structure:
8 |
9 | Location:[dep1:[val1, val2], dep1:[val1, val2, ...]]
10 |
11 | It produces a ranking of the sentences according to the relation at question and scores each value by MAPE
12 |
13 | '''
14 |
15 | import json
16 | import sys
17 | import os
18 | import glob
19 | import codecs
20 | import operator
21 | import buildMatrix
22 | import baselinePredictor
23 |
24 |
25 |
26 | # training data
27 | # load the FreeBase file
28 | with open(sys.argv[1]) as freebaseFile:
29 | region2property2value = json.loads(freebaseFile.read())
30 |
31 | # we need to make it property2region2value
32 | property2region2value = {}
33 | for region, property2value in region2property2value.items():
34 | for property, value in property2value.items():
35 | if property not in property2region2value:
36 | property2region2value[property] = {}
37 | property2region2value[property][region] = value
38 |
39 | # text patterns
40 | textMatrix = baselinePredictor.BaselinePredictor.loadMatrix(sys.argv[2])
41 |
42 | # specify which ones are needed:
43 | property = "/location/statistical_region/" + sys.argv[3]
44 |
45 | # first let's train a model
46 |
47 | predictor = baselinePredictor.BaselinePredictor()
48 | params = [True, float(sys.argv[4])]
49 |
50 | # train
51 | predictor.trainRelation(property, property2region2value[property], textMatrix, sys.stdout, params)
52 |
53 | print "patterns kept:"
54 | print predictor.property2patterns[property].keys()
55 |
56 |
57 | # parsed texts to check
58 | parsedJSONDir = sys.argv[5]
59 |
60 | # get all the files
61 | jsonFiles = glob.glob(parsedJSONDir + "/*.json")
62 |
63 |
64 | print str(len(jsonFiles)) + " files to process"
65 |
66 | # load the hardcoded names
67 | tokenizedLocationNames = []
68 | names = codecs.open(sys.argv[6], encoding='utf-8').readlines()
69 | for name in names:
70 | tokenizedLocationNames.append(unicode(name).split())
71 | print "Dictionary with hardcoded tokenized location names"
72 | print tokenizedLocationNames
73 |
74 | # get the aliases
75 | # load the file
76 | with open(sys.argv[7]) as jsonFile:
77 | region2aliases = json.loads(jsonFile.read())
78 |
79 | # so we first need to take the location2aliases dict and turn in into aliases to region
80 | alias2region = {}
81 | for region, aliases in region2aliases.items():
82 | # add the location as alias to itself
83 | alias2region[region] = region
84 | for alias in aliases:
85 | # so if this alias is used for a different location
86 | if alias in alias2region and region!=alias2region[alias]:
87 | alias2region[alias] = None
88 | alias2region[alias.lower()] = None
89 | else:
90 | # remember to add the lower
91 | alias2region[alias] = region
92 | alias2region[alias.lower()] = region
93 |
94 | # now filter out the Nones
95 | for alias, region in alias2region.items():
96 | if region == None:
97 | print "alias ", alias, " ambiguous"
98 | del alias2region[alias]
99 |
100 | print alias2region
101 |
102 | # store the result: sentence, country, number, nearestPattern, euclidDistance, correctNumber, MAPE
103 |
104 | tsv = open(sys.argv[8], "wb")
105 |
106 | headers = ['sentence', 'region', 'kb_region', 'property', 'kb_value', 'mape_support_scaling_param', 'pattern', 'value', 'MAPE', 'source_JSON']
107 |
108 | tsv.write("\t".join(headers) + "\n")
109 |
110 | # Now go over each file
111 | for fileCounter, jsonFileName in enumerate(jsonFiles):
112 | #print "processing " + jsonFileName
113 | with codecs.open(jsonFileName) as jsonFile:
114 | parsedSentences = json.loads(jsonFile.read())
115 |
116 | for sentence in parsedSentences:
117 | # skip sentences with more than 120 tokens.
118 | if len(sentence["tokens"])>120:
119 | continue
120 |
121 | # fix the ner tags
122 | if len(tokenizedLocationNames)>0:
123 | buildMatrix.dictLocationMatching(sentence, tokenizedLocationNames)
124 |
125 | wordsInSentence = []
126 | for idx, token in enumerate(sentence["tokens"]):
127 | wordsInSentence.append(token["word"])
128 |
129 | #print "Sentence: " + sentenceText.encode('utf-8')
130 |
131 | # get the numbers mentioned
132 | tokenIDs2number = buildMatrix.getNumbers(sentence)
133 |
134 | # and the locations mentioned in the sentence
135 | tokenIDs2location = buildMatrix.getLocations(sentence)
136 |
137 | # So let's check if the locations are among those that we can fact check for this relation
138 | for locationTokenIDs, location in tokenIDs2location.items():
139 |
140 | # so we have the location, but is it a known region?
141 | region = location
142 | # if the location has an alias
143 | if location in alias2region:
144 | # get it
145 | region = alias2region[location]
146 | elif location.lower() in alias2region:
147 | region = alias2region[location.lower()]
148 |
149 | if region in property2region2value[property]:
150 |
151 | sentenceDAG = buildMatrix.buildDAGfromSentence(sentence)
152 |
153 | for numberTokenIDs, number in tokenIDs2number.items():
154 |
155 | #print "number in text: " + str(number)
156 |
157 | patterns = []
158 | # keep all the shortest paths between the number and the tokens of the location
159 | shortestPaths = buildMatrix.getShortestDepPaths(sentenceDAG, locationTokenIDs, numberTokenIDs)
160 | for shortestPath in shortestPaths:
161 | pathStrings = buildMatrix.depPath2StringExtend(sentenceDAG, shortestPath, locationTokenIDs, numberTokenIDs)
162 | patterns.extend(pathStrings)
163 |
164 | # now get the surface strings
165 | surfacePatternTokenSeqs = buildMatrix.getSurfacePatternsExtend(sentence, locationTokenIDs, numberTokenIDs)
166 | for surfacePatternTokens in surfacePatternTokenSeqs:
167 | if len(surfacePatternTokens) < 15:
168 | surfaceString = ",".join(surfacePatternTokens)
169 | patterns.append(surfaceString)
170 |
171 | patternsApplied = []
172 | for pattern in patterns:
173 | if pattern in predictor.property2patterns[property].keys():
174 | patternsApplied.append(pattern)
175 |
176 | if len(patternsApplied) > 0:
177 | wordsInSentence[numberTokenIDs[0]] = "" + wordsInSentence[numberTokenIDs[0]]
178 | wordsInSentence[numberTokenIDs[-1]] = wordsInSentence[numberTokenIDs[-1]] + ""
179 |
180 | wordsInSentence[locationTokenIDs[0]] = "" + wordsInSentence[locationTokenIDs[0]]
181 | wordsInSentence[locationTokenIDs[-1]] = wordsInSentence[locationTokenIDs[-1]] + ""
182 |
183 | sentenceText = " ".join(wordsInSentence)
184 |
185 | print "Sentence: " + sentenceText.encode('utf-8')
186 | print "location in text " + location.encode('utf-8') + " is known as " + region.encode('utf-8') + " in FB with known " + property + " value " + str(property2region2value[property][region])
187 | print "confidence level= " + str(len(patternsApplied)) + "\t" + str(patternsApplied)
188 | print "sentence states that " + location.encode('utf-8') + " has " + property + " value " + str(number)
189 | if property2region2value[property][region] != 0.0:
190 | mape = abs(number - property2region2value[property][region]) / float(abs(property2region2value[property][region]))
191 | print "MAPE: " + str(mape)
192 | else:
193 | print "MAPE undefined"
194 | mape = "undef"
195 | print "source: " + jsonFileName
196 | print "------------------------------"
197 | details = [sentenceText.encode('utf-8'), location.encode('utf-8'), region.encode('utf-8'), sys.argv[3], str(property2region2value[property][region]), str(len(patternsApplied)),str(patternsApplied), str(number), str(mape), jsonFileName]
198 | tsv.write("\t".join(details) + "\n")
199 |
200 | tsv.close()
201 |
202 |
203 |
204 |
205 |
206 |
--------------------------------------------------------------------------------
/src/main/fixedValuePredictor.py:
--------------------------------------------------------------------------------
1 | import abstractPredictor
2 | import numpy
3 | import heapq
4 |
5 | class FixedValuePredictor(abstractPredictor.AbstractPredictor):
6 |
7 | def __init__(self):
8 | # this keeps the median for each relation
9 | self.property2fixedValue = {}
10 |
11 |
12 | def predict(self, property, region, of=None, useDefault=True):
13 | if useDefault:
14 | return self.property2fixedValue[property]
15 | else:
16 | return None
17 |
18 | # TODO: remove the textMatrix from the arg list
19 | def trainRelation(self, property, trainRegion2value, textMatrix, of, params=None):
20 |
21 | # try three options
22 | candidates = [0, numpy.median(trainRegion2value.values()), numpy.mean(trainRegion2value.values())]
23 | #print candidates
24 | bestScore = float("inf")
25 | bestCandidate = None
26 | for candidate in candidates:
27 | prediction = {}
28 | for region in trainRegion2value:
29 | prediction[region] = candidate
30 | mape = abstractPredictor.AbstractPredictor.MAPE(prediction, trainRegion2value)
31 |
32 | if mape < bestScore:
33 | bestScore = mape
34 | bestCandidate = candidate
35 |
36 | if bestCandidate == 0:
37 | of.write(property + " best value is 0 with score " + str(bestScore) + "\n")
38 | elif bestCandidate == numpy.median(trainRegion2value.values()):
39 | of.write(property + " best value is median (" + str(numpy.median(trainRegion2value.values())) + ") with score " + str(bestScore) + "\n")
40 | elif bestCandidate == numpy.mean(trainRegion2value.values()):
41 | of.write(property + " best value is mean (" + str(numpy.mean(trainRegion2value.values())) + ") with score " + str(bestScore) + "\n")
42 | self.property2fixedValue[property] = bestCandidate
43 |
44 |
45 | # TODO: refactor to reuse the above
46 | def train(self, trainMatrix, textMatrix, params=None):
47 | for property, trainRegion2value in trainMatrix.items():
48 | print property, trainRegion2value
49 | # try three options
50 | candidates = [0, numpy.median(trainRegion2value.values()), numpy.mean(trainRegion2value.values())]
51 | bestScore = float("inf")
52 | bestCandidate = None
53 | for candidate in candidates:
54 | prediction = {}
55 | for region in trainRegion2value:
56 | prediction[region] = candidate
57 | mape = abstractPredictor.AbstractPredictor.MAPE(prediction, trainRegion2value)
58 |
59 | if mape < bestScore:
60 | bestScore = mape
61 | bestCandidate = candidate
62 |
63 | if bestCandidate == 0:
64 | print property, " best value is 0 with score ", bestScore
65 | elif bestCandidate == numpy.median(trainRegion2value.values()):
66 | print property, " best value is median with score ", bestScore
67 | elif bestCandidate == numpy.mean(trainRegion2value.values()):
68 | print property, " best value is mean with score ", bestScore
69 | self.property2fixedValue[property] = bestCandidate
70 |
71 |
72 |
73 | if __name__ == "__main__":
74 |
75 | import sys
76 | import os.path
77 | import json
78 |
79 | fixedValuePredictor = FixedValuePredictor()
80 |
81 | trainMatrix = fixedValuePredictor.loadMatrix(sys.argv[1])
82 | textMatrix = fixedValuePredictor.loadMatrix(sys.argv[2])
83 | testMatrix = fixedValuePredictor.loadMatrix(sys.argv[3])
84 |
85 | outputFileName = sys.argv[4]
86 |
87 | #properties = ["/location/statistical_region/population","/location/statistical_region/gdp_real","/location/statistical_region/cpi_inflation_rate"]
88 | #properties = ["/location/statistical_region/population"]
89 | properties = json.loads(open(os.path.dirname(os.path.abspath(sys.argv[1])) + "/featuresKept.json").read())
90 |
91 | property2bestParams = fixedValuePredictor.crossValidate(trainMatrix, textMatrix, 4, properties, outputFileName, [[None]])
92 | #print "OK"
93 |
94 | print property2bestParams
95 | property2MAPE = {}
96 | for property in properties:
97 | paramsStrs = []
98 | for param in property2bestParams[property]:
99 | paramsStrs.append(str(param))
100 |
101 | ofn = outputFileName + "_" + property.split("/")[-1] + "_" + "_".join(paramsStrs) + "_TEST"
102 | a= {}
103 | fixedValuePredictor.runRelEval(a, property, trainMatrix[property], textMatrix, testMatrix[property], ofn, property2bestParams[property])
104 | property2MAPE[property] = a.values()[0]
105 |
106 | for property in sorted(property2MAPE):
107 | print property, property2MAPE[property]
108 | print "avg MAPE:", str(numpy.mean(property2MAPE.values()))
109 |
110 |
--------------------------------------------------------------------------------
/src/main/matrixFiltering.py:
--------------------------------------------------------------------------------
1 | # so this script takes as input a dictionary in json with the following structure:
2 | # dep or string pattern : {location1:[values], location2:[values]}, etc.
3 | # and does the following kinds of filtering:
4 | # - removes locations that have less than one value for a pattern
5 | # - removes patterns for which location lists are all over the place (high stdev)
6 | # - removes patterns that have fewer than arg1 location
7 |
8 | # The second argument is a list of (FreeBase) region names to their aliases which will
9 | # to bring condense the matrix (UK and U.K. becoming the same location), but also they
10 | # prepare us for experiments
11 |
12 |
13 | import json
14 | import numpy
15 | import sys
16 | import math
17 |
18 | minNumberOfValues = int(sys.argv[4])
19 | minNumberOfLocations = int(sys.argv[5])
20 | maxAllowedDeviation = float(sys.argv[6])
21 | percentageRemoved = float(sys.argv[7])
22 |
23 |
24 | # helps detect errors
25 | numpy.seterr(all='raise')
26 |
27 | # load the file
28 | with open(sys.argv[1]) as jsonFile:
29 | pattern2locations2values = json.loads(jsonFile.read())
30 |
31 | print "patterns before filtering:", len(pattern2locations2values)
32 |
33 | # load the file
34 | with open(sys.argv[2]) as jsonFile:
35 | region2aliases = json.loads(jsonFile.read())
36 |
37 | # so we first need to take the location2aliases dict and turn in into aliases to region
38 | alias2region = {}
39 | for region, aliases in region2aliases.items():
40 | # add the location as alias to itself
41 | alias2region[region] = region
42 | for alias in aliases:
43 | # so if this alias is used for a different location
44 | if alias in alias2region and region!=alias2region[alias]:
45 | alias2region[alias] = None
46 | alias2region[alias.lower()] = None
47 | else:
48 | # remember to add the lower
49 | alias2region[alias] = region
50 | alias2region[alias.lower()] = region
51 |
52 | # now filter out the Nones
53 | for alias, region in alias2region.items():
54 | if region == None:
55 | print "alias ", alias, " ambiguous"
56 | del alias2region[alias]
57 |
58 | # ok, let's traverse now all the patterns and any locations we find we match them case independently to the aliases and replace them with the location
59 | for pattern, locations2values in pattern2locations2values.items():
60 | # so here are the locations
61 | # we must be careful in case two or more locations are collapsed to the same region
62 | regions2values = {}
63 | # for each location
64 | for location, values in locations2values.items():
65 | #print location
66 | #print values
67 | #print numpy.isfinite(values)
68 | #if not numpy.all(numpy.isfinite(values)):
69 | # print "ERROR"
70 | region = location
71 | # if the location has an alias
72 | if location in alias2region:
73 | # get it
74 | region = alias2region[location]
75 | elif location.lower() in alias2region:
76 | region = alias2region[location.lower()]
77 |
78 | # if we haven't added it to the regions
79 | if region not in regions2values:
80 | regions2values[region] = values
81 | else:
82 | regions2values[region].extend(values)
83 | # replace the location values of the pattern with the new ones
84 | pattern2locations2values[pattern] = regions2values
85 |
86 |
87 | countNotEnoughValues = 0
88 | for pattern, loc2values in pattern2locations2values.items():
89 | for loc in loc2values.keys():
90 | # so if there are not enough values, remove the location from that pattern
91 | if len(loc2values[loc]) < minNumberOfValues:
92 | del loc2values[loc]
93 | countNotEnoughValues +=1
94 | if len(loc2values) == 0:
95 | del pattern2locations2values[pattern]
96 |
97 | print "set of values removed for having less than", minNumberOfValues, " of values: ", countNotEnoughValues
98 |
99 | countTooMuchDeviation = 0
100 | for pattern, loc2values in pattern2locations2values.items():
101 | initialLocations = len(loc2values)
102 | locationsRemoved = 0
103 | #print pattern
104 | for loc, values in loc2values.items():
105 | #print loc
106 | #print values
107 | a = numpy.array(values)
108 | # if the values have a high stdev after normalizing them between 0 and 1 (only positive values)
109 | # the value should be interpreted as the percentage of the max value allowed as stdev
110 | # we need the largest absolute value
111 | #print a
112 | largestAbsValue = numpy.abs(a).max()
113 | #print largestAbsValue
114 | # if we didn't have all 0
115 | if largestAbsValue > 0 and numpy.std(a/largestAbsValue) > maxAllowedDeviation:
116 | del loc2values[loc]
117 | countTooMuchDeviation += 1
118 | locationsRemoved += 1
119 | # if the pattern has many locations with values all over the place, remove it altogether.
120 | if float(locationsRemoved)/initialLocations > percentageRemoved:
121 | print "pattern ", pattern.encode('utf-8'), " removed because it has more than ",percentageRemoved, " value with large deviation"
122 | del pattern2locations2values[pattern]
123 |
124 | print "sets of values removed for having more than", maxAllowedDeviation, " std deviation : ", countTooMuchDeviation
125 |
126 |
127 | for pattern in pattern2locations2values.keys():
128 | # now make sure there are enough locations left per pattern
129 | if len(pattern2locations2values[pattern]) < minNumberOfLocations:
130 | del pattern2locations2values[pattern]
131 | else:
132 | # if there are enough values then just keep the average
133 | for location in pattern2locations2values[pattern].keys():
134 | pattern2locations2values[pattern][location] = numpy.mean(pattern2locations2values[pattern][location])
135 |
136 | # if the feature has the same values independently of the region, remove it as well:
137 | for pattern,location2values in pattern2locations2values.items():
138 | if min(location2values.values()) == max(location2values.values()):
139 | print "Removing pattern ", pattern.encode('utf-8'), " because it has the same values for all locations:", location2values
140 | del pattern2locations2values[pattern]
141 |
142 | print "patterns after filtering:",len(pattern2locations2values)
143 |
144 | with open(sys.argv[3], "wb") as out:
145 | json.dump(pattern2locations2values, out)
146 |
--------------------------------------------------------------------------------
/src/utils/FreeBaseDownload.py:
--------------------------------------------------------------------------------
1 | '''
2 | Created on May 10, 2014
3 |
4 | @author: andreasvlachos
5 |
6 | This script downloads all statistical regions from FreeBase using a combination of the MQL read API and the Topic API.
7 | '''
8 | import json
9 | import urllib
10 |
11 | # TODO: add the part to retrieve the aliases:
12 | # [{ "mid": null, "name": null, "/common/topic/alias": [], "type": "/location/statistical_region", "limit": 100 }]
13 |
14 | api_key = open("/cs/research/intelsys/home1/avlachos/freebaseApiKey").read()
15 | mqlread_url = 'https://www.googleapis.com/freebase/v1/mqlread'
16 | # use the mid instead of the id as they do need escaping
17 | mql_query = '[{"mid": null,"name": null, "type": "/location/statistical_region","limit": 100}]'
18 | # set this to the last value we obtained
19 | cursor = ""
20 |
21 | # we need to have a parameter limit=0 as in:
22 | topicService_url = 'https://www.googleapis.com/freebase/v1/topic'
23 | params = {
24 | 'key': api_key,
25 | 'filter': '/location/statistical_region',
26 | 'limit': 0
27 | }
28 |
29 | # Given the quota, we can run this 1000 times daily.
30 | # It stops when the topics are exhausted.
31 |
32 | for i in xrange(1000):
33 | # construct the query
34 | mql_url = mqlread_url + '?query=' + mql_query + "&cursor=" + cursor
35 | print mql_url
36 | statisticalRegionsResult = json.loads(urllib.urlopen(mql_url).read())
37 | #print statisticalRegionsResult
38 | for region in statisticalRegionsResult["result"]:
39 | print region["mid"]# + ":" + region["name"]
40 | # now get the statistical properties
41 | topic_url = topicService_url + region["mid"] + '?' + urllib.urlencode(params)
42 | topicResult = json.loads(urllib.urlopen(topic_url).read())
43 | # print topicResult
44 | topicResult["name"] = region["name"]
45 | filename = region["mid"].split("/")[-1] + ".json"
46 | with open(filename, 'w') as outfile:
47 | json.dump(topicResult, outfile)
48 |
49 | # update the cursor
50 | cursor = statisticalRegionsResult['cursor']
51 | # this cursor can be used to resume the data download
52 | print "New cursor to process"
53 | print cursor
54 | if not cursor:
55 | break
56 |
--------------------------------------------------------------------------------
/src/utils/bingDownload.py:
--------------------------------------------------------------------------------
1 | # Following the example from here
2 | # http://www.cs.columbia.edu/~gravano/cs6111/Proj1/bing-python.txt
3 |
4 | import urllib2
5 | import base64
6 | import urllib
7 | import json
8 | import os
9 |
10 | # from here we only want to keep the countries
11 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010Filtered15-150.json") as dataFile:
12 | allCountriesData = json.loads(dataFile.read())
13 |
14 | # from here get the property ids
15 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/featuresKept.json") as dataFile:
16 | featuresKept = json.loads(dataFile.read())
17 |
18 | # better lowercase the property names
19 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allStatisticalRegionProperties.json") as dataFile:
20 | featuresDesc = json.loads(dataFile.read())
21 |
22 |
23 |
24 | propertyKeywords = []
25 | for feature in featuresKept:
26 | # get the name for it and lower case it
27 | for feat in featuresDesc["result"]:
28 | if feat["id"] == feature:
29 | propertyKeywords.append(feat["name"].lower().encode('utf-8'))
30 | #propertyKeywords = ["fertility rate"]
31 | #print propertyKeywords
32 |
33 | countryNames = []
34 | for country in allCountriesData:
35 | countryNames.append(country.encode('utf-8'))
36 |
37 | #countryNames = ["Czech Republic"]
38 | #print countryNames
39 |
40 | bingUrl = 'https://api.datamarket.azure.com/Bing/SearchWeb/v1/Web' # ?Query=%27gates%27&$top=10&$format=json'
41 | #Provide your account key here
42 | accountKey = 'XXX'
43 | accountKeyEnc = base64.b64encode(accountKey + ':' + accountKey)
44 | headers = {'Authorization': 'Basic ' + accountKeyEnc}
45 |
46 | pathName = "/cs/research/intelsys/home1/avlachos/FactChecking/Bing"
47 |
48 | if not os.path.exists(pathName):
49 | print "creating dir " + pathName
50 | os.mkdir(pathName)
51 |
52 |
53 | for keywords in propertyKeywords:
54 | print keywords
55 | # create a directory for the relation
56 | relPathName = pathName + "/" + keywords
57 | if not os.path.exists(relPathName):
58 | print "creating dir " + relPathName
59 | os.mkdir(relPathName)
60 |
61 | for name in countryNames:
62 | print name
63 | params = {
64 | #'format': "Json",
65 | 'Adult': "\'Strict\'",
66 | 'WebFileType' : "\'HTML\'",
67 | }
68 | # the query terms are done with urllib quote in order to get %20 instead of + (bing likes that instead)
69 | #print '\''.encode('utf-8') + name + " " + keywords + u' 2014\''.encode('utf-8')
70 | # one can add in the end this bit 'WebSearchOptions' : "DisableQueryAlterations"
71 | #&WebSearchOptions=%27DisableQueryAlterations%27
72 | # this bit can fetch the second page "$skip=100"
73 | url = bingUrl + "?Query=" + urllib.quote('\''.encode('utf-8') + name + " " + keywords + '\''.encode('utf-8')) + "&" + urllib.urlencode(params) + "&$format=json"
74 | print url
75 | req = urllib2.Request(url, headers = headers)
76 | response = urllib2.urlopen(req)
77 | content = json.loads(response.read())
78 | # content contains the xml/json response from Bing.
79 | print content
80 | # save the json in a file named after the country and the property.
81 | filename = relPathName + "/" + name + ".json"
82 | with open(filename, 'w') as outfile:
83 | json.dump(content["d"]["results"], outfile)
84 |
85 |
--------------------------------------------------------------------------------
/src/utils/dataFiltering.py:
--------------------------------------------------------------------------------
1 | """ This script takes the json extracted from the freebase jsons
2 | and creates a matrix of countries x FreeBase relations.
3 | We probably want to filter out relations and countries that do not
4 | have a lot of values in the data"""
5 |
6 | import json
7 | from collections import Counter
8 |
9 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010.json") as dataFile:
10 | data = json.loads(dataFile.read())
11 |
12 | #print json.dumps(data, sort_keys=True, indent=4)
13 | print len(data)
14 | filteredFeatureCounts = Counter()
15 | filteredCountries = {}
16 | for country, numbers in data.items():
17 | if len(numbers) >= 15:
18 | filteredCountries[country] = numbers
19 | for feature in numbers:
20 | filteredFeatureCounts[feature] += 1
21 |
22 |
23 | print filteredFeatureCounts
24 |
25 | filteredFeatureCountries = {}
26 | featuresKept = []
27 | entriesFilled = 0
28 | for country, numbers in filteredCountries.items():
29 | filteredFeatures = {}
30 | for feature, number in numbers.items():
31 | if filteredFeatureCounts[feature] >= 150:
32 | if feature not in featuresKept:
33 | featuresKept.append(feature)
34 | filteredFeatures[feature] = number
35 | filteredFeatureCountries[country] = filteredFeatures
36 | entriesFilled += len(filteredFeatures)
37 |
38 |
39 | print len(filteredFeatureCountries)
40 | print len(featuresKept)
41 | print entriesFilled
42 |
43 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/allCountriesPost2010Filtered15-150.json", "w") as dataFile:
44 | json.dump(filteredFeatureCountries, dataFile)
45 |
46 | with open("/cs/research/intelsys/home1/avlachos/FactChecking/featuresKept.json", "w") as dataFile:
47 | json.dump(featuresKept, dataFile)
48 |
49 |
50 | #print len(featureCounts)
51 | #print data["Algeria"]
52 | #print data["Germany"]["/location/statistical_region/population"]
53 | #print data["Algeria"]["/location/statistical_region/population"]
54 |
55 | #print featureCounts.most_common(40)
56 |
--------------------------------------------------------------------------------
/src/utils/dataSplits.py:
--------------------------------------------------------------------------------
1 | '''
2 | Here we keep functions for training/test splits ensuring the matrix still has enough in each row and column
3 |
4 | '''
5 |
6 | import random
7 | import json
8 | import sys
9 |
10 | # load the FreeBase file
11 | with open(sys.argv[1]) as freebaseFile:
12 | region2property2value = json.loads(freebaseFile.read())
13 |
14 | # we need to make it property2region2value
15 | property2region2value = {}
16 | for region, property2value in region2property2value.items():
17 | for property, value in property2value.items():
18 | if property not in property2region2value:
19 | property2region2value[property] = {}
20 | property2region2value[property][region] = value
21 |
22 | trainingPortion = 2.0/3.0
23 |
24 | # for each property, pick the trainingPortion, ensuring that all countries are represented
25 | trainMatrix = {}
26 | testMatrix = {}
27 |
28 | random.seed(3)
29 |
30 | for property, region2value in property2region2value.items():
31 | trainMatrix[property] = {}
32 | testMatrix[property] = {}
33 |
34 | regions = region2value.keys()
35 | random.shuffle(regions)
36 |
37 | for idx, region in enumerate(regions):
38 | if float(idx+1)/float(len(regions)) 0 and len(val["property"][timeType]["values"]) > 0:
69 | thisValue = val["property"][valueType]["values"][0]["value"]
70 | thisTime = val["property"][timeType]["values"][0]["value"]
71 | else:
72 | continue
73 |
74 | try:
75 | # if the time is given as YYYY-MM or YYYY-MM-DD
76 | if thisTime.find("-") > -1:
77 | if len(thisTime.split("-")) ==2:
78 | thisYear, thisMonth = thisTime.split("-")
79 | thisTime = [int(thisYear), int(thisMonth)]
80 | elif len(thisTime.split("-")) ==3:
81 | # the day of the month is ignored
82 | thisYear, thisMonth, thisDay = thisTime.split("-")
83 | thisTime = [int(thisYear), int(thisMonth)]
84 | else:
85 | # or it is just YYYY
86 | thisTime = [int(thisTime), 0]
87 | # check that the numbers are not future projections!
88 | if (thisTime[0] < 2015) and ((mostRecentTime == [0,0]) or (thisTime[0] > mostRecentTime[0]) \
89 | or (thisTime[0] == mostRecentTime[0] and thisTime[1] > mostRecentTime[1])):
90 | mostRecentTime = thisTime
91 | mostRecentValue = thisValue
92 | # if the time specified cannot be parsed, ignore it
93 | except ValueError:
94 | pass
95 | # or the time of the measurement is recent enough
96 | if mostRecentTime[0] >= 2010:
97 | print "Time=" + str(mostRecentTime) + " Value=" + str(mostRecentValue)
98 | numbers[prop] = mostRecentValue
99 | return name, numbers
100 |
101 |
102 | if __name__ == '__main__':
103 | import sys
104 | dirName = sys.argv[1]
105 | propertiesOfInterest = {}
106 | with open(dirName + "../propertiesOfInterest.json") as props:
107 | propertiesOfInterest = json.loads(props.read())
108 |
109 | countries2numbers = {}
110 | totalCountries = 0
111 | totalNumbers = 0
112 | uniqueRelations = []
113 | relation2counts = {}
114 | rels = []
115 | for fl in os.listdir(dirName):
116 | print fl
117 | jsonFl = open(dirName + "/" + fl).read()
118 | name, numbers = extractNumericalValues(jsonFl, propertiesOfInterest)
119 | if name != None and len(numbers)>0:
120 | countries2numbers[name] = numbers
121 | totalCountries += 1
122 | for relation in numbers:
123 | totalNumbers += 1
124 | if relation not in uniqueRelations:
125 | uniqueRelations.append(relation)
126 | relation2counts[relation] = 0
127 | relation2counts[relation] += 1
128 |
129 | print countries2numbers
130 | # maybe return a json? Would be useful to be language independent
131 | print relation2counts
132 | print "countries with at least one 2010-2014 number: " + str(totalCountries)
133 | print "total post 2010 numbers: " + str(totalNumbers)
134 | print "Unique relations: " + str(len(uniqueRelations))
135 |
136 | with open(dirName + "../allCountriesPost2010-2014.json", 'wb') as dc:
137 | json.dump(countries2numbers, dc)
138 |
139 |
140 |
--------------------------------------------------------------------------------
/src/utils/obtainAliases.py:
--------------------------------------------------------------------------------
1 | '''
2 |
3 | This script obtains the aliases from Freebase
4 |
5 | '''
6 |
7 | import sys
8 | import json
9 | import urllib
10 |
11 | # load the file
12 | with open(sys.argv[1]) as freebaseFile:
13 | region2property2value = json.loads(freebaseFile.read())
14 |
15 |
16 | apiKey = open("/cs/research/intelsys/home1/avlachos/freebaseApiKey").read()
17 |
18 | mqlreadUrl = 'https://www.googleapis.com/freebase/v1/mqlread'
19 |
20 | aliasQueryParams = {
21 | 'key': apiKey,
22 | }
23 |
24 | # the limit gives back only one result, which seems to be the most popular and the one we are interested in
25 | aliasQuery = { "/common/topic/alias": [], "type": "/location/statistical_region", "limit":1 }
26 |
27 | region2aliases = {}
28 | for regionName in region2property2value:
29 | print regionName.encode('utf-8')
30 | aliasQuery["name"] = regionName
31 | aliasQueryParams["query"] = json.dumps(aliasQuery)
32 |
33 | aliasUrl = mqlreadUrl + '?' + urllib.urlencode(aliasQueryParams)
34 | aliasJSON = json.loads(urllib.urlopen(aliasUrl).read())
35 | region2aliases[regionName] = aliasJSON["result"]["/common/topic/alias"]
36 |
37 | with open(sys.argv[2], "wb") as out:
38 | json.dump(region2aliases, out)
39 |
40 | print len(region2aliases), " region names with aliases"
41 |
42 |
43 |
--------------------------------------------------------------------------------