├── docs ├── _config.yml └── README.md ├── data └── CORONA_Cube.nc ├── README.md ├── notebooks ├── Collect GDELT events.ipynb ├── Query the news articles.ipynb ├── Spatio Temporal File Creation.ipynb ├── Knowledge Graph Queries.ipynb └── Mapping GDELT Events.ipynb ├── .gitignore └── LICENSE /docs/_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-architect -------------------------------------------------------------------------------- /data/CORONA_Cube.nc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gisfromscratch/gdelt-notebooks/HEAD/data/CORONA_Cube.nc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## GDELT Notebooks 2 | 3 | This repository contains a bunch of notebooks for using advanced analytics on the Global Database of Events, Tone and Language (GDELT). 4 | 5 | ### Ressources: 6 | - [GDELT Notebooks home page](https://gisfromscratch.github.io/gdelt-notebooks/) 7 | -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- 1 | ## GDELT Notebooks 2 | 3 | This repository contains a bunch of notebooks for using advanced analytics on the Global Database of Events, Tone and Language (GDELT). 4 | 5 | ### Getting started: 6 | 7 | Just take a look at the following notebooks: 8 | 9 | 1. [Mapping the GDELT events](https://github.com/gisfromscratch/gdelt-notebooks/blob/master/notebooks/Mapping%20GDELT%20Events.ipynb) 10 | 2. [Mapping the GDELT knowledge graph](https://github.com/gisfromscratch/gdelt-notebooks/blob/master/notebooks/Knowledge%20Graph%20Queries.ipynb) 11 | 3. [Spatioatemporal file creation](https://github.com/gisfromscratch/gdelt-notebooks/blob/master/notebooks/Spatio%20Temporal%20File%20Creation.ipynb) 12 | 4. [Query and aggregate the news articles](https://github.com/gisfromscratch/gdelt-notebooks/blob/master/notebooks/Query%20the%20news%20articles.ipynb) 13 | 14 | ### Requirements: 15 | The arcgis module works with Jupyter <= 5.7.8. Jupyter >= 6.0.0 does not display the map widget. 16 | - arcgis-1.8.0 17 | - gdelt-0.1.10.6 18 | - matplotlib-2.2.2 19 | - seaborn-0.8.1 20 | -------------------------------------------------------------------------------- /notebooks/Collect GDELT events.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Collect GDELT events" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Requirements" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "!pip install --user gdelt" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 6, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "from datetime import date, timedelta \n", 33 | "from gdelt import gdelt as gdelt_client\n", 34 | "import os\n", 35 | "import tempfile" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Getting GDELT events" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 30, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "def get_events(event_date):\n", 52 | " version_date = date(2015, 2, 18)\n", 53 | " if event_date < version_date:\n", 54 | " client = gdelt_client(version=1)\n", 55 | " else:\n", 56 | " client = gdelt_client(version=2)\n", 57 | " events = client.Search(event_date.strftime(\"%Y %m %d\"), table=\"events\", coverage=True)\n", 58 | " del client\n", 59 | " return events" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "## Filter events by country code (FIPS-4)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 35, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "1990-01-01\n", 79 | "C:\\Users\\DEVELO~1\\AppData\\Local\\Temp\\AJ_1990_01_01.csv\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "def filter_events(country_code, event_date=date.today()):\n", 85 | " '''FIPS-4 country code'''\n", 86 | " events = get_events(event_date)\n", 87 | " return events[events.ActionGeo_CountryCode == country_code]\n", 88 | "\n", 89 | "def save_events(country_code, start_date=date.today(), max_days=365, out_dir=tempfile.gettempdir()):\n", 90 | " file_path = os.path.join(out_dir, \"{0}_{1}.csv\".format(country_code, start_date.strftime(\"%Y_%m_%d\")))\n", 91 | " filter_date = start_date\n", 92 | " has_events = False\n", 93 | " for day_count in range(0, max_days):\n", 94 | " try:\n", 95 | " events = filter_events(country_code, filter_date)\n", 96 | " print(filter_date)\n", 97 | " if not events.empty:\n", 98 | " if not os.path.exists(file_path):\n", 99 | " events.to_csv(file_path, index=False)\n", 100 | " has_events = True\n", 101 | " else:\n", 102 | " events.to_csv(file_path, index=False, header=False, mode=\"a\")\n", 103 | " except ValueError as err:\n", 104 | " print(err)\n", 105 | " filter_date -= timedelta(days=1)\n", 106 | " \n", 107 | " if has_events:\n", 108 | " print(file_path)\n", 109 | " else:\n", 110 | " print(\"No events obtained.\")\n", 111 | " \n", 112 | "def wayback_events(country_code, start_date=date.today(), out_dir=tempfile.gettempdir()):\n", 113 | " min_date = date(1979, 1, 1)\n", 114 | " if start_date < min_date:\n", 115 | " raise ValueError(\"GDELT only supports 'Jan 01 1979 - Present' queries currently. Try another date!\")\n", 116 | " \n", 117 | " date_diff = start_date - min_date\n", 118 | " max_days = date_diff.days + 1\n", 119 | " save_events(country_code, start_date, max_days, out_dir)\n", 120 | "\n", 121 | "country_code = \"AJ\"\n", 122 | "#start_date = date.today()\n", 123 | "start_date = date(1990, 1, 1)\n", 124 | "filter_events(country_code, start_date)\n", 125 | "#save_events(country_code, start_date, 1)\n", 126 | "#wayback_events(country_code, start_date)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [] 135 | } 136 | ], 137 | "metadata": { 138 | "kernelspec": { 139 | "display_name": "Python 3", 140 | "language": "python", 141 | "name": "python3" 142 | }, 143 | "language_info": { 144 | "codemirror_mode": { 145 | "name": "ipython", 146 | "version": 3 147 | }, 148 | "file_extension": ".py", 149 | "mimetype": "text/x-python", 150 | "name": "python", 151 | "nbconvert_exporter": "python", 152 | "pygments_lexer": "ipython3", 153 | "version": "3.6.4" 154 | } 155 | }, 156 | "nbformat": 4, 157 | "nbformat_minor": 2 158 | } 159 | -------------------------------------------------------------------------------- /notebooks/Query the news articles.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Query the newsarticles tagged with CORONAVIRUS\n", 8 | "- Query using a date range\n", 9 | "- Count the news articles by using a time window of 15 minutes" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Requirements" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "Requirement already satisfied: gdelt in c:\\users\\developer\\appdata\\roaming\\python\\python36\\site-packages (0.1.10.6)\n", 29 | "Requirement already satisfied: pandas>=0.20.3 in c:\\users\\developer\\appdata\\roaming\\python\\python36\\site-packages (from gdelt) (1.0.1)\n", 30 | "Requirement already satisfied: numpy in c:\\users\\developer\\appdata\\roaming\\python\\python36\\site-packages (from gdelt) (1.18.1)\n", 31 | "Requirement already satisfied: python-dateutil in c:\\bin\\anaconda3\\lib\\site-packages (from gdelt) (2.6.1)\n", 32 | "Requirement already satisfied: requests in c:\\users\\developer\\appdata\\roaming\\python\\python36\\site-packages (from gdelt) (2.22.0)\n", 33 | "Requirement already satisfied: pytz>=2017.2 in c:\\bin\\anaconda3\\lib\\site-packages (from pandas>=0.20.3->gdelt) (2017.3)\n", 34 | "Requirement already satisfied: six>=1.5 in c:\\bin\\anaconda3\\lib\\site-packages (from python-dateutil->gdelt) (1.11.0)\n", 35 | "Requirement already satisfied: idna<2.9,>=2.5 in c:\\bin\\anaconda3\\lib\\site-packages (from requests->gdelt) (2.6)\n", 36 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\\bin\\anaconda3\\lib\\site-packages (from requests->gdelt) (1.22)\n", 37 | "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\\bin\\anaconda3\\lib\\site-packages (from requests->gdelt) (3.0.4)\n", 38 | "Requirement already satisfied: certifi>=2017.4.17 in c:\\bin\\anaconda3\\lib\\site-packages (from requests->gdelt) (2019.11.28)\n" 39 | ] 40 | }, 41 | { 42 | "name": "stderr", 43 | "output_type": "stream", 44 | "text": [ 45 | "You are using pip version 18.1, however version 20.0.2 is available.\n", 46 | "You should consider upgrading via the 'python -m pip install --upgrade pip' command.\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "!pip install --user gdelt" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "# Import modules" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 1, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "from datetime import date, timedelta\n", 68 | "from gdelt import gdelt as gdelt_client\n", 69 | "import os.path\n", 70 | "import pandas\n", 71 | "import tempfile" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 2, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "def get_graph(date, coverage=False):\n", 81 | " client = gdelt_client(version=2)\n", 82 | " graph = client.Search(date.strftime(\"%Y %m %d\"), table=\"gkg\", coverage=coverage)\n", 83 | " graph = graph.astype({\"DATE\": str})\n", 84 | " graph[\"DATE\"] = graph[\"DATE\"].apply(lambda dateStr: pandas.to_datetime(dateStr[:14], format=\"%Y%m%d%H%M%S\"))\n", 85 | " del client\n", 86 | " return graph\n", 87 | "\n", 88 | "def get_graph_range(from_date, to_date, coverage=False):\n", 89 | " date_range = to_date-from_date\n", 90 | " if date_range.days < 1:\n", 91 | " return\n", 92 | " \n", 93 | " client = gdelt_client(version=2)\n", 94 | " graph = None\n", 95 | " for day in range(0, date_range.days + 1):\n", 96 | " date = from_date + timedelta(days=day)\n", 97 | " graph_temp = client.Search(date.strftime(\"%Y %m %d\"), table=\"gkg\", coverage=coverage)\n", 98 | " graph_temp = graph_temp.astype({\"DATE\": str})\n", 99 | " graph_temp[\"DATE\"] = graph_temp[\"DATE\"].apply(lambda dateStr: pandas.to_datetime(dateStr[:14], format=\"%Y%m%d%H%M%S\"))\n", 100 | " if graph is None:\n", 101 | " graph = graph_temp\n", 102 | " else:\n", 103 | " graph = pandas.concat([graph, graph_temp], axis=0)\n", 104 | " del client\n", 105 | " return graph\n", 106 | "\n", 107 | "def count_by_theme(graph, theme):\n", 108 | " theme_graph = graph.loc[graph[\"V2Themes\"].str.contains(theme, na=False)]\n", 109 | " theme_rank = theme_graph.groupby(\"DATE\")\n", 110 | " return theme_rank.size()\n", 111 | "\n", 112 | "def save_temporary_report(date, theme, name):\n", 113 | " graph = get_graph(date, coverage=True)\n", 114 | " graph_counts = count_by_theme(graph, theme)\n", 115 | " graph_counts.columns=[\"COUNT\"]\n", 116 | " csv_file = \"{}/{}_{}.report.csv\".format(tempfile.gettempdir(), name, date.today().strftime(\"%Y%m%d\"))\n", 117 | " if os.path.isfile(csv_file):\n", 118 | " header = False\n", 119 | " else:\n", 120 | " header = graph_counts.columns\n", 121 | " graph_counts.to_csv(csv_file, header=header, mode=\"a\", index=True)\n", 122 | " del graph_counts\n", 123 | " del graph" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "# Query and count the newsarticles" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 3, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "end_date = date.today()-timedelta(days=7)\n", 140 | "start_date = end_date-timedelta(days=90)\n", 141 | "date_range = end_date-start_date\n", 142 | "for day in range(0, date_range.days + 1):\n", 143 | " report_date = start_date + timedelta(days=day)\n", 144 | " save_temporary_report(report_date, \"TAX_DISEASE_CORONAVIRUS\", \"corona\")" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [] 153 | } 154 | ], 155 | "metadata": { 156 | "kernelspec": { 157 | "display_name": "Python 3", 158 | "language": "python", 159 | "name": "python3" 160 | }, 161 | "language_info": { 162 | "codemirror_mode": { 163 | "name": "ipython", 164 | "version": 3 165 | }, 166 | "file_extension": ".py", 167 | "mimetype": "text/x-python", 168 | "name": "python", 169 | "nbconvert_exporter": "python", 170 | "pygments_lexer": "ipython3", 171 | "version": "3.6.4" 172 | } 173 | }, 174 | "nbformat": 4, 175 | "nbformat_minor": 2 176 | } 177 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore Visual Studio temporary files, build results, and 2 | ## files generated by popular Visual Studio add-ons. 3 | ## 4 | ## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore 5 | 6 | # User-specific files 7 | *.rsuser 8 | *.suo 9 | *.user 10 | *.userosscache 11 | *.sln.docstates 12 | 13 | # User-specific files (MonoDevelop/Xamarin Studio) 14 | *.userprefs 15 | 16 | # Mono auto generated files 17 | mono_crash.* 18 | 19 | # Build results 20 | [Dd]ebug/ 21 | [Dd]ebugPublic/ 22 | [Rr]elease/ 23 | [Rr]eleases/ 24 | x64/ 25 | x86/ 26 | [Aa][Rr][Mm]/ 27 | [Aa][Rr][Mm]64/ 28 | bld/ 29 | [Bb]in/ 30 | [Oo]bj/ 31 | [Ll]og/ 32 | [Ll]ogs/ 33 | 34 | # Visual Studio 2015/2017 cache/options directory 35 | .vs/ 36 | # Uncomment if you have tasks that create the project's static files in wwwroot 37 | #wwwroot/ 38 | 39 | # Visual Studio 2017 auto generated files 40 | Generated\ Files/ 41 | 42 | # MSTest test Results 43 | [Tt]est[Rr]esult*/ 44 | [Bb]uild[Ll]og.* 45 | 46 | # NUnit 47 | *.VisualState.xml 48 | TestResult.xml 49 | nunit-*.xml 50 | 51 | # Build Results of an ATL Project 52 | [Dd]ebugPS/ 53 | [Rr]eleasePS/ 54 | dlldata.c 55 | 56 | # Benchmark Results 57 | BenchmarkDotNet.Artifacts/ 58 | 59 | # .NET Core 60 | project.lock.json 61 | project.fragment.lock.json 62 | artifacts/ 63 | 64 | # StyleCop 65 | StyleCopReport.xml 66 | 67 | # Files built by Visual Studio 68 | *_i.c 69 | *_p.c 70 | *_h.h 71 | *.ilk 72 | *.meta 73 | *.obj 74 | *.iobj 75 | *.pch 76 | *.pdb 77 | *.ipdb 78 | *.pgc 79 | *.pgd 80 | *.rsp 81 | *.sbr 82 | *.tlb 83 | *.tli 84 | *.tlh 85 | *.tmp 86 | *.tmp_proj 87 | *_wpftmp.csproj 88 | *.log 89 | *.vspscc 90 | *.vssscc 91 | .builds 92 | *.pidb 93 | *.svclog 94 | *.scc 95 | 96 | # Chutzpah Test files 97 | _Chutzpah* 98 | 99 | # Visual C++ cache files 100 | ipch/ 101 | *.aps 102 | *.ncb 103 | *.opendb 104 | *.opensdf 105 | *.sdf 106 | *.cachefile 107 | *.VC.db 108 | *.VC.VC.opendb 109 | 110 | # Visual Studio profiler 111 | *.psess 112 | *.vsp 113 | *.vspx 114 | *.sap 115 | 116 | # Visual Studio Trace Files 117 | *.e2e 118 | 119 | # TFS 2012 Local Workspace 120 | $tf/ 121 | 122 | # Guidance Automation Toolkit 123 | *.gpState 124 | 125 | # ReSharper is a .NET coding add-in 126 | _ReSharper*/ 127 | *.[Rr]e[Ss]harper 128 | *.DotSettings.user 129 | 130 | # TeamCity is a build add-in 131 | _TeamCity* 132 | 133 | # DotCover is a Code Coverage Tool 134 | *.dotCover 135 | 136 | # AxoCover is a Code Coverage Tool 137 | .axoCover/* 138 | !.axoCover/settings.json 139 | 140 | # Visual Studio code coverage results 141 | *.coverage 142 | *.coveragexml 143 | 144 | # NCrunch 145 | _NCrunch_* 146 | .*crunch*.local.xml 147 | nCrunchTemp_* 148 | 149 | # MightyMoose 150 | *.mm.* 151 | AutoTest.Net/ 152 | 153 | # Web workbench (sass) 154 | .sass-cache/ 155 | 156 | # Installshield output folder 157 | [Ee]xpress/ 158 | 159 | # DocProject is a documentation generator add-in 160 | DocProject/buildhelp/ 161 | DocProject/Help/*.HxT 162 | DocProject/Help/*.HxC 163 | DocProject/Help/*.hhc 164 | DocProject/Help/*.hhk 165 | DocProject/Help/*.hhp 166 | DocProject/Help/Html2 167 | DocProject/Help/html 168 | 169 | # Click-Once directory 170 | publish/ 171 | 172 | # Publish Web Output 173 | *.[Pp]ublish.xml 174 | *.azurePubxml 175 | # Note: Comment the next line if you want to checkin your web deploy settings, 176 | # but database connection strings (with potential passwords) will be unencrypted 177 | *.pubxml 178 | *.publishproj 179 | 180 | # Microsoft Azure Web App publish settings. Comment the next line if you want to 181 | # checkin your Azure Web App publish settings, but sensitive information contained 182 | # in these scripts will be unencrypted 183 | PublishScripts/ 184 | 185 | # NuGet Packages 186 | *.nupkg 187 | # NuGet Symbol Packages 188 | *.snupkg 189 | # The packages folder can be ignored because of Package Restore 190 | **/[Pp]ackages/* 191 | # except build/, which is used as an MSBuild target. 192 | !**/[Pp]ackages/build/ 193 | # Uncomment if necessary however generally it will be regenerated when needed 194 | #!**/[Pp]ackages/repositories.config 195 | # NuGet v3's project.json files produces more ignorable files 196 | *.nuget.props 197 | *.nuget.targets 198 | 199 | # Microsoft Azure Build Output 200 | csx/ 201 | *.build.csdef 202 | 203 | # Microsoft Azure Emulator 204 | ecf/ 205 | rcf/ 206 | 207 | # Windows Store app package directories and files 208 | AppPackages/ 209 | BundleArtifacts/ 210 | Package.StoreAssociation.xml 211 | _pkginfo.txt 212 | *.appx 213 | *.appxbundle 214 | *.appxupload 215 | 216 | # Visual Studio cache files 217 | # files ending in .cache can be ignored 218 | *.[Cc]ache 219 | # but keep track of directories ending in .cache 220 | !?*.[Cc]ache/ 221 | 222 | # Others 223 | ClientBin/ 224 | ~$* 225 | *~ 226 | *.dbmdl 227 | *.dbproj.schemaview 228 | *.jfm 229 | *.pfx 230 | *.publishsettings 231 | orleans.codegen.cs 232 | 233 | # Including strong name files can present a security risk 234 | # (https://github.com/github/gitignore/pull/2483#issue-259490424) 235 | #*.snk 236 | 237 | # Since there are multiple workflows, uncomment next line to ignore bower_components 238 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622) 239 | #bower_components/ 240 | 241 | # RIA/Silverlight projects 242 | Generated_Code/ 243 | 244 | # Backup & report files from converting an old project file 245 | # to a newer Visual Studio version. Backup files are not needed, 246 | # because we have git ;-) 247 | _UpgradeReport_Files/ 248 | Backup*/ 249 | UpgradeLog*.XML 250 | UpgradeLog*.htm 251 | ServiceFabricBackup/ 252 | *.rptproj.bak 253 | 254 | # SQL Server files 255 | *.mdf 256 | *.ldf 257 | *.ndf 258 | 259 | # Business Intelligence projects 260 | *.rdl.data 261 | *.bim.layout 262 | *.bim_*.settings 263 | *.rptproj.rsuser 264 | *- [Bb]ackup.rdl 265 | *- [Bb]ackup ([0-9]).rdl 266 | *- [Bb]ackup ([0-9][0-9]).rdl 267 | 268 | # Microsoft Fakes 269 | FakesAssemblies/ 270 | 271 | # GhostDoc plugin setting file 272 | *.GhostDoc.xml 273 | 274 | # Node.js Tools for Visual Studio 275 | .ntvs_analysis.dat 276 | node_modules/ 277 | 278 | # Visual Studio 6 build log 279 | *.plg 280 | 281 | # Visual Studio 6 workspace options file 282 | *.opt 283 | 284 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.) 285 | *.vbw 286 | 287 | # Visual Studio LightSwitch build output 288 | **/*.HTMLClient/GeneratedArtifacts 289 | **/*.DesktopClient/GeneratedArtifacts 290 | **/*.DesktopClient/ModelManifest.xml 291 | **/*.Server/GeneratedArtifacts 292 | **/*.Server/ModelManifest.xml 293 | _Pvt_Extensions 294 | 295 | # Paket dependency manager 296 | .paket/paket.exe 297 | paket-files/ 298 | 299 | # FAKE - F# Make 300 | .fake/ 301 | 302 | # CodeRush personal settings 303 | .cr/personal 304 | 305 | # Python Tools for Visual Studio (PTVS) 306 | __pycache__/ 307 | *.pyc 308 | 309 | # Cake - Uncomment if you are using it 310 | # tools/** 311 | # !tools/packages.config 312 | 313 | # Tabs Studio 314 | *.tss 315 | 316 | # Telerik's JustMock configuration file 317 | *.jmconfig 318 | 319 | # BizTalk build output 320 | *.btp.cs 321 | *.btm.cs 322 | *.odx.cs 323 | *.xsd.cs 324 | 325 | # OpenCover UI analysis results 326 | OpenCover/ 327 | 328 | # Azure Stream Analytics local run output 329 | ASALocalRun/ 330 | 331 | # MSBuild Binary and Structured Log 332 | *.binlog 333 | 334 | # NVidia Nsight GPU debugger configuration file 335 | *.nvuser 336 | 337 | # MFractors (Xamarin productivity tool) working folder 338 | .mfractor/ 339 | 340 | # Local History for Visual Studio 341 | .localhistory/ 342 | 343 | # BeatPulse healthcheck temp database 344 | healthchecksdb 345 | 346 | # Backup folder for Package Reference Convert tool in Visual Studio 2017 347 | MigrationBackup/ 348 | 349 | # Ionide (cross platform F# VS Code tools) working folder 350 | .ionide/ 351 | .ipynb_checkpoints 352 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /notebooks/Spatio Temporal File Creation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Spatio Temporal Files using extracted locations from the GDELT Knowledge Graph\n", 8 | "We are reading the CSV files into a pandas dataframe.\n", 9 | "The pandas dataframes are converted into a netcdf file using latitude, longitude and time.\n", 10 | "We are using xarray for the conversion from dataframe to netcdf." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": {}, 17 | "outputs": [ 18 | { 19 | "name": "stdout", 20 | "output_type": "stream", 21 | "text": [ 22 | "Requirement already satisfied: arcgis in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (1.7.0)\n", 23 | "Requirement already satisfied: netcdf4 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (1.5.1.2)\n", 24 | "Requirement already satisfied: numpy>=1.7 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from netcdf4) (1.16.5)\n", 25 | "Requirement already satisfied: cftime in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from netcdf4) (1.0.0b1)\n", 26 | "Requirement already satisfied: setuptools>=18.0 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from cftime->netcdf4) (41.2.0)\n", 27 | "Collecting cython (from cftime->netcdf4)\n", 28 | " Downloading https://files.pythonhosted.org/packages/79/18/260304dd0108f550eb4c26aac14ee19050e0a028155a93cc41f8782b5395/Cython-0.29.15-cp36-cp36m-win_amd64.whl (1.7MB)\n", 29 | "Installing collected packages: cython\n", 30 | "Successfully installed cython-0.29.15\n" 31 | ] 32 | }, 33 | { 34 | "name": "stderr", 35 | "output_type": "stream", 36 | "text": [ 37 | " WARNING: The scripts cygdb.exe, cython.exe and cythonize.exe are installed in 'C:\\Users\\jts\\AppData\\Roaming\\Python\\Python36\\Scripts' which is not on PATH.\n", 38 | " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n" 39 | ] 40 | }, 41 | { 42 | "name": "stdout", 43 | "output_type": "stream", 44 | "text": [ 45 | "Collecting xarray\n", 46 | " Downloading https://files.pythonhosted.org/packages/e3/25/cc8ccc40d21638ae8514ce2aef1f1db3036e31c2adea797c7501302726fa/xarray-0.15.0-py3-none-any.whl (650kB)\n", 47 | "Requirement already satisfied: pandas>=0.25 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from xarray) (0.25.1)\n", 48 | "Requirement already satisfied: numpy>=1.15 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from xarray) (1.16.5)\n", 49 | "Requirement already satisfied: pytz>=2017.2 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages\\pytz-2019.2-py3.6.egg (from pandas>=0.25->xarray) (2019.2)\n", 50 | "Requirement already satisfied: python-dateutil>=2.6.1 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from pandas>=0.25->xarray) (2.8.0)\n", 51 | "Requirement already satisfied: six>=1.5 in c:\\program files\\arcgis\\pro\\bin\\python\\envs\\arcgispro-py3\\lib\\site-packages (from python-dateutil>=2.6.1->pandas>=0.25->xarray) (1.12.0)\n", 52 | "Installing collected packages: xarray\n", 53 | "Successfully installed xarray-0.15.0\n" 54 | ] 55 | } 56 | ], 57 | "source": [ 58 | "!pip install --user arcgis\n", 59 | "!pip install --user netcdf4\n", 60 | "!pip install --user xarray" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 2, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "from datetime import date\n", 70 | "import os\n", 71 | "import pandas\n", 72 | "import tempfile\n", 73 | "import xarray" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "# Read the extracted locations from the temp folder into a dataframe" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 3, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "def read_gkg_locations_from_temp():\n", 90 | " gkg_locations = None\n", 91 | " with os.scandir(tempfile.gettempdir()) as dir_scanner:\n", 92 | " for dir_entry in dir_scanner:\n", 93 | " if dir_entry.is_file():\n", 94 | " if dir_entry.name.endswith(\".gkg.csv\"):\n", 95 | " gkg_locations_temp = pandas.read_csv(dir_entry.path)\n", 96 | " if gkg_locations is None:\n", 97 | " gkg_locations = gkg_locations_temp\n", 98 | " else:\n", 99 | " gkg_locations = pandas.concat([gkg_locations, gkg_locations_temp], axis=0)\n", 100 | " return gkg_locations" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "corona_locations = read_gkg_locations_from_temp()\n", 110 | "corona_locations" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "# Convert the DATE column to datetime values\n", 118 | "- Drop the original DATE column\n", 119 | "- Rename the columns" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 5, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "ename": "AttributeError", 129 | "evalue": "'NoneType' object has no attribute 'apply'", 130 | "output_type": "error", 131 | "traceback": [ 132 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 133 | "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", 134 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mcorona_locations\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"time\"\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcorona_locations\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;32mlambda\u001b[0m \u001b[0mrecord\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mto_datetime\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrecord\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"DATE\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mformat\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m\"%Y%m%d%H%M%S\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mcorona_locations\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"DATE\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minplace\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mcorona_locations\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrename\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcolumns\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m\"Location_Lat\"\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;34m\"y\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"Location_Lon\"\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;34m\"x\"\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minplace\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 135 | "\u001b[1;31mAttributeError\u001b[0m: 'NoneType' object has no attribute 'apply'" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "corona_locations[\"time\"] = corona_locations.apply(lambda record: pandas.to_datetime(str(record[\"DATE\"]), format=\"%Y%m%d%H%M%S\"), axis=1)\n", 141 | "corona_locations.drop(\"DATE\", axis=1, inplace=True)\n", 142 | "corona_locations.rename(columns = {\"Location_Lat\":\"y\", \"Location_Lon\":\"x\"}, inplace=True)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "# Convert the dataframe to a multidimensional xarray\n", 150 | "**Warning:** Compute intensive, going to stress your CPU and memory!\n", 151 | "- Set the dataframes index using longitude, latitude and date => Reduced to Geometry and time where Geometry is a string of x#y\n", 152 | "- We cannot use WKB because bytes are not supported by dataframes, xarray needs hashable objects and netcdf does not support tuples and high level objects\n", 153 | "- Aggregate into a field named \"count\" and drop the duplicate multi-index entries\n", 154 | "- Convert to a xarray and fill \"not a number\" values in count with 0" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 335, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/html": [ 165 | "
<xarray.Dataset>\n",
166 |        "Dimensions:   (Geometry: 12171, time: 172)\n",
167 |        "Coordinates:\n",
168 |        "  * Geometry  (Geometry) object '-0.010375#51.4776' ... '99.9435#7.56991'\n",
169 |        "  * time      (time) datetime64[ns] 2020-03-14 ... 2020-03-15T19:15:00\n",
170 |        "Data variables:\n",
171 |        "    count     (Geometry, time) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
" 172 | ], 173 | "text/plain": [ 174 | "\n", 175 | "Dimensions: (Geometry: 12171, time: 172)\n", 176 | "Coordinates:\n", 177 | " * Geometry (Geometry) object '-0.010375#51.4776' ... '99.9435#7.56991'\n", 178 | " * time (time) datetime64[ns] 2020-03-14 ... 2020-03-15T19:15:00\n", 179 | "Data variables:\n", 180 | " count (Geometry, time) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0" 181 | ] 182 | }, 183 | "execution_count": 335, 184 | "metadata": {}, 185 | "output_type": "execute_result" 186 | } 187 | ], 188 | "source": [ 189 | "def index_by_coordinates_and_time(locations):\n", 190 | " locations_multi = locations[[\"x\", \"y\", \"time\"]].set_index([\"x\", \"y\", \"time\"])\n", 191 | " locations_multi[\"count\"] = locations_multi.groupby(level=[0,1,2]).size()\n", 192 | " locations_multi = locations_multi.loc[~locations_multi.index.duplicated(keep=\"first\")]\n", 193 | " locations_xarray = locations_multi.to_xarray()\n", 194 | " locations_xarray = locations_xarray.fillna(0)\n", 195 | " del locations_multi\n", 196 | " return locations_xarray\n", 197 | "\n", 198 | "def index_by_location_and_time(locations):\n", 199 | " locations_multi = locations[[\"Location_Name\", \"time\"]].set_index([\"Location_Name\", \"time\"])\n", 200 | " locations_multi[\"count\"] = locations_multi.groupby(level=[0,1]).size()\n", 201 | " locations_multi = locations_multi.loc[~locations_multi.index.duplicated(keep=\"first\")]\n", 202 | " locations_xarray = locations_multi.to_xarray()\n", 203 | " locations_xarray = locations_xarray.fillna(0)\n", 204 | " del locations_multi\n", 205 | " return locations_xarray\n", 206 | "\n", 207 | "def to_plaintext(x, y):\n", 208 | " return \"{}#{}\".format(x, y)\n", 209 | "\n", 210 | "def index_by_geometry_and_time(locations):\n", 211 | " locations_multi = locations[[\"x\", \"y\", \"time\"]].copy(deep=True)\n", 212 | " locations_multi[\"Geometry\"] = locations_multi.apply(lambda record: to_plaintext(record[\"x\"], record[\"y\"]), axis=1)\n", 213 | " locations_multi = locations_multi[[\"Geometry\", \"time\"]].set_index([\"Geometry\", \"time\"])\n", 214 | " locations_multi[\"count\"] = locations_multi.groupby(level=[0,1]).size()\n", 215 | " locations_multi = locations_multi.loc[~locations_multi.index.duplicated(keep=\"first\")]\n", 216 | " locations_xarray = locations_multi.to_xarray()\n", 217 | " locations_xarray = locations_xarray.fillna(0)\n", 218 | " del locations_multi\n", 219 | " return locations_xarray\n", 220 | "\n", 221 | "#corona_locations_xarray = index_by_coordinates_and_time(corona_locations)\n", 222 | "#corona_locations_xarray = index_by_location_and_time(corona_locations)\n", 223 | "corona_locations_xarray = index_by_geometry_and_time(corona_locations)\n", 224 | "corona_locations_xarray" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 336, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "data": { 234 | "text/html": [ 235 | "
<xarray.DataArray 'count' ()>\n",
236 |        "array(0.1513596)
" 237 | ], 238 | "text/plain": [ 239 | "\n", 240 | "array(0.1513596)" 241 | ] 242 | }, 243 | "execution_count": 336, 244 | "metadata": {}, 245 | "output_type": "execute_result" 246 | } 247 | ], 248 | "source": [ 249 | "corona_locations_xarray[\"count\"].mean()" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "# Save the xarray as a netcdf file\n", 257 | "- **Error:** module 'dask.base' has no attribute 'get_scheduler'\n", 258 | "- **Note:** We had to update dask to version '2.12.0'" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 337, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "corona_locations_xarray.to_netcdf(\"{}/corona_locations_{}.gkg.nc\".format(tempfile.gettempdir(), date.today().strftime(\"%Y%m%d\"), compute=True))\n", 268 | "\n", 269 | "del corona_locations\n", 270 | "del corona_locations_xarray" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [] 279 | } 280 | ], 281 | "metadata": { 282 | "kernelspec": { 283 | "display_name": "Python 3", 284 | "language": "python", 285 | "name": "python3" 286 | }, 287 | "language_info": { 288 | "codemirror_mode": { 289 | "name": "ipython", 290 | "version": 3 291 | }, 292 | "file_extension": ".py", 293 | "mimetype": "text/x-python", 294 | "name": "python", 295 | "nbconvert_exporter": "python", 296 | "pygments_lexer": "ipython3", 297 | "version": "3.6.9" 298 | } 299 | }, 300 | "nbformat": 4, 301 | "nbformat_minor": 2 302 | } 303 | -------------------------------------------------------------------------------- /notebooks/Knowledge Graph Queries.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Requirements" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "Requirement already satisfied: gdelt in c:\\users\\jts\\appdata\\roaming\\python\\python37\\site-packages (0.1.10.6)\n", 20 | "Requirement already satisfied: python-dateutil in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from gdelt) (2.8.0)\n", 21 | "Requirement already satisfied: numpy in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from gdelt) (1.16.4)\n", 22 | "Requirement already satisfied: pandas>=0.20.3 in c:\\users\\jts\\appdata\\roaming\\python\\python37\\site-packages (from gdelt) (1.0.3)\n", 23 | "Requirement already satisfied: requests in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from gdelt) (2.22.0)\n", 24 | "Requirement already satisfied: six>=1.5 in c:\\users\\jts\\appdata\\roaming\\python\\python37\\site-packages (from python-dateutil->gdelt) (1.12.0)\n", 25 | "Requirement already satisfied: pytz>=2017.2 in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from pandas>=0.20.3->gdelt) (2019.1)\n", 26 | "Requirement already satisfied: idna<2.9,>=2.5 in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from requests->gdelt) (2.8)\n", 27 | "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from requests->gdelt) (2020.6.20)\n", 28 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from requests->gdelt) (1.24.2)\n", 29 | "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\\users\\jts\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages (from requests->gdelt) (3.0.4)\n" 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "!pip install --user gdelt" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "# Import modules" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "name": "stderr", 51 | "output_type": "stream", 52 | "text": [ 53 | "C:\\Users\\jts\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\statsmodels\\tools\\_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n", 54 | " import pandas.util.testing as tm\n" 55 | ] 56 | } 57 | ], 58 | "source": [ 59 | "from datetime import date, timedelta\n", 60 | "from gdelt import gdelt as gdelt_client\n", 61 | "import matplotlib.pyplot as plot\n", 62 | "import pandas\n", 63 | "import seaborn\n", 64 | "import tempfile" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "# Query the knowledge graph\n", 72 | "Use coverage option for querying all daily records. Otherwise records collected from the last 15 minutes are returned.\n", 73 | "Use the date option to filter by date." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "def get_graph(date, coverage=False):\n", 83 | " client = gdelt_client(version=2)\n", 84 | " graph = client.Search(date.strftime(\"%Y %m %d\"), table=\"gkg\", coverage=coverage)\n", 85 | " graph = graph.astype({\"DATE\": str})\n", 86 | " graph[\"DATE\"] = graph[\"DATE\"].apply(lambda dateStr: dateStr[:14])\n", 87 | " del client\n", 88 | " return graph\n", 89 | "\n", 90 | "def get_graph_range(from_date, to_date, coverage=False):\n", 91 | " date_range = to_date-from_date\n", 92 | " if date_range.days < 1:\n", 93 | " return\n", 94 | " \n", 95 | " client = gdelt_client(version=2)\n", 96 | " graph = None\n", 97 | " for day in range(0, date_range.days + 1):\n", 98 | " date = from_date + timedelta(days=day)\n", 99 | " graph_temp = client.Search(date.strftime(\"%Y %m %d\"), table=\"gkg\", coverage=coverage)\n", 100 | " graph_temp = graph_temp.astype({\"DATE\": str})\n", 101 | " graph_temp[\"DATE\"] = graph_temp[\"DATE\"].apply(lambda dateStr: dateStr[:14])\n", 102 | " if graph is None:\n", 103 | " graph = graph_temp\n", 104 | " else:\n", 105 | " graph = pandas.concat([graph, graph_temp], axis=0)\n", 106 | " del client\n", 107 | " return graph\n", 108 | "\n", 109 | "def get_today_graph(coverage=False):\n", 110 | " return get_graph(date.today(), coverage)\n", 111 | "\n", 112 | "def get_yesterday_graph(coverage=False):\n", 113 | " return get_graph(date.today()-timedelta(days=1), coverage)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 4, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "end_date = date.today()-timedelta(days=0)\n", 123 | "start_date = end_date-timedelta(days=1)\n", 124 | "graph = get_graph_range(start_date, end_date, coverage=True)\n", 125 | "corona_records = graph.loc[graph[\"V2Themes\"].str.contains(\"TAX_DISEASE_CORONAVIRUS\", na=False)]\n", 126 | "\n", 127 | "report_date = end_date" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "# Save records to temp" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "corona_records.to_csv(\"{}/corona_{}.csv\".format(tempfile.gettempdir(), report_date.strftime(\"%Y%m%d\")), index=False)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "# Location type exploding and filtering" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 6, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "from enum import Enum\n", 160 | "\n", 161 | "class location_type(Enum):\n", 162 | " \"\"\"Location type\n", 163 | " Defines the different location types.\n", 164 | " \"\"\"\n", 165 | " UNKNOWN = 0\n", 166 | " COUNTRY = 1\n", 167 | " USSTATE = 2\n", 168 | " USCITY = 3\n", 169 | " WORLDCITY = 4\n", 170 | " WORLDSTATE = 5\n", 171 | "\n", 172 | "class gdelt_location:\n", 173 | " \"\"\"GDELT location\n", 174 | " Defines a GDELT location.\n", 175 | " \"\"\"\n", 176 | " def __init__(self, location_typeid=0, name=None, country_code=None, admin1_code=None, lat=None, lon=None, feature_id=None): \n", 177 | " self.location_type = location_type(int(location_typeid))\n", 178 | " self.location_name = name\n", 179 | " self.country_code = country_code\n", 180 | " self.admin1_code = admin1_code\n", 181 | " self.location_lat = lat\n", 182 | " self.location_lon = lon\n", 183 | " self.feature_id = feature_id\n", 184 | " \n", 185 | " def has_location_type(self, location_type):\n", 186 | " return location_type == self.location_type\n", 187 | " \n", 188 | " def location_type_matches(self, location_types):\n", 189 | " return self.location_type in location_types\n", 190 | " \n", 191 | " def __str__(self):\n", 192 | " return self.location_name\n", 193 | " \n", 194 | "class location_filter():\n", 195 | " \"\"\"Location Filter\n", 196 | " Defines different filters which can be applied on the dataframes.\n", 197 | " \"\"\"\n", 198 | " def filter_by_type(self, gkg_dataframe, location_type):\n", 199 | " return gkg_dataframe.loc[gkg_dataframe.apply(lambda record: record[\"GDELT_Locations\"].has_location_type(location_type), axis=1)]\n", 200 | " \n", 201 | " def filter_by_types(self, gkg_dataframe, location_types):\n", 202 | " return gkg_dataframe.loc[gkg_dataframe.apply(lambda record: record[\"GDELT_Locations\"].location_type_matches(location_types), axis=1)]\n", 203 | "\n", 204 | "def split_location_entries(locations):\n", 205 | " return [gdelt_location(*location) if 7 == len(location) else gdelt_location() for location in locations]\n", 206 | "\n", 207 | "def split_locations(record):\n", 208 | " return split_location_entries([location.split(\"#\") for location in str(record[\"Locations\"]).split(\";\")])\n" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "# Filter the locations by location type\n", 216 | "- We are exploding the records using the locations column\n", 217 | "- We are filtering by using the location type (e.g. return only cities)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 7, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "corona_locations = corona_records.copy(deep=True)\n", 227 | "if corona_locations.empty:\n", 228 | " corona_locations[\"GDELT_Locations\"] = []\n", 229 | " corona_filtered_locations = corona_locations\n", 230 | "else:\n", 231 | " corona_locations[\"GDELT_Locations\"] = corona_records.apply(lambda record: split_locations(record), axis=1)\n", 232 | " corona_locations_exploded = corona_locations.explode(\"GDELT_Locations\")\n", 233 | "\n", 234 | " filter = location_filter()\n", 235 | " corona_filtered_locations = filter.filter_by_types(corona_locations_exploded, [location_type.WORLDCITY, location_type.USCITY])\n", 236 | "\n", 237 | "del corona_records\n", 238 | "del graph" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "# Extract the coordinates and the name from the GDELT location" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 8, 251 | "metadata": {}, 252 | "outputs": [ 253 | { 254 | "data": { 255 | "text/html": [ 256 | "
\n", 257 | "\n", 270 | "\n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | "
GKGRECORDIDDATESourceCommonNameDocumentIdentifierLocation_NameLocation_LatLocation_Lon
720200820000000-720200820000000rawstory.comhttps://www.rawstory.com/2020/08/trump-slammed...Washington, Washington, United States38.8951-77.0364
720200820000000-720200820000000rawstory.comhttps://www.rawstory.com/2020/08/trump-slammed...White House, District Of Columbia, United States38.8951-77.0364
6920200820000000-6920200820000000foxnews.comhttps://www.foxnews.com/faith-values/la-county...Grace Church, California, United States39.0254-121.671
6920200820000000-6920200820000000foxnews.comhttps://www.foxnews.com/faith-values/la-county...Los Angeles County, California, United States34.3667-118.201
6920200820000000-6920200820000000foxnews.comhttps://www.foxnews.com/faith-values/la-county...Grace Community Church, California, United States39.4424-123.803
........................
7826120200821113000-165120200821113000minsterfm.comhttps://www.minsterfm.com/news/world/3165820/m...Leicester, Leicester, United Kingdom52.6333-1.13333
7826120200821113000-165120200821113000minsterfm.comhttps://www.minsterfm.com/news/world/3165820/m...Mykonos, Perifereia Notiou Aigaiou, Greece37.4525.3333
7826320200821113000-165320200821113000freemalaysiatoday.comhttps://www.freemalaysiatoday.com/category/wor...Seoul, Soul-T'ukpyolsi, South Korea37.5664127
7826320200821113000-165320200821113000freemalaysiatoday.comhttps://www.freemalaysiatoday.com/category/wor...Wuhan, Hubei, China30.5833114.267
7826420200821113000-165420200821113000suntimes.comhttps://chicago.suntimes.com/politics/2020/8/2...Springfield, Illinois, United States39.8017-89.6437
\n", 396 | "

90004 rows × 7 columns

\n", 397 | "
" 398 | ], 399 | "text/plain": [ 400 | " GKGRECORDID DATE SourceCommonName \\\n", 401 | "7 20200820000000-7 20200820000000 rawstory.com \n", 402 | "7 20200820000000-7 20200820000000 rawstory.com \n", 403 | "69 20200820000000-69 20200820000000 foxnews.com \n", 404 | "69 20200820000000-69 20200820000000 foxnews.com \n", 405 | "69 20200820000000-69 20200820000000 foxnews.com \n", 406 | "... ... ... ... \n", 407 | "78261 20200821113000-1651 20200821113000 minsterfm.com \n", 408 | "78261 20200821113000-1651 20200821113000 minsterfm.com \n", 409 | "78263 20200821113000-1653 20200821113000 freemalaysiatoday.com \n", 410 | "78263 20200821113000-1653 20200821113000 freemalaysiatoday.com \n", 411 | "78264 20200821113000-1654 20200821113000 suntimes.com \n", 412 | "\n", 413 | " DocumentIdentifier \\\n", 414 | "7 https://www.rawstory.com/2020/08/trump-slammed... \n", 415 | "7 https://www.rawstory.com/2020/08/trump-slammed... \n", 416 | "69 https://www.foxnews.com/faith-values/la-county... \n", 417 | "69 https://www.foxnews.com/faith-values/la-county... \n", 418 | "69 https://www.foxnews.com/faith-values/la-county... \n", 419 | "... ... \n", 420 | "78261 https://www.minsterfm.com/news/world/3165820/m... \n", 421 | "78261 https://www.minsterfm.com/news/world/3165820/m... \n", 422 | "78263 https://www.freemalaysiatoday.com/category/wor... \n", 423 | "78263 https://www.freemalaysiatoday.com/category/wor... \n", 424 | "78264 https://chicago.suntimes.com/politics/2020/8/2... \n", 425 | "\n", 426 | " Location_Name Location_Lat \\\n", 427 | "7 Washington, Washington, United States 38.8951 \n", 428 | "7 White House, District Of Columbia, United States 38.8951 \n", 429 | "69 Grace Church, California, United States 39.0254 \n", 430 | "69 Los Angeles County, California, United States 34.3667 \n", 431 | "69 Grace Community Church, California, United States 39.4424 \n", 432 | "... ... ... \n", 433 | "78261 Leicester, Leicester, United Kingdom 52.6333 \n", 434 | "78261 Mykonos, Perifereia Notiou Aigaiou, Greece 37.45 \n", 435 | "78263 Seoul, Soul-T'ukpyolsi, South Korea 37.5664 \n", 436 | "78263 Wuhan, Hubei, China 30.5833 \n", 437 | "78264 Springfield, Illinois, United States 39.8017 \n", 438 | "\n", 439 | " Location_Lon \n", 440 | "7 -77.0364 \n", 441 | "7 -77.0364 \n", 442 | "69 -121.671 \n", 443 | "69 -118.201 \n", 444 | "69 -123.803 \n", 445 | "... ... \n", 446 | "78261 -1.13333 \n", 447 | "78261 25.3333 \n", 448 | "78263 127 \n", 449 | "78263 114.267 \n", 450 | "78264 -89.6437 \n", 451 | "\n", 452 | "[90004 rows x 7 columns]" 453 | ] 454 | }, 455 | "execution_count": 8, 456 | "metadata": {}, 457 | "output_type": "execute_result" 458 | } 459 | ], 460 | "source": [ 461 | "def to_point_locations(gkg_dataframe):\n", 462 | " if gkg_dataframe.empty:\n", 463 | " return pandas.DataFrame(columns=[\"GKGRECORDID\", \n", 464 | " \"DATE\",\n", 465 | " \"SourceCommonName\",\n", 466 | " \"DocumentIdentifier\",\n", 467 | " \"Location_Name\",\n", 468 | " \"Location_Lat\",\n", 469 | " \"Location_Lon\"])\n", 470 | " \n", 471 | " point_locations = gkg_dataframe[[\"GKGRECORDID\", \n", 472 | " \"DATE\",\n", 473 | " \"SourceCommonName\",\n", 474 | " \"DocumentIdentifier\",\n", 475 | " \"GDELT_Locations\"]].copy(deep=True)\n", 476 | " point_locations[\"Location_Name\"] = gkg_dataframe.apply(lambda record: record[\"GDELT_Locations\"].location_name, axis=1)\n", 477 | " point_locations[\"Location_Lat\"] = gkg_dataframe.apply(lambda record: record[\"GDELT_Locations\"].location_lat, axis=1)\n", 478 | " point_locations[\"Location_Lon\"] = gkg_dataframe.apply(lambda record: record[\"GDELT_Locations\"].location_lon, axis=1)\n", 479 | " return point_locations.drop(\"GDELT_Locations\", axis=1)\n", 480 | "\n", 481 | "corona_point_locations = to_point_locations(corona_filtered_locations)\n", 482 | "del corona_filtered_locations\n", 483 | "\n", 484 | "corona_point_locations" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "# Save point locations to temp" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 9, 497 | "metadata": {}, 498 | "outputs": [], 499 | "source": [ 500 | "corona_point_locations.to_csv(\"{}/corona_locations_{}.gkg.csv\".format(tempfile.gettempdir(), report_date.strftime(\"%Y%m%d\")), index=False)\n", 501 | "\n", 502 | "del corona_point_locations" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [] 511 | } 512 | ], 513 | "metadata": { 514 | "kernelspec": { 515 | "display_name": "Python 3", 516 | "language": "python", 517 | "name": "python3" 518 | }, 519 | "language_info": { 520 | "codemirror_mode": { 521 | "name": "ipython", 522 | "version": 3 523 | }, 524 | "file_extension": ".py", 525 | "mimetype": "text/x-python", 526 | "name": "python", 527 | "nbconvert_exporter": "python", 528 | "pygments_lexer": "ipython3", 529 | "version": "3.6.4" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 2 534 | } 535 | -------------------------------------------------------------------------------- /notebooks/Mapping GDELT Events.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Mapping GDELT Events" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Requirements" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "!pip install --user arcgis\n", 24 | "!pip install --user gdelt" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Import modules" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "from arcgis.gis import GIS\n", 41 | "from arcgis.features import GeoAccessor as geo\n", 42 | "from datetime import date, timedelta \n", 43 | "from gdelt import gdelt as gdelt_client\n", 44 | "import matplotlib.pyplot as plot\n", 45 | "import seaborn" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Getting GDELT events of today or yesterday\n", 53 | "Date must be formatted as a string." 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "def get_events(date):\n", 63 | " client = gdelt_client(version=2)\n", 64 | " events = client.Search(date.strftime(\"%Y %m %d\"), table=\"events\", coverage=True)\n", 65 | " del client\n", 66 | " return events\n", 67 | "\n", 68 | "def get_today_events():\n", 69 | " return get_events(date.today())\n", 70 | "\n", 71 | "def get_yesterday_events():\n", 72 | " return get_events(date.today()-timedelta(days=1))" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 3, 78 | "metadata": {}, 79 | "outputs": [ 80 | { 81 | "data": { 82 | "text/html": [ 83 | "
\n", 84 | "\n", 97 | "\n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | "
GLOBALEVENTIDSQLDATEMonthYearYearFractionDateIsRootEventQuadClassGoldsteinScaleNumMentionsNumSources...Actor1Geo_TypeActor1Geo_LatActor1Geo_LongActor2Geo_TypeActor2Geo_LatActor2Geo_LongActionGeo_TypeActionGeo_LatActionGeo_LongDATEADDED
count2.149300e+042.149300e+0421493.00000021493.00000021493.00000021493.00000021493.00000021493.00000021493.00000021493.000000...21493.00000018699.00000018701.00000021493.00000014229.00000014230.00000021493.00000020828.00000020830.0000002.149300e+04
mean9.542017e+082.020106e+07202010.5477602019.9955802020.8362760.5146331.7982600.4548414.9684551.132880...2.39626931.790814-22.4137061.83073631.611770-19.8759012.67836031.373551-22.1761082.020111e+13
std8.832235e+036.633097e+026.6338680.0663380.0663210.4997971.1069564.5134795.6388060.910842...1.43092719.52129577.8237861.62319419.55676978.1787211.24594119.73183578.1303359.312152e+03
min9.541879e+082.019111e+07201911.0000002019.0000002019.8438000.0000001.000000-10.0000001.0000001.000000...0.000000-85.622100-175.0000000.000000-45.933300-172.1783090.000000-85.622100-172.1783092.020111e+13
25%9.541932e+082.020111e+07202011.0000002020.0000002020.8411000.0000001.000000-2.0000002.0000001.000000...1.00000027.833300-83.6487000.00000027.833300-83.6487002.00000027.748100-83.6487002.020111e+13
50%9.542018e+082.020111e+07202011.0000002020.0000002020.8411001.0000001.0000001.0000004.0000001.000000...2.00000037.768000-70.6667002.00000037.553800-43.2633003.00000037.400100-69.3977002.020111e+13
75%9.542095e+082.020111e+07202011.0000002020.0000002020.8411001.0000003.0000003.4000006.0000001.000000...4.00000042.14970035.2333003.00000042.14970035.7768004.00000042.14970035.2333002.020111e+13
max9.542170e+082.020111e+07202011.0000002020.0000002020.8411001.0000004.00000010.000000164.00000027.000000...5.00000068.900000179.5000005.00000065.000000179.5000005.00000068.900000179.5000002.020111e+13
\n", 319 | "

8 rows × 22 columns

\n", 320 | "
" 321 | ], 322 | "text/plain": [ 323 | " GLOBALEVENTID SQLDATE MonthYear Year FractionDate \\\n", 324 | "count 2.149300e+04 2.149300e+04 21493.000000 21493.000000 21493.000000 \n", 325 | "mean 9.542017e+08 2.020106e+07 202010.547760 2019.995580 2020.836276 \n", 326 | "std 8.832235e+03 6.633097e+02 6.633868 0.066338 0.066321 \n", 327 | "min 9.541879e+08 2.019111e+07 201911.000000 2019.000000 2019.843800 \n", 328 | "25% 9.541932e+08 2.020111e+07 202011.000000 2020.000000 2020.841100 \n", 329 | "50% 9.542018e+08 2.020111e+07 202011.000000 2020.000000 2020.841100 \n", 330 | "75% 9.542095e+08 2.020111e+07 202011.000000 2020.000000 2020.841100 \n", 331 | "max 9.542170e+08 2.020111e+07 202011.000000 2020.000000 2020.841100 \n", 332 | "\n", 333 | " IsRootEvent QuadClass GoldsteinScale NumMentions NumSources \\\n", 334 | "count 21493.000000 21493.000000 21493.000000 21493.000000 21493.000000 \n", 335 | "mean 0.514633 1.798260 0.454841 4.968455 1.132880 \n", 336 | "std 0.499797 1.106956 4.513479 5.638806 0.910842 \n", 337 | "min 0.000000 1.000000 -10.000000 1.000000 1.000000 \n", 338 | "25% 0.000000 1.000000 -2.000000 2.000000 1.000000 \n", 339 | "50% 1.000000 1.000000 1.000000 4.000000 1.000000 \n", 340 | "75% 1.000000 3.000000 3.400000 6.000000 1.000000 \n", 341 | "max 1.000000 4.000000 10.000000 164.000000 27.000000 \n", 342 | "\n", 343 | " ... Actor1Geo_Type Actor1Geo_Lat Actor1Geo_Long Actor2Geo_Type \\\n", 344 | "count ... 21493.000000 18699.000000 18701.000000 21493.000000 \n", 345 | "mean ... 2.396269 31.790814 -22.413706 1.830736 \n", 346 | "std ... 1.430927 19.521295 77.823786 1.623194 \n", 347 | "min ... 0.000000 -85.622100 -175.000000 0.000000 \n", 348 | "25% ... 1.000000 27.833300 -83.648700 0.000000 \n", 349 | "50% ... 2.000000 37.768000 -70.666700 2.000000 \n", 350 | "75% ... 4.000000 42.149700 35.233300 3.000000 \n", 351 | "max ... 5.000000 68.900000 179.500000 5.000000 \n", 352 | "\n", 353 | " Actor2Geo_Lat Actor2Geo_Long ActionGeo_Type ActionGeo_Lat \\\n", 354 | "count 14229.000000 14230.000000 21493.000000 20828.000000 \n", 355 | "mean 31.611770 -19.875901 2.678360 31.373551 \n", 356 | "std 19.556769 78.178721 1.245941 19.731835 \n", 357 | "min -45.933300 -172.178309 0.000000 -85.622100 \n", 358 | "25% 27.833300 -83.648700 2.000000 27.748100 \n", 359 | "50% 37.553800 -43.263300 3.000000 37.400100 \n", 360 | "75% 42.149700 35.776800 4.000000 42.149700 \n", 361 | "max 65.000000 179.500000 5.000000 68.900000 \n", 362 | "\n", 363 | " ActionGeo_Long DATEADDED \n", 364 | "count 20830.000000 2.149300e+04 \n", 365 | "mean -22.176108 2.020111e+13 \n", 366 | "std 78.130335 9.312152e+03 \n", 367 | "min -172.178309 2.020111e+13 \n", 368 | "25% -83.648700 2.020111e+13 \n", 369 | "50% -69.397700 2.020111e+13 \n", 370 | "75% 35.233300 2.020111e+13 \n", 371 | "max 179.500000 2.020111e+13 \n", 372 | "\n", 373 | "[8 rows x 22 columns]" 374 | ] 375 | }, 376 | "execution_count": 3, 377 | "metadata": {}, 378 | "output_type": "execute_result" 379 | } 380 | ], 381 | "source": [ 382 | "#events = get_today_events()\n", 383 | "events = get_yesterday_events()\n", 384 | "events.describe()" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "## Plot the number of sources\n", 392 | "We are using a logarithmic scale for plotting the number of sources." 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 4, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "Text(0,0.5,'Count')" 404 | ] 405 | }, 406 | "execution_count": 4, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | }, 410 | { 411 | "data": { 412 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAGuCAYAAAAgSr8sAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xt0VeWdP+BvkCRG1NSqoHhFKZASBYm0M8UbqFUDnSo6U2YWl2lLvSHYmoqgYkGtEpzq8loEC7VgRVovHUZd00qtdsrSLtIUHA1I4gWtommpUuQSAuf3RxcZYxKEH8m5ZD/PWlmL/e7NyeeclzeHT/Y5++SlUqlUAAAA0Kl1yXQAAAAAOp7yBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAXTMdoL1VVVVlOgIAAEBGlZWVtRjrdOUvovU7SubU1NRESUlJpmPwKcxT9jNHucE85QbzlP3MUW4wT9mprRNiXvYJAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACdA10wGS4tgpT7a5742Zw9OYBAAASCJn/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIgJwqf3V1dXHSSSdFY2NjpqMAAADklJwpfx999FHMmjUr9t1330xHAQAAyDk5U/5mzJgRkyZNiqKiokxHAQAAyDk5Uf5+9KMfRVlZWfTv3z/TUQAAAHJSTpS/JUuWxH/913/FmDFjor6+Pi655JJMRwIAAMgpXTMdYHc88cQTTX8eNmxY3H///RlMAwAAkHsycuavoaEhRowYEcuWLWs2Nm3atBg8eHAMGTIk5s6dm4loAAAAnVLaz/xt3bo1KioqYs2aNc3GZ82aFdXV1TF//vxYt25dTJ48OXr27BnDhw9vdtyvf/3rdMYFAADoFNJ65q+2tjb+5V/+JdauXdtsfNOmTbF48eK49tpro7S0NM4666wYP358LFy4MJ3xAAAAOq20nvlbvnx5DBkyJCZOnBgDBw5sGl+1alU0NDREWVlZ01hZWVncd9990djYGF277lnMmpqadsucDrmWd09t2bKl09/HzsA8ZT9zlBvMU24wT9nPHOUG85Rb0lr+Ro0a1ep4fX19FBcXR2FhYdPYIYccEtu2bYv169dH9+7d9+j7lJSU7FXOjvFam3uyM2/7qamp6fT3sTMwT9nPHOUG85QbzFP2M0e5wTxlp6qqqlbHs+KjHjZv3hwFBQXNxnZuNzQ0ZCISAABAp5IV5a+wsLBFydu5XVRUlIlIAAAAnUpWlL8ePXrEhg0bmhXA+vr6KCgoiOLi4gwmAwAA6ByyovyVlJREfn5+VFdXN41VVVVF//799/hiLwAAALSUFeWvqKgozj///JgxY0asXLkyli5dGvPmzYuxY8dmOhoAAECnkDWn1aZOnRrTp0+PcePGRbdu3WLChAlRXl6e6VgAAACdQsbK3+rVq5ttFxUVRWVlZVRWVmYoEQAAQOeVFS/7BAAAoGMpfwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAXTMdgMw6dsqTbe57Y+bwNCYBAAA6kjN/AAAACaD8AQAAJIDyBwAAkAA58Z6/hoaGmDx5crz//vtRVFQUt912W3z2s5/NdCwAAICckRNn/p588sk4/PDD46c//WkMHz485syZk+lIAAAAOSUnzvxdcMEF0djYGBER69ati4MPPjjDiQAAAHJLTpS/iIiuXbvGxRdfHC+99FLMmzcv03EAAABySk687HOnOXPmxMMPPxxXXnllpqMAAADklJwofw899FA89NBDERGx3377RZcuOREbAAAga2SkRTU0NMSIESNi2bJlzcamTZsWgwcPjiFDhsTcuXOb9o0YMSKef/75GD16dHz729+Om2++OROxAQAAclba3/O3devWqKioiDVr1jQbnzVrVlRXV8f8+fNj3bp1MXny5OjZs2cMHz48iouL4/777093VAAAgE4jreWvtrY2KioqIpVKNRvftGlTLF68OGbPnh2lpaVRWloa48ePj4ULF8bw4cP3+PvU1NS0V+S0yNa87ZVry5YtWXsf+T/mKfuZo9xgnnKDecp+5ig3mKfcktbyt3z58hgyZEhMnDgxBg4c2DS+atWqaGhoiLKysqaxsrKyuO+++6KxsTG6dt2zmCUlJe2Wuf281uaezObt+Fw1NTVZOid8nHnKfuYoN5in3GCesp85yg3mKTtVVVW1Op7W8jdq1KhWx+vr66O4uDgKCwubxg455JDYtm1brF+/Prp3756uiAAAAJ1SVlw2c/PmzVFQUNBsbOd2Q0NDJiIBAAB0KllR/goLC1uUvJ3bRUVFmYgEAADQqWRF+evRo0ds2LChWQGsr6+PgoKCKC4uzmAyAACAziEryl9JSUnk5+dHdXV101hVVVX0799/jy/2AgAAQEtZUf6Kiori/PPPjxkzZsTKlStj6dKlMW/evBg7dmymowEAAHQKWXNaberUqTF9+vQYN25cdOvWLSZMmBDl5eWZjgUAANApZKz8rV69utl2UVFRVFZWRmVlZYYSAQAAdF5Z8bJPAAAAOpbyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACZM2HvMPuOnbKk7vc/8bM4WlKAgAAucOZPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgARol/K3fv369rgZAAAAOshul7+SkpJWS97bb78dZ555ZruGAgAAoH113dXOxx9/PH7+859HREQqlYrLLrssunZt/lfq6+uje/fuHZcQAACAvbbL8nfOOefEn/70p4iIqKqqikGDBkW3bt2aHdOtW7f48pe/3HEJAQAA2Gu7LH/77bdfXHHFFRERccQRR0R5eXkUFhamJRgAAADtZ5fl7+MuuOCCqKuri//93/+NxsbGSKVSzfZfdNFF7R4OAACA9rHb5W/OnDlx++23R3FxcYuXfubl5Sl/AAAAWWy3y9+iRYviO9/5TlxyySUdmQcAAIAOsNsf9bBhw4Y455xzOjILAAAAHWS3y99Xv/rVWLRoUYv3+gEAAJD9dvtln3/961/jl7/8ZSxZsiSOOOKIyM/Pb7b/oYceavdwAAAAtI/dLn/HHXdcXHrppR2ZBQAAgA6y2+Vv5+f9AQAAkHt2u/xNnjx5l/tnzZq112EAAADoGLt9wZd99tmn2VcqlYq1a9fGf//3f8dhhx3WkRkBAADYS7t95u/WW29tdXz+/PnxyiuvtFsgAAAA2t9un/lry9lnnx3PPPNMe2QBAACgg+z2mb8dO3a0GNu4cWM8+OCDcdBBB7VrKMhFx055cpf735g5PE1JAACgpd0uf5///OcjLy+vxXhhYWHcfPPN7RoKAACA9rXb5e8nP/lJs+28vLzIz8+P3r17x/7779/uwQAAAGg/u13+vvCFL0RERF1dXdTV1cX27dujV69eih8AAEAO2O3y9+GHH8Y111wTv/nNb6K4uDi2b98eH330UZx88slx3333xQEHHNCROQEAANgLu321z5tuuinq6+vj6aefjhdffDGWL18eS5Ysic2bN7f5MRAAAABkh90uf88++2zMmDEjevXq1TTWu3fvuOGGG2Lp0qUdEg4AAID2sdvlb9999211PC8vL7Zv395ugQAAAGh/u13+hg0bFjfeeGO8/vrrTWOvvfZa3HTTTTF06NAOCQcAAED72O0Lvlx99dUxYcKEOO+885qu8PnRRx/F6aefHtOmTeuwgAAAAOy93Sp/K1eujL59+8aCBQti9erVUVdXFw0NDXHkkUfGySef3NEZAQAA2Eu7fNlnY2NjXH311fG1r30tVqxYERERffv2jfLy8njuuedizJgxcf3113vPHwAAQJbbZfmbN29evPjii/GTn/yk6UPed7rjjjti/vz5sXTp0liwYEGHhgQAAGDv7LL8Pf744zFt2rQYPHhwq/v/4R/+ISZPnhw///nPOyQcAAAA7WOX5e/dd9+Nz3/+87u8gZNPPjnefvvtdg0FAABA+9pl+TvkkEM+tdi98847cdBBB7VrKAAAANrXLsvf2WefHXfffXds27at1f3btm2Le+65J0477bQOCQcAAED72OVHPVx++eVx0UUXxciRI2PMmDFRWloaBxxwQHz44YexcuXKeOihh2Lr1q1x++23pysvAAAA/x92Wf4OOOCAWLx4cdx2220xc+bM2Lx5c0REpFKpKC4ujhEjRsSECRPis5/9bIeG3LJlS0yePDn+8pe/RGNjY0yZMiVOOumkDv2ekBTHTnnyY1uvtdj/xszh6QsDAECH+dQPeS8uLo6bb745brjhhnjrrbdiw4YNcdBBB8XRRx8dXbrs8lWj7Wbx4sXxuc99Lu6666547bXXXGEUAABgD31q+dupoKAgjj/++I7M0qaRI0c2Fc3t27dHfn5+RnIAAADkqt0uf5m0//77R0TEX/7yl5g8eXJMnTo1w4kAAAByS3pet9kO6urq4t///d/jyiuvjC984QuZjgMAAJBTcuLM35/+9KeYMGFC3HbbbXHCCSdkOg4AAEDOyciZv4aGhhgxYkQsW7as2di0adNi8ODBMWTIkJg7d27TvnvuuSc2b94cs2bNijFjxsSkSZMyERsAACBnpf3M39atW6OioiLWrFnTbHzWrFlRXV0d8+fPj3Xr1sXkyZOjZ8+eMXz48Lj11lvTHRMAAKBTSWv5q62tjYqKikilUs3GN23aFIsXL47Zs2dHaWlplJaWxvjx42PhwoUxfPief8ZYTU1Ne0VOi2zN2165tmzZktb72Nkfz3TL1dydUbrXEv9/zFNuME/ZzxzlBvOUW9Ja/pYvXx5DhgyJiRMnxsCBA5vGV61aFQ0NDVFWVtY0VlZWFvfdd180NjZG1657FrOkpKTdMreflh+evVNm83Z8rpqamna+j21njsjk45mtuT5NruZOnvZfS3QE85QbzFP2M0e5wTxlp6qqqlbH01r+Ro0a1ep4fX19FBcXR2FhYdPYIYccEtu2bYv169dH9+7d0xURAACgU8qKj3rYvHlzFBQUNBvbud3Q0JCJSAAAAJ1KVpS/wsLCFiVv53ZRUVEmIgEAAHQqWVH+evToERs2bGhWAOvr66OgoCCKi4szmAwAAKBzyIryV1JSEvn5+VFdXd00VlVVFf3799/ji70AAADQUlY0q6Kiojj//PNjxowZMXPmzKivr4958+bFTTfdlOloAJ3CsVOebHPfGzP3/CN1AIDckxXlLyJi6tSpMX369Bg3blx069YtJkyYEOXl5ZmOBQAA0ClkrPytXr262XZRUVFUVlZGZWVlhhIBAAB0Xllz5g8APm5XL1WN8HJVANhTWXHBFwAAADqW8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAvicPyAr+Yw3AID25cwfAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkABdMx0AADqT8x58LSJea3P/GzOHpy8MAHyMM38AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACdA10wEAOpNjpzzZ5r43Zg5PYxIAgOac+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgATIyfL3zDPPxNVXX53pGAAAADmja6YD7KnKysp49tlno7S0NNNRAAAAckbOnfk78cQTY/r06ZmOAQAAkFNyrvydd955kZeXl+kYAAAAOSXnyh8AAAB7TvkDAABIgIyWv4aGhhgxYkQsW7as2di0adNi8ODBMWTIkJg7d24GEwIAAHQOGbva59atW6OioiLWrFnTbHzWrFlRXV0d8+fPj3Xr1sXkyZOjZ8+eMXz48KZjvvjFL8YXv/jFdEcGAADIWRkpf7W1tVFRURGpVKrZ+KZNm2Lx4sUxe/bsKC0tjdLS0hg/fnwsXLiwWfn7NDU1Ne0duUNla972yrVly5a03sfO/nimW7bmztZcu7K3mTtqLeXiYxkhN3sn3c9N7DlzlBvMU27JSPlbvnx5DBkyJCZOnBgDBw5sGl+1alU0NDREWVlZ01hZWVncd9990djYGF277l7ckpKSds+8915rc09m83Z8rpqamna+j21njsjk45mtuT5NtubO1lyfpuPW1N6tpWz9GbQrne/fQEQ2506W9n9uor2Zo9xgnrJTVVVVq+MZKX+jRo1qdby+vj6Ki4ujsLCwaeyQQw6Jbdu2xfr166N79+7piggAANCpZNXVPjdv3hwFBQXNxnZuNzQ0ZCISAABAp5CxC760prCwsEXJ27ldVFSUiUgAQAYdO+XJNve9MXP3rwcAQJad+evRo0ds2LChWQGsr6+PgoKCKC4uzmAyAACA3JZV5a+kpCTy8/Ojurq6aayqqir69++/2xd7AQAAoKWsKn9FRUVx/vnnx4wZM2LlypWxdOnSmDdvXowdOzbT0QAAAHJa1p1Omzp1akyfPj3GjRsX3bp1iwkTJkR5eXmmYwEAAOS0jJe/1atXN9suKiqKysrKqKyszFAiAACAzierXvYJAABAx1D+AAAAEkD5AwAASADlDwAAIAGUPwAAgARQ/gAAABJA+QMAAEiAjH/OHwDQ8Y6d8uQu978xc3iakgCQKc78AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSAq30CAAC0YldXSs7FqyQ78wcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkgPIHAACQAMofAABAAih/AAAACaD8AQAAJIDyBwAAkADKHwAAQAIofwAAAAmg/AEAACSA8gcAAJAAyh8AAEACKH8AAAAJoPwBAAAkQNdMBwAAyDXHTnmyzX1vzByexiQAu8+ZPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAGUPwAAgATomukAu2PHjh0xffr0ePXVVyM/Pz9uueWWOOqoozIdCwAAIGfkxJm/X/3qV7F9+/ZYtGhRXHnllTFr1qxMRwIAAMgpOVH+/vCHP8Spp54aEREnn3xyvPTSSxlOBAAAkFtyovxt3LgxDjjggKbtVCqVwTQAAAC5JyfK3/777x8fffRR0/Y+++yTwTQAAAC5JyfK36BBg+L555+PiIjly5dHSUlJhhMBAADkloyUv4aGhhgxYkQsW7as2di0adNi8ODBMWTIkJg7d27TvrPPPju6dOkSo0aNiv/4j/+IyZMnZyI2AABAzkr7Rz1s3bo1KioqYs2aNc3GZ82aFdXV1TF//vxYt25dTJ48OXr27BnDhw+PLl26xI033pjuqAAAAJ1GWstfbW1tVFRUtLhgy6ZNm2Lx4sUxe/bsKC0tjdLS0hg/fnwsXLgwhg8fvsffp6ampr0ip0W25m2vXFu2bEnrfezsj2e6ZWvubM21K3ubuaPWUi4+lhFyt7dszbUre5O5I5+bcvGxzEbp/v/Dnjrvwdd2uf/pccelKUlmZfs8daRcvN9pLX/Lly+PIUOGxMSJE2PgwIFN46tWrYqGhoYoKytrGisrK4v77rsvGhsbo2vXPYuZne8JbPsHRGbzdnyumpqadr6Pu/5hm7nHM1tzfZpszZ2tuT5Nx62pvVtL2fozaFc637+BCGtqz3XMv929f27KxTWVW9r//w/tLVfXVPvK/nnaW7m51quqqlodT2v5GzVqVKvj9fX1UVxcHIWFhU1jhxxySGzbti3Wr18f3bt3T1dEAACATikrrva5efPmKCgoaDa2c7uhoSETkQAAADqVrCh/hYWFLUrezu2ioqJMRAIAAOhUsqL89ejRIzZs2NCsANbX10dBQUEUFxdnMBkAAEDnkBXlr6SkJPLz86O6urpprKqqKvr377/HF3sBAACgpawof0VFRXH++efHjBkzYuXKlbF06dKYN29ejB07NtPRAAAAOoWsOa02derUmD59eowbNy66desWEyZMiPLy8kzHAgAA6BQyVv5Wr17dbLuoqCgqKyujsrIyQ4kAAAA6r6x42ScAAAAdS/kDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAHyUqlUKtMh2lNVVVWmIwAAAGRUWVlZi7FOV/4AAABoycs+AQAAEkD5AwAASADlj722du3auPTSS2Pw4MFx2mmnxcyZM2Pr1q2tHvuNb3wj+vbt2+zrmWeeSXPi5FmyZEmLx/3yyy9v9dhVq1bF1772tRgwYECMHDkyVq5cmea0yfTYY4+1mKOdX++8806L462l9GtoaIgRI0bEsmXLmsY++OCDmDRpUgwaNCiGDRsWjz/++C5v44UXXoivfOUrMWDAgBgzZky8+eabHR07cVqbp5dffjnGjBkTJ510UgwbNizuv//+2LFjR5u3cc4557RYXzU1NemInwitzdHs2bNbPObf//7327wNa6njfXKe7r777lafo/r169fmbVhL2adrpgOQ2xoaGuLSSy+N3r17x6JFi+Ivf/lLXHvttRERMWXKlBbH19bWxh133BGDBw9uGisuLk5b3qSqra2Ns88+O773ve81jRUWFrY4btOmTTF+/PgoLy+PW265JRYtWhSXXHJJ/OpXv4r9998/nZETp7y8PE499dSm7R07dsRll10WRx55ZPTs2bPF8dZSem3dujUqKipizZo1zcanTJkSmzZtiocffjheeumluOGGG+KYY46JQYMGtbiNd999Ny677LK4/PLLY+jQoXHvvffG5ZdfHkuWLIkuXfwutj20Nk8ffPBBfOtb34rzzjsvbrzxxnjjjTdiypQpsd9++8WYMWNa3EZDQ0O89dZb8fDDD8dRRx3VNH7QQQel5T50dm2tpdra2hgzZkxccsklTWNFRUWt3oa11PFam6dvfOMbMWrUqKbtLVu2xOjRo6O8vLzV27CWspPyx15ZuXJlrF27Nn72s59Ft27d4vjjj48rr7wyZs6c2aL8bdy4Md5777048cQT49BDD81Q4mSqq6uLvn37furj/tRTT0V+fn5MmTIlunTpEtdee20899xz8fTTT8c///M/pyltMu27776x7777Nm0vXLgw3nnnnZg/f36LY62l9KqtrY2Kior45PXR1q5dG88++2z88pe/jGOOOSb69u0b1dXV8dOf/rTV8rd48eLo169ffOtb34qIiFtuuSWGDBkSL7zwQnzpS19Ky33pzNqap+eeey66du0a1113XXTp0iV69eoVX//612PJkiWtlr/XXnst8vLy4oQTToj8/Px0xU+EtuYo4u/PU2PGjNmtn2nWUsdqa566desW3bp1a9qurKyMbt26xVVXXdXq7VhL2cmvR9grxx13XMyZM6fZD4O8vLxoaGhocWxtbW0UFha2ehaDjlVbWxu9evX61ONWrFgRgwYNavrNaV5eXgwaNCiqq6s7OiIfs3Hjxrjnnnti0qRJrZ7Ns5bSa/ny5TFkyJB45JFHmo2vWLEiDj300DjmmGOaxsrKyuKPf/xjq7ezYsWKZmdqi4qKon///tZXO2lrnr7whS/E7bff3uyMUF5eXptvT6irq4sjjzzSf1Y7QFtzlEql4vXXX9+t56kIa6mjtTVPH/enP/0pFixYENdcc02ba8Vayk7O/LFXPvvZzzb7LduOHTti4cKFrX6uSG1tbRx44IHxne98J6qqquKwww6LiRMnxumnn57OyImz82UXzz77bNx1112xY8eOOPfcc2PSpElRUFDQ7Nj6+voWT74HH3xwrFq1Kp2RE++RRx6JgoKCNs+2Wkvp9fGXOX1cfX19dO/evdnYwQcfHOvWrduj49977732CZpwbc3T4YcfHocffnjT9pYtW2Lx4sUxdOjQVo+vra2NffbZJ8aPHx81NTXRq1evuPrqq2PAgAEdkjtJ2pqjt99+OzZv3hyLFy+Oq666KvaTJaAaAAAL00lEQVTdd9+48MIL4xvf+EarL+O0ljpWW/P0cT/60Y+ipKRkl8871lJ2cuaPdnXrrbdGTU1NVFRUtNhXV1cXH330UQwbNiweeOCBOP300+PSSy+NFStWZCBpcrz55pvR2NgY++23X9x1110xefLkWLJkSdx6660tjt28eXOLQlhQUNDqmVw6RiqVikceeSRGjx69y9+mWkuZ19Z62bZtW6sva7O+Mm/79u1x9dVXx+bNm5u9t+zj6urqYsOGDfFv//ZvMWfOnDj++ONj3Lhx8fbbb6c5bXLU1dVFRESPHj1i9uzZcfHFF8fs2bNj3rx5rR5vLWXWpk2b4he/+EV8/etf3+Vx1lJ2cuaPdpFKpeL73/9+PPzww3HnnXfG5z73uRbHfPe7343LLrssDjzwwIiI6NevX7z88suxaNEivwXqQJ/73OfihRdeaHqDdb9+/SKVSkVFRUVcd9110bXr//0YKCwsbPHk2dDQ0Oy9aHSsl19+OdauXRtf/epX2zzGWsoOu1oveXl5u338Zz7zmQ7Nyd81NDTEd7/73fif//mf+PGPf9zme8t+8IMfxNatW5sucjV9+vT4wx/+EE888URcccUV6YycGGeccUaz56m+ffvGX//613jooYdi/PjxLY63ljLrt7/9baRSqTjrrLN2eZy1lJ2c+WOv7dixI6699tpYtGhR3HHHHW3+MNhnn32a/rO603HHHRfvv/9+OmIm2ievrHX88cfHtm3bYv369c3Ge/ToEfX19c3G/vznP7uoSBo9//zzMWDAgOjRo0ebx1hL2aFHjx7x5z//udnYrtaL9ZU5W7Zsicsuuyx+97vfxQMPPLDLX5Lk5+c3u7pxXl6e9ZUGrT1PtfWYW0uZ9fzzz8cZZ5zR4uzrJ1lL2Un5Y6/NnDkzlixZEnfffXd8+ctfbvO4SZMmxfTp05uN7XwNOB3nl7/8ZXzpS19q9lvSV155JQ488MAWT5QDBgyI6urqppespVKpqK6ujoEDB6Y1c5J98kIGrbGWssPAgQPjvffea/YSpqqqqjaLxYABA+IPf/hD0/bmzZvjlVdesb7S4Lvf/W6sXLky5s+f3+p70j/uoosuijlz5jRt79ixI1avXh3HHXdcR8dMrAcffDC+8pWvNBt75ZVX2vyZZi1l1u48T0VYS9lK+WOv/PGPf4wHH3wwJk2aFKWlpVFfX9/0FfH3N2Vv2bIlIiKGDRsWjz76aCxZsiTeeOONuOuuu6KqqirGjh2bybvQ6Q0ePDhSqVTccMMN8frrr8dvfvObmDVrVnzzm9+MvLy8ZnN07rnnxqZNm+Kmm26K2trauPXWW2Pjxo1tfoYP7W/NmjXRu3fvFuPWUvY56qij4pRTTolrrrkmVq1a1TQno0ePjoi/v7+svr6+6RcvF154YaxYsSJ++MMfRm1tbVx33XXRs2fP+Md//MdM3o1O76mnnopf/epXMW3atDj88MObnqN2vvLhk/N0xhlnxI9+9KN47rnn4rXXXovp06fHhx9+GBdeeGEm70anduqpp8batWvjBz/4Qbz55puxZMmSmDt3btNHOVhL2aOxsTFef/31Vt/eYy3liBTshZkzZ6b69OnT6te2bdtSffr0ST366KNNxy9YsCB11llnpUpLS1MjR45M/f73v89g+uR4+eWXU6NHj04NHDgwdcopp6Tuvvvu1I4dO1KpVKrFHK1YsSJ1/vnnp0pLS1MXXnhh6qWXXspU7EQ64YQTUs8++2yLcWspO/Tp0yf1u9/9rmn7z3/+c+qSSy5JnXDCCamhQ4emnnjiiaZ9b731VqpPnz6pF154oWnsN7/5Teqcc85JnXjiiakxY8ak3nzzzbTmT4qPz9PEiRNbfY469dRTU6lUy3lqbGxM3XnnnanTTz89dcIJJ6RGjx6dWrVqVcbuS2f1ybW0bNmy1MiRI1MnnnhiatiwYamHHnqoaZ+1lDmfnKf6+vpUnz59Uq+++mqLY62l3JCXSrVySTIAAAA6FS/7BAAASADlDwAAIAGUPwAAgARQ/gAAABJA+QMAAEgA5Q8AACABumY6AACkU2NjY8yZMycef/zxePfdd+Oggw6KM844I7797W/HwQcfnOl4ANBhfM4fAIlSWVkZzz//fFx77bVx7LHHxrvvvhu33XZbbNu2LR599NHIy8vLdEQA6BDKHwCJ8sUvfjFmzJgR5557btPYW2+9FWeddVY88sgjMXDgwAymA4CO4z1/ACTOCy+8ENu3b2/aPuqoo+LJJ5+Mfv36xY4dO+KBBx6Is846K0488cQYPXp0rFq1qunYvn37xrJly5q2H3vssTjttNMiIuLFF1+M0047LW688cYoKyuLu+++OyIiFixYEGeeeWacdNJJMXbs2Kirq2v6+4888kjTvn/913+NlStXNu178cUXY+TIkXHiiSfGGWecEffff3+HPSYAdH7KHwCJMnbs2Hj44Ydj6NChcf3118eTTz4ZGzZsiN69e8e+++4b9957b8ybNy+mTp0ajz/+eBx55JExfvz42Lhx427d/nvvvRcbN26Mxx9/PC644IL42c9+FnfccUd85zvfiSeeeCIOO+ywuPzyyyOVSsWvf/3ruPPOO5u+12mnnRbjxo2L999/P7Zv3x6TJk2KoUOHxlNPPRU33HBD3HvvvfHb3/62gx8hADor5Q+ARJkwYULccccdcfTRR8djjz0WV111VZxyyinxwAMPRCqVioULF8YVV1wRZ555Zhx//PFx0003RdeuXeMXv/jFbn+P8ePHx9FHHx1HHnlkLFq0KMaMGRMjRoyIY445JqZNmxbDhg2LjRs3xgMPPBAXX3xxnHXWWXHsscfGZZddFqWlpfGzn/0s/va3v8UHH3wQBx98cBx55JExbNiw+PGPfxz9+vXrwEcHgM7M1T4BSJzy8vIoLy+PDRs2xLJly+KRRx6J2267LQ4++OD44IMPYsCAAU3H5ufnR2lpabOXan6aI444ounPdXV1cemllzZtH3DAAXHNNdc07bv99tvjzjvvbNrf0NAQhx12WHzmM5+Jiy++OGbMmBE//OEPY+jQofFP//RPceihh+7NXQcgwZQ/ABJj1apV8fOf/zyuv/76iIg48MAD49xzz41zzjknLrroovj973/f6t/bvn17s/cIfnLfJxUWFjb9OT8/v80827dvj2uuuSZOOeWUZuP77bdfRERUVFTEBRdcEEuXLo1nn302xowZEzfffHNceOGFu76jANAKL/sEIDG2b98eCxYsiD/+8Y/NxvPy8uKAAw6II444Ig499NBYsWJF075t27bFyy+/HL169YqIv5e5j7//76233trl9zzmmGPilVdeadretGlTDBkyJF599dXo1atXrFu3Lo455pimr3nz5sXvf//7qK+vj+nTp8cRRxwR3/rWt+KnP/1pjBw5Mp5++un2eCgASCBn/gBIjP79+8fQoUPjiiuuiIqKijj55JPjgw8+iGeeeSZqampi5syZsd9++8U999wTPXr0iGOPPTYeeOCB2Lp1a4wYMSIiIk444YSYP39+9OnTJ15//fV47LHHokuXtn+XOnbs2LjxxhujX79+0bdv37j33nvjM5/5TPTu3Tu+/vWvx7XXXhvHHXdclJWVxX/+53/Go48+GqNGjYri4uJ45plnYseOHfHNb34zPvzww1i+fHmzj6gAgD3hc/4ASJQtW7bEnDlz4qmnnop33nknCgoKYvDgwVFRURG9e/eOHTt2xD333BOLFy+Ov/3tbzFw4MC47rrrok+fPhER8corr8T1118fr776apSWlsaFF14Yd999dzz//PPx4osvxtixY+Pll1+Orl3/7/erc+fOjQULFsTf/va3GDRoUHzve9+Lo48+OiL+/jEQP/7xj+P999+P4447Lq666qo4/fTTIyLipZdeiltuuSVWrVoVhYWFUV5eHlOmTImCgoL0P3AA5DzlDwAAIAG85w8AACABlD8AAIAEUP4AAAASQPkDAABIAOUPAAAgAZQ/AACABFD+AAAAEkD5AwAASADlDwAAIAH+HwcgLFJBpyB9AAAAAElFTkSuQmCC\n", 413 | "text/plain": [ 414 | "
" 415 | ] 416 | }, 417 | "metadata": {}, 418 | "output_type": "display_data" 419 | } 420 | ], 421 | "source": [ 422 | "seaborn.set_style(\"whitegrid\")\n", 423 | "plot.rcParams[\"figure.figsize\"] = (15, 7)\n", 424 | "\n", 425 | "figure, axis = plot.subplots()\n", 426 | "events[\"NumSources\"].hist(ax=axis, bins=100)\n", 427 | "axis.set_yscale(\"log\")\n", 428 | "axis.tick_params(labelsize=14)\n", 429 | "axis.set_xlabel(\"Sources\", fontsize=14)\n", 430 | "axis.set_ylabel(\"Count\", fontsize=14)" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "## Connect to ArcGIS Online anonymously" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 5, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "gis = GIS()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "## Display a map of europe\n", 454 | "***Hint:*** *map.basemaps shows a list of all available basemaps*" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 6, 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [ 463 | "def get_europe_map():\n", 464 | " focus_map = gis.map(\"Europe\")\n", 465 | " focus_map.basemap = \"dark-gray-vector\"\n", 466 | " return focus_map\n", 467 | "\n", 468 | "focus_map = get_europe_map()" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 7, 474 | "metadata": { 475 | "scrolled": true 476 | }, 477 | "outputs": [ 478 | { 479 | "data": { 480 | "application/vnd.jupyter.widget-view+json": { 481 | "model_id": "7999fdcde3034247b0991a19137920bb", 482 | "version_major": 2, 483 | "version_minor": 0 484 | }, 485 | "text/html": [ 486 | "

Failed to display Jupyter Widget of type MapView.

\n", 487 | "

\n", 488 | " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", 489 | " that the widgets JavaScript is still loading. If this message persists, it\n", 490 | " likely means that the widgets JavaScript library is either not installed or\n", 491 | " not enabled. See the Jupyter\n", 492 | " Widgets Documentation for setup instructions.\n", 493 | "

\n", 494 | "

\n", 495 | " If you're reading this message in another frontend (for example, a static\n", 496 | " rendering on GitHub or NBViewer),\n", 497 | " it may mean that your frontend doesn't currently support widgets.\n", 498 | "

\n" 499 | ], 500 | "text/plain": [ 501 | "MapView(layout=Layout(height='400px', width='100%'))" 502 | ] 503 | }, 504 | "metadata": {}, 505 | "output_type": "display_data" 506 | }, 507 | { 508 | "data": { 509 | "text/html": [ 510 | "
" 511 | ], 512 | "text/plain": [ 513 | "" 514 | ] 515 | }, 516 | "metadata": {}, 517 | "output_type": "display_data" 518 | } 519 | ], 520 | "source": [ 521 | "focus_map" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "## Create a spatial dataframe for mapping the events from the pandas dataframe\n", 529 | "We are using the ActionGeo_Long and ActionGeo_Lat for locating the events. The spatial reference is WGS84 having EPSG-Code of 4326.\n", 530 | "- Extract only the columns relevant for mapping and having a simple type (e.g. int64, float64)\n", 531 | "- We are dropping all records having \"not a number\" for latitude or longitude\n", 532 | "- We are slicing and creating a deep copy from the events dataframe" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 8, 538 | "metadata": { 539 | "scrolled": true 540 | }, 541 | "outputs": [ 542 | { 543 | "name": "stdout", 544 | "output_type": "stream", 545 | "text": [ 546 | "\n", 547 | "Int64Index: 51802 entries, 0 to 53241\n", 548 | "Data columns (total 7 columns):\n", 549 | " # Column Non-Null Count Dtype \n", 550 | "--- ------ -------------- ----- \n", 551 | " 0 GLOBALEVENTID 51802 non-null int64 \n", 552 | " 1 ActionGeo_FullName 51802 non-null object \n", 553 | " 2 ActionGeo_Long 51802 non-null float64 \n", 554 | " 3 ActionGeo_Lat 51802 non-null float64 \n", 555 | " 4 NumMentions 51802 non-null int64 \n", 556 | " 5 SOURCEURL 51802 non-null object \n", 557 | " 6 SHAPE 51802 non-null geometry\n", 558 | "dtypes: float64(2), geometry(1), int64(2), object(2)\n", 559 | "memory usage: 3.2+ MB\n" 560 | ] 561 | } 562 | ], 563 | "source": [ 564 | "mapping_events = events[[\"GLOBALEVENTID\", \n", 565 | " \"ActionGeo_FullName\", \n", 566 | " \"ActionGeo_Long\", \n", 567 | " \"ActionGeo_Lat\", \n", 568 | " \"NumMentions\", \n", 569 | " \"SOURCEURL\"]].dropna(subset=[\"ActionGeo_Lat\", \"ActionGeo_Long\"]) \n", 570 | "geo_events = geo.from_xy(mapping_events, x_column=\"ActionGeo_Long\", y_column=\"ActionGeo_Lat\", sr=4326)\n", 571 | "geo_events.info()" 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "metadata": {}, 577 | "source": [ 578 | "## Display a subset of GDELT events on the map\n", 579 | "- Sort the events by using the number of mentions in all source documents. The number of mentions could be used to determine how important the event is. Multiple references within a single document are counted too." 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 9, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "application/vnd.jupyter.widget-view+json": { 590 | "model_id": "addd6f3cd261463f93556e95431a8e74", 591 | "version_major": 2, 592 | "version_minor": 0 593 | }, 594 | "text/html": [ 595 | "

Failed to display Jupyter Widget of type MapView.

\n", 596 | "

\n", 597 | " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", 598 | " that the widgets JavaScript is still loading. If this message persists, it\n", 599 | " likely means that the widgets JavaScript library is either not installed or\n", 600 | " not enabled. See the Jupyter\n", 601 | " Widgets Documentation for setup instructions.\n", 602 | "

\n", 603 | "

\n", 604 | " If you're reading this message in another frontend (for example, a static\n", 605 | " rendering on GitHub or NBViewer),\n", 606 | " it may mean that your frontend doesn't currently support widgets.\n", 607 | "

\n" 608 | ], 609 | "text/plain": [ 610 | "MapView(layout=Layout(height='400px', width='100%'))" 611 | ] 612 | }, 613 | "metadata": {}, 614 | "output_type": "display_data" 615 | }, 616 | { 617 | "data": { 618 | "text/html": [ 619 | "
" 620 | ], 621 | "text/plain": [ 622 | "" 623 | ] 624 | }, 625 | "metadata": {}, 626 | "output_type": "display_data" 627 | } 628 | ], 629 | "source": [ 630 | "del focus_map\n", 631 | "focus_map = get_europe_map()\n", 632 | "top_mapping_events = mapping_events.sort_values(by=\"NumMentions\", ascending=False).head(n=1000)\n", 633 | "top_geo_events = geo.from_xy(top_mapping_events, x_column=\"ActionGeo_Long\", y_column=\"ActionGeo_Lat\", sr=4326)\n", 634 | "top_geo_events.spatial.plot(map_widget=focus_map)\n", 635 | "focus_map" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "metadata": {}, 641 | "source": [ 642 | "## Display the GDELT events using a heatmap\n", 643 | "We are creating a new map widget and convert all \"mapping events\" to a spatial data frame." 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 10, 649 | "metadata": {}, 650 | "outputs": [ 651 | { 652 | "data": { 653 | "application/vnd.jupyter.widget-view+json": { 654 | "model_id": "a2d60e5a230f4dec9908830687522fea", 655 | "version_major": 2, 656 | "version_minor": 0 657 | }, 658 | "text/html": [ 659 | "

Failed to display Jupyter Widget of type MapView.

\n", 660 | "

\n", 661 | " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", 662 | " that the widgets JavaScript is still loading. If this message persists, it\n", 663 | " likely means that the widgets JavaScript library is either not installed or\n", 664 | " not enabled. See the Jupyter\n", 665 | " Widgets Documentation for setup instructions.\n", 666 | "

\n", 667 | "

\n", 668 | " If you're reading this message in another frontend (for example, a static\n", 669 | " rendering on GitHub or NBViewer),\n", 670 | " it may mean that your frontend doesn't currently support widgets.\n", 671 | "

\n" 672 | ], 673 | "text/plain": [ 674 | "MapView(layout=Layout(height='400px', width='100%'))" 675 | ] 676 | }, 677 | "metadata": {}, 678 | "output_type": "display_data" 679 | }, 680 | { 681 | "data": { 682 | "text/html": [ 683 | "
" 684 | ], 685 | "text/plain": [ 686 | "" 687 | ] 688 | }, 689 | "metadata": {}, 690 | "output_type": "display_data" 691 | } 692 | ], 693 | "source": [ 694 | "del focus_map\n", 695 | "heat_focus_map = get_europe_map()\n", 696 | "geo_events.spatial.plot(map_widget=heat_focus_map, renderer_type=\"h\")\n", 697 | "heat_focus_map" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": null, 703 | "metadata": {}, 704 | "outputs": [], 705 | "source": [] 706 | } 707 | ], 708 | "metadata": { 709 | "kernelspec": { 710 | "display_name": "Python 3", 711 | "language": "python", 712 | "name": "python3" 713 | }, 714 | "language_info": { 715 | "codemirror_mode": { 716 | "name": "ipython", 717 | "version": 3 718 | }, 719 | "file_extension": ".py", 720 | "mimetype": "text/x-python", 721 | "name": "python", 722 | "nbconvert_exporter": "python", 723 | "pygments_lexer": "ipython3", 724 | "version": "3.6.4" 725 | } 726 | }, 727 | "nbformat": 4, 728 | "nbformat_minor": 2 729 | } 730 | --------------------------------------------------------------------------------