├── 1. Downloading the tweets.ipynb ├── 2. Parse the tweet file.ipynb ├── 3. Heatmap.ipynb ├── 4. Sentiment Analysis.ipynb ├── heatmap.py ├── murcia_tweets_polarity.png ├── murcia_tweets_polarity_layered_combined_binary.png ├── murcia_tweets_polarity_layered_combined_binary2.png └── tweets.txt /1. Downloading the tweets.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 4, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stdout", 12 | "output_type": "stream", 13 | "text": [ 14 | "11/02/2015 22:10:55\n", 15 | "\n", 16 | "CPython 2.7.10\n", 17 | "IPython 4.0.0\n", 18 | "\n", 19 | "compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)\n", 20 | "system : Linux\n", 21 | "release : 3.13.0-66-generic\n", 22 | "machine : x86_64\n", 23 | "processor : x86_64\n", 24 | "CPU cores : 8\n", 25 | "interpreter: 64bit\n" 26 | ] 27 | } 28 | ], 29 | "source": [ 30 | "%load_ext watermark\n", 31 | "%watermark" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "**TWEETS HEATMAP OF MURCIA**" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "**1. Getting the tweets**" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "*In order to get the tweets, we need to use Twitter Streaming api. I found that [Tweepy](http://tweepy.readthedocs.org/en/latest/getting_started.html), one (of many) Twitter API Python wrappers does an outstanding job in capturing tweets via Twitter Streaming API*\n", 53 | "\n", 54 | "Here is the code I used. I filtered the tweets by location. Twitter allows for filtering using a bounding box set of coordinates using the following structure:\n", 55 | "\n", 56 | "**location=[sw_longitude, sw_latitude, ne_longitude, ne_latitude]**\n", 57 | "\n", 58 | "*(interesting how some apis follow (lat, lon) and others (lon, lat). We need a standard)*" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": { 65 | "collapsed": false 66 | }, 67 | "outputs": [], 68 | "source": [ 69 | "import json\n", 70 | "from tweepy import Stream\n", 71 | "from tweepy import OAuthHandler\n", 72 | "from tweepy.streaming import StreamListener\n", 73 | "\n", 74 | "\n", 75 | "ckey = YOUR_CONSUMER_KEY_HERE\n", 76 | "csecret = YOUR_CONSUMER_SECRET_HERE\n", 77 | "atoken = YOUR_TWITTER_APP_TOKEN_HERE\n", 78 | "asecret = YOUR_TWITTER_APP_SECRET_HERE\n", 79 | "\n", 80 | "murcia = [-1.157420, 37.951741, -1.081202, 38.029126] #Check it out, is a very nice city!\n", 81 | "\n", 82 | "file = open('tweets.txt', 'a')\n", 83 | "\n", 84 | "class listener(StreamListener):\n", 85 | "\n", 86 | " def on_data(self, data):\n", 87 | " # Twitter returns data in JSON format - we need to decode it first\n", 88 | " try:\n", 89 | " decoded = json.loads(data)\n", 90 | " except Exception as e:\n", 91 | " print e #we don't want the listener to stop\n", 92 | " return True\n", 93 | " \n", 94 | " if decoded.get('geo') is not None:\n", 95 | " location = decoded.get('geo').get('coordinates')\n", 96 | " else:\n", 97 | " location = '[,]'\n", 98 | " text = decoded['text'].replace('\\n',' ')\n", 99 | " user = '@' + decoded.get('user').get('screen_name')\n", 100 | " created = decoded.get('created_at')\n", 101 | " tweet = '%s|%s|%s|s\\n' % (user,location,created,text)\n", 102 | " \n", 103 | " file.write(tweet)\n", 104 | " print tweet\n", 105 | " return True\n", 106 | "\n", 107 | " def on_error(self, status):\n", 108 | " print status\n", 109 | "\n", 110 | "if __name__ == '__main__':\n", 111 | " print 'Starting'\n", 112 | " \n", 113 | " auth = OAuthHandler(ckey, csecret)\n", 114 | " auth.set_access_token(atoken, asecret)\n", 115 | " twitterStream = Stream(auth, listener())\n", 116 | " twitterStream.filter(locations=murcia)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "**Run it...and wait.**\n", 124 | "\n", 125 | "The script will capture all the tweets that fit within that bounding box we setup. \n", 126 | "\n", 127 | "One important thing to notice is that the api is not 100% accurate on the data it returns. I found several geocoded tweets that didn't belong to the specified box. \n", 128 | "\n", 129 | "Since the script has to be running in order to capture all the tweets, you can run this on a spare computer if you have one, or alternatively you can consider online services such as [RedHat](http://www.redhat.com/) or [PythonAnywhere](https://www.pythonanywhere.com/), or rent your ownn tiny machine on the cloud with services like [Digital Ocean](digitalocean.com) or [Amazon Web Services](aws.amazon.com)" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "
" 137 | ] 138 | } 139 | ], 140 | "metadata": { 141 | "kernelspec": { 142 | "display_name": "Python 2", 143 | "language": "python", 144 | "name": "python2" 145 | }, 146 | "language_info": { 147 | "codemirror_mode": { 148 | "name": "ipython", 149 | "version": 2 150 | }, 151 | "file_extension": ".py", 152 | "mimetype": "text/x-python", 153 | "name": "python", 154 | "nbconvert_exporter": "python", 155 | "pygments_lexer": "ipython2", 156 | "version": "2.7.10" 157 | } 158 | }, 159 | "nbformat": 4, 160 | "nbformat_minor": 0 161 | } 162 | -------------------------------------------------------------------------------- /2. Parse the tweet file.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "###LOAD AND PARSE THE TWEETS FILE" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "collapsed": false 15 | }, 16 | "outputs": [ 17 | { 18 | "name": "stdout", 19 | "output_type": "stream", 20 | "text": [ 21 | "11/03/2015 18:36:43\n", 22 | "\n", 23 | "CPython 2.7.10\n", 24 | "IPython 4.0.0\n", 25 | "\n", 26 | "compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)\n", 27 | "system : Linux\n", 28 | "release : 3.13.0-66-generic\n", 29 | "machine : x86_64\n", 30 | "processor : x86_64\n", 31 | "CPU cores : 8\n", 32 | "interpreter: 64bit\n" 33 | ] 34 | } 35 | ], 36 | "source": [ 37 | "%load_ext watermark\n", 38 | "%watermark" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "The Tweepy Stream Handler writes to a file containing one tweet per line. Each line follows the following structure:\n", 46 | "\n", 47 | "*@USER + | + [LAT,LON] | TIMESTAMP | TWEET*\n", 48 | "\n", 49 | "And now we proceed to turn it into a more useable file" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "Since the tweets file was so big, I processed it by chunks instead of loading all of it in memory" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": { 63 | "collapsed": true 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "import pandas as pd\n", 68 | "import numpy as np\n", 69 | "\n", 70 | "tweets_raw = pd.read_table('tweets.txt', header=None, iterator=True)\n", 71 | "\n", 72 | "while 1:\n", 73 | " tweets = tweets_raw.get_chunk(10000)\n", 74 | " tweets.columns = ['tweets']\n", 75 | " tweets['len'] = tweets.tweets.apply(lambda x: len(x.split('|')))\n", 76 | " tweets[tweets.len < 4] = np.nan\n", 77 | " del tweets['len']\n", 78 | " tweets = tweets[tweets.tweets.notnull()]\n", 79 | " tweets['user'] = tweets.tweets.apply(lambda x: x.split('|')[0])\n", 80 | " tweets['geo'] = tweets.tweets.apply(lambda x: x.split('|')[1])\n", 81 | " tweets['timestamp'] = tweets.tweets.apply(lambda x: x.split('|')[2])\n", 82 | " tweets['tweet'] = tweets.tweets.apply(lambda x: x.split('|')[3])\n", 83 | " tweets['lat'] = tweets.geo.apply(lambda x: x.split(',')[0].replace('[',''))\n", 84 | " tweets['lon'] = tweets.geo.apply(lambda x: x.split(',')[1].replace(']',''))\n", 85 | " del tweets['tweets']\n", 86 | " del tweets['geo']\n", 87 | " tweets['lon'] = tweets.lon.convert_objects(convert_numeric=True)\n", 88 | " tweets['lat'] = tweets.lat.convert_objects(convert_numeric=True)\n", 89 | " tweets.to_csv('tweets.csv', mode='a', header=False,index=False)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 31, 95 | "metadata": { 96 | "collapsed": false 97 | }, 98 | "outputs": [ 99 | { 100 | "data": { 101 | "text/plain": [ 102 | "(181987, 5)" 103 | ] 104 | }, 105 | "execution_count": 31, 106 | "metadata": {}, 107 | "output_type": "execute_result" 108 | } 109 | ], 110 | "source": [ 111 | "tweets.shape" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 9, 117 | "metadata": { 118 | "collapsed": false 119 | }, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "user object\n", 125 | "timestamp datetime64[ns]\n", 126 | "tweet object\n", 127 | "lat float64\n", 128 | "lon float64\n", 129 | "dtype: object" 130 | ] 131 | }, 132 | "execution_count": 9, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "tweets.dtypes" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 33, 144 | "metadata": { 145 | "collapsed": false 146 | }, 147 | "outputs": [ 148 | { 149 | "data": { 150 | "text/html": [ 151 | "
\n", 152 | "\n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | "
usertimestamptweetlatlon
1@IkiduAuren2015-03-29 15:58:23Feliz tarde de Domingo. http://t.co/jxL7v5zFwd38.842937-0.115407
3@monorex22015-03-29 15:59:37Good afternoon:-D:-D38.026032-1.208355
4@Santos_Poveda2015-03-29 16:01:29@InkUtv @OilVirgin @RiobuenoRafael @NinaNebo @...38.095896-1.181909
6@anittaaML2015-03-29 16:03:46@caarmens98 te voy a reportar por hj de p37.992632-1.197700
7@helenatovar02102015-03-29 16:04:46Lucha por lo qe quieres qe les joda a los qe h...38.055685-1.081301
\n", 206 | "
" 207 | ], 208 | "text/plain": [ 209 | " user timestamp \\\n", 210 | "1 @IkiduAuren 2015-03-29 15:58:23 \n", 211 | "3 @monorex2 2015-03-29 15:59:37 \n", 212 | "4 @Santos_Poveda 2015-03-29 16:01:29 \n", 213 | "6 @anittaaML 2015-03-29 16:03:46 \n", 214 | "7 @helenatovar0210 2015-03-29 16:04:46 \n", 215 | "\n", 216 | " tweet lat lon \n", 217 | "1 Feliz tarde de Domingo. http://t.co/jxL7v5zFwd 38.842937 -0.115407 \n", 218 | "3 Good afternoon:-D:-D 38.026032 -1.208355 \n", 219 | "4 @InkUtv @OilVirgin @RiobuenoRafael @NinaNebo @... 38.095896 -1.181909 \n", 220 | "6 @caarmens98 te voy a reportar por hj de p 37.992632 -1.197700 \n", 221 | "7 Lucha por lo qe quieres qe les joda a los qe h... 38.055685 -1.081301 " 222 | ] 223 | }, 224 | "execution_count": 33, 225 | "metadata": {}, 226 | "output_type": "execute_result" 227 | } 228 | ], 229 | "source": [ 230 | "#convert time zome from UTC to Spain time for further time of day analyses\n", 231 | "tweets.set_index('timestamp').tz_localize('UTC').tz_convert('Europe/Madrid').reset_index()\n", 232 | "tweets.head()" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "Since we want to do a heatmap, we only care about those tweets that are geocoded and whose latitude and longitud are within the Murcia area" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 34, 245 | "metadata": { 246 | "collapsed": false 247 | }, 248 | "outputs": [ 249 | { 250 | "data": { 251 | "text/plain": [ 252 | "(95384, 5)" 253 | ] 254 | }, 255 | "execution_count": 34, 256 | "metadata": {}, 257 | "output_type": "execute_result" 258 | } 259 | ], 260 | "source": [ 261 | "min_lon = -1.157420\n", 262 | "max_lon = -1.081202\n", 263 | "min_lat = 37.951741\n", 264 | "max_lat = 38.029126\n", 265 | "\n", 266 | "tweets = tweets[(tweets.lat.notnull()) & (tweets.lon.notnull())]\n", 267 | "\n", 268 | "tweets = tweets[(tweets.lon > min_lon) & (tweets.lon < max_lon) & (tweets.lat > min_lat) & (tweets.lat < max_lat)]\n", 269 | "tweets.shape" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "Finally, we save the parsed tweets to use with [heatmap.py](http://www.sethoscope.net/heatmap/)" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 40, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | "/media/manuel/DATA/Backup/Proyectos/tweepy murcia/heatmap\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "cd ../heatmap" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 42, 301 | "metadata": { 302 | "collapsed": false 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "with open('tweets_heatmap','w') as file:\n", 307 | " file.write(tweets[['lat','lon']].to_string(header=False, index=False))" 308 | ] 309 | } 310 | ], 311 | "metadata": { 312 | "kernelspec": { 313 | "display_name": "Python 2", 314 | "language": "python", 315 | "name": "python2" 316 | }, 317 | "language_info": { 318 | "codemirror_mode": { 319 | "name": "ipython", 320 | "version": 2 321 | }, 322 | "file_extension": ".py", 323 | "mimetype": "text/x-python", 324 | "name": "python", 325 | "nbconvert_exporter": "python", 326 | "pygments_lexer": "ipython2", 327 | "version": "2.7.10" 328 | } 329 | }, 330 | "nbformat": 4, 331 | "nbformat_minor": 0 332 | } 333 | -------------------------------------------------------------------------------- /heatmap.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # heatmap.py - Generates heat map images and animations from geographic data 4 | # Copyright 2010 Seth Golub 5 | # http://www.sethoscope.net/heatmap/ 6 | # 7 | # This program is free software: you can redistribute it and/or modify 8 | # it under the terms of the GNU Affero General Public License as 9 | # published by the Free Software Foundation, either version 3 of the 10 | # License, or (at your option) any later version. 11 | # 12 | # This program is distributed in the hope that it will be useful, 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | # GNU Affero General Public License for more details. 16 | # 17 | # You should have received a copy of the GNU Affero General Public License 18 | # along with this program. If not, see . 19 | 20 | from __future__ import print_function 21 | 22 | import sys 23 | import logging 24 | import math 25 | from PIL import Image 26 | from PIL import ImageColor 27 | import tempfile 28 | import os.path 29 | import shutil 30 | import subprocess 31 | from time import mktime, strptime 32 | from collections import defaultdict 33 | import xml.etree.cElementTree as ET 34 | from colorsys import hsv_to_rgb 35 | try: 36 | import cPickle as pickle 37 | except ImportError: 38 | import pickle 39 | 40 | __version__ = '1.11' 41 | 42 | class Coordinate(object): 43 | def __init__(self, x, y): 44 | self.x = x 45 | self.y = y 46 | 47 | first = property(lambda self: self.x) 48 | second = property(lambda self: self.y) 49 | 50 | def copy(self): 51 | return self.__class__(self.first, self.second) 52 | 53 | def __str__(self): 54 | return '(%s, %s)' % (str(self.x), str(self.y)) 55 | 56 | def __hash__(self): 57 | return hash((self.x, self.y)) 58 | 59 | def __eq__(self, o): 60 | return True if self.x == o.x and self.y == o.y else False 61 | 62 | def __sub__(self, o): 63 | return self.__class__(self.first - o.first, self.second - o.second) 64 | 65 | 66 | class LatLon(Coordinate): 67 | def __init__(self, lat, lon): 68 | self.lat = lat 69 | self.lon = lon 70 | 71 | def get_lat(self): 72 | return self.y 73 | 74 | def set_lat(self, lat): 75 | self.y = lat 76 | 77 | def get_lon(self): 78 | return self.x 79 | 80 | def set_lon(self, lon): 81 | self.x = lon 82 | 83 | lat = property(get_lat, set_lat) 84 | lon = property(get_lon, set_lon) 85 | 86 | first = property(get_lat) 87 | second = property(get_lon) 88 | 89 | class TrackLog: 90 | class Trkseg(list): # for GPX tags 91 | pass 92 | 93 | class Trkpt: # for GPX tags 94 | def __init__(self, lat, lon): 95 | self.coords = LatLon(float(lat), float(lon)) 96 | 97 | def __str__(self): 98 | return str(self.coords) 99 | 100 | def _parse(self, filename): 101 | self._segments = [] 102 | for event, elem in ET.iterparse(filename, ('start', 'end')): 103 | elem.tag = elem.tag[elem.tag.rfind('}') + 1:] # remove namespace 104 | if elem.tag == "trkseg": 105 | if event == 'start': 106 | self._segments.append(TrackLog.Trkseg()) 107 | else: # event == 'end' 108 | yield self._segments.pop() 109 | elem.clear() # delete contents from parse tree 110 | elif elem.tag == 'trkpt' and event == 'end': 111 | point = TrackLog.Trkpt(elem.attrib['lat'], elem.attrib['lon']) 112 | self._segments[-1].append(point) 113 | timestr = elem.findtext('time') 114 | if timestr: 115 | timestr = timestr[:-1].split('.')[0] + ' GMT' 116 | point.time = mktime( 117 | strptime(timestr, '%Y-%m-%dT%H:%M:%S %Z')) 118 | elem.clear() # clear the trkpt node to minimize memory usage 119 | 120 | def __init__(self, filename): 121 | self.filename = filename 122 | 123 | def segments(self): 124 | '''Parse file and yield segments containing points''' 125 | logging.info('reading GPX track from %s' % self.filename) 126 | return self._parse(self.filename) 127 | 128 | 129 | class Projection(object): 130 | # For guessing scale, we pretend the earth is a sphere with this 131 | # radius in meters, as in Web Mercator (the projection all the 132 | # online maps use). 133 | EARTH_RADIUS = 6378137 # in meters 134 | 135 | def get_pixels_per_degree(self): 136 | try: 137 | return self._pixels_per_degree 138 | except AttributeError: 139 | raise AttributeError('projection scale was never set') 140 | 141 | def set_pixels_per_degree(self, val): 142 | self._pixels_per_degree = val 143 | logging.info('scale: %f meters/pixel (%f pixels/degree)' 144 | % (self.meters_per_pixel, val)) 145 | 146 | def get_meters_per_pixel(self): 147 | return 2 * math.pi * self.EARTH_RADIUS / 360 / self.pixels_per_degree 148 | 149 | def set_meters_per_pixel(self, val): 150 | self.pixels_per_degree = 2 * math.pi * self.EARTH_RADIUS / 360 / val 151 | return val 152 | 153 | pixels_per_degree = property(get_pixels_per_degree, set_pixels_per_degree) 154 | meters_per_pixel = property(get_meters_per_pixel, set_meters_per_pixel) 155 | 156 | def is_scaled(self): 157 | return hasattr(self, '_pixels_per_degree') 158 | 159 | def project(self, coords): 160 | raise NotImplementedError 161 | 162 | def inverse_project(self, coords): # Not all projections can support this. 163 | raise NotImplementedError 164 | 165 | def auto_set_scale(self, extent_in, padding, width=None, height=None): 166 | # We need to choose a scale at which the data's bounding box, 167 | # once projected onto the map, will fit in the specified height 168 | # and/or width. The catch is that we can't project until we 169 | # have a scale, so what we'll do is set a provisional scale, 170 | # project the bounding box onto the map, then adjust the scale 171 | # appropriately. This way we don't need to know anything about 172 | # the projection. 173 | # 174 | # Projection subclasses are free to override this method with 175 | # something simpler that just solves for scale given the lat/lon 176 | # and x/y bounds. 177 | 178 | # We'll work large to minimize roundoff error. 179 | SCALE_FACTOR = 1000000.0 180 | self.pixels_per_degree = SCALE_FACTOR 181 | extent_out = extent_in.map(self.project) 182 | padding *= 2 # padding-per-edge -> padding-in-each-dimension 183 | try: 184 | if height: 185 | self.pixels_per_degree = pixels_per_lat = ( 186 | float(height - padding) / 187 | extent_out.size().y * SCALE_FACTOR) 188 | if width: 189 | self.pixels_per_degree = ( 190 | float(width - padding) / 191 | extent_out.size().x * SCALE_FACTOR) 192 | if height: 193 | self.pixels_per_degree = min(self.pixels_per_degree, 194 | pixels_per_lat) 195 | except ZeroDivisionError: 196 | raise ZeroDivisionError( 197 | 'You need at least two data points for auto scaling. ' 198 | 'Try specifying the scale explicitly (or extent + ' 199 | 'height or width).') 200 | assert(self.pixels_per_degree > 0) 201 | 202 | 203 | # Treats Lat/Lon as a square grid. 204 | class EquirectangularProjection(Projection): 205 | # http://en.wikipedia.org/wiki/Equirectangular_projection 206 | def project(self, coord): 207 | x = coord.lon * self.pixels_per_degree 208 | y = -coord.lat * self.pixels_per_degree 209 | return Coordinate(x, y) 210 | 211 | def inverse_project(self, coord): 212 | lat = -coord.y / self.pixels_per_degree 213 | lon = coord.x / self.pixels_per_degree 214 | return LatLon(lat, lon) 215 | 216 | 217 | class MercatorProjection(Projection): 218 | def set_pixels_per_degree(self, val): 219 | super(MercatorProjection, self).set_pixels_per_degree(val) 220 | self._pixels_per_radian = val * (180 / math.pi) 221 | pixels_per_degree = property(Projection.get_pixels_per_degree, 222 | set_pixels_per_degree) 223 | 224 | def project(self, coord): 225 | x = coord.lon * self.pixels_per_degree 226 | y = -self._pixels_per_radian * math.log( 227 | math.tan((math.pi/4 + math.pi/360 * coord.lat))) 228 | return Coordinate(x, y) 229 | 230 | def inverse_project(self, coord): 231 | lat = (360 / math.pi 232 | * math.atan(math.exp(-coord.y / self._pixels_per_radian)) - 90) 233 | lon = coord.x / self.pixels_per_degree 234 | return LatLon(lat, lon) 235 | 236 | class Extent(): 237 | def __init__(self, coords=None, shapes=None): 238 | if coords: 239 | coords = tuple(coords) # if it's a generator, slurp them all 240 | self.min = coords[0].__class__(min(c.first for c in coords), 241 | min(c.second for c in coords)) 242 | self.max = coords[0].__class__(max(c.first for c in coords), 243 | max(c.second for c in coords)) 244 | elif shapes: 245 | self.from_shapes(shapes) 246 | else: 247 | raise ValueError('Extent must be initialized') 248 | 249 | def __str__(self): 250 | return '%s,%s,%s,%s' % (self.min.y, self.min.x, self.max.y, self.max.x) 251 | 252 | def update(self, other): 253 | '''grow this bounding box so that it includes the other''' 254 | self.min.x = min(self.min.x, other.min.x) 255 | self.min.y = min(self.min.y, other.min.y) 256 | self.max.x = max(self.max.x, other.max.x) 257 | self.max.y = max(self.max.y, other.max.y) 258 | 259 | def from_bounding_box(self, other): 260 | self.min = other.min.copy() 261 | self.max = other.max.copy() 262 | 263 | def from_shapes(self, shapes): 264 | shapes = iter(shapes) 265 | self.from_bounding_box(next(shapes).extent) 266 | for s in shapes: 267 | self.update(s.extent) 268 | 269 | def corners(self): 270 | return (self.min, self.max) 271 | 272 | def size(self): 273 | return self.max.__class__(self.max.x - self.min.x, 274 | self.max.y - self.min.y) 275 | 276 | def grow(self, pad): 277 | self.min.x -= pad 278 | self.min.y -= pad 279 | self.max.x += pad 280 | self.max.y += pad 281 | 282 | def resize(self, width=None, height=None): 283 | if width: 284 | self.max.x += float(width - self.size().x) / 2 285 | self.min.x = self.max.x - width 286 | if height: 287 | self.max.y += float(height - self.size().y) / 2 288 | self.min.y = self.max.y - height 289 | 290 | def is_inside(self, coord): 291 | return (coord.x >= self.min.x and coord.x <= self.max.x and 292 | coord.y >= self.min.y and coord.y <= self.max.y) 293 | 294 | def map(self, func): 295 | '''Returns a new Extent whose corners are a function of the 296 | corners of this one. The expected use is to project a Extent 297 | onto a map. For example: bbox_xy = bbox_ll.map(projector.project)''' 298 | return Extent(coords=(func(self.min), func(self.max))) 299 | 300 | 301 | class Matrix(defaultdict): 302 | '''An abstract sparse matrix, with data stored as {coord : value}.''' 303 | 304 | @staticmethod 305 | def matrix_factory(decay): 306 | # If decay is 0 or 1, we can accumulate as we go and save lots of 307 | # memory. 308 | if decay == 1.0: 309 | logging.info('creating a summing matrix') 310 | return SummingMatrix() 311 | elif decay == 0.0: 312 | logging.info('creating a maxing matrix') 313 | return MaxingMatrix() 314 | logging.info('creating an appending matrix') 315 | return AppendingMatrix(decay) 316 | 317 | def __init__(self, default_factory=float): 318 | self.default_factory = default_factory 319 | 320 | def add(self, coord, val): 321 | raise NotImplementedError 322 | 323 | def extent(self): 324 | return(Extent(coords=self.keys())) 325 | 326 | def finalized(self): 327 | return self 328 | 329 | 330 | class SummingMatrix(Matrix): 331 | def add(self, coord, val): 332 | self[coord] += val 333 | 334 | 335 | class MaxingMatrix(Matrix): 336 | def add(self, coord, val): 337 | self[coord] = max(val, self.get(coord, val)) 338 | 339 | 340 | class AppendingMatrix(Matrix): 341 | def __init__(self, decay): 342 | self.default_factory = list 343 | self.decay = decay 344 | 345 | def add(self, coord, val): 346 | self[coord].append(val) 347 | 348 | def finalized(self): 349 | logging.info('combining coincident points') 350 | m = Matrix() 351 | for (coord, values) in self.items(): 352 | m[coord] = self.reduce(self.decay, values) 353 | return m 354 | 355 | @staticmethod 356 | def reduce(decay, values): 357 | ''' 358 | Returns a weighted sum of the values, where weight N is 359 | pow(decay,N). This means the largest value counts fully, but 360 | additional values have diminishing contributions. decay=0 makes 361 | the reduction equivalent to max(), which makes each data point 362 | visible, but says nothing about their relative magnitude. 363 | decay=1 makes this like sum(), which makes the relative 364 | magnitude of the points more visible, but could make smaller 365 | values hard to see. Experiment with values between 0 and 1. 366 | Values outside that range will give weird results. 367 | ''' 368 | # It would be nice to do this on the fly, while accumulating data, but 369 | # it needs to be insensitive to data order. 370 | weight = 1.0 371 | total = 0.0 372 | values.sort(reverse=True) 373 | for value in values: 374 | total += value * weight 375 | weight *= decay 376 | return total 377 | 378 | 379 | class Point: 380 | def __init__(self, coord, weight=1.0): 381 | self.coord = coord 382 | self.weight = weight 383 | 384 | def __str__(self): 385 | return 'P(%s)' % str(self.coord) 386 | 387 | @staticmethod 388 | def general_distance(x, y): 389 | # assumes square units, which causes distortion in some projections 390 | return (x ** 2 + y ** 2) ** 0.5 391 | 392 | @property 393 | def extent(self): 394 | if not hasattr(self, '_extent'): 395 | self._extent = Extent(coords=(self.coord,)) 396 | return self._extent 397 | 398 | # From a modularity standpoint, it would be reasonable to cache 399 | # distances, not heat values, and let the kernel cache the 400 | # distance to heat map, but this is substantially faster. 401 | heat_cache = {} 402 | @classmethod 403 | def _initialize_heat_cache(cls, kernel): 404 | cache = {} 405 | for x in range(kernel.radius + 1): 406 | for y in range(kernel.radius + 1): 407 | cache[(x, y)] = kernel.heat(cls.general_distance(x, y)) 408 | cls.heat_cache[kernel] = cache 409 | 410 | def add_heat_to_matrix(self, matrix, kernel): 411 | if kernel not in Point.heat_cache: 412 | Point._initialize_heat_cache(kernel) 413 | cache = Point.heat_cache[kernel] 414 | x = int(self.coord.x) 415 | y = int(self.coord.y) 416 | for dx in range(-kernel.radius, kernel.radius + 1): 417 | for dy in range(-kernel.radius, kernel.radius + 1): 418 | matrix.add(Coordinate(x + dx, y + dy), 419 | self.weight * cache[(abs(dx), abs(dy))]) 420 | 421 | def map(self, func): 422 | return Point(func(self.coord), self.weight) 423 | 424 | 425 | class LineSegment: 426 | def __init__(self, start, end, weight=1.0): 427 | self.start = start 428 | self.end = end 429 | self.weight = weight 430 | self.length_squared = float((self.end.x - self.start.x) ** 2 + 431 | (self.end.y - self.start.y) ** 2) 432 | self.extent = Extent(coords=(start, end)) 433 | 434 | def __str__(self): 435 | return 'LineSegment(%s, %s)' % (self.start, self.end) 436 | 437 | def distance(self, coord): 438 | # http://stackoverflow.com/questions/849211/shortest-distance-between-a-point-and-a-line-segment 439 | # http://www.topcoder.com/tc?d1=tutorials&d2=geometry1&module=Static#line_point_distance 440 | # http://local.wasp.uwa.edu.au/~pbourke/geometry/pointline/ 441 | try: 442 | dx = (self.end.x - self.start.x) 443 | dy = (self.end.y - self.start.y) 444 | u = ((coord.x - self.start.x) * dx + 445 | (coord.y - self.start.y) * dy) / self.length_squared 446 | if u < 0: 447 | u = 0 448 | elif u > 1: 449 | u = 1 450 | except ZeroDivisionError: 451 | u = 0 # Our line is zero-length. That's ok. 452 | dx = self.start.x + u * dx - coord.x 453 | dy = self.start.y + u * dy - coord.y 454 | return math.sqrt(dx * dx + dy * dy) 455 | 456 | def add_heat_to_matrix(self, matrix, kernel): 457 | # Iterate over every point in a bounding box around this, with an 458 | # extra margin given by the kernel's self-reported maximum range. 459 | # TODO: There is probably a more clever iteration that skips more 460 | # of the empty space. 461 | for x in range(int(self.extent.min.x - kernel.radius), 462 | int(self.extent.max.x + kernel.radius + 1)): 463 | for y in range(int(self.extent.min.y - kernel.radius), 464 | int(self.extent.max.y + kernel.radius + 1)): 465 | coord = Coordinate(x, y) 466 | heat = kernel.heat(self.distance(coord)) 467 | if heat: 468 | matrix.add(coord, self.weight * heat) 469 | 470 | def map(self, func): 471 | return LineSegment(func(self.start), func(self.end)) 472 | 473 | 474 | class LinearKernel: 475 | '''Uses a linear falloff, essentially turning a point into a cone.''' 476 | def __init__(self, radius): 477 | self.radius = radius # in pixels 478 | self.radius_float = float(radius) # worthwhile time saver 479 | 480 | def heat(self, distance): 481 | if distance >= self.radius: 482 | return 0.0 483 | return 1.0 - (distance / self.radius_float) 484 | 485 | 486 | class GaussianKernel: 487 | def __init__(self, radius): 488 | '''radius is the distance beyond which you should not bother.''' 489 | self.radius = radius 490 | # We set the scale such that the heat value drops to 1/256 of 491 | # the peak at a distance of radius. 492 | self.scale = math.log(256) / radius 493 | 494 | def heat(self, distance): 495 | '''Returns 1.0 at center, 1/e at radius pixels from center.''' 496 | return math.e ** (-distance * self.scale) 497 | 498 | 499 | class ColorMap: 500 | DEFAULT_HSVA_MIN_STR = '000ffff00' 501 | DEFAULT_HSVA_MAX_STR = '02affffff' 502 | 503 | @staticmethod 504 | def _str_to_float(string, base=16, maxval=256): 505 | return float(int(string, base)) / maxval 506 | 507 | @staticmethod 508 | def str_to_hsva(string): 509 | ''' 510 | Returns a 4-tuple of ints from a hex string color specification, 511 | such that AAABBCCDD becomes AAA, BB, CC, DD. For example, 512 | str2hsva('06688bbff') returns (102, 136, 187, 255). Note that 513 | the first number is 3 digits. 514 | ''' 515 | if string.startswith('#'): 516 | string = string[1:] # Leading "#" was once required, is now optional. 517 | return tuple(ColorMap._str_to_float(s) for s in (string[0:3], 518 | string[3:5], 519 | string[5:7], 520 | string[7:9])) 521 | 522 | def __init__(self, hsva_min=None, hsva_max=None, image=None, steps=256): 523 | ''' 524 | Create a color map based on a progression in the specified 525 | range, or using pixels in a provided image. 526 | 527 | If supplied, hsva_min and hsva_max must each be a 4-tuple of 528 | (hue, saturation, value, alpha), where each is a float from 529 | 0.0 to 1.0. The gradient will be a linear progression from 530 | hsva_min to hsva_max, including both ends of the range. 531 | 532 | The optional steps argument specifies how many discrete steps 533 | there should be in the color gradient when using hsva_min 534 | and hsva_max. 535 | ''' 536 | # TODO: do the interpolation in Lab space instead of HSV 537 | self.values = [] 538 | if image: 539 | assert image.mode == 'RGBA', ( 540 | 'Gradient image must be RGBA. Yours is %s.' % image.mode) 541 | num_rows = image.size[1] 542 | self.values = [image.getpixel((0, row)) for row in range(num_rows)] 543 | self.values.reverse() 544 | else: 545 | if not hsva_min: 546 | hsva_min = ColorMap.str_to_hsva(self.DEFAULT_HSVA_MIN_STR) 547 | if not hsva_max: 548 | hsva_max = ColorMap.str_to_hsva(self.DEFAULT_HSVA_MAX_STR) 549 | # Turn (h1,s1,v1,a1), (h2,s2,v2,a2) into (h2-h1,s2-s1,v2-v1,a2-a1) 550 | hsva_range = list(map(lambda min, max: max - min, hsva_min, hsva_max)) 551 | for value in range(0, steps): 552 | hsva = list(map( 553 | lambda range, min: value / float(steps - 1) * range + min, 554 | hsva_range, hsva_min)) 555 | hsva[0] = hsva[0] % 1 # in case hue is out of range 556 | rgba = tuple( 557 | [int(x * 255) for x in hsv_to_rgb(*hsva[0:3]) + (hsva[3],)]) 558 | self.values.append(rgba) 559 | 560 | def get(self, floatval): 561 | return self.values[int(floatval * (len(self.values) - 1))] 562 | 563 | 564 | class ImageMaker(): 565 | def __init__(self, config): 566 | '''Each argument to the constructor should be a 4-tuple of (hue, 567 | saturaton, value, alpha), one to use for minimum data values and 568 | one for maximum. Each should be in [0,1], however because hue is 569 | circular, you may specify hue in any range and it will be shifted 570 | into [0,1] as needed. This is so you can wrap around the color 571 | wheel in either direction.''' 572 | self.config = config 573 | if config.background and not config.background_image: 574 | self.background = ImageColor.getrgb(config.background) 575 | else: 576 | self.background = None 577 | 578 | @staticmethod 579 | def _blend_pixels(a, b): 580 | # a is RGBA, b is RGB; we could write this more generically, 581 | # but why complicate things? 582 | alpha = a[3] / 255.0 583 | return tuple( 584 | map(lambda aa, bb: int(aa * alpha + bb * (1 - alpha)), a[:3], b)) 585 | 586 | def make_image(self, matrix): 587 | extent = self.config.extent_out 588 | if not extent: 589 | extent = matrix.extent() 590 | extent.resize((self.config.width or 1) - 1, 591 | (self.config.height or 1) - 1) 592 | size = extent.size() 593 | size.x = int(size.x) + 1 594 | size.y = int(size.y) + 1 595 | logging.info('saving image (%d x %d)' % (size.x, size.y)) 596 | if self.background: 597 | img = Image.new('RGB', (size.x, size.y), self.background) 598 | else: 599 | img = Image.new('RGBA', (size.x, size.y)) 600 | 601 | maxval = max(matrix.values()) 602 | pixels = img.load() 603 | for (coord, val) in matrix.items(): 604 | x = int(coord.x - extent.min.x) 605 | y = int(coord.y - extent.min.y) 606 | if extent.is_inside(coord): 607 | color = self.config.colormap.get(val / maxval) 608 | if self.background: 609 | pixels[x, y] = ImageMaker._blend_pixels(color, 610 | self.background) 611 | else: 612 | pixels[x, y] = color 613 | if self.config.background_image: 614 | img = Image.composite(img, self.config.background_image, 615 | img.split()[3]) 616 | return img 617 | 618 | 619 | class ImageSeriesMaker(): 620 | '''Creates a movie showing the data appearing on the heatmap.''' 621 | def __init__(self, config): 622 | self.config = config 623 | self.image_maker = ImageMaker(config) 624 | self.tmpdir = tempfile.mkdtemp() 625 | self.imgfile_template = os.path.join(self.tmpdir, 'frame-%05d.png') 626 | 627 | 628 | def _save_image(self, matrix): 629 | self.frame_count += 1 630 | logging.info('Frame %d' % (self.frame_count)) 631 | matrix = matrix.finalized() 632 | image = self.image_maker.make_image(matrix) 633 | image.save(self.imgfile_template % self.frame_count) 634 | 635 | def maybe_save_image(self, matrix): 636 | self.inputs_since_output += 1 637 | if self.inputs_since_output >= self.config.frequency: 638 | self._save_image(matrix) 639 | self.inputs_since_output = 0 640 | 641 | @staticmethod 642 | def create_movie(infiles, outfile, ffmpegopts): 643 | command = ['ffmpeg', '-i', infiles] 644 | if ffmpegopts: 645 | # I hope they don't have spaces in their arguments 646 | command.extend(ffmpegopts.split()) 647 | command.append(outfile) 648 | logging.info('Encoding video: %s' % ' '.join(command)) 649 | subprocess.call(command) 650 | 651 | 652 | def run(self): 653 | logging.info('Putting animation frames in %s' % self.tmpdir) 654 | self.inputs_since_output = 0 655 | self.frame_count = 0 656 | matrix = process_shapes(self.config, self.maybe_save_image) 657 | if ( not self.frame_count 658 | or self.inputs_since_output >= self.config.straggler_threshold ): 659 | self._save_image(matrix) 660 | self.create_movie(self.imgfile_template, 661 | self.config.output, 662 | self.config.ffmpegopts) 663 | if self.config.keepframes: 664 | logging.info('The animation frames are in %s' % self.tmpdir) 665 | else: 666 | shutil.rmtree(self.tmpdir) 667 | return matrix 668 | 669 | 670 | def _get_osm_image(bbox, zoom, osm_base): 671 | # Just a wrapper for osm.createOSMImage to translate coordinate schemes 672 | try: 673 | from osmviz.manager import PILImageManager, OSMManager 674 | osm = OSMManager( 675 | image_manager=PILImageManager('RGB'), 676 | server=osm_base) 677 | (c1, c2) = bbox.corners() 678 | image, bounds = osm.createOSMImage((c1.lat, c2.lat, c1.lon, c2.lon), zoom) 679 | (lat1, lat2, lon1, lon2) = bounds 680 | return image, Extent(coords=(LatLon(lat1, lon1), 681 | LatLon(lat2, lon2))) 682 | except ImportError as e: 683 | logging.error( 684 | "ImportError: %s.\n" 685 | "The --osm option depends on the osmviz module, available from\n" 686 | "http://cbick.github.com/osmviz/\n\n" % str(e)) 687 | sys.exit(1) 688 | 689 | 690 | def _scale_for_osm_zoom(zoom): 691 | return 256 * pow(2, zoom) / 360.0 692 | 693 | 694 | def choose_osm_zoom(config, padding): 695 | # Since we know we're only going to do this with Mercator, we could do 696 | # a bit more math and solve this directly, but as a first pass method, 697 | # we instead project the bounding box into pixel-land at a high zoom 698 | # level, then see the power of two we're off by. 699 | if config.zoom: 700 | return config.zoom 701 | if not (config.width or config.height): 702 | raise ValueError('For OSM, you must specify height, width, or zoom') 703 | crazy_zoom_level = 30 704 | proj = MercatorProjection() 705 | scale = _scale_for_osm_zoom(crazy_zoom_level) 706 | proj.pixels_per_degree = scale 707 | bbox_crazy_xy = config.extent_in.map(proj.project) 708 | if config.width: 709 | size_ratio = width_ratio = ( 710 | float(bbox_crazy_xy.size().x) / (config.width - 2 * padding)) 711 | if config.height: 712 | size_ratio = ( 713 | float(bbox_crazy_xy.size().y) / (config.height - 2 * padding)) 714 | if config.width: 715 | size_ratio = max(size_ratio, width_ratio) 716 | # TODO: We use --height and --width as upper bounds, choosing a zoom 717 | # level that lets our image be no larger than the specified size. 718 | # It might be desirable to use them as lower bounds or to get as close 719 | # as possible, whether larger or smaller (where "close" probably means 720 | # in pixels, not scale factors). 721 | # TODO: This is off by a little bit at small scales. 722 | zoom = int(crazy_zoom_level - math.log(size_ratio, 2)) 723 | logging.info('Choosing OSM zoom level %d' % zoom) 724 | return zoom 725 | 726 | 727 | def get_osm_background(config, padding): 728 | zoom = choose_osm_zoom(config, padding) 729 | proj = MercatorProjection() 730 | proj.pixels_per_degree = _scale_for_osm_zoom(zoom) 731 | bbox_xy = config.extent_in.map(proj.project) 732 | # We're not checking that the padding fits within the specified size. 733 | bbox_xy.grow(padding) 734 | bbox_ll = bbox_xy.map(proj.inverse_project) 735 | image, img_bbox_ll = _get_osm_image(bbox_ll, zoom, config.osm_base) 736 | img_bbox_xy = img_bbox_ll.map(proj.project) 737 | 738 | # TODO: this crops to our data extent, which means we're not making 739 | # an image of the requested dimensions. Perhaps we should let the 740 | # user specify whether to treat the requested size as min,max,exact. 741 | offset = bbox_xy.min - img_bbox_xy.min 742 | image = image.crop(( 743 | int(offset.x), 744 | int(offset.y), 745 | int(offset.x + bbox_xy.size().x + 1), 746 | int(offset.y + bbox_xy.size().y + 1))) 747 | config.background_image = image 748 | config.extent_in = bbox_ll 749 | config.projection = proj 750 | (config.width, config.height) = image.size 751 | 752 | #if the layer option is on we only draw the points 753 | if config.layer: 754 | from PIL import ImageDraw 755 | transparent_area = (0,0,image.size[0], image.size[1]) 756 | mask=Image.new('L', image.size, color=255) 757 | draw=ImageDraw.Draw(mask) 758 | draw.rectangle(transparent_area, fill=0) 759 | image.putalpha(mask) 760 | 761 | return image, bbox_ll, proj 762 | 763 | def process_shapes(config, hook=None): 764 | matrix = Matrix.matrix_factory(config.decay) 765 | logging.info('processing data') 766 | for shape in config.shapes: 767 | shape = shape.map(config.projection.project) 768 | # TODO: skip shapes outside map extent 769 | shape.add_heat_to_matrix(matrix, config.kernel) 770 | if hook: 771 | hook(matrix) 772 | return matrix 773 | 774 | def shapes_from_gpx(filename): 775 | track = TrackLog(filename) 776 | for trkseg in track.segments(): 777 | for i, p1 in enumerate(trkseg[:-1]): 778 | p2 = trkseg[i + 1] 779 | yield LineSegment(p1.coords, p2.coords) 780 | 781 | def shapes_from_file(filename): 782 | logging.info('reading points from %s' % filename) 783 | count = 0 784 | with open(filename, 'rU') as f: 785 | for line in f: 786 | line = line.strip() 787 | if len(line) > 0: # ignore blank lines 788 | values = [float(x) for x in line.split()] 789 | assert len(values) == 2 or len(values) == 3, ( 790 | 'input lines must have two or three values: %s' % line) 791 | (lat, lon) = values[0:2] 792 | weight = 1.0 if len(values) == 2 else values[2] 793 | count += 1 794 | yield Point(LatLon(lat, lon), weight) 795 | logging.info('read %d points' % count) 796 | 797 | def shapes_from_csv(filename, ignore_csv_header): 798 | import csv 799 | logging.info('reading csv') 800 | count = 0 801 | with open(filename, 'rU') as f: 802 | reader = csv.reader(f) 803 | if ignore_csv_header: 804 | next(reader) # Skip header line 805 | for row in reader: 806 | (lat, lon) = (float(row[0]), float(row[1])) 807 | count += 1 808 | yield Point(LatLon(lat, lon)) 809 | logging.info('read %d points' % count) 810 | 811 | def shapes_from_shp(filename): 812 | try: 813 | import ogr 814 | import osr 815 | except ImportError: 816 | try: 817 | from osgeo import ogr 818 | from osgeo import osr 819 | except ImportError: 820 | raise ImportError('You need to have python-gdal bindings installed') 821 | 822 | driver = ogr.GetDriverByName("ESRI Shapefile") 823 | dataSource = driver.Open(filename, 0) 824 | if dataSource is None: 825 | raise Exception("Not a valid shape file") 826 | 827 | layer = dataSource.GetLayer() 828 | if layer.GetGeomType() != 1: 829 | raise Exception("Only point layers are supported") 830 | 831 | spatial_reference = layer.GetSpatialRef() 832 | if spatial_reference is None: 833 | raise Exception("The shapefile doesn't have spatial reference") 834 | 835 | spatial_reference.AutoIdentifyEPSG() 836 | auth_code = spatial_reference.GetAuthorityCode(None) 837 | if auth_code == '': 838 | raise Exception("The input shapefile projection could not be recognized") 839 | 840 | if auth_code != '4326': 841 | # TODO: implement reproject layer (maybe geometry by geometry is easier) 842 | raise Exception("Currently only Lng-Lat WGS84 is supported (EPSG 4326)") 843 | 844 | count = 0 845 | for feature in layer: 846 | geom = feature.GetGeometryRef() 847 | lat = geom.GetY() 848 | lon = geom.GetX() 849 | count += 1 850 | yield Point(LatLon(lat,lon)) 851 | 852 | logging.info('read %d points' % count) 853 | 854 | class Configuration(object): 855 | ''' 856 | This object holds the settings for creating a heatmap as well as 857 | an iterator for the input data. 858 | 859 | Most of the command line processing is about settings and data, so 860 | the command line options are also processed with this object. 861 | This happens in two phases. 862 | 863 | First the settings are parsed and turned into more useful objects 864 | in set_from_options(). Command line flags go in, and the 865 | Configuration object is populated with the specified values and 866 | defaults. 867 | 868 | In the second phase, various other parameters are computed. These 869 | are things we set automatically based on the other settings or on 870 | the data. You can skip this if you set everything manually, but 871 | 872 | The idea is that someone could import this module, populate a 873 | Configuration instance manually, and run the process themselves. 874 | Where possible, this object contains instances, rather than option 875 | strings (e.g. for projection, kernel, colormap, etc). 876 | 877 | Every parameter is explained in the glossary dictionary, and only 878 | documented parameters are allowed. Parameters default to None. 879 | ''' 880 | 881 | glossary = { 882 | # Many of these are exactly the same as the command line option. 883 | # In those cases, the documentation is left blank. 884 | # Many have default values based on the command line defaults. 885 | 'output' : '', 886 | 'width' : '', 887 | 'height' : '', 888 | 'margin' : '', 889 | 'shapes' : 'unprojected iterable of shapes (Points and LineSegments)', 890 | 'projection' : 'Projection instance', 891 | 'colormap' : 'ColorMap instance', 892 | 'decay' : '', 893 | 'kernel' : 'kernel instance', 894 | 'extent_in' : 'extent in original space', 895 | 'extent_out' : 'extent in projected space', 896 | 897 | 'background': '', 898 | 'background_image': '', 899 | 'background_brightness' : '', 900 | 901 | # OpenStreetMap background tiles 902 | 'osm' : 'True/False; see command line options', 903 | 'osm_base' : '', 904 | 'zoom' : '', 905 | 906 | #layer only outputs the points 907 | 'layer':'', 908 | 909 | # These are for making an animation, ignored otherwise. 910 | 'ffmpegopts' : '', 911 | 'keepframes' : '', 912 | 'frequency' : '', 913 | 'straggler_threshold' : '', 914 | 915 | # We always instantiate an OptionParser in order to set up 916 | # default values. You can use this OptionParser in your own 917 | # script, perhaps adding your own options. 918 | 'optparser' : 'OptionParser instance for command line processing', 919 | } 920 | 921 | _kernels = { 'linear': LinearKernel, 922 | 'gaussian': GaussianKernel, } 923 | _projections = { 'equirectangular': EquirectangularProjection, 924 | 'mercator': MercatorProjection, } 925 | 926 | def __init__(self, use_defaults=True): 927 | for k in self.glossary.keys(): 928 | setattr(self, k, None) 929 | self.optparser = self._make_optparser() 930 | if use_defaults: 931 | self.set_defaults() 932 | 933 | def set_defaults(self): 934 | (options, args) = self.optparser.parse_args([]) 935 | self.set_from_options(options) 936 | 937 | def _make_optparser(self): 938 | '''Return a an OptionParser set up for our command line options.''' 939 | # TODO: convert to argparse 940 | from optparse import OptionParser 941 | optparser = OptionParser(version=__version__) 942 | optparser.add_option('-g', '--gpx', metavar='FILE') 943 | optparser.add_option( 944 | '-p', '--points', metavar='FILE', 945 | help=( 946 | 'File containing one space-separated coordinate pair per line, ' 947 | 'with optional point value as third term.')) 948 | optparser.add_option( 949 | '', '--csv', metavar='FILE', 950 | help=( 951 | 'File containing one comma-separated coordinate pair per line, ' 952 | 'the rest of the line is ignored.')) 953 | optparser.add_option( 954 | '', '--ignore_csv_header', action='store_true', 955 | help='Ignore first line of CSV input file.') 956 | optparser.add_option( 957 | '', '--shp_file', metavar='FILE', 958 | help=('ESRI Shapefile containing the points.')) 959 | optparser.add_option( 960 | '-s', '--scale', metavar='FLOAT', type='float', 961 | help='meters per pixel, approximate'), 962 | optparser.add_option( 963 | '-W', '--width', metavar='INT', type='int', 964 | help='width of output image'), 965 | optparser.add_option( 966 | '-H', '--height', metavar='INT', type='int', 967 | help='height of output image'), 968 | optparser.add_option( 969 | '-P', '--projection', metavar='NAME', type='choice', 970 | choices=list(self._projections.keys()), default='mercator', 971 | help='choices: ' + ', '.join(self._projections.keys()) + 972 | '; default: %default') 973 | optparser.add_option( 974 | '-e', '--extent', metavar='RANGE', 975 | help=( 976 | 'Clip results to RANGE, which is specified as lat1,lon1,lat2,lon2;' 977 | ' (for square mercator: -85.0511,-180,85.0511,180)')) 978 | optparser.add_option( 979 | '-R', '--margin', metavar='INT', type='int', default=0, 980 | help=( 981 | 'Try to keep data at least this many pixels away from image ' 982 | 'border.')) 983 | optparser.add_option( 984 | '-r', '--radius', metavar='INT', type='int', default=5, 985 | help='pixel radius of point blobs; default: %default') 986 | optparser.add_option( 987 | '-d', '--decay', metavar='FLOAT', type='float', default=0.95, 988 | help=( 989 | 'float in [0,1]; Larger values give more weight to data ' 990 | 'magnitude. Smaller values are more democratic. default:' 991 | '%default')) 992 | optparser.add_option( 993 | '-S', '--save', metavar='FILE', help='save processed data to FILE') 994 | optparser.add_option( 995 | '-L', '--load', metavar='FILE', help='load processed data from FILE') 996 | optparser.add_option( 997 | '-o', '--output', metavar='FILE', 998 | help='name of output file (image or video)') 999 | optparser.add_option( 1000 | '-a', '--animate', action='store_true', 1001 | help='Make an animation instead of a static image') 1002 | optparser.add_option( 1003 | '', '--frequency', type='int', default=1, 1004 | help='input points per animation frame; default: %default') 1005 | optparser.add_option( 1006 | '', '--straggler_threshold', type='int', default=1, 1007 | help='add one more animation frame if >= this many inputs remain') 1008 | optparser.add_option( 1009 | '-F', '--ffmpegopts', metavar='STR', 1010 | help='extra options to pass to ffmpeg when making an animation') 1011 | optparser.add_option( 1012 | '-K', '--keepframes', action='store_true', 1013 | help='keep intermediate images after creating an animation') 1014 | optparser.add_option( 1015 | '-b', '--background', metavar='COLOR', 1016 | help='composite onto this background (color name or #rrggbb)') 1017 | optparser.add_option( 1018 | '-I', '--background_image', metavar='FILE', 1019 | help='composite onto this image') 1020 | optparser.add_option( 1021 | '-B', '--background_brightness', type='float', metavar='NUM', 1022 | help='Multiply each pixel in background image by this.') 1023 | optparser.add_option( 1024 | '-m', '--hsva_min', metavar='HEX', 1025 | default=ColorMap.DEFAULT_HSVA_MIN_STR, 1026 | help='hhhssvvaa hex for minimum data values; default: %default') 1027 | optparser.add_option( 1028 | '-M', '--hsva_max', metavar='HEX', 1029 | default=ColorMap.DEFAULT_HSVA_MAX_STR, 1030 | help='hhhssvvaa hex for maximum data values; default: %default') 1031 | optparser.add_option( 1032 | '-G', '--gradient', metavar='FILE', 1033 | help=( 1034 | 'Take color gradient from this the first column of pixels in ' 1035 | 'this image. Overrides -m and -M.')) 1036 | optparser.add_option( 1037 | '-k', '--kernel', 1038 | type='choice', 1039 | default='linear', 1040 | choices=list(self._kernels.keys()), 1041 | help=('Kernel to use for the falling-off function; choices: ' + 1042 | ', '.join(self._kernels.keys()) + '; default: %default')) 1043 | optparser.add_option( 1044 | '', '--osm', action='store_true', 1045 | help='Composite onto OpenStreetMap tiles') 1046 | optparser.add_option( 1047 | '', '--osm_base', metavar='URL', 1048 | default='http://tile.openstreetmap.org', 1049 | help='Base URL for map tiles; default %default') 1050 | optparser.add_option( 1051 | '-z', '--zoom', type='int', 1052 | help='Zoom level for OSM; 0 (the default) means autozoom') 1053 | optparser.add_option('-v', '--verbose', action='store_true') 1054 | optparser.add_option('', '--debug', action='store_true') 1055 | optparser.add_option('', '--layer', action='store_true', help='only plot points') 1056 | 1057 | return optparser 1058 | 1059 | def set_from_options(self, options): 1060 | for k in self.glossary.keys(): 1061 | try: 1062 | setattr(self, k, getattr(options, k)) 1063 | except AttributeError: 1064 | pass 1065 | 1066 | self.kernel = self._kernels[options.kernel](options.radius) 1067 | self.projection = self._projections[options.projection]() 1068 | 1069 | if options.scale: 1070 | self.projection.meters_per_pixel = options.scale 1071 | 1072 | if options.gradient: 1073 | self.colormap = ColorMap(image = Image.open(options.gradient)) 1074 | else: 1075 | self.colormap = ColorMap(hsva_min = ColorMap.str_to_hsva(options.hsva_min), 1076 | hsva_max = ColorMap.str_to_hsva(options.hsva_max)) 1077 | if options.gpx: 1078 | logging.debug('Reading from gpx: %s' % options.gpx) 1079 | self.shapes = shapes_from_gpx(options.gpx) 1080 | elif options.points: 1081 | logging.debug('Reading from points: %s' % options.points) 1082 | self.shapes = shapes_from_file(options.points) 1083 | elif options.csv: 1084 | logging.debug('Reading from csv: %s' % options.csv) 1085 | self.shapes = shapes_from_csv(options.csv, options.ignore_csv_header) 1086 | elif options.shp_file: 1087 | logging.debug('Reading from Shape File: %s' % options.shp_file) 1088 | self.shapes = shapes_from_shp(options.shp_file) 1089 | 1090 | if options.extent: 1091 | (lat1, lon1, lat2, lon2) = \ 1092 | [float(f) for f in options.extent.split(',')] 1093 | self.extent_in = Extent(coords=(LatLon(lat1, lon1), 1094 | LatLon(lat2, lon2))) 1095 | if options.background_image: 1096 | self.background_image = Image.open(options.background_image) 1097 | (self.width, self.height) = self.background_image.size 1098 | 1099 | 1100 | def fill_missing(self): 1101 | if not self.shapes: 1102 | raise ValueError('no input specified') 1103 | 1104 | padding = self.margin + self.kernel.radius 1105 | if not self.extent_in: 1106 | logging.debug('reading input data') 1107 | self.shapes = list(self.shapes) 1108 | logging.debug('read %d shapes' % len(self.shapes)) 1109 | self.extent_in = Extent(shapes=self.shapes) 1110 | 1111 | if self.osm: 1112 | get_osm_background(self, padding) 1113 | else: 1114 | if not self.projection.is_scaled(): 1115 | self.projection.auto_set_scale(self.extent_in, padding, 1116 | self.width, self.height) 1117 | if not (self.width or self.height or self.background_image): 1118 | raise ValueError('You must specify width or height or scale ' 1119 | 'or background_image or both osm and zoom.') 1120 | 1121 | if self.background_brightness is not None: 1122 | if self.background_image: 1123 | self.background_image = self.background_image.point( 1124 | lambda x: x * self.background_brightness) 1125 | self.background_brightness = None # idempotence 1126 | else: 1127 | logging.warning( 1128 | 'background brightness specified, but no background image') 1129 | 1130 | if not self.extent_out: 1131 | self.extent_out = self.extent_in.map(self.projection.project) 1132 | self.extent_out.grow(padding) 1133 | logging.info('input extent: %s' % str(self.extent_out.map( 1134 | self.projection.inverse_project))) 1135 | logging.info('output extent: %s' % str(self.extent_out)) 1136 | 1137 | 1138 | def main(): 1139 | logging.basicConfig(format='%(relativeCreated)8d ms // %(message)s') 1140 | config = Configuration(use_defaults=False) 1141 | (options, args) = config.optparser.parse_args() 1142 | 1143 | if options.verbose: 1144 | logging.getLogger().setLevel(logging.INFO) 1145 | if options.debug: 1146 | logging.getLogger().setLevel(logging.DEBUG) 1147 | 1148 | logging.debug('hsva_min: {}/n hsva_max: {}'.format(options.hsva_min, options.hsva_max)) 1149 | 1150 | if options.load: 1151 | logging.info('loading data') 1152 | matrix = pickle.load(open(options.load, 'rb')) 1153 | config = matrix['config'] 1154 | del matrix['config'] 1155 | config.set_from_options(options) 1156 | config.fill_missing() 1157 | else: 1158 | config.set_from_options(options) 1159 | config.fill_missing() 1160 | if options.animate: 1161 | animator = ImageSeriesMaker(config) 1162 | matrix = animator.run() 1163 | else: 1164 | matrix = process_shapes(config) 1165 | matrix = matrix.finalized() 1166 | 1167 | if options.output and not options.animate: 1168 | image = ImageMaker(config).make_image(matrix) 1169 | image.save(options.output) 1170 | 1171 | if options.save: 1172 | logging.info('saving data') 1173 | matrix['config'] = config 1174 | pickle.dump(matrix, open(options.save, 'wb'), 2) 1175 | 1176 | logging.info('end') 1177 | 1178 | if __name__ == '__main__': 1179 | main() 1180 | -------------------------------------------------------------------------------- /murcia_tweets_polarity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity.png -------------------------------------------------------------------------------- /murcia_tweets_polarity_layered_combined_binary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity_layered_combined_binary.png -------------------------------------------------------------------------------- /murcia_tweets_polarity_layered_combined_binary2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity_layered_combined_binary2.png --------------------------------------------------------------------------------