├── 1. Downloading the tweets.ipynb
├── 2. Parse the tweet file.ipynb
├── 3. Heatmap.ipynb
├── 4. Sentiment Analysis.ipynb
├── heatmap.py
├── murcia_tweets_polarity.png
├── murcia_tweets_polarity_layered_combined_binary.png
├── murcia_tweets_polarity_layered_combined_binary2.png
└── tweets.txt
/1. Downloading the tweets.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 4,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [
10 | {
11 | "name": "stdout",
12 | "output_type": "stream",
13 | "text": [
14 | "11/02/2015 22:10:55\n",
15 | "\n",
16 | "CPython 2.7.10\n",
17 | "IPython 4.0.0\n",
18 | "\n",
19 | "compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)\n",
20 | "system : Linux\n",
21 | "release : 3.13.0-66-generic\n",
22 | "machine : x86_64\n",
23 | "processor : x86_64\n",
24 | "CPU cores : 8\n",
25 | "interpreter: 64bit\n"
26 | ]
27 | }
28 | ],
29 | "source": [
30 | "%load_ext watermark\n",
31 | "%watermark"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "**TWEETS HEATMAP OF MURCIA**"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "**1. Getting the tweets**"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "*In order to get the tweets, we need to use Twitter Streaming api. I found that [Tweepy](http://tweepy.readthedocs.org/en/latest/getting_started.html), one (of many) Twitter API Python wrappers does an outstanding job in capturing tweets via Twitter Streaming API*\n",
53 | "\n",
54 | "Here is the code I used. I filtered the tweets by location. Twitter allows for filtering using a bounding box set of coordinates using the following structure:\n",
55 | "\n",
56 | "**location=[sw_longitude, sw_latitude, ne_longitude, ne_latitude]**\n",
57 | "\n",
58 | "*(interesting how some apis follow (lat, lon) and others (lon, lat). We need a standard)*"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {
65 | "collapsed": false
66 | },
67 | "outputs": [],
68 | "source": [
69 | "import json\n",
70 | "from tweepy import Stream\n",
71 | "from tweepy import OAuthHandler\n",
72 | "from tweepy.streaming import StreamListener\n",
73 | "\n",
74 | "\n",
75 | "ckey = YOUR_CONSUMER_KEY_HERE\n",
76 | "csecret = YOUR_CONSUMER_SECRET_HERE\n",
77 | "atoken = YOUR_TWITTER_APP_TOKEN_HERE\n",
78 | "asecret = YOUR_TWITTER_APP_SECRET_HERE\n",
79 | "\n",
80 | "murcia = [-1.157420, 37.951741, -1.081202, 38.029126] #Check it out, is a very nice city!\n",
81 | "\n",
82 | "file = open('tweets.txt', 'a')\n",
83 | "\n",
84 | "class listener(StreamListener):\n",
85 | "\n",
86 | " def on_data(self, data):\n",
87 | " # Twitter returns data in JSON format - we need to decode it first\n",
88 | " try:\n",
89 | " decoded = json.loads(data)\n",
90 | " except Exception as e:\n",
91 | " print e #we don't want the listener to stop\n",
92 | " return True\n",
93 | " \n",
94 | " if decoded.get('geo') is not None:\n",
95 | " location = decoded.get('geo').get('coordinates')\n",
96 | " else:\n",
97 | " location = '[,]'\n",
98 | " text = decoded['text'].replace('\\n',' ')\n",
99 | " user = '@' + decoded.get('user').get('screen_name')\n",
100 | " created = decoded.get('created_at')\n",
101 | " tweet = '%s|%s|%s|s\\n' % (user,location,created,text)\n",
102 | " \n",
103 | " file.write(tweet)\n",
104 | " print tweet\n",
105 | " return True\n",
106 | "\n",
107 | " def on_error(self, status):\n",
108 | " print status\n",
109 | "\n",
110 | "if __name__ == '__main__':\n",
111 | " print 'Starting'\n",
112 | " \n",
113 | " auth = OAuthHandler(ckey, csecret)\n",
114 | " auth.set_access_token(atoken, asecret)\n",
115 | " twitterStream = Stream(auth, listener())\n",
116 | " twitterStream.filter(locations=murcia)"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "**Run it...and wait.**\n",
124 | "\n",
125 | "The script will capture all the tweets that fit within that bounding box we setup. \n",
126 | "\n",
127 | "One important thing to notice is that the api is not 100% accurate on the data it returns. I found several geocoded tweets that didn't belong to the specified box. \n",
128 | "\n",
129 | "Since the script has to be running in order to capture all the tweets, you can run this on a spare computer if you have one, or alternatively you can consider online services such as [RedHat](http://www.redhat.com/) or [PythonAnywhere](https://www.pythonanywhere.com/), or rent your ownn tiny machine on the cloud with services like [Digital Ocean](digitalocean.com) or [Amazon Web Services](aws.amazon.com)"
130 | ]
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "metadata": {},
135 | "source": [
136 | "
"
137 | ]
138 | }
139 | ],
140 | "metadata": {
141 | "kernelspec": {
142 | "display_name": "Python 2",
143 | "language": "python",
144 | "name": "python2"
145 | },
146 | "language_info": {
147 | "codemirror_mode": {
148 | "name": "ipython",
149 | "version": 2
150 | },
151 | "file_extension": ".py",
152 | "mimetype": "text/x-python",
153 | "name": "python",
154 | "nbconvert_exporter": "python",
155 | "pygments_lexer": "ipython2",
156 | "version": "2.7.10"
157 | }
158 | },
159 | "nbformat": 4,
160 | "nbformat_minor": 0
161 | }
162 |
--------------------------------------------------------------------------------
/2. Parse the tweet file.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "###LOAD AND PARSE THE TWEETS FILE"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": false
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "11/03/2015 18:36:43\n",
22 | "\n",
23 | "CPython 2.7.10\n",
24 | "IPython 4.0.0\n",
25 | "\n",
26 | "compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)\n",
27 | "system : Linux\n",
28 | "release : 3.13.0-66-generic\n",
29 | "machine : x86_64\n",
30 | "processor : x86_64\n",
31 | "CPU cores : 8\n",
32 | "interpreter: 64bit\n"
33 | ]
34 | }
35 | ],
36 | "source": [
37 | "%load_ext watermark\n",
38 | "%watermark"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "The Tweepy Stream Handler writes to a file containing one tweet per line. Each line follows the following structure:\n",
46 | "\n",
47 | "*@USER + | + [LAT,LON] | TIMESTAMP | TWEET*\n",
48 | "\n",
49 | "And now we proceed to turn it into a more useable file"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "Since the tweets file was so big, I processed it by chunks instead of loading all of it in memory"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "metadata": {
63 | "collapsed": true
64 | },
65 | "outputs": [],
66 | "source": [
67 | "import pandas as pd\n",
68 | "import numpy as np\n",
69 | "\n",
70 | "tweets_raw = pd.read_table('tweets.txt', header=None, iterator=True)\n",
71 | "\n",
72 | "while 1:\n",
73 | " tweets = tweets_raw.get_chunk(10000)\n",
74 | " tweets.columns = ['tweets']\n",
75 | " tweets['len'] = tweets.tweets.apply(lambda x: len(x.split('|')))\n",
76 | " tweets[tweets.len < 4] = np.nan\n",
77 | " del tweets['len']\n",
78 | " tweets = tweets[tweets.tweets.notnull()]\n",
79 | " tweets['user'] = tweets.tweets.apply(lambda x: x.split('|')[0])\n",
80 | " tweets['geo'] = tweets.tweets.apply(lambda x: x.split('|')[1])\n",
81 | " tweets['timestamp'] = tweets.tweets.apply(lambda x: x.split('|')[2])\n",
82 | " tweets['tweet'] = tweets.tweets.apply(lambda x: x.split('|')[3])\n",
83 | " tweets['lat'] = tweets.geo.apply(lambda x: x.split(',')[0].replace('[',''))\n",
84 | " tweets['lon'] = tweets.geo.apply(lambda x: x.split(',')[1].replace(']',''))\n",
85 | " del tweets['tweets']\n",
86 | " del tweets['geo']\n",
87 | " tweets['lon'] = tweets.lon.convert_objects(convert_numeric=True)\n",
88 | " tweets['lat'] = tweets.lat.convert_objects(convert_numeric=True)\n",
89 | " tweets.to_csv('tweets.csv', mode='a', header=False,index=False)"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 31,
95 | "metadata": {
96 | "collapsed": false
97 | },
98 | "outputs": [
99 | {
100 | "data": {
101 | "text/plain": [
102 | "(181987, 5)"
103 | ]
104 | },
105 | "execution_count": 31,
106 | "metadata": {},
107 | "output_type": "execute_result"
108 | }
109 | ],
110 | "source": [
111 | "tweets.shape"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": 9,
117 | "metadata": {
118 | "collapsed": false
119 | },
120 | "outputs": [
121 | {
122 | "data": {
123 | "text/plain": [
124 | "user object\n",
125 | "timestamp datetime64[ns]\n",
126 | "tweet object\n",
127 | "lat float64\n",
128 | "lon float64\n",
129 | "dtype: object"
130 | ]
131 | },
132 | "execution_count": 9,
133 | "metadata": {},
134 | "output_type": "execute_result"
135 | }
136 | ],
137 | "source": [
138 | "tweets.dtypes"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 33,
144 | "metadata": {
145 | "collapsed": false
146 | },
147 | "outputs": [
148 | {
149 | "data": {
150 | "text/html": [
151 | "\n",
152 | "
\n",
153 | " \n",
154 | " \n",
155 | " | \n",
156 | " user | \n",
157 | " timestamp | \n",
158 | " tweet | \n",
159 | " lat | \n",
160 | " lon | \n",
161 | "
\n",
162 | " \n",
163 | " \n",
164 | " \n",
165 | " 1 | \n",
166 | " @IkiduAuren | \n",
167 | " 2015-03-29 15:58:23 | \n",
168 | " Feliz tarde de Domingo. http://t.co/jxL7v5zFwd | \n",
169 | " 38.842937 | \n",
170 | " -0.115407 | \n",
171 | "
\n",
172 | " \n",
173 | " 3 | \n",
174 | " @monorex2 | \n",
175 | " 2015-03-29 15:59:37 | \n",
176 | " Good afternoon:-D:-D | \n",
177 | " 38.026032 | \n",
178 | " -1.208355 | \n",
179 | "
\n",
180 | " \n",
181 | " 4 | \n",
182 | " @Santos_Poveda | \n",
183 | " 2015-03-29 16:01:29 | \n",
184 | " @InkUtv @OilVirgin @RiobuenoRafael @NinaNebo @... | \n",
185 | " 38.095896 | \n",
186 | " -1.181909 | \n",
187 | "
\n",
188 | " \n",
189 | " 6 | \n",
190 | " @anittaaML | \n",
191 | " 2015-03-29 16:03:46 | \n",
192 | " @caarmens98 te voy a reportar por hj de p | \n",
193 | " 37.992632 | \n",
194 | " -1.197700 | \n",
195 | "
\n",
196 | " \n",
197 | " 7 | \n",
198 | " @helenatovar0210 | \n",
199 | " 2015-03-29 16:04:46 | \n",
200 | " Lucha por lo qe quieres qe les joda a los qe h... | \n",
201 | " 38.055685 | \n",
202 | " -1.081301 | \n",
203 | "
\n",
204 | " \n",
205 | "
\n",
206 | "
"
207 | ],
208 | "text/plain": [
209 | " user timestamp \\\n",
210 | "1 @IkiduAuren 2015-03-29 15:58:23 \n",
211 | "3 @monorex2 2015-03-29 15:59:37 \n",
212 | "4 @Santos_Poveda 2015-03-29 16:01:29 \n",
213 | "6 @anittaaML 2015-03-29 16:03:46 \n",
214 | "7 @helenatovar0210 2015-03-29 16:04:46 \n",
215 | "\n",
216 | " tweet lat lon \n",
217 | "1 Feliz tarde de Domingo. http://t.co/jxL7v5zFwd 38.842937 -0.115407 \n",
218 | "3 Good afternoon:-D:-D 38.026032 -1.208355 \n",
219 | "4 @InkUtv @OilVirgin @RiobuenoRafael @NinaNebo @... 38.095896 -1.181909 \n",
220 | "6 @caarmens98 te voy a reportar por hj de p 37.992632 -1.197700 \n",
221 | "7 Lucha por lo qe quieres qe les joda a los qe h... 38.055685 -1.081301 "
222 | ]
223 | },
224 | "execution_count": 33,
225 | "metadata": {},
226 | "output_type": "execute_result"
227 | }
228 | ],
229 | "source": [
230 | "#convert time zome from UTC to Spain time for further time of day analyses\n",
231 | "tweets.set_index('timestamp').tz_localize('UTC').tz_convert('Europe/Madrid').reset_index()\n",
232 | "tweets.head()"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "Since we want to do a heatmap, we only care about those tweets that are geocoded and whose latitude and longitud are within the Murcia area"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": 34,
245 | "metadata": {
246 | "collapsed": false
247 | },
248 | "outputs": [
249 | {
250 | "data": {
251 | "text/plain": [
252 | "(95384, 5)"
253 | ]
254 | },
255 | "execution_count": 34,
256 | "metadata": {},
257 | "output_type": "execute_result"
258 | }
259 | ],
260 | "source": [
261 | "min_lon = -1.157420\n",
262 | "max_lon = -1.081202\n",
263 | "min_lat = 37.951741\n",
264 | "max_lat = 38.029126\n",
265 | "\n",
266 | "tweets = tweets[(tweets.lat.notnull()) & (tweets.lon.notnull())]\n",
267 | "\n",
268 | "tweets = tweets[(tweets.lon > min_lon) & (tweets.lon < max_lon) & (tweets.lat > min_lat) & (tweets.lat < max_lat)]\n",
269 | "tweets.shape"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {},
275 | "source": [
276 | "Finally, we save the parsed tweets to use with [heatmap.py](http://www.sethoscope.net/heatmap/)"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 40,
282 | "metadata": {
283 | "collapsed": false
284 | },
285 | "outputs": [
286 | {
287 | "name": "stdout",
288 | "output_type": "stream",
289 | "text": [
290 | "/media/manuel/DATA/Backup/Proyectos/tweepy murcia/heatmap\n"
291 | ]
292 | }
293 | ],
294 | "source": [
295 | "cd ../heatmap"
296 | ]
297 | },
298 | {
299 | "cell_type": "code",
300 | "execution_count": 42,
301 | "metadata": {
302 | "collapsed": false
303 | },
304 | "outputs": [],
305 | "source": [
306 | "with open('tweets_heatmap','w') as file:\n",
307 | " file.write(tweets[['lat','lon']].to_string(header=False, index=False))"
308 | ]
309 | }
310 | ],
311 | "metadata": {
312 | "kernelspec": {
313 | "display_name": "Python 2",
314 | "language": "python",
315 | "name": "python2"
316 | },
317 | "language_info": {
318 | "codemirror_mode": {
319 | "name": "ipython",
320 | "version": 2
321 | },
322 | "file_extension": ".py",
323 | "mimetype": "text/x-python",
324 | "name": "python",
325 | "nbconvert_exporter": "python",
326 | "pygments_lexer": "ipython2",
327 | "version": "2.7.10"
328 | }
329 | },
330 | "nbformat": 4,
331 | "nbformat_minor": 0
332 | }
333 |
--------------------------------------------------------------------------------
/heatmap.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | #
3 | # heatmap.py - Generates heat map images and animations from geographic data
4 | # Copyright 2010 Seth Golub
5 | # http://www.sethoscope.net/heatmap/
6 | #
7 | # This program is free software: you can redistribute it and/or modify
8 | # it under the terms of the GNU Affero General Public License as
9 | # published by the Free Software Foundation, either version 3 of the
10 | # License, or (at your option) any later version.
11 | #
12 | # This program is distributed in the hope that it will be useful,
13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 | # GNU Affero General Public License for more details.
16 | #
17 | # You should have received a copy of the GNU Affero General Public License
18 | # along with this program. If not, see .
19 |
20 | from __future__ import print_function
21 |
22 | import sys
23 | import logging
24 | import math
25 | from PIL import Image
26 | from PIL import ImageColor
27 | import tempfile
28 | import os.path
29 | import shutil
30 | import subprocess
31 | from time import mktime, strptime
32 | from collections import defaultdict
33 | import xml.etree.cElementTree as ET
34 | from colorsys import hsv_to_rgb
35 | try:
36 | import cPickle as pickle
37 | except ImportError:
38 | import pickle
39 |
40 | __version__ = '1.11'
41 |
42 | class Coordinate(object):
43 | def __init__(self, x, y):
44 | self.x = x
45 | self.y = y
46 |
47 | first = property(lambda self: self.x)
48 | second = property(lambda self: self.y)
49 |
50 | def copy(self):
51 | return self.__class__(self.first, self.second)
52 |
53 | def __str__(self):
54 | return '(%s, %s)' % (str(self.x), str(self.y))
55 |
56 | def __hash__(self):
57 | return hash((self.x, self.y))
58 |
59 | def __eq__(self, o):
60 | return True if self.x == o.x and self.y == o.y else False
61 |
62 | def __sub__(self, o):
63 | return self.__class__(self.first - o.first, self.second - o.second)
64 |
65 |
66 | class LatLon(Coordinate):
67 | def __init__(self, lat, lon):
68 | self.lat = lat
69 | self.lon = lon
70 |
71 | def get_lat(self):
72 | return self.y
73 |
74 | def set_lat(self, lat):
75 | self.y = lat
76 |
77 | def get_lon(self):
78 | return self.x
79 |
80 | def set_lon(self, lon):
81 | self.x = lon
82 |
83 | lat = property(get_lat, set_lat)
84 | lon = property(get_lon, set_lon)
85 |
86 | first = property(get_lat)
87 | second = property(get_lon)
88 |
89 | class TrackLog:
90 | class Trkseg(list): # for GPX tags
91 | pass
92 |
93 | class Trkpt: # for GPX tags
94 | def __init__(self, lat, lon):
95 | self.coords = LatLon(float(lat), float(lon))
96 |
97 | def __str__(self):
98 | return str(self.coords)
99 |
100 | def _parse(self, filename):
101 | self._segments = []
102 | for event, elem in ET.iterparse(filename, ('start', 'end')):
103 | elem.tag = elem.tag[elem.tag.rfind('}') + 1:] # remove namespace
104 | if elem.tag == "trkseg":
105 | if event == 'start':
106 | self._segments.append(TrackLog.Trkseg())
107 | else: # event == 'end'
108 | yield self._segments.pop()
109 | elem.clear() # delete contents from parse tree
110 | elif elem.tag == 'trkpt' and event == 'end':
111 | point = TrackLog.Trkpt(elem.attrib['lat'], elem.attrib['lon'])
112 | self._segments[-1].append(point)
113 | timestr = elem.findtext('time')
114 | if timestr:
115 | timestr = timestr[:-1].split('.')[0] + ' GMT'
116 | point.time = mktime(
117 | strptime(timestr, '%Y-%m-%dT%H:%M:%S %Z'))
118 | elem.clear() # clear the trkpt node to minimize memory usage
119 |
120 | def __init__(self, filename):
121 | self.filename = filename
122 |
123 | def segments(self):
124 | '''Parse file and yield segments containing points'''
125 | logging.info('reading GPX track from %s' % self.filename)
126 | return self._parse(self.filename)
127 |
128 |
129 | class Projection(object):
130 | # For guessing scale, we pretend the earth is a sphere with this
131 | # radius in meters, as in Web Mercator (the projection all the
132 | # online maps use).
133 | EARTH_RADIUS = 6378137 # in meters
134 |
135 | def get_pixels_per_degree(self):
136 | try:
137 | return self._pixels_per_degree
138 | except AttributeError:
139 | raise AttributeError('projection scale was never set')
140 |
141 | def set_pixels_per_degree(self, val):
142 | self._pixels_per_degree = val
143 | logging.info('scale: %f meters/pixel (%f pixels/degree)'
144 | % (self.meters_per_pixel, val))
145 |
146 | def get_meters_per_pixel(self):
147 | return 2 * math.pi * self.EARTH_RADIUS / 360 / self.pixels_per_degree
148 |
149 | def set_meters_per_pixel(self, val):
150 | self.pixels_per_degree = 2 * math.pi * self.EARTH_RADIUS / 360 / val
151 | return val
152 |
153 | pixels_per_degree = property(get_pixels_per_degree, set_pixels_per_degree)
154 | meters_per_pixel = property(get_meters_per_pixel, set_meters_per_pixel)
155 |
156 | def is_scaled(self):
157 | return hasattr(self, '_pixels_per_degree')
158 |
159 | def project(self, coords):
160 | raise NotImplementedError
161 |
162 | def inverse_project(self, coords): # Not all projections can support this.
163 | raise NotImplementedError
164 |
165 | def auto_set_scale(self, extent_in, padding, width=None, height=None):
166 | # We need to choose a scale at which the data's bounding box,
167 | # once projected onto the map, will fit in the specified height
168 | # and/or width. The catch is that we can't project until we
169 | # have a scale, so what we'll do is set a provisional scale,
170 | # project the bounding box onto the map, then adjust the scale
171 | # appropriately. This way we don't need to know anything about
172 | # the projection.
173 | #
174 | # Projection subclasses are free to override this method with
175 | # something simpler that just solves for scale given the lat/lon
176 | # and x/y bounds.
177 |
178 | # We'll work large to minimize roundoff error.
179 | SCALE_FACTOR = 1000000.0
180 | self.pixels_per_degree = SCALE_FACTOR
181 | extent_out = extent_in.map(self.project)
182 | padding *= 2 # padding-per-edge -> padding-in-each-dimension
183 | try:
184 | if height:
185 | self.pixels_per_degree = pixels_per_lat = (
186 | float(height - padding) /
187 | extent_out.size().y * SCALE_FACTOR)
188 | if width:
189 | self.pixels_per_degree = (
190 | float(width - padding) /
191 | extent_out.size().x * SCALE_FACTOR)
192 | if height:
193 | self.pixels_per_degree = min(self.pixels_per_degree,
194 | pixels_per_lat)
195 | except ZeroDivisionError:
196 | raise ZeroDivisionError(
197 | 'You need at least two data points for auto scaling. '
198 | 'Try specifying the scale explicitly (or extent + '
199 | 'height or width).')
200 | assert(self.pixels_per_degree > 0)
201 |
202 |
203 | # Treats Lat/Lon as a square grid.
204 | class EquirectangularProjection(Projection):
205 | # http://en.wikipedia.org/wiki/Equirectangular_projection
206 | def project(self, coord):
207 | x = coord.lon * self.pixels_per_degree
208 | y = -coord.lat * self.pixels_per_degree
209 | return Coordinate(x, y)
210 |
211 | def inverse_project(self, coord):
212 | lat = -coord.y / self.pixels_per_degree
213 | lon = coord.x / self.pixels_per_degree
214 | return LatLon(lat, lon)
215 |
216 |
217 | class MercatorProjection(Projection):
218 | def set_pixels_per_degree(self, val):
219 | super(MercatorProjection, self).set_pixels_per_degree(val)
220 | self._pixels_per_radian = val * (180 / math.pi)
221 | pixels_per_degree = property(Projection.get_pixels_per_degree,
222 | set_pixels_per_degree)
223 |
224 | def project(self, coord):
225 | x = coord.lon * self.pixels_per_degree
226 | y = -self._pixels_per_radian * math.log(
227 | math.tan((math.pi/4 + math.pi/360 * coord.lat)))
228 | return Coordinate(x, y)
229 |
230 | def inverse_project(self, coord):
231 | lat = (360 / math.pi
232 | * math.atan(math.exp(-coord.y / self._pixels_per_radian)) - 90)
233 | lon = coord.x / self.pixels_per_degree
234 | return LatLon(lat, lon)
235 |
236 | class Extent():
237 | def __init__(self, coords=None, shapes=None):
238 | if coords:
239 | coords = tuple(coords) # if it's a generator, slurp them all
240 | self.min = coords[0].__class__(min(c.first for c in coords),
241 | min(c.second for c in coords))
242 | self.max = coords[0].__class__(max(c.first for c in coords),
243 | max(c.second for c in coords))
244 | elif shapes:
245 | self.from_shapes(shapes)
246 | else:
247 | raise ValueError('Extent must be initialized')
248 |
249 | def __str__(self):
250 | return '%s,%s,%s,%s' % (self.min.y, self.min.x, self.max.y, self.max.x)
251 |
252 | def update(self, other):
253 | '''grow this bounding box so that it includes the other'''
254 | self.min.x = min(self.min.x, other.min.x)
255 | self.min.y = min(self.min.y, other.min.y)
256 | self.max.x = max(self.max.x, other.max.x)
257 | self.max.y = max(self.max.y, other.max.y)
258 |
259 | def from_bounding_box(self, other):
260 | self.min = other.min.copy()
261 | self.max = other.max.copy()
262 |
263 | def from_shapes(self, shapes):
264 | shapes = iter(shapes)
265 | self.from_bounding_box(next(shapes).extent)
266 | for s in shapes:
267 | self.update(s.extent)
268 |
269 | def corners(self):
270 | return (self.min, self.max)
271 |
272 | def size(self):
273 | return self.max.__class__(self.max.x - self.min.x,
274 | self.max.y - self.min.y)
275 |
276 | def grow(self, pad):
277 | self.min.x -= pad
278 | self.min.y -= pad
279 | self.max.x += pad
280 | self.max.y += pad
281 |
282 | def resize(self, width=None, height=None):
283 | if width:
284 | self.max.x += float(width - self.size().x) / 2
285 | self.min.x = self.max.x - width
286 | if height:
287 | self.max.y += float(height - self.size().y) / 2
288 | self.min.y = self.max.y - height
289 |
290 | def is_inside(self, coord):
291 | return (coord.x >= self.min.x and coord.x <= self.max.x and
292 | coord.y >= self.min.y and coord.y <= self.max.y)
293 |
294 | def map(self, func):
295 | '''Returns a new Extent whose corners are a function of the
296 | corners of this one. The expected use is to project a Extent
297 | onto a map. For example: bbox_xy = bbox_ll.map(projector.project)'''
298 | return Extent(coords=(func(self.min), func(self.max)))
299 |
300 |
301 | class Matrix(defaultdict):
302 | '''An abstract sparse matrix, with data stored as {coord : value}.'''
303 |
304 | @staticmethod
305 | def matrix_factory(decay):
306 | # If decay is 0 or 1, we can accumulate as we go and save lots of
307 | # memory.
308 | if decay == 1.0:
309 | logging.info('creating a summing matrix')
310 | return SummingMatrix()
311 | elif decay == 0.0:
312 | logging.info('creating a maxing matrix')
313 | return MaxingMatrix()
314 | logging.info('creating an appending matrix')
315 | return AppendingMatrix(decay)
316 |
317 | def __init__(self, default_factory=float):
318 | self.default_factory = default_factory
319 |
320 | def add(self, coord, val):
321 | raise NotImplementedError
322 |
323 | def extent(self):
324 | return(Extent(coords=self.keys()))
325 |
326 | def finalized(self):
327 | return self
328 |
329 |
330 | class SummingMatrix(Matrix):
331 | def add(self, coord, val):
332 | self[coord] += val
333 |
334 |
335 | class MaxingMatrix(Matrix):
336 | def add(self, coord, val):
337 | self[coord] = max(val, self.get(coord, val))
338 |
339 |
340 | class AppendingMatrix(Matrix):
341 | def __init__(self, decay):
342 | self.default_factory = list
343 | self.decay = decay
344 |
345 | def add(self, coord, val):
346 | self[coord].append(val)
347 |
348 | def finalized(self):
349 | logging.info('combining coincident points')
350 | m = Matrix()
351 | for (coord, values) in self.items():
352 | m[coord] = self.reduce(self.decay, values)
353 | return m
354 |
355 | @staticmethod
356 | def reduce(decay, values):
357 | '''
358 | Returns a weighted sum of the values, where weight N is
359 | pow(decay,N). This means the largest value counts fully, but
360 | additional values have diminishing contributions. decay=0 makes
361 | the reduction equivalent to max(), which makes each data point
362 | visible, but says nothing about their relative magnitude.
363 | decay=1 makes this like sum(), which makes the relative
364 | magnitude of the points more visible, but could make smaller
365 | values hard to see. Experiment with values between 0 and 1.
366 | Values outside that range will give weird results.
367 | '''
368 | # It would be nice to do this on the fly, while accumulating data, but
369 | # it needs to be insensitive to data order.
370 | weight = 1.0
371 | total = 0.0
372 | values.sort(reverse=True)
373 | for value in values:
374 | total += value * weight
375 | weight *= decay
376 | return total
377 |
378 |
379 | class Point:
380 | def __init__(self, coord, weight=1.0):
381 | self.coord = coord
382 | self.weight = weight
383 |
384 | def __str__(self):
385 | return 'P(%s)' % str(self.coord)
386 |
387 | @staticmethod
388 | def general_distance(x, y):
389 | # assumes square units, which causes distortion in some projections
390 | return (x ** 2 + y ** 2) ** 0.5
391 |
392 | @property
393 | def extent(self):
394 | if not hasattr(self, '_extent'):
395 | self._extent = Extent(coords=(self.coord,))
396 | return self._extent
397 |
398 | # From a modularity standpoint, it would be reasonable to cache
399 | # distances, not heat values, and let the kernel cache the
400 | # distance to heat map, but this is substantially faster.
401 | heat_cache = {}
402 | @classmethod
403 | def _initialize_heat_cache(cls, kernel):
404 | cache = {}
405 | for x in range(kernel.radius + 1):
406 | for y in range(kernel.radius + 1):
407 | cache[(x, y)] = kernel.heat(cls.general_distance(x, y))
408 | cls.heat_cache[kernel] = cache
409 |
410 | def add_heat_to_matrix(self, matrix, kernel):
411 | if kernel not in Point.heat_cache:
412 | Point._initialize_heat_cache(kernel)
413 | cache = Point.heat_cache[kernel]
414 | x = int(self.coord.x)
415 | y = int(self.coord.y)
416 | for dx in range(-kernel.radius, kernel.radius + 1):
417 | for dy in range(-kernel.radius, kernel.radius + 1):
418 | matrix.add(Coordinate(x + dx, y + dy),
419 | self.weight * cache[(abs(dx), abs(dy))])
420 |
421 | def map(self, func):
422 | return Point(func(self.coord), self.weight)
423 |
424 |
425 | class LineSegment:
426 | def __init__(self, start, end, weight=1.0):
427 | self.start = start
428 | self.end = end
429 | self.weight = weight
430 | self.length_squared = float((self.end.x - self.start.x) ** 2 +
431 | (self.end.y - self.start.y) ** 2)
432 | self.extent = Extent(coords=(start, end))
433 |
434 | def __str__(self):
435 | return 'LineSegment(%s, %s)' % (self.start, self.end)
436 |
437 | def distance(self, coord):
438 | # http://stackoverflow.com/questions/849211/shortest-distance-between-a-point-and-a-line-segment
439 | # http://www.topcoder.com/tc?d1=tutorials&d2=geometry1&module=Static#line_point_distance
440 | # http://local.wasp.uwa.edu.au/~pbourke/geometry/pointline/
441 | try:
442 | dx = (self.end.x - self.start.x)
443 | dy = (self.end.y - self.start.y)
444 | u = ((coord.x - self.start.x) * dx +
445 | (coord.y - self.start.y) * dy) / self.length_squared
446 | if u < 0:
447 | u = 0
448 | elif u > 1:
449 | u = 1
450 | except ZeroDivisionError:
451 | u = 0 # Our line is zero-length. That's ok.
452 | dx = self.start.x + u * dx - coord.x
453 | dy = self.start.y + u * dy - coord.y
454 | return math.sqrt(dx * dx + dy * dy)
455 |
456 | def add_heat_to_matrix(self, matrix, kernel):
457 | # Iterate over every point in a bounding box around this, with an
458 | # extra margin given by the kernel's self-reported maximum range.
459 | # TODO: There is probably a more clever iteration that skips more
460 | # of the empty space.
461 | for x in range(int(self.extent.min.x - kernel.radius),
462 | int(self.extent.max.x + kernel.radius + 1)):
463 | for y in range(int(self.extent.min.y - kernel.radius),
464 | int(self.extent.max.y + kernel.radius + 1)):
465 | coord = Coordinate(x, y)
466 | heat = kernel.heat(self.distance(coord))
467 | if heat:
468 | matrix.add(coord, self.weight * heat)
469 |
470 | def map(self, func):
471 | return LineSegment(func(self.start), func(self.end))
472 |
473 |
474 | class LinearKernel:
475 | '''Uses a linear falloff, essentially turning a point into a cone.'''
476 | def __init__(self, radius):
477 | self.radius = radius # in pixels
478 | self.radius_float = float(radius) # worthwhile time saver
479 |
480 | def heat(self, distance):
481 | if distance >= self.radius:
482 | return 0.0
483 | return 1.0 - (distance / self.radius_float)
484 |
485 |
486 | class GaussianKernel:
487 | def __init__(self, radius):
488 | '''radius is the distance beyond which you should not bother.'''
489 | self.radius = radius
490 | # We set the scale such that the heat value drops to 1/256 of
491 | # the peak at a distance of radius.
492 | self.scale = math.log(256) / radius
493 |
494 | def heat(self, distance):
495 | '''Returns 1.0 at center, 1/e at radius pixels from center.'''
496 | return math.e ** (-distance * self.scale)
497 |
498 |
499 | class ColorMap:
500 | DEFAULT_HSVA_MIN_STR = '000ffff00'
501 | DEFAULT_HSVA_MAX_STR = '02affffff'
502 |
503 | @staticmethod
504 | def _str_to_float(string, base=16, maxval=256):
505 | return float(int(string, base)) / maxval
506 |
507 | @staticmethod
508 | def str_to_hsva(string):
509 | '''
510 | Returns a 4-tuple of ints from a hex string color specification,
511 | such that AAABBCCDD becomes AAA, BB, CC, DD. For example,
512 | str2hsva('06688bbff') returns (102, 136, 187, 255). Note that
513 | the first number is 3 digits.
514 | '''
515 | if string.startswith('#'):
516 | string = string[1:] # Leading "#" was once required, is now optional.
517 | return tuple(ColorMap._str_to_float(s) for s in (string[0:3],
518 | string[3:5],
519 | string[5:7],
520 | string[7:9]))
521 |
522 | def __init__(self, hsva_min=None, hsva_max=None, image=None, steps=256):
523 | '''
524 | Create a color map based on a progression in the specified
525 | range, or using pixels in a provided image.
526 |
527 | If supplied, hsva_min and hsva_max must each be a 4-tuple of
528 | (hue, saturation, value, alpha), where each is a float from
529 | 0.0 to 1.0. The gradient will be a linear progression from
530 | hsva_min to hsva_max, including both ends of the range.
531 |
532 | The optional steps argument specifies how many discrete steps
533 | there should be in the color gradient when using hsva_min
534 | and hsva_max.
535 | '''
536 | # TODO: do the interpolation in Lab space instead of HSV
537 | self.values = []
538 | if image:
539 | assert image.mode == 'RGBA', (
540 | 'Gradient image must be RGBA. Yours is %s.' % image.mode)
541 | num_rows = image.size[1]
542 | self.values = [image.getpixel((0, row)) for row in range(num_rows)]
543 | self.values.reverse()
544 | else:
545 | if not hsva_min:
546 | hsva_min = ColorMap.str_to_hsva(self.DEFAULT_HSVA_MIN_STR)
547 | if not hsva_max:
548 | hsva_max = ColorMap.str_to_hsva(self.DEFAULT_HSVA_MAX_STR)
549 | # Turn (h1,s1,v1,a1), (h2,s2,v2,a2) into (h2-h1,s2-s1,v2-v1,a2-a1)
550 | hsva_range = list(map(lambda min, max: max - min, hsva_min, hsva_max))
551 | for value in range(0, steps):
552 | hsva = list(map(
553 | lambda range, min: value / float(steps - 1) * range + min,
554 | hsva_range, hsva_min))
555 | hsva[0] = hsva[0] % 1 # in case hue is out of range
556 | rgba = tuple(
557 | [int(x * 255) for x in hsv_to_rgb(*hsva[0:3]) + (hsva[3],)])
558 | self.values.append(rgba)
559 |
560 | def get(self, floatval):
561 | return self.values[int(floatval * (len(self.values) - 1))]
562 |
563 |
564 | class ImageMaker():
565 | def __init__(self, config):
566 | '''Each argument to the constructor should be a 4-tuple of (hue,
567 | saturaton, value, alpha), one to use for minimum data values and
568 | one for maximum. Each should be in [0,1], however because hue is
569 | circular, you may specify hue in any range and it will be shifted
570 | into [0,1] as needed. This is so you can wrap around the color
571 | wheel in either direction.'''
572 | self.config = config
573 | if config.background and not config.background_image:
574 | self.background = ImageColor.getrgb(config.background)
575 | else:
576 | self.background = None
577 |
578 | @staticmethod
579 | def _blend_pixels(a, b):
580 | # a is RGBA, b is RGB; we could write this more generically,
581 | # but why complicate things?
582 | alpha = a[3] / 255.0
583 | return tuple(
584 | map(lambda aa, bb: int(aa * alpha + bb * (1 - alpha)), a[:3], b))
585 |
586 | def make_image(self, matrix):
587 | extent = self.config.extent_out
588 | if not extent:
589 | extent = matrix.extent()
590 | extent.resize((self.config.width or 1) - 1,
591 | (self.config.height or 1) - 1)
592 | size = extent.size()
593 | size.x = int(size.x) + 1
594 | size.y = int(size.y) + 1
595 | logging.info('saving image (%d x %d)' % (size.x, size.y))
596 | if self.background:
597 | img = Image.new('RGB', (size.x, size.y), self.background)
598 | else:
599 | img = Image.new('RGBA', (size.x, size.y))
600 |
601 | maxval = max(matrix.values())
602 | pixels = img.load()
603 | for (coord, val) in matrix.items():
604 | x = int(coord.x - extent.min.x)
605 | y = int(coord.y - extent.min.y)
606 | if extent.is_inside(coord):
607 | color = self.config.colormap.get(val / maxval)
608 | if self.background:
609 | pixels[x, y] = ImageMaker._blend_pixels(color,
610 | self.background)
611 | else:
612 | pixels[x, y] = color
613 | if self.config.background_image:
614 | img = Image.composite(img, self.config.background_image,
615 | img.split()[3])
616 | return img
617 |
618 |
619 | class ImageSeriesMaker():
620 | '''Creates a movie showing the data appearing on the heatmap.'''
621 | def __init__(self, config):
622 | self.config = config
623 | self.image_maker = ImageMaker(config)
624 | self.tmpdir = tempfile.mkdtemp()
625 | self.imgfile_template = os.path.join(self.tmpdir, 'frame-%05d.png')
626 |
627 |
628 | def _save_image(self, matrix):
629 | self.frame_count += 1
630 | logging.info('Frame %d' % (self.frame_count))
631 | matrix = matrix.finalized()
632 | image = self.image_maker.make_image(matrix)
633 | image.save(self.imgfile_template % self.frame_count)
634 |
635 | def maybe_save_image(self, matrix):
636 | self.inputs_since_output += 1
637 | if self.inputs_since_output >= self.config.frequency:
638 | self._save_image(matrix)
639 | self.inputs_since_output = 0
640 |
641 | @staticmethod
642 | def create_movie(infiles, outfile, ffmpegopts):
643 | command = ['ffmpeg', '-i', infiles]
644 | if ffmpegopts:
645 | # I hope they don't have spaces in their arguments
646 | command.extend(ffmpegopts.split())
647 | command.append(outfile)
648 | logging.info('Encoding video: %s' % ' '.join(command))
649 | subprocess.call(command)
650 |
651 |
652 | def run(self):
653 | logging.info('Putting animation frames in %s' % self.tmpdir)
654 | self.inputs_since_output = 0
655 | self.frame_count = 0
656 | matrix = process_shapes(self.config, self.maybe_save_image)
657 | if ( not self.frame_count
658 | or self.inputs_since_output >= self.config.straggler_threshold ):
659 | self._save_image(matrix)
660 | self.create_movie(self.imgfile_template,
661 | self.config.output,
662 | self.config.ffmpegopts)
663 | if self.config.keepframes:
664 | logging.info('The animation frames are in %s' % self.tmpdir)
665 | else:
666 | shutil.rmtree(self.tmpdir)
667 | return matrix
668 |
669 |
670 | def _get_osm_image(bbox, zoom, osm_base):
671 | # Just a wrapper for osm.createOSMImage to translate coordinate schemes
672 | try:
673 | from osmviz.manager import PILImageManager, OSMManager
674 | osm = OSMManager(
675 | image_manager=PILImageManager('RGB'),
676 | server=osm_base)
677 | (c1, c2) = bbox.corners()
678 | image, bounds = osm.createOSMImage((c1.lat, c2.lat, c1.lon, c2.lon), zoom)
679 | (lat1, lat2, lon1, lon2) = bounds
680 | return image, Extent(coords=(LatLon(lat1, lon1),
681 | LatLon(lat2, lon2)))
682 | except ImportError as e:
683 | logging.error(
684 | "ImportError: %s.\n"
685 | "The --osm option depends on the osmviz module, available from\n"
686 | "http://cbick.github.com/osmviz/\n\n" % str(e))
687 | sys.exit(1)
688 |
689 |
690 | def _scale_for_osm_zoom(zoom):
691 | return 256 * pow(2, zoom) / 360.0
692 |
693 |
694 | def choose_osm_zoom(config, padding):
695 | # Since we know we're only going to do this with Mercator, we could do
696 | # a bit more math and solve this directly, but as a first pass method,
697 | # we instead project the bounding box into pixel-land at a high zoom
698 | # level, then see the power of two we're off by.
699 | if config.zoom:
700 | return config.zoom
701 | if not (config.width or config.height):
702 | raise ValueError('For OSM, you must specify height, width, or zoom')
703 | crazy_zoom_level = 30
704 | proj = MercatorProjection()
705 | scale = _scale_for_osm_zoom(crazy_zoom_level)
706 | proj.pixels_per_degree = scale
707 | bbox_crazy_xy = config.extent_in.map(proj.project)
708 | if config.width:
709 | size_ratio = width_ratio = (
710 | float(bbox_crazy_xy.size().x) / (config.width - 2 * padding))
711 | if config.height:
712 | size_ratio = (
713 | float(bbox_crazy_xy.size().y) / (config.height - 2 * padding))
714 | if config.width:
715 | size_ratio = max(size_ratio, width_ratio)
716 | # TODO: We use --height and --width as upper bounds, choosing a zoom
717 | # level that lets our image be no larger than the specified size.
718 | # It might be desirable to use them as lower bounds or to get as close
719 | # as possible, whether larger or smaller (where "close" probably means
720 | # in pixels, not scale factors).
721 | # TODO: This is off by a little bit at small scales.
722 | zoom = int(crazy_zoom_level - math.log(size_ratio, 2))
723 | logging.info('Choosing OSM zoom level %d' % zoom)
724 | return zoom
725 |
726 |
727 | def get_osm_background(config, padding):
728 | zoom = choose_osm_zoom(config, padding)
729 | proj = MercatorProjection()
730 | proj.pixels_per_degree = _scale_for_osm_zoom(zoom)
731 | bbox_xy = config.extent_in.map(proj.project)
732 | # We're not checking that the padding fits within the specified size.
733 | bbox_xy.grow(padding)
734 | bbox_ll = bbox_xy.map(proj.inverse_project)
735 | image, img_bbox_ll = _get_osm_image(bbox_ll, zoom, config.osm_base)
736 | img_bbox_xy = img_bbox_ll.map(proj.project)
737 |
738 | # TODO: this crops to our data extent, which means we're not making
739 | # an image of the requested dimensions. Perhaps we should let the
740 | # user specify whether to treat the requested size as min,max,exact.
741 | offset = bbox_xy.min - img_bbox_xy.min
742 | image = image.crop((
743 | int(offset.x),
744 | int(offset.y),
745 | int(offset.x + bbox_xy.size().x + 1),
746 | int(offset.y + bbox_xy.size().y + 1)))
747 | config.background_image = image
748 | config.extent_in = bbox_ll
749 | config.projection = proj
750 | (config.width, config.height) = image.size
751 |
752 | #if the layer option is on we only draw the points
753 | if config.layer:
754 | from PIL import ImageDraw
755 | transparent_area = (0,0,image.size[0], image.size[1])
756 | mask=Image.new('L', image.size, color=255)
757 | draw=ImageDraw.Draw(mask)
758 | draw.rectangle(transparent_area, fill=0)
759 | image.putalpha(mask)
760 |
761 | return image, bbox_ll, proj
762 |
763 | def process_shapes(config, hook=None):
764 | matrix = Matrix.matrix_factory(config.decay)
765 | logging.info('processing data')
766 | for shape in config.shapes:
767 | shape = shape.map(config.projection.project)
768 | # TODO: skip shapes outside map extent
769 | shape.add_heat_to_matrix(matrix, config.kernel)
770 | if hook:
771 | hook(matrix)
772 | return matrix
773 |
774 | def shapes_from_gpx(filename):
775 | track = TrackLog(filename)
776 | for trkseg in track.segments():
777 | for i, p1 in enumerate(trkseg[:-1]):
778 | p2 = trkseg[i + 1]
779 | yield LineSegment(p1.coords, p2.coords)
780 |
781 | def shapes_from_file(filename):
782 | logging.info('reading points from %s' % filename)
783 | count = 0
784 | with open(filename, 'rU') as f:
785 | for line in f:
786 | line = line.strip()
787 | if len(line) > 0: # ignore blank lines
788 | values = [float(x) for x in line.split()]
789 | assert len(values) == 2 or len(values) == 3, (
790 | 'input lines must have two or three values: %s' % line)
791 | (lat, lon) = values[0:2]
792 | weight = 1.0 if len(values) == 2 else values[2]
793 | count += 1
794 | yield Point(LatLon(lat, lon), weight)
795 | logging.info('read %d points' % count)
796 |
797 | def shapes_from_csv(filename, ignore_csv_header):
798 | import csv
799 | logging.info('reading csv')
800 | count = 0
801 | with open(filename, 'rU') as f:
802 | reader = csv.reader(f)
803 | if ignore_csv_header:
804 | next(reader) # Skip header line
805 | for row in reader:
806 | (lat, lon) = (float(row[0]), float(row[1]))
807 | count += 1
808 | yield Point(LatLon(lat, lon))
809 | logging.info('read %d points' % count)
810 |
811 | def shapes_from_shp(filename):
812 | try:
813 | import ogr
814 | import osr
815 | except ImportError:
816 | try:
817 | from osgeo import ogr
818 | from osgeo import osr
819 | except ImportError:
820 | raise ImportError('You need to have python-gdal bindings installed')
821 |
822 | driver = ogr.GetDriverByName("ESRI Shapefile")
823 | dataSource = driver.Open(filename, 0)
824 | if dataSource is None:
825 | raise Exception("Not a valid shape file")
826 |
827 | layer = dataSource.GetLayer()
828 | if layer.GetGeomType() != 1:
829 | raise Exception("Only point layers are supported")
830 |
831 | spatial_reference = layer.GetSpatialRef()
832 | if spatial_reference is None:
833 | raise Exception("The shapefile doesn't have spatial reference")
834 |
835 | spatial_reference.AutoIdentifyEPSG()
836 | auth_code = spatial_reference.GetAuthorityCode(None)
837 | if auth_code == '':
838 | raise Exception("The input shapefile projection could not be recognized")
839 |
840 | if auth_code != '4326':
841 | # TODO: implement reproject layer (maybe geometry by geometry is easier)
842 | raise Exception("Currently only Lng-Lat WGS84 is supported (EPSG 4326)")
843 |
844 | count = 0
845 | for feature in layer:
846 | geom = feature.GetGeometryRef()
847 | lat = geom.GetY()
848 | lon = geom.GetX()
849 | count += 1
850 | yield Point(LatLon(lat,lon))
851 |
852 | logging.info('read %d points' % count)
853 |
854 | class Configuration(object):
855 | '''
856 | This object holds the settings for creating a heatmap as well as
857 | an iterator for the input data.
858 |
859 | Most of the command line processing is about settings and data, so
860 | the command line options are also processed with this object.
861 | This happens in two phases.
862 |
863 | First the settings are parsed and turned into more useful objects
864 | in set_from_options(). Command line flags go in, and the
865 | Configuration object is populated with the specified values and
866 | defaults.
867 |
868 | In the second phase, various other parameters are computed. These
869 | are things we set automatically based on the other settings or on
870 | the data. You can skip this if you set everything manually, but
871 |
872 | The idea is that someone could import this module, populate a
873 | Configuration instance manually, and run the process themselves.
874 | Where possible, this object contains instances, rather than option
875 | strings (e.g. for projection, kernel, colormap, etc).
876 |
877 | Every parameter is explained in the glossary dictionary, and only
878 | documented parameters are allowed. Parameters default to None.
879 | '''
880 |
881 | glossary = {
882 | # Many of these are exactly the same as the command line option.
883 | # In those cases, the documentation is left blank.
884 | # Many have default values based on the command line defaults.
885 | 'output' : '',
886 | 'width' : '',
887 | 'height' : '',
888 | 'margin' : '',
889 | 'shapes' : 'unprojected iterable of shapes (Points and LineSegments)',
890 | 'projection' : 'Projection instance',
891 | 'colormap' : 'ColorMap instance',
892 | 'decay' : '',
893 | 'kernel' : 'kernel instance',
894 | 'extent_in' : 'extent in original space',
895 | 'extent_out' : 'extent in projected space',
896 |
897 | 'background': '',
898 | 'background_image': '',
899 | 'background_brightness' : '',
900 |
901 | # OpenStreetMap background tiles
902 | 'osm' : 'True/False; see command line options',
903 | 'osm_base' : '',
904 | 'zoom' : '',
905 |
906 | #layer only outputs the points
907 | 'layer':'',
908 |
909 | # These are for making an animation, ignored otherwise.
910 | 'ffmpegopts' : '',
911 | 'keepframes' : '',
912 | 'frequency' : '',
913 | 'straggler_threshold' : '',
914 |
915 | # We always instantiate an OptionParser in order to set up
916 | # default values. You can use this OptionParser in your own
917 | # script, perhaps adding your own options.
918 | 'optparser' : 'OptionParser instance for command line processing',
919 | }
920 |
921 | _kernels = { 'linear': LinearKernel,
922 | 'gaussian': GaussianKernel, }
923 | _projections = { 'equirectangular': EquirectangularProjection,
924 | 'mercator': MercatorProjection, }
925 |
926 | def __init__(self, use_defaults=True):
927 | for k in self.glossary.keys():
928 | setattr(self, k, None)
929 | self.optparser = self._make_optparser()
930 | if use_defaults:
931 | self.set_defaults()
932 |
933 | def set_defaults(self):
934 | (options, args) = self.optparser.parse_args([])
935 | self.set_from_options(options)
936 |
937 | def _make_optparser(self):
938 | '''Return a an OptionParser set up for our command line options.'''
939 | # TODO: convert to argparse
940 | from optparse import OptionParser
941 | optparser = OptionParser(version=__version__)
942 | optparser.add_option('-g', '--gpx', metavar='FILE')
943 | optparser.add_option(
944 | '-p', '--points', metavar='FILE',
945 | help=(
946 | 'File containing one space-separated coordinate pair per line, '
947 | 'with optional point value as third term.'))
948 | optparser.add_option(
949 | '', '--csv', metavar='FILE',
950 | help=(
951 | 'File containing one comma-separated coordinate pair per line, '
952 | 'the rest of the line is ignored.'))
953 | optparser.add_option(
954 | '', '--ignore_csv_header', action='store_true',
955 | help='Ignore first line of CSV input file.')
956 | optparser.add_option(
957 | '', '--shp_file', metavar='FILE',
958 | help=('ESRI Shapefile containing the points.'))
959 | optparser.add_option(
960 | '-s', '--scale', metavar='FLOAT', type='float',
961 | help='meters per pixel, approximate'),
962 | optparser.add_option(
963 | '-W', '--width', metavar='INT', type='int',
964 | help='width of output image'),
965 | optparser.add_option(
966 | '-H', '--height', metavar='INT', type='int',
967 | help='height of output image'),
968 | optparser.add_option(
969 | '-P', '--projection', metavar='NAME', type='choice',
970 | choices=list(self._projections.keys()), default='mercator',
971 | help='choices: ' + ', '.join(self._projections.keys()) +
972 | '; default: %default')
973 | optparser.add_option(
974 | '-e', '--extent', metavar='RANGE',
975 | help=(
976 | 'Clip results to RANGE, which is specified as lat1,lon1,lat2,lon2;'
977 | ' (for square mercator: -85.0511,-180,85.0511,180)'))
978 | optparser.add_option(
979 | '-R', '--margin', metavar='INT', type='int', default=0,
980 | help=(
981 | 'Try to keep data at least this many pixels away from image '
982 | 'border.'))
983 | optparser.add_option(
984 | '-r', '--radius', metavar='INT', type='int', default=5,
985 | help='pixel radius of point blobs; default: %default')
986 | optparser.add_option(
987 | '-d', '--decay', metavar='FLOAT', type='float', default=0.95,
988 | help=(
989 | 'float in [0,1]; Larger values give more weight to data '
990 | 'magnitude. Smaller values are more democratic. default:'
991 | '%default'))
992 | optparser.add_option(
993 | '-S', '--save', metavar='FILE', help='save processed data to FILE')
994 | optparser.add_option(
995 | '-L', '--load', metavar='FILE', help='load processed data from FILE')
996 | optparser.add_option(
997 | '-o', '--output', metavar='FILE',
998 | help='name of output file (image or video)')
999 | optparser.add_option(
1000 | '-a', '--animate', action='store_true',
1001 | help='Make an animation instead of a static image')
1002 | optparser.add_option(
1003 | '', '--frequency', type='int', default=1,
1004 | help='input points per animation frame; default: %default')
1005 | optparser.add_option(
1006 | '', '--straggler_threshold', type='int', default=1,
1007 | help='add one more animation frame if >= this many inputs remain')
1008 | optparser.add_option(
1009 | '-F', '--ffmpegopts', metavar='STR',
1010 | help='extra options to pass to ffmpeg when making an animation')
1011 | optparser.add_option(
1012 | '-K', '--keepframes', action='store_true',
1013 | help='keep intermediate images after creating an animation')
1014 | optparser.add_option(
1015 | '-b', '--background', metavar='COLOR',
1016 | help='composite onto this background (color name or #rrggbb)')
1017 | optparser.add_option(
1018 | '-I', '--background_image', metavar='FILE',
1019 | help='composite onto this image')
1020 | optparser.add_option(
1021 | '-B', '--background_brightness', type='float', metavar='NUM',
1022 | help='Multiply each pixel in background image by this.')
1023 | optparser.add_option(
1024 | '-m', '--hsva_min', metavar='HEX',
1025 | default=ColorMap.DEFAULT_HSVA_MIN_STR,
1026 | help='hhhssvvaa hex for minimum data values; default: %default')
1027 | optparser.add_option(
1028 | '-M', '--hsva_max', metavar='HEX',
1029 | default=ColorMap.DEFAULT_HSVA_MAX_STR,
1030 | help='hhhssvvaa hex for maximum data values; default: %default')
1031 | optparser.add_option(
1032 | '-G', '--gradient', metavar='FILE',
1033 | help=(
1034 | 'Take color gradient from this the first column of pixels in '
1035 | 'this image. Overrides -m and -M.'))
1036 | optparser.add_option(
1037 | '-k', '--kernel',
1038 | type='choice',
1039 | default='linear',
1040 | choices=list(self._kernels.keys()),
1041 | help=('Kernel to use for the falling-off function; choices: ' +
1042 | ', '.join(self._kernels.keys()) + '; default: %default'))
1043 | optparser.add_option(
1044 | '', '--osm', action='store_true',
1045 | help='Composite onto OpenStreetMap tiles')
1046 | optparser.add_option(
1047 | '', '--osm_base', metavar='URL',
1048 | default='http://tile.openstreetmap.org',
1049 | help='Base URL for map tiles; default %default')
1050 | optparser.add_option(
1051 | '-z', '--zoom', type='int',
1052 | help='Zoom level for OSM; 0 (the default) means autozoom')
1053 | optparser.add_option('-v', '--verbose', action='store_true')
1054 | optparser.add_option('', '--debug', action='store_true')
1055 | optparser.add_option('', '--layer', action='store_true', help='only plot points')
1056 |
1057 | return optparser
1058 |
1059 | def set_from_options(self, options):
1060 | for k in self.glossary.keys():
1061 | try:
1062 | setattr(self, k, getattr(options, k))
1063 | except AttributeError:
1064 | pass
1065 |
1066 | self.kernel = self._kernels[options.kernel](options.radius)
1067 | self.projection = self._projections[options.projection]()
1068 |
1069 | if options.scale:
1070 | self.projection.meters_per_pixel = options.scale
1071 |
1072 | if options.gradient:
1073 | self.colormap = ColorMap(image = Image.open(options.gradient))
1074 | else:
1075 | self.colormap = ColorMap(hsva_min = ColorMap.str_to_hsva(options.hsva_min),
1076 | hsva_max = ColorMap.str_to_hsva(options.hsva_max))
1077 | if options.gpx:
1078 | logging.debug('Reading from gpx: %s' % options.gpx)
1079 | self.shapes = shapes_from_gpx(options.gpx)
1080 | elif options.points:
1081 | logging.debug('Reading from points: %s' % options.points)
1082 | self.shapes = shapes_from_file(options.points)
1083 | elif options.csv:
1084 | logging.debug('Reading from csv: %s' % options.csv)
1085 | self.shapes = shapes_from_csv(options.csv, options.ignore_csv_header)
1086 | elif options.shp_file:
1087 | logging.debug('Reading from Shape File: %s' % options.shp_file)
1088 | self.shapes = shapes_from_shp(options.shp_file)
1089 |
1090 | if options.extent:
1091 | (lat1, lon1, lat2, lon2) = \
1092 | [float(f) for f in options.extent.split(',')]
1093 | self.extent_in = Extent(coords=(LatLon(lat1, lon1),
1094 | LatLon(lat2, lon2)))
1095 | if options.background_image:
1096 | self.background_image = Image.open(options.background_image)
1097 | (self.width, self.height) = self.background_image.size
1098 |
1099 |
1100 | def fill_missing(self):
1101 | if not self.shapes:
1102 | raise ValueError('no input specified')
1103 |
1104 | padding = self.margin + self.kernel.radius
1105 | if not self.extent_in:
1106 | logging.debug('reading input data')
1107 | self.shapes = list(self.shapes)
1108 | logging.debug('read %d shapes' % len(self.shapes))
1109 | self.extent_in = Extent(shapes=self.shapes)
1110 |
1111 | if self.osm:
1112 | get_osm_background(self, padding)
1113 | else:
1114 | if not self.projection.is_scaled():
1115 | self.projection.auto_set_scale(self.extent_in, padding,
1116 | self.width, self.height)
1117 | if not (self.width or self.height or self.background_image):
1118 | raise ValueError('You must specify width or height or scale '
1119 | 'or background_image or both osm and zoom.')
1120 |
1121 | if self.background_brightness is not None:
1122 | if self.background_image:
1123 | self.background_image = self.background_image.point(
1124 | lambda x: x * self.background_brightness)
1125 | self.background_brightness = None # idempotence
1126 | else:
1127 | logging.warning(
1128 | 'background brightness specified, but no background image')
1129 |
1130 | if not self.extent_out:
1131 | self.extent_out = self.extent_in.map(self.projection.project)
1132 | self.extent_out.grow(padding)
1133 | logging.info('input extent: %s' % str(self.extent_out.map(
1134 | self.projection.inverse_project)))
1135 | logging.info('output extent: %s' % str(self.extent_out))
1136 |
1137 |
1138 | def main():
1139 | logging.basicConfig(format='%(relativeCreated)8d ms // %(message)s')
1140 | config = Configuration(use_defaults=False)
1141 | (options, args) = config.optparser.parse_args()
1142 |
1143 | if options.verbose:
1144 | logging.getLogger().setLevel(logging.INFO)
1145 | if options.debug:
1146 | logging.getLogger().setLevel(logging.DEBUG)
1147 |
1148 | logging.debug('hsva_min: {}/n hsva_max: {}'.format(options.hsva_min, options.hsva_max))
1149 |
1150 | if options.load:
1151 | logging.info('loading data')
1152 | matrix = pickle.load(open(options.load, 'rb'))
1153 | config = matrix['config']
1154 | del matrix['config']
1155 | config.set_from_options(options)
1156 | config.fill_missing()
1157 | else:
1158 | config.set_from_options(options)
1159 | config.fill_missing()
1160 | if options.animate:
1161 | animator = ImageSeriesMaker(config)
1162 | matrix = animator.run()
1163 | else:
1164 | matrix = process_shapes(config)
1165 | matrix = matrix.finalized()
1166 |
1167 | if options.output and not options.animate:
1168 | image = ImageMaker(config).make_image(matrix)
1169 | image.save(options.output)
1170 |
1171 | if options.save:
1172 | logging.info('saving data')
1173 | matrix['config'] = config
1174 | pickle.dump(matrix, open(options.save, 'wb'), 2)
1175 |
1176 | logging.info('end')
1177 |
1178 | if __name__ == '__main__':
1179 | main()
1180 |
--------------------------------------------------------------------------------
/murcia_tweets_polarity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity.png
--------------------------------------------------------------------------------
/murcia_tweets_polarity_layered_combined_binary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity_layered_combined_binary.png
--------------------------------------------------------------------------------
/murcia_tweets_polarity_layered_combined_binary2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manugarri/tweets_map/36bc3f88497406327f1609c0ef488c00ee8b9223/murcia_tweets_polarity_layered_combined_binary2.png
--------------------------------------------------------------------------------