├── README.md
├── calculate_workout_variables.py
├── censor_and_package.py
├── convert_fit_to_csv.py
├── import_and_process_garmin_fit.py
└── process_all.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Converting Garmin FIT to CSV (and GPX to CSV)
 2 | 
 3 | [article for original version](https://maxcandocia.com/article/2017/Sep/22/converting-garmin-fit-to-csv/)
 4 | 
 5 | Requirements:
 6 | 
 7 |  * Python 3.5+
 8 | 
 9 |  * BeautifulSoup
10 | 
11 |  * fitparse (installation instructions below)
12 | 
13 |  * tzwhere (to localize timezones)
14 | 
15 |  * pytz (to localize timezones)
16 | 
17 |  * numpy (to calculate workout variables)
18 | 
19 |  * FIT files to convert (if using that feature)
20 | 
21 |  * GPX files to convert (if using that feature)
22 | 
23 | First, install `fitparse`
24 | 
25 |     sudo pip3 install -e git+https://github.com/dtcooper/python-fitparse#egg=python-fitparse
26 | 
27 | OR
28 | 
29 |     sudo pip3 install fitparse
30 | 
31 | Then you can execute `process_all.py`
32 | 
33 |     python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/
34 | 
35 | This will create a bunch of CSVs for all of your workouts in that directory. The files will be stored in a `subject_data` directory's subdirectory (check the defaults for the specific folders), which is generated based off the subject name. Up to 3 files are made per FIT file:
36 | 
37 |  1. A CSV of all of the track data
38 | 
39 |  2. A CSV of the lap data
40 | 
41 |  3. A CSV of the start (and stop) data
42 | 
43 | Each of the CSVs is in the format '{activity_type}_YY-MM-DD_HH-MM-SS[_{laps,starts}].csv.
44 | 
45 | You can also provide a csv to censor certain geographic regions by latitude, longitude, and radius. Simply create a CSV with `longitude`, `latitude`, and `radius` column headers, and add as many circular regions as you want. Note that radius is assumed to be in meters.
46 | 
47 |     
48 |     python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/ --censorfile=/home/mydir/censor.csv --censor
49 | 
50 | This will be stored in a folder called `"censored"` that is in the subject's directory. You can use the `--censor-string=` to change what censored fields are replaced with (default is `[CENSORED]`).
51 | 
52 | You can also archive the data after it's been processed:
53 | 
54 |     python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/ --censorfile=/home/mydir/censor.csv --censor --archive-results
55 | 
56 | By default, this stores data in a directory called `archives` in the main `subject_data` folder. You can add the `--archive-censored-only`, which will only archive the censored folder.
57 | 
58 | ## GPX data
59 | 
60 | You can also process GPX data (and censor it the same way as FIT data)
61 | 
62 | For the initial processing, you can do
63 | 
64 |     python3 process_all.py --subject-name=mysubjectname --skip-fit-conversion gpx-source-dir=/home/mydir/gpx_files
65 | 
66 | By default, the program will always try to copy/process FIT files unless you add the `--skip-fit-conversion` flag, but you can always tweak the code to your needs.
67 | 
68 | ## Additional Help
69 | 
70 | You can use `python3 process_all.py --help` to see more information.


--------------------------------------------------------------------------------
/calculate_workout_variables.py:
--------------------------------------------------------------------------------
  1 | from bs4 import BeautifulSoup
  2 | import numpy as np 
  3 | PI = np.pi
  4 | import csv
  5 | import os
  6 | from collections import OrderedDict
  7 | import re
  8 | 
  9 | OUTPUT_FILE = 'gpx_processed_info.csv' 
 10 | 
 11 | MAX_SPEED = 50#mph
 12 | 
 13 | #radius of earth in miles
 14 | C_R = 6371/1.60934
 15 | def distcalc(c1, c2):
 16 |     lat1 = float(c1['lat'])*PI/180.
 17 |     lon1 = float(c1['lon'])*PI/180.
 18 | 
 19 |     lat2 = float(c2['lat'])*PI/180.
 20 |     lon2 = float(c2['lon'])*PI/180.
 21 | 
 22 |     dlat = lat2-lat1
 23 |     dlon = lon2-lon1
 24 | 
 25 |     a = np.sin(dlat/2.)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(dlon/2)**2
 26 |     c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
 27 |     d = C_R * c 
 28 |     return d 
 29 | 
 30 | def calculate_distances(points):
 31 |     dists = np.asarray([distcalc(c2.attrs,c1.attrs) for c1, c2 in zip(points[1:],points[:-1])])
 32 |     return dists 
 33 | 
 34 | def calculate_velocities(distances):
 35 |     #convert mi/s to mph
 36 |     velocities = distances * 3600
 37 |     return velocities 
 38 | 
 39 | def calculate_accelerations(velocities):
 40 |     return np.diff(velocities)
 41 | 
 42 | MIPS_TO_MPH = 3600.
 43 | 
 44 | FPS_TO_MPH = 3600./5280
 45 | 
 46 | G_FPS = 32.
 47 | 
 48 | G_MPHPS = 32 * FPS_TO_MPH
 49 | 
 50 | def process_file(filename, target_dir):
 51 |     new_filename = re.sub(r'([^.]+)\.gpx', r'raw_csv/\1.csv', filename)
 52 |     if os.path.exists(new_filename):
 53 |         #print '%s already exists. skipping.' % new_filename
 54 |         return None
 55 |     print('processing %s' % filename )
 56 |     with open(os.path.join(target_dir, filename),'r') as f:
 57 |         soup = BeautifulSoup(f.read(), 'lxml')
 58 |         track = soup.find('trk')
 59 |         segments = track.find('trkseg')
 60 |         points = segments.find_all('trkpt')
 61 |         times = [p.find('time').text for p in points]
 62 |         elevations = np.asarray([float(p.find('ele').text) for p in points])
 63 |     #lon-lat based
 64 |     distances = calculate_distances(points)
 65 |     velocities = calculate_velocities(distances)
 66 |     #if velocity > MAX_SPEED, then it indicates discontinuity
 67 |     velocities = velocities * (velocities < MAX_SPEED)
 68 |     accelerations = calculate_accelerations(velocities)
 69 |     #elevation
 70 |     elevation_changes = np.diff(elevations)
 71 |     sum_v = np.sum(velocities)
 72 |     sum_v2 = np.sum(velocities**2)
 73 |     sum_v3 = np.sum(velocities**3)
 74 |     abs_elevation = np.sum(np.abs(elevation_changes))/2
 75 |     sum_a = np.sum(accelerations * (accelerations > 0))
 76 |     #alternative type of accelerations measurement
 77 |     velocities_mph = 3600 * velocities 
 78 |     energy_increases = velocities_mph[1:]**2 - velocities_mph[:-1]**2
 79 |     energy_increases = energy_increases - FPS_TO_MPH**2 * G_FPS * elevation_changes[1:] * (elevation_changes[1:] < 0)
 80 |     energy_increases = np.sum(energy_increases * (energy_increases > 0))
 81 |     with open(new_filename, 'w') as f:
 82 |         f.write('time,distance,elevation_change')
 83 |         for t, d, e in zip(times[1:], distances, elevation_changes):
 84 |             f.write('\n')
 85 |             f.write(','.join([str(t), str(d), str(e)]))
 86 |             return {
 87 |                 'sum_v':sum_v,
 88 |                 'sum_v2':sum_v2,
 89 |                 'abs_elevation':abs_elevation,
 90 |                 'sum_a':sum_a,
 91 |                 'sum_v3':sum_v3,
 92 |                 'sum_e':energy_increases
 93 |             }
 94 | 
 95 | def main(gpx_source_dir, gpx_target_dir, gpx_summary_filename):
 96 |     original_dir = os.getcwd()
 97 |     os.makedirs(gpx_target_dir, exist_ok=True)
 98 |     
 99 |     os.chdir(gpx_source_dir)
100 |     file_list = [x for x in os.listdir('.') if x[-4:].lower()=='.gpx']
101 |     file_list.sort()
102 |     fileinfo = OrderedDict()
103 |     for file in file_list:
104 |         td = process_file(file, gpx_target_dir)
105 |         if td is not None: 
106 |             fileinfo[file] = td
107 |     #no longer interested in actually summing up variables here...
108 |     if True:
109 |         return 0
110 |     with open(os.path.join(gpx_target_dir, gpx_summary_filename), 'w') as f:
111 |         f.write(','.join(['filename','sum_v','sum_v2','sum_v3','abs_elevation','sum_a', 'sum_e']))
112 |         for fn, data in fileinfo.iteritems():
113 |             f.write('\n')
114 |             f.write(','.join([str(x) for x in [
115 |                 fn,
116 |                 data['sum_v'],
117 |                 data['sum_v2'],
118 |                 data['sum_v3'],
119 |                 data['abs_elevation'],
120 |                 data['sum_a'],
121 |                 data['sum_e']
122 |             ]
123 |             ]
124 |             ))
125 |     print('processed gpx files')
126 |     
127 |     os.setwd(original_dir)
128 | 
129 | 
130 | if __name__=='__main__':
131 |     raise NotImplementedError('this program is now to be called from other files') #main()
132 | 
133 |     """
134 |     **Context:**
135 | 
136 | I am trying to reverse-engineer Strava's algorithm for measuring Calories burned on a bike ride. I have 76 .gpx files downloaded, along with Strava's estimation for Calories burned. The equations for measuring this involve being able to estimate the speed at various time points, the elevation at various time points, and calculating power as a polynomial function of these two values. Integrating unscaled polynomials over time and plugging them into a linear regression with respect to the Calorie estimate should determine coefficients for my particular case.
137 | 
138 | **Note that I do not believe Strava's measurement is accurate. I simply want to determine the algorithms and formula they use for my particular case**
139 | 
140 | _________________
141 | **Problems:**
142 | 
143 | 1. My current speed variable is simply calculated by scaling the distance (each over 1-second intervals), which was initially calculated using the [haversine formula](https://en.wikipedia.org/wiki/Haversine_formula). I have a smoothed version that uses a normal kernel with varying bandwidths, and in each case the total distance matches the one Strava displays on its website.
144 | 
145 | 2. The elevation calculations I make to determine overall change in elevation are very far off from the website's. I am simply looking at all positive increases in elevation (there is an 0.1-foot resolution), and adding those together. The estimates the website gives are usually 3/4-3 times the value I estimate.
146 | 
147 | 3. The power formula the website gives is [this](https://support.strava.com/hc/en-us/articles/216917107-Power-Calculations), but it seems to calculate wind resistance as a function of v^2 (which is usually the force) as opposed to v^3 (which should be the power). Either way, I use the first, second, and third powers of velocity as variables. Additionally, I look at kinetic energy increases (a function of v^2/2) and add those together, decreasing them by decreases in gravitational potential energy that simultaneously occur. 
148 | 
149 | _____________
150 | 
151 | **Additional notes about data:**
152 | 
153 | 1. The rides take place on mostly flat ground, with very few hills. Most elevation changes take place over longer distances.
154 | 
155 | 2. 
156 |     """
157 | 


--------------------------------------------------------------------------------
/censor_and_package.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import csv
  3 | import re 
  4 | from bs4 import BeautifulSoup
  5 | import bs4
  6 | import shutil
  7 | from io import StringIO
  8 | import zipfile
  9 | import numpy as np 
 10 | PI = np.pi
 11 | import codecs
 12 | 
 13 | #should have 3 columns: longitude, latitude, radius (meters)
 14 | #CENSORFILE = 'censor.csv'
 15 | 
 16 | CENSOR_PARAMS = {
 17 |     'location':True,#this should always be true
 18 |     'heart_rate':False,
 19 |     'speed':True,
 20 |     'temperature':True,
 21 |     'speed_smooth':True,
 22 |     'elevation':True,
 23 |     'altitude':True,
 24 |     'timestamp':False,#if this is true, it will censor all of the CSVs
 25 |     'start_position_lat':True,
 26 |     'start_position_long':True,
 27 |     'end_position_lat':True,
 28 |     'end_position_long':True,
 29 |     'latitude':True,
 30 |     'longitude':True,
 31 |     'position_lat':True,
 32 |     'position_long':True,
 33 |     'enhanced_altitude':True,
 34 |     'enhanced_speed':True,
 35 |     #GPX-specific namestrk
 36 |     'ele':True,
 37 |     'lat':True,
 38 |     'lon':True,
 39 |     'time':False,#if this is true, will censor all of GPX
 40 | }
 41 | 
 42 | # other names that can be synonymous with lat, lon
 43 | ADDITIONAL_LATLONG = [('start_position_lat','start_position_long'),
 44 |     ('end_position_lat','end_position_long')]
 45 | 
 46 | 
 47 | #removes any NA values for coordinates
 48 | #REMOVE_MISSING_COORDINATES = True
 49 | 
 50 | CENSOR_STRING = '[CENSORED]'
 51 | 
 52 | #ROOT_DIRECTORY = '/ntfsl/data/workouts'
 53 | 
 54 | #SEARCH_DIRECTORIES = [
 55 | #    'workout_gpx/strava_gpx',
 56 | #    'workout_gpx/garmin_fit',
 57 | #    'workout_gpx/cateye_gpx',
 58 | #    'workout_gpx/strava_gpx/gpx_csv'
 59 | #]
 60 | 
 61 | #TARGET_DIRECTORY = 'CLEAN_WORKOUTS'
 62 | 
 63 | #ZIP_FILENAME = 'CLEAN_WORKOUTS.ZIP'
 64 | 
 65 | CENSOR_COORDINATES = []
 66 | 
 67 | #ADDITIONAL_FILES_TO_COPY = ['workout_gpx/strava_gpx/bike_and_run_gpx_info.ods']
 68 | 
 69 | #will overwrite file if it already exists
 70 | OVERWRITE = False 
 71 | OVERWRITE_CSV = True 
 72 | OVERWRITE_GPX = False 
 73 | 
 74 | BLACKLIST = set(['test_file.csv'])
 75 | 
 76 | #radius of earth in meters
 77 | C_R = 6371. * 1000#/1.60934
 78 | def distcalc(c1, c2):
 79 |     lat1 = float(c1['lat'])*PI/180.
 80 |     lon1 = float(c1['lon'])*PI/180.
 81 | 
 82 |     lat2 = float(c2['lat'])*PI/180.
 83 |     lon2 = float(c2['lon'])*PI/180.
 84 | 
 85 |     dlat = lat2-lat1
 86 |     dlon = lon2-lon1
 87 | 
 88 |     a = np.sin(dlat/2.)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(dlon/2)**2
 89 |     c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
 90 |     d = C_R * c 
 91 |     return d 
 92 | 
 93 | def calculate_distances(points):
 94 |     dists = np.asarray([distcalc(c2.attrs,c1.attrs) for c1, c2 in zip(points[1:],points[:-1])])
 95 |     return dists 
 96 | 
 97 | def is_censorable(longitude, latitude):
 98 |     censor = False 
 99 |     for cc in CENSOR_COORDINATES:
100 |         dist = distcalc({'lat':cc['latitude'],
101 |             'lon':cc['longitude']},
102 |             {'lat':latitude,'lon':longitude})
103 |         if dist <= cc['radius']:
104 |             censor = True 
105 |             break
106 |     return censor 
107 | 
108 | CSV_REGEX = re.compile(r'.*\.csv$')
109 | GPX_REGEX = re.compile(r'.*\.gpx$')
110 | 
111 | def find_csv(directory):
112 |     files = os.listdir(directory)
113 |     return [file for file in files if CSV_REGEX.match(file) and file not in BLACKLIST]
114 | 
115 | def find_gpx(directory):
116 |     files = os.listdir(directory)
117 |     return [file for file in files if GPX_REGEX.match(file) and file not in BLACKLIST]
118 | 
119 | def censor_line(x, template):
120 |     return [e if not template[i] else CENSOR_STRING for i, e in enumerate(x)]
121 | 
122 | def transfer_csv(filename, directory, censor_target_dir):
123 |     target_file = os.path.join(censor_target_dir, directory, filename)
124 |     if os.path.isfile(target_file) and not (OVERWRITE or OVERWRITE_CSV):
125 |         return 1
126 |     with open(os.path.join(directory, filename), 'r') as f: 
127 |         reader = csv.reader(f)
128 |         use_alternate_censoring = False 
129 |         with codecs.open(os.path.join(censor_target_dir, os.path.split(directory)[1], filename),
130 |                          'w', encoding='utf8') as of:
131 |             writer = csv.writer(of)
132 |             header = next(reader)
133 |             writer.writerow(header)
134 |             if 'latitude' in header:
135 |                 lat_index = header.index('latitude')
136 |                 lon_index = header.index('longitude')
137 |             elif 'position_lat' in header:
138 |                 lat_index = header.index('position_lat')
139 |                 lon_index = header.index('position_long')
140 |             else:
141 |                 use_alternate_censoring=True
142 |                 other_latlong_indexes = []
143 |                 for names in ADDITIONAL_LATLONG:
144 |                     try:
145 |                         other_latlong_indexes.append( ( header.index(names[0]), header.index(names[1])) )
146 |                     except ValueError:
147 |                         continue 
148 |             #currently not in use
149 |             censorable_columns = [i for i, column in enumerate(header) if CENSOR_PARAMS.get(column, False)]
150 |             #currently in use 
151 |             should_censor = [CENSOR_PARAMS.get(column, False) for i, column in enumerate(header)]
152 |             #print should_censor
153 |             for line in reader:
154 |                 if not use_alternate_censoring:
155 |                     try:
156 |                         longitude, latitude = ( float(line[lon_index]), float(line[lat_index]) )
157 |                         if is_censorable(longitude, latitude):
158 |                             if not CENSOR_PARAMS['timestamp']:
159 |                                 writer.writerow(censor_line(line, should_censor))
160 |                         else:
161 |                             writer.writerow(line)
162 |                     except ValueError:
163 |                         #likely has one or both of the longitude/latitude values missing
164 |                         #I do not personally have files like this (I think), but it is possible
165 |                         #will fail to censor latitude/longitude if the other is not present, but that's
166 |                         #not realistic
167 |                         print('....')
168 |                         writer.writerow(line)
169 |                 else:
170 |                     will_censor = False 
171 |                     for latitude_idx, longitude_idx in other_latlong_indexes:
172 |                         try:
173 |                             latitude = float(line[latitude_idx])
174 |                             longitude = float(line[longitude_idx])
175 |                         except ValueError:
176 |                             #value of 'None' likely, will just ignore this...
177 |                             continue 
178 |                         will_censor = will_censor or is_censorable(latitude, longitude)
179 |                         if will_censor:
180 |                             break
181 |                     if will_censor:
182 |                         if not CENSOR_PARAMS['timestamp']:
183 |                             writer.writerow(censor_line(line, should_censor))
184 |                     else:
185 |                         writer.writerow(line)
186 |         print('transfered %s' % (os.path.join(directory, filename)))
187 | 
188 | 
189 | def load_censor_coordinates(censorfile):
190 |     # not great practice, but easy enough to use here
191 |     global CENSOR_COORDINATES 
192 |     with open(censorfile,'r') as f:
193 |         reader = csv.reader(f)
194 |         header = next(reader)
195 |         lat_index = header.index('latitude')
196 |         lon_index = header.index('longitude')
197 |         radius_index = header.index('radius')
198 |         for line in reader:
199 |             CENSOR_COORDINATES.append({'latitude':float(line[lat_index]),
200 |                 'longitude':float(line[lon_index]),
201 |                 'radius':float(line[radius_index])}
202 |                 )
203 |         print(CENSOR_COORDINATES )
204 |     print('loaded CENSOR_COORDINATES')
205 |     return 0
206 | 
207 | def transfer_gpx(filename, directory, censor_target_dir):
208 |     target_file = os.path.join(censor_target_dir, os.path.split(directory)[1], filename)
209 |     if os.path.isfile(target_file) and not (OVERWRITE or OVERWRITE_GPX):
210 |         return 1
211 |     with open(os.path.join(directory, filename),'r') as f:
212 |         data = f.read()
213 |         soup = BeautifulSoup(data, 'lxml',from_encoding="utf-8")
214 |     trkpts = soup.find_all('trkpt')
215 |     for pt in trkpts:
216 |         lat, lon = (float(pt.attrs['lat']), float(pt.attrs['lon']) )
217 |         will_censor = is_censorable(lon, lat)
218 |         if will_censor:
219 |             if CENSOR_PARAMS['time']:
220 |                 pt.decompose()
221 |             else:
222 |                 for child in pt.children:
223 |                     if isinstance(child, bs4.element.Tag):
224 |                         if CENSOR_PARAMS.get(child.name, False):
225 |                             child.decompose()
226 |                 if CENSOR_PARAMS.get('lat', False):
227 |                     pt.attrs['lat'] = CENSOR_STRING
228 |                 if CENSOR_PARAMS.get('lon', False):
229 |                     pt.attrs['lon'] = CENSOR_STRING 
230 |                 
231 |     with codecs.open(os.path.join(censor_target_dir, os.path.split(directory)[1],
232 |                                   filename), 'w', encoding='utf8') as f:
233 |         try:
234 |             f.write(soup.prettify())
235 |         except:
236 |             print(filename )
237 |             print(directory )
238 |             raise Exception('fix that damn unicode bug')
239 |     print('processed %s' % '/'.join([directory,filename]))
240 |     return 0 
241 | 
242 | def make_directories(censor_search_directories, censor_target_dir):
243 |     counter = 0
244 |     for directory in censor_search_directories:
245 |         path = os.path.join(censor_target_dir, os.path.split(directory)[1])
246 |         if not os.path.exists(path):
247 |             os.makedirs(path)
248 |             counter += 1
249 |     print('made %d necessary directories' % counter )
250 | 
251 | def zip_target_directory( archive_target_dir, zip_filename, target_directory):
252 |     shutil.make_archive(os.path.join(archive_target_dir, zip_filename), 'zip', target_directory)
253 | 
254 | def main(
255 |         censor_search_directories,
256 |         censor_target_dir,
257 |         censorfile,
258 |         censor_string,
259 |         options, # pretty much everything else...
260 | ):
261 |     #os.chdir(ROOT_DIRECTORY)
262 |     if censorfile != '':
263 |         load_censor_coordinates(censorfile)
264 | 
265 |     # quick h4ck
266 |     global CENSOR_STRING
267 |     CENSOR_STRING = censor_string
268 | 
269 |     if censorfile != '':
270 |         make_directories(censor_search_directories, censor_target_dir)
271 |         for directory in censor_search_directories:
272 |             print('searching %s' % directory )
273 |             csv_files = find_csv(directory)
274 |             gpx_files = find_gpx(directory)
275 |             #print gpx_files
276 |             for filename in csv_files:
277 |                 try:
278 |                     transfer_csv(filename, directory, censor_target_dir)
279 |                 except Exception as e:
280 |                     print('!')
281 |                     print(filename )
282 |                     raise e 
283 |             for filename in gpx_files:
284 |                 transfer_gpx(filename, directory, censor_target_dir)
285 |     if options['archive_results']:
286 |         os.makedirs(options['archive_output_dir'], exist_ok=True)
287 |         for file in options['archive_extra_files']:
288 |             if options['archive_censored_only']:
289 |                 shutil.copyfile(file, os.path.join(censor_target_dir, os.path.split(file)[1]))
290 |             else:
291 |                 shutil.copyfile(file, os.path.join(options['root_subject_dir'], os.path.split(file)[1]))
292 | 
293 |         if options['archive_censored_only']:
294 |             zip_target_directory(options['archive_output_dir'], options['archive_filename'],
295 |                                  censor_target_dir
296 |             )
297 |         else:
298 |             zip_target_directory(options['archive_output_dir'], options['archive_filename'],
299 |                                  options['root_subject_dir'],
300 |             )
301 |         print('made censored files and zipped them!')
302 | 
303 | 
304 | if __name__=='__main__':
305 |     raise NotImplementedError('No longer supporting executable')
306 | 


--------------------------------------------------------------------------------
/convert_fit_to_csv.py:
--------------------------------------------------------------------------------
  1 | """
  2 | toggle ALT_FILENAME to change naming scheme
  3 | currently recommended to keep at =True, since event type is placed in filename 
  4 | of created objects
  5 | """
  6 | 
  7 | import csv
  8 | import os
  9 | #to install fitparse, run 
 10 | #sudo pip3 install -e git+https://github.com/dtcooper/python-fitparse#egg=python-fitparse
 11 | import fitparse
 12 | import pytz
 13 | from copy import copy
 14 | from tzwhere import tzwhere
 15 | 
 16 | print('Initializing tzwhere')
 17 | tzwhere = tzwhere.tzwhere()
 18 | 
 19 | tz_fields = ['timestamp_utc', 'timezone']
 20 | 
 21 | #for general tracks
 22 | allowed_fields = ['timestamp','position_lat','position_long', 'distance',
 23 | 'enhanced_altitude', 'altitude','enhanced_speed',
 24 |                  'speed', 'heart_rate','cadence','fractional_cadence',
 25 |                  'temperature'] + tz_fields
 26 | 
 27 | # if gps data is spotty, but you want to keep HR/temp data while the clock is running, you can remove
 28 | # 'position_lat' and 'position_long' from here
 29 | required_fields = ['timestamp', 'position_lat', 'position_long', 'altitude']
 30 | 
 31 | 
 32 | 
 33 | #for laps
 34 | lap_fields = ['timestamp','start_time','start_position_lat','start_position_long',
 35 |                'end_position_lat','end_position_long','total_elapsed_time','total_timer_time',
 36 |                'total_distance','total_strides','total_calories','enhanced_avg_speed','avg_speed',
 37 |                'enhanced_max_speed','max_speed','total_ascent','total_descent',
 38 |                'event','event_type','avg_heart_rate','max_heart_rate',
 39 |                'avg_running_cadence','max_running_cadence',
 40 |                'lap_trigger','sub_sport','avg_fractional_cadence','max_fractional_cadence',
 41 |                'total_fractional_cycles','avg_vertical_oscillation','avg_temperature','max_temperature'] + tz_fields
 42 | #last field above manually generated
 43 | lap_required_fields = ['timestamp', 'start_time','lap_trigger']
 44 | 
 45 | #start/stop events
 46 | start_fields = ['timestamp','timer_trigger','event','event_type','event_group'] 
 47 | start_required_fields = copy(start_fields)
 48 | start_fields += tz_fields
 49 | 
 50 | #
 51 | all_allowed_fields = set(allowed_fields + lap_fields + start_fields)
 52 | 
 53 | UTC = pytz.UTC
 54 | CST = pytz.timezone('US/Central')
 55 | 
 56 | 
 57 | #files beyond the main file are assumed to be created, as the log will be updated only after they are created
 58 | ALT_FILENAME = True
 59 | ALT_LOG_ = 'file_log.log'
 60 | 
 61 | def read_log(log_path):
 62 |     with open(os.path.join(log_path, ALT_LOG_), 'r') as f:
 63 |         lines = f.read().split()
 64 |     return lines
 65 | 
 66 | def append_log(filename, log_path):
 67 |     with open(os.path.join(log_path, ALT_LOG_), 'a') as f:
 68 |         f.write(filename)
 69 |         f.write('\n')
 70 |     return None 
 71 | 
 72 | def main(
 73 |         fit_target_dir,
 74 |         fit_processed_csv_dir,
 75 |         fit_overwrite,
 76 |         fit_ignore_splits_and_laps,
 77 | ):
 78 | 
 79 |     ALT_LOG = os.path.join(fit_processed_csv_dir, ALT_LOG_)
 80 |     files = os.listdir(fit_target_dir)
 81 |     fit_files = [file for file in files if file[-4:].lower()=='.fit']
 82 |     overwritten_files = []
 83 |     
 84 |     if not os.path.exists(ALT_LOG):
 85 |         os.system('touch %s' % ALT_LOG)
 86 |         file_list = []
 87 |     else:
 88 |         file_list = read_log(fit_processed_csv_dir)
 89 |         
 90 |     for file in fit_files:
 91 |         is_overwritten=False
 92 |         if file in file_list and not fit_overwrite:
 93 |             continue
 94 |         elif file in file_list:
 95 |             is_overwritten=True
 96 |             
 97 |         new_filename = file[:-4] + '.csv'
 98 |         
 99 |         fitfile = fitparse.FitFile(
100 |             os.path.join(fit_target_dir, file),  
101 |             data_processor=fitparse.StandardUnitsDataProcessor()
102 |         )
103 |         
104 |         print('converting %s' % os.path.join(fit_target_dir, file))
105 |         write_fitfile_to_csv(
106 |             fitfile,
107 |             new_filename,
108 |             file,
109 |             fit_target_dir,
110 |             fit_processed_csv_dir,
111 |             is_overwritten,
112 |             fit_ignore_splits_and_laps,
113 |         )
114 |     print('finished conversions')
115 | 
116 | def lap_filename(output_filename):
117 |     return output_filename[:-4] + '_laps.csv'
118 | 
119 | def start_filename(output_filename):
120 |     return output_filename[:-4] + '_starts.csv'
121 | 
122 | def get_timestamp(messages):
123 |     for m in messages:
124 |         fields = m.fields
125 |         for f in fields:
126 |             if f.name == 'timestamp':
127 |                 return f.value
128 |     return None
129 | 
130 | def get_event_type(messages):
131 |     for m in messages:
132 |         fields = m.fields
133 |         for f in fields:
134 |             if f.name == 'sport':
135 |                 return f.value
136 |     return None
137 | 
138 | def write_fitfile_to_csv(
139 |         fitfile,
140 |         output_file='test_output.csv',
141 |         original_filename=None,
142 |         fit_target_dir=None, #raises errors if not defined
143 |         fit_processed_csv_dir=None, #raises errors if not defined
144 |         is_overwritten=False,
145 |         fit_ignore_splits_and_laps=False
146 | ):
147 |     tz_name = ''
148 |     local_tz = CST
149 |     changed_tz = False
150 |     position_long = None
151 |     position_lat = None
152 |     messages = fitfile.messages
153 |     data = []
154 |     lap_data = []
155 |     start_data = []
156 |     #this should probably work, but it's possibly 
157 |     #based on a certain version of the file/device
158 |     timestamp = get_timestamp(messages)
159 |     event_type = get_event_type(messages)
160 |     if event_type is None:
161 |         event_type = 'other'
162 |     output_file = event_type + '_' + timestamp.strftime('%Y-%m-%d_%H-%M-%S.csv')
163 |     
164 |     for m in messages:
165 |         skip=False
166 |         skip_lap = False 
167 |         skip_start = False 
168 |         if not hasattr(m, 'fields'):
169 |             continue
170 |         fields = m.fields
171 |         #check for important data types
172 |         mdata = {}
173 |         for field in fields:
174 |             if not changed_tz and field.name in ['position_lat','position_long', 'start_position_lat','start_position_long']:
175 |                 if 'lat' in field.name:
176 |                     try:
177 |                         position_lat = float(field.value)
178 |                     except TypeError:
179 |                         pass
180 |                 else:
181 |                     try:
182 |                         position_long = float(field.value)
183 |                     except TypeError:
184 |                         pass
185 |                 if position_lat is not None and position_long is not None:
186 |                     changed_tz = True
187 |                     tz_name = tzwhere.tzNameAt(position_lat, position_long)
188 |                     if tz_name is None:
189 |                         for latoff in [-0.1, 0, 0.1]:
190 |                             for longoff in [-0.1, 0, 0.1]:
191 |                                 tz_name = tzwhere.tzNameAt(
192 |                                     position_lat + latoff,
193 |                                     position_long + longoff
194 |                                 )
195 |                                 if tz_name is not None:
196 |                                     break
197 |                                 
198 |                     try:
199 |                         local_tz = pytz.timezone(tz_name)
200 |                     except Exception as e:
201 |                         print('TZ NAME: %s' % tz_name)
202 |                         print('lat/lon: (%s/%s)' % (position_lat, position_long))
203 |                         print('outfile name: %s' % output_file)
204 |                         raise e
205 |                     if tz_name != 'US/Central':
206 |                         print('Using timezone %s' % tz_name)
207 |                     
208 |                 
209 |             if field.name in all_allowed_fields:
210 |                 # currently saving timezone conversion to end, but keeping this here for now
211 |                 if field.name=='timestamp' and False:
212 |                     mdata[field.name] = UTC.localize(field.value).astimezone(local_tz)
213 |                 else:
214 |                     mdata[field.name] = field.value
215 |         # this is sort of a janky way of determining field type, but it works for now
216 |         for rf in required_fields:
217 |             if rf not in mdata:
218 |                 skip=True
219 |         for lrf in lap_required_fields:
220 |             if lrf not in mdata:
221 |                 skip_lap = True 
222 |         for srf in start_required_fields:
223 |             if srf not in mdata:
224 |                 skip_start = True
225 |         if not skip:
226 |             data.append(mdata)
227 |         elif not skip_lap:
228 |             lap_data.append(mdata)
229 |         elif not skip_start:
230 |             start_data.append(mdata)
231 | 
232 |     # localize timezone
233 |     for row in data + lap_data + start_data:
234 |         if 'timestamp' in row:
235 |             row['timestamp_utc'] = row['timestamp']
236 |             row['timestamp'] = UTC.localize(row['timestamp']).astimezone(local_tz)
237 |             row['timezone'] = tz_name
238 |             
239 |     #write to csv
240 |     #general track info
241 |     with open(os.path.join(fit_processed_csv_dir, output_file), 'w') as f:
242 |         writer = csv.writer(f)
243 |         writer.writerow(allowed_fields)
244 |         for entry in data:
245 |             writer.writerow([ str(entry.get(k, '')) for k in allowed_fields])
246 | 
247 |     if not fit_ignore_splits_and_laps:
248 |         #lap info
249 |         with open(os.path.join(fit_processed_csv_dir, lap_filename(output_file)), 'w') as f:
250 |             writer = csv.writer(f)
251 |             writer.writerow(lap_fields)
252 |             for entry in lap_data:
253 |                 writer.writerow([ str(entry.get(k, '')) for k in lap_fields])
254 |         #start/stop info
255 |         with open(os.path.join(fit_processed_csv_dir, start_filename(output_file)), 'w') as f:
256 |             writer = csv.writer(f)
257 |             writer.writerow(start_fields)
258 |             for entry in start_data:
259 |                 writer.writerow([ str(entry.get(k, '')) for k in start_fields])
260 |     print('wrote %s' % output_file)
261 |     if not fit_ignore_splits_and_laps:
262 |         print('wrote %s' % lap_filename(output_file))
263 |         print('wrote %s' % start_filename(output_file))
264 | 
265 |     if not is_overwritten:
266 |         append_log(original_filename, fit_processed_csv_dir)
267 | 
268 |     if not changed_tz:
269 |         print('TZ IS NOT CHANGED!')
270 | 
271 | if __name__=='__main__':
272 |     raise NotImplementedError('There is no way to currently run this as a command-line script. It must be imported. Run process_all.py instead.')
273 |     main()
274 | 


--------------------------------------------------------------------------------
/import_and_process_garmin_fit.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import shutil
 3 | import re
 4 | 
 5 | import convert_fit_to_csv
 6 | 
 7 | #ACTIVITY_DIRECTORY = '/media/max/GARMIN/Garmin/ACTIVITY/'
 8 | 
 9 | #TARGET_DIRECTORY = '/ntfsl/data/workouts/workout_gpx/garmin_fit/'
10 | 
11 | FNAME_REGEX = re.compile(r'\.[Ff][Ii][Tt]')
12 | 
13 | 
14 | 
15 | def main(
16 |         fit_source_dir,
17 |         fit_target_dir,
18 |         fit_processed_csv_dir,
19 |         fit_overwrite,
20 |         fit_ignore_splits_and_laps,
21 | ):
22 |     os.makedirs(fit_target_dir, exist_ok=True)
23 |     os.makedirs(fit_processed_csv_dir, exist_ok=True)
24 |     activity_files = os.listdir(fit_source_dir)
25 |     print('activity files: ', activity_files)
26 |     new_names = [FNAME_REGEX.sub('.fit', file) for file in activity_files]
27 |     print('new names: ', new_names)
28 |     current_files = set(os.listdir(fit_target_dir))
29 |     print('current_files: ', current_files)
30 |     for src_file, tgt_file in zip(activity_files, new_names):
31 |         if FNAME_REGEX.sub('.fit', tgt_file) in current_files:
32 |             print('%s already exists...' % tgt_file)
33 |             continue
34 |         else:
35 |             pass
36 |         shutil.copyfile(
37 |             os.path.join(fit_source_dir, src_file),
38 |             os.path.join(fit_target_dir, tgt_file)
39 |         )
40 |         print("copied %s to %s" % (
41 |             os.path.join(fit_source_dir, src_file),
42 |             os.path.join(fit_target_dir, tgt_file)
43 |         )
44 |         )
45 | 
46 |     convert_fit_to_csv.main(
47 |         fit_target_dir,
48 |         fit_processed_csv_dir,
49 |         fit_overwrite,
50 |         fit_ignore_splits_and_laps,
51 |     )
52 |     
53 |     #os.chdir(fit_target_dir)
54 |     #os.system('python3 convert_fit_to_csv.py')
55 | 
56 | if __name__=='__main__':
57 |     main()
58 | 


--------------------------------------------------------------------------------
/process_all.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import argparse
  4 | 
  5 | import import_and_process_garmin_fit
  6 | #import gpx_to_csv
  7 | import calculate_workout_variables
  8 | import censor_and_package
  9 | 
 10 | def main():
 11 |     options = parse_options()
 12 |     censor_search_directories = []
 13 |     
 14 |     if options['gpx_source_dir'] != '':
 15 |         if not options['skip_gpx_conversion']:
 16 |             print('doing GPX conversions')
 17 |             calculate_workout_variables.main(
 18 |                 options['gpx_source_dir'],
 19 |                 options['gpx_target_dir'],
 20 |                 options['gpx_summary_filename'],
 21 | 
 22 |             )
 23 |         censor_search_directories.append(options['gpx_target_dir'])
 24 | 
 25 |     if options['fit_source_dir'] != '':
 26 |         if not options['skip_fit_conversion']:
 27 |             print('doing FIT conversions')
 28 |             import_and_process_garmin_fit.main(
 29 |                 options['fit_source_dir'],
 30 |                 options['fit_target_dir'],
 31 |                 options['fit_processed_csv_dir'],
 32 |                 options['fit_overwrite'],
 33 |                 options['fit_ignore_splits_and_laps'],
 34 |             )
 35 |         censor_search_directories.append(options['fit_processed_csv_dir'])
 36 | 
 37 |     # even if no censoring is done, archiving can still be done here
 38 |     if True: #options['censorfile'] != '' and len(censor_search_directories) > 0:
 39 |         censor_target_dir = os.path.join(options['subject_dir'], options['name'], 'censored')
 40 |         censor_and_package.main(
 41 |             censor_search_directories,
 42 |             censor_target_dir,
 43 |             options['censorfile'],
 44 |             options['censor_string'],
 45 |             # will be used to control archiving
 46 |             options
 47 |         )
 48 | 
 49 | 
 50 | 
 51 | def parse_options():
 52 |     parser = argparse.ArgumentParser(description='Run FIT/GPX Pipeline')
 53 |     parser.add_argument('--subject-name', dest='subject_name', type=str, required=True,
 54 |                         help='name of subject'
 55 |     )
 56 | 
 57 |     parser.add_argument('--fit-source-dir', dest='fit_source_dir', type=str,
 58 |                         default='/media/max/GARMIN/Garmin/ACTIVITY/',
 59 |                         help='source data for garmin fit'
 60 |     )
 61 | 
 62 |     parser.add_argument('--fit-target-dir', dest='fit_target_dir', required=False,
 63 |                         default='',
 64 |                         help='target directory for FIT data; default uses subject name'
 65 |     )
 66 | 
 67 |     parser.add_argument('--fit-processed-csv-dir', dest='fit_processed_csv_dir', required=False,
 68 |                         default='',
 69 |                         help='target directory for CSVs of processed fit data; default uses subject name'
 70 |     )
 71 | 
 72 |     #TODO
 73 |     parser.add_argument('--erase-copied-fit-files', dest='erase_copied_fit_files', required=False,
 74 |                         action='store_true',
 75 |                         help='If True, will delete any copied FIT files (not the originals, though)'
 76 |     )
 77 | 
 78 |     parser.add_argument('--gpx-source-dir', dest='gpx_source_dir',required=False, default='',
 79 |                         help='directory for gpx files (if desired)',
 80 |     )
 81 | 
 82 |     parser.add_argument('--gpx-target-dir', dest='gpx_target_dir', required=False,
 83 |                         default='',
 84 |                         help='directory to store processed gpx csv in'
 85 |     )
 86 | 
 87 |     parser.add_argument('--subject-dir', dest='subject_dir',
 88 |                         default=os.path.join(os.getcwd(), 'subject_data'),
 89 |                         help='default directory to store subject data in',
 90 |     )
 91 | 
 92 |     parser.add_argument('--gpx-summary-filename', dest='gpx_summary_filename',
 93 |                         default='gpx_summary.csv',
 94 |                         help='the summary filename for gpx data'
 95 |     )
 96 | 
 97 |     parser.add_argument('--fit-overwrite', dest='fit_overwrite',
 98 |                         action='store_true', default=False, required=False,
 99 |                         help='Will overwrite any previously created CSVs from fit data'
100 |     )
101 | 
102 |     parser.add_argument('--fit-ignore-splits-and-laps', dest='fit_ignore_splits_and_laps',
103 |                         action='store_true', default=False, required=False,
104 |                         help='Will not write split/lap data if specified'
105 |     )
106 | 
107 |     # censorship arguments
108 | 
109 |     parser.add_argument('--censorfile', dest='censorfile', required=False,
110 |                         default='',
111 |                         help='If provided, will use censorfile CSV to create a copy of data '
112 |                         'with censored locations around different latitude/longitude/radii'
113 |     )
114 | 
115 |     parser.add_argument('--censor-string', dest='censor_string', required=False,
116 |                         default='[CENSORED]',
117 |                         help='This is what censored fields are replaced with in censored data'
118 |     )
119 | 
120 |     parser.add_argument('--archive-results', dest='archive_results', action='store_true',
121 |                         default=False,
122 |                         help='If True, will package data into an archive'
123 |     )
124 | 
125 |     parser.add_argument('--archive-censored-only', dest='archive_censored_only',
126 |                         action='store_true',
127 |                         default=False,
128 |                         help='If True, will only package data that is censored'
129 |     )
130 | 
131 |     parser.add_argument('--archive-extra-files', nargs='+', dest="archive_extra_files",
132 |                         required=False,
133 |                         help="Will copy these extra files into an archive if it is being "
134 |                         "created"
135 |     )
136 | 
137 |     parser.add_argument('--archive-output-dir', dest='archive_output_dir',
138 |                         required=False, default='archives',
139 |                         help="location for archived output"
140 |     )
141 | 
142 |     parser.add_argument('--archive-filename', dest='archive_filename',
143 |                         required=False, default='',
144 |                         help='archive filename; will use name for default if none specified'
145 |     )
146 | 
147 | 
148 | 
149 |     # skip steps to allow archiving/censoring without other processing
150 | 
151 |     parser.add_argument('--skip-gpx-conversion', dest='skip_gpx_conversion',
152 |                         action='store_true', required=False,
153 |                         help='Skips GPX conversion if used',
154 |     )
155 | 
156 |     parser.add_argument('--skip-fit-conversion', dest='skip_fit_conversion',
157 |                         action='store_true', required=False,
158 |                         help='Skips FIT conversion if used'
159 |     )
160 | 
161 |     args = parser.parse_args()
162 |     
163 |     options = vars(args)
164 |     name = options['subject_name'].lower().replace(' ','_')
165 |     options['root_subject_dir'] = os.path.join(options['subject_dir'], name)
166 |     options['name'] = name
167 |     if options['archive_extra_files'] is None:
168 |         options['archive_extra_files'] = []
169 |     # clean up some empty data
170 |     if options['gpx_target_dir']=='':
171 |         options['gpx_target_dir']=os.path.join(options['subject_dir'], name, 'gpx_csv')
172 | 
173 |     if options['fit_target_dir']=='':
174 |         options['fit_target_dir']=os.path.join(options['subject_dir'], name, 'fit_files')
175 | 
176 |     if options['fit_processed_csv_dir']=='':
177 |         options['fit_processed_csv_dir']=os.path.join(options['subject_dir'], name, 'fit_csv')
178 | 
179 |     if options['archive_filename'] == '':
180 |         options['archive_filename']=name
181 |         
182 |     if options['archive_output_dir'][0] != '/':
183 |         options['archive_output_dir'] = os.path.join(options['subject_dir'], options['archive_output_dir'])
184 | 
185 | 
186 |     return options
187 | 
188 |     
189 | 
190 | if __name__=='__main__':
191 |     main()
192 | 
193 | 
194 | if False:
195 |     print('cleaning GPS data and importing from garmin...')
196 |     os.system('python calculate_workout_variables.py')
197 |     os.system('python gpx_to_csv.py')
198 |     os.system('python3 import_and_process_garmin_fit.py')
199 |     print('cleaned GPS data and imported from garmin...')
200 | 


--------------------------------------------------------------------------------