├── README.md ├── calculate_workout_variables.py ├── censor_and_package.py ├── convert_fit_to_csv.py ├── import_and_process_garmin_fit.py └── process_all.py /README.md: -------------------------------------------------------------------------------- 1 | # Converting Garmin FIT to CSV (and GPX to CSV) 2 | 3 | [article for original version](https://maxcandocia.com/article/2017/Sep/22/converting-garmin-fit-to-csv/) 4 | 5 | Requirements: 6 | 7 | * Python 3.5+ 8 | 9 | * BeautifulSoup 10 | 11 | * fitparse (installation instructions below) 12 | 13 | * tzwhere (to localize timezones) 14 | 15 | * pytz (to localize timezones) 16 | 17 | * numpy (to calculate workout variables) 18 | 19 | * FIT files to convert (if using that feature) 20 | 21 | * GPX files to convert (if using that feature) 22 | 23 | First, install `fitparse` 24 | 25 | sudo pip3 install -e git+https://github.com/dtcooper/python-fitparse#egg=python-fitparse 26 | 27 | OR 28 | 29 | sudo pip3 install fitparse 30 | 31 | Then you can execute `process_all.py` 32 | 33 | python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/ 34 | 35 | This will create a bunch of CSVs for all of your workouts in that directory. The files will be stored in a `subject_data` directory's subdirectory (check the defaults for the specific folders), which is generated based off the subject name. Up to 3 files are made per FIT file: 36 | 37 | 1. A CSV of all of the track data 38 | 39 | 2. A CSV of the lap data 40 | 41 | 3. A CSV of the start (and stop) data 42 | 43 | Each of the CSVs is in the format '{activity_type}_YY-MM-DD_HH-MM-SS[_{laps,starts}].csv. 44 | 45 | You can also provide a csv to censor certain geographic regions by latitude, longitude, and radius. Simply create a CSV with `longitude`, `latitude`, and `radius` column headers, and add as many circular regions as you want. Note that radius is assumed to be in meters. 46 | 47 | 48 | python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/ --censorfile=/home/mydir/censor.csv --censor 49 | 50 | This will be stored in a folder called `"censored"` that is in the subject's directory. You can use the `--censor-string=` to change what censored fields are replaced with (default is `[CENSORED]`). 51 | 52 | You can also archive the data after it's been processed: 53 | 54 | python3 process_all.py --subject-name=mysubjectname --fit-source-dir=/media/myname/GARMIN/Garmin/ACTIVITY/ --censorfile=/home/mydir/censor.csv --censor --archive-results 55 | 56 | By default, this stores data in a directory called `archives` in the main `subject_data` folder. You can add the `--archive-censored-only`, which will only archive the censored folder. 57 | 58 | ## GPX data 59 | 60 | You can also process GPX data (and censor it the same way as FIT data) 61 | 62 | For the initial processing, you can do 63 | 64 | python3 process_all.py --subject-name=mysubjectname --skip-fit-conversion gpx-source-dir=/home/mydir/gpx_files 65 | 66 | By default, the program will always try to copy/process FIT files unless you add the `--skip-fit-conversion` flag, but you can always tweak the code to your needs. 67 | 68 | ## Additional Help 69 | 70 | You can use `python3 process_all.py --help` to see more information. -------------------------------------------------------------------------------- /calculate_workout_variables.py: -------------------------------------------------------------------------------- 1 | from bs4 import BeautifulSoup 2 | import numpy as np 3 | PI = np.pi 4 | import csv 5 | import os 6 | from collections import OrderedDict 7 | import re 8 | 9 | OUTPUT_FILE = 'gpx_processed_info.csv' 10 | 11 | MAX_SPEED = 50#mph 12 | 13 | #radius of earth in miles 14 | C_R = 6371/1.60934 15 | def distcalc(c1, c2): 16 | lat1 = float(c1['lat'])*PI/180. 17 | lon1 = float(c1['lon'])*PI/180. 18 | 19 | lat2 = float(c2['lat'])*PI/180. 20 | lon2 = float(c2['lon'])*PI/180. 21 | 22 | dlat = lat2-lat1 23 | dlon = lon2-lon1 24 | 25 | a = np.sin(dlat/2.)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(dlon/2)**2 26 | c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) 27 | d = C_R * c 28 | return d 29 | 30 | def calculate_distances(points): 31 | dists = np.asarray([distcalc(c2.attrs,c1.attrs) for c1, c2 in zip(points[1:],points[:-1])]) 32 | return dists 33 | 34 | def calculate_velocities(distances): 35 | #convert mi/s to mph 36 | velocities = distances * 3600 37 | return velocities 38 | 39 | def calculate_accelerations(velocities): 40 | return np.diff(velocities) 41 | 42 | MIPS_TO_MPH = 3600. 43 | 44 | FPS_TO_MPH = 3600./5280 45 | 46 | G_FPS = 32. 47 | 48 | G_MPHPS = 32 * FPS_TO_MPH 49 | 50 | def process_file(filename, target_dir): 51 | new_filename = re.sub(r'([^.]+)\.gpx', r'raw_csv/\1.csv', filename) 52 | if os.path.exists(new_filename): 53 | #print '%s already exists. skipping.' % new_filename 54 | return None 55 | print('processing %s' % filename ) 56 | with open(os.path.join(target_dir, filename),'r') as f: 57 | soup = BeautifulSoup(f.read(), 'lxml') 58 | track = soup.find('trk') 59 | segments = track.find('trkseg') 60 | points = segments.find_all('trkpt') 61 | times = [p.find('time').text for p in points] 62 | elevations = np.asarray([float(p.find('ele').text) for p in points]) 63 | #lon-lat based 64 | distances = calculate_distances(points) 65 | velocities = calculate_velocities(distances) 66 | #if velocity > MAX_SPEED, then it indicates discontinuity 67 | velocities = velocities * (velocities < MAX_SPEED) 68 | accelerations = calculate_accelerations(velocities) 69 | #elevation 70 | elevation_changes = np.diff(elevations) 71 | sum_v = np.sum(velocities) 72 | sum_v2 = np.sum(velocities**2) 73 | sum_v3 = np.sum(velocities**3) 74 | abs_elevation = np.sum(np.abs(elevation_changes))/2 75 | sum_a = np.sum(accelerations * (accelerations > 0)) 76 | #alternative type of accelerations measurement 77 | velocities_mph = 3600 * velocities 78 | energy_increases = velocities_mph[1:]**2 - velocities_mph[:-1]**2 79 | energy_increases = energy_increases - FPS_TO_MPH**2 * G_FPS * elevation_changes[1:] * (elevation_changes[1:] < 0) 80 | energy_increases = np.sum(energy_increases * (energy_increases > 0)) 81 | with open(new_filename, 'w') as f: 82 | f.write('time,distance,elevation_change') 83 | for t, d, e in zip(times[1:], distances, elevation_changes): 84 | f.write('\n') 85 | f.write(','.join([str(t), str(d), str(e)])) 86 | return { 87 | 'sum_v':sum_v, 88 | 'sum_v2':sum_v2, 89 | 'abs_elevation':abs_elevation, 90 | 'sum_a':sum_a, 91 | 'sum_v3':sum_v3, 92 | 'sum_e':energy_increases 93 | } 94 | 95 | def main(gpx_source_dir, gpx_target_dir, gpx_summary_filename): 96 | original_dir = os.getcwd() 97 | os.makedirs(gpx_target_dir, exist_ok=True) 98 | 99 | os.chdir(gpx_source_dir) 100 | file_list = [x for x in os.listdir('.') if x[-4:].lower()=='.gpx'] 101 | file_list.sort() 102 | fileinfo = OrderedDict() 103 | for file in file_list: 104 | td = process_file(file, gpx_target_dir) 105 | if td is not None: 106 | fileinfo[file] = td 107 | #no longer interested in actually summing up variables here... 108 | if True: 109 | return 0 110 | with open(os.path.join(gpx_target_dir, gpx_summary_filename), 'w') as f: 111 | f.write(','.join(['filename','sum_v','sum_v2','sum_v3','abs_elevation','sum_a', 'sum_e'])) 112 | for fn, data in fileinfo.iteritems(): 113 | f.write('\n') 114 | f.write(','.join([str(x) for x in [ 115 | fn, 116 | data['sum_v'], 117 | data['sum_v2'], 118 | data['sum_v3'], 119 | data['abs_elevation'], 120 | data['sum_a'], 121 | data['sum_e'] 122 | ] 123 | ] 124 | )) 125 | print('processed gpx files') 126 | 127 | os.setwd(original_dir) 128 | 129 | 130 | if __name__=='__main__': 131 | raise NotImplementedError('this program is now to be called from other files') #main() 132 | 133 | """ 134 | **Context:** 135 | 136 | I am trying to reverse-engineer Strava's algorithm for measuring Calories burned on a bike ride. I have 76 .gpx files downloaded, along with Strava's estimation for Calories burned. The equations for measuring this involve being able to estimate the speed at various time points, the elevation at various time points, and calculating power as a polynomial function of these two values. Integrating unscaled polynomials over time and plugging them into a linear regression with respect to the Calorie estimate should determine coefficients for my particular case. 137 | 138 | **Note that I do not believe Strava's measurement is accurate. I simply want to determine the algorithms and formula they use for my particular case** 139 | 140 | _________________ 141 | **Problems:** 142 | 143 | 1. My current speed variable is simply calculated by scaling the distance (each over 1-second intervals), which was initially calculated using the [haversine formula](https://en.wikipedia.org/wiki/Haversine_formula). I have a smoothed version that uses a normal kernel with varying bandwidths, and in each case the total distance matches the one Strava displays on its website. 144 | 145 | 2. The elevation calculations I make to determine overall change in elevation are very far off from the website's. I am simply looking at all positive increases in elevation (there is an 0.1-foot resolution), and adding those together. The estimates the website gives are usually 3/4-3 times the value I estimate. 146 | 147 | 3. The power formula the website gives is [this](https://support.strava.com/hc/en-us/articles/216917107-Power-Calculations), but it seems to calculate wind resistance as a function of v^2 (which is usually the force) as opposed to v^3 (which should be the power). Either way, I use the first, second, and third powers of velocity as variables. Additionally, I look at kinetic energy increases (a function of v^2/2) and add those together, decreasing them by decreases in gravitational potential energy that simultaneously occur. 148 | 149 | _____________ 150 | 151 | **Additional notes about data:** 152 | 153 | 1. The rides take place on mostly flat ground, with very few hills. Most elevation changes take place over longer distances. 154 | 155 | 2. 156 | """ 157 | -------------------------------------------------------------------------------- /censor_and_package.py: -------------------------------------------------------------------------------- 1 | import os 2 | import csv 3 | import re 4 | from bs4 import BeautifulSoup 5 | import bs4 6 | import shutil 7 | from io import StringIO 8 | import zipfile 9 | import numpy as np 10 | PI = np.pi 11 | import codecs 12 | 13 | #should have 3 columns: longitude, latitude, radius (meters) 14 | #CENSORFILE = 'censor.csv' 15 | 16 | CENSOR_PARAMS = { 17 | 'location':True,#this should always be true 18 | 'heart_rate':False, 19 | 'speed':True, 20 | 'temperature':True, 21 | 'speed_smooth':True, 22 | 'elevation':True, 23 | 'altitude':True, 24 | 'timestamp':False,#if this is true, it will censor all of the CSVs 25 | 'start_position_lat':True, 26 | 'start_position_long':True, 27 | 'end_position_lat':True, 28 | 'end_position_long':True, 29 | 'latitude':True, 30 | 'longitude':True, 31 | 'position_lat':True, 32 | 'position_long':True, 33 | 'enhanced_altitude':True, 34 | 'enhanced_speed':True, 35 | #GPX-specific namestrk 36 | 'ele':True, 37 | 'lat':True, 38 | 'lon':True, 39 | 'time':False,#if this is true, will censor all of GPX 40 | } 41 | 42 | # other names that can be synonymous with lat, lon 43 | ADDITIONAL_LATLONG = [('start_position_lat','start_position_long'), 44 | ('end_position_lat','end_position_long')] 45 | 46 | 47 | #removes any NA values for coordinates 48 | #REMOVE_MISSING_COORDINATES = True 49 | 50 | CENSOR_STRING = '[CENSORED]' 51 | 52 | #ROOT_DIRECTORY = '/ntfsl/data/workouts' 53 | 54 | #SEARCH_DIRECTORIES = [ 55 | # 'workout_gpx/strava_gpx', 56 | # 'workout_gpx/garmin_fit', 57 | # 'workout_gpx/cateye_gpx', 58 | # 'workout_gpx/strava_gpx/gpx_csv' 59 | #] 60 | 61 | #TARGET_DIRECTORY = 'CLEAN_WORKOUTS' 62 | 63 | #ZIP_FILENAME = 'CLEAN_WORKOUTS.ZIP' 64 | 65 | CENSOR_COORDINATES = [] 66 | 67 | #ADDITIONAL_FILES_TO_COPY = ['workout_gpx/strava_gpx/bike_and_run_gpx_info.ods'] 68 | 69 | #will overwrite file if it already exists 70 | OVERWRITE = False 71 | OVERWRITE_CSV = True 72 | OVERWRITE_GPX = False 73 | 74 | BLACKLIST = set(['test_file.csv']) 75 | 76 | #radius of earth in meters 77 | C_R = 6371. * 1000#/1.60934 78 | def distcalc(c1, c2): 79 | lat1 = float(c1['lat'])*PI/180. 80 | lon1 = float(c1['lon'])*PI/180. 81 | 82 | lat2 = float(c2['lat'])*PI/180. 83 | lon2 = float(c2['lon'])*PI/180. 84 | 85 | dlat = lat2-lat1 86 | dlon = lon2-lon1 87 | 88 | a = np.sin(dlat/2.)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(dlon/2)**2 89 | c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) 90 | d = C_R * c 91 | return d 92 | 93 | def calculate_distances(points): 94 | dists = np.asarray([distcalc(c2.attrs,c1.attrs) for c1, c2 in zip(points[1:],points[:-1])]) 95 | return dists 96 | 97 | def is_censorable(longitude, latitude): 98 | censor = False 99 | for cc in CENSOR_COORDINATES: 100 | dist = distcalc({'lat':cc['latitude'], 101 | 'lon':cc['longitude']}, 102 | {'lat':latitude,'lon':longitude}) 103 | if dist <= cc['radius']: 104 | censor = True 105 | break 106 | return censor 107 | 108 | CSV_REGEX = re.compile(r'.*\.csv$') 109 | GPX_REGEX = re.compile(r'.*\.gpx$') 110 | 111 | def find_csv(directory): 112 | files = os.listdir(directory) 113 | return [file for file in files if CSV_REGEX.match(file) and file not in BLACKLIST] 114 | 115 | def find_gpx(directory): 116 | files = os.listdir(directory) 117 | return [file for file in files if GPX_REGEX.match(file) and file not in BLACKLIST] 118 | 119 | def censor_line(x, template): 120 | return [e if not template[i] else CENSOR_STRING for i, e in enumerate(x)] 121 | 122 | def transfer_csv(filename, directory, censor_target_dir): 123 | target_file = os.path.join(censor_target_dir, directory, filename) 124 | if os.path.isfile(target_file) and not (OVERWRITE or OVERWRITE_CSV): 125 | return 1 126 | with open(os.path.join(directory, filename), 'r') as f: 127 | reader = csv.reader(f) 128 | use_alternate_censoring = False 129 | with codecs.open(os.path.join(censor_target_dir, os.path.split(directory)[1], filename), 130 | 'w', encoding='utf8') as of: 131 | writer = csv.writer(of) 132 | header = next(reader) 133 | writer.writerow(header) 134 | if 'latitude' in header: 135 | lat_index = header.index('latitude') 136 | lon_index = header.index('longitude') 137 | elif 'position_lat' in header: 138 | lat_index = header.index('position_lat') 139 | lon_index = header.index('position_long') 140 | else: 141 | use_alternate_censoring=True 142 | other_latlong_indexes = [] 143 | for names in ADDITIONAL_LATLONG: 144 | try: 145 | other_latlong_indexes.append( ( header.index(names[0]), header.index(names[1])) ) 146 | except ValueError: 147 | continue 148 | #currently not in use 149 | censorable_columns = [i for i, column in enumerate(header) if CENSOR_PARAMS.get(column, False)] 150 | #currently in use 151 | should_censor = [CENSOR_PARAMS.get(column, False) for i, column in enumerate(header)] 152 | #print should_censor 153 | for line in reader: 154 | if not use_alternate_censoring: 155 | try: 156 | longitude, latitude = ( float(line[lon_index]), float(line[lat_index]) ) 157 | if is_censorable(longitude, latitude): 158 | if not CENSOR_PARAMS['timestamp']: 159 | writer.writerow(censor_line(line, should_censor)) 160 | else: 161 | writer.writerow(line) 162 | except ValueError: 163 | #likely has one or both of the longitude/latitude values missing 164 | #I do not personally have files like this (I think), but it is possible 165 | #will fail to censor latitude/longitude if the other is not present, but that's 166 | #not realistic 167 | print('....') 168 | writer.writerow(line) 169 | else: 170 | will_censor = False 171 | for latitude_idx, longitude_idx in other_latlong_indexes: 172 | try: 173 | latitude = float(line[latitude_idx]) 174 | longitude = float(line[longitude_idx]) 175 | except ValueError: 176 | #value of 'None' likely, will just ignore this... 177 | continue 178 | will_censor = will_censor or is_censorable(latitude, longitude) 179 | if will_censor: 180 | break 181 | if will_censor: 182 | if not CENSOR_PARAMS['timestamp']: 183 | writer.writerow(censor_line(line, should_censor)) 184 | else: 185 | writer.writerow(line) 186 | print('transfered %s' % (os.path.join(directory, filename))) 187 | 188 | 189 | def load_censor_coordinates(censorfile): 190 | # not great practice, but easy enough to use here 191 | global CENSOR_COORDINATES 192 | with open(censorfile,'r') as f: 193 | reader = csv.reader(f) 194 | header = next(reader) 195 | lat_index = header.index('latitude') 196 | lon_index = header.index('longitude') 197 | radius_index = header.index('radius') 198 | for line in reader: 199 | CENSOR_COORDINATES.append({'latitude':float(line[lat_index]), 200 | 'longitude':float(line[lon_index]), 201 | 'radius':float(line[radius_index])} 202 | ) 203 | print(CENSOR_COORDINATES ) 204 | print('loaded CENSOR_COORDINATES') 205 | return 0 206 | 207 | def transfer_gpx(filename, directory, censor_target_dir): 208 | target_file = os.path.join(censor_target_dir, os.path.split(directory)[1], filename) 209 | if os.path.isfile(target_file) and not (OVERWRITE or OVERWRITE_GPX): 210 | return 1 211 | with open(os.path.join(directory, filename),'r') as f: 212 | data = f.read() 213 | soup = BeautifulSoup(data, 'lxml',from_encoding="utf-8") 214 | trkpts = soup.find_all('trkpt') 215 | for pt in trkpts: 216 | lat, lon = (float(pt.attrs['lat']), float(pt.attrs['lon']) ) 217 | will_censor = is_censorable(lon, lat) 218 | if will_censor: 219 | if CENSOR_PARAMS['time']: 220 | pt.decompose() 221 | else: 222 | for child in pt.children: 223 | if isinstance(child, bs4.element.Tag): 224 | if CENSOR_PARAMS.get(child.name, False): 225 | child.decompose() 226 | if CENSOR_PARAMS.get('lat', False): 227 | pt.attrs['lat'] = CENSOR_STRING 228 | if CENSOR_PARAMS.get('lon', False): 229 | pt.attrs['lon'] = CENSOR_STRING 230 | 231 | with codecs.open(os.path.join(censor_target_dir, os.path.split(directory)[1], 232 | filename), 'w', encoding='utf8') as f: 233 | try: 234 | f.write(soup.prettify()) 235 | except: 236 | print(filename ) 237 | print(directory ) 238 | raise Exception('fix that damn unicode bug') 239 | print('processed %s' % '/'.join([directory,filename])) 240 | return 0 241 | 242 | def make_directories(censor_search_directories, censor_target_dir): 243 | counter = 0 244 | for directory in censor_search_directories: 245 | path = os.path.join(censor_target_dir, os.path.split(directory)[1]) 246 | if not os.path.exists(path): 247 | os.makedirs(path) 248 | counter += 1 249 | print('made %d necessary directories' % counter ) 250 | 251 | def zip_target_directory( archive_target_dir, zip_filename, target_directory): 252 | shutil.make_archive(os.path.join(archive_target_dir, zip_filename), 'zip', target_directory) 253 | 254 | def main( 255 | censor_search_directories, 256 | censor_target_dir, 257 | censorfile, 258 | censor_string, 259 | options, # pretty much everything else... 260 | ): 261 | #os.chdir(ROOT_DIRECTORY) 262 | if censorfile != '': 263 | load_censor_coordinates(censorfile) 264 | 265 | # quick h4ck 266 | global CENSOR_STRING 267 | CENSOR_STRING = censor_string 268 | 269 | if censorfile != '': 270 | make_directories(censor_search_directories, censor_target_dir) 271 | for directory in censor_search_directories: 272 | print('searching %s' % directory ) 273 | csv_files = find_csv(directory) 274 | gpx_files = find_gpx(directory) 275 | #print gpx_files 276 | for filename in csv_files: 277 | try: 278 | transfer_csv(filename, directory, censor_target_dir) 279 | except Exception as e: 280 | print('!') 281 | print(filename ) 282 | raise e 283 | for filename in gpx_files: 284 | transfer_gpx(filename, directory, censor_target_dir) 285 | if options['archive_results']: 286 | os.makedirs(options['archive_output_dir'], exist_ok=True) 287 | for file in options['archive_extra_files']: 288 | if options['archive_censored_only']: 289 | shutil.copyfile(file, os.path.join(censor_target_dir, os.path.split(file)[1])) 290 | else: 291 | shutil.copyfile(file, os.path.join(options['root_subject_dir'], os.path.split(file)[1])) 292 | 293 | if options['archive_censored_only']: 294 | zip_target_directory(options['archive_output_dir'], options['archive_filename'], 295 | censor_target_dir 296 | ) 297 | else: 298 | zip_target_directory(options['archive_output_dir'], options['archive_filename'], 299 | options['root_subject_dir'], 300 | ) 301 | print('made censored files and zipped them!') 302 | 303 | 304 | if __name__=='__main__': 305 | raise NotImplementedError('No longer supporting executable') 306 | -------------------------------------------------------------------------------- /convert_fit_to_csv.py: -------------------------------------------------------------------------------- 1 | """ 2 | toggle ALT_FILENAME to change naming scheme 3 | currently recommended to keep at =True, since event type is placed in filename 4 | of created objects 5 | """ 6 | 7 | import csv 8 | import os 9 | #to install fitparse, run 10 | #sudo pip3 install -e git+https://github.com/dtcooper/python-fitparse#egg=python-fitparse 11 | import fitparse 12 | import pytz 13 | from copy import copy 14 | from tzwhere import tzwhere 15 | 16 | print('Initializing tzwhere') 17 | tzwhere = tzwhere.tzwhere() 18 | 19 | tz_fields = ['timestamp_utc', 'timezone'] 20 | 21 | #for general tracks 22 | allowed_fields = ['timestamp','position_lat','position_long', 'distance', 23 | 'enhanced_altitude', 'altitude','enhanced_speed', 24 | 'speed', 'heart_rate','cadence','fractional_cadence', 25 | 'temperature'] + tz_fields 26 | 27 | # if gps data is spotty, but you want to keep HR/temp data while the clock is running, you can remove 28 | # 'position_lat' and 'position_long' from here 29 | required_fields = ['timestamp', 'position_lat', 'position_long', 'altitude'] 30 | 31 | 32 | 33 | #for laps 34 | lap_fields = ['timestamp','start_time','start_position_lat','start_position_long', 35 | 'end_position_lat','end_position_long','total_elapsed_time','total_timer_time', 36 | 'total_distance','total_strides','total_calories','enhanced_avg_speed','avg_speed', 37 | 'enhanced_max_speed','max_speed','total_ascent','total_descent', 38 | 'event','event_type','avg_heart_rate','max_heart_rate', 39 | 'avg_running_cadence','max_running_cadence', 40 | 'lap_trigger','sub_sport','avg_fractional_cadence','max_fractional_cadence', 41 | 'total_fractional_cycles','avg_vertical_oscillation','avg_temperature','max_temperature'] + tz_fields 42 | #last field above manually generated 43 | lap_required_fields = ['timestamp', 'start_time','lap_trigger'] 44 | 45 | #start/stop events 46 | start_fields = ['timestamp','timer_trigger','event','event_type','event_group'] 47 | start_required_fields = copy(start_fields) 48 | start_fields += tz_fields 49 | 50 | # 51 | all_allowed_fields = set(allowed_fields + lap_fields + start_fields) 52 | 53 | UTC = pytz.UTC 54 | CST = pytz.timezone('US/Central') 55 | 56 | 57 | #files beyond the main file are assumed to be created, as the log will be updated only after they are created 58 | ALT_FILENAME = True 59 | ALT_LOG_ = 'file_log.log' 60 | 61 | def read_log(log_path): 62 | with open(os.path.join(log_path, ALT_LOG_), 'r') as f: 63 | lines = f.read().split() 64 | return lines 65 | 66 | def append_log(filename, log_path): 67 | with open(os.path.join(log_path, ALT_LOG_), 'a') as f: 68 | f.write(filename) 69 | f.write('\n') 70 | return None 71 | 72 | def main( 73 | fit_target_dir, 74 | fit_processed_csv_dir, 75 | fit_overwrite, 76 | fit_ignore_splits_and_laps, 77 | ): 78 | 79 | ALT_LOG = os.path.join(fit_processed_csv_dir, ALT_LOG_) 80 | files = os.listdir(fit_target_dir) 81 | fit_files = [file for file in files if file[-4:].lower()=='.fit'] 82 | overwritten_files = [] 83 | 84 | if not os.path.exists(ALT_LOG): 85 | os.system('touch %s' % ALT_LOG) 86 | file_list = [] 87 | else: 88 | file_list = read_log(fit_processed_csv_dir) 89 | 90 | for file in fit_files: 91 | is_overwritten=False 92 | if file in file_list and not fit_overwrite: 93 | continue 94 | elif file in file_list: 95 | is_overwritten=True 96 | 97 | new_filename = file[:-4] + '.csv' 98 | 99 | fitfile = fitparse.FitFile( 100 | os.path.join(fit_target_dir, file), 101 | data_processor=fitparse.StandardUnitsDataProcessor() 102 | ) 103 | 104 | print('converting %s' % os.path.join(fit_target_dir, file)) 105 | write_fitfile_to_csv( 106 | fitfile, 107 | new_filename, 108 | file, 109 | fit_target_dir, 110 | fit_processed_csv_dir, 111 | is_overwritten, 112 | fit_ignore_splits_and_laps, 113 | ) 114 | print('finished conversions') 115 | 116 | def lap_filename(output_filename): 117 | return output_filename[:-4] + '_laps.csv' 118 | 119 | def start_filename(output_filename): 120 | return output_filename[:-4] + '_starts.csv' 121 | 122 | def get_timestamp(messages): 123 | for m in messages: 124 | fields = m.fields 125 | for f in fields: 126 | if f.name == 'timestamp': 127 | return f.value 128 | return None 129 | 130 | def get_event_type(messages): 131 | for m in messages: 132 | fields = m.fields 133 | for f in fields: 134 | if f.name == 'sport': 135 | return f.value 136 | return None 137 | 138 | def write_fitfile_to_csv( 139 | fitfile, 140 | output_file='test_output.csv', 141 | original_filename=None, 142 | fit_target_dir=None, #raises errors if not defined 143 | fit_processed_csv_dir=None, #raises errors if not defined 144 | is_overwritten=False, 145 | fit_ignore_splits_and_laps=False 146 | ): 147 | tz_name = '' 148 | local_tz = CST 149 | changed_tz = False 150 | position_long = None 151 | position_lat = None 152 | messages = fitfile.messages 153 | data = [] 154 | lap_data = [] 155 | start_data = [] 156 | #this should probably work, but it's possibly 157 | #based on a certain version of the file/device 158 | timestamp = get_timestamp(messages) 159 | event_type = get_event_type(messages) 160 | if event_type is None: 161 | event_type = 'other' 162 | output_file = event_type + '_' + timestamp.strftime('%Y-%m-%d_%H-%M-%S.csv') 163 | 164 | for m in messages: 165 | skip=False 166 | skip_lap = False 167 | skip_start = False 168 | if not hasattr(m, 'fields'): 169 | continue 170 | fields = m.fields 171 | #check for important data types 172 | mdata = {} 173 | for field in fields: 174 | if not changed_tz and field.name in ['position_lat','position_long', 'start_position_lat','start_position_long']: 175 | if 'lat' in field.name: 176 | try: 177 | position_lat = float(field.value) 178 | except TypeError: 179 | pass 180 | else: 181 | try: 182 | position_long = float(field.value) 183 | except TypeError: 184 | pass 185 | if position_lat is not None and position_long is not None: 186 | changed_tz = True 187 | tz_name = tzwhere.tzNameAt(position_lat, position_long) 188 | if tz_name is None: 189 | for latoff in [-0.1, 0, 0.1]: 190 | for longoff in [-0.1, 0, 0.1]: 191 | tz_name = tzwhere.tzNameAt( 192 | position_lat + latoff, 193 | position_long + longoff 194 | ) 195 | if tz_name is not None: 196 | break 197 | 198 | try: 199 | local_tz = pytz.timezone(tz_name) 200 | except Exception as e: 201 | print('TZ NAME: %s' % tz_name) 202 | print('lat/lon: (%s/%s)' % (position_lat, position_long)) 203 | print('outfile name: %s' % output_file) 204 | raise e 205 | if tz_name != 'US/Central': 206 | print('Using timezone %s' % tz_name) 207 | 208 | 209 | if field.name in all_allowed_fields: 210 | # currently saving timezone conversion to end, but keeping this here for now 211 | if field.name=='timestamp' and False: 212 | mdata[field.name] = UTC.localize(field.value).astimezone(local_tz) 213 | else: 214 | mdata[field.name] = field.value 215 | # this is sort of a janky way of determining field type, but it works for now 216 | for rf in required_fields: 217 | if rf not in mdata: 218 | skip=True 219 | for lrf in lap_required_fields: 220 | if lrf not in mdata: 221 | skip_lap = True 222 | for srf in start_required_fields: 223 | if srf not in mdata: 224 | skip_start = True 225 | if not skip: 226 | data.append(mdata) 227 | elif not skip_lap: 228 | lap_data.append(mdata) 229 | elif not skip_start: 230 | start_data.append(mdata) 231 | 232 | # localize timezone 233 | for row in data + lap_data + start_data: 234 | if 'timestamp' in row: 235 | row['timestamp_utc'] = row['timestamp'] 236 | row['timestamp'] = UTC.localize(row['timestamp']).astimezone(local_tz) 237 | row['timezone'] = tz_name 238 | 239 | #write to csv 240 | #general track info 241 | with open(os.path.join(fit_processed_csv_dir, output_file), 'w') as f: 242 | writer = csv.writer(f) 243 | writer.writerow(allowed_fields) 244 | for entry in data: 245 | writer.writerow([ str(entry.get(k, '')) for k in allowed_fields]) 246 | 247 | if not fit_ignore_splits_and_laps: 248 | #lap info 249 | with open(os.path.join(fit_processed_csv_dir, lap_filename(output_file)), 'w') as f: 250 | writer = csv.writer(f) 251 | writer.writerow(lap_fields) 252 | for entry in lap_data: 253 | writer.writerow([ str(entry.get(k, '')) for k in lap_fields]) 254 | #start/stop info 255 | with open(os.path.join(fit_processed_csv_dir, start_filename(output_file)), 'w') as f: 256 | writer = csv.writer(f) 257 | writer.writerow(start_fields) 258 | for entry in start_data: 259 | writer.writerow([ str(entry.get(k, '')) for k in start_fields]) 260 | print('wrote %s' % output_file) 261 | if not fit_ignore_splits_and_laps: 262 | print('wrote %s' % lap_filename(output_file)) 263 | print('wrote %s' % start_filename(output_file)) 264 | 265 | if not is_overwritten: 266 | append_log(original_filename, fit_processed_csv_dir) 267 | 268 | if not changed_tz: 269 | print('TZ IS NOT CHANGED!') 270 | 271 | if __name__=='__main__': 272 | raise NotImplementedError('There is no way to currently run this as a command-line script. It must be imported. Run process_all.py instead.') 273 | main() 274 | -------------------------------------------------------------------------------- /import_and_process_garmin_fit.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import re 4 | 5 | import convert_fit_to_csv 6 | 7 | #ACTIVITY_DIRECTORY = '/media/max/GARMIN/Garmin/ACTIVITY/' 8 | 9 | #TARGET_DIRECTORY = '/ntfsl/data/workouts/workout_gpx/garmin_fit/' 10 | 11 | FNAME_REGEX = re.compile(r'\.[Ff][Ii][Tt]') 12 | 13 | 14 | 15 | def main( 16 | fit_source_dir, 17 | fit_target_dir, 18 | fit_processed_csv_dir, 19 | fit_overwrite, 20 | fit_ignore_splits_and_laps, 21 | ): 22 | os.makedirs(fit_target_dir, exist_ok=True) 23 | os.makedirs(fit_processed_csv_dir, exist_ok=True) 24 | activity_files = os.listdir(fit_source_dir) 25 | print('activity files: ', activity_files) 26 | new_names = [FNAME_REGEX.sub('.fit', file) for file in activity_files] 27 | print('new names: ', new_names) 28 | current_files = set(os.listdir(fit_target_dir)) 29 | print('current_files: ', current_files) 30 | for src_file, tgt_file in zip(activity_files, new_names): 31 | if FNAME_REGEX.sub('.fit', tgt_file) in current_files: 32 | print('%s already exists...' % tgt_file) 33 | continue 34 | else: 35 | pass 36 | shutil.copyfile( 37 | os.path.join(fit_source_dir, src_file), 38 | os.path.join(fit_target_dir, tgt_file) 39 | ) 40 | print("copied %s to %s" % ( 41 | os.path.join(fit_source_dir, src_file), 42 | os.path.join(fit_target_dir, tgt_file) 43 | ) 44 | ) 45 | 46 | convert_fit_to_csv.main( 47 | fit_target_dir, 48 | fit_processed_csv_dir, 49 | fit_overwrite, 50 | fit_ignore_splits_and_laps, 51 | ) 52 | 53 | #os.chdir(fit_target_dir) 54 | #os.system('python3 convert_fit_to_csv.py') 55 | 56 | if __name__=='__main__': 57 | main() 58 | -------------------------------------------------------------------------------- /process_all.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import argparse 4 | 5 | import import_and_process_garmin_fit 6 | #import gpx_to_csv 7 | import calculate_workout_variables 8 | import censor_and_package 9 | 10 | def main(): 11 | options = parse_options() 12 | censor_search_directories = [] 13 | 14 | if options['gpx_source_dir'] != '': 15 | if not options['skip_gpx_conversion']: 16 | print('doing GPX conversions') 17 | calculate_workout_variables.main( 18 | options['gpx_source_dir'], 19 | options['gpx_target_dir'], 20 | options['gpx_summary_filename'], 21 | 22 | ) 23 | censor_search_directories.append(options['gpx_target_dir']) 24 | 25 | if options['fit_source_dir'] != '': 26 | if not options['skip_fit_conversion']: 27 | print('doing FIT conversions') 28 | import_and_process_garmin_fit.main( 29 | options['fit_source_dir'], 30 | options['fit_target_dir'], 31 | options['fit_processed_csv_dir'], 32 | options['fit_overwrite'], 33 | options['fit_ignore_splits_and_laps'], 34 | ) 35 | censor_search_directories.append(options['fit_processed_csv_dir']) 36 | 37 | # even if no censoring is done, archiving can still be done here 38 | if True: #options['censorfile'] != '' and len(censor_search_directories) > 0: 39 | censor_target_dir = os.path.join(options['subject_dir'], options['name'], 'censored') 40 | censor_and_package.main( 41 | censor_search_directories, 42 | censor_target_dir, 43 | options['censorfile'], 44 | options['censor_string'], 45 | # will be used to control archiving 46 | options 47 | ) 48 | 49 | 50 | 51 | def parse_options(): 52 | parser = argparse.ArgumentParser(description='Run FIT/GPX Pipeline') 53 | parser.add_argument('--subject-name', dest='subject_name', type=str, required=True, 54 | help='name of subject' 55 | ) 56 | 57 | parser.add_argument('--fit-source-dir', dest='fit_source_dir', type=str, 58 | default='/media/max/GARMIN/Garmin/ACTIVITY/', 59 | help='source data for garmin fit' 60 | ) 61 | 62 | parser.add_argument('--fit-target-dir', dest='fit_target_dir', required=False, 63 | default='', 64 | help='target directory for FIT data; default uses subject name' 65 | ) 66 | 67 | parser.add_argument('--fit-processed-csv-dir', dest='fit_processed_csv_dir', required=False, 68 | default='', 69 | help='target directory for CSVs of processed fit data; default uses subject name' 70 | ) 71 | 72 | #TODO 73 | parser.add_argument('--erase-copied-fit-files', dest='erase_copied_fit_files', required=False, 74 | action='store_true', 75 | help='If True, will delete any copied FIT files (not the originals, though)' 76 | ) 77 | 78 | parser.add_argument('--gpx-source-dir', dest='gpx_source_dir',required=False, default='', 79 | help='directory for gpx files (if desired)', 80 | ) 81 | 82 | parser.add_argument('--gpx-target-dir', dest='gpx_target_dir', required=False, 83 | default='', 84 | help='directory to store processed gpx csv in' 85 | ) 86 | 87 | parser.add_argument('--subject-dir', dest='subject_dir', 88 | default=os.path.join(os.getcwd(), 'subject_data'), 89 | help='default directory to store subject data in', 90 | ) 91 | 92 | parser.add_argument('--gpx-summary-filename', dest='gpx_summary_filename', 93 | default='gpx_summary.csv', 94 | help='the summary filename for gpx data' 95 | ) 96 | 97 | parser.add_argument('--fit-overwrite', dest='fit_overwrite', 98 | action='store_true', default=False, required=False, 99 | help='Will overwrite any previously created CSVs from fit data' 100 | ) 101 | 102 | parser.add_argument('--fit-ignore-splits-and-laps', dest='fit_ignore_splits_and_laps', 103 | action='store_true', default=False, required=False, 104 | help='Will not write split/lap data if specified' 105 | ) 106 | 107 | # censorship arguments 108 | 109 | parser.add_argument('--censorfile', dest='censorfile', required=False, 110 | default='', 111 | help='If provided, will use censorfile CSV to create a copy of data ' 112 | 'with censored locations around different latitude/longitude/radii' 113 | ) 114 | 115 | parser.add_argument('--censor-string', dest='censor_string', required=False, 116 | default='[CENSORED]', 117 | help='This is what censored fields are replaced with in censored data' 118 | ) 119 | 120 | parser.add_argument('--archive-results', dest='archive_results', action='store_true', 121 | default=False, 122 | help='If True, will package data into an archive' 123 | ) 124 | 125 | parser.add_argument('--archive-censored-only', dest='archive_censored_only', 126 | action='store_true', 127 | default=False, 128 | help='If True, will only package data that is censored' 129 | ) 130 | 131 | parser.add_argument('--archive-extra-files', nargs='+', dest="archive_extra_files", 132 | required=False, 133 | help="Will copy these extra files into an archive if it is being " 134 | "created" 135 | ) 136 | 137 | parser.add_argument('--archive-output-dir', dest='archive_output_dir', 138 | required=False, default='archives', 139 | help="location for archived output" 140 | ) 141 | 142 | parser.add_argument('--archive-filename', dest='archive_filename', 143 | required=False, default='', 144 | help='archive filename; will use name for default if none specified' 145 | ) 146 | 147 | 148 | 149 | # skip steps to allow archiving/censoring without other processing 150 | 151 | parser.add_argument('--skip-gpx-conversion', dest='skip_gpx_conversion', 152 | action='store_true', required=False, 153 | help='Skips GPX conversion if used', 154 | ) 155 | 156 | parser.add_argument('--skip-fit-conversion', dest='skip_fit_conversion', 157 | action='store_true', required=False, 158 | help='Skips FIT conversion if used' 159 | ) 160 | 161 | args = parser.parse_args() 162 | 163 | options = vars(args) 164 | name = options['subject_name'].lower().replace(' ','_') 165 | options['root_subject_dir'] = os.path.join(options['subject_dir'], name) 166 | options['name'] = name 167 | if options['archive_extra_files'] is None: 168 | options['archive_extra_files'] = [] 169 | # clean up some empty data 170 | if options['gpx_target_dir']=='': 171 | options['gpx_target_dir']=os.path.join(options['subject_dir'], name, 'gpx_csv') 172 | 173 | if options['fit_target_dir']=='': 174 | options['fit_target_dir']=os.path.join(options['subject_dir'], name, 'fit_files') 175 | 176 | if options['fit_processed_csv_dir']=='': 177 | options['fit_processed_csv_dir']=os.path.join(options['subject_dir'], name, 'fit_csv') 178 | 179 | if options['archive_filename'] == '': 180 | options['archive_filename']=name 181 | 182 | if options['archive_output_dir'][0] != '/': 183 | options['archive_output_dir'] = os.path.join(options['subject_dir'], options['archive_output_dir']) 184 | 185 | 186 | return options 187 | 188 | 189 | 190 | if __name__=='__main__': 191 | main() 192 | 193 | 194 | if False: 195 | print('cleaning GPS data and importing from garmin...') 196 | os.system('python calculate_workout_variables.py') 197 | os.system('python gpx_to_csv.py') 198 | os.system('python3 import_and_process_garmin_fit.py') 199 | print('cleaned GPS data and imported from garmin...') 200 | --------------------------------------------------------------------------------