├── LICENSE ├── README.md ├── accuracy.py ├── build_graph.py ├── create_folders.sh ├── data_preparation_train.py ├── model_testing.py ├── model_training.py ├── network ├── affinity.py ├── affinity_appearance.py ├── affinity_final.py ├── affinity_geom.py ├── complete_net.py ├── detections.py ├── encoderCNN.py └── optimizationGNN.py ├── singularity ├── test.sh ├── tracking.py ├── train.sh └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Ioannis Papakis 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization 2 | 3 | This repository is the official code implementation of the GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization on [IEEE](https://ieeexplore.ieee.org/document/9564655) and on [Arxiv](https://arxiv.org/abs/2010.00067). Link to access a new traffic vehicle monitoring dataset named "VA Beach Traffic Dataset" will be provided here. 4 | 5 | ## Citing: 6 | If you find this paper or code useful, please cite using the following on IEEE: 7 | 8 | ``` 9 | @inproceedings{papakis2021graph, 10 | title={A Graph Convolutional Neural Network Based Approach for Traffic Monitoring Using Augmented Detections with Optical Flow}, 11 | author={Papakis, Ioannis and Sarkar, Abhijit and Karpatne, Anuj}, 12 | booktitle={2021 IEEE International Intelligent Transportation Systems Conference (ITSC)}, 13 | pages={2980--2986}, 14 | year={2021}, 15 | organization={IEEE} 16 | } 17 | ``` 18 | 19 | or on Arxiv: 20 | 21 | ``` 22 | @article{papakis2020gcnnmatch, 23 | title={GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization}, 24 | author={Papakis, Ioannis and Sarkar, Abhijit and Karpatne, Anuj}, 25 | journal={arXiv preprint arXiv:2010.00067}, 26 | year={2020} 27 | } 28 | ``` 29 | 30 | ## Installing & Preparation: 31 | 32 | * Install singularity following instructions from its [website](https://sylabs.io/guides/3.0/user-guide/quick_start.html#quick-installation-steps). 33 | 34 | * Git clone this repo folder and cd to it. 35 | 36 | * "sudo singularity build geometric.sif singularity". Follow instructions from [pytorch-geometric](https://github.com/rusty1s/pytorch_geometric/tree/master/docker) to change settings if needed for your system. 37 | 38 | * Download MOT17 Dataset from [MOT website](https://motchallenge.net/data/MOT17/) and place it in a folder /MOT_dataset. 39 | 40 | * "mkdir overlay". It will allow you to install additional packages if needed in the future. 41 | 42 | * "sudo singularity run --nv -B /MOT_dataset/:/data --overlay overlay/ geometric.sif" 43 | 44 | * "./create_folders.sh" 45 | 46 | ## Training: 47 | 48 | * Command: ./train.sh 49 | 50 | * Result: Training will start and save the trained models in /models. Settings can be changed in tracking.py. 51 | 52 | ## Testing: 53 | 54 | * Specify which trained model to use in tracking.py. A trained model can be found [here](https://drive.google.com/drive/folders/1b0ZF7WAQFIXv6xydyU3OGGBW-7EhegSv?usp=sharing). 55 | 56 | * Command: ./test.sh 57 | 58 | * Result: Testing will start and produce txt files and videos saved in /output. Settings can be changed in tracking.py 59 | 60 | For Benchmark evaluation the pre-processed with Tracktor detection files from [this repo](https://github.com/dvl-tum/mot_neural_solver) were used. 61 | 62 | -------------------------------------------------------------------------------- /accuracy.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from utils import * 3 | import numpy as np 4 | 5 | def accuracy(k,j,edges_number_list,output_final,ground_truth,batch,start,device): 6 | num_of_edges= edges_number_list[int(j.item())] 7 | output3= [0] * num_of_edges 8 | output_sliced= output_final[start:start+num_of_edges].detach().clone() 9 | ground_truth_sliced= ground_truth[start:start+num_of_edges].to(torch.int8).detach().clone() 10 | edges_list_reduced= [[],[]] 11 | output_reduced= [] 12 | ground_truth_reduced= [] 13 | # print(j.item()) 14 | for i in range(len(batch[k].edge_index[0])): 15 | edge1= batch[k].edge_index[0][i] 16 | edge2= batch[k].edge_index[1][i] 17 | if edge1<=edge2: 18 | edges_list_reduced[0].append(edge1.item()) 19 | edges_list_reduced[1].append(edge2.item()) 20 | ground_truth_reduced.append(ground_truth_sliced[i]) 21 | if edge11: 38 | if len(out1)>max: 39 | max= len(out1) 40 | constraints.append(out1) 41 | if out2 and len(np.array(out2))>1: 42 | if len(out2)>max: 43 | max= len(out2) 44 | constraints.append(out2) 45 | # Get most probable edges as 1 and the other as 0 46 | max=0 47 | zero_indeces= [] 48 | one_indeces= [] 49 | ranking= True 50 | # print(optim_graph.out.size()) 51 | while ranking==True: 52 | for i, edge in enumerate(output_reduced): 53 | if (edge>max) and (i not in zero_indeces) and (i not in one_indeces): 54 | max=edge 55 | index= i 56 | if max==0: 57 | ranking= False 58 | 59 | else: 60 | one_indeces.append(index) 61 | for constraint in constraints: 62 | if index in constraint: 63 | for constr in constraint: 64 | if constr!=index and constr!=-1 and constr not in zero_indeces: 65 | zero_indeces.append(constr) 66 | max=0 67 | processed_output= [] 68 | for i, edge in enumerate(output_reduced): 69 | if i in one_indeces: 70 | processed_output.append(torch.tensor(1).to(device)) 71 | else: 72 | processed_output.append(torch.tensor(0).to(device)) 73 | processed_output= torch.stack(processed_output) 74 | ground_truth_reduced= torch.stack(ground_truth_reduced) 75 | 76 | return processed_output, ground_truth_reduced,output_reduced -------------------------------------------------------------------------------- /build_graph.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import os 3 | import numpy as np 4 | import torch.nn.functional as F 5 | from utils import * 6 | from torch_geometric.data import Data, DataLoader, DataListLoader 7 | import torchvision.utils as vutils 8 | import matplotlib.pyplot as plt 9 | from PIL import Image 10 | from torchvision.transforms import ToTensor 11 | 12 | def build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=True): 13 | 14 | if len(tracklets): 15 | edges_first_row = [] 16 | edges_second_row = [] 17 | edges_complete_first_row= [] 18 | edges_complete_second_row = [] 19 | edge_attr = [] 20 | ground_truth = [] 21 | idx = [] 22 | node_attr = [] 23 | coords = [] 24 | frame = [] 25 | coords_original = [] 26 | transform= ToTensor() 27 | ####tracklet graphs 28 | for tracklet in tracklets: 29 | tracklet1= tracklet[-1] 30 | xmin, ymin, width, height = int(round(tracklet1[2])), int(round(tracklet1[3])), \ 31 | int(round(tracklet1[4])), int(round(tracklet1[5])) 32 | image_name = os.path.join(images_path, "{0:0=6d}".format(int(tracklet1[0])) + ".jpg") 33 | image = plt.imread(image_name) 34 | frame_width, frame_height, channels = image.shape 35 | coords.append([xmin / frame_width, ymin / frame_height, width / frame_width, height / frame_height]) 36 | coords_original.append([xmin, ymin, xmin+width/2, ymin+height/2]) 37 | image_cropped = image[ymin:ymin + height, xmin:xmin + width] 38 | image_resized = cv2.resize(image_cropped, (90,150), interpolation=cv2.INTER_AREA) 39 | image_resized = image_resized / 255 40 | image_resized = image_resized.astype(np.float32) 41 | image_resized -= [0.485, 0.456, 0.406] 42 | image_resized /= [0.229, 0.224, 0.225] 43 | image_resized = transform(image_resized) 44 | node_attr.append(image_resized) 45 | frame.append([tracklet1[0]/fps]) # the frame it is observed 46 | #####new detections graph 47 | for detection in current_detections: 48 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 49 | int(round(detection[4])), int(round(detection[5])) 50 | image_name = os.path.join(images_path, "{0:0=6d}".format(int(detection[0])) + ".jpg") 51 | image = plt.imread(image_name) 52 | frame_width, frame_height, channels = image.shape 53 | coords.append([xmin / frame_width, ymin / frame_height, width / frame_width, height / frame_height]) 54 | coords_original.append([xmin, ymin, xmin+width/2, ymin+height/2]) 55 | image_cropped = image[ymin:ymin + height, xmin:xmin + width] 56 | image_resized = cv2.resize(image_cropped, (90,150), interpolation=cv2.INTER_AREA) 57 | image_resized = image_resized / 255 58 | image_resized = image_resized.astype(np.float32) 59 | image_resized -= [0.485, 0.456, 0.406] 60 | image_resized /= [0.229, 0.224, 0.225] 61 | image_resized = transform(image_resized) 62 | node_attr.append(image_resized) 63 | frame.append([detection[0]/fps]) # the frame it is observed 64 | # construct connections between tracklets and detections 65 | k = 0 66 | for i in range(len(tracklets) + len(current_detections)): 67 | for j in range(len(tracklets) + len(current_detections)): 68 | distance= ((coords_original[i][0]-coords_original[j][0])**2+(coords_original[i][1]-coords_original[j][1])**2)**0.5 69 | if i < len(tracklets) and j >= len(tracklets): # i is tracklet j is detection 70 | # adjacency matrix 71 | if distance= len(tracklets) and j < len(tracklets): # j is tracklet i is detection 84 | # adjacency matrix 85 | if distancecurrent_frame: 34 | break 35 | else: 36 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 37 | int(round(detection[4])), int(round(detection[5])) 38 | object_type = detection[7] 39 | if xmin > 0 and ymin > 0 and width > 0 and height > 0 and (object_type in acceptable_object_types): 40 | most_recent_frame_back2 = randint(1, most_recent_frame_back) 41 | if current_frame-most_recent_frame_back2<1: 42 | most_recent_frame_back2=1 43 | temp=current_frame - (most_recent_frame_back2 - 1) 44 | if (detection[0]=temp-frames_look_back: 45 | new_tracklet= True 46 | for k,i in enumerate(tracklet_IDs): 47 | if detection[1]==i: 48 | new_tracklet=False 49 | tracklets[k].append(detection) 50 | break 51 | if new_tracklet==True: 52 | tracklet_IDs.append(int(detection[1])) 53 | tracklets.append([detection]) 54 | elif detection[0]==current_frame: 55 | current_detections.append(detection) 56 | data = build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=False) 57 | data_list.append(data) 58 | current_frame += graph_jump 59 | print("Data preparation finished") 60 | return data_list 61 | 62 | 63 | -------------------------------------------------------------------------------- /model_testing.py: -------------------------------------------------------------------------------- 1 | from torch_geometric.data import DataLoader, DataListLoader 2 | from build_graph import * 3 | from scipy.optimize import linear_sum_assignment 4 | import matplotlib.pyplot as plt 5 | import torchvision 6 | import torchvision.utils as vutils 7 | import shutil 8 | import tensorflow as tf 9 | import tensorboard as tb 10 | import keyword 11 | from torchvision.transforms import ToTensor 12 | from utils import * 13 | import datetime 14 | 15 | def model_testing(sequence, detections, images_path, total_frames, frames_look_back, model, distance_limit, fp_min_times_seen, match_thres, det_conf_thres, fp_look_back, fp_recent_frame_limit,min_height,fps): 16 | 17 | device = torch.device('cuda') 18 | #pick one frame and load previous results 19 | tf.io.gfile = tb.compat.tensorflow_stub.io.gfile 20 | current_frame= 2 21 | id_num= 0 22 | tracking_output= [] 23 | checked_ids = [] 24 | 25 | transform = ToTensor() 26 | 27 | while current_frame <= total_frames: 28 | print("Sequence: " + sequence+ ", Frame: " + str(current_frame)+'/'+str(int(total_frames))) 29 | data_list = [] 30 | #Give IDs to the first frame 31 | tracklets = [] 32 | if not tracking_output: 33 | for i, detection in enumerate(detections): 34 | if detection[0] == 1: 35 | frame = detection[0] 36 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 37 | int(round(detection[4])), int(round(detection[5])) 38 | confidence= detection[6] 39 | if xmin > 0 and ymin > 0 and width > 0 and height > min_height and confidence>det_conf_thres: 40 | id_num += 1 41 | ID= int(id_num) 42 | tracking_output.append([frame, ID, xmin, ymin, width, height, \ 43 | int(detection[6]), 1, 1]) 44 | tracklets.append([[frame, ID, xmin, ymin, width, height, \ 45 | int(detection[6]), 1, 1]]) 46 | else: 47 | detections= detections[i:] 48 | break 49 | else: 50 | #Get all tracklets 51 | tracklet_IDs = [] 52 | for j, tracklet in enumerate(tracking_output): 53 | xmin, ymin, width, height = int(round(tracklet[2])), int(round(tracklet[3])), \ 54 | int(round(tracklet[4])), int(round(tracklet[5])) 55 | if xmin > 0 and ymin > 0 and width > 0 and height > 0: 56 | if (tracklet[0]=current_frame-frames_look_back: 57 | new_tracklet= True 58 | for k,i in enumerate(tracklet_IDs): 59 | if tracklet[1]==i: 60 | new_tracklet=False 61 | tracklets[k].append(tracklet) 62 | break 63 | if new_tracklet==True: 64 | tracklet_IDs.append(int(tracklet[1])) 65 | tracklets.append([tracklet]) 66 | #Get new detections 67 | current_detections = [] 68 | for i, detection in enumerate(detections): 69 | if detection[0] == current_frame: 70 | frame = detection[0] 71 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 72 | int(round(detection[4])), int(round(detection[5])) 73 | confidence= detection[6] 74 | if xmin > 0 and ymin > 0 and width > 0 and height > min_height and confidence>det_conf_thres: 75 | current_detections.append([frame, -1, xmin, ymin, width, height, \ 76 | int(detection[6]), 1, 1]) 77 | else: 78 | detections= detections[i:] 79 | break 80 | #build graph and run model 81 | data = build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=True) 82 | if data: 83 | if current_detections and data.edge_attr.size()[0]!=0: 84 | data_list.append(data) 85 | 86 | loader = DataListLoader(data_list) 87 | for graph_num, batch in enumerate(loader): 88 | #MODEL FORWARD 89 | output, output2, ground_truth, ground_truth2, det_num, tracklet_num= model(batch) 90 | #FEATURE MAPS on tensorboard 91 | #embedding 92 | images= batch[0].x 93 | images = F.interpolate(images, size=250) 94 | edge_index= data_list[graph_num].edges_complete 95 | #THRESHOLDS 96 | temp= [] 97 | for i in output2: 98 | if i>match_thres: 99 | temp.append(i) 100 | else: 101 | temp.append(i-i) 102 | output2= torch.stack(temp) 103 | # HUNGARIAN 104 | cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num) 105 | # Give Ids to current frame 106 | for i,detection in enumerate(current_detections): 107 | match_found= False 108 | for k,m in enumerate(cleaned_output):#cleaned_output): 109 | if m==1 and edge_index[1,k]==i+len(tracklets): #match found 110 | ID= tracklets[edge_index[0,k]][-1][1] 111 | frame = detection[0] 112 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 113 | int(round(detection[4])), int(round(detection[5])) 114 | tracking_output.append([frame, ID, xmin, ymin, width, height, \ 115 | int(detection[6]), 1, 1]) 116 | match_found = True 117 | break 118 | if match_found==False: #give new ID 119 | # print("no match") 120 | id_num += 1 121 | ID= id_num 122 | frame = detection[0] 123 | xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \ 124 | int(round(detection[4])), int(round(detection[5])) 125 | tracking_output.append([frame, ID, xmin, ymin, width, height, \ 126 | int(detection[6]), 1, 1]) 127 | #Clean output for false positives 128 | if current_frame>=fp_look_back: 129 | # reduce to recent objects 130 | recent_tracks = [i for i in tracking_output if i[0] >= current_frame-fp_look_back] 131 | # find the different IDs 132 | candidate_ids= [] 133 | times_seen= [] 134 | first_frame_seen= [] 135 | for i in recent_tracks: 136 | if i[1] not in checked_ids: 137 | if i[1] not in candidate_ids: 138 | candidate_ids.append(i[1]) 139 | times_seen.append(1) 140 | first_frame_seen.append(i[0]) 141 | else: 142 | index= candidate_ids.index(i[1]) 143 | times_seen[index]= times_seen[index] + 1 144 | # find which IDs to remove 145 | remove_ids = [] 146 | for i,j in enumerate(candidate_ids): 147 | if times_seen[i] < fp_min_times_seen and current_frame-first_frame_seen[i]>=fp_look_back: 148 | remove_ids.append(j) 149 | elif times_seen[i] > fp_min_times_seen: 150 | checked_ids.append(j) 151 | #keep only those IDs that are seen enough times 152 | tracking_output = [j for j in tracking_output if j[1] not in remove_ids] 153 | current_frame += 1 154 | # reduce to recent objects 155 | recent_tracks = [i for i in tracking_output if i[0] >= current_frame-fp_look_back] 156 | # find the different IDs 157 | candidate_ids= [] 158 | times_seen= [] 159 | for i in recent_tracks: 160 | if i[1] not in checked_ids: 161 | if i[1] not in candidate_ids: 162 | candidate_ids.append(i[1]) 163 | times_seen.append(1) 164 | else: 165 | index= candidate_ids.index(i[1]) 166 | times_seen[index]= times_seen[index] + 1 167 | # find which IDs to remove 168 | remove_ids = [] 169 | for i,j in enumerate(candidate_ids): 170 | if times_seen[i] < fp_min_times_seen: 171 | remove_ids.append(j) 172 | elif times_seen[i] > fp_min_times_seen: 173 | checked_ids.append(j) 174 | #keep only those IDs that are seen enough times 175 | tracking_output = [j for j in tracking_output if j[1] not in remove_ids] 176 | 177 | return tracking_output -------------------------------------------------------------------------------- /model_training.py: -------------------------------------------------------------------------------- 1 | from torch_geometric.nn import MetaLayer, DataParallel 2 | from utils import * 3 | from network.complete_net import * 4 | from torch_geometric.data import DataLoader, DataListLoader 5 | from accuracy import * 6 | import matplotlib.pyplot as plt 7 | import logging 8 | import sys 9 | import os 10 | 11 | def model_training(data_list_train, data_list_test, epochs, acc_epoch, acc_epoch2, save_model_epochs, validation_epoch, batchsize, logfilename, load_checkpoint= None): 12 | 13 | #logging 14 | logging.basicConfig(level=logging.DEBUG, filename='./logfiles/'+logfilename, filemode="w+", 15 | format="%(message)s") 16 | trainloader = DataListLoader(data_list_train, batch_size=batchsize, shuffle=True) 17 | testloader = DataListLoader(data_list_test, batch_size=batchsize, shuffle=True) 18 | device = torch.device('cuda') 19 | complete_net = completeNet() 20 | complete_net = DataParallel(complete_net) 21 | complete_net = complete_net.to(device) 22 | 23 | #train parameters 24 | weights = [10, 1] 25 | optimizer = torch.optim.Adam(complete_net.parameters(), lr=0.001, weight_decay=0.001) 26 | 27 | #resume training 28 | initial_epoch=1 29 | if load_checkpoint!=None: 30 | checkpoint = torch.load(load_checkpoint) 31 | complete_net.load_state_dict(checkpoint['model_state_dict'], strict=False) 32 | optimizer.load_state_dict(checkpoint['optimizer_state_dict']) 33 | initial_epoch = checkpoint['epoch']+1 34 | loss = checkpoint['loss'] 35 | 36 | complete_net.train() 37 | 38 | for epoch in range(initial_epoch, epochs+1): 39 | epoch_total=0 40 | epoch_total_ones= 0 41 | epoch_total_zeros= 0 42 | epoch_correct=0 43 | epoch_correct_ones= 0 44 | epoch_correct_zeros= 0 45 | running_loss= 0 46 | batches_num=0 47 | for batch in trainloader: 48 | batch_total=0 49 | batch_total_ones= 0 50 | batch_total_zeros= 0 51 | batch_correct= 0 52 | batch_correct_ones= 0 53 | batch_correct_zeros= 0 54 | batches_num+=1 55 | # Forward-Backpropagation 56 | output, output2, ground_truth, ground_truth2, det_num, tracklet_num= complete_net(batch) 57 | optimizer.zero_grad() 58 | loss = weighted_binary_cross_entropy(output, ground_truth, weights) 59 | loss.backward() 60 | optimizer.step() 61 | ##Accuracy 62 | if epoch%acc_epoch==0 and epoch!=0: 63 | # Hungarian method, clean up 64 | cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num) 65 | batch_total += cleaned_output.size(0) 66 | ones= torch.tensor([1 for x in cleaned_output]).to(device) 67 | zeros = torch.tensor([0 for x in cleaned_output]).to(device) 68 | batch_total_ones += (cleaned_output == ones).sum().item() 69 | batch_total_zeros += (cleaned_output == zeros).sum().item() 70 | batch_correct += (cleaned_output == ground_truth2).sum().item() 71 | temp1 = (cleaned_output == ground_truth2) 72 | temp2 = (cleaned_output == ones) 73 | batch_correct_ones += (temp1 & temp2).sum().item() 74 | temp3 = (cleaned_output == zeros) 75 | batch_correct_zeros += (temp1 & temp3).sum().item() 76 | epoch_total += batch_total 77 | epoch_total_ones += batch_total_ones 78 | epoch_total_zeros += batch_total_zeros 79 | epoch_correct += batch_correct 80 | epoch_correct_ones += batch_correct_ones 81 | epoch_correct_zeros += batch_correct_zeros 82 | if loss.item()!=loss.item(): 83 | print("Error") 84 | break 85 | if batch_total_ones != 0 and batch_total_zeros != 0 and epoch%acc_epoch==0 and epoch!=0: 86 | print('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 87 | (epoch, batches_num, loss.item(), 100 * batch_correct / batch_total, 100 * batch_correct_ones / batch_total_ones, 88 | 100 * batch_correct_zeros / batch_total_zeros)) 89 | logging.info('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 90 | (epoch, batches_num, loss.item(), 100 * batch_correct / batch_total, 100 * batch_correct_ones / batch_total_ones, 91 | 100 * batch_correct_zeros / batch_total_zeros)) 92 | else: 93 | print('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f |' % 94 | (epoch, batches_num, loss.item())) 95 | logging.info('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f |' % 96 | (epoch, batches_num, loss.item())) 97 | running_loss += loss.item() 98 | if loss.item()!=loss.item(): 99 | print("Error") 100 | break 101 | if epoch_total_ones!=0 and epoch_total_zeros!=0 and epoch%acc_epoch==0 and epoch!=0: 102 | print('Epoch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 103 | (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \ 104 | epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros)) 105 | logging.info('Epoch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 106 | (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \ 107 | epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros)) 108 | else: 109 | print('Epoch: [%d] | Training_Loss: %.3f |' % 110 | (epoch, running_loss / batches_num)) 111 | logging.info('Epoch: [%d] | Training_Loss: %.3f |' % 112 | (epoch, running_loss / batches_num)) 113 | # save model 114 | if epoch%save_model_epochs==0 and epoch!=0: 115 | torch.save({ 116 | 'epoch': epoch, 117 | 'model_state_dict': complete_net.state_dict(), 118 | 'optimizer_state_dict': optimizer.state_dict(), 119 | 'loss': running_loss, 120 | }, './models/epoch_'+str(epoch)+'.pth') 121 | 122 | #validation 123 | if epoch%validation_epoch==0 and epoch!=0: 124 | with torch.no_grad(): 125 | epoch_total=0 126 | epoch_total_ones= 0 127 | epoch_total_zeros= 0 128 | epoch_correct=0 129 | epoch_correct_ones= 0 130 | epoch_correct_zeros= 0 131 | running_loss= 0 132 | batches_num=0 133 | for batch in testloader: 134 | batch_total=0 135 | batch_total_ones= 0 136 | batch_total_zeros= 0 137 | batch_correct= 0 138 | batch_correct_ones= 0 139 | batch_correct_zeros= 0 140 | batches_num+=1 141 | output, output2, ground_truth, ground_truth2, det_num, tracklet_num = complete_net(batch) 142 | loss = weighted_binary_cross_entropy(output, ground_truth, weights) 143 | running_loss += loss.item() 144 | ##Accuracy 145 | if epoch%acc_epoch2==0 and epoch!=0: 146 | # Hungarian method, clean up 147 | cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num) 148 | batch_total += cleaned_output.size(0) 149 | ones= torch.tensor([1 for x in cleaned_output]).to(device) 150 | zeros = torch.tensor([0 for x in cleaned_output]).to(device) 151 | batch_total_ones += (cleaned_output == ones).sum().item() 152 | batch_total_zeros += (cleaned_output == zeros).sum().item() 153 | batch_correct += (cleaned_output == ground_truth2).sum().item() 154 | temp1 = (cleaned_output == ground_truth2) 155 | temp2 = (cleaned_output == ones) 156 | batch_correct_ones += (temp1 & temp2).sum().item() 157 | temp3 = (cleaned_output == zeros) 158 | batch_correct_zeros += (temp1 & temp3).sum().item() 159 | epoch_total += batch_total 160 | epoch_total_ones += batch_total_ones 161 | epoch_total_zeros += batch_total_zeros 162 | epoch_correct += batch_correct 163 | epoch_correct_ones += batch_correct_ones 164 | epoch_correct_zeros += batch_correct_zeros 165 | if epoch_total_ones!=0 and epoch_total_zeros!=0 and epoch%acc_epoch2==0 and epoch!=0: 166 | print('Epoch: [%d] | Validation_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 167 | (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \ 168 | epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros)) 169 | logging.info('Epoch: [%d] | Validation_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' % 170 | (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \ 171 | epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros)) 172 | else: 173 | print('Epoch: [%d] | Validation_Loss: %.3f |' % 174 | (epoch, running_loss / batches_num)) 175 | logging.info('Epoch: [%d] | Validation_Loss: %.3f |' % 176 | (epoch, running_loss / batches_num)) 177 | 178 | 179 | 180 | -------------------------------------------------------------------------------- /network/affinity.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU 4 | 5 | class affinityNet(torch.nn.Module): 6 | def __init__(self): 7 | super(affinityNet, self).__init__() 8 | self.mlp = nn.Sequential( 9 | nn.Linear(2, 1), #loads features from two nodes and features of their edge (edge of interest) 10 | nn.ReLU() 11 | ) 12 | 13 | def forward(self, inputs): 14 | # source, target: [E, F_x], where E is the number of edges. 15 | # edge_attr: [E, F_e] 16 | # u: [B, F_u], where B is the number of graphs. 17 | # batch: [E] with max entry B - 1. 18 | # out = torch.cat([x1, x2, x3], 0) 19 | return self.mlp(inputs) 20 | -------------------------------------------------------------------------------- /network/affinity_appearance.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU 4 | 5 | class affinity_appearanceNet(torch.nn.Module): 6 | def __init__(self): 7 | super(affinity_appearanceNet, self).__init__() 8 | self.mlp = nn.Sequential( 9 | nn.Linear(1024*2, 1), #loads features from two nodes and features of their edge (edge of interest) 10 | nn.ReLU() 11 | ) 12 | 13 | def forward(self, inputs): 14 | # source, target: [E, F_x], where E is the number of edges. 15 | # edge_attr: [E, F_e] 16 | # u: [B, F_u], where B is the number of graphs. 17 | # batch: [E] with max entry B - 1. 18 | # out = torch.cat([x1, x2, x3], 0) 19 | return self.mlp(inputs) 20 | -------------------------------------------------------------------------------- /network/affinity_final.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU 4 | 5 | class affinity_finalNet(torch.nn.Module): 6 | def __init__(self): 7 | super(affinity_finalNet, self).__init__() 8 | self.mlp = nn.Sequential( 9 | nn.Linear(2, 1), #loads features from two nodes and features of their edge (edge of interest) 10 | nn.ReLU() 11 | ) 12 | 13 | def forward(self, inputs): 14 | # source, target: [E, F_x], where E is the number of edges. 15 | # edge_attr: [E, F_e] 16 | # u: [B, F_u], where B is the number of graphs. 17 | # batch: [E] with max entry B - 1. 18 | # out = torch.cat([x1, x2, x3], 0) 19 | out = self.mlp(inputs) 20 | return out 21 | -------------------------------------------------------------------------------- /network/affinity_geom.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU 4 | 5 | class affinity_geomNet(torch.nn.Module): 6 | def __init__(self): 7 | super(affinity_geomNet, self).__init__() 8 | self.mlp = nn.Sequential( 9 | nn.Linear(8, 1), #loads features from two nodes and features of their edge (edge of interest) 10 | nn.ReLU() 11 | ) 12 | 13 | def forward(self, inputs): 14 | # source, target: [E, F_x], where E is the number of edges. 15 | # edge_attr: [E, F_e] 16 | # u: [B, F_u], where B is the number of graphs. 17 | # batch: [E] with max entry B - 1. 18 | # out = torch.cat([x1, x2, x3], 0) 19 | return self.mlp(inputs) 20 | -------------------------------------------------------------------------------- /network/complete_net.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import torch.nn as nn 3 | import os 4 | import numpy as np 5 | from network.optimizationGNN import * 6 | from network.encoderCNN import * 7 | from network.detections import * 8 | from network.affinity import * 9 | from network.affinity_appearance import * 10 | from network.affinity_geom import * 11 | from torch.nn.parameter import Parameter 12 | from utils import * 13 | import torch 14 | from torch_sparse import transpose 15 | from network.affinity_final import * 16 | 17 | class completeNet(nn.Module): 18 | def __init__(self): 19 | super(completeNet, self).__init__() 20 | 21 | self.cnn = EncoderCNN() 22 | self.affinity_net = affinityNet() 23 | self.affinity_appearance_net= affinity_appearanceNet() 24 | self.affinity_geom_net= affinity_geomNet() 25 | self.affinity_final_net= affinity_finalNet() 26 | self.optim_net = optimNet() 27 | self.cos = nn.CosineSimilarity(dim=0, eps=1e-6) 28 | 29 | def forward(self, data): 30 | # print('Inside Model: num graphs: {}, device: {}'.format(data.num_graphs, data.batch.device)) 31 | # device = torch.device('cuda') 32 | x, coords_original, edge_index, ground_truth, coords, edges_number_list, frame, track_num, detections_num= \ 33 | data.x, data.coords_original, data.edge_index, data.ground_truth, data.coords, data.edges_number, data.frame, data.track_num, data.det_num 34 | slack= torch.Tensor([-0.2]).float().cuda() 35 | lam= torch.Tensor([5]).float().cuda() 36 | #Pass through GNN 37 | node_embedding= self.cnn(x) 38 | edge_embedding = [] 39 | edge_mlp= [] 40 | for i in range(len(edge_index[0])): 41 | #CNN features 42 | x1 = self.affinity_appearance_net(torch.cat((node_embedding[edge_index[0][i]], node_embedding[edge_index[1][i]]), 0)) 43 | #geometry 44 | x2 = self.affinity_geom_net(torch.cat((coords[edge_index[0][i]], coords[edge_index[1][i]]), 0)) 45 | #iou 46 | iou= box_iou_calc(coords_original[edge_index[0][i]], coords_original[edge_index[1][i]]) 47 | # x2= iou 48 | edge_mlp.append(iou) 49 | #pass through mlp 50 | inputs = torch.cat((x1.reshape(1), x2.reshape(1)), 0) 51 | edge_embedding.append(self.affinity_net(inputs)) 52 | # print(edge_embedding) 53 | edge_embedding= torch.stack(edge_embedding) 54 | output = self.optim_net(node_embedding, edge_embedding, edge_index, coords, frame) 55 | output_temp= [] 56 | for i in range(len(edge_index[0])): 57 | if edge_index[0][i]> $SINGULARITY_ENVIRONMENT 19 | export PATH=/opt/pyenv/versions/3.7.2/bin/:$PATH 20 | 21 | pip install torch==1.3.0 22 | 23 | mkdir -p $SINGULARITY_ROOTFS/tmp/sing_build_cuda 24 | cd $SINGULARITY_ROOTFS/tmp/sing_build_cuda 25 | 26 | export TORCH_CUDA_ARCH_LIST="5.0 6.1" 27 | 28 | git clone https://github.com/rusty1s/pytorch_scatter.git && \ 29 | cd ./pytorch_scatter && \ 30 | git checkout 1.4.0 && \ 31 | python3 ./setup.py install && \ 32 | cd .. 33 | 34 | git clone https://github.com/rusty1s/pytorch_sparse.git && \ 35 | cd ./pytorch_sparse && \ 36 | git checkout 0.4.3 && \ 37 | python3 ./setup.py install && \ 38 | cd .. 39 | 40 | git clone https://github.com/rusty1s/pytorch_cluster.git && \ 41 | cd ./pytorch_cluster && \ 42 | git checkout 1.4.5 && \ 43 | python3 ./setup.py install && \ 44 | cd .. 45 | 46 | git clone https://github.com/rusty1s/pytorch_geometric.git && \ 47 | cd ./pytorch_geometric && \ 48 | git checkout 1.3.2 && \ 49 | python3 ./setup.py install && \ 50 | cd .. 51 | 52 | cd $CURDIR 53 | rm -rf $SINGULARITY_ROOTFS/tmp/sing_build_cuda 54 | 55 | pip install Pillow 56 | pip install opencv-python 57 | pip install torchvision==0.4.1 58 | pip install matplotlib 59 | pip install tensorflow 60 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | python ./tracking.py --type test -------------------------------------------------------------------------------- /tracking.py: -------------------------------------------------------------------------------- 1 | from data_preparation_train import * 2 | from model_training import * 3 | from model_testing import * 4 | from network.complete_net import * 5 | from utils import * 6 | import os 7 | import pickle 8 | import argparse 9 | import pprint 10 | 11 | def get_data(info, set): 12 | if set=='training': 13 | detections = np.loadtxt("/data/MOT17/train/MOT17-{}-{}/gt/gt.txt".format(info[0], info[1]), delimiter=',') 14 | images_path = "/data/MOT17/train/MOT17-{}-{}/img1".format(info[0], info[1]) 15 | elif set=='testing': 16 | detections = np.loadtxt("/data/MOT17/test/MOT17-{}-{}/det/det.txt".format(info[0], info[1]), delimiter=',') 17 | images_path = "/data/MOT17/test/MOT17-{}-{}/img1".format(info[0], info[1]) 18 | return detections, images_path 19 | 20 | if __name__ == "__main__": 21 | parser = argparse.ArgumentParser() 22 | parser.add_argument('--type', type=str) 23 | args = parser.parse_args() 24 | 25 | if args.type == 'train': 26 | 27 | # initialize training settings 28 | batchsize = 2 # specify how many graphs to use at each batch 29 | epochs = 4 # at how many epochs to stop 30 | load_checkpoint= None#'./models/epoch_2.pth' # None or specify .pth file to continue training 31 | validation_epochs= 4 # epoch interval for validation 32 | acc_epoch = epochs # at which epoch to calculate training accuracy 33 | acc_epoch2 = epochs # at which epoch to calculate validation accuracy 34 | save_model_epochs = 1 # epoch interval to save the model 35 | most_recent_frame_back = 30 # for more challenging training, specify most recent frame of tracklets to match new detections with, a max value is specified here 36 | # so later it will be randomly between 1 and most_recent_frame_back, min value=1 -> take only previous frame 37 | frames_look_back = 30 # a second limit that is used to use tracklets for matching in frames between frames_look_back and most_recent_frame_back, 38 | # min value=1 -> take only previous frame 39 | # example: if current frame= 60, tracklets used randomly from t1= 30 to 59, also tracklets used till t1-30 40 | graph_jump = 5 # how many frames to move the current frame forward, min value=1 -> move to the next frame 41 | distance_limit = 250 # objects within that pixel distance can be associated 42 | 43 | # MOT17 specific settings 44 | train_seq = ["02", "04", "05", "09", "10", "11", "13"] # names of videos 45 | valid_seq = ["02", "04", "05", "09", "10", "11", "13"] # names of videos 46 | fps= [30,30,14,30,30,30,25] # specify fps of each video 47 | current_frame_train = 2 # use as first current frame the second frame of each video 48 | total_frames = [None, None, None, None, None, None, None] # total frames of each video loaded, None for max frames 49 | current_frame_valid = [500,900,780,450,550,800,650] # up to which frame of each video to use for training 50 | detector = ["FRCNN"] # specify a detector just to direct to one of the MOT folders 51 | 52 | # Option 1: If graph data not built, loop through sequence and get training data 53 | print('\n') 54 | print('Training Data') 55 | data_list_train = [] 56 | for s in range(len(train_seq)): 57 | for d in range(len(detector)): 58 | print('Sequence: ' + train_seq[s]) 59 | detections, images_path = get_data([train_seq[s], detector[d]], "training") 60 | list = data_prep_train(train_seq[s], detections, images_path, frames_look_back, total_frames[s], most_recent_frame_back, 61 | graph_jump, current_frame_train, current_frame_valid[s], distance_limit, fps[s], "training") 62 | data_list_train = data_list_train + list 63 | with open('./data/data_train.data', 'wb') as filehandle: 64 | pickle.dump(data_list_train, filehandle) 65 | print("Saved to pickle file \n") 66 | print('Validation Data') 67 | data_list_valid = [] 68 | for s in range(len(valid_seq)): 69 | for d in range(len(detector)): 70 | print('Sequence: ' + valid_seq[s]) 71 | detections, images_path = get_data([valid_seq[s], detector[d]], "training") 72 | list = data_prep_train(valid_seq[s], detections, images_path, frames_look_back, total_frames[s], most_recent_frame_back, 73 | graph_jump, current_frame_train, current_frame_valid[s], distance_limit, fps[s], "validation") 74 | data_list_valid = data_list_valid + list 75 | with open('./data/data_valid.data', 'wb') as filehandle: 76 | pickle.dump(data_list_valid, filehandle) 77 | print("Saved to pickle file \n") 78 | 79 | # Option 2: If data graph built, just import files 80 | # with open('./data/data_train.data', 'rb') as filehandle: 81 | # data_list_train = pickle.load(filehandle) 82 | # print("Loaded training pickle files") 83 | # with open('./data/data_valid.data', 'rb') as filehandle: 84 | # data_list_valid = pickle.load(filehandle) 85 | # print("Loaded validation pickle files") 86 | #Load and train 87 | model_training(data_list_train, data_list_valid, epochs, acc_epoch, acc_epoch2, save_model_epochs, validation_epochs, batchsize, "logfile", load_checkpoint) 88 | 89 | elif args.type == 'test': 90 | 91 | frames_look_back = 30 # how many previous frames to look back for tracklets to be matched 92 | match_thres= 0.25 # the matching confidence threshold 93 | det_conf_thres= 0.0 # the lowest detection confidence 94 | distance_limit = 200 # objects within that pixel distance can be associated 95 | min_height= 10 #minimum height of detections 96 | fp_look_back= 15 # for false positives 97 | fp_recent_frame_limit= 10 # for false positives 98 | fp_min_times_seen= 3 # for false positives, minimum times seen for the last fp_look_back frames 99 | #with the first instance seen before fp_recent_frame_limit otherwise considered false positive 100 | 101 | # Select sequence 102 | seq = ["01","03","06","07","08","12","14"] 103 | fps= [30,30,14,30,30,30,25] 104 | # detector = ["DPM", "FRCNN", "SDP"] 105 | detector = ["FRCNN"] 106 | 107 | #load model 108 | model = completeNet() 109 | device = torch.device('cuda') 110 | model = model.to(device) 111 | model = DataParallel(model) 112 | model.load_state_dict(torch.load('./models/epoch_11.pth')['model_state_dict']) 113 | model.eval() 114 | 115 | # Load data and test, write output to txt and video 116 | data_list = [] 117 | for s in range(len(seq)): 118 | for d in range(len(detector)): 119 | detections, images_path = get_data([seq[s], detector[d]], "testing") 120 | total_frames= None # None for max frames 121 | if total_frames == None: 122 | total_frames = np.max(detections[:, 0]) # change only if you want a subset of the total frames 123 | print('Sequence: '+ seq[s]) 124 | print('Total frames: ' + str(total_frames)) 125 | detections = sorted(detections, key=lambda x: x[0]) 126 | tracking_output= model_testing(seq[s], detections, images_path, total_frames, frames_look_back, model, distance_limit, fp_min_times_seen, match_thres, det_conf_thres, fp_look_back, fp_recent_frame_limit, min_height, fps[s]) 127 | #write output 128 | outputFile = './output/MOT17-{}-{}.avi'.format(seq[s],detector[d]) 129 | #get dimensions of images 130 | image_name = os.path.join(images_path, "{0:0=6d}".format(1) + ".jpg") 131 | image = cv2.imread(image_name, cv2.IMREAD_COLOR) 132 | vid_writer = cv2.VideoWriter(outputFile, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 15, 133 | (image.shape[1], image.shape[0])) 134 | with open('./output/MOT17-{}-{}.txt'.format(seq[s],detector[d]), 'w') as f: 135 | for frame in range(1,int(total_frames)+1): 136 | image_name = os.path.join(images_path, "{0:0=6d}".format(int(frame)) + ".jpg") 137 | image = cv2.imread(image_name, cv2.IMREAD_COLOR) 138 | for item in tracking_output: 139 | if item[0]==frame: 140 | #write tracking output to txt 141 | f.write('%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n' % (item[0], item[1], item[2], item[3], item[4], item[5], -1, -1, -1, -1)) 142 | #write tracking output to frame 143 | xmin = int(item[2]) 144 | ymin = int(item[3]) 145 | xmax = int(item[2] + item[4]) 146 | ymax = int(item[3] + item[5]) 147 | display_text = '%d' % (item[1]) 148 | color_rectangle = (0, 0, 255) 149 | cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color=color_rectangle, thickness=2) 150 | font = cv2.FONT_HERSHEY_PLAIN 151 | color_text = (255, 255, 255) 152 | cv2.putText(image, display_text, 153 | (xmin + int((xmax - xmin) / 2), ymin + int((ymax - ymin) / 2)), fontFace=font, 154 | fontScale=1.3, color=color_text, 155 | thickness=2) 156 | elif item[0]>frame: 157 | break 158 | for item in detections: 159 | if item[0] == frame and item[2]>0 and item[3]>0 and item[4]>0 and item[5]>0: 160 | xmin = int(item[2]) 161 | ymin = int(item[3]) 162 | xmax = int(item[2] + item[4]) 163 | ymax = int(item[3] + item[5]) 164 | color_rectangle = (255, 255, 255) 165 | cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color=color_rectangle, thickness=1) 166 | elif item[0]>frame: 167 | break 168 | xmin = 0 169 | ymin = 0 170 | xmax = 1920 171 | ymax = 1080 172 | display_text = 'Frame %d' % (frame) 173 | font = cv2.FONT_HERSHEY_PLAIN 174 | color_text = (0, 0, 255) 175 | cv2.putText(image, display_text, 176 | (50, 50), fontFace=font, 177 | fontScale=1.3, color=color_text, 178 | thickness=2) 179 | vid_writer.write(image) 180 | cv2.destroyAllWindows() 181 | vid_writer.release() 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | -------------------------------------------------------------------------------- /train.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | python ./tracking.py --type train 4 | 5 | 6 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from scipy.optimize import linear_sum_assignment 4 | 5 | def fill_detections(tracking_output): 6 | final_output= [] 7 | different_ids= [] 8 | for i in tracking_output: 9 | if i[1] not in different_ids: 10 | different_ids.append(int(i[1])) 11 | #for each ID 12 | for i in different_ids: 13 | #get output for each ID 14 | output_temp= [] 15 | for j in tracking_output: 16 | if j[1]==i: 17 | output_temp.append(j) 18 | filled_detections= [] 19 | #for every two frames, fill in detections 20 | for j in range(len(output_temp)-1): 21 | diff_frame= output_temp[j+1][0]-output_temp[j][0] 22 | diff_x= output_temp[j][2]-output_temp[j+1][2] 23 | diff_y= output_temp[j][3]-output_temp[j+1][3] 24 | diff_w= output_temp[j][4]-output_temp[j+1][4] 25 | diff_h= output_temp[j][5]-output_temp[j+1][5] 26 | boxes1= torch.tensor([output_temp[j][2],output_temp[j][3],output_temp[j][2]+output_temp[j][4],output_temp[j][3]+output_temp[j][5]]) 27 | boxes2= torch.tensor([output_temp[j+1][2],output_temp[j+1][3],output_temp[j+1][2]+output_temp[j+1][4],output_temp[j+1][3]+output_temp[j+1][5]]) 28 | IOU= box_iou_calc(boxes1, boxes2) 29 | if abs(int(diff_frame))>1 and abs(int(diff_frame))<150 and IOU.item()>0.1: 30 | for sec in range(1,abs(int(diff_frame))): 31 | div= abs(diff_frame)/sec 32 | filled_detections.append([output_temp[j][0]+sec,i,int(output_temp[j][2]-diff_x/div),\ 33 | int(output_temp[j][3]-diff_y/div),int(output_temp[j][4]-diff_w/div),int(output_temp[j][5]-diff_h/div)]) 34 | for j in output_temp: 35 | final_output.append(j) 36 | for j in filled_detections: 37 | final_output.append(j) 38 | final_output = sorted(final_output, key=lambda x: x[0]) 39 | return final_output 40 | 41 | 42 | def weighted_binary_cross_entropy(output, target, weights=None): 43 | loss = - weights[0] * (target * torch.log(output)) - \ 44 | weights[1] * ((1 - target) * torch.log(1 - output)) 45 | return torch.mean(loss) 46 | 47 | def indices_first(a, b, value): 48 | out = [k for k, x in enumerate(a) if x == value]# and x <= b[k]] 49 | if out: 50 | return out 51 | 52 | def indices_second(a, b, value): 53 | out = [k for k, x in enumerate(b) if x == value]# and x >= a[k]] 54 | if out: 55 | return out 56 | 57 | def sinkhorn(matrix): 58 | row_len = len(matrix) 59 | col_len = len(matrix[0]) 60 | desired_row_sums = torch.ones((1, row_len), requires_grad=False).cuda() 61 | desired_col_sums = torch.ones((1, col_len), requires_grad=False).cuda() 62 | desired_row_sums[:, -1] = col_len-1 63 | desired_col_sums[:, -1] = row_len-1 64 | for _ in range(8): 65 | #row normalization 66 | actual_row_sum = torch.sum(matrix, axis=1) 67 | for i, row in enumerate(matrix): 68 | for j, element in enumerate(row): 69 | matrix[i,j]= element*desired_row_sums[0,i]/(actual_row_sum[i]) 70 | #column normalization 71 | actual_col_sum = torch.sum(matrix, axis=0) 72 | for i, row in enumerate(matrix): 73 | for j, element in enumerate(row): 74 | matrix[i,j]= element*desired_col_sums[0,j]/(actual_col_sum[j]) 75 | return matrix 76 | 77 | 78 | def hungarian(output, ground_truth, det_num, tracklet_num): 79 | cleaned_output = [] 80 | num = 0 81 | eps = 0.0001 # for numerical stability 82 | for i, j in enumerate(tracklet_num): 83 | matrix = [] 84 | for k in range(j): 85 | matrix.append([]) 86 | for l in range(det_num[i]): 87 | matrix[k].append(1 - output[num].cpu().detach().numpy()) 88 | num += 1 89 | matrix = np.array(matrix) 90 | # padding 91 | (a, b) = matrix.shape 92 | if a > b: 93 | padding = ((0, 0), (0, a - b)) 94 | else: 95 | padding = ((0, b - a), (0, 0)) 96 | matrix = np.pad(matrix, padding, mode='constant', constant_values=eps) 97 | # hungarian 98 | row_ind, col_ind = linear_sum_assignment(matrix) 99 | #take out those that are all 1, max cost, either hungarian will assign 100 | remove_ind= [] 101 | cnt= 0 102 | for i, row in enumerate(matrix): 103 | for j, element in enumerate(row): 104 | if element==1: 105 | remove_ind.append(cnt) 106 | cnt += 1 107 | cnt= 0 108 | for i, row in enumerate(matrix): 109 | for j, element in enumerate(row): 110 | if i < a and j < b: 111 | p1 = row_ind.tolist().index(i) 112 | p2 = col_ind.tolist().index(j) 113 | # print(p2) 114 | if p1 == p2 and cnt not in remove_ind: 115 | cleaned_output.append(torch.tensor(1, dtype=float).cuda()) 116 | else: 117 | cleaned_output.append(torch.tensor(0, dtype=float).cuda()) 118 | cnt += 1 119 | cleaned_output = torch.stack(cleaned_output) 120 | return cleaned_output 121 | 122 | def box_area(boxes): 123 | """ 124 | Computes the area of a set of bounding boxes, which are specified by its 125 | (x1, y1, x2, y2) coordinates. 126 | Arguments: 127 | boxes (Tensor[N, 4]): boxes for which the area will be computed. They 128 | are expected to be in (x1, y1, x2, y2) format 129 | Returns: 130 | area (Tensor[N]): area for each box 131 | """ 132 | return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) 133 | 134 | def box_iou_calc(boxes1, boxes2): 135 | # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py 136 | """ 137 | Return intersection-over-union (Jaccard index) of boxes. 138 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format. 139 | Arguments: 140 | boxes1 (Tensor[N, 4]) 141 | boxes2 (Tensor[M, 4]) 142 | Returns: 143 | iou (Tensor[N, M]): the NxM matrix containing the pairwise 144 | IoU values for every element in boxes1 and boxes2 145 | """ 146 | boxes1= boxes1.reshape(1,4) 147 | boxes2 = boxes2.reshape(1, 4) 148 | 149 | area1 = box_area(boxes1) 150 | area2 = box_area(boxes2) 151 | 152 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2] 153 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2] 154 | 155 | wh = (rb - lt).clamp(min=0) # [N,M,2] 156 | inter = wh[:, :, 0] * wh[:, :, 1] # [N,M] 157 | 158 | iou = inter / (area1[:, None] + area2 - inter) 159 | return iou 160 | 161 | class UnNormalize(object): 162 | def __init__(self, mean, std): 163 | self.mean = mean 164 | self.std = std 165 | 166 | def __call__(self, tensor): 167 | """ 168 | Args: 169 | tensor (Tensor): Tensor image of size (C, H, W) to be normalized. 170 | Returns: 171 | Tensor: Normalized image. 172 | """ 173 | for t, m, s in zip(tensor, self.mean, self.std): 174 | t.mul_(s).add_(m) 175 | # The normalize code -> t.sub_(m).div_(s) 176 | return tensor --------------------------------------------------------------------------------