├── LICENSE
├── README.md
├── accuracy.py
├── build_graph.py
├── create_folders.sh
├── data_preparation_train.py
├── model_testing.py
├── model_training.py
├── network
    ├── affinity.py
    ├── affinity_appearance.py
    ├── affinity_final.py
    ├── affinity_geom.py
    ├── complete_net.py
    ├── detections.py
    ├── encoderCNN.py
    └── optimizationGNN.py
├── singularity
├── test.sh
├── tracking.py
├── train.sh
└── utils.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Ioannis Papakis
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization
 2 | 
 3 | This repository is the official code implementation of the GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization on [IEEE](https://ieeexplore.ieee.org/document/9564655) and on [Arxiv](https://arxiv.org/abs/2010.00067). Link to access a new traffic vehicle monitoring dataset named "VA Beach Traffic Dataset" will be provided here.
 4 | 
 5 | ## Citing:
 6 | If you find this paper or code useful, please cite using the following on IEEE:
 7 | 
 8 | ```
 9 | @inproceedings{papakis2021graph,
10 |   title={A Graph Convolutional Neural Network Based Approach for Traffic Monitoring Using Augmented Detections with Optical Flow},
11 |   author={Papakis, Ioannis and Sarkar, Abhijit and Karpatne, Anuj},
12 |   booktitle={2021 IEEE International Intelligent Transportation Systems Conference (ITSC)},
13 |   pages={2980--2986},
14 |   year={2021},
15 |   organization={IEEE}
16 | }
17 | ```
18 | 
19 | or on Arxiv:
20 | 
21 | ```
22 | @article{papakis2020gcnnmatch,
23 |   title={GCNNMatch: Graph Convolutional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization},
24 |   author={Papakis, Ioannis and Sarkar, Abhijit and Karpatne, Anuj},
25 |   journal={arXiv preprint arXiv:2010.00067},
26 |   year={2020}
27 | }
28 | ```
29 | 
30 | ## Installing & Preparation:
31 | 
32 | * Install singularity following instructions from its [website](https://sylabs.io/guides/3.0/user-guide/quick_start.html#quick-installation-steps).
33 | 
34 | * Git clone this repo folder and cd to it.
35 | 
36 | * "sudo singularity build geometric.sif singularity". Follow instructions from [pytorch-geometric](https://github.com/rusty1s/pytorch_geometric/tree/master/docker) to change settings if needed for your system.
37 | 
38 | * Download MOT17 Dataset from [MOT website](https://motchallenge.net/data/MOT17/) and place it in a folder /MOT_dataset. 
39 | 
40 | * "mkdir overlay". It will allow you to install additional packages if needed in the future.
41 | 
42 | * "sudo singularity run --nv -B /MOT_dataset/:/data --overlay overlay/ geometric.sif"
43 | 
44 | * "./create_folders.sh"
45 | 
46 | ## Training:
47 | 
48 | * Command: ./train.sh
49 | 
50 | * Result: Training will start and save the trained models in /models. Settings can be changed in tracking.py.
51 | 
52 | ## Testing:
53 | 
54 | * Specify which trained model to use in tracking.py. A trained model can be found [here](https://drive.google.com/drive/folders/1b0ZF7WAQFIXv6xydyU3OGGBW-7EhegSv?usp=sharing).
55 | 
56 | * Command: ./test.sh
57 | 
58 | * Result: Testing will start and produce txt files and videos saved in /output. Settings can be changed in tracking.py
59 | 
60 | For Benchmark evaluation the pre-processed with Tracktor detection files from [this repo](https://github.com/dvl-tum/mot_neural_solver) were used.
61 | 
62 | 


--------------------------------------------------------------------------------
/accuracy.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from utils import * 
 3 | import numpy as np
 4 | 
 5 | def accuracy(k,j,edges_number_list,output_final,ground_truth,batch,start,device):
 6 |     num_of_edges= edges_number_list[int(j.item())]
 7 |     output3= [0] * num_of_edges
 8 |     output_sliced= output_final[start:start+num_of_edges].detach().clone()
 9 |     ground_truth_sliced= ground_truth[start:start+num_of_edges].to(torch.int8).detach().clone()
10 |     edges_list_reduced= [[],[]]
11 |     output_reduced= []
12 |     ground_truth_reduced= []
13 |     # print(j.item())
14 |     for i in range(len(batch[k].edge_index[0])):
15 |         edge1= batch[k].edge_index[0][i]
16 |         edge2= batch[k].edge_index[1][i]
17 |         if edge1<=edge2:
18 |             edges_list_reduced[0].append(edge1.item())
19 |             edges_list_reduced[1].append(edge2.item())
20 |             ground_truth_reduced.append(ground_truth_sliced[i])
21 |             if edge1<edge2: #find the second same edge
22 |                 for j in range(i,len(batch[k].edge_index[0])):
23 |                     edge3= batch[k].edge_index[0][j]
24 |                     edge4= batch[k].edge_index[1][j]
25 |                     if edge1==edge4 and edge2==edge3:
26 |                         output_reduced.append((output_sliced[i]+output_sliced[j])/2)
27 |                         break
28 |             else:
29 |                 output_reduced.append(output_sliced[i])
30 |     start += num_of_edges
31 |     constraints= []
32 |     # find indexes for constraints
33 |     max= 0
34 |     for i in range(len(batch[k].edge_index[0])):
35 |         out1 = indices_first(edges_list_reduced[0], edges_list_reduced[1], i)
36 |         out2 = indices_second(edges_list_reduced[0], edges_list_reduced[1], i)
37 |         if out1 and len(np.array(out1))>1:
38 |             if len(out1)>max:
39 |                 max= len(out1)
40 |             constraints.append(out1)
41 |         if out2 and len(np.array(out2))>1:
42 |             if len(out2)>max:
43 |                 max= len(out2)
44 |             constraints.append(out2)
45 |     # Get most probable edges as 1 and the other as 0
46 |     max=0
47 |     zero_indeces= []
48 |     one_indeces= []
49 |     ranking= True
50 |     # print(optim_graph.out.size())
51 |     while ranking==True:
52 |         for i, edge in enumerate(output_reduced):
53 |             if (edge>max) and (i not in zero_indeces) and (i not in one_indeces):
54 |                 max=edge
55 |                 index= i
56 |         if max==0:
57 |             ranking= False
58 |         
59 |         else:
60 |             one_indeces.append(index)
61 |             for constraint in constraints:
62 |                 if index in constraint:
63 |                     for constr in constraint:
64 |                         if constr!=index and constr!=-1 and constr not in zero_indeces:
65 |                             zero_indeces.append(constr)    
66 |             max=0
67 |     processed_output= []
68 |     for i, edge in enumerate(output_reduced):
69 |         if i in one_indeces:
70 |             processed_output.append(torch.tensor(1).to(device))
71 |         else:
72 |             processed_output.append(torch.tensor(0).to(device))
73 |     processed_output= torch.stack(processed_output)
74 |     ground_truth_reduced= torch.stack(ground_truth_reduced)
75 |     
76 |     return processed_output, ground_truth_reduced,output_reduced


--------------------------------------------------------------------------------
/build_graph.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import os
  3 | import numpy as np
  4 | import torch.nn.functional as F
  5 | from utils import *
  6 | from torch_geometric.data import Data, DataLoader, DataListLoader
  7 | import torchvision.utils as vutils
  8 | import matplotlib.pyplot as plt
  9 | from PIL import Image
 10 | from torchvision.transforms import ToTensor
 11 | 
 12 | def build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=True):
 13 | 
 14 |     if len(tracklets):
 15 |         edges_first_row = []
 16 |         edges_second_row = []
 17 |         edges_complete_first_row= []
 18 |         edges_complete_second_row = []
 19 |         edge_attr = []
 20 |         ground_truth = []
 21 |         idx = []
 22 |         node_attr = []
 23 |         coords = []
 24 |         frame = []
 25 |         coords_original = []
 26 |         transform= ToTensor()
 27 |         ####tracklet graphs
 28 |         for tracklet in tracklets:
 29 |             tracklet1= tracklet[-1]
 30 |             xmin, ymin, width, height = int(round(tracklet1[2])), int(round(tracklet1[3])), \
 31 |                                         int(round(tracklet1[4])), int(round(tracklet1[5]))
 32 |             image_name = os.path.join(images_path, "{0:0=6d}".format(int(tracklet1[0])) + ".jpg")
 33 |             image = plt.imread(image_name)
 34 |             frame_width, frame_height, channels = image.shape
 35 |             coords.append([xmin / frame_width, ymin / frame_height, width / frame_width, height / frame_height])
 36 |             coords_original.append([xmin, ymin, xmin+width/2, ymin+height/2])
 37 |             image_cropped = image[ymin:ymin + height, xmin:xmin + width]
 38 |             image_resized = cv2.resize(image_cropped, (90,150), interpolation=cv2.INTER_AREA)
 39 |             image_resized = image_resized / 255
 40 |             image_resized = image_resized.astype(np.float32)
 41 |             image_resized -= [0.485, 0.456, 0.406]
 42 |             image_resized /= [0.229, 0.224, 0.225]
 43 |             image_resized = transform(image_resized)
 44 |             node_attr.append(image_resized)
 45 |             frame.append([tracklet1[0]/fps])  # the frame it is observed
 46 |         #####new detections graph
 47 |         for detection in current_detections:
 48 |             xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
 49 |                                         int(round(detection[4])), int(round(detection[5]))
 50 |             image_name = os.path.join(images_path, "{0:0=6d}".format(int(detection[0])) + ".jpg")
 51 |             image = plt.imread(image_name)
 52 |             frame_width, frame_height, channels = image.shape
 53 |             coords.append([xmin / frame_width, ymin / frame_height, width / frame_width, height / frame_height])
 54 |             coords_original.append([xmin, ymin, xmin+width/2, ymin+height/2])
 55 |             image_cropped = image[ymin:ymin + height, xmin:xmin + width]
 56 |             image_resized = cv2.resize(image_cropped, (90,150), interpolation=cv2.INTER_AREA)
 57 |             image_resized = image_resized / 255
 58 |             image_resized = image_resized.astype(np.float32)
 59 |             image_resized -= [0.485, 0.456, 0.406]
 60 |             image_resized /= [0.229, 0.224, 0.225]
 61 |             image_resized = transform(image_resized)
 62 |             node_attr.append(image_resized)
 63 |             frame.append([detection[0]/fps])  # the frame it is observed
 64 |         # construct connections between tracklets and detections
 65 |         k = 0
 66 |         for i in range(len(tracklets) + len(current_detections)):
 67 |             for j in range(len(tracklets) + len(current_detections)):
 68 |                 distance= ((coords_original[i][0]-coords_original[j][0])**2+(coords_original[i][1]-coords_original[j][1])**2)**0.5
 69 |                 if i < len(tracklets) and j >= len(tracklets):  # i is tracklet j is detection
 70 |                     # adjacency matrix
 71 |                     if distance<distance_limit:
 72 |                         edges_first_row.append(i)
 73 |                         edges_second_row.append(j)
 74 |                         edge_attr.append([0.0])
 75 |                     if test==True:
 76 |                         edges_complete_first_row.append(i)
 77 |                         edges_complete_second_row.append(j)
 78 |                     if tracklets[i][-1][1] == current_detections[j - len(tracklets)][1]:
 79 |                         ground_truth.append(1.0)
 80 |                     else:
 81 |                         ground_truth.append(0.0)
 82 |                     k += 1
 83 |                 elif i >= len(tracklets) and j < len(tracklets):  # j is tracklet i is detection
 84 |                     # adjacency matrix
 85 |                     if distance<distance_limit:
 86 |                         edges_first_row.append(i)
 87 |                         edges_second_row.append(j)
 88 |                         edge_attr.append([0.0])
 89 |                     k += 1
 90 |         idx.append(current_frame - 2)
 91 |         frame_node_attr = torch.stack(node_attr)
 92 |         frame_edge_attr = torch.tensor(edge_attr, dtype=torch.float)
 93 |         frame_edges_index = torch.tensor([edges_first_row, edges_second_row], dtype=torch.long)
 94 |         frame_coords = torch.tensor(coords, dtype=torch.float)
 95 |         frame_ground_truth = torch.tensor(ground_truth, dtype=torch.float)
 96 |         frame_idx = torch.tensor(idx, dtype=torch.float)
 97 |         frame_edges_number = torch.tensor(len(edges_first_row), dtype=torch.int).reshape(1)
 98 |         frame_frame = torch.tensor(frame, dtype=torch.float)
 99 |         tracklets_frame = torch.tensor(len(tracklets), dtype=torch.float).reshape(1)
100 |         detections_frame = torch.tensor(len(current_detections), dtype=torch.float).reshape(1)
101 |         coords_original = torch.tensor(coords_original, dtype= torch.float)
102 |         edges_complete = torch.tensor([edges_complete_first_row, edges_complete_second_row], dtype=torch.long)
103 |         data = Data(x=frame_node_attr, edge_index=frame_edges_index, \
104 |                     edge_attr=frame_edge_attr, coords=frame_coords, coords_original= coords_original,\
105 |                     ground_truth=frame_ground_truth, idx=frame_idx, \
106 |                     edges_number=frame_edges_number, frame=frame_frame, det_num= detections_frame, track_num= tracklets_frame, edges_complete= edges_complete)
107 |         return data


--------------------------------------------------------------------------------
/create_folders.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | mkdir data
4 | mkdir models
5 | mkdir output
6 | mkdir logfiles
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/data_preparation_train.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | from PIL import Image
 4 | from utils import *
 5 | import torch.nn.functional as F
 6 | from utils import * 
 7 | from torch_geometric.data import Data, DataLoader, DataListLoader
 8 | from random import randint
 9 | from build_graph import *
10 | 
11 | def data_prep_train(sequence, detections, images_path, frames_look_back, total_frames, most_recent_frame_back, graph_jump, current_frame_train, current_frame_valid, distance_limit, fps, type):
12 | 
13 |     if total_frames==None:
14 |         total_frames= np.max(detections[:,0]) #change only if you want a subset of the total frames
15 |     detections= sorted(detections, key = lambda x: x[0])
16 |     data_list = []
17 |     acceptable_object_types= [1,2,7] # MOT specific types
18 |     if type=="training":
19 |         total_frames= current_frame_valid
20 |         current_frame= current_frame_train
21 |     elif type=="validation":
22 |         total_frames= total_frames
23 |         current_frame= current_frame_valid
24 | 
25 |     while current_frame<=total_frames:
26 | 
27 |         print("Sequence: " + sequence + ", Frame: " + str(current_frame)+"/"+ str(int(total_frames)))
28 |         ####find tracklets and new detections
29 |         current_detections = []
30 |         tracklets = []
31 |         tracklet_IDs = []
32 |         for j, detection in enumerate(detections):
33 |             if detection[0]>current_frame:
34 |                 break
35 |             else:
36 |                 xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
37 |                                             int(round(detection[4])), int(round(detection[5]))
38 |                 object_type = detection[7]
39 |                 if xmin > 0 and ymin > 0 and width > 0 and height > 0 and (object_type in acceptable_object_types):
40 |                     most_recent_frame_back2 = randint(1, most_recent_frame_back)
41 |                     if current_frame-most_recent_frame_back2<1:
42 |                         most_recent_frame_back2=1
43 |                     temp=current_frame - (most_recent_frame_back2 - 1)
44 |                     if (detection[0]<temp) and detection[0]>=temp-frames_look_back:
45 |                         new_tracklet= True
46 |                         for k,i in enumerate(tracklet_IDs):
47 |                             if detection[1]==i:
48 |                                 new_tracklet=False
49 |                                 tracklets[k].append(detection)
50 |                                 break
51 |                         if new_tracklet==True:
52 |                             tracklet_IDs.append(int(detection[1]))
53 |                             tracklets.append([detection])
54 |                     elif detection[0]==current_frame:
55 |                         current_detections.append(detection)
56 |         data = build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=False)
57 |         data_list.append(data)
58 |         current_frame += graph_jump
59 |     print("Data preparation finished")
60 |     return data_list
61 | 
62 | 
63 | 


--------------------------------------------------------------------------------
/model_testing.py:
--------------------------------------------------------------------------------
  1 | from torch_geometric.data import DataLoader, DataListLoader
  2 | from build_graph import *
  3 | from scipy.optimize import linear_sum_assignment
  4 | import matplotlib.pyplot as plt
  5 | import torchvision
  6 | import torchvision.utils as vutils
  7 | import shutil
  8 | import tensorflow as tf
  9 | import tensorboard as tb
 10 | import keyword
 11 | from torchvision.transforms import ToTensor
 12 | from utils import *
 13 | import datetime
 14 | 
 15 | def model_testing(sequence, detections, images_path, total_frames, frames_look_back, model, distance_limit, fp_min_times_seen, match_thres, det_conf_thres, fp_look_back, fp_recent_frame_limit,min_height,fps):
 16 | 
 17 |     device = torch.device('cuda')
 18 |     #pick one frame and load previous results
 19 |     tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
 20 |     current_frame= 2
 21 |     id_num= 0
 22 |     tracking_output= []
 23 |     checked_ids = []
 24 |     
 25 |     transform = ToTensor()
 26 | 
 27 |     while current_frame <= total_frames:
 28 |         print("Sequence: " + sequence+ ", Frame: " + str(current_frame)+'/'+str(int(total_frames)))
 29 |         data_list = []
 30 |         #Give IDs to the first frame
 31 |         tracklets = []
 32 |         if not tracking_output:
 33 |             for i, detection in enumerate(detections):
 34 |                 if detection[0] == 1:
 35 |                     frame = detection[0]
 36 |                     xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
 37 |                                                 int(round(detection[4])), int(round(detection[5]))
 38 |                     confidence= detection[6]
 39 |                     if xmin > 0 and ymin > 0 and width > 0 and height > min_height and confidence>det_conf_thres:
 40 |                         id_num += 1
 41 |                         ID= int(id_num)
 42 |                         tracking_output.append([frame, ID, xmin, ymin, width, height, \
 43 |                             int(detection[6]), 1, 1])
 44 |                         tracklets.append([[frame, ID, xmin, ymin, width, height, \
 45 |                                                 int(detection[6]), 1, 1]])
 46 |                 else:
 47 |                     detections= detections[i:]
 48 |                     break
 49 |         else:
 50 |             #Get all tracklets
 51 |             tracklet_IDs = []
 52 |             for j, tracklet in enumerate(tracking_output):
 53 |                 xmin, ymin, width, height = int(round(tracklet[2])), int(round(tracklet[3])), \
 54 |                                             int(round(tracklet[4])), int(round(tracklet[5]))
 55 |                 if xmin > 0 and ymin > 0 and width > 0 and height > 0:
 56 |                     if (tracklet[0]<current_frame) and tracklet[0]>=current_frame-frames_look_back:
 57 |                         new_tracklet= True
 58 |                         for k,i in enumerate(tracklet_IDs):
 59 |                             if tracklet[1]==i:
 60 |                                 new_tracklet=False
 61 |                                 tracklets[k].append(tracklet)
 62 |                                 break
 63 |                         if new_tracklet==True:
 64 |                             tracklet_IDs.append(int(tracklet[1]))
 65 |                             tracklets.append([tracklet])
 66 |         #Get new detections
 67 |         current_detections = []
 68 |         for i, detection in enumerate(detections):
 69 |             if detection[0] == current_frame:
 70 |                 frame = detection[0]
 71 |                 xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
 72 |                                             int(round(detection[4])), int(round(detection[5]))
 73 |                 confidence= detection[6]
 74 |                 if xmin > 0 and ymin > 0 and width > 0 and height > min_height and confidence>det_conf_thres:
 75 |                     current_detections.append([frame, -1, xmin, ymin, width, height, \
 76 |                         int(detection[6]), 1, 1])
 77 |             else:
 78 |                 detections= detections[i:]
 79 |                 break
 80 |         #build graph and run model
 81 |         data = build_graph(tracklets, current_detections, images_path, current_frame, distance_limit, fps, test=True)
 82 |         if data:
 83 |             if current_detections and data.edge_attr.size()[0]!=0:
 84 |                 data_list.append(data)
 85 | 
 86 |                 loader = DataListLoader(data_list)
 87 |                 for graph_num, batch in enumerate(loader):
 88 |                     #MODEL FORWARD
 89 |                     output, output2, ground_truth, ground_truth2, det_num, tracklet_num= model(batch)
 90 |                     #FEATURE MAPS on tensorboard
 91 |                     #embedding
 92 |                     images= batch[0].x
 93 |                     images = F.interpolate(images, size=250)
 94 |                     edge_index= data_list[graph_num].edges_complete
 95 |                     #THRESHOLDS
 96 |                     temp= []
 97 |                     for i in output2:
 98 |                         if i>match_thres:
 99 |                             temp.append(i)
100 |                         else:
101 |                             temp.append(i-i)
102 |                     output2= torch.stack(temp)
103 |                     # HUNGARIAN
104 |                     cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num)
105 |                     # Give Ids to current frame
106 |                     for i,detection in enumerate(current_detections):
107 |                         match_found= False
108 |                         for k,m in enumerate(cleaned_output):#cleaned_output):
109 |                             if m==1 and edge_index[1,k]==i+len(tracklets): #match found
110 |                                 ID= tracklets[edge_index[0,k]][-1][1]
111 |                                 frame = detection[0]
112 |                                 xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
113 |                                                             int(round(detection[4])), int(round(detection[5]))
114 |                                 tracking_output.append([frame, ID, xmin, ymin, width, height, \
115 |                                                         int(detection[6]), 1, 1])
116 |                                 match_found = True
117 |                                 break
118 |                         if match_found==False: #give new ID
119 |                             # print("no match")
120 |                             id_num += 1
121 |                             ID= id_num
122 |                             frame = detection[0]
123 |                             xmin, ymin, width, height = int(round(detection[2])), int(round(detection[3])), \
124 |                                                         int(round(detection[4])), int(round(detection[5]))
125 |                             tracking_output.append([frame, ID, xmin, ymin, width, height, \
126 |                                                     int(detection[6]), 1, 1])
127 |                     #Clean output for false positives
128 |                     if current_frame>=fp_look_back:
129 |                         # reduce to recent objects
130 |                         recent_tracks = [i for i in tracking_output if i[0] >= current_frame-fp_look_back]
131 |                         # find the different IDs
132 |                         candidate_ids= []
133 |                         times_seen= []
134 |                         first_frame_seen= []
135 |                         for i in recent_tracks:
136 |                             if i[1] not in checked_ids:
137 |                                 if i[1] not in candidate_ids:
138 |                                     candidate_ids.append(i[1])
139 |                                     times_seen.append(1)
140 |                                     first_frame_seen.append(i[0])
141 |                                 else:
142 |                                     index= candidate_ids.index(i[1])
143 |                                     times_seen[index]= times_seen[index] + 1
144 |                         # find which IDs to remove
145 |                         remove_ids = []
146 |                         for i,j in enumerate(candidate_ids):
147 |                             if times_seen[i] < fp_min_times_seen and current_frame-first_frame_seen[i]>=fp_look_back:
148 |                                 remove_ids.append(j)
149 |                             elif times_seen[i] > fp_min_times_seen:
150 |                                 checked_ids.append(j)
151 |                         #keep only those IDs that are seen enough times
152 |                         tracking_output = [j for j in tracking_output if j[1] not in remove_ids]
153 |         current_frame += 1
154 |     # reduce to recent objects
155 |     recent_tracks = [i for i in tracking_output if i[0] >= current_frame-fp_look_back]
156 |     # find the different IDs
157 |     candidate_ids= []
158 |     times_seen= []
159 |     for i in recent_tracks:
160 |         if i[1] not in checked_ids:
161 |             if i[1] not in candidate_ids:
162 |                 candidate_ids.append(i[1])
163 |                 times_seen.append(1)
164 |             else:
165 |                 index= candidate_ids.index(i[1])
166 |                 times_seen[index]= times_seen[index] + 1
167 |     # find which IDs to remove
168 |     remove_ids = []
169 |     for i,j in enumerate(candidate_ids):
170 |         if times_seen[i] < fp_min_times_seen:
171 |             remove_ids.append(j)
172 |         elif times_seen[i] > fp_min_times_seen:
173 |             checked_ids.append(j)
174 |     #keep only those IDs that are seen enough times
175 |     tracking_output = [j for j in tracking_output if j[1] not in remove_ids]
176 | 
177 |     return tracking_output


--------------------------------------------------------------------------------
/model_training.py:
--------------------------------------------------------------------------------
  1 | from torch_geometric.nn import MetaLayer, DataParallel
  2 | from utils import * 
  3 | from network.complete_net import *
  4 | from torch_geometric.data import DataLoader, DataListLoader
  5 | from accuracy import *
  6 | import matplotlib.pyplot as plt
  7 | import logging
  8 | import sys
  9 | import os
 10 | 
 11 | def model_training(data_list_train, data_list_test, epochs, acc_epoch, acc_epoch2, save_model_epochs, validation_epoch, batchsize, logfilename, load_checkpoint= None):
 12 |         
 13 |     #logging
 14 |     logging.basicConfig(level=logging.DEBUG, filename='./logfiles/'+logfilename, filemode="w+",
 15 |                         format="%(message)s")
 16 |     trainloader = DataListLoader(data_list_train, batch_size=batchsize, shuffle=True)
 17 |     testloader = DataListLoader(data_list_test, batch_size=batchsize, shuffle=True)
 18 |     device = torch.device('cuda')
 19 |     complete_net = completeNet()
 20 |     complete_net = DataParallel(complete_net)
 21 |     complete_net = complete_net.to(device)
 22 |     
 23 |     #train parameters
 24 |     weights = [10, 1]
 25 |     optimizer = torch.optim.Adam(complete_net.parameters(), lr=0.001, weight_decay=0.001)
 26 | 
 27 |     #resume training
 28 |     initial_epoch=1
 29 |     if load_checkpoint!=None:
 30 |         checkpoint = torch.load(load_checkpoint)
 31 |         complete_net.load_state_dict(checkpoint['model_state_dict'], strict=False)
 32 |         optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
 33 |         initial_epoch = checkpoint['epoch']+1
 34 |         loss = checkpoint['loss']
 35 |     
 36 |     complete_net.train()
 37 | 
 38 |     for epoch in range(initial_epoch, epochs+1):
 39 |         epoch_total=0
 40 |         epoch_total_ones= 0
 41 |         epoch_total_zeros= 0
 42 |         epoch_correct=0
 43 |         epoch_correct_ones= 0
 44 |         epoch_correct_zeros= 0
 45 |         running_loss= 0
 46 |         batches_num=0         
 47 |         for batch in trainloader:
 48 |             batch_total=0
 49 |             batch_total_ones= 0
 50 |             batch_total_zeros= 0
 51 |             batch_correct= 0
 52 |             batch_correct_ones= 0
 53 |             batch_correct_zeros= 0
 54 |             batches_num+=1
 55 |             # Forward-Backpropagation
 56 |             output, output2, ground_truth, ground_truth2, det_num, tracklet_num= complete_net(batch)
 57 |             optimizer.zero_grad()
 58 |             loss = weighted_binary_cross_entropy(output, ground_truth, weights)
 59 |             loss.backward()
 60 |             optimizer.step()
 61 |             ##Accuracy 
 62 |             if epoch%acc_epoch==0 and epoch!=0:
 63 |                 # Hungarian method, clean up
 64 |                 cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num)
 65 |                 batch_total += cleaned_output.size(0)
 66 |                 ones= torch.tensor([1 for x in cleaned_output]).to(device)
 67 |                 zeros = torch.tensor([0 for x in cleaned_output]).to(device)
 68 |                 batch_total_ones += (cleaned_output == ones).sum().item()
 69 |                 batch_total_zeros += (cleaned_output == zeros).sum().item()
 70 |                 batch_correct += (cleaned_output == ground_truth2).sum().item()
 71 |                 temp1 = (cleaned_output == ground_truth2)
 72 |                 temp2 = (cleaned_output == ones)
 73 |                 batch_correct_ones += (temp1 & temp2).sum().item()
 74 |                 temp3 = (cleaned_output == zeros)
 75 |                 batch_correct_zeros += (temp1 & temp3).sum().item()
 76 |                 epoch_total += batch_total
 77 |                 epoch_total_ones += batch_total_ones
 78 |                 epoch_total_zeros += batch_total_zeros
 79 |                 epoch_correct += batch_correct
 80 |                 epoch_correct_ones += batch_correct_ones
 81 |                 epoch_correct_zeros += batch_correct_zeros
 82 |             if loss.item()!=loss.item():
 83 |                 print("Error")
 84 |                 break
 85 |             if batch_total_ones != 0 and batch_total_zeros != 0 and epoch%acc_epoch==0 and epoch!=0:
 86 |                 print('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
 87 |                       (epoch, batches_num, loss.item(), 100 * batch_correct / batch_total, 100 * batch_correct_ones / batch_total_ones,
 88 |                        100 * batch_correct_zeros / batch_total_zeros))
 89 |                 logging.info('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
 90 |                       (epoch, batches_num, loss.item(), 100 * batch_correct / batch_total, 100 * batch_correct_ones / batch_total_ones,
 91 |                        100 * batch_correct_zeros / batch_total_zeros))
 92 |             else:
 93 |                 print('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f |' %
 94 |                         (epoch, batches_num, loss.item()))
 95 |                 logging.info('Epoch: [%d] | Batch: [%d] | Training_Loss: %.3f |' %
 96 |                         (epoch, batches_num, loss.item()))
 97 |             running_loss += loss.item()
 98 |         if loss.item()!=loss.item():
 99 |                 print("Error")
100 |                 break
101 |         if epoch_total_ones!=0 and epoch_total_zeros!=0 and epoch%acc_epoch==0 and epoch!=0:
102 |             print('Epoch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
103 |                       (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \
104 |                           epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros))
105 |             logging.info('Epoch: [%d] | Training_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
106 |                       (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \
107 |                           epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros))
108 |         else:
109 |             print('Epoch: [%d] | Training_Loss: %.3f |' %
110 |                         (epoch, running_loss / batches_num))
111 |             logging.info('Epoch: [%d] | Training_Loss: %.3f |' %
112 |                         (epoch, running_loss / batches_num))
113 |         # save model
114 |         if epoch%save_model_epochs==0 and epoch!=0:
115 |             torch.save({ 
116 |                         'epoch': epoch,
117 |                         'model_state_dict': complete_net.state_dict(),
118 |                         'optimizer_state_dict': optimizer.state_dict(),
119 |                         'loss': running_loss,
120 |                         }, './models/epoch_'+str(epoch)+'.pth')
121 | 
122 |         #validation
123 |         if epoch%validation_epoch==0 and epoch!=0:
124 |             with torch.no_grad():
125 |                 epoch_total=0
126 |                 epoch_total_ones= 0
127 |                 epoch_total_zeros= 0
128 |                 epoch_correct=0
129 |                 epoch_correct_ones= 0
130 |                 epoch_correct_zeros= 0
131 |                 running_loss= 0
132 |                 batches_num=0
133 |                 for batch in testloader:
134 |                     batch_total=0
135 |                     batch_total_ones= 0
136 |                     batch_total_zeros= 0
137 |                     batch_correct= 0
138 |                     batch_correct_ones= 0
139 |                     batch_correct_zeros= 0
140 |                     batches_num+=1
141 |                     output, output2, ground_truth, ground_truth2, det_num, tracklet_num = complete_net(batch)
142 |                     loss = weighted_binary_cross_entropy(output, ground_truth, weights)
143 |                     running_loss += loss.item()
144 |                     ##Accuracy 
145 |                     if epoch%acc_epoch2==0 and epoch!=0:
146 |                         # Hungarian method, clean up
147 |                         cleaned_output= hungarian(output2, ground_truth2, det_num, tracklet_num)
148 |                         batch_total += cleaned_output.size(0)
149 |                         ones= torch.tensor([1 for x in cleaned_output]).to(device)
150 |                         zeros = torch.tensor([0 for x in cleaned_output]).to(device)
151 |                         batch_total_ones += (cleaned_output == ones).sum().item()
152 |                         batch_total_zeros += (cleaned_output == zeros).sum().item()
153 |                         batch_correct += (cleaned_output == ground_truth2).sum().item()
154 |                         temp1 = (cleaned_output == ground_truth2)
155 |                         temp2 = (cleaned_output == ones)
156 |                         batch_correct_ones += (temp1 & temp2).sum().item()
157 |                         temp3 = (cleaned_output == zeros)
158 |                         batch_correct_zeros += (temp1 & temp3).sum().item()
159 |                         epoch_total += batch_total
160 |                         epoch_total_ones += batch_total_ones
161 |                         epoch_total_zeros += batch_total_zeros
162 |                         epoch_correct += batch_correct
163 |                         epoch_correct_ones += batch_correct_ones
164 |                         epoch_correct_zeros += batch_correct_zeros
165 |                 if epoch_total_ones!=0 and epoch_total_zeros!=0 and epoch%acc_epoch2==0 and epoch!=0:
166 |                     print('Epoch: [%d] | Validation_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
167 |                                 (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \
168 |                                     epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros))
169 |                     logging.info('Epoch: [%d] | Validation_Loss: %.3f | Total_Accuracy: %.3f | Ones_Accuracy: %.3f | Zeros_Accuracy: %.3f |' %
170 |                                 (epoch, running_loss / batches_num, 100 * epoch_correct / epoch_total, 100 * \
171 |                                     epoch_correct_ones / epoch_total_ones, 100 * epoch_correct_zeros / epoch_total_zeros))
172 |                 else:
173 |                     print('Epoch: [%d] | Validation_Loss: %.3f |' %
174 |                                 (epoch, running_loss / batches_num))
175 |                     logging.info('Epoch: [%d] | Validation_Loss: %.3f |' %
176 |                                 (epoch, running_loss / batches_num))
177 |     
178 | 
179 |     
180 | 


--------------------------------------------------------------------------------
/network/affinity.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch import nn
 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU
 4 | 
 5 | class affinityNet(torch.nn.Module):
 6 |     def __init__(self):
 7 |         super(affinityNet, self).__init__()
 8 |         self.mlp = nn.Sequential(
 9 |             nn.Linear(2, 1), #loads features from two nodes and features of their edge (edge of interest)
10 |             nn.ReLU()
11 |         )
12 | 
13 |     def forward(self, inputs):
14 |         # source, target: [E, F_x], where E is the number of edges.
15 |         # edge_attr: [E, F_e]
16 |         # u: [B, F_u], where B is the number of graphs.
17 |         # batch: [E] with max entry B - 1.
18 |         # out = torch.cat([x1, x2, x3], 0)
19 |         return self.mlp(inputs)
20 | 


--------------------------------------------------------------------------------
/network/affinity_appearance.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch import nn
 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU
 4 | 
 5 | class affinity_appearanceNet(torch.nn.Module):
 6 |     def __init__(self):
 7 |         super(affinity_appearanceNet, self).__init__()
 8 |         self.mlp = nn.Sequential(
 9 |             nn.Linear(1024*2, 1), #loads features from two nodes and features of their edge (edge of interest)
10 |             nn.ReLU()
11 |         )
12 | 
13 |     def forward(self, inputs):
14 |         # source, target: [E, F_x], where E is the number of edges.
15 |         # edge_attr: [E, F_e]
16 |         # u: [B, F_u], where B is the number of graphs.
17 |         # batch: [E] with max entry B - 1.
18 |         # out = torch.cat([x1, x2, x3], 0)
19 |         return self.mlp(inputs)
20 | 


--------------------------------------------------------------------------------
/network/affinity_final.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch import nn
 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU
 4 | 
 5 | class affinity_finalNet(torch.nn.Module):
 6 |     def __init__(self):
 7 |         super(affinity_finalNet, self).__init__()
 8 |         self.mlp = nn.Sequential(
 9 |             nn.Linear(2, 1), #loads features from two nodes and features of their edge (edge of interest)
10 |             nn.ReLU()
11 |         )
12 | 
13 |     def forward(self, inputs):
14 |         # source, target: [E, F_x], where E is the number of edges.
15 |         # edge_attr: [E, F_e]
16 |         # u: [B, F_u], where B is the number of graphs.
17 |         # batch: [E] with max entry B - 1.
18 |         # out = torch.cat([x1, x2, x3], 0)
19 |         out = self.mlp(inputs)
20 |         return out
21 | 


--------------------------------------------------------------------------------
/network/affinity_geom.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch import nn
 3 | from torch.nn import Sequential as Seq, Linear as Lin, ReLU
 4 | 
 5 | class affinity_geomNet(torch.nn.Module):
 6 |     def __init__(self):
 7 |         super(affinity_geomNet, self).__init__()
 8 |         self.mlp = nn.Sequential(
 9 |             nn.Linear(8, 1), #loads features from two nodes and features of their edge (edge of interest)
10 |             nn.ReLU()
11 |         )
12 | 
13 |     def forward(self, inputs):
14 |         # source, target: [E, F_x], where E is the number of edges.
15 |         # edge_attr: [E, F_e]
16 |         # u: [B, F_u], where B is the number of graphs.
17 |         # batch: [E] with max entry B - 1.
18 |         # out = torch.cat([x1, x2, x3], 0)
19 |         return self.mlp(inputs)
20 | 


--------------------------------------------------------------------------------
/network/complete_net.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import torch.nn as nn
  3 | import os
  4 | import numpy as np
  5 | from network.optimizationGNN import *
  6 | from network.encoderCNN import *
  7 | from network.detections import *
  8 | from network.affinity import *
  9 | from network.affinity_appearance import *
 10 | from network.affinity_geom import *
 11 | from torch.nn.parameter import Parameter
 12 | from utils import *
 13 | import torch
 14 | from torch_sparse import transpose
 15 | from network.affinity_final import *
 16 | 
 17 | class completeNet(nn.Module):
 18 |     def __init__(self):
 19 |         super(completeNet, self).__init__()
 20 | 
 21 |         self.cnn = EncoderCNN()
 22 |         self.affinity_net = affinityNet()
 23 |         self.affinity_appearance_net= affinity_appearanceNet()
 24 |         self.affinity_geom_net= affinity_geomNet()
 25 |         self.affinity_final_net= affinity_finalNet()
 26 |         self.optim_net = optimNet()
 27 |         self.cos = nn.CosineSimilarity(dim=0, eps=1e-6)
 28 | 
 29 |     def forward(self, data):
 30 |         # print('Inside Model:  num graphs: {}, device: {}'.format(data.num_graphs, data.batch.device))
 31 |         # device = torch.device('cuda')
 32 |         x, coords_original, edge_index, ground_truth, coords, edges_number_list, frame, track_num, detections_num= \
 33 |             data.x, data.coords_original, data.edge_index, data.ground_truth, data.coords, data.edges_number, data.frame, data.track_num, data.det_num
 34 |         slack= torch.Tensor([-0.2]).float().cuda()
 35 |         lam= torch.Tensor([5]).float().cuda()
 36 |         #Pass through GNN
 37 |         node_embedding= self.cnn(x)
 38 |         edge_embedding = []
 39 |         edge_mlp= []
 40 |         for i in range(len(edge_index[0])):
 41 |             #CNN features
 42 |             x1 = self.affinity_appearance_net(torch.cat((node_embedding[edge_index[0][i]], node_embedding[edge_index[1][i]]), 0))
 43 |             #geometry
 44 |             x2 = self.affinity_geom_net(torch.cat((coords[edge_index[0][i]], coords[edge_index[1][i]]), 0))
 45 |             #iou
 46 |             iou= box_iou_calc(coords_original[edge_index[0][i]], coords_original[edge_index[1][i]])
 47 |             # x2= iou
 48 |             edge_mlp.append(iou)
 49 |             #pass through mlp
 50 |             inputs = torch.cat((x1.reshape(1), x2.reshape(1)), 0)
 51 |             edge_embedding.append(self.affinity_net(inputs))
 52 |         # print(edge_embedding)
 53 |         edge_embedding= torch.stack(edge_embedding)
 54 |         output = self.optim_net(node_embedding, edge_embedding, edge_index, coords, frame)
 55 |         output_temp= []
 56 |         for i in range(len(edge_index[0])):
 57 |             if edge_index[0][i]<edge_index[1][i]:
 58 |                 nodes_difference= self.cos(output[edge_index[0][i]], output[edge_index[1][i]])
 59 |                 x1 = self.affinity_final_net(torch.cat((nodes_difference.reshape(1), edge_mlp[i].reshape(1)), 0))
 60 |                 output_temp.append(x1.reshape(1))
 61 |         output= output_temp
 62 |         start1= 0
 63 |         start2 = 0 #two are used here because output is already reduced while edges not
 64 |         normalized_output= []
 65 |         tracklet_num = []
 66 |         det_num = []
 67 |         for i,j in enumerate(data.idx):
 68 |             num_of_edges1= edges_number_list[i].item()
 69 |             num_of_edges2= int(num_of_edges1/2)
 70 |             output_sliced= output[start2:start2+num_of_edges2]
 71 |             edges_sliced= edge_index[:, start1:start1+num_of_edges1]
 72 |             start1 += num_of_edges1
 73 |             start2 += num_of_edges2
 74 | 
 75 |             row, col = edges_sliced
 76 |             mask = row < col
 77 |             edges_sliced = edges_sliced[:, mask]
 78 |             num_of_nodes= sum(track_num[0:i])+sum(detections_num[0:i])
 79 |             for k,l  in enumerate(edges_sliced):
 80 |                 for m,n in enumerate(l): 
 81 |                     edges_sliced[k,m]= edges_sliced[k,m]-num_of_nodes
 82 |             # elevate to e power and augment with slack variable
 83 |             matrix = []
 84 |             for k in range(int(track_num[i].item())):
 85 |                 matrix.append([])
 86 |                 for l in range(int(detections_num[i].item())):
 87 |                     matrix[k].append(torch.zeros(1, dtype=torch.float, requires_grad=False).cuda())
 88 |                 matrix[k].append(torch.exp(slack*lam))#slack
 89 |             for k,m in enumerate(edges_sliced[0]):
 90 |                 matrix[int(edges_sliced[0,k].item())][int(edges_sliced[1,k].item())-int(track_num[i].item())]= torch.exp(output_sliced[k]*lam)
 91 |             for w,z in enumerate(matrix):
 92 |                 matrix[w] = torch.cat(z)
 93 |             matrix.append(torch.ones(len(matrix[0])).cuda()*torch.exp(slack*lam))#slack
 94 |             matrix = torch.stack(matrix)
 95 |             matrix = sinkhorn(matrix)
 96 |             matrix = matrix[0:-1,0:-1]
 97 |             det_num.append(torch.tensor(len(matrix[0]), dtype= int).cuda())
 98 |             tracklet_num.append(torch.tensor(len(matrix), dtype= int).cuda())
 99 |             normalized_output.append(matrix.reshape(-1))
100 |         normalized_output = torch.cat((normalized_output[:]),dim=0)
101 |         normalized_output_final= []
102 |         ground_truth_final= []
103 |         for k, l in enumerate(normalized_output):
104 |             if l.item()!=0:
105 |                 normalized_output_final.append(l)
106 |                 ground_truth_final.append(ground_truth[k])
107 |         return torch.stack(normalized_output_final), normalized_output, torch.stack(ground_truth_final), ground_truth, torch.stack(det_num), torch.stack(tracklet_num)


--------------------------------------------------------------------------------
/network/detections.py:
--------------------------------------------------------------------------------
 1 | from network.encoderCNN import *
 2 | import sys
 3 | import torch.nn as nn
 4 | import os
 5 | import numpy as np
 6 | from torch_geometric.nn import MetaLayer
 7 | 
 8 | 
 9 | class detectionsNet(nn.Module):
10 |     def __init__(self):
11 |         super(detectionsNet, self).__init__()
12 |         self.cnn = EncoderCNN()
13 | 
14 |     def forward(self, node_attr):
15 |         return self.cnn(node_attr)


--------------------------------------------------------------------------------
/network/encoderCNN.py:
--------------------------------------------------------------------------------
 1 | from torchvision import models
 2 | import torch.nn as nn
 3 | import torch
 4 | import torch.nn.functional as F
 5 | 
 6 | class EncoderCNN(nn.Module):
 7 |     def __init__(self, embed_size=512):
 8 |         super(EncoderCNN, self).__init__()
 9 |         # get the pretrained densenet model
10 |         initial_cnn = models.densenet121(pretrained=True)
11 |         self.cnn = torch.nn.Sequential(*(list(initial_cnn.children())[:-1]))
12 |         # Freeze model weights
13 |         for param in self.cnn.parameters():
14 |             param.requires_grad = True
15 | 
16 |     def forward(self, images):
17 |         out = self.cnn(images)
18 |         out = F.relu(out, inplace=True)
19 |         out = F.adaptive_avg_pool2d(out, (1, 1))
20 |         out = torch.flatten(out, 1)
21 | 
22 |         return out


--------------------------------------------------------------------------------
/network/optimizationGNN.py:
--------------------------------------------------------------------------------
 1 | from network.encoderCNN import *
 2 | import sys
 3 | import torch.nn as nn
 4 | import os
 5 | import numpy as np
 6 | from torch_geometric.nn import MetaLayer
 7 | from torch_geometric.nn import GCNConv
 8 | import torch.nn.functional as F
 9 | import torch
10 | 
11 | 
12 | class optimNet(nn.Module):
13 |     def __init__(self):
14 |         super(optimNet, self).__init__()
15 |         self.conv1 = GCNConv(1024, 512, improved=False, cached=False, bias=True)
16 |         self.conv2 = GCNConv(512, 128, improved=False, cached=False, bias=True)
17 |         self.mlp1 = nn.Sequential(
18 |             nn.Linear(1024, 1), #loads features from two nodes and features of their edge (edge of interest)
19 |             nn.ReLU()
20 |         )
21 |     def similarity1(self, node_embedding, edge_index):
22 |         edge_attr = []
23 |         for i in range(len(edge_index[0])):
24 |             x1 = self.mlp1(torch.cat((node_embedding[edge_index[0][i]], node_embedding[edge_index[1][i]]), 0))
25 |             edge_attr.append(x1.reshape(1))
26 |         edge_attr = torch.stack(edge_attr)
27 |         return edge_attr
28 |     def forward(self, node_attr, edge_attr, edge_index, coords, frame):
29 |         node_embedding= node_attr
30 |         out = self.conv1(node_embedding, edge_index, edge_attr.reshape(-1))
31 |         out = F.relu(out)
32 |         edge_attr = self.similarity1(out, edge_index)
33 |         out = self.conv2(out, edge_index, edge_attr.reshape(-1))
34 |         return out
35 | 
36 | 


--------------------------------------------------------------------------------
/singularity:
--------------------------------------------------------------------------------
 1 | Bootstrap: docker
 2 | 
 3 | From: nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04
 4 | 
 5 | %post
 6 | CURDIR=`pwd`
 7 | 
 8 | apt-get update
 9 | apt-get install -y tmux nano git wget
10 | 
11 | apt-get install -y --no-install-recommends make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
12 | 
13 | 
14 | export PYENV_ROOT=/opt/pyenv
15 | export PATH="/opt/pyenv/bin:$PATH"
16 | curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
17 | pyenv install 3.7.2
18 | echo 'export PATH=/opt/pyenv/versions/3.7.2/bin/:$PATH' >> $SINGULARITY_ENVIRONMENT
19 | export PATH=/opt/pyenv/versions/3.7.2/bin/:$PATH
20 | 
21 | pip install torch==1.3.0
22 | 
23 | mkdir -p $SINGULARITY_ROOTFS/tmp/sing_build_cuda
24 | cd $SINGULARITY_ROOTFS/tmp/sing_build_cuda
25 | 
26 | export TORCH_CUDA_ARCH_LIST="5.0 6.1"
27 | 
28 | git clone https://github.com/rusty1s/pytorch_scatter.git && \
29 |     cd ./pytorch_scatter && \
30 |     git checkout 1.4.0 && \
31 |     python3 ./setup.py install && \
32 |     cd ..
33 | 
34 | git clone https://github.com/rusty1s/pytorch_sparse.git && \
35 |     cd ./pytorch_sparse && \
36 |     git checkout 0.4.3 && \
37 |     python3 ./setup.py install && \
38 |     cd ..
39 | 
40 | git clone https://github.com/rusty1s/pytorch_cluster.git && \
41 |     cd ./pytorch_cluster && \
42 |     git checkout 1.4.5 && \
43 |     python3 ./setup.py install && \
44 |     cd ..
45 | 
46 | git clone https://github.com/rusty1s/pytorch_geometric.git && \
47 |     cd ./pytorch_geometric && \
48 |     git checkout 1.3.2 && \
49 |     python3 ./setup.py install && \
50 |     cd ..
51 | 
52 | cd $CURDIR
53 | rm -rf $SINGULARITY_ROOTFS/tmp/sing_build_cuda
54 | 
55 | pip install Pillow
56 | pip install opencv-python 
57 | pip install torchvision==0.4.1
58 | pip install matplotlib
59 | pip install tensorflow
60 | 


--------------------------------------------------------------------------------
/test.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | python ./tracking.py --type test


--------------------------------------------------------------------------------
/tracking.py:
--------------------------------------------------------------------------------
  1 | from data_preparation_train import *
  2 | from model_training import *
  3 | from model_testing import *
  4 | from network.complete_net import *
  5 | from utils import *
  6 | import os
  7 | import pickle
  8 | import argparse
  9 | import pprint
 10 | 
 11 | def get_data(info, set):
 12 |     if set=='training':
 13 |         detections = np.loadtxt("/data/MOT17/train/MOT17-{}-{}/gt/gt.txt".format(info[0], info[1]), delimiter=',')
 14 |         images_path = "/data/MOT17/train/MOT17-{}-{}/img1".format(info[0], info[1])
 15 |     elif set=='testing':
 16 |         detections = np.loadtxt("/data/MOT17/test/MOT17-{}-{}/det/det.txt".format(info[0], info[1]), delimiter=',')
 17 |         images_path = "/data/MOT17/test/MOT17-{}-{}/img1".format(info[0], info[1])
 18 |     return detections, images_path
 19 | 
 20 | if __name__ == "__main__":
 21 |     parser = argparse.ArgumentParser()
 22 |     parser.add_argument('--type', type=str)
 23 |     args = parser.parse_args()
 24 | 
 25 |     if args.type == 'train':
 26 | 
 27 |         # initialize training settings
 28 |         batchsize = 2 # specify how many graphs to use at each batch
 29 |         epochs = 4 # at how many epochs to stop
 30 |         load_checkpoint= None#'./models/epoch_2.pth' # None or specify .pth file to continue training
 31 |         validation_epochs= 4 # epoch interval for validation
 32 |         acc_epoch = epochs # at which epoch to calculate training accuracy
 33 |         acc_epoch2 = epochs # at which epoch to calculate validation accuracy
 34 |         save_model_epochs = 1 # epoch interval to save the model
 35 |         most_recent_frame_back = 30 # for more challenging training, specify most recent frame of tracklets to match new detections with, a max value is specified here 
 36 |                                     # so later it will be randomly between 1 and most_recent_frame_back, min value=1 -> take only previous frame
 37 |         frames_look_back = 30  # a second limit that is used to use tracklets for matching in frames between frames_look_back and most_recent_frame_back,
 38 |                                 # min value=1 -> take only previous frame
 39 |         # example: if current frame= 60, tracklets used randomly from t1= 30 to 59, also tracklets used till t1-30
 40 |         graph_jump = 5  # how many frames to move the current frame forward, min value=1 -> move to the next frame
 41 |         distance_limit = 250 # objects within that pixel distance can be associated 
 42 | 
 43 |         # MOT17 specific settings
 44 |         train_seq = ["02", "04", "05", "09", "10", "11", "13"] # names of videos
 45 |         valid_seq = ["02", "04", "05", "09", "10", "11", "13"] # names of videos
 46 |         fps= [30,30,14,30,30,30,25] # specify fps of each video
 47 |         current_frame_train = 2 # use as first current frame the second frame of each video
 48 |         total_frames = [None, None, None, None, None, None, None] # total frames of each video loaded, None for max frames
 49 |         current_frame_valid = [500,900,780,450,550,800,650] # up to which frame of each video to use for training
 50 |         detector = ["FRCNN"] # specify a detector just to direct to one of the MOT folders
 51 | 
 52 |         # Option 1: If graph data not built, loop through sequence and get training data
 53 |         print('\n')
 54 |         print('Training Data')
 55 |         data_list_train = []
 56 |         for s in range(len(train_seq)):
 57 |             for d in range(len(detector)):
 58 |                 print('Sequence: ' + train_seq[s])
 59 |                 detections, images_path = get_data([train_seq[s], detector[d]], "training")
 60 |                 list = data_prep_train(train_seq[s], detections, images_path, frames_look_back, total_frames[s], most_recent_frame_back,
 61 |                                         graph_jump, current_frame_train, current_frame_valid[s], distance_limit, fps[s], "training")
 62 |                 data_list_train = data_list_train + list
 63 |         with open('./data/data_train.data', 'wb') as filehandle:
 64 |             pickle.dump(data_list_train, filehandle)
 65 |         print("Saved to pickle file \n")
 66 |         print('Validation Data')
 67 |         data_list_valid = []
 68 |         for s in range(len(valid_seq)):
 69 |             for d in range(len(detector)):
 70 |                 print('Sequence: ' + valid_seq[s])
 71 |                 detections, images_path = get_data([valid_seq[s], detector[d]], "training")
 72 |                 list = data_prep_train(valid_seq[s], detections, images_path, frames_look_back, total_frames[s], most_recent_frame_back,
 73 |                                         graph_jump, current_frame_train, current_frame_valid[s], distance_limit, fps[s], "validation")
 74 |                 data_list_valid = data_list_valid + list
 75 |         with open('./data/data_valid.data', 'wb') as filehandle:
 76 |             pickle.dump(data_list_valid, filehandle)
 77 |         print("Saved to pickle file \n")
 78 | 
 79 |         # Option 2: If data graph built, just import files
 80 |         # with open('./data/data_train.data', 'rb') as filehandle:
 81 |         #     data_list_train = pickle.load(filehandle)
 82 |         # print("Loaded training pickle files")
 83 |         # with open('./data/data_valid.data', 'rb') as filehandle:
 84 |         #     data_list_valid = pickle.load(filehandle)
 85 |         # print("Loaded validation pickle files")
 86 |         #Load and train
 87 |         model_training(data_list_train, data_list_valid, epochs, acc_epoch, acc_epoch2, save_model_epochs, validation_epochs, batchsize, "logfile", load_checkpoint)
 88 | 
 89 |     elif args.type == 'test':
 90 | 
 91 |         frames_look_back = 30 # how many previous frames to look back for tracklets to be matched
 92 |         match_thres= 0.25 # the matching confidence threshold
 93 |         det_conf_thres= 0.0 # the lowest detection confidence
 94 |         distance_limit = 200 # objects within that pixel distance can be associated 
 95 |         min_height= 10 #minimum height of detections
 96 |         fp_look_back= 15 # for false positives
 97 |         fp_recent_frame_limit= 10 # for false positives
 98 |         fp_min_times_seen= 3 # for false positives, minimum times seen for the last fp_look_back frames 
 99 |                             #with the first instance seen before fp_recent_frame_limit otherwise considered false positive 
100 | 
101 |         # Select sequence
102 |         seq = ["01","03","06","07","08","12","14"]
103 |         fps= [30,30,14,30,30,30,25]
104 |         # detector = ["DPM", "FRCNN", "SDP"]
105 |         detector = ["FRCNN"]
106 | 
107 |         #load model
108 |         model = completeNet()
109 |         device = torch.device('cuda')
110 |         model = model.to(device)
111 |         model = DataParallel(model)
112 |         model.load_state_dict(torch.load('./models/epoch_11.pth')['model_state_dict'])
113 |         model.eval()
114 | 
115 |         # Load data and test, write output to txt and video
116 |         data_list = []
117 |         for s in range(len(seq)):
118 |             for d in range(len(detector)):
119 |                 detections, images_path = get_data([seq[s], detector[d]], "testing")
120 |                 total_frames= None # None for max frames
121 |                 if total_frames == None:
122 |                     total_frames = np.max(detections[:, 0])  # change only if you want a subset of the total frames
123 |                 print('Sequence: '+ seq[s])
124 |                 print('Total frames: ' + str(total_frames))
125 |                 detections = sorted(detections, key=lambda x: x[0])
126 |                 tracking_output= model_testing(seq[s], detections, images_path, total_frames, frames_look_back, model, distance_limit, fp_min_times_seen, match_thres, det_conf_thres, fp_look_back, fp_recent_frame_limit, min_height, fps[s])
127 |                 #write output
128 |                 outputFile = './output/MOT17-{}-{}.avi'.format(seq[s],detector[d])
129 |                 #get dimensions of images
130 |                 image_name = os.path.join(images_path, "{0:0=6d}".format(1) + ".jpg")
131 |                 image = cv2.imread(image_name, cv2.IMREAD_COLOR)
132 |                 vid_writer = cv2.VideoWriter(outputFile, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 15,
133 |                                              (image.shape[1], image.shape[0]))
134 |                 with open('./output/MOT17-{}-{}.txt'.format(seq[s],detector[d]), 'w') as f:
135 |                     for frame in range(1,int(total_frames)+1):
136 |                         image_name = os.path.join(images_path, "{0:0=6d}".format(int(frame)) + ".jpg")
137 |                         image = cv2.imread(image_name, cv2.IMREAD_COLOR)
138 |                         for item in tracking_output:
139 |                             if item[0]==frame:
140 |                                 #write tracking output to txt
141 |                                 f.write('%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n' % (item[0], item[1], item[2], item[3], item[4], item[5], -1, -1, -1, -1))
142 |                                 #write tracking output to frame
143 |                                 xmin = int(item[2])
144 |                                 ymin = int(item[3])
145 |                                 xmax = int(item[2] + item[4])
146 |                                 ymax = int(item[3] + item[5])
147 |                                 display_text = '%d' % (item[1])
148 |                                 color_rectangle = (0, 0, 255)
149 |                                 cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color=color_rectangle, thickness=2)
150 |                                 font = cv2.FONT_HERSHEY_PLAIN
151 |                                 color_text = (255, 255, 255)
152 |                                 cv2.putText(image, display_text,
153 |                                             (xmin + int((xmax - xmin) / 2), ymin + int((ymax - ymin) / 2)), fontFace=font,
154 |                                             fontScale=1.3, color=color_text,
155 |                                             thickness=2)
156 |                             elif item[0]>frame:
157 |                                 break
158 |                         for item in detections:
159 |                             if item[0] == frame and item[2]>0 and item[3]>0 and item[4]>0 and item[5]>0:
160 |                                 xmin = int(item[2])
161 |                                 ymin = int(item[3])
162 |                                 xmax = int(item[2] + item[4])
163 |                                 ymax = int(item[3] + item[5])
164 |                                 color_rectangle = (255, 255, 255)
165 |                                 cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color=color_rectangle, thickness=1)
166 |                             elif item[0]>frame:
167 |                                 break
168 |                         xmin = 0
169 |                         ymin = 0
170 |                         xmax = 1920
171 |                         ymax = 1080
172 |                         display_text = 'Frame %d' % (frame)
173 |                         font = cv2.FONT_HERSHEY_PLAIN
174 |                         color_text = (0, 0, 255)
175 |                         cv2.putText(image, display_text,
176 |                                     (50, 50), fontFace=font,
177 |                                     fontScale=1.3, color=color_text,
178 |                                     thickness=2)
179 |                         vid_writer.write(image)
180 |                 cv2.destroyAllWindows()
181 |                 vid_writer.release()
182 | 
183 | 
184 | 
185 | 
186 | 
187 | 
188 | 
189 | 
190 | 
191 | 
192 | 
193 | 
194 | 
195 | 
196 | 
197 | 
198 | 
199 | 


--------------------------------------------------------------------------------
/train.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | python ./tracking.py --type train
4 | 
5 | 
6 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | from scipy.optimize import linear_sum_assignment
  4 | 
  5 | def fill_detections(tracking_output):
  6 |     final_output= []
  7 |     different_ids= []
  8 |     for i in tracking_output:
  9 |         if i[1] not in different_ids:
 10 |             different_ids.append(int(i[1]))
 11 |     #for each ID
 12 |     for i in different_ids:
 13 |         #get output for each ID
 14 |         output_temp= []
 15 |         for j in tracking_output:
 16 |             if j[1]==i:
 17 |                 output_temp.append(j)
 18 |         filled_detections= []
 19 |         #for every two frames, fill in detections
 20 |         for j in range(len(output_temp)-1):
 21 |             diff_frame= output_temp[j+1][0]-output_temp[j][0]
 22 |             diff_x= output_temp[j][2]-output_temp[j+1][2]
 23 |             diff_y= output_temp[j][3]-output_temp[j+1][3]
 24 |             diff_w= output_temp[j][4]-output_temp[j+1][4]
 25 |             diff_h= output_temp[j][5]-output_temp[j+1][5]
 26 |             boxes1= torch.tensor([output_temp[j][2],output_temp[j][3],output_temp[j][2]+output_temp[j][4],output_temp[j][3]+output_temp[j][5]])
 27 |             boxes2= torch.tensor([output_temp[j+1][2],output_temp[j+1][3],output_temp[j+1][2]+output_temp[j+1][4],output_temp[j+1][3]+output_temp[j+1][5]])
 28 |             IOU= box_iou_calc(boxes1, boxes2)
 29 |             if abs(int(diff_frame))>1 and abs(int(diff_frame))<150 and IOU.item()>0.1:                  
 30 |                 for sec in range(1,abs(int(diff_frame))):
 31 |                     div= abs(diff_frame)/sec
 32 |                     filled_detections.append([output_temp[j][0]+sec,i,int(output_temp[j][2]-diff_x/div),\
 33 |                         int(output_temp[j][3]-diff_y/div),int(output_temp[j][4]-diff_w/div),int(output_temp[j][5]-diff_h/div)])
 34 |         for j in output_temp:
 35 |             final_output.append(j)
 36 |         for j in filled_detections:
 37 |             final_output.append(j)
 38 |     final_output = sorted(final_output, key=lambda x: x[0])
 39 |     return final_output
 40 | 
 41 | 
 42 | def weighted_binary_cross_entropy(output, target, weights=None):
 43 |     loss = - weights[0] * (target * torch.log(output)) - \
 44 |             weights[1] * ((1 - target) * torch.log(1 - output))
 45 |     return torch.mean(loss)
 46 | 
 47 | def indices_first(a, b, value):
 48 |     out = [k for k, x in enumerate(a) if x == value]# and x <= b[k]]
 49 |     if out:
 50 |         return out
 51 | 
 52 | def indices_second(a, b, value):
 53 |     out = [k for k, x in enumerate(b) if x == value]# and x >= a[k]]
 54 |     if out:
 55 |         return out
 56 | 
 57 | def sinkhorn(matrix): 
 58 |     row_len = len(matrix) 
 59 |     col_len = len(matrix[0]) 
 60 |     desired_row_sums = torch.ones((1, row_len), requires_grad=False).cuda()
 61 |     desired_col_sums = torch.ones((1, col_len), requires_grad=False).cuda()
 62 |     desired_row_sums[:, -1] = col_len-1
 63 |     desired_col_sums[:, -1] = row_len-1
 64 |     for _ in range(8):
 65 |         #row normalization
 66 |         actual_row_sum = torch.sum(matrix, axis=1)
 67 |         for i, row in enumerate(matrix):
 68 |             for j, element in enumerate(row):
 69 |                 matrix[i,j]= element*desired_row_sums[0,i]/(actual_row_sum[i])
 70 |         #column normalization
 71 |         actual_col_sum = torch.sum(matrix, axis=0)
 72 |         for i, row in enumerate(matrix):
 73 |             for j, element in enumerate(row):
 74 |                 matrix[i,j]= element*desired_col_sums[0,j]/(actual_col_sum[j])
 75 |     return matrix
 76 | 
 77 | 
 78 | def hungarian(output, ground_truth, det_num, tracklet_num):
 79 |     cleaned_output = []
 80 |     num = 0
 81 |     eps = 0.0001  # for numerical stability
 82 |     for i, j in enumerate(tracklet_num):
 83 |         matrix = []
 84 |         for k in range(j):
 85 |             matrix.append([])
 86 |             for l in range(det_num[i]):
 87 |                 matrix[k].append(1 - output[num].cpu().detach().numpy())
 88 |                 num += 1
 89 |         matrix = np.array(matrix)
 90 |         # padding
 91 |         (a, b) = matrix.shape
 92 |         if a > b:
 93 |             padding = ((0, 0), (0, a - b))
 94 |         else:
 95 |             padding = ((0, b - a), (0, 0))
 96 |         matrix = np.pad(matrix, padding, mode='constant', constant_values=eps)
 97 |         # hungarian
 98 |         row_ind, col_ind = linear_sum_assignment(matrix)
 99 |         #take out those that are all 1, max cost, either hungarian will assign
100 |         remove_ind= []
101 |         cnt= 0
102 |         for i, row in enumerate(matrix):
103 |             for j, element in enumerate(row):
104 |                 if element==1:
105 |                     remove_ind.append(cnt)
106 |                 cnt += 1
107 |         cnt= 0
108 |         for i, row in enumerate(matrix):
109 |             for j, element in enumerate(row):
110 |                 if i < a and j < b:
111 |                     p1 = row_ind.tolist().index(i)
112 |                     p2 = col_ind.tolist().index(j)
113 |                     # print(p2)
114 |                     if p1 == p2 and cnt not in remove_ind:
115 |                         cleaned_output.append(torch.tensor(1, dtype=float).cuda())
116 |                     else:
117 |                         cleaned_output.append(torch.tensor(0, dtype=float).cuda())
118 |                 cnt += 1
119 |     cleaned_output = torch.stack(cleaned_output)
120 |     return cleaned_output
121 | 
122 | def box_area(boxes):
123 |     """
124 |     Computes the area of a set of bounding boxes, which are specified by its
125 |     (x1, y1, x2, y2) coordinates.
126 |     Arguments:
127 |         boxes (Tensor[N, 4]): boxes for which the area will be computed. They
128 |             are expected to be in (x1, y1, x2, y2) format
129 |     Returns:
130 |         area (Tensor[N]): area for each box
131 |     """
132 |     return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
133 | 
134 | def box_iou_calc(boxes1, boxes2):
135 |     # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py
136 |     """
137 |     Return intersection-over-union (Jaccard index) of boxes.
138 |     Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
139 |     Arguments:
140 |         boxes1 (Tensor[N, 4])
141 |         boxes2 (Tensor[M, 4])
142 |     Returns:
143 |         iou (Tensor[N, M]): the NxM matrix containing the pairwise
144 |             IoU values for every element in boxes1 and boxes2
145 |     """
146 |     boxes1= boxes1.reshape(1,4)
147 |     boxes2 = boxes2.reshape(1, 4)
148 | 
149 |     area1 = box_area(boxes1)
150 |     area2 = box_area(boxes2)
151 | 
152 |     lt = torch.max(boxes1[:, None, :2], boxes2[:, :2])  # [N,M,2]
153 |     rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])  # [N,M,2]
154 | 
155 |     wh = (rb - lt).clamp(min=0)  # [N,M,2]
156 |     inter = wh[:, :, 0] * wh[:, :, 1]  # [N,M]
157 | 
158 |     iou = inter / (area1[:, None] + area2 - inter)
159 |     return iou
160 | 
161 | class UnNormalize(object):
162 |     def __init__(self, mean, std):
163 |         self.mean = mean
164 |         self.std = std
165 | 
166 |     def __call__(self, tensor):
167 |         """
168 |         Args:
169 |             tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
170 |         Returns:
171 |             Tensor: Normalized image.
172 |         """
173 |         for t, m, s in zip(tensor, self.mean, self.std):
174 |             t.mul_(s).add_(m)
175 |             # The normalize code -> t.sub_(m).div_(s)
176 |         return tensor


--------------------------------------------------------------------------------