├── CLOC ├── README.md ├── code_offline │ ├── __pycache__ │ │ ├── yfcc100m_dataset.cpython-37.pyc │ │ └── yfcc100m_dataset.cpython-39.pyc │ ├── main.py │ └── yfcc100m_dataset.py ├── code_online │ ├── best_model │ │ ├── __pycache__ │ │ │ ├── yfcc100m_dataset.cpython-37.pyc │ │ │ └── yfcc100m_dataset.cpython-39.pyc │ │ ├── groupNorm │ │ │ ├── __pycache__ │ │ │ │ ├── group_norm.cpython-37.pyc │ │ │ │ ├── group_norm.cpython-39.pyc │ │ │ │ ├── resnet.cpython-37.pyc │ │ │ │ └── resnet.cpython-39.pyc │ │ │ ├── group_norm.py │ │ │ └── resnet.py │ │ ├── main_online_best_model.py │ │ └── yfcc100m_dataset.py │ └── no_PoLRS │ │ ├── __pycache__ │ │ ├── yfcc100m_dataset.cpython-37.pyc │ │ └── yfcc100m_dataset.cpython-39.pyc │ │ ├── groupNorm │ │ ├── __pycache__ │ │ │ ├── group_norm.cpython-37.pyc │ │ │ ├── group_norm.cpython-39.pyc │ │ │ ├── resnet.cpython-37.pyc │ │ │ └── resnet.cpython-39.pyc │ │ ├── group_norm.py │ │ └── resnet.py │ │ ├── main_online.py │ │ └── yfcc100m_dataset.py ├── data_preparation │ └── download_images │ │ └── download_images.py ├── exp_BS │ ├── eval_BS128.sh │ ├── eval_BS256.sh │ ├── eval_BS64.sh │ ├── train_BS128.sh │ ├── train_BS256.sh │ └── train_BS64.sh ├── exp_LR │ ├── eval_PoLRS.sh │ ├── eval_constant.sh │ ├── eval_cosine.sh │ ├── train_PoLRS.sh │ ├── train_constant.sh │ └── train_cosine.sh ├── exp_RepBuf │ ├── eval_39M.sh │ ├── eval_40K.sh │ ├── eval_4M.sh │ ├── eval_ADRep.sh │ ├── train_39M.sh │ ├── train_40K.sh │ ├── train_4M.sh │ └── train_ADRep.sh └── exp_best_model │ ├── eval_offline.sh │ ├── eval_online.sh │ ├── train_offline.sh │ └── train_online.sh ├── LICENSE └── README.md /CLOC/README.md: -------------------------------------------------------------------------------- 1 | # CLOC 2 | 3 | Description 4 | =========== 5 | 6 | ![alt text](https://github.com/ZhipengCai/ZhipengCai.github.io/blob/master/papers/CLOC.png " ") 7 | 8 | This repo contains the code for our ICCV 2021 paper: Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data. 9 | 10 | Authors: Zhipeng Cai, Ozan Sener, Vladlen Koltun 11 | 12 | [[Paper link]](https://arxiv.org/pdf/2108.09020.pdf) 13 | 14 | 15 | We perform an empirical study for online visual continual learning with natural distribution shifts. 16 | 17 | 1. We create a large scale continual learning benchmark called Continual LOCalization (CLOC), which contains roughly 39 million images taken from 8 years. We construct a continual classification task using the time stamp and geo-location of each image. CLOC is suitable for benchmarking continual learning due to its large scale and natural distribution shifts. We provide the code to construct CLOC here. ([[New host for CLOC dataset is now avaiable!]](https://github.com/hammoudhasan/CLDatasets)) If you just need the CLOC dataset, visit this link and follow their instructions for faster/easier download! 18 | 19 | 2. We define online continual learning (OCL) on CLOC, and distinguish learning efficacy (how fast can the model adapt to new data) from information retention (how well can the model remember old knowledge). Previous studies mostly focus on offline continual learning, and optimize for information retention. In this paper, we show that learning efficacy and information retention are somewhat conflicting, and propose serveral strategies to optimize learning efficacy. 20 | 21 | We show that the average online accuracy of our OCL model can perform on-par or better than the validation accuracy of supervised learning (SL) models, given similar budgets, as shown in the bottom right figure above. 22 | 23 | The code in this repo is free for non-commercial academic use. Any commercial use is strictly 24 | prohibited without the authors' consent. Please acknowledge the authors by citing: 25 | 26 | ``` 27 | @inproceedings{cai2021online, 28 | title={Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data.}, 29 | author={Cai, Zhipeng and Sener, Ozan and Koltun, Vladlen}, 30 | booktitle={International Conference on Computer Vision}, 31 | year={2021} 32 | } 33 | ``` 34 | 35 | Prerequisite 36 | ============ 37 | 1. python 3 38 | 39 | 2. pytorch 1.7+ 40 | 41 | 3. tensorboardX 42 | 43 | 4. numpy 44 | 45 | We run the experiments mainly using 4 Quadro RTX 6000 GPUs. 46 | 47 | Usage 48 | ===== 49 | 1. Clone this repo. 50 | 51 | 2. Download meta-data of CLOC from [the google drive link](https://drive.google.com/file/d/1UdIZe_9rEemO2QukHw7bf6aDFV-RjAfc/view?usp=sharing). Decompress it into "CLOC/data_preparation/release", via: 52 | 53 | ``` 54 | mv metadata.tar.gz CLOC/data_preparation/ 55 | tar -xvzf metadata.tar.gz 56 | ``` 57 | 58 | 3. Download images (an important function for download_images.py is to simultaneously download different parts of the dataset, and resume from unexpected termination, please see the comments in the code for more details), via: 59 | 60 | ``` 61 | cd CLOC/data_preparation/download_images 62 | python /download_images.py 63 | ``` 64 | 65 | 4. Run the experiments: 66 | 67 | Simply go to exp_ folders, and run the experiment that you want to replicate. "exp_best_model" refers to the code to train the proposed OCL model, i.e., using PoLRS, ADRep and small batch sizes. 68 | 69 | In each exp_ folder, there will be pairs of scripts named as "train_xxx.sh" and "eval_xxx.sh". Both can be run by simply type in: 70 | 71 | ``` 72 | bash xxx_xxx.sh 73 | ``` 74 | 75 | "train_xxx.sh" trains the OCL model and plots the average online accuracy. 76 | 77 | "eval_xxx.sh" produces the backward transfer curve. 78 | 79 | The time axis for different plots may have a scaling effect, one can unify them by simply normalize the time axis into [0,1] and then mutiply individual time point by the maximum number of images or the maximum wall-clock time. 80 | 81 | 5. use tensorboard to monitor the results: 82 | 83 | Go to each exp_xxx folder, do: 84 | ``` 85 | tensorboard --logdir=./ 86 | ``` 87 | The output folder of individual experiments can be found in the first line of the output logs. 88 | 89 | ------------------------ 90 | Contact 91 | ------------------------ 92 | 93 | Name: Zhipeng Cai 94 | 95 | Homepage: https://zhipengcai.github.io/ 96 | 97 | Email: czptc2h@gmail.com 98 | 99 | Do not hesitate to contact the authors if you have any question or find any bug :) 100 | 101 | -------------------------------------------------------------------------------- /CLOC/code_offline/__pycache__/yfcc100m_dataset.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_offline/__pycache__/yfcc100m_dataset.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_offline/__pycache__/yfcc100m_dataset.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_offline/__pycache__/yfcc100m_dataset.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_offline/main.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import random 4 | import shutil 5 | import time 6 | import warnings 7 | import builtins 8 | import math 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.parallel 13 | import torch.backends.cudnn as cudnn 14 | import torch.distributed as dist 15 | import torch.optim 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | from torch import randperm 23 | 24 | import numpy as np 25 | import sys 26 | from tensorboardX import SummaryWriter 27 | from yfcc100m_dataset import YFCC_CL_Dataset_offline_train, YFCC_CL_Dataset_offline_val 28 | 29 | import psutil 30 | import gc 31 | import copy 32 | 33 | 34 | model_names = sorted(name for name in models.__dict__ 35 | if name.islower() and not name.startswith("__") 36 | and callable(models.__dict__[name])) 37 | 38 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 39 | parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50', 40 | choices=model_names, 41 | help='model architecture: ' + 42 | ' | '.join(model_names) + 43 | ' (default: resnet18)') 44 | parser.add_argument('-j', '--workers', default=16, type=int, metavar='N', 45 | help='number of data loading workers (default: 4)') 46 | parser.add_argument('--epochs', default=90, type=int, metavar='N', 47 | help='number of total epochs to run') 48 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', 49 | help='manual epoch number (useful on restarts)') 50 | parser.add_argument('-b', '--batch-size', default=256, type=int, 51 | metavar='N', 52 | help='mini-batch size (default: 256), this is the total ' 53 | 'batch size of all GPUs on the current node when ' 54 | 'using Data Parallel or Distributed Data Parallel') 55 | 56 | parser.add_argument('--lr', '--learning-rate', default=0.1, type=float, 57 | metavar='LR', help='initial learning rate', dest='lr') 58 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', 59 | help='momentum') 60 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float, 61 | metavar='W', help='weight decay (default: 1e-4)', 62 | dest='weight_decay') 63 | parser.add_argument('-p', '--print-freq', default=10, type=int, 64 | metavar='N', help='print frequency (default: 10)') 65 | parser.add_argument('-wf', '--write-freq', default=100, type=int, 66 | metavar='N', help='print frequency (default: 100)') 67 | parser.add_argument('--resume', default='', type=str, metavar='PATH', 68 | help='path to latest checkpoint (default: none)') 69 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 70 | help='evaluate model on validation set') 71 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', 72 | help='use pre-trained model') 73 | parser.add_argument('--world-size', default=-1, type=int, 74 | help='number of nodes for distributed training') 75 | parser.add_argument('--rank', default=-1, type=int, 76 | help='node rank for distributed training') 77 | parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, 78 | help='url used to set up distributed training') 79 | parser.add_argument('--dist-backend', default='nccl', type=str, 80 | help='distributed backend') 81 | parser.add_argument('--seed', default=None, type=int, 82 | help='seed for initializing training. ') 83 | parser.add_argument('--gpu', default=None, type=int, 84 | help='GPU id to use.') 85 | parser.add_argument('--multiprocessing-distributed', action='store_true', 86 | help='Use multi-processing distributed training to launch ' 87 | 'N processes per node, which has N GPUs. This is the ' 88 | 'fastest way to use PyTorch for either single node or ' 89 | 'multi node data parallel training') 90 | 91 | # used for continual learning 92 | parser.add_argument('--cell_id', default="/export/share/t1-datasets/yfcc100m_full_dataset_alt/metadata_geolocation/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy", type=str, 93 | help='file that store the cell IDS') 94 | parser.add_argument('--data', default="/export/share/t1-datasets/yfcc100m_full_dataset_alt/metadata_geolocation/", 95 | type=str, help='path to the training data') 96 | parser.add_argument('--data_val', default="/export/share/t1-datasets/yfcc100m_full_dataset_alt/metadata_geolocation/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv", 97 | type=str, help='path to the metadata') 98 | 99 | parser.add_argument('--val_freq', default=10, 100 | type=int, help='perform validation per [val_freq] epochs (in offline mode)') 101 | parser.add_argument('--adjust_lr', default=0, 102 | type=int, help='whether to adjust lr') 103 | parser.add_argument('--num_passes', default=1, 104 | type=int, help='number of passes of data during training (used for offline training)') 105 | parser.add_argument('--use_aug', default=0, 106 | type=int, help='whether to use data augmentation') 107 | parser.add_argument('--use_val', default=0, 108 | type=int, help='whether to use validation set') 109 | parser.add_argument('--root', default="/export/share/t1-datasets/yfcc100m_full_dataset_alt/images/", 110 | type=str, help='root to the image dataset, used to save the memory consumption') 111 | 112 | parser.add_argument('--num_classes', default=500, 113 | type=int, help='number of classes (used only for constructing ring buffer, no need to set this value)') 114 | parser.add_argument('--val_set_OOD', default=0, 115 | type=int, help='(only for offline) whether to use an out-of-domain validation set') 116 | parser.add_argument('--train_set_division_OOD1', default=10, 117 | type=int, help='use the first [1/xxx] training set at the time axis') 118 | 119 | parser.add_argument('--used_data_rate_start', default = 0.0, type = float, help = 'starting data rate') 120 | parser.add_argument('--used_data_rate_end', default = 1.0, type = float, help = 'ending data rate') 121 | 122 | parser.add_argument("--min_lr", default = 0.0, type = float, help = 'minimum learning rate, used to cut the cosine LR') 123 | 124 | # whether to save intermediate models 125 | parser.add_argument("--SaveInter", default = 0, type = int, help = 'whether to save intermediate models') 126 | 127 | best_acc1 = 0 128 | 129 | 130 | def main(): 131 | args = parser.parse_args() 132 | 133 | if args.seed is not None: 134 | random.seed(args.seed) 135 | torch.manual_seed(args.seed) 136 | cudnn.deterministic = True 137 | warnings.warn('You have chosen to seed training. ' 138 | 'This will turn on the CUDNN deterministic setting, ' 139 | 'which can slow down your training considerably! ' 140 | 'You may see unexpected behavior when restarting ' 141 | 'from checkpoints.') 142 | 143 | if args.gpu is not None: 144 | warnings.warn('You have chosen a specific GPU. This will completely ' 145 | 'disable data parallelism.') 146 | 147 | if args.dist_url == "env://" and args.world_size == -1: 148 | args.world_size = int(os.environ["WORLD_SIZE"]) 149 | 150 | args.distributed = args.world_size > 1 or args.multiprocessing_distributed 151 | 152 | ngpus_per_node = torch.cuda.device_count() 153 | if args.multiprocessing_distributed: 154 | # Since we have ngpus_per_node processes per node, the total world_size 155 | # needs to be adjusted accordingly 156 | args.world_size = ngpus_per_node * args.world_size 157 | # Use torch.multiprocessing.spawn to launch distributed processes: the 158 | # main_worker process function 159 | mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) 160 | else: 161 | # Simply call main_worker function 162 | main_worker(args.gpu, ngpus_per_node, args) 163 | 164 | 165 | def main_worker(gpu, ngpus_per_node, args): 166 | global best_acc1 167 | 168 | args.gpu = gpu 169 | 170 | # suppress printing if not master 171 | if args.multiprocessing_distributed and args.gpu != 0: 172 | def print_pass(*args): 173 | pass 174 | builtins.print = print_pass 175 | 176 | args.output_dir = create_output_dir(args) 177 | os.makedirs(args.output_dir, exist_ok = True) 178 | writer = SummaryWriter(args.output_dir) 179 | 180 | print("output_dir = {}".format(args.output_dir)) 181 | 182 | sys.stdout.flush() 183 | 184 | 185 | if args.gpu is not None: 186 | print("Use GPU: {} for training".format(args.gpu)) 187 | 188 | if args.distributed: 189 | if args.dist_url == "env://" and args.rank == -1: 190 | args.rank = int(os.environ["RANK"]) 191 | if args.multiprocessing_distributed: 192 | # For multiprocessing distributed training, rank needs to be the 193 | # global rank among all the processes 194 | args.rank = args.rank * ngpus_per_node + gpu 195 | dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 196 | world_size=args.world_size, rank=args.rank) 197 | 198 | model, criterion, optimizer, num_classes = init_model(args, ngpus_per_node) 199 | 200 | args.num_classes = num_classes 201 | 202 | # init the full dataset 203 | train_set, val_set = init_dataset(args) 204 | cudnn.benchmark = True 205 | 206 | if train_set is not None: 207 | print("train_set.len = {}; val_set.len = {}".format(train_set.__len__(), val_set.__len__())) 208 | else: 209 | print("val_set.len = {}".format(val_set.__len__())) 210 | 211 | 212 | if args.evaluate: 213 | # optionally resume from a checkpoint 214 | if args.resume: 215 | if os.path.isfile(args.resume): 216 | args.start_epoch, _ = resume_offline(args, model, optimizer) 217 | else: 218 | raise RuntimeError("=> no checkpoint found at '{}'".format(args.resume)) 219 | 220 | print("do not perform training for evaluation mode, val_set = {}".format(val_set)) 221 | sys.stdout.flush() 222 | 223 | val_loader = init_val_loader(args, val_set) 224 | out_folder_eval = args.output_dir + '/epoch{}'.format(args.start_epoch) 225 | os.makedirs(out_folder_eval, exist_ok = True) 226 | writer = SummaryWriter(out_folder_eval) 227 | acc1, idx_test, label_test = validate_with_dis(val_loader, model, criterion, args, writer, args.start_epoch, print_over_time = True) 228 | # save idx_test and label_test 229 | fname_idx = out_folder_eval + '/idx_test.torchSave' 230 | fname_pred = out_folder_eval + '/pred_test.torchSave' 231 | print("saving files to {}".format(fname_idx)) 232 | torch.save(idx_test, fname_idx) 233 | torch.save(label_test, fname_pred) 234 | print("finish saving") 235 | else: 236 | train_offline(args, model, criterion, optimizer, train_set, val_set, ngpus_per_node, writer) 237 | 238 | 239 | def init_model(args, ngpus_per_node): 240 | cell_ids = np.load(args.cell_id, allow_pickle = True) 241 | num_classes = cell_ids.size + 1 # remember to +1, label 0 means all other locations that are not covered by the labels 242 | 243 | print("init with num_classes = {}".format(num_classes)) 244 | 245 | # create model 246 | if args.pretrained: 247 | print("=> using pre-trained model '{}'".format(args.arch)) 248 | model = models.__dict__[args.arch](pretrained=True) 249 | else: 250 | print("=> creating model '{}'".format(args.arch)) 251 | print("batch size too small ({} per GPU), using syncBN".format(int(args.batch_size / ngpus_per_node))) 252 | model = models.__dict__[args.arch](num_classes = num_classes, norm_layer = nn.SyncBatchNorm) 253 | 254 | if not torch.cuda.is_available(): 255 | print('using CPU, this will be slow') 256 | elif args.distributed: 257 | # For multiprocessing distributed, DistributedDataParallel constructor 258 | # should always set the single device scope, otherwise, 259 | # DistributedDataParallel will use all available devices. 260 | if args.gpu is not None: 261 | torch.cuda.set_device(args.gpu) 262 | model.cuda(args.gpu) 263 | # When using a single GPU per process and per 264 | # DistributedDataParallel, we need to divide the batch size 265 | # ourselves based on the total number of GPUs we have 266 | args.batch_size = int(args.batch_size / ngpus_per_node) 267 | args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node) 268 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu]) 269 | else: 270 | model.cuda() 271 | # DistributedDataParallel will divide and allocate batch_size to all 272 | # available GPUs if device_ids are not set 273 | model = torch.nn.parallel.DistributedDataParallel(model) 274 | elif args.gpu is not None: 275 | torch.cuda.set_device(args.gpu) 276 | model = model.cuda(args.gpu) 277 | else: 278 | # DataParallel will divide and allocate batch_size to all available GPUs 279 | if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): 280 | model.features = torch.nn.DataParallel(model.features) 281 | model.cuda() 282 | else: 283 | model = torch.nn.DataParallel(model).cuda() 284 | 285 | # define loss function (criterion) and optimizer 286 | criterion = nn.CrossEntropyLoss().cuda(args.gpu) 287 | 288 | optimizer = torch.optim.SGD(model.parameters(), args.lr, 289 | momentum=args.momentum, 290 | weight_decay=args.weight_decay) 291 | 292 | return model, criterion, optimizer, num_classes 293 | 294 | 295 | def init_dataset(args): 296 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 297 | std=[0.229, 0.224, 0.225]) 298 | 299 | trans = transforms.Compose([ 300 | transforms.RandomResizedCrop(224), 301 | transforms.RandomHorizontalFlip(), 302 | transforms.ToTensor(), 303 | normalize, 304 | ]) 305 | 306 | trans_test = transforms.Compose([ 307 | transforms.Resize(256), 308 | transforms.CenterCrop(224), 309 | transforms.ToTensor(), 310 | normalize, 311 | ]) 312 | 313 | print("data augmentation = {}; data augmentation for test batch = {}".format(trans, trans_test)) 314 | 315 | if args.evaluate: 316 | train_dataset = None 317 | val_dataset = YFCC_CL_Dataset_offline_val(args, 318 | transform = trans_test) 319 | else: 320 | train_dataset = YFCC_CL_Dataset_offline_train(args, 321 | transform = trans) 322 | val_dataset = YFCC_CL_Dataset_offline_val(args, 323 | transform = trans_test) 324 | 325 | return train_dataset, val_dataset 326 | 327 | 328 | def resume_offline(args, model, optimizer): 329 | print("=> loading checkpoint '{}'".format(args.resume)) 330 | if args.gpu is None: 331 | checkpoint = torch.load(args.resume) 332 | else: 333 | loc = 'cuda:{}'.format(args.gpu) 334 | checkpoint = torch.load(args.resume, map_location=loc) 335 | 336 | args.start_epoch = checkpoint['epoch'] 337 | best_acc1 = checkpoint['best_acc1'] 338 | 339 | model.load_state_dict(checkpoint['state_dict']) 340 | 341 | optimizer.load_state_dict(checkpoint['optimizer']) 342 | 343 | print("=> loaded checkpoint '{}' (epoch {})" 344 | .format(args.resume, checkpoint['epoch'])) 345 | 346 | return checkpoint['epoch'], checkpoint['best_acc1'] 347 | 348 | 349 | def init_val_loader(args, val_set): 350 | return torch.utils.data.DataLoader(val_set, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True) 351 | 352 | def validate_with_dis(val_loader, model, criterion, args, writer, epoch, print_over_time = False, writer_val = None): 353 | batch_time = AverageMeter('Time', ':6.3f') 354 | losses = AverageMeter('Loss', ':.4e') 355 | top1 = AverageMeter('Acc@1', ':6.2f') 356 | top5 = AverageMeter('Acc@5', ':6.2f') 357 | 358 | top1_iter = AverageMeter('AccI@1', ':6.2f') 359 | top5_iter = AverageMeter('AccI@5', ':6.2f') 360 | 361 | progress = ProgressMeter( 362 | len(val_loader), 363 | [batch_time, losses, top1, top5, top1_iter, top5_iter], 364 | prefix='Test: ') 365 | 366 | if writer_val == None: 367 | writer_val = writer 368 | 369 | model.eval() 370 | second_per_week = 7*24*3600 371 | second_last = 0 372 | 373 | with torch.no_grad(): 374 | end = time.time() 375 | for i, (images, target, time_curr, idx) in enumerate(val_loader): 376 | if args.gpu is not None: 377 | images = images.cuda(args.gpu, non_blocking=True) 378 | target = target.cuda(args.gpu, non_blocking=True) 379 | idx = idx.cuda(args.gpu, non_blocking=True) 380 | # compute output 381 | output = model(images) 382 | loss = criterion(output, target) 383 | 384 | if i == 0: 385 | idx_test = idx.clone() 386 | _, pred_test = output.topk(5, dim = 1) 387 | else: 388 | idx_test = torch.cat((idx_test, idx.clone())) 389 | _, pred_tmp = output.topk(5, dim = 1) 390 | pred_test = torch.cat((pred_test, pred_tmp)) 391 | # measure accuracy and record loss 392 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 393 | losses.update(loss.item(), images.size(0)) 394 | top1.update(acc1[0], images.size(0)) 395 | top5.update(acc5[0], images.size(0)) 396 | top1_iter.update(acc1[0], images.size(0)) 397 | top5_iter.update(acc5[0], images.size(0)) 398 | 399 | # measure elapsed time 400 | batch_time.update(time.time() - end) 401 | end = time.time() 402 | 403 | if i % args.print_freq == 0: 404 | progress.display(i) 405 | sys.stdout.flush() 406 | 407 | if print_over_time and time_curr[-1] - second_last >= second_per_week: 408 | writer_val.add_scalar("val_acc1_avg", top1.avg, time_curr[-1]//second_per_week) 409 | writer_val.add_scalar("val_acc5_avg", top5.avg, time_curr[-1]//second_per_week) 410 | writer_val.add_scalar("val_acc1_perWeek", top1_iter.avg, time_curr[-1]//second_per_week) 411 | writer_val.add_scalar("val_acc5_perWeek", top5_iter.avg, time_curr[-1]//second_per_week) 412 | top1_iter.reset() 413 | top5_iter.reset() 414 | second_last = time_curr[-1] 415 | 416 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}' 417 | .format(top1=top1, top5=top5)) 418 | 419 | writer.add_scalar("val_loss", losses.avg, epoch) 420 | writer.add_scalar("val_acc1", top1.avg, epoch) 421 | return top1.avg, idx_test, pred_test 422 | 423 | def validate(val_loader, model, criterion, args, writer, epoch, print_over_time = False, writer_val = None): 424 | batch_time = AverageMeter('Time', ':6.3f') 425 | losses = AverageMeter('Loss', ':.4e') 426 | top1 = AverageMeter('Acc@1', ':6.2f') 427 | top5 = AverageMeter('Acc@5', ':6.2f') 428 | 429 | top1_iter = AverageMeter('AccI@1', ':6.2f') 430 | top5_iter = AverageMeter('AccI@5', ':6.2f') 431 | 432 | progress = ProgressMeter( 433 | len(val_loader), 434 | [batch_time, losses, top1, top5, top1_iter, top5_iter], 435 | prefix='Test: ') 436 | 437 | if writer_val == None: 438 | writer_val = writer 439 | 440 | model.eval() 441 | second_per_week = 7*24*3600 442 | second_last = 0 443 | 444 | with torch.no_grad(): 445 | end = time.time() 446 | for i, (images, target, time_curr, idx) in enumerate(val_loader): 447 | if args.gpu is not None: 448 | images = images.cuda(args.gpu, non_blocking=True) 449 | target = target.cuda(args.gpu, non_blocking=True) 450 | 451 | # compute output 452 | output = model(images) 453 | loss = criterion(output, target) 454 | 455 | output_all = global_gather(output) 456 | target_all = global_gather(target) 457 | 458 | # measure accuracy and record loss 459 | acc1, acc5 = accuracy(output_all, target_all, topk=(1, 5)) 460 | losses.update(loss.item(), images.size(0)) 461 | top1.update(acc1[0], images.size(0)) 462 | top5.update(acc5[0], images.size(0)) 463 | top1_iter.update(acc1[0], images.size(0)) 464 | top5_iter.update(acc5[0], images.size(0)) 465 | 466 | 467 | # measure elapsed time 468 | batch_time.update(time.time() - end) 469 | end = time.time() 470 | 471 | 472 | if i % args.print_freq == 0: 473 | progress.display(i) 474 | sys.stdout.flush() 475 | 476 | if print_over_time and time_curr[-1] - second_last >= second_per_week: 477 | writer_val.add_scalar("val_acc1_avg", top1.avg, time_curr[-1]//second_per_week) 478 | writer_val.add_scalar("val_acc5_avg", top5.avg, time_curr[-1]//second_per_week) 479 | writer_val.add_scalar("val_acc1_perWeek", top1_iter.avg, time_curr[-1]//second_per_week) 480 | writer_val.add_scalar("val_acc5_perWeek", top5_iter.avg, time_curr[-1]//second_per_week) 481 | top1_iter.reset() 482 | top5_iter.reset() 483 | second_last = time_curr[-1] 484 | 485 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}' 486 | .format(top1=top1, top5=top5)) 487 | 488 | # if args.gpu == 0: 489 | writer.add_scalar("val_loss", losses.avg, epoch) 490 | writer.add_scalar("val_acc1", top1.avg, epoch) 491 | return top1.avg 492 | 493 | def train_offline(args, model, criterion, optimizer, train_set, val_set, ngpus_per_node, writer): 494 | if args.resume: 495 | if os.path.isfile(args.resume): 496 | args.start_epoch, _ = resume_offline(args, model, optimizer) 497 | else: 498 | raise RuntimeError("=> no checkpoint found at '{}'".format(args.resume)) 499 | 500 | print("args.workers = {}".format(args.workers)) 501 | if args.use_val == 1: 502 | val_loader = torch.utils.data.DataLoader(val_set, 503 | batch_size=args.batch_size, shuffle=False, 504 | num_workers=args.workers, pin_memory=True) 505 | 506 | for epoch in range(args.start_epoch, args.epochs): 507 | train_loader, train_sampler = init_loader_v2(args, train_set, epoch) 508 | if args.distributed: 509 | train_sampler.set_epoch(0) 510 | 511 | if args.adjust_lr: 512 | adjust_learning_rate(optimizer, epoch, args, ngpus_per_node) 513 | 514 | writer.add_scalar("learning rate", get_lr(optimizer), epoch) 515 | 516 | # train for one epoch 517 | train_offline_one_iter(train_loader, model, criterion, optimizer, epoch, args, writer) 518 | 519 | if args.use_val == 1 and (epoch + 1) % args.val_freq == 0: 520 | # # evaluate on validation set 521 | os.makedirs(args.output_dir+"/eval_ep{}".format(epoch), exist_ok = True) 522 | writer_val = SummaryWriter(args.output_dir+"/eval_ep{}".format(epoch)) 523 | acc1 = validate(val_loader, model, criterion, args, writer_val, epoch, print_over_time = True, writer_val = writer_val) 524 | 525 | if not args.multiprocessing_distributed or (args.multiprocessing_distributed 526 | and args.rank % ngpus_per_node == 0): 527 | if epoch % 10 == 0: 528 | save_checkpoint({ 529 | 'epoch': epoch + 1, 530 | 'arch': args.arch, 531 | 'state_dict': model.state_dict(), 532 | 'best_acc1': 0, # currently not used 533 | 'optimizer' : optimizer.state_dict() 534 | }, 0, output_dir = args.output_dir, filename = 'checkpoint_ep{}.pth.tar'.format(epoch)) 535 | else: 536 | save_checkpoint({ 537 | 'epoch': epoch + 1, 538 | 'arch': args.arch, 539 | 'state_dict': model.state_dict(), 540 | 'best_acc1': 0, # currently not used 541 | 'optimizer' : optimizer.state_dict() 542 | }, 0, output_dir = args.output_dir) 543 | 544 | val_loader = init_val_loader(args, val_set) 545 | acc1 = validate(val_loader, model, criterion, args, writer, args.epochs, print_over_time = True, writer_val = writer) 546 | print("final average accuracy = {}".format(acc1)) 547 | 548 | 549 | def init_loader_v2(args, train_set, epoch): 550 | 551 | train_set._change_data_range() 552 | batchSize4Loader = args.batch_size 553 | 554 | if args.distributed: 555 | # we already have randomness during change_data_range() 556 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, shuffle = False) 557 | else: 558 | train_sampler = None 559 | 560 | print("sampler.shuffle = {}".format(train_sampler.shuffle)) 561 | 562 | print("[initLoaderv2]: batchSize4Loader = {}".format(batchSize4Loader)) 563 | train_loader = torch.utils.data.DataLoader( 564 | train_set, batch_size=batchSize4Loader, shuffle=(train_sampler is None), 565 | num_workers=args.workers, pin_memory=True, sampler=train_sampler) 566 | 567 | return train_loader, train_sampler 568 | 569 | 570 | 571 | def train_offline_one_iter(train_loader, model, criterion, optimizer, epoch, args, writer): 572 | batch_time = AverageMeter('Time', ':6.3f') 573 | data_time = AverageMeter('Data', ':6.3f') 574 | losses = AverageMeter('Loss', ':.4e') 575 | top1 = AverageMeter('Acc@1', ':6.2f') 576 | top5 = AverageMeter('Acc@5', ':6.2f') 577 | 578 | progress = ProgressMeter( 579 | len(train_loader), 580 | [batch_time, data_time, losses, top1, top5], 581 | prefix="Epoch: [{}]".format(epoch)) 582 | 583 | model.train() 584 | end = time.time() 585 | 586 | optimizer.zero_grad() 587 | 588 | for i, (images, target, _, index) in enumerate(train_loader): 589 | # measure data loading time 590 | data_time.update(time.time() - end) 591 | 592 | iter_curr = epoch*len(train_loader)+i 593 | 594 | batch_size = target.size(0) 595 | 596 | if args.gpu is not None: 597 | images = images.cuda(args.gpu, non_blocking=True) 598 | target = target.cuda(args.gpu, non_blocking=True) 599 | 600 | output = model(images) 601 | loss = criterion(output, target) 602 | 603 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 604 | losses.update(loss.item(), images.size(0)) 605 | top1.update(acc1[0], images.size(0)) 606 | top5.update(acc5[0], images.size(0)) 607 | 608 | 609 | loss.backward() 610 | 611 | if i % args.print_freq == 0: 612 | progress.display(i) 613 | 614 | if i % args.write_freq == 0: 615 | writer.add_scalar("train_loss_iter", losses.avg, iter_curr) 616 | writer.add_scalar("train_acc1_iter", top1.avg, iter_curr) 617 | writer.add_scalar("train_acc5_iter", top5.avg, iter_curr) 618 | 619 | optimizer.step() # update parameters of net 620 | optimizer.zero_grad() # reset gradient 621 | 622 | # measure elapsed time 623 | batch_time.update(time.time() - end) 624 | end = time.time() 625 | 626 | # if args.gpu == 0: 627 | writer.add_scalar("train_loss", losses.avg, epoch) 628 | writer.add_scalar("train_acc1", top1.avg, epoch) 629 | 630 | def global_gather(x): 631 | all_x = [torch.ones_like(x) 632 | for _ in range(dist.get_world_size())] 633 | dist.all_gather(all_x, x, async_op=False) 634 | return torch.cat(all_x, dim=0) 635 | 636 | 637 | def save_checkpoint(state, is_best, output_dir = '.', filename='checkpoint.pth.tar'): 638 | torch.save(state, output_dir+'/checkpoint.pth.tar') 639 | if filename != 'checkpoint.pth.tar': 640 | shutil.copyfile(output_dir+'/checkpoint.pth.tar', output_dir+'/'+filename) 641 | 642 | 643 | class AverageMeter(object): 644 | """Computes and stores the average and current value""" 645 | def __init__(self, name, fmt=':f'): 646 | self.name = name 647 | self.fmt = fmt 648 | self.reset() 649 | 650 | def reset(self): 651 | self.val = 0 652 | self.avg = 0 653 | self.sum = 0 654 | self.count = 0 655 | 656 | def update(self, val, n=1): 657 | self.val = val 658 | self.sum += val * n 659 | self.count += n 660 | self.avg = self.sum / self.count 661 | 662 | def __str__(self): 663 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 664 | return fmtstr.format(**self.__dict__) 665 | 666 | 667 | class ProgressMeter(object): 668 | def __init__(self, num_batches, meters, prefix=""): 669 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 670 | self.meters = meters 671 | self.prefix = prefix 672 | 673 | def display(self, batch): 674 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 675 | entries += [str(meter) for meter in self.meters] 676 | # for test only 677 | print('\t'.join(entries)) 678 | 679 | def _get_batch_fmtstr(self, num_batches): 680 | num_digits = len(str(num_batches // 1)) 681 | fmt = '{:' + str(num_digits) + 'd}' 682 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 683 | 684 | 685 | 686 | def get_lr(optimizer): 687 | for param_group in optimizer.param_groups: 688 | return param_group['lr'] 689 | 690 | 691 | def adjust_learning_rate(optimizer, epoch, args, ngpus_per_node, iter_curr = 0): 692 | lr = args.lr 693 | 694 | lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs)) 695 | lr = max(args.min_lr, lr) 696 | # print("lr = {}".format(lr)) 697 | 698 | for param_group in optimizer.param_groups: 699 | param_group['lr'] = lr 700 | 701 | 702 | def accuracy(output, target, topk=(1,), cross_GPU = False): 703 | """Computes the accuracy over the k top predictions for the specified values of k""" 704 | with torch.no_grad(): 705 | if cross_GPU: 706 | output = global_gather(output) 707 | target = global_gather(target) 708 | 709 | maxk = max(topk) 710 | batch_size = target.size(0) 711 | 712 | _, pred = output.topk(maxk, 1, True, True) 713 | pred = pred.t() 714 | correct = pred.eq(target.reshape(1, -1).expand_as(pred)) 715 | 716 | res = [] 717 | for k in topk: 718 | correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True) 719 | res.append(correct_k.mul_(100.0 / batch_size)) 720 | return res 721 | 722 | 723 | def create_output_dir(args): 724 | 725 | output_dir = 'results_offline/'+args.arch+'_lr{}_epochs{}'.format(args.lr, args.epochs)+'_numPasses{}'.format(args.num_passes)+'_useVal{}'.format(args.use_val)+'_valFreq{}'.format(args.val_freq) 726 | 727 | output_dir += '_BS{}'.format(args.batch_size) 728 | 729 | if args.used_data_rate_start != 0.0: 730 | output_dir += "_start{}".format(args.used_data_rate_start) 731 | 732 | if args.used_data_rate_end != 1.0: 733 | output_dir += "_end{}".format(args.used_data_rate_end) 734 | 735 | if args.weight_decay != 1e-4: 736 | output_dir += '_WD{}'.format(args.weight_decay) 737 | 738 | if args.min_lr > 0.0: 739 | output_dir += '_minLr{}'.format(args.min_lr) 740 | 741 | if args.evaluate: 742 | output_dir += '/evaluate' 743 | 744 | return output_dir 745 | 746 | if __name__ == '__main__': 747 | main() -------------------------------------------------------------------------------- /CLOC/code_offline/yfcc100m_dataset.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchvision.datasets.vision import StandardTransform 3 | from torch.utils.data import Dataset, IterableDataset 4 | from PIL import Image 5 | 6 | import os 7 | import os.path 8 | import csv 9 | 10 | import sys 11 | import math 12 | import random 13 | import matplotlib.pyplot as plt 14 | 15 | 16 | def has_file_allowed_extension(filename, extensions): 17 | """Checks if a file is an allowed extension. 18 | 19 | Args: 20 | filename (string): path to a file 21 | extensions (tuple of strings): extensions to consider (lowercase) 22 | 23 | Returns: 24 | bool: True if the filename ends with one of given extensions 25 | """ 26 | return filename.lower().endswith(extensions) 27 | 28 | 29 | def is_image_file(filename): 30 | """Checks if a file is an allowed image extension. 31 | 32 | Args: 33 | filename (string): path to a file 34 | 35 | Returns: 36 | bool: True if the filename ends with a known image extension 37 | """ 38 | return has_file_allowed_extension(filename, IMG_EXTENSIONS) 39 | 40 | 41 | def make_dataset(directory, class_to_idx, extensions=None, is_valid_file=None): 42 | instances = [] 43 | directory = os.path.expanduser(directory) 44 | both_none = extensions is None and is_valid_file is None 45 | both_something = extensions is not None and is_valid_file is not None 46 | if both_none or both_something: 47 | raise ValueError("Both extensions and is_valid_file cannot be None or not None at the same time") 48 | if extensions is not None: 49 | def is_valid_file(x): 50 | return has_file_allowed_extension(x, extensions) 51 | for target_class in sorted(class_to_idx.keys()): 52 | class_index = class_to_idx[target_class] 53 | target_dir = os.path.join(directory, target_class) 54 | if not os.path.isdir(target_dir): 55 | continue 56 | for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)): 57 | for fname in sorted(fnames): 58 | path = os.path.join(root, fname) 59 | if is_valid_file(path): 60 | item = path, class_index 61 | instances.append(item) 62 | return instances 63 | 64 | 65 | def pil_loader(path): 66 | # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835) 67 | with open(path, 'rb') as f: 68 | img = Image.open(f) 69 | return img.convert('RGB') 70 | 71 | 72 | def accimage_loader(path): 73 | import accimage 74 | try: 75 | return accimage.Image(path) 76 | except IOError: 77 | # Potentially a decoding problem, fall back to PIL.Image 78 | return pil_loader(path) 79 | 80 | 81 | def default_loader(path): 82 | from torchvision import get_image_backend 83 | if get_image_backend() == 'accimage': 84 | return accimage_loader(path) 85 | else: 86 | return pil_loader(path) 87 | 88 | 89 | IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') 90 | 91 | 92 | 93 | class YFCC_CL_Dataset_offline_val(Dataset): 94 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, 95 | target_transform=None): 96 | 97 | fname = args.data_val 98 | root = args.root 99 | 100 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 101 | 102 | sys.stdout.flush() 103 | 104 | if isinstance(fname, torch._six.string_classes): 105 | fname = os.path.expanduser(fname) 106 | self.fname = fname 107 | 108 | self.transform = transform 109 | self.target_transform = target_transform 110 | 111 | self.labels, self.time_taken, self.user, self.store_loc = self._make_data(self.fname, root = root) 112 | if len(self.labels) == 0: 113 | msg = "Found 0 files in subfolders of: {}\n".format(self.fname) 114 | if extensions is not None: 115 | msg += "Supported extensions are: {}".format(",".join(extensions)) 116 | raise RuntimeError(msg) 117 | 118 | self.loader = loader 119 | self.extensions = extensions 120 | self.root = root 121 | self.batch_size = torch.cuda.device_count()*args.batch_size 122 | print("root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}".format(root, self.time_taken[1000], len(self.time_taken), self.batch_size)) 123 | 124 | def _make_data(self, fname, root): 125 | # read data 126 | fval = open(fname, 'r') 127 | lines_val = fval.readlines() 128 | labels = [None] * len(lines_val) 129 | time = [None] * len(lines_val) 130 | user = [None] * len(lines_val) 131 | store_loc = [None] * len(lines_val) 132 | 133 | for i in range(len(lines_val)): 134 | line_splitted = lines_val[i].split(",") 135 | labels[i] = int(line_splitted[0]) 136 | time[i] = int(line_splitted[2]) 137 | user[i] = line_splitted[3] 138 | store_loc[i] = line_splitted[-1][:-1] 139 | return labels, time, user, store_loc 140 | 141 | def __getitem__(self, index): 142 | if self.root is not None: 143 | path = self.root + self.store_loc[index] 144 | else: 145 | path = self.store_loc[index] 146 | sample = self.loader(path) 147 | if self.transform is not None: 148 | sample = self.transform(sample) 149 | 150 | return sample, self.labels[index], self.time_taken[index], index 151 | 152 | 153 | def __len__(self): 154 | return len(self.labels) 155 | 156 | 157 | 158 | class YFCC_CL_Dataset_offline_train(Dataset): 159 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, 160 | target_transform=None): 161 | 162 | fname = args.data 163 | root = args.root 164 | 165 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 166 | 167 | sys.stdout.flush() 168 | 169 | if isinstance(fname, torch._six.string_classes): 170 | fname = os.path.expanduser(fname) 171 | self.fname = fname 172 | 173 | # for backwards-compatibility 174 | self.transform = transform 175 | self.target_transform = target_transform 176 | 177 | self._make_data() 178 | self.used_data_start = int(len(self.labels)*args.used_data_rate_start) 179 | self.used_data_end = min(len(self.labels), int(len(self.labels)*args.used_data_rate_end)) 180 | self.data_size = self.used_data_end-self.used_data_start + 1 181 | self.data_size_per_epoch = int(args.num_passes*self.data_size)//int(args.epochs) 182 | 183 | if len(self.labels) == 0: 184 | msg = "Found 0 files in subfolders of: {}\n".format(self.fname) 185 | if extensions is not None: 186 | msg += "Supported extensions are: {}".format(",".join(extensions)) 187 | raise RuntimeError(msg) 188 | 189 | self.loader = loader 190 | self.extensions = extensions 191 | self.root = root 192 | self.batch_size = torch.cuda.device_count()*args.batch_size 193 | print("root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}; self.used_data_start = {}; self.used_data_end = {}; self.data_size_per_epoch = {}".format(root, self.time[1000], len(self.time), self.batch_size, self.used_data_start, self.used_data_end, self.data_size_per_epoch)) 194 | 195 | def _make_data(self): 196 | # only read labels and times to save storage 197 | self.labels = torch.load(self.fname+'train_labels.torchSave') 198 | # labels = [None] * len(labels) 199 | self.time = torch.load(self.fname+'train_time.torchSave') 200 | self.user = [None] * len(self.labels) 201 | self.store_loc = [None] * len(self.labels) 202 | self.idx_data = [] 203 | 204 | 205 | def _change_data_range(self, idx_data = None): 206 | if idx_data is None: 207 | # generate idx_data 208 | idx_data = self.used_data_start+torch.randperm(self.data_size)[:self.data_size_per_epoch] 209 | self.idx_data = idx_data 210 | # read user and store_locs 211 | self.user = [None] * len(self.labels) 212 | self.store_loc = [None] * len(self.labels) 213 | 214 | tmp_user = torch.load(self.fname+'train_user.torchSave') 215 | tmp_loc = torch.load(self.fname+'train_store_loc.torchSave') 216 | print("change data range to {}/{}".format(self.idx_data.min(), self.idx_data.max())) 217 | for i in range(len(self.idx_data)): 218 | self.idx_data[i] = min(len(self.labels)-1, self.idx_data[i]) 219 | self.user[self.idx_data[i]] = tmp_user[self.idx_data[i]] 220 | self.store_loc[self.idx_data[i]] = tmp_loc[self.idx_data[i]][:-1] 221 | 222 | def __getitem__(self, index): 223 | if self.root is not None: 224 | path = self.root + self.store_loc[self.idx_data[index]] 225 | else: 226 | path = self.store_loc[self.idx_data[index]] 227 | sample = self.loader(path) 228 | 229 | 230 | if self.transform is not None: 231 | sample = self.transform(sample) 232 | 233 | return sample, self.labels[self.idx_data[index]], self.time[self.idx_data[index]], self.idx_data[index] 234 | 235 | 236 | def __len__(self): 237 | return len(self.idx_data) 238 | 239 | 240 | -------------------------------------------------------------------------------- /CLOC/code_online/best_model/__pycache__/yfcc100m_dataset.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/__pycache__/yfcc100m_dataset.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/__pycache__/yfcc100m_dataset.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/__pycache__/yfcc100m_dataset.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/__pycache__/group_norm.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/groupNorm/__pycache__/group_norm.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/__pycache__/group_norm.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/groupNorm/__pycache__/group_norm.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/__pycache__/resnet.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/groupNorm/__pycache__/resnet.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/__pycache__/resnet.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/best_model/groupNorm/__pycache__/resnet.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/group_norm.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | from torch.nn.modules.batchnorm import _BatchNorm 4 | 5 | 6 | def group_norm(input, group, running_mean, running_var, weight=None, bias=None, 7 | use_input_stats=True, momentum=0.1, eps=1e-5): 8 | r"""Applies Group Normalization for channels in the same group in each data sample in a 9 | batch. 10 | 11 | See :class:`~torch.nn.GroupNorm1d`, :class:`~torch.nn.GroupNorm2d`, 12 | :class:`~torch.nn.GroupNorm3d` for details. 13 | """ 14 | if not use_input_stats and (running_mean is None or running_var is None): 15 | raise ValueError('Expected running_mean and running_var to be not None when use_input_stats=False') 16 | 17 | b, c = input.size(0), input.size(1) 18 | if weight is not None: 19 | weight = weight.repeat(b) 20 | if bias is not None: 21 | bias = bias.repeat(b) 22 | 23 | def _instance_norm(input, group, running_mean=None, running_var=None, weight=None, 24 | bias=None, use_input_stats=None, momentum=None, eps=None): 25 | # Repeat stored stats and affine transform params if necessary 26 | if running_mean is not None: 27 | running_mean_orig = running_mean 28 | running_mean = running_mean_orig.repeat(b) 29 | if running_var is not None: 30 | running_var_orig = running_var 31 | running_var = running_var_orig.repeat(b) 32 | 33 | #norm_shape = [1, b * c / group, group] 34 | #print(norm_shape) 35 | # Apply instance norm 36 | input_reshaped = input.contiguous().view(1, int(b * c/group), group, *input.size()[2:]) 37 | 38 | out = F.batch_norm( 39 | input_reshaped, running_mean, running_var, weight=weight, bias=bias, 40 | training=use_input_stats, momentum=momentum, eps=eps) 41 | 42 | # Reshape back 43 | if running_mean is not None: 44 | running_mean_orig.copy_(running_mean.view(b, int(c/group)).mean(0, keepdim=False)) 45 | if running_var is not None: 46 | running_var_orig.copy_(running_var.view(b, int(c/group)).mean(0, keepdim=False)) 47 | 48 | return out.view(b, c, *input.size()[2:]) 49 | return _instance_norm(input, group, running_mean=running_mean, 50 | running_var=running_var, weight=weight, bias=bias, 51 | use_input_stats=use_input_stats, momentum=momentum, 52 | eps=eps) 53 | 54 | 55 | class _GroupNorm(_BatchNorm): 56 | def __init__(self, num_features, num_groups=1, eps=1e-5, momentum=0.1, 57 | affine=False, track_running_stats=False): 58 | self.num_groups = num_groups 59 | self.track_running_stats = track_running_stats 60 | super(_GroupNorm, self).__init__(int(num_features/num_groups), eps, 61 | momentum, affine, track_running_stats) 62 | 63 | def _check_input_dim(self, input): 64 | return NotImplemented 65 | 66 | def forward(self, input): 67 | self._check_input_dim(input) 68 | 69 | return group_norm( 70 | input, self.num_groups, self.running_mean, self.running_var, self.weight, self.bias, 71 | self.training or not self.track_running_stats, self.momentum, self.eps) 72 | 73 | 74 | class GroupNorm2d(_GroupNorm): 75 | r"""Applies Group Normalization over a 4D input (a mini-batch of 2D inputs 76 | with additional channel dimension) as described in the paper 77 | https://arxiv.org/pdf/1803.08494.pdf 78 | `Group Normalization`_ . 79 | 80 | Args: 81 | num_features: :math:`C` from an expected input of size 82 | :math:`(N, C, H, W)` 83 | num_groups: 84 | eps: a value added to the denominator for numerical stability. Default: 1e-5 85 | momentum: the value used for the running_mean and running_var computation. Default: 0.1 86 | affine: a boolean value that when set to ``True``, this module has 87 | learnable affine parameters. Default: ``True`` 88 | track_running_stats: a boolean value that when set to ``True``, this 89 | module tracks the running mean and variance, and when set to ``False``, 90 | this module does not track such statistics and always uses batch 91 | statistics in both training and eval modes. Default: ``False`` 92 | 93 | Shape: 94 | - Input: :math:`(N, C, H, W)` 95 | - Output: :math:`(N, C, H, W)` (same shape as input) 96 | 97 | Examples: 98 | >>> # Without Learnable Parameters 99 | >>> m = GroupNorm2d(100, 4) 100 | >>> # With Learnable Parameters 101 | >>> m = GroupNorm2d(100, 4, affine=True) 102 | >>> input = torch.randn(20, 100, 35, 45) 103 | >>> output = m(input) 104 | 105 | """ 106 | 107 | def _check_input_dim(self, input): 108 | if input.dim() != 4: 109 | raise ValueError('expected 4D input (got {}D input)' 110 | .format(input.dim())) 111 | 112 | 113 | class GroupNorm3d(_GroupNorm): 114 | """ 115 | Assume the data format is (B, C, D, H, W) 116 | """ 117 | def _check_input_dim(self, input): 118 | if input.dim() != 5: 119 | raise ValueError('expected 5D input (got {}D input)' 120 | .format(input.dim())) 121 | 122 | 123 | -------------------------------------------------------------------------------- /CLOC/code_online/best_model/groupNorm/resnet.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | from .group_norm import GroupNorm2d 5 | 6 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 7 | 'resnet152'] 8 | 9 | 10 | model_urls = { 11 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 12 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 13 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 14 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 15 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 16 | } 17 | 18 | 19 | def conv3x3(in_planes, out_planes, stride=1): 20 | """3x3 convolution with padding""" 21 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 22 | padding=1, bias=False) 23 | 24 | def norm2d(planes, num_channels_per_group=32): 25 | print("num_channels_per_group:{}".format(num_channels_per_group)) 26 | if num_channels_per_group > 0: 27 | return GroupNorm2d(planes, num_channels_per_group, affine=True, 28 | track_running_stats=False) 29 | else: 30 | return nn.BatchNorm2d(planes) 31 | 32 | 33 | class BasicBlock(nn.Module): 34 | expansion = 1 35 | 36 | def __init__(self, inplanes, planes, stride=1, downsample=None, 37 | group_norm=0): 38 | super(BasicBlock, self).__init__() 39 | self.conv1 = conv3x3(inplanes, planes, stride) 40 | self.bn1 = norm2d(planes, group_norm) 41 | self.relu = nn.ReLU(inplace=True) 42 | self.conv2 = conv3x3(planes, planes) 43 | self.bn2 = norm2d(planes, group_norm) 44 | self.downsample = downsample 45 | self.stride = stride 46 | 47 | def forward(self, x): 48 | residual = x 49 | 50 | out = self.conv1(x) 51 | out = self.bn1(out) 52 | out = self.relu(out) 53 | 54 | out = self.conv2(out) 55 | out = self.bn2(out) 56 | 57 | if self.downsample is not None: 58 | residual = self.downsample(x) 59 | 60 | out += residual 61 | out = self.relu(out) 62 | 63 | return out 64 | 65 | 66 | class Bottleneck(nn.Module): 67 | expansion = 4 68 | 69 | def __init__(self, inplanes, planes, stride=1, downsample=None, 70 | group_norm=0): 71 | super(Bottleneck, self).__init__() 72 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 73 | self.bn1 = norm2d(planes, group_norm) 74 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 75 | padding=1, bias=False) 76 | self.bn2 = norm2d(planes, group_norm) 77 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 78 | self.bn3 = norm2d(planes * 4, group_norm) 79 | self.relu = nn.ReLU(inplace=True) 80 | self.downsample = downsample 81 | self.stride = stride 82 | 83 | def forward(self, x): 84 | residual = x 85 | 86 | out = self.conv1(x) 87 | out = self.bn1(out) 88 | out = self.relu(out) 89 | 90 | out = self.conv2(out) 91 | out = self.bn2(out) 92 | out = self.relu(out) 93 | 94 | out = self.conv3(out) 95 | out = self.bn3(out) 96 | 97 | if self.downsample is not None: 98 | residual = self.downsample(x) 99 | 100 | out += residual 101 | out = self.relu(out) 102 | 103 | return out 104 | 105 | 106 | class ResNet(nn.Module): 107 | 108 | def __init__(self, block, layers, num_classes=1000, group_norm=0): 109 | self.inplanes = 64 110 | super(ResNet, self).__init__() 111 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 112 | bias=False) 113 | self.bn1 = norm2d(64, group_norm) 114 | self.relu = nn.ReLU(inplace=True) 115 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 116 | self.layer1 = self._make_layer(block, 64, layers[0], 117 | group_norm=group_norm) 118 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2, 119 | group_norm=group_norm) 120 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2, 121 | group_norm=group_norm) 122 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, 123 | group_norm=group_norm) 124 | self.avgpool = nn.AvgPool2d(7, stride=1) 125 | self.fc = nn.Linear(512 * block.expansion, num_classes) 126 | 127 | for m in self.modules(): 128 | if isinstance(m, nn.Conv2d): 129 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 130 | m.weight.data.normal_(0, math.sqrt(2. / n)) 131 | elif isinstance(m, nn.BatchNorm2d): 132 | m.weight.data.fill_(1) 133 | m.bias.data.zero_() 134 | elif isinstance(m, GroupNorm2d): 135 | m.weight.data.fill_(1) 136 | m.bias.data.zero_() 137 | 138 | for m in self.modules(): 139 | if isinstance(m, Bottleneck): 140 | m.bn3.weight.data.fill_(0) 141 | if isinstance(m, BasicBlock): 142 | m.bn2.weight.data.fill_(0) 143 | 144 | 145 | def _make_layer(self, block, planes, blocks, stride=1, group_norm=0): 146 | downsample = None 147 | if stride != 1 or self.inplanes != planes * block.expansion: 148 | downsample = nn.Sequential( 149 | nn.Conv2d(self.inplanes, planes * block.expansion, 150 | kernel_size=1, stride=stride, bias=False), 151 | norm2d(planes * block.expansion, group_norm), 152 | ) 153 | 154 | layers = [] 155 | layers.append(block(self.inplanes, planes, stride, downsample, 156 | group_norm)) 157 | self.inplanes = planes * block.expansion 158 | for i in range(1, blocks): 159 | layers.append(block(self.inplanes, planes, group_norm=group_norm)) 160 | 161 | return nn.Sequential(*layers) 162 | 163 | def forward(self, x): 164 | x = self.conv1(x) 165 | x = self.bn1(x) 166 | x = self.relu(x) 167 | x = self.maxpool(x) 168 | 169 | x = self.layer1(x) 170 | x = self.layer2(x) 171 | x = self.layer3(x) 172 | x = self.layer4(x) 173 | 174 | x = self.avgpool(x) 175 | x = x.view(x.size(0), -1) 176 | x = self.fc(x) 177 | 178 | return x 179 | 180 | 181 | def resnet18(pretrained=False, **kwargs): 182 | """Constructs a ResNet-18 model. 183 | 184 | Args: 185 | pretrained (bool): If True, returns a model pre-trained on ImageNet 186 | """ 187 | model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs) 188 | if pretrained: 189 | model.load_state_dict(model_zoo.load_url(model_urls['resnet18'])) 190 | return model 191 | 192 | 193 | def resnet34(pretrained=False, **kwargs): 194 | """Constructs a ResNet-34 model. 195 | 196 | Args: 197 | pretrained (bool): If True, returns a model pre-trained on ImageNet 198 | """ 199 | model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs) 200 | if pretrained: 201 | model.load_state_dict(model_zoo.load_url(model_urls['resnet34'])) 202 | return model 203 | 204 | 205 | def resnet50(pretrained=False, **kwargs): 206 | """Constructs a ResNet-50 model. 207 | 208 | Args: 209 | pretrained (bool): If True, returns a model pre-trained on ImageNet 210 | """ 211 | model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs) 212 | if pretrained: 213 | model.load_state_dict(model_zoo.load_url(model_urls['resnet50'])) 214 | return model 215 | 216 | 217 | def resnet101(pretrained=False, **kwargs): 218 | """Constructs a ResNet-101 model. 219 | 220 | Args: 221 | pretrained (bool): If True, returns a model pre-trained on ImageNet 222 | """ 223 | model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs) 224 | if pretrained: 225 | model.load_state_dict(model_zoo.load_url(model_urls['resnet101'])) 226 | return model 227 | 228 | 229 | def resnet152(pretrained=False, **kwargs): 230 | """Constructs a ResNet-152 model. 231 | 232 | Args: 233 | pretrained (bool): If True, returns a model pre-trained on ImageNet 234 | """ 235 | model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs) 236 | if pretrained: 237 | model.load_state_dict(model_zoo.load_url(model_urls['resnet152'])) 238 | return model 239 | -------------------------------------------------------------------------------- /CLOC/code_online/best_model/yfcc100m_dataset.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchvision.datasets.vision import StandardTransform 3 | from torch.utils.data import Dataset, IterableDataset 4 | from PIL import Image 5 | 6 | import os 7 | import os.path 8 | import csv 9 | 10 | import sys 11 | import math 12 | import random 13 | import matplotlib.pyplot as plt 14 | 15 | 16 | def has_file_allowed_extension(filename, extensions): 17 | """Checks if a file is an allowed extension. 18 | 19 | Args: 20 | filename (string): path to a file 21 | extensions (tuple of strings): extensions to consider (lowercase) 22 | 23 | Returns: 24 | bool: True if the filename ends with one of given extensions 25 | """ 26 | return filename.lower().endswith(extensions) 27 | 28 | 29 | def is_image_file(filename): 30 | """Checks if a file is an allowed image extension. 31 | 32 | Args: 33 | filename (string): path to a file 34 | 35 | Returns: 36 | bool: True if the filename ends with a known image extension 37 | """ 38 | return has_file_allowed_extension(filename, IMG_EXTENSIONS) 39 | 40 | 41 | def make_dataset(directory, class_to_idx, extensions=None, is_valid_file=None): 42 | instances = [] 43 | directory = os.path.expanduser(directory) 44 | both_none = extensions is None and is_valid_file is None 45 | both_something = extensions is not None and is_valid_file is not None 46 | if both_none or both_something: 47 | raise ValueError("Both extensions and is_valid_file cannot be None or not None at the same time") 48 | if extensions is not None: 49 | def is_valid_file(x): 50 | return has_file_allowed_extension(x, extensions) 51 | for target_class in sorted(class_to_idx.keys()): 52 | class_index = class_to_idx[target_class] 53 | target_dir = os.path.join(directory, target_class) 54 | if not os.path.isdir(target_dir): 55 | continue 56 | for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)): 57 | for fname in sorted(fnames): 58 | path = os.path.join(root, fname) 59 | if is_valid_file(path): 60 | item = path, class_index 61 | instances.append(item) 62 | return instances 63 | 64 | 65 | def pil_loader(path): 66 | # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835) 67 | with open(path, 'rb') as f: 68 | img = Image.open(f) 69 | return img.convert('RGB') 70 | 71 | 72 | def accimage_loader(path): 73 | import accimage 74 | try: 75 | return accimage.Image(path) 76 | except IOError: 77 | # Potentially a decoding problem, fall back to PIL.Image 78 | return pil_loader(path) 79 | 80 | 81 | def default_loader(path): 82 | from torchvision import get_image_backend 83 | if get_image_backend() == 'accimage': 84 | return accimage_loader(path) 85 | else: 86 | return pil_loader(path) 87 | 88 | 89 | IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') 90 | 91 | 92 | 93 | class YFCC_CL_Dataset_offline_val(Dataset): 94 | """A data loader for YFCC100M dataset 95 | """ 96 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, 97 | target_transform=None): 98 | 99 | fname = args.data_val 100 | root = args.root 101 | 102 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 103 | 104 | sys.stdout.flush() 105 | 106 | if isinstance(fname, torch._six.string_classes): 107 | fname = os.path.expanduser(fname) 108 | self.fname = fname 109 | 110 | # for backwards-compatibility 111 | self.transform = transform 112 | self.target_transform = target_transform 113 | 114 | self.labels, self.time_taken, self.user, self.store_loc = self._make_data(self.fname, root = root) 115 | if len(self.labels) == 0: 116 | msg = "Found 0 files in subfolders of: {}\n".format(self.fname) 117 | if extensions is not None: 118 | msg += "Supported extensions are: {}".format(",".join(extensions)) 119 | raise RuntimeError(msg) 120 | 121 | self.loader = loader 122 | self.extensions = extensions 123 | self.root = root 124 | 125 | # for replay buffer 126 | self.batch_size = torch.cuda.device_count()*args.batch_size 127 | self.is_forward = True 128 | self.offset = 0 129 | print("root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}".format(root, self.time_taken[1000], len(self.time_taken), self.batch_size)) 130 | 131 | def _make_data(self, fname, root): 132 | # read data 133 | fval = open(fname, 'r') 134 | lines_val = fval.readlines() 135 | labels = [None] * len(lines_val) 136 | time = [None] * len(lines_val) 137 | user = [None] * len(lines_val) 138 | store_loc = [None] * len(lines_val) 139 | 140 | for i in range(len(lines_val)): 141 | line_splitted = lines_val[i].split(",") 142 | labels[i] = int(line_splitted[0]) 143 | time[i] = int(line_splitted[2]) 144 | user[i] = line_splitted[3] 145 | store_loc[i] = line_splitted[-1][:-1] 146 | return labels, time, user, store_loc 147 | 148 | def set_transfer_time_point(self, args, val_set, time_last, is_forward = True): 149 | # find idx that is larger and closest to time_last 150 | self.is_forward = is_forward 151 | for i in range(len(self.time_taken)): 152 | if self.time_taken[i] >= time_last: 153 | print("[set_transfer_time_point]: time_last = {}; time[{}] = {}".format(time_last, i, self.time_taken[i])) 154 | self.offset = i 155 | return 156 | self.offset = len(self.time_taken) - 1 157 | 158 | def __getitem__(self, index): 159 | if self.is_forward: 160 | index = min(len(self.labels), index + self.offset) 161 | else: 162 | index = max(0, self.offset - index) 163 | 164 | if self.root is not None: 165 | path = self.root + self.store_loc[index] 166 | else: 167 | path = self.store_loc[index] 168 | sample = self.loader(path) 169 | if self.transform is not None: 170 | sample = self.transform(sample) 171 | 172 | return sample, self.labels[index], self.time_taken[index], index 173 | 174 | 175 | def __len__(self): 176 | if self.is_forward: 177 | return len(self.labels) - self.offset 178 | else: 179 | return self.offset + 1 180 | 181 | 182 | class YFCC_CL_Dataset_online(Dataset): 183 | 184 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, transform_RepBuf = None, 185 | target_transform=None, target_transform_RepBuf = None, trans_test = None): 186 | 187 | fname = args.data 188 | root = args.root 189 | size_buf = args.size_replay_buffer 190 | 191 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 192 | 193 | sys.stdout.flush() 194 | 195 | if isinstance(fname, torch._six.string_classes): 196 | fname = os.path.expanduser(fname) 197 | self.fname = fname 198 | 199 | # for backwards-compatibility 200 | self.transform = transform 201 | self.transform_test = trans_test 202 | self.target_transform = target_transform 203 | self.transform_RepBuf = transform_RepBuf 204 | self.target_transform_RepBuf = target_transform_RepBuf 205 | 206 | # valid initial and final index 207 | self.used_data_start = args.used_data_start 208 | self.used_data_end = args.used_data_end 209 | 210 | self._make_data() 211 | self.data_size = len(self.labels) 212 | self.data_size_per_epoch = math.ceil(self.data_size/args.epochs) 213 | 214 | self.batch_size = torch.cuda.device_count()*args.batch_size 215 | 216 | # make batch size always 256 217 | if self.data_size_per_epoch % self.batch_size != 0: 218 | self.data_size_per_epoch = self.data_size_per_epoch - self.data_size_per_epoch % self.batch_size + self.batch_size 219 | 220 | 221 | self.loader = loader 222 | self.extensions = extensions 223 | self.root = root 224 | self.size_buf = size_buf 225 | 226 | self.repBuf_sample_rate = 0.5 # use this later, test 0.5 case first 227 | self.sampling_strategy = args.sampling_strategy 228 | 229 | self.NOSubBatch = args.NOSubBatch 230 | self.SubBatch_index_offset = int(self.batch_size/self.NOSubBatch) 231 | 232 | self.gradient_steps_per_batch = int(args.gradient_steps_per_batch) # repBatch_rate is multiplied by this 233 | self.repBatch_rate = math.ceil(self.repBuf_sample_rate/(1-self.repBuf_sample_rate)) 234 | 235 | self.repType = args.ReplayType 236 | 237 | print("[initData]: root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}; size_buf = {}; repBuf_sample_rate = {}".format(root, self.time_taken[1000], len(self.time_taken), self.batch_size, self.size_buf, self.repBuf_sample_rate)) 238 | print("[initData]: repBatch_rate = {}; NOSubBatch = {}; SubBatch_index_offset = {}".format(self.repBatch_rate, self.NOSubBatch, self.SubBatch_index_offset)) 239 | print("[initData]: transform = {}; transform_RepBuf = {}".format(self.transform, self.transform_RepBuf)) 240 | 241 | def _make_data(self): 242 | # only read labels and times to save storage 243 | self.labels = torch.load(self.fname+'train_labels.torchSave') 244 | self.time_taken = torch.load(self.fname+'train_time.torchSave') 245 | self.user = torch.load(self.fname+'train_userID.torchSave') 246 | self.store_loc = [None] * len(self.labels) 247 | self.idx_data = [] 248 | 249 | def _change_data_range_FIFO(self): 250 | print("reading store location from {}".format(self.fname+'train_store_loc.torchSave')) 251 | tmp_loc = torch.load(self.fname+'train_store_loc.torchSave') 252 | print("tmp_loc.size = {}".format(len(tmp_loc))) 253 | for i in range(self.used_data_start, self.used_data_end): 254 | self.store_loc[i] = tmp_loc[i][:-1] 255 | if i % 1e5 == 0: 256 | print("store_loc[{}] = {}".format(i, self.store_loc[i])) 257 | sys.stdout.flush() 258 | 259 | 260 | def _set_data_idx(self, epoch): 261 | # setup training data idx 262 | self.offset = epoch*self.data_size_per_epoch 263 | size_curr_epoch = min(self.data_size - self.offset, self.data_size_per_epoch) 264 | size_curr_epoch = size_curr_epoch - (size_curr_epoch % self.batch_size) 265 | 266 | batchSize = int(self.batch_size/self.NOSubBatch) 267 | self.data_idx = [None]* (size_curr_epoch *self.gradient_steps_per_batch) 268 | iter_total = size_curr_epoch//batchSize 269 | bsReplicated = batchSize*self.gradient_steps_per_batch 270 | for i in range(0, int(iter_total)): 271 | self.data_idx[i*bsReplicated:(i+1)*bsReplicated] = list(range(self.offset+i*batchSize, self.offset+(i+1)*batchSize))*self.gradient_steps_per_batch 272 | 273 | 274 | def _change_data_range(self, epoch = 0): 275 | self._set_data_idx(epoch) 276 | 277 | self.used_data_start = max(0, epoch*self.data_size_per_epoch-self.size_buf-self.batch_size) 278 | self.used_data_end = min(len(self.labels), (epoch+1)*self.data_size_per_epoch+self.batch_size) 279 | print("change valid data range to: [{},{}]".format(self.used_data_start, self.used_data_end)) 280 | 281 | self.store_loc = [None] * len(self.labels) 282 | self._change_data_range_FIFO() 283 | 284 | 285 | def _sample_FIFO(self, index): 286 | if index < self.batch_size: 287 | return 0 288 | else: 289 | repBuf_idx = random.randint(max(0, index-self.size_buf-self.batch_size), index-self.batch_size) 290 | return repBuf_idx 291 | 292 | 293 | def _sample(self, index): 294 | return self._sample_FIFO(index) 295 | 296 | def __getitem__(self, index): 297 | index = self.data_idx[index] 298 | index_pop = index 299 | 300 | if index_pop < 0: 301 | index_pop = 0 302 | is_valid = torch.tensor(0) 303 | else: 304 | is_valid = torch.tensor(1) 305 | 306 | num_batches = 1 307 | 308 | target_pop = self.labels[index_pop] 309 | path_pop = self.root + self.store_loc[index_pop] 310 | sample_pop = self.loader(path_pop) 311 | if self.transform is not None: 312 | sample_pop = self.transform(sample_pop) 313 | if self.target_transform is not None: 314 | target_pop = self.target_transform(target_pop) 315 | 316 | 317 | sample_test = torch.zeros(self.NOSubBatch, sample_pop.size()[0], sample_pop.size()[1], sample_pop.size()[2]) 318 | target_test = torch.zeros(self.NOSubBatch).long() 319 | test_idx = torch.zeros(self.NOSubBatch).long() 320 | time_taken_test = torch.zeros(self.NOSubBatch).long() 321 | user_test = torch.zeros(self.NOSubBatch).long() 322 | 323 | # get the test data of full batch size 324 | for i in range(0, self.NOSubBatch): 325 | test_idx[i] = index+i*self.SubBatch_index_offset 326 | if test_idx[i] >= len(self.time_taken): 327 | test_idx[i] = len(self.time_taken) - 1 328 | 329 | time_taken_test[i] = self.time_taken[test_idx[i]] 330 | target_test[i] = self.labels[test_idx[i]] 331 | user_test[i] = self.user[test_idx[i]] 332 | 333 | path = self.root + self.store_loc[test_idx[i]] 334 | sample = self.loader(path) 335 | if self.transform is not None: 336 | sample_test[i] = self.transform_test(sample) 337 | if self.target_transform is not None: 338 | target_test[i] = self.target_transform(target_test[i]) 339 | 340 | # randomly sample from repBuf 341 | sample_RepBuf = torch.zeros(self.repBatch_rate* num_batches, sample_pop.size()[0], sample_pop.size()[1], sample_pop.size()[2]) 342 | target_RepBuf = torch.zeros(self.repBatch_rate* num_batches).long() 343 | repBuf_idx = torch.zeros(self.repBatch_rate * num_batches).long() 344 | 345 | for i in range(0, int(self.repBatch_rate * num_batches)): 346 | repBuf_idx[i] = self._sample(index) 347 | path_RepBuf = self.root + self.store_loc[repBuf_idx[i]] 348 | target_RepBuf[i] = self.labels[repBuf_idx[i]] 349 | 350 | sample_RepBuf_tmp = self.loader(path_RepBuf) 351 | if self.transform_RepBuf is not None: 352 | # print("performing to sample_RepBuf: {}".format(self.transform_RepBuf)) 353 | sample_RepBuf[i] = self.transform_RepBuf(sample_RepBuf_tmp) 354 | if self.target_transform_RepBuf is not None: 355 | target_RepBuf[i] = self.target_transform_RepBuf(target_RepBuf[i]) 356 | 357 | return sample_test, target_test, user_test, time_taken_test, test_idx, sample_pop, target_pop, index_pop, sample_RepBuf, target_RepBuf, repBuf_idx 358 | 359 | def __len__(self): 360 | return len(self.data_idx) 361 | -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/__pycache__/yfcc100m_dataset.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/__pycache__/yfcc100m_dataset.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/__pycache__/yfcc100m_dataset.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/__pycache__/yfcc100m_dataset.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/__pycache__/group_norm.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/groupNorm/__pycache__/group_norm.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/__pycache__/group_norm.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/groupNorm/__pycache__/group_norm.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/__pycache__/resnet.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/groupNorm/__pycache__/resnet.cpython-37.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/__pycache__/resnet.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IntelLabs/continuallearning/ee7eb9e8550c4f74a6432475ebae37cc05021535/CLOC/code_online/no_PoLRS/groupNorm/__pycache__/resnet.cpython-39.pyc -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/group_norm.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | from torch.nn.modules.batchnorm import _BatchNorm 4 | 5 | 6 | def group_norm(input, group, running_mean, running_var, weight=None, bias=None, 7 | use_input_stats=True, momentum=0.1, eps=1e-5): 8 | r"""Applies Group Normalization for channels in the same group in each data sample in a 9 | batch. 10 | 11 | See :class:`~torch.nn.GroupNorm1d`, :class:`~torch.nn.GroupNorm2d`, 12 | :class:`~torch.nn.GroupNorm3d` for details. 13 | """ 14 | if not use_input_stats and (running_mean is None or running_var is None): 15 | raise ValueError('Expected running_mean and running_var to be not None when use_input_stats=False') 16 | 17 | b, c = input.size(0), input.size(1) 18 | if weight is not None: 19 | weight = weight.repeat(b) 20 | if bias is not None: 21 | bias = bias.repeat(b) 22 | 23 | def _instance_norm(input, group, running_mean=None, running_var=None, weight=None, 24 | bias=None, use_input_stats=None, momentum=None, eps=None): 25 | # Repeat stored stats and affine transform params if necessary 26 | if running_mean is not None: 27 | running_mean_orig = running_mean 28 | running_mean = running_mean_orig.repeat(b) 29 | if running_var is not None: 30 | running_var_orig = running_var 31 | running_var = running_var_orig.repeat(b) 32 | 33 | #norm_shape = [1, b * c / group, group] 34 | #print(norm_shape) 35 | # Apply instance norm 36 | input_reshaped = input.contiguous().view(1, int(b * c/group), group, *input.size()[2:]) 37 | 38 | out = F.batch_norm( 39 | input_reshaped, running_mean, running_var, weight=weight, bias=bias, 40 | training=use_input_stats, momentum=momentum, eps=eps) 41 | 42 | # Reshape back 43 | if running_mean is not None: 44 | running_mean_orig.copy_(running_mean.view(b, int(c/group)).mean(0, keepdim=False)) 45 | if running_var is not None: 46 | running_var_orig.copy_(running_var.view(b, int(c/group)).mean(0, keepdim=False)) 47 | 48 | return out.view(b, c, *input.size()[2:]) 49 | return _instance_norm(input, group, running_mean=running_mean, 50 | running_var=running_var, weight=weight, bias=bias, 51 | use_input_stats=use_input_stats, momentum=momentum, 52 | eps=eps) 53 | 54 | 55 | class _GroupNorm(_BatchNorm): 56 | def __init__(self, num_features, num_groups=1, eps=1e-5, momentum=0.1, 57 | affine=False, track_running_stats=False): 58 | self.num_groups = num_groups 59 | self.track_running_stats = track_running_stats 60 | super(_GroupNorm, self).__init__(int(num_features/num_groups), eps, 61 | momentum, affine, track_running_stats) 62 | 63 | def _check_input_dim(self, input): 64 | return NotImplemented 65 | 66 | def forward(self, input): 67 | self._check_input_dim(input) 68 | 69 | return group_norm( 70 | input, self.num_groups, self.running_mean, self.running_var, self.weight, self.bias, 71 | self.training or not self.track_running_stats, self.momentum, self.eps) 72 | 73 | 74 | class GroupNorm2d(_GroupNorm): 75 | r"""Applies Group Normalization over a 4D input (a mini-batch of 2D inputs 76 | with additional channel dimension) as described in the paper 77 | https://arxiv.org/pdf/1803.08494.pdf 78 | `Group Normalization`_ . 79 | 80 | Args: 81 | num_features: :math:`C` from an expected input of size 82 | :math:`(N, C, H, W)` 83 | num_groups: 84 | eps: a value added to the denominator for numerical stability. Default: 1e-5 85 | momentum: the value used for the running_mean and running_var computation. Default: 0.1 86 | affine: a boolean value that when set to ``True``, this module has 87 | learnable affine parameters. Default: ``True`` 88 | track_running_stats: a boolean value that when set to ``True``, this 89 | module tracks the running mean and variance, and when set to ``False``, 90 | this module does not track such statistics and always uses batch 91 | statistics in both training and eval modes. Default: ``False`` 92 | 93 | Shape: 94 | - Input: :math:`(N, C, H, W)` 95 | - Output: :math:`(N, C, H, W)` (same shape as input) 96 | 97 | Examples: 98 | >>> # Without Learnable Parameters 99 | >>> m = GroupNorm2d(100, 4) 100 | >>> # With Learnable Parameters 101 | >>> m = GroupNorm2d(100, 4, affine=True) 102 | >>> input = torch.randn(20, 100, 35, 45) 103 | >>> output = m(input) 104 | 105 | """ 106 | 107 | def _check_input_dim(self, input): 108 | if input.dim() != 4: 109 | raise ValueError('expected 4D input (got {}D input)' 110 | .format(input.dim())) 111 | 112 | 113 | class GroupNorm3d(_GroupNorm): 114 | """ 115 | Assume the data format is (B, C, D, H, W) 116 | """ 117 | def _check_input_dim(self, input): 118 | if input.dim() != 5: 119 | raise ValueError('expected 5D input (got {}D input)' 120 | .format(input.dim())) 121 | 122 | 123 | -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/groupNorm/resnet.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | from .group_norm import GroupNorm2d 5 | 6 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 7 | 'resnet152'] 8 | 9 | 10 | model_urls = { 11 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 12 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 13 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 14 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 15 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 16 | } 17 | 18 | 19 | def conv3x3(in_planes, out_planes, stride=1): 20 | """3x3 convolution with padding""" 21 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 22 | padding=1, bias=False) 23 | 24 | def norm2d(planes, num_channels_per_group=32): 25 | print("num_channels_per_group:{}".format(num_channels_per_group)) 26 | if num_channels_per_group > 0: 27 | return GroupNorm2d(planes, num_channels_per_group, affine=True, 28 | track_running_stats=False) 29 | else: 30 | return nn.BatchNorm2d(planes) 31 | 32 | 33 | class BasicBlock(nn.Module): 34 | expansion = 1 35 | 36 | def __init__(self, inplanes, planes, stride=1, downsample=None, 37 | group_norm=0): 38 | super(BasicBlock, self).__init__() 39 | self.conv1 = conv3x3(inplanes, planes, stride) 40 | self.bn1 = norm2d(planes, group_norm) 41 | self.relu = nn.ReLU(inplace=True) 42 | self.conv2 = conv3x3(planes, planes) 43 | self.bn2 = norm2d(planes, group_norm) 44 | self.downsample = downsample 45 | self.stride = stride 46 | 47 | def forward(self, x): 48 | residual = x 49 | 50 | out = self.conv1(x) 51 | out = self.bn1(out) 52 | out = self.relu(out) 53 | 54 | out = self.conv2(out) 55 | out = self.bn2(out) 56 | 57 | if self.downsample is not None: 58 | residual = self.downsample(x) 59 | 60 | out += residual 61 | out = self.relu(out) 62 | 63 | return out 64 | 65 | 66 | class Bottleneck(nn.Module): 67 | expansion = 4 68 | 69 | def __init__(self, inplanes, planes, stride=1, downsample=None, 70 | group_norm=0): 71 | super(Bottleneck, self).__init__() 72 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 73 | self.bn1 = norm2d(planes, group_norm) 74 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 75 | padding=1, bias=False) 76 | self.bn2 = norm2d(planes, group_norm) 77 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 78 | self.bn3 = norm2d(planes * 4, group_norm) 79 | self.relu = nn.ReLU(inplace=True) 80 | self.downsample = downsample 81 | self.stride = stride 82 | 83 | def forward(self, x): 84 | residual = x 85 | 86 | out = self.conv1(x) 87 | out = self.bn1(out) 88 | out = self.relu(out) 89 | 90 | out = self.conv2(out) 91 | out = self.bn2(out) 92 | out = self.relu(out) 93 | 94 | out = self.conv3(out) 95 | out = self.bn3(out) 96 | 97 | if self.downsample is not None: 98 | residual = self.downsample(x) 99 | 100 | out += residual 101 | out = self.relu(out) 102 | 103 | return out 104 | 105 | 106 | class ResNet(nn.Module): 107 | 108 | def __init__(self, block, layers, num_classes=1000, group_norm=0): 109 | self.inplanes = 64 110 | super(ResNet, self).__init__() 111 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 112 | bias=False) 113 | self.bn1 = norm2d(64, group_norm) 114 | self.relu = nn.ReLU(inplace=True) 115 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 116 | self.layer1 = self._make_layer(block, 64, layers[0], 117 | group_norm=group_norm) 118 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2, 119 | group_norm=group_norm) 120 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2, 121 | group_norm=group_norm) 122 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, 123 | group_norm=group_norm) 124 | self.avgpool = nn.AvgPool2d(7, stride=1) 125 | self.fc = nn.Linear(512 * block.expansion, num_classes) 126 | 127 | for m in self.modules(): 128 | if isinstance(m, nn.Conv2d): 129 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 130 | m.weight.data.normal_(0, math.sqrt(2. / n)) 131 | elif isinstance(m, nn.BatchNorm2d): 132 | m.weight.data.fill_(1) 133 | m.bias.data.zero_() 134 | elif isinstance(m, GroupNorm2d): 135 | m.weight.data.fill_(1) 136 | m.bias.data.zero_() 137 | 138 | for m in self.modules(): 139 | if isinstance(m, Bottleneck): 140 | m.bn3.weight.data.fill_(0) 141 | if isinstance(m, BasicBlock): 142 | m.bn2.weight.data.fill_(0) 143 | 144 | 145 | def _make_layer(self, block, planes, blocks, stride=1, group_norm=0): 146 | downsample = None 147 | if stride != 1 or self.inplanes != planes * block.expansion: 148 | downsample = nn.Sequential( 149 | nn.Conv2d(self.inplanes, planes * block.expansion, 150 | kernel_size=1, stride=stride, bias=False), 151 | norm2d(planes * block.expansion, group_norm), 152 | ) 153 | 154 | layers = [] 155 | layers.append(block(self.inplanes, planes, stride, downsample, 156 | group_norm)) 157 | self.inplanes = planes * block.expansion 158 | for i in range(1, blocks): 159 | layers.append(block(self.inplanes, planes, group_norm=group_norm)) 160 | 161 | return nn.Sequential(*layers) 162 | 163 | def forward(self, x): 164 | x = self.conv1(x) 165 | x = self.bn1(x) 166 | x = self.relu(x) 167 | x = self.maxpool(x) 168 | 169 | x = self.layer1(x) 170 | x = self.layer2(x) 171 | x = self.layer3(x) 172 | x = self.layer4(x) 173 | 174 | x = self.avgpool(x) 175 | x = x.view(x.size(0), -1) 176 | x = self.fc(x) 177 | 178 | return x 179 | 180 | 181 | def resnet18(pretrained=False, **kwargs): 182 | """Constructs a ResNet-18 model. 183 | 184 | Args: 185 | pretrained (bool): If True, returns a model pre-trained on ImageNet 186 | """ 187 | model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs) 188 | if pretrained: 189 | model.load_state_dict(model_zoo.load_url(model_urls['resnet18'])) 190 | return model 191 | 192 | 193 | def resnet34(pretrained=False, **kwargs): 194 | """Constructs a ResNet-34 model. 195 | 196 | Args: 197 | pretrained (bool): If True, returns a model pre-trained on ImageNet 198 | """ 199 | model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs) 200 | if pretrained: 201 | model.load_state_dict(model_zoo.load_url(model_urls['resnet34'])) 202 | return model 203 | 204 | 205 | def resnet50(pretrained=False, **kwargs): 206 | """Constructs a ResNet-50 model. 207 | 208 | Args: 209 | pretrained (bool): If True, returns a model pre-trained on ImageNet 210 | """ 211 | model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs) 212 | if pretrained: 213 | model.load_state_dict(model_zoo.load_url(model_urls['resnet50'])) 214 | return model 215 | 216 | 217 | def resnet101(pretrained=False, **kwargs): 218 | """Constructs a ResNet-101 model. 219 | 220 | Args: 221 | pretrained (bool): If True, returns a model pre-trained on ImageNet 222 | """ 223 | model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs) 224 | if pretrained: 225 | model.load_state_dict(model_zoo.load_url(model_urls['resnet101'])) 226 | return model 227 | 228 | 229 | def resnet152(pretrained=False, **kwargs): 230 | """Constructs a ResNet-152 model. 231 | 232 | Args: 233 | pretrained (bool): If True, returns a model pre-trained on ImageNet 234 | """ 235 | model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs) 236 | if pretrained: 237 | model.load_state_dict(model_zoo.load_url(model_urls['resnet152'])) 238 | return model 239 | -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/main_online.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import random 4 | import shutil 5 | import time 6 | import warnings 7 | import builtins 8 | import math 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.parallel 13 | import torch.backends.cudnn as cudnn 14 | import torch.distributed as dist 15 | import torch.optim 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | import groupNorm.resnet as models_GN 23 | from torch import randperm 24 | 25 | import numpy as np 26 | import sys 27 | from tensorboardX import SummaryWriter 28 | from yfcc100m_dataset import YFCC_CL_Dataset_offline_val, YFCC_CL_Dataset_online 29 | 30 | import psutil 31 | import gc 32 | 33 | import copy 34 | 35 | 36 | model_names = sorted(name for name in models.__dict__ 37 | if name.islower() and not name.startswith("__") 38 | and callable(models.__dict__[name])) 39 | 40 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 41 | parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50', 42 | choices=model_names, 43 | help='model architecture: ' + 44 | ' | '.join(model_names) + 45 | ' (default: resnet18)') 46 | parser.add_argument('-j', '--workers', default=16, type=int, metavar='N', 47 | help='number of data loading workers (default: 4)') 48 | parser.add_argument('--epochs', default=90, type=int, metavar='N', 49 | help='number of total epochs to run') 50 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', 51 | help='manual epoch number (useful on restarts)') 52 | parser.add_argument('-b', '--batch-size', default=256, type=int, 53 | metavar='N', 54 | help='mini-batch size (default: 256), this is the total ' 55 | 'batch size of all GPUs on the current node when ' 56 | 'using Data Parallel or Distributed Data Parallel') 57 | parser.add_argument('--gradient_accumulation_steps', default=1, type=int, 58 | help='number of gradient accumulation steps') 59 | 60 | parser.add_argument('--lr', '--learning-rate', default=0.1, type=float, 61 | metavar='LR', help='initial learning rate', dest='lr') 62 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', 63 | help='momentum') 64 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float, 65 | metavar='W', help='weight decay (default: 1e-4)', 66 | dest='weight_decay') 67 | parser.add_argument('-p', '--print-freq', default=10, type=int, 68 | metavar='N', help='print frequency (default: 10)') 69 | parser.add_argument('-wf', '--write-freq', default=100, type=int, 70 | metavar='N', help='print frequency (default: 100)') 71 | parser.add_argument('--resume', default='', type=str, metavar='PATH', 72 | help='path to latest checkpoint (default: none)') 73 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 74 | help='evaluate model on validation set') 75 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', 76 | help='use pre-trained model') 77 | parser.add_argument('--world-size', default=-1, type=int, 78 | help='number of nodes for distributed training') 79 | parser.add_argument('--rank', default=-1, type=int, 80 | help='node rank for distributed training') 81 | parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, 82 | help='url used to set up distributed training') 83 | parser.add_argument('--dist-backend', default='nccl', type=str, 84 | help='distributed backend') 85 | parser.add_argument('--seed', default=None, type=int, 86 | help='seed for initializing training. ') 87 | parser.add_argument('--gpu', default=None, type=int, 88 | help='GPU id to use.') 89 | parser.add_argument('--multiprocessing-distributed', action='store_true', 90 | help='Use multi-processing distributed training to launch ' 91 | 'N processes per node, which has N GPUs. This is the ' 92 | 'fastest way to use PyTorch for either single node or ' 93 | 'multi node data parallel training') 94 | 95 | parser.add_argument('--cell_id', default='../final_metadata/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy', type=str, 96 | help='file that store the cell IDS') 97 | 98 | parser.add_argument('--val_freq', default=10, 99 | type=int, help='perform validation per [val_freq] epochs (in offline mode)') 100 | parser.add_argument('--adjust_lr', default=0, 101 | type=int, help='whether to adjust lr') 102 | parser.add_argument('--use_val', default=0, 103 | type=int, help='whether to use validation set') 104 | 105 | parser.add_argument('--root', default="/export/share/Datasets/yfcc100m_full_dataset/images/", 106 | type=str, help='root to the image dataset, used to save the memory consumption') 107 | parser.add_argument('--data', default="/export/share/Datasets/yfcc100m_full_dataset/metadata_geolocation/", 108 | type=str, help='path to the training data') 109 | parser.add_argument('--data_val', default="/export/share/Datasets/yfcc100m_full_dataset/metadata_geolocation/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv", 110 | type=str, help='path to the metadata') 111 | 112 | parser.add_argument('--gradient_steps_per_batch', default=1, 113 | type=int, help='number of gradient steps per batch, used to compare against firehose paper') 114 | parser.add_argument('--size_replay_buffer', default=0, 115 | type=int, help='size of the experience replay buffer (per gpu, so if you have 8 gpus, each gpu will have size_replay_buffer number of samples in the buffer)') 116 | parser.add_argument('--sample_rate_from_buffer', default=0.5, 117 | type=float, help='how many samples are from the buffer') 118 | parser.add_argument('--sampling_strategy', default='FIFO', 119 | type=str, help='sampling strategy (FIFO, Reservoir, RingBuf)') 120 | parser.add_argument('--num_classes', default=500, 121 | type=int, help='number of classes (used only for constructing ring buffer, no need to set this value)') 122 | 123 | parser.add_argument('--weight_old_data', default=1.0, 124 | type=float, help='weight of the loss on old data, used to separate the effect of learning rate on old and new data') 125 | 126 | parser.add_argument('--NOSubBatch', default=1, 127 | type=int, help='separate each batch into consecutive sub-batches, used only in online mode, for testing the effect of batch size') 128 | 129 | 130 | parser.add_argument('--GN', default=0, 131 | type=int, help='number of channels per group. If it is 0, it means ' 132 | 'batch norm instead of group-norm') 133 | 134 | parser.add_argument('--used_data_start', default = 0, type = int, help = 'number of images used, use the full dataset if set to > 1') 135 | parser.add_argument('--used_data_end', default = -1, type = int, help = 'number of images used, use the full dataset if set to > 1') 136 | 137 | # separate between pure-replay or mixed replay 138 | parser.add_argument("--ReplayType", default = 'mixRep', 139 | choices=['mixRep', 'pureRep'], help='Type of replay buffer') 140 | parser.add_argument("--min_lr", default = 0.0, type = float, help = 'minimum learning rate, used to cut the cosine LR') 141 | # whether to save intermediate models 142 | parser.add_argument("--SaveInter", default = 1, type = int, help = 'whether to save intermediate models') 143 | parser.add_argument("--ABS_performanceGap", default = 0.5, type = float, help = 'the allowed performance gap between the accuracy of old and new samples, hyper-param for adaptive buffer size (tuned through cross val)') 144 | parser.add_argument("--use_ADRep", default=1, type = int, help = 'whether to use ADRep') 145 | best_acc1 = 0 146 | 147 | def main(): 148 | args = parser.parse_args() 149 | 150 | if args.seed is not None: 151 | random.seed(args.seed) 152 | torch.manual_seed(args.seed) 153 | cudnn.deterministic = True 154 | warnings.warn('You have chosen to seed training. ' 155 | 'This will turn on the CUDNN deterministic setting, ' 156 | 'which can slow down your training considerably! ' 157 | 'You may see unexpected behavior when restarting ' 158 | 'from checkpoints.') 159 | 160 | if args.gpu is not None: 161 | warnings.warn('You have chosen a specific GPU. This will completely ' 162 | 'disable data parallelism.') 163 | 164 | if args.dist_url == "env://" and args.world_size == -1: 165 | args.world_size = int(os.environ["WORLD_SIZE"]) 166 | 167 | args.distributed = args.world_size > 1 or args.multiprocessing_distributed 168 | 169 | ngpus_per_node = torch.cuda.device_count() 170 | if args.multiprocessing_distributed: 171 | # Since we have ngpus_per_node processes per node, the total world_size 172 | # needs to be adjusted accordingly 173 | args.world_size = ngpus_per_node * args.world_size 174 | # Use torch.multiprocessing.spawn to launch distributed processes: the 175 | # main_worker process function 176 | mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) 177 | else: 178 | # Simply call main_worker function 179 | main_worker(args.gpu, ngpus_per_node, args) 180 | 181 | 182 | def main_worker(gpu, ngpus_per_node, args): 183 | global best_acc1 184 | args.gpu = gpu 185 | # suppress printing if not master 186 | if args.multiprocessing_distributed and args.gpu != 0: 187 | def print_pass(*args): 188 | pass 189 | builtins.print = print_pass 190 | 191 | args.output_dir = create_output_dir(args) 192 | os.makedirs(args.output_dir, exist_ok = True) 193 | writer = SummaryWriter(args.output_dir) 194 | 195 | print("output_dir = {}".format(args.output_dir)) 196 | sys.stdout.flush() 197 | 198 | 199 | if args.gpu is not None: 200 | print("Use GPU: {} for training".format(args.gpu)) 201 | 202 | if args.distributed: 203 | if args.dist_url == "env://" and args.rank == -1: 204 | args.rank = int(os.environ["RANK"]) 205 | if args.multiprocessing_distributed: 206 | # For multiprocessing distributed training, rank needs to be the 207 | # global rank among all the processes 208 | args.rank = args.rank * ngpus_per_node + gpu 209 | dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 210 | world_size=args.world_size, rank=args.rank) 211 | 212 | model, criterion, optimizer, num_classes = init_model(args, ngpus_per_node) 213 | 214 | args.num_classes = num_classes 215 | 216 | # init the full dataset 217 | train_set, val_set = init_dataset(args) 218 | cudnn.benchmark = True 219 | 220 | if args.evaluate: 221 | # optionally resume from a checkpoint 222 | if args.resume: 223 | if os.path.isfile(args.resume): 224 | args.start_epoch, _, _, _, _ = resume_online(args, model, optimizer) 225 | else: 226 | raise RuntimeError("=> no checkpoint found at '{}'".format(args.resume)) 227 | 228 | print("do not perform training for evaluation mode, val_set = {}".format(val_set)) 229 | sys.stdout.flush() 230 | 231 | out_folder_eval = args.output_dir + '/epoch{}'.format(args.start_epoch) 232 | os.makedirs(out_folder_eval, exist_ok = True) 233 | writer = SummaryWriter(out_folder_eval) 234 | print("[validation]: saving validation result to {}".format(out_folder_eval)) 235 | val_loader = init_val_loader(args, val_set) 236 | acc1 = validate(val_loader, model, criterion, args, writer, args.epochs, is_forward = 2) 237 | 238 | # do forward and backward transfer too 239 | # 1. read out time of current example 240 | time_taken = torch.load(args.data+'train_time.torchSave') 241 | idx_last = compute_idx_curr(len(time_taken), args.start_epoch, args, ngpus_per_node) 242 | if idx_last < 0: 243 | time_last = time_taken[0] - 1 244 | else: 245 | time_last = time_taken[min(idx_last, len(time_taken)-1)] 246 | 247 | # compute forward 248 | val_set.set_transfer_time_point(args, val_set, time_last, is_forward = True) 249 | val_loader = init_val_loader(args, val_set) 250 | out_folder_eval = args.output_dir + '/epoch{}/forward'.format(args.start_epoch) 251 | os.makedirs(out_folder_eval, exist_ok = True) 252 | writer = SummaryWriter(out_folder_eval) 253 | print("[forward transfer]: saving validation result to {}".format(out_folder_eval)) 254 | acc1 = validate(val_loader, model, criterion, args, writer, args.epochs, is_forward = 1, time_last = time_last) 255 | # backward transfer 256 | val_set.set_transfer_time_point(args, val_set, time_last, is_forward = False) 257 | val_loader = init_val_loader(args, val_set) 258 | out_folder_eval = args.output_dir + '/epoch{}/backward'.format(args.start_epoch) 259 | os.makedirs(out_folder_eval, exist_ok = True) 260 | writer = SummaryWriter(out_folder_eval) 261 | print("[backward transfer]: saving validation result to {}".format(out_folder_eval)) 262 | acc1 = validate(val_loader, model, criterion, args, writer, args.epochs, is_forward = 0, time_last = time_last) 263 | 264 | else: 265 | train_online(args, model, criterion, optimizer, train_set, val_set, ngpus_per_node, writer) 266 | 267 | def compute_idx_curr(data_size, epoch, args, ngpus_per_node): 268 | data_size_per_epoch = math.ceil(data_size/args.epochs) 269 | if data_size_per_epoch % (args.batch_size* ngpus_per_node) != 0: 270 | data_size_per_epoch = data_size_per_epoch - data_size_per_epoch % (args.batch_size* ngpus_per_node) + (args.batch_size* ngpus_per_node) 271 | return data_size_per_epoch * epoch - 1 272 | 273 | 274 | def init_model(args, ngpus_per_node): 275 | cell_ids = np.load(args.cell_id, allow_pickle = True) 276 | num_classes = cell_ids.size + 1 # remember to +1 277 | print("init with num_classes = {}".format(num_classes)) 278 | 279 | if args.GN == 0: 280 | if args.pretrained: 281 | print("=> using pre-trained model '{}'".format(args.arch)) 282 | model = models.__dict__[args.arch](pretrained=True) 283 | else: 284 | print("=> creating model '{}'".format(args.arch)) 285 | model = models.__dict__[args.arch](num_classes = num_classes, norm_layer = nn.SyncBatchNorm) 286 | else: 287 | if args.pretrained: 288 | print("=> using pre-trained model '{}'".format(args.arch)) 289 | model = models_GN.__dict__[args.arch](pretrained=True, 290 | group_norm=args.GN) 291 | else: 292 | print("=> creating model '{}'".format(args.arch)) 293 | model = models_GN.__dict__[args.arch](num_classes = num_classes, 294 | group_norm=args.GN) 295 | 296 | if not torch.cuda.is_available(): 297 | print('using CPU, this will be slow') 298 | elif args.distributed: 299 | # For multiprocessing distributed, DistributedDataParallel constructor 300 | # should always set the single device scope, otherwise, 301 | # DistributedDataParallel will use all available devices. 302 | if args.gpu is not None: 303 | torch.cuda.set_device(args.gpu) 304 | model.cuda(args.gpu) 305 | # When using a single GPU per process and per 306 | # DistributedDataParallel, we need to divide the batch size 307 | # ourselves based on the total number of GPUs we have 308 | args.batch_size = int(args.batch_size / ngpus_per_node) 309 | args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node) 310 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu]) 311 | else: 312 | model.cuda() 313 | # DistributedDataParallel will divide and allocate batch_size to all 314 | # available GPUs if device_ids are not set 315 | model = torch.nn.parallel.DistributedDataParallel(model) 316 | elif args.gpu is not None: 317 | torch.cuda.set_device(args.gpu) 318 | model = model.cuda(args.gpu) 319 | else: 320 | # DataParallel will divide and allocate batch_size to all available GPUs 321 | if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): 322 | model.features = torch.nn.DataParallel(model.features) 323 | model.cuda() 324 | else: 325 | model = torch.nn.DataParallel(model).cuda() 326 | 327 | # define loss function (criterion) and optimizer 328 | criterion = nn.CrossEntropyLoss().cuda(args.gpu) 329 | optimizer = torch.optim.SGD(model.parameters(), args.lr, 330 | momentum=args.momentum, 331 | weight_decay=args.weight_decay) 332 | 333 | return model, criterion, optimizer, num_classes 334 | 335 | 336 | def init_dataset(args): 337 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 338 | std=[0.229, 0.224, 0.225]) 339 | 340 | 341 | trans = transforms.Compose([ 342 | transforms.RandomResizedCrop(224), 343 | transforms.RandomHorizontalFlip(), 344 | transforms.ToTensor(), 345 | normalize, 346 | ]) 347 | 348 | trans_test = transforms.Compose([ 349 | transforms.Resize(256), 350 | transforms.CenterCrop(224), 351 | transforms.ToTensor(), 352 | normalize, 353 | ]) 354 | 355 | print("data augmentation = {}; data augmentation for test batch = {}".format(trans, trans_test)) 356 | 357 | if args.evaluate: 358 | train_dataset = None 359 | val_dataset = YFCC_CL_Dataset_offline_val(args, 360 | transform = trans_test) 361 | else: 362 | train_dataset = YFCC_CL_Dataset_online(args, 363 | transform = trans, transform_RepBuf = trans, trans_test = trans_test) 364 | val_dataset = YFCC_CL_Dataset_offline_val(args, 365 | transform = trans_test) 366 | 367 | return train_dataset, val_dataset 368 | 369 | def resume_online(args, model, optimizer): 370 | print("=> loading checkpoint '{}'".format(args.resume)) 371 | 372 | if args.gpu is None: 373 | checkpoint = torch.load(args.resume) 374 | else: 375 | loc = 'cuda:{}'.format(args.gpu) 376 | checkpoint = torch.load(args.resume, map_location=loc) 377 | 378 | args.start_epoch = checkpoint['epoch'] 379 | best_acc1 = checkpoint['best_acc1'] 380 | 381 | model.load_state_dict(checkpoint['state_dict']) 382 | optimizer.load_state_dict(checkpoint['optimizer']) 383 | online_fit_meters = checkpoint['online_fit_meters'] 384 | 385 | userID_last = checkpoint['userID_last'] 386 | 387 | print("=> loaded checkpoint '{}' (epoch {})" 388 | .format(args.resume, checkpoint['epoch'])) 389 | if 'reservoir_buffer' in checkpoint: 390 | reservoir_buffer = checkpoint['reservoir_buffer'] 391 | else: 392 | reservoir_buffer = None 393 | 394 | args.size_replay_buffer = checkpoint['size_replay_buffer'] 395 | return checkpoint['epoch'], checkpoint['best_acc1'], reservoir_buffer, online_fit_meters, userID_last 396 | 397 | 398 | def init_val_loader(args, val_set): 399 | return torch.utils.data.DataLoader(val_set, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True) 400 | 401 | def init_loader_v2(args, train_set, epoch): 402 | train_set.size_buf = args.size_replay_buffer 403 | train_set._change_data_range(epoch = epoch) 404 | 405 | if args.distributed: 406 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, shuffle = False) 407 | else: 408 | train_sampler = None 409 | 410 | batchSize4Loader = int(args.batch_size/args.NOSubBatch) 411 | train_loader = torch.utils.data.DataLoader( 412 | train_set, batch_size=batchSize4Loader, shuffle=(train_sampler is None), 413 | num_workers=args.workers, pin_memory=True, sampler=train_sampler) 414 | return train_loader, train_sampler 415 | 416 | 417 | def init_online_fit_meters(): 418 | return [AverageMeter('LossOF', ':.4e'), AverageMeter('AccOF@1', ':6.2f'), AverageMeter('AccOF@5', ':6.2f'), AverageMeter('AccF@1', ':6.2f'), AverageMeter('AccO@1', ':6.2f')] 419 | 420 | def train_online(args, model, criterion, optimizer, train_set, val_set, ngpus_per_node, writer): 421 | # init metrics for online fit 422 | online_fit_meters = init_online_fit_meters() 423 | 424 | userID_last = -1 425 | 426 | if args.resume: 427 | if os.path.isfile(args.resume): 428 | args.start_epoch, _, reservoir_buffer, online_fit_meters, userID_last = resume_online(args, model, optimizer) 429 | if reservoir_buffer is not None: 430 | train_set.buf_last = reservoir_buffer.cpu() 431 | else: 432 | raise RuntimeError("=> no checkpoint found at '{}'".format(args.resume)) 433 | 434 | for epoch in range(args.start_epoch, args.epochs): 435 | train_loader, train_sampler = init_loader_v2(args, train_set, epoch) 436 | 437 | if args.distributed: 438 | train_sampler.set_epoch(0) 439 | 440 | if args.adjust_lr: 441 | adjust_learning_rate(optimizer, epoch, args, ngpus_per_node) 442 | 443 | writer.add_scalar("learning rate", get_lr(optimizer), epoch) 444 | 445 | user_ID_last = train_MultiGD(train_loader, model, criterion, optimizer, epoch, args, writer, online_fit_meters, userID_last) 446 | 447 | if not args.multiprocessing_distributed or (args.multiprocessing_distributed 448 | and args.rank % ngpus_per_node == 0): 449 | save_dict = { 450 | 'epoch': epoch + 1, 451 | 'arch': args.arch, 452 | 'size_replay_buffer': args.size_replay_buffer, 453 | 'state_dict': model.state_dict(), 454 | 'best_acc1': 0, 455 | 'optimizer' : optimizer.state_dict(), 456 | 'online_fit_meters': online_fit_meters, 457 | 'userID_last': userID_last, 458 | } 459 | 460 | # if args.gpu == 0: 461 | if args.sampling_strategy == 'Reservoir': 462 | # store previous reservoir buffer 463 | save_dict['reservoir_buffer'] = train_set.buf_last 464 | 465 | if (epoch + 1) % 10 == 0 and args.SaveInter: 466 | save_checkpoint(save_dict, 0, output_dir = args.output_dir, filename='checkpoint_ep{}.pth.tar'.format(epoch + 1)) 467 | else: 468 | save_checkpoint(save_dict, 0, output_dir = args.output_dir) 469 | 470 | if args.use_val == 1: 471 | val_loader = torch.utils.data.DataLoader(val_set, 472 | batch_size=args.batch_size, shuffle=False, 473 | num_workers=args.workers, pin_memory=True) 474 | acc1 = validate(val_loader, model, criterion, args, writer, args.epochs) 475 | print("final average accuracy = {}".format(acc1)) 476 | 477 | def find_next_test_album(user_ID, user_ID_last, idx_time_sorted): 478 | idx_ne = (user_ID[idx_time_sorted] != user_ID_last).nonzero().flatten() 479 | if idx_ne.numel() > 0: 480 | idx_end = 1 481 | for i in range(1, idx_ne.numel()): 482 | if user_ID[idx_time_sorted][idx_ne[0]] == user_ID[idx_time_sorted][idx_ne[i]]: 483 | idx_end += 1 484 | else: 485 | break 486 | return idx_ne[:idx_end] 487 | else: 488 | return None 489 | 490 | def train_MultiGD(train_loader, model, criterion, optimizer, epoch, args, writer, online_fit_meters, userID_last): 491 | # these statistics are local, so we need a separate set of meters for online fit, forward and backward transfer. 492 | batch_time = AverageMeter('Time', ':6.3f') 493 | data_time = AverageMeter('Data', ':6.3f') 494 | 495 | losses = AverageMeter('Loss', ':.4e') 496 | 497 | top1 = AverageMeter('Acc@1', ':6.2f') 498 | top5 = AverageMeter('Acc@5', ':6.2f') 499 | 500 | top1_future = AverageMeter('AccF@1', ':6.2f') 501 | top5_future = AverageMeter('AccF@5', ':6.2f') 502 | 503 | top1_Rep = AverageMeter('Acc_old@1', ':6.2f') 504 | top5_Rep = AverageMeter('Acc_old@5', ':6.2f') 505 | 506 | progress = ProgressMeter( 507 | len(train_loader), 508 | [batch_time, data_time, losses, top1, top1_future, top1_Rep, online_fit_meters[1], online_fit_meters[2]], 509 | prefix="Epoch: [{}]".format(epoch)) 510 | 511 | # switch to train mode 512 | model.train() 513 | end = time.time() 514 | 515 | optimizer.zero_grad() 516 | 517 | ngpus = torch.cuda.device_count() 518 | 519 | rate_rep_sample = args.sample_rate_from_buffer/(1-args.sample_rate_from_buffer) 520 | 521 | for i, (images, target, userID, time_taken, index, 522 | images_pop_tmp, target_pop_tmp, index_pop_tmp, 523 | images_from_buf_tmp, target_from_buf_tmp, index_buf_tmp) in enumerate(train_loader): 524 | # 1. images & target: test mini-batch (size = batch size) 525 | # 2. images_pop & target_pop: samples poped from validation buffer, used for training (size = batch size, set to "images & target" if batch_size*iter < val_buf_size, and we do normal SGD in this case) 526 | # 3. images_from_buf and target_from_buf: samples from replay buffer, used for training (size = batch size * sample rate * number of GD steps per iter) 527 | 528 | # measure data loading time 529 | data_time.update(time.time() - end) 530 | 531 | # print("target = {}".format(target)) 532 | iter_curr = epoch*len(train_loader)/(args.gradient_steps_per_batch*args.NOSubBatch)+i//(args.gradient_steps_per_batch*args.NOSubBatch) 533 | batch_size = target_pop_tmp.numel() 534 | 535 | if i == 0: 536 | batch_size_ori = batch_size 537 | 538 | if i % (args.gradient_steps_per_batch*args.NOSubBatch) == 0: 539 | images = images.reshape(images.size()[0]*images.size()[1],images.size()[2], images.size()[3], images.size()[4])[:] 540 | target = target.reshape(1, -1)[0][:] 541 | userID = userID.reshape(1, -1)[0][:] 542 | time_taken = time_taken.reshape(1, -1)[0][:] 543 | index = index.reshape(1, -1)[0][:] 544 | 545 | # organize the format of samples from the replay buffer 546 | bs_from_buf = round(batch_size*rate_rep_sample) 547 | images_pop = images_pop_tmp 548 | target_pop = target_pop_tmp 549 | index_pop = index_pop_tmp 550 | 551 | images_from_buf = images_from_buf_tmp.reshape(images_from_buf_tmp.size()[0]*images_from_buf_tmp.size()[1],images_from_buf_tmp.size()[2], images_from_buf_tmp.size()[3], images_from_buf_tmp.size()[4])[:bs_from_buf] 552 | target_from_buf = target_from_buf_tmp.reshape(1, -1)[0][:bs_from_buf] 553 | index_buf = index_buf_tmp.reshape(1, -1)[0][:bs_from_buf] 554 | 555 | 556 | if i % (args.gradient_steps_per_batch * args.NOSubBatch) == 0: 557 | # do prediction on the new batch 558 | if args.gpu is not None: 559 | images = images.cuda(args.gpu, non_blocking=True) 560 | target = target.cuda(args.gpu, non_blocking=True) 561 | 562 | with torch.no_grad(): 563 | model.eval() 564 | output = model(images) 565 | output_all = global_gather(output) 566 | target_new = global_gather(target) 567 | 568 | acc1F, acc5F = accuracy(output_all, target_new, topk=(1, 5)) 569 | top1_future.update(acc1F[0]) 570 | top5_future.update(acc5F[0]) 571 | 572 | # compute online fit on the next album 573 | userID = global_gather(userID.cuda(args.gpu, non_blocking=True)) 574 | time_taken = global_gather(time_taken.cuda(args.gpu, non_blocking=True)) 575 | time_sorted, idx_sort = time_taken.sort() 576 | time_last = time_sorted[-1] 577 | idx_set = find_next_test_album(userID, userID_last, idx_sort) 578 | if idx_set is not None: 579 | output_album = output_all[idx_sort][idx_set] 580 | target_album = target_new[idx_sort][idx_set] 581 | acc1OF, acc5OF = accuracy(output_album, target_album, topk=(1,5)) 582 | online_fit_meters[1].update(acc1OF[0]) 583 | online_fit_meters[2].update(acc5OF[0]) 584 | 585 | # update userID_last 586 | userID_last = userID[idx_sort[-1]] 587 | 588 | model.train() 589 | 590 | images = images_pop.cuda(args.gpu, non_blocking=True) 591 | target = target_pop.cuda(args.gpu, non_blocking=True) 592 | 593 | # new code starts from here, do not support sub-batches for now 594 | if args.sample_rate_from_buffer > 0 and iter_curr > 1: 595 | images_from_buf = images_from_buf.cuda(args.gpu, non_blocking=True) 596 | target_from_buf = target_from_buf.cuda(args.gpu, non_blocking=True) 597 | images_merged, target_merged = merge_data_gpu(images_from_buf, target_from_buf, images, target, args) 598 | flag_merged = 1 599 | else: 600 | images_merged = images 601 | target_merged = target 602 | flag_merged = 0 603 | 604 | output = model(images_merged) 605 | loss = compute_loss(args, output, target_merged, criterion, target.numel()) 606 | 607 | 608 | loss.backward() 609 | 610 | 611 | optimizer.step() # update parameters of net 612 | optimizer.zero_grad() # reset gradient 613 | 614 | if i % (args.gradient_steps_per_batch * args.NOSubBatch) == 0: 615 | 616 | losses.update(loss.item(), target_merged.numel()) 617 | output_all = global_gather(output) 618 | target_all = global_gather(target_merged) 619 | acc1, acc5 = accuracy(output_all, target_all, topk=(1, 5)) 620 | top1.update(acc1[0], target_all.numel()) 621 | top5.update(acc5[0], target_all.numel()) 622 | 623 | if target_merged.numel() > target.numel(): 624 | output_all = global_gather(output[target.numel():]) 625 | target_all = global_gather(target_merged[target.numel():]) 626 | acc1, acc5 = accuracy(output_all, target_all, topk=(1, 5)) 627 | 628 | top1_Rep.update(acc1[0], target_all.numel()) 629 | top5_Rep.update(acc5[0], target_all.numel()) 630 | 631 | # measure elapsed time 632 | batch_time.update(time.time() - end) 633 | end = time.time() 634 | 635 | if i % (args.print_freq * args.gradient_steps_per_batch * args.NOSubBatch) == 0: 636 | progress.display(i) 637 | # sys.stdout.flush() 638 | 639 | if i % (args.write_freq*args.gradient_steps_per_batch* args.NOSubBatch) == 0: 640 | writer.add_scalar("train_loss_iter", losses.avg, iter_curr) 641 | writer.add_scalar("train_acc1_iter", top1.avg, iter_curr) 642 | writer.add_scalar("train_acc5_iter", top5.avg, iter_curr) 643 | 644 | writer.add_scalar("avg_online_loss_iter", online_fit_meters[0].avg, iter_curr) 645 | writer.add_scalar("avg_online_acc1_iter", online_fit_meters[1].avg, iter_curr) 646 | writer.add_scalar("avg_online_acc5_iter", online_fit_meters[2].avg, iter_curr) 647 | 648 | writer.add_scalar("avg_online_loss_time_iter", online_fit_meters[0].avg, time_last) 649 | writer.add_scalar("avg_online_acc1_time_iter", online_fit_meters[1].avg, time_last) 650 | writer.add_scalar("avg_online_acc5_time_iter", online_fit_meters[2].avg, time_last) 651 | 652 | writer.add_scalar("train_acc1_old_iter", top1_Rep.avg, iter_curr) 653 | writer.add_scalar("train_acc5_old_iter", top5_Rep.avg, iter_curr) 654 | 655 | writer.add_scalar("train_acc1_future_iter", top1_future.avg, iter_curr) 656 | writer.add_scalar("train_acc5_future_iter", top5_future.avg, iter_curr) 657 | 658 | 659 | # if args.gpu == 0: 660 | writer.add_scalar("train_loss", losses.avg, epoch) 661 | writer.add_scalar("train_acc1", top1.avg, epoch) 662 | writer.add_scalar("train_acc5", top5.avg, epoch) 663 | 664 | writer.add_scalar("avg_online_loss_time", online_fit_meters[0].avg, time_last) 665 | writer.add_scalar("avg_online_acc1_time", online_fit_meters[1].avg, time_last) 666 | writer.add_scalar("avg_online_acc5_time", online_fit_meters[2].avg, time_last) 667 | 668 | writer.add_scalar("avg_online_loss_epoch", online_fit_meters[0].avg, epoch) 669 | writer.add_scalar("avg_online_acc1_epoch", online_fit_meters[1].avg, epoch) 670 | writer.add_scalar("avg_online_acc5_epoch", online_fit_meters[2].avg, epoch) 671 | 672 | writer.add_scalar("train_acc1_old", top1_Rep.avg, epoch) 673 | writer.add_scalar("train_acc5_old", top5_Rep.avg, epoch) 674 | 675 | writer.add_scalar("train_acc1_future", top1_future.avg, epoch) 676 | writer.add_scalar("train_acc5_future", top5_future.avg, epoch) 677 | 678 | writer.add_scalar("RepBuf_size", args.size_replay_buffer, epoch) 679 | if args.use_ADRep: 680 | if top1_Rep.avg - top1_future.avg > args.ABS_performanceGap: 681 | args.size_replay_buffer = int(args.size_replay_buffer*2) 682 | elif top1_Rep.avg - top1_future.avg < -args.ABS_performanceGap: 683 | args.size_replay_buffer = int(args.size_replay_buffer/2) 684 | print("[changing replay buffer size]: changing replay buffer size to {} (rep vs future = {}/{})".format(args.size_replay_buffer, top1_Rep.avg, top1_future.avg)) 685 | 686 | 687 | return userID_last 688 | 689 | def validate(val_loader, model, criterion, args, writer, epoch, is_forward = 2, time_last = -1): 690 | batch_time = AverageMeter('Time', ':6.3f') 691 | losses = AverageMeter('Loss', ':.4e') 692 | top1 = AverageMeter('Acc@1', ':6.2f') 693 | top5 = AverageMeter('Acc@5', ':6.2f') 694 | 695 | top1_iter = AverageMeter('AccI@1', ':6.2f') 696 | top5_iter = AverageMeter('AccI@5', ':6.2f') 697 | 698 | top1_over_time = AverageMeter('Acc@1_time', ':6.2f') 699 | top5_over_time = AverageMeter('Acc@5_time', ':6.2f') 700 | 701 | progress = ProgressMeter( 702 | len(val_loader), 703 | [batch_time, losses, top1, top5, top1_iter, top5_iter, top1_over_time, top5_over_time], 704 | prefix='Test: ') 705 | 706 | model.eval() 707 | week_per_second = 24*7*3600 708 | time_init = time_last 709 | with torch.no_grad(): 710 | end = time.time() 711 | for i, (images, target, time_curr, idx) in enumerate(val_loader): 712 | 713 | if args.gpu is not None: 714 | images = images.cuda(args.gpu, non_blocking=True) 715 | target = target.cuda(args.gpu, non_blocking=True) 716 | 717 | if i == 0: 718 | if time_last == -1: 719 | time_last = time_curr[0] 720 | idx_init = idx[0] 721 | 722 | # compute output 723 | output = model(images) 724 | loss = criterion(output, target) 725 | 726 | # measure accuracy and record loss 727 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 728 | losses.update(loss.item(), target.size(0)) 729 | top1.update(acc1[0], target.size(0)) 730 | top5.update(acc5[0], target.size(0)) 731 | 732 | top1_over_time.update(acc1[0], target.size(0)) 733 | top5_over_time.update(acc5[0], target.size(0)) 734 | top1_iter.update(acc1[0], target.size(0)) 735 | top5_iter.update(acc5[0], target.size(0)) 736 | 737 | # measure elapsed time 738 | batch_time.update(time.time() - end) 739 | end = time.time() 740 | 741 | 742 | if i % args.print_freq == 0: 743 | progress.display(i) 744 | sys.stdout.flush() 745 | 746 | time_latest = time_curr[-1] 747 | idx_latest = idx[-1] 748 | if is_forward == 0: 749 | time_gap_from_init = time_init - time_latest 750 | time_gap = time_last - time_latest 751 | idx_gap = idx_init - idx_latest 752 | else: 753 | time_gap_from_init = time_latest - time_init 754 | time_gap = time_latest - time_last 755 | idx_gap = idx_latest - idx_init 756 | 757 | if idx_gap % 10000 == 0 or i == (len(val_loader)-1): 758 | writer.add_scalar("val_acc1_iter", top1_iter.avg, idx_gap) 759 | writer.add_scalar("val_acc5_iter", top5_iter.avg, idx_gap) 760 | 761 | writer.add_scalar("transfer_top1_valIdx", top1.avg, idx_gap) 762 | writer.add_scalar("transfer_top5_valIdx", top5.avg, idx_gap) 763 | 764 | top1_iter.reset() 765 | top5_iter.reset() 766 | 767 | if time_gap > week_per_second: 768 | writer.add_scalar("transfer_top1_time", top1.avg, time_gap_from_init) 769 | writer.add_scalar("transfer_top5_time", top5.avg, time_gap_from_init) 770 | 771 | writer.add_scalar("val_acc1_over_time", top1_over_time.avg, time_gap_from_init) 772 | writer.add_scalar("val_acc5_over_time", top5_over_time.avg, time_gap_from_init) 773 | 774 | top1_over_time.reset() 775 | top5_over_time.reset() 776 | time_last = time_latest 777 | 778 | # TODO: this should also be done with the ProgressMeter 779 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}' 780 | .format(top1=top1, top5=top5)) 781 | 782 | # if args.gpu == 0: 783 | writer.add_scalar("val_loss", losses.avg, epoch) 784 | writer.add_scalar("val_acc1", top1.avg, epoch) 785 | return top1.avg 786 | 787 | 788 | def save_checkpoint(state, is_best, output_dir = '.', filename='checkpoint.pth.tar'): 789 | torch.save(state, output_dir+'/checkpoint.pth.tar') 790 | if filename != 'checkpoint.pth.tar': 791 | shutil.copyfile(output_dir+'/checkpoint.pth.tar', output_dir+'/'+filename) 792 | 793 | 794 | class AverageMeter(object): 795 | """Computes and stores the average and current value""" 796 | def __init__(self, name, fmt=':f'): 797 | self.name = name 798 | self.fmt = fmt 799 | self.reset() 800 | 801 | def reset(self): 802 | self.val = 0 803 | self.avg = 0 804 | self.sum = 0 805 | self.count = 0 806 | 807 | def update(self, val, n=1): 808 | self.val = val 809 | self.sum += val * n 810 | self.count += n 811 | self.avg = self.sum / self.count 812 | 813 | def ma_update(self, val, weight): 814 | print("[ma_update before]: avg = {}; weight = {}; val = {}".format(self.avg, weight, val)) 815 | if weight >= 1.0: 816 | self.update(val) 817 | else: 818 | self.val = val 819 | self.avg = self.avg * weight + val * (1 - weight) 820 | print("[ma_update after]: avg = {}".format(self.avg)) 821 | 822 | def __str__(self): 823 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 824 | return fmtstr.format(**self.__dict__) 825 | 826 | 827 | class ProgressMeter(object): 828 | def __init__(self, num_batches, meters, prefix=""): 829 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 830 | self.meters = meters 831 | self.prefix = prefix 832 | 833 | def display(self, batch): 834 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 835 | entries += [str(meter) for meter in self.meters] 836 | # for test only 837 | print('\t'.join(entries)) 838 | 839 | def _get_batch_fmtstr(self, num_batches): 840 | num_digits = len(str(num_batches // 1)) 841 | fmt = '{:' + str(num_digits) + 'd}' 842 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 843 | 844 | 845 | def create_output_dir(args): 846 | output_dir = 'results_no_PoLRS/'+args.arch+'_lr{}_epochs{}'.format(args.lr, args.epochs)+'_bufSize{}'.format(args.size_replay_buffer) 847 | 848 | if args.sampling_strategy != 'FIFO': 849 | output_dir += '_sampleStrategy{}'.format(args.sampling_strategy) 850 | 851 | if args.adjust_lr != 0: 852 | output_dir += '_adjustLr{}'.format(args.adjust_lr) 853 | if args.adjust_lr == 2: 854 | output_dir += '_MinLr{}'.format(args.cyclic_min_lr) 855 | 856 | if args.ReplayType != 'mixRep': 857 | output_dir += '_{}'.format(args.ReplayType) 858 | 859 | if args.ABS_performanceGap != 0.5: 860 | output_dir += 'ABSPG{}'.format(args.ABS_performanceGap) 861 | 862 | if args.gradient_steps_per_batch > 1: 863 | output_dir+='_GDSteps{}'.format(args.gradient_steps_per_batch) 864 | 865 | output_dir += '_BS{}'.format(args.batch_size) 866 | 867 | if args.GN > 0: 868 | output_dir += '_GN{}'.format(args.GN) 869 | 870 | if args.NOSubBatch > 1: 871 | output_dir += '_NOSubBatch{}'.format(args.NOSubBatch) 872 | 873 | if args.weight_decay != 1e-4: 874 | output_dir += '_WD{}'.format(args.weight_decay) 875 | 876 | if args.min_lr > 0.0: 877 | output_dir += '_minLr{}'.format(args.min_lr) 878 | 879 | if args.use_ADRep <=0: 880 | output_dir += '_noADRep' 881 | 882 | if args.evaluate: 883 | output_dir += '/evaluate' 884 | 885 | 886 | return output_dir 887 | 888 | def global_gather(x): 889 | all_x = [torch.ones_like(x) 890 | for _ in range(dist.get_world_size())] 891 | dist.all_gather(all_x, x, async_op=False) 892 | return torch.cat(all_x, dim=0) 893 | 894 | def compute_loss_CE(args, output, target, criterion, size_subBatch_new): 895 | lossF = criterion(output[:size_subBatch_new], target[:size_subBatch_new]) 896 | if target.numel() > size_subBatch_new and args.weight_old_data > 0.0 and args.sample_rate_from_buffer > 0.0: 897 | lossRep = criterion(output[size_subBatch_new:], target[size_subBatch_new:]) 898 | return (lossF*0.5+args.weight_old_data*lossRep*0.5) 899 | else: 900 | return lossF 901 | 902 | 903 | def compute_loss(args, output, target, criterion, size_subBatch_new): 904 | return compute_loss_CE(args, output, target, criterion, size_subBatch_new) 905 | 906 | def merge_data(image_buf, target_buf, image, target): 907 | # currently just simply merge them 908 | if image_buf.is_cuda: 909 | image_buf = image_buf.cpu() 910 | if target_buf.is_cuda: 911 | target_buf = target_buf.cpu() 912 | 913 | return torch.cat((image, image_buf)), torch.cat((target, target_buf)) 914 | 915 | def merge_data_gpu(image_buf, target_buf, image, target, args): 916 | if not image_buf.is_cuda: 917 | image_buf = image_buf.cuda(args.gpu, non_blocking=True) 918 | if not target_buf.is_cuda: 919 | target_buf = target_buf.cuda(args.gpu, non_blocking=True) 920 | 921 | return torch.cat((image, image_buf)), torch.cat((target, target_buf)) 922 | 923 | def get_lr(optimizer): 924 | for param_group in optimizer.param_groups: 925 | return param_group['lr'] 926 | 927 | 928 | def adjust_learning_rate(optimizer, epoch, args, ngpus_per_node, iter_curr = 0): 929 | lr = args.lr 930 | 931 | lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs)) 932 | lr = max(args.min_lr, lr) 933 | 934 | for param_group in optimizer.param_groups: 935 | param_group['lr'] = lr 936 | 937 | 938 | def accuracy(output, target, topk=(1,), cross_GPU = False): 939 | """Computes the accuracy over the k top predictions for the specified values of k""" 940 | with torch.no_grad(): 941 | if cross_GPU: 942 | output = global_gather(output) 943 | target = global_gather(target) 944 | 945 | maxk = max(topk) 946 | batch_size = target.size(0) 947 | 948 | _, pred = output.topk(maxk, 1, True, True) 949 | pred = pred.t() 950 | correct = pred.eq(target.reshape(1, -1).expand_as(pred)) 951 | 952 | res = [] 953 | for k in topk: 954 | correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True) 955 | res.append(correct_k.mul_(100.0 / batch_size)) 956 | return res 957 | 958 | 959 | if __name__ == '__main__': 960 | main() -------------------------------------------------------------------------------- /CLOC/code_online/no_PoLRS/yfcc100m_dataset.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchvision.datasets.vision import StandardTransform 3 | from torch.utils.data import Dataset, IterableDataset 4 | from PIL import Image 5 | 6 | import os 7 | import os.path 8 | import csv 9 | 10 | import sys 11 | import math 12 | import random 13 | import matplotlib.pyplot as plt 14 | 15 | 16 | def has_file_allowed_extension(filename, extensions): 17 | """Checks if a file is an allowed extension. 18 | 19 | Args: 20 | filename (string): path to a file 21 | extensions (tuple of strings): extensions to consider (lowercase) 22 | 23 | Returns: 24 | bool: True if the filename ends with one of given extensions 25 | """ 26 | return filename.lower().endswith(extensions) 27 | 28 | 29 | def is_image_file(filename): 30 | """Checks if a file is an allowed image extension. 31 | 32 | Args: 33 | filename (string): path to a file 34 | 35 | Returns: 36 | bool: True if the filename ends with a known image extension 37 | """ 38 | return has_file_allowed_extension(filename, IMG_EXTENSIONS) 39 | 40 | 41 | def make_dataset(directory, class_to_idx, extensions=None, is_valid_file=None): 42 | instances = [] 43 | directory = os.path.expanduser(directory) 44 | both_none = extensions is None and is_valid_file is None 45 | both_something = extensions is not None and is_valid_file is not None 46 | if both_none or both_something: 47 | raise ValueError("Both extensions and is_valid_file cannot be None or not None at the same time") 48 | if extensions is not None: 49 | def is_valid_file(x): 50 | return has_file_allowed_extension(x, extensions) 51 | for target_class in sorted(class_to_idx.keys()): 52 | class_index = class_to_idx[target_class] 53 | target_dir = os.path.join(directory, target_class) 54 | if not os.path.isdir(target_dir): 55 | continue 56 | for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)): 57 | for fname in sorted(fnames): 58 | path = os.path.join(root, fname) 59 | if is_valid_file(path): 60 | item = path, class_index 61 | instances.append(item) 62 | return instances 63 | 64 | 65 | def pil_loader(path): 66 | # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835) 67 | with open(path, 'rb') as f: 68 | img = Image.open(f) 69 | return img.convert('RGB') 70 | 71 | 72 | def accimage_loader(path): 73 | import accimage 74 | try: 75 | return accimage.Image(path) 76 | except IOError: 77 | # Potentially a decoding problem, fall back to PIL.Image 78 | return pil_loader(path) 79 | 80 | 81 | def default_loader(path): 82 | from torchvision import get_image_backend 83 | if get_image_backend() == 'accimage': 84 | return accimage_loader(path) 85 | else: 86 | return pil_loader(path) 87 | 88 | 89 | IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp') 90 | 91 | 92 | 93 | class YFCC_CL_Dataset_offline_val(Dataset): 94 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, 95 | target_transform=None): 96 | 97 | fname = args.data_val 98 | root = args.root 99 | 100 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 101 | 102 | sys.stdout.flush() 103 | 104 | if isinstance(fname, torch._six.string_classes): 105 | fname = os.path.expanduser(fname) 106 | self.fname = fname 107 | 108 | self.transform = transform 109 | self.target_transform = target_transform 110 | 111 | self.labels, self.time_taken, self.user, self.store_loc = self._make_data(self.fname, root = root) 112 | if len(self.labels) == 0: 113 | msg = "Found 0 files in subfolders of: {}\n".format(self.fname) 114 | if extensions is not None: 115 | msg += "Supported extensions are: {}".format(",".join(extensions)) 116 | raise RuntimeError(msg) 117 | 118 | self.loader = loader 119 | self.extensions = extensions 120 | self.root = root 121 | 122 | self.batch_size = torch.cuda.device_count()*args.batch_size 123 | self.is_forward = True 124 | self.offset = 0 125 | 126 | print("root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}".format(root, self.time_taken[1000], len(self.time_taken), self.batch_size)) 127 | 128 | def _make_data(self, fname, root): 129 | # read data 130 | fval = open(fname, 'r') 131 | lines_val = fval.readlines() 132 | labels = [None] * len(lines_val) 133 | time = [None] * len(lines_val) 134 | user = [None] * len(lines_val) 135 | store_loc = [None] * len(lines_val) 136 | 137 | for i in range(len(lines_val)): 138 | line_splitted = lines_val[i].split(",") 139 | labels[i] = int(line_splitted[0]) 140 | time[i] = int(line_splitted[2]) 141 | user[i] = line_splitted[3] 142 | store_loc[i] = line_splitted[-1][:-1] 143 | return labels, time, user, store_loc 144 | 145 | def set_transfer_time_point(self, args, val_set, time_last, is_forward = True): 146 | # find idx that is larger and closest to time_last 147 | self.is_forward = is_forward 148 | for i in range(len(self.time_taken)): 149 | if self.time_taken[i] >= time_last: 150 | print("[set_transfer_time_point]: time_last = {}; time[{}] = {}".format(time_last, i, self.time_taken[i])) 151 | self.offset = i 152 | return 153 | self.offset = len(self.time_taken) - 1 154 | 155 | def __getitem__(self, index): 156 | if self.is_forward: 157 | index = min(len(self.labels), index + self.offset) 158 | else: 159 | index = max(0, self.offset - index) 160 | 161 | if self.root is not None: 162 | path = self.root + self.store_loc[index] 163 | else: 164 | path = self.store_loc[index] 165 | 166 | sample = self.loader(path) 167 | if self.transform is not None: 168 | sample = self.transform(sample) 169 | 170 | return sample, self.labels[index], self.time_taken[index], index 171 | 172 | 173 | def __len__(self): 174 | if self.is_forward: 175 | return len(self.labels) - self.offset 176 | else: 177 | return self.offset + 1 178 | 179 | 180 | class YFCC_CL_Dataset_online(Dataset): 181 | def __init__(self, args, loader = default_loader, extensions=IMG_EXTENSIONS, transform=None, transform_RepBuf = None, 182 | target_transform=None, target_transform_RepBuf = None, trans_test = None): 183 | 184 | fname = args.data 185 | root = args.root 186 | size_buf = args.size_replay_buffer 187 | 188 | print("YFCC_CL dataset loader = {}; extensions = {}".format(loader, extensions)) 189 | 190 | sys.stdout.flush() 191 | 192 | if isinstance(fname, torch._six.string_classes): 193 | fname = os.path.expanduser(fname) 194 | self.fname = fname 195 | 196 | # for backwards-compatibility 197 | self.transform = transform 198 | self.transform_test = trans_test 199 | self.target_transform = target_transform 200 | self.transform_RepBuf = transform_RepBuf 201 | self.target_transform_RepBuf = target_transform_RepBuf 202 | 203 | # valid initial and final index 204 | self.used_data_start = args.used_data_start 205 | self.used_data_end = args.used_data_end 206 | 207 | print("[YFCC_CL_Dataset_ConGraDv4] trans_test = {}".format(self.transform_test)) 208 | 209 | self._make_data() 210 | self.data_size = len(self.labels) 211 | self.data_size_per_epoch = math.ceil(self.data_size/args.epochs) 212 | 213 | self.batch_size = torch.cuda.device_count()*args.batch_size 214 | 215 | if self.data_size_per_epoch % self.batch_size != 0: 216 | self.data_size_per_epoch = self.data_size_per_epoch - self.data_size_per_epoch % self.batch_size + self.batch_size 217 | 218 | 219 | self.loader = loader 220 | self.extensions = extensions 221 | self.root = root 222 | 223 | # for replay buffer 224 | self.size_buf = size_buf 225 | 226 | self.repBuf_sample_rate = 0.5 # use this later, test 0.5 case first 227 | self.sampling_strategy = args.sampling_strategy 228 | 229 | self.NOSubBatch = args.NOSubBatch 230 | self.SubBatch_index_offset = int(self.batch_size/self.NOSubBatch) 231 | 232 | self.gradient_steps_per_batch = int(args.gradient_steps_per_batch) # repBatch_rate is multiplied by this 233 | # ratio between the replay buffer and the new input batch (rounded up to an integer) 234 | self.repBatch_rate = math.ceil(self.repBuf_sample_rate/(1-self.repBuf_sample_rate)) 235 | 236 | self.repType = args.ReplayType 237 | 238 | if self.sampling_strategy == 'Reservoir': 239 | self.buf_out_dir = args.output_dir + '/reservoir_buf' 240 | os.makedirs(self.buf_out_dir, exist_ok = True) 241 | self.buf_last = None 242 | self.gpu = args.gpu 243 | 244 | if self.sampling_strategy == 'RingBuf': 245 | self.samples_per_class = math.floor(size_buf / args.num_classes) 246 | print("[ConGraDv4]: self.samples_per_class = {}".format(self.samples_per_class)) 247 | self.num_classes = args.num_classes 248 | self.buf_out_dir = args.output_dir + '/RingBuf_buf' 249 | os.makedirs(self.buf_out_dir, exist_ok = True) 250 | self.buf_last = None 251 | self.gpu = args.gpu 252 | 253 | print("[initData]: root = {}; time_taken (an example) = {}; time_taken.len = {}; batch_size = {}; size_buf = {}; repBuf_sample_rate = {}".format(root, self.time_taken[1000], len(self.time_taken), self.batch_size, self.size_buf, self.repBuf_sample_rate)) 254 | print("[initData]: repBatch_rate = {}; NOSubBatch = {}; SubBatch_index_offset = {}".format(self.repBatch_rate, self.NOSubBatch, self.SubBatch_index_offset)) 255 | print("[initData]: transform = {}; transform_RepBuf = {}".format(self.transform, self.transform_RepBuf)) 256 | 257 | def _make_data(self): 258 | self.labels = torch.load(self.fname+'train_labels.torchSave') 259 | self.time_taken = torch.load(self.fname+'train_time.torchSave') 260 | 261 | self.user = torch.load(self.fname+'train_userID.torchSave') 262 | self.store_loc = [None] * len(self.labels) 263 | self.idx_data = [] 264 | 265 | def _change_data_range_FIFO(self): 266 | print("reading store location from {}".format(self.fname+'train_store_loc.torchSave')) 267 | tmp_loc = torch.load(self.fname+'train_store_loc.torchSave') 268 | print("tmp_loc.siez = {}".format(len(tmp_loc))) 269 | for i in range(self.used_data_start, self.used_data_end): 270 | self.store_loc[i] = tmp_loc[i][:-1] 271 | if i % 1e5 == 0: 272 | print("store_loc[{}] = {}".format(i, self.store_loc[i])) 273 | sys.stdout.flush() 274 | 275 | def _change_data_range_reservoir(self, idx_Reservoir_sample, buf_init, epoch = 0): 276 | tmp_loc = torch.load(self.fname+'train_store_loc.torchSave') 277 | i = 0 278 | 279 | for i in range(len(tmp_loc)): 280 | if (i >= self.used_data_start - self.batch_size and (self.used_data_end <= 0 or i < self.used_data_end)): 281 | self.store_loc[i] = tmp_loc[i][:-1] 282 | 283 | # create a new reservoir idx_set 284 | if i >= self.used_data_start and i % self.batch_size == 0: 285 | batch_num = i//self.batch_size 286 | batch_num_curr_epoch = (i-self.used_data_start)//self.batch_size 287 | 288 | buf_file_name = self.buf_out_dir + '/{}.buf'.format(batch_num_curr_epoch) 289 | buf_curr = self.buf_last.clone() 290 | 291 | if batch_num == 1: 292 | buf_curr = torch.tensor(list(range(i-self.batch_size, i))) 293 | elif batch_num > 1 and buf_curr.numel() < self.size_buf: 294 | buf_curr = torch.cat((buf_curr, torch.tensor(list(range(i-self.batch_size, i))))).unique() 295 | 296 | elif batch_num > 1: 297 | # do reservoir sampling when repBuf size reaches the maximum value 298 | reservoir_value = torch.randint(i, [self.batch_size]).unique() 299 | replace_idx = (reservoir_value < buf_curr.numel()).nonzero().flatten() 300 | if replace_idx.numel() >0: 301 | buf_curr[reservoir_value[replace_idx]] = (i-replace_idx-1) 302 | 303 | if buf_curr.numel() > self.size_buf: 304 | buf_curr = buf_curr[:self.size_buf] 305 | 306 | self.buf_last = buf_curr.clone() 307 | # save buffer 308 | if self.gpu == 0: 309 | torch.save(buf_curr, buf_file_name) 310 | 311 | 312 | elif idx_Reservoir_sample >=0 and idx_Reservoir_sample < buf_init.numel() and i == buf_init[idx_Reservoir_sample]: 313 | self.store_loc[i] = tmp_loc[i][:-1] 314 | idx_Reservoir_sample += 1 315 | 316 | 317 | if i % 1e5 == 0: 318 | print("store_loc[{}] = {}".format(i, self.store_loc[i])) 319 | sys.stdout.flush() 320 | 321 | if self.used_data_end > 0 and i >= self.used_data_end: 322 | break 323 | 324 | i += 1 325 | 326 | if self.gpu == 0: 327 | buf_last_file_name = self.buf_out_dir + '/last{}.buf'.format(epoch) 328 | torch.save(self.buf_last, buf_last_file_name) 329 | 330 | def _change_data_range_RingBuf(self, idx_RingBuf_sample, buf_init, buf_curr, idx_buf_curr, epoch): 331 | tmp_loc = torch.load(self.fname+'train_store_loc.torchSave') 332 | i = 0 333 | for i in range(len(tmp_loc)): 334 | if (i >= self.used_data_start - self.batch_size and (self.used_data_end <= 0 or i < self.used_data_end)): 335 | self.store_loc[i] = tmp_loc[i][:-1] 336 | 337 | if i >= self.used_data_start: 338 | # save buffer 339 | if i % self.batch_size == 0 and self.gpu == 0: 340 | self.buf_last = buf_curr[:] 341 | self.idx_buf_last = idx_buf_curr[:] 342 | batch_num = i//self.batch_size 343 | batch_num_curr_epoch = (i-self.used_data_start)//self.batch_size 344 | buf_file_name = self.buf_out_dir + '/{}.buf'.format(batch_num_curr_epoch) 345 | torch.save(buf_curr, buf_file_name) 346 | 347 | class_curr = self.labels[i] 348 | 349 | if buf_curr[class_curr] is None: 350 | buf_curr[class_curr] = torch.tensor([i]) 351 | idx_buf_curr[class_curr] = 0 352 | 353 | elif buf_curr[class_curr].numel() < self.samples_per_class: 354 | buf_curr[class_curr] = torch.cat((buf_curr[class_curr], torch.tensor([i]))) 355 | idx_buf_curr[class_curr] += 1 356 | 357 | else: 358 | idx_buf_curr[class_curr] = (idx_buf_curr[class_curr] + 1) % self.samples_per_class 359 | buf_curr[class_curr][idx_buf_curr[class_curr]] = i 360 | 361 | 362 | elif idx_RingBuf_sample >=0 and idx_RingBuf_sample < buf_init.numel() and i == buf_init[idx_RingBuf_sample]: 363 | self.store_loc[i] = tmp_loc[i][:-1] 364 | idx_RingBuf_sample += 1 365 | 366 | if i % 1e5 == 0: 367 | print("store_loc[{}] = {}".format(i, self.store_loc[i])) 368 | sys.stdout.flush() 369 | 370 | if self.used_data_end > 0 and i >= self.used_data_end: 371 | break 372 | 373 | i += 1 374 | 375 | if self.gpu == 0: 376 | buf_last_file_name = self.buf_out_dir + '/buf_last{}.buf'.format(epoch) 377 | buf_idx_last_file_name = self.buf_out_dir + '/buf_idx_last{}.buf'.format(epoch) 378 | 379 | torch.save(self.buf_last, buf_last_file_name) 380 | torch.save(self.idx_buf_last, buf_idx_last_file_name) 381 | 382 | def _set_data_idx(self, epoch): 383 | self.offset = epoch*self.data_size_per_epoch 384 | size_curr_epoch = min(self.data_size - self.offset, self.data_size_per_epoch) 385 | size_curr_epoch = size_curr_epoch - (size_curr_epoch % self.batch_size) 386 | 387 | batchSize = int(self.batch_size/self.NOSubBatch) 388 | self.data_idx = [None]* (size_curr_epoch *self.gradient_steps_per_batch) 389 | iter_total = size_curr_epoch//batchSize 390 | bsReplicated = batchSize*self.gradient_steps_per_batch 391 | for i in range(0, int(iter_total)): 392 | self.data_idx[i*bsReplicated:(i+1)*bsReplicated] = list(range(self.offset+i*batchSize, self.offset+(i+1)*batchSize))*self.gradient_steps_per_batch 393 | 394 | 395 | def _change_data_range(self, epoch = 0): 396 | self._set_data_idx(epoch) 397 | # compute data range to change 398 | if self.sampling_strategy == 'Reservoir' or self.sampling_strategy == 'RingBuf': 399 | self.used_data_start = epoch*self.data_size_per_epoch 400 | self.used_data_end = min(len(self.labels), (epoch+1)*self.data_size_per_epoch) 401 | else: 402 | self.used_data_start = max(0, epoch*self.data_size_per_epoch-self.size_buf-self.batch_size) 403 | self.used_data_end = min(len(self.labels), (epoch+1)*self.data_size_per_epoch+self.batch_size) 404 | 405 | print("change valid data range to: [{},{}]".format(self.used_data_start, self.used_data_end)) 406 | 407 | self.store_loc = [None] * len(self.labels) 408 | if self.sampling_strategy == 'Reservoir': 409 | if self.used_data_start == 0: 410 | self.buf_last = torch.tensor(list(range(self.batch_size))).long() 411 | buf_init = self.buf_last 412 | idx_Reservoir_sample = -1 413 | else: 414 | idx_Reservoir_sample = 0 415 | buf_last_file_name = self.buf_out_dir + '/last{}.buf'.format(epoch-1) 416 | self.buf_last = torch.load(buf_last_file_name) 417 | self.buf_last = self.buf_last.unique() 418 | buf_init, _ = self.buf_last.clone().flatten().sort() 419 | print("[reservoir]: idx_Reservoir_sample = {}; buf_init = {}".format(idx_Reservoir_sample, buf_init[:10])) 420 | self._change_data_range_reservoir(idx_Reservoir_sample, buf_init, epoch = epoch) 421 | 422 | elif self.sampling_strategy == 'RingBuf': 423 | # precompute the maximum index for each iteration 424 | print("[RingBuf]: entering ringBuf init stage") 425 | sys.stdout.flush() 426 | if self.used_data_start == 0: 427 | print("[RingBuf]: entering ringBuf init stage 1") 428 | self.buf_last = [None] * self.num_classes 429 | self.idx_buf_last = [-1] * self.num_classes 430 | buf_init = None 431 | idx_RingBuf_sample = -1 432 | else: 433 | print("[RingBuf]: entering ringBuf init stage 2") 434 | sys.stdout.flush() 435 | idx_RingBuf_sample = 0 436 | buf_last_file_name = self.buf_out_dir + '/buf_last{}.buf'.format(epoch-1) 437 | buf_idx_last_file_name = self.buf_out_dir + '/buf_idx_last{}.buf'.format(epoch-1) 438 | 439 | self.buf_last = torch.load(buf_last_file_name) 440 | self.idx_buf_last = torch.load(buf_idx_last_file_name) 441 | buf_init = None 442 | for i in range(0, len(self.buf_last)): 443 | if self.buf_last[i] is not None: 444 | if buf_init is None: 445 | buf_init = self.buf_last[i].clone() 446 | else: 447 | buf_init = torch.cat((buf_init, self.buf_last[i])) 448 | 449 | buf_init, _ = buf_init.flatten().sort() 450 | buf_init = buf_init.unique() 451 | print("[RingBuf]: idx_RingBuf_sample = {}; buf_init = {}".format(idx_RingBuf_sample, buf_init[:10])) 452 | sys.stdout.flush() 453 | buf_curr = self.buf_last[:] 454 | idx_buf_curr = self.idx_buf_last[:] 455 | self._change_data_range_RingBuf(idx_RingBuf_sample, buf_init, buf_curr, idx_buf_curr, epoch) 456 | else: 457 | self._change_data_range_FIFO() 458 | 459 | 460 | def _sample_FIFO(self, index): 461 | if index < self.batch_size: 462 | return 0 463 | else: 464 | repBuf_idx = random.randint(max(0, index-self.size_buf-self.batch_size), index-self.batch_size) 465 | return repBuf_idx 466 | 467 | def _sample_reservoir(self, index): 468 | batch_num = math.floor(index/self.batch_size) 469 | if batch_num == 0: 470 | return 0 471 | else: 472 | batch_num_curr_epoch = math.floor((index - self.used_data_start)/self.batch_size) 473 | buf_file_name = self.buf_out_dir + '/{}.buf'.format(batch_num_curr_epoch) 474 | buf_curr = torch.load(buf_file_name) 475 | repBuf_idx = buf_curr[random.randint(0, buf_curr.numel()-1)].item() 476 | return repBuf_idx 477 | 478 | def _sample_RingBuf(self, index): 479 | batch_num = math.floor(index/self.batch_size) 480 | if batch_num == 0: 481 | return 0 482 | else: 483 | batch_num_curr_epoch = math.floor((index - self.used_data_start)/self.batch_size) 484 | buf_file_name = self.buf_out_dir + '/{}.buf'.format(batch_num_curr_epoch) 485 | buf_curr = torch.load(buf_file_name) 486 | 487 | class_idx_all = torch.randperm(len(buf_curr)) 488 | class_idx = 0 489 | while buf_curr[class_idx_all[class_idx]] is None: 490 | class_idx += 1 491 | class_idx = class_idx_all[class_idx] 492 | repBuf_idx = buf_curr[class_idx][random.randint(0, buf_curr[class_idx].numel()-1)].item() 493 | return repBuf_idx 494 | 495 | 496 | def _sample(self, index): 497 | if self.sampling_strategy == 'FIFO': 498 | return self._sample_FIFO(index) 499 | elif self.sampling_strategy == 'RingBuf': 500 | return self._sample_RingBuf(index) 501 | elif self.sampling_strategy == 'Reservoir': 502 | return self._sample_reservoir(index) 503 | else: 504 | return self._sample_FIFO(index) 505 | 506 | def __getitem__(self, index): 507 | index = self.data_idx[index] 508 | 509 | if self.repType == 'mixRep': 510 | # mixed replay 511 | index_pop = index 512 | else: 513 | # pure replay based training 514 | index_pop = self._sample(index + self.batch_size) 515 | 516 | if index_pop < 0: 517 | index_pop = 0 518 | is_valid = torch.tensor(0) 519 | else: 520 | is_valid = torch.tensor(1) 521 | 522 | num_batches = 1 523 | 524 | 525 | target_pop = self.labels[index_pop] 526 | path_pop = self.root + self.store_loc[index_pop] 527 | sample_pop = self.loader(path_pop) 528 | if self.transform is not None: 529 | sample_pop = self.transform(sample_pop) 530 | if self.target_transform is not None: 531 | target_pop = self.target_transform(target_pop) 532 | 533 | sample_test = torch.zeros(self.NOSubBatch, sample_pop.size()[0], sample_pop.size()[1], sample_pop.size()[2]) 534 | target_test = torch.zeros(self.NOSubBatch).long() 535 | test_idx = torch.zeros(self.NOSubBatch).long() 536 | time_taken_test = torch.zeros(self.NOSubBatch).long() 537 | user_test = torch.zeros(self.NOSubBatch).long() 538 | 539 | # get the test data of full batch size 540 | for i in range(0, self.NOSubBatch): 541 | test_idx[i] = index+i*self.SubBatch_index_offset 542 | if test_idx[i] >= len(self.time_taken): 543 | test_idx[i] = len(self.time_taken) - 1 544 | 545 | time_taken_test[i] = self.time_taken[test_idx[i]] 546 | target_test[i] = self.labels[test_idx[i]] 547 | user_test[i] = self.user[test_idx[i]] 548 | 549 | path = self.root + self.store_loc[test_idx[i]] 550 | sample = self.loader(path) 551 | if self.transform is not None: 552 | sample_test[i] = self.transform_test(sample) 553 | if self.target_transform is not None: 554 | target_test[i] = self.target_transform(target_test[i]) 555 | 556 | # randomly sample from repBuf 557 | sample_RepBuf = torch.zeros(self.repBatch_rate* num_batches, sample_pop.size()[0], sample_pop.size()[1], sample_pop.size()[2]) 558 | target_RepBuf = torch.zeros(self.repBatch_rate* num_batches).long() 559 | repBuf_idx = torch.zeros(self.repBatch_rate * num_batches).long() 560 | 561 | if self.repType == 'mixRep': 562 | for i in range(0, int(self.repBatch_rate * num_batches)): 563 | repBuf_idx[i] = self._sample(index) 564 | path_RepBuf = self.root + self.store_loc[repBuf_idx[i]] 565 | target_RepBuf[i] = self.labels[repBuf_idx[i]] 566 | 567 | sample_RepBuf_tmp = self.loader(path_RepBuf) 568 | if self.transform_RepBuf is not None: 569 | sample_RepBuf[i] = self.transform_RepBuf(sample_RepBuf_tmp) 570 | if self.target_transform_RepBuf is not None: 571 | target_RepBuf[i] = self.target_transform_RepBuf(target_RepBuf[i]) 572 | 573 | return sample_test, target_test, user_test, time_taken_test, test_idx, sample_pop, target_pop, index_pop, sample_RepBuf, target_RepBuf, repBuf_idx 574 | 575 | 576 | 577 | def __len__(self): 578 | return len(self.data_idx) 579 | -------------------------------------------------------------------------------- /CLOC/data_preparation/download_images/download_images.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | import sys 3 | import csv 4 | 5 | import io 6 | import random 7 | import shutil 8 | import sys 9 | from multiprocessing import Pool 10 | import pathlib 11 | 12 | import requests 13 | from PIL import Image 14 | import time 15 | import os 16 | 17 | import wget 18 | 19 | def image_downloader(url_and_path:list): 20 | img_url = url_and_path[0] 21 | save_path = url_and_path[1] 22 | 23 | try: 24 | res = requests.get(img_url, stream=True) 25 | # count = 1 26 | # while res.status_code not in [ 301, 302, 303, 307, 308, 200 ] and count <= 5: 27 | # res = requests.get(img_url, stream=True) 28 | # # print(f'Retry: {count} {img_url}') 29 | # count += 1 30 | # checking the type for image 31 | if 'image' not in res.headers.get("content-type", ''): 32 | # print('ERROR: URL doesnot appear to be an image') 33 | return 0 34 | if not os.path.exists(save_path): 35 | os.makedirs(os.path.dirname(save_path), exist_ok = True) 36 | i = Image.open(io.BytesIO(res.content)) 37 | i.save(save_path) 38 | return 1 39 | 40 | except: 41 | return 0 42 | 43 | def test_function(number): 44 | return number*number 45 | 46 | def run_downloader(process:int, urls_and_paths:list): 47 | 48 | pool = Pool() 49 | 50 | results = pool.imap_unordered(image_downloader, urls_and_paths) 51 | total_number = 0 52 | success_number = 0 53 | 54 | for r in results: 55 | total_number = total_number + 1 56 | if r == 1: 57 | success_number = success_number + 1 58 | 59 | return success_number, total_number 60 | 61 | header = ['photoid', 'uid', 'unickname', 'datetaken', 'capturedevice', 'title', 'description', 'usertags','machinetags','longitude','latitude','accuracy', 'web page urls', 'downloadurl original', 'downloadurl medium size images (used)', 'local path'] 62 | 63 | 64 | # f_out = open('new_metadata.csv.zip', 'r') 65 | csv.field_size_limit(sys.maxsize) 66 | 67 | urls_and_paths = [] 68 | success_number_total = 0 69 | 70 | num_process = 5 # accelerate downloading using multiple processes 71 | current_start = 0 # restart from the previous image id, can be used to resume download if there is a time limit constraint to run the downloader 72 | batch_size = 200 # number of images to download simultaneously 73 | part_length = 50000000 # if you have the budget to run multiple downloaders at the same time, you can divide the dataset into several parts and specify the number of images in each part in here 74 | current_part = 0 # if you have the budget to run multiple downloaders at the same time, you can divide the dataset into several parts and specify the part to download in here 75 | current_end = part_length*(current_part+1) 76 | root_folder = 'dataset/' # name of the root folder for storing the images, remember to add '/' and change the corresponding dataset folder in the code 77 | 78 | print("downloading files [{} , {}]".format(current_start, current_end)) 79 | sys.stdout.flush() 80 | fname = '../release/download_link_and_locations.csv' 81 | with open(fname) as f: 82 | csv_reader = csv.reader(f, delimiter=',') 83 | line_count = 0 84 | for row in csv_reader: 85 | line_count += 1 86 | 87 | if line_count < current_start: 88 | continue 89 | elif line_count >= current_end: 90 | break 91 | 92 | downloadurl_m = row[0] 93 | local_path = root_folder + row[1] 94 | 95 | if line_count % batch_size == 0 : 96 | # print('line count = {}'.format(line_count)) 97 | # print('downloadurl_m = {}; local path = {}'.format(downloadurl_m, local_path)) 98 | urls_and_paths += [[downloadurl_m, local_path]] 99 | success_number, _ = run_downloader(num_process, urls_and_paths) 100 | success_number_total += success_number 101 | urls_and_paths = [] 102 | if line_count % 1000 == 0: 103 | print('success_rate = {}/{}'.format(success_number_total, line_count)) 104 | sys.stdout.flush() 105 | else: 106 | urls_and_paths += [[downloadurl_m, local_path]] 107 | 108 | 109 | print(f'Processed {line_count} lines.') 110 | -------------------------------------------------------------------------------- /CLOC/exp_BS/eval_BS128.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 2 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256_NOSubBatch2_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_BS/eval_BS256.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_BS/eval_BS64.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 4 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256_NOSubBatch4_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_BS/train_BS128.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 2 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_BS/train_BS256.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_BS/train_BS64.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 4 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_LR/eval_PoLRS.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/best_model/main_online_best_model.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --weight_old_data 1.0 \ 8 | --batch-size 256 \ 9 | --NOSubBatch 1 \ 10 | --workers 64 \ 11 | --gradient_steps_per_batch 1 \ 12 | --size_replay_buffer 40000 \ 13 | --epochs 90 \ 14 | --LR_adjust_intv 5 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_best_model/resnet50_lr0.05_epochs90_bufSize40000_LrAjIntv5_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:23794' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_LR/eval_constant.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | 4 | python ../code_online/no_PoLRS/main_online.py \ 5 | -a resnet50 \ 6 | --lr 0.05 \ 7 | --weight-decay 0.0001 \ 8 | --batch-size 256 \ 9 | --NOSubBatch 1 \ 10 | --workers 32 \ 11 | --size_replay_buffer 40000 \ 12 | --val_freq 1000 \ 13 | --epochs 90 \ 14 | --adjust_lr 0 \ 15 | --SaveInter 1 \ 16 | --use_ADRep 0 \ 17 | --evaluate \ 18 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_BS256_noADRep/checkpoint.pth.tar' \ 19 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 20 | --root "../data_preparation/release/dataset/images/" \ 21 | --data "../data_preparation/release/" \ 22 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 23 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 24 | -------------------------------------------------------------------------------- /CLOC/exp_LR/eval_cosine.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_LR/train_PoLRS.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/best_model/main_online_best_model.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --weight_old_data 1.0 \ 8 | --batch-size 256 \ 9 | --NOSubBatch 1 \ 10 | --workers 64 \ 11 | --gradient_steps_per_batch 1 \ 12 | --size_replay_buffer 40000 \ 13 | --epochs 90 \ 14 | --LR_adjust_intv 5 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:23794' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_LR/train_constant.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | 4 | python ../code_online/no_PoLRS/main_online.py \ 5 | -a resnet50 \ 6 | --lr 0.05 \ 7 | --weight-decay 0.0001 \ 8 | --batch-size 256 \ 9 | --NOSubBatch 1 \ 10 | --workers 32 \ 11 | --size_replay_buffer 40000 \ 12 | --val_freq 1000 \ 13 | --epochs 90 \ 14 | --adjust_lr 0 \ 15 | --SaveInter 1 \ 16 | --use_ADRep 0 \ 17 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 18 | --root "../data_preparation/release/dataset/images/" \ 19 | --data "../data_preparation/release/" \ 20 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 21 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 22 | -------------------------------------------------------------------------------- /CLOC/exp_LR/train_cosine.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/eval_39M.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000000_adjustLr1_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/eval_40K.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/eval_4M.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 4000000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize4000000_adjustLr1_BS256_noADRep/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/eval_ADRep.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 1 \ 16 | --evaluate \ 17 | --resume 'results_no_PoLRS/resnet50_lr0.05_epochs90_bufSize40000_adjustLr1_BS256/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/train_39M.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/train_40K.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/train_4M.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 4000000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_RepBuf/train_ADRep.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_online/no_PoLRS/main_online.py \ 4 | -a resnet50 \ 5 | --lr 0.05 \ 6 | --weight-decay 0.0001 \ 7 | --batch-size 256 \ 8 | --NOSubBatch 1 \ 9 | --workers 32 \ 10 | --size_replay_buffer 40000 \ 11 | --val_freq 1000 \ 12 | --epochs 90 \ 13 | --adjust_lr 1 \ 14 | --SaveInter 1 \ 15 | --use_ADRep 1 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:11805' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /CLOC/exp_best_model/eval_offline.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | #python main_v3.py \ 4 | python ../code_offline/main.py \ 5 | -a resnet50 \ 6 | --lr 0.025 \ 7 | --workers 64 \ 8 | --batch-size 256 \ 9 | --adjust_lr 1 \ 10 | --num_passes 1 \ 11 | --use_aug 1 \ 12 | --use_val 1 \ 13 | --val_freq 1 \ 14 | --epochs 90 \ 15 | --used_data_rate_start 0.0 \ 16 | --used_data_rate_end 1.0 \ 17 | --evaluate \ 18 | --resume 'results_offline/resnet50_lr0.025_epochs90_numPasses1_useVal1_valFreq1_BS256/checkpoint.pth.tar' \ 19 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 20 | --root "../data_preparation/release/dataset/images/" \ 21 | --data "../data_preparation/release/" \ 22 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 23 | --dist-url 'tcp://127.0.0.1:52176' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 24 | 25 | #val_data_rate = 0.5/0.01 26 | # --resume '/mnt/beegfs/tier1/vcl-nfs-work/zcai/WorkSpace/continual_learning/training/code_github/Continual-Learning/results/resnet50_lr0.1_isOnline0_valDataRate0.001_adjustLr1_numPasses3_useAug1_useVal1_valFreq20/checkpoint.pth.tar' \ 27 | #/export/share/Datasets/yfcc100m_full_dataset/metadata_geolocation 28 | -------------------------------------------------------------------------------- /CLOC/exp_best_model/eval_online.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | 4 | python ../code_online/best_model/main_online_best_model.py \ 5 | -a resnet50 \ 6 | --lr 0.0125 \ 7 | --weight-decay 0.0001 \ 8 | --weight_old_data 1.0 \ 9 | --batch-size 256 \ 10 | --NOSubBatch 4 \ 11 | --workers 64 \ 12 | --gradient_steps_per_batch 1 \ 13 | --size_replay_buffer 40000 \ 14 | --epochs 90 \ 15 | --LR_adjust_intv 5 \ 16 | --evaluate \ 17 | --resume 'results_best_model/resnet50_lr0.0125_epochs90_bufSize40000_LrAjIntv5_BS256_NOSubBatch4/checkpoint.pth.tar' \ 18 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 19 | --root "../data_preparation/release/dataset/images/" \ 20 | --data "../data_preparation/release/" \ 21 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 22 | --dist-url 'tcp://127.0.0.1:23794' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 23 | -------------------------------------------------------------------------------- /CLOC/exp_best_model/train_offline.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python ../code_offline/main.py \ 4 | -a resnet50 \ 5 | --lr 0.025 \ 6 | --workers 64 \ 7 | --batch-size 256 \ 8 | --adjust_lr 1 \ 9 | --num_passes 1 \ 10 | --use_aug 1 \ 11 | --use_val 1 \ 12 | --val_freq 1 \ 13 | --epochs 90 \ 14 | --used_data_rate_start 0.0 \ 15 | --used_data_rate_end 1.0 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:52176' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | 22 | -------------------------------------------------------------------------------- /CLOC/exp_best_model/train_online.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | 4 | python ../code_online/best_model/main_online_best_model.py \ 5 | -a resnet50 \ 6 | --lr 0.0125 \ 7 | --weight-decay 0.0001 \ 8 | --weight_old_data 1.0 \ 9 | --batch-size 256 \ 10 | --NOSubBatch 4 \ 11 | --workers 64 \ 12 | --gradient_steps_per_batch 1 \ 13 | --size_replay_buffer 40000 \ 14 | --epochs 90 \ 15 | --LR_adjust_intv 5 \ 16 | --cell_id "../data_preparation/release/cellID_yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250.npy" \ 17 | --root "../data_preparation/release/dataset/images/" \ 18 | --data "../data_preparation/release/" \ 19 | --data_val "../data_preparation/release/yfcc100m_metadata_with_labels_usedDataRatio0.05_t110000_t250_valid_files_2004To2014_compact_val.csv" \ 20 | --dist-url 'tcp://127.0.0.1:23794' --dist-backend 'nccl' --world-size 1 --rank 0 --multiprocessing-distributed 21 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Intel Labs 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DISCONTINUATION OF PROJECT # 2 | This project will no longer be maintained by Intel. 3 | Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. 4 | Intel no longer accepts patches to this project. 5 | If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project. 6 | 7 | # continual learning 8 | 9 | The "CLOC" folder contains the code for ICCV 2021 paper: Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data. 10 | 11 | [[Paper link]](https://arxiv.org/pdf/2108.09020.pdf) 12 | 13 | > reviewed: 12.19.2022 michaelbeale-il 14 | --------------------------------------------------------------------------------