├── .gitignore ├── README.md ├── args.py ├── clip ├── __init__.py ├── bpe_simple_vocab_16e6.txt.gz ├── clip.py ├── model.py └── simple_tokenizer.py ├── config.py ├── configs ├── base.yaml ├── scpnet+coco.yaml ├── scpnet+cub.yaml ├── scpnet+nuswide.yaml └── scpnet+voc.yaml ├── cub_labels.txt ├── dataset ├── coco_train_singlelabel.txt ├── cub_train.txt ├── cub_val.txt ├── nus_train_singlelabel.txt ├── nus_val.txt ├── voc_train.txt └── voc_val.txt ├── figures └── overview.png ├── log.py ├── logs ├── scpnet+coco.txt ├── scpnet+cub.txt ├── scpnet+nuswide.txt └── scpnet+voc.txt ├── loss.py ├── model.py ├── nuswide_labels.txt ├── randaugment.py ├── relation+coco.npy ├── relation+cub.npy ├── relation+nuswide.npy ├── relation+voc.npy ├── scpnet.py ├── train.py ├── utils.py └── voc_labels.txt /.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__ 2 | checkpoints -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # [Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels](https://openaccess.thecvf.com/content/CVPR2023/papers/Ding_Exploring_Structured_Semantic_Prior_for_Multi_Label_Recognition_With_Incomplete_CVPR_2023_paper.pdf) 2 | 3 | Official PyTorch Implementation of **SCPNet**, from the following paper: 4 | 5 | [Exploring Structured Semantic Prior 6 | for Multi Label Recognition with Incomplete Labels](https://openaccess.thecvf.com/content/CVPR2023/papers/Ding_Exploring_Structured_Semantic_Prior_for_Multi_Label_Recognition_With_Incomplete_CVPR_2023_paper.pdf). CVPR 2023. 7 | 8 | > Zixuan Ding*, Ao Wang*, Hui Chen†, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, Jungong Han, 9 | >
Xidian University, Tsinghua University, JD.com 10 | 11 | 12 | **Abstract** 13 | 14 | Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the imageto-label correspondence in the vision-language model, i.e., CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the 15 | valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used 16 | benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. 17 | 18 |

19 | 20 | 21 | 22 | 23 |
24 |

25 | 26 | 27 | ## Credit to previous work 28 | This repository is built upon the code base of [ASL](https://github.com/Alibaba-MIIL/ASL) and [SPLC](https://github.com/xinyu1205/robust-loss-mlml), thanks very much! 29 | 30 | ## Performance 31 | 32 | | Dataset | mAP | Ckpt | Log | 33 | |:---: | :---: | :---: | :---: | 34 | | COCO | 76.4 | [scpnet+coco.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+coco.ckpt) | [scpnet+coco.txt](logs/scpnet+coco.txt) | 35 | | VOC | 91.2 | [scpnet+voc.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+voc.ckpt) | [scpnet+voc.txt](logs/scpnet+voc.txt) | 36 | | NUSWIDE | 62.0 | [scpnet+nuswide.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+nuswide.ckpt) | [scpnet+nuswide.txt](logs/scpnet+nuswide.txt) | 37 | | CUB | 25.7 | [scpnet+cub.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+cub.ckpt) | [scpnet+cub.txt](logs/scpnet+cub.txt) | 38 | 39 | ## Training 40 | 41 | ### COCO 42 | ```python 43 | python train.py -c configs/scpnet+coco.yaml 44 | ``` 45 | 46 | ### VOC 47 | ```python 48 | python train.py -c configs/scpnet+voc.yaml 49 | ``` 50 | 51 | ### NUSWIDE 52 | ```python 53 | python train.py -c configs/scpnet+nuswide.yaml 54 | ``` 55 | 56 | ### CUB 57 | ```python 58 | python train.py -c configs/scpnet+cub.yaml 59 | ``` 60 | 61 | ## Inference 62 | 63 | > Note: Please place the pretrained checkpoint to checkpoints/scpnet+coco/round1/model-highest.ckpt 64 | 65 | #### COCO 66 | ```python 67 | python train.py -c configs/scpnet+coco.yaml -t -r 1 68 | ``` 69 | 70 | #### VOC 71 | ```python 72 | python train.py -c configs/scpnet+voc.yaml -t -r 1 73 | ``` 74 | 75 | #### NUSWIDE 76 | ```python 77 | python train.py -c configs/scpnet+nuswide.yaml -t -r 1 78 | ``` 79 | 80 | #### CUB 81 | ```python 82 | python train.py -c configs/scpnet+cub.yaml -t -r 1 83 | ``` 84 | 85 | ## Citation 86 | ``` 87 | @inproceedings{ding2023exploring, 88 | title={Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels}, 89 | author={Ding, Zixuan and Wang, Ao and Chen, Hui and Zhang, Qiang and Liu, Pengzhang and Bao, Yongjun and Yan, Weipeng and Han, Jungong}, 90 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, 91 | pages={3398--3407}, 92 | year={2023} 93 | } 94 | ``` -------------------------------------------------------------------------------- /args.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | parser = argparse.ArgumentParser(description='PyTorch MS_COCO Training') 4 | parser.add_argument('-c', 5 | '--config-file', 6 | help='config file', 7 | default='configs/base.yaml', 8 | type=str) 9 | parser.add_argument('-t', 10 | '--test', 11 | help='run test', 12 | default=False, 13 | action="store_true") 14 | parser.add_argument('-r', '--round', help='round', default=1, type=int) 15 | parser.add_argument('--resume', default=False, action='store_true') 16 | args = parser.parse_args() 17 | -------------------------------------------------------------------------------- /clip/__init__.py: -------------------------------------------------------------------------------- 1 | from .clip import * 2 | -------------------------------------------------------------------------------- /clip/bpe_simple_vocab_16e6.txt.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/clip/bpe_simple_vocab_16e6.txt.gz -------------------------------------------------------------------------------- /clip/clip.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | import os 3 | import urllib 4 | import warnings 5 | from typing import List, Union 6 | 7 | import torch 8 | from PIL import Image 9 | from torchvision.transforms import (CenterCrop, Compose, Normalize, Resize, 10 | ToTensor) 11 | from tqdm import tqdm 12 | 13 | from .model import build_model 14 | from .simple_tokenizer import SimpleTokenizer as _Tokenizer 15 | 16 | try: 17 | from torchvision.transforms import InterpolationMode 18 | BICUBIC = InterpolationMode.BICUBIC 19 | except ImportError: 20 | BICUBIC = Image.BICUBIC 21 | 22 | if torch.__version__.split(".") < ["1", "7", "1"]: 23 | warnings.warn("PyTorch version 1.7.1 or higher is recommended") 24 | 25 | __all__ = ["available_models", "load", "tokenize"] 26 | _tokenizer = _Tokenizer() 27 | 28 | _MODELS = { 29 | "RN50": 30 | "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt", 31 | "RN101": 32 | "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt", 33 | "RN50x4": 34 | "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt", 35 | "RN50x16": 36 | "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt", 37 | "ViT-B/32": 38 | "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt", 39 | "ViT-B/16": 40 | "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt", 41 | } 42 | 43 | 44 | def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")): 45 | os.makedirs(root, exist_ok=True) 46 | filename = os.path.basename(url) 47 | 48 | expected_sha256 = url.split("/")[-2] 49 | download_target = os.path.join(root, filename) 50 | 51 | if os.path.exists(download_target) and not os.path.isfile(download_target): 52 | raise RuntimeError( 53 | f"{download_target} exists and is not a regular file") 54 | 55 | if os.path.isfile(download_target): 56 | if hashlib.sha256(open(download_target, 57 | "rb").read()).hexdigest() == expected_sha256: 58 | return download_target 59 | else: 60 | warnings.warn( 61 | f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file" 62 | ) 63 | 64 | with urllib.request.urlopen(url) as source, open(download_target, 65 | "wb") as output: 66 | with tqdm(total=int(source.info().get("Content-Length")), 67 | ncols=80, 68 | unit='iB', 69 | unit_scale=True) as loop: 70 | while True: 71 | buffer = source.read(8192) 72 | if not buffer: 73 | break 74 | 75 | output.write(buffer) 76 | loop.update(len(buffer)) 77 | 78 | if hashlib.sha256(open(download_target, 79 | "rb").read()).hexdigest() != expected_sha256: 80 | raise RuntimeError( 81 | "Model has been downloaded but the SHA256 checksum does not not match" 82 | ) 83 | 84 | return download_target 85 | 86 | 87 | def _transform(n_px): 88 | return Compose([ 89 | Resize(n_px, interpolation=BICUBIC), 90 | CenterCrop(n_px), 91 | lambda image: image.convert("RGB"), 92 | ToTensor(), 93 | Normalize((0.48145466, 0.4578275, 0.40821073), 94 | (0.26862954, 0.26130258, 0.27577711)), 95 | ]) 96 | 97 | 98 | def available_models() -> List[str]: 99 | """Returns the names of available CLIP models""" 100 | return list(_MODELS.keys()) 101 | 102 | 103 | def load(name: str, 104 | device: Union[str, torch.device] = "cuda" 105 | if torch.cuda.is_available() else "cpu", 106 | jit=False): 107 | """Load a CLIP model 108 | 109 | Parameters 110 | ---------- 111 | name : str 112 | A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict 113 | 114 | device : Union[str, torch.device] 115 | The device to put the loaded model 116 | 117 | jit : bool 118 | Whether to load the optimized JIT model or more hackable non-JIT model (default). 119 | 120 | Returns 121 | ------- 122 | model : torch.nn.Module 123 | The CLIP model 124 | 125 | preprocess : Callable[[PIL.Image], torch.Tensor] 126 | A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input 127 | """ 128 | if name in _MODELS: 129 | model_path = _download(_MODELS[name]) 130 | elif os.path.isfile(name): 131 | model_path = name 132 | else: 133 | raise RuntimeError( 134 | f"Model {name} not found; available models = {available_models()}") 135 | 136 | try: 137 | # loading JIT archive 138 | model = torch.jit.load(model_path, 139 | map_location=device if jit else "cpu").eval() 140 | state_dict = None 141 | except RuntimeError: 142 | # loading saved state dict 143 | if jit: 144 | warnings.warn( 145 | f"File {model_path} is not a JIT archive. Loading as a state dict instead" 146 | ) 147 | jit = False 148 | state_dict = torch.load(model_path, map_location="cpu") 149 | 150 | if not jit: 151 | model = build_model(state_dict or model.state_dict()).to(device) 152 | if str(device) == "cpu": 153 | model.float() 154 | return model, _transform(model.visual.input_resolution) 155 | 156 | # patch the device names 157 | device_holder = torch.jit.trace( 158 | lambda: torch.ones([]).to(torch.device(device)), example_inputs=[]) 159 | device_node = [ 160 | n for n in device_holder.graph.findAllNodes("prim::Constant") 161 | if "Device" in repr(n) 162 | ][-1] 163 | 164 | def patch_device(module): 165 | try: 166 | graphs = [module.graph] if hasattr(module, "graph") else [] 167 | except RuntimeError: 168 | graphs = [] 169 | 170 | if hasattr(module, "forward1"): 171 | graphs.append(module.forward1.graph) 172 | 173 | for graph in graphs: 174 | for node in graph.findAllNodes("prim::Constant"): 175 | if "value" in node.attributeNames() and str( 176 | node["value"]).startswith("cuda"): 177 | node.copyAttributes(device_node) 178 | 179 | model.apply(patch_device) 180 | patch_device(model.encode_image) 181 | patch_device(model.encode_text) 182 | 183 | # patch dtype to float32 on CPU 184 | if str(device) == "cpu": 185 | float_holder = torch.jit.trace(lambda: torch.ones([]).float(), 186 | example_inputs=[]) 187 | float_input = list(float_holder.graph.findNode("aten::to").inputs())[1] 188 | float_node = float_input.node() 189 | 190 | def patch_float(module): 191 | try: 192 | graphs = [module.graph] if hasattr(module, "graph") else [] 193 | except RuntimeError: 194 | graphs = [] 195 | 196 | if hasattr(module, "forward1"): 197 | graphs.append(module.forward1.graph) 198 | 199 | for graph in graphs: 200 | for node in graph.findAllNodes("aten::to"): 201 | inputs = list(node.inputs()) 202 | for i in [ 203 | 1, 2 204 | ]: # dtype can be the second or third argument to aten::to() 205 | if inputs[i].node()["value"] == 5: 206 | inputs[i].node().copyAttributes(float_node) 207 | 208 | model.apply(patch_float) 209 | patch_float(model.encode_image) 210 | patch_float(model.encode_text) 211 | 212 | model.float() 213 | 214 | return model, _transform(model.input_resolution.item()) 215 | 216 | 217 | def tokenize(texts: Union[str, List[str]], 218 | context_length: int = 77, 219 | truncate: bool = False) -> torch.LongTensor: 220 | """ 221 | Returns the tokenized representation of given input string(s) 222 | 223 | Parameters 224 | ---------- 225 | texts : Union[str, List[str]] 226 | An input string or a list of input strings to tokenize 227 | 228 | context_length : int 229 | The context length to use; all CLIP models use 77 as the context length 230 | 231 | truncate: bool 232 | Whether to truncate the text in case its encoding is longer than the context length 233 | 234 | Returns 235 | ------- 236 | A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length] 237 | """ 238 | if isinstance(texts, str): 239 | texts = [texts] 240 | 241 | sot_token = _tokenizer.encoder["<|startoftext|>"] 242 | eot_token = _tokenizer.encoder["<|endoftext|>"] 243 | all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] 244 | for text in texts] 245 | result = torch.zeros(len(all_tokens), context_length, dtype=torch.long) 246 | 247 | for i, tokens in enumerate(all_tokens): 248 | if len(tokens) > context_length: 249 | if truncate: 250 | tokens = tokens[:context_length] 251 | tokens[-1] = eot_token 252 | else: 253 | raise RuntimeError( 254 | f"Input {texts[i]} is too long for context length {context_length}" 255 | ) 256 | result[i, :len(tokens)] = torch.tensor(tokens) 257 | 258 | return result 259 | -------------------------------------------------------------------------------- /clip/model.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | from typing import Tuple, Union 3 | 4 | import numpy as np 5 | import torch 6 | import torch.nn.functional as F 7 | from torch import nn 8 | 9 | 10 | class Bottleneck(nn.Module): 11 | expansion = 4 12 | 13 | def __init__(self, inplanes, planes, stride=1): 14 | super().__init__() 15 | 16 | # all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1 17 | self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False) 18 | self.bn1 = nn.BatchNorm2d(planes) 19 | 20 | self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False) 21 | self.bn2 = nn.BatchNorm2d(planes) 22 | 23 | self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity() 24 | 25 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False) 26 | self.bn3 = nn.BatchNorm2d(planes * self.expansion) 27 | 28 | self.relu = nn.ReLU(inplace=True) 29 | self.downsample = None 30 | self.stride = stride 31 | 32 | if stride > 1 or inplanes != planes * Bottleneck.expansion: 33 | # downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1 34 | self.downsample = nn.Sequential( 35 | OrderedDict([("-1", nn.AvgPool2d(stride)), 36 | ("0", 37 | nn.Conv2d(inplanes, 38 | planes * self.expansion, 39 | 1, 40 | stride=1, 41 | bias=False)), 42 | ("1", nn.BatchNorm2d(planes * self.expansion))])) 43 | 44 | def forward(self, x: torch.Tensor): 45 | identity = x 46 | 47 | out = self.relu(self.bn1(self.conv1(x))) 48 | out = self.relu(self.bn2(self.conv2(out))) 49 | out = self.avgpool(out) 50 | out = self.bn3(self.conv3(out)) 51 | 52 | if self.downsample is not None: 53 | identity = self.downsample(x) 54 | 55 | out += identity 56 | out = self.relu(out) 57 | return out 58 | 59 | 60 | class AttentionPool2d(nn.Module): 61 | 62 | def __init__(self, 63 | spacial_dim: int, 64 | embed_dim: int, 65 | num_heads: int, 66 | output_dim: int = None): 67 | super().__init__() 68 | self.positional_embedding = nn.Parameter( 69 | torch.randn(spacial_dim**2 + 1, embed_dim) / embed_dim**0.5) 70 | self.k_proj = nn.Linear(embed_dim, embed_dim) 71 | self.q_proj = nn.Linear(embed_dim, embed_dim) 72 | self.v_proj = nn.Linear(embed_dim, embed_dim) 73 | self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim) 74 | self.num_heads = num_heads 75 | 76 | def forward(self, x): 77 | x = x.reshape(x.shape[0], x.shape[1], 78 | x.shape[2] * x.shape[3]).permute(2, 0, 79 | 1) # NCHW -> (HW)NC 80 | x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0) # (HW+1)NC 81 | x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC 82 | x, _ = F.multi_head_attention_forward( 83 | query=x, 84 | key=x, 85 | value=x, 86 | embed_dim_to_check=x.shape[-1], 87 | num_heads=self.num_heads, 88 | q_proj_weight=self.q_proj.weight, 89 | k_proj_weight=self.k_proj.weight, 90 | v_proj_weight=self.v_proj.weight, 91 | in_proj_weight=None, 92 | in_proj_bias=torch.cat( 93 | [self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]), 94 | bias_k=None, 95 | bias_v=None, 96 | add_zero_attn=False, 97 | dropout_p=0, 98 | out_proj_weight=self.c_proj.weight, 99 | out_proj_bias=self.c_proj.bias, 100 | use_separate_proj_weight=True, 101 | training=self.training, 102 | need_weights=False) 103 | 104 | return x[0] 105 | 106 | 107 | class ModifiedResNet(nn.Module): 108 | """ 109 | A ResNet class that is similar to torchvision's but contains the following changes: 110 | - There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool. 111 | - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1 112 | - The final pooling layer is a QKV attention instead of an average pool 113 | """ 114 | 115 | def __init__(self, 116 | layers, 117 | output_dim, 118 | heads, 119 | input_resolution=224, 120 | width=64): 121 | super().__init__() 122 | self.output_dim = output_dim 123 | self.input_resolution = input_resolution 124 | 125 | # the 3-layer stem 126 | self.conv1 = nn.Conv2d(3, 127 | width // 2, 128 | kernel_size=3, 129 | stride=2, 130 | padding=1, 131 | bias=False) 132 | self.bn1 = nn.BatchNorm2d(width // 2) 133 | self.conv2 = nn.Conv2d(width // 2, 134 | width // 2, 135 | kernel_size=3, 136 | padding=1, 137 | bias=False) 138 | self.bn2 = nn.BatchNorm2d(width // 2) 139 | self.conv3 = nn.Conv2d(width // 2, 140 | width, 141 | kernel_size=3, 142 | padding=1, 143 | bias=False) 144 | self.bn3 = nn.BatchNorm2d(width) 145 | self.avgpool = nn.AvgPool2d(2) 146 | self.relu = nn.ReLU(inplace=True) 147 | 148 | # residual layers 149 | self._inplanes = width # this is a *mutable* variable used during construction 150 | self.layer1 = self._make_layer(width, layers[0]) 151 | self.layer2 = self._make_layer(width * 2, layers[1], stride=2) 152 | self.layer3 = self._make_layer(width * 4, layers[2], stride=2) 153 | self.layer4 = self._make_layer(width * 8, layers[3], stride=2) 154 | 155 | embed_dim = width * 32 # the ResNet feature dimension 156 | self.attnpool = AttentionPool2d(input_resolution // 32, embed_dim, 157 | heads, output_dim) 158 | 159 | def _make_layer(self, planes, blocks, stride=1): 160 | layers = [Bottleneck(self._inplanes, planes, stride)] 161 | 162 | self._inplanes = planes * Bottleneck.expansion 163 | for _ in range(1, blocks): 164 | layers.append(Bottleneck(self._inplanes, planes)) 165 | 166 | return nn.Sequential(*layers) 167 | 168 | def forward(self, x): 169 | 170 | def stem(x): 171 | for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2), 172 | (self.conv3, self.bn3)]: 173 | x = self.relu(bn(conv(x))) 174 | x = self.avgpool(x) 175 | return x 176 | 177 | x = x.type(self.conv1.weight.dtype) 178 | x = stem(x) 179 | x = self.layer1(x) 180 | x = self.layer2(x) 181 | x = self.layer3(x) 182 | x = self.layer4(x) 183 | x = self.attnpool(x) 184 | 185 | return x 186 | 187 | def get_features(self, x): 188 | def stem(x): 189 | for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2), 190 | (self.conv3, self.bn3)]: 191 | x = self.relu(bn(conv(x))) 192 | x = self.avgpool(x) 193 | return x 194 | 195 | x = x.type(self.conv1.weight.dtype) 196 | x = stem(x) 197 | x = self.layer1(x) 198 | x = self.layer2(x) 199 | x = self.layer3(x) 200 | x = self.layer4(x) 201 | return x 202 | 203 | class LayerNorm(nn.LayerNorm): 204 | """Subclass torch's LayerNorm to handle fp16.""" 205 | 206 | def forward(self, x: torch.Tensor): 207 | orig_type = x.dtype 208 | ret = super().forward(x.type(torch.float32)) 209 | return ret.type(orig_type) 210 | 211 | 212 | class QuickGELU(nn.Module): 213 | 214 | def forward(self, x: torch.Tensor): 215 | return x * torch.sigmoid(1.702 * x) 216 | 217 | 218 | class ResidualAttentionBlock(nn.Module): 219 | 220 | def __init__(self, 221 | d_model: int, 222 | n_head: int, 223 | attn_mask: torch.Tensor = None): 224 | super().__init__() 225 | 226 | self.attn = nn.MultiheadAttention(d_model, n_head) 227 | self.ln_1 = LayerNorm(d_model) 228 | self.mlp = nn.Sequential( 229 | OrderedDict([("c_fc", nn.Linear(d_model, d_model * 4)), 230 | ("gelu", QuickGELU()), 231 | ("c_proj", nn.Linear(d_model * 4, d_model))])) 232 | self.ln_2 = LayerNorm(d_model) 233 | self.attn_mask = attn_mask 234 | 235 | def attention(self, x: torch.Tensor): 236 | self.attn_mask = self.attn_mask.to( 237 | dtype=x.dtype, 238 | device=x.device) if self.attn_mask is not None else None 239 | return self.attn(x, x, x, need_weights=False, 240 | attn_mask=self.attn_mask)[0] 241 | 242 | def forward(self, x: torch.Tensor): 243 | x = x + self.attention(self.ln_1(x)) 244 | x = x + self.mlp(self.ln_2(x)) 245 | return x 246 | 247 | 248 | class Transformer(nn.Module): 249 | 250 | def __init__(self, 251 | width: int, 252 | layers: int, 253 | heads: int, 254 | attn_mask: torch.Tensor = None): 255 | super().__init__() 256 | self.width = width 257 | self.layers = layers 258 | self.resblocks = nn.Sequential(*[ 259 | ResidualAttentionBlock(width, heads, attn_mask) 260 | for _ in range(layers) 261 | ]) 262 | 263 | def forward(self, x: torch.Tensor): 264 | return self.resblocks(x) 265 | 266 | 267 | class VisionTransformer(nn.Module): 268 | 269 | def __init__(self, input_resolution: int, patch_size: int, width: int, 270 | layers: int, heads: int, output_dim: int): 271 | super().__init__() 272 | self.input_resolution = input_resolution 273 | self.output_dim = output_dim 274 | self.conv1 = nn.Conv2d(in_channels=3, 275 | out_channels=width, 276 | kernel_size=patch_size, 277 | stride=patch_size, 278 | bias=False) 279 | 280 | scale = width**-0.5 281 | self.class_embedding = nn.Parameter(scale * torch.randn(width)) 282 | self.positional_embedding = nn.Parameter(scale * torch.randn( 283 | (input_resolution // patch_size)**2 + 1, width)) 284 | self.ln_pre = LayerNorm(width) 285 | 286 | self.transformer = Transformer(width, layers, heads) 287 | 288 | self.ln_post = LayerNorm(width) 289 | self.proj = nn.Parameter(scale * torch.randn(width, output_dim)) 290 | 291 | def forward(self, x: torch.Tensor): 292 | x = self.conv1(x) # shape = [*, width, grid, grid] 293 | x = x.reshape(x.shape[0], x.shape[1], 294 | -1) # shape = [*, width, grid ** 2] 295 | x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width] 296 | x = torch.cat([ 297 | self.class_embedding.to(x.dtype) + torch.zeros( 298 | x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x 299 | ], 300 | dim=1) # shape = [*, grid ** 2 + 1, width] # noqa 301 | x = x + self.positional_embedding.to(x.dtype) 302 | x = self.ln_pre(x) 303 | 304 | x = x.permute(1, 0, 2) # NLD -> LND 305 | x = self.transformer(x) 306 | x = x.permute(1, 0, 2) # LND -> NLD 307 | 308 | x = self.ln_post(x[:, 0, :]) 309 | 310 | if self.proj is not None: 311 | x = x @ self.proj 312 | 313 | return x 314 | 315 | 316 | class CLIP(nn.Module): 317 | 318 | def __init__( 319 | self, 320 | embed_dim: int, 321 | # vision 322 | image_resolution: int, 323 | vision_layers: Union[Tuple[int, int, int, int], int], 324 | vision_width: int, 325 | vision_patch_size: int, 326 | # text 327 | context_length: int, 328 | vocab_size: int, 329 | transformer_width: int, 330 | transformer_heads: int, 331 | transformer_layers: int): 332 | super().__init__() 333 | 334 | self.context_length = context_length 335 | 336 | if isinstance(vision_layers, (tuple, list)): 337 | vision_heads = vision_width * 32 // 64 338 | self.visual = ModifiedResNet(layers=vision_layers, 339 | output_dim=embed_dim, 340 | heads=vision_heads, 341 | input_resolution=image_resolution, 342 | width=vision_width) 343 | else: 344 | vision_heads = vision_width // 64 345 | self.visual = VisionTransformer(input_resolution=image_resolution, 346 | patch_size=vision_patch_size, 347 | width=vision_width, 348 | layers=vision_layers, 349 | heads=vision_heads, 350 | output_dim=embed_dim) 351 | 352 | self.transformer = Transformer(width=transformer_width, 353 | layers=transformer_layers, 354 | heads=transformer_heads, 355 | attn_mask=self.build_attention_mask()) 356 | 357 | self.vocab_size = vocab_size 358 | self.token_embedding = nn.Embedding(vocab_size, transformer_width) 359 | self.positional_embedding = nn.Parameter( 360 | torch.empty(self.context_length, transformer_width)) 361 | self.ln_final = LayerNorm(transformer_width) 362 | 363 | self.text_projection = nn.Parameter( 364 | torch.empty(transformer_width, embed_dim)) 365 | self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07)) 366 | 367 | self.initialize_parameters() 368 | 369 | def initialize_parameters(self): 370 | nn.init.normal_(self.token_embedding.weight, std=0.02) 371 | nn.init.normal_(self.positional_embedding, std=0.01) 372 | 373 | if isinstance(self.visual, ModifiedResNet): 374 | if self.visual.attnpool is not None: 375 | std = self.visual.attnpool.c_proj.in_features**-0.5 376 | nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std) 377 | nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std) 378 | nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std) 379 | nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std) 380 | 381 | for resnet_block in [ 382 | self.visual.layer1, self.visual.layer2, self.visual.layer3, 383 | self.visual.layer4 384 | ]: 385 | for name, param in resnet_block.named_parameters(): 386 | if name.endswith("bn3.weight"): 387 | nn.init.zeros_(param) 388 | 389 | proj_std = (self.transformer.width**-0.5) * ( 390 | (2 * self.transformer.layers)**-0.5) 391 | attn_std = self.transformer.width**-0.5 392 | fc_std = (2 * self.transformer.width)**-0.5 393 | for block in self.transformer.resblocks: 394 | nn.init.normal_(block.attn.in_proj_weight, std=attn_std) 395 | nn.init.normal_(block.attn.out_proj.weight, std=proj_std) 396 | nn.init.normal_(block.mlp.c_fc.weight, std=fc_std) 397 | nn.init.normal_(block.mlp.c_proj.weight, std=proj_std) 398 | 399 | if self.text_projection is not None: 400 | nn.init.normal_(self.text_projection, 401 | std=self.transformer.width**-0.5) 402 | 403 | def build_attention_mask(self): 404 | # lazily create causal attention mask, with full attention between the vision tokens 405 | # pytorch uses additive attention mask; fill with -inf 406 | mask = torch.empty(self.context_length, self.context_length) 407 | mask.fill_(float("-inf")) 408 | mask.triu_(1) # zero out the lower diagonal 409 | return mask 410 | 411 | @property 412 | def dtype(self): 413 | return self.visual.conv1.weight.dtype 414 | 415 | def encode_image(self, image): 416 | return self.visual(image.type(self.dtype)) 417 | 418 | def encode_text(self, text): 419 | x = self.token_embedding(text).type( 420 | self.dtype) # [batch_size, n_ctx, d_model] 421 | 422 | x = x + self.positional_embedding.type(self.dtype) 423 | x = x.permute(1, 0, 2) # NLD -> LND 424 | x = self.transformer(x) 425 | x = x.permute(1, 0, 2) # LND -> NLD 426 | x = self.ln_final(x).type(self.dtype) 427 | 428 | # x.shape = [batch_size, n_ctx, transformer.width] 429 | # take features from the eot embedding (eot_token is the highest number in each sequence) 430 | x = x[torch.arange(x.shape[0]), 431 | text.argmax(dim=-1)] @ self.text_projection 432 | 433 | return x 434 | 435 | def forward(self, image, text): 436 | image_features = self.encode_image(image) 437 | text_features = self.encode_text(text) 438 | 439 | # normalized features 440 | image_features = image_features / image_features.norm(dim=-1, 441 | keepdim=True) 442 | text_features = text_features / text_features.norm(dim=-1, 443 | keepdim=True) 444 | 445 | # cosine similarity as logits 446 | logit_scale = self.logit_scale.exp() 447 | logits_per_image = logit_scale * image_features @ text_features.t() 448 | logits_per_text = logit_scale * text_features @ image_features.t() 449 | 450 | # shape = [global_batch_size, global_batch_size] 451 | return logits_per_image, logits_per_text 452 | 453 | 454 | def convert_weights(model: nn.Module): 455 | """Convert applicable model parameters to fp16""" 456 | 457 | def _convert_weights_to_fp16(l): 458 | if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)): 459 | l.weight.data = l.weight.data.half() 460 | if l.bias is not None: 461 | l.bias.data = l.bias.data.half() 462 | 463 | if isinstance(l, nn.MultiheadAttention): 464 | for attr in [ 465 | *[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], 466 | "in_proj_bias", "bias_k", "bias_v" 467 | ]: 468 | tensor = getattr(l, attr) 469 | if tensor is not None: 470 | tensor.data = tensor.data.half() 471 | 472 | for name in ["text_projection", "proj"]: 473 | if hasattr(l, name): 474 | attr = getattr(l, name) 475 | if attr is not None: 476 | attr.data = attr.data.half() 477 | 478 | model.apply(_convert_weights_to_fp16) 479 | 480 | 481 | def build_model(state_dict: dict): 482 | vit = "visual.proj" in state_dict 483 | 484 | if vit: 485 | vision_width = state_dict["visual.conv1.weight"].shape[0] 486 | vision_layers = len([ 487 | k for k in state_dict.keys() 488 | if k.startswith("visual.") and k.endswith(".attn.in_proj_weight") 489 | ]) 490 | vision_patch_size = state_dict["visual.conv1.weight"].shape[-1] 491 | grid_size = round( 492 | (state_dict["visual.positional_embedding"].shape[0] - 1)**0.5) 493 | image_resolution = vision_patch_size * grid_size 494 | else: 495 | counts: list = [ 496 | len( 497 | set( 498 | k.split(".")[2] for k in state_dict 499 | if k.startswith(f"visual.layer{b}"))) 500 | for b in [1, 2, 3, 4] 501 | ] 502 | vision_layers = tuple(counts) 503 | vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0] 504 | output_width = round( 505 | (state_dict["visual.attnpool.positional_embedding"].shape[0] - 506 | 1)**0.5) 507 | vision_patch_size = None 508 | assert output_width**2 + 1 == state_dict[ 509 | "visual.attnpool.positional_embedding"].shape[0] 510 | image_resolution = output_width * 32 511 | 512 | embed_dim = state_dict["text_projection"].shape[1] 513 | context_length = state_dict["positional_embedding"].shape[0] 514 | vocab_size = state_dict["token_embedding.weight"].shape[0] 515 | transformer_width = state_dict["ln_final.weight"].shape[0] 516 | transformer_heads = transformer_width // 64 517 | transformer_layers = len( 518 | set( 519 | k.split(".")[2] for k in state_dict 520 | if k.startswith(f"transformer.resblocks"))) 521 | 522 | model = CLIP(embed_dim, image_resolution, vision_layers, vision_width, 523 | vision_patch_size, context_length, vocab_size, 524 | transformer_width, transformer_heads, transformer_layers) 525 | 526 | for key in ["input_resolution", "context_length", "vocab_size"]: 527 | if key in state_dict: 528 | del state_dict[key] 529 | 530 | convert_weights(model) 531 | model.load_state_dict(state_dict) 532 | return model.eval() 533 | -------------------------------------------------------------------------------- /clip/simple_tokenizer.py: -------------------------------------------------------------------------------- 1 | import gzip 2 | import html 3 | import os 4 | from functools import lru_cache 5 | 6 | import ftfy 7 | import regex as re 8 | 9 | 10 | @lru_cache() 11 | def default_bpe(): 12 | return os.path.join(os.path.dirname(os.path.abspath(__file__)), 13 | "bpe_simple_vocab_16e6.txt.gz") 14 | 15 | 16 | @lru_cache() 17 | def bytes_to_unicode(): 18 | """ 19 | Returns list of utf-8 byte and a corresponding list of unicode strings. 20 | The reversible bpe codes work on unicode strings. 21 | This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. 22 | When you're at something like a 10B token dataset you end up needing around 5K for decent coverage. 23 | This is a signficant percentage of your normal, say, 32K bpe vocab. 24 | To avoid that, we want lookup tables between utf-8 bytes and unicode strings. 25 | And avoids mapping to whitespace/control characters the bpe code barfs on. 26 | """ 27 | bs = list(range(ord("!"), 28 | ord("~") + 1)) + list(range( 29 | ord("¡"), 30 | ord("¬") + 1)) + list(range(ord("®"), 31 | ord("ÿ") + 1)) 32 | cs = bs[:] 33 | n = 0 34 | for b in range(2**8): 35 | if b not in bs: 36 | bs.append(b) 37 | cs.append(2**8 + n) 38 | n += 1 39 | cs = [chr(n) for n in cs] 40 | return dict(zip(bs, cs)) 41 | 42 | 43 | def get_pairs(word): 44 | """Return set of symbol pairs in a word. 45 | Word is represented as tuple of symbols (symbols being variable-length strings). 46 | """ 47 | pairs = set() 48 | prev_char = word[0] 49 | for char in word[1:]: 50 | pairs.add((prev_char, char)) 51 | prev_char = char 52 | return pairs 53 | 54 | 55 | def basic_clean(text): 56 | text = ftfy.fix_text(text) 57 | text = html.unescape(html.unescape(text)) 58 | return text.strip() 59 | 60 | 61 | def whitespace_clean(text): 62 | text = re.sub(r'\s+', ' ', text) 63 | text = text.strip() 64 | return text 65 | 66 | 67 | class SimpleTokenizer(object): 68 | 69 | def __init__(self, bpe_path: str = default_bpe()): 70 | self.byte_encoder = bytes_to_unicode() 71 | self.byte_decoder = {v: k for k, v in self.byte_encoder.items()} 72 | merges = gzip.open(bpe_path).read().decode("utf-8").split('\n') 73 | merges = merges[1:49152 - 256 - 2 + 1] 74 | merges = [tuple(merge.split()) for merge in merges] 75 | vocab = list(bytes_to_unicode().values()) 76 | vocab = vocab + [v + '' for v in vocab] 77 | for merge in merges: 78 | vocab.append(''.join(merge)) 79 | vocab.extend(['<|startoftext|>', '<|endoftext|>']) 80 | self.encoder = dict(zip(vocab, range(len(vocab)))) 81 | self.decoder = {v: k for k, v in self.encoder.items()} 82 | self.bpe_ranks = dict(zip(merges, range(len(merges)))) 83 | self.cache = { 84 | '<|startoftext|>': '<|startoftext|>', 85 | '<|endoftext|>': '<|endoftext|>' 86 | } 87 | self.pat = re.compile( 88 | r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", 89 | re.IGNORECASE) 90 | 91 | def bpe(self, token): 92 | if token in self.cache: 93 | return self.cache[token] 94 | word = tuple(token[:-1]) + (token[-1] + '', ) 95 | pairs = get_pairs(word) 96 | 97 | if not pairs: 98 | return token + '' 99 | 100 | while True: 101 | bigram = min( 102 | pairs, key=lambda pair: self.bpe_ranks.get(pair, float('inf'))) 103 | if bigram not in self.bpe_ranks: 104 | break 105 | first, second = bigram 106 | new_word = [] 107 | i = 0 108 | while i < len(word): 109 | try: 110 | j = word.index(first, i) 111 | new_word.extend(word[i:j]) 112 | i = j 113 | except: # noqa 114 | new_word.extend(word[i:]) 115 | break 116 | 117 | if word[i] == first and i < len(word) - 1 and word[ 118 | i + 1] == second: 119 | new_word.append(first + second) 120 | i += 2 121 | else: 122 | new_word.append(word[i]) 123 | i += 1 124 | new_word = tuple(new_word) 125 | word = new_word 126 | if len(word) == 1: 127 | break 128 | else: 129 | pairs = get_pairs(word) 130 | word = ' '.join(word) 131 | self.cache[token] = word 132 | return word 133 | 134 | def encode(self, text): 135 | bpe_tokens = [] 136 | text = whitespace_clean(basic_clean(text)).lower() 137 | for token in re.findall(self.pat, text): 138 | token = ''.join(self.byte_encoder[b] 139 | for b in token.encode('utf-8')) 140 | bpe_tokens.extend(self.encoder[bpe_token] 141 | for bpe_token in self.bpe(token).split(' ')) 142 | return bpe_tokens 143 | 144 | def decode(self, tokens): 145 | text = ''.join([self.decoder[token] for token in tokens]) 146 | text = bytearray([self.byte_decoder[c] for c in text 147 | ]).decode('utf-8', 148 | errors="replace").replace('', ' ') 149 | return text 150 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from omegaconf import OmegaConf 4 | 5 | from args import args 6 | 7 | # Base config 8 | base_cfg = OmegaConf.load('configs/base.yaml') 9 | 10 | # Main Config 11 | main_cfg = OmegaConf.load(args.config_file) 12 | 13 | cfg = OmegaConf.merge(base_cfg, main_cfg) 14 | 15 | import torch # isort:skip # noqa 16 | 17 | cfg.checkpoint = f"checkpoints/{cfg.checkpoint}" 18 | dirs = os.listdir(cfg.checkpoint) if os.path.exists(cfg.checkpoint) else [] 19 | dirs.sort() 20 | for i, dir in enumerate(dirs): 21 | assert (dir.startswith('round')) 22 | next_index = len(dirs) + 1 23 | 24 | if args.resume: 25 | cfg.resume = f"{cfg.checkpoint}/round{args.round}" 26 | else: 27 | cfg.resume = False 28 | 29 | if not args.test: 30 | cfg.checkpoint = f"{cfg.checkpoint}/round{next_index}" 31 | assert (not os.path.exists(cfg.checkpoint)) 32 | cfg.test = False 33 | else: 34 | cfg.checkpoint = f"{cfg.checkpoint}/round{args.round}" 35 | cfg.log_file = "test.txt" 36 | cfg.test = True 37 | 38 | if not os.path.exists(cfg.checkpoint): 39 | os.makedirs(cfg.checkpoint) 40 | -------------------------------------------------------------------------------- /configs/base.yaml: -------------------------------------------------------------------------------- 1 | lr: 1e-4 2 | workers: 16 3 | image_size: 224 4 | batch_size: 128 5 | loss: SPLC 6 | log_file: log.txt 7 | weight_decay: 1e-4 8 | test: false 9 | thre: 0.5 10 | -------------------------------------------------------------------------------- /configs/scpnet+coco.yaml: -------------------------------------------------------------------------------- 1 | dataset: ./dataset/coco_train_singlelabel.txt 2 | data: /home/wangao/datasets/coco 3 | num_classes: 80 4 | n_ctx: 4 5 | checkpoint: scpnet+coco 6 | ctx_init: a photo of a 7 | log_file: scpnet+coco.txt 8 | epochs: 60 9 | total_epochs: 80 10 | lr: 3e-5 11 | lambda_u: 0.125 12 | p_cutoff: 0.95 13 | hard_k: 2 14 | kl_lambda: 1 15 | T: 0.2 16 | sparse_topk: 62 17 | reweight_p: 0.2 18 | gcn_lr: 1e-5 19 | relation_file: relation+coco.npy 20 | ratio: 1 21 | scale: 10 22 | -------------------------------------------------------------------------------- /configs/scpnet+cub.yaml: -------------------------------------------------------------------------------- 1 | data: /home/wangao/datasets/cub/ 2 | train_dataset: ./dataset/cub_train.txt 3 | val_dataset: ./dataset/cub_val.txt 4 | num_classes: 312 5 | n_ctx: 4 6 | checkpoint: scpnet+cub 7 | ctx_init: a photo of a 8 | log_file: scpnet+cub.txt 9 | total_epochs: 100 10 | epochs: 100 11 | lr: 3e-4 12 | lambda_u: 0.125 13 | p_cutoff: 0.95 14 | hard_k: 31 15 | kl_lambda: 3 16 | T: 0.3 17 | sparse_topk: 312 18 | reweight_p: 0.2 19 | gcn_lr: 2e-4 20 | relation_file: relation+cub.npy 21 | ratio: 1 22 | scale: 10 23 | batch_size: 96 -------------------------------------------------------------------------------- /configs/scpnet+nuswide.yaml: -------------------------------------------------------------------------------- 1 | data: /home/wangao/datasets/nuswide/ 2 | train_dataset: ./dataset/nus_train_singlelabel.txt 3 | val_dataset: ./dataset/nus_val.txt 4 | num_classes: 81 5 | n_ctx: 4 6 | checkpoint: scpnet+nuswide 7 | ctx_init: a photo of a 8 | log_file: scpnet+nuswide.txt 9 | epochs: 60 10 | lr: 3e-5 11 | lambda_u: 0.125 12 | p_cutoff: 0.95 13 | hard_k: 6 14 | kl_lambda: 1 15 | T: 0.2 16 | sparse_topk: 50 17 | reweight_p: 0.2 18 | gcn_lr: 1e-4 19 | relation_file: relation+nuswide.npy 20 | ratio: 1 21 | scale: clip -------------------------------------------------------------------------------- /configs/scpnet+voc.yaml: -------------------------------------------------------------------------------- 1 | data: /home/wangao/datasets/voc2012/ 2 | train_dataset: ./dataset/voc_train.txt 3 | val_dataset: ./dataset/voc_val.txt 4 | num_classes: 20 5 | n_ctx: 4 6 | checkpoint: scpnet+voc 7 | ctx_init: a photo of a 8 | log_file: scpnet+voc.txt 9 | epochs: 120 10 | total_epochs: 120 11 | lr: 4e-5 12 | lambda_u: 0.125 13 | p_cutoff: 0.95 14 | hard_k: 5 15 | kl_lambda: 2 16 | T: 0.3 17 | sparse_topk: 20 18 | reweight_p: 0.2 19 | gcn_lr: 2e-4 20 | relation_file: relation+voc.npy 21 | ratio: 1 22 | scale: 10 -------------------------------------------------------------------------------- /cub_labels.txt: -------------------------------------------------------------------------------- 1 | curved bill 2 | dagger bill 3 | hooked bill 4 | needle bill 5 | hooked_seabird bill 6 | spatulate bill 7 | all-purpose bill 8 | cone bill 9 | specialized bill 10 | blue wing 11 | brown wing 12 | iridescent wing 13 | purple wing 14 | rufous wing 15 | grey wing 16 | yellow wing 17 | olive wing 18 | green wing 19 | pink wing 20 | orange wing 21 | black wing 22 | white wing 23 | red wing 24 | buff wing 25 | blue upperparts 26 | brown upperparts 27 | iridescent upperparts 28 | purple upperparts 29 | rufous upperparts 30 | grey upperparts 31 | yellow upperparts 32 | olive upperparts 33 | green upperparts 34 | pink upperparts 35 | orange upperparts 36 | black upperparts 37 | white upperparts 38 | red upperparts 39 | buff upperparts 40 | blue underparts 41 | brown underparts 42 | iridescent underparts 43 | purple underparts 44 | rufous underparts 45 | grey underparts 46 | yellow underparts 47 | olive underparts 48 | green underparts 49 | pink underparts 50 | orange underparts 51 | black underparts 52 | white underparts 53 | red underparts 54 | buff underparts 55 | solid breast 56 | spotted breast 57 | striped breast 58 | multi-colored breast 59 | blue back 60 | brown back 61 | iridescent back 62 | purple back 63 | rufous back 64 | grey back 65 | yellow back 66 | olive back 67 | green back 68 | pink back 69 | orange back 70 | black back 71 | white back 72 | red back 73 | buff back 74 | forked_tail tail 75 | rounded_tail tail 76 | notched_tail tail 77 | fan-shaped_tail tail 78 | pointed_tail tail 79 | squared_tail tail 80 | blue upper 81 | brown upper 82 | iridescent upper 83 | purple upper 84 | rufous upper 85 | grey upper 86 | yellow upper 87 | olive upper 88 | green upper 89 | pink upper 90 | orange upper 91 | black upper 92 | white upper 93 | red upper 94 | buff upper 95 | spotted head 96 | malar head 97 | crested head 98 | masked head 99 | unique_pattern head 100 | eyebrow head 101 | eyering head 102 | plain head 103 | eyeline head 104 | striped head 105 | capped head 106 | blue breast 107 | brown breast 108 | iridescent breast 109 | purple breast 110 | rufous breast 111 | grey breast 112 | yellow breast 113 | olive breast 114 | green breast 115 | pink breast 116 | orange breast 117 | black breast 118 | white breast 119 | red breast 120 | buff breast 121 | blue throat 122 | brown throat 123 | iridescent throat 124 | purple throat 125 | rufous throat 126 | grey throat 127 | yellow throat 128 | olive throat 129 | green throat 130 | pink throat 131 | orange throat 132 | black throat 133 | white throat 134 | red throat 135 | buff throat 136 | blue eye 137 | brown eye 138 | purple eye 139 | rufous eye 140 | grey eye 141 | yellow eye 142 | olive eye 143 | green eye 144 | pink eye 145 | orange eye 146 | black eye 147 | white eye 148 | red eye 149 | buff eye 150 | about_the_same_as_head bill 151 | longer_than_head bill 152 | shorter_than_head bill 153 | blue forehead 154 | brown forehead 155 | iridescent forehead 156 | purple forehead 157 | rufous forehead 158 | grey forehead 159 | yellow forehead 160 | olive forehead 161 | green forehead 162 | pink forehead 163 | orange forehead 164 | black forehead 165 | white forehead 166 | red forehead 167 | buff forehead 168 | blue under 169 | brown under 170 | iridescent under 171 | purple under 172 | rufous under 173 | grey under 174 | yellow under 175 | olive under 176 | green under 177 | pink under 178 | orange under 179 | black under 180 | white under 181 | red under 182 | buff under 183 | blue nape 184 | brown nape 185 | iridescent nape 186 | purple nape 187 | rufous nape 188 | grey nape 189 | yellow nape 190 | olive nape 191 | green nape 192 | pink nape 193 | orange nape 194 | black nape 195 | white nape 196 | red nape 197 | buff nape 198 | blue belly 199 | brown belly 200 | iridescent belly 201 | purple belly 202 | rufous belly 203 | grey belly 204 | yellow belly 205 | olive belly 206 | green belly 207 | pink belly 208 | orange belly 209 | black belly 210 | white belly 211 | red belly 212 | buff belly 213 | rounded-wings wing 214 | pointed-wings wing 215 | broad-wings wing 216 | tapered-wings wing 217 | long-wings wing 218 | large size::large 219 | small size::small 220 | very_large size::very 221 | medium size::medium 222 | very_small size::very 223 | upright-perching_water-like shape::upright-perching 224 | chicken-like-marsh shape::chicken-like-marsh 225 | long-legged-like shape::long-legged-like 226 | duck-like shape::duck-like 227 | owl-like shape::owl-like 228 | gull-like shape::gull-like 229 | hummingbird-like shape::hummingbird-like 230 | pigeon-like shape::pigeon-like 231 | tree-clinging-like shape::tree-clinging-like 232 | hawk-like shape::hawk-like 233 | sandpiper-like shape::sandpiper-like 234 | upland-ground-like shape::upland-ground-like 235 | swallow-like shape::swallow-like 236 | perching-like shape::perching-like 237 | solid back 238 | spotted back 239 | striped back 240 | multi-colored back 241 | solid tail 242 | spotted tail 243 | striped tail 244 | multi-colored tail 245 | solid belly 246 | spotted belly 247 | striped belly 248 | multi-colored belly 249 | blue primary 250 | brown primary 251 | iridescent primary 252 | purple primary 253 | rufous primary 254 | grey primary 255 | yellow primary 256 | olive primary 257 | green primary 258 | pink primary 259 | orange primary 260 | black primary 261 | white primary 262 | red primary 263 | buff primary 264 | blue leg 265 | brown leg 266 | iridescent leg 267 | purple leg 268 | rufous leg 269 | grey leg 270 | yellow leg 271 | olive leg 272 | green leg 273 | pink leg 274 | orange leg 275 | black leg 276 | white leg 277 | red leg 278 | buff leg 279 | blue bill 280 | brown bill 281 | iridescent bill 282 | purple bill 283 | rufous bill 284 | grey bill 285 | yellow bill 286 | olive bill 287 | green bill 288 | pink bill 289 | orange bill 290 | black bill 291 | white bill 292 | red bill 293 | buff bill 294 | blue crown 295 | brown crown 296 | iridescent crown 297 | purple crown 298 | rufous crown 299 | grey crown 300 | yellow crown 301 | olive crown 302 | green crown 303 | pink crown 304 | orange crown 305 | black crown 306 | white crown 307 | red crown 308 | buff crown 309 | solid wing 310 | spotted wing 311 | striped wing 312 | multi-colored wing -------------------------------------------------------------------------------- /figures/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/figures/overview.png -------------------------------------------------------------------------------- /log.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import sys 3 | 4 | from config import cfg 5 | 6 | cfg.log_file = f"{cfg.checkpoint}/{cfg.log_file}" 7 | 8 | logging.basicConfig(level=logging.INFO, 9 | format="%(asctime)s [%(levelname)s] %(message)s", 10 | handlers=[ 11 | logging.FileHandler(cfg.log_file), 12 | logging.StreamHandler(sys.stdout) 13 | ]) 14 | 15 | logger = logging.getLogger() 16 | 17 | if cfg.resume: 18 | logger.info(f"Resume training... from {cfg.resume}") 19 | 20 | -------------------------------------------------------------------------------- /logs/scpnet+coco.txt: -------------------------------------------------------------------------------- 1 | 2022-07-17 15:51:40,685 [INFO] Epoch [0/60], Step [000/642], LR 1.2e-06, Loss: 13286.5 2 | 2022-07-17 15:53:22,741 [INFO] Epoch [0/60], Step [100/642], LR 1.2e-06, Loss: 1132.3 3 | 2022-07-17 15:55:04,475 [INFO] Epoch [0/60], Step [200/642], LR 1.2e-06, Loss: 1030.5 4 | 2022-07-17 15:56:47,055 [INFO] Epoch [0/60], Step [300/642], LR 1.3e-06, Loss: 1008.7 5 | 2022-07-17 15:58:29,322 [INFO] Epoch [0/60], Step [400/642], LR 1.3e-06, Loss: 630.1 6 | 2022-07-17 16:00:12,686 [INFO] Epoch [0/60], Step [500/642], LR 1.4e-06, Loss: 847.4 7 | 2022-07-17 16:01:56,180 [INFO] Epoch [0/60], Step [600/642], LR 1.4e-06, Loss: 962.7 8 | 2022-07-17 16:02:36,957 [INFO] Start validation... 9 | 2022-07-17 16:04:13,632 [INFO] mAP score regular 28.42, mAP score EMA 44.89 10 | 2022-07-17 16:04:16,156 [INFO] current_mAP = 44.89, highest_mAP = 44.89, best_epoch=0 11 | 12 | 2022-07-17 16:04:16,156 [INFO] Save text embeddings done 13 | 2022-07-17 16:04:25,360 [INFO] Epoch [1/60], Step [000/642], LR 1.5e-06, Loss: 3451.4 14 | 2022-07-17 16:06:10,475 [INFO] Epoch [1/60], Step [100/642], LR 1.6e-06, Loss: 946.1 15 | 2022-07-17 16:07:54,265 [INFO] Epoch [1/60], Step [200/642], LR 1.7e-06, Loss: 964.6 16 | 2022-07-17 16:09:37,637 [INFO] Epoch [1/60], Step [300/642], LR 1.8e-06, Loss: 976.3 17 | 2022-07-17 16:11:21,822 [INFO] Epoch [1/60], Step [400/642], LR 1.9e-06, Loss: 973.7 18 | 2022-07-17 16:13:04,875 [INFO] Epoch [1/60], Step [500/642], LR 2.1e-06, Loss: 960.4 19 | 2022-07-17 16:14:48,458 [INFO] Epoch [1/60], Step [600/642], LR 2.2e-06, Loss: 915.2 20 | 2022-07-17 16:15:29,918 [INFO] Start validation... 21 | 2022-07-17 16:17:05,892 [INFO] mAP score regular 38.98, mAP score EMA 41.81 22 | 2022-07-17 16:17:05,922 [INFO] current_mAP = 41.81, highest_mAP = 44.89, best_epoch=0 23 | 24 | 2022-07-17 16:17:05,922 [INFO] Save text embeddings done 25 | 2022-07-17 16:17:17,145 [INFO] Epoch [2/60], Step [000/642], LR 2.3e-06, Loss: 978.1 26 | 2022-07-17 16:19:00,247 [INFO] Epoch [2/60], Step [100/642], LR 2.5e-06, Loss: 954.2 27 | 2022-07-17 16:20:43,878 [INFO] Epoch [2/60], Step [200/642], LR 2.7e-06, Loss: 954.1 28 | 2022-07-17 16:22:28,571 [INFO] Epoch [2/60], Step [300/642], LR 2.9e-06, Loss: 928.4 29 | 2022-07-17 16:24:12,222 [INFO] Epoch [2/60], Step [400/642], LR 3.1e-06, Loss: 921.7 30 | 2022-07-17 16:25:55,386 [INFO] Epoch [2/60], Step [500/642], LR 3.3e-06, Loss: 907.2 31 | 2022-07-17 16:27:38,450 [INFO] Epoch [2/60], Step [600/642], LR 3.5e-06, Loss: 920.4 32 | 2022-07-17 16:28:19,890 [INFO] Start validation... 33 | 2022-07-17 16:29:56,955 [INFO] mAP score regular 54.01, mAP score EMA 41.98 34 | 2022-07-17 16:30:05,007 [INFO] current_mAP = 54.01, highest_mAP = 54.01, best_epoch=2 35 | 36 | 2022-07-17 16:30:05,007 [INFO] Save text embeddings done 37 | 2022-07-17 16:30:16,473 [INFO] Epoch [3/60], Step [000/642], LR 3.6e-06, Loss: 887.5 38 | 2022-07-17 16:31:59,093 [INFO] Epoch [3/60], Step [100/642], LR 3.9e-06, Loss: 876.4 39 | 2022-07-17 16:33:42,022 [INFO] Epoch [3/60], Step [200/642], LR 4.1e-06, Loss: 889.5 40 | 2022-07-17 16:35:25,202 [INFO] Epoch [3/60], Step [300/642], LR 4.4e-06, Loss: 878.5 41 | 2022-07-17 16:37:09,456 [INFO] Epoch [3/60], Step [400/642], LR 4.7e-06, Loss: 832.1 42 | 2022-07-17 16:38:53,582 [INFO] Epoch [3/60], Step [500/642], LR 5.0e-06, Loss: 842.4 43 | 2022-07-17 16:40:37,047 [INFO] Epoch [3/60], Step [600/642], LR 5.3e-06, Loss: 858.2 44 | 2022-07-17 16:41:18,475 [INFO] Start validation... 45 | 2022-07-17 16:42:51,412 [INFO] mAP score regular 61.35, mAP score EMA 45.21 46 | 2022-07-17 16:42:59,220 [INFO] current_mAP = 61.35, highest_mAP = 61.35, best_epoch=3 47 | 48 | 2022-07-17 16:42:59,221 [INFO] Save text embeddings done 49 | 2022-07-17 16:43:08,551 [INFO] Epoch [4/60], Step [000/642], LR 5.4e-06, Loss: 864.4 50 | 2022-07-17 16:44:51,657 [INFO] Epoch [4/60], Step [100/642], LR 5.7e-06, Loss: 818.5 51 | 2022-07-17 16:46:35,568 [INFO] Epoch [4/60], Step [200/642], LR 6.1e-06, Loss: 841.1 52 | 2022-07-17 16:48:19,479 [INFO] Epoch [4/60], Step [300/642], LR 6.4e-06, Loss: 802.2 53 | 2022-07-17 16:50:03,433 [INFO] Epoch [4/60], Step [400/642], LR 6.7e-06, Loss: 807.5 54 | 2022-07-17 16:51:46,536 [INFO] Epoch [4/60], Step [500/642], LR 7.1e-06, Loss: 796.9 55 | 2022-07-17 16:53:29,551 [INFO] Epoch [4/60], Step [600/642], LR 7.5e-06, Loss: 792.0 56 | 2022-07-17 16:54:10,788 [INFO] Start validation... 57 | 2022-07-17 16:55:43,037 [INFO] mAP score regular 65.80, mAP score EMA 50.40 58 | 2022-07-17 16:55:51,001 [INFO] current_mAP = 65.80, highest_mAP = 65.80, best_epoch=4 59 | 60 | 2022-07-17 16:55:51,002 [INFO] Save text embeddings done 61 | 2022-07-17 16:56:01,866 [INFO] Epoch [5/60], Step [000/642], LR 7.6e-06, Loss: 803.1 62 | 2022-07-17 16:57:44,888 [INFO] Epoch [5/60], Step [100/642], LR 8.0e-06, Loss: 756.4 63 | 2022-07-17 16:59:28,773 [INFO] Epoch [5/60], Step [200/642], LR 8.4e-06, Loss: 749.1 64 | 2022-07-17 17:01:12,349 [INFO] Epoch [5/60], Step [300/642], LR 8.7e-06, Loss: 735.2 65 | 2022-07-17 17:02:55,653 [INFO] Epoch [5/60], Step [400/642], LR 9.1e-06, Loss: 728.2 66 | 2022-07-17 17:04:39,685 [INFO] Epoch [5/60], Step [500/642], LR 9.5e-06, Loss: 711.2 67 | 2022-07-17 17:06:24,157 [INFO] Epoch [5/60], Step [600/642], LR 9.9e-06, Loss: 692.8 68 | 2022-07-17 17:07:05,441 [INFO] Start validation... 69 | 2022-07-17 17:08:38,475 [INFO] mAP score regular 69.56, mAP score EMA 56.27 70 | 2022-07-17 17:08:46,013 [INFO] current_mAP = 69.56, highest_mAP = 69.56, best_epoch=5 71 | 72 | 2022-07-17 17:08:46,013 [INFO] Save text embeddings done 73 | 2022-07-17 17:08:53,237 [INFO] Epoch [6/60], Step [000/642], LR 1.0e-05, Loss: 722.1 74 | 2022-07-17 17:10:36,298 [INFO] Epoch [6/60], Step [100/642], LR 1.1e-05, Loss: 654.5 75 | 2022-07-17 17:12:20,620 [INFO] Epoch [6/60], Step [200/642], LR 1.1e-05, Loss: 662.9 76 | 2022-07-17 17:14:03,871 [INFO] Epoch [6/60], Step [300/642], LR 1.1e-05, Loss: 652.4 77 | 2022-07-17 17:15:46,665 [INFO] Epoch [6/60], Step [400/642], LR 1.2e-05, Loss: 654.8 78 | 2022-07-17 17:17:30,040 [INFO] Epoch [6/60], Step [500/642], LR 1.2e-05, Loss: 666.1 79 | 2022-07-17 17:19:12,738 [INFO] Epoch [6/60], Step [600/642], LR 1.3e-05, Loss: 672.6 80 | 2022-07-17 17:19:53,822 [INFO] Start validation... 81 | 2022-07-17 17:21:23,980 [INFO] mAP score regular 71.25, mAP score EMA 61.50 82 | 2022-07-17 17:21:32,028 [INFO] current_mAP = 71.25, highest_mAP = 71.25, best_epoch=6 83 | 84 | 2022-07-17 17:21:32,029 [INFO] Save text embeddings done 85 | 2022-07-17 17:21:42,003 [INFO] Epoch [7/60], Step [000/642], LR 1.3e-05, Loss: 664.5 86 | 2022-07-17 17:23:24,433 [INFO] Epoch [7/60], Step [100/642], LR 1.3e-05, Loss: 670.3 87 | 2022-07-17 17:25:08,126 [INFO] Epoch [7/60], Step [200/642], LR 1.4e-05, Loss: 659.0 88 | 2022-07-17 17:26:51,416 [INFO] Epoch [7/60], Step [300/642], LR 1.4e-05, Loss: 662.2 89 | 2022-07-17 17:28:35,512 [INFO] Epoch [7/60], Step [400/642], LR 1.5e-05, Loss: 657.5 90 | 2022-07-17 17:30:19,108 [INFO] Epoch [7/60], Step [500/642], LR 1.5e-05, Loss: 637.7 91 | 2022-07-17 17:32:01,520 [INFO] Epoch [7/60], Step [600/642], LR 1.5e-05, Loss: 666.3 92 | 2022-07-17 17:32:42,955 [INFO] Start validation... 93 | 2022-07-17 17:34:15,117 [INFO] mAP score regular 72.35, mAP score EMA 65.64 94 | 2022-07-17 17:34:22,499 [INFO] current_mAP = 72.35, highest_mAP = 72.35, best_epoch=7 95 | 96 | 2022-07-17 17:34:22,500 [INFO] Save text embeddings done 97 | 2022-07-17 17:34:29,495 [INFO] Epoch [8/60], Step [000/642], LR 1.6e-05, Loss: 665.7 98 | 2022-07-17 17:36:13,136 [INFO] Epoch [8/60], Step [100/642], LR 1.6e-05, Loss: 662.4 99 | 2022-07-17 17:37:55,668 [INFO] Epoch [8/60], Step [200/642], LR 1.6e-05, Loss: 655.5 100 | 2022-07-17 17:39:38,771 [INFO] Epoch [8/60], Step [300/642], LR 1.7e-05, Loss: 663.2 101 | 2022-07-17 17:41:22,314 [INFO] Epoch [8/60], Step [400/642], LR 1.7e-05, Loss: 642.0 102 | 2022-07-17 17:43:05,360 [INFO] Epoch [8/60], Step [500/642], LR 1.8e-05, Loss: 645.0 103 | 2022-07-17 17:44:48,373 [INFO] Epoch [8/60], Step [600/642], LR 1.8e-05, Loss: 684.7 104 | 2022-07-17 17:45:29,895 [INFO] Start validation... 105 | 2022-07-17 17:47:01,768 [INFO] mAP score regular 71.84, mAP score EMA 68.70 106 | 2022-07-17 17:47:01,790 [INFO] current_mAP = 71.84, highest_mAP = 72.35, best_epoch=7 107 | 108 | 2022-07-17 17:47:01,790 [INFO] Save text embeddings done 109 | 2022-07-17 17:47:08,673 [INFO] Epoch [9/60], Step [000/642], LR 1.8e-05, Loss: 627.2 110 | 2022-07-17 17:48:50,724 [INFO] Epoch [9/60], Step [100/642], LR 1.9e-05, Loss: 645.3 111 | 2022-07-17 17:50:34,738 [INFO] Epoch [9/60], Step [200/642], LR 1.9e-05, Loss: 638.1 112 | 2022-07-17 17:52:17,310 [INFO] Epoch [9/60], Step [300/642], LR 2.0e-05, Loss: 662.6 113 | 2022-07-17 17:53:59,213 [INFO] Epoch [9/60], Step [400/642], LR 2.0e-05, Loss: 611.6 114 | 2022-07-17 17:55:41,656 [INFO] Epoch [9/60], Step [500/642], LR 2.1e-05, Loss: 648.7 115 | 2022-07-17 17:57:24,063 [INFO] Epoch [9/60], Step [600/642], LR 2.1e-05, Loss: 662.5 116 | 2022-07-17 17:58:05,332 [INFO] Start validation... 117 | 2022-07-17 17:59:33,008 [INFO] mAP score regular 72.28, mAP score EMA 70.95 118 | 2022-07-17 17:59:33,027 [INFO] current_mAP = 72.28, highest_mAP = 72.35, best_epoch=7 119 | 120 | 2022-07-17 17:59:33,027 [INFO] Save text embeddings done 121 | 2022-07-17 17:59:40,885 [INFO] Epoch [10/60], Step [000/642], LR 2.1e-05, Loss: 629.6 122 | 2022-07-17 18:01:23,674 [INFO] Epoch [10/60], Step [100/642], LR 2.2e-05, Loss: 628.5 123 | 2022-07-17 18:03:06,082 [INFO] Epoch [10/60], Step [200/642], LR 2.2e-05, Loss: 656.5 124 | 2022-07-17 18:04:48,816 [INFO] Epoch [10/60], Step [300/642], LR 2.2e-05, Loss: 644.5 125 | 2022-07-17 18:06:32,578 [INFO] Epoch [10/60], Step [400/642], LR 2.3e-05, Loss: 623.7 126 | 2022-07-17 18:08:15,731 [INFO] Epoch [10/60], Step [500/642], LR 2.3e-05, Loss: 651.2 127 | 2022-07-17 18:09:58,153 [INFO] Epoch [10/60], Step [600/642], LR 2.3e-05, Loss: 646.6 128 | 2022-07-17 18:10:39,446 [INFO] Start validation... 129 | 2022-07-17 18:12:10,610 [INFO] mAP score regular 72.85, mAP score EMA 72.60 130 | 2022-07-17 18:12:18,127 [INFO] current_mAP = 72.85, highest_mAP = 72.85, best_epoch=10 131 | 132 | 2022-07-17 18:12:18,127 [INFO] Save text embeddings done 133 | 2022-07-17 18:12:25,880 [INFO] Epoch [11/60], Step [000/642], LR 2.4e-05, Loss: 637.2 134 | 2022-07-17 18:14:09,057 [INFO] Epoch [11/60], Step [100/642], LR 2.4e-05, Loss: 623.2 135 | 2022-07-17 18:15:52,046 [INFO] Epoch [11/60], Step [200/642], LR 2.4e-05, Loss: 610.6 136 | 2022-07-17 18:17:35,624 [INFO] Epoch [11/60], Step [300/642], LR 2.5e-05, Loss: 624.1 137 | 2022-07-17 18:19:19,581 [INFO] Epoch [11/60], Step [400/642], LR 2.5e-05, Loss: 615.6 138 | 2022-07-17 18:21:03,305 [INFO] Epoch [11/60], Step [500/642], LR 2.5e-05, Loss: 627.8 139 | 2022-07-17 18:22:46,519 [INFO] Epoch [11/60], Step [600/642], LR 2.6e-05, Loss: 644.7 140 | 2022-07-17 18:23:27,759 [INFO] Start validation... 141 | 2022-07-17 18:25:01,510 [INFO] mAP score regular 72.48, mAP score EMA 73.83 142 | 2022-07-17 18:25:09,083 [INFO] current_mAP = 73.83, highest_mAP = 73.83, best_epoch=11 143 | 144 | 2022-07-17 18:25:09,084 [INFO] Save text embeddings done 145 | 2022-07-17 18:25:18,691 [INFO] Epoch [12/60], Step [000/642], LR 2.6e-05, Loss: 635.7 146 | 2022-07-17 18:27:02,516 [INFO] Epoch [12/60], Step [100/642], LR 2.6e-05, Loss: 630.8 147 | 2022-07-17 18:28:45,784 [INFO] Epoch [12/60], Step [200/642], LR 2.6e-05, Loss: 623.5 148 | 2022-07-17 18:30:28,908 [INFO] Epoch [12/60], Step [300/642], LR 2.7e-05, Loss: 619.5 149 | 2022-07-17 18:32:11,842 [INFO] Epoch [12/60], Step [400/642], LR 2.7e-05, Loss: 630.5 150 | 2022-07-17 18:33:55,056 [INFO] Epoch [12/60], Step [500/642], LR 2.7e-05, Loss: 600.1 151 | 2022-07-17 18:35:39,971 [INFO] Epoch [12/60], Step [600/642], LR 2.7e-05, Loss: 649.8 152 | 2022-07-17 18:36:21,413 [INFO] Start validation... 153 | 2022-07-17 18:37:55,776 [INFO] mAP score regular 71.86, mAP score EMA 74.67 154 | 2022-07-17 18:38:03,296 [INFO] current_mAP = 74.67, highest_mAP = 74.67, best_epoch=12 155 | 156 | 2022-07-17 18:38:03,297 [INFO] Save text embeddings done 157 | 2022-07-17 18:38:11,348 [INFO] Epoch [13/60], Step [000/642], LR 2.8e-05, Loss: 608.0 158 | 2022-07-17 18:39:54,589 [INFO] Epoch [13/60], Step [100/642], LR 2.8e-05, Loss: 628.6 159 | 2022-07-17 18:41:37,798 [INFO] Epoch [13/60], Step [200/642], LR 2.8e-05, Loss: 619.4 160 | 2022-07-17 18:43:21,140 [INFO] Epoch [13/60], Step [300/642], LR 2.8e-05, Loss: 637.7 161 | 2022-07-17 18:45:04,141 [INFO] Epoch [13/60], Step [400/642], LR 2.8e-05, Loss: 632.4 162 | 2022-07-17 18:46:47,602 [INFO] Epoch [13/60], Step [500/642], LR 2.9e-05, Loss: 658.2 163 | 2022-07-17 18:48:30,967 [INFO] Epoch [13/60], Step [600/642], LR 2.9e-05, Loss: 610.7 164 | 2022-07-17 18:49:11,892 [INFO] Start validation... 165 | 2022-07-17 18:50:45,710 [INFO] mAP score regular 72.37, mAP score EMA 75.25 166 | 2022-07-17 18:50:54,315 [INFO] current_mAP = 75.25, highest_mAP = 75.25, best_epoch=13 167 | 168 | 2022-07-17 18:50:54,315 [INFO] Save text embeddings done 169 | 2022-07-17 18:51:02,755 [INFO] Epoch [14/60], Step [000/642], LR 2.9e-05, Loss: 594.0 170 | 2022-07-17 18:52:46,716 [INFO] Epoch [14/60], Step [100/642], LR 2.9e-05, Loss: 601.3 171 | 2022-07-17 18:54:30,099 [INFO] Epoch [14/60], Step [200/642], LR 2.9e-05, Loss: 607.4 172 | 2022-07-17 18:56:13,470 [INFO] Epoch [14/60], Step [300/642], LR 2.9e-05, Loss: 636.7 173 | 2022-07-17 18:57:56,986 [INFO] Epoch [14/60], Step [400/642], LR 2.9e-05, Loss: 602.9 174 | 2022-07-17 18:59:39,555 [INFO] Epoch [14/60], Step [500/642], LR 3.0e-05, Loss: 601.4 175 | 2022-07-17 19:01:23,459 [INFO] Epoch [14/60], Step [600/642], LR 3.0e-05, Loss: 595.6 176 | 2022-07-17 19:02:04,815 [INFO] Start validation... 177 | 2022-07-17 19:03:41,072 [INFO] mAP score regular 70.73, mAP score EMA 75.68 178 | 2022-07-17 19:03:48,854 [INFO] current_mAP = 75.68, highest_mAP = 75.68, best_epoch=14 179 | 180 | 2022-07-17 19:03:48,863 [INFO] Save text embeddings done 181 | 2022-07-17 19:03:58,047 [INFO] Epoch [15/60], Step [000/642], LR 3.0e-05, Loss: 649.3 182 | 2022-07-17 19:05:40,735 [INFO] Epoch [15/60], Step [100/642], LR 3.0e-05, Loss: 634.9 183 | 2022-07-17 19:07:24,967 [INFO] Epoch [15/60], Step [200/642], LR 3.0e-05, Loss: 615.0 184 | 2022-07-17 19:09:09,782 [INFO] Epoch [15/60], Step [300/642], LR 3.0e-05, Loss: 596.5 185 | 2022-07-17 19:10:53,226 [INFO] Epoch [15/60], Step [400/642], LR 3.0e-05, Loss: 627.1 186 | 2022-07-17 19:12:36,561 [INFO] Epoch [15/60], Step [500/642], LR 3.0e-05, Loss: 640.2 187 | 2022-07-17 19:14:19,818 [INFO] Epoch [15/60], Step [600/642], LR 3.0e-05, Loss: 599.0 188 | 2022-07-17 19:15:01,235 [INFO] Start validation... 189 | 2022-07-17 19:16:35,357 [INFO] mAP score regular 71.33, mAP score EMA 76.00 190 | 2022-07-17 19:16:45,918 [INFO] current_mAP = 76.00, highest_mAP = 76.00, best_epoch=15 191 | 192 | 2022-07-17 19:16:45,918 [INFO] Save text embeddings done 193 | 2022-07-17 19:16:57,307 [INFO] Epoch [16/60], Step [000/642], LR 3.0e-05, Loss: 591.5 194 | 2022-07-17 19:18:39,963 [INFO] Epoch [16/60], Step [100/642], LR 3.0e-05, Loss: 600.0 195 | 2022-07-17 19:20:23,570 [INFO] Epoch [16/60], Step [200/642], LR 3.0e-05, Loss: 583.3 196 | 2022-07-17 19:22:07,824 [INFO] Epoch [16/60], Step [300/642], LR 3.0e-05, Loss: 600.7 197 | 2022-07-17 19:23:51,911 [INFO] Epoch [16/60], Step [400/642], LR 3.0e-05, Loss: 605.3 198 | 2022-07-17 19:25:34,761 [INFO] Epoch [16/60], Step [500/642], LR 3.0e-05, Loss: 578.8 199 | 2022-07-17 19:27:17,418 [INFO] Epoch [16/60], Step [600/642], LR 3.0e-05, Loss: 627.3 200 | 2022-07-17 19:27:58,323 [INFO] Start validation... 201 | 2022-07-17 19:29:31,542 [INFO] mAP score regular 70.97, mAP score EMA 76.25 202 | 2022-07-17 19:29:42,388 [INFO] current_mAP = 76.25, highest_mAP = 76.25, best_epoch=16 203 | 204 | 2022-07-17 19:29:42,388 [INFO] Save text embeddings done 205 | 2022-07-17 19:29:51,810 [INFO] Epoch [17/60], Step [000/642], LR 3.0e-05, Loss: 624.3 206 | 2022-07-17 19:31:36,236 [INFO] Epoch [17/60], Step [100/642], LR 3.0e-05, Loss: 588.3 207 | 2022-07-17 19:33:19,494 [INFO] Epoch [17/60], Step [200/642], LR 3.0e-05, Loss: 610.9 208 | 2022-07-17 19:35:03,244 [INFO] Epoch [17/60], Step [300/642], LR 3.0e-05, Loss: 573.2 209 | 2022-07-17 19:36:46,556 [INFO] Epoch [17/60], Step [400/642], LR 3.0e-05, Loss: 612.2 210 | 2022-07-17 19:38:28,978 [INFO] Epoch [17/60], Step [500/642], LR 3.0e-05, Loss: 575.3 211 | 2022-07-17 19:40:12,017 [INFO] Epoch [17/60], Step [600/642], LR 3.0e-05, Loss: 602.0 212 | 2022-07-17 19:40:53,048 [INFO] Start validation... 213 | 2022-07-17 19:42:25,563 [INFO] mAP score regular 72.18, mAP score EMA 76.37 214 | 2022-07-17 19:42:33,121 [INFO] current_mAP = 76.37, highest_mAP = 76.37, best_epoch=17 215 | 216 | 2022-07-17 19:42:33,121 [INFO] Save text embeddings done 217 | 2022-07-17 19:42:42,420 [INFO] Epoch [18/60], Step [000/642], LR 3.0e-05, Loss: 607.2 218 | 2022-07-17 19:44:24,861 [INFO] Epoch [18/60], Step [100/642], LR 3.0e-05, Loss: 579.3 219 | 2022-07-17 19:46:08,403 [INFO] Epoch [18/60], Step [200/642], LR 3.0e-05, Loss: 570.6 220 | 2022-07-17 19:47:52,526 [INFO] Epoch [18/60], Step [300/642], LR 3.0e-05, Loss: 604.3 221 | 2022-07-17 19:49:35,492 [INFO] Epoch [18/60], Step [400/642], LR 3.0e-05, Loss: 602.7 222 | 2022-07-17 19:51:18,619 [INFO] Epoch [18/60], Step [500/642], LR 3.0e-05, Loss: 577.4 223 | 2022-07-17 19:53:01,579 [INFO] Epoch [18/60], Step [600/642], LR 3.0e-05, Loss: 603.9 224 | 2022-07-17 19:53:42,746 [INFO] Start validation... 225 | 2022-07-17 19:55:16,569 [INFO] mAP score regular 71.23, mAP score EMA 76.42 226 | 2022-07-17 19:55:24,451 [INFO] current_mAP = 76.42, highest_mAP = 76.42, best_epoch=18 227 | 228 | 2022-07-17 19:55:24,451 [INFO] Save text embeddings done 229 | 2022-07-17 19:55:34,927 [INFO] Epoch [19/60], Step [000/642], LR 3.0e-05, Loss: 590.6 230 | 2022-07-17 19:57:18,542 [INFO] Epoch [19/60], Step [100/642], LR 3.0e-05, Loss: 614.3 231 | 2022-07-17 19:59:01,698 [INFO] Epoch [19/60], Step [200/642], LR 3.0e-05, Loss: 578.7 232 | 2022-07-17 20:00:45,131 [INFO] Epoch [19/60], Step [300/642], LR 3.0e-05, Loss: 611.3 233 | 2022-07-17 20:02:27,317 [INFO] Epoch [19/60], Step [400/642], LR 3.0e-05, Loss: 608.8 234 | 2022-07-17 20:04:09,765 [INFO] Epoch [19/60], Step [500/642], LR 3.0e-05, Loss: 614.6 235 | 2022-07-17 20:05:52,692 [INFO] Epoch [19/60], Step [600/642], LR 3.0e-05, Loss: 607.6 236 | 2022-07-17 20:06:33,787 [INFO] Start validation... 237 | 2022-07-17 20:08:05,728 [INFO] mAP score regular 70.85, mAP score EMA 76.35 238 | 2022-07-17 20:08:05,748 [INFO] current_mAP = 76.35, highest_mAP = 76.42, best_epoch=18 239 | 240 | 2022-07-17 20:08:05,749 [INFO] Save text embeddings done 241 | 2022-07-17 20:08:15,606 [INFO] Epoch [20/60], Step [000/642], LR 3.0e-05, Loss: 591.2 242 | 2022-07-17 20:09:57,967 [INFO] Epoch [20/60], Step [100/642], LR 3.0e-05, Loss: 590.8 243 | 2022-07-17 20:11:40,405 [INFO] Epoch [20/60], Step [200/642], LR 3.0e-05, Loss: 598.5 244 | 2022-07-17 20:13:23,257 [INFO] Epoch [20/60], Step [300/642], LR 3.0e-05, Loss: 579.0 245 | 2022-07-17 20:15:06,258 [INFO] Epoch [20/60], Step [400/642], LR 3.0e-05, Loss: 583.9 246 | 2022-07-17 20:16:47,969 [INFO] Epoch [20/60], Step [500/642], LR 3.0e-05, Loss: 599.2 247 | 2022-07-17 20:18:30,394 [INFO] Epoch [20/60], Step [600/642], LR 3.0e-05, Loss: 568.2 248 | 2022-07-17 20:19:11,417 [INFO] Start validation... 249 | 2022-07-17 20:20:42,805 [INFO] mAP score regular 71.09, mAP score EMA 76.25 250 | 2022-07-17 20:20:42,826 [INFO] current_mAP = 76.25, highest_mAP = 76.42, best_epoch=18 251 | 252 | 2022-07-17 20:20:42,826 [INFO] Save text embeddings done 253 | 2022-07-17 20:20:53,384 [INFO] Epoch [21/60], Step [000/642], LR 3.0e-05, Loss: 572.4 254 | 2022-07-17 20:22:37,802 [INFO] Epoch [21/60], Step [100/642], LR 3.0e-05, Loss: 599.3 255 | 2022-07-17 20:24:20,758 [INFO] Epoch [21/60], Step [200/642], LR 2.9e-05, Loss: 557.3 256 | 2022-07-17 20:26:04,039 [INFO] Epoch [21/60], Step [300/642], LR 2.9e-05, Loss: 571.6 257 | 2022-07-17 20:27:46,721 [INFO] Epoch [21/60], Step [400/642], LR 2.9e-05, Loss: 619.8 258 | 2022-07-17 20:29:29,281 [INFO] Epoch [21/60], Step [500/642], LR 2.9e-05, Loss: 584.9 259 | 2022-07-17 20:31:12,229 [INFO] Epoch [21/60], Step [600/642], LR 2.9e-05, Loss: 579.1 260 | 2022-07-17 20:31:53,816 [INFO] Start validation... 261 | 2022-07-17 20:33:25,354 [INFO] mAP score regular 70.88, mAP score EMA 76.12 262 | 2022-07-17 20:33:25,371 [INFO] current_mAP = 76.12, highest_mAP = 76.42, best_epoch=18 263 | 264 | 2022-07-17 20:33:25,372 [INFO] Save text embeddings done 265 | 2022-07-17 20:33:34,224 [INFO] Epoch [22/60], Step [000/642], LR 2.9e-05, Loss: 596.8 266 | 2022-07-17 20:35:18,091 [INFO] Epoch [22/60], Step [100/642], LR 2.9e-05, Loss: 577.1 267 | 2022-07-17 20:37:00,832 [INFO] Epoch [22/60], Step [200/642], LR 2.9e-05, Loss: 561.4 268 | 2022-07-17 20:38:43,381 [INFO] Epoch [22/60], Step [300/642], LR 2.9e-05, Loss: 591.5 269 | 2022-07-17 20:40:26,419 [INFO] Epoch [22/60], Step [400/642], LR 2.9e-05, Loss: 583.0 270 | 2022-07-17 20:42:09,171 [INFO] Epoch [22/60], Step [500/642], LR 2.9e-05, Loss: 576.2 271 | 2022-07-17 20:43:51,747 [INFO] Epoch [22/60], Step [600/642], LR 2.9e-05, Loss: 606.0 272 | 2022-07-17 20:44:32,859 [INFO] Start validation... 273 | 2022-07-17 20:46:04,609 [INFO] mAP score regular 70.46, mAP score EMA 75.94 274 | 2022-07-17 20:46:04,629 [INFO] current_mAP = 75.94, highest_mAP = 76.42, best_epoch=18 275 | 276 | 2022-07-17 20:46:04,630 [INFO] Save text embeddings done 277 | 2022-07-17 20:46:11,990 [INFO] Epoch [23/60], Step [000/642], LR 2.9e-05, Loss: 590.1 278 | 2022-07-17 20:47:57,088 [INFO] Epoch [23/60], Step [100/642], LR 2.9e-05, Loss: 570.9 279 | 2022-07-17 20:49:39,436 [INFO] Epoch [23/60], Step [200/642], LR 2.9e-05, Loss: 573.8 280 | 2022-07-17 20:51:22,581 [INFO] Epoch [23/60], Step [300/642], LR 2.9e-05, Loss: 568.8 281 | 2022-07-17 20:53:05,325 [INFO] Epoch [23/60], Step [400/642], LR 2.9e-05, Loss: 614.1 282 | 2022-07-17 20:54:47,585 [INFO] Epoch [23/60], Step [500/642], LR 2.9e-05, Loss: 557.5 283 | 2022-07-17 20:56:30,310 [INFO] Epoch [23/60], Step [600/642], LR 2.9e-05, Loss: 569.7 284 | 2022-07-17 20:57:11,431 [INFO] Start validation... 285 | -------------------------------------------------------------------------------- /logs/scpnet+cub.txt: -------------------------------------------------------------------------------- 1 | 2022-07-29 09:13:45,742 [INFO] Epoch [0/100], Step [000/063], LR 1.2e-05, Loss: 58426.6 2 | 2022-07-29 09:14:39,947 [INFO] Start validation... 3 | 2022-07-29 09:15:01,987 [INFO] mAP score regular 10.42, mAP score EMA 14.98 4 | 2022-07-29 09:15:03,820 [INFO] current_mAP = 14.98, highest_mAP = 14.98, best_epoch=0 5 | 6 | 2022-07-29 09:15:03,821 [INFO] Save text embeddings done 7 | 2022-07-29 09:15:08,845 [INFO] Epoch [1/100], Step [000/063], LR 1.4e-05, Loss: 9480.4 8 | 2022-07-29 09:16:04,440 [INFO] Start validation... 9 | 2022-07-29 09:16:26,846 [INFO] mAP score regular 12.89, mAP score EMA 12.77 10 | 2022-07-29 09:16:26,854 [INFO] current_mAP = 12.89, highest_mAP = 14.98, best_epoch=0 11 | 12 | 2022-07-29 09:16:26,854 [INFO] Save text embeddings done 13 | 2022-07-29 09:16:32,453 [INFO] Epoch [2/100], Step [000/063], LR 1.9e-05, Loss: 4303.4 14 | 2022-07-29 09:17:28,590 [INFO] Start validation... 15 | 2022-07-29 09:17:51,066 [INFO] mAP score regular 13.16, mAP score EMA 12.49 16 | 2022-07-29 09:17:51,074 [INFO] current_mAP = 13.16, highest_mAP = 14.98, best_epoch=0 17 | 18 | 2022-07-29 09:17:51,074 [INFO] Save text embeddings done 19 | 2022-07-29 09:17:56,180 [INFO] Epoch [3/100], Step [000/063], LR 2.8e-05, Loss: 4418.8 20 | 2022-07-29 09:18:51,915 [INFO] Start validation... 21 | 2022-07-29 09:19:14,250 [INFO] mAP score regular 13.40, mAP score EMA 12.42 22 | 2022-07-29 09:19:14,258 [INFO] current_mAP = 13.40, highest_mAP = 14.98, best_epoch=0 23 | 24 | 2022-07-29 09:19:14,258 [INFO] Save text embeddings done 25 | 2022-07-29 09:19:19,482 [INFO] Epoch [4/100], Step [000/063], LR 4.0e-05, Loss: 4741.4 26 | 2022-07-29 09:20:15,280 [INFO] Start validation... 27 | 2022-07-29 09:20:37,483 [INFO] mAP score regular 13.38, mAP score EMA 12.46 28 | 2022-07-29 09:20:37,491 [INFO] current_mAP = 13.38, highest_mAP = 14.98, best_epoch=0 29 | 30 | 2022-07-29 09:20:37,491 [INFO] Save text embeddings done 31 | 2022-07-29 09:20:42,655 [INFO] Epoch [5/100], Step [000/063], LR 5.4e-05, Loss: 4832.6 32 | 2022-07-29 09:21:38,819 [INFO] Start validation... 33 | 2022-07-29 09:22:00,899 [INFO] mAP score regular 13.63, mAP score EMA 12.63 34 | 2022-07-29 09:22:00,907 [INFO] current_mAP = 13.63, highest_mAP = 14.98, best_epoch=0 35 | 36 | 2022-07-29 09:22:00,907 [INFO] Save text embeddings done 37 | 2022-07-29 09:22:05,806 [INFO] Epoch [6/100], Step [000/063], LR 7.2e-05, Loss: 4149.6 38 | 2022-07-29 09:23:02,277 [INFO] Start validation... 39 | 2022-07-29 09:23:24,715 [INFO] mAP score regular 14.65, mAP score EMA 12.93 40 | 2022-07-29 09:23:24,723 [INFO] current_mAP = 14.65, highest_mAP = 14.98, best_epoch=0 41 | 42 | 2022-07-29 09:23:24,723 [INFO] Save text embeddings done 43 | 2022-07-29 09:23:31,049 [INFO] Epoch [7/100], Step [000/063], LR 9.1e-05, Loss: 4404.6 44 | 2022-07-29 09:24:26,983 [INFO] Start validation... 45 | 2022-07-29 09:24:49,659 [INFO] mAP score regular 15.59, mAP score EMA 13.41 46 | 2022-07-29 09:24:58,817 [INFO] current_mAP = 15.59, highest_mAP = 15.59, best_epoch=7 47 | 48 | 2022-07-29 09:24:59,235 [INFO] Save text embeddings done 49 | 2022-07-29 09:25:05,516 [INFO] Epoch [8/100], Step [000/063], LR 1.1e-04, Loss: 4698.5 50 | 2022-07-29 09:26:03,066 [INFO] Start validation... 51 | 2022-07-29 09:26:26,515 [INFO] mAP score regular 16.58, mAP score EMA 14.19 52 | 2022-07-29 09:26:40,525 [INFO] current_mAP = 16.58, highest_mAP = 16.58, best_epoch=8 53 | 54 | 2022-07-29 09:26:40,526 [INFO] Save text embeddings done 55 | 2022-07-29 09:26:45,407 [INFO] Epoch [9/100], Step [000/063], LR 1.3e-04, Loss: 5346.6 56 | 2022-07-29 09:27:50,084 [INFO] Start validation... 57 | 2022-07-29 09:28:12,438 [INFO] mAP score regular 18.00, mAP score EMA 15.06 58 | 2022-07-29 09:28:22,483 [INFO] current_mAP = 18.00, highest_mAP = 18.00, best_epoch=9 59 | 60 | 2022-07-29 09:28:22,484 [INFO] Save text embeddings done 61 | 2022-07-29 09:28:28,229 [INFO] Epoch [10/100], Step [000/063], LR 1.6e-04, Loss: 4414.6 62 | 2022-07-29 09:29:25,566 [INFO] Start validation... 63 | 2022-07-29 09:29:48,410 [INFO] mAP score regular 17.60, mAP score EMA 16.27 64 | 2022-07-29 09:29:48,417 [INFO] current_mAP = 17.60, highest_mAP = 18.00, best_epoch=9 65 | 66 | 2022-07-29 09:29:48,418 [INFO] Save text embeddings done 67 | 2022-07-29 09:29:55,244 [INFO] Epoch [11/100], Step [000/063], LR 1.8e-04, Loss: 4678.3 68 | 2022-07-29 09:30:52,128 [INFO] Start validation... 69 | 2022-07-29 09:31:15,142 [INFO] mAP score regular 19.05, mAP score EMA 17.57 70 | 2022-07-29 09:31:23,717 [INFO] current_mAP = 19.05, highest_mAP = 19.05, best_epoch=11 71 | 72 | 2022-07-29 09:31:23,718 [INFO] Save text embeddings done 73 | 2022-07-29 09:31:29,185 [INFO] Epoch [12/100], Step [000/063], LR 2.0e-04, Loss: 4220.6 74 | 2022-07-29 09:32:26,296 [INFO] Start validation... 75 | 2022-07-29 09:32:49,182 [INFO] mAP score regular 21.08, mAP score EMA 18.73 76 | 2022-07-29 09:32:58,051 [INFO] current_mAP = 21.08, highest_mAP = 21.08, best_epoch=12 77 | 78 | 2022-07-29 09:32:58,052 [INFO] Save text embeddings done 79 | 2022-07-29 09:33:04,718 [INFO] Epoch [13/100], Step [000/063], LR 2.2e-04, Loss: 4281.9 80 | 2022-07-29 09:34:01,639 [INFO] Start validation... 81 | 2022-07-29 09:34:24,362 [INFO] mAP score regular 20.11, mAP score EMA 19.80 82 | 2022-07-29 09:34:24,370 [INFO] current_mAP = 20.11, highest_mAP = 21.08, best_epoch=12 83 | 84 | 2022-07-29 09:34:24,370 [INFO] Save text embeddings done 85 | 2022-07-29 09:34:31,286 [INFO] Epoch [14/100], Step [000/063], LR 2.4e-04, Loss: 5025.3 86 | 2022-07-29 09:35:29,472 [INFO] Start validation... 87 | 2022-07-29 09:35:53,424 [INFO] mAP score regular 21.95, mAP score EMA 20.66 88 | 2022-07-29 09:36:02,892 [INFO] current_mAP = 21.95, highest_mAP = 21.95, best_epoch=14 89 | 90 | 2022-07-29 09:36:02,892 [INFO] Save text embeddings done 91 | 2022-07-29 09:36:08,986 [INFO] Epoch [15/100], Step [000/063], LR 2.6e-04, Loss: 4703.9 92 | 2022-07-29 09:37:05,694 [INFO] Start validation... 93 | 2022-07-29 09:37:28,806 [INFO] mAP score regular 21.79, mAP score EMA 21.44 94 | 2022-07-29 09:37:28,826 [INFO] current_mAP = 21.79, highest_mAP = 21.95, best_epoch=14 95 | 96 | 2022-07-29 09:37:28,826 [INFO] Save text embeddings done 97 | 2022-07-29 09:37:34,459 [INFO] Epoch [16/100], Step [000/063], LR 2.7e-04, Loss: 4622.5 98 | 2022-07-29 09:38:31,933 [INFO] Start validation... 99 | 2022-07-29 09:38:55,138 [INFO] mAP score regular 22.05, mAP score EMA 22.16 100 | 2022-07-29 09:39:04,021 [INFO] current_mAP = 22.16, highest_mAP = 22.16, best_epoch=16 101 | 102 | 2022-07-29 09:39:04,021 [INFO] Save text embeddings done 103 | 2022-07-29 09:39:09,860 [INFO] Epoch [17/100], Step [000/063], LR 2.8e-04, Loss: 4661.1 104 | 2022-07-29 09:40:07,510 [INFO] Start validation... 105 | 2022-07-29 09:40:30,736 [INFO] mAP score regular 22.23, mAP score EMA 22.78 106 | 2022-07-29 09:40:44,520 [INFO] current_mAP = 22.78, highest_mAP = 22.78, best_epoch=17 107 | 108 | 2022-07-29 09:40:44,520 [INFO] Save text embeddings done 109 | 2022-07-29 09:40:51,565 [INFO] Epoch [18/100], Step [000/063], LR 2.9e-04, Loss: 4612.7 110 | 2022-07-29 09:41:55,680 [INFO] Start validation... 111 | 2022-07-29 09:42:19,019 [INFO] mAP score regular 22.62, mAP score EMA 23.19 112 | 2022-07-29 09:42:27,813 [INFO] current_mAP = 23.19, highest_mAP = 23.19, best_epoch=18 113 | 114 | 2022-07-29 09:42:27,814 [INFO] Save text embeddings done 115 | 2022-07-29 09:42:32,668 [INFO] Epoch [19/100], Step [000/063], LR 3.0e-04, Loss: 4768.1 116 | 2022-07-29 09:43:37,441 [INFO] Start validation... 117 | 2022-07-29 09:44:00,393 [INFO] mAP score regular 22.65, mAP score EMA 23.51 118 | 2022-07-29 09:44:15,909 [INFO] current_mAP = 23.51, highest_mAP = 23.51, best_epoch=19 119 | 120 | 2022-07-29 09:44:15,909 [INFO] Save text embeddings done 121 | 2022-07-29 09:44:20,407 [INFO] Epoch [20/100], Step [000/063], LR 3.0e-04, Loss: 4798.3 122 | 2022-07-29 09:45:20,859 [INFO] Start validation... 123 | 2022-07-29 09:45:43,773 [INFO] mAP score regular 21.91, mAP score EMA 23.79 124 | 2022-07-29 09:45:55,448 [INFO] current_mAP = 23.79, highest_mAP = 23.79, best_epoch=20 125 | 126 | 2022-07-29 09:45:55,449 [INFO] Save text embeddings done 127 | 2022-07-29 09:46:01,795 [INFO] Epoch [21/100], Step [000/063], LR 3.0e-04, Loss: 4817.1 128 | 2022-07-29 09:47:00,813 [INFO] Start validation... 129 | 2022-07-29 09:47:25,342 [INFO] mAP score regular 23.10, mAP score EMA 24.03 130 | 2022-07-29 09:47:35,078 [INFO] current_mAP = 24.03, highest_mAP = 24.03, best_epoch=21 131 | 132 | 2022-07-29 09:47:35,079 [INFO] Save text embeddings done 133 | 2022-07-29 09:47:41,634 [INFO] Epoch [22/100], Step [000/063], LR 3.0e-04, Loss: 4627.6 134 | 2022-07-29 09:48:41,624 [INFO] Start validation... 135 | 2022-07-29 09:49:05,467 [INFO] mAP score regular 23.45, mAP score EMA 24.34 136 | 2022-07-29 09:49:19,710 [INFO] current_mAP = 24.34, highest_mAP = 24.34, best_epoch=22 137 | 138 | 2022-07-29 09:49:19,710 [INFO] Save text embeddings done 139 | 2022-07-29 09:49:26,541 [INFO] Epoch [23/100], Step [000/063], LR 3.0e-04, Loss: 4463.8 140 | 2022-07-29 09:50:30,465 [INFO] Start validation... 141 | 2022-07-29 09:50:53,782 [INFO] mAP score regular 23.33, mAP score EMA 24.54 142 | 2022-07-29 09:51:02,617 [INFO] current_mAP = 24.54, highest_mAP = 24.54, best_epoch=23 143 | 144 | 2022-07-29 09:51:03,080 [INFO] Save text embeddings done 145 | 2022-07-29 09:51:08,366 [INFO] Epoch [24/100], Step [000/063], LR 3.0e-04, Loss: 4642.3 146 | 2022-07-29 09:52:07,097 [INFO] Start validation... 147 | 2022-07-29 09:52:29,895 [INFO] mAP score regular 23.29, mAP score EMA 24.67 148 | 2022-07-29 09:52:44,963 [INFO] current_mAP = 24.67, highest_mAP = 24.67, best_epoch=24 149 | 150 | 2022-07-29 09:52:44,963 [INFO] Save text embeddings done 151 | 2022-07-29 09:52:50,137 [INFO] Epoch [25/100], Step [000/063], LR 3.0e-04, Loss: 4619.1 152 | 2022-07-29 09:53:47,223 [INFO] Start validation... 153 | 2022-07-29 09:54:10,013 [INFO] mAP score regular 23.32, mAP score EMA 24.80 154 | 2022-07-29 09:54:21,557 [INFO] current_mAP = 24.80, highest_mAP = 24.80, best_epoch=25 155 | 156 | 2022-07-29 09:54:21,558 [INFO] Save text embeddings done 157 | 2022-07-29 09:54:26,215 [INFO] Epoch [26/100], Step [000/063], LR 3.0e-04, Loss: 4161.4 158 | 2022-07-29 09:55:24,199 [INFO] Start validation... 159 | 2022-07-29 09:55:47,376 [INFO] mAP score regular 23.22, mAP score EMA 24.93 160 | 2022-07-29 09:55:56,704 [INFO] current_mAP = 24.93, highest_mAP = 24.93, best_epoch=26 161 | 162 | 2022-07-29 09:55:56,705 [INFO] Save text embeddings done 163 | 2022-07-29 09:56:03,486 [INFO] Epoch [27/100], Step [000/063], LR 2.9e-04, Loss: 4729.1 164 | 2022-07-29 09:57:01,029 [INFO] Start validation... 165 | 2022-07-29 09:57:24,318 [INFO] mAP score regular 23.18, mAP score EMA 25.08 166 | 2022-07-29 09:57:50,660 [INFO] current_mAP = 25.08, highest_mAP = 25.08, best_epoch=27 167 | 168 | 2022-07-29 09:57:50,661 [INFO] Save text embeddings done 169 | 2022-07-29 09:57:55,166 [INFO] Epoch [28/100], Step [000/063], LR 2.9e-04, Loss: 4741.3 170 | 2022-07-29 09:58:52,173 [INFO] Start validation... 171 | 2022-07-29 09:59:15,006 [INFO] mAP score regular 23.48, mAP score EMA 25.09 172 | 2022-07-29 09:59:42,331 [INFO] current_mAP = 25.09, highest_mAP = 25.09, best_epoch=28 173 | 174 | 2022-07-29 09:59:42,332 [INFO] Save text embeddings done 175 | 2022-07-29 09:59:48,479 [INFO] Epoch [29/100], Step [000/063], LR 2.9e-04, Loss: 4437.4 176 | 2022-07-29 10:00:46,094 [INFO] Start validation... 177 | 2022-07-29 10:01:08,688 [INFO] mAP score regular 24.26, mAP score EMA 25.17 178 | 2022-07-29 10:01:32,452 [INFO] current_mAP = 25.17, highest_mAP = 25.17, best_epoch=29 179 | 180 | 2022-07-29 10:01:32,452 [INFO] Save text embeddings done 181 | 2022-07-29 10:01:37,289 [INFO] Epoch [30/100], Step [000/063], LR 2.9e-04, Loss: 4449.7 182 | 2022-07-29 10:02:39,085 [INFO] Start validation... 183 | 2022-07-29 10:03:01,615 [INFO] mAP score regular 23.91, mAP score EMA 25.31 184 | 2022-07-29 10:03:17,173 [INFO] current_mAP = 25.31, highest_mAP = 25.31, best_epoch=30 185 | 186 | 2022-07-29 10:03:17,173 [INFO] Save text embeddings done 187 | 2022-07-29 10:03:22,586 [INFO] Epoch [31/100], Step [000/063], LR 2.9e-04, Loss: 4160.9 188 | 2022-07-29 10:04:30,225 [INFO] Start validation... 189 | 2022-07-29 10:04:53,147 [INFO] mAP score regular 23.22, mAP score EMA 25.44 190 | 2022-07-29 10:05:21,281 [INFO] current_mAP = 25.44, highest_mAP = 25.44, best_epoch=31 191 | 192 | 2022-07-29 10:05:23,469 [INFO] Save text embeddings done 193 | 2022-07-29 10:05:28,687 [INFO] Epoch [32/100], Step [000/063], LR 2.8e-04, Loss: 4545.0 194 | 2022-07-29 10:06:25,065 [INFO] Start validation... 195 | 2022-07-29 10:06:47,916 [INFO] mAP score regular 24.28, mAP score EMA 25.53 196 | 2022-07-29 10:07:17,111 [INFO] current_mAP = 25.53, highest_mAP = 25.53, best_epoch=32 197 | 198 | 2022-07-29 10:07:17,112 [INFO] Save text embeddings done 199 | 2022-07-29 10:07:21,864 [INFO] Epoch [33/100], Step [000/063], LR 2.8e-04, Loss: 4666.8 200 | 2022-07-29 10:08:19,891 [INFO] Start validation... 201 | 2022-07-29 10:08:42,551 [INFO] mAP score regular 23.81, mAP score EMA 25.58 202 | 2022-07-29 10:09:03,455 [INFO] current_mAP = 25.58, highest_mAP = 25.58, best_epoch=33 203 | 204 | 2022-07-29 10:09:03,455 [INFO] Save text embeddings done 205 | 2022-07-29 10:09:09,183 [INFO] Epoch [34/100], Step [000/063], LR 2.8e-04, Loss: 4361.8 206 | 2022-07-29 10:10:09,995 [INFO] Start validation... 207 | 2022-07-29 10:10:32,777 [INFO] mAP score regular 22.87, mAP score EMA 25.65 208 | 2022-07-29 10:10:49,802 [INFO] current_mAP = 25.65, highest_mAP = 25.65, best_epoch=34 209 | 210 | 2022-07-29 10:10:49,803 [INFO] Save text embeddings done 211 | 2022-07-29 10:10:56,725 [INFO] Epoch [35/100], Step [000/063], LR 2.7e-04, Loss: 4491.9 212 | 2022-07-29 10:11:53,650 [INFO] Start validation... 213 | 2022-07-29 10:12:16,334 [INFO] mAP score regular 23.81, mAP score EMA 25.63 214 | 2022-07-29 10:12:16,346 [INFO] current_mAP = 25.63, highest_mAP = 25.65, best_epoch=34 215 | 216 | 2022-07-29 10:12:16,346 [INFO] Save text embeddings done 217 | 2022-07-29 10:12:21,707 [INFO] Epoch [36/100], Step [000/063], LR 2.7e-04, Loss: 4725.1 218 | 2022-07-29 10:13:19,069 [INFO] Start validation... 219 | 2022-07-29 10:13:42,494 [INFO] mAP score regular 23.84, mAP score EMA 25.71 220 | 2022-07-29 10:13:53,448 [INFO] current_mAP = 25.71, highest_mAP = 25.71, best_epoch=36 221 | 222 | 2022-07-29 10:13:53,449 [INFO] Save text embeddings done 223 | 2022-07-29 10:13:58,979 [INFO] Epoch [37/100], Step [000/063], LR 2.7e-04, Loss: 4550.1 224 | 2022-07-29 10:14:56,588 [INFO] Start validation... 225 | 2022-07-29 10:15:20,064 [INFO] mAP score regular 24.24, mAP score EMA 25.67 226 | 2022-07-29 10:15:20,073 [INFO] current_mAP = 25.67, highest_mAP = 25.71, best_epoch=36 227 | 228 | 2022-07-29 10:15:20,073 [INFO] Save text embeddings done 229 | 2022-07-29 10:15:26,300 [INFO] Epoch [38/100], Step [000/063], LR 2.6e-04, Loss: 4764.7 230 | 2022-07-29 10:16:23,005 [INFO] Start validation... 231 | 2022-07-29 10:16:45,663 [INFO] mAP score regular 23.94, mAP score EMA 25.64 232 | 2022-07-29 10:16:45,672 [INFO] current_mAP = 25.64, highest_mAP = 25.71, best_epoch=36 233 | 234 | 2022-07-29 10:16:45,672 [INFO] Save text embeddings done 235 | 2022-07-29 10:16:50,267 [INFO] Epoch [39/100], Step [000/063], LR 2.6e-04, Loss: 4412.5 236 | 2022-07-29 10:17:47,600 [INFO] Start validation... 237 | 2022-07-29 10:18:10,546 [INFO] mAP score regular 23.24, mAP score EMA 25.61 238 | 2022-07-29 10:18:10,556 [INFO] current_mAP = 25.61, highest_mAP = 25.71, best_epoch=36 239 | 240 | 2022-07-29 10:18:10,556 [INFO] Save text embeddings done 241 | 2022-07-29 10:18:16,734 [INFO] Epoch [40/100], Step [000/063], LR 2.6e-04, Loss: 4513.1 242 | 2022-07-29 10:19:15,864 [INFO] Start validation... 243 | 2022-07-29 10:19:38,685 [INFO] mAP score regular 23.93, mAP score EMA 25.60 244 | 2022-07-29 10:19:38,694 [INFO] current_mAP = 25.60, highest_mAP = 25.71, best_epoch=36 245 | 246 | 2022-07-29 10:19:38,694 [INFO] Save text embeddings done 247 | 2022-07-29 10:19:43,243 [INFO] Epoch [41/100], Step [000/063], LR 2.5e-04, Loss: 4301.8 248 | 2022-07-29 10:20:41,449 [INFO] Start validation... 249 | 2022-07-29 10:21:04,521 [INFO] mAP score regular 23.22, mAP score EMA 25.53 250 | 2022-07-29 10:21:04,530 [INFO] current_mAP = 25.53, highest_mAP = 25.71, best_epoch=36 251 | 252 | 2022-07-29 10:21:04,530 [INFO] Save text embeddings done 253 | 2022-07-29 10:21:09,729 [INFO] Epoch [42/100], Step [000/063], LR 2.5e-04, Loss: 4514.9 254 | 2022-07-29 10:22:07,055 [INFO] Start validation... 255 | 2022-07-29 10:22:30,427 [INFO] mAP score regular 23.71, mAP score EMA 25.52 256 | 2022-07-29 10:22:30,437 [INFO] current_mAP = 25.52, highest_mAP = 25.71, best_epoch=36 257 | 258 | 2022-07-29 10:22:30,437 [INFO] Save text embeddings done 259 | 2022-07-29 10:22:35,814 [INFO] Epoch [43/100], Step [000/063], LR 2.4e-04, Loss: 4668.1 260 | 2022-07-29 10:23:33,230 [INFO] Start validation... 261 | 2022-07-29 10:23:56,347 [INFO] mAP score regular 22.90, mAP score EMA 25.47 262 | 2022-07-29 10:23:56,355 [INFO] current_mAP = 25.47, highest_mAP = 25.71, best_epoch=36 263 | 264 | 2022-07-29 10:23:56,356 [INFO] Save text embeddings done 265 | 2022-07-29 10:24:01,297 [INFO] Epoch [44/100], Step [000/063], LR 2.4e-04, Loss: 4356.1 266 | 2022-07-29 10:24:58,189 [INFO] Start validation... 267 | 2022-07-29 10:25:21,006 [INFO] mAP score regular 23.78, mAP score EMA 25.42 268 | 2022-07-29 10:25:21,014 [INFO] current_mAP = 25.42, highest_mAP = 25.71, best_epoch=36 269 | 270 | 2022-07-29 10:25:21,014 [INFO] Save text embeddings done 271 | 2022-07-29 10:25:26,686 [INFO] Epoch [45/100], Step [000/063], LR 2.3e-04, Loss: 4519.1 272 | 2022-07-29 10:26:23,847 [INFO] Start validation... 273 | 2022-07-29 10:26:47,035 [INFO] mAP score regular 23.43, mAP score EMA 25.43 274 | 2022-07-29 10:26:47,045 [INFO] current_mAP = 25.43, highest_mAP = 25.71, best_epoch=36 275 | 276 | 2022-07-29 10:26:47,045 [INFO] Save text embeddings done 277 | 2022-07-29 10:26:54,267 [INFO] Epoch [46/100], Step [000/063], LR 2.3e-04, Loss: 4271.3 278 | 2022-07-29 10:27:52,214 [INFO] Start validation... 279 | 2022-07-29 10:28:15,352 [INFO] mAP score regular 23.96, mAP score EMA 25.40 280 | 2022-07-29 10:28:15,360 [INFO] current_mAP = 25.40, highest_mAP = 25.71, best_epoch=36 281 | 282 | 2022-07-29 10:28:15,360 [INFO] Save text embeddings done 283 | 2022-07-29 10:28:21,349 [INFO] Epoch [47/100], Step [000/063], LR 2.2e-04, Loss: 4291.6 284 | 2022-07-29 10:29:18,763 [INFO] Start validation... 285 | 2022-07-29 10:29:41,741 [INFO] mAP score regular 23.26, mAP score EMA 25.33 286 | 2022-07-29 10:29:41,749 [INFO] current_mAP = 25.33, highest_mAP = 25.71, best_epoch=36 287 | 288 | 2022-07-29 10:29:41,750 [INFO] Save text embeddings done 289 | 2022-07-29 10:29:46,669 [INFO] Epoch [48/100], Step [000/063], LR 2.2e-04, Loss: 4608.9 290 | 2022-07-29 10:30:43,650 [INFO] Start validation... 291 | 2022-07-29 10:31:07,049 [INFO] mAP score regular 23.28, mAP score EMA 25.22 292 | 2022-07-29 10:31:07,057 [INFO] current_mAP = 25.22, highest_mAP = 25.71, best_epoch=36 293 | 294 | 2022-07-29 10:31:07,057 [INFO] Save text embeddings done 295 | 2022-07-29 10:31:13,064 [INFO] Epoch [49/100], Step [000/063], LR 2.1e-04, Loss: 4615.0 296 | 2022-07-29 10:32:09,690 [INFO] Start validation... 297 | 2022-07-29 10:32:32,531 [INFO] mAP score regular 23.78, mAP score EMA 25.14 298 | 2022-07-29 10:32:32,541 [INFO] current_mAP = 25.14, highest_mAP = 25.71, best_epoch=36 299 | 300 | 2022-07-29 10:32:32,541 [INFO] Save text embeddings done 301 | 2022-07-29 10:32:38,914 [INFO] Epoch [50/100], Step [000/063], LR 2.1e-04, Loss: 4325.3 302 | -------------------------------------------------------------------------------- /logs/scpnet+nuswide.txt: -------------------------------------------------------------------------------- 1 | 2022-07-22 16:58:23,583 [INFO] Epoch [0/60], Step [000/931], LR 1.2e-06, Loss: 168696.5 2 | 2022-07-22 17:00:07,435 [INFO] Epoch [0/60], Step [100/931], LR 1.2e-06, Loss: 1152.4 3 | 2022-07-22 17:01:48,504 [INFO] Epoch [0/60], Step [200/931], LR 1.2e-06, Loss: 842.5 4 | 2022-07-22 17:03:29,886 [INFO] Epoch [0/60], Step [300/931], LR 1.2e-06, Loss: 821.4 5 | 2022-07-22 17:05:10,584 [INFO] Epoch [0/60], Step [400/931], LR 1.3e-06, Loss: 607.3 6 | 2022-07-22 17:06:51,785 [INFO] Epoch [0/60], Step [500/931], LR 1.3e-06, Loss: 471.3 7 | 2022-07-22 17:08:32,190 [INFO] Epoch [0/60], Step [600/931], LR 1.3e-06, Loss: 488.9 8 | 2022-07-22 17:10:12,379 [INFO] Epoch [0/60], Step [700/931], LR 1.4e-06, Loss: 508.0 9 | 2022-07-22 17:11:53,464 [INFO] Epoch [0/60], Step [800/931], LR 1.4e-06, Loss: 409.1 10 | 2022-07-22 17:13:34,056 [INFO] Epoch [0/60], Step [900/931], LR 1.5e-06, Loss: 425.4 11 | 2022-07-22 17:14:04,058 [INFO] Start validation... 12 | 2022-07-22 17:15:55,879 [INFO] mAP score regular 11.84, mAP score EMA 3.66 13 | 2022-07-22 17:15:57,927 [INFO] current_mAP = 11.84, highest_mAP = 11.84, best_epoch=0 14 | 15 | 2022-07-22 17:15:57,928 [INFO] Save text embeddings done 16 | 2022-07-22 17:16:02,440 [INFO] Epoch [1/60], Step [000/931], LR 1.5e-06, Loss: 9494.0 17 | 2022-07-22 17:17:45,651 [INFO] Epoch [1/60], Step [100/931], LR 1.5e-06, Loss: 1090.6 18 | 2022-07-22 17:19:26,972 [INFO] Epoch [1/60], Step [200/931], LR 1.6e-06, Loss: 704.7 19 | 2022-07-22 17:21:08,998 [INFO] Epoch [1/60], Step [300/931], LR 1.7e-06, Loss: 694.3 20 | 2022-07-22 17:22:51,621 [INFO] Epoch [1/60], Step [400/931], LR 1.8e-06, Loss: 579.7 21 | 2022-07-22 17:24:33,624 [INFO] Epoch [1/60], Step [500/931], LR 1.9e-06, Loss: 521.8 22 | 2022-07-22 17:26:15,368 [INFO] Epoch [1/60], Step [600/931], LR 1.9e-06, Loss: 515.0 23 | 2022-07-22 17:27:56,879 [INFO] Epoch [1/60], Step [700/931], LR 2.0e-06, Loss: 465.6 24 | 2022-07-22 17:29:38,717 [INFO] Epoch [1/60], Step [800/931], LR 2.2e-06, Loss: 471.2 25 | 2022-07-22 17:31:20,236 [INFO] Epoch [1/60], Step [900/931], LR 2.3e-06, Loss: 503.3 26 | 2022-07-22 17:31:50,347 [INFO] Start validation... 27 | 2022-07-22 17:33:40,165 [INFO] mAP score regular 39.21, mAP score EMA 5.20 28 | 2022-07-22 17:33:46,354 [INFO] current_mAP = 39.21, highest_mAP = 39.21, best_epoch=1 29 | 30 | 2022-07-22 17:33:46,354 [INFO] Save text embeddings done 31 | 2022-07-22 17:33:53,102 [INFO] Epoch [2/60], Step [000/931], LR 2.3e-06, Loss: 539.3 32 | 2022-07-22 17:35:35,290 [INFO] Epoch [2/60], Step [100/931], LR 2.4e-06, Loss: 485.4 33 | 2022-07-22 17:37:17,157 [INFO] Epoch [2/60], Step [200/931], LR 2.5e-06, Loss: 444.6 34 | 2022-07-22 17:38:58,742 [INFO] Epoch [2/60], Step [300/931], LR 2.7e-06, Loss: 437.2 35 | 2022-07-22 17:40:40,263 [INFO] Epoch [2/60], Step [400/931], LR 2.8e-06, Loss: 409.9 36 | 2022-07-22 17:42:22,608 [INFO] Epoch [2/60], Step [500/931], LR 3.0e-06, Loss: 465.6 37 | 2022-07-22 17:44:04,586 [INFO] Epoch [2/60], Step [600/931], LR 3.1e-06, Loss: 465.4 38 | 2022-07-22 17:45:46,511 [INFO] Epoch [2/60], Step [700/931], LR 3.3e-06, Loss: 443.5 39 | 2022-07-22 17:47:28,576 [INFO] Epoch [2/60], Step [800/931], LR 3.4e-06, Loss: 409.1 40 | 2022-07-22 17:49:10,655 [INFO] Epoch [2/60], Step [900/931], LR 3.6e-06, Loss: 407.5 41 | 2022-07-22 17:49:40,992 [INFO] Start validation... 42 | 2022-07-22 17:51:31,065 [INFO] mAP score regular 45.36, mAP score EMA 8.71 43 | 2022-07-22 17:51:37,773 [INFO] current_mAP = 45.36, highest_mAP = 45.36, best_epoch=2 44 | 45 | 2022-07-22 17:51:37,774 [INFO] Save text embeddings done 46 | 2022-07-22 17:51:44,476 [INFO] Epoch [3/60], Step [000/931], LR 3.6e-06, Loss: 405.8 47 | 2022-07-22 17:53:26,599 [INFO] Epoch [3/60], Step [100/931], LR 3.8e-06, Loss: 382.8 48 | 2022-07-22 17:55:08,574 [INFO] Epoch [3/60], Step [200/931], LR 4.0e-06, Loss: 437.0 49 | 2022-07-22 17:56:50,093 [INFO] Epoch [3/60], Step [300/931], LR 4.2e-06, Loss: 421.8 50 | 2022-07-22 17:58:32,676 [INFO] Epoch [3/60], Step [400/931], LR 4.3e-06, Loss: 407.3 51 | 2022-07-22 18:00:14,487 [INFO] Epoch [3/60], Step [500/931], LR 4.5e-06, Loss: 446.2 52 | 2022-07-22 18:01:56,628 [INFO] Epoch [3/60], Step [600/931], LR 4.7e-06, Loss: 392.6 53 | 2022-07-22 18:03:38,722 [INFO] Epoch [3/60], Step [700/931], LR 4.9e-06, Loss: 411.1 54 | 2022-07-22 18:05:20,252 [INFO] Epoch [3/60], Step [800/931], LR 5.1e-06, Loss: 426.3 55 | 2022-07-22 18:07:02,374 [INFO] Epoch [3/60], Step [900/931], LR 5.4e-06, Loss: 377.5 56 | 2022-07-22 18:07:32,762 [INFO] Start validation... 57 | 2022-07-22 18:09:23,769 [INFO] mAP score regular 49.79, mAP score EMA 18.69 58 | 2022-07-22 18:09:30,335 [INFO] current_mAP = 49.79, highest_mAP = 49.79, best_epoch=3 59 | 60 | 2022-07-22 18:09:30,335 [INFO] Save text embeddings done 61 | 2022-07-22 18:09:35,111 [INFO] Epoch [4/60], Step [000/931], LR 5.4e-06, Loss: 348.0 62 | 2022-07-22 18:11:17,859 [INFO] Epoch [4/60], Step [100/931], LR 5.6e-06, Loss: 419.7 63 | 2022-07-22 18:12:58,908 [INFO] Epoch [4/60], Step [200/931], LR 5.9e-06, Loss: 434.7 64 | 2022-07-22 18:14:40,430 [INFO] Epoch [4/60], Step [300/931], LR 6.1e-06, Loss: 422.4 65 | 2022-07-22 18:16:21,626 [INFO] Epoch [4/60], Step [400/931], LR 6.3e-06, Loss: 382.5 66 | 2022-07-22 18:18:03,176 [INFO] Epoch [4/60], Step [500/931], LR 6.5e-06, Loss: 398.0 67 | 2022-07-22 18:19:44,669 [INFO] Epoch [4/60], Step [600/931], LR 6.8e-06, Loss: 366.3 68 | 2022-07-22 18:21:26,210 [INFO] Epoch [4/60], Step [700/931], LR 7.0e-06, Loss: 438.7 69 | 2022-07-22 18:23:07,129 [INFO] Epoch [4/60], Step [800/931], LR 7.3e-06, Loss: 430.0 70 | 2022-07-22 18:24:48,503 [INFO] Epoch [4/60], Step [900/931], LR 7.5e-06, Loss: 376.9 71 | 2022-07-22 18:25:18,670 [INFO] Start validation... 72 | 2022-07-22 18:27:08,789 [INFO] mAP score regular 53.76, mAP score EMA 35.17 73 | 2022-07-22 18:27:15,260 [INFO] current_mAP = 53.76, highest_mAP = 53.76, best_epoch=4 74 | 75 | 2022-07-22 18:27:15,260 [INFO] Save text embeddings done 76 | 2022-07-22 18:27:20,534 [INFO] Epoch [5/60], Step [000/931], LR 7.6e-06, Loss: 370.5 77 | 2022-07-22 18:29:02,237 [INFO] Epoch [5/60], Step [100/931], LR 7.9e-06, Loss: 448.7 78 | 2022-07-22 18:30:44,068 [INFO] Epoch [5/60], Step [200/931], LR 8.1e-06, Loss: 437.9 79 | 2022-07-22 18:32:25,512 [INFO] Epoch [5/60], Step [300/931], LR 8.4e-06, Loss: 450.5 80 | 2022-07-22 18:34:07,672 [INFO] Epoch [5/60], Step [400/931], LR 8.6e-06, Loss: 402.4 81 | 2022-07-22 18:35:49,313 [INFO] Epoch [5/60], Step [500/931], LR 8.9e-06, Loss: 383.7 82 | 2022-07-22 18:37:30,873 [INFO] Epoch [5/60], Step [600/931], LR 9.2e-06, Loss: 388.4 83 | 2022-07-22 18:39:12,336 [INFO] Epoch [5/60], Step [700/931], LR 9.5e-06, Loss: 410.7 84 | 2022-07-22 18:40:54,095 [INFO] Epoch [5/60], Step [800/931], LR 9.7e-06, Loss: 405.9 85 | 2022-07-22 18:42:35,310 [INFO] Epoch [5/60], Step [900/931], LR 1.0e-05, Loss: 427.1 86 | 2022-07-22 18:43:06,120 [INFO] Start validation... 87 | 2022-07-22 18:44:55,785 [INFO] mAP score regular 56.13, mAP score EMA 45.50 88 | 2022-07-22 18:45:02,754 [INFO] current_mAP = 56.13, highest_mAP = 56.13, best_epoch=5 89 | 90 | 2022-07-22 18:45:02,755 [INFO] Save text embeddings done 91 | 2022-07-22 18:45:08,786 [INFO] Epoch [6/60], Step [000/931], LR 1.0e-05, Loss: 377.6 92 | 2022-07-22 18:46:50,663 [INFO] Epoch [6/60], Step [100/931], LR 1.0e-05, Loss: 424.4 93 | 2022-07-22 18:48:32,182 [INFO] Epoch [6/60], Step [200/931], LR 1.1e-05, Loss: 411.5 94 | 2022-07-22 18:50:13,058 [INFO] Epoch [6/60], Step [300/931], LR 1.1e-05, Loss: 374.4 95 | 2022-07-22 18:51:54,736 [INFO] Epoch [6/60], Step [400/931], LR 1.1e-05, Loss: 386.2 96 | 2022-07-22 18:53:36,489 [INFO] Epoch [6/60], Step [500/931], LR 1.2e-05, Loss: 434.6 97 | 2022-07-22 18:55:17,819 [INFO] Epoch [6/60], Step [600/931], LR 1.2e-05, Loss: 388.9 98 | 2022-07-22 18:56:59,350 [INFO] Epoch [6/60], Step [700/931], LR 1.2e-05, Loss: 388.1 99 | 2022-07-22 18:58:41,195 [INFO] Epoch [6/60], Step [800/931], LR 1.2e-05, Loss: 447.5 100 | 2022-07-22 19:00:22,662 [INFO] Epoch [6/60], Step [900/931], LR 1.3e-05, Loss: 495.3 101 | 2022-07-22 19:00:52,610 [INFO] Start validation... 102 | 2022-07-22 19:02:44,761 [INFO] mAP score regular 57.37, mAP score EMA 49.74 103 | 2022-07-22 19:02:51,485 [INFO] current_mAP = 57.37, highest_mAP = 57.37, best_epoch=6 104 | 105 | 2022-07-22 19:02:51,486 [INFO] Save text embeddings done 106 | 2022-07-22 19:02:56,513 [INFO] Epoch [7/60], Step [000/931], LR 1.3e-05, Loss: 435.5 107 | 2022-07-22 19:04:38,287 [INFO] Epoch [7/60], Step [100/931], LR 1.3e-05, Loss: 415.3 108 | 2022-07-22 19:06:20,097 [INFO] Epoch [7/60], Step [200/931], LR 1.3e-05, Loss: 459.2 109 | 2022-07-22 19:08:01,869 [INFO] Epoch [7/60], Step [300/931], LR 1.4e-05, Loss: 451.2 110 | 2022-07-22 19:09:43,821 [INFO] Epoch [7/60], Step [400/931], LR 1.4e-05, Loss: 425.5 111 | 2022-07-22 19:11:25,536 [INFO] Epoch [7/60], Step [500/931], LR 1.4e-05, Loss: 476.3 112 | 2022-07-22 19:13:06,800 [INFO] Epoch [7/60], Step [600/931], LR 1.5e-05, Loss: 438.6 113 | 2022-07-22 19:14:47,432 [INFO] Epoch [7/60], Step [700/931], LR 1.5e-05, Loss: 461.8 114 | 2022-07-22 19:16:28,145 [INFO] Epoch [7/60], Step [800/931], LR 1.5e-05, Loss: 471.2 115 | 2022-07-22 19:18:09,652 [INFO] Epoch [7/60], Step [900/931], LR 1.6e-05, Loss: 412.4 116 | 2022-07-22 19:18:39,353 [INFO] Start validation... 117 | 2022-07-22 19:20:25,645 [INFO] mAP score regular 58.69, mAP score EMA 52.78 118 | 2022-07-22 19:20:32,182 [INFO] current_mAP = 58.69, highest_mAP = 58.69, best_epoch=7 119 | 120 | 2022-07-22 19:20:32,183 [INFO] Save text embeddings done 121 | 2022-07-22 19:20:36,631 [INFO] Epoch [8/60], Step [000/931], LR 1.6e-05, Loss: 448.2 122 | 2022-07-22 19:22:18,627 [INFO] Epoch [8/60], Step [100/931], LR 1.6e-05, Loss: 484.9 123 | 2022-07-22 19:24:00,345 [INFO] Epoch [8/60], Step [200/931], LR 1.6e-05, Loss: 488.8 124 | 2022-07-22 19:25:42,333 [INFO] Epoch [8/60], Step [300/931], LR 1.7e-05, Loss: 469.3 125 | 2022-07-22 19:27:23,299 [INFO] Epoch [8/60], Step [400/931], LR 1.7e-05, Loss: 462.5 126 | 2022-07-22 19:29:04,810 [INFO] Epoch [8/60], Step [500/931], LR 1.7e-05, Loss: 444.4 127 | 2022-07-22 19:30:47,145 [INFO] Epoch [8/60], Step [600/931], LR 1.7e-05, Loss: 441.2 128 | 2022-07-22 19:32:27,715 [INFO] Epoch [8/60], Step [700/931], LR 1.8e-05, Loss: 424.7 129 | 2022-07-22 19:34:09,897 [INFO] Epoch [8/60], Step [800/931], LR 1.8e-05, Loss: 450.1 130 | 2022-07-22 19:35:50,608 [INFO] Epoch [8/60], Step [900/931], LR 1.8e-05, Loss: 445.6 131 | 2022-07-22 19:36:20,269 [INFO] Start validation... 132 | 2022-07-22 19:38:06,495 [INFO] mAP score regular 58.81, mAP score EMA 55.29 133 | 2022-07-22 19:38:12,678 [INFO] current_mAP = 58.81, highest_mAP = 58.81, best_epoch=8 134 | 135 | 2022-07-22 19:38:12,679 [INFO] Save text embeddings done 136 | 2022-07-22 19:38:17,254 [INFO] Epoch [9/60], Step [000/931], LR 1.8e-05, Loss: 488.8 137 | 2022-07-22 19:39:58,702 [INFO] Epoch [9/60], Step [100/931], LR 1.9e-05, Loss: 394.8 138 | 2022-07-22 19:41:39,838 [INFO] Epoch [9/60], Step [200/931], LR 1.9e-05, Loss: 447.5 139 | 2022-07-22 19:43:20,808 [INFO] Epoch [9/60], Step [300/931], LR 1.9e-05, Loss: 449.2 140 | 2022-07-22 19:45:01,452 [INFO] Epoch [9/60], Step [400/931], LR 2.0e-05, Loss: 461.2 141 | 2022-07-22 19:46:42,221 [INFO] Epoch [9/60], Step [500/931], LR 2.0e-05, Loss: 442.0 142 | 2022-07-22 19:48:23,145 [INFO] Epoch [9/60], Step [600/931], LR 2.0e-05, Loss: 448.4 143 | 2022-07-22 19:50:03,666 [INFO] Epoch [9/60], Step [700/931], LR 2.0e-05, Loss: 449.9 144 | 2022-07-22 19:51:44,383 [INFO] Epoch [9/60], Step [800/931], LR 2.1e-05, Loss: 460.1 145 | 2022-07-22 19:53:25,921 [INFO] Epoch [9/60], Step [900/931], LR 2.1e-05, Loss: 500.3 146 | 2022-07-22 19:53:55,621 [INFO] Start validation... 147 | 2022-07-22 19:55:43,415 [INFO] mAP score regular 58.21, mAP score EMA 57.47 148 | 2022-07-22 19:55:43,431 [INFO] current_mAP = 58.21, highest_mAP = 58.81, best_epoch=8 149 | 150 | 2022-07-22 19:55:43,432 [INFO] Save text embeddings done 151 | 2022-07-22 19:55:47,553 [INFO] Epoch [10/60], Step [000/931], LR 2.1e-05, Loss: 527.4 152 | 2022-07-22 19:57:28,562 [INFO] Epoch [10/60], Step [100/931], LR 2.1e-05, Loss: 446.2 153 | 2022-07-22 19:59:08,950 [INFO] Epoch [10/60], Step [200/931], LR 2.2e-05, Loss: 448.2 154 | 2022-07-22 20:00:49,431 [INFO] Epoch [10/60], Step [300/931], LR 2.2e-05, Loss: 430.5 155 | 2022-07-22 20:02:29,805 [INFO] Epoch [10/60], Step [400/931], LR 2.2e-05, Loss: 492.0 156 | 2022-07-22 20:04:10,502 [INFO] Epoch [10/60], Step [500/931], LR 2.2e-05, Loss: 512.3 157 | 2022-07-22 20:05:51,735 [INFO] Epoch [10/60], Step [600/931], LR 2.3e-05, Loss: 476.7 158 | 2022-07-22 20:07:32,334 [INFO] Epoch [10/60], Step [700/931], LR 2.3e-05, Loss: 472.3 159 | 2022-07-22 20:09:13,029 [INFO] Epoch [10/60], Step [800/931], LR 2.3e-05, Loss: 413.4 160 | 2022-07-22 20:10:53,699 [INFO] Epoch [10/60], Step [900/931], LR 2.4e-05, Loss: 488.3 161 | 2022-07-22 20:11:23,478 [INFO] Start validation... 162 | 2022-07-22 20:13:09,127 [INFO] mAP score regular 58.46, mAP score EMA 59.12 163 | 2022-07-22 20:13:15,739 [INFO] current_mAP = 59.12, highest_mAP = 59.12, best_epoch=10 164 | 165 | 2022-07-22 20:13:15,739 [INFO] Save text embeddings done 166 | 2022-07-22 20:13:21,393 [INFO] Epoch [11/60], Step [000/931], LR 2.4e-05, Loss: 469.4 167 | 2022-07-22 20:15:02,065 [INFO] Epoch [11/60], Step [100/931], LR 2.4e-05, Loss: 471.5 168 | 2022-07-22 20:16:42,953 [INFO] Epoch [11/60], Step [200/931], LR 2.4e-05, Loss: 508.9 169 | 2022-07-22 20:18:23,656 [INFO] Epoch [11/60], Step [300/931], LR 2.4e-05, Loss: 453.1 170 | 2022-07-22 20:20:05,861 [INFO] Epoch [11/60], Step [400/931], LR 2.5e-05, Loss: 460.3 171 | 2022-07-22 20:21:47,189 [INFO] Epoch [11/60], Step [500/931], LR 2.5e-05, Loss: 418.6 172 | 2022-07-22 20:23:28,281 [INFO] Epoch [11/60], Step [600/931], LR 2.5e-05, Loss: 470.3 173 | 2022-07-22 20:25:09,164 [INFO] Epoch [11/60], Step [700/931], LR 2.5e-05, Loss: 434.9 174 | 2022-07-22 20:26:50,217 [INFO] Epoch [11/60], Step [800/931], LR 2.6e-05, Loss: 462.9 175 | 2022-07-22 20:28:32,165 [INFO] Epoch [11/60], Step [900/931], LR 2.6e-05, Loss: 506.0 176 | 2022-07-22 20:29:01,882 [INFO] Start validation... 177 | 2022-07-22 20:30:50,127 [INFO] mAP score regular 58.00, mAP score EMA 60.30 178 | 2022-07-22 20:30:56,423 [INFO] current_mAP = 60.30, highest_mAP = 60.30, best_epoch=11 179 | 180 | 2022-07-22 20:30:56,423 [INFO] Save text embeddings done 181 | 2022-07-22 20:31:01,779 [INFO] Epoch [12/60], Step [000/931], LR 2.6e-05, Loss: 454.3 182 | 2022-07-22 20:32:43,921 [INFO] Epoch [12/60], Step [100/931], LR 2.6e-05, Loss: 473.9 183 | 2022-07-22 20:34:26,710 [INFO] Epoch [12/60], Step [200/931], LR 2.6e-05, Loss: 484.2 184 | 2022-07-22 20:36:08,301 [INFO] Epoch [12/60], Step [300/931], LR 2.6e-05, Loss: 479.9 185 | 2022-07-22 20:37:50,309 [INFO] Epoch [12/60], Step [400/931], LR 2.7e-05, Loss: 483.3 186 | 2022-07-22 20:39:31,775 [INFO] Epoch [12/60], Step [500/931], LR 2.7e-05, Loss: 479.9 187 | 2022-07-22 20:41:14,192 [INFO] Epoch [12/60], Step [600/931], LR 2.7e-05, Loss: 473.1 188 | 2022-07-22 20:42:55,274 [INFO] Epoch [12/60], Step [700/931], LR 2.7e-05, Loss: 445.0 189 | 2022-07-22 20:44:37,150 [INFO] Epoch [12/60], Step [800/931], LR 2.7e-05, Loss: 496.0 190 | 2022-07-22 20:46:18,049 [INFO] Epoch [12/60], Step [900/931], LR 2.8e-05, Loss: 501.6 191 | 2022-07-22 20:46:47,986 [INFO] Start validation... 192 | 2022-07-22 20:48:35,301 [INFO] mAP score regular 57.37, mAP score EMA 61.14 193 | 2022-07-22 20:48:41,965 [INFO] current_mAP = 61.14, highest_mAP = 61.14, best_epoch=12 194 | 195 | 2022-07-22 20:48:41,965 [INFO] Save text embeddings done 196 | 2022-07-22 20:48:48,658 [INFO] Epoch [13/60], Step [000/931], LR 2.8e-05, Loss: 490.6 197 | 2022-07-22 20:50:29,657 [INFO] Epoch [13/60], Step [100/931], LR 2.8e-05, Loss: 451.0 198 | 2022-07-22 20:52:10,663 [INFO] Epoch [13/60], Step [200/931], LR 2.8e-05, Loss: 481.5 199 | 2022-07-22 20:53:51,202 [INFO] Epoch [13/60], Step [300/931], LR 2.8e-05, Loss: 451.2 200 | 2022-07-22 20:55:32,349 [INFO] Epoch [13/60], Step [400/931], LR 2.8e-05, Loss: 461.9 201 | 2022-07-22 20:57:13,332 [INFO] Epoch [13/60], Step [500/931], LR 2.8e-05, Loss: 504.6 202 | 2022-07-22 20:58:54,373 [INFO] Epoch [13/60], Step [600/931], LR 2.8e-05, Loss: 475.2 203 | 2022-07-22 21:00:35,350 [INFO] Epoch [13/60], Step [700/931], LR 2.9e-05, Loss: 495.0 204 | 2022-07-22 21:02:16,973 [INFO] Epoch [13/60], Step [800/931], LR 2.9e-05, Loss: 437.8 205 | 2022-07-22 21:03:58,042 [INFO] Epoch [13/60], Step [900/931], LR 2.9e-05, Loss: 468.1 206 | 2022-07-22 21:04:27,835 [INFO] Start validation... 207 | 2022-07-22 21:06:15,938 [INFO] mAP score regular 56.19, mAP score EMA 61.67 208 | 2022-07-22 21:06:22,436 [INFO] current_mAP = 61.67, highest_mAP = 61.67, best_epoch=13 209 | 210 | 2022-07-22 21:06:22,437 [INFO] Save text embeddings done 211 | 2022-07-22 21:06:27,647 [INFO] Epoch [14/60], Step [000/931], LR 2.9e-05, Loss: 469.8 212 | 2022-07-22 21:08:10,284 [INFO] Epoch [14/60], Step [100/931], LR 2.9e-05, Loss: 453.6 213 | 2022-07-22 21:09:52,117 [INFO] Epoch [14/60], Step [200/931], LR 2.9e-05, Loss: 501.0 214 | 2022-07-22 21:11:33,067 [INFO] Epoch [14/60], Step [300/931], LR 2.9e-05, Loss: 437.3 215 | 2022-07-22 21:13:13,941 [INFO] Epoch [14/60], Step [400/931], LR 2.9e-05, Loss: 502.1 216 | 2022-07-22 21:14:54,995 [INFO] Epoch [14/60], Step [500/931], LR 2.9e-05, Loss: 486.2 217 | 2022-07-22 21:16:36,150 [INFO] Epoch [14/60], Step [600/931], LR 2.9e-05, Loss: 460.9 218 | 2022-07-22 21:18:17,339 [INFO] Epoch [14/60], Step [700/931], LR 3.0e-05, Loss: 466.3 219 | 2022-07-22 21:19:58,371 [INFO] Epoch [14/60], Step [800/931], LR 3.0e-05, Loss: 452.6 220 | 2022-07-22 21:21:40,733 [INFO] Epoch [14/60], Step [900/931], LR 3.0e-05, Loss: 466.5 221 | 2022-07-22 21:22:10,500 [INFO] Start validation... 222 | 2022-07-22 21:24:01,041 [INFO] mAP score regular 56.97, mAP score EMA 61.98 223 | 2022-07-22 21:24:08,292 [INFO] current_mAP = 61.98, highest_mAP = 61.98, best_epoch=14 224 | 225 | 2022-07-22 21:24:08,293 [INFO] Save text embeddings done 226 | 2022-07-22 21:24:13,050 [INFO] Epoch [15/60], Step [000/931], LR 3.0e-05, Loss: 428.4 227 | 2022-07-22 21:25:55,328 [INFO] Epoch [15/60], Step [100/931], LR 3.0e-05, Loss: 453.4 228 | 2022-07-22 21:27:37,141 [INFO] Epoch [15/60], Step [200/931], LR 3.0e-05, Loss: 467.9 229 | 2022-07-22 21:29:18,685 [INFO] Epoch [15/60], Step [300/931], LR 3.0e-05, Loss: 440.8 230 | 2022-07-22 21:30:59,519 [INFO] Epoch [15/60], Step [400/931], LR 3.0e-05, Loss: 448.2 231 | 2022-07-22 21:32:40,941 [INFO] Epoch [15/60], Step [500/931], LR 3.0e-05, Loss: 467.9 232 | 2022-07-22 21:34:22,354 [INFO] Epoch [15/60], Step [600/931], LR 3.0e-05, Loss: 463.5 233 | 2022-07-22 21:36:04,211 [INFO] Epoch [15/60], Step [700/931], LR 3.0e-05, Loss: 454.4 234 | 2022-07-22 21:37:45,543 [INFO] Epoch [15/60], Step [800/931], LR 3.0e-05, Loss: 486.7 235 | 2022-07-22 21:39:26,770 [INFO] Epoch [15/60], Step [900/931], LR 3.0e-05, Loss: 455.3 236 | 2022-07-22 21:39:56,605 [INFO] Start validation... 237 | 2022-07-22 21:41:47,027 [INFO] mAP score regular 54.83, mAP score EMA 62.04 238 | 2022-07-22 21:41:53,996 [INFO] current_mAP = 62.04, highest_mAP = 62.04, best_epoch=15 239 | 240 | 2022-07-22 21:41:53,997 [INFO] Save text embeddings done 241 | 2022-07-22 21:42:00,142 [INFO] Epoch [16/60], Step [000/931], LR 3.0e-05, Loss: 441.6 242 | 2022-07-22 21:43:42,084 [INFO] Epoch [16/60], Step [100/931], LR 3.0e-05, Loss: 411.5 243 | 2022-07-22 21:45:23,530 [INFO] Epoch [16/60], Step [200/931], LR 3.0e-05, Loss: 417.8 244 | 2022-07-22 21:47:04,952 [INFO] Epoch [16/60], Step [300/931], LR 3.0e-05, Loss: 470.4 245 | 2022-07-22 21:48:46,557 [INFO] Epoch [16/60], Step [400/931], LR 3.0e-05, Loss: 448.7 246 | 2022-07-22 21:50:29,074 [INFO] Epoch [16/60], Step [500/931], LR 3.0e-05, Loss: 412.4 247 | 2022-07-22 21:52:10,681 [INFO] Epoch [16/60], Step [600/931], LR 3.0e-05, Loss: 453.9 248 | 2022-07-22 21:53:52,312 [INFO] Epoch [16/60], Step [700/931], LR 3.0e-05, Loss: 421.8 249 | 2022-07-22 21:55:34,295 [INFO] Epoch [16/60], Step [800/931], LR 3.0e-05, Loss: 460.4 250 | 2022-07-22 21:57:16,768 [INFO] Epoch [16/60], Step [900/931], LR 3.0e-05, Loss: 436.2 251 | 2022-07-22 21:57:46,734 [INFO] Start validation... 252 | 2022-07-22 21:59:38,726 [INFO] mAP score regular 56.43, mAP score EMA 61.85 253 | 2022-07-22 21:59:38,743 [INFO] current_mAP = 61.85, highest_mAP = 62.04, best_epoch=15 254 | 255 | 2022-07-22 21:59:38,743 [INFO] Save text embeddings done 256 | 2022-07-22 21:59:44,728 [INFO] Epoch [17/60], Step [000/931], LR 3.0e-05, Loss: 465.2 257 | 2022-07-22 22:01:27,612 [INFO] Epoch [17/60], Step [100/931], LR 3.0e-05, Loss: 466.5 258 | 2022-07-22 22:03:09,889 [INFO] Epoch [17/60], Step [200/931], LR 3.0e-05, Loss: 459.8 259 | 2022-07-22 22:04:51,506 [INFO] Epoch [17/60], Step [300/931], LR 3.0e-05, Loss: 483.3 260 | 2022-07-22 22:06:32,686 [INFO] Epoch [17/60], Step [400/931], LR 3.0e-05, Loss: 421.2 261 | 2022-07-22 22:08:14,521 [INFO] Epoch [17/60], Step [500/931], LR 3.0e-05, Loss: 411.6 262 | 2022-07-22 22:09:56,089 [INFO] Epoch [17/60], Step [600/931], LR 3.0e-05, Loss: 432.9 263 | 2022-07-22 22:11:37,423 [INFO] Epoch [17/60], Step [700/931], LR 3.0e-05, Loss: 398.5 264 | 2022-07-22 22:13:19,035 [INFO] Epoch [17/60], Step [800/931], LR 3.0e-05, Loss: 422.8 265 | 2022-07-22 22:15:00,741 [INFO] Epoch [17/60], Step [900/931], LR 3.0e-05, Loss: 488.0 266 | 2022-07-22 22:15:30,760 [INFO] Start validation... 267 | 2022-07-22 22:17:16,056 [INFO] mAP score regular 55.89, mAP score EMA 61.57 268 | 2022-07-22 22:17:16,071 [INFO] current_mAP = 61.57, highest_mAP = 62.04, best_epoch=15 269 | 270 | 2022-07-22 22:17:16,071 [INFO] Save text embeddings done 271 | 2022-07-22 22:17:24,893 [INFO] Epoch [18/60], Step [000/931], LR 3.0e-05, Loss: 477.4 272 | 2022-07-22 22:19:06,060 [INFO] Epoch [18/60], Step [100/931], LR 3.0e-05, Loss: 451.3 273 | 2022-07-22 22:20:47,363 [INFO] Epoch [18/60], Step [200/931], LR 3.0e-05, Loss: 456.4 274 | 2022-07-22 22:22:29,607 [INFO] Epoch [18/60], Step [300/931], LR 3.0e-05, Loss: 459.9 275 | 2022-07-22 22:24:11,986 [INFO] Epoch [18/60], Step [400/931], LR 3.0e-05, Loss: 440.4 276 | 2022-07-22 22:25:53,735 [INFO] Epoch [18/60], Step [500/931], LR 3.0e-05, Loss: 400.4 277 | 2022-07-22 22:27:35,577 [INFO] Epoch [18/60], Step [600/931], LR 3.0e-05, Loss: 464.0 278 | 2022-07-22 22:29:16,623 [INFO] Epoch [18/60], Step [700/931], LR 3.0e-05, Loss: 424.3 279 | 2022-07-22 22:30:58,613 [INFO] Epoch [18/60], Step [800/931], LR 3.0e-05, Loss: 471.1 280 | 2022-07-22 22:32:40,558 [INFO] Epoch [18/60], Step [900/931], LR 3.0e-05, Loss: 458.9 281 | 2022-07-22 22:33:10,468 [INFO] Start validation... 282 | 2022-07-22 22:34:59,445 [INFO] mAP score regular 56.24, mAP score EMA 61.21 283 | 2022-07-22 22:34:59,462 [INFO] current_mAP = 61.21, highest_mAP = 62.04, best_epoch=15 284 | 285 | 2022-07-22 22:34:59,462 [INFO] Save text embeddings done 286 | 2022-07-22 22:35:03,586 [INFO] Epoch [19/60], Step [000/931], LR 3.0e-05, Loss: 412.9 287 | 2022-07-22 22:36:46,505 [INFO] Epoch [19/60], Step [100/931], LR 3.0e-05, Loss: 382.3 288 | 2022-07-22 22:38:29,144 [INFO] Epoch [19/60], Step [200/931], LR 3.0e-05, Loss: 370.6 289 | 2022-07-22 22:40:12,081 [INFO] Epoch [19/60], Step [300/931], LR 3.0e-05, Loss: 438.2 290 | 2022-07-22 22:41:53,928 [INFO] Epoch [19/60], Step [400/931], LR 3.0e-05, Loss: 464.3 291 | 2022-07-22 22:43:35,950 [INFO] Epoch [19/60], Step [500/931], LR 3.0e-05, Loss: 473.7 292 | 2022-07-22 22:45:17,584 [INFO] Epoch [19/60], Step [600/931], LR 3.0e-05, Loss: 423.3 293 | 2022-07-22 22:46:59,126 [INFO] Epoch [19/60], Step [700/931], LR 3.0e-05, Loss: 428.5 294 | 2022-07-22 22:48:40,476 [INFO] Epoch [19/60], Step [800/931], LR 3.0e-05, Loss: 400.4 295 | 2022-07-22 22:50:22,851 [INFO] Epoch [19/60], Step [900/931], LR 3.0e-05, Loss: 446.7 296 | 2022-07-22 22:50:52,918 [INFO] Start validation... 297 | 2022-07-22 22:52:42,333 [INFO] mAP score regular 55.96, mAP score EMA 60.90 298 | 2022-07-22 22:52:42,367 [INFO] current_mAP = 60.90, highest_mAP = 62.04, best_epoch=15 299 | 300 | 2022-07-22 22:52:42,367 [INFO] Save text embeddings done 301 | 2022-07-22 22:52:47,543 [INFO] Epoch [20/60], Step [000/931], LR 3.0e-05, Loss: 463.8 302 | 2022-07-22 22:54:30,558 [INFO] Epoch [20/60], Step [100/931], LR 3.0e-05, Loss: 417.2 303 | 2022-07-22 22:56:12,176 [INFO] Epoch [20/60], Step [200/931], LR 3.0e-05, Loss: 407.8 304 | 2022-07-22 22:57:54,237 [INFO] Epoch [20/60], Step [300/931], LR 3.0e-05, Loss: 431.0 305 | 2022-07-22 22:59:36,853 [INFO] Epoch [20/60], Step [400/931], LR 3.0e-05, Loss: 402.1 306 | 2022-07-22 23:01:18,576 [INFO] Epoch [20/60], Step [500/931], LR 3.0e-05, Loss: 411.5 307 | 2022-07-22 23:03:00,285 [INFO] Epoch [20/60], Step [600/931], LR 3.0e-05, Loss: 403.1 308 | 2022-07-22 23:04:41,332 [INFO] Epoch [20/60], Step [700/931], LR 3.0e-05, Loss: 440.3 309 | 2022-07-22 23:06:23,263 [INFO] Epoch [20/60], Step [800/931], LR 3.0e-05, Loss: 453.7 310 | 2022-07-22 23:08:05,801 [INFO] Epoch [20/60], Step [900/931], LR 3.0e-05, Loss: 476.9 311 | 2022-07-22 23:08:35,797 [INFO] Start validation... 312 | 2022-07-22 23:10:21,861 [INFO] mAP score regular 54.57, mAP score EMA 60.69 313 | 2022-07-22 23:10:21,877 [INFO] current_mAP = 60.69, highest_mAP = 62.04, best_epoch=15 314 | 315 | 2022-07-22 23:10:21,877 [INFO] Save text embeddings done 316 | 2022-07-22 23:10:28,288 [INFO] Epoch [21/60], Step [000/931], LR 3.0e-05, Loss: 438.4 317 | 2022-07-22 23:12:09,766 [INFO] Epoch [21/60], Step [100/931], LR 3.0e-05, Loss: 419.9 318 | 2022-07-22 23:13:51,275 [INFO] Epoch [21/60], Step [200/931], LR 3.0e-05, Loss: 397.4 319 | 2022-07-22 23:15:32,690 [INFO] Epoch [21/60], Step [300/931], LR 2.9e-05, Loss: 398.8 320 | 2022-07-22 23:17:13,952 [INFO] Epoch [21/60], Step [400/931], LR 2.9e-05, Loss: 398.1 321 | 2022-07-22 23:18:55,569 [INFO] Epoch [21/60], Step [500/931], LR 2.9e-05, Loss: 447.3 322 | 2022-07-22 23:20:37,013 [INFO] Epoch [21/60], Step [600/931], LR 2.9e-05, Loss: 422.9 323 | 2022-07-22 23:22:18,985 [INFO] Epoch [21/60], Step [700/931], LR 2.9e-05, Loss: 423.4 324 | 2022-07-22 23:24:00,437 [INFO] Epoch [21/60], Step [800/931], LR 2.9e-05, Loss: 412.2 325 | 2022-07-22 23:25:42,140 [INFO] Epoch [21/60], Step [900/931], LR 2.9e-05, Loss: 442.7 326 | 2022-07-22 23:26:12,301 [INFO] Start validation... 327 | 2022-07-22 23:28:00,761 [INFO] mAP score regular 54.85, mAP score EMA 60.46 328 | 2022-07-22 23:28:00,779 [INFO] current_mAP = 60.46, highest_mAP = 62.04, best_epoch=15 329 | 330 | 2022-07-22 23:28:00,780 [INFO] Save text embeddings done 331 | 2022-07-22 23:28:05,091 [INFO] Epoch [22/60], Step [000/931], LR 2.9e-05, Loss: 423.0 332 | 2022-07-22 23:29:48,657 [INFO] Epoch [22/60], Step [100/931], LR 2.9e-05, Loss: 397.1 333 | 2022-07-22 23:31:29,919 [INFO] Epoch [22/60], Step [200/931], LR 2.9e-05, Loss: 405.3 334 | 2022-07-22 23:33:11,463 [INFO] Epoch [22/60], Step [300/931], LR 2.9e-05, Loss: 358.0 335 | 2022-07-22 23:34:52,871 [INFO] Epoch [22/60], Step [400/931], LR 2.9e-05, Loss: 402.1 336 | 2022-07-22 23:36:34,599 [INFO] Epoch [22/60], Step [500/931], LR 2.9e-05, Loss: 381.8 337 | 2022-07-22 23:38:16,141 [INFO] Epoch [22/60], Step [600/931], LR 2.9e-05, Loss: 408.6 338 | 2022-07-22 23:39:57,610 [INFO] Epoch [22/60], Step [700/931], LR 2.9e-05, Loss: 395.9 339 | 2022-07-22 23:41:39,738 [INFO] Epoch [22/60], Step [800/931], LR 2.9e-05, Loss: 431.5 340 | 2022-07-22 23:43:21,677 [INFO] Epoch [22/60], Step [900/931], LR 2.9e-05, Loss: 426.9 341 | 2022-07-22 23:43:52,092 [INFO] Start validation... 342 | 2022-07-22 23:45:38,008 [INFO] mAP score regular 55.11, mAP score EMA 60.24 343 | 2022-07-22 23:45:38,023 [INFO] current_mAP = 60.24, highest_mAP = 62.04, best_epoch=15 344 | 345 | 2022-07-22 23:45:38,023 [INFO] Save text embeddings done 346 | 2022-07-22 23:45:44,623 [INFO] Epoch [23/60], Step [000/931], LR 2.9e-05, Loss: 429.3 347 | 2022-07-22 23:47:25,950 [INFO] Epoch [23/60], Step [100/931], LR 2.9e-05, Loss: 382.3 348 | 2022-07-22 23:49:07,550 [INFO] Epoch [23/60], Step [200/931], LR 2.9e-05, Loss: 414.0 349 | 2022-07-22 23:50:48,862 [INFO] Epoch [23/60], Step [300/931], LR 2.9e-05, Loss: 391.0 350 | 2022-07-22 23:52:30,438 [INFO] Epoch [23/60], Step [400/931], LR 2.9e-05, Loss: 392.5 351 | 2022-07-22 23:54:12,120 [INFO] Epoch [23/60], Step [500/931], LR 2.9e-05, Loss: 367.7 352 | 2022-07-22 23:55:53,637 [INFO] Epoch [23/60], Step [600/931], LR 2.9e-05, Loss: 403.6 353 | 2022-07-22 23:57:35,572 [INFO] Epoch [23/60], Step [700/931], LR 2.9e-05, Loss: 417.2 354 | 2022-07-22 23:59:18,266 [INFO] Epoch [23/60], Step [800/931], LR 2.9e-05, Loss: 392.9 355 | 2022-07-23 00:01:00,770 [INFO] Epoch [23/60], Step [900/931], LR 2.9e-05, Loss: 420.5 356 | 2022-07-23 00:01:30,815 [INFO] Start validation... 357 | 2022-07-23 00:03:16,200 [INFO] mAP score regular 54.67, mAP score EMA 59.98 358 | 2022-07-23 00:03:16,214 [INFO] current_mAP = 59.98, highest_mAP = 62.04, best_epoch=15 359 | 360 | 2022-07-23 00:03:16,215 [INFO] Save text embeddings done 361 | 2022-07-23 00:03:22,167 [INFO] Epoch [24/60], Step [000/931], LR 2.9e-05, Loss: 392.7 362 | 2022-07-23 00:05:03,694 [INFO] Epoch [24/60], Step [100/931], LR 2.9e-05, Loss: 382.0 363 | 2022-07-23 00:06:45,183 [INFO] Epoch [24/60], Step [200/931], LR 2.9e-05, Loss: 398.3 364 | 2022-07-23 00:08:26,419 [INFO] Epoch [24/60], Step [300/931], LR 2.9e-05, Loss: 405.1 365 | 2022-07-23 00:10:07,741 [INFO] Epoch [24/60], Step [400/931], LR 2.9e-05, Loss: 375.0 366 | 2022-07-23 00:11:49,190 [INFO] Epoch [24/60], Step [500/931], LR 2.9e-05, Loss: 422.6 367 | 2022-07-23 00:13:31,083 [INFO] Epoch [24/60], Step [600/931], LR 2.9e-05, Loss: 385.8 368 | 2022-07-23 00:15:12,802 [INFO] Epoch [24/60], Step [700/931], LR 2.9e-05, Loss: 405.0 369 | 2022-07-23 00:16:54,887 [INFO] Epoch [24/60], Step [800/931], LR 2.9e-05, Loss: 383.2 370 | -------------------------------------------------------------------------------- /logs/scpnet+voc.txt: -------------------------------------------------------------------------------- 1 | 2022-07-23 11:14:06,053 [INFO] Epoch [0/120], Step [000/045], LR 1.6e-06, Loss: 3571.0 2 | 2022-07-23 11:14:49,796 [INFO] Start validation... 3 | 2022-07-23 11:15:03,834 [INFO] mAP score regular 35.93, mAP score EMA 73.83 4 | 2022-07-23 11:15:05,482 [INFO] current_mAP = 73.83, highest_mAP = 73.83, best_epoch=0 5 | 6 | 2022-07-23 11:15:05,482 [INFO] Save text embeddings done 7 | 2022-07-23 11:15:12,790 [INFO] Epoch [1/120], Step [000/045], LR 1.8e-06, Loss: 894.1 8 | 2022-07-23 11:15:57,073 [INFO] Start validation... 9 | 2022-07-23 11:16:10,915 [INFO] mAP score regular 43.35, mAP score EMA 71.88 10 | 2022-07-23 11:16:10,925 [INFO] current_mAP = 71.88, highest_mAP = 73.83, best_epoch=0 11 | 12 | 2022-07-23 11:16:10,925 [INFO] Save text embeddings done 13 | 2022-07-23 11:16:18,474 [INFO] Epoch [2/120], Step [000/045], LR 2.3e-06, Loss: 540.6 14 | 2022-07-23 11:17:01,149 [INFO] Start validation... 15 | 2022-07-23 11:17:14,754 [INFO] mAP score regular 57.53, mAP score EMA 69.09 16 | 2022-07-23 11:17:14,769 [INFO] current_mAP = 69.09, highest_mAP = 73.83, best_epoch=0 17 | 18 | 2022-07-23 11:17:14,769 [INFO] Save text embeddings done 19 | 2022-07-23 11:17:20,126 [INFO] Epoch [3/120], Step [000/045], LR 3.1e-06, Loss: 462.6 20 | 2022-07-23 11:18:04,651 [INFO] Start validation... 21 | 2022-07-23 11:18:18,585 [INFO] mAP score regular 72.38, mAP score EMA 68.07 22 | 2022-07-23 11:18:18,602 [INFO] current_mAP = 72.38, highest_mAP = 73.83, best_epoch=0 23 | 24 | 2022-07-23 11:18:18,602 [INFO] Save text embeddings done 25 | 2022-07-23 11:18:26,176 [INFO] Epoch [4/120], Step [000/045], LR 4.2e-06, Loss: 482.1 26 | 2022-07-23 11:19:09,089 [INFO] Start validation... 27 | 2022-07-23 11:19:22,748 [INFO] mAP score regular 79.14, mAP score EMA 69.34 28 | 2022-07-23 11:19:29,342 [INFO] current_mAP = 79.14, highest_mAP = 79.14, best_epoch=4 29 | 30 | 2022-07-23 11:19:29,343 [INFO] Save text embeddings done 31 | 2022-07-23 11:19:37,442 [INFO] Epoch [5/120], Step [000/045], LR 5.6e-06, Loss: 415.2 32 | 2022-07-23 11:20:20,641 [INFO] Start validation... 33 | 2022-07-23 11:20:34,293 [INFO] mAP score regular 82.83, mAP score EMA 72.48 34 | 2022-07-23 11:20:41,011 [INFO] current_mAP = 82.83, highest_mAP = 82.83, best_epoch=5 35 | 36 | 2022-07-23 11:20:41,012 [INFO] Save text embeddings done 37 | 2022-07-23 11:20:47,217 [INFO] Epoch [6/120], Step [000/045], LR 7.3e-06, Loss: 442.0 38 | 2022-07-23 11:21:30,075 [INFO] Start validation... 39 | 2022-07-23 11:21:43,656 [INFO] mAP score regular 85.28, mAP score EMA 76.40 40 | 2022-07-23 11:21:50,637 [INFO] current_mAP = 85.28, highest_mAP = 85.28, best_epoch=6 41 | 42 | 2022-07-23 11:21:50,638 [INFO] Save text embeddings done 43 | 2022-07-23 11:21:56,390 [INFO] Epoch [7/120], Step [000/045], LR 9.2e-06, Loss: 434.3 44 | 2022-07-23 11:22:40,778 [INFO] Start validation... 45 | 2022-07-23 11:22:54,413 [INFO] mAP score regular 86.72, mAP score EMA 80.10 46 | 2022-07-23 11:23:00,835 [INFO] current_mAP = 86.72, highest_mAP = 86.72, best_epoch=7 47 | 48 | 2022-07-23 11:23:00,835 [INFO] Save text embeddings done 49 | 2022-07-23 11:23:07,409 [INFO] Epoch [8/120], Step [000/045], LR 1.1e-05, Loss: 355.7 50 | 2022-07-23 11:23:50,891 [INFO] Start validation... 51 | 2022-07-23 11:24:04,527 [INFO] mAP score regular 88.37, mAP score EMA 83.00 52 | 2022-07-23 11:24:10,715 [INFO] current_mAP = 88.37, highest_mAP = 88.37, best_epoch=8 53 | 54 | 2022-07-23 11:24:10,716 [INFO] Save text embeddings done 55 | 2022-07-23 11:24:18,045 [INFO] Epoch [9/120], Step [000/045], LR 1.4e-05, Loss: 345.8 56 | 2022-07-23 11:25:01,379 [INFO] Start validation... 57 | 2022-07-23 11:25:15,006 [INFO] mAP score regular 88.61, mAP score EMA 85.20 58 | 2022-07-23 11:25:21,417 [INFO] current_mAP = 88.61, highest_mAP = 88.61, best_epoch=9 59 | 60 | 2022-07-23 11:25:21,418 [INFO] Save text embeddings done 61 | 2022-07-23 11:25:29,395 [INFO] Epoch [10/120], Step [000/045], LR 1.6e-05, Loss: 365.9 62 | 2022-07-23 11:26:12,363 [INFO] Start validation... 63 | 2022-07-23 11:26:25,895 [INFO] mAP score regular 89.47, mAP score EMA 86.74 64 | 2022-07-23 11:26:32,584 [INFO] current_mAP = 89.47, highest_mAP = 89.47, best_epoch=10 65 | 66 | 2022-07-23 11:26:32,584 [INFO] Save text embeddings done 67 | 2022-07-23 11:26:38,868 [INFO] Epoch [11/120], Step [000/045], LR 1.8e-05, Loss: 367.3 68 | 2022-07-23 11:27:22,388 [INFO] Start validation... 69 | 2022-07-23 11:27:36,294 [INFO] mAP score regular 89.78, mAP score EMA 87.93 70 | 2022-07-23 11:27:42,804 [INFO] current_mAP = 89.78, highest_mAP = 89.78, best_epoch=11 71 | 72 | 2022-07-23 11:27:42,805 [INFO] Save text embeddings done 73 | 2022-07-23 11:27:50,380 [INFO] Epoch [12/120], Step [000/045], LR 2.1e-05, Loss: 334.9 74 | 2022-07-23 11:28:33,685 [INFO] Start validation... 75 | 2022-07-23 11:28:47,273 [INFO] mAP score regular 89.98, mAP score EMA 88.76 76 | 2022-07-23 11:28:58,709 [INFO] current_mAP = 89.98, highest_mAP = 89.98, best_epoch=12 77 | 78 | 2022-07-23 11:28:58,709 [INFO] Save text embeddings done 79 | 2022-07-23 11:29:06,397 [INFO] Epoch [13/120], Step [000/045], LR 2.3e-05, Loss: 331.9 80 | 2022-07-23 11:29:49,412 [INFO] Start validation... 81 | 2022-07-23 11:30:03,142 [INFO] mAP score regular 90.17, mAP score EMA 89.42 82 | 2022-07-23 11:30:09,868 [INFO] current_mAP = 90.17, highest_mAP = 90.17, best_epoch=13 83 | 84 | 2022-07-23 11:30:09,869 [INFO] Save text embeddings done 85 | 2022-07-23 11:30:15,575 [INFO] Epoch [14/120], Step [000/045], LR 2.6e-05, Loss: 347.5 86 | 2022-07-23 11:30:58,531 [INFO] Start validation... 87 | 2022-07-23 11:31:12,207 [INFO] mAP score regular 90.07, mAP score EMA 89.97 88 | 2022-07-23 11:31:12,222 [INFO] current_mAP = 90.07, highest_mAP = 90.17, best_epoch=13 89 | 90 | 2022-07-23 11:31:12,222 [INFO] Save text embeddings done 91 | 2022-07-23 11:31:20,367 [INFO] Epoch [15/120], Step [000/045], LR 2.8e-05, Loss: 332.7 92 | 2022-07-23 11:32:03,296 [INFO] Start validation... 93 | 2022-07-23 11:32:16,901 [INFO] mAP score regular 90.44, mAP score EMA 90.36 94 | 2022-07-23 11:32:23,289 [INFO] current_mAP = 90.44, highest_mAP = 90.44, best_epoch=15 95 | 96 | 2022-07-23 11:32:23,290 [INFO] Save text embeddings done 97 | 2022-07-23 11:32:29,544 [INFO] Epoch [16/120], Step [000/045], LR 3.0e-05, Loss: 285.7 98 | 2022-07-23 11:33:12,549 [INFO] Start validation... 99 | 2022-07-23 11:33:26,250 [INFO] mAP score regular 89.82, mAP score EMA 90.65 100 | 2022-07-23 11:33:32,635 [INFO] current_mAP = 90.65, highest_mAP = 90.65, best_epoch=16 101 | 102 | 2022-07-23 11:33:32,636 [INFO] Save text embeddings done 103 | 2022-07-23 11:33:39,803 [INFO] Epoch [17/120], Step [000/045], LR 3.3e-05, Loss: 314.5 104 | 2022-07-23 11:34:23,025 [INFO] Start validation... 105 | 2022-07-23 11:34:36,961 [INFO] mAP score regular 90.27, mAP score EMA 90.83 106 | 2022-07-23 11:34:42,962 [INFO] current_mAP = 90.83, highest_mAP = 90.83, best_epoch=17 107 | 108 | 2022-07-23 11:34:43,351 [INFO] Save text embeddings done 109 | 2022-07-23 11:34:51,397 [INFO] Epoch [18/120], Step [000/045], LR 3.4e-05, Loss: 340.0 110 | 2022-07-23 11:35:34,678 [INFO] Start validation... 111 | 2022-07-23 11:35:48,436 [INFO] mAP score regular 89.55, mAP score EMA 90.96 112 | 2022-07-23 11:35:54,681 [INFO] current_mAP = 90.96, highest_mAP = 90.96, best_epoch=18 113 | 114 | 2022-07-23 11:35:54,681 [INFO] Save text embeddings done 115 | 2022-07-23 11:36:00,680 [INFO] Epoch [19/120], Step [000/045], LR 3.6e-05, Loss: 315.2 116 | 2022-07-23 11:36:44,151 [INFO] Start validation... 117 | 2022-07-23 11:36:58,147 [INFO] mAP score regular 89.19, mAP score EMA 91.04 118 | 2022-07-23 11:37:04,608 [INFO] current_mAP = 91.04, highest_mAP = 91.04, best_epoch=19 119 | 120 | 2022-07-23 11:37:04,609 [INFO] Save text embeddings done 121 | 2022-07-23 11:37:10,919 [INFO] Epoch [20/120], Step [000/045], LR 3.7e-05, Loss: 338.2 122 | 2022-07-23 11:37:53,949 [INFO] Start validation... 123 | 2022-07-23 11:38:07,863 [INFO] mAP score regular 89.12, mAP score EMA 91.13 124 | 2022-07-23 11:38:14,331 [INFO] current_mAP = 91.13, highest_mAP = 91.13, best_epoch=20 125 | 126 | 2022-07-23 11:38:14,332 [INFO] Save text embeddings done 127 | 2022-07-23 11:38:22,259 [INFO] Epoch [21/120], Step [000/045], LR 3.9e-05, Loss: 343.8 128 | 2022-07-23 11:39:05,336 [INFO] Start validation... 129 | 2022-07-23 11:39:19,372 [INFO] mAP score regular 88.68, mAP score EMA 91.15 130 | 2022-07-23 11:39:25,583 [INFO] current_mAP = 91.15, highest_mAP = 91.15, best_epoch=21 131 | 132 | 2022-07-23 11:39:25,584 [INFO] Save text embeddings done 133 | 2022-07-23 11:39:32,159 [INFO] Epoch [22/120], Step [000/045], LR 3.9e-05, Loss: 325.2 134 | 2022-07-23 11:40:15,595 [INFO] Start validation... 135 | 2022-07-23 11:40:29,531 [INFO] mAP score regular 88.36, mAP score EMA 91.16 136 | 2022-07-23 11:40:35,989 [INFO] current_mAP = 91.16, highest_mAP = 91.16, best_epoch=22 137 | 138 | 2022-07-23 11:40:35,990 [INFO] Save text embeddings done 139 | 2022-07-23 11:40:41,879 [INFO] Epoch [23/120], Step [000/045], LR 4.0e-05, Loss: 329.5 140 | 2022-07-23 11:41:25,363 [INFO] Start validation... 141 | 2022-07-23 11:41:39,352 [INFO] mAP score regular 88.45, mAP score EMA 91.09 142 | 2022-07-23 11:41:39,363 [INFO] current_mAP = 91.09, highest_mAP = 91.16, best_epoch=22 143 | 144 | 2022-07-23 11:41:39,363 [INFO] Save text embeddings done 145 | 2022-07-23 11:41:46,142 [INFO] Epoch [24/120], Step [000/045], LR 4.0e-05, Loss: 295.2 146 | 2022-07-23 11:42:29,380 [INFO] Start validation... 147 | 2022-07-23 11:42:43,251 [INFO] mAP score regular 87.28, mAP score EMA 90.97 148 | 2022-07-23 11:42:43,274 [INFO] current_mAP = 90.97, highest_mAP = 91.16, best_epoch=22 149 | 150 | 2022-07-23 11:42:43,274 [INFO] Save text embeddings done 151 | 2022-07-23 11:42:51,054 [INFO] Epoch [25/120], Step [000/045], LR 4.0e-05, Loss: 310.6 152 | 2022-07-23 11:43:34,602 [INFO] Start validation... 153 | 2022-07-23 11:43:48,753 [INFO] mAP score regular 88.02, mAP score EMA 90.86 154 | 2022-07-23 11:43:48,774 [INFO] current_mAP = 90.86, highest_mAP = 91.16, best_epoch=22 155 | 156 | 2022-07-23 11:43:48,774 [INFO] Save text embeddings done 157 | 2022-07-23 11:43:55,623 [INFO] Epoch [26/120], Step [000/045], LR 4.0e-05, Loss: 285.2 158 | 2022-07-23 11:44:39,114 [INFO] Start validation... 159 | 2022-07-23 11:44:52,902 [INFO] mAP score regular 87.06, mAP score EMA 90.75 160 | 2022-07-23 11:44:52,914 [INFO] current_mAP = 90.75, highest_mAP = 91.16, best_epoch=22 161 | 162 | 2022-07-23 11:44:52,914 [INFO] Save text embeddings done 163 | 2022-07-23 11:44:59,368 [INFO] Epoch [27/120], Step [000/045], LR 4.0e-05, Loss: 294.0 164 | -------------------------------------------------------------------------------- /loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class SPLC(nn.Module): 7 | r""" SPLC loss as described in the paper "Simple Loss Design for Multi-Label Learning with Missing Labels " 8 | 9 | .. math:: 10 | &L_{SPLC}^+ = loss^+(p) 11 | &L_{SPLC}^- = \mathbb{I}(p\leq \tau)loss^-(p) + (1-\mathbb{I}(p\leq \tau))loss^+(p) 12 | 13 | where :math:'\tau' is a threshold to identify missing label 14 | :math:`$\mathbb{I}(\cdot)\in\{0,1\}$` is the indicator function, 15 | :math: $loss^+(\cdot), loss^-(\cdot)$ refer to loss functions for positives and negatives, respectively. 16 | 17 | .. note:: 18 | SPLC can be combinded with various multi-label loss functions. 19 | SPLC performs best combined with Focal margin loss in our paper. Code of SPLC with Focal margin loss is released here. 20 | Since the first epoch can recall few missing labels with high precision, SPLC can be used ater the first epoch. 21 | Sigmoid will be done in loss. 22 | 23 | Args: 24 | tau (float): threshold value. Default: 0.6 25 | change_epoch (int): which epoch to combine SPLC. Default: 1 26 | margin (float): Margin value. Default: 1 27 | gamma (float): Hard mining value. Default: 2 28 | reduction (string, optional): Specifies the reduction to apply to the output: 29 | ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied, 30 | ``'mean'``: the sum of the output will be divided by the number of 31 | elements in the output, ``'sum'``: the output will be summed. Default: ``'sum'`` 32 | 33 | """ 34 | 35 | def __init__( 36 | self, 37 | tau: float = 0.6, 38 | change_epoch: int = 1, 39 | margin: float = 1.0, 40 | gamma: float = 2.0, 41 | ) -> None: 42 | super(SPLC, self).__init__() 43 | self.tau = tau 44 | self.change_epoch = change_epoch 45 | self.margin = margin 46 | self.gamma = gamma 47 | 48 | def forward(self, logits: torch.Tensor, targets: torch.Tensor, 49 | epoch): 50 | """ 51 | call function as forward 52 | 53 | Args: 54 | logits : The predicted logits before sigmoid with shape of :math:`(N, C)` 55 | targets : Multi-label binarized vector with shape of :math:`(N, C)` 56 | epoch : The epoch of current training. 57 | 58 | Returns: 59 | torch.Tensor: loss 60 | """ 61 | # Subtract margin for positive logits 62 | logits = torch.where(targets == 1, logits - self.margin, logits) 63 | 64 | # SPLC missing label correction 65 | if epoch >= self.change_epoch: 66 | targets = torch.where( 67 | torch.sigmoid(logits) > self.tau, 68 | torch.tensor(1).cuda(), targets) 69 | 70 | pred = torch.sigmoid(logits) 71 | 72 | # Focal margin for postive loss 73 | pt = (1 - pred) * targets + pred * (1 - targets) 74 | focal_weight = pt**self.gamma 75 | 76 | los_pos = targets * F.logsigmoid(logits) 77 | los_neg = (1 - targets) * F.logsigmoid(-logits) 78 | 79 | loss = -(los_pos + los_neg) 80 | loss *= focal_weight 81 | 82 | return loss.sum(), targets 83 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torch.nn import functional as F 6 | 7 | from clip import clip 8 | from clip.simple_tokenizer import SimpleTokenizer as _Tokenizer 9 | from config import cfg 10 | from log import logger 11 | 12 | _tokenizer = _Tokenizer() 13 | 14 | def load_clip_to_cpu(): 15 | backbone_name = 'RN50' 16 | url = clip._MODELS[backbone_name] 17 | model_path = clip._download(url) 18 | 19 | try: 20 | # loading JIT archive 21 | model = torch.jit.load( # type: ignore 22 | model_path, map_location="cpu").eval() 23 | state_dict = None 24 | 25 | except RuntimeError: 26 | state_dict = torch.load(model_path, map_location="cpu") 27 | 28 | model = clip.build_model(state_dict or model.state_dict()) # type: ignore 29 | 30 | return model 31 | 32 | 33 | class TextEncoder(nn.Module): 34 | 35 | def __init__(self, clip_model): 36 | super().__init__() 37 | self.transformer = clip_model.transformer 38 | self.positional_embedding = clip_model.positional_embedding 39 | self.ln_final = clip_model.ln_final 40 | self.text_projection = clip_model.text_projection 41 | self.dtype = clip_model.dtype 42 | 43 | def forward(self, prompts, tokenized_prompts): 44 | x = prompts + self.positional_embedding.type(self.dtype) 45 | x = x.permute(1, 0, 2) # NLD -> LND 46 | x = self.transformer(x) 47 | x = x.permute(1, 0, 2) # LND -> NLD 48 | x = self.ln_final(x).type(self.dtype) 49 | 50 | # x.shape = [batch_size, n_ctx, transformer.width] 51 | # take features from the eot embedding (eot_token is the highest number in each sequence) 52 | x = x[torch.arange(x.shape[0]), 53 | tokenized_prompts.argmax(dim=-1)] @ self.text_projection 54 | 55 | return x 56 | 57 | 58 | class PromptLearner(nn.Module): 59 | 60 | def __init__(self, classnames, clip_model): 61 | super().__init__() 62 | n_cls = len(classnames) 63 | n_ctx = cfg.n_ctx 64 | dtype = clip_model.dtype 65 | clip_imsize = clip_model.visual.input_resolution 66 | cfg_imsize = 224 67 | assert cfg_imsize == clip_imsize, f"cfg_imsize ({cfg_imsize}) must equal to clip_imsize ({clip_imsize})" 68 | 69 | # use given words to initialize context vectors 70 | ctx_init = cfg.ctx_init.replace("_", " ") 71 | assert (n_ctx == len(ctx_init.split(" "))) 72 | prompt = clip.tokenize(ctx_init) 73 | with torch.no_grad(): 74 | embedding = clip_model.token_embedding(prompt).type(dtype) 75 | ctx_vectors = embedding[0, 1:1 + n_ctx, :] 76 | prompt_prefix = ctx_init 77 | 78 | self.ctx = nn.Parameter(ctx_vectors) # type: ignore 79 | classnames = [name.replace("_", " ") for name in classnames] 80 | name_lens = [len(_tokenizer.encode(name)) for name in classnames] 81 | prompts = [prompt_prefix + " " + name + "." for name in classnames] 82 | 83 | tokenized_prompts = torch.cat([clip.tokenize(p) for p in prompts]) 84 | with torch.no_grad(): 85 | embedding = clip_model.token_embedding(tokenized_prompts).type( 86 | dtype) 87 | 88 | # These token vectors will be saved when in save_model(), 89 | # but they should be ignored in load_model() as we want to use 90 | # those computed using the current class names 91 | self.register_buffer("token_prefix", embedding[:, :1, :]) # SOS 92 | self.register_buffer("token_suffix", 93 | embedding[:, 1 + n_ctx:, :]) # CLS, EOS 94 | self.register_buffer("token_middle", embedding[:, 1:(1 + n_ctx), :]) 95 | self.n_cls = n_cls 96 | self.n_ctx = n_ctx 97 | self.tokenized_prompts = tokenized_prompts # torch.Tensor 98 | self.name_lens = name_lens 99 | 100 | def forward(self): 101 | ctx = self.ctx 102 | if ctx.dim() == 2: 103 | ctx = ctx.unsqueeze(0).expand(self.n_cls, -1, -1) 104 | prefix = self.token_prefix 105 | suffix = self.token_suffix 106 | 107 | prompts = torch.cat( 108 | [ 109 | prefix, # (n_cls, 1, dim) 110 | ctx, # (n_cls, n_ctx, dim) 111 | suffix, # (n_cls, *, dim) 112 | ], # type: ignore 113 | dim=1, 114 | ) 115 | return prompts 116 | 117 | def load_clip_model(): 118 | clip_model = load_clip_to_cpu() 119 | 120 | # CLIP's default precision is fp16 121 | clip_model.float() 122 | return clip_model, clip._transform(clip_model.visual.input_resolution) 123 | 124 | import math 125 | import numpy as np 126 | class GraphConvolution(nn.Module): 127 | """ 128 | Simple GCN layer, similar to https://arxiv.org/abs/1609.02907 129 | """ 130 | 131 | def __init__(self, in_features, out_features, bias=False): 132 | super(GraphConvolution, self).__init__() 133 | self.in_features = in_features 134 | self.out_features = out_features 135 | self.weight = nn.parameter.Parameter(torch.Tensor(in_features, out_features)) 136 | if bias: 137 | self.bias = nn.parameter.Parameter(torch.Tensor(1, 1, out_features)) 138 | else: 139 | self.register_parameter('bias', None) 140 | self.reset_parameters() 141 | 142 | def reset_parameters(self): 143 | stdv = 1. / math.sqrt(self.weight.size(1)) 144 | self.weight.data.uniform_(-stdv, stdv) 145 | if self.bias is not None: 146 | self.bias.data.uniform_(-stdv, stdv) 147 | 148 | def forward(self, input, adj): 149 | support = torch.matmul(input, self.weight) 150 | output = torch.matmul(adj, support) 151 | if self.bias is not None: 152 | return output + self.bias 153 | else: 154 | return output 155 | 156 | def __repr__(self): 157 | return self.__class__.__name__ + ' (' \ 158 | + str(self.in_features) + ' -> ' \ 159 | + str(self.out_features) + ')' 160 | 161 | from timm.models.vision_transformer import resize_pos_embed 162 | class SCPNet(nn.Module): 163 | 164 | def __init__(self, classnames, clip_model): 165 | super().__init__() 166 | self.prompt_learner = PromptLearner(classnames, clip_model) 167 | self.tokenized_prompts = self.prompt_learner.tokenized_prompts 168 | self.image_encoder = clip_model.visual 169 | self.text_encoder = TextEncoder(clip_model) 170 | self.logit_scale = clip_model.logit_scale 171 | self.dtype = clip_model.dtype 172 | 173 | self.gc1 = GraphConvolution(1024, 2048) 174 | self.gc2 = GraphConvolution(2048, 2048) 175 | self.gc3 = GraphConvolution(2048, 1024) 176 | self.relu = nn.LeakyReLU(0.2) 177 | self.relu2 = nn.LeakyReLU(0.2) 178 | 179 | self.relation = torch.Tensor(np.load(cfg.relation_file)) 180 | 181 | _ ,max_idx = torch.topk(self.relation, cfg.sparse_topk) 182 | mask = torch.ones_like(self.relation).type(torch.bool) 183 | for i, idx in enumerate(max_idx): 184 | mask[i][idx] = 0 185 | self.relation[mask] = 0 186 | sparse_mask = mask 187 | dialog = torch.eye(cfg.num_classes).type(torch.bool) 188 | self.relation[dialog] = 0 189 | self.relation = self.relation / torch.sum(self.relation, dim=1).reshape(-1, 1) * cfg.reweight_p 190 | self.relation[dialog] = 1-cfg.reweight_p 191 | 192 | self.gcn_relation = self.relation.clone() 193 | assert(self.gcn_relation.requires_grad == False) 194 | self.relation = torch.exp(self.relation/cfg.T) / torch.sum(torch.exp(self.relation/cfg.T), dim=1).reshape(-1,1) 195 | self.relation[sparse_mask] = 0 196 | self.relation = self.relation / torch.sum(self.relation, dim=1).reshape(-1, 1) 197 | 198 | def forward(self, image): 199 | tokenized_prompts = self.tokenized_prompts 200 | image_features = self.image_encoder(image.type(self.dtype)) 201 | image_features = image_features / image_features.norm(dim=-1, 202 | keepdim=True) 203 | logit_scale = self.logit_scale.exp() 204 | if cfg.scale != 'clip': 205 | assert(isinstance(cfg.scale, int)) 206 | logit_scale = cfg.scale 207 | prompts = self.prompt_learner() 208 | text_features = self.text_encoder(prompts, tokenized_prompts) 209 | identity = text_features 210 | 211 | text_features = self.gc1(text_features, self.gcn_relation.cuda()) 212 | text_features = self.relu(text_features) 213 | text_features = self.gc2(text_features, self.gcn_relation.cuda()) 214 | text_features = self.relu2(text_features) 215 | text_features = self.gc3(text_features, self.gcn_relation.cuda()) 216 | 217 | text_features += identity 218 | text_features = text_features / text_features.norm(dim=-1, 219 | keepdim=True) 220 | logits = logit_scale * image_features @ text_features.t() 221 | return logits 222 | -------------------------------------------------------------------------------- /nuswide_labels.txt: -------------------------------------------------------------------------------- 1 | airport 2 | animal 3 | beach 4 | bear 5 | birds 6 | boats 7 | book 8 | bridge 9 | buildings 10 | cars 11 | castle 12 | cat 13 | cityscape 14 | clouds 15 | computer 16 | coral 17 | cow 18 | dancing 19 | dog 20 | earthquake 21 | elk 22 | fire 23 | fish 24 | flags 25 | flowers 26 | food 27 | fox 28 | frost 29 | garden 30 | glacier 31 | grass 32 | harbor 33 | horses 34 | house 35 | lake 36 | leaf 37 | map 38 | military 39 | moon 40 | mountain 41 | nighttime 42 | ocean 43 | person 44 | plane 45 | plants 46 | police 47 | protest 48 | railroad 49 | rainbow 50 | reflection 51 | road 52 | rocks 53 | running 54 | sand 55 | sign 56 | sky 57 | snow 58 | soccer 59 | sports 60 | statue 61 | street 62 | sun 63 | sunset 64 | surf 65 | swimmers 66 | tattoo 67 | temple 68 | tiger 69 | tower 70 | town 71 | toy 72 | train 73 | tree 74 | valley 75 | vehicle 76 | water 77 | waterfall 78 | wedding 79 | whales 80 | window 81 | zebra -------------------------------------------------------------------------------- /randaugment.py: -------------------------------------------------------------------------------- 1 | # copyright: https://github.com/ildoonet/pytorch-randaugment 2 | # code in this file is adpated from rpmcruz/autoaugment 3 | # https://github.com/rpmcruz/autoaugment/blob/master/transformations.py 4 | # This code is modified version of one of ildoonet, for randaugmentation of fixmatch. 5 | 6 | import random 7 | 8 | import numpy as np 9 | import PIL 10 | import PIL.ImageDraw 11 | import PIL.ImageEnhance 12 | import PIL.ImageOps 13 | import torch 14 | import torch.nn.functional as F 15 | from PIL import Image 16 | 17 | 18 | def AutoContrast(img, _): 19 | return PIL.ImageOps.autocontrast(img) 20 | 21 | 22 | def Brightness(img, v): 23 | assert v >= 0.0 24 | return PIL.ImageEnhance.Brightness(img).enhance(v) 25 | 26 | 27 | def Color(img, v): 28 | assert v >= 0.0 29 | return PIL.ImageEnhance.Color(img).enhance(v) 30 | 31 | 32 | def Contrast(img, v): 33 | assert v >= 0.0 34 | return PIL.ImageEnhance.Contrast(img).enhance(v) 35 | 36 | 37 | def Equalize(img, _): 38 | return PIL.ImageOps.equalize(img) 39 | 40 | 41 | def Invert(img, _): 42 | return PIL.ImageOps.invert(img) 43 | 44 | 45 | def Identity(img, v): 46 | return img 47 | 48 | 49 | def Posterize(img, v): # [4, 8] 50 | v = int(v) 51 | v = max(1, v) 52 | return PIL.ImageOps.posterize(img, v) 53 | 54 | 55 | def Rotate(img, v): # [-30, 30] 56 | #assert -30 <= v <= 30 57 | #if random.random() > 0.5: 58 | # v = -v 59 | return img.rotate(v) 60 | 61 | 62 | def Sharpness(img, v): # [0.1,1.9] 63 | assert v >= 0.0 64 | return PIL.ImageEnhance.Sharpness(img).enhance(v) 65 | 66 | 67 | def ShearX(img, v): # [-0.3, 0.3] 68 | #assert -0.3 <= v <= 0.3 69 | #if random.random() > 0.5: 70 | # v = -v 71 | return img.transform(img.size, PIL.Image.AFFINE, (1, v, 0, 0, 1, 0)) 72 | 73 | 74 | def ShearY(img, v): # [-0.3, 0.3] 75 | #assert -0.3 <= v <= 0.3 76 | #if random.random() > 0.5: 77 | # v = -v 78 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, v, 1, 0)) 79 | 80 | 81 | def TranslateX(img, v): # [-150, 150] => percentage: [-0.45, 0.45] 82 | #assert -0.3 <= v <= 0.3 83 | #if random.random() > 0.5: 84 | # v = -v 85 | v = v * img.size[0] 86 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0)) 87 | 88 | 89 | def TranslateXabs(img, v): # [-150, 150] => percentage: [-0.45, 0.45] 90 | #assert v >= 0.0 91 | #if random.random() > 0.5: 92 | # v = -v 93 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0)) 94 | 95 | 96 | def TranslateY(img, v): # [-150, 150] => percentage: [-0.45, 0.45] 97 | #assert -0.3 <= v <= 0.3 98 | #if random.random() > 0.5: 99 | # v = -v 100 | v = v * img.size[1] 101 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v)) 102 | 103 | 104 | def TranslateYabs(img, v): # [-150, 150] => percentage: [-0.45, 0.45] 105 | #assert 0 <= v 106 | #if random.random() > 0.5: 107 | # v = -v 108 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v)) 109 | 110 | 111 | def Solarize(img, v): # [0, 256] 112 | assert 0 <= v <= 256 113 | return PIL.ImageOps.solarize(img, v) 114 | 115 | 116 | def Cutout(img, v): #[0, 60] => percentage: [0, 0.2] => change to [0, 0.5] 117 | assert 0.0 <= v <= 0.5 118 | if v <= 0.: 119 | return img 120 | 121 | v = v * img.size[0] 122 | return CutoutAbs(img, v) 123 | 124 | 125 | def CutoutAbs(img, v): # [0, 60] => percentage: [0, 0.2] 126 | # assert 0 <= v <= 20 127 | if v < 0: 128 | return img 129 | w, h = img.size 130 | x0 = np.random.uniform(w) 131 | y0 = np.random.uniform(h) 132 | 133 | x0 = int(max(0, x0 - v / 2.)) 134 | y0 = int(max(0, y0 - v / 2.)) 135 | x1 = min(w, x0 + v) 136 | y1 = min(h, y0 + v) 137 | 138 | xy = (x0, y0, x1, y1) 139 | color = (125, 123, 114) 140 | # color = (0, 0, 0) 141 | img = img.copy() 142 | PIL.ImageDraw.Draw(img).rectangle(xy, color) 143 | return img 144 | 145 | 146 | def augment_list(): 147 | l = [(AutoContrast, 0, 1), (Brightness, 0.05, 0.95), (Color, 0.05, 0.95), 148 | (Contrast, 0.05, 0.95), (Equalize, 0, 1), (Identity, 0, 1), 149 | (Posterize, 4, 8), (Rotate, -30, 30), (Sharpness, 0.05, 0.95), 150 | (ShearX, -0.3, 0.3), (ShearY, -0.3, 0.3), (Solarize, 0, 256), 151 | (TranslateX, -0.3, 0.3), (TranslateY, -0.3, 0.3)] 152 | return l 153 | 154 | 155 | class RandAugment: 156 | 157 | def __init__(self, n, m): 158 | self.n = n 159 | self.m = m # [0, 30] in fixmatch, deprecated. 160 | self.augment_list = augment_list() 161 | 162 | def __call__(self, img): 163 | ops = random.choices(self.augment_list, k=self.n) 164 | for op, min_val, max_val in ops: 165 | val = min_val + float(max_val - min_val) * random.random() 166 | img = op(img, val) 167 | cutout_val = random.random() * 0.5 168 | img = Cutout(img, cutout_val) #for fixmatch 169 | return img 170 | -------------------------------------------------------------------------------- /relation+coco.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+coco.npy -------------------------------------------------------------------------------- /relation+cub.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+cub.npy -------------------------------------------------------------------------------- /relation+nuswide.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+nuswide.npy -------------------------------------------------------------------------------- /relation+voc.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+voc.npy -------------------------------------------------------------------------------- /scpnet.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import os 3 | 4 | import numpy as np 5 | import torch 6 | from PIL import Image 7 | from torch.cuda.amp import autocast # type: ignore 8 | from torchvision import transforms 9 | 10 | from config import cfg 11 | from log import logger 12 | from model import SCPNet, load_clip_model 13 | from utils import COCO_missing_val_dataset, CocoDetection, ModelEma, get_ema_co 14 | 15 | from randaugment import RandAugment 16 | 17 | 18 | class WeakStrongDataset(torch.utils.data.Dataset): # type: ignore 19 | 20 | def __init__(self, 21 | root, 22 | annFile, 23 | transform, 24 | target_transform=None, 25 | class_num: int = -1): 26 | self.root = root 27 | with open(annFile, 'r') as f: 28 | names = f.readlines() 29 | self.name = names 30 | self.transform = transform 31 | self.class_num = class_num 32 | self.target_transform = target_transform 33 | self.strong_transform: transforms.Compose = copy.deepcopy( 34 | transform) # type: ignore 35 | self.strong_transform.transforms.insert(0, 36 | RandAugment(3, 37 | 5)) # type: ignore 38 | 39 | def __getitem__(self, index): 40 | name = self.name[index] 41 | path = name.strip('\n').split(',')[0] 42 | num = name.strip('\n').split(',')[1] 43 | num = num.strip(' ').split(' ') 44 | num = np.array([int(i) for i in num]) 45 | label = np.zeros([self.class_num]) 46 | label[num] = 1 47 | label = torch.tensor(label, dtype=torch.long) 48 | img = Image.open(os.path.join(self.root, path)).convert('RGB') 49 | 50 | img_w = self.transform(img) 51 | if self.target_transform is not None: 52 | target = self.target_transform(target) # type: ignore # noqa 53 | assert (self.target_transform is None) 54 | return [index, img_w, 55 | self.transform(img), 56 | self.strong_transform(img)], label 57 | 58 | def __len__(self): 59 | return len(self.name) 60 | 61 | 62 | def build_weak_strong_dataset(train_preprocess, 63 | val_preprocess, 64 | pin_memory=True): 65 | if "coco" in cfg.data: 66 | return build_coco_weak_strong_dataset(train_preprocess, val_preprocess) 67 | elif "nuswide" in cfg.data: 68 | return build_nuswide_weak_strong_dataset(train_preprocess, 69 | val_preprocess) 70 | elif "voc" in cfg.data: 71 | return build_voc_weak_strong_dataset(train_preprocess, val_preprocess) 72 | elif "cub" in cfg.data: 73 | return build_cub_weak_strong_dataset(train_preprocess, val_preprocess) 74 | else: 75 | assert (False) 76 | 77 | 78 | def build_coco_weak_strong_dataset(train_preprocess, val_preprocess): 79 | 80 | # COCO Data loading 81 | instances_path_val = os.path.join(cfg.data, 82 | 'annotations/instances_val2014.json') 83 | # instances_path_train = os.path.join(args.data, 'annotations/instances_train2014.json') 84 | instances_path_train = cfg.dataset 85 | 86 | data_path_val = f'{cfg.data}/val2014' # args.data 87 | data_path_train = f'{cfg.data}/train2014' # args.data 88 | val_dataset = CocoDetection(data_path_val, instances_path_val, 89 | val_preprocess) 90 | train_dataset = WeakStrongDataset(data_path_train, 91 | instances_path_train, 92 | train_preprocess, 93 | class_num=cfg.num_classes) 94 | 95 | # Pytorch Data loader 96 | train_loader = torch.utils.data.DataLoader( # type: ignore 97 | train_dataset, 98 | batch_size=cfg.batch_size, 99 | shuffle=True, 100 | num_workers=cfg.workers, 101 | pin_memory=True) 102 | 103 | val_loader = torch.utils.data.DataLoader( # type: ignore 104 | val_dataset, 105 | batch_size=cfg.batch_size, 106 | shuffle=False, 107 | num_workers=cfg.workers, 108 | pin_memory=False) 109 | 110 | return [train_loader, val_loader] 111 | 112 | 113 | def build_nuswide_weak_strong_dataset(train_preprocess, val_preprocess): 114 | # Nus_wide Data loading 115 | instances_path_train = cfg.train_dataset 116 | instances_path_val = cfg.val_dataset 117 | 118 | data_path_val = f'{cfg.data}images' # args.data 119 | data_path_train = f'{cfg.data}images' # args.data 120 | 121 | val_dataset = COCO_missing_val_dataset(data_path_val, 122 | instances_path_val, 123 | val_preprocess, 124 | class_num=cfg.num_classes) 125 | train_dataset = WeakStrongDataset(data_path_train, 126 | instances_path_train, 127 | train_preprocess, 128 | class_num=cfg.num_classes) 129 | # Pytorch Data loader 130 | train_loader = torch.utils.data.DataLoader(train_dataset, 131 | batch_size=cfg.batch_size, 132 | shuffle=True, 133 | num_workers=cfg.workers, 134 | pin_memory=True) 135 | 136 | val_loader = torch.utils.data.DataLoader(val_dataset, 137 | batch_size=cfg.batch_size, 138 | shuffle=False, 139 | num_workers=cfg.workers, 140 | pin_memory=False) 141 | return [train_loader, val_loader] 142 | 143 | 144 | def build_voc_weak_strong_dataset(train_preprocess, val_preprocess): 145 | # VOC Data loading 146 | instances_path_train = cfg.train_dataset 147 | instances_path_val = cfg.val_dataset 148 | 149 | data_path_val = f'{cfg.data}VOC2012/JPEGImages' # args.data 150 | data_path_train = f'{cfg.data}VOC2012/JPEGImages' # args.data 151 | 152 | val_dataset = COCO_missing_val_dataset(data_path_val, 153 | instances_path_val, 154 | val_preprocess, 155 | class_num=cfg.num_classes) 156 | train_dataset = WeakStrongDataset(data_path_train, 157 | instances_path_train, 158 | train_preprocess, 159 | class_num=cfg.num_classes) 160 | # Pytorch Data loader 161 | train_loader = torch.utils.data.DataLoader(train_dataset, 162 | batch_size=cfg.batch_size, 163 | shuffle=True, 164 | num_workers=cfg.workers, 165 | pin_memory=True) 166 | 167 | val_loader = torch.utils.data.DataLoader(val_dataset, 168 | batch_size=cfg.batch_size, 169 | shuffle=False, 170 | num_workers=cfg.workers, 171 | pin_memory=False) 172 | return [train_loader, val_loader] 173 | 174 | 175 | def build_cub_weak_strong_dataset(train_preprocess, val_preprocess): 176 | # CUB Data loading 177 | instances_path_train = cfg.train_dataset 178 | instances_path_val = cfg.val_dataset 179 | 180 | data_path_val = f'{cfg.data}CUB_200_2011/images' # args.data 181 | data_path_train = f'{cfg.data}CUB_200_2011/images' # args.data 182 | 183 | val_dataset = COCO_missing_val_dataset(data_path_val, 184 | instances_path_val, 185 | val_preprocess, 186 | class_num=cfg.num_classes) 187 | train_dataset = WeakStrongDataset(data_path_train, 188 | instances_path_train, 189 | train_preprocess, 190 | class_num=cfg.num_classes) 191 | # Pytorch Data loader 192 | train_loader = torch.utils.data.DataLoader(train_dataset, 193 | batch_size=cfg.batch_size, 194 | shuffle=True, 195 | num_workers=cfg.workers, 196 | pin_memory=True) 197 | 198 | val_loader = torch.utils.data.DataLoader(val_dataset, 199 | batch_size=cfg.batch_size, 200 | shuffle=False, 201 | num_workers=cfg.workers, 202 | pin_memory=False) 203 | return [train_loader, val_loader] 204 | 205 | class SCPNetTrainer(): 206 | 207 | def __init__(self) -> None: 208 | super().__init__() 209 | 210 | clip_model, _ = load_clip_model() 211 | # image_size = clip_model.visual.input_resolution 212 | image_size = cfg.image_size 213 | 214 | normalize = transforms.Normalize((0.48145466, 0.4578275, 0.40821073), 215 | (0.26862954, 0.26130258, 0.27577711)) 216 | 217 | train_preprocess = transforms.Compose([ 218 | transforms.RandomHorizontalFlip(), 219 | transforms.RandomResizedCrop(image_size), 220 | transforms.ToTensor(), normalize 221 | ]) 222 | val_preprocess = transforms.Compose([ 223 | transforms.Resize(image_size), 224 | transforms.CenterCrop(image_size), 225 | transforms.ToTensor(), normalize 226 | ]) 227 | 228 | train_loader, val_loader = build_weak_strong_dataset( 229 | train_preprocess, # type: ignore 230 | val_preprocess) 231 | self.train_loader = train_loader 232 | self.val_loader = val_loader 233 | 234 | classnames = val_loader.dataset.labels() 235 | assert (len(classnames) == cfg.num_classes) 236 | 237 | self.model = SCPNet(classnames, clip_model) 238 | self.relation = self.model.relation 239 | self.classnames = classnames 240 | for name, param in self.model.named_parameters(): 241 | if "text_encoder" in name: 242 | param.requires_grad_(False) 243 | 244 | self.model.cuda() 245 | ema_co = get_ema_co() 246 | self.ema = ModelEma(self.model, ema_co) # 0.9997^641=0.82 247 | 248 | self.selected_label = torch.zeros( 249 | (len(self.train_loader.dataset), cfg.num_classes), 250 | dtype=torch.long, 251 | ) 252 | self.selected_label = self.selected_label.cuda() 253 | self.classwise_acc = torch.zeros((cfg.num_classes, )).cuda() 254 | self.classwise_acc[:] = 1/cfg.num_classes 255 | 256 | def consistency_loss(self, logits_s, logits_w, y_lb): 257 | logits_w = logits_w.detach() 258 | 259 | pseudo_label = torch.sigmoid(logits_w) 260 | pseudo_label_s = torch.sigmoid(logits_s) 261 | 262 | relation_p = pseudo_label @ self.relation.cuda().t() 263 | 264 | max_probs, max_idx = torch.topk(pseudo_label, cfg.hard_k, dim=-1) 265 | threhold = cfg.p_cutoff * (self.classwise_acc[max_idx] / 266 | (2. - self.classwise_acc[max_idx])) 267 | mask = max_probs.ge(threhold).float().sum(dim=1) >= 1 # convex 268 | labels = torch.zeros((len(logits_s), cfg.num_classes), 269 | dtype=torch.long) 270 | for i, idx in enumerate(max_idx): 271 | labels[i][idx] = 1 272 | labels_mask = pseudo_label < cfg.p_cutoff * ( 273 | self.classwise_acc / (2. - self.classwise_acc)) 274 | labels[labels_mask] = 0 275 | labels = torch.logical_or(labels, y_lb.cpu()).type(torch.long) 276 | labels = labels.cuda() 277 | xs_pos = pseudo_label_s 278 | xs_neg = 1 - pseudo_label_s 279 | los_pos = labels * torch.log(xs_pos.clamp(min=1e-8)) 280 | los_neg = (1 - labels) * torch.log(xs_neg.clamp(min=1e-8)) 281 | loss = (los_pos + los_neg) * mask.reshape(-1, 1) 282 | loss_kl = (relation_p * torch.log(xs_pos.clamp(min=1e-8)) + (1 - relation_p) * torch.log(xs_neg.clamp(min=1e-8))) * mask.reshape(-1, 1) 283 | return -loss.sum() - cfg.kl_lambda * loss_kl.sum(), labels 284 | 285 | def train(self, input, target, criterion, epoch, epoch_i) -> torch.Tensor: 286 | 287 | x_ulb_idx, x_lb, x_ulb_w, x_ulb_s = input 288 | y_lb = target 289 | 290 | num_lb = x_lb.shape[0] 291 | num_ulb = x_ulb_w.shape[0] 292 | assert num_ulb == x_ulb_s.shape[0] 293 | 294 | x_lb, x_ulb_w, x_ulb_s = x_lb.cuda(), x_ulb_w.cuda(), x_ulb_s.cuda() 295 | x_ulb_idx = x_ulb_idx.cuda() 296 | 297 | pseudo_counter = self.selected_label.sum(dim=0) 298 | max_v = pseudo_counter.max().item() 299 | sum_v = pseudo_counter.sum().item() 300 | if max_v >= 1: # not all(5w) -1 301 | for i in range(cfg.num_classes): 302 | self.classwise_acc[i] = max(pseudo_counter[i] / max( 303 | max_v, 304 | cfg.hard_k * len(self.selected_label) - sum_v), 1/cfg.num_classes) 305 | 306 | inputs = torch.cat((x_lb, x_ulb_w, x_ulb_s)) 307 | 308 | # inference and calculate sup/unsup losses 309 | with autocast(): 310 | logits = self.model(inputs) 311 | logits_x_lb = logits[:num_lb] 312 | logits_x_ulb_w, logits_x_ulb_s = logits[num_lb:].chunk(2) 313 | logits_x_lb = logits_x_lb.float() 314 | logits_x_ulb_w, logits_x_ulb_s = logits_x_ulb_w.float( 315 | ), logits_x_ulb_s.float() 316 | 317 | sup_loss, _ = criterion(logits_x_lb, y_lb, epoch) 318 | 319 | unsup_loss, labels = self.consistency_loss(logits_x_ulb_s, 320 | logits_x_ulb_w, y_lb) 321 | 322 | assert (labels is not None) 323 | select_mask = labels.sum(dim=1) >= 1 324 | if x_ulb_idx[select_mask].nelement() != 0: 325 | self.selected_label[ 326 | x_ulb_idx[select_mask]] = labels[select_mask] 327 | 328 | total_loss = sup_loss + cfg.lambda_u * unsup_loss 329 | 330 | return total_loss 331 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | from typing import Tuple 4 | 5 | import torch 6 | from torch.cuda.amp import GradScaler, autocast # type: ignore 7 | from torch.optim import lr_scheduler 8 | 9 | from log import logger 10 | from loss import SPLC 11 | from scpnet import SCPNetTrainer 12 | from utils import AverageMeter, add_weight_decay, mAP 13 | 14 | from config import cfg # isort:skip 15 | 16 | def save_best(trainer, if_ema_better: bool) -> None: 17 | if if_ema_better: 18 | torch.save(trainer.ema.module.state_dict(), 19 | os.path.join(cfg.checkpoint, 'model-highest.ckpt')) 20 | else: 21 | torch.save(trainer.model.state_dict(), 22 | os.path.join(cfg.checkpoint, 'model-highest.ckpt')) 23 | torch.save(trainer.model.state_dict(), 24 | os.path.join(cfg.checkpoint, 'model-highest-regular.ckpt')) 25 | torch.save(trainer.ema.module.state_dict(), 26 | os.path.join(cfg.checkpoint, 'model-highest-ema.ckpt')) 27 | 28 | def validate(trainer, epoch: int) -> Tuple[float, bool]: 29 | 30 | trainer.model.eval() 31 | logger.info("Start validation...") 32 | sigmoid = torch.nn.Sigmoid() 33 | preds_regular = [] 34 | preds_ema = [] 35 | targets = [] 36 | for _, (input, target) in enumerate(trainer.val_loader): 37 | target = target 38 | # compute output 39 | with torch.no_grad(): 40 | with autocast(): 41 | if cfg.model_name != 'simsiam': 42 | output_regular = sigmoid( 43 | trainer.model(input.cuda())).cpu() 44 | output_ema = sigmoid( 45 | trainer.ema.module(input.cuda())).cpu() 46 | else: 47 | output_regular = sigmoid( 48 | trainer.model.module.clip( 49 | input.cuda())).cpu() 50 | output_ema = sigmoid( 51 | trainer.ema.module.module.clip( 52 | input.cuda())).cpu() 53 | 54 | # for mAP calculation 55 | preds_regular.append(output_regular.cpu().detach()) 56 | preds_ema.append(output_ema.cpu().detach()) 57 | targets.append(target.cpu().detach()) 58 | 59 | mAP_score_regular = mAP( 60 | torch.cat(targets).numpy(), 61 | torch.cat(preds_regular).numpy()) 62 | mAP_score_ema = mAP( 63 | torch.cat(targets).numpy(), 64 | torch.cat(preds_ema).numpy()) 65 | logger.info("mAP score regular {:.2f}, mAP score EMA {:.2f}".format( 66 | mAP_score_regular, mAP_score_ema)) 67 | mAP_max = max(mAP_score_regular, mAP_score_ema) 68 | if mAP_score_ema >= mAP_score_regular: 69 | if_ema_better = True 70 | else: 71 | if_ema_better = False 72 | 73 | trainer.model.train() 74 | return mAP_max, if_ema_better 75 | 76 | def train(trainer) -> None: 77 | # set optimizer 78 | criterion = SPLC() 79 | parameters = add_weight_decay(trainer.model, cfg.weight_decay) 80 | max_lr = [cfg.lr, cfg.lr, cfg.gcn_lr, cfg.gcn_lr] 81 | optimizer = torch.optim.Adam( 82 | params=parameters, lr=cfg.lr, 83 | weight_decay=0) # true wd, filter_bias_and_bn 84 | steps_per_epoch = len(trainer.train_loader) 85 | scheduler = lr_scheduler.OneCycleLR( # type: ignore 86 | optimizer, 87 | max_lr=max_lr, 88 | steps_per_epoch=steps_per_epoch, 89 | epochs=cfg.total_epochs, # type: ignore 90 | pct_start=0.2) 91 | 92 | highest_mAP = 0 93 | scaler = GradScaler() 94 | best_epoch = 0 95 | for epoch in range(cfg.epochs): 96 | for i, (input, target) in enumerate(trainer.train_loader): 97 | target = target.cuda() # (batch,3,num_classes) 98 | # target = target.max(dim=1)[0] 99 | loss = trainer.train(input, target, criterion, epoch, i) 100 | 101 | trainer.model.zero_grad() 102 | scaler.scale(loss).backward() # type: ignore 103 | scaler.step(optimizer) 104 | scaler.update() 105 | scheduler.step() 106 | trainer.ema.update(trainer.model) 107 | if i % 100 == 0: 108 | logger.info('Epoch [{}/{}], Step [{}/{}], LR {:.1e}, Loss: {:.1f}' 109 | .format(epoch, cfg.epochs, str(i).zfill(3), str(steps_per_epoch).zfill(3), # noqa 110 | scheduler.get_last_lr()[0], \ 111 | loss.item())) 112 | 113 | mAP_score, if_ema_better = validate(trainer, epoch) 114 | 115 | if mAP_score > highest_mAP: 116 | highest_mAP = mAP_score 117 | best_epoch = epoch 118 | save_best(trainer, if_ema_better) 119 | logger.info( 120 | 'current_mAP = {:.2f}, highest_mAP = {:.2f}, best_epoch={}\n'. 121 | format(mAP_score, highest_mAP, best_epoch)) 122 | logger.info("Save text embeddings done") 123 | 124 | def test(trainer) -> None: 125 | # get model-highest.ckpt 126 | trainer.model.load_state_dict( 127 | torch.load(f"{cfg.checkpoint}/model-highest.ckpt"), strict=True) 128 | trainer.model.eval() 129 | 130 | logger.info("Start test...") 131 | batch_time = AverageMeter() 132 | prec = AverageMeter() 133 | rec = AverageMeter() 134 | # mAP_meter = AverageMeter() 135 | 136 | sigmoid = torch.nn.Sigmoid() 137 | 138 | end = time.time() 139 | tp, fp, fn, tn, count = 0, 0, 0, 0, 0 140 | preds = [] 141 | targets = [] 142 | for i, (input, target) in enumerate(trainer.val_loader): 143 | target = target 144 | # compute output 145 | with torch.no_grad(): 146 | output = sigmoid(trainer.model(input.cuda())).cpu() 147 | 148 | # for mAP calculation 149 | preds.append(output.cpu()) 150 | targets.append(target.cpu()) 151 | 152 | # measure accuracy and record loss 153 | pred = output.data.gt(cfg.thre).long() 154 | 155 | tp += (pred + target).eq(2).sum(dim=0) 156 | fp += (pred - target).eq(1).sum(dim=0) 157 | fn += (pred - target).eq(-1).sum(dim=0) 158 | tn += (pred + target).eq(0).sum(dim=0) 159 | count += input.size(0) 160 | 161 | this_tp = (pred + target).eq(2).sum() 162 | this_fp = (pred - target).eq(1).sum() 163 | this_fn = (pred - target).eq(-1).sum() 164 | # this_tn = (pred + target).eq(0).sum() 165 | 166 | this_prec = this_tp.float() / (this_tp + this_fp).float( 167 | ) * 100.0 if this_tp + this_fp != 0 else 0.0 168 | this_rec = this_tp.float() / (this_tp + this_fn).float( 169 | ) * 100.0 if this_tp + this_fn != 0 else 0.0 170 | 171 | prec.update(float(this_prec), input.size(0)) 172 | rec.update(float(this_rec), input.size(0)) 173 | 174 | # measure elapsed time 175 | batch_time.update(time.time() - end) 176 | end = time.time() 177 | 178 | p_c = [ 179 | float(tp[i].float() / (tp[i] + fp[i]).float()) * 180 | 100.0 if tp[i] > 0 else 0.0 for i in range(len(tp)) 181 | ] 182 | r_c = [ 183 | float(tp[i].float() / (tp[i] + fn[i]).float()) * 184 | 100.0 if tp[i] > 0 else 0.0 for i in range(len(tp)) 185 | ] 186 | f_c = [ 187 | 2 * p_c[i] * r_c[i] / (p_c[i] + r_c[i]) if tp[i] > 0 else 0.0 188 | for i in range(len(tp)) 189 | ] 190 | 191 | mean_p_c = sum(p_c) / len(p_c) 192 | mean_r_c = sum(r_c) / len(r_c) 193 | mean_f_c = sum(f_c) / len(f_c) 194 | 195 | p_o = tp.sum().float() / (tp + fp).sum().float() * 100.0 196 | r_o = tp.sum().float() / (tp + fn).sum().float() * 100.0 197 | f_o = 2 * p_o * r_o / (p_o + r_o) 198 | 199 | if i % 64 == 0: 200 | logger.info( 201 | 'Test: [{0}/{1}]\t' 202 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 203 | 'Precision {prec.val:.2f} ({prec.avg:.2f})\t' 204 | 'Recall {rec.val:.2f} ({rec.avg:.2f})'.format( 205 | i, 206 | len(trainer.val_loader), 207 | batch_time=batch_time, 208 | prec=prec, 209 | rec=rec)) 210 | logger.info( 211 | 'P_C {:.2f} R_C {:.2f} F_C {:.2f} P_O {:.2f} R_O {:.2f} F_O {:.2f}' 212 | .format(mean_p_c, mean_r_c, mean_f_c, p_o, r_o, f_o)) 213 | 214 | logger.info( 215 | '--------------------------------------------------------------------' 216 | ) 217 | logger.info( 218 | ' * P_C {:.2f} R_C {:.2f} F_C {:.2f} P_O {:.2f} R_O {:.2f} F_O {:.2f}' 219 | .format(mean_p_c, mean_r_c, mean_f_c, p_o, r_o, 220 | f_o)) # type: ignore 221 | 222 | mAP_score = mAP(torch.cat(targets).numpy(), torch.cat(preds).numpy()) 223 | logger.info(f"mAP score: {mAP_score}") 224 | return torch.cat(targets).numpy(), torch.cat(preds).numpy() # type: ignore 225 | 226 | 227 | 228 | def main(): 229 | trainer = SCPNetTrainer() 230 | if cfg.test: 231 | test(trainer) 232 | else: 233 | train(trainer) 234 | 235 | if __name__ == '__main__': 236 | main() -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | from copy import deepcopy 4 | 5 | import numpy as np 6 | import torch 7 | from PIL import Image, ImageDraw 8 | from pycocotools.coco import COCO 9 | from torchvision import datasets as datasets 10 | 11 | from config import cfg 12 | from log import logger 13 | 14 | 15 | def average_precision(output, target): 16 | epsilon = 1e-8 17 | 18 | # sort examples 19 | indices = output.argsort()[::-1] 20 | # Computes prec@i 21 | total_count_ = np.cumsum(np.ones((len(output), 1))) 22 | 23 | target_ = target[indices] 24 | ind = target_ == 1 25 | pos_count_ = np.cumsum(ind) 26 | total = pos_count_[-1] 27 | pos_count_[np.logical_not(ind)] = 0 # type: ignore 28 | pp = pos_count_ / total_count_ 29 | precision_at_i_ = np.sum(pp) 30 | precision_at_i = precision_at_i_ / (total + epsilon) 31 | 32 | return precision_at_i 33 | 34 | 35 | def mAP(targs, preds): 36 | """Returns the model's average precision for each class 37 | Return: 38 | ap (FloatTensor): 1xK tensor, with avg precision for each class k 39 | """ 40 | 41 | if np.size(preds) == 0: 42 | return 0 43 | ap = np.zeros((preds.shape[1])) 44 | # compute average precision for each class 45 | for k in range(preds.shape[1]): 46 | # sort scores 47 | scores = preds[:, k] 48 | targets = targs[:, k] 49 | # compute average precision 50 | ap[k] = average_precision(scores, targets) 51 | return 100 * ap.mean() 52 | 53 | 54 | class AverageMeter(object): 55 | 56 | def __init__(self): 57 | self.val = None 58 | self.sum = None 59 | self.cnt = None 60 | self.avg = None 61 | self.ema = None 62 | self.initialized = False 63 | 64 | def update(self, val, n=1): 65 | if not self.initialized: 66 | self.initialize(val, n) 67 | else: 68 | self.add(val, n) 69 | 70 | def initialize(self, val, n): 71 | self.val = val 72 | self.sum = val * n 73 | self.cnt = n 74 | self.avg = val 75 | self.ema = val 76 | self.initialized = True 77 | 78 | def add(self, val, n): 79 | self.val = val 80 | self.sum += val * n 81 | self.cnt += n 82 | self.avg = self.sum / self.cnt 83 | self.ema = self.ema * 0.99 + self.val * 0.01 # type: ignore 84 | 85 | 86 | class CocoDetection(datasets.coco.CocoDetection): 87 | 88 | def __init__(self, root, annFile, transform=None, target_transform=None): 89 | self.root = root 90 | self.coco = COCO(annFile) 91 | 92 | self.ids = list(self.coco.imgToAnns.keys()) 93 | self.transform = transform 94 | self.target_transform = target_transform 95 | self.cat2cat = dict() 96 | for cat in self.coco.cats.keys(): 97 | self.cat2cat[cat] = len(self.cat2cat) 98 | 99 | def labels(self): 100 | return [v["name"] for v in self.coco.cats.values()] 101 | 102 | def __getitem__(self, index): 103 | coco = self.coco 104 | img_id = self.ids[index] 105 | ann_ids = coco.getAnnIds(imgIds=img_id) 106 | target = coco.loadAnns(ann_ids) 107 | 108 | output = torch.zeros((3, 80), dtype=torch.long) 109 | for obj in target: # type: ignore 110 | if obj['area'] < 32 * 32: 111 | output[0][self.cat2cat[obj['category_id']]] = 1 112 | elif obj['area'] < 96 * 96: 113 | output[1][self.cat2cat[obj['category_id']]] = 1 114 | else: 115 | output[2][self.cat2cat[obj['category_id']]] = 1 116 | target = output 117 | 118 | path = coco.loadImgs(img_id)[0]['file_name'] # type: ignore 119 | img = Image.open(os.path.join(self.root, path)).convert('RGB') 120 | if self.transform is not None: 121 | img = self.transform(img) 122 | if self.target_transform is not None: 123 | target = self.target_transform(target) 124 | target = target.max(dim=0)[0] 125 | return img, target 126 | 127 | 128 | class COCO_missing_dataset(torch.utils.data.Dataset): # type: ignore 129 | 130 | def __init__(self, 131 | root, 132 | annFile, 133 | transform=None, 134 | target_transform=None, 135 | class_num: int = -1): 136 | self.root = root 137 | with open(annFile, 'r') as f: 138 | names = f.readlines() 139 | # name = names.strip('\n').split(' ') 140 | self.name = names 141 | # self.label = name[:,1] 142 | self.transform = transform 143 | self.class_num = class_num 144 | self.target_transform = target_transform 145 | 146 | def __getitem__(self, index): 147 | name = self.name[index] 148 | path = name.strip('\n').split(',')[0] 149 | num = name.strip('\n').split(',')[1] 150 | num = num.strip(' ').split(' ') 151 | num = np.array([int(i) for i in num]) 152 | label = np.zeros([self.class_num]) 153 | label[num] = 1 154 | label = torch.tensor(label, dtype=torch.long) 155 | if os.path.exists(os.path.join(self.root, path)) == False: 156 | label = np.zeros([self.class_num]) 157 | label = torch.tensor(label, dtype=torch.long) 158 | img = np.zeros((448, 448, 3)) 159 | img = Image.fromarray(np.uint8(img)) # type: ignore 160 | exit(1) 161 | else: 162 | img = Image.open(os.path.join(self.root, path)).convert('RGB') 163 | if self.transform is not None: 164 | img = self.transform(img) 165 | if self.target_transform is not None: 166 | target = self.target_transform(target) # type: ignore # noqa 167 | assert (self.target_transform is None) 168 | return [index,img], label 169 | 170 | def __len__(self): 171 | return len(self.name) 172 | 173 | def labels(self): 174 | if "coco" in cfg.data: 175 | assert (False) 176 | elif "nuswide" in cfg.data: 177 | with open('nuswide_labels.txt', 'r') as f: 178 | text = f.read() 179 | return text.split('\n') 180 | elif "voc" in cfg.data: 181 | with open('voc_labels.txt', 'r') as f: 182 | text = f.read() 183 | return text.split('\n') 184 | elif "cub" in cfg.data: 185 | with open('cub_labels.txt', 'r') as f: 186 | text = f.read() 187 | return text.split('\n') 188 | else: 189 | assert (False) 190 | 191 | 192 | class COCO_missing_val_dataset(torch.utils.data.Dataset): # type: ignore 193 | 194 | def __init__(self, 195 | root, 196 | annFile, 197 | transform=None, 198 | target_transform=None, 199 | class_num: int = -1): 200 | self.root = root 201 | with open(annFile, 'r') as f: 202 | names = f.readlines() 203 | # name = names.strip('\n').split(' ') 204 | self.name = names 205 | # self.label = name[:,1] 206 | self.transform = transform 207 | self.class_num = class_num 208 | self.target_transform = target_transform 209 | 210 | def __getitem__(self, index): 211 | name = self.name[index] 212 | path = name.strip('\n').split(',')[0] 213 | num = name.strip('\n').split(',')[1] 214 | num = num.strip(' ').split(' ') 215 | num = np.array([int(i) for i in num]) 216 | label = np.zeros([self.class_num]) 217 | label[num] = 1 218 | label = torch.tensor(label, dtype=torch.long) 219 | if os.path.exists(os.path.join(self.root, path)) == False: 220 | label = np.zeros([self.class_num]) 221 | label = torch.tensor(label, dtype=torch.long) 222 | img = np.zeros((448, 448, 3)) 223 | img = Image.fromarray(np.uint8(img)) # type: ignore 224 | exit(1) 225 | else: 226 | img = Image.open(os.path.join(self.root, path)).convert('RGB') 227 | if self.transform is not None: 228 | img = self.transform(img) 229 | if self.target_transform is not None: 230 | target = self.target_transform(target) # type: ignore # noqa 231 | assert (self.target_transform is None) 232 | return img, label 233 | 234 | def __len__(self): 235 | return len(self.name) 236 | 237 | def labels(self): 238 | if "coco" in cfg.data: 239 | assert (False) 240 | elif "nuswide" in cfg.data: 241 | with open('nuswide_labels.txt', 'r') as f: 242 | text = f.read() 243 | return text.split('\n') 244 | elif "voc" in cfg.data: 245 | with open('voc_labels.txt', 'r') as f: 246 | text = f.read() 247 | return text.split('\n') 248 | elif "cub" in cfg.data: 249 | with open('cub_labels.txt', 'r') as f: 250 | text = f.read() 251 | return text.split('\n') 252 | else: 253 | assert (False) 254 | 255 | 256 | class ModelEma(torch.nn.Module): 257 | 258 | def __init__(self, model, decay=0.9997, device=None): 259 | super(ModelEma, self).__init__() 260 | # make a copy of the model for accumulating moving average of weights 261 | self.module = deepcopy(model) 262 | self.module.eval() 263 | self.decay = decay 264 | self.device = device # perform ema on different device from model if set 265 | if self.device is not None: 266 | self.module.to(device=device) 267 | 268 | def _update(self, model, update_fn): 269 | with torch.no_grad(): 270 | for ema_v, model_v in zip(self.module.state_dict().values(), 271 | model.state_dict().values()): 272 | if self.device is not None: 273 | model_v = model_v.to(device=self.device) 274 | ema_v.copy_(update_fn(ema_v, model_v)) 275 | 276 | def update(self, model): 277 | self._update(model, 278 | update_fn=lambda e, m: self.decay * e + 279 | (1. - self.decay) * m) 280 | 281 | def set(self, model): 282 | self._update(model, update_fn=lambda e, m: m) 283 | 284 | 285 | class CutoutPIL(object): 286 | 287 | def __init__(self, cutout_factor=0.5): 288 | self.cutout_factor = cutout_factor 289 | 290 | def __call__(self, x): 291 | img_draw = ImageDraw.Draw(x) 292 | h, w = x.size[0], x.size[1] # HWC 293 | h_cutout = int(self.cutout_factor * h + 0.5) 294 | w_cutout = int(self.cutout_factor * w + 0.5) 295 | y_c = np.random.randint(h) 296 | x_c = np.random.randint(w) 297 | 298 | y1 = np.clip(y_c - h_cutout // 2, 0, h) 299 | y2 = np.clip(y_c + h_cutout // 2, 0, h) 300 | x1 = np.clip(x_c - w_cutout // 2, 0, w) 301 | x2 = np.clip(x_c + w_cutout // 2, 0, w) 302 | fill_color = (random.randint(0, 255), random.randint(0, 255), 303 | random.randint(0, 255)) 304 | img_draw.rectangle([x1, y1, x2, y2], fill=fill_color) # type: ignore 305 | 306 | return x 307 | 308 | 309 | def add_weight_decay(model, weight_decay=1e-4, skip_list=()): 310 | decay = [] 311 | no_decay = [] 312 | gcn = [] 313 | gcn_no_decay = [] 314 | prefix = "module." if torch.cuda.device_count() > 1 else "" 315 | for name, param in model.named_parameters(): 316 | if not param.requires_grad: 317 | continue # frozen weights 318 | if name.startswith(f"{prefix}gc"): 319 | if len(param.shape) == 1 or name.endswith(".bias") or name in skip_list: 320 | gcn_no_decay.append(param) 321 | else: 322 | gcn.append(param) 323 | assert("gcn" in cfg.model_name) 324 | elif len(param.shape) == 1 or name.endswith( 325 | ".bias") or name in skip_list: 326 | no_decay.append(param) 327 | else: 328 | decay.append(param) 329 | return [{ 330 | 'params': no_decay, 331 | 'weight_decay': 0. 332 | }, { 333 | 'params': decay, 334 | 'weight_decay': weight_decay 335 | }, { 336 | 'params': gcn_no_decay, 337 | 'weight_decay': 0. 338 | }, { 339 | 'params': gcn, 340 | 'weight_decay': weight_decay 341 | }] 342 | 343 | def get_ema_co(): 344 | if "coco" in cfg.data: 345 | ema_co = np.exp(np.log(0.82)/(641*cfg.ratio)) # type: ignore 346 | # ema_co = 0.9997 347 | elif "nus" in cfg.data: 348 | ema_co = np.exp(np.log(0.82)/(931*cfg.ratio)) # type: ignore 349 | # ema_co = 0.9998 350 | elif "voc" in cfg.data: 351 | ema_co = np.exp(np.log(0.82)/(45*cfg.ratio)) # type: ignore 352 | # ema_co = 0.9956 353 | elif "cub" in cfg.data: 354 | if cfg.batch_size == 96: 355 | ema_co = np.exp(np.log(0.82)/(63*cfg.ratio)) 356 | else: 357 | ema_co = np.exp(np.log(0.82)/(47*cfg.ratio)) # type: ignore 358 | else: 359 | assert(False) 360 | return ema_co -------------------------------------------------------------------------------- /voc_labels.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor --------------------------------------------------------------------------------