├── .gitignore
├── README.md
├── args.py
├── clip
├── __init__.py
├── bpe_simple_vocab_16e6.txt.gz
├── clip.py
├── model.py
└── simple_tokenizer.py
├── config.py
├── configs
├── base.yaml
├── scpnet+coco.yaml
├── scpnet+cub.yaml
├── scpnet+nuswide.yaml
└── scpnet+voc.yaml
├── cub_labels.txt
├── dataset
├── coco_train_singlelabel.txt
├── cub_train.txt
├── cub_val.txt
├── nus_train_singlelabel.txt
├── nus_val.txt
├── voc_train.txt
└── voc_val.txt
├── figures
└── overview.png
├── log.py
├── logs
├── scpnet+coco.txt
├── scpnet+cub.txt
├── scpnet+nuswide.txt
└── scpnet+voc.txt
├── loss.py
├── model.py
├── nuswide_labels.txt
├── randaugment.py
├── relation+coco.npy
├── relation+cub.npy
├── relation+nuswide.npy
├── relation+voc.npy
├── scpnet.py
├── train.py
├── utils.py
└── voc_labels.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | **/__pycache__
2 | checkpoints
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # [Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels](https://openaccess.thecvf.com/content/CVPR2023/papers/Ding_Exploring_Structured_Semantic_Prior_for_Multi_Label_Recognition_With_Incomplete_CVPR_2023_paper.pdf)
2 |
3 | Official PyTorch Implementation of **SCPNet**, from the following paper:
4 |
5 | [Exploring Structured Semantic Prior
6 | for Multi Label Recognition with Incomplete Labels](https://openaccess.thecvf.com/content/CVPR2023/papers/Ding_Exploring_Structured_Semantic_Prior_for_Multi_Label_Recognition_With_Incomplete_CVPR_2023_paper.pdf). CVPR 2023.
7 |
8 | > Zixuan Ding*, Ao Wang*, Hui Chen†, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, Jungong Han,
9 | >
Xidian University, Tsinghua University, JD.com
10 |
11 |
12 | **Abstract**
13 |
14 | Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the imageto-label correspondence in the vision-language model, i.e., CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the
15 | valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used
16 | benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method.
17 |
18 |
19 |
20 |
21 |  |
22 |
23 |
24 |
25 |
26 |
27 | ## Credit to previous work
28 | This repository is built upon the code base of [ASL](https://github.com/Alibaba-MIIL/ASL) and [SPLC](https://github.com/xinyu1205/robust-loss-mlml), thanks very much!
29 |
30 | ## Performance
31 |
32 | | Dataset | mAP | Ckpt | Log |
33 | |:---: | :---: | :---: | :---: |
34 | | COCO | 76.4 | [scpnet+coco.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+coco.ckpt) | [scpnet+coco.txt](logs/scpnet+coco.txt) |
35 | | VOC | 91.2 | [scpnet+voc.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+voc.ckpt) | [scpnet+voc.txt](logs/scpnet+voc.txt) |
36 | | NUSWIDE | 62.0 | [scpnet+nuswide.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+nuswide.ckpt) | [scpnet+nuswide.txt](logs/scpnet+nuswide.txt) |
37 | | CUB | 25.7 | [scpnet+cub.ckpt](https://github.com/jameslahm/SCPNet/releases/download/v1.0/scpnet+cub.ckpt) | [scpnet+cub.txt](logs/scpnet+cub.txt) |
38 |
39 | ## Training
40 |
41 | ### COCO
42 | ```python
43 | python train.py -c configs/scpnet+coco.yaml
44 | ```
45 |
46 | ### VOC
47 | ```python
48 | python train.py -c configs/scpnet+voc.yaml
49 | ```
50 |
51 | ### NUSWIDE
52 | ```python
53 | python train.py -c configs/scpnet+nuswide.yaml
54 | ```
55 |
56 | ### CUB
57 | ```python
58 | python train.py -c configs/scpnet+cub.yaml
59 | ```
60 |
61 | ## Inference
62 |
63 | > Note: Please place the pretrained checkpoint to checkpoints/scpnet+coco/round1/model-highest.ckpt
64 |
65 | #### COCO
66 | ```python
67 | python train.py -c configs/scpnet+coco.yaml -t -r 1
68 | ```
69 |
70 | #### VOC
71 | ```python
72 | python train.py -c configs/scpnet+voc.yaml -t -r 1
73 | ```
74 |
75 | #### NUSWIDE
76 | ```python
77 | python train.py -c configs/scpnet+nuswide.yaml -t -r 1
78 | ```
79 |
80 | #### CUB
81 | ```python
82 | python train.py -c configs/scpnet+cub.yaml -t -r 1
83 | ```
84 |
85 | ## Citation
86 | ```
87 | @inproceedings{ding2023exploring,
88 | title={Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels},
89 | author={Ding, Zixuan and Wang, Ao and Chen, Hui and Zhang, Qiang and Liu, Pengzhang and Bao, Yongjun and Yan, Weipeng and Han, Jungong},
90 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
91 | pages={3398--3407},
92 | year={2023}
93 | }
94 | ```
--------------------------------------------------------------------------------
/args.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | parser = argparse.ArgumentParser(description='PyTorch MS_COCO Training')
4 | parser.add_argument('-c',
5 | '--config-file',
6 | help='config file',
7 | default='configs/base.yaml',
8 | type=str)
9 | parser.add_argument('-t',
10 | '--test',
11 | help='run test',
12 | default=False,
13 | action="store_true")
14 | parser.add_argument('-r', '--round', help='round', default=1, type=int)
15 | parser.add_argument('--resume', default=False, action='store_true')
16 | args = parser.parse_args()
17 |
--------------------------------------------------------------------------------
/clip/__init__.py:
--------------------------------------------------------------------------------
1 | from .clip import *
2 |
--------------------------------------------------------------------------------
/clip/bpe_simple_vocab_16e6.txt.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/clip/bpe_simple_vocab_16e6.txt.gz
--------------------------------------------------------------------------------
/clip/clip.py:
--------------------------------------------------------------------------------
1 | import hashlib
2 | import os
3 | import urllib
4 | import warnings
5 | from typing import List, Union
6 |
7 | import torch
8 | from PIL import Image
9 | from torchvision.transforms import (CenterCrop, Compose, Normalize, Resize,
10 | ToTensor)
11 | from tqdm import tqdm
12 |
13 | from .model import build_model
14 | from .simple_tokenizer import SimpleTokenizer as _Tokenizer
15 |
16 | try:
17 | from torchvision.transforms import InterpolationMode
18 | BICUBIC = InterpolationMode.BICUBIC
19 | except ImportError:
20 | BICUBIC = Image.BICUBIC
21 |
22 | if torch.__version__.split(".") < ["1", "7", "1"]:
23 | warnings.warn("PyTorch version 1.7.1 or higher is recommended")
24 |
25 | __all__ = ["available_models", "load", "tokenize"]
26 | _tokenizer = _Tokenizer()
27 |
28 | _MODELS = {
29 | "RN50":
30 | "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
31 | "RN101":
32 | "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
33 | "RN50x4":
34 | "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
35 | "RN50x16":
36 | "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
37 | "ViT-B/32":
38 | "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
39 | "ViT-B/16":
40 | "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
41 | }
42 |
43 |
44 | def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")):
45 | os.makedirs(root, exist_ok=True)
46 | filename = os.path.basename(url)
47 |
48 | expected_sha256 = url.split("/")[-2]
49 | download_target = os.path.join(root, filename)
50 |
51 | if os.path.exists(download_target) and not os.path.isfile(download_target):
52 | raise RuntimeError(
53 | f"{download_target} exists and is not a regular file")
54 |
55 | if os.path.isfile(download_target):
56 | if hashlib.sha256(open(download_target,
57 | "rb").read()).hexdigest() == expected_sha256:
58 | return download_target
59 | else:
60 | warnings.warn(
61 | f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file"
62 | )
63 |
64 | with urllib.request.urlopen(url) as source, open(download_target,
65 | "wb") as output:
66 | with tqdm(total=int(source.info().get("Content-Length")),
67 | ncols=80,
68 | unit='iB',
69 | unit_scale=True) as loop:
70 | while True:
71 | buffer = source.read(8192)
72 | if not buffer:
73 | break
74 |
75 | output.write(buffer)
76 | loop.update(len(buffer))
77 |
78 | if hashlib.sha256(open(download_target,
79 | "rb").read()).hexdigest() != expected_sha256:
80 | raise RuntimeError(
81 | "Model has been downloaded but the SHA256 checksum does not not match"
82 | )
83 |
84 | return download_target
85 |
86 |
87 | def _transform(n_px):
88 | return Compose([
89 | Resize(n_px, interpolation=BICUBIC),
90 | CenterCrop(n_px),
91 | lambda image: image.convert("RGB"),
92 | ToTensor(),
93 | Normalize((0.48145466, 0.4578275, 0.40821073),
94 | (0.26862954, 0.26130258, 0.27577711)),
95 | ])
96 |
97 |
98 | def available_models() -> List[str]:
99 | """Returns the names of available CLIP models"""
100 | return list(_MODELS.keys())
101 |
102 |
103 | def load(name: str,
104 | device: Union[str, torch.device] = "cuda"
105 | if torch.cuda.is_available() else "cpu",
106 | jit=False):
107 | """Load a CLIP model
108 |
109 | Parameters
110 | ----------
111 | name : str
112 | A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
113 |
114 | device : Union[str, torch.device]
115 | The device to put the loaded model
116 |
117 | jit : bool
118 | Whether to load the optimized JIT model or more hackable non-JIT model (default).
119 |
120 | Returns
121 | -------
122 | model : torch.nn.Module
123 | The CLIP model
124 |
125 | preprocess : Callable[[PIL.Image], torch.Tensor]
126 | A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
127 | """
128 | if name in _MODELS:
129 | model_path = _download(_MODELS[name])
130 | elif os.path.isfile(name):
131 | model_path = name
132 | else:
133 | raise RuntimeError(
134 | f"Model {name} not found; available models = {available_models()}")
135 |
136 | try:
137 | # loading JIT archive
138 | model = torch.jit.load(model_path,
139 | map_location=device if jit else "cpu").eval()
140 | state_dict = None
141 | except RuntimeError:
142 | # loading saved state dict
143 | if jit:
144 | warnings.warn(
145 | f"File {model_path} is not a JIT archive. Loading as a state dict instead"
146 | )
147 | jit = False
148 | state_dict = torch.load(model_path, map_location="cpu")
149 |
150 | if not jit:
151 | model = build_model(state_dict or model.state_dict()).to(device)
152 | if str(device) == "cpu":
153 | model.float()
154 | return model, _transform(model.visual.input_resolution)
155 |
156 | # patch the device names
157 | device_holder = torch.jit.trace(
158 | lambda: torch.ones([]).to(torch.device(device)), example_inputs=[])
159 | device_node = [
160 | n for n in device_holder.graph.findAllNodes("prim::Constant")
161 | if "Device" in repr(n)
162 | ][-1]
163 |
164 | def patch_device(module):
165 | try:
166 | graphs = [module.graph] if hasattr(module, "graph") else []
167 | except RuntimeError:
168 | graphs = []
169 |
170 | if hasattr(module, "forward1"):
171 | graphs.append(module.forward1.graph)
172 |
173 | for graph in graphs:
174 | for node in graph.findAllNodes("prim::Constant"):
175 | if "value" in node.attributeNames() and str(
176 | node["value"]).startswith("cuda"):
177 | node.copyAttributes(device_node)
178 |
179 | model.apply(patch_device)
180 | patch_device(model.encode_image)
181 | patch_device(model.encode_text)
182 |
183 | # patch dtype to float32 on CPU
184 | if str(device) == "cpu":
185 | float_holder = torch.jit.trace(lambda: torch.ones([]).float(),
186 | example_inputs=[])
187 | float_input = list(float_holder.graph.findNode("aten::to").inputs())[1]
188 | float_node = float_input.node()
189 |
190 | def patch_float(module):
191 | try:
192 | graphs = [module.graph] if hasattr(module, "graph") else []
193 | except RuntimeError:
194 | graphs = []
195 |
196 | if hasattr(module, "forward1"):
197 | graphs.append(module.forward1.graph)
198 |
199 | for graph in graphs:
200 | for node in graph.findAllNodes("aten::to"):
201 | inputs = list(node.inputs())
202 | for i in [
203 | 1, 2
204 | ]: # dtype can be the second or third argument to aten::to()
205 | if inputs[i].node()["value"] == 5:
206 | inputs[i].node().copyAttributes(float_node)
207 |
208 | model.apply(patch_float)
209 | patch_float(model.encode_image)
210 | patch_float(model.encode_text)
211 |
212 | model.float()
213 |
214 | return model, _transform(model.input_resolution.item())
215 |
216 |
217 | def tokenize(texts: Union[str, List[str]],
218 | context_length: int = 77,
219 | truncate: bool = False) -> torch.LongTensor:
220 | """
221 | Returns the tokenized representation of given input string(s)
222 |
223 | Parameters
224 | ----------
225 | texts : Union[str, List[str]]
226 | An input string or a list of input strings to tokenize
227 |
228 | context_length : int
229 | The context length to use; all CLIP models use 77 as the context length
230 |
231 | truncate: bool
232 | Whether to truncate the text in case its encoding is longer than the context length
233 |
234 | Returns
235 | -------
236 | A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]
237 | """
238 | if isinstance(texts, str):
239 | texts = [texts]
240 |
241 | sot_token = _tokenizer.encoder["<|startoftext|>"]
242 | eot_token = _tokenizer.encoder["<|endoftext|>"]
243 | all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token]
244 | for text in texts]
245 | result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
246 |
247 | for i, tokens in enumerate(all_tokens):
248 | if len(tokens) > context_length:
249 | if truncate:
250 | tokens = tokens[:context_length]
251 | tokens[-1] = eot_token
252 | else:
253 | raise RuntimeError(
254 | f"Input {texts[i]} is too long for context length {context_length}"
255 | )
256 | result[i, :len(tokens)] = torch.tensor(tokens)
257 |
258 | return result
259 |
--------------------------------------------------------------------------------
/clip/model.py:
--------------------------------------------------------------------------------
1 | from collections import OrderedDict
2 | from typing import Tuple, Union
3 |
4 | import numpy as np
5 | import torch
6 | import torch.nn.functional as F
7 | from torch import nn
8 |
9 |
10 | class Bottleneck(nn.Module):
11 | expansion = 4
12 |
13 | def __init__(self, inplanes, planes, stride=1):
14 | super().__init__()
15 |
16 | # all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1
17 | self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False)
18 | self.bn1 = nn.BatchNorm2d(planes)
19 |
20 | self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False)
21 | self.bn2 = nn.BatchNorm2d(planes)
22 |
23 | self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity()
24 |
25 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False)
26 | self.bn3 = nn.BatchNorm2d(planes * self.expansion)
27 |
28 | self.relu = nn.ReLU(inplace=True)
29 | self.downsample = None
30 | self.stride = stride
31 |
32 | if stride > 1 or inplanes != planes * Bottleneck.expansion:
33 | # downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1
34 | self.downsample = nn.Sequential(
35 | OrderedDict([("-1", nn.AvgPool2d(stride)),
36 | ("0",
37 | nn.Conv2d(inplanes,
38 | planes * self.expansion,
39 | 1,
40 | stride=1,
41 | bias=False)),
42 | ("1", nn.BatchNorm2d(planes * self.expansion))]))
43 |
44 | def forward(self, x: torch.Tensor):
45 | identity = x
46 |
47 | out = self.relu(self.bn1(self.conv1(x)))
48 | out = self.relu(self.bn2(self.conv2(out)))
49 | out = self.avgpool(out)
50 | out = self.bn3(self.conv3(out))
51 |
52 | if self.downsample is not None:
53 | identity = self.downsample(x)
54 |
55 | out += identity
56 | out = self.relu(out)
57 | return out
58 |
59 |
60 | class AttentionPool2d(nn.Module):
61 |
62 | def __init__(self,
63 | spacial_dim: int,
64 | embed_dim: int,
65 | num_heads: int,
66 | output_dim: int = None):
67 | super().__init__()
68 | self.positional_embedding = nn.Parameter(
69 | torch.randn(spacial_dim**2 + 1, embed_dim) / embed_dim**0.5)
70 | self.k_proj = nn.Linear(embed_dim, embed_dim)
71 | self.q_proj = nn.Linear(embed_dim, embed_dim)
72 | self.v_proj = nn.Linear(embed_dim, embed_dim)
73 | self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
74 | self.num_heads = num_heads
75 |
76 | def forward(self, x):
77 | x = x.reshape(x.shape[0], x.shape[1],
78 | x.shape[2] * x.shape[3]).permute(2, 0,
79 | 1) # NCHW -> (HW)NC
80 | x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0) # (HW+1)NC
81 | x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC
82 | x, _ = F.multi_head_attention_forward(
83 | query=x,
84 | key=x,
85 | value=x,
86 | embed_dim_to_check=x.shape[-1],
87 | num_heads=self.num_heads,
88 | q_proj_weight=self.q_proj.weight,
89 | k_proj_weight=self.k_proj.weight,
90 | v_proj_weight=self.v_proj.weight,
91 | in_proj_weight=None,
92 | in_proj_bias=torch.cat(
93 | [self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]),
94 | bias_k=None,
95 | bias_v=None,
96 | add_zero_attn=False,
97 | dropout_p=0,
98 | out_proj_weight=self.c_proj.weight,
99 | out_proj_bias=self.c_proj.bias,
100 | use_separate_proj_weight=True,
101 | training=self.training,
102 | need_weights=False)
103 |
104 | return x[0]
105 |
106 |
107 | class ModifiedResNet(nn.Module):
108 | """
109 | A ResNet class that is similar to torchvision's but contains the following changes:
110 | - There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
111 | - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
112 | - The final pooling layer is a QKV attention instead of an average pool
113 | """
114 |
115 | def __init__(self,
116 | layers,
117 | output_dim,
118 | heads,
119 | input_resolution=224,
120 | width=64):
121 | super().__init__()
122 | self.output_dim = output_dim
123 | self.input_resolution = input_resolution
124 |
125 | # the 3-layer stem
126 | self.conv1 = nn.Conv2d(3,
127 | width // 2,
128 | kernel_size=3,
129 | stride=2,
130 | padding=1,
131 | bias=False)
132 | self.bn1 = nn.BatchNorm2d(width // 2)
133 | self.conv2 = nn.Conv2d(width // 2,
134 | width // 2,
135 | kernel_size=3,
136 | padding=1,
137 | bias=False)
138 | self.bn2 = nn.BatchNorm2d(width // 2)
139 | self.conv3 = nn.Conv2d(width // 2,
140 | width,
141 | kernel_size=3,
142 | padding=1,
143 | bias=False)
144 | self.bn3 = nn.BatchNorm2d(width)
145 | self.avgpool = nn.AvgPool2d(2)
146 | self.relu = nn.ReLU(inplace=True)
147 |
148 | # residual layers
149 | self._inplanes = width # this is a *mutable* variable used during construction
150 | self.layer1 = self._make_layer(width, layers[0])
151 | self.layer2 = self._make_layer(width * 2, layers[1], stride=2)
152 | self.layer3 = self._make_layer(width * 4, layers[2], stride=2)
153 | self.layer4 = self._make_layer(width * 8, layers[3], stride=2)
154 |
155 | embed_dim = width * 32 # the ResNet feature dimension
156 | self.attnpool = AttentionPool2d(input_resolution // 32, embed_dim,
157 | heads, output_dim)
158 |
159 | def _make_layer(self, planes, blocks, stride=1):
160 | layers = [Bottleneck(self._inplanes, planes, stride)]
161 |
162 | self._inplanes = planes * Bottleneck.expansion
163 | for _ in range(1, blocks):
164 | layers.append(Bottleneck(self._inplanes, planes))
165 |
166 | return nn.Sequential(*layers)
167 |
168 | def forward(self, x):
169 |
170 | def stem(x):
171 | for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2),
172 | (self.conv3, self.bn3)]:
173 | x = self.relu(bn(conv(x)))
174 | x = self.avgpool(x)
175 | return x
176 |
177 | x = x.type(self.conv1.weight.dtype)
178 | x = stem(x)
179 | x = self.layer1(x)
180 | x = self.layer2(x)
181 | x = self.layer3(x)
182 | x = self.layer4(x)
183 | x = self.attnpool(x)
184 |
185 | return x
186 |
187 | def get_features(self, x):
188 | def stem(x):
189 | for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2),
190 | (self.conv3, self.bn3)]:
191 | x = self.relu(bn(conv(x)))
192 | x = self.avgpool(x)
193 | return x
194 |
195 | x = x.type(self.conv1.weight.dtype)
196 | x = stem(x)
197 | x = self.layer1(x)
198 | x = self.layer2(x)
199 | x = self.layer3(x)
200 | x = self.layer4(x)
201 | return x
202 |
203 | class LayerNorm(nn.LayerNorm):
204 | """Subclass torch's LayerNorm to handle fp16."""
205 |
206 | def forward(self, x: torch.Tensor):
207 | orig_type = x.dtype
208 | ret = super().forward(x.type(torch.float32))
209 | return ret.type(orig_type)
210 |
211 |
212 | class QuickGELU(nn.Module):
213 |
214 | def forward(self, x: torch.Tensor):
215 | return x * torch.sigmoid(1.702 * x)
216 |
217 |
218 | class ResidualAttentionBlock(nn.Module):
219 |
220 | def __init__(self,
221 | d_model: int,
222 | n_head: int,
223 | attn_mask: torch.Tensor = None):
224 | super().__init__()
225 |
226 | self.attn = nn.MultiheadAttention(d_model, n_head)
227 | self.ln_1 = LayerNorm(d_model)
228 | self.mlp = nn.Sequential(
229 | OrderedDict([("c_fc", nn.Linear(d_model, d_model * 4)),
230 | ("gelu", QuickGELU()),
231 | ("c_proj", nn.Linear(d_model * 4, d_model))]))
232 | self.ln_2 = LayerNorm(d_model)
233 | self.attn_mask = attn_mask
234 |
235 | def attention(self, x: torch.Tensor):
236 | self.attn_mask = self.attn_mask.to(
237 | dtype=x.dtype,
238 | device=x.device) if self.attn_mask is not None else None
239 | return self.attn(x, x, x, need_weights=False,
240 | attn_mask=self.attn_mask)[0]
241 |
242 | def forward(self, x: torch.Tensor):
243 | x = x + self.attention(self.ln_1(x))
244 | x = x + self.mlp(self.ln_2(x))
245 | return x
246 |
247 |
248 | class Transformer(nn.Module):
249 |
250 | def __init__(self,
251 | width: int,
252 | layers: int,
253 | heads: int,
254 | attn_mask: torch.Tensor = None):
255 | super().__init__()
256 | self.width = width
257 | self.layers = layers
258 | self.resblocks = nn.Sequential(*[
259 | ResidualAttentionBlock(width, heads, attn_mask)
260 | for _ in range(layers)
261 | ])
262 |
263 | def forward(self, x: torch.Tensor):
264 | return self.resblocks(x)
265 |
266 |
267 | class VisionTransformer(nn.Module):
268 |
269 | def __init__(self, input_resolution: int, patch_size: int, width: int,
270 | layers: int, heads: int, output_dim: int):
271 | super().__init__()
272 | self.input_resolution = input_resolution
273 | self.output_dim = output_dim
274 | self.conv1 = nn.Conv2d(in_channels=3,
275 | out_channels=width,
276 | kernel_size=patch_size,
277 | stride=patch_size,
278 | bias=False)
279 |
280 | scale = width**-0.5
281 | self.class_embedding = nn.Parameter(scale * torch.randn(width))
282 | self.positional_embedding = nn.Parameter(scale * torch.randn(
283 | (input_resolution // patch_size)**2 + 1, width))
284 | self.ln_pre = LayerNorm(width)
285 |
286 | self.transformer = Transformer(width, layers, heads)
287 |
288 | self.ln_post = LayerNorm(width)
289 | self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
290 |
291 | def forward(self, x: torch.Tensor):
292 | x = self.conv1(x) # shape = [*, width, grid, grid]
293 | x = x.reshape(x.shape[0], x.shape[1],
294 | -1) # shape = [*, width, grid ** 2]
295 | x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width]
296 | x = torch.cat([
297 | self.class_embedding.to(x.dtype) + torch.zeros(
298 | x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x
299 | ],
300 | dim=1) # shape = [*, grid ** 2 + 1, width] # noqa
301 | x = x + self.positional_embedding.to(x.dtype)
302 | x = self.ln_pre(x)
303 |
304 | x = x.permute(1, 0, 2) # NLD -> LND
305 | x = self.transformer(x)
306 | x = x.permute(1, 0, 2) # LND -> NLD
307 |
308 | x = self.ln_post(x[:, 0, :])
309 |
310 | if self.proj is not None:
311 | x = x @ self.proj
312 |
313 | return x
314 |
315 |
316 | class CLIP(nn.Module):
317 |
318 | def __init__(
319 | self,
320 | embed_dim: int,
321 | # vision
322 | image_resolution: int,
323 | vision_layers: Union[Tuple[int, int, int, int], int],
324 | vision_width: int,
325 | vision_patch_size: int,
326 | # text
327 | context_length: int,
328 | vocab_size: int,
329 | transformer_width: int,
330 | transformer_heads: int,
331 | transformer_layers: int):
332 | super().__init__()
333 |
334 | self.context_length = context_length
335 |
336 | if isinstance(vision_layers, (tuple, list)):
337 | vision_heads = vision_width * 32 // 64
338 | self.visual = ModifiedResNet(layers=vision_layers,
339 | output_dim=embed_dim,
340 | heads=vision_heads,
341 | input_resolution=image_resolution,
342 | width=vision_width)
343 | else:
344 | vision_heads = vision_width // 64
345 | self.visual = VisionTransformer(input_resolution=image_resolution,
346 | patch_size=vision_patch_size,
347 | width=vision_width,
348 | layers=vision_layers,
349 | heads=vision_heads,
350 | output_dim=embed_dim)
351 |
352 | self.transformer = Transformer(width=transformer_width,
353 | layers=transformer_layers,
354 | heads=transformer_heads,
355 | attn_mask=self.build_attention_mask())
356 |
357 | self.vocab_size = vocab_size
358 | self.token_embedding = nn.Embedding(vocab_size, transformer_width)
359 | self.positional_embedding = nn.Parameter(
360 | torch.empty(self.context_length, transformer_width))
361 | self.ln_final = LayerNorm(transformer_width)
362 |
363 | self.text_projection = nn.Parameter(
364 | torch.empty(transformer_width, embed_dim))
365 | self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
366 |
367 | self.initialize_parameters()
368 |
369 | def initialize_parameters(self):
370 | nn.init.normal_(self.token_embedding.weight, std=0.02)
371 | nn.init.normal_(self.positional_embedding, std=0.01)
372 |
373 | if isinstance(self.visual, ModifiedResNet):
374 | if self.visual.attnpool is not None:
375 | std = self.visual.attnpool.c_proj.in_features**-0.5
376 | nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std)
377 | nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std)
378 | nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std)
379 | nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std)
380 |
381 | for resnet_block in [
382 | self.visual.layer1, self.visual.layer2, self.visual.layer3,
383 | self.visual.layer4
384 | ]:
385 | for name, param in resnet_block.named_parameters():
386 | if name.endswith("bn3.weight"):
387 | nn.init.zeros_(param)
388 |
389 | proj_std = (self.transformer.width**-0.5) * (
390 | (2 * self.transformer.layers)**-0.5)
391 | attn_std = self.transformer.width**-0.5
392 | fc_std = (2 * self.transformer.width)**-0.5
393 | for block in self.transformer.resblocks:
394 | nn.init.normal_(block.attn.in_proj_weight, std=attn_std)
395 | nn.init.normal_(block.attn.out_proj.weight, std=proj_std)
396 | nn.init.normal_(block.mlp.c_fc.weight, std=fc_std)
397 | nn.init.normal_(block.mlp.c_proj.weight, std=proj_std)
398 |
399 | if self.text_projection is not None:
400 | nn.init.normal_(self.text_projection,
401 | std=self.transformer.width**-0.5)
402 |
403 | def build_attention_mask(self):
404 | # lazily create causal attention mask, with full attention between the vision tokens
405 | # pytorch uses additive attention mask; fill with -inf
406 | mask = torch.empty(self.context_length, self.context_length)
407 | mask.fill_(float("-inf"))
408 | mask.triu_(1) # zero out the lower diagonal
409 | return mask
410 |
411 | @property
412 | def dtype(self):
413 | return self.visual.conv1.weight.dtype
414 |
415 | def encode_image(self, image):
416 | return self.visual(image.type(self.dtype))
417 |
418 | def encode_text(self, text):
419 | x = self.token_embedding(text).type(
420 | self.dtype) # [batch_size, n_ctx, d_model]
421 |
422 | x = x + self.positional_embedding.type(self.dtype)
423 | x = x.permute(1, 0, 2) # NLD -> LND
424 | x = self.transformer(x)
425 | x = x.permute(1, 0, 2) # LND -> NLD
426 | x = self.ln_final(x).type(self.dtype)
427 |
428 | # x.shape = [batch_size, n_ctx, transformer.width]
429 | # take features from the eot embedding (eot_token is the highest number in each sequence)
430 | x = x[torch.arange(x.shape[0]),
431 | text.argmax(dim=-1)] @ self.text_projection
432 |
433 | return x
434 |
435 | def forward(self, image, text):
436 | image_features = self.encode_image(image)
437 | text_features = self.encode_text(text)
438 |
439 | # normalized features
440 | image_features = image_features / image_features.norm(dim=-1,
441 | keepdim=True)
442 | text_features = text_features / text_features.norm(dim=-1,
443 | keepdim=True)
444 |
445 | # cosine similarity as logits
446 | logit_scale = self.logit_scale.exp()
447 | logits_per_image = logit_scale * image_features @ text_features.t()
448 | logits_per_text = logit_scale * text_features @ image_features.t()
449 |
450 | # shape = [global_batch_size, global_batch_size]
451 | return logits_per_image, logits_per_text
452 |
453 |
454 | def convert_weights(model: nn.Module):
455 | """Convert applicable model parameters to fp16"""
456 |
457 | def _convert_weights_to_fp16(l):
458 | if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
459 | l.weight.data = l.weight.data.half()
460 | if l.bias is not None:
461 | l.bias.data = l.bias.data.half()
462 |
463 | if isinstance(l, nn.MultiheadAttention):
464 | for attr in [
465 | *[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]],
466 | "in_proj_bias", "bias_k", "bias_v"
467 | ]:
468 | tensor = getattr(l, attr)
469 | if tensor is not None:
470 | tensor.data = tensor.data.half()
471 |
472 | for name in ["text_projection", "proj"]:
473 | if hasattr(l, name):
474 | attr = getattr(l, name)
475 | if attr is not None:
476 | attr.data = attr.data.half()
477 |
478 | model.apply(_convert_weights_to_fp16)
479 |
480 |
481 | def build_model(state_dict: dict):
482 | vit = "visual.proj" in state_dict
483 |
484 | if vit:
485 | vision_width = state_dict["visual.conv1.weight"].shape[0]
486 | vision_layers = len([
487 | k for k in state_dict.keys()
488 | if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")
489 | ])
490 | vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
491 | grid_size = round(
492 | (state_dict["visual.positional_embedding"].shape[0] - 1)**0.5)
493 | image_resolution = vision_patch_size * grid_size
494 | else:
495 | counts: list = [
496 | len(
497 | set(
498 | k.split(".")[2] for k in state_dict
499 | if k.startswith(f"visual.layer{b}")))
500 | for b in [1, 2, 3, 4]
501 | ]
502 | vision_layers = tuple(counts)
503 | vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
504 | output_width = round(
505 | (state_dict["visual.attnpool.positional_embedding"].shape[0] -
506 | 1)**0.5)
507 | vision_patch_size = None
508 | assert output_width**2 + 1 == state_dict[
509 | "visual.attnpool.positional_embedding"].shape[0]
510 | image_resolution = output_width * 32
511 |
512 | embed_dim = state_dict["text_projection"].shape[1]
513 | context_length = state_dict["positional_embedding"].shape[0]
514 | vocab_size = state_dict["token_embedding.weight"].shape[0]
515 | transformer_width = state_dict["ln_final.weight"].shape[0]
516 | transformer_heads = transformer_width // 64
517 | transformer_layers = len(
518 | set(
519 | k.split(".")[2] for k in state_dict
520 | if k.startswith(f"transformer.resblocks")))
521 |
522 | model = CLIP(embed_dim, image_resolution, vision_layers, vision_width,
523 | vision_patch_size, context_length, vocab_size,
524 | transformer_width, transformer_heads, transformer_layers)
525 |
526 | for key in ["input_resolution", "context_length", "vocab_size"]:
527 | if key in state_dict:
528 | del state_dict[key]
529 |
530 | convert_weights(model)
531 | model.load_state_dict(state_dict)
532 | return model.eval()
533 |
--------------------------------------------------------------------------------
/clip/simple_tokenizer.py:
--------------------------------------------------------------------------------
1 | import gzip
2 | import html
3 | import os
4 | from functools import lru_cache
5 |
6 | import ftfy
7 | import regex as re
8 |
9 |
10 | @lru_cache()
11 | def default_bpe():
12 | return os.path.join(os.path.dirname(os.path.abspath(__file__)),
13 | "bpe_simple_vocab_16e6.txt.gz")
14 |
15 |
16 | @lru_cache()
17 | def bytes_to_unicode():
18 | """
19 | Returns list of utf-8 byte and a corresponding list of unicode strings.
20 | The reversible bpe codes work on unicode strings.
21 | This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
22 | When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
23 | This is a signficant percentage of your normal, say, 32K bpe vocab.
24 | To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
25 | And avoids mapping to whitespace/control characters the bpe code barfs on.
26 | """
27 | bs = list(range(ord("!"),
28 | ord("~") + 1)) + list(range(
29 | ord("¡"),
30 | ord("¬") + 1)) + list(range(ord("®"),
31 | ord("ÿ") + 1))
32 | cs = bs[:]
33 | n = 0
34 | for b in range(2**8):
35 | if b not in bs:
36 | bs.append(b)
37 | cs.append(2**8 + n)
38 | n += 1
39 | cs = [chr(n) for n in cs]
40 | return dict(zip(bs, cs))
41 |
42 |
43 | def get_pairs(word):
44 | """Return set of symbol pairs in a word.
45 | Word is represented as tuple of symbols (symbols being variable-length strings).
46 | """
47 | pairs = set()
48 | prev_char = word[0]
49 | for char in word[1:]:
50 | pairs.add((prev_char, char))
51 | prev_char = char
52 | return pairs
53 |
54 |
55 | def basic_clean(text):
56 | text = ftfy.fix_text(text)
57 | text = html.unescape(html.unescape(text))
58 | return text.strip()
59 |
60 |
61 | def whitespace_clean(text):
62 | text = re.sub(r'\s+', ' ', text)
63 | text = text.strip()
64 | return text
65 |
66 |
67 | class SimpleTokenizer(object):
68 |
69 | def __init__(self, bpe_path: str = default_bpe()):
70 | self.byte_encoder = bytes_to_unicode()
71 | self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
72 | merges = gzip.open(bpe_path).read().decode("utf-8").split('\n')
73 | merges = merges[1:49152 - 256 - 2 + 1]
74 | merges = [tuple(merge.split()) for merge in merges]
75 | vocab = list(bytes_to_unicode().values())
76 | vocab = vocab + [v + '' for v in vocab]
77 | for merge in merges:
78 | vocab.append(''.join(merge))
79 | vocab.extend(['<|startoftext|>', '<|endoftext|>'])
80 | self.encoder = dict(zip(vocab, range(len(vocab))))
81 | self.decoder = {v: k for k, v in self.encoder.items()}
82 | self.bpe_ranks = dict(zip(merges, range(len(merges))))
83 | self.cache = {
84 | '<|startoftext|>': '<|startoftext|>',
85 | '<|endoftext|>': '<|endoftext|>'
86 | }
87 | self.pat = re.compile(
88 | r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""",
89 | re.IGNORECASE)
90 |
91 | def bpe(self, token):
92 | if token in self.cache:
93 | return self.cache[token]
94 | word = tuple(token[:-1]) + (token[-1] + '', )
95 | pairs = get_pairs(word)
96 |
97 | if not pairs:
98 | return token + ''
99 |
100 | while True:
101 | bigram = min(
102 | pairs, key=lambda pair: self.bpe_ranks.get(pair, float('inf')))
103 | if bigram not in self.bpe_ranks:
104 | break
105 | first, second = bigram
106 | new_word = []
107 | i = 0
108 | while i < len(word):
109 | try:
110 | j = word.index(first, i)
111 | new_word.extend(word[i:j])
112 | i = j
113 | except: # noqa
114 | new_word.extend(word[i:])
115 | break
116 |
117 | if word[i] == first and i < len(word) - 1 and word[
118 | i + 1] == second:
119 | new_word.append(first + second)
120 | i += 2
121 | else:
122 | new_word.append(word[i])
123 | i += 1
124 | new_word = tuple(new_word)
125 | word = new_word
126 | if len(word) == 1:
127 | break
128 | else:
129 | pairs = get_pairs(word)
130 | word = ' '.join(word)
131 | self.cache[token] = word
132 | return word
133 |
134 | def encode(self, text):
135 | bpe_tokens = []
136 | text = whitespace_clean(basic_clean(text)).lower()
137 | for token in re.findall(self.pat, text):
138 | token = ''.join(self.byte_encoder[b]
139 | for b in token.encode('utf-8'))
140 | bpe_tokens.extend(self.encoder[bpe_token]
141 | for bpe_token in self.bpe(token).split(' '))
142 | return bpe_tokens
143 |
144 | def decode(self, tokens):
145 | text = ''.join([self.decoder[token] for token in tokens])
146 | text = bytearray([self.byte_decoder[c] for c in text
147 | ]).decode('utf-8',
148 | errors="replace").replace('', ' ')
149 | return text
150 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from omegaconf import OmegaConf
4 |
5 | from args import args
6 |
7 | # Base config
8 | base_cfg = OmegaConf.load('configs/base.yaml')
9 |
10 | # Main Config
11 | main_cfg = OmegaConf.load(args.config_file)
12 |
13 | cfg = OmegaConf.merge(base_cfg, main_cfg)
14 |
15 | import torch # isort:skip # noqa
16 |
17 | cfg.checkpoint = f"checkpoints/{cfg.checkpoint}"
18 | dirs = os.listdir(cfg.checkpoint) if os.path.exists(cfg.checkpoint) else []
19 | dirs.sort()
20 | for i, dir in enumerate(dirs):
21 | assert (dir.startswith('round'))
22 | next_index = len(dirs) + 1
23 |
24 | if args.resume:
25 | cfg.resume = f"{cfg.checkpoint}/round{args.round}"
26 | else:
27 | cfg.resume = False
28 |
29 | if not args.test:
30 | cfg.checkpoint = f"{cfg.checkpoint}/round{next_index}"
31 | assert (not os.path.exists(cfg.checkpoint))
32 | cfg.test = False
33 | else:
34 | cfg.checkpoint = f"{cfg.checkpoint}/round{args.round}"
35 | cfg.log_file = "test.txt"
36 | cfg.test = True
37 |
38 | if not os.path.exists(cfg.checkpoint):
39 | os.makedirs(cfg.checkpoint)
40 |
--------------------------------------------------------------------------------
/configs/base.yaml:
--------------------------------------------------------------------------------
1 | lr: 1e-4
2 | workers: 16
3 | image_size: 224
4 | batch_size: 128
5 | loss: SPLC
6 | log_file: log.txt
7 | weight_decay: 1e-4
8 | test: false
9 | thre: 0.5
10 |
--------------------------------------------------------------------------------
/configs/scpnet+coco.yaml:
--------------------------------------------------------------------------------
1 | dataset: ./dataset/coco_train_singlelabel.txt
2 | data: /home/wangao/datasets/coco
3 | num_classes: 80
4 | n_ctx: 4
5 | checkpoint: scpnet+coco
6 | ctx_init: a photo of a
7 | log_file: scpnet+coco.txt
8 | epochs: 60
9 | total_epochs: 80
10 | lr: 3e-5
11 | lambda_u: 0.125
12 | p_cutoff: 0.95
13 | hard_k: 2
14 | kl_lambda: 1
15 | T: 0.2
16 | sparse_topk: 62
17 | reweight_p: 0.2
18 | gcn_lr: 1e-5
19 | relation_file: relation+coco.npy
20 | ratio: 1
21 | scale: 10
22 |
--------------------------------------------------------------------------------
/configs/scpnet+cub.yaml:
--------------------------------------------------------------------------------
1 | data: /home/wangao/datasets/cub/
2 | train_dataset: ./dataset/cub_train.txt
3 | val_dataset: ./dataset/cub_val.txt
4 | num_classes: 312
5 | n_ctx: 4
6 | checkpoint: scpnet+cub
7 | ctx_init: a photo of a
8 | log_file: scpnet+cub.txt
9 | total_epochs: 100
10 | epochs: 100
11 | lr: 3e-4
12 | lambda_u: 0.125
13 | p_cutoff: 0.95
14 | hard_k: 31
15 | kl_lambda: 3
16 | T: 0.3
17 | sparse_topk: 312
18 | reweight_p: 0.2
19 | gcn_lr: 2e-4
20 | relation_file: relation+cub.npy
21 | ratio: 1
22 | scale: 10
23 | batch_size: 96
--------------------------------------------------------------------------------
/configs/scpnet+nuswide.yaml:
--------------------------------------------------------------------------------
1 | data: /home/wangao/datasets/nuswide/
2 | train_dataset: ./dataset/nus_train_singlelabel.txt
3 | val_dataset: ./dataset/nus_val.txt
4 | num_classes: 81
5 | n_ctx: 4
6 | checkpoint: scpnet+nuswide
7 | ctx_init: a photo of a
8 | log_file: scpnet+nuswide.txt
9 | epochs: 60
10 | lr: 3e-5
11 | lambda_u: 0.125
12 | p_cutoff: 0.95
13 | hard_k: 6
14 | kl_lambda: 1
15 | T: 0.2
16 | sparse_topk: 50
17 | reweight_p: 0.2
18 | gcn_lr: 1e-4
19 | relation_file: relation+nuswide.npy
20 | ratio: 1
21 | scale: clip
--------------------------------------------------------------------------------
/configs/scpnet+voc.yaml:
--------------------------------------------------------------------------------
1 | data: /home/wangao/datasets/voc2012/
2 | train_dataset: ./dataset/voc_train.txt
3 | val_dataset: ./dataset/voc_val.txt
4 | num_classes: 20
5 | n_ctx: 4
6 | checkpoint: scpnet+voc
7 | ctx_init: a photo of a
8 | log_file: scpnet+voc.txt
9 | epochs: 120
10 | total_epochs: 120
11 | lr: 4e-5
12 | lambda_u: 0.125
13 | p_cutoff: 0.95
14 | hard_k: 5
15 | kl_lambda: 2
16 | T: 0.3
17 | sparse_topk: 20
18 | reweight_p: 0.2
19 | gcn_lr: 2e-4
20 | relation_file: relation+voc.npy
21 | ratio: 1
22 | scale: 10
--------------------------------------------------------------------------------
/cub_labels.txt:
--------------------------------------------------------------------------------
1 | curved bill
2 | dagger bill
3 | hooked bill
4 | needle bill
5 | hooked_seabird bill
6 | spatulate bill
7 | all-purpose bill
8 | cone bill
9 | specialized bill
10 | blue wing
11 | brown wing
12 | iridescent wing
13 | purple wing
14 | rufous wing
15 | grey wing
16 | yellow wing
17 | olive wing
18 | green wing
19 | pink wing
20 | orange wing
21 | black wing
22 | white wing
23 | red wing
24 | buff wing
25 | blue upperparts
26 | brown upperparts
27 | iridescent upperparts
28 | purple upperparts
29 | rufous upperparts
30 | grey upperparts
31 | yellow upperparts
32 | olive upperparts
33 | green upperparts
34 | pink upperparts
35 | orange upperparts
36 | black upperparts
37 | white upperparts
38 | red upperparts
39 | buff upperparts
40 | blue underparts
41 | brown underparts
42 | iridescent underparts
43 | purple underparts
44 | rufous underparts
45 | grey underparts
46 | yellow underparts
47 | olive underparts
48 | green underparts
49 | pink underparts
50 | orange underparts
51 | black underparts
52 | white underparts
53 | red underparts
54 | buff underparts
55 | solid breast
56 | spotted breast
57 | striped breast
58 | multi-colored breast
59 | blue back
60 | brown back
61 | iridescent back
62 | purple back
63 | rufous back
64 | grey back
65 | yellow back
66 | olive back
67 | green back
68 | pink back
69 | orange back
70 | black back
71 | white back
72 | red back
73 | buff back
74 | forked_tail tail
75 | rounded_tail tail
76 | notched_tail tail
77 | fan-shaped_tail tail
78 | pointed_tail tail
79 | squared_tail tail
80 | blue upper
81 | brown upper
82 | iridescent upper
83 | purple upper
84 | rufous upper
85 | grey upper
86 | yellow upper
87 | olive upper
88 | green upper
89 | pink upper
90 | orange upper
91 | black upper
92 | white upper
93 | red upper
94 | buff upper
95 | spotted head
96 | malar head
97 | crested head
98 | masked head
99 | unique_pattern head
100 | eyebrow head
101 | eyering head
102 | plain head
103 | eyeline head
104 | striped head
105 | capped head
106 | blue breast
107 | brown breast
108 | iridescent breast
109 | purple breast
110 | rufous breast
111 | grey breast
112 | yellow breast
113 | olive breast
114 | green breast
115 | pink breast
116 | orange breast
117 | black breast
118 | white breast
119 | red breast
120 | buff breast
121 | blue throat
122 | brown throat
123 | iridescent throat
124 | purple throat
125 | rufous throat
126 | grey throat
127 | yellow throat
128 | olive throat
129 | green throat
130 | pink throat
131 | orange throat
132 | black throat
133 | white throat
134 | red throat
135 | buff throat
136 | blue eye
137 | brown eye
138 | purple eye
139 | rufous eye
140 | grey eye
141 | yellow eye
142 | olive eye
143 | green eye
144 | pink eye
145 | orange eye
146 | black eye
147 | white eye
148 | red eye
149 | buff eye
150 | about_the_same_as_head bill
151 | longer_than_head bill
152 | shorter_than_head bill
153 | blue forehead
154 | brown forehead
155 | iridescent forehead
156 | purple forehead
157 | rufous forehead
158 | grey forehead
159 | yellow forehead
160 | olive forehead
161 | green forehead
162 | pink forehead
163 | orange forehead
164 | black forehead
165 | white forehead
166 | red forehead
167 | buff forehead
168 | blue under
169 | brown under
170 | iridescent under
171 | purple under
172 | rufous under
173 | grey under
174 | yellow under
175 | olive under
176 | green under
177 | pink under
178 | orange under
179 | black under
180 | white under
181 | red under
182 | buff under
183 | blue nape
184 | brown nape
185 | iridescent nape
186 | purple nape
187 | rufous nape
188 | grey nape
189 | yellow nape
190 | olive nape
191 | green nape
192 | pink nape
193 | orange nape
194 | black nape
195 | white nape
196 | red nape
197 | buff nape
198 | blue belly
199 | brown belly
200 | iridescent belly
201 | purple belly
202 | rufous belly
203 | grey belly
204 | yellow belly
205 | olive belly
206 | green belly
207 | pink belly
208 | orange belly
209 | black belly
210 | white belly
211 | red belly
212 | buff belly
213 | rounded-wings wing
214 | pointed-wings wing
215 | broad-wings wing
216 | tapered-wings wing
217 | long-wings wing
218 | large size::large
219 | small size::small
220 | very_large size::very
221 | medium size::medium
222 | very_small size::very
223 | upright-perching_water-like shape::upright-perching
224 | chicken-like-marsh shape::chicken-like-marsh
225 | long-legged-like shape::long-legged-like
226 | duck-like shape::duck-like
227 | owl-like shape::owl-like
228 | gull-like shape::gull-like
229 | hummingbird-like shape::hummingbird-like
230 | pigeon-like shape::pigeon-like
231 | tree-clinging-like shape::tree-clinging-like
232 | hawk-like shape::hawk-like
233 | sandpiper-like shape::sandpiper-like
234 | upland-ground-like shape::upland-ground-like
235 | swallow-like shape::swallow-like
236 | perching-like shape::perching-like
237 | solid back
238 | spotted back
239 | striped back
240 | multi-colored back
241 | solid tail
242 | spotted tail
243 | striped tail
244 | multi-colored tail
245 | solid belly
246 | spotted belly
247 | striped belly
248 | multi-colored belly
249 | blue primary
250 | brown primary
251 | iridescent primary
252 | purple primary
253 | rufous primary
254 | grey primary
255 | yellow primary
256 | olive primary
257 | green primary
258 | pink primary
259 | orange primary
260 | black primary
261 | white primary
262 | red primary
263 | buff primary
264 | blue leg
265 | brown leg
266 | iridescent leg
267 | purple leg
268 | rufous leg
269 | grey leg
270 | yellow leg
271 | olive leg
272 | green leg
273 | pink leg
274 | orange leg
275 | black leg
276 | white leg
277 | red leg
278 | buff leg
279 | blue bill
280 | brown bill
281 | iridescent bill
282 | purple bill
283 | rufous bill
284 | grey bill
285 | yellow bill
286 | olive bill
287 | green bill
288 | pink bill
289 | orange bill
290 | black bill
291 | white bill
292 | red bill
293 | buff bill
294 | blue crown
295 | brown crown
296 | iridescent crown
297 | purple crown
298 | rufous crown
299 | grey crown
300 | yellow crown
301 | olive crown
302 | green crown
303 | pink crown
304 | orange crown
305 | black crown
306 | white crown
307 | red crown
308 | buff crown
309 | solid wing
310 | spotted wing
311 | striped wing
312 | multi-colored wing
--------------------------------------------------------------------------------
/figures/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/figures/overview.png
--------------------------------------------------------------------------------
/log.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import sys
3 |
4 | from config import cfg
5 |
6 | cfg.log_file = f"{cfg.checkpoint}/{cfg.log_file}"
7 |
8 | logging.basicConfig(level=logging.INFO,
9 | format="%(asctime)s [%(levelname)s] %(message)s",
10 | handlers=[
11 | logging.FileHandler(cfg.log_file),
12 | logging.StreamHandler(sys.stdout)
13 | ])
14 |
15 | logger = logging.getLogger()
16 |
17 | if cfg.resume:
18 | logger.info(f"Resume training... from {cfg.resume}")
19 |
20 |
--------------------------------------------------------------------------------
/logs/scpnet+coco.txt:
--------------------------------------------------------------------------------
1 | 2022-07-17 15:51:40,685 [INFO] Epoch [0/60], Step [000/642], LR 1.2e-06, Loss: 13286.5
2 | 2022-07-17 15:53:22,741 [INFO] Epoch [0/60], Step [100/642], LR 1.2e-06, Loss: 1132.3
3 | 2022-07-17 15:55:04,475 [INFO] Epoch [0/60], Step [200/642], LR 1.2e-06, Loss: 1030.5
4 | 2022-07-17 15:56:47,055 [INFO] Epoch [0/60], Step [300/642], LR 1.3e-06, Loss: 1008.7
5 | 2022-07-17 15:58:29,322 [INFO] Epoch [0/60], Step [400/642], LR 1.3e-06, Loss: 630.1
6 | 2022-07-17 16:00:12,686 [INFO] Epoch [0/60], Step [500/642], LR 1.4e-06, Loss: 847.4
7 | 2022-07-17 16:01:56,180 [INFO] Epoch [0/60], Step [600/642], LR 1.4e-06, Loss: 962.7
8 | 2022-07-17 16:02:36,957 [INFO] Start validation...
9 | 2022-07-17 16:04:13,632 [INFO] mAP score regular 28.42, mAP score EMA 44.89
10 | 2022-07-17 16:04:16,156 [INFO] current_mAP = 44.89, highest_mAP = 44.89, best_epoch=0
11 |
12 | 2022-07-17 16:04:16,156 [INFO] Save text embeddings done
13 | 2022-07-17 16:04:25,360 [INFO] Epoch [1/60], Step [000/642], LR 1.5e-06, Loss: 3451.4
14 | 2022-07-17 16:06:10,475 [INFO] Epoch [1/60], Step [100/642], LR 1.6e-06, Loss: 946.1
15 | 2022-07-17 16:07:54,265 [INFO] Epoch [1/60], Step [200/642], LR 1.7e-06, Loss: 964.6
16 | 2022-07-17 16:09:37,637 [INFO] Epoch [1/60], Step [300/642], LR 1.8e-06, Loss: 976.3
17 | 2022-07-17 16:11:21,822 [INFO] Epoch [1/60], Step [400/642], LR 1.9e-06, Loss: 973.7
18 | 2022-07-17 16:13:04,875 [INFO] Epoch [1/60], Step [500/642], LR 2.1e-06, Loss: 960.4
19 | 2022-07-17 16:14:48,458 [INFO] Epoch [1/60], Step [600/642], LR 2.2e-06, Loss: 915.2
20 | 2022-07-17 16:15:29,918 [INFO] Start validation...
21 | 2022-07-17 16:17:05,892 [INFO] mAP score regular 38.98, mAP score EMA 41.81
22 | 2022-07-17 16:17:05,922 [INFO] current_mAP = 41.81, highest_mAP = 44.89, best_epoch=0
23 |
24 | 2022-07-17 16:17:05,922 [INFO] Save text embeddings done
25 | 2022-07-17 16:17:17,145 [INFO] Epoch [2/60], Step [000/642], LR 2.3e-06, Loss: 978.1
26 | 2022-07-17 16:19:00,247 [INFO] Epoch [2/60], Step [100/642], LR 2.5e-06, Loss: 954.2
27 | 2022-07-17 16:20:43,878 [INFO] Epoch [2/60], Step [200/642], LR 2.7e-06, Loss: 954.1
28 | 2022-07-17 16:22:28,571 [INFO] Epoch [2/60], Step [300/642], LR 2.9e-06, Loss: 928.4
29 | 2022-07-17 16:24:12,222 [INFO] Epoch [2/60], Step [400/642], LR 3.1e-06, Loss: 921.7
30 | 2022-07-17 16:25:55,386 [INFO] Epoch [2/60], Step [500/642], LR 3.3e-06, Loss: 907.2
31 | 2022-07-17 16:27:38,450 [INFO] Epoch [2/60], Step [600/642], LR 3.5e-06, Loss: 920.4
32 | 2022-07-17 16:28:19,890 [INFO] Start validation...
33 | 2022-07-17 16:29:56,955 [INFO] mAP score regular 54.01, mAP score EMA 41.98
34 | 2022-07-17 16:30:05,007 [INFO] current_mAP = 54.01, highest_mAP = 54.01, best_epoch=2
35 |
36 | 2022-07-17 16:30:05,007 [INFO] Save text embeddings done
37 | 2022-07-17 16:30:16,473 [INFO] Epoch [3/60], Step [000/642], LR 3.6e-06, Loss: 887.5
38 | 2022-07-17 16:31:59,093 [INFO] Epoch [3/60], Step [100/642], LR 3.9e-06, Loss: 876.4
39 | 2022-07-17 16:33:42,022 [INFO] Epoch [3/60], Step [200/642], LR 4.1e-06, Loss: 889.5
40 | 2022-07-17 16:35:25,202 [INFO] Epoch [3/60], Step [300/642], LR 4.4e-06, Loss: 878.5
41 | 2022-07-17 16:37:09,456 [INFO] Epoch [3/60], Step [400/642], LR 4.7e-06, Loss: 832.1
42 | 2022-07-17 16:38:53,582 [INFO] Epoch [3/60], Step [500/642], LR 5.0e-06, Loss: 842.4
43 | 2022-07-17 16:40:37,047 [INFO] Epoch [3/60], Step [600/642], LR 5.3e-06, Loss: 858.2
44 | 2022-07-17 16:41:18,475 [INFO] Start validation...
45 | 2022-07-17 16:42:51,412 [INFO] mAP score regular 61.35, mAP score EMA 45.21
46 | 2022-07-17 16:42:59,220 [INFO] current_mAP = 61.35, highest_mAP = 61.35, best_epoch=3
47 |
48 | 2022-07-17 16:42:59,221 [INFO] Save text embeddings done
49 | 2022-07-17 16:43:08,551 [INFO] Epoch [4/60], Step [000/642], LR 5.4e-06, Loss: 864.4
50 | 2022-07-17 16:44:51,657 [INFO] Epoch [4/60], Step [100/642], LR 5.7e-06, Loss: 818.5
51 | 2022-07-17 16:46:35,568 [INFO] Epoch [4/60], Step [200/642], LR 6.1e-06, Loss: 841.1
52 | 2022-07-17 16:48:19,479 [INFO] Epoch [4/60], Step [300/642], LR 6.4e-06, Loss: 802.2
53 | 2022-07-17 16:50:03,433 [INFO] Epoch [4/60], Step [400/642], LR 6.7e-06, Loss: 807.5
54 | 2022-07-17 16:51:46,536 [INFO] Epoch [4/60], Step [500/642], LR 7.1e-06, Loss: 796.9
55 | 2022-07-17 16:53:29,551 [INFO] Epoch [4/60], Step [600/642], LR 7.5e-06, Loss: 792.0
56 | 2022-07-17 16:54:10,788 [INFO] Start validation...
57 | 2022-07-17 16:55:43,037 [INFO] mAP score regular 65.80, mAP score EMA 50.40
58 | 2022-07-17 16:55:51,001 [INFO] current_mAP = 65.80, highest_mAP = 65.80, best_epoch=4
59 |
60 | 2022-07-17 16:55:51,002 [INFO] Save text embeddings done
61 | 2022-07-17 16:56:01,866 [INFO] Epoch [5/60], Step [000/642], LR 7.6e-06, Loss: 803.1
62 | 2022-07-17 16:57:44,888 [INFO] Epoch [5/60], Step [100/642], LR 8.0e-06, Loss: 756.4
63 | 2022-07-17 16:59:28,773 [INFO] Epoch [5/60], Step [200/642], LR 8.4e-06, Loss: 749.1
64 | 2022-07-17 17:01:12,349 [INFO] Epoch [5/60], Step [300/642], LR 8.7e-06, Loss: 735.2
65 | 2022-07-17 17:02:55,653 [INFO] Epoch [5/60], Step [400/642], LR 9.1e-06, Loss: 728.2
66 | 2022-07-17 17:04:39,685 [INFO] Epoch [5/60], Step [500/642], LR 9.5e-06, Loss: 711.2
67 | 2022-07-17 17:06:24,157 [INFO] Epoch [5/60], Step [600/642], LR 9.9e-06, Loss: 692.8
68 | 2022-07-17 17:07:05,441 [INFO] Start validation...
69 | 2022-07-17 17:08:38,475 [INFO] mAP score regular 69.56, mAP score EMA 56.27
70 | 2022-07-17 17:08:46,013 [INFO] current_mAP = 69.56, highest_mAP = 69.56, best_epoch=5
71 |
72 | 2022-07-17 17:08:46,013 [INFO] Save text embeddings done
73 | 2022-07-17 17:08:53,237 [INFO] Epoch [6/60], Step [000/642], LR 1.0e-05, Loss: 722.1
74 | 2022-07-17 17:10:36,298 [INFO] Epoch [6/60], Step [100/642], LR 1.1e-05, Loss: 654.5
75 | 2022-07-17 17:12:20,620 [INFO] Epoch [6/60], Step [200/642], LR 1.1e-05, Loss: 662.9
76 | 2022-07-17 17:14:03,871 [INFO] Epoch [6/60], Step [300/642], LR 1.1e-05, Loss: 652.4
77 | 2022-07-17 17:15:46,665 [INFO] Epoch [6/60], Step [400/642], LR 1.2e-05, Loss: 654.8
78 | 2022-07-17 17:17:30,040 [INFO] Epoch [6/60], Step [500/642], LR 1.2e-05, Loss: 666.1
79 | 2022-07-17 17:19:12,738 [INFO] Epoch [6/60], Step [600/642], LR 1.3e-05, Loss: 672.6
80 | 2022-07-17 17:19:53,822 [INFO] Start validation...
81 | 2022-07-17 17:21:23,980 [INFO] mAP score regular 71.25, mAP score EMA 61.50
82 | 2022-07-17 17:21:32,028 [INFO] current_mAP = 71.25, highest_mAP = 71.25, best_epoch=6
83 |
84 | 2022-07-17 17:21:32,029 [INFO] Save text embeddings done
85 | 2022-07-17 17:21:42,003 [INFO] Epoch [7/60], Step [000/642], LR 1.3e-05, Loss: 664.5
86 | 2022-07-17 17:23:24,433 [INFO] Epoch [7/60], Step [100/642], LR 1.3e-05, Loss: 670.3
87 | 2022-07-17 17:25:08,126 [INFO] Epoch [7/60], Step [200/642], LR 1.4e-05, Loss: 659.0
88 | 2022-07-17 17:26:51,416 [INFO] Epoch [7/60], Step [300/642], LR 1.4e-05, Loss: 662.2
89 | 2022-07-17 17:28:35,512 [INFO] Epoch [7/60], Step [400/642], LR 1.5e-05, Loss: 657.5
90 | 2022-07-17 17:30:19,108 [INFO] Epoch [7/60], Step [500/642], LR 1.5e-05, Loss: 637.7
91 | 2022-07-17 17:32:01,520 [INFO] Epoch [7/60], Step [600/642], LR 1.5e-05, Loss: 666.3
92 | 2022-07-17 17:32:42,955 [INFO] Start validation...
93 | 2022-07-17 17:34:15,117 [INFO] mAP score regular 72.35, mAP score EMA 65.64
94 | 2022-07-17 17:34:22,499 [INFO] current_mAP = 72.35, highest_mAP = 72.35, best_epoch=7
95 |
96 | 2022-07-17 17:34:22,500 [INFO] Save text embeddings done
97 | 2022-07-17 17:34:29,495 [INFO] Epoch [8/60], Step [000/642], LR 1.6e-05, Loss: 665.7
98 | 2022-07-17 17:36:13,136 [INFO] Epoch [8/60], Step [100/642], LR 1.6e-05, Loss: 662.4
99 | 2022-07-17 17:37:55,668 [INFO] Epoch [8/60], Step [200/642], LR 1.6e-05, Loss: 655.5
100 | 2022-07-17 17:39:38,771 [INFO] Epoch [8/60], Step [300/642], LR 1.7e-05, Loss: 663.2
101 | 2022-07-17 17:41:22,314 [INFO] Epoch [8/60], Step [400/642], LR 1.7e-05, Loss: 642.0
102 | 2022-07-17 17:43:05,360 [INFO] Epoch [8/60], Step [500/642], LR 1.8e-05, Loss: 645.0
103 | 2022-07-17 17:44:48,373 [INFO] Epoch [8/60], Step [600/642], LR 1.8e-05, Loss: 684.7
104 | 2022-07-17 17:45:29,895 [INFO] Start validation...
105 | 2022-07-17 17:47:01,768 [INFO] mAP score regular 71.84, mAP score EMA 68.70
106 | 2022-07-17 17:47:01,790 [INFO] current_mAP = 71.84, highest_mAP = 72.35, best_epoch=7
107 |
108 | 2022-07-17 17:47:01,790 [INFO] Save text embeddings done
109 | 2022-07-17 17:47:08,673 [INFO] Epoch [9/60], Step [000/642], LR 1.8e-05, Loss: 627.2
110 | 2022-07-17 17:48:50,724 [INFO] Epoch [9/60], Step [100/642], LR 1.9e-05, Loss: 645.3
111 | 2022-07-17 17:50:34,738 [INFO] Epoch [9/60], Step [200/642], LR 1.9e-05, Loss: 638.1
112 | 2022-07-17 17:52:17,310 [INFO] Epoch [9/60], Step [300/642], LR 2.0e-05, Loss: 662.6
113 | 2022-07-17 17:53:59,213 [INFO] Epoch [9/60], Step [400/642], LR 2.0e-05, Loss: 611.6
114 | 2022-07-17 17:55:41,656 [INFO] Epoch [9/60], Step [500/642], LR 2.1e-05, Loss: 648.7
115 | 2022-07-17 17:57:24,063 [INFO] Epoch [9/60], Step [600/642], LR 2.1e-05, Loss: 662.5
116 | 2022-07-17 17:58:05,332 [INFO] Start validation...
117 | 2022-07-17 17:59:33,008 [INFO] mAP score regular 72.28, mAP score EMA 70.95
118 | 2022-07-17 17:59:33,027 [INFO] current_mAP = 72.28, highest_mAP = 72.35, best_epoch=7
119 |
120 | 2022-07-17 17:59:33,027 [INFO] Save text embeddings done
121 | 2022-07-17 17:59:40,885 [INFO] Epoch [10/60], Step [000/642], LR 2.1e-05, Loss: 629.6
122 | 2022-07-17 18:01:23,674 [INFO] Epoch [10/60], Step [100/642], LR 2.2e-05, Loss: 628.5
123 | 2022-07-17 18:03:06,082 [INFO] Epoch [10/60], Step [200/642], LR 2.2e-05, Loss: 656.5
124 | 2022-07-17 18:04:48,816 [INFO] Epoch [10/60], Step [300/642], LR 2.2e-05, Loss: 644.5
125 | 2022-07-17 18:06:32,578 [INFO] Epoch [10/60], Step [400/642], LR 2.3e-05, Loss: 623.7
126 | 2022-07-17 18:08:15,731 [INFO] Epoch [10/60], Step [500/642], LR 2.3e-05, Loss: 651.2
127 | 2022-07-17 18:09:58,153 [INFO] Epoch [10/60], Step [600/642], LR 2.3e-05, Loss: 646.6
128 | 2022-07-17 18:10:39,446 [INFO] Start validation...
129 | 2022-07-17 18:12:10,610 [INFO] mAP score regular 72.85, mAP score EMA 72.60
130 | 2022-07-17 18:12:18,127 [INFO] current_mAP = 72.85, highest_mAP = 72.85, best_epoch=10
131 |
132 | 2022-07-17 18:12:18,127 [INFO] Save text embeddings done
133 | 2022-07-17 18:12:25,880 [INFO] Epoch [11/60], Step [000/642], LR 2.4e-05, Loss: 637.2
134 | 2022-07-17 18:14:09,057 [INFO] Epoch [11/60], Step [100/642], LR 2.4e-05, Loss: 623.2
135 | 2022-07-17 18:15:52,046 [INFO] Epoch [11/60], Step [200/642], LR 2.4e-05, Loss: 610.6
136 | 2022-07-17 18:17:35,624 [INFO] Epoch [11/60], Step [300/642], LR 2.5e-05, Loss: 624.1
137 | 2022-07-17 18:19:19,581 [INFO] Epoch [11/60], Step [400/642], LR 2.5e-05, Loss: 615.6
138 | 2022-07-17 18:21:03,305 [INFO] Epoch [11/60], Step [500/642], LR 2.5e-05, Loss: 627.8
139 | 2022-07-17 18:22:46,519 [INFO] Epoch [11/60], Step [600/642], LR 2.6e-05, Loss: 644.7
140 | 2022-07-17 18:23:27,759 [INFO] Start validation...
141 | 2022-07-17 18:25:01,510 [INFO] mAP score regular 72.48, mAP score EMA 73.83
142 | 2022-07-17 18:25:09,083 [INFO] current_mAP = 73.83, highest_mAP = 73.83, best_epoch=11
143 |
144 | 2022-07-17 18:25:09,084 [INFO] Save text embeddings done
145 | 2022-07-17 18:25:18,691 [INFO] Epoch [12/60], Step [000/642], LR 2.6e-05, Loss: 635.7
146 | 2022-07-17 18:27:02,516 [INFO] Epoch [12/60], Step [100/642], LR 2.6e-05, Loss: 630.8
147 | 2022-07-17 18:28:45,784 [INFO] Epoch [12/60], Step [200/642], LR 2.6e-05, Loss: 623.5
148 | 2022-07-17 18:30:28,908 [INFO] Epoch [12/60], Step [300/642], LR 2.7e-05, Loss: 619.5
149 | 2022-07-17 18:32:11,842 [INFO] Epoch [12/60], Step [400/642], LR 2.7e-05, Loss: 630.5
150 | 2022-07-17 18:33:55,056 [INFO] Epoch [12/60], Step [500/642], LR 2.7e-05, Loss: 600.1
151 | 2022-07-17 18:35:39,971 [INFO] Epoch [12/60], Step [600/642], LR 2.7e-05, Loss: 649.8
152 | 2022-07-17 18:36:21,413 [INFO] Start validation...
153 | 2022-07-17 18:37:55,776 [INFO] mAP score regular 71.86, mAP score EMA 74.67
154 | 2022-07-17 18:38:03,296 [INFO] current_mAP = 74.67, highest_mAP = 74.67, best_epoch=12
155 |
156 | 2022-07-17 18:38:03,297 [INFO] Save text embeddings done
157 | 2022-07-17 18:38:11,348 [INFO] Epoch [13/60], Step [000/642], LR 2.8e-05, Loss: 608.0
158 | 2022-07-17 18:39:54,589 [INFO] Epoch [13/60], Step [100/642], LR 2.8e-05, Loss: 628.6
159 | 2022-07-17 18:41:37,798 [INFO] Epoch [13/60], Step [200/642], LR 2.8e-05, Loss: 619.4
160 | 2022-07-17 18:43:21,140 [INFO] Epoch [13/60], Step [300/642], LR 2.8e-05, Loss: 637.7
161 | 2022-07-17 18:45:04,141 [INFO] Epoch [13/60], Step [400/642], LR 2.8e-05, Loss: 632.4
162 | 2022-07-17 18:46:47,602 [INFO] Epoch [13/60], Step [500/642], LR 2.9e-05, Loss: 658.2
163 | 2022-07-17 18:48:30,967 [INFO] Epoch [13/60], Step [600/642], LR 2.9e-05, Loss: 610.7
164 | 2022-07-17 18:49:11,892 [INFO] Start validation...
165 | 2022-07-17 18:50:45,710 [INFO] mAP score regular 72.37, mAP score EMA 75.25
166 | 2022-07-17 18:50:54,315 [INFO] current_mAP = 75.25, highest_mAP = 75.25, best_epoch=13
167 |
168 | 2022-07-17 18:50:54,315 [INFO] Save text embeddings done
169 | 2022-07-17 18:51:02,755 [INFO] Epoch [14/60], Step [000/642], LR 2.9e-05, Loss: 594.0
170 | 2022-07-17 18:52:46,716 [INFO] Epoch [14/60], Step [100/642], LR 2.9e-05, Loss: 601.3
171 | 2022-07-17 18:54:30,099 [INFO] Epoch [14/60], Step [200/642], LR 2.9e-05, Loss: 607.4
172 | 2022-07-17 18:56:13,470 [INFO] Epoch [14/60], Step [300/642], LR 2.9e-05, Loss: 636.7
173 | 2022-07-17 18:57:56,986 [INFO] Epoch [14/60], Step [400/642], LR 2.9e-05, Loss: 602.9
174 | 2022-07-17 18:59:39,555 [INFO] Epoch [14/60], Step [500/642], LR 3.0e-05, Loss: 601.4
175 | 2022-07-17 19:01:23,459 [INFO] Epoch [14/60], Step [600/642], LR 3.0e-05, Loss: 595.6
176 | 2022-07-17 19:02:04,815 [INFO] Start validation...
177 | 2022-07-17 19:03:41,072 [INFO] mAP score regular 70.73, mAP score EMA 75.68
178 | 2022-07-17 19:03:48,854 [INFO] current_mAP = 75.68, highest_mAP = 75.68, best_epoch=14
179 |
180 | 2022-07-17 19:03:48,863 [INFO] Save text embeddings done
181 | 2022-07-17 19:03:58,047 [INFO] Epoch [15/60], Step [000/642], LR 3.0e-05, Loss: 649.3
182 | 2022-07-17 19:05:40,735 [INFO] Epoch [15/60], Step [100/642], LR 3.0e-05, Loss: 634.9
183 | 2022-07-17 19:07:24,967 [INFO] Epoch [15/60], Step [200/642], LR 3.0e-05, Loss: 615.0
184 | 2022-07-17 19:09:09,782 [INFO] Epoch [15/60], Step [300/642], LR 3.0e-05, Loss: 596.5
185 | 2022-07-17 19:10:53,226 [INFO] Epoch [15/60], Step [400/642], LR 3.0e-05, Loss: 627.1
186 | 2022-07-17 19:12:36,561 [INFO] Epoch [15/60], Step [500/642], LR 3.0e-05, Loss: 640.2
187 | 2022-07-17 19:14:19,818 [INFO] Epoch [15/60], Step [600/642], LR 3.0e-05, Loss: 599.0
188 | 2022-07-17 19:15:01,235 [INFO] Start validation...
189 | 2022-07-17 19:16:35,357 [INFO] mAP score regular 71.33, mAP score EMA 76.00
190 | 2022-07-17 19:16:45,918 [INFO] current_mAP = 76.00, highest_mAP = 76.00, best_epoch=15
191 |
192 | 2022-07-17 19:16:45,918 [INFO] Save text embeddings done
193 | 2022-07-17 19:16:57,307 [INFO] Epoch [16/60], Step [000/642], LR 3.0e-05, Loss: 591.5
194 | 2022-07-17 19:18:39,963 [INFO] Epoch [16/60], Step [100/642], LR 3.0e-05, Loss: 600.0
195 | 2022-07-17 19:20:23,570 [INFO] Epoch [16/60], Step [200/642], LR 3.0e-05, Loss: 583.3
196 | 2022-07-17 19:22:07,824 [INFO] Epoch [16/60], Step [300/642], LR 3.0e-05, Loss: 600.7
197 | 2022-07-17 19:23:51,911 [INFO] Epoch [16/60], Step [400/642], LR 3.0e-05, Loss: 605.3
198 | 2022-07-17 19:25:34,761 [INFO] Epoch [16/60], Step [500/642], LR 3.0e-05, Loss: 578.8
199 | 2022-07-17 19:27:17,418 [INFO] Epoch [16/60], Step [600/642], LR 3.0e-05, Loss: 627.3
200 | 2022-07-17 19:27:58,323 [INFO] Start validation...
201 | 2022-07-17 19:29:31,542 [INFO] mAP score regular 70.97, mAP score EMA 76.25
202 | 2022-07-17 19:29:42,388 [INFO] current_mAP = 76.25, highest_mAP = 76.25, best_epoch=16
203 |
204 | 2022-07-17 19:29:42,388 [INFO] Save text embeddings done
205 | 2022-07-17 19:29:51,810 [INFO] Epoch [17/60], Step [000/642], LR 3.0e-05, Loss: 624.3
206 | 2022-07-17 19:31:36,236 [INFO] Epoch [17/60], Step [100/642], LR 3.0e-05, Loss: 588.3
207 | 2022-07-17 19:33:19,494 [INFO] Epoch [17/60], Step [200/642], LR 3.0e-05, Loss: 610.9
208 | 2022-07-17 19:35:03,244 [INFO] Epoch [17/60], Step [300/642], LR 3.0e-05, Loss: 573.2
209 | 2022-07-17 19:36:46,556 [INFO] Epoch [17/60], Step [400/642], LR 3.0e-05, Loss: 612.2
210 | 2022-07-17 19:38:28,978 [INFO] Epoch [17/60], Step [500/642], LR 3.0e-05, Loss: 575.3
211 | 2022-07-17 19:40:12,017 [INFO] Epoch [17/60], Step [600/642], LR 3.0e-05, Loss: 602.0
212 | 2022-07-17 19:40:53,048 [INFO] Start validation...
213 | 2022-07-17 19:42:25,563 [INFO] mAP score regular 72.18, mAP score EMA 76.37
214 | 2022-07-17 19:42:33,121 [INFO] current_mAP = 76.37, highest_mAP = 76.37, best_epoch=17
215 |
216 | 2022-07-17 19:42:33,121 [INFO] Save text embeddings done
217 | 2022-07-17 19:42:42,420 [INFO] Epoch [18/60], Step [000/642], LR 3.0e-05, Loss: 607.2
218 | 2022-07-17 19:44:24,861 [INFO] Epoch [18/60], Step [100/642], LR 3.0e-05, Loss: 579.3
219 | 2022-07-17 19:46:08,403 [INFO] Epoch [18/60], Step [200/642], LR 3.0e-05, Loss: 570.6
220 | 2022-07-17 19:47:52,526 [INFO] Epoch [18/60], Step [300/642], LR 3.0e-05, Loss: 604.3
221 | 2022-07-17 19:49:35,492 [INFO] Epoch [18/60], Step [400/642], LR 3.0e-05, Loss: 602.7
222 | 2022-07-17 19:51:18,619 [INFO] Epoch [18/60], Step [500/642], LR 3.0e-05, Loss: 577.4
223 | 2022-07-17 19:53:01,579 [INFO] Epoch [18/60], Step [600/642], LR 3.0e-05, Loss: 603.9
224 | 2022-07-17 19:53:42,746 [INFO] Start validation...
225 | 2022-07-17 19:55:16,569 [INFO] mAP score regular 71.23, mAP score EMA 76.42
226 | 2022-07-17 19:55:24,451 [INFO] current_mAP = 76.42, highest_mAP = 76.42, best_epoch=18
227 |
228 | 2022-07-17 19:55:24,451 [INFO] Save text embeddings done
229 | 2022-07-17 19:55:34,927 [INFO] Epoch [19/60], Step [000/642], LR 3.0e-05, Loss: 590.6
230 | 2022-07-17 19:57:18,542 [INFO] Epoch [19/60], Step [100/642], LR 3.0e-05, Loss: 614.3
231 | 2022-07-17 19:59:01,698 [INFO] Epoch [19/60], Step [200/642], LR 3.0e-05, Loss: 578.7
232 | 2022-07-17 20:00:45,131 [INFO] Epoch [19/60], Step [300/642], LR 3.0e-05, Loss: 611.3
233 | 2022-07-17 20:02:27,317 [INFO] Epoch [19/60], Step [400/642], LR 3.0e-05, Loss: 608.8
234 | 2022-07-17 20:04:09,765 [INFO] Epoch [19/60], Step [500/642], LR 3.0e-05, Loss: 614.6
235 | 2022-07-17 20:05:52,692 [INFO] Epoch [19/60], Step [600/642], LR 3.0e-05, Loss: 607.6
236 | 2022-07-17 20:06:33,787 [INFO] Start validation...
237 | 2022-07-17 20:08:05,728 [INFO] mAP score regular 70.85, mAP score EMA 76.35
238 | 2022-07-17 20:08:05,748 [INFO] current_mAP = 76.35, highest_mAP = 76.42, best_epoch=18
239 |
240 | 2022-07-17 20:08:05,749 [INFO] Save text embeddings done
241 | 2022-07-17 20:08:15,606 [INFO] Epoch [20/60], Step [000/642], LR 3.0e-05, Loss: 591.2
242 | 2022-07-17 20:09:57,967 [INFO] Epoch [20/60], Step [100/642], LR 3.0e-05, Loss: 590.8
243 | 2022-07-17 20:11:40,405 [INFO] Epoch [20/60], Step [200/642], LR 3.0e-05, Loss: 598.5
244 | 2022-07-17 20:13:23,257 [INFO] Epoch [20/60], Step [300/642], LR 3.0e-05, Loss: 579.0
245 | 2022-07-17 20:15:06,258 [INFO] Epoch [20/60], Step [400/642], LR 3.0e-05, Loss: 583.9
246 | 2022-07-17 20:16:47,969 [INFO] Epoch [20/60], Step [500/642], LR 3.0e-05, Loss: 599.2
247 | 2022-07-17 20:18:30,394 [INFO] Epoch [20/60], Step [600/642], LR 3.0e-05, Loss: 568.2
248 | 2022-07-17 20:19:11,417 [INFO] Start validation...
249 | 2022-07-17 20:20:42,805 [INFO] mAP score regular 71.09, mAP score EMA 76.25
250 | 2022-07-17 20:20:42,826 [INFO] current_mAP = 76.25, highest_mAP = 76.42, best_epoch=18
251 |
252 | 2022-07-17 20:20:42,826 [INFO] Save text embeddings done
253 | 2022-07-17 20:20:53,384 [INFO] Epoch [21/60], Step [000/642], LR 3.0e-05, Loss: 572.4
254 | 2022-07-17 20:22:37,802 [INFO] Epoch [21/60], Step [100/642], LR 3.0e-05, Loss: 599.3
255 | 2022-07-17 20:24:20,758 [INFO] Epoch [21/60], Step [200/642], LR 2.9e-05, Loss: 557.3
256 | 2022-07-17 20:26:04,039 [INFO] Epoch [21/60], Step [300/642], LR 2.9e-05, Loss: 571.6
257 | 2022-07-17 20:27:46,721 [INFO] Epoch [21/60], Step [400/642], LR 2.9e-05, Loss: 619.8
258 | 2022-07-17 20:29:29,281 [INFO] Epoch [21/60], Step [500/642], LR 2.9e-05, Loss: 584.9
259 | 2022-07-17 20:31:12,229 [INFO] Epoch [21/60], Step [600/642], LR 2.9e-05, Loss: 579.1
260 | 2022-07-17 20:31:53,816 [INFO] Start validation...
261 | 2022-07-17 20:33:25,354 [INFO] mAP score regular 70.88, mAP score EMA 76.12
262 | 2022-07-17 20:33:25,371 [INFO] current_mAP = 76.12, highest_mAP = 76.42, best_epoch=18
263 |
264 | 2022-07-17 20:33:25,372 [INFO] Save text embeddings done
265 | 2022-07-17 20:33:34,224 [INFO] Epoch [22/60], Step [000/642], LR 2.9e-05, Loss: 596.8
266 | 2022-07-17 20:35:18,091 [INFO] Epoch [22/60], Step [100/642], LR 2.9e-05, Loss: 577.1
267 | 2022-07-17 20:37:00,832 [INFO] Epoch [22/60], Step [200/642], LR 2.9e-05, Loss: 561.4
268 | 2022-07-17 20:38:43,381 [INFO] Epoch [22/60], Step [300/642], LR 2.9e-05, Loss: 591.5
269 | 2022-07-17 20:40:26,419 [INFO] Epoch [22/60], Step [400/642], LR 2.9e-05, Loss: 583.0
270 | 2022-07-17 20:42:09,171 [INFO] Epoch [22/60], Step [500/642], LR 2.9e-05, Loss: 576.2
271 | 2022-07-17 20:43:51,747 [INFO] Epoch [22/60], Step [600/642], LR 2.9e-05, Loss: 606.0
272 | 2022-07-17 20:44:32,859 [INFO] Start validation...
273 | 2022-07-17 20:46:04,609 [INFO] mAP score regular 70.46, mAP score EMA 75.94
274 | 2022-07-17 20:46:04,629 [INFO] current_mAP = 75.94, highest_mAP = 76.42, best_epoch=18
275 |
276 | 2022-07-17 20:46:04,630 [INFO] Save text embeddings done
277 | 2022-07-17 20:46:11,990 [INFO] Epoch [23/60], Step [000/642], LR 2.9e-05, Loss: 590.1
278 | 2022-07-17 20:47:57,088 [INFO] Epoch [23/60], Step [100/642], LR 2.9e-05, Loss: 570.9
279 | 2022-07-17 20:49:39,436 [INFO] Epoch [23/60], Step [200/642], LR 2.9e-05, Loss: 573.8
280 | 2022-07-17 20:51:22,581 [INFO] Epoch [23/60], Step [300/642], LR 2.9e-05, Loss: 568.8
281 | 2022-07-17 20:53:05,325 [INFO] Epoch [23/60], Step [400/642], LR 2.9e-05, Loss: 614.1
282 | 2022-07-17 20:54:47,585 [INFO] Epoch [23/60], Step [500/642], LR 2.9e-05, Loss: 557.5
283 | 2022-07-17 20:56:30,310 [INFO] Epoch [23/60], Step [600/642], LR 2.9e-05, Loss: 569.7
284 | 2022-07-17 20:57:11,431 [INFO] Start validation...
285 |
--------------------------------------------------------------------------------
/logs/scpnet+cub.txt:
--------------------------------------------------------------------------------
1 | 2022-07-29 09:13:45,742 [INFO] Epoch [0/100], Step [000/063], LR 1.2e-05, Loss: 58426.6
2 | 2022-07-29 09:14:39,947 [INFO] Start validation...
3 | 2022-07-29 09:15:01,987 [INFO] mAP score regular 10.42, mAP score EMA 14.98
4 | 2022-07-29 09:15:03,820 [INFO] current_mAP = 14.98, highest_mAP = 14.98, best_epoch=0
5 |
6 | 2022-07-29 09:15:03,821 [INFO] Save text embeddings done
7 | 2022-07-29 09:15:08,845 [INFO] Epoch [1/100], Step [000/063], LR 1.4e-05, Loss: 9480.4
8 | 2022-07-29 09:16:04,440 [INFO] Start validation...
9 | 2022-07-29 09:16:26,846 [INFO] mAP score regular 12.89, mAP score EMA 12.77
10 | 2022-07-29 09:16:26,854 [INFO] current_mAP = 12.89, highest_mAP = 14.98, best_epoch=0
11 |
12 | 2022-07-29 09:16:26,854 [INFO] Save text embeddings done
13 | 2022-07-29 09:16:32,453 [INFO] Epoch [2/100], Step [000/063], LR 1.9e-05, Loss: 4303.4
14 | 2022-07-29 09:17:28,590 [INFO] Start validation...
15 | 2022-07-29 09:17:51,066 [INFO] mAP score regular 13.16, mAP score EMA 12.49
16 | 2022-07-29 09:17:51,074 [INFO] current_mAP = 13.16, highest_mAP = 14.98, best_epoch=0
17 |
18 | 2022-07-29 09:17:51,074 [INFO] Save text embeddings done
19 | 2022-07-29 09:17:56,180 [INFO] Epoch [3/100], Step [000/063], LR 2.8e-05, Loss: 4418.8
20 | 2022-07-29 09:18:51,915 [INFO] Start validation...
21 | 2022-07-29 09:19:14,250 [INFO] mAP score regular 13.40, mAP score EMA 12.42
22 | 2022-07-29 09:19:14,258 [INFO] current_mAP = 13.40, highest_mAP = 14.98, best_epoch=0
23 |
24 | 2022-07-29 09:19:14,258 [INFO] Save text embeddings done
25 | 2022-07-29 09:19:19,482 [INFO] Epoch [4/100], Step [000/063], LR 4.0e-05, Loss: 4741.4
26 | 2022-07-29 09:20:15,280 [INFO] Start validation...
27 | 2022-07-29 09:20:37,483 [INFO] mAP score regular 13.38, mAP score EMA 12.46
28 | 2022-07-29 09:20:37,491 [INFO] current_mAP = 13.38, highest_mAP = 14.98, best_epoch=0
29 |
30 | 2022-07-29 09:20:37,491 [INFO] Save text embeddings done
31 | 2022-07-29 09:20:42,655 [INFO] Epoch [5/100], Step [000/063], LR 5.4e-05, Loss: 4832.6
32 | 2022-07-29 09:21:38,819 [INFO] Start validation...
33 | 2022-07-29 09:22:00,899 [INFO] mAP score regular 13.63, mAP score EMA 12.63
34 | 2022-07-29 09:22:00,907 [INFO] current_mAP = 13.63, highest_mAP = 14.98, best_epoch=0
35 |
36 | 2022-07-29 09:22:00,907 [INFO] Save text embeddings done
37 | 2022-07-29 09:22:05,806 [INFO] Epoch [6/100], Step [000/063], LR 7.2e-05, Loss: 4149.6
38 | 2022-07-29 09:23:02,277 [INFO] Start validation...
39 | 2022-07-29 09:23:24,715 [INFO] mAP score regular 14.65, mAP score EMA 12.93
40 | 2022-07-29 09:23:24,723 [INFO] current_mAP = 14.65, highest_mAP = 14.98, best_epoch=0
41 |
42 | 2022-07-29 09:23:24,723 [INFO] Save text embeddings done
43 | 2022-07-29 09:23:31,049 [INFO] Epoch [7/100], Step [000/063], LR 9.1e-05, Loss: 4404.6
44 | 2022-07-29 09:24:26,983 [INFO] Start validation...
45 | 2022-07-29 09:24:49,659 [INFO] mAP score regular 15.59, mAP score EMA 13.41
46 | 2022-07-29 09:24:58,817 [INFO] current_mAP = 15.59, highest_mAP = 15.59, best_epoch=7
47 |
48 | 2022-07-29 09:24:59,235 [INFO] Save text embeddings done
49 | 2022-07-29 09:25:05,516 [INFO] Epoch [8/100], Step [000/063], LR 1.1e-04, Loss: 4698.5
50 | 2022-07-29 09:26:03,066 [INFO] Start validation...
51 | 2022-07-29 09:26:26,515 [INFO] mAP score regular 16.58, mAP score EMA 14.19
52 | 2022-07-29 09:26:40,525 [INFO] current_mAP = 16.58, highest_mAP = 16.58, best_epoch=8
53 |
54 | 2022-07-29 09:26:40,526 [INFO] Save text embeddings done
55 | 2022-07-29 09:26:45,407 [INFO] Epoch [9/100], Step [000/063], LR 1.3e-04, Loss: 5346.6
56 | 2022-07-29 09:27:50,084 [INFO] Start validation...
57 | 2022-07-29 09:28:12,438 [INFO] mAP score regular 18.00, mAP score EMA 15.06
58 | 2022-07-29 09:28:22,483 [INFO] current_mAP = 18.00, highest_mAP = 18.00, best_epoch=9
59 |
60 | 2022-07-29 09:28:22,484 [INFO] Save text embeddings done
61 | 2022-07-29 09:28:28,229 [INFO] Epoch [10/100], Step [000/063], LR 1.6e-04, Loss: 4414.6
62 | 2022-07-29 09:29:25,566 [INFO] Start validation...
63 | 2022-07-29 09:29:48,410 [INFO] mAP score regular 17.60, mAP score EMA 16.27
64 | 2022-07-29 09:29:48,417 [INFO] current_mAP = 17.60, highest_mAP = 18.00, best_epoch=9
65 |
66 | 2022-07-29 09:29:48,418 [INFO] Save text embeddings done
67 | 2022-07-29 09:29:55,244 [INFO] Epoch [11/100], Step [000/063], LR 1.8e-04, Loss: 4678.3
68 | 2022-07-29 09:30:52,128 [INFO] Start validation...
69 | 2022-07-29 09:31:15,142 [INFO] mAP score regular 19.05, mAP score EMA 17.57
70 | 2022-07-29 09:31:23,717 [INFO] current_mAP = 19.05, highest_mAP = 19.05, best_epoch=11
71 |
72 | 2022-07-29 09:31:23,718 [INFO] Save text embeddings done
73 | 2022-07-29 09:31:29,185 [INFO] Epoch [12/100], Step [000/063], LR 2.0e-04, Loss: 4220.6
74 | 2022-07-29 09:32:26,296 [INFO] Start validation...
75 | 2022-07-29 09:32:49,182 [INFO] mAP score regular 21.08, mAP score EMA 18.73
76 | 2022-07-29 09:32:58,051 [INFO] current_mAP = 21.08, highest_mAP = 21.08, best_epoch=12
77 |
78 | 2022-07-29 09:32:58,052 [INFO] Save text embeddings done
79 | 2022-07-29 09:33:04,718 [INFO] Epoch [13/100], Step [000/063], LR 2.2e-04, Loss: 4281.9
80 | 2022-07-29 09:34:01,639 [INFO] Start validation...
81 | 2022-07-29 09:34:24,362 [INFO] mAP score regular 20.11, mAP score EMA 19.80
82 | 2022-07-29 09:34:24,370 [INFO] current_mAP = 20.11, highest_mAP = 21.08, best_epoch=12
83 |
84 | 2022-07-29 09:34:24,370 [INFO] Save text embeddings done
85 | 2022-07-29 09:34:31,286 [INFO] Epoch [14/100], Step [000/063], LR 2.4e-04, Loss: 5025.3
86 | 2022-07-29 09:35:29,472 [INFO] Start validation...
87 | 2022-07-29 09:35:53,424 [INFO] mAP score regular 21.95, mAP score EMA 20.66
88 | 2022-07-29 09:36:02,892 [INFO] current_mAP = 21.95, highest_mAP = 21.95, best_epoch=14
89 |
90 | 2022-07-29 09:36:02,892 [INFO] Save text embeddings done
91 | 2022-07-29 09:36:08,986 [INFO] Epoch [15/100], Step [000/063], LR 2.6e-04, Loss: 4703.9
92 | 2022-07-29 09:37:05,694 [INFO] Start validation...
93 | 2022-07-29 09:37:28,806 [INFO] mAP score regular 21.79, mAP score EMA 21.44
94 | 2022-07-29 09:37:28,826 [INFO] current_mAP = 21.79, highest_mAP = 21.95, best_epoch=14
95 |
96 | 2022-07-29 09:37:28,826 [INFO] Save text embeddings done
97 | 2022-07-29 09:37:34,459 [INFO] Epoch [16/100], Step [000/063], LR 2.7e-04, Loss: 4622.5
98 | 2022-07-29 09:38:31,933 [INFO] Start validation...
99 | 2022-07-29 09:38:55,138 [INFO] mAP score regular 22.05, mAP score EMA 22.16
100 | 2022-07-29 09:39:04,021 [INFO] current_mAP = 22.16, highest_mAP = 22.16, best_epoch=16
101 |
102 | 2022-07-29 09:39:04,021 [INFO] Save text embeddings done
103 | 2022-07-29 09:39:09,860 [INFO] Epoch [17/100], Step [000/063], LR 2.8e-04, Loss: 4661.1
104 | 2022-07-29 09:40:07,510 [INFO] Start validation...
105 | 2022-07-29 09:40:30,736 [INFO] mAP score regular 22.23, mAP score EMA 22.78
106 | 2022-07-29 09:40:44,520 [INFO] current_mAP = 22.78, highest_mAP = 22.78, best_epoch=17
107 |
108 | 2022-07-29 09:40:44,520 [INFO] Save text embeddings done
109 | 2022-07-29 09:40:51,565 [INFO] Epoch [18/100], Step [000/063], LR 2.9e-04, Loss: 4612.7
110 | 2022-07-29 09:41:55,680 [INFO] Start validation...
111 | 2022-07-29 09:42:19,019 [INFO] mAP score regular 22.62, mAP score EMA 23.19
112 | 2022-07-29 09:42:27,813 [INFO] current_mAP = 23.19, highest_mAP = 23.19, best_epoch=18
113 |
114 | 2022-07-29 09:42:27,814 [INFO] Save text embeddings done
115 | 2022-07-29 09:42:32,668 [INFO] Epoch [19/100], Step [000/063], LR 3.0e-04, Loss: 4768.1
116 | 2022-07-29 09:43:37,441 [INFO] Start validation...
117 | 2022-07-29 09:44:00,393 [INFO] mAP score regular 22.65, mAP score EMA 23.51
118 | 2022-07-29 09:44:15,909 [INFO] current_mAP = 23.51, highest_mAP = 23.51, best_epoch=19
119 |
120 | 2022-07-29 09:44:15,909 [INFO] Save text embeddings done
121 | 2022-07-29 09:44:20,407 [INFO] Epoch [20/100], Step [000/063], LR 3.0e-04, Loss: 4798.3
122 | 2022-07-29 09:45:20,859 [INFO] Start validation...
123 | 2022-07-29 09:45:43,773 [INFO] mAP score regular 21.91, mAP score EMA 23.79
124 | 2022-07-29 09:45:55,448 [INFO] current_mAP = 23.79, highest_mAP = 23.79, best_epoch=20
125 |
126 | 2022-07-29 09:45:55,449 [INFO] Save text embeddings done
127 | 2022-07-29 09:46:01,795 [INFO] Epoch [21/100], Step [000/063], LR 3.0e-04, Loss: 4817.1
128 | 2022-07-29 09:47:00,813 [INFO] Start validation...
129 | 2022-07-29 09:47:25,342 [INFO] mAP score regular 23.10, mAP score EMA 24.03
130 | 2022-07-29 09:47:35,078 [INFO] current_mAP = 24.03, highest_mAP = 24.03, best_epoch=21
131 |
132 | 2022-07-29 09:47:35,079 [INFO] Save text embeddings done
133 | 2022-07-29 09:47:41,634 [INFO] Epoch [22/100], Step [000/063], LR 3.0e-04, Loss: 4627.6
134 | 2022-07-29 09:48:41,624 [INFO] Start validation...
135 | 2022-07-29 09:49:05,467 [INFO] mAP score regular 23.45, mAP score EMA 24.34
136 | 2022-07-29 09:49:19,710 [INFO] current_mAP = 24.34, highest_mAP = 24.34, best_epoch=22
137 |
138 | 2022-07-29 09:49:19,710 [INFO] Save text embeddings done
139 | 2022-07-29 09:49:26,541 [INFO] Epoch [23/100], Step [000/063], LR 3.0e-04, Loss: 4463.8
140 | 2022-07-29 09:50:30,465 [INFO] Start validation...
141 | 2022-07-29 09:50:53,782 [INFO] mAP score regular 23.33, mAP score EMA 24.54
142 | 2022-07-29 09:51:02,617 [INFO] current_mAP = 24.54, highest_mAP = 24.54, best_epoch=23
143 |
144 | 2022-07-29 09:51:03,080 [INFO] Save text embeddings done
145 | 2022-07-29 09:51:08,366 [INFO] Epoch [24/100], Step [000/063], LR 3.0e-04, Loss: 4642.3
146 | 2022-07-29 09:52:07,097 [INFO] Start validation...
147 | 2022-07-29 09:52:29,895 [INFO] mAP score regular 23.29, mAP score EMA 24.67
148 | 2022-07-29 09:52:44,963 [INFO] current_mAP = 24.67, highest_mAP = 24.67, best_epoch=24
149 |
150 | 2022-07-29 09:52:44,963 [INFO] Save text embeddings done
151 | 2022-07-29 09:52:50,137 [INFO] Epoch [25/100], Step [000/063], LR 3.0e-04, Loss: 4619.1
152 | 2022-07-29 09:53:47,223 [INFO] Start validation...
153 | 2022-07-29 09:54:10,013 [INFO] mAP score regular 23.32, mAP score EMA 24.80
154 | 2022-07-29 09:54:21,557 [INFO] current_mAP = 24.80, highest_mAP = 24.80, best_epoch=25
155 |
156 | 2022-07-29 09:54:21,558 [INFO] Save text embeddings done
157 | 2022-07-29 09:54:26,215 [INFO] Epoch [26/100], Step [000/063], LR 3.0e-04, Loss: 4161.4
158 | 2022-07-29 09:55:24,199 [INFO] Start validation...
159 | 2022-07-29 09:55:47,376 [INFO] mAP score regular 23.22, mAP score EMA 24.93
160 | 2022-07-29 09:55:56,704 [INFO] current_mAP = 24.93, highest_mAP = 24.93, best_epoch=26
161 |
162 | 2022-07-29 09:55:56,705 [INFO] Save text embeddings done
163 | 2022-07-29 09:56:03,486 [INFO] Epoch [27/100], Step [000/063], LR 2.9e-04, Loss: 4729.1
164 | 2022-07-29 09:57:01,029 [INFO] Start validation...
165 | 2022-07-29 09:57:24,318 [INFO] mAP score regular 23.18, mAP score EMA 25.08
166 | 2022-07-29 09:57:50,660 [INFO] current_mAP = 25.08, highest_mAP = 25.08, best_epoch=27
167 |
168 | 2022-07-29 09:57:50,661 [INFO] Save text embeddings done
169 | 2022-07-29 09:57:55,166 [INFO] Epoch [28/100], Step [000/063], LR 2.9e-04, Loss: 4741.3
170 | 2022-07-29 09:58:52,173 [INFO] Start validation...
171 | 2022-07-29 09:59:15,006 [INFO] mAP score regular 23.48, mAP score EMA 25.09
172 | 2022-07-29 09:59:42,331 [INFO] current_mAP = 25.09, highest_mAP = 25.09, best_epoch=28
173 |
174 | 2022-07-29 09:59:42,332 [INFO] Save text embeddings done
175 | 2022-07-29 09:59:48,479 [INFO] Epoch [29/100], Step [000/063], LR 2.9e-04, Loss: 4437.4
176 | 2022-07-29 10:00:46,094 [INFO] Start validation...
177 | 2022-07-29 10:01:08,688 [INFO] mAP score regular 24.26, mAP score EMA 25.17
178 | 2022-07-29 10:01:32,452 [INFO] current_mAP = 25.17, highest_mAP = 25.17, best_epoch=29
179 |
180 | 2022-07-29 10:01:32,452 [INFO] Save text embeddings done
181 | 2022-07-29 10:01:37,289 [INFO] Epoch [30/100], Step [000/063], LR 2.9e-04, Loss: 4449.7
182 | 2022-07-29 10:02:39,085 [INFO] Start validation...
183 | 2022-07-29 10:03:01,615 [INFO] mAP score regular 23.91, mAP score EMA 25.31
184 | 2022-07-29 10:03:17,173 [INFO] current_mAP = 25.31, highest_mAP = 25.31, best_epoch=30
185 |
186 | 2022-07-29 10:03:17,173 [INFO] Save text embeddings done
187 | 2022-07-29 10:03:22,586 [INFO] Epoch [31/100], Step [000/063], LR 2.9e-04, Loss: 4160.9
188 | 2022-07-29 10:04:30,225 [INFO] Start validation...
189 | 2022-07-29 10:04:53,147 [INFO] mAP score regular 23.22, mAP score EMA 25.44
190 | 2022-07-29 10:05:21,281 [INFO] current_mAP = 25.44, highest_mAP = 25.44, best_epoch=31
191 |
192 | 2022-07-29 10:05:23,469 [INFO] Save text embeddings done
193 | 2022-07-29 10:05:28,687 [INFO] Epoch [32/100], Step [000/063], LR 2.8e-04, Loss: 4545.0
194 | 2022-07-29 10:06:25,065 [INFO] Start validation...
195 | 2022-07-29 10:06:47,916 [INFO] mAP score regular 24.28, mAP score EMA 25.53
196 | 2022-07-29 10:07:17,111 [INFO] current_mAP = 25.53, highest_mAP = 25.53, best_epoch=32
197 |
198 | 2022-07-29 10:07:17,112 [INFO] Save text embeddings done
199 | 2022-07-29 10:07:21,864 [INFO] Epoch [33/100], Step [000/063], LR 2.8e-04, Loss: 4666.8
200 | 2022-07-29 10:08:19,891 [INFO] Start validation...
201 | 2022-07-29 10:08:42,551 [INFO] mAP score regular 23.81, mAP score EMA 25.58
202 | 2022-07-29 10:09:03,455 [INFO] current_mAP = 25.58, highest_mAP = 25.58, best_epoch=33
203 |
204 | 2022-07-29 10:09:03,455 [INFO] Save text embeddings done
205 | 2022-07-29 10:09:09,183 [INFO] Epoch [34/100], Step [000/063], LR 2.8e-04, Loss: 4361.8
206 | 2022-07-29 10:10:09,995 [INFO] Start validation...
207 | 2022-07-29 10:10:32,777 [INFO] mAP score regular 22.87, mAP score EMA 25.65
208 | 2022-07-29 10:10:49,802 [INFO] current_mAP = 25.65, highest_mAP = 25.65, best_epoch=34
209 |
210 | 2022-07-29 10:10:49,803 [INFO] Save text embeddings done
211 | 2022-07-29 10:10:56,725 [INFO] Epoch [35/100], Step [000/063], LR 2.7e-04, Loss: 4491.9
212 | 2022-07-29 10:11:53,650 [INFO] Start validation...
213 | 2022-07-29 10:12:16,334 [INFO] mAP score regular 23.81, mAP score EMA 25.63
214 | 2022-07-29 10:12:16,346 [INFO] current_mAP = 25.63, highest_mAP = 25.65, best_epoch=34
215 |
216 | 2022-07-29 10:12:16,346 [INFO] Save text embeddings done
217 | 2022-07-29 10:12:21,707 [INFO] Epoch [36/100], Step [000/063], LR 2.7e-04, Loss: 4725.1
218 | 2022-07-29 10:13:19,069 [INFO] Start validation...
219 | 2022-07-29 10:13:42,494 [INFO] mAP score regular 23.84, mAP score EMA 25.71
220 | 2022-07-29 10:13:53,448 [INFO] current_mAP = 25.71, highest_mAP = 25.71, best_epoch=36
221 |
222 | 2022-07-29 10:13:53,449 [INFO] Save text embeddings done
223 | 2022-07-29 10:13:58,979 [INFO] Epoch [37/100], Step [000/063], LR 2.7e-04, Loss: 4550.1
224 | 2022-07-29 10:14:56,588 [INFO] Start validation...
225 | 2022-07-29 10:15:20,064 [INFO] mAP score regular 24.24, mAP score EMA 25.67
226 | 2022-07-29 10:15:20,073 [INFO] current_mAP = 25.67, highest_mAP = 25.71, best_epoch=36
227 |
228 | 2022-07-29 10:15:20,073 [INFO] Save text embeddings done
229 | 2022-07-29 10:15:26,300 [INFO] Epoch [38/100], Step [000/063], LR 2.6e-04, Loss: 4764.7
230 | 2022-07-29 10:16:23,005 [INFO] Start validation...
231 | 2022-07-29 10:16:45,663 [INFO] mAP score regular 23.94, mAP score EMA 25.64
232 | 2022-07-29 10:16:45,672 [INFO] current_mAP = 25.64, highest_mAP = 25.71, best_epoch=36
233 |
234 | 2022-07-29 10:16:45,672 [INFO] Save text embeddings done
235 | 2022-07-29 10:16:50,267 [INFO] Epoch [39/100], Step [000/063], LR 2.6e-04, Loss: 4412.5
236 | 2022-07-29 10:17:47,600 [INFO] Start validation...
237 | 2022-07-29 10:18:10,546 [INFO] mAP score regular 23.24, mAP score EMA 25.61
238 | 2022-07-29 10:18:10,556 [INFO] current_mAP = 25.61, highest_mAP = 25.71, best_epoch=36
239 |
240 | 2022-07-29 10:18:10,556 [INFO] Save text embeddings done
241 | 2022-07-29 10:18:16,734 [INFO] Epoch [40/100], Step [000/063], LR 2.6e-04, Loss: 4513.1
242 | 2022-07-29 10:19:15,864 [INFO] Start validation...
243 | 2022-07-29 10:19:38,685 [INFO] mAP score regular 23.93, mAP score EMA 25.60
244 | 2022-07-29 10:19:38,694 [INFO] current_mAP = 25.60, highest_mAP = 25.71, best_epoch=36
245 |
246 | 2022-07-29 10:19:38,694 [INFO] Save text embeddings done
247 | 2022-07-29 10:19:43,243 [INFO] Epoch [41/100], Step [000/063], LR 2.5e-04, Loss: 4301.8
248 | 2022-07-29 10:20:41,449 [INFO] Start validation...
249 | 2022-07-29 10:21:04,521 [INFO] mAP score regular 23.22, mAP score EMA 25.53
250 | 2022-07-29 10:21:04,530 [INFO] current_mAP = 25.53, highest_mAP = 25.71, best_epoch=36
251 |
252 | 2022-07-29 10:21:04,530 [INFO] Save text embeddings done
253 | 2022-07-29 10:21:09,729 [INFO] Epoch [42/100], Step [000/063], LR 2.5e-04, Loss: 4514.9
254 | 2022-07-29 10:22:07,055 [INFO] Start validation...
255 | 2022-07-29 10:22:30,427 [INFO] mAP score regular 23.71, mAP score EMA 25.52
256 | 2022-07-29 10:22:30,437 [INFO] current_mAP = 25.52, highest_mAP = 25.71, best_epoch=36
257 |
258 | 2022-07-29 10:22:30,437 [INFO] Save text embeddings done
259 | 2022-07-29 10:22:35,814 [INFO] Epoch [43/100], Step [000/063], LR 2.4e-04, Loss: 4668.1
260 | 2022-07-29 10:23:33,230 [INFO] Start validation...
261 | 2022-07-29 10:23:56,347 [INFO] mAP score regular 22.90, mAP score EMA 25.47
262 | 2022-07-29 10:23:56,355 [INFO] current_mAP = 25.47, highest_mAP = 25.71, best_epoch=36
263 |
264 | 2022-07-29 10:23:56,356 [INFO] Save text embeddings done
265 | 2022-07-29 10:24:01,297 [INFO] Epoch [44/100], Step [000/063], LR 2.4e-04, Loss: 4356.1
266 | 2022-07-29 10:24:58,189 [INFO] Start validation...
267 | 2022-07-29 10:25:21,006 [INFO] mAP score regular 23.78, mAP score EMA 25.42
268 | 2022-07-29 10:25:21,014 [INFO] current_mAP = 25.42, highest_mAP = 25.71, best_epoch=36
269 |
270 | 2022-07-29 10:25:21,014 [INFO] Save text embeddings done
271 | 2022-07-29 10:25:26,686 [INFO] Epoch [45/100], Step [000/063], LR 2.3e-04, Loss: 4519.1
272 | 2022-07-29 10:26:23,847 [INFO] Start validation...
273 | 2022-07-29 10:26:47,035 [INFO] mAP score regular 23.43, mAP score EMA 25.43
274 | 2022-07-29 10:26:47,045 [INFO] current_mAP = 25.43, highest_mAP = 25.71, best_epoch=36
275 |
276 | 2022-07-29 10:26:47,045 [INFO] Save text embeddings done
277 | 2022-07-29 10:26:54,267 [INFO] Epoch [46/100], Step [000/063], LR 2.3e-04, Loss: 4271.3
278 | 2022-07-29 10:27:52,214 [INFO] Start validation...
279 | 2022-07-29 10:28:15,352 [INFO] mAP score regular 23.96, mAP score EMA 25.40
280 | 2022-07-29 10:28:15,360 [INFO] current_mAP = 25.40, highest_mAP = 25.71, best_epoch=36
281 |
282 | 2022-07-29 10:28:15,360 [INFO] Save text embeddings done
283 | 2022-07-29 10:28:21,349 [INFO] Epoch [47/100], Step [000/063], LR 2.2e-04, Loss: 4291.6
284 | 2022-07-29 10:29:18,763 [INFO] Start validation...
285 | 2022-07-29 10:29:41,741 [INFO] mAP score regular 23.26, mAP score EMA 25.33
286 | 2022-07-29 10:29:41,749 [INFO] current_mAP = 25.33, highest_mAP = 25.71, best_epoch=36
287 |
288 | 2022-07-29 10:29:41,750 [INFO] Save text embeddings done
289 | 2022-07-29 10:29:46,669 [INFO] Epoch [48/100], Step [000/063], LR 2.2e-04, Loss: 4608.9
290 | 2022-07-29 10:30:43,650 [INFO] Start validation...
291 | 2022-07-29 10:31:07,049 [INFO] mAP score regular 23.28, mAP score EMA 25.22
292 | 2022-07-29 10:31:07,057 [INFO] current_mAP = 25.22, highest_mAP = 25.71, best_epoch=36
293 |
294 | 2022-07-29 10:31:07,057 [INFO] Save text embeddings done
295 | 2022-07-29 10:31:13,064 [INFO] Epoch [49/100], Step [000/063], LR 2.1e-04, Loss: 4615.0
296 | 2022-07-29 10:32:09,690 [INFO] Start validation...
297 | 2022-07-29 10:32:32,531 [INFO] mAP score regular 23.78, mAP score EMA 25.14
298 | 2022-07-29 10:32:32,541 [INFO] current_mAP = 25.14, highest_mAP = 25.71, best_epoch=36
299 |
300 | 2022-07-29 10:32:32,541 [INFO] Save text embeddings done
301 | 2022-07-29 10:32:38,914 [INFO] Epoch [50/100], Step [000/063], LR 2.1e-04, Loss: 4325.3
302 |
--------------------------------------------------------------------------------
/logs/scpnet+nuswide.txt:
--------------------------------------------------------------------------------
1 | 2022-07-22 16:58:23,583 [INFO] Epoch [0/60], Step [000/931], LR 1.2e-06, Loss: 168696.5
2 | 2022-07-22 17:00:07,435 [INFO] Epoch [0/60], Step [100/931], LR 1.2e-06, Loss: 1152.4
3 | 2022-07-22 17:01:48,504 [INFO] Epoch [0/60], Step [200/931], LR 1.2e-06, Loss: 842.5
4 | 2022-07-22 17:03:29,886 [INFO] Epoch [0/60], Step [300/931], LR 1.2e-06, Loss: 821.4
5 | 2022-07-22 17:05:10,584 [INFO] Epoch [0/60], Step [400/931], LR 1.3e-06, Loss: 607.3
6 | 2022-07-22 17:06:51,785 [INFO] Epoch [0/60], Step [500/931], LR 1.3e-06, Loss: 471.3
7 | 2022-07-22 17:08:32,190 [INFO] Epoch [0/60], Step [600/931], LR 1.3e-06, Loss: 488.9
8 | 2022-07-22 17:10:12,379 [INFO] Epoch [0/60], Step [700/931], LR 1.4e-06, Loss: 508.0
9 | 2022-07-22 17:11:53,464 [INFO] Epoch [0/60], Step [800/931], LR 1.4e-06, Loss: 409.1
10 | 2022-07-22 17:13:34,056 [INFO] Epoch [0/60], Step [900/931], LR 1.5e-06, Loss: 425.4
11 | 2022-07-22 17:14:04,058 [INFO] Start validation...
12 | 2022-07-22 17:15:55,879 [INFO] mAP score regular 11.84, mAP score EMA 3.66
13 | 2022-07-22 17:15:57,927 [INFO] current_mAP = 11.84, highest_mAP = 11.84, best_epoch=0
14 |
15 | 2022-07-22 17:15:57,928 [INFO] Save text embeddings done
16 | 2022-07-22 17:16:02,440 [INFO] Epoch [1/60], Step [000/931], LR 1.5e-06, Loss: 9494.0
17 | 2022-07-22 17:17:45,651 [INFO] Epoch [1/60], Step [100/931], LR 1.5e-06, Loss: 1090.6
18 | 2022-07-22 17:19:26,972 [INFO] Epoch [1/60], Step [200/931], LR 1.6e-06, Loss: 704.7
19 | 2022-07-22 17:21:08,998 [INFO] Epoch [1/60], Step [300/931], LR 1.7e-06, Loss: 694.3
20 | 2022-07-22 17:22:51,621 [INFO] Epoch [1/60], Step [400/931], LR 1.8e-06, Loss: 579.7
21 | 2022-07-22 17:24:33,624 [INFO] Epoch [1/60], Step [500/931], LR 1.9e-06, Loss: 521.8
22 | 2022-07-22 17:26:15,368 [INFO] Epoch [1/60], Step [600/931], LR 1.9e-06, Loss: 515.0
23 | 2022-07-22 17:27:56,879 [INFO] Epoch [1/60], Step [700/931], LR 2.0e-06, Loss: 465.6
24 | 2022-07-22 17:29:38,717 [INFO] Epoch [1/60], Step [800/931], LR 2.2e-06, Loss: 471.2
25 | 2022-07-22 17:31:20,236 [INFO] Epoch [1/60], Step [900/931], LR 2.3e-06, Loss: 503.3
26 | 2022-07-22 17:31:50,347 [INFO] Start validation...
27 | 2022-07-22 17:33:40,165 [INFO] mAP score regular 39.21, mAP score EMA 5.20
28 | 2022-07-22 17:33:46,354 [INFO] current_mAP = 39.21, highest_mAP = 39.21, best_epoch=1
29 |
30 | 2022-07-22 17:33:46,354 [INFO] Save text embeddings done
31 | 2022-07-22 17:33:53,102 [INFO] Epoch [2/60], Step [000/931], LR 2.3e-06, Loss: 539.3
32 | 2022-07-22 17:35:35,290 [INFO] Epoch [2/60], Step [100/931], LR 2.4e-06, Loss: 485.4
33 | 2022-07-22 17:37:17,157 [INFO] Epoch [2/60], Step [200/931], LR 2.5e-06, Loss: 444.6
34 | 2022-07-22 17:38:58,742 [INFO] Epoch [2/60], Step [300/931], LR 2.7e-06, Loss: 437.2
35 | 2022-07-22 17:40:40,263 [INFO] Epoch [2/60], Step [400/931], LR 2.8e-06, Loss: 409.9
36 | 2022-07-22 17:42:22,608 [INFO] Epoch [2/60], Step [500/931], LR 3.0e-06, Loss: 465.6
37 | 2022-07-22 17:44:04,586 [INFO] Epoch [2/60], Step [600/931], LR 3.1e-06, Loss: 465.4
38 | 2022-07-22 17:45:46,511 [INFO] Epoch [2/60], Step [700/931], LR 3.3e-06, Loss: 443.5
39 | 2022-07-22 17:47:28,576 [INFO] Epoch [2/60], Step [800/931], LR 3.4e-06, Loss: 409.1
40 | 2022-07-22 17:49:10,655 [INFO] Epoch [2/60], Step [900/931], LR 3.6e-06, Loss: 407.5
41 | 2022-07-22 17:49:40,992 [INFO] Start validation...
42 | 2022-07-22 17:51:31,065 [INFO] mAP score regular 45.36, mAP score EMA 8.71
43 | 2022-07-22 17:51:37,773 [INFO] current_mAP = 45.36, highest_mAP = 45.36, best_epoch=2
44 |
45 | 2022-07-22 17:51:37,774 [INFO] Save text embeddings done
46 | 2022-07-22 17:51:44,476 [INFO] Epoch [3/60], Step [000/931], LR 3.6e-06, Loss: 405.8
47 | 2022-07-22 17:53:26,599 [INFO] Epoch [3/60], Step [100/931], LR 3.8e-06, Loss: 382.8
48 | 2022-07-22 17:55:08,574 [INFO] Epoch [3/60], Step [200/931], LR 4.0e-06, Loss: 437.0
49 | 2022-07-22 17:56:50,093 [INFO] Epoch [3/60], Step [300/931], LR 4.2e-06, Loss: 421.8
50 | 2022-07-22 17:58:32,676 [INFO] Epoch [3/60], Step [400/931], LR 4.3e-06, Loss: 407.3
51 | 2022-07-22 18:00:14,487 [INFO] Epoch [3/60], Step [500/931], LR 4.5e-06, Loss: 446.2
52 | 2022-07-22 18:01:56,628 [INFO] Epoch [3/60], Step [600/931], LR 4.7e-06, Loss: 392.6
53 | 2022-07-22 18:03:38,722 [INFO] Epoch [3/60], Step [700/931], LR 4.9e-06, Loss: 411.1
54 | 2022-07-22 18:05:20,252 [INFO] Epoch [3/60], Step [800/931], LR 5.1e-06, Loss: 426.3
55 | 2022-07-22 18:07:02,374 [INFO] Epoch [3/60], Step [900/931], LR 5.4e-06, Loss: 377.5
56 | 2022-07-22 18:07:32,762 [INFO] Start validation...
57 | 2022-07-22 18:09:23,769 [INFO] mAP score regular 49.79, mAP score EMA 18.69
58 | 2022-07-22 18:09:30,335 [INFO] current_mAP = 49.79, highest_mAP = 49.79, best_epoch=3
59 |
60 | 2022-07-22 18:09:30,335 [INFO] Save text embeddings done
61 | 2022-07-22 18:09:35,111 [INFO] Epoch [4/60], Step [000/931], LR 5.4e-06, Loss: 348.0
62 | 2022-07-22 18:11:17,859 [INFO] Epoch [4/60], Step [100/931], LR 5.6e-06, Loss: 419.7
63 | 2022-07-22 18:12:58,908 [INFO] Epoch [4/60], Step [200/931], LR 5.9e-06, Loss: 434.7
64 | 2022-07-22 18:14:40,430 [INFO] Epoch [4/60], Step [300/931], LR 6.1e-06, Loss: 422.4
65 | 2022-07-22 18:16:21,626 [INFO] Epoch [4/60], Step [400/931], LR 6.3e-06, Loss: 382.5
66 | 2022-07-22 18:18:03,176 [INFO] Epoch [4/60], Step [500/931], LR 6.5e-06, Loss: 398.0
67 | 2022-07-22 18:19:44,669 [INFO] Epoch [4/60], Step [600/931], LR 6.8e-06, Loss: 366.3
68 | 2022-07-22 18:21:26,210 [INFO] Epoch [4/60], Step [700/931], LR 7.0e-06, Loss: 438.7
69 | 2022-07-22 18:23:07,129 [INFO] Epoch [4/60], Step [800/931], LR 7.3e-06, Loss: 430.0
70 | 2022-07-22 18:24:48,503 [INFO] Epoch [4/60], Step [900/931], LR 7.5e-06, Loss: 376.9
71 | 2022-07-22 18:25:18,670 [INFO] Start validation...
72 | 2022-07-22 18:27:08,789 [INFO] mAP score regular 53.76, mAP score EMA 35.17
73 | 2022-07-22 18:27:15,260 [INFO] current_mAP = 53.76, highest_mAP = 53.76, best_epoch=4
74 |
75 | 2022-07-22 18:27:15,260 [INFO] Save text embeddings done
76 | 2022-07-22 18:27:20,534 [INFO] Epoch [5/60], Step [000/931], LR 7.6e-06, Loss: 370.5
77 | 2022-07-22 18:29:02,237 [INFO] Epoch [5/60], Step [100/931], LR 7.9e-06, Loss: 448.7
78 | 2022-07-22 18:30:44,068 [INFO] Epoch [5/60], Step [200/931], LR 8.1e-06, Loss: 437.9
79 | 2022-07-22 18:32:25,512 [INFO] Epoch [5/60], Step [300/931], LR 8.4e-06, Loss: 450.5
80 | 2022-07-22 18:34:07,672 [INFO] Epoch [5/60], Step [400/931], LR 8.6e-06, Loss: 402.4
81 | 2022-07-22 18:35:49,313 [INFO] Epoch [5/60], Step [500/931], LR 8.9e-06, Loss: 383.7
82 | 2022-07-22 18:37:30,873 [INFO] Epoch [5/60], Step [600/931], LR 9.2e-06, Loss: 388.4
83 | 2022-07-22 18:39:12,336 [INFO] Epoch [5/60], Step [700/931], LR 9.5e-06, Loss: 410.7
84 | 2022-07-22 18:40:54,095 [INFO] Epoch [5/60], Step [800/931], LR 9.7e-06, Loss: 405.9
85 | 2022-07-22 18:42:35,310 [INFO] Epoch [5/60], Step [900/931], LR 1.0e-05, Loss: 427.1
86 | 2022-07-22 18:43:06,120 [INFO] Start validation...
87 | 2022-07-22 18:44:55,785 [INFO] mAP score regular 56.13, mAP score EMA 45.50
88 | 2022-07-22 18:45:02,754 [INFO] current_mAP = 56.13, highest_mAP = 56.13, best_epoch=5
89 |
90 | 2022-07-22 18:45:02,755 [INFO] Save text embeddings done
91 | 2022-07-22 18:45:08,786 [INFO] Epoch [6/60], Step [000/931], LR 1.0e-05, Loss: 377.6
92 | 2022-07-22 18:46:50,663 [INFO] Epoch [6/60], Step [100/931], LR 1.0e-05, Loss: 424.4
93 | 2022-07-22 18:48:32,182 [INFO] Epoch [6/60], Step [200/931], LR 1.1e-05, Loss: 411.5
94 | 2022-07-22 18:50:13,058 [INFO] Epoch [6/60], Step [300/931], LR 1.1e-05, Loss: 374.4
95 | 2022-07-22 18:51:54,736 [INFO] Epoch [6/60], Step [400/931], LR 1.1e-05, Loss: 386.2
96 | 2022-07-22 18:53:36,489 [INFO] Epoch [6/60], Step [500/931], LR 1.2e-05, Loss: 434.6
97 | 2022-07-22 18:55:17,819 [INFO] Epoch [6/60], Step [600/931], LR 1.2e-05, Loss: 388.9
98 | 2022-07-22 18:56:59,350 [INFO] Epoch [6/60], Step [700/931], LR 1.2e-05, Loss: 388.1
99 | 2022-07-22 18:58:41,195 [INFO] Epoch [6/60], Step [800/931], LR 1.2e-05, Loss: 447.5
100 | 2022-07-22 19:00:22,662 [INFO] Epoch [6/60], Step [900/931], LR 1.3e-05, Loss: 495.3
101 | 2022-07-22 19:00:52,610 [INFO] Start validation...
102 | 2022-07-22 19:02:44,761 [INFO] mAP score regular 57.37, mAP score EMA 49.74
103 | 2022-07-22 19:02:51,485 [INFO] current_mAP = 57.37, highest_mAP = 57.37, best_epoch=6
104 |
105 | 2022-07-22 19:02:51,486 [INFO] Save text embeddings done
106 | 2022-07-22 19:02:56,513 [INFO] Epoch [7/60], Step [000/931], LR 1.3e-05, Loss: 435.5
107 | 2022-07-22 19:04:38,287 [INFO] Epoch [7/60], Step [100/931], LR 1.3e-05, Loss: 415.3
108 | 2022-07-22 19:06:20,097 [INFO] Epoch [7/60], Step [200/931], LR 1.3e-05, Loss: 459.2
109 | 2022-07-22 19:08:01,869 [INFO] Epoch [7/60], Step [300/931], LR 1.4e-05, Loss: 451.2
110 | 2022-07-22 19:09:43,821 [INFO] Epoch [7/60], Step [400/931], LR 1.4e-05, Loss: 425.5
111 | 2022-07-22 19:11:25,536 [INFO] Epoch [7/60], Step [500/931], LR 1.4e-05, Loss: 476.3
112 | 2022-07-22 19:13:06,800 [INFO] Epoch [7/60], Step [600/931], LR 1.5e-05, Loss: 438.6
113 | 2022-07-22 19:14:47,432 [INFO] Epoch [7/60], Step [700/931], LR 1.5e-05, Loss: 461.8
114 | 2022-07-22 19:16:28,145 [INFO] Epoch [7/60], Step [800/931], LR 1.5e-05, Loss: 471.2
115 | 2022-07-22 19:18:09,652 [INFO] Epoch [7/60], Step [900/931], LR 1.6e-05, Loss: 412.4
116 | 2022-07-22 19:18:39,353 [INFO] Start validation...
117 | 2022-07-22 19:20:25,645 [INFO] mAP score regular 58.69, mAP score EMA 52.78
118 | 2022-07-22 19:20:32,182 [INFO] current_mAP = 58.69, highest_mAP = 58.69, best_epoch=7
119 |
120 | 2022-07-22 19:20:32,183 [INFO] Save text embeddings done
121 | 2022-07-22 19:20:36,631 [INFO] Epoch [8/60], Step [000/931], LR 1.6e-05, Loss: 448.2
122 | 2022-07-22 19:22:18,627 [INFO] Epoch [8/60], Step [100/931], LR 1.6e-05, Loss: 484.9
123 | 2022-07-22 19:24:00,345 [INFO] Epoch [8/60], Step [200/931], LR 1.6e-05, Loss: 488.8
124 | 2022-07-22 19:25:42,333 [INFO] Epoch [8/60], Step [300/931], LR 1.7e-05, Loss: 469.3
125 | 2022-07-22 19:27:23,299 [INFO] Epoch [8/60], Step [400/931], LR 1.7e-05, Loss: 462.5
126 | 2022-07-22 19:29:04,810 [INFO] Epoch [8/60], Step [500/931], LR 1.7e-05, Loss: 444.4
127 | 2022-07-22 19:30:47,145 [INFO] Epoch [8/60], Step [600/931], LR 1.7e-05, Loss: 441.2
128 | 2022-07-22 19:32:27,715 [INFO] Epoch [8/60], Step [700/931], LR 1.8e-05, Loss: 424.7
129 | 2022-07-22 19:34:09,897 [INFO] Epoch [8/60], Step [800/931], LR 1.8e-05, Loss: 450.1
130 | 2022-07-22 19:35:50,608 [INFO] Epoch [8/60], Step [900/931], LR 1.8e-05, Loss: 445.6
131 | 2022-07-22 19:36:20,269 [INFO] Start validation...
132 | 2022-07-22 19:38:06,495 [INFO] mAP score regular 58.81, mAP score EMA 55.29
133 | 2022-07-22 19:38:12,678 [INFO] current_mAP = 58.81, highest_mAP = 58.81, best_epoch=8
134 |
135 | 2022-07-22 19:38:12,679 [INFO] Save text embeddings done
136 | 2022-07-22 19:38:17,254 [INFO] Epoch [9/60], Step [000/931], LR 1.8e-05, Loss: 488.8
137 | 2022-07-22 19:39:58,702 [INFO] Epoch [9/60], Step [100/931], LR 1.9e-05, Loss: 394.8
138 | 2022-07-22 19:41:39,838 [INFO] Epoch [9/60], Step [200/931], LR 1.9e-05, Loss: 447.5
139 | 2022-07-22 19:43:20,808 [INFO] Epoch [9/60], Step [300/931], LR 1.9e-05, Loss: 449.2
140 | 2022-07-22 19:45:01,452 [INFO] Epoch [9/60], Step [400/931], LR 2.0e-05, Loss: 461.2
141 | 2022-07-22 19:46:42,221 [INFO] Epoch [9/60], Step [500/931], LR 2.0e-05, Loss: 442.0
142 | 2022-07-22 19:48:23,145 [INFO] Epoch [9/60], Step [600/931], LR 2.0e-05, Loss: 448.4
143 | 2022-07-22 19:50:03,666 [INFO] Epoch [9/60], Step [700/931], LR 2.0e-05, Loss: 449.9
144 | 2022-07-22 19:51:44,383 [INFO] Epoch [9/60], Step [800/931], LR 2.1e-05, Loss: 460.1
145 | 2022-07-22 19:53:25,921 [INFO] Epoch [9/60], Step [900/931], LR 2.1e-05, Loss: 500.3
146 | 2022-07-22 19:53:55,621 [INFO] Start validation...
147 | 2022-07-22 19:55:43,415 [INFO] mAP score regular 58.21, mAP score EMA 57.47
148 | 2022-07-22 19:55:43,431 [INFO] current_mAP = 58.21, highest_mAP = 58.81, best_epoch=8
149 |
150 | 2022-07-22 19:55:43,432 [INFO] Save text embeddings done
151 | 2022-07-22 19:55:47,553 [INFO] Epoch [10/60], Step [000/931], LR 2.1e-05, Loss: 527.4
152 | 2022-07-22 19:57:28,562 [INFO] Epoch [10/60], Step [100/931], LR 2.1e-05, Loss: 446.2
153 | 2022-07-22 19:59:08,950 [INFO] Epoch [10/60], Step [200/931], LR 2.2e-05, Loss: 448.2
154 | 2022-07-22 20:00:49,431 [INFO] Epoch [10/60], Step [300/931], LR 2.2e-05, Loss: 430.5
155 | 2022-07-22 20:02:29,805 [INFO] Epoch [10/60], Step [400/931], LR 2.2e-05, Loss: 492.0
156 | 2022-07-22 20:04:10,502 [INFO] Epoch [10/60], Step [500/931], LR 2.2e-05, Loss: 512.3
157 | 2022-07-22 20:05:51,735 [INFO] Epoch [10/60], Step [600/931], LR 2.3e-05, Loss: 476.7
158 | 2022-07-22 20:07:32,334 [INFO] Epoch [10/60], Step [700/931], LR 2.3e-05, Loss: 472.3
159 | 2022-07-22 20:09:13,029 [INFO] Epoch [10/60], Step [800/931], LR 2.3e-05, Loss: 413.4
160 | 2022-07-22 20:10:53,699 [INFO] Epoch [10/60], Step [900/931], LR 2.4e-05, Loss: 488.3
161 | 2022-07-22 20:11:23,478 [INFO] Start validation...
162 | 2022-07-22 20:13:09,127 [INFO] mAP score regular 58.46, mAP score EMA 59.12
163 | 2022-07-22 20:13:15,739 [INFO] current_mAP = 59.12, highest_mAP = 59.12, best_epoch=10
164 |
165 | 2022-07-22 20:13:15,739 [INFO] Save text embeddings done
166 | 2022-07-22 20:13:21,393 [INFO] Epoch [11/60], Step [000/931], LR 2.4e-05, Loss: 469.4
167 | 2022-07-22 20:15:02,065 [INFO] Epoch [11/60], Step [100/931], LR 2.4e-05, Loss: 471.5
168 | 2022-07-22 20:16:42,953 [INFO] Epoch [11/60], Step [200/931], LR 2.4e-05, Loss: 508.9
169 | 2022-07-22 20:18:23,656 [INFO] Epoch [11/60], Step [300/931], LR 2.4e-05, Loss: 453.1
170 | 2022-07-22 20:20:05,861 [INFO] Epoch [11/60], Step [400/931], LR 2.5e-05, Loss: 460.3
171 | 2022-07-22 20:21:47,189 [INFO] Epoch [11/60], Step [500/931], LR 2.5e-05, Loss: 418.6
172 | 2022-07-22 20:23:28,281 [INFO] Epoch [11/60], Step [600/931], LR 2.5e-05, Loss: 470.3
173 | 2022-07-22 20:25:09,164 [INFO] Epoch [11/60], Step [700/931], LR 2.5e-05, Loss: 434.9
174 | 2022-07-22 20:26:50,217 [INFO] Epoch [11/60], Step [800/931], LR 2.6e-05, Loss: 462.9
175 | 2022-07-22 20:28:32,165 [INFO] Epoch [11/60], Step [900/931], LR 2.6e-05, Loss: 506.0
176 | 2022-07-22 20:29:01,882 [INFO] Start validation...
177 | 2022-07-22 20:30:50,127 [INFO] mAP score regular 58.00, mAP score EMA 60.30
178 | 2022-07-22 20:30:56,423 [INFO] current_mAP = 60.30, highest_mAP = 60.30, best_epoch=11
179 |
180 | 2022-07-22 20:30:56,423 [INFO] Save text embeddings done
181 | 2022-07-22 20:31:01,779 [INFO] Epoch [12/60], Step [000/931], LR 2.6e-05, Loss: 454.3
182 | 2022-07-22 20:32:43,921 [INFO] Epoch [12/60], Step [100/931], LR 2.6e-05, Loss: 473.9
183 | 2022-07-22 20:34:26,710 [INFO] Epoch [12/60], Step [200/931], LR 2.6e-05, Loss: 484.2
184 | 2022-07-22 20:36:08,301 [INFO] Epoch [12/60], Step [300/931], LR 2.6e-05, Loss: 479.9
185 | 2022-07-22 20:37:50,309 [INFO] Epoch [12/60], Step [400/931], LR 2.7e-05, Loss: 483.3
186 | 2022-07-22 20:39:31,775 [INFO] Epoch [12/60], Step [500/931], LR 2.7e-05, Loss: 479.9
187 | 2022-07-22 20:41:14,192 [INFO] Epoch [12/60], Step [600/931], LR 2.7e-05, Loss: 473.1
188 | 2022-07-22 20:42:55,274 [INFO] Epoch [12/60], Step [700/931], LR 2.7e-05, Loss: 445.0
189 | 2022-07-22 20:44:37,150 [INFO] Epoch [12/60], Step [800/931], LR 2.7e-05, Loss: 496.0
190 | 2022-07-22 20:46:18,049 [INFO] Epoch [12/60], Step [900/931], LR 2.8e-05, Loss: 501.6
191 | 2022-07-22 20:46:47,986 [INFO] Start validation...
192 | 2022-07-22 20:48:35,301 [INFO] mAP score regular 57.37, mAP score EMA 61.14
193 | 2022-07-22 20:48:41,965 [INFO] current_mAP = 61.14, highest_mAP = 61.14, best_epoch=12
194 |
195 | 2022-07-22 20:48:41,965 [INFO] Save text embeddings done
196 | 2022-07-22 20:48:48,658 [INFO] Epoch [13/60], Step [000/931], LR 2.8e-05, Loss: 490.6
197 | 2022-07-22 20:50:29,657 [INFO] Epoch [13/60], Step [100/931], LR 2.8e-05, Loss: 451.0
198 | 2022-07-22 20:52:10,663 [INFO] Epoch [13/60], Step [200/931], LR 2.8e-05, Loss: 481.5
199 | 2022-07-22 20:53:51,202 [INFO] Epoch [13/60], Step [300/931], LR 2.8e-05, Loss: 451.2
200 | 2022-07-22 20:55:32,349 [INFO] Epoch [13/60], Step [400/931], LR 2.8e-05, Loss: 461.9
201 | 2022-07-22 20:57:13,332 [INFO] Epoch [13/60], Step [500/931], LR 2.8e-05, Loss: 504.6
202 | 2022-07-22 20:58:54,373 [INFO] Epoch [13/60], Step [600/931], LR 2.8e-05, Loss: 475.2
203 | 2022-07-22 21:00:35,350 [INFO] Epoch [13/60], Step [700/931], LR 2.9e-05, Loss: 495.0
204 | 2022-07-22 21:02:16,973 [INFO] Epoch [13/60], Step [800/931], LR 2.9e-05, Loss: 437.8
205 | 2022-07-22 21:03:58,042 [INFO] Epoch [13/60], Step [900/931], LR 2.9e-05, Loss: 468.1
206 | 2022-07-22 21:04:27,835 [INFO] Start validation...
207 | 2022-07-22 21:06:15,938 [INFO] mAP score regular 56.19, mAP score EMA 61.67
208 | 2022-07-22 21:06:22,436 [INFO] current_mAP = 61.67, highest_mAP = 61.67, best_epoch=13
209 |
210 | 2022-07-22 21:06:22,437 [INFO] Save text embeddings done
211 | 2022-07-22 21:06:27,647 [INFO] Epoch [14/60], Step [000/931], LR 2.9e-05, Loss: 469.8
212 | 2022-07-22 21:08:10,284 [INFO] Epoch [14/60], Step [100/931], LR 2.9e-05, Loss: 453.6
213 | 2022-07-22 21:09:52,117 [INFO] Epoch [14/60], Step [200/931], LR 2.9e-05, Loss: 501.0
214 | 2022-07-22 21:11:33,067 [INFO] Epoch [14/60], Step [300/931], LR 2.9e-05, Loss: 437.3
215 | 2022-07-22 21:13:13,941 [INFO] Epoch [14/60], Step [400/931], LR 2.9e-05, Loss: 502.1
216 | 2022-07-22 21:14:54,995 [INFO] Epoch [14/60], Step [500/931], LR 2.9e-05, Loss: 486.2
217 | 2022-07-22 21:16:36,150 [INFO] Epoch [14/60], Step [600/931], LR 2.9e-05, Loss: 460.9
218 | 2022-07-22 21:18:17,339 [INFO] Epoch [14/60], Step [700/931], LR 3.0e-05, Loss: 466.3
219 | 2022-07-22 21:19:58,371 [INFO] Epoch [14/60], Step [800/931], LR 3.0e-05, Loss: 452.6
220 | 2022-07-22 21:21:40,733 [INFO] Epoch [14/60], Step [900/931], LR 3.0e-05, Loss: 466.5
221 | 2022-07-22 21:22:10,500 [INFO] Start validation...
222 | 2022-07-22 21:24:01,041 [INFO] mAP score regular 56.97, mAP score EMA 61.98
223 | 2022-07-22 21:24:08,292 [INFO] current_mAP = 61.98, highest_mAP = 61.98, best_epoch=14
224 |
225 | 2022-07-22 21:24:08,293 [INFO] Save text embeddings done
226 | 2022-07-22 21:24:13,050 [INFO] Epoch [15/60], Step [000/931], LR 3.0e-05, Loss: 428.4
227 | 2022-07-22 21:25:55,328 [INFO] Epoch [15/60], Step [100/931], LR 3.0e-05, Loss: 453.4
228 | 2022-07-22 21:27:37,141 [INFO] Epoch [15/60], Step [200/931], LR 3.0e-05, Loss: 467.9
229 | 2022-07-22 21:29:18,685 [INFO] Epoch [15/60], Step [300/931], LR 3.0e-05, Loss: 440.8
230 | 2022-07-22 21:30:59,519 [INFO] Epoch [15/60], Step [400/931], LR 3.0e-05, Loss: 448.2
231 | 2022-07-22 21:32:40,941 [INFO] Epoch [15/60], Step [500/931], LR 3.0e-05, Loss: 467.9
232 | 2022-07-22 21:34:22,354 [INFO] Epoch [15/60], Step [600/931], LR 3.0e-05, Loss: 463.5
233 | 2022-07-22 21:36:04,211 [INFO] Epoch [15/60], Step [700/931], LR 3.0e-05, Loss: 454.4
234 | 2022-07-22 21:37:45,543 [INFO] Epoch [15/60], Step [800/931], LR 3.0e-05, Loss: 486.7
235 | 2022-07-22 21:39:26,770 [INFO] Epoch [15/60], Step [900/931], LR 3.0e-05, Loss: 455.3
236 | 2022-07-22 21:39:56,605 [INFO] Start validation...
237 | 2022-07-22 21:41:47,027 [INFO] mAP score regular 54.83, mAP score EMA 62.04
238 | 2022-07-22 21:41:53,996 [INFO] current_mAP = 62.04, highest_mAP = 62.04, best_epoch=15
239 |
240 | 2022-07-22 21:41:53,997 [INFO] Save text embeddings done
241 | 2022-07-22 21:42:00,142 [INFO] Epoch [16/60], Step [000/931], LR 3.0e-05, Loss: 441.6
242 | 2022-07-22 21:43:42,084 [INFO] Epoch [16/60], Step [100/931], LR 3.0e-05, Loss: 411.5
243 | 2022-07-22 21:45:23,530 [INFO] Epoch [16/60], Step [200/931], LR 3.0e-05, Loss: 417.8
244 | 2022-07-22 21:47:04,952 [INFO] Epoch [16/60], Step [300/931], LR 3.0e-05, Loss: 470.4
245 | 2022-07-22 21:48:46,557 [INFO] Epoch [16/60], Step [400/931], LR 3.0e-05, Loss: 448.7
246 | 2022-07-22 21:50:29,074 [INFO] Epoch [16/60], Step [500/931], LR 3.0e-05, Loss: 412.4
247 | 2022-07-22 21:52:10,681 [INFO] Epoch [16/60], Step [600/931], LR 3.0e-05, Loss: 453.9
248 | 2022-07-22 21:53:52,312 [INFO] Epoch [16/60], Step [700/931], LR 3.0e-05, Loss: 421.8
249 | 2022-07-22 21:55:34,295 [INFO] Epoch [16/60], Step [800/931], LR 3.0e-05, Loss: 460.4
250 | 2022-07-22 21:57:16,768 [INFO] Epoch [16/60], Step [900/931], LR 3.0e-05, Loss: 436.2
251 | 2022-07-22 21:57:46,734 [INFO] Start validation...
252 | 2022-07-22 21:59:38,726 [INFO] mAP score regular 56.43, mAP score EMA 61.85
253 | 2022-07-22 21:59:38,743 [INFO] current_mAP = 61.85, highest_mAP = 62.04, best_epoch=15
254 |
255 | 2022-07-22 21:59:38,743 [INFO] Save text embeddings done
256 | 2022-07-22 21:59:44,728 [INFO] Epoch [17/60], Step [000/931], LR 3.0e-05, Loss: 465.2
257 | 2022-07-22 22:01:27,612 [INFO] Epoch [17/60], Step [100/931], LR 3.0e-05, Loss: 466.5
258 | 2022-07-22 22:03:09,889 [INFO] Epoch [17/60], Step [200/931], LR 3.0e-05, Loss: 459.8
259 | 2022-07-22 22:04:51,506 [INFO] Epoch [17/60], Step [300/931], LR 3.0e-05, Loss: 483.3
260 | 2022-07-22 22:06:32,686 [INFO] Epoch [17/60], Step [400/931], LR 3.0e-05, Loss: 421.2
261 | 2022-07-22 22:08:14,521 [INFO] Epoch [17/60], Step [500/931], LR 3.0e-05, Loss: 411.6
262 | 2022-07-22 22:09:56,089 [INFO] Epoch [17/60], Step [600/931], LR 3.0e-05, Loss: 432.9
263 | 2022-07-22 22:11:37,423 [INFO] Epoch [17/60], Step [700/931], LR 3.0e-05, Loss: 398.5
264 | 2022-07-22 22:13:19,035 [INFO] Epoch [17/60], Step [800/931], LR 3.0e-05, Loss: 422.8
265 | 2022-07-22 22:15:00,741 [INFO] Epoch [17/60], Step [900/931], LR 3.0e-05, Loss: 488.0
266 | 2022-07-22 22:15:30,760 [INFO] Start validation...
267 | 2022-07-22 22:17:16,056 [INFO] mAP score regular 55.89, mAP score EMA 61.57
268 | 2022-07-22 22:17:16,071 [INFO] current_mAP = 61.57, highest_mAP = 62.04, best_epoch=15
269 |
270 | 2022-07-22 22:17:16,071 [INFO] Save text embeddings done
271 | 2022-07-22 22:17:24,893 [INFO] Epoch [18/60], Step [000/931], LR 3.0e-05, Loss: 477.4
272 | 2022-07-22 22:19:06,060 [INFO] Epoch [18/60], Step [100/931], LR 3.0e-05, Loss: 451.3
273 | 2022-07-22 22:20:47,363 [INFO] Epoch [18/60], Step [200/931], LR 3.0e-05, Loss: 456.4
274 | 2022-07-22 22:22:29,607 [INFO] Epoch [18/60], Step [300/931], LR 3.0e-05, Loss: 459.9
275 | 2022-07-22 22:24:11,986 [INFO] Epoch [18/60], Step [400/931], LR 3.0e-05, Loss: 440.4
276 | 2022-07-22 22:25:53,735 [INFO] Epoch [18/60], Step [500/931], LR 3.0e-05, Loss: 400.4
277 | 2022-07-22 22:27:35,577 [INFO] Epoch [18/60], Step [600/931], LR 3.0e-05, Loss: 464.0
278 | 2022-07-22 22:29:16,623 [INFO] Epoch [18/60], Step [700/931], LR 3.0e-05, Loss: 424.3
279 | 2022-07-22 22:30:58,613 [INFO] Epoch [18/60], Step [800/931], LR 3.0e-05, Loss: 471.1
280 | 2022-07-22 22:32:40,558 [INFO] Epoch [18/60], Step [900/931], LR 3.0e-05, Loss: 458.9
281 | 2022-07-22 22:33:10,468 [INFO] Start validation...
282 | 2022-07-22 22:34:59,445 [INFO] mAP score regular 56.24, mAP score EMA 61.21
283 | 2022-07-22 22:34:59,462 [INFO] current_mAP = 61.21, highest_mAP = 62.04, best_epoch=15
284 |
285 | 2022-07-22 22:34:59,462 [INFO] Save text embeddings done
286 | 2022-07-22 22:35:03,586 [INFO] Epoch [19/60], Step [000/931], LR 3.0e-05, Loss: 412.9
287 | 2022-07-22 22:36:46,505 [INFO] Epoch [19/60], Step [100/931], LR 3.0e-05, Loss: 382.3
288 | 2022-07-22 22:38:29,144 [INFO] Epoch [19/60], Step [200/931], LR 3.0e-05, Loss: 370.6
289 | 2022-07-22 22:40:12,081 [INFO] Epoch [19/60], Step [300/931], LR 3.0e-05, Loss: 438.2
290 | 2022-07-22 22:41:53,928 [INFO] Epoch [19/60], Step [400/931], LR 3.0e-05, Loss: 464.3
291 | 2022-07-22 22:43:35,950 [INFO] Epoch [19/60], Step [500/931], LR 3.0e-05, Loss: 473.7
292 | 2022-07-22 22:45:17,584 [INFO] Epoch [19/60], Step [600/931], LR 3.0e-05, Loss: 423.3
293 | 2022-07-22 22:46:59,126 [INFO] Epoch [19/60], Step [700/931], LR 3.0e-05, Loss: 428.5
294 | 2022-07-22 22:48:40,476 [INFO] Epoch [19/60], Step [800/931], LR 3.0e-05, Loss: 400.4
295 | 2022-07-22 22:50:22,851 [INFO] Epoch [19/60], Step [900/931], LR 3.0e-05, Loss: 446.7
296 | 2022-07-22 22:50:52,918 [INFO] Start validation...
297 | 2022-07-22 22:52:42,333 [INFO] mAP score regular 55.96, mAP score EMA 60.90
298 | 2022-07-22 22:52:42,367 [INFO] current_mAP = 60.90, highest_mAP = 62.04, best_epoch=15
299 |
300 | 2022-07-22 22:52:42,367 [INFO] Save text embeddings done
301 | 2022-07-22 22:52:47,543 [INFO] Epoch [20/60], Step [000/931], LR 3.0e-05, Loss: 463.8
302 | 2022-07-22 22:54:30,558 [INFO] Epoch [20/60], Step [100/931], LR 3.0e-05, Loss: 417.2
303 | 2022-07-22 22:56:12,176 [INFO] Epoch [20/60], Step [200/931], LR 3.0e-05, Loss: 407.8
304 | 2022-07-22 22:57:54,237 [INFO] Epoch [20/60], Step [300/931], LR 3.0e-05, Loss: 431.0
305 | 2022-07-22 22:59:36,853 [INFO] Epoch [20/60], Step [400/931], LR 3.0e-05, Loss: 402.1
306 | 2022-07-22 23:01:18,576 [INFO] Epoch [20/60], Step [500/931], LR 3.0e-05, Loss: 411.5
307 | 2022-07-22 23:03:00,285 [INFO] Epoch [20/60], Step [600/931], LR 3.0e-05, Loss: 403.1
308 | 2022-07-22 23:04:41,332 [INFO] Epoch [20/60], Step [700/931], LR 3.0e-05, Loss: 440.3
309 | 2022-07-22 23:06:23,263 [INFO] Epoch [20/60], Step [800/931], LR 3.0e-05, Loss: 453.7
310 | 2022-07-22 23:08:05,801 [INFO] Epoch [20/60], Step [900/931], LR 3.0e-05, Loss: 476.9
311 | 2022-07-22 23:08:35,797 [INFO] Start validation...
312 | 2022-07-22 23:10:21,861 [INFO] mAP score regular 54.57, mAP score EMA 60.69
313 | 2022-07-22 23:10:21,877 [INFO] current_mAP = 60.69, highest_mAP = 62.04, best_epoch=15
314 |
315 | 2022-07-22 23:10:21,877 [INFO] Save text embeddings done
316 | 2022-07-22 23:10:28,288 [INFO] Epoch [21/60], Step [000/931], LR 3.0e-05, Loss: 438.4
317 | 2022-07-22 23:12:09,766 [INFO] Epoch [21/60], Step [100/931], LR 3.0e-05, Loss: 419.9
318 | 2022-07-22 23:13:51,275 [INFO] Epoch [21/60], Step [200/931], LR 3.0e-05, Loss: 397.4
319 | 2022-07-22 23:15:32,690 [INFO] Epoch [21/60], Step [300/931], LR 2.9e-05, Loss: 398.8
320 | 2022-07-22 23:17:13,952 [INFO] Epoch [21/60], Step [400/931], LR 2.9e-05, Loss: 398.1
321 | 2022-07-22 23:18:55,569 [INFO] Epoch [21/60], Step [500/931], LR 2.9e-05, Loss: 447.3
322 | 2022-07-22 23:20:37,013 [INFO] Epoch [21/60], Step [600/931], LR 2.9e-05, Loss: 422.9
323 | 2022-07-22 23:22:18,985 [INFO] Epoch [21/60], Step [700/931], LR 2.9e-05, Loss: 423.4
324 | 2022-07-22 23:24:00,437 [INFO] Epoch [21/60], Step [800/931], LR 2.9e-05, Loss: 412.2
325 | 2022-07-22 23:25:42,140 [INFO] Epoch [21/60], Step [900/931], LR 2.9e-05, Loss: 442.7
326 | 2022-07-22 23:26:12,301 [INFO] Start validation...
327 | 2022-07-22 23:28:00,761 [INFO] mAP score regular 54.85, mAP score EMA 60.46
328 | 2022-07-22 23:28:00,779 [INFO] current_mAP = 60.46, highest_mAP = 62.04, best_epoch=15
329 |
330 | 2022-07-22 23:28:00,780 [INFO] Save text embeddings done
331 | 2022-07-22 23:28:05,091 [INFO] Epoch [22/60], Step [000/931], LR 2.9e-05, Loss: 423.0
332 | 2022-07-22 23:29:48,657 [INFO] Epoch [22/60], Step [100/931], LR 2.9e-05, Loss: 397.1
333 | 2022-07-22 23:31:29,919 [INFO] Epoch [22/60], Step [200/931], LR 2.9e-05, Loss: 405.3
334 | 2022-07-22 23:33:11,463 [INFO] Epoch [22/60], Step [300/931], LR 2.9e-05, Loss: 358.0
335 | 2022-07-22 23:34:52,871 [INFO] Epoch [22/60], Step [400/931], LR 2.9e-05, Loss: 402.1
336 | 2022-07-22 23:36:34,599 [INFO] Epoch [22/60], Step [500/931], LR 2.9e-05, Loss: 381.8
337 | 2022-07-22 23:38:16,141 [INFO] Epoch [22/60], Step [600/931], LR 2.9e-05, Loss: 408.6
338 | 2022-07-22 23:39:57,610 [INFO] Epoch [22/60], Step [700/931], LR 2.9e-05, Loss: 395.9
339 | 2022-07-22 23:41:39,738 [INFO] Epoch [22/60], Step [800/931], LR 2.9e-05, Loss: 431.5
340 | 2022-07-22 23:43:21,677 [INFO] Epoch [22/60], Step [900/931], LR 2.9e-05, Loss: 426.9
341 | 2022-07-22 23:43:52,092 [INFO] Start validation...
342 | 2022-07-22 23:45:38,008 [INFO] mAP score regular 55.11, mAP score EMA 60.24
343 | 2022-07-22 23:45:38,023 [INFO] current_mAP = 60.24, highest_mAP = 62.04, best_epoch=15
344 |
345 | 2022-07-22 23:45:38,023 [INFO] Save text embeddings done
346 | 2022-07-22 23:45:44,623 [INFO] Epoch [23/60], Step [000/931], LR 2.9e-05, Loss: 429.3
347 | 2022-07-22 23:47:25,950 [INFO] Epoch [23/60], Step [100/931], LR 2.9e-05, Loss: 382.3
348 | 2022-07-22 23:49:07,550 [INFO] Epoch [23/60], Step [200/931], LR 2.9e-05, Loss: 414.0
349 | 2022-07-22 23:50:48,862 [INFO] Epoch [23/60], Step [300/931], LR 2.9e-05, Loss: 391.0
350 | 2022-07-22 23:52:30,438 [INFO] Epoch [23/60], Step [400/931], LR 2.9e-05, Loss: 392.5
351 | 2022-07-22 23:54:12,120 [INFO] Epoch [23/60], Step [500/931], LR 2.9e-05, Loss: 367.7
352 | 2022-07-22 23:55:53,637 [INFO] Epoch [23/60], Step [600/931], LR 2.9e-05, Loss: 403.6
353 | 2022-07-22 23:57:35,572 [INFO] Epoch [23/60], Step [700/931], LR 2.9e-05, Loss: 417.2
354 | 2022-07-22 23:59:18,266 [INFO] Epoch [23/60], Step [800/931], LR 2.9e-05, Loss: 392.9
355 | 2022-07-23 00:01:00,770 [INFO] Epoch [23/60], Step [900/931], LR 2.9e-05, Loss: 420.5
356 | 2022-07-23 00:01:30,815 [INFO] Start validation...
357 | 2022-07-23 00:03:16,200 [INFO] mAP score regular 54.67, mAP score EMA 59.98
358 | 2022-07-23 00:03:16,214 [INFO] current_mAP = 59.98, highest_mAP = 62.04, best_epoch=15
359 |
360 | 2022-07-23 00:03:16,215 [INFO] Save text embeddings done
361 | 2022-07-23 00:03:22,167 [INFO] Epoch [24/60], Step [000/931], LR 2.9e-05, Loss: 392.7
362 | 2022-07-23 00:05:03,694 [INFO] Epoch [24/60], Step [100/931], LR 2.9e-05, Loss: 382.0
363 | 2022-07-23 00:06:45,183 [INFO] Epoch [24/60], Step [200/931], LR 2.9e-05, Loss: 398.3
364 | 2022-07-23 00:08:26,419 [INFO] Epoch [24/60], Step [300/931], LR 2.9e-05, Loss: 405.1
365 | 2022-07-23 00:10:07,741 [INFO] Epoch [24/60], Step [400/931], LR 2.9e-05, Loss: 375.0
366 | 2022-07-23 00:11:49,190 [INFO] Epoch [24/60], Step [500/931], LR 2.9e-05, Loss: 422.6
367 | 2022-07-23 00:13:31,083 [INFO] Epoch [24/60], Step [600/931], LR 2.9e-05, Loss: 385.8
368 | 2022-07-23 00:15:12,802 [INFO] Epoch [24/60], Step [700/931], LR 2.9e-05, Loss: 405.0
369 | 2022-07-23 00:16:54,887 [INFO] Epoch [24/60], Step [800/931], LR 2.9e-05, Loss: 383.2
370 |
--------------------------------------------------------------------------------
/logs/scpnet+voc.txt:
--------------------------------------------------------------------------------
1 | 2022-07-23 11:14:06,053 [INFO] Epoch [0/120], Step [000/045], LR 1.6e-06, Loss: 3571.0
2 | 2022-07-23 11:14:49,796 [INFO] Start validation...
3 | 2022-07-23 11:15:03,834 [INFO] mAP score regular 35.93, mAP score EMA 73.83
4 | 2022-07-23 11:15:05,482 [INFO] current_mAP = 73.83, highest_mAP = 73.83, best_epoch=0
5 |
6 | 2022-07-23 11:15:05,482 [INFO] Save text embeddings done
7 | 2022-07-23 11:15:12,790 [INFO] Epoch [1/120], Step [000/045], LR 1.8e-06, Loss: 894.1
8 | 2022-07-23 11:15:57,073 [INFO] Start validation...
9 | 2022-07-23 11:16:10,915 [INFO] mAP score regular 43.35, mAP score EMA 71.88
10 | 2022-07-23 11:16:10,925 [INFO] current_mAP = 71.88, highest_mAP = 73.83, best_epoch=0
11 |
12 | 2022-07-23 11:16:10,925 [INFO] Save text embeddings done
13 | 2022-07-23 11:16:18,474 [INFO] Epoch [2/120], Step [000/045], LR 2.3e-06, Loss: 540.6
14 | 2022-07-23 11:17:01,149 [INFO] Start validation...
15 | 2022-07-23 11:17:14,754 [INFO] mAP score regular 57.53, mAP score EMA 69.09
16 | 2022-07-23 11:17:14,769 [INFO] current_mAP = 69.09, highest_mAP = 73.83, best_epoch=0
17 |
18 | 2022-07-23 11:17:14,769 [INFO] Save text embeddings done
19 | 2022-07-23 11:17:20,126 [INFO] Epoch [3/120], Step [000/045], LR 3.1e-06, Loss: 462.6
20 | 2022-07-23 11:18:04,651 [INFO] Start validation...
21 | 2022-07-23 11:18:18,585 [INFO] mAP score regular 72.38, mAP score EMA 68.07
22 | 2022-07-23 11:18:18,602 [INFO] current_mAP = 72.38, highest_mAP = 73.83, best_epoch=0
23 |
24 | 2022-07-23 11:18:18,602 [INFO] Save text embeddings done
25 | 2022-07-23 11:18:26,176 [INFO] Epoch [4/120], Step [000/045], LR 4.2e-06, Loss: 482.1
26 | 2022-07-23 11:19:09,089 [INFO] Start validation...
27 | 2022-07-23 11:19:22,748 [INFO] mAP score regular 79.14, mAP score EMA 69.34
28 | 2022-07-23 11:19:29,342 [INFO] current_mAP = 79.14, highest_mAP = 79.14, best_epoch=4
29 |
30 | 2022-07-23 11:19:29,343 [INFO] Save text embeddings done
31 | 2022-07-23 11:19:37,442 [INFO] Epoch [5/120], Step [000/045], LR 5.6e-06, Loss: 415.2
32 | 2022-07-23 11:20:20,641 [INFO] Start validation...
33 | 2022-07-23 11:20:34,293 [INFO] mAP score regular 82.83, mAP score EMA 72.48
34 | 2022-07-23 11:20:41,011 [INFO] current_mAP = 82.83, highest_mAP = 82.83, best_epoch=5
35 |
36 | 2022-07-23 11:20:41,012 [INFO] Save text embeddings done
37 | 2022-07-23 11:20:47,217 [INFO] Epoch [6/120], Step [000/045], LR 7.3e-06, Loss: 442.0
38 | 2022-07-23 11:21:30,075 [INFO] Start validation...
39 | 2022-07-23 11:21:43,656 [INFO] mAP score regular 85.28, mAP score EMA 76.40
40 | 2022-07-23 11:21:50,637 [INFO] current_mAP = 85.28, highest_mAP = 85.28, best_epoch=6
41 |
42 | 2022-07-23 11:21:50,638 [INFO] Save text embeddings done
43 | 2022-07-23 11:21:56,390 [INFO] Epoch [7/120], Step [000/045], LR 9.2e-06, Loss: 434.3
44 | 2022-07-23 11:22:40,778 [INFO] Start validation...
45 | 2022-07-23 11:22:54,413 [INFO] mAP score regular 86.72, mAP score EMA 80.10
46 | 2022-07-23 11:23:00,835 [INFO] current_mAP = 86.72, highest_mAP = 86.72, best_epoch=7
47 |
48 | 2022-07-23 11:23:00,835 [INFO] Save text embeddings done
49 | 2022-07-23 11:23:07,409 [INFO] Epoch [8/120], Step [000/045], LR 1.1e-05, Loss: 355.7
50 | 2022-07-23 11:23:50,891 [INFO] Start validation...
51 | 2022-07-23 11:24:04,527 [INFO] mAP score regular 88.37, mAP score EMA 83.00
52 | 2022-07-23 11:24:10,715 [INFO] current_mAP = 88.37, highest_mAP = 88.37, best_epoch=8
53 |
54 | 2022-07-23 11:24:10,716 [INFO] Save text embeddings done
55 | 2022-07-23 11:24:18,045 [INFO] Epoch [9/120], Step [000/045], LR 1.4e-05, Loss: 345.8
56 | 2022-07-23 11:25:01,379 [INFO] Start validation...
57 | 2022-07-23 11:25:15,006 [INFO] mAP score regular 88.61, mAP score EMA 85.20
58 | 2022-07-23 11:25:21,417 [INFO] current_mAP = 88.61, highest_mAP = 88.61, best_epoch=9
59 |
60 | 2022-07-23 11:25:21,418 [INFO] Save text embeddings done
61 | 2022-07-23 11:25:29,395 [INFO] Epoch [10/120], Step [000/045], LR 1.6e-05, Loss: 365.9
62 | 2022-07-23 11:26:12,363 [INFO] Start validation...
63 | 2022-07-23 11:26:25,895 [INFO] mAP score regular 89.47, mAP score EMA 86.74
64 | 2022-07-23 11:26:32,584 [INFO] current_mAP = 89.47, highest_mAP = 89.47, best_epoch=10
65 |
66 | 2022-07-23 11:26:32,584 [INFO] Save text embeddings done
67 | 2022-07-23 11:26:38,868 [INFO] Epoch [11/120], Step [000/045], LR 1.8e-05, Loss: 367.3
68 | 2022-07-23 11:27:22,388 [INFO] Start validation...
69 | 2022-07-23 11:27:36,294 [INFO] mAP score regular 89.78, mAP score EMA 87.93
70 | 2022-07-23 11:27:42,804 [INFO] current_mAP = 89.78, highest_mAP = 89.78, best_epoch=11
71 |
72 | 2022-07-23 11:27:42,805 [INFO] Save text embeddings done
73 | 2022-07-23 11:27:50,380 [INFO] Epoch [12/120], Step [000/045], LR 2.1e-05, Loss: 334.9
74 | 2022-07-23 11:28:33,685 [INFO] Start validation...
75 | 2022-07-23 11:28:47,273 [INFO] mAP score regular 89.98, mAP score EMA 88.76
76 | 2022-07-23 11:28:58,709 [INFO] current_mAP = 89.98, highest_mAP = 89.98, best_epoch=12
77 |
78 | 2022-07-23 11:28:58,709 [INFO] Save text embeddings done
79 | 2022-07-23 11:29:06,397 [INFO] Epoch [13/120], Step [000/045], LR 2.3e-05, Loss: 331.9
80 | 2022-07-23 11:29:49,412 [INFO] Start validation...
81 | 2022-07-23 11:30:03,142 [INFO] mAP score regular 90.17, mAP score EMA 89.42
82 | 2022-07-23 11:30:09,868 [INFO] current_mAP = 90.17, highest_mAP = 90.17, best_epoch=13
83 |
84 | 2022-07-23 11:30:09,869 [INFO] Save text embeddings done
85 | 2022-07-23 11:30:15,575 [INFO] Epoch [14/120], Step [000/045], LR 2.6e-05, Loss: 347.5
86 | 2022-07-23 11:30:58,531 [INFO] Start validation...
87 | 2022-07-23 11:31:12,207 [INFO] mAP score regular 90.07, mAP score EMA 89.97
88 | 2022-07-23 11:31:12,222 [INFO] current_mAP = 90.07, highest_mAP = 90.17, best_epoch=13
89 |
90 | 2022-07-23 11:31:12,222 [INFO] Save text embeddings done
91 | 2022-07-23 11:31:20,367 [INFO] Epoch [15/120], Step [000/045], LR 2.8e-05, Loss: 332.7
92 | 2022-07-23 11:32:03,296 [INFO] Start validation...
93 | 2022-07-23 11:32:16,901 [INFO] mAP score regular 90.44, mAP score EMA 90.36
94 | 2022-07-23 11:32:23,289 [INFO] current_mAP = 90.44, highest_mAP = 90.44, best_epoch=15
95 |
96 | 2022-07-23 11:32:23,290 [INFO] Save text embeddings done
97 | 2022-07-23 11:32:29,544 [INFO] Epoch [16/120], Step [000/045], LR 3.0e-05, Loss: 285.7
98 | 2022-07-23 11:33:12,549 [INFO] Start validation...
99 | 2022-07-23 11:33:26,250 [INFO] mAP score regular 89.82, mAP score EMA 90.65
100 | 2022-07-23 11:33:32,635 [INFO] current_mAP = 90.65, highest_mAP = 90.65, best_epoch=16
101 |
102 | 2022-07-23 11:33:32,636 [INFO] Save text embeddings done
103 | 2022-07-23 11:33:39,803 [INFO] Epoch [17/120], Step [000/045], LR 3.3e-05, Loss: 314.5
104 | 2022-07-23 11:34:23,025 [INFO] Start validation...
105 | 2022-07-23 11:34:36,961 [INFO] mAP score regular 90.27, mAP score EMA 90.83
106 | 2022-07-23 11:34:42,962 [INFO] current_mAP = 90.83, highest_mAP = 90.83, best_epoch=17
107 |
108 | 2022-07-23 11:34:43,351 [INFO] Save text embeddings done
109 | 2022-07-23 11:34:51,397 [INFO] Epoch [18/120], Step [000/045], LR 3.4e-05, Loss: 340.0
110 | 2022-07-23 11:35:34,678 [INFO] Start validation...
111 | 2022-07-23 11:35:48,436 [INFO] mAP score regular 89.55, mAP score EMA 90.96
112 | 2022-07-23 11:35:54,681 [INFO] current_mAP = 90.96, highest_mAP = 90.96, best_epoch=18
113 |
114 | 2022-07-23 11:35:54,681 [INFO] Save text embeddings done
115 | 2022-07-23 11:36:00,680 [INFO] Epoch [19/120], Step [000/045], LR 3.6e-05, Loss: 315.2
116 | 2022-07-23 11:36:44,151 [INFO] Start validation...
117 | 2022-07-23 11:36:58,147 [INFO] mAP score regular 89.19, mAP score EMA 91.04
118 | 2022-07-23 11:37:04,608 [INFO] current_mAP = 91.04, highest_mAP = 91.04, best_epoch=19
119 |
120 | 2022-07-23 11:37:04,609 [INFO] Save text embeddings done
121 | 2022-07-23 11:37:10,919 [INFO] Epoch [20/120], Step [000/045], LR 3.7e-05, Loss: 338.2
122 | 2022-07-23 11:37:53,949 [INFO] Start validation...
123 | 2022-07-23 11:38:07,863 [INFO] mAP score regular 89.12, mAP score EMA 91.13
124 | 2022-07-23 11:38:14,331 [INFO] current_mAP = 91.13, highest_mAP = 91.13, best_epoch=20
125 |
126 | 2022-07-23 11:38:14,332 [INFO] Save text embeddings done
127 | 2022-07-23 11:38:22,259 [INFO] Epoch [21/120], Step [000/045], LR 3.9e-05, Loss: 343.8
128 | 2022-07-23 11:39:05,336 [INFO] Start validation...
129 | 2022-07-23 11:39:19,372 [INFO] mAP score regular 88.68, mAP score EMA 91.15
130 | 2022-07-23 11:39:25,583 [INFO] current_mAP = 91.15, highest_mAP = 91.15, best_epoch=21
131 |
132 | 2022-07-23 11:39:25,584 [INFO] Save text embeddings done
133 | 2022-07-23 11:39:32,159 [INFO] Epoch [22/120], Step [000/045], LR 3.9e-05, Loss: 325.2
134 | 2022-07-23 11:40:15,595 [INFO] Start validation...
135 | 2022-07-23 11:40:29,531 [INFO] mAP score regular 88.36, mAP score EMA 91.16
136 | 2022-07-23 11:40:35,989 [INFO] current_mAP = 91.16, highest_mAP = 91.16, best_epoch=22
137 |
138 | 2022-07-23 11:40:35,990 [INFO] Save text embeddings done
139 | 2022-07-23 11:40:41,879 [INFO] Epoch [23/120], Step [000/045], LR 4.0e-05, Loss: 329.5
140 | 2022-07-23 11:41:25,363 [INFO] Start validation...
141 | 2022-07-23 11:41:39,352 [INFO] mAP score regular 88.45, mAP score EMA 91.09
142 | 2022-07-23 11:41:39,363 [INFO] current_mAP = 91.09, highest_mAP = 91.16, best_epoch=22
143 |
144 | 2022-07-23 11:41:39,363 [INFO] Save text embeddings done
145 | 2022-07-23 11:41:46,142 [INFO] Epoch [24/120], Step [000/045], LR 4.0e-05, Loss: 295.2
146 | 2022-07-23 11:42:29,380 [INFO] Start validation...
147 | 2022-07-23 11:42:43,251 [INFO] mAP score regular 87.28, mAP score EMA 90.97
148 | 2022-07-23 11:42:43,274 [INFO] current_mAP = 90.97, highest_mAP = 91.16, best_epoch=22
149 |
150 | 2022-07-23 11:42:43,274 [INFO] Save text embeddings done
151 | 2022-07-23 11:42:51,054 [INFO] Epoch [25/120], Step [000/045], LR 4.0e-05, Loss: 310.6
152 | 2022-07-23 11:43:34,602 [INFO] Start validation...
153 | 2022-07-23 11:43:48,753 [INFO] mAP score regular 88.02, mAP score EMA 90.86
154 | 2022-07-23 11:43:48,774 [INFO] current_mAP = 90.86, highest_mAP = 91.16, best_epoch=22
155 |
156 | 2022-07-23 11:43:48,774 [INFO] Save text embeddings done
157 | 2022-07-23 11:43:55,623 [INFO] Epoch [26/120], Step [000/045], LR 4.0e-05, Loss: 285.2
158 | 2022-07-23 11:44:39,114 [INFO] Start validation...
159 | 2022-07-23 11:44:52,902 [INFO] mAP score regular 87.06, mAP score EMA 90.75
160 | 2022-07-23 11:44:52,914 [INFO] current_mAP = 90.75, highest_mAP = 91.16, best_epoch=22
161 |
162 | 2022-07-23 11:44:52,914 [INFO] Save text embeddings done
163 | 2022-07-23 11:44:59,368 [INFO] Epoch [27/120], Step [000/045], LR 4.0e-05, Loss: 294.0
164 |
--------------------------------------------------------------------------------
/loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class SPLC(nn.Module):
7 | r""" SPLC loss as described in the paper "Simple Loss Design for Multi-Label Learning with Missing Labels "
8 |
9 | .. math::
10 | &L_{SPLC}^+ = loss^+(p)
11 | &L_{SPLC}^- = \mathbb{I}(p\leq \tau)loss^-(p) + (1-\mathbb{I}(p\leq \tau))loss^+(p)
12 |
13 | where :math:'\tau' is a threshold to identify missing label
14 | :math:`$\mathbb{I}(\cdot)\in\{0,1\}$` is the indicator function,
15 | :math: $loss^+(\cdot), loss^-(\cdot)$ refer to loss functions for positives and negatives, respectively.
16 |
17 | .. note::
18 | SPLC can be combinded with various multi-label loss functions.
19 | SPLC performs best combined with Focal margin loss in our paper. Code of SPLC with Focal margin loss is released here.
20 | Since the first epoch can recall few missing labels with high precision, SPLC can be used ater the first epoch.
21 | Sigmoid will be done in loss.
22 |
23 | Args:
24 | tau (float): threshold value. Default: 0.6
25 | change_epoch (int): which epoch to combine SPLC. Default: 1
26 | margin (float): Margin value. Default: 1
27 | gamma (float): Hard mining value. Default: 2
28 | reduction (string, optional): Specifies the reduction to apply to the output:
29 | ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied,
30 | ``'mean'``: the sum of the output will be divided by the number of
31 | elements in the output, ``'sum'``: the output will be summed. Default: ``'sum'``
32 |
33 | """
34 |
35 | def __init__(
36 | self,
37 | tau: float = 0.6,
38 | change_epoch: int = 1,
39 | margin: float = 1.0,
40 | gamma: float = 2.0,
41 | ) -> None:
42 | super(SPLC, self).__init__()
43 | self.tau = tau
44 | self.change_epoch = change_epoch
45 | self.margin = margin
46 | self.gamma = gamma
47 |
48 | def forward(self, logits: torch.Tensor, targets: torch.Tensor,
49 | epoch):
50 | """
51 | call function as forward
52 |
53 | Args:
54 | logits : The predicted logits before sigmoid with shape of :math:`(N, C)`
55 | targets : Multi-label binarized vector with shape of :math:`(N, C)`
56 | epoch : The epoch of current training.
57 |
58 | Returns:
59 | torch.Tensor: loss
60 | """
61 | # Subtract margin for positive logits
62 | logits = torch.where(targets == 1, logits - self.margin, logits)
63 |
64 | # SPLC missing label correction
65 | if epoch >= self.change_epoch:
66 | targets = torch.where(
67 | torch.sigmoid(logits) > self.tau,
68 | torch.tensor(1).cuda(), targets)
69 |
70 | pred = torch.sigmoid(logits)
71 |
72 | # Focal margin for postive loss
73 | pt = (1 - pred) * targets + pred * (1 - targets)
74 | focal_weight = pt**self.gamma
75 |
76 | los_pos = targets * F.logsigmoid(logits)
77 | los_neg = (1 - targets) * F.logsigmoid(-logits)
78 |
79 | loss = -(los_pos + los_neg)
80 | loss *= focal_weight
81 |
82 | return loss.sum(), targets
83 |
--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
1 | from collections import OrderedDict
2 |
3 | import torch
4 | import torch.nn as nn
5 | from torch.nn import functional as F
6 |
7 | from clip import clip
8 | from clip.simple_tokenizer import SimpleTokenizer as _Tokenizer
9 | from config import cfg
10 | from log import logger
11 |
12 | _tokenizer = _Tokenizer()
13 |
14 | def load_clip_to_cpu():
15 | backbone_name = 'RN50'
16 | url = clip._MODELS[backbone_name]
17 | model_path = clip._download(url)
18 |
19 | try:
20 | # loading JIT archive
21 | model = torch.jit.load( # type: ignore
22 | model_path, map_location="cpu").eval()
23 | state_dict = None
24 |
25 | except RuntimeError:
26 | state_dict = torch.load(model_path, map_location="cpu")
27 |
28 | model = clip.build_model(state_dict or model.state_dict()) # type: ignore
29 |
30 | return model
31 |
32 |
33 | class TextEncoder(nn.Module):
34 |
35 | def __init__(self, clip_model):
36 | super().__init__()
37 | self.transformer = clip_model.transformer
38 | self.positional_embedding = clip_model.positional_embedding
39 | self.ln_final = clip_model.ln_final
40 | self.text_projection = clip_model.text_projection
41 | self.dtype = clip_model.dtype
42 |
43 | def forward(self, prompts, tokenized_prompts):
44 | x = prompts + self.positional_embedding.type(self.dtype)
45 | x = x.permute(1, 0, 2) # NLD -> LND
46 | x = self.transformer(x)
47 | x = x.permute(1, 0, 2) # LND -> NLD
48 | x = self.ln_final(x).type(self.dtype)
49 |
50 | # x.shape = [batch_size, n_ctx, transformer.width]
51 | # take features from the eot embedding (eot_token is the highest number in each sequence)
52 | x = x[torch.arange(x.shape[0]),
53 | tokenized_prompts.argmax(dim=-1)] @ self.text_projection
54 |
55 | return x
56 |
57 |
58 | class PromptLearner(nn.Module):
59 |
60 | def __init__(self, classnames, clip_model):
61 | super().__init__()
62 | n_cls = len(classnames)
63 | n_ctx = cfg.n_ctx
64 | dtype = clip_model.dtype
65 | clip_imsize = clip_model.visual.input_resolution
66 | cfg_imsize = 224
67 | assert cfg_imsize == clip_imsize, f"cfg_imsize ({cfg_imsize}) must equal to clip_imsize ({clip_imsize})"
68 |
69 | # use given words to initialize context vectors
70 | ctx_init = cfg.ctx_init.replace("_", " ")
71 | assert (n_ctx == len(ctx_init.split(" ")))
72 | prompt = clip.tokenize(ctx_init)
73 | with torch.no_grad():
74 | embedding = clip_model.token_embedding(prompt).type(dtype)
75 | ctx_vectors = embedding[0, 1:1 + n_ctx, :]
76 | prompt_prefix = ctx_init
77 |
78 | self.ctx = nn.Parameter(ctx_vectors) # type: ignore
79 | classnames = [name.replace("_", " ") for name in classnames]
80 | name_lens = [len(_tokenizer.encode(name)) for name in classnames]
81 | prompts = [prompt_prefix + " " + name + "." for name in classnames]
82 |
83 | tokenized_prompts = torch.cat([clip.tokenize(p) for p in prompts])
84 | with torch.no_grad():
85 | embedding = clip_model.token_embedding(tokenized_prompts).type(
86 | dtype)
87 |
88 | # These token vectors will be saved when in save_model(),
89 | # but they should be ignored in load_model() as we want to use
90 | # those computed using the current class names
91 | self.register_buffer("token_prefix", embedding[:, :1, :]) # SOS
92 | self.register_buffer("token_suffix",
93 | embedding[:, 1 + n_ctx:, :]) # CLS, EOS
94 | self.register_buffer("token_middle", embedding[:, 1:(1 + n_ctx), :])
95 | self.n_cls = n_cls
96 | self.n_ctx = n_ctx
97 | self.tokenized_prompts = tokenized_prompts # torch.Tensor
98 | self.name_lens = name_lens
99 |
100 | def forward(self):
101 | ctx = self.ctx
102 | if ctx.dim() == 2:
103 | ctx = ctx.unsqueeze(0).expand(self.n_cls, -1, -1)
104 | prefix = self.token_prefix
105 | suffix = self.token_suffix
106 |
107 | prompts = torch.cat(
108 | [
109 | prefix, # (n_cls, 1, dim)
110 | ctx, # (n_cls, n_ctx, dim)
111 | suffix, # (n_cls, *, dim)
112 | ], # type: ignore
113 | dim=1,
114 | )
115 | return prompts
116 |
117 | def load_clip_model():
118 | clip_model = load_clip_to_cpu()
119 |
120 | # CLIP's default precision is fp16
121 | clip_model.float()
122 | return clip_model, clip._transform(clip_model.visual.input_resolution)
123 |
124 | import math
125 | import numpy as np
126 | class GraphConvolution(nn.Module):
127 | """
128 | Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
129 | """
130 |
131 | def __init__(self, in_features, out_features, bias=False):
132 | super(GraphConvolution, self).__init__()
133 | self.in_features = in_features
134 | self.out_features = out_features
135 | self.weight = nn.parameter.Parameter(torch.Tensor(in_features, out_features))
136 | if bias:
137 | self.bias = nn.parameter.Parameter(torch.Tensor(1, 1, out_features))
138 | else:
139 | self.register_parameter('bias', None)
140 | self.reset_parameters()
141 |
142 | def reset_parameters(self):
143 | stdv = 1. / math.sqrt(self.weight.size(1))
144 | self.weight.data.uniform_(-stdv, stdv)
145 | if self.bias is not None:
146 | self.bias.data.uniform_(-stdv, stdv)
147 |
148 | def forward(self, input, adj):
149 | support = torch.matmul(input, self.weight)
150 | output = torch.matmul(adj, support)
151 | if self.bias is not None:
152 | return output + self.bias
153 | else:
154 | return output
155 |
156 | def __repr__(self):
157 | return self.__class__.__name__ + ' (' \
158 | + str(self.in_features) + ' -> ' \
159 | + str(self.out_features) + ')'
160 |
161 | from timm.models.vision_transformer import resize_pos_embed
162 | class SCPNet(nn.Module):
163 |
164 | def __init__(self, classnames, clip_model):
165 | super().__init__()
166 | self.prompt_learner = PromptLearner(classnames, clip_model)
167 | self.tokenized_prompts = self.prompt_learner.tokenized_prompts
168 | self.image_encoder = clip_model.visual
169 | self.text_encoder = TextEncoder(clip_model)
170 | self.logit_scale = clip_model.logit_scale
171 | self.dtype = clip_model.dtype
172 |
173 | self.gc1 = GraphConvolution(1024, 2048)
174 | self.gc2 = GraphConvolution(2048, 2048)
175 | self.gc3 = GraphConvolution(2048, 1024)
176 | self.relu = nn.LeakyReLU(0.2)
177 | self.relu2 = nn.LeakyReLU(0.2)
178 |
179 | self.relation = torch.Tensor(np.load(cfg.relation_file))
180 |
181 | _ ,max_idx = torch.topk(self.relation, cfg.sparse_topk)
182 | mask = torch.ones_like(self.relation).type(torch.bool)
183 | for i, idx in enumerate(max_idx):
184 | mask[i][idx] = 0
185 | self.relation[mask] = 0
186 | sparse_mask = mask
187 | dialog = torch.eye(cfg.num_classes).type(torch.bool)
188 | self.relation[dialog] = 0
189 | self.relation = self.relation / torch.sum(self.relation, dim=1).reshape(-1, 1) * cfg.reweight_p
190 | self.relation[dialog] = 1-cfg.reweight_p
191 |
192 | self.gcn_relation = self.relation.clone()
193 | assert(self.gcn_relation.requires_grad == False)
194 | self.relation = torch.exp(self.relation/cfg.T) / torch.sum(torch.exp(self.relation/cfg.T), dim=1).reshape(-1,1)
195 | self.relation[sparse_mask] = 0
196 | self.relation = self.relation / torch.sum(self.relation, dim=1).reshape(-1, 1)
197 |
198 | def forward(self, image):
199 | tokenized_prompts = self.tokenized_prompts
200 | image_features = self.image_encoder(image.type(self.dtype))
201 | image_features = image_features / image_features.norm(dim=-1,
202 | keepdim=True)
203 | logit_scale = self.logit_scale.exp()
204 | if cfg.scale != 'clip':
205 | assert(isinstance(cfg.scale, int))
206 | logit_scale = cfg.scale
207 | prompts = self.prompt_learner()
208 | text_features = self.text_encoder(prompts, tokenized_prompts)
209 | identity = text_features
210 |
211 | text_features = self.gc1(text_features, self.gcn_relation.cuda())
212 | text_features = self.relu(text_features)
213 | text_features = self.gc2(text_features, self.gcn_relation.cuda())
214 | text_features = self.relu2(text_features)
215 | text_features = self.gc3(text_features, self.gcn_relation.cuda())
216 |
217 | text_features += identity
218 | text_features = text_features / text_features.norm(dim=-1,
219 | keepdim=True)
220 | logits = logit_scale * image_features @ text_features.t()
221 | return logits
222 |
--------------------------------------------------------------------------------
/nuswide_labels.txt:
--------------------------------------------------------------------------------
1 | airport
2 | animal
3 | beach
4 | bear
5 | birds
6 | boats
7 | book
8 | bridge
9 | buildings
10 | cars
11 | castle
12 | cat
13 | cityscape
14 | clouds
15 | computer
16 | coral
17 | cow
18 | dancing
19 | dog
20 | earthquake
21 | elk
22 | fire
23 | fish
24 | flags
25 | flowers
26 | food
27 | fox
28 | frost
29 | garden
30 | glacier
31 | grass
32 | harbor
33 | horses
34 | house
35 | lake
36 | leaf
37 | map
38 | military
39 | moon
40 | mountain
41 | nighttime
42 | ocean
43 | person
44 | plane
45 | plants
46 | police
47 | protest
48 | railroad
49 | rainbow
50 | reflection
51 | road
52 | rocks
53 | running
54 | sand
55 | sign
56 | sky
57 | snow
58 | soccer
59 | sports
60 | statue
61 | street
62 | sun
63 | sunset
64 | surf
65 | swimmers
66 | tattoo
67 | temple
68 | tiger
69 | tower
70 | town
71 | toy
72 | train
73 | tree
74 | valley
75 | vehicle
76 | water
77 | waterfall
78 | wedding
79 | whales
80 | window
81 | zebra
--------------------------------------------------------------------------------
/randaugment.py:
--------------------------------------------------------------------------------
1 | # copyright: https://github.com/ildoonet/pytorch-randaugment
2 | # code in this file is adpated from rpmcruz/autoaugment
3 | # https://github.com/rpmcruz/autoaugment/blob/master/transformations.py
4 | # This code is modified version of one of ildoonet, for randaugmentation of fixmatch.
5 |
6 | import random
7 |
8 | import numpy as np
9 | import PIL
10 | import PIL.ImageDraw
11 | import PIL.ImageEnhance
12 | import PIL.ImageOps
13 | import torch
14 | import torch.nn.functional as F
15 | from PIL import Image
16 |
17 |
18 | def AutoContrast(img, _):
19 | return PIL.ImageOps.autocontrast(img)
20 |
21 |
22 | def Brightness(img, v):
23 | assert v >= 0.0
24 | return PIL.ImageEnhance.Brightness(img).enhance(v)
25 |
26 |
27 | def Color(img, v):
28 | assert v >= 0.0
29 | return PIL.ImageEnhance.Color(img).enhance(v)
30 |
31 |
32 | def Contrast(img, v):
33 | assert v >= 0.0
34 | return PIL.ImageEnhance.Contrast(img).enhance(v)
35 |
36 |
37 | def Equalize(img, _):
38 | return PIL.ImageOps.equalize(img)
39 |
40 |
41 | def Invert(img, _):
42 | return PIL.ImageOps.invert(img)
43 |
44 |
45 | def Identity(img, v):
46 | return img
47 |
48 |
49 | def Posterize(img, v): # [4, 8]
50 | v = int(v)
51 | v = max(1, v)
52 | return PIL.ImageOps.posterize(img, v)
53 |
54 |
55 | def Rotate(img, v): # [-30, 30]
56 | #assert -30 <= v <= 30
57 | #if random.random() > 0.5:
58 | # v = -v
59 | return img.rotate(v)
60 |
61 |
62 | def Sharpness(img, v): # [0.1,1.9]
63 | assert v >= 0.0
64 | return PIL.ImageEnhance.Sharpness(img).enhance(v)
65 |
66 |
67 | def ShearX(img, v): # [-0.3, 0.3]
68 | #assert -0.3 <= v <= 0.3
69 | #if random.random() > 0.5:
70 | # v = -v
71 | return img.transform(img.size, PIL.Image.AFFINE, (1, v, 0, 0, 1, 0))
72 |
73 |
74 | def ShearY(img, v): # [-0.3, 0.3]
75 | #assert -0.3 <= v <= 0.3
76 | #if random.random() > 0.5:
77 | # v = -v
78 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, v, 1, 0))
79 |
80 |
81 | def TranslateX(img, v): # [-150, 150] => percentage: [-0.45, 0.45]
82 | #assert -0.3 <= v <= 0.3
83 | #if random.random() > 0.5:
84 | # v = -v
85 | v = v * img.size[0]
86 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0))
87 |
88 |
89 | def TranslateXabs(img, v): # [-150, 150] => percentage: [-0.45, 0.45]
90 | #assert v >= 0.0
91 | #if random.random() > 0.5:
92 | # v = -v
93 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0))
94 |
95 |
96 | def TranslateY(img, v): # [-150, 150] => percentage: [-0.45, 0.45]
97 | #assert -0.3 <= v <= 0.3
98 | #if random.random() > 0.5:
99 | # v = -v
100 | v = v * img.size[1]
101 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v))
102 |
103 |
104 | def TranslateYabs(img, v): # [-150, 150] => percentage: [-0.45, 0.45]
105 | #assert 0 <= v
106 | #if random.random() > 0.5:
107 | # v = -v
108 | return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v))
109 |
110 |
111 | def Solarize(img, v): # [0, 256]
112 | assert 0 <= v <= 256
113 | return PIL.ImageOps.solarize(img, v)
114 |
115 |
116 | def Cutout(img, v): #[0, 60] => percentage: [0, 0.2] => change to [0, 0.5]
117 | assert 0.0 <= v <= 0.5
118 | if v <= 0.:
119 | return img
120 |
121 | v = v * img.size[0]
122 | return CutoutAbs(img, v)
123 |
124 |
125 | def CutoutAbs(img, v): # [0, 60] => percentage: [0, 0.2]
126 | # assert 0 <= v <= 20
127 | if v < 0:
128 | return img
129 | w, h = img.size
130 | x0 = np.random.uniform(w)
131 | y0 = np.random.uniform(h)
132 |
133 | x0 = int(max(0, x0 - v / 2.))
134 | y0 = int(max(0, y0 - v / 2.))
135 | x1 = min(w, x0 + v)
136 | y1 = min(h, y0 + v)
137 |
138 | xy = (x0, y0, x1, y1)
139 | color = (125, 123, 114)
140 | # color = (0, 0, 0)
141 | img = img.copy()
142 | PIL.ImageDraw.Draw(img).rectangle(xy, color)
143 | return img
144 |
145 |
146 | def augment_list():
147 | l = [(AutoContrast, 0, 1), (Brightness, 0.05, 0.95), (Color, 0.05, 0.95),
148 | (Contrast, 0.05, 0.95), (Equalize, 0, 1), (Identity, 0, 1),
149 | (Posterize, 4, 8), (Rotate, -30, 30), (Sharpness, 0.05, 0.95),
150 | (ShearX, -0.3, 0.3), (ShearY, -0.3, 0.3), (Solarize, 0, 256),
151 | (TranslateX, -0.3, 0.3), (TranslateY, -0.3, 0.3)]
152 | return l
153 |
154 |
155 | class RandAugment:
156 |
157 | def __init__(self, n, m):
158 | self.n = n
159 | self.m = m # [0, 30] in fixmatch, deprecated.
160 | self.augment_list = augment_list()
161 |
162 | def __call__(self, img):
163 | ops = random.choices(self.augment_list, k=self.n)
164 | for op, min_val, max_val in ops:
165 | val = min_val + float(max_val - min_val) * random.random()
166 | img = op(img, val)
167 | cutout_val = random.random() * 0.5
168 | img = Cutout(img, cutout_val) #for fixmatch
169 | return img
170 |
--------------------------------------------------------------------------------
/relation+coco.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+coco.npy
--------------------------------------------------------------------------------
/relation+cub.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+cub.npy
--------------------------------------------------------------------------------
/relation+nuswide.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+nuswide.npy
--------------------------------------------------------------------------------
/relation+voc.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jameslahm/SCPNet/d56341acffcf95ede22d16806e2c043ff767782b/relation+voc.npy
--------------------------------------------------------------------------------
/scpnet.py:
--------------------------------------------------------------------------------
1 | import copy
2 | import os
3 |
4 | import numpy as np
5 | import torch
6 | from PIL import Image
7 | from torch.cuda.amp import autocast # type: ignore
8 | from torchvision import transforms
9 |
10 | from config import cfg
11 | from log import logger
12 | from model import SCPNet, load_clip_model
13 | from utils import COCO_missing_val_dataset, CocoDetection, ModelEma, get_ema_co
14 |
15 | from randaugment import RandAugment
16 |
17 |
18 | class WeakStrongDataset(torch.utils.data.Dataset): # type: ignore
19 |
20 | def __init__(self,
21 | root,
22 | annFile,
23 | transform,
24 | target_transform=None,
25 | class_num: int = -1):
26 | self.root = root
27 | with open(annFile, 'r') as f:
28 | names = f.readlines()
29 | self.name = names
30 | self.transform = transform
31 | self.class_num = class_num
32 | self.target_transform = target_transform
33 | self.strong_transform: transforms.Compose = copy.deepcopy(
34 | transform) # type: ignore
35 | self.strong_transform.transforms.insert(0,
36 | RandAugment(3,
37 | 5)) # type: ignore
38 |
39 | def __getitem__(self, index):
40 | name = self.name[index]
41 | path = name.strip('\n').split(',')[0]
42 | num = name.strip('\n').split(',')[1]
43 | num = num.strip(' ').split(' ')
44 | num = np.array([int(i) for i in num])
45 | label = np.zeros([self.class_num])
46 | label[num] = 1
47 | label = torch.tensor(label, dtype=torch.long)
48 | img = Image.open(os.path.join(self.root, path)).convert('RGB')
49 |
50 | img_w = self.transform(img)
51 | if self.target_transform is not None:
52 | target = self.target_transform(target) # type: ignore # noqa
53 | assert (self.target_transform is None)
54 | return [index, img_w,
55 | self.transform(img),
56 | self.strong_transform(img)], label
57 |
58 | def __len__(self):
59 | return len(self.name)
60 |
61 |
62 | def build_weak_strong_dataset(train_preprocess,
63 | val_preprocess,
64 | pin_memory=True):
65 | if "coco" in cfg.data:
66 | return build_coco_weak_strong_dataset(train_preprocess, val_preprocess)
67 | elif "nuswide" in cfg.data:
68 | return build_nuswide_weak_strong_dataset(train_preprocess,
69 | val_preprocess)
70 | elif "voc" in cfg.data:
71 | return build_voc_weak_strong_dataset(train_preprocess, val_preprocess)
72 | elif "cub" in cfg.data:
73 | return build_cub_weak_strong_dataset(train_preprocess, val_preprocess)
74 | else:
75 | assert (False)
76 |
77 |
78 | def build_coco_weak_strong_dataset(train_preprocess, val_preprocess):
79 |
80 | # COCO Data loading
81 | instances_path_val = os.path.join(cfg.data,
82 | 'annotations/instances_val2014.json')
83 | # instances_path_train = os.path.join(args.data, 'annotations/instances_train2014.json')
84 | instances_path_train = cfg.dataset
85 |
86 | data_path_val = f'{cfg.data}/val2014' # args.data
87 | data_path_train = f'{cfg.data}/train2014' # args.data
88 | val_dataset = CocoDetection(data_path_val, instances_path_val,
89 | val_preprocess)
90 | train_dataset = WeakStrongDataset(data_path_train,
91 | instances_path_train,
92 | train_preprocess,
93 | class_num=cfg.num_classes)
94 |
95 | # Pytorch Data loader
96 | train_loader = torch.utils.data.DataLoader( # type: ignore
97 | train_dataset,
98 | batch_size=cfg.batch_size,
99 | shuffle=True,
100 | num_workers=cfg.workers,
101 | pin_memory=True)
102 |
103 | val_loader = torch.utils.data.DataLoader( # type: ignore
104 | val_dataset,
105 | batch_size=cfg.batch_size,
106 | shuffle=False,
107 | num_workers=cfg.workers,
108 | pin_memory=False)
109 |
110 | return [train_loader, val_loader]
111 |
112 |
113 | def build_nuswide_weak_strong_dataset(train_preprocess, val_preprocess):
114 | # Nus_wide Data loading
115 | instances_path_train = cfg.train_dataset
116 | instances_path_val = cfg.val_dataset
117 |
118 | data_path_val = f'{cfg.data}images' # args.data
119 | data_path_train = f'{cfg.data}images' # args.data
120 |
121 | val_dataset = COCO_missing_val_dataset(data_path_val,
122 | instances_path_val,
123 | val_preprocess,
124 | class_num=cfg.num_classes)
125 | train_dataset = WeakStrongDataset(data_path_train,
126 | instances_path_train,
127 | train_preprocess,
128 | class_num=cfg.num_classes)
129 | # Pytorch Data loader
130 | train_loader = torch.utils.data.DataLoader(train_dataset,
131 | batch_size=cfg.batch_size,
132 | shuffle=True,
133 | num_workers=cfg.workers,
134 | pin_memory=True)
135 |
136 | val_loader = torch.utils.data.DataLoader(val_dataset,
137 | batch_size=cfg.batch_size,
138 | shuffle=False,
139 | num_workers=cfg.workers,
140 | pin_memory=False)
141 | return [train_loader, val_loader]
142 |
143 |
144 | def build_voc_weak_strong_dataset(train_preprocess, val_preprocess):
145 | # VOC Data loading
146 | instances_path_train = cfg.train_dataset
147 | instances_path_val = cfg.val_dataset
148 |
149 | data_path_val = f'{cfg.data}VOC2012/JPEGImages' # args.data
150 | data_path_train = f'{cfg.data}VOC2012/JPEGImages' # args.data
151 |
152 | val_dataset = COCO_missing_val_dataset(data_path_val,
153 | instances_path_val,
154 | val_preprocess,
155 | class_num=cfg.num_classes)
156 | train_dataset = WeakStrongDataset(data_path_train,
157 | instances_path_train,
158 | train_preprocess,
159 | class_num=cfg.num_classes)
160 | # Pytorch Data loader
161 | train_loader = torch.utils.data.DataLoader(train_dataset,
162 | batch_size=cfg.batch_size,
163 | shuffle=True,
164 | num_workers=cfg.workers,
165 | pin_memory=True)
166 |
167 | val_loader = torch.utils.data.DataLoader(val_dataset,
168 | batch_size=cfg.batch_size,
169 | shuffle=False,
170 | num_workers=cfg.workers,
171 | pin_memory=False)
172 | return [train_loader, val_loader]
173 |
174 |
175 | def build_cub_weak_strong_dataset(train_preprocess, val_preprocess):
176 | # CUB Data loading
177 | instances_path_train = cfg.train_dataset
178 | instances_path_val = cfg.val_dataset
179 |
180 | data_path_val = f'{cfg.data}CUB_200_2011/images' # args.data
181 | data_path_train = f'{cfg.data}CUB_200_2011/images' # args.data
182 |
183 | val_dataset = COCO_missing_val_dataset(data_path_val,
184 | instances_path_val,
185 | val_preprocess,
186 | class_num=cfg.num_classes)
187 | train_dataset = WeakStrongDataset(data_path_train,
188 | instances_path_train,
189 | train_preprocess,
190 | class_num=cfg.num_classes)
191 | # Pytorch Data loader
192 | train_loader = torch.utils.data.DataLoader(train_dataset,
193 | batch_size=cfg.batch_size,
194 | shuffle=True,
195 | num_workers=cfg.workers,
196 | pin_memory=True)
197 |
198 | val_loader = torch.utils.data.DataLoader(val_dataset,
199 | batch_size=cfg.batch_size,
200 | shuffle=False,
201 | num_workers=cfg.workers,
202 | pin_memory=False)
203 | return [train_loader, val_loader]
204 |
205 | class SCPNetTrainer():
206 |
207 | def __init__(self) -> None:
208 | super().__init__()
209 |
210 | clip_model, _ = load_clip_model()
211 | # image_size = clip_model.visual.input_resolution
212 | image_size = cfg.image_size
213 |
214 | normalize = transforms.Normalize((0.48145466, 0.4578275, 0.40821073),
215 | (0.26862954, 0.26130258, 0.27577711))
216 |
217 | train_preprocess = transforms.Compose([
218 | transforms.RandomHorizontalFlip(),
219 | transforms.RandomResizedCrop(image_size),
220 | transforms.ToTensor(), normalize
221 | ])
222 | val_preprocess = transforms.Compose([
223 | transforms.Resize(image_size),
224 | transforms.CenterCrop(image_size),
225 | transforms.ToTensor(), normalize
226 | ])
227 |
228 | train_loader, val_loader = build_weak_strong_dataset(
229 | train_preprocess, # type: ignore
230 | val_preprocess)
231 | self.train_loader = train_loader
232 | self.val_loader = val_loader
233 |
234 | classnames = val_loader.dataset.labels()
235 | assert (len(classnames) == cfg.num_classes)
236 |
237 | self.model = SCPNet(classnames, clip_model)
238 | self.relation = self.model.relation
239 | self.classnames = classnames
240 | for name, param in self.model.named_parameters():
241 | if "text_encoder" in name:
242 | param.requires_grad_(False)
243 |
244 | self.model.cuda()
245 | ema_co = get_ema_co()
246 | self.ema = ModelEma(self.model, ema_co) # 0.9997^641=0.82
247 |
248 | self.selected_label = torch.zeros(
249 | (len(self.train_loader.dataset), cfg.num_classes),
250 | dtype=torch.long,
251 | )
252 | self.selected_label = self.selected_label.cuda()
253 | self.classwise_acc = torch.zeros((cfg.num_classes, )).cuda()
254 | self.classwise_acc[:] = 1/cfg.num_classes
255 |
256 | def consistency_loss(self, logits_s, logits_w, y_lb):
257 | logits_w = logits_w.detach()
258 |
259 | pseudo_label = torch.sigmoid(logits_w)
260 | pseudo_label_s = torch.sigmoid(logits_s)
261 |
262 | relation_p = pseudo_label @ self.relation.cuda().t()
263 |
264 | max_probs, max_idx = torch.topk(pseudo_label, cfg.hard_k, dim=-1)
265 | threhold = cfg.p_cutoff * (self.classwise_acc[max_idx] /
266 | (2. - self.classwise_acc[max_idx]))
267 | mask = max_probs.ge(threhold).float().sum(dim=1) >= 1 # convex
268 | labels = torch.zeros((len(logits_s), cfg.num_classes),
269 | dtype=torch.long)
270 | for i, idx in enumerate(max_idx):
271 | labels[i][idx] = 1
272 | labels_mask = pseudo_label < cfg.p_cutoff * (
273 | self.classwise_acc / (2. - self.classwise_acc))
274 | labels[labels_mask] = 0
275 | labels = torch.logical_or(labels, y_lb.cpu()).type(torch.long)
276 | labels = labels.cuda()
277 | xs_pos = pseudo_label_s
278 | xs_neg = 1 - pseudo_label_s
279 | los_pos = labels * torch.log(xs_pos.clamp(min=1e-8))
280 | los_neg = (1 - labels) * torch.log(xs_neg.clamp(min=1e-8))
281 | loss = (los_pos + los_neg) * mask.reshape(-1, 1)
282 | loss_kl = (relation_p * torch.log(xs_pos.clamp(min=1e-8)) + (1 - relation_p) * torch.log(xs_neg.clamp(min=1e-8))) * mask.reshape(-1, 1)
283 | return -loss.sum() - cfg.kl_lambda * loss_kl.sum(), labels
284 |
285 | def train(self, input, target, criterion, epoch, epoch_i) -> torch.Tensor:
286 |
287 | x_ulb_idx, x_lb, x_ulb_w, x_ulb_s = input
288 | y_lb = target
289 |
290 | num_lb = x_lb.shape[0]
291 | num_ulb = x_ulb_w.shape[0]
292 | assert num_ulb == x_ulb_s.shape[0]
293 |
294 | x_lb, x_ulb_w, x_ulb_s = x_lb.cuda(), x_ulb_w.cuda(), x_ulb_s.cuda()
295 | x_ulb_idx = x_ulb_idx.cuda()
296 |
297 | pseudo_counter = self.selected_label.sum(dim=0)
298 | max_v = pseudo_counter.max().item()
299 | sum_v = pseudo_counter.sum().item()
300 | if max_v >= 1: # not all(5w) -1
301 | for i in range(cfg.num_classes):
302 | self.classwise_acc[i] = max(pseudo_counter[i] / max(
303 | max_v,
304 | cfg.hard_k * len(self.selected_label) - sum_v), 1/cfg.num_classes)
305 |
306 | inputs = torch.cat((x_lb, x_ulb_w, x_ulb_s))
307 |
308 | # inference and calculate sup/unsup losses
309 | with autocast():
310 | logits = self.model(inputs)
311 | logits_x_lb = logits[:num_lb]
312 | logits_x_ulb_w, logits_x_ulb_s = logits[num_lb:].chunk(2)
313 | logits_x_lb = logits_x_lb.float()
314 | logits_x_ulb_w, logits_x_ulb_s = logits_x_ulb_w.float(
315 | ), logits_x_ulb_s.float()
316 |
317 | sup_loss, _ = criterion(logits_x_lb, y_lb, epoch)
318 |
319 | unsup_loss, labels = self.consistency_loss(logits_x_ulb_s,
320 | logits_x_ulb_w, y_lb)
321 |
322 | assert (labels is not None)
323 | select_mask = labels.sum(dim=1) >= 1
324 | if x_ulb_idx[select_mask].nelement() != 0:
325 | self.selected_label[
326 | x_ulb_idx[select_mask]] = labels[select_mask]
327 |
328 | total_loss = sup_loss + cfg.lambda_u * unsup_loss
329 |
330 | return total_loss
331 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | from typing import Tuple
4 |
5 | import torch
6 | from torch.cuda.amp import GradScaler, autocast # type: ignore
7 | from torch.optim import lr_scheduler
8 |
9 | from log import logger
10 | from loss import SPLC
11 | from scpnet import SCPNetTrainer
12 | from utils import AverageMeter, add_weight_decay, mAP
13 |
14 | from config import cfg # isort:skip
15 |
16 | def save_best(trainer, if_ema_better: bool) -> None:
17 | if if_ema_better:
18 | torch.save(trainer.ema.module.state_dict(),
19 | os.path.join(cfg.checkpoint, 'model-highest.ckpt'))
20 | else:
21 | torch.save(trainer.model.state_dict(),
22 | os.path.join(cfg.checkpoint, 'model-highest.ckpt'))
23 | torch.save(trainer.model.state_dict(),
24 | os.path.join(cfg.checkpoint, 'model-highest-regular.ckpt'))
25 | torch.save(trainer.ema.module.state_dict(),
26 | os.path.join(cfg.checkpoint, 'model-highest-ema.ckpt'))
27 |
28 | def validate(trainer, epoch: int) -> Tuple[float, bool]:
29 |
30 | trainer.model.eval()
31 | logger.info("Start validation...")
32 | sigmoid = torch.nn.Sigmoid()
33 | preds_regular = []
34 | preds_ema = []
35 | targets = []
36 | for _, (input, target) in enumerate(trainer.val_loader):
37 | target = target
38 | # compute output
39 | with torch.no_grad():
40 | with autocast():
41 | if cfg.model_name != 'simsiam':
42 | output_regular = sigmoid(
43 | trainer.model(input.cuda())).cpu()
44 | output_ema = sigmoid(
45 | trainer.ema.module(input.cuda())).cpu()
46 | else:
47 | output_regular = sigmoid(
48 | trainer.model.module.clip(
49 | input.cuda())).cpu()
50 | output_ema = sigmoid(
51 | trainer.ema.module.module.clip(
52 | input.cuda())).cpu()
53 |
54 | # for mAP calculation
55 | preds_regular.append(output_regular.cpu().detach())
56 | preds_ema.append(output_ema.cpu().detach())
57 | targets.append(target.cpu().detach())
58 |
59 | mAP_score_regular = mAP(
60 | torch.cat(targets).numpy(),
61 | torch.cat(preds_regular).numpy())
62 | mAP_score_ema = mAP(
63 | torch.cat(targets).numpy(),
64 | torch.cat(preds_ema).numpy())
65 | logger.info("mAP score regular {:.2f}, mAP score EMA {:.2f}".format(
66 | mAP_score_regular, mAP_score_ema))
67 | mAP_max = max(mAP_score_regular, mAP_score_ema)
68 | if mAP_score_ema >= mAP_score_regular:
69 | if_ema_better = True
70 | else:
71 | if_ema_better = False
72 |
73 | trainer.model.train()
74 | return mAP_max, if_ema_better
75 |
76 | def train(trainer) -> None:
77 | # set optimizer
78 | criterion = SPLC()
79 | parameters = add_weight_decay(trainer.model, cfg.weight_decay)
80 | max_lr = [cfg.lr, cfg.lr, cfg.gcn_lr, cfg.gcn_lr]
81 | optimizer = torch.optim.Adam(
82 | params=parameters, lr=cfg.lr,
83 | weight_decay=0) # true wd, filter_bias_and_bn
84 | steps_per_epoch = len(trainer.train_loader)
85 | scheduler = lr_scheduler.OneCycleLR( # type: ignore
86 | optimizer,
87 | max_lr=max_lr,
88 | steps_per_epoch=steps_per_epoch,
89 | epochs=cfg.total_epochs, # type: ignore
90 | pct_start=0.2)
91 |
92 | highest_mAP = 0
93 | scaler = GradScaler()
94 | best_epoch = 0
95 | for epoch in range(cfg.epochs):
96 | for i, (input, target) in enumerate(trainer.train_loader):
97 | target = target.cuda() # (batch,3,num_classes)
98 | # target = target.max(dim=1)[0]
99 | loss = trainer.train(input, target, criterion, epoch, i)
100 |
101 | trainer.model.zero_grad()
102 | scaler.scale(loss).backward() # type: ignore
103 | scaler.step(optimizer)
104 | scaler.update()
105 | scheduler.step()
106 | trainer.ema.update(trainer.model)
107 | if i % 100 == 0:
108 | logger.info('Epoch [{}/{}], Step [{}/{}], LR {:.1e}, Loss: {:.1f}'
109 | .format(epoch, cfg.epochs, str(i).zfill(3), str(steps_per_epoch).zfill(3), # noqa
110 | scheduler.get_last_lr()[0], \
111 | loss.item()))
112 |
113 | mAP_score, if_ema_better = validate(trainer, epoch)
114 |
115 | if mAP_score > highest_mAP:
116 | highest_mAP = mAP_score
117 | best_epoch = epoch
118 | save_best(trainer, if_ema_better)
119 | logger.info(
120 | 'current_mAP = {:.2f}, highest_mAP = {:.2f}, best_epoch={}\n'.
121 | format(mAP_score, highest_mAP, best_epoch))
122 | logger.info("Save text embeddings done")
123 |
124 | def test(trainer) -> None:
125 | # get model-highest.ckpt
126 | trainer.model.load_state_dict(
127 | torch.load(f"{cfg.checkpoint}/model-highest.ckpt"), strict=True)
128 | trainer.model.eval()
129 |
130 | logger.info("Start test...")
131 | batch_time = AverageMeter()
132 | prec = AverageMeter()
133 | rec = AverageMeter()
134 | # mAP_meter = AverageMeter()
135 |
136 | sigmoid = torch.nn.Sigmoid()
137 |
138 | end = time.time()
139 | tp, fp, fn, tn, count = 0, 0, 0, 0, 0
140 | preds = []
141 | targets = []
142 | for i, (input, target) in enumerate(trainer.val_loader):
143 | target = target
144 | # compute output
145 | with torch.no_grad():
146 | output = sigmoid(trainer.model(input.cuda())).cpu()
147 |
148 | # for mAP calculation
149 | preds.append(output.cpu())
150 | targets.append(target.cpu())
151 |
152 | # measure accuracy and record loss
153 | pred = output.data.gt(cfg.thre).long()
154 |
155 | tp += (pred + target).eq(2).sum(dim=0)
156 | fp += (pred - target).eq(1).sum(dim=0)
157 | fn += (pred - target).eq(-1).sum(dim=0)
158 | tn += (pred + target).eq(0).sum(dim=0)
159 | count += input.size(0)
160 |
161 | this_tp = (pred + target).eq(2).sum()
162 | this_fp = (pred - target).eq(1).sum()
163 | this_fn = (pred - target).eq(-1).sum()
164 | # this_tn = (pred + target).eq(0).sum()
165 |
166 | this_prec = this_tp.float() / (this_tp + this_fp).float(
167 | ) * 100.0 if this_tp + this_fp != 0 else 0.0
168 | this_rec = this_tp.float() / (this_tp + this_fn).float(
169 | ) * 100.0 if this_tp + this_fn != 0 else 0.0
170 |
171 | prec.update(float(this_prec), input.size(0))
172 | rec.update(float(this_rec), input.size(0))
173 |
174 | # measure elapsed time
175 | batch_time.update(time.time() - end)
176 | end = time.time()
177 |
178 | p_c = [
179 | float(tp[i].float() / (tp[i] + fp[i]).float()) *
180 | 100.0 if tp[i] > 0 else 0.0 for i in range(len(tp))
181 | ]
182 | r_c = [
183 | float(tp[i].float() / (tp[i] + fn[i]).float()) *
184 | 100.0 if tp[i] > 0 else 0.0 for i in range(len(tp))
185 | ]
186 | f_c = [
187 | 2 * p_c[i] * r_c[i] / (p_c[i] + r_c[i]) if tp[i] > 0 else 0.0
188 | for i in range(len(tp))
189 | ]
190 |
191 | mean_p_c = sum(p_c) / len(p_c)
192 | mean_r_c = sum(r_c) / len(r_c)
193 | mean_f_c = sum(f_c) / len(f_c)
194 |
195 | p_o = tp.sum().float() / (tp + fp).sum().float() * 100.0
196 | r_o = tp.sum().float() / (tp + fn).sum().float() * 100.0
197 | f_o = 2 * p_o * r_o / (p_o + r_o)
198 |
199 | if i % 64 == 0:
200 | logger.info(
201 | 'Test: [{0}/{1}]\t'
202 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
203 | 'Precision {prec.val:.2f} ({prec.avg:.2f})\t'
204 | 'Recall {rec.val:.2f} ({rec.avg:.2f})'.format(
205 | i,
206 | len(trainer.val_loader),
207 | batch_time=batch_time,
208 | prec=prec,
209 | rec=rec))
210 | logger.info(
211 | 'P_C {:.2f} R_C {:.2f} F_C {:.2f} P_O {:.2f} R_O {:.2f} F_O {:.2f}'
212 | .format(mean_p_c, mean_r_c, mean_f_c, p_o, r_o, f_o))
213 |
214 | logger.info(
215 | '--------------------------------------------------------------------'
216 | )
217 | logger.info(
218 | ' * P_C {:.2f} R_C {:.2f} F_C {:.2f} P_O {:.2f} R_O {:.2f} F_O {:.2f}'
219 | .format(mean_p_c, mean_r_c, mean_f_c, p_o, r_o,
220 | f_o)) # type: ignore
221 |
222 | mAP_score = mAP(torch.cat(targets).numpy(), torch.cat(preds).numpy())
223 | logger.info(f"mAP score: {mAP_score}")
224 | return torch.cat(targets).numpy(), torch.cat(preds).numpy() # type: ignore
225 |
226 |
227 |
228 | def main():
229 | trainer = SCPNetTrainer()
230 | if cfg.test:
231 | test(trainer)
232 | else:
233 | train(trainer)
234 |
235 | if __name__ == '__main__':
236 | main()
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import random
3 | from copy import deepcopy
4 |
5 | import numpy as np
6 | import torch
7 | from PIL import Image, ImageDraw
8 | from pycocotools.coco import COCO
9 | from torchvision import datasets as datasets
10 |
11 | from config import cfg
12 | from log import logger
13 |
14 |
15 | def average_precision(output, target):
16 | epsilon = 1e-8
17 |
18 | # sort examples
19 | indices = output.argsort()[::-1]
20 | # Computes prec@i
21 | total_count_ = np.cumsum(np.ones((len(output), 1)))
22 |
23 | target_ = target[indices]
24 | ind = target_ == 1
25 | pos_count_ = np.cumsum(ind)
26 | total = pos_count_[-1]
27 | pos_count_[np.logical_not(ind)] = 0 # type: ignore
28 | pp = pos_count_ / total_count_
29 | precision_at_i_ = np.sum(pp)
30 | precision_at_i = precision_at_i_ / (total + epsilon)
31 |
32 | return precision_at_i
33 |
34 |
35 | def mAP(targs, preds):
36 | """Returns the model's average precision for each class
37 | Return:
38 | ap (FloatTensor): 1xK tensor, with avg precision for each class k
39 | """
40 |
41 | if np.size(preds) == 0:
42 | return 0
43 | ap = np.zeros((preds.shape[1]))
44 | # compute average precision for each class
45 | for k in range(preds.shape[1]):
46 | # sort scores
47 | scores = preds[:, k]
48 | targets = targs[:, k]
49 | # compute average precision
50 | ap[k] = average_precision(scores, targets)
51 | return 100 * ap.mean()
52 |
53 |
54 | class AverageMeter(object):
55 |
56 | def __init__(self):
57 | self.val = None
58 | self.sum = None
59 | self.cnt = None
60 | self.avg = None
61 | self.ema = None
62 | self.initialized = False
63 |
64 | def update(self, val, n=1):
65 | if not self.initialized:
66 | self.initialize(val, n)
67 | else:
68 | self.add(val, n)
69 |
70 | def initialize(self, val, n):
71 | self.val = val
72 | self.sum = val * n
73 | self.cnt = n
74 | self.avg = val
75 | self.ema = val
76 | self.initialized = True
77 |
78 | def add(self, val, n):
79 | self.val = val
80 | self.sum += val * n
81 | self.cnt += n
82 | self.avg = self.sum / self.cnt
83 | self.ema = self.ema * 0.99 + self.val * 0.01 # type: ignore
84 |
85 |
86 | class CocoDetection(datasets.coco.CocoDetection):
87 |
88 | def __init__(self, root, annFile, transform=None, target_transform=None):
89 | self.root = root
90 | self.coco = COCO(annFile)
91 |
92 | self.ids = list(self.coco.imgToAnns.keys())
93 | self.transform = transform
94 | self.target_transform = target_transform
95 | self.cat2cat = dict()
96 | for cat in self.coco.cats.keys():
97 | self.cat2cat[cat] = len(self.cat2cat)
98 |
99 | def labels(self):
100 | return [v["name"] for v in self.coco.cats.values()]
101 |
102 | def __getitem__(self, index):
103 | coco = self.coco
104 | img_id = self.ids[index]
105 | ann_ids = coco.getAnnIds(imgIds=img_id)
106 | target = coco.loadAnns(ann_ids)
107 |
108 | output = torch.zeros((3, 80), dtype=torch.long)
109 | for obj in target: # type: ignore
110 | if obj['area'] < 32 * 32:
111 | output[0][self.cat2cat[obj['category_id']]] = 1
112 | elif obj['area'] < 96 * 96:
113 | output[1][self.cat2cat[obj['category_id']]] = 1
114 | else:
115 | output[2][self.cat2cat[obj['category_id']]] = 1
116 | target = output
117 |
118 | path = coco.loadImgs(img_id)[0]['file_name'] # type: ignore
119 | img = Image.open(os.path.join(self.root, path)).convert('RGB')
120 | if self.transform is not None:
121 | img = self.transform(img)
122 | if self.target_transform is not None:
123 | target = self.target_transform(target)
124 | target = target.max(dim=0)[0]
125 | return img, target
126 |
127 |
128 | class COCO_missing_dataset(torch.utils.data.Dataset): # type: ignore
129 |
130 | def __init__(self,
131 | root,
132 | annFile,
133 | transform=None,
134 | target_transform=None,
135 | class_num: int = -1):
136 | self.root = root
137 | with open(annFile, 'r') as f:
138 | names = f.readlines()
139 | # name = names.strip('\n').split(' ')
140 | self.name = names
141 | # self.label = name[:,1]
142 | self.transform = transform
143 | self.class_num = class_num
144 | self.target_transform = target_transform
145 |
146 | def __getitem__(self, index):
147 | name = self.name[index]
148 | path = name.strip('\n').split(',')[0]
149 | num = name.strip('\n').split(',')[1]
150 | num = num.strip(' ').split(' ')
151 | num = np.array([int(i) for i in num])
152 | label = np.zeros([self.class_num])
153 | label[num] = 1
154 | label = torch.tensor(label, dtype=torch.long)
155 | if os.path.exists(os.path.join(self.root, path)) == False:
156 | label = np.zeros([self.class_num])
157 | label = torch.tensor(label, dtype=torch.long)
158 | img = np.zeros((448, 448, 3))
159 | img = Image.fromarray(np.uint8(img)) # type: ignore
160 | exit(1)
161 | else:
162 | img = Image.open(os.path.join(self.root, path)).convert('RGB')
163 | if self.transform is not None:
164 | img = self.transform(img)
165 | if self.target_transform is not None:
166 | target = self.target_transform(target) # type: ignore # noqa
167 | assert (self.target_transform is None)
168 | return [index,img], label
169 |
170 | def __len__(self):
171 | return len(self.name)
172 |
173 | def labels(self):
174 | if "coco" in cfg.data:
175 | assert (False)
176 | elif "nuswide" in cfg.data:
177 | with open('nuswide_labels.txt', 'r') as f:
178 | text = f.read()
179 | return text.split('\n')
180 | elif "voc" in cfg.data:
181 | with open('voc_labels.txt', 'r') as f:
182 | text = f.read()
183 | return text.split('\n')
184 | elif "cub" in cfg.data:
185 | with open('cub_labels.txt', 'r') as f:
186 | text = f.read()
187 | return text.split('\n')
188 | else:
189 | assert (False)
190 |
191 |
192 | class COCO_missing_val_dataset(torch.utils.data.Dataset): # type: ignore
193 |
194 | def __init__(self,
195 | root,
196 | annFile,
197 | transform=None,
198 | target_transform=None,
199 | class_num: int = -1):
200 | self.root = root
201 | with open(annFile, 'r') as f:
202 | names = f.readlines()
203 | # name = names.strip('\n').split(' ')
204 | self.name = names
205 | # self.label = name[:,1]
206 | self.transform = transform
207 | self.class_num = class_num
208 | self.target_transform = target_transform
209 |
210 | def __getitem__(self, index):
211 | name = self.name[index]
212 | path = name.strip('\n').split(',')[0]
213 | num = name.strip('\n').split(',')[1]
214 | num = num.strip(' ').split(' ')
215 | num = np.array([int(i) for i in num])
216 | label = np.zeros([self.class_num])
217 | label[num] = 1
218 | label = torch.tensor(label, dtype=torch.long)
219 | if os.path.exists(os.path.join(self.root, path)) == False:
220 | label = np.zeros([self.class_num])
221 | label = torch.tensor(label, dtype=torch.long)
222 | img = np.zeros((448, 448, 3))
223 | img = Image.fromarray(np.uint8(img)) # type: ignore
224 | exit(1)
225 | else:
226 | img = Image.open(os.path.join(self.root, path)).convert('RGB')
227 | if self.transform is not None:
228 | img = self.transform(img)
229 | if self.target_transform is not None:
230 | target = self.target_transform(target) # type: ignore # noqa
231 | assert (self.target_transform is None)
232 | return img, label
233 |
234 | def __len__(self):
235 | return len(self.name)
236 |
237 | def labels(self):
238 | if "coco" in cfg.data:
239 | assert (False)
240 | elif "nuswide" in cfg.data:
241 | with open('nuswide_labels.txt', 'r') as f:
242 | text = f.read()
243 | return text.split('\n')
244 | elif "voc" in cfg.data:
245 | with open('voc_labels.txt', 'r') as f:
246 | text = f.read()
247 | return text.split('\n')
248 | elif "cub" in cfg.data:
249 | with open('cub_labels.txt', 'r') as f:
250 | text = f.read()
251 | return text.split('\n')
252 | else:
253 | assert (False)
254 |
255 |
256 | class ModelEma(torch.nn.Module):
257 |
258 | def __init__(self, model, decay=0.9997, device=None):
259 | super(ModelEma, self).__init__()
260 | # make a copy of the model for accumulating moving average of weights
261 | self.module = deepcopy(model)
262 | self.module.eval()
263 | self.decay = decay
264 | self.device = device # perform ema on different device from model if set
265 | if self.device is not None:
266 | self.module.to(device=device)
267 |
268 | def _update(self, model, update_fn):
269 | with torch.no_grad():
270 | for ema_v, model_v in zip(self.module.state_dict().values(),
271 | model.state_dict().values()):
272 | if self.device is not None:
273 | model_v = model_v.to(device=self.device)
274 | ema_v.copy_(update_fn(ema_v, model_v))
275 |
276 | def update(self, model):
277 | self._update(model,
278 | update_fn=lambda e, m: self.decay * e +
279 | (1. - self.decay) * m)
280 |
281 | def set(self, model):
282 | self._update(model, update_fn=lambda e, m: m)
283 |
284 |
285 | class CutoutPIL(object):
286 |
287 | def __init__(self, cutout_factor=0.5):
288 | self.cutout_factor = cutout_factor
289 |
290 | def __call__(self, x):
291 | img_draw = ImageDraw.Draw(x)
292 | h, w = x.size[0], x.size[1] # HWC
293 | h_cutout = int(self.cutout_factor * h + 0.5)
294 | w_cutout = int(self.cutout_factor * w + 0.5)
295 | y_c = np.random.randint(h)
296 | x_c = np.random.randint(w)
297 |
298 | y1 = np.clip(y_c - h_cutout // 2, 0, h)
299 | y2 = np.clip(y_c + h_cutout // 2, 0, h)
300 | x1 = np.clip(x_c - w_cutout // 2, 0, w)
301 | x2 = np.clip(x_c + w_cutout // 2, 0, w)
302 | fill_color = (random.randint(0, 255), random.randint(0, 255),
303 | random.randint(0, 255))
304 | img_draw.rectangle([x1, y1, x2, y2], fill=fill_color) # type: ignore
305 |
306 | return x
307 |
308 |
309 | def add_weight_decay(model, weight_decay=1e-4, skip_list=()):
310 | decay = []
311 | no_decay = []
312 | gcn = []
313 | gcn_no_decay = []
314 | prefix = "module." if torch.cuda.device_count() > 1 else ""
315 | for name, param in model.named_parameters():
316 | if not param.requires_grad:
317 | continue # frozen weights
318 | if name.startswith(f"{prefix}gc"):
319 | if len(param.shape) == 1 or name.endswith(".bias") or name in skip_list:
320 | gcn_no_decay.append(param)
321 | else:
322 | gcn.append(param)
323 | assert("gcn" in cfg.model_name)
324 | elif len(param.shape) == 1 or name.endswith(
325 | ".bias") or name in skip_list:
326 | no_decay.append(param)
327 | else:
328 | decay.append(param)
329 | return [{
330 | 'params': no_decay,
331 | 'weight_decay': 0.
332 | }, {
333 | 'params': decay,
334 | 'weight_decay': weight_decay
335 | }, {
336 | 'params': gcn_no_decay,
337 | 'weight_decay': 0.
338 | }, {
339 | 'params': gcn,
340 | 'weight_decay': weight_decay
341 | }]
342 |
343 | def get_ema_co():
344 | if "coco" in cfg.data:
345 | ema_co = np.exp(np.log(0.82)/(641*cfg.ratio)) # type: ignore
346 | # ema_co = 0.9997
347 | elif "nus" in cfg.data:
348 | ema_co = np.exp(np.log(0.82)/(931*cfg.ratio)) # type: ignore
349 | # ema_co = 0.9998
350 | elif "voc" in cfg.data:
351 | ema_co = np.exp(np.log(0.82)/(45*cfg.ratio)) # type: ignore
352 | # ema_co = 0.9956
353 | elif "cub" in cfg.data:
354 | if cfg.batch_size == 96:
355 | ema_co = np.exp(np.log(0.82)/(63*cfg.ratio))
356 | else:
357 | ema_co = np.exp(np.log(0.82)/(47*cfg.ratio)) # type: ignore
358 | else:
359 | assert(False)
360 | return ema_co
--------------------------------------------------------------------------------
/voc_labels.txt:
--------------------------------------------------------------------------------
1 | aeroplane
2 | bicycle
3 | bird
4 | boat
5 | bottle
6 | bus
7 | car
8 | cat
9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor
--------------------------------------------------------------------------------