├── slides └── .gitkeep ├── paper_reports ├── .gitignore ├── Lucid.md ├── fastvideoseg.md ├── images │ ├── cct.png │ ├── tcc.png │ ├── tcn.png │ ├── BDWSS.png │ ├── CANEt.png │ ├── osvos.png │ ├── video.png │ ├── WebcoSeg.png │ ├── WegSeg.png │ ├── tcc_dig.png │ ├── tcc_eq1.png │ ├── tcc_eq2.png │ ├── tcc_eq3.png │ ├── tcc_eq4.png │ ├── tcc_eq5.png │ ├── tcc_eq6.png │ ├── tcc_eq7.png │ ├── masktrack.png │ └── videoprediction.png ├── Cct.md ├── TransKg.md ├── BDWSS.md ├── WebcoSeg.md ├── uncertainty.md ├── CA_fewshot.md ├── Tcn.md ├── WebSeg.md ├── VOSTsv.md ├── Tcc.md ├── Seg_video_propa.md └── VideoSeg.md ├── awesome-marketplace.md ├── self-supervised.md ├── summary.md └── README.md /slides/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /paper_reports/.gitignore: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /paper_reports/Lucid.md: -------------------------------------------------------------------------------- 1 | ## Lucid 2 | 3 | -------------------------------------------------------------------------------- /paper_reports/fastvideoseg.md: -------------------------------------------------------------------------------- 1 | ## Fast object segmentation in unconstrained video 2 | 3 | -------------------------------------------------------------------------------- /paper_reports/images/cct.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/cct.png -------------------------------------------------------------------------------- /paper_reports/images/tcc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc.png -------------------------------------------------------------------------------- /paper_reports/images/tcn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcn.png -------------------------------------------------------------------------------- /paper_reports/images/BDWSS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/BDWSS.png -------------------------------------------------------------------------------- /paper_reports/images/CANEt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/CANEt.png -------------------------------------------------------------------------------- /paper_reports/images/osvos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/osvos.png -------------------------------------------------------------------------------- /paper_reports/images/video.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/video.png -------------------------------------------------------------------------------- /paper_reports/images/WebcoSeg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/WebcoSeg.png -------------------------------------------------------------------------------- /paper_reports/images/WegSeg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/WegSeg.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_dig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_dig.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq1.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq2.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq3.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq4.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq5.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq6.png -------------------------------------------------------------------------------- /paper_reports/images/tcc_eq7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq7.png -------------------------------------------------------------------------------- /paper_reports/images/masktrack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/masktrack.png -------------------------------------------------------------------------------- /paper_reports/images/videoprediction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/videoprediction.png -------------------------------------------------------------------------------- /paper_reports/Cct.md: -------------------------------------------------------------------------------- 1 | ## Learning Correspondence from the Cycle-consistency of Time 2 | 3 | ![cct](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/cct.png) -------------------------------------------------------------------------------- /paper_reports/TransKg.md: -------------------------------------------------------------------------------- 1 | ## Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ## Weakly Supervised Semantic Segmentation using Web-Crawled Videos 10 | 11 | ![video](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/video.png) -------------------------------------------------------------------------------- /paper_reports/BDWSS.md: -------------------------------------------------------------------------------- 1 | ## Bootstrapping the Performance of Webly Supervised Semantic Segmentation 2 | 3 | Complexity Measure: target domain model 4 | 5 | Proxy Ground Truth: fuse **target** and web domain pseudo masks 6 | 7 | 8 | 9 | ![BDWSS](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/BDWSS.png) 10 | 11 | -------------------------------------------------------------------------------- /awesome-marketplace.md: -------------------------------------------------------------------------------- 1 | **Links of awesome resources of hot topics, including papers, codes, slides, etc.** 2 | 3 | #### Knowledge-Distillation 4 | - [Link](https://github.com/dkozlov/awesome-knowledge-distillation) 5 | 6 | #### Self-Supervised Learning 7 | - [Link](https://github.com/jason718/awesome-self-supervised-learning) 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | #### Unknown 16 | 17 | Segmentations is All You Need -------------------------------------------------------------------------------- /paper_reports/WebcoSeg.md: -------------------------------------------------------------------------------- 1 | ## Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation 2 | 3 | Complexity Measure: x, co-segmentation has good tolerance to noise 4 | 5 | Proxy Ground Truth: pseudo masks using weakly model trained by co-segmentation 6 | 7 | Filtering: keep images which predicted masks have rate of 0.2-0.8 8 | 9 | ![WebcoSeg](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/WebcoSeg.png) -------------------------------------------------------------------------------- /paper_reports/uncertainty.md: -------------------------------------------------------------------------------- 1 | ## Uncertainty (Bayesian Deep Learning) 2 | 3 | ### *What* *Uncertainties* *Do* *We* *Need* *in* *Bayesian* *Deep* *Learning* for Computer *Vision* 4 | 5 | 6 | 7 | ### Bounding Box Regression with Uncertainty for Accurate Object Detection 8 | 9 | 10 | 11 | ### Uncertainty in Deep Learning (Phd Thesis) 12 | 13 | [Link]() 14 | 15 | 16 | 17 | 深度学习中的两种不确定性 18 | 19 | 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /paper_reports/CA_fewshot.md: -------------------------------------------------------------------------------- 1 | ## CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning 2 | ![CANEt](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/CANEt.png) 3 | 4 | 5 | 6 | #### Iterative Optimization module 7 | 8 | use last iteration predicted probability maps and input features (concat) to predict current masks in a residual form, 9 | 10 | predicted map has $p$ probability to set to be zero (resist over-fitting in iterative optimization) 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | -------------------------------------------------------------------------------- /paper_reports/Tcn.md: -------------------------------------------------------------------------------- 1 | ## Time-Contrastive Networks: Self-Supervised Learning from Video 2 | 3 | self-supervised in a single video 4 | 5 | **triplet loss**: frames near anchor are treated as positive samples, and frames far from anchor are treated as negative samples 6 | 7 | The model trains itself by trying to answer the following questions simultaneously: 8 | 9 | - What is common between the different-looking blue frames? 10 | - What is different between the similar-looking red and blue frames? 11 | 12 | ![tcn](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcn.png) -------------------------------------------------------------------------------- /paper_reports/WebSeg.md: -------------------------------------------------------------------------------- 1 | ## WebSeg: Learning Semantic Segmentation from Web Searches 2 | 3 | - use low level cues as ground truth: regions, saliency 4 | 5 | - regions are get using MCG on **edges maps** 6 | 7 | - saliency use DSS 8 | 9 | - filter GT by a region net 10 | 11 | ![WegSeg](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/WegSeg.png) 12 | 13 | #### complexity image measure 14 | 15 | drop web crawled complex images 16 | 17 | - variance of Laplace 18 | - saturation / brightness 19 | 20 | #### proxy ground truth 21 | 22 | Region + Saliency 23 | 24 | #### noise filtering module 25 | 26 | labels: region probs 27 | 28 | network: spp pooling network -------------------------------------------------------------------------------- /paper_reports/VOSTsv.md: -------------------------------------------------------------------------------- 1 | ## Video Object Segmentation and Tracking: A Survey 2 | 3 | ### VOS 4 | 5 | bottom-up: 6 | 7 | - spatio-temporal motion 8 | - appearance similarity 9 | 10 | iteratively optimize energy functions / fine-tunes deep network 11 | 12 | read multiple frames at once -> take full advantage of the context of multiple frames -> suited short-term images 13 | 14 | 15 | 16 | ### VOT 17 | 18 | use class-specific detector to robustly predict the motion state (location, size, orientation) -> suited long-term sequences 19 | 20 | - generative / discriminative appearance models 21 | 22 | - part-based tracking 23 | - segmentation-based tracking 24 | 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /self-supervised.md: -------------------------------------------------------------------------------- 1 | ## Self-Supervised Learning 2 | 3 | #### Image 4 | 5 | Rotation: predicting rotation degree 6 | 7 | Exemplar: each image correspond to one class, use triplet loss 8 | 9 | Jigsaw: recover relative spatial position of 9 randomly sampled image patches after a random permutation 10 | 11 | Relative Patch Location: predicting the relative location of two given patches of an image. 12 | 13 | 14 | 15 | #### Video 16 | 17 | cycle between tracking frame patches in same video 18 | 19 | cycle between frames in similar videos (most similar frame of frame a in video A is frame b is video B, then then most similiar frame of frame B should be frame A correspondingly ) 20 | 21 | triplet between time-near, faraway frames and anchor frame among same video 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | #### Papers 30 | 31 | Revisiting Self-Supervised Visual Representation Learning, CVPR2019 32 | 33 | SCOPS: Self-Supervised Co-Part Segmentation, CVPR2019 34 | 35 | Time-contrastive Networks: Self-Supervised Learning from Video 36 | 37 | Temporal Cycle-Consistency Learning, CVPR2019 38 | 39 | Learning Correspondence from the Cycle-consistency of Time, CVPR2019 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /paper_reports/Tcc.md: -------------------------------------------------------------------------------- 1 | ## Temporal Cycle-Consistency Learning 2 | 3 | 4 | 5 | ![tcc](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc.png) 6 | 7 | matching in mid-level feature 8 | 9 | cycle consistency in videos 10 | 11 | ![tcc_dig](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_dig.png) 12 | 13 | #### Cycle-back LOSS 14 | 15 | ###### classification 16 | 17 | Given the selected point $u_i$ 18 | 19 | cycle-forward: soft nearest neighbor: ![tcc_eq1](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq1.png) 20 | 21 | cycle-back: use distance as logits: ![tcc_eq2](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq2.png) ![tcc_eq3](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq3.png) 22 | 23 | 24 | 25 | ###### regression 26 | 27 | penalize the model less if cycle-back frame is near the anchor frame 28 | 29 | a similarity vector $\beta$ along $u_i$ 30 | 31 | ![tcc_eq4](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq4.png) 32 | 33 | Give $\beta$ a Gaussian prior, center is position of anchor frame i 34 | 35 | ![tcc_eq5](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq5.png) 36 | 37 | where ![tcc_eq6](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq6.png) 38 | 39 | or only minimize mean 40 | 41 | ![tcc_eq7](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq7.png) -------------------------------------------------------------------------------- /paper_reports/Seg_video_propa.md: -------------------------------------------------------------------------------- 1 | ## Improving Semantic Segmentation via Video Propagation and Label Relaxation 2 | 3 | #### Introduction 4 | 5 | synthesizing new training samples: use a video prediction-based methodology 6 | 7 | video prediction: prone to producing unnatural distortions along object boundaries 8 | 9 | ![framework](images/videoprediction.png) 10 | 11 | 12 | 13 | #### Contribution 14 | 15 | label propagation: 16 | 17 | - patch matching: sensitive to patch size and threshold 18 | - optical flow: rely on accurate optical flow 19 | 20 | 21 | 22 | This paper: 23 | 24 | - motion vectors from video prediction (self-supervised training) 25 | 26 | - joint image-label propagation 27 | 28 | 29 | 30 | Boundary handling: 31 | 32 | - edge cues as constraints 33 | - error propagation from edge estimation 34 | - overfitting: fitting extremely hard boundary cases 35 | - structure modeling: affinity field [21], random walk [5], relaxation labelling[37], boundary neural fields [4] 36 | - not directly deals with boundary pixels 37 | 38 | This Paper: predict multiple classes at a boundary pixel 39 | 40 | 41 | 42 | #### Method 43 | 44 | ###### Video Prediction 45 | 46 | SDC-Net: Video Prediction using Spatially-Displaced Convolution 47 | 48 | ###### Boundary Label Relaxation 49 | 50 | difficult to classify the center pixel of a receptive field when potentially half or more of the input context could be from a different class 51 | 52 | For boundary pixels: 53 | 54 | - x maximizing the likelihood of the target label 55 | - maximize the likelihood of $P(A \cup B) = P(A) + P(B)$, A, B is neighbor classes 56 | 57 | - loss is $\mathcal{L}_{boundary} = -log\sum_{C\in\mathcal{N}}{P(C)}$ 58 | 59 | -------------------------------------------------------------------------------- /paper_reports/VideoSeg.md: -------------------------------------------------------------------------------- 1 | ## Video Segmentation Overview 2 | #### Basic 3 | 4 | roadmap: 5 | 6 | - interleave box tracking with box-driven segmentation 7 | - propagate the first frame segmentation via graph labeling 8 | 9 | 10 | 11 | Lucid Data Dreaming augmentations, temporal component 12 | 13 | 14 | 15 | ### Approaches 16 | 17 | #### Semi-supervised 18 | 19 | ###### Matching-based 20 | 21 | - OSVOS 22 | - OnAVOS (Online adaptation of convolutional neural networks for video object segmentation, BMVC 2017) 23 | - updating the network online with additional high confident predictions 24 | - OSVOS-S(Video object segmentation without temporal information, PAMI 2018) 25 | - semantic information from an instance segmentation network 26 | - using the instance segments of the different objects in the scene as prior knowledge and blend them with the segmentation output 27 | 28 | - **Lucid** 29 | 30 | - FAVOS, PML, videomatch 31 | 32 | 33 | 34 | ###### Propagation-based 35 | 36 | - MaskTrack 37 | - LucidTracker (Lucid Data Dreaming for Video Object Segmentation, IJCV) 38 | - CRN (Motion guided cascaded refinement network for video object segmentation, CVPR2018) 39 | - applying active contour on optical flow to find motion cues 40 | - **CINM**: (Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF, CVPR2018) 41 | - long-term temporal dependency 42 | - MoNet: (MoNet: Deep motion exploitation for video object segmentation, CVPR2018) 43 | - exploits optical flow motion cues by feature alignment and a distance transform layer 44 | - combined temporal information from nearby frame to track the target 45 | - LSE: (Video object segmentation by learning location-sensitive embeddings, ECCV2018) 46 | - Location-sensitive embeddings used to refine an initial foreground prediction 47 | - combined temporal information from nearby frame to track the target 48 | 49 | 50 | 51 | - OSMN, RGMP, FEELVOS, MHP-VOS, RVOS 52 | - meta-learning , Conditional Batch Normalization (CBN) to gather spatiotemporal features 53 | - applied instance detection 54 | 55 | - STCNN (Spatiotemporal CNN for Video Object Segmentation, CVPR2019) 56 | - the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data 57 | 58 | 59 | 60 | ###### Detection-based 61 | 62 | - MHP-VOS: (MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, CVPR2019) 63 | - cases that objects are occluded or missing 64 | 65 | 66 | 67 | ###### Fast (without fine tune) 68 | 69 | - FAVOS (Fast and accurate online video object segmentation via tracking parts, CVPR2018) 70 | - PML (Blazingly fast video object segmentation with pixel-wise metric learning, CVPR2018) 71 | - Videomatch (Videomatch: Matching based video object segmentation, CVPR2018) 72 | 73 | 74 | 75 | - OSMN (Efficient video object segmentation via network modulation, CVPR2018) 76 | - meta-learning 77 | - RGMP (Fast video object segmentation by reference-guided mask propagation, CVPR2018) 78 | - FEELVOS (Fast End-to-End Embedding Learning for Video Object Segmentation, CVPR2019) 79 | 80 | 81 | 82 | #### Unsupervised 83 | 84 | - RVOS (RVOS: End-to-End Recurrent Network for Video Object Segmentation: CVPR2019) 85 | - IET, (Instance Embedding Transfer to Unsupervised Video Object Segmentation: CVPR2018) 86 | - adapt the instance networks trained on static images 87 | - incorporate the embeddings with objectness and optical flow features 88 | - LMP (Learning motion patterns in videos, CVPR2017) 89 | - takes optical flow as an input to separate moving and non-moving regions 90 | - combines the results with objectness cues from SharpMask [35] to generate the moving object segmentation 91 | - LVO (Learning video object segmentation with visual memory, ICCV2017) 92 | - two-stream network, using RGB appearance features and optical flow motion features 93 | - **FSEG** (Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos, CVPR2017) 94 | - two-stream network trained with mined supplemental data 95 | 96 | 97 | 98 | ##### Early works 99 | 100 | Bilateral space video segmentation, 2016 101 | 102 | Video segmentation via object flow, 2016 103 | 104 | Efficient video segmentation using parametric graph partitioning, 2015 105 | 106 | Streaming hierarchical video segmentation, 2012 107 | 108 | 109 | 110 | 111 | 112 | ### Details 113 | 114 | #### OSVOS (One Shot Video Object Segmentation) 115 | 116 | ![osvos](../paper_reports/images/osvos.png) 117 | 118 | 1. Take a net (say VGG-16) pre-trained for classification for example, on imagenet. 119 | 2. Convert it to a fully convolutional network, à la [FCN](https://arxiv.org/abs/1605.06211), thus preserving spatial information: 120 | \- Remove the FC layers in the end. 121 | \- Insert a new loss: pixel-wise sigmoid balanced cross entropy (previously used by [HED](https://arxiv.org/abs/1504.06375)). Now each pixel is separately classified into foreground or background. 122 | 3. Train the new fully convolutional network on the DAVIS-2016 training set. 123 | 4. **One-shot training:** At inference time, given a new input video for segmentation and a ground-truth annotation for the first frame (remember, this is a semi-supervised problem), create a new model, initialized with the weights trained in [3] and fine-tuned on the first frame. 124 | 125 | #### MaskTrack (Learning Video Object Segmentation from Static Images) 126 | 127 | ![masktrack](../paper_reports/images/masktrack.png) 128 | 129 | ###### offline training 130 | 131 | conditional mask prediction 132 | 133 | hypothesis: mask estimation are smooth among two near frames 134 | 135 | train: image dataset, use augmentation (deformation and affine transformation on mask) to simulate last frame prediction 136 | 137 | test: RGB+last frame mask estimation -> current frame mask estimation 138 | 139 | ###### online training 140 | 141 | fine tune on test video, generate multiple training samples by augmentation (deformation and affine transformation on mask) 142 | 143 | 144 | 145 | #### CRN 146 | 147 | Motion-guided cascaded refinement network for video object segmentation -------------------------------------------------------------------------------- /summary.md: -------------------------------------------------------------------------------- 1 | ## Summary 2 | 3 | #### Video 4 | 5 | FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation, liang-chieh Chen,CVPR2019 6 | 7 | **SCOPS: Self-Supervised Co-Part Segmentation**. CVPR2019 8 | 9 | MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, (semi-supervised,) CVPR2019 10 | 11 | RVOS: End-to-End Recurrent Network for Video Object Segmentation,(zero shot) 12 | 13 | **Spatiotemporal CNN for Video Object Segmentation**, CVPR2019 14 | 15 | - [x] Fast Online Object Tracking and Segmentation: A Unifying Approach, SiamMask, 16 | 17 | **Improving Semantic Segmentation via Video Propagation and Label Relaxation**, CVPR2019 18 | 19 | Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation 20 | 21 | 22 | 23 | Foreground Clustering for Joint Segmentation and Localization in Videos and Images, NIPS2018 24 | 25 | Learning Video Object Segmentation from Static Images (masktrack), CVPR2017 26 | 27 | **Weakly Supervised Semantic Segmentation Using Web-Crawled Videos**, CVPR2017 28 | 29 | Learning semantic segmentation with weakly-annotated videos, ECCV2016 30 | 31 | Fast object segmentation in unconstrained video, ICCV2013 32 | 33 | 34 | 35 | ##### Motion based 36 | 37 | ###### optical flow 38 | 39 | MoNet: Deep Motion Exploitation for Video Ojbect Segmentation, CVPR2018 40 | 41 | ###### mask refinement 42 | 43 | Efficient video object segmentation via network modulation, CVPR 2018 44 | 45 | Learning video object segmentation from static images, 2017 46 | 47 | Fast and Accurate online video segmentation via tracking parts, 2018 48 | 49 | ##### Detection based 50 | 51 | One shot video object segmentation 52 | 53 | 54 | 55 | ##### Semantic 56 | 57 | Semantic Video CNNs Through Representation Warping, ICCV 2017 58 | 59 | Semantic Video Segmentation by Gated Recurrent Flow Propagation, CVPR2018 60 | 61 | Efficient Uncertainty Estimation for Semantic Segmentation in Videos, ECCV 2018 62 | 63 | 64 | 65 | ##### Others 66 | 67 | **Towards segmenting anything that moves**, 68 | 69 | **SCOPS: Self-Supervised Co-Part Segmentation** 70 | 71 | Video Object Segmentation and Tracking: A Survey, LinGuosheng 72 | 73 | **Jifeng Dai** 74 | 75 | [不同视角构造cycle-consistency,降低视频标注成本](https://mp.weixin.qq.com/s?__biz=MzU4MjQ3MDkwNA==&mid=2247489650&idx=1&sn=9bf3faf9e3f701c691c6d7c0230c812c&pass_ticket=kKH6zQhjNNZcUufO56qeszGgG9f0k9DjYmd9pbbUc4IN3KNpnJi%2Fle2KYoKpjvay) 76 | 77 | - [Learning Correspondence from the Cycle-consistency of Time](paper_reports/Cct), cycle between different frame patch 78 | - [Temporal Cycle-Consistency Learning](paper_reports/Tcc), cycle between different video frame 79 | - [Time-Contrastive Networks](paper_reports/Tcn), triplet between frames among same video 80 | 81 | Depth from videos in the wild: Unsupervised Monocular Depth Learning from Unknown Cameras 82 | 83 | 84 | 85 | #### Domain Adaptation 86 | 87 | Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR2018 88 | 89 | Knowledge Adaptation for Efficient Semantic Segmentation, CVPR2019 90 | 91 | Structure Knowledge Distillation for Semantic Segmentation, CVPR2019 92 | 93 | Self-ensembling for visual domain adaptation, 94 | 95 | Deep semi-supervised segmentation with weight-averaged consistency targets, MICCA2018 96 | 97 | Semi-supervised Skin Lesion Segmentation via Transformation Consistent Self-ensembling Model, BMVC2018 98 | 99 | Data augmentation using learned transforms for one-shot medical image segmentation, CVPR2019 100 | 101 | 102 | 103 | #### Self-Supervised 104 | 105 | Revisiting Self-Supervised Visual Representation Learning, CVPR2019, review 4 common image ss methods 106 | 107 | SCOPS: Self-Supervised Co-Part Segmentation, CVPR2019 108 | 109 | Time-contrastive Networks: Self-Supervised Learning from Video 110 | 111 | Temporal Cycle-Consistency Learning, CVPR2019 112 | 113 | Learning Correspondence from the Cycle-consistency of Time, CVPR2019 114 | 115 | 116 | 117 | #### Webly 118 | 119 | [WebSeg: Learning Semantic Segmentation from Web Searches](paper_reports/WebSeg), arxiv, edges+MCG+saliency 120 | 121 | [Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation](paper_reports/WebcoSeg), BMVC2017, co-segmentation 122 | 123 | [Bootstrapping the Performance of Webly Supervised Semantic Segmentation](), CVPR2018, two model, 124 | 125 | #### Few Shot 126 | 127 | [CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning](paper_reports/CA_fewshot), CVPR2019 128 | 129 | **Few-Shot Semantic Segmentation with Prototype Learning**, BMVC2018 130 | 131 | Data augmentation using learned transformations for one-shot medical image segmentation, CVPR2019, spatial transform and appearance transform 132 | 133 | 134 | 135 | #### Semi-Supervised 136 | 137 | Weakly- and Semi-Supervised Panoptic Segmentation, ECCV2018 138 | 139 | **Adversarial Learning for Semi-Supervised Semantic Segmentation**, BMVC2018, Ming-Hsuan Yang 140 | 141 | Transferable Semi-supervised Semantic Segmentation, AAAI 2018 142 | 143 | **Adversarial Dropout for Supervised and Semi-Supervised Learning**, AAAI 2018 144 | 145 | PIXEL LEVEL DATA AUGMENTATION FOR SEMANTIC IMAGE SEGMENTATION USING GENERATIVE ADVERSARIAL NETWORKS: Interesting 146 | 147 | - balance distribution by generating image using GAN (manipulate on GT mask) 148 | 149 | 150 | 151 | #### Cosegmentation 152 | 153 | 154 | 155 | #### Weakly-Semantic 156 | 157 | Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation, BMVC2018 158 | 159 | **Cyclic Guidance for Weakly supervised Joint Detection and Segmentation** 160 | 161 | 162 | 163 | #### Weakly-Localization 164 | 165 | what about used for fine-grain 166 | 167 | #### Weakly-Instance 168 | 169 | 170 | 171 | 172 | #### Overhaed Imaginary 173 | Self-supervision-for-segmenting-overhead-imagery 174 | 175 | 176 | 177 | #### Fast Segmentation 178 | 179 | Improving Fast Segmentation With Teacher-student Learning, BMVC2018 180 | 181 | 182 | 183 | ## Useful Knowledge 184 | 185 | ### Uncertainty 186 | 187 | [doc](./paper_reports/uncertainty) 188 | 189 | ### Augmentation 190 | 191 | Lucid 192 | 193 | #### Appearance model 194 | 195 | 196 | 197 | #### Transformation model 198 | 199 | deformable 200 | 201 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Weakly-Segmentation 2 | List of useful codes and papers for weakly supervised Semantic/Instance/Panoptic/Few Shot Segmentation 3 | 4 | - [Weakly-Segmentation](#weakly-segmentation) 5 | * [Top Work](#top-work) 6 | - [By Dataset](#by-dataset) 7 | + [PASCAL VOC2012](#pascal-voc2012) 8 | - [By Years](#by-years) 9 | + [ICCV2019](#iccv2019) 10 | * [Resources](#resources) 11 | - [Tutorial](#tutorial) 12 | * [Implementation](#implementation) 13 | - [Related Tasks](#related-tasks) 14 | * [Few-shot segmentation](#few-shot-segmentation) 15 | * [Weakly-supervised Instance Segmentation](#weakly-supervised-instance-segmentation) 16 | * [Weakly-supervised Panoptic Segmentation](#weakly-supervised-panoptic-segmentation) 17 | - [Reading List](#reading-list) 18 | * [Under Review](#under-review) 19 | * [Published](#published) 20 | - [context](#context) 21 | - [graph](#graph) 22 | - [bbox-level](#bbox-level) 23 | - [webly](#webly) 24 | - [Saliency](#saliency) 25 | - [localization](#localization) 26 | - [spp](#spp) 27 | - [affinity](#affinity) 28 | - [region](#region) 29 | - [network](#network) 30 | - [regularizer](#regularizer) 31 | - [evaluation measure](#evaluation-measure) 32 | - [architecture](#architecture) 33 | - [generative adversarial](#generative-adversarial) 34 | - [scene understanding](#scene-understanding) 35 | - [other useful](#other-useful) 36 | - [application](#application) 37 | * [Others](#others) 38 | - [priors](#priors) 39 | - [diffusion](#diffusion) 40 | - [analysis](#analysis) 41 | - [post processing](#post-processing) 42 | - [common methods](#common-methods) 43 | 44 | 45 | ## Top Work 46 | #### By Dataset 47 | ###### PASCAL VOC2012 48 | 49 | | method | val | test | notes | 50 | | ------------ | ---------- | ---------- | ---------- | 51 | | [DSRG](https://github.com/speedinghzl/DSRG)CVPR2018 | 61.4 | 63.2 | deep seeded region growing, resnet-lfov\|vgg-aspp | 52 | | [psa](https://github.com/jiwoon-ahn/psa)CVPR2018 | 61.7 | 63.7 | pixel affinity network, resnet38 | 53 | | [MDC](https://arxiv.org/pdf/1805.04574.pdf)CVPR2018 | 60.4 | 60.8 | multi-dilated convolution, vgg-lfov | 54 | | [MCOF](http://3dimage.ee.tsinghua.edu.cn/wx/mcof)CVPR2018 | 60.3 | 61.2 | iterative, RegionNet(sppx), resnet-lfov | 55 | | [GAIN](https://arxiv.org/abs/1802.10171.pdf)CVPR2018 | 55.3 | 56.8 | | 56 | | [DCSP](https://github.com/arslan-chaudhry/dcsp_segmentation)BMVC2017 | **58.6** | **59.2** | adversarial for saliency, and generate cues by cam+saliency(harmonic mean)| 57 | | [GuidedSeg](https://github.com/coallaoh/GuidedLabelling)CVPR2017 | 55.7 | 56.7 | saliency, TBD| 58 | | [BDSSW](https://github.com/ascust/BDWSS)CVPR2018 | 63.0 | 63.9 | webly, filter+enhance| 59 | | [WegSeg](https://arxiv.org/pdf/1803.09859.pdf)arxiv | 63.1 | 63.3 | webly(pure), Noise filter module| 60 | | [SeeNet](https://arxiv.org/abs/1810.09821)NIPS2018 | 63.1 | 62.8 | based on DCSP | 61 | | [Graph](http://mftp.mmcheng.net/Papers/18ECCVGraphPartition.pdf)ECCV2018 | 63.6 | 64.5 | graph partition| 62 | | [Graph](http://mftp.mmcheng.net/Papers/18ECCVGraphPartition.pdf)ECCV2018 | 64.5 | 65.6 | use simple ImageNet dataset additionally| 63 | | [CIAN](https://arxiv.org/abs/1811.10842)CVPR2019 | 64.1 | 64.7 | cross image affinity network| 64 | | [FickleNet](https://arxiv.org/abs/1902.10421)CVPR2019 | **64.9** | **65.3** | use dropout (a generalization of dilated convolution)| 65 | 66 | #### By Years 67 | ###### ICCV2019 68 | Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation 69 | Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation 70 | ###### CVPR2019 71 | FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference 72 | 73 | 74 | ## Resources 75 | see [this](https://github.com/JackieZhangdx/WeakSupervisedSegmentationList) for more weakly lists and resources. 76 | see [this](https://github.com/wutianyiRosun/Segmentation.X) for more semantic/instance/panoptic/video segmentation lists and resources. 77 | see [this](https://github.com/mrgloom/awesome-semantic-segmentation) for more implementations 78 | a good architecture summary paper:[Learning a Discriminative Feature Network for Semantic Segmentation](https://arxiv.org/pdf/1804.09337.pdf) 79 | #### Tutorial 80 | - Unsupervised Visual Learning Tutorial. *CVPR 2018* [[part 1]](https://www.youtube.com/watch?v=gSqmUOAMwcc) [[part 2]](https://www.youtube.com/watch?v=BijK_US6A0w) 81 | - Weakly Supervised Learning for Computer Vision. *CVPR 2018* [[web]](https://hbilen.github.io/wsl-cvpr18.github.io/) [[part 1]](https://www.youtube.com/watch?v=bXfZFmE8cjo) [[part 2]](https://www.youtube.com/watch?v=FetNp6f19IM) 82 | 83 | ## Implementation 84 | 85 | [pytorch-segmentation-detection](https://github.com/warmspringwinds/pytorch-segmentation-detection) a library for dense inference and training of Convolutional Neural Networks, 68.0% 86 | 87 | [rdn](https://github.com/fyu/drn) Dilated Residual Networks, 75.6%, may be the best available semantic segmentation in PyTorch? 88 | 89 | [Detectron.pytorch](https://github.com/roytseng-tw/Detectron.pytorch) A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available. only for coco now 90 | 91 | [AdvSemiSeg](https://github.com/hfslyc/AdvSemiSeg) Adversarial Learning for Semi-supervised Semantic Segmentation. heavily borrowed from a **pytorch DeepLab** implementation ([Link](https://github.com/speedinghzl/Pytorch-Deeplab)) 92 | 93 | [PyTorch-ENet](https://github.com/davidtvs/PyTorch-ENet) PyTorch implementation of ENet 94 | 95 | [tensorflow-deeplab-resnet](https://github.com/DrSleep/tensorflow-deeplab-resnet) Tensorflow implementation of deeplab-resnet(deeplabv2, resnet101-based): complete and detailed 96 | 97 | [tensorflow-deeplab-lfov](https://github.com/DrSleep/tensorflow-deeplab-lfov) Tensorflow implementation of deeplab-LargeFOV(deeplabv2, vgg16-based): complete and detailed 98 | 99 | [resnet38](https://github.com/itijyou/ademxapp) Wider or Deeper: Revisiting the ResNet Model for Visual Recognition: implemented using MXNET 100 | 101 | [pytorch_deeplab_large_fov](https://github.com/BardOfCodes/pytorch_deeplab_large_fov): deeplab v1 102 | 103 | [pytorch-deeplab-resnet](https://github.com/isht7/pytorch-deeplab-resnet)DeepLab resnet v2 model in pytorch 104 | 105 | [DeepLab-ResNet-Pytorch](https://github.com/speedinghzl/Pytorch-Deeplab) Deeplab v3 model in pytorch, 106 | 107 | [BDWSS](https://github.com/ascust/BDWSS) Bootstrapping the Performance of Webly Supervised Semantic Segmentation 108 | 109 | [psa](https://github.com/jiwoon-ahn/psa) Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation 110 | 111 | [DSRG](https://github.com/speedinghzl/DSRG): Caffe, CAM and DRFI provided 112 | 113 | SEC 114 | - [original](https://github.com/kolesman/SEC): Caffe 115 | - [BDSSW](https://github.com/ascust/BDWSS): MXNET 116 | - [SEC-tensorflow](https://github.com/xtudbxk/SEC-tensorflow): tensorflow 117 | 118 | # Related Tasks 119 | ## Few-shot segmentation 120 | - [ ] One-shot learning for semantic segmentation, BMVC2017 121 | - [ ] Conditional networks for few-shot semantic segmentation, ICLR2018 Workshop 122 | - [ ] Few-Shot Segmentation Propagation with Guided Networks, preprint 123 | - [ ] Few-Shot Semantic Segmentation with Prototype Learning, BMVC2018 124 | - [ ] Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation, AAAI2019 125 | - [ ] CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning, CVPR2019 126 | - [ ] One-Shot Segmentation in Clutter, ICML 2018 127 | 128 | ## Weakly-supervised Instance Segmentation 129 | - [x] Weakly Supervised Instance Segmentation using Class Peak Response, CVPR2018 130 | - [ ] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR2019 131 | - [ ] Object Counting and Instance Segmentation with Image-level Supervision, CVPR2019 132 | - [x] Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation, CVPR2019 133 | - [x] Where are the Masks: Instance Segmentation with Image-level Supervision, BMVC2019 134 | - [ ] Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation, ICCV2019 135 | 136 | ## Weakly-supervised Panoptic Segmentation 137 | - [ ] Weakly- and Semi-Supervised Panoptic Segmentation, ECCV2018 138 | 139 | # Reading List 140 | 141 | ## Under Review 142 | - [ ] [Gated CRF Loss for Weakly Supervised Semantic Image Segmentation](https://arxiv.org/abs/1906.04651) 143 | - [ ] [Closed-Loop Adaptation for Weakly-Supervised Semantic Segmentation](https://arxiv.org/abs/1905.12190) 144 | - [ ] [Harvesting Information from Captions for Weakly Supervised Semantic Segmentation](https://arxiv.org/abs/1905.06784) 145 | - [ ] [Consistency regularization and CutMix for semi-supervised semantic segmentation](https://arxiv.org/abs/1906.01916) 146 | - [ ] [Zero-shot Semantic Segmentation](https://arxiv.org/abs/1906.00817) 147 | - [x] [Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation](https://arxiv.org/pdf/1909.03714.pdf), propose an scale equivariant regularization. 148 | 149 | ## Published 150 | #### context 151 | - [x] Context Encoding for Semantic Segmentation: CVPR2018. use TEN 152 | - [ ] The Role of Context for Object Detection and Semantic Segmentation in the Wild: CVPR2014 153 | - [ ] Objects as Context for Detecting Their Semantic Parts: CVPR2018 154 | - [ ] Exploring context with deep structured models for semantic segmentation: TPAMI2017 155 | - [ ] dilated convolution 156 | - [ ] Deep TEN: Texture encoding network !!: CVPR2017. A global context vector, pooled from all spatial positions, can be concatenated to local features 157 | - [ ] Refinenet: Multi-path refinement networks for high-resolution semantic segmentation: CVPR2017. local features across different scales can be fused to encode global context 158 | - [x] Non-local neural networks: CVPR2018. a densely connected graph with pairwise edges between all pixels 159 | 160 | #### graph 161 | - [ ] Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation: ECCV2018 162 | 163 | #### bbox-level 164 | Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation, CVPR2019 165 | 166 | #### webly 167 | - [x] Weakly Supervised Semantic Segmentation Based on Web Image Cosegmentation: BMVC2017, training model using masks of web images which are generated by cosegmentation 168 | - [ ] Webly Supervised Semantic Segmentation: CVPR2017 169 | - [x] Weakly Supervised Semantic Segmentation using Web-Crawled Videos: CVPR2017, learns a class-agnostic decoder(attention map -> binary mask), pseudo masks are generated from video frames by solving a graph-based optimization problem. 170 | - [x] Bootstrapping the Performance of Webly Supervised Semantic Segmentation: target + web domain, target model filters web images, refine mask by combine target and web masks. 171 | - [ ] Learning from Weak and Noisy Labels for Semantic Segmentation: TPAMI2017 172 | - [x] WebSeg: Learning Semantic Segmentation from Web Searches: arxiv, directly learning from keywork retrievaled web images. using saliency and region(MCG with edge) 173 | - [x] STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation: TPAMI 2017, Initial, Enhanced, Powerful three DCNN model. inital mask(generated by saliency and label using simple images) -> initial model -> enhanced mask(generated using simple images) -> Enhanced model -> powerful mask(generated using complex images) -> powerful model 174 | - saliency can not handle complex images, so BMVC2017 uses coseg instead 175 | 176 | #### Saliency 177 | - [x] Exploiting Saliency for Object Segmentation from Image Level Labels: CVPR2017 178 | - [x] Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation: BMVC2017 179 | - combine saliency(off-shelf) and CAM to get cues, use harmonic mean function 180 | - adapt CAM from head of Segmentation Network 181 | - use erasing to get multiple objects' saliency 182 | 183 | #### localization 184 | - [x] Adversarial Complementary Learning for Weakly Supervised Object Localization, CVPR2018. two branchs, remove high activations from feature map. [code](https://github.com/xiaomengyc/ACoL) 185 | - [x] [Tell me where to look: Guided Attention Inference Network](https://arxiv.org/pdf/1802.10171.pdf), CVPR2018. origin image soft erasing(CAM after sigmoid as attention) -> end2end training, force erased images have zero activation 186 | - [x] Self-Erasing Network for Integral Object Attention, NIPS2018: prohibit attentions from spreading to unexpected background regions. 187 | - cam -> tenary mask(attention, background, potential) 188 | - self erasing only in attention + potential region(**sign flip in background region** instead of setting to 0 simply) 189 | - self produced psedo label for background region(difference to SPG: 1.psedo label for background and attention 2.supervise low layer) 190 | - [x] Self-produced Guidance for Weakly-supervised Object localization, ECCV2018: 191 | - self supervised use top down framework, for single label classification prob. **add pixel-wise supervision when only have image level label** 192 | - B1, B2 sharing 193 | - bottom guide top inversely(B1+B2 -> C) 194 | 195 | #### spp 196 | - [ ] Superpixel convolutional networks using bilateral inceptions 197 | - [x] Learning Superpixels with Segmentation-Aware Affinity Loss: good intro for superpixel algs. 198 | 199 | #### affinity 200 | - [x] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation: image-level, semantic affinity, learn a **network** to predict affinity 201 | - [x] Adaptive Affinity Field for Semantic Segmentation: ECCV2018, semantic affinity. add a pairwise term in seg **loss**(similarity metric: KL divergence), use an adversarial method to determine optimal neighborhood size 202 | 203 | #### region 204 | - [ ] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation 205 | - [ ] Simultaneous Detection and Segmentation, 2014 206 | - [ ] Feedforward semantic segmentation with zoom-out features: 2015 207 | 208 | #### network 209 | - [ ] Learned Shape-Tailored Descriptors for Segmentation 210 | - [ ] Normalized Cut Loss for Weakly-Supervised CNN Segmentation 211 | - [ ] Fully Convolutional Adaptation Networks for Semantic Segmentation 212 | - [ ] Learning to Adapt Structured Output Space for Semantic Segmentation 213 | - [x] Semantic Segmentation with Reverse Attention: BMVC2017, equally responses of multi classes(confusion in boudary region). add reverse branch, predict the probability of pixel that doesn't belong to the corresponding class. and use attention to combine origin and reverse branch 214 | - [x] Deep Clustering for Unsupervised Learning of Visual Features, ECCV2018. use assignments of knn as supervision to update weights of network 215 | - [x] DEL: Deep Embedding Learning for Efficient Image Segmentation, IJCAI 2018. use spp embedding as init probs to do image segmentation 216 | - [x] Learning a Discriminative Feature Network for Semantic Segmentation, CVPR2018, Smoother network: multi-scale+global context(FPN with channel atention), Broder Network: focal loss for boundary. [code?](https://github.com/YuhuiMa/DFN-tensorflow) 217 | - [ ] Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation: BMVC 2018 218 | - [ ] Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation: CVPR2019 219 | 220 | #### regularizer 221 | - [ ] [Normalized Cut Loss for Weakly-Supervised CNN Segmentation](https://arxiv.org/pdf/1804.01346.pdf) 222 | - [ ] [Regularized Losses for Weakly-supervised CNN Segmentation](https://github.com/meng-tang/rloss) 223 | 224 | #### evaluation measure 225 | - [ ] [Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation](https://www.cs.umanitoba.ca/~ywang/papers/isvc16.pdf) 226 | - [ ] [The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks](https://arxiv.org/pdf/1705.08790.pdf) 227 | - [ ] [What is a good evaluation measure for semantic segmentation?](http://www.bmva.org/bmvc/2013/Papers/paper0032/paper0032.pdf) 228 | 229 | #### architecture 230 | - [ ] The Devil is in the Decoders, BMVC2017 231 | - [x] Dilated Residual Networks, CVPR2017. Dilated structure design for classification and localization. 232 | - [x] Understanding Convolution for Semantic Segmentation, WACV2018. hybrid dilated convolution(2-2-2 -> 1-2-3) 233 | - [x] Smoothed Dilated Convolutions for Improved Dense Prediction, KDD2018. separable and share conv(for smoothing) + dilated conv 234 | - [x] Deeplab v1, v2, v3, v3+ 235 | - [ ] Learning Fully Dense Neural Networks for Image Semantic Segmentation, AAAI2019 236 | 237 | #### generative adversarial 238 | - [ ] **Deep dual learning for semantic image segmentation**:CVPR2017, image translation 239 | - [x] Semantic Segmentation using Adversarial Networks, NIPS2016 workshop 240 | - add gan loss branch, Segnet as generator, D: GT mask or predicted mask 241 | - [x] Adversarial Learning for Semi-Supervised Semantic Segmentation: BMVC2018 242 | - semi supervised: SegNet as G, FCN-type D(discriminate each location), use output of D as psedo label for unlabeled data 243 | - [x] Semi and weakly Supervised Semantic Segmentation Using Generative Adversarial Network: ICCV2017, use SegNet as D, treat fake as new class 244 | - weakly, use conditionalGan, pixel-level, image-level, generated data are included in loss. performance boosts less when increasing fully data 245 | - [ ] generative adversarial learning towards Fast weakly supervised detection: CVPR2018 246 | - [x] Adaptive Affinity Field for Semantic Segmentation: ECCV2018, semantic affinity. add a pairwise term in seg **loss**(similarity metric: KL divergence), use an adversarial method to determine optimal neighborhood size 247 | 248 | #### scene understanding 249 | - [ ] ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans 250 | - [ ] SeGAN: Segmenting and Generating the Invisible 251 | 252 | #### other useful 253 | - [ ] Learning to Segment Every Thing: semi-supervised, weight transfer function (from bbox parameters to mask parameters) 254 | - [ ] Simple Does It: Weakly Supervised Instance and Semantic Segmentation: bbox-level, many methods, using graphcut, HED, MCG 255 | - [ ] Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning: tricky, curriculum learning: image level -> instance level -> pixel level 256 | - [ ] Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation: CVPR2017 257 | - [x] Improving Weakly-Supervised Object Localization By Micro-Annotation: BMVC2016, object classes always co-occur with same background elements(boat, train). propose a new annotation method. add human annotations to improve localization results of CAM, annotating based on clusters of dense features. each class uses a spectral clustering.(CAM has problem) 258 | - [x] Co-attention CNNs for Unsupervised Object Co-segmentation: IJCAI 2018 259 | - [ ] Coarse-to-fine Image Co-segmentation with Intra and Inter Rank Constraints, IJCAI2018 260 | - [ ] Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters, IJCAI2018 261 | - [x] Image-level to Pixel-wise Labeling: From Theory to Practice: fully, analysis the effect of image labels on seg results. add a generator(recover original image). image label(binary, use a threshold small than 0.5, eg:0.25), IJCAI2018 262 | 263 | #### application 264 | - [x] SeGAN: Segmenting and Generating the Invisible: CVPR2018, generate occluded parts 265 | - [x] Learning Hierarchical Semantic Image Manipulation through Structured Representations: NIPS2018, manipulate image on object-level by modify bbox 266 | 267 | 268 | ## Others 269 | #### priors 270 | - Superpixels: An Evaluation of the State-of-the-Art [link](https://github.com/davidstutz/superpixel-benchmark) 271 | - Learning Superpixels with Segmentation-Aware Affinity Loss[link](http://jankautz.com/publications/LearningSuperpixels_CVPR2018.pdf) 272 | - Superpixel based Continuous Conditional Random Field Neural Network for Semantic Segmentation [link](https://www.sciencedirect.com/science/article/pii/S0925231219300281) 273 | 274 | #### diffusion 275 | Learning random-walk label propagation for weakly-supervised semantic segmentation: scribble 276 | 277 | Convolutional Random Walk Networks for Semantic Image Segmetation: fully, affinity branch(low level) 278 | 279 | Soft Proposal Networks for Weakly Supervised Object Localization: attention, semantic affinity 280 | 281 | Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation: image-level, semantic affinity 282 | 283 | #### analysis 284 | image level to pixel wise labeling: from theory to practice: IJCAI 2018 analysis the effectiveness of class-level labels for segmentation(GT, predicted) 285 | Attention based Deep Multiple Instance Learning: ICML 2018. CAM from MIL perspective view 286 | 287 | #### post processing 288 | listed in : [Co-attention CNNs for Unsupervised Object Co-segmentation](https://www.csie.ntu.edu.tw/~cyy/publications/papers/Hsu2018CAC.pdf) 289 | - Otsu’s method 290 | - GrabCut 291 | - CRF 292 | 293 | #### common methods 294 | - refine segmentation results using image-level labels 295 | - multi-label classification branch(BDWSS) 296 | - generative branch(to original image) 297 | - crf 298 | 299 | --------------------------------------------------------------------------------