├── slides
    └── .gitkeep
├── paper_reports
    ├── .gitignore
    ├── Lucid.md
    ├── fastvideoseg.md
    ├── images
    │   ├── cct.png
    │   ├── tcc.png
    │   ├── tcn.png
    │   ├── BDWSS.png
    │   ├── CANEt.png
    │   ├── osvos.png
    │   ├── video.png
    │   ├── WebcoSeg.png
    │   ├── WegSeg.png
    │   ├── tcc_dig.png
    │   ├── tcc_eq1.png
    │   ├── tcc_eq2.png
    │   ├── tcc_eq3.png
    │   ├── tcc_eq4.png
    │   ├── tcc_eq5.png
    │   ├── tcc_eq6.png
    │   ├── tcc_eq7.png
    │   ├── masktrack.png
    │   └── videoprediction.png
    ├── Cct.md
    ├── TransKg.md
    ├── BDWSS.md
    ├── WebcoSeg.md
    ├── uncertainty.md
    ├── CA_fewshot.md
    ├── Tcn.md
    ├── WebSeg.md
    ├── VOSTsv.md
    ├── Tcc.md
    ├── Seg_video_propa.md
    └── VideoSeg.md
├── awesome-marketplace.md
├── self-supervised.md
├── summary.md
└── README.md


/slides/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/paper_reports/.gitignore:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/paper_reports/Lucid.md:
--------------------------------------------------------------------------------
1 | ## Lucid
2 | 
3 | 


--------------------------------------------------------------------------------
/paper_reports/fastvideoseg.md:
--------------------------------------------------------------------------------
1 | ## Fast object segmentation in unconstrained video
2 | 
3 | 


--------------------------------------------------------------------------------
/paper_reports/images/cct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/cct.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc.png


--------------------------------------------------------------------------------
/paper_reports/images/tcn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcn.png


--------------------------------------------------------------------------------
/paper_reports/images/BDWSS.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/BDWSS.png


--------------------------------------------------------------------------------
/paper_reports/images/CANEt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/CANEt.png


--------------------------------------------------------------------------------
/paper_reports/images/osvos.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/osvos.png


--------------------------------------------------------------------------------
/paper_reports/images/video.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/video.png


--------------------------------------------------------------------------------
/paper_reports/images/WebcoSeg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/WebcoSeg.png


--------------------------------------------------------------------------------
/paper_reports/images/WegSeg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/WegSeg.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_dig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_dig.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq1.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq2.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq3.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq4.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq5.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq6.png


--------------------------------------------------------------------------------
/paper_reports/images/tcc_eq7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/tcc_eq7.png


--------------------------------------------------------------------------------
/paper_reports/images/masktrack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/masktrack.png


--------------------------------------------------------------------------------
/paper_reports/images/videoprediction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinlee9/Semantic-Segmentation/HEAD/paper_reports/images/videoprediction.png


--------------------------------------------------------------------------------
/paper_reports/Cct.md:
--------------------------------------------------------------------------------
1 | ## Learning Correspondence from the Cycle-consistency of Time
2 | 
3 | ![cct](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/cct.png)


--------------------------------------------------------------------------------
/paper_reports/TransKg.md:
--------------------------------------------------------------------------------
 1 | ## Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network
 2 | 
 3 | 
 4 | 
 5 | 
 6 | 
 7 | 
 8 | 
 9 | ## Weakly Supervised Semantic Segmentation using Web-Crawled Videos
10 | 
11 | ![video](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/video.png)


--------------------------------------------------------------------------------
/paper_reports/BDWSS.md:
--------------------------------------------------------------------------------
 1 | ## Bootstrapping the Performance of Webly Supervised Semantic Segmentation
 2 | 
 3 | Complexity Measure: target domain model
 4 | 
 5 | Proxy Ground Truth:  fuse **target** and web domain pseudo masks 
 6 | 
 7 | 
 8 | 
 9 | ![BDWSS](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/BDWSS.png)
10 | 
11 | 


--------------------------------------------------------------------------------
/awesome-marketplace.md:
--------------------------------------------------------------------------------
 1 | **Links of awesome resources of hot topics, including papers, codes, slides, etc.**
 2 | 
 3 | ####  Knowledge-Distillation
 4 | - [Link](https://github.com/dkozlov/awesome-knowledge-distillation)
 5 | 
 6 | #### Self-Supervised Learning
 7 | - [Link](https://github.com/jason718/awesome-self-supervised-learning)
 8 | 
 9 | 
10 | 
11 | 
12 | 
13 | 
14 | 
15 | #### Unknown
16 | 
17 | Segmentations is All You Need


--------------------------------------------------------------------------------
/paper_reports/WebcoSeg.md:
--------------------------------------------------------------------------------
1 | ## Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation
2 | 
3 | Complexity Measure: x, co-segmentation has good tolerance to noise
4 | 
5 | Proxy Ground Truth: pseudo masks using weakly model trained by co-segmentation
6 | 
7 | Filtering: keep images which predicted masks have rate of 0.2-0.8
8 | 
9 | ![WebcoSeg](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/WebcoSeg.png)


--------------------------------------------------------------------------------
/paper_reports/uncertainty.md:
--------------------------------------------------------------------------------
 1 | ## Uncertainty (Bayesian Deep Learning)
 2 | 
 3 | ### *What* *Uncertainties* *Do* *We* *Need* *in* *Bayesian* *Deep* *Learning* for Computer *Vision*
 4 | 
 5 | 
 6 | 
 7 | ### Bounding Box Regression with Uncertainty for Accurate Object Detection
 8 | 
 9 | 
10 | 
11 | ### Uncertainty in Deep Learning (Phd Thesis)
12 | 
13 | [Link](<http://mlg.eng.cam.ac.uk/yarin/blog_2248.html>)
14 | 
15 | 
16 | 
17 | 深度学习中的两种不确定性<https://zhuanlan.zhihu.com/p/56986840>
18 | 
19 | 
20 | 
21 | 
22 | 
23 | 


--------------------------------------------------------------------------------
/paper_reports/CA_fewshot.md:
--------------------------------------------------------------------------------
 1 | ## CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning
 2 | ![CANEt](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/CANEt.png)
 3 | 
 4 | 
 5 | 
 6 | #### Iterative Optimization module
 7 | 
 8 | use last iteration predicted probability maps and input features (concat) to predict current masks in a residual form, 
 9 | 
10 | predicted map has $p$ probability to set to be zero (resist over-fitting in iterative optimization)
11 | 
12 | 
13 | 
14 | 
15 | 
16 | 
17 | 
18 | 
19 | 
20 | 


--------------------------------------------------------------------------------
/paper_reports/Tcn.md:
--------------------------------------------------------------------------------
 1 | ## Time-Contrastive Networks: Self-Supervised Learning from Video
 2 | 
 3 | self-supervised in a single video
 4 | 
 5 | **triplet loss**: frames near anchor are treated as positive samples, and frames far from anchor are treated as negative samples
 6 | 
 7 | The model trains itself by trying to answer the following questions simultaneously:
 8 | 
 9 | - What is common between the different-looking blue frames?
10 | -  What is different between the similar-looking red and blue frames?
11 | 
12 | ![tcn](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcn.png)


--------------------------------------------------------------------------------
/paper_reports/WebSeg.md:
--------------------------------------------------------------------------------
 1 | ## WebSeg: Learning Semantic Segmentation from Web Searches
 2 | 
 3 | - use low level cues as ground truth: regions, saliency
 4 | 
 5 |   - regions are get using MCG on **edges maps**
 6 | 
 7 |   - saliency use DSS
 8 | 
 9 | - filter GT by a region net
10 | 
11 | ![WegSeg](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/WegSeg.png)
12 | 
13 | #### complexity image measure
14 | 
15 | drop web crawled complex images
16 | 
17 | -  variance of Laplace
18 | - saturation / brightness
19 | 
20 | #### proxy ground truth 
21 | 
22 | Region + Saliency
23 | 
24 | #### noise filtering module
25 | 
26 | labels: region probs
27 | 
28 | network: spp pooling network


--------------------------------------------------------------------------------
/paper_reports/VOSTsv.md:
--------------------------------------------------------------------------------
 1 | ## Video Object Segmentation and Tracking: A Survey
 2 | 
 3 | ### VOS
 4 | 
 5 | bottom-up:
 6 | 
 7 | - spatio-temporal motion
 8 | - appearance similarity 
 9 | 
10 | iteratively optimize energy functions / fine-tunes deep network
11 | 
12 | read multiple frames at once -> take full advantage of the  context of multiple frames -> suited short-term images
13 | 
14 | 
15 | 
16 | ### VOT
17 | 
18 | use class-specific detector to robustly predict the motion state (location, size, orientation) -> suited long-term sequences
19 | 
20 | - generative / discriminative appearance models
21 | 
22 | - part-based tracking
23 | - segmentation-based tracking
24 | 
25 | 
26 | 
27 | 
28 | 
29 | 


--------------------------------------------------------------------------------
/self-supervised.md:
--------------------------------------------------------------------------------
 1 | ## Self-Supervised Learning
 2 | 
 3 | #### Image
 4 | 
 5 | Rotation: predicting rotation degree
 6 | 
 7 | Exemplar: each image correspond to one class, use triplet loss 
 8 | 
 9 | Jigsaw:  recover relative spatial position of 9 randomly sampled image patches after a random permutation
10 | 
11 | Relative Patch Location: predicting the relative location of two given patches of an image. 
12 | 
13 | 
14 | 
15 | #### Video
16 | 
17 | cycle between tracking frame patches in same video
18 | 
19 | cycle between frames in similar videos (most similar frame of frame a in video A is frame b is video B, then then most similiar frame of frame B should be frame A correspondingly )
20 | 
21 | triplet between time-near, faraway frames and anchor frame among same video 
22 | 
23 | 
24 | 
25 | 
26 | 
27 | 
28 | 
29 | #### Papers
30 | 
31 | Revisiting Self-Supervised Visual Representation Learning, CVPR2019
32 | 
33 | SCOPS: Self-Supervised Co-Part Segmentation,  CVPR2019
34 | 
35 | Time-contrastive Networks: Self-Supervised Learning from Video
36 | 
37 | Temporal Cycle-Consistency Learning,  CVPR2019
38 | 
39 | Learning Correspondence from the Cycle-consistency of Time, CVPR2019
40 | 
41 | 
42 | 
43 | 


--------------------------------------------------------------------------------
/paper_reports/Tcc.md:
--------------------------------------------------------------------------------
 1 | ## Temporal Cycle-Consistency Learning
 2 | 
 3 | 
 4 | 
 5 | ![tcc](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc.png)
 6 | 
 7 | matching in mid-level feature
 8 | 
 9 | cycle consistency in videos
10 | 
11 | ![tcc_dig](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_dig.png)
12 | 
13 | #### Cycle-back LOSS
14 | 
15 | ###### classification
16 | 
17 | Given the selected point $u_i$
18 | 
19 | cycle-forward: soft nearest neighbor: ![tcc_eq1](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq1.png)
20 | 
21 | cycle-back: use distance as logits:  ![tcc_eq2](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq2.png)     ![tcc_eq3](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq3.png)
22 | 
23 | 
24 | 
25 | ###### regression
26 | 
27 | penalize the model less if cycle-back frame is near the anchor frame
28 | 
29 | a similarity vector $\beta$ along $u_i$
30 | 
31 | ![tcc_eq4](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq4.png)
32 | 
33 | Give $\beta$ a Gaussian prior, center is position of anchor frame i
34 | 
35 | ![tcc_eq5](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq5.png)
36 | 
37 | where ![tcc_eq6](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq6.png)
38 | 
39 | or only minimize mean
40 | 
41 | ![tcc_eq7](/home/zhikang/src/python/Semantic-Segmentation/paper_reports/images/tcc_eq7.png)


--------------------------------------------------------------------------------
/paper_reports/Seg_video_propa.md:
--------------------------------------------------------------------------------
 1 | ## Improving Semantic Segmentation via Video Propagation and Label Relaxation
 2 | 
 3 | #### Introduction
 4 | 
 5 | synthesizing new training samples: use a  video prediction-based methodology
 6 | 
 7 | video prediction: prone to producing unnatural distortions along object boundaries
 8 | 
 9 | ![framework](images/videoprediction.png)
10 | 
11 | 
12 | 
13 | #### Contribution
14 | 
15 | label propagation:
16 | 
17 | - patch matching: sensitive to patch size and threshold
18 | - optical flow: rely on accurate optical flow
19 | 
20 | 
21 | 
22 | This paper:
23 | 
24 | - motion vectors from video prediction (self-supervised training)
25 | 
26 |  -	joint image-label propagation
27 | 
28 | 
29 | 
30 | Boundary handling:
31 | 
32 | - edge cues as constraints
33 |   - error propagation from edge estimation
34 |   - overfitting:  fitting extremely hard boundary cases 
35 | - structure modeling: affinity field [21], random walk [5], relaxation labelling[37], boundary neural fields [4]
36 |   - not directly deals with boundary pixels
37 | 
38 | This Paper: predict multiple classes at a boundary pixel
39 | 
40 | 
41 | 
42 | #### Method
43 | 
44 | ###### Video Prediction
45 | 
46 | SDC-Net: Video Prediction using Spatially-Displaced Convolution
47 | 
48 | ###### Boundary Label Relaxation
49 | 
50 | difficult to classify the center pixel of a receptive field when potentially half or more of the input context could be from a different class
51 | 
52 | For boundary pixels:
53 | 
54 | - x maximizing the likelihood of the target label
55 | - maximize the likelihood of $P(A \cup B) = P(A) + P(B)$,  A, B is neighbor classes
56 | 
57 | - loss is $\mathcal{L}_{boundary} = -log\sum_{C\in\mathcal{N}}{P(C)}$
58 | 
59 | 


--------------------------------------------------------------------------------
/paper_reports/VideoSeg.md:
--------------------------------------------------------------------------------
  1 | ## Video Segmentation Overview
  2 | #### Basic
  3 | 
  4 | roadmap:
  5 | 
  6 | - interleave box tracking with box-driven segmentation
  7 | - propagate the first frame segmentation via graph labeling
  8 | 
  9 | 
 10 | 
 11 | Lucid Data Dreaming augmentations, temporal component
 12 | 
 13 | 
 14 | 
 15 | ### Approaches
 16 | 
 17 | #### Semi-supervised
 18 | 
 19 | ###### Matching-based
 20 | 
 21 | - OSVOS
 22 | - OnAVOS (Online adaptation of convolutional neural networks for video object segmentation, BMVC 2017)
 23 | - updating the network online with additional high confident predictions
 24 | - OSVOS-S(Video object segmentation without temporal information,  PAMI 2018)
 25 |      - semantic information from an instance segmentation network
 26 |      - using the instance segments of the different objects in the scene as prior knowledge and blend them with the segmentation output
 27 | 
 28 | - **Lucid**
 29 | 
 30 |    - FAVOS, PML, videomatch
 31 | 
 32 |      
 33 | 
 34 | ###### Propagation-based
 35 | 
 36 | - MaskTrack
 37 | - LucidTracker (Lucid Data Dreaming for Video Object Segmentation, IJCV)
 38 | - CRN (Motion guided cascaded refinement network for video object segmentation, CVPR2018)
 39 |   - applying active contour on optical flow to find motion cues
 40 | - **CINM**: (Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF, CVPR2018)
 41 |   - long-term temporal dependency
 42 | - MoNet:  (MoNet: Deep motion exploitation for video object segmentation, CVPR2018)
 43 |   - exploits optical flow motion cues by feature alignment and a distance transform layer
 44 |   - combined temporal information from nearby frame to track the target
 45 | - LSE: (Video object segmentation by learning location-sensitive embeddings, ECCV2018)
 46 |   - Location-sensitive embeddings used to refine an initial foreground prediction
 47 |   - combined temporal information from nearby frame to track the target
 48 | 
 49 | 
 50 | 
 51 | - OSMN, RGMP, FEELVOS, MHP-VOS, RVOS
 52 |   - meta-learning , Conditional Batch Normalization (CBN) to gather spatiotemporal features
 53 |   - applied instance detection
 54 | 
 55 | - STCNN (Spatiotemporal CNN for Video Object Segmentation, CVPR2019)
 56 |   - the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data
 57 | 
 58 | 
 59 | 
 60 | ###### Detection-based 
 61 | 
 62 | - MHP-VOS: (MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, CVPR2019)
 63 |   - cases that objects are occluded or missing
 64 | 
 65 | 
 66 | 
 67 | ###### Fast (without fine tune)
 68 | 
 69 | - FAVOS (Fast and accurate online video object segmentation via tracking parts, CVPR2018)
 70 | - PML (Blazingly fast video object segmentation with pixel-wise metric learning, CVPR2018)
 71 | - Videomatch (Videomatch: Matching based video object segmentation, CVPR2018)
 72 | 
 73 | 
 74 | 
 75 | - OSMN (Efficient video object segmentation via network modulation, CVPR2018)
 76 |   - meta-learning 
 77 | - RGMP (Fast video object segmentation by reference-guided mask propagation, CVPR2018)
 78 | - FEELVOS (Fast End-to-End Embedding Learning for Video Object Segmentation, CVPR2019)
 79 | 
 80 | 
 81 | 
 82 | #### Unsupervised
 83 | 
 84 | - RVOS (RVOS: End-to-End Recurrent Network for Video Object Segmentation: CVPR2019)
 85 | - IET, (Instance Embedding Transfer to Unsupervised Video Object Segmentation: CVPR2018)
 86 |   - adapt the instance networks trained on static images
 87 |   - incorporate the embeddings with objectness and optical flow features
 88 | - LMP (Learning motion patterns in videos, CVPR2017)
 89 |   - takes optical flow as an input to separate moving and non-moving regions
 90 |   - combines the results with objectness cues from SharpMask [35] to generate the moving object segmentation 
 91 | - LVO (Learning video object segmentation with visual memory, ICCV2017)
 92 |   - two-stream network, using RGB appearance features and optical flow motion features
 93 | - **FSEG** (Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos, CVPR2017)
 94 |   - two-stream network trained with mined supplemental data
 95 | 
 96 | 
 97 | 
 98 | ##### Early works
 99 | 
100 | Bilateral space video segmentation, 2016
101 | 
102 | Video segmentation via object flow, 2016
103 | 
104 | Efficient video segmentation using parametric graph partitioning, 2015
105 | 
106 | Streaming hierarchical video segmentation, 2012
107 | 
108 | 
109 | 
110 | 
111 | 
112 | ### Details
113 | 
114 | #### OSVOS (One Shot Video Object Segmentation)
115 | 
116 | ![osvos](../paper_reports/images/osvos.png)
117 | 
118 | 1. Take a net (say VGG-16) pre-trained for classification for example, on imagenet.
119 | 2. Convert it to a fully convolutional network, à la [FCN](https://arxiv.org/abs/1605.06211), thus preserving spatial information:
120 |    \- Remove the FC layers in the end.
121 |    \- Insert a new loss: pixel-wise sigmoid balanced cross entropy (previously used by [HED](https://arxiv.org/abs/1504.06375)). Now each pixel is separately classified into foreground or background.
122 | 3. Train the new fully convolutional network on the DAVIS-2016 training set.
123 | 4. **One-shot training:** At inference time, given a new input video for segmentation and a ground-truth annotation for the first frame (remember, this is a semi-supervised problem), create a new model, initialized with the weights trained in [3] and fine-tuned on the first frame.
124 | 
125 | #### MaskTrack (Learning Video Object Segmentation from Static Images)
126 | 
127 | ![masktrack](../paper_reports/images/masktrack.png)
128 | 
129 | ###### offline training
130 | 
131 | conditional mask prediction
132 | 
133 | hypothesis: mask estimation are smooth among two near frames 
134 | 
135 | train: image dataset, use augmentation (deformation and affine transformation on mask) to simulate last frame prediction
136 | 
137 | test: RGB+last frame mask estimation -> current frame mask estimation
138 | 
139 | ###### online training
140 | 
141 | fine tune on test video,  generate multiple training samples by augmentation (deformation and affine transformation on mask)
142 | 
143 | 
144 | 
145 | #### CRN
146 | 
147 |  Motion-guided cascaded refinement network for video object segmentation 


--------------------------------------------------------------------------------
/summary.md:
--------------------------------------------------------------------------------
  1 | ## Summary
  2 | 
  3 | #### Video
  4 | 
  5 | FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation, liang-chieh Chen,CVPR2019
  6 | 
  7 | **SCOPS: Self-Supervised Co-Part Segmentation**. CVPR2019
  8 | 
  9 | MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, (semi-supervised,) CVPR2019
 10 | 
 11 | RVOS: End-to-End Recurrent Network for Video Object Segmentation,(zero shot)
 12 | 
 13 | **Spatiotemporal CNN for Video Object Segmentation**, CVPR2019
 14 | 
 15 | - [x] Fast Online Object Tracking and Segmentation: A Unifying Approach, SiamMask, <https://mp.weixin.qq.com/s/tn3DBGQ-bfj8UCuupK-vHg>
 16 | 
 17 | **Improving Semantic Segmentation via Video Propagation and Label Relaxation**， CVPR2019
 18 | 
 19 | Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
 20 | 
 21 | 
 22 | 
 23 | Foreground Clustering for Joint Segmentation and Localization in Videos and Images, NIPS2018
 24 | 
 25 | Learning Video Object Segmentation from Static Images (masktrack), CVPR2017
 26 | 
 27 | **Weakly Supervised Semantic Segmentation Using Web-Crawled Videos**, CVPR2017
 28 | 
 29 | Learning semantic segmentation with weakly-annotated videos, ECCV2016
 30 | 
 31 | Fast object segmentation in unconstrained video, ICCV2013
 32 | 
 33 | 
 34 | 
 35 | ##### Motion based
 36 | 
 37 | ######  optical flow
 38 | 
 39 | MoNet: Deep Motion Exploitation for Video Ojbect Segmentation, CVPR2018
 40 | 
 41 | ###### mask refinement
 42 | 
 43 | Efficient video object segmentation via network modulation, CVPR 2018
 44 | 
 45 | Learning video object segmentation from static images, 2017
 46 | 
 47 | Fast and Accurate online video segmentation via tracking parts, 2018  
 48 | 
 49 | ##### Detection based
 50 | 
 51 | One shot video object segmentation 
 52 | 
 53 | 
 54 | 
 55 | ##### Semantic
 56 | 
 57 | Semantic Video CNNs Through Representation Warping, ICCV 2017
 58 | 
 59 | Semantic Video Segmentation by Gated Recurrent Flow Propagation, CVPR2018
 60 | 
 61 | Efficient Uncertainty Estimation for Semantic Segmentation in Videos, ECCV 2018
 62 | 
 63 | 
 64 | 
 65 | ##### Others
 66 | 
 67 | **Towards segmenting anything that moves**,
 68 | 
 69 | **SCOPS: Self-Supervised Co-Part Segmentation**
 70 | 
 71 | Video Object Segmentation and Tracking: A Survey, LinGuosheng
 72 | 
 73 | **Jifeng Dai**
 74 | 
 75 | [不同视角构造cycle-consistency，降低视频标注成本](https://mp.weixin.qq.com/s?__biz=MzU4MjQ3MDkwNA==&mid=2247489650&idx=1&sn=9bf3faf9e3f701c691c6d7c0230c812c&pass_ticket=kKH6zQhjNNZcUufO56qeszGgG9f0k9DjYmd9pbbUc4IN3KNpnJi%2Fle2KYoKpjvay)
 76 | 
 77 | - [Learning Correspondence from the Cycle-consistency of Time](paper_reports/Cct), cycle between different frame patch
 78 | - [Temporal Cycle-Consistency Learning](paper_reports/Tcc), cycle between different video frame
 79 | - [Time-Contrastive Networks](paper_reports/Tcn), triplet between frames among same video
 80 | 
 81 | Depth from videos in the wild: Unsupervised Monocular Depth Learning from Unknown Cameras
 82 | 
 83 | 
 84 | 
 85 | #### Domain Adaptation
 86 | 
 87 | Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR2018
 88 | 
 89 | Knowledge Adaptation for Efficient Semantic Segmentation, CVPR2019
 90 | 
 91 | Structure Knowledge Distillation for Semantic Segmentation, CVPR2019
 92 | 
 93 | Self-ensembling for visual domain adaptation, 
 94 | 
 95 | Deep semi-supervised segmentation with weight-averaged consistency targets, MICCA2018
 96 | 
 97 | Semi-supervised Skin Lesion Segmentation via Transformation Consistent Self-ensembling Model, BMVC2018
 98 | 
 99 | Data augmentation using learned transforms for one-shot medical image segmentation, CVPR2019
100 | 
101 | 
102 | 
103 | #### Self-Supervised
104 | 
105 | Revisiting Self-Supervised Visual Representation Learning, CVPR2019, review 4 common image ss methods
106 | 
107 | SCOPS: Self-Supervised Co-Part Segmentation,  CVPR2019
108 | 
109 | Time-contrastive Networks: Self-Supervised Learning from Video
110 | 
111 | Temporal Cycle-Consistency Learning,  CVPR2019
112 | 
113 | Learning Correspondence from the Cycle-consistency of Time, CVPR2019
114 | 
115 | 
116 | 
117 | #### Webly
118 | 
119 | [WebSeg: Learning Semantic Segmentation from Web Searches](paper_reports/WebSeg), arxiv, edges+MCG+saliency
120 | 
121 | [Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation](paper_reports/WebcoSeg), BMVC2017, co-segmentation
122 | 
123 | [Bootstrapping the Performance of Webly Supervised Semantic Segmentation](), CVPR2018, two model, 
124 | 
125 | #### Few Shot
126 | 
127 | [CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning](paper_reports/CA_fewshot), CVPR2019 
128 | 
129 | **Few-Shot Semantic Segmentation with Prototype Learning**, BMVC2018
130 | 
131 | Data augmentation using learned transformations for one-shot medical image segmentation, CVPR2019, spatial transform and appearance transform
132 | 
133 | 
134 | 
135 | #### Semi-Supervised
136 | 
137 | Weakly- and Semi-Supervised Panoptic Segmentation, ECCV2018
138 | 
139 | **Adversarial Learning for Semi-Supervised Semantic Segmentation**, BMVC2018, Ming-Hsuan Yang
140 | 
141 | Transferable Semi-supervised Semantic Segmentation, AAAI 2018
142 | 
143 | **Adversarial Dropout for Supervised and Semi-Supervised Learning**, AAAI 2018 
144 | 
145 | PIXEL LEVEL DATA AUGMENTATION FOR SEMANTIC IMAGE SEGMENTATION USING GENERATIVE ADVERSARIAL NETWORKS: Interesting
146 | 
147 | - balance distribution by generating image using GAN (manipulate on GT mask)
148 | 
149 | 
150 | 
151 | #### Cosegmentation
152 | 
153 | 
154 | 
155 | #### Weakly-Semantic
156 | 
157 | Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation, BMVC2018
158 | 
159 | **Cyclic Guidance for Weakly supervised Joint Detection and Segmentation**
160 | 
161 | 
162 | 
163 | #### Weakly-Localization
164 | 
165 | what about used for fine-grain 
166 | 
167 | #### Weakly-Instance
168 | 
169 | 
170 | 
171 | 
172 | #### Overhaed Imaginary
173 | Self-supervision-for-segmenting-overhead-imagery
174 | 
175 | 
176 | 
177 | #### Fast Segmentation
178 | 
179 | Improving Fast Segmentation With Teacher-student Learning, BMVC2018
180 | 
181 | 
182 | 
183 | ## Useful Knowledge
184 | 
185 | ### Uncertainty
186 | 
187 | [doc](./paper_reports/uncertainty)
188 | 
189 | ### Augmentation
190 | 
191 | Lucid
192 | 
193 | #### Appearance model
194 | 
195 | 
196 | 
197 | #### Transformation model
198 | 
199 | deformable
200 | 
201 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Weakly-Segmentation
  2 | List of useful codes and papers for weakly supervised Semantic/Instance/Panoptic/Few Shot Segmentation
  3 | 
  4 | - [Weakly-Segmentation](#weakly-segmentation)
  5 |   * [Top Work](#top-work)
  6 |       - [By Dataset](#by-dataset)
  7 |           + [PASCAL VOC2012](#pascal-voc2012)
  8 |       - [By Years](#by-years)
  9 |           + [ICCV2019](#iccv2019)
 10 |   * [Resources](#resources)
 11 |       - [Tutorial](#tutorial)
 12 |   * [Implementation](#implementation)
 13 | - [Related Tasks](#related-tasks)
 14 |   * [Few-shot segmentation](#few-shot-segmentation)
 15 |   * [Weakly-supervised Instance Segmentation](#weakly-supervised-instance-segmentation)
 16 |   * [Weakly-supervised Panoptic Segmentation](#weakly-supervised-panoptic-segmentation)
 17 | - [Reading List](#reading-list)
 18 |   * [Under Review](#under-review)
 19 |   * [Published](#published)
 20 |       - [context](#context)
 21 |       - [graph](#graph)
 22 |       - [bbox-level](#bbox-level)
 23 |       - [webly](#webly)
 24 |       - [Saliency](#saliency)
 25 |       - [localization](#localization)
 26 |       - [spp](#spp)
 27 |       - [affinity](#affinity)
 28 |       - [region](#region)
 29 |       - [network](#network)
 30 |       - [regularizer](#regularizer)
 31 |       - [evaluation measure](#evaluation-measure)
 32 |       - [architecture](#architecture)
 33 |       - [generative adversarial](#generative-adversarial)
 34 |       - [scene understanding](#scene-understanding)
 35 |       - [other useful](#other-useful)
 36 |       - [application](#application)
 37 |   * [Others](#others)
 38 |       - [priors](#priors)
 39 |       - [diffusion](#diffusion)
 40 |       - [analysis](#analysis)
 41 |       - [post processing](#post-processing)
 42 |       - [common methods](#common-methods)
 43 | 
 44 |             
 45 | ## Top Work
 46 | #### By Dataset
 47 | ###### PASCAL VOC2012
 48 | 
 49 | | method | val | test       |  notes |
 50 | | ------------ | ---------- | ---------- | ---------- |
 51 | | [DSRG](https://github.com/speedinghzl/DSRG)<sub>CVPR2018</sub> | 61.4 | 63.2 | deep seeded region growing, resnet-lfov\|vgg-aspp  |
 52 | | [psa](https://github.com/jiwoon-ahn/psa)<sub>CVPR2018</sub> | 61.7 | 63.7 | pixel affinity network, resnet38 |
 53 | | [MDC](https://arxiv.org/pdf/1805.04574.pdf)<sub>CVPR2018</sub> | 60.4 | 60.8 | multi-dilated convolution, vgg-lfov |
 54 | | [MCOF](http://3dimage.ee.tsinghua.edu.cn/wx/mcof)<sub>CVPR2018</sub> | 60.3 | 61.2 | iterative, RegionNet(sppx), resnet-lfov |
 55 | | [GAIN](https://arxiv.org/abs/1802.10171.pdf)<sub>CVPR2018</sub> |  55.3 |  56.8 | |
 56 | | [DCSP](https://github.com/arslan-chaudhry/dcsp_segmentation)<sub>BMVC2017</sub> | **58.6** | **59.2** | adversarial for saliency, and generate cues by cam+saliency(harmonic mean)|
 57 | | [GuidedSeg](https://github.com/coallaoh/GuidedLabelling)<sub>CVPR2017</sub> | 55.7 | 56.7 | saliency, TBD|
 58 | | [BDSSW](https://github.com/ascust/BDWSS)<sub>CVPR2018</sub> | 63.0 | 63.9 | webly, filter+enhance|
 59 | | [WegSeg](https://arxiv.org/pdf/1803.09859.pdf)<sub>arxiv</sub> | 63.1 | 63.3 | webly(pure), Noise filter module|
 60 | | [SeeNet](https://arxiv.org/abs/1810.09821)<sub>NIPS2018</sub> | 63.1 | 62.8 | based on DCSP |
 61 | | [Graph](http://mftp.mmcheng.net/Papers/18ECCVGraphPartition.pdf)<sub>ECCV2018</sub> | 63.6 | 64.5 | graph partition|
 62 | | [Graph](http://mftp.mmcheng.net/Papers/18ECCVGraphPartition.pdf)<sub>ECCV2018</sub> | 64.5 | 65.6 | use simple ImageNet dataset additionally|
 63 | | [CIAN](https://arxiv.org/abs/1811.10842)<sub>CVPR2019</sub> | 64.1 | 64.7 | cross image affinity network|
 64 | | [FickleNet](https://arxiv.org/abs/1902.10421)<sub>CVPR2019</sub> | **64.9** | **65.3** | use dropout (a generalization of dilated convolution)|
 65 | 
 66 | #### By Years
 67 | ###### ICCV2019
 68 | Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation   
 69 | Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation
 70 | ###### CVPR2019
 71 | FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference  
 72 | 
 73 | 
 74 | ## Resources
 75 | see [this](https://github.com/JackieZhangdx/WeakSupervisedSegmentationList) for more weakly lists and resources.  
 76 | see [this](https://github.com/wutianyiRosun/Segmentation.X) for more semantic/instance/panoptic/video segmentation lists and resources.
 77 | see [this](https://github.com/mrgloom/awesome-semantic-segmentation) for more implementations  
 78 | a good architecture summary paper:[Learning a Discriminative Feature Network for Semantic Segmentation](https://arxiv.org/pdf/1804.09337.pdf)
 79 | #### Tutorial
 80 | - Unsupervised Visual Learning Tutorial. *CVPR 2018* [[part 1]](https://www.youtube.com/watch?v=gSqmUOAMwcc) [[part 2]](https://www.youtube.com/watch?v=BijK_US6A0w)
 81 | - Weakly Supervised Learning for Computer Vision. *CVPR 2018* [[web]](https://hbilen.github.io/wsl-cvpr18.github.io/) [[part 1]](https://www.youtube.com/watch?v=bXfZFmE8cjo) [[part 2]](https://www.youtube.com/watch?v=FetNp6f19IM)
 82 | 
 83 | ## Implementation
 84 | 
 85 | [pytorch-segmentation-detection](https://github.com/warmspringwinds/pytorch-segmentation-detection) a library for dense inference and training of Convolutional Neural Networks, 68.0%
 86 | 
 87 | [rdn](https://github.com/fyu/drn) Dilated Residual Networks, 75.6%, may be the best available semantic segmentation in PyTorch?
 88 | 
 89 | [Detectron.pytorch](https://github.com/roytseng-tw/Detectron.pytorch) A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available. only for coco now
 90 | 
 91 | [AdvSemiSeg](https://github.com/hfslyc/AdvSemiSeg) Adversarial Learning for Semi-supervised Semantic Segmentation.  heavily borrowed from a **pytorch DeepLab** implementation ([Link](https://github.com/speedinghzl/Pytorch-Deeplab))
 92 | 
 93 | [PyTorch-ENet](https://github.com/davidtvs/PyTorch-ENet) PyTorch implementation of ENet
 94 | 
 95 | [tensorflow-deeplab-resnet](https://github.com/DrSleep/tensorflow-deeplab-resnet) Tensorflow implementation of deeplab-resnet(deeplabv2, resnet101-based): complete and detailed
 96 | 
 97 | [tensorflow-deeplab-lfov](https://github.com/DrSleep/tensorflow-deeplab-lfov) Tensorflow implementation of deeplab-LargeFOV(deeplabv2, vgg16-based): complete and detailed
 98 | 
 99 | [resnet38](https://github.com/itijyou/ademxapp)  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition: implemented using MXNET
100 | 
101 | [pytorch_deeplab_large_fov](https://github.com/BardOfCodes/pytorch_deeplab_large_fov): deeplab v1
102 | 
103 | [pytorch-deeplab-resnet](https://github.com/isht7/pytorch-deeplab-resnet)DeepLab resnet v2 model in pytorch
104 | 
105 | [DeepLab-ResNet-Pytorch](https://github.com/speedinghzl/Pytorch-Deeplab) Deeplab v3 model in pytorch, 
106 | 
107 | [BDWSS](https://github.com/ascust/BDWSS) Bootstrapping the Performance of Webly Supervised Semantic Segmentation
108 | 
109 | [psa](https://github.com/jiwoon-ahn/psa) Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation
110 | 
111 | [DSRG](https://github.com/speedinghzl/DSRG): Caffe, CAM and DRFI provided 
112 | 
113 | SEC
114 |   - [original](https://github.com/kolesman/SEC): Caffe  
115 |   - [BDSSW](https://github.com/ascust/BDWSS): MXNET
116 |   - [SEC-tensorflow](https://github.com/xtudbxk/SEC-tensorflow): tensorflow  
117 | 
118 | # Related Tasks
119 | ## Few-shot segmentation
120 | - [ ] One-shot learning for semantic segmentation, BMVC2017
121 | - [ ] Conditional networks for few-shot semantic segmentation, ICLR2018 Workshop
122 | - [ ] Few-Shot Segmentation Propagation with Guided Networks, preprint
123 | - [ ] Few-Shot Semantic Segmentation with Prototype Learning, BMVC2018
124 | - [ ] Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation, AAAI2019
125 | - [ ] CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning, CVPR2019
126 | - [ ] One-Shot Segmentation in Clutter, ICML 2018
127 | 
128 | ## Weakly-supervised Instance Segmentation
129 | - [x] Weakly Supervised Instance Segmentation using Class Peak Response, CVPR2018
130 | - [ ] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR2019
131 | - [ ] Object Counting and Instance Segmentation with Image-level Supervision, CVPR2019
132 | - [x] Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation, CVPR2019
133 | - [x] Where are the Masks: Instance Segmentation with Image-level Supervision, BMVC2019
134 | - [ ] Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation, ICCV2019
135 | 
136 | ## Weakly-supervised Panoptic Segmentation
137 | - [ ] Weakly- and Semi-Supervised Panoptic Segmentation, ECCV2018
138 | 
139 | # Reading List
140 | 
141 | ## Under Review
142 | - [ ] [Gated CRF Loss for Weakly Supervised Semantic Image Segmentation](https://arxiv.org/abs/1906.04651)
143 | - [ ] [Closed-Loop Adaptation for Weakly-Supervised Semantic Segmentation](https://arxiv.org/abs/1905.12190)
144 | - [ ] [Harvesting Information from Captions for Weakly Supervised Semantic Segmentation](https://arxiv.org/abs/1905.06784)
145 | - [ ] [Consistency regularization and CutMix for semi-supervised semantic segmentation](https://arxiv.org/abs/1906.01916)
146 | - [ ] [Zero-shot Semantic Segmentation](https://arxiv.org/abs/1906.00817)
147 | - [x] [Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation](https://arxiv.org/pdf/1909.03714.pdf), propose an scale equivariant regularization. 
148 | 
149 | ## Published
150 | #### context 
151 | - [x] Context Encoding for Semantic Segmentation: CVPR2018. use TEN
152 | - [ ] The Role of Context for Object Detection and Semantic Segmentation in the Wild: CVPR2014
153 | - [ ] Objects as Context for Detecting Their Semantic Parts: CVPR2018
154 | - [ ] Exploring context with deep structured models for semantic segmentation: TPAMI2017
155 | - [ ] dilated convolution
156 | - [ ] Deep TEN: Texture encoding network !!: CVPR2017. A global context vector, pooled from all spatial positions, can be concatenated to local features
157 | - [ ]  Refinenet: Multi-path refinement networks for high-resolution semantic segmentation: CVPR2017. local features across different scales can be fused to encode global context
158 | - [x] Non-local neural networks: CVPR2018. a densely connected graph with pairwise edges between all pixels
159 | 
160 | #### graph
161 | - [ ] Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation: ECCV2018
162 | 
163 | #### bbox-level
164 | Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation, CVPR2019
165 | 
166 | #### webly
167 | - [x] Weakly Supervised Semantic Segmentation Based on Web Image Cosegmentation: BMVC2017, training model using masks of web images which are generated by cosegmentation 
168 | - [ ] Webly Supervised Semantic Segmentation: CVPR2017
169 | - [x] Weakly Supervised Semantic Segmentation using Web-Crawled Videos: CVPR2017, learns a class-agnostic decoder(attention map -> binary mask), pseudo masks are generated from video frames by solving a graph-based optimization problem. 
170 | - [x] Bootstrapping the Performance of Webly Supervised Semantic Segmentation: target + web domain, target model filters web images, refine mask by combine target and web masks.
171 | - [ ] Learning from Weak and Noisy Labels for Semantic Segmentation: TPAMI2017
172 | - [x] WebSeg: Learning Semantic Segmentation from Web Searches: arxiv, directly learning from keywork retrievaled web images. using saliency and region(MCG with edge)
173 | - [x] STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation: TPAMI 2017, Initial, Enhanced, Powerful three DCNN model. inital mask(generated by saliency and label using simple images) -> initial model -> enhanced mask(generated using simple images) -> Enhanced model -> powerful mask(generated using complex images) -> powerful model
174 |   - saliency can not handle complex images, so BMVC2017 uses coseg instead
175 | 
176 | #### Saliency
177 | - [x] Exploiting Saliency for Object Segmentation from Image Level Labels: CVPR2017
178 | - [x] Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation: BMVC2017
179 |   - combine saliency(off-shelf) and CAM to get cues, use harmonic mean function
180 |   - adapt CAM from head of Segmentation Network
181 |   - use erasing to get multiple objects' saliency
182 | 
183 | #### localization
184 | - [x] Adversarial Complementary Learning for Weakly Supervised Object Localization, CVPR2018. two branchs, remove high activations from feature map. [code](https://github.com/xiaomengyc/ACoL)
185 | - [x] [Tell me where to look: Guided Attention Inference Network](https://arxiv.org/pdf/1802.10171.pdf), CVPR2018. origin image soft erasing(CAM after sigmoid as attention) -> end2end training, force erased images have zero activation
186 | - [x] Self-Erasing Network for Integral Object Attention， NIPS2018: prohibit attentions from spreading to unexpected background regions.
187 |   - cam -> tenary mask(attention, background, potential)
188 |   - self erasing only in attention + potential region(**sign flip in background region** instead of setting to 0 simply)
189 |   - self produced psedo label for background region(difference to SPG: 1.psedo label for background and attention 2.supervise low layer)
190 | - [x] Self-produced Guidance for Weakly-supervised Object localization, ECCV2018:
191 |   - self supervised use top down framework, for single label classification prob. **add pixel-wise supervision when only have image level label**  
192 |   - B1, B2 sharing
193 |   - bottom guide top inversely(B1+B2 -> C)
194 | 
195 | #### spp
196 | - [ ] Superpixel convolutional networks using bilateral inceptions
197 | - [x] Learning Superpixels with Segmentation-Aware Affinity Loss: good intro for superpixel algs.
198 | 
199 | #### affinity
200 | - [x] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation: image-level, semantic affinity, learn a **network** to predict affinity
201 | - [x] Adaptive Affinity Field for Semantic Segmentation: ECCV2018, semantic affinity. add a pairwise term in seg **loss**(similarity metric: KL divergence), use an adversarial method to determine optimal neighborhood size
202 | 
203 | #### region
204 | - [ ] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
205 | - [ ] Simultaneous Detection and Segmentation, 2014
206 | - [ ] Feedforward semantic segmentation with zoom-out features: 2015
207 | 
208 | #### network
209 | - [ ] Learned Shape-Tailored Descriptors for Segmentation
210 | - [ ] Normalized Cut Loss for Weakly-Supervised CNN Segmentation
211 | - [ ] Fully Convolutional Adaptation Networks for Semantic Segmentation
212 | - [ ] Learning to Adapt Structured Output Space for Semantic Segmentation
213 | - [x] Semantic Segmentation with Reverse Attention: BMVC2017, equally responses of multi classes(confusion in boudary region). add reverse branch, predict the probability of pixel that doesn't belong to the corresponding class. and use attention to combine origin and reverse branch 
214 | - [x] Deep Clustering for Unsupervised Learning of Visual Features, ECCV2018. use assignments of knn as supervision to update weights of network 
215 | - [x] DEL: Deep Embedding Learning for Efficient Image Segmentation, IJCAI 2018. use spp embedding as init probs to do image segmentation
216 | - [x] Learning a Discriminative Feature Network for Semantic Segmentation, CVPR2018, Smoother network: multi-scale+global context(FPN with channel atention), Broder Network: focal loss for boundary. [code?](https://github.com/YuhuiMa/DFN-tensorflow)
217 | - [ ] Convolutional Simplex Projection Network for Weakly Supervised Semantic Segmentation: BMVC 2018
218 | - [ ] Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation: CVPR2019
219 | 
220 | #### regularizer
221 | - [ ] [Normalized Cut Loss for Weakly-Supervised CNN Segmentation](https://arxiv.org/pdf/1804.01346.pdf)
222 | - [ ] [Regularized Losses for Weakly-supervised CNN Segmentation](https://github.com/meng-tang/rloss)
223 | 
224 | #### evaluation measure
225 | - [ ] [Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation](https://www.cs.umanitoba.ca/~ywang/papers/isvc16.pdf)
226 | - [ ] [The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks](https://arxiv.org/pdf/1705.08790.pdf)
227 | - [ ] [What is a good evaluation measure for semantic segmentation?](http://www.bmva.org/bmvc/2013/Papers/paper0032/paper0032.pdf)
228 | 
229 | #### architecture
230 | - [ ] The Devil is in the Decoders, BMVC2017
231 | - [x] Dilated Residual Networks, CVPR2017. Dilated structure design for classification and localization.
232 | - [x] Understanding Convolution for Semantic Segmentation, WACV2018. hybrid dilated convolution(2-2-2 -> 1-2-3)
233 | - [x] Smoothed Dilated Convolutions for Improved Dense Prediction, KDD2018. separable and share conv(for smoothing) + dilated conv
234 | - [x] Deeplab v1, v2, v3, v3+
235 | - [ ] Learning Fully Dense Neural Networks for Image Semantic Segmentation, AAAI2019 
236 | 
237 | #### generative adversarial 
238 | - [ ] **Deep dual learning for semantic image segmentation**:CVPR2017, image translation
239 | - [x] Semantic Segmentation using Adversarial Networks, NIPS2016 workshop
240 |   - add gan loss branch, Segnet as generator, D: GT mask or predicted mask
241 | - [x] Adversarial Learning for Semi-Supervised Semantic Segmentation: BMVC2018
242 |   - semi supervised: SegNet as G, FCN-type D(discriminate each location), use output of D as psedo label for unlabeled data
243 | - [x] Semi and weakly Supervised Semantic Segmentation Using Generative Adversarial Network: ICCV2017, use SegNet as D, treat fake as new class
244 |   - weakly, use conditionalGan, pixel-level, image-level, generated data are included in loss. performance boosts less when increasing fully data
245 | - [ ] generative adversarial learning towards Fast weakly supervised detection: CVPR2018
246 | - [x] Adaptive Affinity Field for Semantic Segmentation: ECCV2018, semantic affinity. add a pairwise term in seg **loss**(similarity metric: KL divergence), use an adversarial method to determine optimal neighborhood size
247 | 
248 | #### scene understanding
249 | - [ ] ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
250 | - [ ] SeGAN: Segmenting and Generating the Invisible
251 | 
252 | #### other useful
253 | - [ ] Learning to Segment Every Thing: semi-supervised, weight transfer function (from bbox parameters to mask parameters)
254 | - [ ] Simple Does It: Weakly Supervised Instance and Semantic Segmentation: bbox-level, many methods, using graphcut, HED, MCG
255 | - [ ] Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning: tricky, curriculum learning: image level -> instance level -> pixel level
256 | - [ ] Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation: CVPR2017
257 | - [x] Improving Weakly-Supervised Object Localization By Micro-Annotation: BMVC2016, object classes always co-occur with same background elements(boat, train). propose a new annotation method. add human annotations to improve localization results of CAM, annotating based on clusters of dense features. each class uses a spectral clustering.(CAM has problem)
258 | - [x] Co-attention CNNs for Unsupervised Object Co-segmentation: IJCAI 2018
259 | - [ ] Coarse-to-fine Image Co-segmentation with Intra and Inter Rank Constraints, IJCAI2018
260 | - [ ] Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters, IJCAI2018
261 | - [x] Image-level to Pixel-wise Labeling: From Theory to Practice: fully, analysis the effect of image labels on seg results. add a generator(recover original image). image label(binary, use a threshold small than 0.5, eg:0.25), IJCAI2018
262 | 
263 | #### application
264 | - [x] SeGAN: Segmenting and Generating the Invisible: CVPR2018, generate occluded parts
265 | - [x] Learning Hierarchical Semantic Image Manipulation through Structured Representations: NIPS2018, manipulate image on object-level by modify bbox
266 | 
267 | 
268 | ## Others
269 | #### priors
270 | - Superpixels: An Evaluation of the State-of-the-Art [link](https://github.com/davidstutz/superpixel-benchmark)
271 | - Learning Superpixels with Segmentation-Aware Affinity Loss[link](http://jankautz.com/publications/LearningSuperpixels_CVPR2018.pdf)
272 | - Superpixel based Continuous Conditional Random Field Neural Network for Semantic Segmentation [link](https://www.sciencedirect.com/science/article/pii/S0925231219300281)
273 | 
274 | #### diffusion
275 | Learning random-walk label propagation for weakly-supervised semantic segmentation: scribble
276 | 
277 | Convolutional Random Walk Networks for Semantic Image Segmetation: fully, affinity branch(low level)
278 | 
279 | Soft Proposal Networks for Weakly Supervised Object Localization: attention, semantic affinity
280 | 
281 | Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation: image-level, semantic affinity
282 | 
283 | #### analysis
284 | image level to pixel wise labeling: from theory to practice: IJCAI 2018 analysis the effectiveness of class-level labels for segmentation(GT, predicted)
285 | Attention based Deep Multiple Instance Learning: ICML 2018. CAM from MIL perspective view
286 | 
287 | #### post processing
288 | listed in : [Co-attention CNNs for Unsupervised Object Co-segmentation](https://www.csie.ntu.edu.tw/~cyy/publications/papers/Hsu2018CAC.pdf)
289 | - Otsu’s method
290 | - GrabCut
291 | - CRF    
292 | 
293 | #### common methods
294 | - refine segmentation results using image-level labels
295 | - multi-label classification branch(BDWSS)
296 | - generative branch(to original image) 
297 | - crf
298 | 
299 | 


--------------------------------------------------------------------------------