├── dataset.png ├── overview.png ├── 4-Datasets.md ├── README.md ├── 3-VSS.md └── 2-VOS.md /dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tfzhou/VS-Survey/HEAD/dataset.png -------------------------------------------------------------------------------- /overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tfzhou/VS-Survey/HEAD/overview.png -------------------------------------------------------------------------------- /4-Datasets.md: -------------------------------------------------------------------------------- 1 | ## 4. Datasets 2 | 3 | | Dataset | Year | Publication | #Video | Annotation | Link | 4 | | ----------------- | ---- | ----------- | ---------- | ------------------------------- | ------------------------------------------------------------ | 5 | | Youtube-Objects | 2012 | CVPR | 1,407(126) | Object-level AVOS, SVOS | [Paper](https://hal.inria.fr/hal-00695940v2/document), [Homepage](http://calvin-vision.net/datasets/youtube-objects-dataset/) | 6 | | FBMS | 2014 | PAMI | 59 | Object-level AVOS, SVOS | [Paper](https://lmb.informatik.uni-freiburg.de/Publications/2014/OB14b/pami_moseg.pdf), [Homepage](https://lmb.informatik.uni-freiburg.de/Publications/2014/OB14b/) | 7 | | DAVIS16 | 2016 | CVPR | 50 | Object-level AVOS, SVOS | [Paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Perazzi_A_Benchmark_Dataset_CVPR_2016_paper.pdf), [Homepage](https://davischallenge.org/davis2016/code.html) | 8 | | DAVIS17 | 2017 | --- | 150 | Instance-level AVOS, SVOS, IVOS | [Paper](https://arxiv.org/pdf/1704.00675.pdf), [Homepage](https://davischallenge.org/davis2017/code.html) | 9 | | Youtube-VOS | 2018 | --- | 4,519 | SVOS | [Paper](https://arxiv.org/pdf/1809.03327.pdf), [Homepage](https://youtube-vos.org/) | 10 | | A2D Sentence | 2018 | CVPR | 3782 | RVOS | [Paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gavrilyuk_Actor_and_Action_CVPR_2018_paper.pdf), [Homepage](https://kgavrilyuk.github.io/publication/actor_action/) | 11 | | J-HMDB Sentence | 2018 | CVPR | 928 | RVOS | [Paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gavrilyuk_Actor_and_Action_CVPR_2018_paper.pdf), [Homepage](https://kgavrilyuk.github.io/publication/actor_action/) | 12 | | DAVIS17-RVOS | 2018 | ACCV | 90 | RVOS | [Paper](https://arxiv.org/pdf/1803.08006.pdf), [Homepage](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/video-segmentation/video-object-segmentation-with-language-referring-expressions) | 13 | | Refer-Youtube-VOS | 2020 | ECCV | 3,975 | RVOS | [Paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf), [Homepage](https://github.com/skynbe/Refer-Youtube-VOS) | 14 | | CamVid | 2009 | PRL | 4 | VSS | [Paper](http://www0.cs.ucl.ac.uk/staff/G.Brostow/papers/Brostow_2009-PRL.pdf), [Homepage](http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/) | 15 | | CityScape | 2016 | CVPR | 5,000 | VSS | [Paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.pdf), [Homepage](https://www.cityscapes-dataset.com/) | 16 | | NYUDv2 | 2012 | ECCV | 518 | VSS | [Paper](https://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf), [Homepage](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) | 17 | | VSPW | 2021 | CVPR | 3,536 | VSS | [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Miao_VSPW_A_Large-scale_Dataset_for_Video_Scene_Parsing_in_the_CVPR_2021_paper.pdf), [Homepage](https://www.vspwdataset.com/) | 18 | | Youtube-VIS | 2019 | ICCV | 3,859 | VIS | [Paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Yang_Video_Instance_Segmentation_ICCV_2019_paper.pdf), [Homepage](https://youtube-vos.org/dataset/vis/) | 19 | | KITTI MOTS | 2019 | CVPR | 21 | VIS | [Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Voigtlaender_MOTS_Multi-Object_Tracking_and_Segmentation_CVPR_2019_paper.pdf), [Homepage](https://www.vision.rwth-aachen.de/page/mots) | 20 | | MOTSChallenge | 2019 | CVPR | 4 | VIS | [Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Voigtlaender_MOTS_Multi-Object_Tracking_and_Segmentation_CVPR_2019_paper.pdf), [Homepage](https://www.vision.rwth-aachen.de/page/mots) | 21 | | BDD100K | 2020 | ECCV | 100,000 | VSS,VIS | [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yu_BDD100K_A_Diverse_Driving_Dataset_for_Heterogeneous_Multitask_Learning_CVPR_2020_paper.pdf), [Homepage](https://bdd-data.berkeley.edu/) | 22 | | OVIS | 2021 | --- | 901 | VIS | [Paper](https://arxiv.org/pdf/2102.01558.pdf), [Homepage](http://songbai.site/ovis/) | 23 | | VIPER-VPS | 2020 | CVPR | 124 | VPS | [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Video_Panoptic_Segmentation_CVPR_2020_paper.pdf), [Homepage](https://github.com/mcahny/vps) | 24 | | Cityscape-VPS | 2020 | CVPR | 500 | VPS | [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Video_Panoptic_Segmentation_CVPR_2020_paper.pdf), [Homepage](https://github.com/mcahny/vps) | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) 2 | [![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) 3 | ![visitors](https://visitor-badge.glitch.me/badge?style=flat-square&page_id=tfzhou/VS-Survey) 4 | 5 |

6 |

A Survey on Deep Learning Technique for Video Segmentation

7 | 8 |

9 | IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
10 | Tianfei Zhou 11 | , 12 | Fatih Porikli 13 | , 14 | David Crandall 15 | , 16 | Luc Van Gool 17 | , 18 | Wenguan Wang 19 | 20 |

21 | 22 |

23 | 24 | arXiv PDF 25 | 26 | 27 | Project Page 28 | 29 | 30 | TPAMI PDF 31 | 32 |

33 |

34 |
35 | 36 | This repo compiles a collection of resources on deep video segmentation, and will be continuously updated to track developments in the field. Please feel free to submit a pull request if you find any work missing. 37 | 38 | ## 1. Introduction 39 | Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to virtual background creation in video conferencing. In this survey, we comprehensively review two basic lines of research — **video object segmentation** and **video semantic segmentation** — by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. In particular, we review **eight** sub-fields as given in the following figure: 40 | 41 |

42 | 43 |

44 | 45 | 46 | ## 2. Deep Learning-based Video Object Segmentation 47 | 48 | - [2.1 Automatic Video Object Segmentation (AVOS)](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#21-automatic-video-object-segmentation-avos) 49 | - [2.1.1 Deep Learning Module based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#211-deep-learning-module-based) 50 | - [2.1.2 Pixel Instance Embedding based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#212-pixel-instance-embedding-based) 51 | - [2.1.3 Short-term Information Encoding](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#213-short-term-information-encoding) 52 | - [2.1.4 Long-term Context Encoding](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#214-long-term-context-encoding) 53 | - [2.1.5 Un-/Weakly-supervised based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#215-un/weakly-supervised-based) 54 | - [2.1.6 Instance-level AVOS](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#216-instance-level-AVOS) 55 | - [2.2 Semi-automatic Video Object Segmentation (SVOS)](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#22-semi-automatic-video-object-segmentation-svos) 56 | - [2.2.1 Online Fine-tuning based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#221-online-fine-tuning-based) 57 | - [2.2.2 Propagation based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#222-propagation-based) 58 | - [2.2.3 Matching based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#223-matching-based) 59 | - [2.2.4 Box-initialization based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#224-box-initialization-based) 60 | - [2.2.5 Un-/Weakly-supervised based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#225-un-/weakly-supervised-based) 61 | - [2.2.6 Other Specific Methods](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#226-other-specific-methods) 62 | - [2.3 Interactive Video Object Segmentation (IVOS)](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#23-interactive-video-object-segmentation-ivos) 63 | - [2.3.1 Interaction-propagation based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#231-interaction-propagation-based) 64 | - [2.3.2 Other Methods](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#232-other-methods) 65 | - [2.4 Language-guided Video Object Segmentation (LVOS)](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#24-language-guided-video-object-segmentation-lvos) 66 | - [2.4.1 Dynamic Convolution based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#241-dynamic-convolution-based) 67 | - [2.4.2 Capsule Routing based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#242-capsule-routing-based) 68 | - [2.4.3 Attention based](https://github.com/tfzhou/VS-Survey/blob/main/2-VOS.md#243-attention-based) 69 | ## 3. Deep Learning-based Video Semantic Segmentation 70 | - [3.1 (Instance-agnostic) Video Semantic Segmentation (VSS)](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#31-instance-agnostic-video-semantic-segmentation-vss) 71 | - [3.1.1 Efforts towards More Accurate Segmentation](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#311-efforts-toward-more-accurate-segmentation) 72 | - [3.1.2 Efforts towards Faster Segmentation](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#312-efforts-towards-faster-segmentation) 73 | - [3.1.3 Semi-/Weakly-supervised based](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#313-semi/weakly-supervised-based) 74 | - [3.2 Video Instance Segmentation (VIS)](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#32-video-instance-segmentation-vis) 75 | - [3.3 Video Panoptic Segmentation (VPS)](https://github.com/tfzhou/VS-Survey/blob/main/3-VSS.md#33-video-panoptic-segmentation-vps) 76 | 77 | ## 4. Datasets 78 | - [Popular Datasets in VOS and VSS](https://github.com/tfzhou/VS-Survey/blob/main/4-Datasets.md) 79 | 80 | ![](dataset.png) 81 | 82 | ## Citation 83 | 84 | If you find our survey and repository useful for your research, please consider citing our paper: 85 | ```bibtex 86 | @article{zhou2023survey, 87 | title={A Survey on Deep Learning Technique for Video Segmentation}, 88 | author={Zhou, Tianfei and Porikli, Fatih and Crandall, David J and Van Gool, Luc and Wang, Wenguan}, 89 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 90 | year={2023}, 91 | publisher={IEEE} 92 | } 93 | ``` 94 | -------------------------------------------------------------------------------- /3-VSS.md: -------------------------------------------------------------------------------- 1 | ## 3. Deep Learning-based Video Semantic Segmentation 2 | 3 | ### 3.1 (Instance-agnostic) VideoSemanticSegmentation (VSS) 4 | 5 | #### 3.1.1 Efforts towards More Accurate Segmentation 6 | 7 | | Year | Publication | Paper Title | Project | 8 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 9 | | 2016 | CVPR | [Feature space optimization for semantic video segmentation](https://ieeexplore.ieee.org/document/7780714) | -- | 10 | | 2016 | ECCV | [Joint optical flow and temporally consistent semantic segmentation](https://arxiv.org/pdf/1607.07716.pdf) | -- | 11 | | 2017 | ICCV | [Video scene parsing with predictive feature learning](https://ieeexplore.ieee.org/document/8237857) | -- | 12 | | 2017 | ICCV | [Semantic video cnns through representation warping](https://arxiv.org/pdf/1708.03088.pdf) | [Code](https://github.com/raghudeep/netwarp_public), [Project](http://segmentation.is.tuebingen.mpg.de/netwarp/) | 13 | | 2018 | CVPR | [Semantic video segmentation by gated recurrent flow propagation](https://arxiv.org/pdf/1612.08871.pdf) | [Code](https://github.com/D-Nilsson/GRFP) | 14 | | 2018 | CVPR | [Deep spatio-temporal random fields for efficient video segmentation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Chandra_Deep_Spatio-Temporal_Random_CVPR_2018_paper.pdf) | [Code](https://github.com/siddharthachandra/gcrf-v3.0) | 15 | | 2018 | ECCV | [Efficient uncertainty estimation for semantic segmentation in videos](https://arxiv.org/pdf/1807.11037.pdf) | [Code](https://github.com/andyhahaha/Efficient-Uncertainty-Video-Segmentation) | 16 | | 2020 | AAAI | [Every frame counts: joint learning of video segmentation and optical flow](https://arxiv.org/pdf/1911.12739.pdf) | -- | 17 | 18 | #### 3.1.2 Efforts towards Faster Segmentation 19 | 20 | | Year | Publication | Paper Title | Project | 21 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 22 | | 2016 | ECCV | [Clockwork convnets for video semantic segmentation](https://arxiv.org/pdf/1608.03609.pdf) | -- | 23 | | 2017 | CVPR | [Deep feature flow for video recognition](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Deep_Feature_Flow_CVPR_2017_paper.pdf) | [Code](https://github.com/msracver/Deep-Feature-Flow) | 24 | | 2017 | CVPR | [Budget-aware deep semantic video segmentation](https://openaccess.thecvf.com/content_cvpr_2017/papers/Mahasseni_Budget-Aware_Deep_Semantic_CVPR_2017_paper.pdf) | -- | 25 | | 2018 | CVPR | [Dynamic video segmentation network](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Dynamic_Video_Segmentation_CVPR_2018_paper.pdf) | [Code](https://github.com/XUSean0118/DVSNet) | 26 | | 2018 | CVPR | [Low-latency video semantic segmentation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Low-Latency_Video_Semantic_CVPR_2018_paper.pdf) | -- | 27 | | 2019 | CVPR | [Accel: A corrective fusion network for efficient semantic segmentation on video](https://arxiv.org/pdf/1807.06667.pdf) | -- | 28 | | 2020 | CVPR | [Temporally distributed networks for fast video semantic segmentation](https://arxiv.org/pdf/2004.01800.pdf) | [Code](https://github.com/feinanshan/TDNet) | 29 | | 2020 | ECCV | [Efficient semantic video segmentation with per-frame inference](https://arxiv.org/pdf/2002.11433.pdf) | [Code](https://github.com/irfanICMLL/ETC-Real-time-Per-frame-Semantic-video-segmentation) | 30 | 31 | #### 3.1.3 Semi-/Weakly-supervised based 32 | 33 | | Year | Publication | Paper Title | Project | 34 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 35 | | 2017 | ICCV | [Bringing background into the foreground: Making all classes equal in weakly-supervised video semantic segmentation](https://openaccess.thecvf.com/content_ICCV_2017/papers/Saleh_Bringing_Background_Into_ICCV_2017_paper.pdf) | -- | 36 | | 2019 | CVPR | [Improving semantic segmentation via video propagation and label relaxation](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Improving_Semantic_Segmentation_via_Video_Propagation_and_Label_Relaxation_CVPR_2019_paper.pdf) | [Code](https://github.com/YeLyuUT/SSeg), [Project](https://nv-adlr.github.io/publication/2018-Segmentation) | 37 | | 2020 | ECCV | [Naive-student: Leveraging semi-supervised learning in video sequences for urban scene segmentation](https://arxiv.org/pdf/2005.10266.pdf) | -- | 38 | 39 | ### 3.2 Video Instance Segmentation (VIS) 40 | 41 | | Year | Publication | Paper Title | Project | 42 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 43 | | 2019 | arXiv | [Learning a spatio-temporal embedding for video instance segmentation](https://arxiv.org/pdf/1912.08969.pdf) | -- | 44 | | 2019 | CVPR | [Mots: Multi-object tracking and segmentation](abc) | [Code]() | 45 | | 2019 | ICCV | [Video instance segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Yang_Video_Instance_Segmentation_ICCV_2019_paper.pdf) | [Code](https://github.com/youtubevos/MaskTrackRCNN) | 46 | | 2020 | CVPR | [Learning multi-object tracking and segmentation from automatic annotations](https://openaccess.thecvf.com/content_CVPR_2020/papers/Porzi_Learning_Multi-Object_Tracking_and_Segmentation_From_Automatic_Annotations_CVPR_2020_paper.pdf) | -- | 47 | | 2020 | CVPR | [Video instance segmentation tracking with a modified vae architecture](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lin_Video_Instance_Segmentation_Tracking_With_a_Modified_VAE_Architecture_CVPR_2020_paper.pdf) | -- | 48 | | 2020 | CVPR | [Classifying, segmenting, and tracking object instances in video with mask propagation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Bertasius_Classifying_Segmenting_and_Tracking_Object_Instances_in_Video_with_Mask_CVPR_2020_paper.pdf) | -- | 49 | | 2020 | ECCV | [Sipmask: Spatial information preservation for fast image and video instance segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590001.pdf) | [Code](https://github.com/JialeCao001/SipMask) | 50 | | 2020 | ECCV | [Stem-seg: Spatio-temporal embeddings for instance segmentation in videos](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123560154.pdf) | [Code](https://github.com/sabarim/STEm-Seg) | 51 | | 2021 | AAAI | [Compfeat: Comprehensive feature aggregation for video instance segmentation](https://www.aaai.org/AAAI21Papers/AAAI-1143.FuY.pdf) | [Code](https://github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation) | 52 | | 2021 | CVPR | [Track to detect and segment: An online multi-object tracker](https://openaccess.thecvf.com/content/CVPR2021/papers/Wu_Track_To_Detect_and_Segment_An_Online_Multi-Object_Tracker_CVPR_2021_paper.pdf) | [Code](https://github.com/JialianW/TraDeS),[Project](https://jialianwu.com/projects/TraDeS.html) | 53 | | 2021 | CVPR | [Sg-net: Spatial granularity network for one-stage video instance segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_SG-Net_Spatial_Granularity_Network_for_One-Stage_Video_Instance_Segmentation_CVPR_2021_paper.pdf) | [Code](https://github.com/goodproj13/SG-Net) | 54 | | 2021 | CVPR | [End-to-end video instance segmentation with transformers](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_End-to-End_Video_Instance_Segmentation_With_Transformers_CVPR_2021_paper.pdf) | [Code](https://github.com/Epiphqny/VisTR) | 55 | | 2021 | CVPR | [Weakly supervised instance segmentation for videos with temporal mask consistency](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Weakly_Supervised_Instance_Segmentation_for_Videos_With_Temporal_Mask_Consistency_CVPR_2021_paper.pdf) | -- | 56 | | 2021 | CVPR | [Learning to track instances without video annotations](https://openaccess.thecvf.com/content/CVPR2021/papers/Fu_Learning_to_Track_Instances_without_Video_Annotations_CVPR_2021_paper.pdf) | [Project](https://oasisyang.github.io/projects/semi-track/index.html) | 57 | | 2021 | ICCV | [Video instance segmentation with a propose-reduce paradigm](https://openaccess.thecvf.com/content/ICCV2021/papers/Lin_Video_Instance_Segmentation_With_a_Propose-Reduce_Paradigm_ICCV_2021_paper.pdf) | [Code](https://github.com/dvlab-research/ProposeReduce) | 58 | | 2021 | ICCV | [Crossover learning for fast online video instance segmentation](https://arxiv.org/pdf/2104.05970.pdf) | [Code](https://github.com/hustvl/CrossVIS) | 59 | | 2022 | PAMI | [Video Instance Segmentation using Dynamic Network](https://arxiv.org/abs/2107.13155) | [Code](https://github.com/lxtGH/TemporalPyramidRouting) | 60 | 61 | ### 3.3 Video Panoptic Segmentation (VPS) 62 | 63 | | Year | Publication | Paper Title | Project | 64 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------: | 65 | | 2020 | CVPR | [Video panoptic segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Video_Panoptic_Segmentation_CVPR_2020_paper.pdf) | -- | 66 | | 2021 | CVPR | [Learning to associate every segment for video panoptic segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Woo_Learning_To_Associate_Every_Segment_for_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf) | -- | 67 | | 2021 | CVPR | [Vipdeeplab: Learning visual perception with depth-aware video panoptic segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Qiao_VIP-DeepLab_Learning_Visual_Perception_With_Depth-Aware_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf) | [Code](https://github.com/joe-siyuan-qiao/ViP-DeepLab) | 68 | | 2022 | CVPR | [Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation](https://arxiv.org/abs/2204.04656) | [Code](https://github.com/lxtGH/Video-K-Net) | 69 | 70 | 71 | -------------------------------------------------------------------------------- /2-VOS.md: -------------------------------------------------------------------------------- 1 | ## 2. Deep Learning-based Video Object Segmentation 2 | 3 | ### 2.1 Automatic Video Object Segmentation (AVOS) 4 | 5 | #### 2.1.1 Deep Learning Module based 6 | 7 | | Year | Publication | Paper Title | Project | 8 | |------|:-----------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:| 9 | | 2015 | CVPR | [Learning to segment moving objects in videos](https://www.cs.cmu.edu/~katef/papers/CVPR2015_LearnVideoSegment.pdf) | -- | 10 | | 2016 | CVPR | [Video segmentation via object flow](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Tsai_Video_Segmentation_via_CVPR_2016_paper.pdf) | [Code](https://github.com/wasidennis/ObjectFlow), [Project](https://sites.google.com/site/yihsuantsai/research/cvpr16-segmentation) | 11 | | 2017 | CVPR | [Learning motion patterns in videos](https://openaccess.thecvf.com/content_cvpr_2017/papers/Tokmakov_Learning_Motion_Patterns_CVPR_2017_paper.pdf) | [Project](http://thoth.inrialpes.fr/research/mpnet/) | 12 | | 2017 | ICCV | [Primary video object segmentation via complementary cnns and neighborhood reversible flow](https://openaccess.thecvf.com/content_ICCV_2017/papers/Li_Primary_Video_Object_ICCV_2017_paper.pdf) | -- | 13 | | 2018 | TIP | [Video salient object detection via fully convolutional networks](https://arxiv.org/pdf/1702.00871.pdf) | [Code](https://github.com/wenguanwang/ViSalientObject) | 14 | 15 | #### 2.1.2 Pixel Instance Embedding based 16 | 17 | | Year | Publication | Paper Title | Project | 18 | | ---- |:-----------:| ------------------------------------------------------------ |:-------:| 19 | | 2017 | arXiv | [Semantic instance segmentation via deep metric learning](https://arxiv.org/pdf/1703.10277.pdf) | -- | 20 | | 2018 | CVPR | [Instance embedding transfer to unsupervised video object segmentation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Instance_Embedding_Transfer_CVPR_2018_paper.pdf) | -- | 21 | | 2018 | ECCV | [video object segmentation with motion-based bilateral networks](https://openaccess.thecvf.com/content_ECCV_2018/papers/Siyang_Li_Unsupervised_Video_Object_ECCV_2018_paper.pdf) | -- | 22 | 23 | #### 2.1.3 Short-term Information Encoding 24 | 25 | | Year | Publication | Paper Title | Project | 26 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 27 | | 2017 | CVPR | [Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos](https://openaccess.thecvf.com/content_cvpr_2017/papers/Jain_FusionSeg_Learning_to_CVPR_2017_paper.pdf) | [Code](https://github.com/suyogduttjain/fusionseg), [Project](https://vision.cs.utexas.edu/projects/fusionseg/) | 28 | | 2017 | ICCV | [Segflow: Joint learning for video object segmentation and optical flow](https://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_SegFlow_Joint_Learning_ICCV_2017_paper.pdf) | [Code](https://github.com/JingchunCheng/SegFlow), [Project](https://sites.google.com/site/yihsuantsai/research/iccv17-segflow) | 29 | | 2017 | ICCV | [Learning video object segmentation with visual memory](http://openaccess.thecvf.com/content_ICCV_2017/papers/Tokmakov_Learning_Video_Object_ICCV_2017_paper.pdf) | -- | 30 | | 2018 | ECCV | [Pyramid dilated deeper convlstm for video salient object detection](https://openaccess.thecvf.com/content_ECCV_2018/papers/Hongmei_Song_Pseudo_Pyramid_Deeper_ECCV_2018_paper.pdf) | [Code](https://github.com/shenjianbing/PDB-ConvLSTM) | 31 | | 2018 | CVPR | [Flow guided recurrent neural encoder for video salient object detection](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Flow_Guided_Recurrent_CVPR_2018_paper.pdf) | -- | 32 | | 2019 | ICCV | [Zero-shot video object segmentation via attentive graph neural networks](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Zero-Shot_Video_Object_Segmentation_via_Attentive_Graph_Neural_Networks_ICCV_2019_paper.pdf) | [Code](https://github.com/carrierlxk/AGNN) | 33 | | 2019 | ICCV | [Motion guided attention for video salient object detection](https://openaccess.thecvf.com/content_ICCV_2019/papers/Li_Motion_Guided_Attention_for_Video_Salient_Object_Detection_ICCV_2019_paper.pdf) | [Code](https://github.com/lhaof/Motion-Guided-Attention) | 34 | | 2019 | IJCV | [Learning to segment moving objects](https://arxiv.org/pdf/1712.01127.pdf) | -- | 35 | | 2020 | AAAI | [Motion attentive transition for zero-shot video object segmentation](https://ojs.aaai.org/index.php/AAAI/article/download/7008/6862) | [Code](https://github.com/tfzhou/MATNet) | 36 | 37 | #### 2.1.4 Long-term Context Encoding 38 | 39 | | Year | Publication | Paper Title | Project | 40 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 41 | | 2019 | CVPR | [See more, know more: Unsupervised video object segmentation with co-attention siamese networks](https://openaccess.thecvf.com/content_CVPR_2019/papers/Lu_See_More_Know_More_Unsupervised_Video_Object_Segmentation_With_Co-Attention_CVPR_2019_paper.pdf) | [Code](https://github.com/carrierlxk/COSNet) | 42 | | 2019 | ICCV | [Anchor diffusion for unsupervised video object segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Yang_Anchor_Diffusion_for_Unsupervised_Video_Object_Segmentation_ICCV_2019_paper.pdf) | [Code](https://github.com/yz93/anchor-diff-VOS)| 43 | | 2019 | ICCV | [Zero-shot video object segmentation via attentive graph neural networks](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Zero-Shot_Video_Object_Segmentation_via_Attentive_Graph_Neural_Networks_ICCV_2019_paper.pdf)| [Code](https://github.com/carrierlxk/AGNN)| 44 | | 2020 | AAAI | [Pyramid constrained network for fast video salient object detection](https://ojs.aaai.org/index.php/AAAI/article/view/6718)|[Code](https://github.com/guyuchao/PyramidCSA)| 45 | | 2020 | ECCV | [Unsupervised video object segmentation with joint hotspot tracking](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590477.pdf) | [Code](https://github.com/luzhangada/code-for-WCS-Net) | 46 | | 2020 | ECCV | [Learning discriminative feature with crf for unsupervised video object segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720443.pdf)|--| 47 | | 2020 | TPAMI | [Zero-Shot Video Object Segmentation with Co-Attention Siamese Networks](https://ieeexplore.ieee.org/abstract/document/9268466/) | [Code](https://github.com/carrierlxk/COSNet)| 48 | | 2021 | AAAI | [F2net: Learning to focus on the foreground for unsupervised video object segmentation](https://ojs.aaai.org/index.php/AAAI/article/view/16308)| --| 49 | | 2021 | CVPR | [Reciprocal transformations for unsupervised video object segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Ren_Reciprocal_Transformations_for_Unsupervised_Video_Object_Segmentation_CVPR_2021_paper.pdf)| [Code](https://github.com/OliverRensu/RTNet)| 50 | | 2021 | TPAMI | [Segmenting objects from relational visual data](https://ieeexplore.ieee.org/document/9551804/) | [Code](https://github.com/carrierlxk/AGNN) | 51 | 52 | #### 2.1.5 Un-/Weakly-supervised based 53 | | Year | Publication | Paper Title | Project | 54 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 55 | | 2019 | CVPR | [Learning unsupervised video object segmentation through visual attention](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.pdf) | [Code](https://github.com/wenguanwang/AGS) | 56 | | 2019 | CVPR | [Unsupervised moving object detection via contextual information separation](https://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Unsupervised_Moving_Object_Detection_via_Contextual_Information_Separation_CVPR_2019_paper.pdf) | [Code](https://github.com/antonilo/unsupervised_detection), [Project](http://rpg.ifi.uzh.ch/unsupervised_detection.html)| 57 | | 2020 | CVPR | [Learning video object segmentation from unlabeled videos](http://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Learning_Video_Object_Segmentation_From_Unlabeled_Videos_CVPR_2020_paper.pdf)| [Code](https://github.com/carrierlxk/MuG) | 58 | | 2021 | CVPR | [Dystab: Unsupervised object segmentation via dynamic-static bootstrapping](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_DyStaB_Unsupervised_Object_Segmentation_via_Dynamic-Static_Bootstrapping_CVPR_2021_paper.pdf)|[Code](https://github.com/blai88/dystab)| 59 | 60 | #### 2.1.6 Instance-level AVOS 61 | | Year | Publication | Paper Title | Project | 62 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 63 | | 2019 | CVPR | [See more, know more: Unsupervised video object segmentation with co-attention siamese networks](https://openaccess.thecvf.com/content_CVPR_2019/papers/Lu_See_More_Know_More_Unsupervised_Video_Object_Segmentation_With_Co-Attention_CVPR_2019_paper.pdf) | [Code](https://github.com/carrierlxk/COSNet) | 64 | | 2019 | CVPR | [Rvos: End-to-end recurrent network for video object segmentation](https://openaccess.thecvf.com/content_CVPR_2019/papers/Ventura_RVOS_End-To-End_Recurrent_Network_for_Video_Object_Segmentation_CVPR_2019_paper.pdf) | [Code](https://github.com/imatge-upc/rvos),[Project](https://imatge-upc.github.io/rvos/) | 65 | | 2019 | ICCV | [Zero-shot video object segmentation via attentive graph neural networks](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Zero-Shot_Video_Object_Segmentation_via_Attentive_Graph_Neural_Networks_ICCV_2019_paper.pdf) | [Code](https://github.com/carrierlxk/AGNN) | 66 | | 2020 | TPAMI | [Zero-Shot Video Object Segmentation with Co-Attention Siamese Networks](https://ieeexplore.ieee.org/abstract/document/9268466/) | [Code](https://github.com/carrierlxk/COSNet)| 67 | | 2020 | WACV | [Unovost: Unsupervised offline video object segmentation and tracking](https://openaccess.thecvf.com/content_WACV_2020/papers/Luiten_UnOVOST_Unsupervised_Offline_Video_Object_Segmentation_and_Tracking_WACV_2020_paper.pdf)| --| 68 | | 2021 | CVPR | [Target-aware object discovery and association for unsupervised video multi-object segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhou_Target-Aware_Object_Discovery_and_Association_for_Unsupervised_Video_Multi-Object_Segmentation_CVPR_2021_paper.pdf)|--| 69 | | 2021 | TPAMI | [Paying attention to video object pattern understanding](https://ieeexplore.ieee.org/document/8957473)|[Code](https://github.com/wenguanwang/AGS)| 70 | 71 | ### 2.2 Semi-automatic Video Object Segmentation (SVOS) 72 | 73 | #### 2.2.1 Online Fine-tuning based 74 | | Year | Publication | Paper Title | Project | 75 | | ---- | :---------: | ------------------------------------------------------------ | :---------------------------------------------------: | 76 | | 2017 | CVPR | [One-shot video object segmentation](https://arxiv.org/pdf/1611.05198.pdf) | [Code](https://github.com/kmaninis/OSVOS-PyTorch) | 77 | | 2017 | BMVC | [Online adaptation of convolutional neural networks for video object segmentation](https://arxiv.org/pdf/1706.09364.pdf) | [Code](https://github.com/Stocastico/OnAVOS) | 78 | | 2018 | CVPR | [Monet: Deep motion exploitation for video object segmentation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xiao_MoNet_Deep_Motion_CVPR_2018_paper.pdf) | -- | 79 | | 2018 | CVPR | [Efficient video object segmentation via network modulation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_Efficient_Video_Object_CVPR_2018_paper.pdf) | [Code](https://github.com/linjieyangsc/video_seg) | 80 | | 2018 | TPAMI | [Video object segmentation without temporal information](https://arxiv.org/pdf/1709.06031.pdf) | [Project](https://cvlsegmentation.github.io/osvos-s/) | 81 | | 2019 | TPAMI | [Online meta adaptation for fast video object segmentation](https://ieeexplore.ieee.org/document/8611188) | [Code](https://github.com/huaxinxiao/MVOS-OL) | 82 | | 2020 | NeurIPS | [Make one-shot video object segmentation efficient again](https://proceedings.neurips.cc//paper/2020/file/781397bc0630d47ab531ea850bddcf63-Paper.pdf) | [Code](https://github.com/dvl-tum/e-osvos) | 83 | 84 | #### 2.2.2 Propagation-based 85 | | Year | Publication | Paper Title | Project | 86 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 87 | | 2017 | CVPR | [Learning video object segmentation from static images](https://openaccess.thecvf.com/content_cvpr_2017/papers/Perazzi_Learning_Video_Object_CVPR_2017_paper.pdf) | -- | 88 | | 2017 | CVPR | [Online video object segmentation via convolutional trident network](https://openaccess.thecvf.com/content_cvpr_2017/papers/Jang_Online_Video_Object_CVPR_2017_paper.pdf) | -- | 89 | | 2017 | CVPR | [Video propagation networks](https://openaccess.thecvf.com/content_cvpr_2017/papers/Jampani_Video_Propagation_Networks_CVPR_2017_paper.pdf) | [Code](https://github.com/varunjampani/video_prop_networks) | 90 | | 2018 | CVPR | [Motion-guided cascaded refinement network for video object segmentation](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0391.pdf) | -- | 91 | | 2018 | CVPR | [Reinforcement cutting-agent learning for video object segmentation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Han_Reinforcement_Cutting-Agent_Learning_CVPR_2018_paper.pdf) | -- | 92 | | 2018 | CVPR | [CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF](https://openaccess.thecvf.com/content_cvpr_2018/papers/Bao_CNN_in_MRF_CVPR_2018_paper.pdf) | -- | 93 | | 2018 | CVPR | [Fast and accurate online video object segmentation via tracking parts](https://arxiv.org/pdf/1806.02323.pdf) | [Code](https://github.com/JingchunCheng/FAVOS) | 94 | | 2018 | CVPR | [Fast video object segmentation by reference-guided mask propagation](https://openaccess.thecvf.com/content_cvpr_2018/papers/Oh_Fast_Video_Object_CVPR_2018_paper.pdf) | -- | 95 | | 2018 | ECCV | [Video object segmentation with joint reidentification and attention-aware mask propagation](https://openaccess.thecvf.com/content_ECCV_2018/papers/Xiaoxiao_Li_Video_Object_Segmentation_ECCV_2018_paper.pdf) | [Project](http://mmlab.ie.cuhk.edu.hk/projects/DyeNet/) | 96 | | 2018 | ECCV | [Video object segmentation by learning location-sensitive embeddings](https://openaccess.thecvf.com/content_ECCV_2018/papers/Hai_Ci_Video_Object_Segmentation_ECCV_2018_paper.pdf) | -- | 97 | | 2018 | arXiv | [Youtube-vos: A large-scale video object segmentation benchmark](https://arxiv.org/pdf/1809.03327.pdf) | [Project](https://youtube-vos.org/) | 98 | | 2019 | IJCV | [Lucid data dreaming for video object segmentation](https://arxiv.org/pdf/1703.09554.pdf) | [Code](https://github.com/ankhoreva/LucidDataDreaming), [Project](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/weakly-supervised-learning/lucid-data-dreaming-for-object-tracking) | 99 | | 2019 | CVPR | [Mhp-vos: Multiple hypotheses propagation for video object segmentation](https://arxiv.org/pdf/1904.08141.pdf) | [Code](https://github.com/shuangjiexu/MHP-VOS) | 100 | | 2019 | CVPR | [A generative appearance model for end-to-end video object segmentation](https://openaccess.thecvf.com/content_CVPR_2019/papers/Johnander_A_Generative_Appearance_Model_for_End-To-End_Video_Object_Segmentation_CVPR_2019_paper.pdf) | [Code](https://github.com/joakimjohnander/agame-vos) | 101 | | 2019 | ICCV | [Fast video object segmentation via dynamic targeting network](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Fast_Video_Object_Segmentation_via_Dynamic_Targeting_Network_ICCV_2019_paper.pdf) | -- | 102 | | 2019 | ICCV | [Agss-vos: Attention guided single-shot video object segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Lin_AGSS-VOS_Attention_Guided_Single-Shot_Video_Object_Segmentation_ICCV_2019_paper.pdf) | [Code](https://github.com/dvlab-research/AGSS-VOS) | 103 | | 2020 | CVPR | [State-aware tracker for real-time video object segmentation](https://arxiv.org/pdf/2003.00482.pdf) | [Code](https://github.com/MegviiDetection/video_analyst) | 104 | | 2020 | CVPR | [Fast video object segmentation with temporal aggregation network and dynamic template matching](https://openaccess.thecvf.com/content_CVPR_2020/papers/Huang_Fast_Video_Object_Segmentation_With_Temporal_Aggregation_Network_and_Dynamic_CVPR_2020_paper.pdf) | [Project](https://xuhuaking.github.io/Fast-VOS-DTTM-TAN/) | 105 | | 2022 | AAAI | [Reliable Propagation-Correction Modulation for Video Object Segmentation](https://arxiv.org/pdf/2112.02853.pdf) | [Code](https://github.com/JerryX1110/RPCMVOS) | 106 | 107 | #### 2.2.3 Matching-based 108 | | Year | Publication | Paper Title | Project | 109 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 110 | | 2015 | ICCV | [Visual tracking with fully convolutional networks](https://wlouyang.github.io/Papers/Wang_Visual_Tracking_With_ICCV_2015_paper.pdf) | -- | 111 | | 2017 | ICCV | [Pixel-level matching for video object segmentation using convolutional neural networks](https://openaccess.thecvf.com/content_ICCV_2017/papers/Yoon_Pixel-Level_Matching_for_ICCV_2017_paper.pdf) | -- | 112 | | 2017 | CVPR | [Learning video object segmentation from static images](https://openaccess.thecvf.com/content_cvpr_2017/papers/Perazzi_Learning_Video_Object_CVPR_2017_paper.pdf) | -- | 113 | | 2018 | CVPR | [Fast and accurate online video object segmentation via tracking parts](https://arxiv.org/pdf/1806.02323.pdf) | [Code](https://github.com/JingchunCheng/FAVOS) | 114 | | 2018 | CVPR | [CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF](https://openaccess.thecvf.com/content_cvpr_2018/papers/Bao_CNN_in_MRF_CVPR_2018_paper.pdf) | -- | 115 | | 2018 | CVPR | [Motion-guided cascaded refinement network for video object segmentation](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0391.pdf) | -- | 116 | | 2018 | ECCV | [Video object segmentation with joint reidentification and attention-aware mask propagation](https://openaccess.thecvf.com/content_ECCV_2018/papers/Xiaoxiao_Li_Video_Object_Segmentation_ECCV_2018_paper.pdf) | [Project](http://mmlab.ie.cuhk.edu.hk/projects/DyeNet/) | 117 | | 2018 | ECCV | [Videomatch: Matching based video object segmentation](https://arxiv.org/pdf/1809.01123.pdf) | -- | 118 | | 2018 | arXiv | [Youtube-vos: A large-scale video object segmentation benchmark](https://arxiv.org/pdf/1809.03327.pdf) | [Project](https://youtube-vos.org/) | 119 | | 2019 | CVPR | [Feelvos: Fast end-to-end embedding learning for video object segmentation](https://arxiv.org/pdf/1902.09513.pdf) | -- | 120 | | 2019 | CVPR | [Mhp-vos: Multiple hypotheses propagation for video object segmentation](https://arxiv.org/pdf/1904.08141.pdf) | [Code](https://github.com/shuangjiexu/MHP-VOS) | 121 | | 2019 | ICCV | [Ranet: Ranking attention network for fast video object segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_RANet_Ranking_Attention_Network_for_Fast_Video_Object_Segmentation_ICCV_2019_paper.pdf) | [Code](https://github.com/Storife/RANet) | 122 | | 2019 | ICCV | [Video object segmentation using space-time memory networks](https://openaccess.thecvf.com/content_ICCV_2019/papers/Oh_Video_Object_Segmentation_Using_Space-Time_Memory_Networks_ICCV_2019_paper.pdf) | -- | 123 | | 2019 | ICCV | [Capsulevos: Semisupervised video object segmentation using capsule routing](https://openaccess.thecvf.com/content_ICCV_2019/papers/Duarte_CapsuleVOS_Semi-Supervised_Video_Object_Segmentation_Using_Capsule_Routing_ICCV_2019_paper.pdf) | [Code](https://github.com/KevinDuarte/CapsuleVOS) | 124 | | 2020 | CVPR | [A transductive approach for video object segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_A_Transductive_Approach_for_Video_Object_Segmentation_CVPR_2020_paper.pdf) | [Code](https://github.com/microsoft/transductive-vos.pytorch) | 125 | | 2020 | CVPR | [Learning fast and robust target models for video object segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Robinson_Learning_Fast_and_Robust_Target_Models_for_Video_Object_Segmentation_CVPR_2020_paper.pdf) | [Code](https://github.com/andr345/frtm-vos) | 126 | | 2020 | ECCV | [Collaborative video object segmentation by foreground-background integration](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500324.pdf) | [Code](https://github.com/z-x-yang/CFBI) | 127 | | 2020 | ECCV | [Video object segmentation with episodic graph memory networks](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123480664.pdf) | [Code](https://github.com/carrierlxk/GraphMemVOS) | 128 | | 2020 | ECCV | [Learning what to learn for video object segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123470766.pdf) | [Code](https://github.com/visionml/pytracking) | 129 | | 2020 | ECCV | [Kernelized memory network for video object segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670630.pdf) | -- | 130 | | 2020 | ECCV | [Fast video object segmentation using the global context module](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123550732.pdf) | -- | 131 | | 2020 | ECCV | [Memory selection network for video propagation](https://jiaya.me/papers/video_prop_eccv20.pdf) | -- | 132 | | 2020 | NeurIPS | [Video object segmentation with adaptive feature bank and uncertain-region refinement](https://proceedings.neurips.cc/paper/2020/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf) | [Code](https://github.com/xmlyqing00/AFB-URR) | 133 | | 2021 | CVPR | [Learning position and target consistency for memory-based video object segmentation]() | -- | 134 | | 2021 | CVPR | [Efficient regional memory network for video object segmentation](https://arxiv.org/pdf/2103.12934.pdf) | [Code](https://github.com/hzxie/RMNet), [Project](https://infinitescript.com/project/rmnet/) | 135 | | 2021 | CVPR | [Video object segmentation using global and instance embedding learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Ge_Video_Object_Segmentation_Using_Global_and_Instance_Embedding_Learning_CVPR_2021_paper.pdf) | -- | 136 | | 2021 | CVPR | [Sstvos: Sparse spatiotemporal transformers for video object segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Duke_SSTVOS_Sparse_Spatiotemporal_Transformers_for_Video_Object_Segmentation_CVPR_2021_paper.pdf) | [Code](https://github.com/dukebw/SSTVOS) | 137 | | 2021 | CVPR | [Swiftnet: Realtime video object segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_SwiftNet_Real-Time_Video_Object_Segmentation_CVPR_2021_paper.pdf) | [Code](https://github.com/haochenheheda/SwiftNet) | 138 | 139 | #### 2.2.4 Box-initialization based 140 | | Year | Publication | Paper Title | Project | 141 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 142 | | 2019 | CVPR | [Fast online object tracking and segmentation: A unifying approach](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Fast_Online_Object_Tracking_and_Segmentation_A_Unifying_Approach_CVPR_2019_paper.pdf) | [Code](https://github.com/foolwood/SiamMask), [Project](https://www.robots.ox.ac.uk/~qwang/SiamMask/) | 143 | | 2020 | CVPR | [Fast template matching and update for video object tracking and segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Sun_Fast_Template_Matching_and_Update_for_Video_Object_Tracking_and_CVPR_2020_paper.pdf)| [Code](https://github.com/insomnia94/FTMU)| 144 | | 2021 | AAAI | [Query-memory reaggregation for weakly-supervised video object segmentation](https://www.aaai.org/AAAI21Papers/AAAI-3972.LinF.pdf)| --| 145 | 146 | #### 2.2.5 Un-/Weakly-supervised based 147 | | Year | Publication | Paper Title | Project | 148 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 149 | | 2018 | ECCV | [Tracking emerges by colorizing videos](https://openaccess.thecvf.com/content_ECCV_2018/papers/Carl_Vondrick_Self-supervised_Tracking_by_ECCV_2018_paper.pdf) | -- | 150 | | 2019 | CVPR | [Learning correspondence from the cycle-consistency of time](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Correspondence_From_the_Cycle-Consistency_of_Time_CVPR_2019_paper.pdf)|[Code](https://github.com/xiaolonw/TimeCycle), [Project](https://ajabri.github.io/timecycle/)| 151 | | 2019 | NeurIPS | [Joint-task self-supervised learning for temporal correspondence](https://proceedings.neurips.cc/paper/2019/file/140f6969d5213fd0ece03148e62e461e-Paper.pdf)|[Code](https://github.com/Liusifei/UVC), [Project](https://sites.google.com/view/uvc2019/)| 152 | | 2020 | CVPR | [Mast: A memory-augmented selfsupervised tracker](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lai_MAST_A_Memory-Augmented_Self-Supervised_Tracker_CVPR_2020_paper.pdf)|[Code](https://github.com/zlai0/MAST)| 153 | | 2020 | CVPR | [Learning video object segmentation from unlabeled videos](http://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Learning_Video_Object_Segmentation_From_Unlabeled_Videos_CVPR_2020_paper.pdf)| [Code](https://github.com/carrierlxk/MuG) | 154 | | 2020 | NeurIPS | [Space-time correspondence as a contrastive random walk](https://arxiv.org/pdf/2006.14613) |[Code](https://github.com/ajabri/videowalk), [Project](https://ajabri.github.io/videowalk/)| 155 | | 2022 | CVPR | [Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning](https://arxiv.org/abs/2203.14333) |[Code](https://github.com/0liliulei/LIIR), [Project](https://github.com/0liliulei/LIIR)| 156 | 157 | #### 2.2.6 Other Specific Methods 158 | | Year | Publication | Paper Title | Project | 159 | | ---- | :---------: | ------------------------------------------------------------ |:------------------------------------------------------------------------------------------------------------------------------:| 160 | | 2019 | ICCV | [Dmm-net: Differentiable mask-matching network for video object segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_DMM-Net_Differentiable_Mask-Matching_Network_for_Video_Object_Segmentation_ICCV_2019_paper.pdf) | [Code](https://github.com/ZENGXH/DMM_Net) | 161 | | 2019 | CVPR | [Bubblenets: Learning to select the guidance frame in video object segmentation by deep sorting frames](https://openaccess.thecvf.com/content_CVPR_2019/papers/Griffin_BubbleNets_Learning_to_Select_the_Guidance_Frame_in_Video_Object_CVPR_2019_paper.pdf)|[Code](https://github.com/griffbr/BubbleNets)| 162 | | 2020 | NeurIPS | [Delving into the cyclic mechanism in semi-supervised video object segmentation](https://proceedings.neurips.cc/paper/2020/file/0d5bd023a3ee11c7abca5b42a93c4866-Paper.pdf) | --| 163 | | 2021 | CVPR | [Learning dynamic network using a reuse gate function in semi-supervised video object segmentation](https://openaccess.thecvf.com/content/CVPR2021/papers/Park_Learning_Dynamic_Network_Using_a_Reuse_Gate_Function_in_Semi-Supervised_CVPR_2021_paper.pdf)| [Code](https://github.com/HYOJINPARK/Reuse_VOS)| 164 | 165 | ### 2.3 Interactive Video Object Segmentation (IVOS) 166 | 167 | #### 2.3.1 Interaction-propagation based 168 | 169 | | Year | Publication | Paper Title | Project | 170 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 171 | | 2016 | CVPR | [Deep interactive object selection](https://openaccess.thecvf.com/content_cvpr_2016/papers/Xu_Deep_Interactive_Object_CVPR_2016_paper.pdf) | -- | 172 | | 2017 | CVPR | [One-shot video object segmentation](https://arxiv.org/pdf/1611.05198.pdf) | [Code](https://github.com/kmaninis/OSVOS-PyTorch) | 173 | | 2017 | arXiv | [Interactive video object segmentation in the wild](https://arxiv.org/pdf/1801.00269.pdf) | -- | 174 | | 2019 | CVPR | [Fast user-guided video object segmentation by interaction-and-propagation networks](https://arxiv.org/pdf/1904.09791.pdf) | -- | 175 | | 2020 | CVPR | [Memory aggregation networks for efficient interactive video object segmentation](https://arxiv.org/pdf/2003.13246.pdf) | [Code](https://github.com/lightas/CVPR2020_MANet) | 176 | | 2020 | ECCV | [Interactive video object segmentation using global and local transfer modules](https://arxiv.org/pdf/2007.08139.pdf) | [Code](https://github.com/yuk6heo/IVOS-ATNet), [Project](http://mcl.korea.ac.kr/yukheo_eccv2020/) | 177 | | 2021 | CVPR | [Guided interactive video object segmentation using reliability-based attention maps](https://openaccess.thecvf.com/content/CVPR2021/papers/Heo_Guided_Interactive_Video_Object_Segmentation_Using_Reliability-Based_Attention_Maps_CVPR_2021_paper.pdf) | [Code](https://github.com/yuk6heo/GIS-RAmap) | 178 | | 2021 | CVPR | [Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion](https://arxiv.org/pdf/2103.07941.pdf) | [Code](https://github.com/hkchengrex/MiVOS), [Project](https://hkchengrex.github.io/MiVOS/) | 179 | 180 | #### 2.3.2 Other Methods 181 | 182 | | Year | Publication | Paper Title | Project | 183 | | ---- | :---------: | ------------------------------------------------------------ | :---------------------------------------------------------: | 184 | | 2018 | CVPR | [Blazingly fast video object segmentation with pixel-wise metric learning](https://arxiv.org/pdf/1804.03131.pdf) | [Code](https://github.com/yuhuayc/fast-vos) | 185 | | 2018 | CVPR | [Fast and accurate online video object segmentation via tracking parts](https://arxiv.org/pdf/1806.02323.pdf) | [Code](https://github.com/JingchunCheng/FAVOS) | 186 | | 2020 | ECCV | [Scribblebox: Interactive annotation framework for video object segmentation](https://arxiv.org/pdf/2008.09721.pdf) | [Project](http://www.cs.toronto.edu/~linghuan/scribblebox/) | 187 | | 2021 | CVPR | [Learning to recommend frame for interactive video object segmentation in the wild](https://arxiv.org/pdf/2103.10391.pdf) | [Code](https://github.com/svip-lab/IVOS-W) | 188 | 189 | ### 2.4 Language-guided Video Object Segmentation (LVOS) 190 | 191 | #### 2.4.1 Dynamic Convolution-based 192 | 193 | | Year | Publication | Paper Title | Project | 194 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 195 | | 2017 | CVPR | [Tracking by natural language specification](https://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Tracking_by_Natural_CVPR_2017_paper.pdf) | [Project](https://github.com/QUVA-Lab/lang-tracker) | 196 | | 2018 | CVPR | [Actor and action video segmentation from a sentence](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gavrilyuk_Actor_and_Action_CVPR_2018_paper.pdf) | [Project](https://kgavrilyuk.github.io/publication/actor_action/) | 197 | | 2019 | ICCV | [Asymmetric cross-guided attention network for actor and action video segmentation from natural language query](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Asymmetric_Cross-Guided_Attention_Network_for_Actor_and_Action_Video_Segmentation_ICCV_2019_paper.pdf) | -- | 198 | | 2020 | AAAI | [Context modulated dynamic networks for actor and action video segmentation with language queries](https://ojs.aaai.org/index.php/AAAI/article/view/6895/6749) | -- | 199 | | 2020 | IJCAI | [Polar relative positional encoding for video-language segmentation](https://www.ijcai.org/proceedings/2020/0132.pdf) | -- | 200 | 201 | #### 2.4.2 Capsule Routing-based 202 | 203 | | Year | Publication | Paper Title | Project | 204 | | ---- | :---------: | ------------------------------------------------------------ | :----------------------------------------------------------: | 205 | | 2018 | ICLR | [Matrix capsules with em routing](https://openreview.net/pdf?id=HJWLfGWRb) | [Code](https://github.com/IBM/matrix-capsules-with-em-routing) | 206 | | 2020 | CVPR | [Visual-textual capsule routing for text-based video segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/McIntosh_Visual-Textual_Capsule_Routing_for_Text-Based_Video_Segmentation_CVPR_2020_paper.pdf) | -- | 207 | 208 | #### 2.4.3 Attention-based 209 | 210 | | Year | Publication | Paper Title | Project | 211 | | ---- | :---------: | ------------------------------------------------------------ | :-------------------------------------------------: | 212 | | 2018 | ACCV | [Video object segmentation with language referring expressions](https://arxiv.org/pdf/1803.08006.pdf) | -- | 213 | | 2019 | ICCV | [Asymmetric cross-guided attention network for actor and action video segmentation from natural language query](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Asymmetric_Cross-Guided_Attention_Network_for_Actor_and_Action_Video_Segmentation_ICCV_2019_paper.pdf) | -- | 214 | | 2020 | ECCV | [Urvos: Unified referring video object segmentation network with a large-scale benchmark](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf) | [Code](https://github.com/skynbe/Refer-Youtube-VOS) | 215 | | 2021 | CVPR | [Collaborative spatial-temporal modeling for language-queried video actor segmentation](https://arxiv.org/pdf/2105.06818.pdf) | -- | 216 | | 2021 | TPAMI | [Referring segmentation in images and videos with cross-modal self-attention network](https://arxiv.org/pdf/2102.04762.pdf) | -- | 217 | | 2021 | TPAMI | [Cross-modal progressive comprehension for referring segmentation](https://ieeexplore.ieee.org/document/9430750) | [Code](https://github.com/spyflying/CMPC-Refseg) | 218 | 219 | #### 220 | --------------------------------------------------------------------------------