├── LICENSE
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Daqing Liu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome Referring Expression Comprehension
  2 | 
  3 | > Inspired by [awesome-grounding](https://github.com/TheShadow29/awesome-grounding) and this [survey](https://arxiv.org/pdf/2007.09554.pdf).
  4 | 
  5 | A curated list of research papers in Referring Expression Comprehension (REC). Link to the code and website if available is also present.
  6 | 
  7 | ## Table of Contents
  8 | - [Contributing](#contributing)
  9 | - [Paper List](#paper-list)
 10 |   - [Survey](#survey)
 11 |   - [Dataset](#dataset)
 12 |   - [2020](#2020)
 13 |   - [2019](#2019)
 14 |   - [2018](#2018)
 15 |   - [2017](#2017)
 16 |   - [2016](#2016)
 17 | - [Acknowledgement](#acknowledgement)
 18 | 
 19 | ## Paper List
 20 | 
 21 | ### Survey
 22 | 
 23 | - **Referring Expression Comprehension : A Survey of Methods and Datasets**.  *Yanyuan Qiao, Chaorui Deng, and Qi Wu*. arXiv, 2020. [[Paper]](https://arxiv.org/pdf/2007.09554.pdf)
 24 | 
 25 | ### Dataset
 26 | 
 27 | - **[RefCOCOg] Generation and Comprehension of Unambiguous Object Descriptions**.  *Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy*. CVPR, 2016. [[Paper]](https://arxiv.org/pdf/1511.02283.pdf) [[Code]](https://github.com/mjhucla/Google_Refexp_toolbox)
 28 | - **[RefCOCO, RefCOCO+] Modeling context in referring expressions**.  *Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg*. ECCV, 2016. [[Paper]](https://arxiv.org/pdf/1608.00272.pdf) [[Code]](https://github.com/lichengunc/refer)
 29 | - **[CLEVR-Ref+] CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions**.  *Runtao Liu, Chenxi Liu, Yutong Bai, and Alan Yuille*. CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1901.00850.pdf) [[Code]](https://github.com/ccvl/clevr-refplus-dataset-gen) [[Website]](https://cs.jhu.edu/~cxliu/2019/clevr-ref+)
 30 | - **[Cops-Ref] Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension**.  *Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, and Qi Wu*. CVPR, 2020. [[Paper]](https://arxiv.org/pdf/2003.00403.pdf)[~~[Code]~~](https://github.com/zfchenUnique/Cops-Ref)
 31 | - **[Ref-Reasoning] Graph-Structured Referring Expression Reasoning in The Wild**.  *Sibei Yang, Guanbin Li, and Yizhou Yu*. CVPR, 2020. [[Paper]](https://arxiv.org/pdf/2004.08814.pdf) [[Code]](https://github.com/sibeiyang/sgmn) [[Website]](https://sibeiyang.github.io/dataset/ref-reasoning/)
 32 | 
 33 | ### arXiv
 34 | 
 35 | 
 36 | - **(TransVG) TransVG: End-to-End Visual Grounding with Transformers**.  *Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li*. arXiv, 2021. [[Paper]](https://arxiv.org/pdf/2104.08541.pdf)
 37 | - **(ECIFA) Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge**.  *Peng Wang, Dongyang Liu, Hui Li, and Qi Wu*. arXiv, 2020. [[Paper]](https://arxiv.org/pdf/2006.01629.pdf)
 38 | - **(JVGN) Joint Visual Grounding with Language Scene Graphs**.  *Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, and Qianru Sun*. arXiv, 2019. [[Paper]](https://arxiv.org/pdf/1906.03561.pdf) *(I am an author of the paper)*
 39 | - **A Real-time Global Inference Network for One-stage Referring Expression Comprehension**.  *Yiyi Zhou et al.* arXiv, 2019. [[Paper]](https://arxiv.org/pdf/1912.03478.pdf) [[Code]](https://github.com/luogen1996/Real-time-Global-Inference-Network)
 40 | - **(SGG) Real-Time Referring Expression Comprehension by Single-Stage Grounding Network**.  *Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, and Jiebo Luo*. arXiv, 2018. [[Paper]](https://arxiv.org/pdf/1812.03426v1.pdf)
 41 | 
 42 | ### 2020
 43 | 
 44 | - **Improving One-stage Visual Grounding by Recursive Sub-query Construction**.  *Zhengyuan Yang, Tianlang Chen, Liwei Wang, and Jiebo Luo*. ECCV, 2020. [[Paper]](https://arxiv.org/pdf/2008.01059.pdf) [[Code]](https://github.com/zyang-ur/ReSC)
 45 | - **(LSCM) Linguistic Structure Guided Context Modeling for Referring Image Segmentation**.  *Tianrui Hui et al.* ECCV, 2020. [[Paper]](http://colalab.org/media/paper/Linguistic_Structure_Guided_Context_Modeling_for_Referring_Image_Segmentation.pdf)
 46 | - **(BiLingUNet) BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions**.  *Ozan Arkan Can, İlker Kesen, and Deniz Yuret*. ECCV, 2020. [[Paper]](https://arxiv.org/pdf/2003.12739.pdf)
 47 | - **(SGMN) Graph-Structured Referring Expression Reasoning in The Wild**.  *Sibei Yang, Guanbin Li, and Yizhou Yu*. CVPR, 2020.  [[Paper]](https://arxiv.org/pdf/2004.08814.pdf) [[Code]](https://github.com/sibeiyang/sgmn) [[Website]](https://sibeiyang.github.io/dataset/ref-reasoning/)
 48 | - **(MCN) Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation**.  *Gen Luo et al.* CVPR, 2020. [[Paper]](https://arxiv.org/pdf/2003.08813.pdf) [[Code]](https://github.com/luogen1996/MCN)
 49 | - **(RCCF) A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension**.  *Yue Liao et al.* CVPR, 2020. [[Paper]](https://arxiv.org/pdf/1909.07072.pdf)
 50 | - **(LCMCG) Learning Cross-modal Context Graph for Visual Grounding**.  *Yongfei Liu, Bo Wan, Xiaodan Zhu, and Xuming He*. AAAI, 2020. [[Code]](https://github.com/youngfly11/LCMCG-PyTorch)
 51 | 
 52 | ### 2019
 53 | 
 54 | - **(NMTree) Learning to Assemble Neural Module Tree Networks for Visual Grounding**.  *Daqing Liu, Hanwang Zhang, Feng Wu, and Zheng-Jun Zha*. ICCV, 2019. [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Liu_Learning_to_Assemble_Neural_Module_Tree_Networks_for_Visual_Grounding_ICCV_2019_paper.pdf) [[Code]](https://github.com/daqingliu/NMTree) *(I am an author of the paper)*
 55 | - **(RvG-Tree) Learning to Compose and Reason with Language Tree Structures for Visual Grounding**.  *Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, and Hanwang Zhang*. TPAMI, 2019. [[Paper]](https://arxiv.org/pdf/1906.01784.pdf) *(I am an author of the paper)*
 56 | - **(FAOA) A Fast and Accurate One-Stage Approach to Visual Grounding**.  *Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo*. ICCV, 2019. [[Paper]](https://arxiv.org/pdf/1908.06354.pdf) [[Code]](https://github.com/zyang-ur/onestage_grounding)
 57 | - **(DGA) Dynamic Graph Attention for Referring Expression Comprehension**.  *Sibei Yang, Li Guanbin, and Yu Yizhou*. ICCV, 2019. [[Paper]](https://arxiv.org/pdf/1909.08164.pdf) [[Code]](https://github.com/sibeiyang/sgmn/tree/master/lib/dga_models)
 58 | - **(LCGN) Language-Conditioned Graph Networks for Relational Reasoning**.  *Ronghang Hu, Anna Rohrbach, Trevor Darrell, and Kate Saenko*. ICCV, 2019. [[Paper]](https://arxiv.org/pdf/1905.04405.pdf) [[Code]](https://github.com/ronghanghu/lcgn)
 59 | - **See-through-text grouping for referring image segmentation**.  *DIng Jie Chen, Songhao Jia, Yi Chen Lo, Hwann Tzong Chen, and Tyng Luh Liu*. ICCV, 2019. [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Chen_See-Through-Text_Grouping_for_Referring_Image_Segmentation_ICCV_2019_paper.pdf)
 60 | - **(CMRIN) Cross-Modal Relationship Inference for Grounding Referring Expressions**.  *Sibei Yang, Guanbin Li, and Yizhou Yu*. CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1906.04464.pdf)
 61 | - **(CM-Att-Erase) Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing**.  *Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li*. CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1903.00839.pdf)
 62 | - **(CMSA) Cross-Modal Self-Attention Network for Referring Image Segmentation**.  *Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang*. CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1904.04745.pdf) [[Code]](https://github.com/lwye/CMSA-Net)
 63 | 
 64 | ### 2018
 65 | 
 66 | - **(Multi-hop FiLM) Visual Reasoning with Multi-hop Feature Modulation**.  *Florian Strub, Mathieu Seurin, Ethan Perez, and Harm De Vries*. ECCV, 2018. [[Paper]](https://arxiv.org/pdf/1808.04446.pdf)
 67 | - **(DDPN) Rethinking diversified and discriminative proposal generation for visual grounding**.  *Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, and Dacheng Tao*. IJCAI, 2018. [[Paper]](https://www.ijcai.org/proceedings/2018/0155.pdf) [[Code]](https://github.com/XiangChenchao/DDPN)
 68 | - **(MAttNet) MAttNet: Modular Attention Network for Referring Expression Comprehension**.  *Licheng Yu \*et al.** CVPR, 2018. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_MAttNet_Modular_Attention_CVPR_2018_paper.pdf) [[Code]](https://github.com/lichengunc/MAttNet) [[Website]](http://vision2.cs.unc.edu/refer/comprehension)
 69 | - **(AccumAttn) Visual Grounding via Accumulated Attention**.  *Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, and Mingkui Tan*. CVPR, 2018. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2018/papers/Deng_Visual_Grounding_via_CVPR_2018_paper.pdf)
 70 | - **(ParalAttn) Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries**.  *Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, and Anton Van Den Hengel*. CVPR, 2018. [[Paper]](https://arxiv.org/pdf/1711.06370.pdf) [[Code]](https://github.com/bohanzhuang/Parallel-Attention-A-Unified-Framework-for-Visual-Object-Discovery-through-Dialogs-and-Queries)
 71 | - **(LGRAN) Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks**.  *Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, and Anton van den Hengel*. CVPR, 2018. [[Paper]](https://arxiv.org/pdf/1812.04794.pdf)
 72 | - **(VariContext) Grounding Referring Expressions in Images by Variational Context**.  *Hanwang Zhang, Yulei Niu, and Shih-Fu Chang*. CVPR, 2018. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Grounding_Referring_Expressions_CVPR_2018_paper.pdf) [[Code]](https://github.com/yuleiniu/vc/)
 73 | - **(GroundNet) Using Syntax to Ground Referring Expressions in Natural Images**.  *Volkan Cirik, Taylor Berg-Kirkpatrick, and Louis-Philippe Morency*. AAAI, 2018. [[Paper]](https://arxiv.org/pdf/1805.10547.pdf) [[Code]](https://github.com/volkancirik/groundnet)
 74 | 
 75 | ### 2017
 76 | 
 77 | - **Recurrent Multimodal Interaction for Referring Image Segmentation**.  *Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, and Alan Yuille*. ICCV, 2017. [[Paper]](https://arxiv.org/pdf/1703.07939.pdf) [[Code]](https://github.com/chenxi116/TF-phrasecut-public)
 78 | - **(Attribute) Referring Expression Generation and Comprehension via Attributes**.  *Jingyu Liu, Liang Wang, and Ming-Hsuan Yang*. ICCV, 2017. [[Paper]](http://faculty.ucmerced.edu/mhyang/papers/iccv2017_referring_expression.pdf)
 79 | - **(CMN) Modeling relationships in referential expressions with compositional modular networks**.  *Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, and Kate Saenko*. CVPR, 2017. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2017/papers/Hu_Modeling_Relationships_in_CVPR_2017_paper.pdf) [[Code]](https://github.com/ronghanghu/cmn)
 80 | - **(Spe+Lis+RI) A Joint Speaker-Listener-Reinforcer Model for Referring Expressions**.  *Licheng Yu, Hao Tan, Mohit Bansal, and Tamara L. Berg*. CVPR, 2017. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_A_Joint_Speaker-Listener-Reinforcer_CVPR_2017_paper.pdf) [[Code]](https://github.com/lichengunc/speaker_listener_reinforcer) [[Website]](https://vision.cs.unc.edu/refer/)
 81 | - **Comprehension-guided referring expressions**.  *Ruotian Luo and Gregory Shakhnarovich*. CVPR, 2017. [[Paper]](http://openaccess.thecvf.com/content_cvpr_2017/papers/Luo_Comprehension-Guided_Referring_Expressions_CVPR_2017_paper.pdf) [[Code]](https://github.com/ruotianluo/refexp-comprehension)
 82 | 
 83 | ### 2016
 84 | 
 85 | - **(MCB) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding**.  *Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach*. EMNLP, 2016. [[Paper]](https://arxiv.org/pdf/1606.01847.pdf) [[Code]](https://github.com/akirafukui/vqa-mcb)
 86 | 
 87 | - **(NegBag) Modeling context between objects for referring expression understanding**.  *Varun K. Nagaraja, Vlad I. Morariu, and Larry S. Davis*. ECCV, 2016. [[Paper]](https://arxiv.org/pdf/1608.00525.pdf) [[Code]](https://github.com/varun-nagaraja/referring-expressions)
 88 | - **(VisDif) Modeling context in referring expressions**.  *Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg*. ECCV, 2016. [[Paper]](https://arxiv.org/pdf/1608.00272.pdf) [[Code]](https://github.com/lichengunc/refer)
 89 | - **(SCRC) Natural Language Object Retrieval**.  *Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell*. CVPR, 2016. [[Paper]](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hu_Natural_Language_Object_CVPR_2016_paper.pdf) [[Code]](https://github.com/ronghanghu/natural-language-object-retrieval) [[Website]](http://ronghanghu.com/text_obj_retrieval/)
 90 | 
 91 | - **(MMI) Generation and Comprehension of Unambiguous Object Descriptions**.  *Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy*. CVPR, 2016. [[Paper]](https://arxiv.org/pdf/1511.02283.pdf) [[Code]](https://github.com/mjhucla/Google_Refexp_toolbox)
 92 | 
 93 | ## Contributing
 94 | 
 95 | Please feel free to contact me via email (liudq@mail.ustc.edu.cn) or open an issue or submit a pull request.
 96 | 
 97 | To add a new paper via pull request:
 98 | 
 99 | 1. Fork the repo, edit `README.md`.
100 | 1. Put the new paper at the correct chronological position as the following format:
101 |     ```
102 |     - **Paper Title**. *Author(s)*. Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)
103 |     ```
104 | 1. Send a pull request. Ideally, I will review the request within a week.
105 | 
106 | ## Acknowledgement
107 | 
108 | This repo is maintained by [Daqing LIU](http://home.ustc.edu.cn/~liudq/).
109 | 
110 | Other Awesome Vision-Language lists: [Awesome Vision-Languge Navigation](https://github.com/daqingliu/awesome-vln), [Awesome-Video-Captioning](https://github.com/tgc1997/Awesome-Video-Captioning).
111 | 


--------------------------------------------------------------------------------