├── .gitmodules
├── Images
    ├── conv.jpg
    ├── img.jpg
    ├── lesa.jpg
    ├── lesa_method_fig.png
    └── sa.jpg
└── README.md


/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "LESA_detection"]
2 | 	path = LESA_detection
3 | 	url = https://github.com/Chenglin-Yang/LESA_detection
4 | [submodule "LESA_classification"]
5 | 	path = LESA_classification
6 | 	url = https://github.com/Chenglin-Yang/LESA_classification
7 | 


--------------------------------------------------------------------------------
/Images/conv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chenglin-Yang/LESA/d63861dc13365294ad85fdb85ec180c2b29d4db4/Images/conv.jpg


--------------------------------------------------------------------------------
/Images/img.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chenglin-Yang/LESA/d63861dc13365294ad85fdb85ec180c2b29d4db4/Images/img.jpg


--------------------------------------------------------------------------------
/Images/lesa.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chenglin-Yang/LESA/d63861dc13365294ad85fdb85ec180c2b29d4db4/Images/lesa.jpg


--------------------------------------------------------------------------------
/Images/lesa_method_fig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chenglin-Yang/LESA/d63861dc13365294ad85fdb85ec180c2b29d4db4/Images/lesa_method_fig.png


--------------------------------------------------------------------------------
/Images/sa.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chenglin-Yang/LESA/d63861dc13365294ad85fdb85ec180c2b29d4db4/Images/sa.jpg


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # LESA
 2 | 
 3 | ## Introduction
 4 | 
 5 | This repository contains the official implementation of [Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms](http://arxiv.org/abs/2107.05637).
 6 | The code for image classification and object detection is based on [axial-deeplab](https://github.com/csrhddlam/axial-deeplab) and [mmdetection](https://github.com/open-mmlab/mmdetection). 
 7 | 
 8 | <p align = "center">
 9 | <img src = "Images/lesa_method_fig.png">
10 | </p>
11 | <p align = "center">
12 | Visualizing Locally Enhanced Self-Attention (LESA) at one spatial location. 
13 | </p>
14 | 
15 | Self-Attention has become prevalent in computer vision models. Inspired by fully connected Conditional Random Fields (CRFs), we decompose self-attention into local and context terms. They correspond to the unary and binary terms in CRF and are implemented by attention mechanisms with projection matrices. We observe that the unary terms only make small contributions to the outputs, and meanwhile standard CNNs that rely solely on the unary terms achieve great performances on a variety of tasks. Therefore, we propose Locally Enhanced Self-Attention (LESA), which enhances the unary term by incorporating it with convolutions, and utilizes a fusion module to dynamically couple the unary and binary operations. In our experiments, we replace the self-attention modules with LESA. The results on ImageNet and COCO show the superiority of LESA over convolution and self-attention baselines for the tasks of image recognition, object detection, and instance segmentation.
16 | 
17 | Image             |  Convolution | Self-Attention             |  LESA 
18 | :-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:
19 | |<img src="./Images/img.jpg" width="200px" />|<img src="./Images/conv.jpg" width="200px" />|<img src="./Images/sa.jpg" width="200px" />|<img src="./Images/lesa.jpg" width="200px" /> 
20 | 
21 | <p align = "center">
22 | Effectiveness  of  Locally  Enhanced  Self-Attention(LESA) on COCO object detection and instance segmentation. 
23 | </p>
24 | 
25 | 
26 | ## Citing LESA
27 | 
28 | If you find LESA is helpful in your project, please consider citing our paper.
29 | 
30 | ```BibTeX
31 | @article{yang2021locally,
32 |   title={Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms},
33 |   author={Yang, Chenglin and Qiao, Siyuan and Kortylewski, Adam and Yuille, Alan},
34 |   journal={arXiv preprint arXiv:2107.05637},
35 |   year={2021}
36 | }
37 | ```
38 | 
39 | ## Main Results on ImageNet
40 | 
41 | Please refer to LESA_classification for details.
42 | 
43 | | Method    | Model | Top-1 Acc. | Top-5 Acc. |
44 | |-----------|--------------|:------------:|:------------:|
45 | | LESA_ResNet50 | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/EnO2WCZcRwJJh1YsdHEZXOMBueC6q0baT4kWl_sI5SqjFQ?e=Wh9uWt) | 79.55 | 94.79 |
46 | | LESA_WRN50 | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/EgHuW1XQefNNkLCkRE1Ag4UBA96d7lBasZ4esEh3Re1mXA?e=jU72GG) | 80.18 | 95.07 |
47 | 
48 | ## Main Results on COCO test-dev
49 | 
50 | Please refer to LESA_detection for details.
51 | 
52 | | Method    | Backbone          | Pretrained | Model | Box AP | Mask AP |
53 | |:-----------:|:-----------------:|--------------|--------------|:------------:|:------------:|
54 | | Mask-RCNN | LESA_ResNet50         | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/EsV_fGZY-uhEkciwVckp4c8BlInA1GFv7gett1_LOZ0vFg?e=g1If75) | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/Egpo87VmMmlEg0jY_KHYAJsBS7EFDJ4YxJ2zhkTxcJCzWg?e=SGSKmx) | 44.2 | 39.6 |
55 | | HTC | LESA_WRN50 | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/ElagAKEgXttArEbtVR6NpmEBWAZN0pNE5Q6MMXEJZ27VHg?e=dIdwAI) | [Download](https://livejohnshopkins-my.sharepoint.com/:f:/g/personal/cyang76_jh_edu/EmSPz8ToSK5GuYyWELj3Y0QBwP3Q_Jd4FhK1WDvf2FuADw?e=xRsbl5) | 50.5 | 44.4 |
56 | 
57 | ## Credits
58 | 
59 | This project is based on [axial-deeplab](https://github.com/csrhddlam/axial-deeplab) and [mmdetection](https://github.com/open-mmlab/mmdetection).
60 | 
61 | Relative position embedding is based on [bottleneck-transformer-pytorch](https://github.com/lucidrains/bottleneck-transformer-pytorch/blob/main/bottleneck_transformer_pytorch/bottleneck_transformer_pytorch.py)
62 | 
63 | ResNet is based on [pytorch/vision](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py). Classification helper functions are based on [pytorch-classification](https://github.com/bearpaw/pytorch-classification).
64 | 


--------------------------------------------------------------------------------