└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Awesome Transformer [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
 2 | 
 3 | A curated list of all kinds of transformers, also include some personal experiment results, applications and thoughts from industry.
 4 | 
 5 | 
 6 | 
 7 | ## Updates
 8 | 
 9 | - **2021.02.20**: I opened github discuss panel, we can start discuss about transformers there.
10 | 
11 | 
12 | 
13 | ## Blogs
14 | 
15 | - [Attention is All You Need](https://arxiv.org/abs/1706.03762)
16 | - [Chinese Blog] 3W字长文带你轻松入门视觉transformer [[Link](https://zhuanlan.zhihu.com/p/308301901)]
17 | - Transformers in Vision: A Survey [[paper](https://arxiv.org/abs/2101.01169)]   - 2021.01.04
18 | - A Survey on Visual Transformer [[paper](https://arxiv.org/abs/2012.12556)]   - 2020.12.24
19 | - https://zhuanlan.zhihu.com/p/342512339 万字长文带你了解2021大热的ViT
20 | - 线性Attention的探索：Attention必须有个Softmax吗？[link](线性Attention的探索：Attention必须有个Softmax吗？)
21 | 
22 | 
23 | 
24 | ## Standalone Github Repos
25 | 
26 | - https://github.com/ThilinaRajapakse/simpletransformers: Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI;
27 | - 
28 | 
29 | 
30 | 
31 | ## arXiv papers
32 | 
33 | - Training Vision Transformers for Image Retrieval[[paper](https://arxiv.org/abs/2102.05644)]
34 | - **[TransReID]** TransReID: Transformer-based Object Re-Identification[[paper](https://arxiv.org/abs/2102.04378)]
35 | - **[VTN]** Video Transformer Network[[paper](https://arxiv.org/abs/2102.00719)]
36 | - **[T2T-ViT]** Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [[paper](https://arxiv.org/abs/2101.11986)] [[code](https://github.com/yitu-opensource/T2T-ViT)]
37 | - **[BoTNet]** Bottleneck Transformers for Visual Recognition [[paper](https://arxiv.org/abs/2101.11605)]
38 | - **[CPTR]** CPTR: Full Transformer Network for Image Captioning [[paper](https://arxiv.org/abs/2101.10804)]
39 | - Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [[paper](https://arxiv.org/abs/2101.08779)] [[code](https://google.github.io/aichoreographer/)]
40 | - **[Trans2Seg]**  Segmenting Transparent Object in the Wild with Transformer [[paper](https://github.com/xieenze/Trans2Seg)] [[code](https://github.com/xieenze/Trans2Seg)]
41 | - **[SMCA]**  Fast Convergence of DETR with Spatially Modulated Co-Attention [[paper](https://arxiv.org/abs/2101.07448)]
42 | - Investigating the Vision Transformer Model for Image Retrieval Tasks [[paper](https://arxiv.org/abs/2101.03771)]
43 | - **[Trear]** Trear: Transformer-based RGB-D Egocentric Action Recognition [[paper](https://arxiv.org/abs/2101.03904)]
44 | - **[VisTR]** End-to-End Video Instance Segmentation with Transformers [[paper](https://arxiv.org/abs/2011.14503)]
45 | - **[VisualSparta]** VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search [[paper](https://arxiv.org/abs/2101.00265)]
46 | - **[TrackFormer]** TrackFormer: Multi-Object Tracking with Transformers [[paper](https://arxiv.org/abs/2101.02702)]
47 | - **[LETR]** Line Segment Detection Using Transformers without Edges [[paper](https://arxiv.org/abs/2101.01909)]
48 | - **[TAPE]** Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry [[paper](https://arxiv.org/abs/2101.02143)]
49 | - **[TRIQ]** Transformer for Image Quality Assessment [[paper](https://arxiv.org/abs/2101.01097)] [[code](https://github.com/junyongyou/triq)]
50 | - **[TransTrack]** TransTrack: Multiple-Object Tracking with Transformer [[paper](https://arxiv.org/abs/2012.15460)] [[code](https://github.com/PeizeSun/TransTrack)]
51 | - **[SETR]** Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [[paper](https://arxiv.org/abs/2012.15840)] [[code](https://fudan-zvg.github.io/SETR/)]
52 | - **[TransPose]** TransPose: Towards Explainable Human Pose Estimation by Transformer [[paper](https://arxiv.org/abs/2012.14214)] 
53 | - **[DeiT]** Training data-efficient image transformers & distillation through attention [[paper](https://arxiv.org/abs/2012.12877)] 
54 | - **[Pointformer]** 3D Object Detection with Pointformer [[paper](https://arxiv.org/abs/2012.11409)] 
55 | - **[ViT-FRCNN]** Toward Transformer-Based Object Detection [[paper](https://arxiv.org/abs/2012.09958)] 
56 | - **[Taming-transformers]** Taming Transformers for High-Resolution Image Synthesis [[paper](https://arxiv.org/abs/2012.09841)] [[code](https://compvis.github.io/taming-transformers/)]
57 | - **[SceneFormer]** SceneFormer: Indoor Scene Generation with Transformers [[paper](https://arxiv.org/abs/2012.09793)] 
58 | - **[PCT]** PCT: Point Cloud Transformer [[paper](https://arxiv.org/abs/2012.09688)] 
59 | - Transformer Interpretability Beyond Attention Visualization[[paper](https://arxiv.org/abs/2012.09838)] [[code](https://github.com/hila-chefer/Transformer-Explainability)]
60 | - **[METRO]** End-to-End Human Pose and Mesh Reconstruction with Transformers [[paper]](https://arxiv.org/abs/2012.09760)
61 | - **[PointTransformer]** Point Transformer[[paper](https://arxiv.org/abs/2012.09164)]
62 | - **[PED]** DETR for Pedestrian Detection[[paper](https://arxiv.org/abs/2012.06785)]
63 | - **[UP-DETR]** UP-DETR: Unsupervised Pre-training for Object Detection with Transformers[[paper](https://arxiv.org/abs/2011.09094)]
64 | - **[LAMBDANETWORKS]** MODELING LONG-RANGE INTERACTIONS WITHOUT ATTENTION[[paper](https://openreview.net/pdf?id=xTJEN-ggl1b)] [[code](https://github.com/lucidrains/lambda-networks)]
65 | - **[C-Tran]** General Multi-label Image Classification with Transformers[[paper](https://arxiv.org/abs/2011.14027)]
66 | - **[TSP-FCOS]** Rethinking Transformer-based Set Prediction for Object Detection[[paper](https://arxiv.org/abs/2011.10881)]
67 | - **[IPT]** Pre-Trained Image Processing Transformer[[paper](https://arxiv.org/abs/2012.00364)]
68 | - **[ACT]** End-to-End Object Detection with Adaptive Clustering Transformer[[paper](https://arxiv.org/abs/2011.09315)]
69 | - **[VTs]** Visual Transformers: Token-based Image Representation and Processing for Computer Vision[[paper](https://arxiv.org/abs/2006.03677)]
70 | 
71 | ### 2021
72 | 
73 | - **[Vision Transformer]** An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(**ICLR**)[[paper](https://arxiv.org/abs/2010.11929)] [[code](https://github.com/google-research/vision_transformer)]
74 | - **[Deformable DETR]** Deformable DETR: Deformable Transformers for End-to-End Object Detection(**ICLR**)[[paper](https://arxiv.org/abs/2010.04159)] [[code](https://github.com/fundamentalvision/Deformable-DETR)]
75 | - **[LSTR]** End-to-end Lane Shape Prediction with Transformers(**WACV**) [[paper](https://arxiv.org/abs/2011.04233)] [[code](https://github.com/liuruijin17/LSTR)]
76 | 
77 | ### 2020
78 | 
79 | - **[DETR]** End-to-End Object Detection with Transformers (**ECCV**) [[paper](https://arxiv.org/abs/2005.12872)] [[code](https://github.com/facebookresearch/detr)]
80 | - **[FPT]** Feature Pyramid Transformer(**CVPR**) [[paper](https://arxiv.org/abs/2007.09451)] [[code](https://github.com/ZHANGDONG-NJUST/FPT)]
81 | - **[TTSR]** Learning Texture Transformer Network for Image Super-Resolution(**CVPR**) [[paper](https://arxiv.org/abs/2006.04139)] [[code](https://github.com/researchmm/TTSR)]
82 | 
83 | 
84 | 
85 | 
86 | 
87 | ## Reference
88 | 
89 | 1. [origin](https://github.com/dk-liang/Awesome-Visual-Transformer)
90 | 
91 | 
92 | 
93 | ## Copyright
94 | 
95 | 
96 | Collected by Lucas Jin. 2021
97 | 
98 | 
99 | 


--------------------------------------------------------------------------------