└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Transformer [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 2 | 3 | A curated list of all kinds of transformers, also include some personal experiment results, applications and thoughts from industry. 4 | 5 | 6 | 7 | ## Updates 8 | 9 | - **2021.02.20**: I opened github discuss panel, we can start discuss about transformers there. 10 | 11 | 12 | 13 | ## Blogs 14 | 15 | - [Attention is All You Need](https://arxiv.org/abs/1706.03762) 16 | - [Chinese Blog] 3W字长文带你轻松入门视觉transformer [[Link](https://zhuanlan.zhihu.com/p/308301901)] 17 | - Transformers in Vision: A Survey [[paper](https://arxiv.org/abs/2101.01169)] - 2021.01.04 18 | - A Survey on Visual Transformer [[paper](https://arxiv.org/abs/2012.12556)] - 2020.12.24 19 | - https://zhuanlan.zhihu.com/p/342512339 万字长文带你了解2021大热的ViT 20 | - 线性Attention的探索:Attention必须有个Softmax吗?[link](线性Attention的探索:Attention必须有个Softmax吗?) 21 | 22 | 23 | 24 | ## Standalone Github Repos 25 | 26 | - https://github.com/ThilinaRajapakse/simpletransformers: Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI; 27 | - 28 | 29 | 30 | 31 | ## arXiv papers 32 | 33 | - Training Vision Transformers for Image Retrieval[[paper](https://arxiv.org/abs/2102.05644)] 34 | - **[TransReID]** TransReID: Transformer-based Object Re-Identification[[paper](https://arxiv.org/abs/2102.04378)] 35 | - **[VTN]** Video Transformer Network[[paper](https://arxiv.org/abs/2102.00719)] 36 | - **[T2T-ViT]** Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [[paper](https://arxiv.org/abs/2101.11986)] [[code](https://github.com/yitu-opensource/T2T-ViT)] 37 | - **[BoTNet]** Bottleneck Transformers for Visual Recognition [[paper](https://arxiv.org/abs/2101.11605)] 38 | - **[CPTR]** CPTR: Full Transformer Network for Image Captioning [[paper](https://arxiv.org/abs/2101.10804)] 39 | - Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [[paper](https://arxiv.org/abs/2101.08779)] [[code](https://google.github.io/aichoreographer/)] 40 | - **[Trans2Seg]** Segmenting Transparent Object in the Wild with Transformer [[paper](https://github.com/xieenze/Trans2Seg)] [[code](https://github.com/xieenze/Trans2Seg)] 41 | - **[SMCA]** Fast Convergence of DETR with Spatially Modulated Co-Attention [[paper](https://arxiv.org/abs/2101.07448)] 42 | - Investigating the Vision Transformer Model for Image Retrieval Tasks [[paper](https://arxiv.org/abs/2101.03771)] 43 | - **[Trear]** Trear: Transformer-based RGB-D Egocentric Action Recognition [[paper](https://arxiv.org/abs/2101.03904)] 44 | - **[VisTR]** End-to-End Video Instance Segmentation with Transformers [[paper](https://arxiv.org/abs/2011.14503)] 45 | - **[VisualSparta]** VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search [[paper](https://arxiv.org/abs/2101.00265)] 46 | - **[TrackFormer]** TrackFormer: Multi-Object Tracking with Transformers [[paper](https://arxiv.org/abs/2101.02702)] 47 | - **[LETR]** Line Segment Detection Using Transformers without Edges [[paper](https://arxiv.org/abs/2101.01909)] 48 | - **[TAPE]** Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry [[paper](https://arxiv.org/abs/2101.02143)] 49 | - **[TRIQ]** Transformer for Image Quality Assessment [[paper](https://arxiv.org/abs/2101.01097)] [[code](https://github.com/junyongyou/triq)] 50 | - **[TransTrack]** TransTrack: Multiple-Object Tracking with Transformer [[paper](https://arxiv.org/abs/2012.15460)] [[code](https://github.com/PeizeSun/TransTrack)] 51 | - **[SETR]** Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [[paper](https://arxiv.org/abs/2012.15840)] [[code](https://fudan-zvg.github.io/SETR/)] 52 | - **[TransPose]** TransPose: Towards Explainable Human Pose Estimation by Transformer [[paper](https://arxiv.org/abs/2012.14214)] 53 | - **[DeiT]** Training data-efficient image transformers & distillation through attention [[paper](https://arxiv.org/abs/2012.12877)] 54 | - **[Pointformer]** 3D Object Detection with Pointformer [[paper](https://arxiv.org/abs/2012.11409)] 55 | - **[ViT-FRCNN]** Toward Transformer-Based Object Detection [[paper](https://arxiv.org/abs/2012.09958)] 56 | - **[Taming-transformers]** Taming Transformers for High-Resolution Image Synthesis [[paper](https://arxiv.org/abs/2012.09841)] [[code](https://compvis.github.io/taming-transformers/)] 57 | - **[SceneFormer]** SceneFormer: Indoor Scene Generation with Transformers [[paper](https://arxiv.org/abs/2012.09793)] 58 | - **[PCT]** PCT: Point Cloud Transformer [[paper](https://arxiv.org/abs/2012.09688)] 59 | - Transformer Interpretability Beyond Attention Visualization[[paper](https://arxiv.org/abs/2012.09838)] [[code](https://github.com/hila-chefer/Transformer-Explainability)] 60 | - **[METRO]** End-to-End Human Pose and Mesh Reconstruction with Transformers [[paper]](https://arxiv.org/abs/2012.09760) 61 | - **[PointTransformer]** Point Transformer[[paper](https://arxiv.org/abs/2012.09164)] 62 | - **[PED]** DETR for Pedestrian Detection[[paper](https://arxiv.org/abs/2012.06785)] 63 | - **[UP-DETR]** UP-DETR: Unsupervised Pre-training for Object Detection with Transformers[[paper](https://arxiv.org/abs/2011.09094)] 64 | - **[LAMBDANETWORKS]** MODELING LONG-RANGE INTERACTIONS WITHOUT ATTENTION[[paper](https://openreview.net/pdf?id=xTJEN-ggl1b)] [[code](https://github.com/lucidrains/lambda-networks)] 65 | - **[C-Tran]** General Multi-label Image Classification with Transformers[[paper](https://arxiv.org/abs/2011.14027)] 66 | - **[TSP-FCOS]** Rethinking Transformer-based Set Prediction for Object Detection[[paper](https://arxiv.org/abs/2011.10881)] 67 | - **[IPT]** Pre-Trained Image Processing Transformer[[paper](https://arxiv.org/abs/2012.00364)] 68 | - **[ACT]** End-to-End Object Detection with Adaptive Clustering Transformer[[paper](https://arxiv.org/abs/2011.09315)] 69 | - **[VTs]** Visual Transformers: Token-based Image Representation and Processing for Computer Vision[[paper](https://arxiv.org/abs/2006.03677)] 70 | 71 | ### 2021 72 | 73 | - **[Vision Transformer]** An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(**ICLR**)[[paper](https://arxiv.org/abs/2010.11929)] [[code](https://github.com/google-research/vision_transformer)] 74 | - **[Deformable DETR]** Deformable DETR: Deformable Transformers for End-to-End Object Detection(**ICLR**)[[paper](https://arxiv.org/abs/2010.04159)] [[code](https://github.com/fundamentalvision/Deformable-DETR)] 75 | - **[LSTR]** End-to-end Lane Shape Prediction with Transformers(**WACV**) [[paper](https://arxiv.org/abs/2011.04233)] [[code](https://github.com/liuruijin17/LSTR)] 76 | 77 | ### 2020 78 | 79 | - **[DETR]** End-to-End Object Detection with Transformers (**ECCV**) [[paper](https://arxiv.org/abs/2005.12872)] [[code](https://github.com/facebookresearch/detr)] 80 | - **[FPT]** Feature Pyramid Transformer(**CVPR**) [[paper](https://arxiv.org/abs/2007.09451)] [[code](https://github.com/ZHANGDONG-NJUST/FPT)] 81 | - **[TTSR]** Learning Texture Transformer Network for Image Super-Resolution(**CVPR**) [[paper](https://arxiv.org/abs/2006.04139)] [[code](https://github.com/researchmm/TTSR)] 82 | 83 | 84 | 85 | 86 | 87 | ## Reference 88 | 89 | 1. [origin](https://github.com/dk-liang/Awesome-Visual-Transformer) 90 | 91 | 92 | 93 | ## Copyright 94 | 95 | 96 | Collected by Lucas Jin. 2021 97 | 98 | 99 | --------------------------------------------------------------------------------