└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # SLRPapers 2 | papers on sign language recognition and related fields 3 | 4 | ## Table of Content 5 | 6 | - **[Papers](#papers)** 7 | - [HMM](#hmm) 8 | - [CNN+RNN](#cnnrnn) 9 | - [3D CNN](#3d-cnn) 10 | - [GCN](#gcn) 11 | - [Others](#others) 12 | 13 | - **[Datasets](#datasets)** 14 | - **[Related Fields](#related-fields)** 15 | - [Action Recognition](#action-recognition) 16 | - [Speech Recognition](#speech-recognition) 17 | - [Video Captioning](#video-captioning) 18 | 19 | ## Papers 20 | 21 | ### HMM 22 | 23 | 1. **Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition** `TOMM2017` [*paper*](https://dl.acm.org/doi/pdf/10.1145/3152121?download=true) *code* 24 | 2. **Chinese sign language recognition with adaptive HMM** `ICME2016` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7552950) *code* 25 | 3. **Sign language recognition based on adaptive HMMS with data augmentation** `ICIP2016` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7532885) *code* 26 | 4. **Continuous sign language recognition using level building based on fast hidden Markov model** `Pattern Recognit.Lett.2016` [*paper*](https://pdf.sciencedirectassets.com/271524/1-s2.0-S0167865516X00086/1-s2.0-S0167865516300344/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEAaCXVzLWVhc3QtMSJHMEUCIQD2Y%2BxR5o8TZTHS2281Y35EvaUofwEtvu1ZEd9IzltFCAIgdTNMHwp2zGmEjq7mpzo2ewVyzh2Mn5zTTR0H29nDY74qvQMIuP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgwwNTkwMDM1NDY4NjUiDEar6BbfeNCgeErvDSqRA1dz9hH9CR6nilnRKmUSYaZl7Fk4rSmfywj7DUDvdCVEGeo%2BoVVLCBrZl2SHby8RITLRTXhklXfrgnn1ek%2Fgynu9sb4H6eBzKJbqET6t2JbONb7iT8NEkqFX1GUVuh5xAvdFNux8v5EwC6j7V9wXbU2WAIuuZJKUN1rK4JJTIo6ww7%2FwjPPn5XGCVQfc1Wp6Oz6j1KScAnPgZzJ1h8MWarW5SqoeGxO0kPH1y1%2BUxOESHFvsZPfB5nmRuqW4rON9m9kb%2BgD0PY0MSn%2BBo19xekFkhc%2FowTkHyU7kw7tPfSa0XRQuTjEXah0m80fzKmgGChHW8u4GPUFqCV1aunbolvf%2BFTJNm2gvUq5W4a%2Fyvhr5%2Fd%2FRQxWCcZCXuMRWIRm0IGEc6Ug8fyPgn2CplAtVYMTmQsqUJmaXLxQD6HmQ6IJiY5g7hCOvXjC%2F6jgr2L4SdvOTohIlzYB5LLek3rm%2BCB4JgoWmGuT74Iv%2FV4vxjexmTE1WegVcyQ%2BsIF%2B%2Bj7bXO%2FcWymf5NsMoBUt35jWu3wTuMJSE1vAFOusBtxzItLiNNvLLwB%2FR1a8vxlW3a10tX8Tg1oVxQDAy1xT%2BJ0BPrQuB3Z555vyScCn1IY0%2FsQGoh%2BkOAKlilzLDRxdwA3T5lbWfPaCb2cK31pwbFv1XkmxwJ3K2w5kziZ0J6n8aiwespNME9qUrzvmZ3wB8N3fvicmYs4nKCLHSyDCScXOX5stLaY069z5G6sMrnr05wATDlolASJtMXj5u5m2b381hnpmVPbT%2Ft7lueZoEGTa%2FCO%2BRUx0tq5J4rAaVOtC%2Fokh8yuMVasm4AZR1GgSlGl97UEKxdo0JaZ6c%2Bx%2Bh7BKeWNJEyNjUqw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200108T074501Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYWB765V7N%2F20200108%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=8f578a751f8ba78bf4bdf010c4d22d373725e6b6dc3cd45fbfc85838b1450d2e&hash=a4caf615747056ee196c29b41bf9170d9ea0f54106703ae52f095ab03f51f717&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0167865516300344&tid=spdf-dd403efa-8ad7-4a95-9e7c-99bdb33484a3&sid=2e1bb56239e50144b748444812c269de5761gxrqa&type=client) *code* 27 | 5. **Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications** `TACCESS2016` [*paper*](https://dl.acm.org/doi/pdf/10.1145/2850421?download=true) *code* 28 | 6. **A Threshold-based HMM-DTW Approach for Continuous Sign Language Recognition** `ICIMCS2014` [*paper*](https://dl.acm.org/doi/pdf/10.1145/2632856.2632931?download=true) *code* 29 | 7. **Improving Continuous Sign Language Recognition: Speech Recognition Techniques and System Design** `SLPAT2013` [*paper*](https://pdfs.semanticscholar.org/91e4/220449ea1d7ed2b49c916dd89af850c69b26.pdf) *code* 30 | 8. **Using Viseme Recognition to Improve a Sign Language Translation System** `IWSLT2013` [*paper*](https://pdfs.semanticscholar.org/e567/428f531e973a4544f1884d9d7e7aa59953a6.pdf?_ga=2.235954241.1969091582.1582720499-418088591.1578543327) *code* 31 | 9. **Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition** `CVPRW2011` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5981681&tag=1) *code* 32 | 10. **Speech Recognition Techniques for a Sign Language Recognition System** `INTERSPEECH2007` [*paper*](https://www-i6.informatik.rwth-aachen.de/publications/download/154/DreuwPhilippeRybachDavidDeselaersThomasZahediMortezaNeyHermann--SpeechRecognitionTechniquesforaSignLanguageRecognitionSystem--2007.pdf) *code* 33 | 11. **Large-Vocabulary Continuous Sign Language Recognition Based on Transition-Movement Models** `TSMC2007` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4032919) *code* 34 | 12. **Real-time American sign language recognition using desk and wearable computer based video** `TPAMI1998` [*paper*](http://luthuli.cs.uiuc.edu/~daf/courses/Signals%20AI/Papers/HMMs/00735811.pdf) *code* 35 | 36 | 37 | 38 | ### CNN+RNN 39 | 40 | *CNN includes 2D CNN and 3D CNN, and RNN includes LSTM and BLSTM.* 41 | 42 | 1. **SF-Net: Structured Feature Network for Continuous Sign Language Recognition** `ArXiv2019` [*paper*](https://arxiv.org/pdf/1908.01341.pdf) *code* 43 | 2. **Iterative Alignment Network for Continuous Sign Language** `CVPR2019` [*paper*](http://openaccess.thecvf.com/content_CVPR_2019/papers/Pu_Iterative_Alignment_Network_for_Continuous_Sign_Language_Recognition_CVPR_2019_paper.pdf) *code* 44 | 3. **Fingerspelling recognition in the wild with iterative visual attention** `ICCV2019` [*paper*](https://arxiv.org/pdf/1908.10546.pdf) *code* 45 | 4. **Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?** `BMVC2019` [*paper*](https://arxiv.org/pdf/1907.10292.pdf) *code* 46 | 5. **Sign Language Recognition Analysis using Multimodal Data** `DSAA2019` [*paper*](https://arxiv.org/pdf/1909.11232.pdf) *code* 47 | 6. **Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks** `ACCESS2019` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8667292) *code* 48 | 7. **Attention in Convolutional LSTM for Gesture Recognition** `NIPS2018` [*paper*](http://papers.nips.cc/paper/7465-attention-in-convolutional-lstm-for-gesture-recognition.pdf) [*code*](https://github.com/GuangmingZhu/AttentionConvLSTM) 49 | 8. **Hierarchical LSTM for Sign Language Translation** `AAAI2018` [*paper*](https://pdfs.semanticscholar.org/d44c/20c48e764a546d00b9155a56b171b0dc04bc.pdf) *code* 50 | 9. **Neural Sign Language Translation** `CVPR2018` [*paper*](http://openaccess.thecvf.com/content_cvpr_2018/papers/Camgoz_Neural_Sign_Language_CVPR_2018_paper.pdf) [*code*](https://github.com/neccam/nslt) 51 | 10. **SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition** `ICCV2017` [*paper*](http://openaccess.thecvf.com/content_ICCV_2017/papers/Camgoz_SubUNets_End-To-End_Hand_ICCV_2017_paper.pdf) [*code*](https://github.com/neccam/SubUNets) 52 | 11. **Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization** `CVPR2017` [*paper*](http://openaccess.thecvf.com/content_cvpr_2017/papers/Cui_Recurrent_Convolutional_Neural_CVPR_2017_paper.pdf) *code* 53 | 12. **Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition** `ICCV2017` [*paper*](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w44/Zhang_Learning_Spatiotemporal_Features_ICCV_2017_paper.pdf) [*code*](https://github.com/GuangmingZhu/Conv3D_BICLSTM) 54 | 13. **Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks** `CVPR2016` [*paper*](https://research.nvidia.com/sites/default/files/pubs/2016-06_Online-Detection-and/NVIDIA_R3DCNN_cvpr2016.pdf) *code* 55 | 56 | 57 | 58 | ### 3D CNN 59 | 60 | 1. **Dense Temporal Convolution Network for Sign Language Translation** `IJCAI2019` [*paper*](https://www.ijcai.org/Proceedings/2019/0105.pdf) *code* 61 | 2. **Thai Sign Language Recognition Using 3D Convolutional Neural Networks** `ICCCM2019` [*paper*](https://dl.acm.org/doi/pdf/10.1145/3348445.3348452?download=true) *code* 62 | 3. **Video-based sign language recognition without temporal segmentation** `AAAI2018` [*paper*](https://arxiv.org/pdf/1801.10111.pdf) *code* 63 | 4. **Using Convolutional 3D Neural Networks for User-independent continuous gesture recognition** `ICPR2016` [*paper*](http://personal.ee.surrey.ac.uk/Personal/S.Hadfield/papers/camgoz2016icprw.pdf) *code* 64 | 5. **Hand Gesture Recognition with 3D Convolutional Neural Networks** `CVPRW2015` [*paper*](https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W15/papers/Molchanov_Hand_Gesture_Recognition_2015_CVPR_paper.pdf) *code* 65 | 6. **Sign Language Recognition using 3D convolutional neural networks** `ICME2015` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7177428) *code* 66 | 67 | 68 | 69 | ### GCN 70 | 71 | 1. **Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition** `ICANN2019` [*paper*](https://arxiv.org/pdf/1901.11164.pdf) [*code*](https://github.com/amorim-cleison/st-gcn-sl) 72 | 73 | 74 | 75 | ### Others 76 | 77 | 1. **Sign Language Translation with Transformers** `ArXiv2020` [*paper*](https://arxiv.org/pdf/2004.00588.pdf) [*code*](https://github.com/kayoyin/transformer-slt) 78 | 2. **Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation** `ArXiv2020` [*paper*](https://arxiv.org/pdf/2003.13830.pdf) [*code*](https://github.com/neccam/slt) 79 | 3. **Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison** `WACV2020` [*paper*](https://arxiv.org/pdf/1910.11006.pdf) [*code*](https://github.com/dxli94/WLASL) 80 | 4. **Human-like sign-language learning method using deep learning** `ETRI2018` [*paper*](https://onlinelibrary.wiley.com/doi/pdf/10.4218/etrij.2018-0066) *code* 81 | 5. **Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs** `CVPR2017` [*paper*](https://www-i6.informatik.rwth-aachen.de/publications/download/1031/KollerOscarZargaranSepehrNeyHermann--Re-SignRe-AlignedEnd-to-EndSequenceModellingwithDeepRecurrentCNN-HMMs--2017.pdf) [*code*](https://github.com/huerlima/Re-Sign-Re-Aligned-End-to-End-Sequence-Modelling-with-Deep-Recurrent-CNN-HMMs) 82 | 6. **Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled** `CVPR2016` [*paper*](https://www-i6.informatik.rwth-aachen.de/publications/download/1000/KollerOscarNeyHermannBowdenRichard--DeepHHowtoTrainaCNNon1MillionHImagesWhenYourDataIsContinuousWeaklyLabelled--2016.pdf) *code* 83 | 7. **Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition** `BMVC2016` [*paper*](https://pdfs.semanticscholar.org/7b2f/db4a2f79a638ad6c5328cd2860b63fdfc100.pdf) *code* 84 | 8. **SIGN LANGUAGE RECOGNITION WITH LONG SHORT-TERM MEMORY** `ICIP2016` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7532884) *code* 85 | 9. **Iterative Reference Driven Metric Learning for Signer Independent Isolated Sign Language Recognition.** `ECCV2016` [*paper*](http://vipl.ict.ac.cn/uploadfile/upload/2018112115134267.pdf) *code* 86 | 10. **Automatic Alignment of HamNoSys Subunits for Continuous Sign Language Recognition** `LREC2016` [*paper*](https://pdfs.semanticscholar.org/f6a3/c3ab709eebc91f9639fe6d26b7736c3115b2.pdf?_ga=2.172107747.1969091582.1582720499-418088591.1578543327) *code* 87 | 11. **Curve Matching from the View of Manifold for Sign Language Recognition** `ACCV2014` [*paper*](http://whdeng.cn/FSLCV14/pdffiles/w12-o7.pdf) *code* 88 | 12. **Sign Language Recognition and Translation with Kinect** `AFGR2013` [*paper*](https://pdfs.semanticscholar.org/0450/ecef50fd1f532fe115c5d32c7c3ebed6fd80.pdf?_ga=2.205538387.1969091582.1582720499-418088591.1578543327) *code* 89 | 13. **Large-scale Learning of Sign Language by Watching TV (Using Cooccurrences).** `BMVC2013` [*paper*](http://www.robots.ox.ac.uk:5000/~vgg/publications/2013/Pfister13/pfister13.pdf) *code* 90 | 14. **Sign Language Recognition using Sequential Pattern Trees** `CVPR2012` [*paper*](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.261.3830&rep=rep1&type=pdf) *code* 91 | 15. **Sign language recognition using sub-units** `JMLR2012` [*paper*](http://www.jmlr.org/papers/volume13/cooper12a/cooper12a.pdf) *code* 92 | 16. **American Sign Language word recognition with a sensory glove using artificial neural networks** `Eng.Appl.Artif.Intell.2011` [*paper*](https://pdf.sciencedirectassets.com/271095/1-s2.0-S0952197611X00076/1-s2.0-S0952197611001230/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEFQaCXVzLWVhc3QtMSJIMEYCIQCDZxnJOHB6ynrfo%2B0eKqcxZx3UyY6n1LVRR2QvdupwOAIhAOfZShxaTYLFHn%2F72DKrv9t%2BmLGlEJRfMa8lgTOzU4PaKr0DCM3%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQAhoMMDU5MDAzNTQ2ODY1IgyZF8SDS6TiuKrS8eEqkQOVoR%2BTiMf07fTZ5J84%2FjQZYJwK%2F2xGNhjzM7v5ISMQI%2FCDEm2qvcsc0YpaTk%2B2BbEwYYO7yVzPyexMkyl%2FklBtl4sxrAa9Dfc3wJs%2FbRRSh6JbbF4kX85PQ8%2BnrnZ7at9281HfUYhCVk8Sct75U2C4LdfBKUEvk0R0cf%2FkIzfi9mXF1IQ9oy1fpLG6q49kH28Arx5g3FlNKs3IjR7oIZy4pYkcdNnjd4kJxPtExRWmlFi1nITofQGITzNdsEtDGQIdk7M2HsbHoSlLwgqGirrFCLRnkUJ8XY9HoD4%2Fe%2FqG4q7knK4oshtBBQy3CNVp6X%2FpnkQ%2B2dcBxs9xoWXalwuEsdv8PDtP1MACScDfNIsDoRUA6vjdNzPW3aSPBDjwA6lkEE5StCKn55TXIQ6Zd6Y7IxfjUnMgOFWAQUa%2FvNXoFjzCjwMPZDO4zOsWuW32h3W2%2F3Rf2SSjf%2BBa112eh5%2FNeTV9iddv1TiAvt37jyb7n2xiCipeNYfpVHk1Ocj7367O1QDbwF2FX1byN6ULKJGhCjDSwNrwBTrqAQYivYa%2BOo26NX78i0ddvdNA%2FiTXdcs7U3QUcoG7GB49i8Ufd9mEGbEWZcDu3hYc8dRsq93ETnKKfI64YRfVljn4RfMrWeHghUnxALq448eFfuB5LhIsMDVwyaUE%2BUdD40W1hypCdpcWmgAZYuYJNcCCSPYlmQYd5fn9bFcGZD235aXKzCUntJiWClVCblsgKdzmvupUKOBz3xrTdMbn%2FTjktLpjGO9WnJJ1YGJlsZxQELb4snrTQU8kOUeonYg6rjduZkrE9TO%2Fw898abdJr60xIrx9sVLuWK3vbfDezkJv5xVXrC6CVYQQAw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200109T045503Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYYZOOW47M%2F20200109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=f24d01b3b6674ba20289c504c727ffe5cccfec13d342832f6b323c81ec689f1f&hash=4b882e75b9a4d7296ee682513ba3630c7d34190549527aec5c1644be54b5b32b&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0952197611001230&tid=spdf-306582c7-3593-4d10-a68f-e6570a446d78&sid=b06c3dbe4dd5d04d742bcc8-a5a302ee0a5agxrqa&type=client) *code* 93 | 17. **Learning sign language by watching TV (using weakly aligned subtitles)** `CVPR2009` [*paper*](https://www.robots.ox.ac.uk/~vgg/publications/2009/Buehler09/buehler09.pdf) *code* 94 | 95 | 96 | 97 | ## Datasets 98 | 99 | | Dataset | Language | Classes | Samples | Data Type | Language Level | 100 | | ------------------------------------------------------------ | ----------- | ------- | ------- | ---------------------------- | :------------- | 101 | | **[CSL Dataset I](http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html)** | Chinese | 500 | 125,000 | Videos&Depth from Kinect | isolated | 102 | | **[CSL Dataset II](http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html)** | Chinese | 100 | 25,000 | Videos&Depth from Kinect | continuous | 103 | | **[RWTH-PHOENIX-Weather 2014](https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX/)** | German | 1,081 | 6,841 | Videos | continuous | 104 | | [**RWTH-PHOENIX-Weather 2014 T**](https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX-2014-T/) | German | 1,066 | 8,257 | Videos | continuous | 105 | | **[ASLLVD](http://www.bu.edu/asllrp/av/dai-asllvd.html)** | American | 3,300 | 9,800 | Videos(multiple angles) | isolated | 106 | | **[ASLLVD-Skeleton](https://www.cin.ufpe.br/~cca5/asllvd-skeleton/)** | American | 3,300 | 9,800 | Skeleton | isolated | 107 | | **[SIGNUM](https://www.phonetik.uni-muenchen.de/forschung/Bas/SIGNUM/)** | German | 450 | 33,210 | Videos | continuous | 108 | | [**DGS Kinect 40**](http://personal.ee.surrey.ac.uk/Personal/H.Cooper/research/papers/Ong_Sign_2012.pdf) | German | 40 | 3,000 | Videos(multiple angles) | isolated | 109 | | [**DEVISIGN-G**](http://vipl.ict.ac.cn/homepage/ksl/data.html) | Chinese | 36 | 432 | Videos | isolated | 110 | | [**DEVISIGN-D**](http://vipl.ict.ac.cn/homepage/ksl/data.html) | Chinese | 500 | 6,000 | Videos | isolated | 111 | | [**DEVISIGN-L**](http://vipl.ict.ac.cn/homepage/ksl/data.html) | Chinese | 2000 | 24,000 | Videos | isolated | 112 | | [**LSA64**](http://facundoq.github.io/unlp/lsa64/) | Argentinian | 64 | 3,200 | Videos | isolated | 113 | | [**GSL isol.**](https://vcl.iti.gr/dataset/gsl/) | Greek | 310 | 40,785 | Videos&Depth from RealSense | isolated | 114 | | [**GSL SD**](https://vcl.iti.gr/dataset/gsl/) | Greek | 310 | 10,290 | Videos&Depth from RealSense | continuous | 115 | | [**GSL SI**](https://vcl.iti.gr/dataset/gsl/) | Greek | 310 | 10,290 | Videos&Depth from RealSense | continuous | 116 | | [**IIITA -ROBITA**](https://robita.iiita.ac.in/dataset.php) | Indian | 23 | 605 | Videos | isolated | 117 | | [**PSL Kinect**](http://vision.kia.prz.edu.pl/dynamickinect.php) | Polish | 30 | 300 | Videos&Depth from Kinect | isolated | 118 | | [**PSL ToF**](http://vision.kia.prz.edu.pl/dynamictof.php) | Polish | 84 | 1,680 | Videos&Depth from ToF camera | isolated | 119 | | [**BUHMAP-DB**](https://www.cmpe.boun.edu.tr/pilab/pilabfiles/databases/buhmap/) | Turkish | 8 | 440 | Videos | isolated | 120 | | [**LSE-Sign**](http://lse-sign.bcbl.eu/web-busqueda/) | Spanish | 2,400 | 2,400 | Videos | isolated | 121 | | [**Purdue RVL-SLLL**](https://engineering.purdue.edu/RVL/Database/ASL/asl-database-front.htm) | American | 39 | 546 | Videos | isolated | 122 | | [**RWTH-BOSTON-50**](http://www-i6.informatik.rwth-aachen.de/aslr/database-rwth-boston-50.php) | American | 50 | 483 | Videos(multiple angles) | isolated | 123 | | [**RWTH-BOSTON-104**](http://www-i6.informatik.rwth-aachen.de/aslr/database-rwth-boston-104.php) | American | 104 | 201 | Videos(multiple angles) | continuous | 124 | | [**RWTH-BOSTON-400**](http://www-i6.informatik.rwth-aachen.de/~dreuw/database.php) | American | 400 | 843 | Videos | continuous | 125 | | [**WLASL**](https://dxli94.github.io/WLASL/) | American | 2,000 | 21,083 | Videos | isolated | 126 | 127 | 128 | 129 | ## Related Fields 130 | 131 | ### Action Recognition 132 | 133 | 1. **DistInit: Learning Video Representations Without a Single Labeled Video** `ICCV2019` [*paper*](https://arxiv.org/pdf/1901.09244.pdf) *code* 134 | 2. **SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition** `ICCV2019` [*paper*](http://openaccess.thecvf.com/content_ICCV_2019/papers/Korbar_SCSampler_Sampling_Salient_Clips_From_Video_for_Efficient_Action_Recognition_ICCV_2019_paper.pdf) *code* 135 | 3. **Reasoning About Human-Object Interactions Through Dual Attention Networks** `ICCV2019` [*paper*](https://arxiv.org/pdf/1909.04743) *code* 136 | 4. **SlowFast Networks for Video Recognition** `ICCV2019` [*paper*](http://openaccess.thecvf.com/content_ICCV_2019/papers/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.pdf) [*code*](https://github.com/facebookresearch/SlowFast) 137 | 5. **Video Classification with Channel-Separated Convolutional Networks** `ICCV2019` [*paper*](https://arxiv.org/pdf/1904.02811) [*code*](https://github.com/facebookresearch/VMZ) 138 | 6. **BMN: Boundary-Matching Network for Temporal Action Proposal Generation** `ICCV2019` [*paper*](https://arxiv.org/pdf/1907.09702) [*code*](https://github.com/JJBOY/BMN-Boundary-Matching-Network) 139 | 7. **DynamoNet: Dynamic Action and Motion Network** `ICCV2019` [*paper*](https://arxiv.org/pdf/1904.11407.pdf) *code* 140 | 8. **Graph Convolutional Networks for Temporal Action Localization** `ICCV2019` [*paper*](https://arxiv.org/pdf/1909.03252) [*code*](https://github.com/Alvin-Zeng/PGCN) 141 | 9. **Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?** `CVPR2018` [*paper*](https://arxiv.org/pdf/1711.09577.pdf) [*code*](https://github.com/kenshohara/3D-ResNets-PyTorch) 142 | 10. **A Closer Look at Spatiotemporal Convolutions for Action Recognition** `CVPR2018` [*paper*](https://arxiv.org/pdf/1711.11248.pdf) [*code*](https://github.com/facebookresearch/VMZ) 143 | 11. **Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition** `AAAI2018` [*paper*](https://arxiv.org/pdf/1801.07455.pdf) [*code*](https://github.com/yysijie/st-gcn) 144 | 12. **Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation** `IJCAI2018` [*paper*](https://arxiv.org/pdf/1804.06055) [*code*](https://github.com/huguyuehuhu/HCN-pytorch) 145 | 13. **Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset** `CVPR2017` [*paper*](https://arxiv.org/pdf/1705.07750.pdf) [*code*](https://github.com/deepmind/kinetics-i3d) 146 | 14. **Action Recognition using Visual Attention** `ICLR2016` [*paper*](https://arxiv.org/pdf/1511.04119v2.pdf) [*code*](https://github.com/kracwarlock/action-recognition-visual-attention) 147 | 15. **Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors** `CVPR2015` [*paper*](https://arxiv.org/pdf/1505.04868) [*code*](https://github.com/wanglimin/TDD) 148 | 16. **Two-Stream Convolutional Networks for Action Recognition in Videos** `NIPS2014` [*paper*](https://arxiv.org/pdf/1406.2199.pdf) [*code*](https://github.com/jeffreyyihuang/two-stream-action-recognition) 149 | 150 | 151 | 152 | ### Speech Recognition 153 | 154 | 1. **State-of-the-art Speech Recognition With Sequence-to-Sequence Models** ` ICASSP2018` [*paper*](https://arxiv.org/pdf/1712.01769.pdf) *code* 155 | 2. **Lip Reading Sentences in the Wild** `CVPR2017` [*paper*](https://arxiv.org/pdf/1611.05358.pdf) [*code*](https://github.com/ajinkyaT/Lip_Reading_in_the_Wild_AVSR) 156 | 3. **Listen, Attend and Spell** `ICASSP2016` [*paper*](https://arxiv.org/pdf/1508.01211v2.pdf) [*code*](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch) 157 | 4. **Deep speech 2: End-to-end speech recognition in english and mandarin** `ICML2016` [*paper*](http://proceedings.mlr.press/v48/amodei16.pdf) [*code*](https://github.com/PaddlePaddle/DeepSpeech) 158 | 5. **Attention-Based Models for Speech Recognition** `NIPS2015` [*paper*](https://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition.pdf) *code* 159 | 6. **Convolutional Neural Networks for Speech Recognition** `TASLP2014` [*paper*](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN_ASLPTrans2-14.pdf) [*code*](https://github.com/cmaroti/speech_recognition) 160 | 7. **Hybrid speech recognition with Deep Bidirectional LSTM** `ASRU2013` [*paper*](https://www.cs.toronto.edu/~graves/asru_2013.pdf) *code* 161 | 8. **New types of deep neural network learning for speech recognition and related applications: an overview** `ICASSP2013` [*paper*](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639344) *code* 162 | 9. **Speech Recognition with Deep Recurrent Neural Networks** `ICASSP2013` [*paper*](https://arxiv.org/pdf/1303.5778.pdf) [*code*](https://github.com/lucko515/speech-recognition-neural-network) 163 | 164 | 165 | 166 | ### Video Captioning 167 | 168 | 1. **Video Description A Survey of Methods, Datasets and Evaluation Metrics** `ACM Computing Surveys2019` [*paper*](https://arxiv.org/pdf/1806.00186.pdf) *code* 169 | 2. **Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning** `CVPR2019` [*paper*](https://arxiv.org/pdf/1902.10322.pdf) *code* 170 | 3. **Reconstruction Network for Video Captioning** `CVPR2018` [*paper*](https://arxiv.org/pdf/1803.11438.pdf) [*code*](https://github.com/hobincar/RecNet) 171 | 4. **Multimodal Memory Modelling for Video Captioning** `CVPR2018` [*paper*](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_M3_Multimodal_Memory_CVPR_2018_paper.pdf) *code* 172 | 5. **Interpretable Video Captioning via Trajectory Structured Localization** `CVPR2018` [*paper*](https://zpascal.net/cvpr2018/Wu_Interpretable_Video_Captioning_CVPR_2018_paper.pdf) *code* 173 | 6. **Video Captioning with Transferred Semantic Attributes** `CVPR2017` [*paper*](https://arxiv.org/pdf/1611.07675.pdf) *code* 174 | 7. **Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks** `CVPR2016` [*paper*](https://arxiv.org/pdf/1510.07712.pdf) *code* 175 | 8. **Jointly Modeling Embedding and Translation to Bridge Video and Language** `CVPR2016` [*paper*](http://openaccess.thecvf.com/content_cvpr_2016/papers/Pan_Jointly_Modeling_Embedding_CVPR_2016_paper.pdf) *code* 176 | 9. **Describing Videos by Exploiting Temporal Structure** `ICCV2015` [*paper*](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Describing_Videos_by_ICCV_2015_paper.pdf) [*code*](https://github.com/yaoli/arctic-capgen-vid) 177 | 178 | --------------------------------------------------------------------------------