└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning Temporal Action Detection 2 | 3 | *Last updated: 2019/9/25* 4 | 5 | ## Performance Table 6 | 7 | | Detector | THUMOS (mAP@IoU=0.5) | ANET (mAP@IoU=0.5) | Speed | Published In | 8 | |:-------------:|:--------------------:|:-------------------:|-------|-------------:| 9 | | LAF | 4.4 | - | | ACMMM'15 | 10 | | RNN & RL | 17.1 | - | | CVPR'16 | 11 | | PSDF | 18.8 | - | | CVPR'16 | 12 | | SSAD | 24.6 | - | | CVPR'16 | 13 | | SCNN | 19.0 | - | | CVPR'16 | 14 | | SCNNv2 | 19.0 | - | | WACV'16 | 15 | | CBR | 31.0 | - | | BMVC'17 | 16 | | SMS | 14.8 | - | | CVPR'17 | 17 | | UntrimmedNets | 13.7 | 7.2 | | CVPR'17 | 18 | | TCN | 25.6 | 23.6 | | ICCV'17 | 19 | | ETE SSTAD | 29.2 | - | | BMVC'17 | 20 | | CDC | 23.3 | 22.9 | 500 | ICCV'17 | 21 | | SMS | 17.8 | - | | CVPR'17 | 22 | | SCC | 19.3 | - | | CVPR'17 | 23 | | TAG | 28.3 | - | | CVPR'17 | 24 | | SST | 19.3 | - | | CVPR'17 | 25 | | SSTAD | 24.6 | - | | ACMMM'17 | 26 | | TURN-TAP | 25.6 | - | 880 | ICCV'17 | 27 | | R-C3D | 28.9 | 16.7 | | ICCV'17 | 28 | | TAG+SSN | 29.1 | 28.3 | | ICCV'17 | 29 | | ETP | 34.2 | - | | CVPR'18 | 30 | | TAL-Net | 42.0 | 38.2 | | CVPR'18 | 31 | | STPN | 16.9 | 29.3 | | CVPR'18 | 32 | | TPN | 27.6 | - | | AAAI'18 | 33 | | SAP | 27.7 | - | | AAAI'18 | 34 | | WSTAD | 15.9 | 27.3 | | ACMMM'18 | 35 | | BSN | 36.9 | - | | ECCV'18 | 36 | | AutoLoc | 21.2 | 27.3 | | ECCV'18 | 37 | | W-TALC | 22.8 | 37.0 | | ECCV'18 | 38 | | CPMN | 16.1 | 39.3 | | ACMMM'18 | 39 | | STAR | 23.0 | 31.1 | | AAAI'19 | 40 | | BMN | 38.8 | 39.4 | | ICCV'19 | 41 | | MGG | 39.9 | - | | ICCV'19 | 42 | 43 | 44 | 45 | ## Dataset 46 | - **[THUMOS]** THUMOS Challenge 2014 |[`[Homepage]`](http://crcv.ucf.edu/THUMOS14/) 47 | 48 | - **[Activity-Net]** A Large-Scale Video Benchmark for Human Activity Understanding |[`[Homepage]`](http://activity-net.org/index.html) 49 | 50 | - **[COIN]** A Large-scale Dataset for Comprehensive Instructional Video Analysis |[`[Homepage]`](https://coin-dataset.github.io/) 51 | 52 | 53 | ## 2015 54 | - **[LAF]** Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images | **[ACMMM'15]** |[`[pdf]`](http://cn.arxiv.org/pdf/1504.00983.pdf) 55 | 56 | ## 2016 57 | - **[RNN & RL]** End-to-end learning of action detection from frame glimpses in videos | **[CVPR'16]** |[`[pdf]`](https://arxiv.org/pdf/1511.06984.pdf) 58 | 59 | - **[PSDF]** Temporal Action Localization with Pyramid of Score Distribution Features | **[CVPR'16]** |[`[pdf]`](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yuan_Temporal_Action_Localization_CVPR_2016_paper.pdf) 60 | 61 | - **[SCNN]** Temporal action localization in untrimmed videos via multi-stage CNNs | **[CVPR'16]** |[`[pdf]`](https://arxiv.org/pdf/1601.02129.pdf) 62 | 63 | - **[SCNNv2]** Efficient Action Detection in Untrimmed Videos via Multi-Task Learning | **[WACV'16]** |[`[pdf]`](https://arxiv.org/pdf/1612.07403.pdf) 64 | 65 | 66 | ## 2017 67 | - **[SSAD]** Single Shot Temporal Action Detection | **[ACMMM'17]** |[`[pdf]`](https://arxiv.org/pdf/1710.06236.pdf) 68 | 69 | - **[CBR]** Cascaded Boundary Regression for Temporal Action Detection | **[BMVC'17]** |[`[pdf]`](https://arxiv.org/pdf/1705.01180.pdf) 70 | 71 | - **[SMS]** Temporal Action Localization by Structured Maximal Sums | **[CVPR'17]** |[`[pdf]`](http://cn.arxiv.org/pdf/1704.04671v1.pdf) 72 | - 73 | - **[UntrimmedNets]** UntrimmedNets for Weakly Supervised Action Recognition and Detection | **[CVPR'17]** |[`[pdf]`](http://cn.arxiv.org/pdf/1703.03329.pdf) 74 | 75 | - **[TCN]** Temporal Context Network for Activity Localization in Videos | **[ICCV'17]** |[`[pdf]`](https://arxiv.org/pdf/1708.02349.pdf) 76 | 77 | - **[SCC]** Semantic Context Cascade for Efficient Action Detection | **[CVPR'17]** |[`[pdf]`](http://openaccess.thecvf.com/content_cvpr_2017/papers/Heilbron_SCC_Semantic_Context_CVPR_2017_paper.pdf) 78 | 79 | - **[SSTAD]** Single Shot Temporal Action Detection | **[ACMMM'17]** |[`[pdf]`](https://arxiv.org/pdf/1710.06236.pdf) 80 | 81 | - **[ETE SSTAD]** End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos | **[BMVC'17]** |[`[pdf]`](http://vision.stanford.edu/pdf/buch2017bmvc.pdf) 82 | 83 | - **[SST]** Single-stream temporal action proposals | **[CVPR'17]** |[`[pdf]`](http://vision.stanford.edu/pdf/buch2017cvpr.pdf) 84 | 85 | - **[TURN-TAP]** Temporal Unit Regression Network for Temporal Action Proposals | **[ICCV'17]** |[`[pdf]`](https://arxiv.org/pdf/1703.06189.pdf) 86 | 87 | - **[Hide-and-Seek]** Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization | **[ICCV'17]** |[`[pdf]`](http://cn.arxiv.org/pdf/1704.04232.pdf) 88 | 89 | - **[TAG]** A Pursuit of Temporal Accuracy in General Activity Detection | **[CVPR'17]** |[`[pdf]`](https://arxiv.org/pdf/1703.02716.pdf) 90 | 91 | 92 | ## 2018 93 | - **[ETP]** Precise Temporal Action Localization by Evolving Temporal Proposals | **[CVPR'18]** |[`[pdf]`](https://arxiv.org/pdf/1804.04803.pdf) 94 | 95 | - **[TAL-Net]** Rethinking the Faster R-CNN Architecture for Temporal Action Localization | **[CVPR'18]** |[`[pdf]`](https://arxiv.org/pdf/1804.07667.pdf) 96 | 97 | - **[STPN]** Weakly Supervised Action Localization by Sparse Temporal Pooling Network| **[CVPR'18]** |[`[pdf]`](http://cn.arxiv.org/pdf/1712.05080.pdf) 98 | 99 | - **[TPN]** Exploring Temporal Preservation Networks for Precise Temporal Action Localization | **[AAAI'18]** |[`[pdf]`](https://arxiv.org/pdf/1708.03280.pdf) 100 | 101 | - **[SAP]** A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning | **[AAAI'18]** |[`[pdf]`](https://arxiv.org/pdf/1706.07251.pdf) 102 | 103 | - **[WSTAD]** Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector | **[ACMMM'18]** |[`[pdf]`](http://cn.arxiv.org/pdf/1807.02929.pdf) 104 | 105 | - **[BSN]** Boundary Sensitive Network for Temporal Action Proposal Generation | **[ECCV'18]** |[`[pdf]`](https://arxiv.org/pdf/1806.02964.pdf) 106 | 107 | - **[AutoLoc]** AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos | **[ECCV'18]** |[`[pdf]`](http://cn.arxiv.org/pdf/1807.08333.pdf) 108 | 109 | - **[W-TALC]** W-TALC: Weakly-supervised Temporal Activity Localization and Classification | **[ECCV'18]** |[`[pdf]`](http://cn.arxiv.org/pdf/1807.10418) 110 | 111 | - **[CPMN]** Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization | **[ACCV'18]** |[`[pdf]`](http://cn.arxiv.org/pdf/1810.11794.pdf) 112 | 113 | 114 | ## 2019 115 | - **[STAR]** Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection | **[AAAI'19]** |[`[pdf]`](http://cn.arxiv.org/pdf/1811.07460.pdf) 116 | - **[BMN]** BMN: Boundary-Matching Network for Temporal Action Proposal Generation | **[ICCV'19]** |[`[pdf]`](https://arxiv.org/pdf/1907.09702.pdf) 117 | - **[MGG]** Multi-granularity Generator for Temporal Action Proposal | **[CVPR'19]** |[`[pdf]`](https://arxiv.org/pdf/1811.11524.pdf) 118 | --------------------------------------------------------------------------------