├── README.md
├── img
    ├── pipeline.png
    ├── quantititive results.png
    ├── review architecture.png
    └── visual.png
└── source code
    ├── Readme
    ├── STANet.zip
    ├── STAViS.zip
    └── ViNet.rar


/README.md:
--------------------------------------------------------------------------------
  1 | # SCDL
  2 | ## Description 
  3 | This is the source code for ''A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!''
  4 |  [paper link](https://ieeexplore.ieee.org/document/9874810 ("paper link"))
  5 |  
  6 | **Please contact at "songsook@163.com" if you have any question about the paper and the repository**.
  7 | 
  8 | ![alt](https://github.com/MengkeSong/SCDL/blob/main/img/review%20architecture.png)
  9 | <p align="center">Figure 1: The structure of our review.</p>
 10 | 
 11 | ![alt](https://github.com/MengkeSong/SCDL/blob/main/img/pipeline.png)
 12 | <p align="center">Figure 2: Demonstrations of the differences between the conventional audio-visual saliency detection model training/testing pipeline (a) and the newly modified training/testing pipeline (b).</p>
 13 | 
 14 | ![alt](https://github.com/MengkeSong/SCDL/blob/main/img/visual.png)
 15 | <p align="center">Figure 3: Demonstration of the differences regarding the scene contents of six wide-usedly datasets of AVAD, Coutrot1, Coutrot1, DIEM, ETMD and SumMe.</p>
 16 | 
 17 | ## Getting Started
 18 | ### Requirements
 19 | * Python 3.7
 20 | * Pytorch 1.6.0
 21 | * CUDA v10.1, cudnn v.7.5.0
 22 | * torchvision
 23 | 
 24 | ### Usage
 25 | 1.Clone
 26 | git clone https://github.com/MengkeSong/SCDL.git
 27 | cd SCDL/
 28 | 
 29 | 2.Download the datasets
 30 | 
 31 | **Google Drive** [link1](https://drive.google.com/drive/folders/1Zgfe9pr4sryJrWGlIYw-sl4u2_pn3gre?usp=sharing "data link") (annotation, audio, fold list, and av gt label) and [link2](https://drive.google.com/drive/folders/1vWNoOF0hul1ASumKAG2EaF29bMIf0ohO?usp=sharing "data link") (video frame).
 32 | 
 33 | **Baidu Netdisk** [link](https://pan.baidu.com/s/1eqesDvK-7KUKL2VBshct6A "data link") (r5ca).
 34 | 
 35 | Download the following datasets and unzip them into your_data folder. All datasets and labeled Audio-visual Consistency Degree can be downloaded at the Baidu Netdisk or Google Drive. 
 36 | 
 37 | Then put them under the following directory:
 38 | ```
 39 | -video_frame/
 40 |   -AVAD/
 41 |   -Coutrot1/
 42 |   -Coutrot2/
 43 |   -DIEM/
 44 |   -ETMD/
 45 |   -SumMe/
 46 | 
 47 | -annotation/
 48 |   -AVAD/
 49 |   -Coutrot1/
 50 |   -Coutrot2/
 51 |   -DIEM/
 52 |   -ETMD/
 53 |   -SumMe/
 54 | 
 55 | -audio/
 56 |   -AVAD/
 57 |   -Coutrot1/
 58 |   -Coutrot2/
 59 |   -DIEM/
 60 |   -ETMD/
 61 |   -SumMe/
 62 | 
 63 | -fold_list/
 64 |   -AVAD/
 65 |   -Coutrot1/
 66 |   -Coutrot2/
 67 |   -DIEM/
 68 |   -ETMD/
 69 |   -SumMe/
 70 |   
 71 |   -av_gt_label/
 72 |   -AVAD/
 73 |   -Coutrot1/
 74 |   -Coutrot2/
 75 |   -DIEM/
 76 |   -ETMD/
 77 |   -SumMe/
 78 | ```
 79 | 
 80 | 3.**Training & Testing**
 81 | 
 82 | Our method is based on the source codes of [STANet](https://github.com/guotaowang/STANet), [STAViS](https://github.com/atsiami/STAViS) and [AViNet](https://github.com/samyak0210/ViNet). The original code of this paper will be publicaly available as soon as it has been recomposed.
 83 | 
 84 | To quickly train and test first, you can modify the used source codes of STANet, STAViS and AViNet by yourself referring to the technical details mentioned in our manuscript. The used source codes of STANet, STAViS and AViNet are provided in the zip files within /source codes/.
 85 | 
 86 | 
 87 | ### Results
 88 | ![alt](https://github.com/MengkeSong/SCDL/blob/main/img/quantititive%20results.png)
 89 | <p align="center">Figure 3: Quantitative comparisons between our method with other fully-/weakly-/un-supervised methods on all 6 datasets.</p>
 90 | 
 91 | ## Citation
 92 | Please cite the following article when referring to this method.
 93 | ```
 94 | @ARTICLE{Chen2022SCDL,
 95 |       title={A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!}, 
 96 |       author={Chenglizhao Chen and Mengke Song and Wenfeng Song and Li Guo and Muwei Jian},
 97 |       year={2022},
 98 | }
 99 | ```
100 | 
101 | ## Acknowledgement 
102 | Thanks to [STANet](https://github.com/guotaowang/STANet), [STAViS](https://github.com/atsiami/STAViS) and [AViNet](https://github.com/samyak0210/ViNet).
103 | 


--------------------------------------------------------------------------------
/img/pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/img/pipeline.png


--------------------------------------------------------------------------------
/img/quantititive results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/img/quantititive results.png


--------------------------------------------------------------------------------
/img/review architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/img/review architecture.png


--------------------------------------------------------------------------------
/img/visual.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/img/visual.png


--------------------------------------------------------------------------------
/source code/Readme:
--------------------------------------------------------------------------------
1 | These are the source codes of of STANet, STAViS and AViNet, not the original code of this paper. The original code of this paper will be publicaly available as soon as it has been recomposed.
2 | To quickly train and test first, you can modify the used source codes of STANet, STAViS and AViNet by yourself referring to the technical details mentioned in our manuscript. 
3 | The used source codes of STANet, STAViS and AViNet are provided in the zip files within /source codes/.
4 | 


--------------------------------------------------------------------------------
/source code/STANet.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/source code/STANet.zip


--------------------------------------------------------------------------------
/source code/STAViS.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/source code/STAViS.zip


--------------------------------------------------------------------------------
/source code/ViNet.rar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MengkeSong/SCDL/4a2cc0fce6fd4a4c36a3ed69b3a6d110e039be42/source code/ViNet.rar


--------------------------------------------------------------------------------