├── fg └── data.png ├── demo ├── 2008_000885.jpg ├── 2008_000885.npy └── 2008_000885.wav └── README.md /fg/data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SenHe/Human-Attention-in-Image-Captioning/HEAD/fg/data.png -------------------------------------------------------------------------------- /demo/2008_000885.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SenHe/Human-Attention-in-Image-Captioning/HEAD/demo/2008_000885.jpg -------------------------------------------------------------------------------- /demo/2008_000885.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SenHe/Human-Attention-in-Image-Captioning/HEAD/demo/2008_000885.npy -------------------------------------------------------------------------------- /demo/2008_000885.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SenHe/Human-Attention-in-Image-Captioning/HEAD/demo/2008_000885.wav -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Human-Attention-in-Image-Captioning-Dataset-and-Analysis-ICCV-2019 2 | ## Introduction 3 | This is the github page for our ICCV 2019 paper ([link](https://openaccess.thecvf.com/content_ICCV_2019/papers/He_Human_Attention_in_Image_Captioning_Dataset_and_Analysis_ICCV_2019_paper.pdf)). 4 | We provide the link to the data collected in the paper: 5 | ![picture](/fg/data.png) 6 | [capgaze1](https://drive.google.com/open?id=1qlOCr8TX6dmAxhlCob79X29riyQ_MRlq): contains 1000 images, and raw data (eye-fixations, verbal description as well as the transcribed text description) from 5 native English speakers. This part of data was used for the analysis. For data privacy reason, the voice of the verbal description was converted by a masking process (pitch modulation, the content was preserved). 7 | 8 | [capgaze2](https://drive.google.com/drive/folders/1ghe3_7tdx2f3ejiKEnv6w_JJ39-9c9eB?usp=sharing): contains 3000 images, and processed data (we combined all the eye-fixations from different people for each image into a fixation map). This part of data was used for developing saliency prediction model under the image captioning task. 9 | 10 | Also we give the code for extracting the information in the collected data in the demo folder (an example in the demo.ipynb). 11 | 12 | # Contact 13 | 14 | --------------------------------------------------------------------------------