├── .gitignore
├── MANIFEST.in
├── README.md
├── dataset
    ├── README.md
    ├── query_frame_annotations.csv
    └── query_videoURLs.csv
├── qvsumm
    ├── __init__.py
    ├── config.ini
    ├── model.py
    ├── shells.py
    └── utils_func.py
├── requirements.txt
├── setup.py
├── summarization_demo.ipynb
└── thumbnail_demo.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | **.pyc
2 | .ipynb_checkpoints/
3 | .idea/
4 | **~
5 | data/
6 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include qvsumm/config.ini


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Query-adaptive Video Summarization via Quality-aware Relevance Estimation
 2 | 
 3 | This project allows to create query-specific thumbnails and summaries.
 4 | I.e. the results are adapted to a user-specified text query 
 5 | through the use of an textual-visual embedding. 
 6 | 
 7 | For more information, see our paper:
 8 | 
 9 | "Query-adaptive Video Summarization via Quality-aware Relevance Estimation" - ACM Multimedia 2017
10 | Arun Balajee Vasudevan\*, Michael Gygli\*, Anna Volokitin, Luc Van Gool (\* denotes equal contribution)  
11 | CVLab, ETH Zurich
12 | 
13 | ### Installation
14 | 
15 | 1. Download this repository or clone with Git, and then `cd` into the root directory of the repository.
16 | 2. Install the requirements with `pip install -r requirements.txt`
17 | 3. Run `python setup.py install --user` to install the package __qvsumm__.
18 | 
19 | This will suffice to run the notebooks below. For the package to work from any 
20 | location, additionally run `export QVSUM_DATA_DIR=DATA_DIR`,
21 | where `DATA_DIR` is the absolute path of the directory `./query-video-summary/data`.
22 | This is necessary so that the model files can be found.
23 | How to download the models is described in the notebooks.
24 | 
25 | Note: We use Lasagne for our implementation. The code is tested for cuDNN version==3.0.
26 | 
27 | ### Getting Started
28 | 
29 | 1. Thumbnail Extraction - This demo shows how to extract query relevant thumbnails from a video after scoring all the video frames based on its relevance to the text query. It takes inputs- text query and video url.  
30 | Run `thumbnail_demo.ipynb`.
31 | 
32 | 2. Summarization- This demo shows how to get the query relevant summary of the video as a set of keyframes. It takes inputs- text query and video url.  
33 | Run `summarization_demo.ipynb`.
34 | 
35 | ### Example
36 | We produce the summarization result for different queries for the [**video**](https://www.youtube.com/watch?v=oRdt9TndBVM).
37 | Green color scores represent the similarity scores for the corresponding queries.
38 | ![Image](https://people.ee.ethz.ch/~arunv/images/summarize_result.png)
39 | 
40 | If you use the relevance prediction of this code please cite:
41 |     
42 |     Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool
43 |     "Query-adaptive Video Summarization via Quality-aware Relevance Estimation"
44 |     ACM Multimedia 2017
45 |     (* denotes equal contribution)  
46 | 
47 | If you use the summarization code, please also cite the following paper, 
48 | which provides code for maximizing submodular mixtures:
49 | 
50 |     Michael Gygli, Helmut Grabner, Luc Van Gool
51 |     "Video Summarization by Learning Submodular Mixtures of Objectives,"
52 |     IEEE CVPR 2015
53 | 
54 | 
55 | 


--------------------------------------------------------------------------------
/dataset/README.md:
--------------------------------------------------------------------------------
 1 | ### query_videoURLs.csv
 2 | This file contains queries and corresponding extracted videos (URLs) from YouTube.
 3 | 
 4 | ### query_frame_annotations.csv
 5 | This file contains the URLs of the extracted video frames and the annotations of relevance and diversity for each frame from 5 different workers. Different annotations are uniquely identified by the assignment ID in the column 2.
 6 | 
 7 | ## Dataset Annotation
 8 | We first annotate the video frames with query relevance labels, and then partition the frames into clusters according to visual similarity.
 9 | Relevance annotation ranges between 0 and 4 (Options for answers are “Trash”,“Not good”, “Good” and “Very Good”) for each frame.
10 | Cluster annotations starts from 0 and ranges to an arbitary number. 0th cluster indicates Trash frames and are of low quality(e.g. blurred, bad contrast, etc.) while the cluster numbers >=1 indicates different groups. We obtain one clustering per worker, where each clustering consists of mutually exclusive subsets of video frames as clusters.
11 | 
12 | 
13 |  
14 | 


--------------------------------------------------------------------------------
/dataset/query_videoURLs.csv:
--------------------------------------------------------------------------------
  1 | audi s4,https://www.youtube.com/watch?v=kQK0Sj9v3Ic
  2 | 1080p hd video,https://www.youtube.com/watch?v=YIwuCs1Yovw
  3 | basketball fights,https://www.youtube.com/watch?v=Y1G6Sx170kA
  4 | abs blogilates,https://www.youtube.com/watch?v=Y-pJ6q2fMEU
  5 | chris brown liquor,https://www.youtube.com/watch?v=GBtKV9GrHxg
  6 | ariana grande focus without music,https://www.youtube.com/watch?v=YjP9bbard4s
  7 | barbie girl song,https://www.youtube.com/watch?v=o_Duyxhf9vQ
  8 | 679 fetty wap 1 hour,https://www.youtube.com/watch?v=O2QD2j-tyI8
  9 | abc mouse,https://www.youtube.com/watch?v=beT_KcblaBI&list=PLU5T7sDbiGxQeFhKrqOaB_c4gj-cpaVSQ
 10 | adele hello,https://www.youtube.com/watch?v=-yL7VP4-kP4
 11 | banks waiting game,https://www.youtube.com/watch?v=cLs0oWq2ovo
 12 | animal farm,https://www.youtube.com/watch?v=LAeKX5n-5IE
 13 | candy candy,https://www.youtube.com/watch?v=LFjZjXHrLU0
 14 | chase rice whisper,https://www.youtube.com/watch?v=xH8R1Mt7IA8
 15 | civil war trailer 2 reaction,https://www.youtube.com/watch?v=8HBQeQ27Yrw
 16 | arsenal vs west brom 2016,https://www.youtube.com/watch?v=3rH92uypBkY
 17 | chapo guzman entrevista,https://www.youtube.com/watch?v=wh5JeDI8mds
 18 | birds chirping sound effect,https://www.youtube.com/watch?v=8WwNJxL-8_k
 19 | cannabis oil cancer cure,https://www.youtube.com/watch?v=teKzQJ_j0fA
 20 | car fails,https://www.youtube.com/watch?v=oYoKnemfpTc
 21 | air force ones nelly,https://www.youtube.com/watch?v=q6wbXf8jj9M
 22 | college football pump up 2015-16,https://www.youtube.com/watch?v=xUm2BW7VY6I
 23 | big bang theory,https://www.youtube.com/watch?v=A2VZ5QxCVNA
 24 | beyonce halo,https://www.youtube.com/watch?v=yW8qsL1nDPM
 25 | cooking channel,https://www.youtube.com/watch?v=3_UvSmTpKGY
 26 | abs workout for men,https://www.youtube.com/watch?v=jdmK1XmPgso
 27 | call of duty ghost,https://www.youtube.com/watch?v=Zxnx3W-HA18
 28 | cristiano araujo hoje eu to terrivel,https://www.youtube.com/watch?v=nf3igFRSDsA
 29 | abc jackson 5,https://www.youtube.com/watch?v=JxpmbEATBH0
 30 | barcelona vs deportivo cuenca 2016,https://www.youtube.com/watch?v=aORUMmo1L68
 31 | cancer story,https://www.youtube.com/watch?v=2E5kL-Hyx7Y
 32 | candy cameo,https://www.youtube.com/watch?v=5w2Anvecs6s
 33 | ariana grande let me love you,https://www.youtube.com/watch?v=eYtq6eA5m5k
 34 | bts run japanese version,https://www.youtube.com/watch?v=rZstHq8CQF0
 35 | bts live boy in luv,https://www.youtube.com/watch?v=sVxofSZc0eU
 36 | 1080p test,https://www.youtube.com/watch?v=1-UdWS4RAA4
 37 | britney spears criminal,https://www.youtube.com/watch?v=qqK0rN0BmGk
 38 | 679 fetty wap lyrics clean,https://www.youtube.com/watch?v=OhG1Yr_3Zg0
 39 | android wear,https://www.youtube.com/watch?v=3QKykbwzLpQ
 40 | christmas songs 2015,https://www.youtube.com/watch?v=SnA52s7qceM&list=PLvCAdHtM7JyNtk6SGfLCANxAGq4hStBsO
 41 | blackberry molasses,https://www.youtube.com/watch?v=imvqWzo-V8k
 42 | banana bus,https://www.youtube.com/watch?v=sHr4B6UAGOs
 43 | arsenal vs watford,https://www.youtube.com/watch?v=a3S1Hw-lRhg
 44 | adele carpool karaoke,https://www.youtube.com/watch?v=Ptx8sJp-kDY
 45 | chapo guzman interview,https://www.youtube.com/watch?v=1ZA1AWlirMw
 46 | barbie movies,https://www.youtube.com/watch?v=i2MCVKAFStk
 47 | badlapur songs,https://www.youtube.com/watch?v=ziTHTlmPdhQ
 48 | abs workout at home,https://www.youtube.com/watch?v=k9AT65aogVs
 49 | bts dance cover,https://www.youtube.com/watch?v=75NF4XhNrcw
 50 | cannabis culture,https://www.youtube.com/watch?v=VDef1xco-ho
 51 | buzzfeed ladylike,https://www.youtube.com/watch?v=KErUeTcsqp4
 52 | chase utley slide,https://www.youtube.com/watch?v=SiY2GtBrHug
 53 | 3d video,https://www.youtube.com/watch?v=FSGfN9rr78Q
 54 | cristiano araujo mente pra mim,https://www.youtube.com/watch?v=k_ivdZfcWP8
 55 | coc hack,https://www.youtube.com/watch?v=mxXCmGJYLTA
 56 | bodybuilding transformation,https://www.youtube.com/watch?v=VE_5T0dTGnk
 57 | birds of prey,https://www.youtube.com/watch?v=dkcDfDQHPLc
 58 | birds and the bees,https://www.youtube.com/watch?v=TGDV3iD4Uxo
 59 | ariana grande focus instrumental,https://www.youtube.com/watch?v=yA0h3BWmwZ8
 60 | amy winehouse back to black,https://www.youtube.com/watch?v=2NVvzRt8fTs
 61 | coc movie,https://www.youtube.com/watch?v=nQ_haNuzWEY
 62 | beyonce formation,https://www.youtube.com/watch?v=lMAISeUGcyY
 63 | 360 video horror,https://www.youtube.com/watch?v=et2Z-Jk5dds
 64 | cooking with kylie,https://www.youtube.com/watch?v=HuaAgNje5mk
 65 | boom boom,https://www.youtube.com/watch?v=3umBIbmcAeo
 66 | 3d movies,https://www.youtube.com/watch?v=Gcc0jh6DDwY&list=PL-qyP9X1OyeQDZIMXstZG48-XHMD-8W2_
 67 | 5s iphone,https://www.youtube.com/watch?v=wGCetsl-srk
 68 | cat fails,https://www.youtube.com/watch?v=ntza_9lbbc0
 69 | ariana grande focus lyrics,https://www.youtube.com/watch?v=-JSTvXeMHTw
 70 | boom clap charli xcx,https://www.youtube.com/watch?v=AOPMlIIg_38
 71 | call of duty infinite warfare,https://www.youtube.com/watch?v=G5tuqJFWVHU
 72 | banks live,https://www.youtube.com/watch?v=QTjnCLAyDw8
 73 | civil war trailer,https://www.youtube.com/watch?v=dKrVegVI0Us
 74 | android authority,https://www.youtube.com/watch?v=mo6nF-T58PA
 75 | call of duty infinity ward,https://www.youtube.com/watch?v=8lJDXZgb0ac
 76 | civil war spiderman,https://www.youtube.com/watch?v=KA-KxpHQcII
 77 | bmw m3,https://www.youtube.com/watch?v=VQsxbuY6yXw
 78 | chocolate salty balls,https://www.youtube.com/watch?v=W6uSS4wBO8E
 79 | civil war tv spot,https://www.youtube.com/watch?v=QciVC11E3Ao
 80 | desert eagle,https://www.youtube.com/watch?v=8LNBxxoVSn8
 81 | 679 fetty wap featuring remy boyz,https://www.youtube.com/watch?v=wxMZkhWum64
 82 | 3d printed gun,https://www.youtube.com/watch?v=IylGx-48TUI
 83 | big bang theory full episodes,https://www.youtube.com/watch?v=A9RcouViR9o&list=PLVpqyN56rDNSmDQci5huHHRSOuXM4PX9t
 84 | autocad 2016,https://www.youtube.com/watch?v=mdolFcEHU-c&list=PLXEyem_18syOyJAMqxwetICD6xJALuHpW
 85 | 360 no scope,https://www.youtube.com/watch?v=yZlxc6NJvZc
 86 | barcelona vs real madrid 2015,https://www.youtube.com/watch?v=PtjhwYdr5tk
 87 | baby alive,https://www.youtube.com/watch?v=mHOEARogPik
 88 | animal jam vines,https://www.youtube.com/watch?v=Lko-PLW8g70
 89 | chandelier sia,https://www.youtube.com/watch?v=vHR4oOIcVZo
 90 | angry birds go,https://www.youtube.com/watch?v=XreVL3Q0Gyk
 91 | cristiano araujo efeitos,https://www.youtube.com/watch?v=IYLhExE-5fQ&list=PLT0SpPxj0t3mAun7J3Nzf9SnbSNOcsxDa
 92 | dinosaur movie,https://www.youtube.com/watch?v=wnY13ftzi-c
 93 | bmw 7 series 2016,https://www.youtube.com/watch?v=sV9fhBwIGO8
 94 | hairstyles for men,https://www.youtube.com/watch?v=1t0U8oebod8
 95 | banana pancakes jack johnson,https://www.youtube.com/watch?v=JWtvXOpmO_A
 96 | audi a6,https://www.youtube.com/watch?v=lr7mPzjTgC0
 97 | bmw x6,https://www.youtube.com/watch?v=ENbdAAF1bmo
 98 | bodybuilding workout,https://www.youtube.com/watch?v=MVpvilPZkJQ
 99 | crossfit girls,https://www.youtube.com/watch?v=fFvcmsoJOl4
100 | barcelona vs valencia,https://www.youtube.com/watch?v=jUHNAFwdKLg
101 | anaconda vs lion,https://www.youtube.com/watch?v=EIyGo_MjqKs
102 | babymetal,https://www.youtube.com/watch?v=dbODbQMm5Wc
103 | barcelona vs sporting gijon 2016,https://www.youtube.com/watch?v=6vzjqo-772w
104 | chandelier acapella,https://www.youtube.com/watch?v=6PxEY_kUm-w
105 | blackberry z10,https://www.youtube.com/watch?v=SE9tJKS7MBQ
106 | chocolate rain,https://www.youtube.com/watch?v=2x2W12A8Qow
107 | 3d sound,https://www.youtube.com/watch?v=vgWmA2Qn8zc
108 | agario hack,https://www.youtube.com/watch?v=vdDRamfUpXc
109 | autocad 3d,https://www.youtube.com/watch?v=3lx_FWEQLag
110 | car karaoke james corden,https://www.youtube.com/watch?v=uQereoIxioI
111 | brock lesnar vs dean ambrose,https://www.youtube.com/watch?v=Q_fnzeDCiWo
112 | big bang loser,https://www.youtube.com/watch?v=9HpgGq3_ww0
113 | bts butterfly,https://www.youtube.com/watch?v=lz-_cfI1Vdc
114 | bugatti ace hood,https://www.youtube.com/watch?v=1Iw-hbXWzGM
115 | ark survival evolved dragon,https://www.youtube.com/watch?v=hrsywopY0u0
116 | bebe rexha,https://www.youtube.com/watch?v=NRyTUN0Cz4U
117 | truck tug of war,https://www.youtube.com/watch?v=r_-fUPY4mjc
118 | 5s lean,https://www.youtube.com/watch?v=tUtc3x3xDFc
119 | bernie sanders debate,https://www.youtube.com/watch?v=vzO-JYjEcHE
120 | cartel de santa suena mamalona,https://www.youtube.com/watch?v=jVjJ6LXSmG8
121 | christmas songs for children,https://www.youtube.com/watch?v=eQ34DSTjsLQ&list=PLA-vix1dc3M6VOf5MgK1KkWHtxMX1_58j
122 | buzzfeed india,https://www.youtube.com/watch?v=AlB2hJXHP_4
123 | black and yellow wiz khalifa,https://www.youtube.com/watch?v=jU7dzfFMo8w
124 | coc comedy,https://www.youtube.com/watch?v=aP64eHVILIE
125 | christmas lights,https://www.youtube.com/watch?v=HmMF5XSncG0
126 | android n,https://www.youtube.com/watch?v=v1IocBl_5UM
127 | christmas in hollywood,https://www.youtube.com/watch?v=h_FV01Diqn0
128 | 360 video,https://www.youtube.com/watch?v=-YYhc_bEYK4
129 | 360 video minecraft,https://www.youtube.com/watch?v=mxT_BVOIAtw
130 | baby kaely,https://www.youtube.com/watch?v=ncN-axCFF4A
131 | chocolate spongebob,https://www.youtube.com/watch?v=i-HFbEbBrhA
132 | chandelier trent harmon,https://www.youtube.com/watch?v=J0SUW_lmZxY
133 | bodybuilding motivation,https://www.youtube.com/watch?v=Apu3ArZNjJA
134 | candy girl new edition,https://www.youtube.com/watch?v=1PZHG4_bKxA
135 | aruan ortiz hidden voices,https://www.youtube.com/watch?v=3QLWJZlhg3Q
136 | crtani za bebe,https://www.youtube.com/watch?v=QkA8KIhzSY0
137 | aib alia bhatt,https://www.youtube.com/watch?v=uBwfDO52En0
138 | bus crash,https://www.youtube.com/watch?v=vB3gJRRHhz8
139 | boom boom boom boom vengaboys,https://www.youtube.com/watch?v=YKNkC6Sk7DU
140 | bts dance girl group,https://www.youtube.com/watch?v=DMqoTBN-uXk
141 | bloodborne game grumps,https://www.youtube.com/watch?v=TGofzK1DGds
142 | college move in day,https://www.youtube.com/watch?v=snzDwACffzs
143 | bernie sanders snl,https://www.youtube.com/watch?v=bzbF0CszTt8
144 | android marshmallow,https://www.youtube.com/watch?v=fH27N690SIU
145 | angry birds toon,https://www.youtube.com/watch?v=guFgKAE4S9I
146 | big bang bae bae,https://www.youtube.com/watch?v=TKD03uPVD-Q
147 | brock lesnar theme,https://www.youtube.com/watch?v=6K07yVtzioI
148 | anaconda movie,https://www.youtube.com/watch?v=JWqaTJbxbW0
149 | audi a7,https://www.youtube.com/watch?v=Q5zA86s5GZc
150 | crossfit behind the scenes 2015,https://www.youtube.com/watch?v=AL-jGGlhm_A
151 | bus song,https://www.youtube.com/watch?v=8vrQA-DTp0g&list=PLh9dqRgNUcGvew-T_8Q2LExxcZ2vdsDt9
152 | brock lesnar vs big show,https://www.youtube.com/watch?v=znpFsE6bwMU
153 | 360jeezy,https://www.youtube.com/watch?v=T2yZmV_73LE
154 | beyonce hold up,https://www.youtube.com/watch?v=i0gMh3i6qTk
155 | cid 1349,https://www.youtube.com/watch?v=1jav0e32L8o
156 | anitta zen,https://www.youtube.com/watch?v=uJcbhRzy_UM
157 | bus driver beats up girl,https://www.youtube.com/watch?v=zcKYeTlWbWg
158 | cannabis 2016,https://www.youtube.com/watch?v=yb0Prix20Tc
159 | chapo guzman documentary,https://www.youtube.com/watch?v=woWeIMjXY9M
160 | cooking with dog,https://www.youtube.com/watch?v=z9D9jpuplBo
161 | chapo guzman movie,https://www.youtube.com/watch?v=woWeIMjXY9M
162 | big bang fantastic baby,https://www.youtube.com/watch?v=epj3e8nNpVM
163 | cat stevens,https://www.youtube.com/watch?v=hr0rDW5j1KU
164 | 5s online offline,https://www.youtube.com/watch?v=EfsWBXcWUpg
165 | bmw m6,https://www.youtube.com/watch?v=WoiLRgKEHgE
166 | borro cassette maluma,https://www.youtube.com/watch?v=oRWgFGsQcUk
167 | college basketball,https://www.youtube.com/watch?v=p9lqZ4wPCIA
168 | big bang theory season 9,https://www.youtube.com/watch?v=SWZi_2vICLk
169 | adana merkez remix,https://www.youtube.com/watch?v=IFfK_EmHaRs
170 | blok ekipa 79,https://www.youtube.com/watch?v=eUQLwpRa5vc
171 | bmw m5,https://www.youtube.com/watch?v=gLX-LSNpfTY
172 | 360 degree,https://www.youtube.com/watch?v=K5Cj8Her_6s
173 | basketball drills,https://www.youtube.com/watch?v=mzOpFFBIlSo
174 | banana phone,https://www.youtube.com/watch?v=j5C6X9vOEkU
175 | cannabis rapper,https://www.youtube.com/watch?v=foQ7x5r61XQ
176 | beyonce lemonade album,https://www.youtube.com/watch?v=BB5zLq1zcdo&list=PLxKHVMqMZqUSPF11Ghs0KqDfOGhB9Vw5E
177 | chase pig commercial,https://www.youtube.com/watch?v=kDcfM9QQsWs
178 | bts run teaser,https://www.youtube.com/watch?v=GdQu9kqR4aY
179 | bts dance practice dope,https://www.youtube.com/watch?v=wnmulgX_l9E
180 | cooking videos,https://www.youtube.com/watch?v=8KX2V0vMEOY
181 | boom boom boom boom empire,https://www.youtube.com/watch?v=FLx1aEqq6HA
182 | childish gambino bonfire,https://www.youtube.com/watch?v=qL1B_r9nC9k
183 | amy winehouse love is a losing game,https://www.youtube.com/watch?v=nMO5Ko_77Hk
184 | bebe rexha live,https://www.youtube.com/watch?v=bTr_Dkk1_bM
185 | ariana grande dangerous woman,https://www.youtube.com/watch?v=naMeIpGcHtA
186 | cartel de santa doctor marihuana,https://www.youtube.com/watch?v=cXn9E9POjnE
187 | chase bryant,https://www.youtube.com/watch?v=YeLqgmVJk0Y
188 | aib knockout salman khan,https://www.youtube.com/watch?v=F3nqJMtfE2E
189 | bus stop,https://www.youtube.com/watch?v=It75wQ0JypA
190 | cat in the hat,https://www.youtube.com/watch?v=1OSStCz6EdY
191 | 


--------------------------------------------------------------------------------
/qvsumm/__init__.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | This module contains functions to create the Relevance network and load the weights
  3 | as well as some helper functions, e.g. for running the demos.
  4 | For more information on the method, see
  5 |  Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool(* denotes equal contribution)
  6 |     "Query-adaptive Video Summarization via Quality-aware Relevance Estimation", ACM Multimedia 2017
  7 | '''
  8 | 
  9 | __author__ = 'Arun Balajee Vasudevan'
 10 | import ConfigParser
 11 | import numpy as np
 12 | import os
 13 | import model
 14 | import gensim
 15 | import theano.tensor as T
 16 | import cv2
 17 | 
 18 | # Load the configuration
 19 | if not 'QVSUM_DATA_DIR' in os.environ:
 20 |     os.environ['QVSUM_DATA_DIR']='./data'
 21 | config = ConfigParser.SafeConfigParser(os.environ)
 22 | print('Loaded config file from %s' % config.read('%s/config.ini' % os.path.dirname(__file__))[0])
 23 | 
 24 | import shells
 25 | 
 26 | try:
 27 |     import lasagne
 28 |     import theano
 29 | except (ImportError, AssertionError) as e:
 30 |     print(e.message)
 31 | 
 32 | def get_QAR_function():
 33 |     '''
 34 |     Get Relevance function (CNN-LSTM Relevance model)
 35 |     @return: theano function that scores video frames
 36 |     '''
 37 |     # Set LSTM model
 38 |     print('Load weights and compile Relevance model...')
 39 |     input_var_lstm = theano.tensor.tensor3('inputs_lstm')
 40 |     input_var_mask = theano.tensor.bmatrix('inputs_mask')
 41 |     l_out = model.LSTMmodel(input_var_lstm, input_var_mask)
 42 |     model.set_lstmweights(l_out, config.get('paths', 'LSTM_weight_file'))
 43 |     network_output = lasagne.layers.get_output(l_out, deterministic=True)
 44 | 
 45 |     # Set CNN model
 46 |     network = model.build_vggmodel(batch_size=1)
 47 |     model.set_vggweights(network['fc7'], config.get('paths', 'vgg_weight_file'), 36)
 48 |     inter_layer = lasagne.layers.get_output(network['fc7'], deterministic=True)
 49 |     netpool = model.build_custom_mlp(inter_layer, depth=0, width=4096, drop_input=0.5, drop_hidden=0.5)
 50 |     model.set_cnnweights(netpool, config.get('paths', 'cnn_weight_file'))
 51 |     mlp_output = lasagne.layers.get_output(netpool, deterministic=True)
 52 | 
 53 |     test_similarity = model.relevance_score(mlp_output, network_output)
 54 |     test_quality = model.interestingness_score(mlp_output)
 55 |     val_fn = theano.function([network['input_layer'].input_var, input_var_lstm, input_var_mask],
 56 |                              [test_similarity, test_quality], on_unused_input='warn')
 57 | 
 58 |     return val_fn
 59 | 
 60 | 
 61 | def get_word2vec_function():
 62 |     '''
 63 |     Get word2vec function
 64 |     @param feature_layer: a layer name (see model.py). If provided, pred_fn returns (score, and the activations at feature_layer)
 65 |     @return: theano function that scores video frames
 66 |     '''
 67 |     # Set word2vec model
 68 |     print('Load word2vec model...')
 69 |     if (os.path.isfile(config.get('paths', 'word2vec_file'))):
 70 |         w2vmodel = gensim.models.Word2Vec.load(config.get('paths', 'word2vec_file'))
 71 |     else:
 72 |         w2vmodel = gensim.models.Word2Vec.load(config.get('paths', 'word2vec_smallfile'))
 73 |     return w2vmodel
 74 | 
 75 | 
 76 | def get_rel_Q_scores(val_fn, w2vmodel, query, frames):
 77 |     '''
 78 |     Predict similarity and quality scores for frames
 79 |     @param val_fn: prediction function
 80 |     @param w2vmodel: word2vec model
 81 |     @param query: given text query as string
 82 |     @param frames: list of paths of video frames
 83 |     @return: list of scores
 84 |     '''
 85 | 
 86 |     def load_dataset(image_names):
 87 |         i = 0
 88 |         image_size = 224
 89 |         data = np.zeros((len(image_names), 3, image_size, image_size), dtype=np.float32)
 90 |         MEAN_PIXEL = [103.939, 116.779, 123.68]
 91 |         p = 0
 92 |         for i in range(len(image_names)):
 93 |             if not isinstance(image_names[i], np.ndarray):
 94 |                 im = cv2.imread(image_names[i])
 95 |             else:
 96 |                 im = image_names[i]
 97 |             im = cv2.resize(im, (image_size, image_size), interpolation=cv2.INTER_CUBIC)
 98 |             im = im - MEAN_PIXEL
 99 |             data[i, :, :, :] = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
100 |         X_train = data
101 |         return X_train
102 | 
103 |     def load_query(query):
104 |         query = query.lower()
105 |         query = ' '.join(word for word in query.split(' ') if word in w2vmodel.vocab)
106 |         words = query.split()
107 |         SEQ_LENGTH = 14
108 |         num_features = 300
109 |         BATCH_SIZE = 1
110 |         qdata = np.zeros((BATCH_SIZE, SEQ_LENGTH, num_features), dtype=np.float32)
111 |         mask = np.ones((BATCH_SIZE, SEQ_LENGTH), dtype=np.bool)
112 |         for j in range(SEQ_LENGTH):
113 |             if j < len(words):
114 |                 qdata[0, j, :] = np.array(w2vmodel[str(words[j])])
115 |             else:
116 |                 mask[0, j] = 0
117 |         return qdata, mask
118 | 
119 |     valid_array_sim = [];
120 |     valid_array_q = []
121 |     qdata, mask = load_query(query)
122 |     print "Scoring frames... "
123 |     for m, p in enumerate(frames):
124 |         path = []
125 |         path.append(p)
126 |         X = load_dataset(path)
127 |         sim, quality = val_fn(X, qdata, mask)
128 |         valid_array_sim.append(sim[0])
129 |         valid_array_q.append(quality[0])
130 |     return valid_array_sim, valid_array_q
131 | 


--------------------------------------------------------------------------------
/qvsumm/config.ini:
--------------------------------------------------------------------------------
1 | [paths]
2 | LSTM_weight_file: %(QVSUM_DATA_DIR)s/LSTMmodel.npz
3 | vgg_weight_file: %(QVSUM_DATA_DIR)s/vgg19.pkl
4 | cnn_weight_file: %(QVSUM_DATA_DIR)s/CNNmodel.npz
5 | word2vec_file: %(QVSUM_DATA_DIR)s/word2vec/word2vecnewsmodelfull300
6 | word2vec_smallfile: %(QVSUM_DATA_DIR)s/word2vec/word2vecmodel.bin


--------------------------------------------------------------------------------
/qvsumm/model.py:
--------------------------------------------------------------------------------
  1 | import lasagne
  2 | import theano.tensor as T
  3 | from lasagne.layers import InputLayer
  4 | from lasagne.layers import DenseLayer
  5 | from lasagne.layers import DropoutLayer
  6 | from lasagne.layers import Pool2DLayer as PoolLayer
  7 | from lasagne.layers import Conv2DLayer as ConvLayer
  8 | import numpy as np
  9 | import pickle
 10 | 
 11 | num_features = 300
 12 | 
 13 | # Sequence Length
 14 | SEQ_LENGTH = 14
 15 | 
 16 | # Number of units in the two hidden (LSTM) layers
 17 | N_HIDDEN = 300
 18 | 
 19 | # All gradients above this will be clipped
 20 | GRAD_CLIP = 5
 21 | 
 22 | 
 23 | def build_vggmodel(input_var=None, batch_size=1):
 24 |     net = {}
 25 |     net['input_layer'] = InputLayer((batch_size, 3, 224, 224), input_var=input_var)
 26 |     net['conv1_1'] = ConvLayer(net['input_layer'], 64, 3, pad=1, flip_filters=False)
 27 |     net['conv1_2'] = ConvLayer(net['conv1_1'], 64, 3, pad=1, flip_filters=False)
 28 |     net['pool1'] = PoolLayer(net['conv1_2'], 2)
 29 |     net['conv2_1'] = ConvLayer(net['pool1'], 128, 3, pad=1, flip_filters=False)
 30 |     net['conv2_2'] = ConvLayer(net['conv2_1'], 128, 3, pad=1, flip_filters=False)
 31 |     net['pool2'] = PoolLayer(net['conv2_2'], 2)
 32 |     net['conv3_1'] = ConvLayer(net['pool2'], 256, 3, pad=1, flip_filters=False)
 33 |     net['conv3_2'] = ConvLayer(net['conv3_1'], 256, 3, pad=1, flip_filters=False)
 34 |     net['conv3_3'] = ConvLayer(net['conv3_2'], 256, 3, pad=1, flip_filters=False)
 35 |     net['conv3_4'] = ConvLayer(net['conv3_3'], 256, 3, pad=1, flip_filters=False)
 36 |     net['pool3'] = PoolLayer(net['conv3_4'], 2)
 37 |     net['conv4_1'] = ConvLayer(net['pool3'], 512, 3, pad=1, flip_filters=False)
 38 |     net['conv4_2'] = ConvLayer(net['conv4_1'], 512, 3, pad=1, flip_filters=False)
 39 |     net['conv4_3'] = ConvLayer(net['conv4_2'], 512, 3, pad=1, flip_filters=False)
 40 |     net['conv4_4'] = ConvLayer(net['conv4_3'], 512, 3, pad=1, flip_filters=False)
 41 |     net['pool4'] = PoolLayer(net['conv4_4'], 2)
 42 |     net['conv5_1'] = ConvLayer(net['pool4'], 512, 3, pad=1, flip_filters=False)
 43 |     net['conv5_2'] = ConvLayer(net['conv5_1'], 512, 3, pad=1, flip_filters=False)
 44 |     net['conv5_3'] = ConvLayer(net['conv5_2'], 512, 3, pad=1, flip_filters=False)
 45 |     net['conv5_4'] = ConvLayer(net['conv5_3'], 512, 3, pad=1, flip_filters=False)
 46 |     net['pool5'] = PoolLayer(net['conv5_4'], 2)
 47 |     net['fc6'] = DenseLayer(net['pool5'], num_units=4096)
 48 |     net['fc6_dropout'] = DropoutLayer(net['fc6'], p=0.5)
 49 |     net['fc7'] = DenseLayer(net['fc6_dropout'], num_units=4096)
 50 |     net['fc7_dropout'] = DropoutLayer(net['fc7'], p=0.5)
 51 |     net['fc8'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None)
 52 |     net['prob'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None)
 53 |     return net
 54 | 
 55 | 
 56 | def build_vggpool5model(input_var=None):
 57 |     net = {}
 58 |     net['input_layer'] = InputLayer((None, 512, 7, 7), input_var=input_var)
 59 |     net['fc6'] = DenseLayer(net['input_layer'], num_units=4096)
 60 |     net['fc6_dropout'] = DropoutLayer(net['fc6'], p=0.5)
 61 |     net['fc7'] = DenseLayer(net['fc6_dropout'], num_units=4096)
 62 |     net['fc7_dropout'] = DropoutLayer(net['fc7'], p=0.5)
 63 |     net['prob'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None)
 64 |     return net
 65 | 
 66 | 
 67 | def LSTMmodel(input_var_lstm=None, input_var_mask=None, batch_size=1):
 68 |     # print "Building LSTM network"
 69 |     l_in = lasagne.layers.InputLayer(shape=(batch_size, SEQ_LENGTH, num_features), input_var=input_var_lstm)
 70 |     mask_input = lasagne.layers.InputLayer(shape=(batch_size, SEQ_LENGTH), input_var=input_var_mask)
 71 |     l_forward_1 = lasagne.layers.LSTMLayer(l_in, N_HIDDEN, grad_clipping=GRAD_CLIP, mask_input=mask_input,
 72 |                                            nonlinearity=lasagne.nonlinearities.rectify)
 73 |     l_forward_slice = lasagne.layers.SliceLayer(l_forward_1, -1, 1)
 74 |     l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=num_features, W=lasagne.init.Normal(),
 75 |                                       nonlinearity=None)
 76 |     return l_out
 77 | 
 78 | 
 79 | def build_custom_mlp(input_var=None, depth=1, width=300, drop_input=0, drop_hidden=0.5):
 80 |     # Input layer and dropout (with shortcut `dropout` for `DropoutLayer`):
 81 |     network = lasagne.layers.InputLayer(shape=(None, 4096), input_var=input_var)
 82 |     if drop_input:
 83 |         network = lasagne.layers.dropout(network, p=drop_input)
 84 |     # Hidden layers and dropout:
 85 |     nonlin = lasagne.nonlinearities.rectify
 86 |     for _ in range(depth):
 87 |         network = lasagne.layers.DenseLayer(
 88 |             network, width, nonlinearity=nonlin)
 89 |         if drop_hidden:
 90 |             network = lasagne.layers.dropout(network, p=drop_hidden)
 91 |     # Output layer:
 92 |     tanh = lasagne.nonlinearities.tanh
 93 |     network = lasagne.layers.DenseLayer(network, 301, nonlinearity=None)
 94 |     return network
 95 | 
 96 | 
 97 | def set_lstmweights(net,
 98 |                     LSTM_weight_file):
 99 |     '''
100 |     set the weights of the given model.
101 |     @param net: a lasagne network
102 |     @param LSTM_weight_file:
103 |     @return:
104 |     '''
105 |     # Get LSTM weights
106 |     with np.load(LSTM_weight_file) as f:
107 |         param_values = [f['arr_%d' % i] for i in range(len(f.files))]
108 |         lasagne.layers.set_all_param_values(net, param_values)
109 |     print('Set LSTM learned weights...')
110 | 
111 | 
112 | def set_vggweights(net,
113 |                    vgg_weight_file, k):
114 |     '''
115 |     set the weights of the given model.
116 |     @param net: a lasagne network
117 |     @param vgg_weight_file:
118 |     @return:
119 |     '''
120 |     # Get VGG weights
121 |     # print('Set vgg19 weights...')
122 |     vggmodel = pickle.load(open(vgg_weight_file))
123 |     lasagne.layers.set_all_param_values(net, vggmodel['param values'][0:k])
124 | 
125 | 
126 | def set_cnnweights(net,
127 |                    cnn_weight_file):
128 |     '''
129 |     set the weights of the given model.
130 |     @param net: a lasagne network
131 |     @param cnn_weight_file:
132 |     @return:
133 |     '''
134 |     # Get LSTM weights
135 |     print('Set CNN learned weights...')
136 |     with np.load(cnn_weight_file) as f:
137 |         param_values = [f['arr_%d' % i] for i in range(len(f.files))]
138 |         lasagne.layers.set_all_param_values([net], param_values)
139 | 
140 | 
141 | def relevance_score(I_out, Q_out):
142 |     I_out = I_out[:, 0:300]
143 |     I_out = I_out / I_out.norm(L=2, axis=1).reshape((I_out.shape[0], 1))
144 |     Q_out = Q_out / Q_out.norm(L=2, axis=1).reshape((Q_out.shape[0], 1))
145 |     value = T.diagonal(T.dot(I_out, Q_out.T))
146 |     return value
147 | 
148 | 
149 | def interestingness_score(I_out):
150 |     I_out = I_out[:, 300]
151 |     return I_out
152 | 


--------------------------------------------------------------------------------
/qvsumm/shells.py:
--------------------------------------------------------------------------------
  1 | '''
  2 |  Implementation of the objectives used in
  3 |  Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool - Query-adaptive Video Summarization via Quality-aware Relevance Estimation. ACM Multimedia 2017
  4 | '''
  5 | __author__ = "Arun Balajee Vasudevan"
  6 | __email__ = "arunv@vision.ee.ethz.ch"
  7 | 
  8 | import numpy as np
  9 | import gm_submodular
 10 | import gm_submodular.example_objectives as ex
 11 | import model
 12 | import theano
 13 | import lasagne
 14 | import scipy.spatial.distance as dist
 15 | import cv2
 16 | from qvsumm import config
 17 | 
 18 | class Summ(gm_submodular.DataElement):
 19 |     '''
 20 |     Defines a class Summ.
 21 |     For inference, this needs the function get_querylen(), getDistances(), getCosts(), vggmodel(), load_dataset() and get_mfeatures().
 22 |     '''
 23 |     budget = 5
 24 | 
 25 |     def __init__(self, query, imagenames, rel_scores, int_scores):
 26 |         self.query = query
 27 |         self.imagenames = imagenames
 28 |         self.int_scores = int_scores
 29 |         self.rel_scores = rel_scores
 30 |         self.querylen = self.get_querylen()
 31 |         self.Y = self.get_Y()
 32 |         self.dist_v = self.get_mfeatures()
 33 | 
 34 |     def get_querylen(self):
 35 |         valid_array = self.rel_scores
 36 |         return len(valid_array)
 37 | 
 38 |     def get_Y(self):
 39 |         return np.array(range(self.querylen))
 40 | 
 41 |     def getCosts(self):
 42 |         return np.ones((self.querylen))
 43 | 
 44 |     def getDistances(self):
 45 |         d = dist.squareform(self.dist_v)
 46 |         return np.multiply(d, d)
 47 | 
 48 |     def vggmodel(self):
 49 |         '''
 50 |         Load the VGG19 with pretrained weights
 51 |         :return: fc7 layer
 52 |         '''
 53 |         network = model.build_vggmodel(batch_size=1)
 54 |         model.set_vggweights(network['fc7'], config.get('paths', 'vgg_weight_file'), 36)
 55 |         prediction = lasagne.layers.get_output(network['fc7'])
 56 |         val_fn = theano.function([network['input_layer'].input_var], [prediction])
 57 |         return val_fn
 58 | 
 59 |     def load_dataset(self, frames):
 60 |         '''
 61 |         Preprocess the set of images
 62 |         '''
 63 |         i = 0
 64 |         # frames=self.imagenames
 65 |         image_size = 224
 66 |         data = np.zeros((len(frames), 3, image_size, image_size), dtype=np.float32)
 67 |         MEAN_PIXEL = [103.939, 116.779, 123.68]
 68 |         p = 0
 69 |         for i in range(len(frames)):
 70 |             image = frames[i]
 71 |             im = cv2.imread(image)
 72 |             # im = im[:,:,::-1]
 73 |             im = cv2.resize(im, (image_size, image_size), interpolation=cv2.INTER_CUBIC)
 74 |             im = im - MEAN_PIXEL
 75 |             data[i, :, :, :] = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
 76 |         X_train = data
 77 |         return X_train
 78 | 
 79 |     def get_mfeatures(self):
 80 |         '''
 81 |         Compute Spatial distance between features in fc7 space
 82 |         '''
 83 |         frames = self.imagenames
 84 |         score_fn = self.vggmodel()
 85 |         m_features = np.zeros((len(frames), 4096), dtype=np.float32)
 86 |         for m, p in enumerate(frames):
 87 |             path = []
 88 |             path.append(p)
 89 |             X = self.load_dataset(path)
 90 |             err = score_fn(X)
 91 |             m_features[m, :] = err[0]
 92 |         return dist.pdist(m_features)
 93 | 
 94 | 
 95 | def quality_shell(S):
 96 |     '''
 97 |     Quality scoring shell Eq.
 98 |     :param S: Summ with interestingness scores
 99 |     :return: quality objective
100 |     '''
101 |     valid_array = S.int_scores
102 |     mn = min(valid_array);
103 |     stdv = np.std(valid_array);
104 |     a = np.array([(item - mn) / stdv for item in valid_array])
105 |     return (lambda X: (np.sum(a[i] for i in X)))
106 | 
107 | 
108 | def similarity_shell(S):
109 |     '''
110 |     Query similarity shell Eq.
111 |     :param S: Summ with relevance scores
112 |     :return: similarity objective
113 |     '''
114 |     valid_array = S.rel_scores
115 |     mn = min(valid_array);
116 |     stdv = np.std(valid_array);
117 |     a = np.array([(item - mn) / stdv for item in valid_array])
118 |     return (lambda X: (np.sum(a[i] for i in X)))
119 | 
120 | 
121 | def diversity_shell(S):
122 |     '''
123 |     Diversity shell Eq.
124 |     :param S: Summ DataElement
125 |     :return: diversity objective
126 |     '''
127 |     frames = S.imagenames
128 |     score_fn = S.vggmodel()
129 |     features = np.zeros((len(frames), 4096), dtype=np.float32)
130 |     for m, p in enumerate(frames):
131 |         path = []
132 |         path.append(p)
133 |         X = S.load_dataset(path)
134 |         err = score_fn(X)
135 |         features[m, :] = err[0]
136 | 
137 |     def square(list):
138 |         return [i ** 2 for i in list]
139 | 
140 |     floatvec = lambda x: np.array([float(i) for i in x])
141 |     dist = lambda x, y: np.sqrt(
142 |         np.sum(square(floatvec(x) / float(np.linalg.norm(x)) - floatvec(y) / float(np.linalg.norm(y)))))
143 |     c = lambda x, y: dist(features[x, :], features[y, :])
144 |     b = lambda i, X: 5 if i == 0 else min([c(X[i], X[j]) + 1e-4 for j in range(i)])
145 |     return (lambda X: (np.sum([b(i, X) for i in range(len(X))])))
146 | 


--------------------------------------------------------------------------------
/qvsumm/utils_func.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import pafy
 3 | import numpy as np
 4 | import urllib
 5 | import urllib2
 6 | from moviepy.editor import *
 7 | from scipy import stats
 8 | import sqlite3
 9 | import csv
10 | import h5py
11 | import scipy.misc
12 | import shutil
13 | def preprocess_video(query,videoURL):
14 | 	import os
15 | 	Gquery=query
16 | 	video = pafy.new(videoURL)
17 | 	for s in video.streams:
18 | 	    if s.resolution == '640x360':
19 | 		best = s
20 | 	direc = "videos"
21 | 	if not os.path.exists(direc):
22 | 		os.makedirs(direc)
23 | 	filename = best.download(quiet=True,filepath=direc+"/"+Gquery[0:len(Gquery)]+".mp4")
24 | 	f = direc+"/"+Gquery[0:len(Gquery)]+".mp4"
25 | 	clip = VideoFileClip(f)
26 | 	time = clip.duration
27 | 	imagenames=[]
28 | 	for k,c in enumerate(np.arange(0.5,time,1)):
29 | 	    im = clip.get_frame(c)
30 | 	    imagepath = direc+"/frames/"
31 | 	    if os.path.exists(imagepath) and k==0:
32 |             	shutil.rmtree(imagepath)
33 | 	    if not os.path.exists(imagepath):
34 | 		os.makedirs(imagepath)
35 | 	    imagepath = imagepath + str(k)+'.png'
36 | 	    scipy.misc.imsave(imagepath, im)
37 | 	    imagenames.append(imagepath)
38 | 	return imagenames


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | theano==0.8.0
2 | lasagne==0.1
3 | numpy
4 | scipy
5 | gensim==0.12.3
6 | youtube-dl==2016.05.10
7 | pafy==0.5.0
8 | git+https://github.com/gyglim/gm_submodular.git
9 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(name='qv_summary',
 4 |       version='0.1',
 5 |       description='This shows how to use or pretrained model of relevance model to score frames based on a given text query',
 6 |       author='Arun Balajee Vasudevan, ETH Zurich',
 7 |       author_email='arunv@vision.ee.ethz.ch',
 8 |       license='BSD',
 9 |       packages=['qvsumm'],
10 |       include_package_data=True,
11 |       install_requires=[
12 |           'numpy','moviepy','theano','lasagne','scikit-image','pafy','gensim==0.12.3','youtube-dl'],
13 |       zip_safe=False)
14 | 


--------------------------------------------------------------------------------