├── .gitignore ├── MANIFEST.in ├── README.md ├── dataset ├── README.md ├── query_frame_annotations.csv └── query_videoURLs.csv ├── qvsumm ├── __init__.py ├── config.ini ├── model.py ├── shells.py └── utils_func.py ├── requirements.txt ├── setup.py ├── summarization_demo.ipynb └── thumbnail_demo.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | **.pyc 2 | .ipynb_checkpoints/ 3 | .idea/ 4 | **~ 5 | data/ 6 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include qvsumm/config.ini -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Query-adaptive Video Summarization via Quality-aware Relevance Estimation 2 | 3 | This project allows to create query-specific thumbnails and summaries. 4 | I.e. the results are adapted to a user-specified text query 5 | through the use of an textual-visual embedding. 6 | 7 | For more information, see our paper: 8 | 9 | "Query-adaptive Video Summarization via Quality-aware Relevance Estimation" - ACM Multimedia 2017 10 | Arun Balajee Vasudevan\*, Michael Gygli\*, Anna Volokitin, Luc Van Gool (\* denotes equal contribution) 11 | CVLab, ETH Zurich 12 | 13 | ### Installation 14 | 15 | 1. Download this repository or clone with Git, and then `cd` into the root directory of the repository. 16 | 2. Install the requirements with `pip install -r requirements.txt` 17 | 3. Run `python setup.py install --user` to install the package __qvsumm__. 18 | 19 | This will suffice to run the notebooks below. For the package to work from any 20 | location, additionally run `export QVSUM_DATA_DIR=DATA_DIR`, 21 | where `DATA_DIR` is the absolute path of the directory `./query-video-summary/data`. 22 | This is necessary so that the model files can be found. 23 | How to download the models is described in the notebooks. 24 | 25 | Note: We use Lasagne for our implementation. The code is tested for cuDNN version==3.0. 26 | 27 | ### Getting Started 28 | 29 | 1. Thumbnail Extraction - This demo shows how to extract query relevant thumbnails from a video after scoring all the video frames based on its relevance to the text query. It takes inputs- text query and video url. 30 | Run `thumbnail_demo.ipynb`. 31 | 32 | 2. Summarization- This demo shows how to get the query relevant summary of the video as a set of keyframes. It takes inputs- text query and video url. 33 | Run `summarization_demo.ipynb`. 34 | 35 | ### Example 36 | We produce the summarization result for different queries for the [**video**](https://www.youtube.com/watch?v=oRdt9TndBVM). 37 | Green color scores represent the similarity scores for the corresponding queries. 38 | ![Image](https://people.ee.ethz.ch/~arunv/images/summarize_result.png) 39 | 40 | If you use the relevance prediction of this code please cite: 41 | 42 | Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool 43 | "Query-adaptive Video Summarization via Quality-aware Relevance Estimation" 44 | ACM Multimedia 2017 45 | (* denotes equal contribution) 46 | 47 | If you use the summarization code, please also cite the following paper, 48 | which provides code for maximizing submodular mixtures: 49 | 50 | Michael Gygli, Helmut Grabner, Luc Van Gool 51 | "Video Summarization by Learning Submodular Mixtures of Objectives," 52 | IEEE CVPR 2015 53 | 54 | 55 | -------------------------------------------------------------------------------- /dataset/README.md: -------------------------------------------------------------------------------- 1 | ### query_videoURLs.csv 2 | This file contains queries and corresponding extracted videos (URLs) from YouTube. 3 | 4 | ### query_frame_annotations.csv 5 | This file contains the URLs of the extracted video frames and the annotations of relevance and diversity for each frame from 5 different workers. Different annotations are uniquely identified by the assignment ID in the column 2. 6 | 7 | ## Dataset Annotation 8 | We first annotate the video frames with query relevance labels, and then partition the frames into clusters according to visual similarity. 9 | Relevance annotation ranges between 0 and 4 (Options for answers are “Trash”,“Not good”, “Good” and “Very Good”) for each frame. 10 | Cluster annotations starts from 0 and ranges to an arbitary number. 0th cluster indicates Trash frames and are of low quality(e.g. blurred, bad contrast, etc.) while the cluster numbers >=1 indicates different groups. We obtain one clustering per worker, where each clustering consists of mutually exclusive subsets of video frames as clusters. 11 | 12 | 13 | 14 | -------------------------------------------------------------------------------- /dataset/query_videoURLs.csv: -------------------------------------------------------------------------------- 1 | audi s4,https://www.youtube.com/watch?v=kQK0Sj9v3Ic 2 | 1080p hd video,https://www.youtube.com/watch?v=YIwuCs1Yovw 3 | basketball fights,https://www.youtube.com/watch?v=Y1G6Sx170kA 4 | abs blogilates,https://www.youtube.com/watch?v=Y-pJ6q2fMEU 5 | chris brown liquor,https://www.youtube.com/watch?v=GBtKV9GrHxg 6 | ariana grande focus without music,https://www.youtube.com/watch?v=YjP9bbard4s 7 | barbie girl song,https://www.youtube.com/watch?v=o_Duyxhf9vQ 8 | 679 fetty wap 1 hour,https://www.youtube.com/watch?v=O2QD2j-tyI8 9 | abc mouse,https://www.youtube.com/watch?v=beT_KcblaBI&list=PLU5T7sDbiGxQeFhKrqOaB_c4gj-cpaVSQ 10 | adele hello,https://www.youtube.com/watch?v=-yL7VP4-kP4 11 | banks waiting game,https://www.youtube.com/watch?v=cLs0oWq2ovo 12 | animal farm,https://www.youtube.com/watch?v=LAeKX5n-5IE 13 | candy candy,https://www.youtube.com/watch?v=LFjZjXHrLU0 14 | chase rice whisper,https://www.youtube.com/watch?v=xH8R1Mt7IA8 15 | civil war trailer 2 reaction,https://www.youtube.com/watch?v=8HBQeQ27Yrw 16 | arsenal vs west brom 2016,https://www.youtube.com/watch?v=3rH92uypBkY 17 | chapo guzman entrevista,https://www.youtube.com/watch?v=wh5JeDI8mds 18 | birds chirping sound effect,https://www.youtube.com/watch?v=8WwNJxL-8_k 19 | cannabis oil cancer cure,https://www.youtube.com/watch?v=teKzQJ_j0fA 20 | car fails,https://www.youtube.com/watch?v=oYoKnemfpTc 21 | air force ones nelly,https://www.youtube.com/watch?v=q6wbXf8jj9M 22 | college football pump up 2015-16,https://www.youtube.com/watch?v=xUm2BW7VY6I 23 | big bang theory,https://www.youtube.com/watch?v=A2VZ5QxCVNA 24 | beyonce halo,https://www.youtube.com/watch?v=yW8qsL1nDPM 25 | cooking channel,https://www.youtube.com/watch?v=3_UvSmTpKGY 26 | abs workout for men,https://www.youtube.com/watch?v=jdmK1XmPgso 27 | call of duty ghost,https://www.youtube.com/watch?v=Zxnx3W-HA18 28 | cristiano araujo hoje eu to terrivel,https://www.youtube.com/watch?v=nf3igFRSDsA 29 | abc jackson 5,https://www.youtube.com/watch?v=JxpmbEATBH0 30 | barcelona vs deportivo cuenca 2016,https://www.youtube.com/watch?v=aORUMmo1L68 31 | cancer story,https://www.youtube.com/watch?v=2E5kL-Hyx7Y 32 | candy cameo,https://www.youtube.com/watch?v=5w2Anvecs6s 33 | ariana grande let me love you,https://www.youtube.com/watch?v=eYtq6eA5m5k 34 | bts run japanese version,https://www.youtube.com/watch?v=rZstHq8CQF0 35 | bts live boy in luv,https://www.youtube.com/watch?v=sVxofSZc0eU 36 | 1080p test,https://www.youtube.com/watch?v=1-UdWS4RAA4 37 | britney spears criminal,https://www.youtube.com/watch?v=qqK0rN0BmGk 38 | 679 fetty wap lyrics clean,https://www.youtube.com/watch?v=OhG1Yr_3Zg0 39 | android wear,https://www.youtube.com/watch?v=3QKykbwzLpQ 40 | christmas songs 2015,https://www.youtube.com/watch?v=SnA52s7qceM&list=PLvCAdHtM7JyNtk6SGfLCANxAGq4hStBsO 41 | blackberry molasses,https://www.youtube.com/watch?v=imvqWzo-V8k 42 | banana bus,https://www.youtube.com/watch?v=sHr4B6UAGOs 43 | arsenal vs watford,https://www.youtube.com/watch?v=a3S1Hw-lRhg 44 | adele carpool karaoke,https://www.youtube.com/watch?v=Ptx8sJp-kDY 45 | chapo guzman interview,https://www.youtube.com/watch?v=1ZA1AWlirMw 46 | barbie movies,https://www.youtube.com/watch?v=i2MCVKAFStk 47 | badlapur songs,https://www.youtube.com/watch?v=ziTHTlmPdhQ 48 | abs workout at home,https://www.youtube.com/watch?v=k9AT65aogVs 49 | bts dance cover,https://www.youtube.com/watch?v=75NF4XhNrcw 50 | cannabis culture,https://www.youtube.com/watch?v=VDef1xco-ho 51 | buzzfeed ladylike,https://www.youtube.com/watch?v=KErUeTcsqp4 52 | chase utley slide,https://www.youtube.com/watch?v=SiY2GtBrHug 53 | 3d video,https://www.youtube.com/watch?v=FSGfN9rr78Q 54 | cristiano araujo mente pra mim,https://www.youtube.com/watch?v=k_ivdZfcWP8 55 | coc hack,https://www.youtube.com/watch?v=mxXCmGJYLTA 56 | bodybuilding transformation,https://www.youtube.com/watch?v=VE_5T0dTGnk 57 | birds of prey,https://www.youtube.com/watch?v=dkcDfDQHPLc 58 | birds and the bees,https://www.youtube.com/watch?v=TGDV3iD4Uxo 59 | ariana grande focus instrumental,https://www.youtube.com/watch?v=yA0h3BWmwZ8 60 | amy winehouse back to black,https://www.youtube.com/watch?v=2NVvzRt8fTs 61 | coc movie,https://www.youtube.com/watch?v=nQ_haNuzWEY 62 | beyonce formation,https://www.youtube.com/watch?v=lMAISeUGcyY 63 | 360 video horror,https://www.youtube.com/watch?v=et2Z-Jk5dds 64 | cooking with kylie,https://www.youtube.com/watch?v=HuaAgNje5mk 65 | boom boom,https://www.youtube.com/watch?v=3umBIbmcAeo 66 | 3d movies,https://www.youtube.com/watch?v=Gcc0jh6DDwY&list=PL-qyP9X1OyeQDZIMXstZG48-XHMD-8W2_ 67 | 5s iphone,https://www.youtube.com/watch?v=wGCetsl-srk 68 | cat fails,https://www.youtube.com/watch?v=ntza_9lbbc0 69 | ariana grande focus lyrics,https://www.youtube.com/watch?v=-JSTvXeMHTw 70 | boom clap charli xcx,https://www.youtube.com/watch?v=AOPMlIIg_38 71 | call of duty infinite warfare,https://www.youtube.com/watch?v=G5tuqJFWVHU 72 | banks live,https://www.youtube.com/watch?v=QTjnCLAyDw8 73 | civil war trailer,https://www.youtube.com/watch?v=dKrVegVI0Us 74 | android authority,https://www.youtube.com/watch?v=mo6nF-T58PA 75 | call of duty infinity ward,https://www.youtube.com/watch?v=8lJDXZgb0ac 76 | civil war spiderman,https://www.youtube.com/watch?v=KA-KxpHQcII 77 | bmw m3,https://www.youtube.com/watch?v=VQsxbuY6yXw 78 | chocolate salty balls,https://www.youtube.com/watch?v=W6uSS4wBO8E 79 | civil war tv spot,https://www.youtube.com/watch?v=QciVC11E3Ao 80 | desert eagle,https://www.youtube.com/watch?v=8LNBxxoVSn8 81 | 679 fetty wap featuring remy boyz,https://www.youtube.com/watch?v=wxMZkhWum64 82 | 3d printed gun,https://www.youtube.com/watch?v=IylGx-48TUI 83 | big bang theory full episodes,https://www.youtube.com/watch?v=A9RcouViR9o&list=PLVpqyN56rDNSmDQci5huHHRSOuXM4PX9t 84 | autocad 2016,https://www.youtube.com/watch?v=mdolFcEHU-c&list=PLXEyem_18syOyJAMqxwetICD6xJALuHpW 85 | 360 no scope,https://www.youtube.com/watch?v=yZlxc6NJvZc 86 | barcelona vs real madrid 2015,https://www.youtube.com/watch?v=PtjhwYdr5tk 87 | baby alive,https://www.youtube.com/watch?v=mHOEARogPik 88 | animal jam vines,https://www.youtube.com/watch?v=Lko-PLW8g70 89 | chandelier sia,https://www.youtube.com/watch?v=vHR4oOIcVZo 90 | angry birds go,https://www.youtube.com/watch?v=XreVL3Q0Gyk 91 | cristiano araujo efeitos,https://www.youtube.com/watch?v=IYLhExE-5fQ&list=PLT0SpPxj0t3mAun7J3Nzf9SnbSNOcsxDa 92 | dinosaur movie,https://www.youtube.com/watch?v=wnY13ftzi-c 93 | bmw 7 series 2016,https://www.youtube.com/watch?v=sV9fhBwIGO8 94 | hairstyles for men,https://www.youtube.com/watch?v=1t0U8oebod8 95 | banana pancakes jack johnson,https://www.youtube.com/watch?v=JWtvXOpmO_A 96 | audi a6,https://www.youtube.com/watch?v=lr7mPzjTgC0 97 | bmw x6,https://www.youtube.com/watch?v=ENbdAAF1bmo 98 | bodybuilding workout,https://www.youtube.com/watch?v=MVpvilPZkJQ 99 | crossfit girls,https://www.youtube.com/watch?v=fFvcmsoJOl4 100 | barcelona vs valencia,https://www.youtube.com/watch?v=jUHNAFwdKLg 101 | anaconda vs lion,https://www.youtube.com/watch?v=EIyGo_MjqKs 102 | babymetal,https://www.youtube.com/watch?v=dbODbQMm5Wc 103 | barcelona vs sporting gijon 2016,https://www.youtube.com/watch?v=6vzjqo-772w 104 | chandelier acapella,https://www.youtube.com/watch?v=6PxEY_kUm-w 105 | blackberry z10,https://www.youtube.com/watch?v=SE9tJKS7MBQ 106 | chocolate rain,https://www.youtube.com/watch?v=2x2W12A8Qow 107 | 3d sound,https://www.youtube.com/watch?v=vgWmA2Qn8zc 108 | agario hack,https://www.youtube.com/watch?v=vdDRamfUpXc 109 | autocad 3d,https://www.youtube.com/watch?v=3lx_FWEQLag 110 | car karaoke james corden,https://www.youtube.com/watch?v=uQereoIxioI 111 | brock lesnar vs dean ambrose,https://www.youtube.com/watch?v=Q_fnzeDCiWo 112 | big bang loser,https://www.youtube.com/watch?v=9HpgGq3_ww0 113 | bts butterfly,https://www.youtube.com/watch?v=lz-_cfI1Vdc 114 | bugatti ace hood,https://www.youtube.com/watch?v=1Iw-hbXWzGM 115 | ark survival evolved dragon,https://www.youtube.com/watch?v=hrsywopY0u0 116 | bebe rexha,https://www.youtube.com/watch?v=NRyTUN0Cz4U 117 | truck tug of war,https://www.youtube.com/watch?v=r_-fUPY4mjc 118 | 5s lean,https://www.youtube.com/watch?v=tUtc3x3xDFc 119 | bernie sanders debate,https://www.youtube.com/watch?v=vzO-JYjEcHE 120 | cartel de santa suena mamalona,https://www.youtube.com/watch?v=jVjJ6LXSmG8 121 | christmas songs for children,https://www.youtube.com/watch?v=eQ34DSTjsLQ&list=PLA-vix1dc3M6VOf5MgK1KkWHtxMX1_58j 122 | buzzfeed india,https://www.youtube.com/watch?v=AlB2hJXHP_4 123 | black and yellow wiz khalifa,https://www.youtube.com/watch?v=jU7dzfFMo8w 124 | coc comedy,https://www.youtube.com/watch?v=aP64eHVILIE 125 | christmas lights,https://www.youtube.com/watch?v=HmMF5XSncG0 126 | android n,https://www.youtube.com/watch?v=v1IocBl_5UM 127 | christmas in hollywood,https://www.youtube.com/watch?v=h_FV01Diqn0 128 | 360 video,https://www.youtube.com/watch?v=-YYhc_bEYK4 129 | 360 video minecraft,https://www.youtube.com/watch?v=mxT_BVOIAtw 130 | baby kaely,https://www.youtube.com/watch?v=ncN-axCFF4A 131 | chocolate spongebob,https://www.youtube.com/watch?v=i-HFbEbBrhA 132 | chandelier trent harmon,https://www.youtube.com/watch?v=J0SUW_lmZxY 133 | bodybuilding motivation,https://www.youtube.com/watch?v=Apu3ArZNjJA 134 | candy girl new edition,https://www.youtube.com/watch?v=1PZHG4_bKxA 135 | aruan ortiz hidden voices,https://www.youtube.com/watch?v=3QLWJZlhg3Q 136 | crtani za bebe,https://www.youtube.com/watch?v=QkA8KIhzSY0 137 | aib alia bhatt,https://www.youtube.com/watch?v=uBwfDO52En0 138 | bus crash,https://www.youtube.com/watch?v=vB3gJRRHhz8 139 | boom boom boom boom vengaboys,https://www.youtube.com/watch?v=YKNkC6Sk7DU 140 | bts dance girl group,https://www.youtube.com/watch?v=DMqoTBN-uXk 141 | bloodborne game grumps,https://www.youtube.com/watch?v=TGofzK1DGds 142 | college move in day,https://www.youtube.com/watch?v=snzDwACffzs 143 | bernie sanders snl,https://www.youtube.com/watch?v=bzbF0CszTt8 144 | android marshmallow,https://www.youtube.com/watch?v=fH27N690SIU 145 | angry birds toon,https://www.youtube.com/watch?v=guFgKAE4S9I 146 | big bang bae bae,https://www.youtube.com/watch?v=TKD03uPVD-Q 147 | brock lesnar theme,https://www.youtube.com/watch?v=6K07yVtzioI 148 | anaconda movie,https://www.youtube.com/watch?v=JWqaTJbxbW0 149 | audi a7,https://www.youtube.com/watch?v=Q5zA86s5GZc 150 | crossfit behind the scenes 2015,https://www.youtube.com/watch?v=AL-jGGlhm_A 151 | bus song,https://www.youtube.com/watch?v=8vrQA-DTp0g&list=PLh9dqRgNUcGvew-T_8Q2LExxcZ2vdsDt9 152 | brock lesnar vs big show,https://www.youtube.com/watch?v=znpFsE6bwMU 153 | 360jeezy,https://www.youtube.com/watch?v=T2yZmV_73LE 154 | beyonce hold up,https://www.youtube.com/watch?v=i0gMh3i6qTk 155 | cid 1349,https://www.youtube.com/watch?v=1jav0e32L8o 156 | anitta zen,https://www.youtube.com/watch?v=uJcbhRzy_UM 157 | bus driver beats up girl,https://www.youtube.com/watch?v=zcKYeTlWbWg 158 | cannabis 2016,https://www.youtube.com/watch?v=yb0Prix20Tc 159 | chapo guzman documentary,https://www.youtube.com/watch?v=woWeIMjXY9M 160 | cooking with dog,https://www.youtube.com/watch?v=z9D9jpuplBo 161 | chapo guzman movie,https://www.youtube.com/watch?v=woWeIMjXY9M 162 | big bang fantastic baby,https://www.youtube.com/watch?v=epj3e8nNpVM 163 | cat stevens,https://www.youtube.com/watch?v=hr0rDW5j1KU 164 | 5s online offline,https://www.youtube.com/watch?v=EfsWBXcWUpg 165 | bmw m6,https://www.youtube.com/watch?v=WoiLRgKEHgE 166 | borro cassette maluma,https://www.youtube.com/watch?v=oRWgFGsQcUk 167 | college basketball,https://www.youtube.com/watch?v=p9lqZ4wPCIA 168 | big bang theory season 9,https://www.youtube.com/watch?v=SWZi_2vICLk 169 | adana merkez remix,https://www.youtube.com/watch?v=IFfK_EmHaRs 170 | blok ekipa 79,https://www.youtube.com/watch?v=eUQLwpRa5vc 171 | bmw m5,https://www.youtube.com/watch?v=gLX-LSNpfTY 172 | 360 degree,https://www.youtube.com/watch?v=K5Cj8Her_6s 173 | basketball drills,https://www.youtube.com/watch?v=mzOpFFBIlSo 174 | banana phone,https://www.youtube.com/watch?v=j5C6X9vOEkU 175 | cannabis rapper,https://www.youtube.com/watch?v=foQ7x5r61XQ 176 | beyonce lemonade album,https://www.youtube.com/watch?v=BB5zLq1zcdo&list=PLxKHVMqMZqUSPF11Ghs0KqDfOGhB9Vw5E 177 | chase pig commercial,https://www.youtube.com/watch?v=kDcfM9QQsWs 178 | bts run teaser,https://www.youtube.com/watch?v=GdQu9kqR4aY 179 | bts dance practice dope,https://www.youtube.com/watch?v=wnmulgX_l9E 180 | cooking videos,https://www.youtube.com/watch?v=8KX2V0vMEOY 181 | boom boom boom boom empire,https://www.youtube.com/watch?v=FLx1aEqq6HA 182 | childish gambino bonfire,https://www.youtube.com/watch?v=qL1B_r9nC9k 183 | amy winehouse love is a losing game,https://www.youtube.com/watch?v=nMO5Ko_77Hk 184 | bebe rexha live,https://www.youtube.com/watch?v=bTr_Dkk1_bM 185 | ariana grande dangerous woman,https://www.youtube.com/watch?v=naMeIpGcHtA 186 | cartel de santa doctor marihuana,https://www.youtube.com/watch?v=cXn9E9POjnE 187 | chase bryant,https://www.youtube.com/watch?v=YeLqgmVJk0Y 188 | aib knockout salman khan,https://www.youtube.com/watch?v=F3nqJMtfE2E 189 | bus stop,https://www.youtube.com/watch?v=It75wQ0JypA 190 | cat in the hat,https://www.youtube.com/watch?v=1OSStCz6EdY 191 | -------------------------------------------------------------------------------- /qvsumm/__init__.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This module contains functions to create the Relevance network and load the weights 3 | as well as some helper functions, e.g. for running the demos. 4 | For more information on the method, see 5 | Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool(* denotes equal contribution) 6 | "Query-adaptive Video Summarization via Quality-aware Relevance Estimation", ACM Multimedia 2017 7 | ''' 8 | 9 | __author__ = 'Arun Balajee Vasudevan' 10 | import ConfigParser 11 | import numpy as np 12 | import os 13 | import model 14 | import gensim 15 | import theano.tensor as T 16 | import cv2 17 | 18 | # Load the configuration 19 | if not 'QVSUM_DATA_DIR' in os.environ: 20 | os.environ['QVSUM_DATA_DIR']='./data' 21 | config = ConfigParser.SafeConfigParser(os.environ) 22 | print('Loaded config file from %s' % config.read('%s/config.ini' % os.path.dirname(__file__))[0]) 23 | 24 | import shells 25 | 26 | try: 27 | import lasagne 28 | import theano 29 | except (ImportError, AssertionError) as e: 30 | print(e.message) 31 | 32 | def get_QAR_function(): 33 | ''' 34 | Get Relevance function (CNN-LSTM Relevance model) 35 | @return: theano function that scores video frames 36 | ''' 37 | # Set LSTM model 38 | print('Load weights and compile Relevance model...') 39 | input_var_lstm = theano.tensor.tensor3('inputs_lstm') 40 | input_var_mask = theano.tensor.bmatrix('inputs_mask') 41 | l_out = model.LSTMmodel(input_var_lstm, input_var_mask) 42 | model.set_lstmweights(l_out, config.get('paths', 'LSTM_weight_file')) 43 | network_output = lasagne.layers.get_output(l_out, deterministic=True) 44 | 45 | # Set CNN model 46 | network = model.build_vggmodel(batch_size=1) 47 | model.set_vggweights(network['fc7'], config.get('paths', 'vgg_weight_file'), 36) 48 | inter_layer = lasagne.layers.get_output(network['fc7'], deterministic=True) 49 | netpool = model.build_custom_mlp(inter_layer, depth=0, width=4096, drop_input=0.5, drop_hidden=0.5) 50 | model.set_cnnweights(netpool, config.get('paths', 'cnn_weight_file')) 51 | mlp_output = lasagne.layers.get_output(netpool, deterministic=True) 52 | 53 | test_similarity = model.relevance_score(mlp_output, network_output) 54 | test_quality = model.interestingness_score(mlp_output) 55 | val_fn = theano.function([network['input_layer'].input_var, input_var_lstm, input_var_mask], 56 | [test_similarity, test_quality], on_unused_input='warn') 57 | 58 | return val_fn 59 | 60 | 61 | def get_word2vec_function(): 62 | ''' 63 | Get word2vec function 64 | @param feature_layer: a layer name (see model.py). If provided, pred_fn returns (score, and the activations at feature_layer) 65 | @return: theano function that scores video frames 66 | ''' 67 | # Set word2vec model 68 | print('Load word2vec model...') 69 | if (os.path.isfile(config.get('paths', 'word2vec_file'))): 70 | w2vmodel = gensim.models.Word2Vec.load(config.get('paths', 'word2vec_file')) 71 | else: 72 | w2vmodel = gensim.models.Word2Vec.load(config.get('paths', 'word2vec_smallfile')) 73 | return w2vmodel 74 | 75 | 76 | def get_rel_Q_scores(val_fn, w2vmodel, query, frames): 77 | ''' 78 | Predict similarity and quality scores for frames 79 | @param val_fn: prediction function 80 | @param w2vmodel: word2vec model 81 | @param query: given text query as string 82 | @param frames: list of paths of video frames 83 | @return: list of scores 84 | ''' 85 | 86 | def load_dataset(image_names): 87 | i = 0 88 | image_size = 224 89 | data = np.zeros((len(image_names), 3, image_size, image_size), dtype=np.float32) 90 | MEAN_PIXEL = [103.939, 116.779, 123.68] 91 | p = 0 92 | for i in range(len(image_names)): 93 | if not isinstance(image_names[i], np.ndarray): 94 | im = cv2.imread(image_names[i]) 95 | else: 96 | im = image_names[i] 97 | im = cv2.resize(im, (image_size, image_size), interpolation=cv2.INTER_CUBIC) 98 | im = im - MEAN_PIXEL 99 | data[i, :, :, :] = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1) 100 | X_train = data 101 | return X_train 102 | 103 | def load_query(query): 104 | query = query.lower() 105 | query = ' '.join(word for word in query.split(' ') if word in w2vmodel.vocab) 106 | words = query.split() 107 | SEQ_LENGTH = 14 108 | num_features = 300 109 | BATCH_SIZE = 1 110 | qdata = np.zeros((BATCH_SIZE, SEQ_LENGTH, num_features), dtype=np.float32) 111 | mask = np.ones((BATCH_SIZE, SEQ_LENGTH), dtype=np.bool) 112 | for j in range(SEQ_LENGTH): 113 | if j < len(words): 114 | qdata[0, j, :] = np.array(w2vmodel[str(words[j])]) 115 | else: 116 | mask[0, j] = 0 117 | return qdata, mask 118 | 119 | valid_array_sim = []; 120 | valid_array_q = [] 121 | qdata, mask = load_query(query) 122 | print "Scoring frames... " 123 | for m, p in enumerate(frames): 124 | path = [] 125 | path.append(p) 126 | X = load_dataset(path) 127 | sim, quality = val_fn(X, qdata, mask) 128 | valid_array_sim.append(sim[0]) 129 | valid_array_q.append(quality[0]) 130 | return valid_array_sim, valid_array_q 131 | -------------------------------------------------------------------------------- /qvsumm/config.ini: -------------------------------------------------------------------------------- 1 | [paths] 2 | LSTM_weight_file: %(QVSUM_DATA_DIR)s/LSTMmodel.npz 3 | vgg_weight_file: %(QVSUM_DATA_DIR)s/vgg19.pkl 4 | cnn_weight_file: %(QVSUM_DATA_DIR)s/CNNmodel.npz 5 | word2vec_file: %(QVSUM_DATA_DIR)s/word2vec/word2vecnewsmodelfull300 6 | word2vec_smallfile: %(QVSUM_DATA_DIR)s/word2vec/word2vecmodel.bin -------------------------------------------------------------------------------- /qvsumm/model.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | import theano.tensor as T 3 | from lasagne.layers import InputLayer 4 | from lasagne.layers import DenseLayer 5 | from lasagne.layers import DropoutLayer 6 | from lasagne.layers import Pool2DLayer as PoolLayer 7 | from lasagne.layers import Conv2DLayer as ConvLayer 8 | import numpy as np 9 | import pickle 10 | 11 | num_features = 300 12 | 13 | # Sequence Length 14 | SEQ_LENGTH = 14 15 | 16 | # Number of units in the two hidden (LSTM) layers 17 | N_HIDDEN = 300 18 | 19 | # All gradients above this will be clipped 20 | GRAD_CLIP = 5 21 | 22 | 23 | def build_vggmodel(input_var=None, batch_size=1): 24 | net = {} 25 | net['input_layer'] = InputLayer((batch_size, 3, 224, 224), input_var=input_var) 26 | net['conv1_1'] = ConvLayer(net['input_layer'], 64, 3, pad=1, flip_filters=False) 27 | net['conv1_2'] = ConvLayer(net['conv1_1'], 64, 3, pad=1, flip_filters=False) 28 | net['pool1'] = PoolLayer(net['conv1_2'], 2) 29 | net['conv2_1'] = ConvLayer(net['pool1'], 128, 3, pad=1, flip_filters=False) 30 | net['conv2_2'] = ConvLayer(net['conv2_1'], 128, 3, pad=1, flip_filters=False) 31 | net['pool2'] = PoolLayer(net['conv2_2'], 2) 32 | net['conv3_1'] = ConvLayer(net['pool2'], 256, 3, pad=1, flip_filters=False) 33 | net['conv3_2'] = ConvLayer(net['conv3_1'], 256, 3, pad=1, flip_filters=False) 34 | net['conv3_3'] = ConvLayer(net['conv3_2'], 256, 3, pad=1, flip_filters=False) 35 | net['conv3_4'] = ConvLayer(net['conv3_3'], 256, 3, pad=1, flip_filters=False) 36 | net['pool3'] = PoolLayer(net['conv3_4'], 2) 37 | net['conv4_1'] = ConvLayer(net['pool3'], 512, 3, pad=1, flip_filters=False) 38 | net['conv4_2'] = ConvLayer(net['conv4_1'], 512, 3, pad=1, flip_filters=False) 39 | net['conv4_3'] = ConvLayer(net['conv4_2'], 512, 3, pad=1, flip_filters=False) 40 | net['conv4_4'] = ConvLayer(net['conv4_3'], 512, 3, pad=1, flip_filters=False) 41 | net['pool4'] = PoolLayer(net['conv4_4'], 2) 42 | net['conv5_1'] = ConvLayer(net['pool4'], 512, 3, pad=1, flip_filters=False) 43 | net['conv5_2'] = ConvLayer(net['conv5_1'], 512, 3, pad=1, flip_filters=False) 44 | net['conv5_3'] = ConvLayer(net['conv5_2'], 512, 3, pad=1, flip_filters=False) 45 | net['conv5_4'] = ConvLayer(net['conv5_3'], 512, 3, pad=1, flip_filters=False) 46 | net['pool5'] = PoolLayer(net['conv5_4'], 2) 47 | net['fc6'] = DenseLayer(net['pool5'], num_units=4096) 48 | net['fc6_dropout'] = DropoutLayer(net['fc6'], p=0.5) 49 | net['fc7'] = DenseLayer(net['fc6_dropout'], num_units=4096) 50 | net['fc7_dropout'] = DropoutLayer(net['fc7'], p=0.5) 51 | net['fc8'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None) 52 | net['prob'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None) 53 | return net 54 | 55 | 56 | def build_vggpool5model(input_var=None): 57 | net = {} 58 | net['input_layer'] = InputLayer((None, 512, 7, 7), input_var=input_var) 59 | net['fc6'] = DenseLayer(net['input_layer'], num_units=4096) 60 | net['fc6_dropout'] = DropoutLayer(net['fc6'], p=0.5) 61 | net['fc7'] = DenseLayer(net['fc6_dropout'], num_units=4096) 62 | net['fc7_dropout'] = DropoutLayer(net['fc7'], p=0.5) 63 | net['prob'] = DenseLayer(net['fc7_dropout'], num_units=300, nonlinearity=None) 64 | return net 65 | 66 | 67 | def LSTMmodel(input_var_lstm=None, input_var_mask=None, batch_size=1): 68 | # print "Building LSTM network" 69 | l_in = lasagne.layers.InputLayer(shape=(batch_size, SEQ_LENGTH, num_features), input_var=input_var_lstm) 70 | mask_input = lasagne.layers.InputLayer(shape=(batch_size, SEQ_LENGTH), input_var=input_var_mask) 71 | l_forward_1 = lasagne.layers.LSTMLayer(l_in, N_HIDDEN, grad_clipping=GRAD_CLIP, mask_input=mask_input, 72 | nonlinearity=lasagne.nonlinearities.rectify) 73 | l_forward_slice = lasagne.layers.SliceLayer(l_forward_1, -1, 1) 74 | l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=num_features, W=lasagne.init.Normal(), 75 | nonlinearity=None) 76 | return l_out 77 | 78 | 79 | def build_custom_mlp(input_var=None, depth=1, width=300, drop_input=0, drop_hidden=0.5): 80 | # Input layer and dropout (with shortcut `dropout` for `DropoutLayer`): 81 | network = lasagne.layers.InputLayer(shape=(None, 4096), input_var=input_var) 82 | if drop_input: 83 | network = lasagne.layers.dropout(network, p=drop_input) 84 | # Hidden layers and dropout: 85 | nonlin = lasagne.nonlinearities.rectify 86 | for _ in range(depth): 87 | network = lasagne.layers.DenseLayer( 88 | network, width, nonlinearity=nonlin) 89 | if drop_hidden: 90 | network = lasagne.layers.dropout(network, p=drop_hidden) 91 | # Output layer: 92 | tanh = lasagne.nonlinearities.tanh 93 | network = lasagne.layers.DenseLayer(network, 301, nonlinearity=None) 94 | return network 95 | 96 | 97 | def set_lstmweights(net, 98 | LSTM_weight_file): 99 | ''' 100 | set the weights of the given model. 101 | @param net: a lasagne network 102 | @param LSTM_weight_file: 103 | @return: 104 | ''' 105 | # Get LSTM weights 106 | with np.load(LSTM_weight_file) as f: 107 | param_values = [f['arr_%d' % i] for i in range(len(f.files))] 108 | lasagne.layers.set_all_param_values(net, param_values) 109 | print('Set LSTM learned weights...') 110 | 111 | 112 | def set_vggweights(net, 113 | vgg_weight_file, k): 114 | ''' 115 | set the weights of the given model. 116 | @param net: a lasagne network 117 | @param vgg_weight_file: 118 | @return: 119 | ''' 120 | # Get VGG weights 121 | # print('Set vgg19 weights...') 122 | vggmodel = pickle.load(open(vgg_weight_file)) 123 | lasagne.layers.set_all_param_values(net, vggmodel['param values'][0:k]) 124 | 125 | 126 | def set_cnnweights(net, 127 | cnn_weight_file): 128 | ''' 129 | set the weights of the given model. 130 | @param net: a lasagne network 131 | @param cnn_weight_file: 132 | @return: 133 | ''' 134 | # Get LSTM weights 135 | print('Set CNN learned weights...') 136 | with np.load(cnn_weight_file) as f: 137 | param_values = [f['arr_%d' % i] for i in range(len(f.files))] 138 | lasagne.layers.set_all_param_values([net], param_values) 139 | 140 | 141 | def relevance_score(I_out, Q_out): 142 | I_out = I_out[:, 0:300] 143 | I_out = I_out / I_out.norm(L=2, axis=1).reshape((I_out.shape[0], 1)) 144 | Q_out = Q_out / Q_out.norm(L=2, axis=1).reshape((Q_out.shape[0], 1)) 145 | value = T.diagonal(T.dot(I_out, Q_out.T)) 146 | return value 147 | 148 | 149 | def interestingness_score(I_out): 150 | I_out = I_out[:, 300] 151 | return I_out 152 | -------------------------------------------------------------------------------- /qvsumm/shells.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Implementation of the objectives used in 3 | Arun Balajee Vasudevan*, Michael Gygli*, Anna Volokitin, Luc Van Gool - Query-adaptive Video Summarization via Quality-aware Relevance Estimation. ACM Multimedia 2017 4 | ''' 5 | __author__ = "Arun Balajee Vasudevan" 6 | __email__ = "arunv@vision.ee.ethz.ch" 7 | 8 | import numpy as np 9 | import gm_submodular 10 | import gm_submodular.example_objectives as ex 11 | import model 12 | import theano 13 | import lasagne 14 | import scipy.spatial.distance as dist 15 | import cv2 16 | from qvsumm import config 17 | 18 | class Summ(gm_submodular.DataElement): 19 | ''' 20 | Defines a class Summ. 21 | For inference, this needs the function get_querylen(), getDistances(), getCosts(), vggmodel(), load_dataset() and get_mfeatures(). 22 | ''' 23 | budget = 5 24 | 25 | def __init__(self, query, imagenames, rel_scores, int_scores): 26 | self.query = query 27 | self.imagenames = imagenames 28 | self.int_scores = int_scores 29 | self.rel_scores = rel_scores 30 | self.querylen = self.get_querylen() 31 | self.Y = self.get_Y() 32 | self.dist_v = self.get_mfeatures() 33 | 34 | def get_querylen(self): 35 | valid_array = self.rel_scores 36 | return len(valid_array) 37 | 38 | def get_Y(self): 39 | return np.array(range(self.querylen)) 40 | 41 | def getCosts(self): 42 | return np.ones((self.querylen)) 43 | 44 | def getDistances(self): 45 | d = dist.squareform(self.dist_v) 46 | return np.multiply(d, d) 47 | 48 | def vggmodel(self): 49 | ''' 50 | Load the VGG19 with pretrained weights 51 | :return: fc7 layer 52 | ''' 53 | network = model.build_vggmodel(batch_size=1) 54 | model.set_vggweights(network['fc7'], config.get('paths', 'vgg_weight_file'), 36) 55 | prediction = lasagne.layers.get_output(network['fc7']) 56 | val_fn = theano.function([network['input_layer'].input_var], [prediction]) 57 | return val_fn 58 | 59 | def load_dataset(self, frames): 60 | ''' 61 | Preprocess the set of images 62 | ''' 63 | i = 0 64 | # frames=self.imagenames 65 | image_size = 224 66 | data = np.zeros((len(frames), 3, image_size, image_size), dtype=np.float32) 67 | MEAN_PIXEL = [103.939, 116.779, 123.68] 68 | p = 0 69 | for i in range(len(frames)): 70 | image = frames[i] 71 | im = cv2.imread(image) 72 | # im = im[:,:,::-1] 73 | im = cv2.resize(im, (image_size, image_size), interpolation=cv2.INTER_CUBIC) 74 | im = im - MEAN_PIXEL 75 | data[i, :, :, :] = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1) 76 | X_train = data 77 | return X_train 78 | 79 | def get_mfeatures(self): 80 | ''' 81 | Compute Spatial distance between features in fc7 space 82 | ''' 83 | frames = self.imagenames 84 | score_fn = self.vggmodel() 85 | m_features = np.zeros((len(frames), 4096), dtype=np.float32) 86 | for m, p in enumerate(frames): 87 | path = [] 88 | path.append(p) 89 | X = self.load_dataset(path) 90 | err = score_fn(X) 91 | m_features[m, :] = err[0] 92 | return dist.pdist(m_features) 93 | 94 | 95 | def quality_shell(S): 96 | ''' 97 | Quality scoring shell Eq. 98 | :param S: Summ with interestingness scores 99 | :return: quality objective 100 | ''' 101 | valid_array = S.int_scores 102 | mn = min(valid_array); 103 | stdv = np.std(valid_array); 104 | a = np.array([(item - mn) / stdv for item in valid_array]) 105 | return (lambda X: (np.sum(a[i] for i in X))) 106 | 107 | 108 | def similarity_shell(S): 109 | ''' 110 | Query similarity shell Eq. 111 | :param S: Summ with relevance scores 112 | :return: similarity objective 113 | ''' 114 | valid_array = S.rel_scores 115 | mn = min(valid_array); 116 | stdv = np.std(valid_array); 117 | a = np.array([(item - mn) / stdv for item in valid_array]) 118 | return (lambda X: (np.sum(a[i] for i in X))) 119 | 120 | 121 | def diversity_shell(S): 122 | ''' 123 | Diversity shell Eq. 124 | :param S: Summ DataElement 125 | :return: diversity objective 126 | ''' 127 | frames = S.imagenames 128 | score_fn = S.vggmodel() 129 | features = np.zeros((len(frames), 4096), dtype=np.float32) 130 | for m, p in enumerate(frames): 131 | path = [] 132 | path.append(p) 133 | X = S.load_dataset(path) 134 | err = score_fn(X) 135 | features[m, :] = err[0] 136 | 137 | def square(list): 138 | return [i ** 2 for i in list] 139 | 140 | floatvec = lambda x: np.array([float(i) for i in x]) 141 | dist = lambda x, y: np.sqrt( 142 | np.sum(square(floatvec(x) / float(np.linalg.norm(x)) - floatvec(y) / float(np.linalg.norm(y))))) 143 | c = lambda x, y: dist(features[x, :], features[y, :]) 144 | b = lambda i, X: 5 if i == 0 else min([c(X[i], X[j]) + 1e-4 for j in range(i)]) 145 | return (lambda X: (np.sum([b(i, X) for i in range(len(X))]))) 146 | -------------------------------------------------------------------------------- /qvsumm/utils_func.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pafy 3 | import numpy as np 4 | import urllib 5 | import urllib2 6 | from moviepy.editor import * 7 | from scipy import stats 8 | import sqlite3 9 | import csv 10 | import h5py 11 | import scipy.misc 12 | import shutil 13 | def preprocess_video(query,videoURL): 14 | import os 15 | Gquery=query 16 | video = pafy.new(videoURL) 17 | for s in video.streams: 18 | if s.resolution == '640x360': 19 | best = s 20 | direc = "videos" 21 | if not os.path.exists(direc): 22 | os.makedirs(direc) 23 | filename = best.download(quiet=True,filepath=direc+"/"+Gquery[0:len(Gquery)]+".mp4") 24 | f = direc+"/"+Gquery[0:len(Gquery)]+".mp4" 25 | clip = VideoFileClip(f) 26 | time = clip.duration 27 | imagenames=[] 28 | for k,c in enumerate(np.arange(0.5,time,1)): 29 | im = clip.get_frame(c) 30 | imagepath = direc+"/frames/" 31 | if os.path.exists(imagepath) and k==0: 32 | shutil.rmtree(imagepath) 33 | if not os.path.exists(imagepath): 34 | os.makedirs(imagepath) 35 | imagepath = imagepath + str(k)+'.png' 36 | scipy.misc.imsave(imagepath, im) 37 | imagenames.append(imagepath) 38 | return imagenames -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | theano==0.8.0 2 | lasagne==0.1 3 | numpy 4 | scipy 5 | gensim==0.12.3 6 | youtube-dl==2016.05.10 7 | pafy==0.5.0 8 | git+https://github.com/gyglim/gm_submodular.git 9 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup(name='qv_summary', 4 | version='0.1', 5 | description='This shows how to use or pretrained model of relevance model to score frames based on a given text query', 6 | author='Arun Balajee Vasudevan, ETH Zurich', 7 | author_email='arunv@vision.ee.ethz.ch', 8 | license='BSD', 9 | packages=['qvsumm'], 10 | include_package_data=True, 11 | install_requires=[ 12 | 'numpy','moviepy','theano','lasagne','scikit-image','pafy','gensim==0.12.3','youtube-dl'], 13 | zip_safe=False) 14 | --------------------------------------------------------------------------------