├── .gitignore ├── LICENSE ├── MANIFEST.in ├── README.md ├── bin └── thingscoop ├── resources ├── clockwork_orange.png ├── filter.png ├── header.gif └── preview.jpg ├── setup.py └── thingscoop ├── __init__.py ├── classifier.py ├── models.py ├── preview.py ├── query.py ├── search.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | *pyc 2 | *#* 3 | .DS_Store 4 | dist 5 | *.egg-info 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Anastasis Germanidis 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md 2 | include LICENSE 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Thingscoop: Utility for searching and filtering videos based on their content 4 | 5 | ### Description 6 | 7 | Thingscoop is a command-line utility for analyzing videos semantically - that means searching, filtering, and describing videos based on objects, places, and other things that appear in them. 8 | 9 | When you first run thingscoop on a video file, it uses a [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) to create an "index" of what's contained in the every second of the input by repeatedly performing image classification on a frame-by-frame basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a [supercut](https://en.wikipedia.org/wiki/Supercut) of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators `!` (not), `||` (or) and `&&` (and). For example, to search a video the presence of the sky *and* the absence of the ocean: `thingscoop search 'sky && !ocean' `. 10 | 11 | Right now two models are supported by thingscoop: `vgg_imagenet` uses the architecture described in ["Very Deep Convolutional Networks for Large-Scale Image Recognition"](http://arxiv.org/abs/1409.1556) to recognize objects from the [ImageNet](http://www.image-net.org/) database, and `googlenet_places` uses the architecture described in ["Going Deeper with Convolutions"](http://arxiv.org/abs/1409.4842) to recognize settings and places from the [MIT Places](http://places.csail.mit.edu/) database. You can specify which model you'd like to use by running `thingscoop models use `, where `` is either `vgg_imagenet` or `googlenet_places`. More models will be added soon. 12 | 13 | Thingscoop is based on [Caffe](http://caffe.berkeleyvision.org/), an open-source deep learning framework. 14 | 15 | ### Installation 16 | 17 | 1. Install ffmpeg, imagemagick, and ghostscript: `brew install ffmpeg imagemagick ghostscript` (Mac OS X) or `apt-get install ffmpeg imagemagick ghostscript` (Ubuntu). 18 | 1. Follow the installation instructions on the [Caffe Installation page](http://caffe.berkeleyvision.org/installation.html). 19 | 2. Make sure you build the Python bindings by running `make pycaffe` (on Caffe's directory). 20 | 3. Set the environment variable CAFFE_ROOT to point to Caffe's directory: `export CAFFE_ROOT=[Caffe's directory]`. 21 | 4. Install thingscoop: `easy_install thingscoop` or `pip install thingscoop`. 22 | 23 | ### Usage 24 | 25 | #### `thingscoop search ` 26 | 27 | Print the start and end times (in seconds) of the regions in `` that match ``. Creates an index for `` using the current model if it does not exist. 28 | 29 | Example output: 30 | 31 | ``` 32 | $ thingscoop search violin waking_life.mp4 33 | /Users/anastasis/Downloads/waking_life.mp4 148.000000 162.000000 34 | /Users/anastasis/Downloads/waking_life.mp4 176.000000 179.000000 35 | /Users/anastasis/Downloads/waking_life.mp4 180.000000 186.000000 36 | /Users/anastasis/Downloads/waking_life.mp4 189.000000 190.000000 37 | /Users/anastasis/Downloads/waking_life.mp4 192.000000 200.000000 38 | /Users/anastasis/Downloads/waking_life.mp4 211.000000 212.000000 39 | /Users/anastasis/Downloads/waking_life.mp4 222.000000 223.000000 40 | /Users/anastasis/Downloads/waking_life.mp4 235.000000 243.000000 41 | /Users/anastasis/Downloads/waking_life.mp4 247.000000 249.000000 42 | /Users/anastasis/Downloads/waking_life.mp4 251.000000 253.000000 43 | /Users/anastasis/Downloads/waking_life.mp4 254.000000 258.000000 44 | ``` 45 | 46 | ####`thingscoop filter ` 47 | 48 | Generate a video compilation of the regions in the `` that match ``. Creates index for `` using the current model if it does not exist. 49 | 50 | Example output: 51 | 52 | 53 | 54 | #### `thingscoop sort ` 55 | 56 | Create a compilation video showing examples for every label recognized in the video (in alphabetic order). Creates an index for `` using the current model if it does not exist. 57 | 58 | Example output: 59 | 60 | 61 | 62 | #### `thingscoop describe ` 63 | 64 | Print every label that appears in `` along with the number of times it appears. Creates an index for `` using the current model if it does not exist. 65 | 66 | #### `thingscoop preview ` 67 | 68 | Create a window that plays the input video `` while also displaying the labels the model recognizes on every frame. 69 | 70 | ``` 71 | $ thingscoop describe koyaanisqatsi.mp4 -m googlenet_places 72 | sky 405 73 | skyscraper 363 74 | canyon 141 75 | office_building 130 76 | highway 78 77 | lighthouse 66 78 | hospital 64 79 | desert 59 80 | shower 49 81 | volcano 45 82 | underwater 44 83 | airport_terminal 43 84 | fountain 39 85 | runway 36 86 | assembly_line 35 87 | aquarium 34 88 | fire_escape 34 89 | music_studio 32 90 | bar 28 91 | amusement_park 28 92 | stage 26 93 | wheat_field 25 94 | butchers_shop 25 95 | engine_room 24 96 | slum 20 97 | butte 20 98 | igloo 20 99 | ...etc 100 | ``` 101 | 102 | #### `thingscoop index ` 103 | 104 | Create an index for `` using the current model if it does not exist. 105 | 106 | #### `thingscoop models list` 107 | 108 | List all models currently available in Thingscoop. 109 | 110 | ``` 111 | $ thingscoop models list 112 | googlenet_imagenet Model described in the paper "Going Deeper with Convolutions" trained on the ImageNet database 113 | googlenet_places Model described in the paper "Going Deeper with Convolutions" trained on the MIT Places database 114 | vgg_imagenet 16-layer model described in the paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets" trained on the ImageNet database 115 | ``` 116 | 117 | #### `thingscoop models info ` 118 | 119 | Print more detailed information about ``. 120 | 121 | ``` 122 | $ thingscoop models info googlenet_places 123 | Name: googlenet_places 124 | Description: Model described in the paper "Going Deeper with Convolutions" trained on the MIT Places database 125 | Dataset: MIT Places 126 | ``` 127 | 128 | #### `thingscoop models freeze` 129 | 130 | List all models that have already been downloaded. 131 | 132 | ``` 133 | $ thingscoop models freeze 134 | googlenet_places 135 | vgg_imagenet 136 | ``` 137 | 138 | #### `thingscoop models current` 139 | 140 | Print the model that is currently in use. 141 | 142 | ``` 143 | $ thingscoop models current 144 | googlenet_places 145 | ``` 146 | 147 | #### `thingscoop models use ` 148 | 149 | Set the current model to ``. Downloads that model locally if it hasn't been downloaded already. 150 | 151 | #### `thingscoop models download ` 152 | 153 | Download the model `` locally. 154 | 155 | #### `thingscoop models remove ` 156 | 157 | Remove the model `` locally. 158 | 159 | #### `thingscoop models clear` 160 | 161 | Remove all models stored locally. 162 | 163 | #### `thingscoop labels list` 164 | 165 | Print all the labels used by the current model. 166 | 167 | ``` 168 | $ thingscoop labels list 169 | abacus 170 | abaya 171 | abstraction 172 | academic gown 173 | accessory 174 | accordion 175 | acorn 176 | acorn squash 177 | acoustic guitar 178 | act 179 | actinic radiation 180 | action 181 | activity 182 | adhesive bandage 183 | adjudicator 184 | administrative district 185 | admiral 186 | adornment 187 | adventurer 188 | advocate 189 | ... 190 | ``` 191 | 192 | #### `thingscoop labels search ` 193 | 194 | Print all the labels supported by the current model that match the regular expression ``. 195 | 196 | ``` 197 | $ thingscoop labels search instrument$ 198 | beating-reed instrument 199 | bowed stringed instrument 200 | double-reed instrument 201 | free-reed instrument 202 | instrument 203 | keyboard instrument 204 | measuring instrument 205 | medical instrument 206 | musical instrument 207 | navigational instrument 208 | negotiable instrument 209 | optical instrument 210 | percussion instrument 211 | scientific instrument 212 | stringed instrument 213 | surveying instrument 214 | wind instrument 215 | ... 216 | 217 | ``` 218 | 219 | ### Full usage options 220 | 221 | ``` 222 | thingscoop - Command-line utility for searching and filtering videos based on their content 223 | 224 | Usage: 225 | thingscoop filter ... [-o ] [-m ] [-s ] [-c ] [--recreate-index] [--gpu-mode] [--open] 226 | thingscoop search ... [-o ] [-m ] [-s ] [-c ] [--recreate-index] [--gpu-mode] 227 | thingscoop describe [-n ] [-m ] [--recreate-index] [--gpu-mode] [-c ] 228 | thingscoop index [-m ] [-s ] [-c ] [-r ] [--recreate-index] [--gpu-mode] 229 | thingscoop sort [-m ] [--gpu-mode] [--min-confidence ] [--max-section-length ] [-i ] [--open] 230 | thingscoop preview [-m ] [--gpu-mode] [--min-confidence ] 231 | thingscoop labels list [-m ] 232 | thingscoop labels search [-m ] 233 | thingscoop models list 234 | thingscoop models info 235 | thingscoop models freeze 236 | thingscoop models current 237 | thingscoop models use 238 | thingscoop models download 239 | thingscoop models remove 240 | thingscoop models clear 241 | 242 | Options: 243 | --version Show version. 244 | -h --help Show this screen. 245 | -o --output Output file for supercut 246 | -s --sample-rate How many frames to classify per second (default = 1) 247 | -c --min-confidence Minimum prediction confidence required to consider a label (default depends on model) 248 | -m --model Model to use (use 'thingscoop models list' to see all available models) 249 | -n --number-of-words Number of words to describe the video with (default = 5) 250 | -t --max-section-length Max number of seconds to show examples of a label in the sorted video (default = 5) 251 | -r --min-occurrences Minimum number of occurrences of a label in video required for it to be shown in the sorted video (default = 2) 252 | -i --ignore-labels Labels to ignore when creating the sorted video video 253 | --title Title to show at the beginning of the video (sort mode only) 254 | --gpu-mode Enable GPU mode 255 | --recreate-index Recreate object index for file if it already exists 256 | --open Open filtered video after creating it (OS X only) 257 | ``` 258 | 259 | ### CHANGELOG 260 | 261 | #### 0.2 (8/16/2015) 262 | 263 | * Added `sort` option for creating a video compilation of all labels appearing in a video 264 | * Now using JSON for the index files 265 | 266 | #### 0.1 (8/5/2015) 267 | 268 | * Conception 269 | 270 | ### License 271 | 272 | MIT 273 | -------------------------------------------------------------------------------- /bin/thingscoop: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | thingscoop - Command-line utility for searching and filtering videos based on their content 4 | 5 | Usage: 6 | thingscoop filter <query> <files>... [-o <output_path>] [-m <model>] [-s <sr>] [-c <mc>] [--recreate-index] [--gpu-mode] [--open] 7 | thingscoop search <query> <files>... [-o <output_path>] [-m <model>] [-s <sr>] [-c <mc>] [--recreate-index] [--gpu-mode] 8 | thingscoop describe <file> [-n <words>] [-m <model>] [--recreate-index] [--gpu-mode] [-c <mc>] 9 | thingscoop index <files> [-m <model>] [-s <sr>] [-c <mc>] [-r <ocr>] [--recreate-index] [--gpu-mode] 10 | thingscoop sort <file> [-m <model>] [--gpu-mode] [--min-confidence <ct>] [--max-section-length <ms>] [-i <ignore>] [--open] 11 | thingscoop preview <file> [-m <model>] [--gpu-mode] [--min-confidence <ct>] 12 | thingscoop labels list [-m <model>] 13 | thingscoop labels search <regexp> [-m <model>] 14 | thingscoop models list 15 | thingscoop models info <model> 16 | thingscoop models freeze 17 | thingscoop models current 18 | thingscoop models use <model> 19 | thingscoop models download <model> 20 | thingscoop models remove <model> 21 | thingscoop models clear 22 | 23 | Options: 24 | --version Show version. 25 | -h --help Show this screen. 26 | -o --output <dst> Output file for supercut 27 | -s --sample-rate <sr> How many frames to classify per second (default = 1) 28 | -c --min-confidence <mc> Minimum prediction confidence required to consider a label (default depends on model) 29 | -m --model <model> Model to use (use 'thingscoop models list' to see all available models) 30 | -n --number-of-words <words> Number of words to describe the video with (default = 5) 31 | -t --max-section-length <ms> Max number of seconds to show examples of a label in the sorted video (default = 5) 32 | -r --min-occurrences <ocr> Minimum number of occurrences of a label in video required for it to be shown in the sorted video (default = 2) 33 | -i --ignore-labels <labels> Labels to ignore when creating the sorted video video 34 | --title <title> Title to show at the beginning of the video (sort mode only) 35 | --gpu-mode Enable GPU mode 36 | --recreate-index Recreate object index for file if it already exists 37 | --open Open filtered video after creating it (OS X only) 38 | """ 39 | 40 | import sys 41 | import os 42 | from docopt import docopt 43 | 44 | if __name__ == '__main__': 45 | if 'CAFFE_ROOT' not in os.environ: 46 | print "You need to set your CAFFE_ROOT environment variable to point to your Caffe directory" 47 | sys.exit(1) 48 | sys.path.append(os.path.join(os.environ['CAFFE_ROOT'], 'python')) 49 | os.environ['GLOG_minloglevel'] = '3' 50 | import thingscoop 51 | args = docopt(__doc__, version="Thingscoop 0.1") 52 | sys.exit(thingscoop.main(args)) 53 | 54 | -------------------------------------------------------------------------------- /resources/clockwork_orange.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/agermanidis/thingscoop/9c61079d4469a2011e232b950a5e17c4dfaf4ef1/resources/clockwork_orange.png -------------------------------------------------------------------------------- /resources/filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/agermanidis/thingscoop/9c61079d4469a2011e232b950a5e17c4dfaf4ef1/resources/filter.png -------------------------------------------------------------------------------- /resources/header.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/agermanidis/thingscoop/9c61079d4469a2011e232b950a5e17c4dfaf4ef1/resources/header.gif -------------------------------------------------------------------------------- /resources/preview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/agermanidis/thingscoop/9c61079d4469a2011e232b950a5e17c4dfaf4ef1/resources/preview.jpg -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | try: 4 | from setuptools import setup 5 | except ImportError: 6 | from distutils.core import setup 7 | 8 | setup( 9 | name='thingscoop', 10 | version='0.2', 11 | description='Command-line utility for searching and filtering videos based on objects that appear in them using convolutional neural networks', 12 | author='Anastasis Germanidis', 13 | author_email='agermanidis@gmail.com', 14 | url='https://github.com/agermanidis/thingscoop', 15 | packages=['thingscoop'], 16 | scripts=['bin/thingscoop'], 17 | install_requires=[ 18 | 'pyPEG2>=2.15.1', 19 | 'requests>=2.7.0', 20 | 'moviepy>=0.2.2.11', 21 | 'docopt>=0.6.2', 22 | 'progressbar>=2.3', 23 | 'numpy>=1.9.2', 24 | 'pattern>=2.6', 25 | 'termcolor>=1.1.0' 26 | ], 27 | license="MIT" 28 | ) 29 | -------------------------------------------------------------------------------- /thingscoop/__init__.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import glob 3 | import multiprocessing 4 | import os 5 | import pydoc 6 | import subprocess 7 | import tempfile 8 | from collections import defaultdict 9 | from operator import itemgetter 10 | 11 | from progressbar import ProgressBar, Percentage, Bar, ETA 12 | 13 | from .classifier import ImageClassifier 14 | from .models import clear_models 15 | from .models import download_model 16 | from .models import get_active_model 17 | from .models import get_all_models 18 | from .models import get_downloaded_models 19 | from .models import info_model 20 | from .models import read_model 21 | from .models import remove_model 22 | from .models import use_model 23 | from .preview import preview 24 | from .query import filename_for_query 25 | from .query import validate_query 26 | from .search import label_videos 27 | from .search import filter_out_labels 28 | from .search import reverse_index 29 | from .search import search_videos 30 | from .search import threshold_labels 31 | from .utils import create_compilation 32 | from .utils import create_supercut 33 | from .utils import merge_values 34 | from .utils import search_labels 35 | 36 | def main(args): 37 | model_name = args['--model'] or args['<model>'] or get_active_model() 38 | 39 | if args['models']: 40 | if args['list']: 41 | models = get_all_models() 42 | for model in models: 43 | print "{0}{1}".format(model['name'].ljust(30), model['description']) 44 | if args['freeze']: 45 | models = get_downloaded_models() 46 | for model_name in models: 47 | print model_name 48 | elif args['current']: 49 | print get_active_model() 50 | elif args['use']: 51 | use_model(model_name) 52 | elif args['download']: 53 | download_model(model_name) 54 | elif args['info']: 55 | try: 56 | model_info = info_model(model_name) 57 | for key in ['name', 'description', 'dataset']: 58 | print "{0}: {1}".format(key.capitalize(), model_info[key]) 59 | except: 60 | print "Model not found" 61 | elif args['remove']: 62 | remove_model(args['<model>']) 63 | elif args['clear']: 64 | clear_models() 65 | return 0 66 | 67 | if args['labels']: 68 | download_model(model_name) 69 | model = read_model(model_name) 70 | labels = sorted(set(model.labels(with_hypernyms=True)), key=lambda l: l.lower()) 71 | if args['list']: 72 | pydoc.pager('\n'.join(labels)) 73 | elif args['search']: 74 | search_labels(args['<regexp>'], labels) 75 | return 0 76 | 77 | sample_rate = float(args['--sample-rate'] or 1) 78 | min_confidence = float(args['--min-confidence'] or 0.25) 79 | number_of_words = int(args['--number-of-words'] or 5) 80 | max_section_length = float(args['--max-section-length'] or 5) 81 | min_occurrences = int(args['--min-occurrences'] or 2) 82 | ignore_labels = args['--ignore-labels'] 83 | recreate_index = args['--recreate-index'] or False 84 | gpu_mode = args['--gpu-mode'] or False 85 | if ignore_labels: 86 | ignore_list = ignore_labels.split(',') 87 | else: 88 | ignore_list = [] 89 | 90 | download_model(model_name) 91 | model = read_model(model_name) 92 | classifier = ImageClassifier(model, gpu_mode=gpu_mode) 93 | 94 | if args['preview']: 95 | preview(args['<file>'], classifier) 96 | return 0 97 | 98 | filenames = args['<files>'] or [args['<file>']] 99 | 100 | labels_by_filename = label_videos( 101 | filenames, 102 | classifier, 103 | sample_rate=sample_rate, 104 | recreate_index=recreate_index 105 | ) 106 | 107 | if args['describe']: 108 | freq_dist = defaultdict(lambda: 0) 109 | for (t, labels) in threshold_labels(merge_values(labels_by_filename), min_confidence): 110 | for label in map(itemgetter(0), labels): 111 | freq_dist[label] += 1 112 | sorted_labels = sorted(freq_dist.iteritems(), key=itemgetter(1), reverse=True) 113 | print '\n'.join(map(lambda (k, v): "{0} {1}".format(k, v), sorted_labels)) 114 | return 0 115 | 116 | if args['search'] or args['filter']: 117 | query = args['<query>'] 118 | validate_query(query, model.labels(with_hypernyms=True)) 119 | 120 | matching_time_regions = search_videos( 121 | labels_by_filename, 122 | args['<query>'], 123 | classifier, 124 | sample_rate=sample_rate, 125 | recreate_index=recreate_index, 126 | min_confidence=min_confidence 127 | ) 128 | 129 | if not matching_time_regions: 130 | return 0 131 | 132 | if args['search']: 133 | for filename, region in matching_time_regions: 134 | start, end = region 135 | print "%s %f %f" % (filename, start, end) 136 | return 0 137 | 138 | if args['filter']: 139 | supercut = create_supercut(matching_time_regions) 140 | 141 | dst = args.get('<dst>') 142 | if not dst: 143 | base, ext = os.path.splitext(args['<files>'][0]) 144 | dst = "{0}_filtered_{1}.mp4".format(base, filename_for_query(query)) 145 | 146 | supercut.to_videofile( 147 | dst, 148 | codec="libx264", 149 | temp_audiofile="temp.m4a", 150 | remove_temp=True, 151 | audio_codec="aac", 152 | ) 153 | 154 | if args['--open']: 155 | subprocess.check_output(['open', dst]) 156 | 157 | if args['sort']: 158 | timed_labels = labels_by_filename[args['<file>']] 159 | timed_labels = threshold_labels(timed_labels, min_confidence) 160 | timed_labels = filter_out_labels(timed_labels, ignore_list) 161 | 162 | idx = reverse_index( 163 | timed_labels, 164 | min_occurrences=min_occurrences, 165 | max_length_per_label=max_section_length 166 | ) 167 | compilation = create_compilation(args['<file>'], idx) 168 | 169 | dst = args.get('<dst>') 170 | if not dst: 171 | base, ext = os.path.splitext(args['<file>']) 172 | dst = "{0}_sorted.mp4".format(base) 173 | 174 | compilation.to_videofile( 175 | dst, 176 | codec="libx264", 177 | temp_audiofile="temp.m4a", 178 | remove_temp=True, 179 | audio_codec="aac", 180 | ) 181 | 182 | if args['--open']: 183 | subprocess.check_output(['open', dst]) 184 | 185 | -------------------------------------------------------------------------------- /thingscoop/classifier.py: -------------------------------------------------------------------------------- 1 | import cPickle 2 | import caffe 3 | import cv2 4 | import glob 5 | import logging 6 | import numpy 7 | import os 8 | 9 | class ImageClassifier(object): 10 | def __init__(self, model, gpu_mode=False): 11 | self.model = model 12 | 13 | kwargs = {} 14 | 15 | if self.model.get("image_dims"): 16 | kwargs['image_dims'] = tuple(self.model.get("image_dims")) 17 | 18 | if self.model.get("channel_swap"): 19 | kwargs['channel_swap'] = tuple(self.model.get("channel_swap")) 20 | 21 | if self.model.get("raw_scale"): 22 | kwargs['raw_scale'] = float(self.model.get("raw_scale")) 23 | 24 | if self.model.get("mean"): 25 | kwargs['mean'] = numpy.array(self.model.get("mean")) 26 | 27 | self.net = caffe.Classifier( 28 | model.deploy_path(), 29 | model.model_path(), 30 | **kwargs 31 | ) 32 | 33 | self.confidence_threshold = 0.1 34 | 35 | if gpu_mode: 36 | caffe.set_mode_gpu() 37 | else: 38 | caffe.set_mode_cpu() 39 | 40 | self.labels = numpy.array(model.labels()) 41 | 42 | if self.model.bet_path(): 43 | self.bet = cPickle.load(open(self.model.bet_path())) 44 | self.bet['words'] = map(lambda w: w.replace(' ', '_'), self.bet['words']) 45 | else: 46 | self.bet = None 47 | 48 | self.net.forward() 49 | 50 | def classify_image(self, filename): 51 | image = caffe.io.load_image(open(filename)) 52 | scores = self.net.predict([image], oversample=True).flatten() 53 | 54 | if self.bet: 55 | expected_infogain = numpy.dot(self.bet['probmat'], scores[self.bet['idmapping']]) 56 | expected_infogain *= self.bet['infogain'] 57 | infogain_sort = expected_infogain.argsort()[::-1] 58 | results = [ 59 | (self.bet['words'][v], float(expected_infogain[v])) 60 | for v in infogain_sort 61 | if expected_infogain[v] > self.confidence_threshold 62 | ] 63 | 64 | else: 65 | indices = (-scores).argsort() 66 | predictions = self.labels[indices] 67 | results = [ 68 | (p, float(scores[i])) 69 | for i, p in zip(indices, predictions) 70 | if scores[i] > self.confidence_threshold 71 | ] 72 | 73 | return results 74 | 75 | -------------------------------------------------------------------------------- /thingscoop/models.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import requests 4 | import shutil 5 | import tempfile 6 | import urlparse 7 | import yaml 8 | import zipfile 9 | import urllib 10 | from pattern.en import wordnet 11 | from progressbar import ProgressBar, Percentage, Bar, ETA, FileTransferSpeed 12 | 13 | from .utils import get_hypernyms 14 | 15 | THINGSCOOP_DIR = os.path.join(os.path.expanduser("~"), ".thingscoop") 16 | CONFIG_PATH = os.path.join(THINGSCOOP_DIR, "config.yml") 17 | 18 | DEFAULT_CONFIG = { 19 | 'repo_url': 'https://s3.amazonaws.com/haystack-models/', 20 | 'active_model': 'googlenet_imagenet' 21 | } 22 | 23 | class CouldNotFindModel(Exception): 24 | pass 25 | 26 | class Model(object): 27 | def __init__(self, name, model_dir): 28 | self.name = name 29 | self.model_dir = model_dir 30 | self.info = yaml.load(open(os.path.join(model_dir, "info.yml"))) 31 | 32 | def get(self, k): 33 | return self.info.get(k) 34 | 35 | def model_path(self): 36 | return os.path.join(self.model_dir, self.info['pretrained_model_file']) 37 | 38 | def deploy_path(self): 39 | return os.path.join(self.model_dir, self.info['deploy_file']) 40 | 41 | def label_path(self): 42 | return os.path.join(self.model_dir, self.info['labels_file']) 43 | 44 | def bet_path(self): 45 | if 'bet_file' not in self.info: return None 46 | return os.path.join(self.model_dir, self.info['bet_file']) 47 | 48 | def labels(self, with_hypernyms=False): 49 | ret = map(str.strip, open(self.label_path()).readlines()) 50 | if with_hypernyms: 51 | for label in list(ret): 52 | ret.extend(get_hypernyms(label)) 53 | return ret 54 | 55 | def read_config(): 56 | if not os.path.exists(CONFIG_PATH): 57 | write_config(DEFAULT_CONFIG) 58 | return yaml.load(open(CONFIG_PATH)) 59 | 60 | def write_config(config): 61 | if not os.path.exists(THINGSCOOP_DIR): 62 | os.makedirs(THINGSCOOP_DIR) 63 | yaml.dump(config, open(CONFIG_PATH, 'wb')) 64 | 65 | def get_repo_url(): 66 | return read_config()['repo_url'] 67 | 68 | def get_active_model(): 69 | return read_config()['active_model'] 70 | 71 | def get_model_url(model): 72 | return get_repo_url() + model + ".zip" 73 | 74 | def get_model_local_path(model): 75 | return os.path.join(THINGSCOOP_DIR, "models", model) 76 | 77 | def set_config(k, v): 78 | config = read_config() 79 | config[k] = v 80 | write_config(config) 81 | 82 | def model_in_cache(model): 83 | return os.path.exists(get_model_local_path(model)) 84 | 85 | def get_models_path(): 86 | return os.path.join(THINGSCOOP_DIR, "models") 87 | 88 | def get_all_models(): 89 | return yaml.load(requests.get(urlparse.urljoin(get_repo_url(), "info.yml")).text) 90 | 91 | def info_model(model): 92 | models = get_all_models() 93 | for model_info in models: 94 | if model_info['name'] == model: 95 | return model_info 96 | 97 | def use_model(model): 98 | download_model(model) 99 | set_config("active_model", model) 100 | 101 | def remove_model(model): 102 | path = get_model_local_path(model) 103 | shutil.rmtree(path) 104 | 105 | def clear_models(): 106 | for model in get_downloaded_models(): 107 | remove_model(model) 108 | 109 | def read_model(model_name): 110 | if not model_in_cache(model_name): 111 | raise CouldNotFindModel, "Could not find model {}".format(model_name) 112 | return Model(model_name, get_model_local_path(model_name)) 113 | 114 | def get_downloaded_models(): 115 | return map(os.path.basename, glob.glob(os.path.join(get_models_path(), "*"))) 116 | 117 | progress_bar = None 118 | def download_model(model): 119 | if model_in_cache(model): return 120 | model_url = get_model_url(model) 121 | tmp_zip = tempfile.NamedTemporaryFile(suffix=".zip") 122 | prompt = "Downloading model {}".format(model) 123 | def cb(count, block_size, total_size): 124 | global progress_bar 125 | if not progress_bar: 126 | widgets = [prompt, Percentage(), ' ', Bar(), ' ', FileTransferSpeed(), ' ', ETA()] 127 | progress_bar = ProgressBar(widgets=widgets, maxval=int(total_size)).start() 128 | progress_bar.update(min(total_size, count * block_size)) 129 | urllib.urlretrieve(model_url, tmp_zip.name, cb) 130 | z = zipfile.ZipFile(tmp_zip) 131 | out_path = get_model_local_path(model) 132 | try: 133 | os.mkdir(out_path) 134 | except: 135 | pass 136 | for name in z.namelist(): 137 | if name.startswith("_"): continue 138 | z.extract(name, out_path) 139 | 140 | -------------------------------------------------------------------------------- /thingscoop/preview.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import datetime 3 | import math 4 | import numpy 5 | import random 6 | import re 7 | import subprocess 8 | import sys 9 | import caffe 10 | import tempfile 11 | 12 | def duration_string_to_timedelta(s): 13 | [hours, minutes, seconds] = map(int, s.split(':')) 14 | seconds = seconds + minutes * 60 + hours * 3600 15 | return datetime.timedelta(seconds=seconds) 16 | 17 | def get_video_duration(path): 18 | result = subprocess.Popen(["ffprobe", path], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 19 | matches = [x for x in result.stdout.readlines() if "Duration" in x] 20 | duration_string = re.findall(r'Duration: ([0-9:]*)', matches[0])[0] 21 | return math.ceil(duration_string_to_timedelta(duration_string).seconds) 22 | 23 | def get_current_position(c): 24 | return int(c.get(cv2.cv.CV_CAP_PROP_POS_MSEC)/1000) 25 | 26 | def add_text_to_frame(frame, text): 27 | ret, _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_PLAIN, 1, 1) 28 | ret = (ret[0] + 20, ret[1] + 20) 29 | cv2.rectangle(frame, (0,0), ret, (0, 0, 0), cv2.cv.CV_FILLED) 30 | cv2.putText(frame, text, (5, 20), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255)) 31 | 32 | def format_classification(result): 33 | str_list = [] 34 | for label, confidence in result: 35 | str_list.append("{0} ({1})".format(label, confidence)) 36 | return ', '.join(str_list) 37 | 38 | def preview(filename, classifier): 39 | cv2.namedWindow('video') 40 | 41 | duration = int(get_video_duration(filename)) 42 | 43 | def trackbar_change(t): 44 | cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, t*1000) 45 | 46 | trackbar_prompt = 'Current position:' 47 | cv2.createTrackbar(trackbar_prompt, 'video', 0, duration, trackbar_change) 48 | 49 | cap = cv2.VideoCapture(filename) 50 | 51 | classification_result = None 52 | previous_time_in_seconds = None 53 | current_pos = 0 54 | 55 | tmp = tempfile.NamedTemporaryFile(suffix=".png") 56 | 57 | while cap.isOpened(): 58 | ret, frame = cap.read() 59 | 60 | cv2.imwrite(tmp.name, frame) 61 | 62 | if ret: 63 | current_pos = get_current_position(cap) 64 | 65 | if current_pos != previous_time_in_seconds: 66 | previous_time_in_seconds = current_pos 67 | classification_result = classifier.classify_image(tmp.name) 68 | 69 | if classification_result: 70 | add_text_to_frame(frame, format_classification(classification_result)) 71 | 72 | cv2.imshow('video', frame) 73 | 74 | cv2.setTrackbarPos(trackbar_prompt, 'video', current_pos) 75 | 76 | k = cv2.waitKey(1) & 0xFF 77 | if k == 27: 78 | break 79 | 80 | cap.release() 81 | cv2.destroyAllWindows() 82 | 83 | -------------------------------------------------------------------------------- /thingscoop/query.py: -------------------------------------------------------------------------------- 1 | import re 2 | from pypeg2 import * 3 | from pattern.en import wordnet 4 | 5 | from .utils import get_hypernyms 6 | 7 | class NoSuchLabelError(Exception): 8 | pass 9 | 10 | class Q(object): 11 | def evaluate(self, labels): 12 | return self.content.evaluate(labels) 13 | 14 | class Label(object): 15 | grammar = attr("name", re.compile(r"[A-z][A-z\- ]+[A-z]")) 16 | 17 | def evaluate(self, labels): 18 | for label in labels: 19 | if label == self.name or self.name in get_hypernyms(label): 20 | return True 21 | return False 22 | 23 | class UnaryOp(object): 24 | grammar = (attr("operator", re.compile("!")), 25 | attr("content", Label)) 26 | 27 | def evaluate(self, labels): 28 | return not self.content.evaluate(labels) 29 | 30 | class ParentheticalQ(List): 31 | grammar = (re.compile("\("), 32 | attr("content", Q), 33 | re.compile("\)")) 34 | 35 | def evaluate(self, labels): 36 | return self.content.evaluate(labels) 37 | 38 | class BinaryOp(object): 39 | grammar = (attr("lh", [ParentheticalQ, UnaryOp, Label]), 40 | attr("op", re.compile("(&&|\|\|)")), 41 | attr("rh", Q)) 42 | 43 | def evaluate(self, labels): 44 | lv = self.lh.evaluate(labels) 45 | rv = self.rh.evaluate(labels) 46 | 47 | if self.op == '||': return lv or rv 48 | else: return lv and rv 49 | 50 | Q.grammar = ( 51 | maybe_some(whitespace), 52 | attr("content", [BinaryOp, ParentheticalQ, UnaryOp, Label]), 53 | maybe_some(whitespace) 54 | ) 55 | 56 | def get_labels(parsed_q): 57 | if type(parsed_q) == Label: 58 | return [parsed_q.name] 59 | if type(parsed_q) == BinaryOp: 60 | return get_labels(parsed_q.lh) + get_labels(parsed_q.rh) 61 | return get_labels(parsed_q.content) 62 | 63 | def validate_query(q, all_labels): 64 | parsed_q = parse(q, Q) 65 | for label in get_labels(parsed_q): 66 | if label not in all_labels: 67 | raise NoSuchLabelError, "Label \"{}\" does not exist.".format(label) 68 | 69 | def eval_query_with_labels(q, labels): 70 | parsed_q = parse(q, Q) 71 | return parsed_q.evaluate(labels) 72 | 73 | def filename_for_query(q): 74 | parsed_q = parse(q, Q) 75 | labels = get_labels(parsed_q) 76 | return '_'.join(labels).replace(' ', '_') 77 | 78 | -------------------------------------------------------------------------------- /thingscoop/search.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import shutil 4 | import subprocess 5 | import sys 6 | import tempfile 7 | from operator import itemgetter 8 | 9 | from progressbar import ProgressBar, Percentage, Bar, ETA 10 | 11 | from .query import eval_query_with_labels 12 | from .utils import extract_frames 13 | from .utils import generate_index_path 14 | from .utils import read_index_from_path 15 | from .utils import save_index_to_path 16 | 17 | def times_to_regions(times, max_total_length=None): 18 | if not times: 19 | return [] 20 | regions = [] 21 | current_start = times[0] 22 | for t1, t2 in zip(times, times[1:]): 23 | if t2 - t1 > 1: 24 | regions.append((current_start, t1 + 1)) 25 | current_start = t2 26 | regions.append((current_start, times[-1] + 1)) 27 | if not max_total_length: 28 | return regions 29 | accum = 0 30 | ret = [] 31 | for index, (t1, t2) in enumerate(regions): 32 | if accum + (t2-t1) > max_total_length: 33 | ret.append((t1, t1 + max_total_length-accum)) 34 | return ret 35 | else: 36 | ret.append((t1, t2)) 37 | accum += t2 - t1 38 | return ret 39 | 40 | def unique_labels(timed_labels): 41 | ret = set() 42 | for t, labels_list in timed_labels: 43 | ret.update(map(itemgetter(0), labels_list)) 44 | return ret 45 | 46 | def reverse_index(timed_labels, min_occurrences=2, max_length_per_label=8): 47 | ret = {} 48 | for label in unique_labels(timed_labels): 49 | times = [] 50 | for t, labels_list in timed_labels: 51 | raw_labels = map(lambda (l, c): l, labels_list) 52 | if eval_query_with_labels(label, raw_labels): 53 | times.append(t) 54 | if len(times) >= min_occurrences: 55 | ret[label] = times_to_regions(times, max_total_length=max_length_per_label) 56 | return ret 57 | 58 | def threshold_labels(timed_labels, min_confidence): 59 | ret = [] 60 | for t, label_list in timed_labels: 61 | filtered = filter(lambda (l, c): c > min_confidence, label_list) 62 | if filtered: 63 | ret.append((t, filtered)) 64 | return ret 65 | 66 | def filter_out_labels(timed_labels, ignore_list): 67 | ret = [] 68 | for t, label_list in timed_labels: 69 | filtered = filter(lambda (l, c): l not in ignore_list, label_list) 70 | if filtered: 71 | ret.append((t, filtered)) 72 | return ret 73 | 74 | def label_video(filename, classifier, sample_rate=1, recreate_index=False): 75 | index_filename = generate_index_path(filename, classifier.model) 76 | 77 | if os.path.exists(index_filename) and not recreate_index: 78 | return read_index_from_path(index_filename) 79 | 80 | temp_frame_dir, frames = extract_frames(filename, sample_rate=sample_rate) 81 | 82 | timed_labels = [] 83 | 84 | widgets=["Labeling {}: ".format(filename), Percentage(), ' ', Bar(), ' ', ETA()] 85 | pbar = ProgressBar(widgets=widgets, maxval=len(frames)).start() 86 | 87 | for index, frame in enumerate(frames): 88 | pbar.update(index) 89 | labels = classifier.classify_image(frame) 90 | if not len(labels): 91 | continue 92 | t = (1./sample_rate) * index 93 | timed_labels.append((t, labels)) 94 | 95 | shutil.rmtree(temp_frame_dir) 96 | save_index_to_path(index_filename, timed_labels) 97 | 98 | return timed_labels 99 | 100 | def label_videos(filenames, classifier, sample_rate=1, recreate_index=False): 101 | ret = {} 102 | for filename in filenames: 103 | ret[filename] = label_video( 104 | filename, 105 | classifier, 106 | sample_rate=sample_rate, 107 | recreate_index=recreate_index 108 | ) 109 | return ret 110 | 111 | def search_labels(timed_labels, query, classifier, sample_rate=1, recreate_index=False, min_confidence=0.3): 112 | timed_labels = threshold_labels(timed_labels, min_confidence) 113 | 114 | times = [] 115 | for t, labels_list in timed_labels: 116 | raw_labels = map(lambda (l, c): l, labels_list) 117 | if eval_query_with_labels(query, raw_labels): 118 | times.append(t) 119 | 120 | return times_to_regions(times) 121 | 122 | def search_videos(labels_by_filename, query, classifier, sample_rate=1, recreate_index=False, min_confidence=0.3): 123 | ret = [] 124 | for filename, timed_labels in labels_by_filename.items(): 125 | ret += map(lambda r: (filename, r), search_labels( 126 | timed_labels, 127 | query, 128 | classifier, 129 | sample_rate=sample_rate, 130 | recreate_index=recreate_index, 131 | min_confidence=min_confidence 132 | )) 133 | return ret 134 | 135 | -------------------------------------------------------------------------------- /thingscoop/utils.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import json 3 | import os 4 | import re 5 | import subprocess 6 | import tempfile 7 | import textwrap 8 | from moviepy.editor import VideoFileClip, TextClip, ImageClip, concatenate_videoclips 9 | from pattern.en import wordnet 10 | from termcolor import colored 11 | from PIL import Image, ImageDraw, ImageFont 12 | 13 | def create_title_frame(title, dimensions, fontsize=60): 14 | para = textwrap.wrap(title, width=30) 15 | im = Image.new('RGB', dimensions, (0, 0, 0, 0)) 16 | draw = ImageDraw.Draw(im) 17 | font = ImageFont.truetype('resources/Helvetica.ttc', fontsize) 18 | total_height = sum(map(lambda l: draw.textsize(l, font=font)[1], para)) 19 | current_h, pad = (dimensions[1]/2-total_height/2), 10 20 | for line in para: 21 | w, h = draw.textsize(line, font=font) 22 | draw.text(((dimensions[0] - w) / 2, current_h), line, font=font) 23 | current_h += h + pad 24 | f = tempfile.NamedTemporaryFile(suffix=".png", delete=False) 25 | im.save(f.name) 26 | return f.name 27 | 28 | def get_video_dimensions(filename): 29 | p = subprocess.Popen(['ffprobe', filename], stdout=subprocess.PIPE, stderr=subprocess.PIPE) 30 | _, out = p.communicate() 31 | for line in out.split('\n'): 32 | if re.search('Video: ', line): 33 | match = re.findall('[1-9][0-9]*x[1-9][0-9]*', line)[0] 34 | return tuple(map(int, match.split('x'))) 35 | 36 | def extract_frames(filename, sample_rate=1): 37 | dest_dir = tempfile.mkdtemp() 38 | dest = os.path.join(dest_dir, "%10d.png") 39 | subprocess.check_output(["ffmpeg", "-i", filename, "-vf", "fps="+str(sample_rate), dest]) 40 | glob_pattern = os.path.join(dest_dir, "*.png") 41 | return dest_dir, glob.glob(glob_pattern) 42 | 43 | def generate_index_path(filename, model): 44 | name, ext = os.path.splitext(filename) 45 | return "{name}_{model_name}.json".format(name=name, model_name=model.name) 46 | 47 | def read_index_from_path(filename): 48 | return json.load(open(filename)) 49 | 50 | def save_index_to_path(filename, timed_labels): 51 | json.dump(timed_labels, open(filename, 'w'), indent=4) 52 | 53 | def create_supercut(regions): 54 | subclips = [] 55 | filenames = set(map(lambda (filename, _): filename, regions)) 56 | video_files = {filename: VideoFileClip(filename) for filename in filenames} 57 | for filename, region in regions: 58 | subclip = video_files[filename].subclip(*region) 59 | subclips.append(subclip) 60 | if not subclips: return None 61 | return concatenate_videoclips(subclips) 62 | 63 | def label_as_title(label): 64 | return label.replace('_', ' ').upper() 65 | 66 | def create_compilation(filename, index): 67 | dims = get_video_dimensions(filename) 68 | subclips = [] 69 | video_file = VideoFileClip(filename) 70 | for label in sorted(index.keys()): 71 | label_img_filename = create_title_frame(label_as_title(label), dims) 72 | label_clip = ImageClip(label_img_filename, duration=2) 73 | os.remove(label_img_filename) 74 | subclips.append(label_clip) 75 | for region in index[label]: 76 | subclip = video_file.subclip(*region) 77 | subclips.append(subclip) 78 | if not subclips: return None 79 | return concatenate_videoclips(subclips) 80 | 81 | def search_labels(r, labels): 82 | r = re.compile(r) 83 | for label in labels: 84 | if not r.search(label): 85 | continue 86 | current_i = 0 87 | ret = '' 88 | for m in r.finditer(label): 89 | ret += label[current_i:m.start()] 90 | ret += colored(label[m.start():m.end()], 'red', attrs=['bold']) 91 | current_i = m.end() 92 | ret += label[m.end():] 93 | print ret 94 | 95 | def get_hypernyms(label): 96 | synsets = wordnet.synsets(label) 97 | if not synsets: return [] 98 | return map(lambda s: s.synonyms[0], synsets[0].hypernyms(True)) 99 | 100 | def merge_values(d): 101 | ret = [] 102 | for lst in d.values(): 103 | ret += lst 104 | return ret 105 | 106 | --------------------------------------------------------------------------------