├── .gitignore ├── LICENSE ├── README.md ├── app.py ├── config.py ├── iffse ├── __init__.py ├── database.py ├── trees.py └── utils │ ├── cv │ ├── __init__.py │ └── faces.py │ ├── helpers.py │ └── ml │ ├── __init__.py │ ├── example.py │ ├── open_face.py │ └── spatial_lrn.py ├── requirements.txt ├── scrapper.py ├── static ├── favicon.ico ├── man.png └── sadbaby.jpg └── templates └── index.html /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | *.db 6 | *.log 7 | *.hdf5 8 | .vscode/ 9 | data/ 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | env/ 17 | build/ 18 | develop-eggs/ 19 | dist/ 20 | downloads/ 21 | eggs/ 22 | .eggs/ 23 | lib/ 24 | lib64/ 25 | parts/ 26 | sdist/ 27 | var/ 28 | wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .coverage 47 | .coverage.* 48 | .cache 49 | nosetests.xml 50 | coverage.xml 51 | *.cover 52 | .hypothesis/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | 62 | # Flask stuff: 63 | instance/ 64 | .webassets-cache 65 | 66 | # Scrapy stuff: 67 | .scrapy 68 | 69 | # Sphinx documentation 70 | docs/_build/ 71 | 72 | # PyBuilder 73 | target/ 74 | 75 | # Jupyter Notebook 76 | .ipynb_checkpoints 77 | 78 | # pyenv 79 | .python-version 80 | 81 | # celery beat schedule file 82 | celerybeat-schedule 83 | 84 | # SageMath parsed files 85 | *.sage.py 86 | 87 | # dotenv 88 | .env 89 | 90 | # virtualenv 91 | .venv 92 | venv/ 93 | ENV/ 94 | 95 | # Spyder project settings 96 | .spyderproject 97 | .spyproject 98 | 99 | # Rope project settings 100 | .ropeproject 101 | 102 | # mkdocs documentation 103 | /site 104 | 105 | # mypy 106 | .mypy_cache/ 107 | 108 | # Database 109 | faces_of_instagram/ 110 | pretrained_weights/ 111 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Kendrick Tan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IFFSE 2 | ## Facial Feature Search Engine for some Photo Sharing Website 3 | 4 | # Setup 5 | The recommended way of installing a local copy of facemaps is to use a python 3.6 conda environment: 6 | 7 | ```bash 8 | wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O conda.sh 9 | chmod +x conda.sh 10 | bash conda.sh 11 | source ~/.bashrc # or `source ~/.zshrc` if you're using zsh 12 | 13 | # Create new conda env and use it 14 | conda create -n facemaps python=3.6 anaconda 15 | source activate facemaps 16 | 17 | # Annoy Issue: 18 | # Annoy uses libstdc++, Anaconda provides its own libstdc++, 19 | # to use annoy in Anaconda, run: 20 | cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 $CONDA_PATH/envs/facemaps/lib 21 | ``` 22 | 23 | ### Dependencies: 24 | ```bash 25 | conda install -c conda-forge dlib=19.4 26 | conda install pytorch torchvision -c soumith 27 | pip install -r requirements.txt 28 | ``` 29 | 30 | # Deploying a local copy 31 | 1. Before anything, you'll need to download a copy of the pretrained weights: 32 | ```bash 33 | git clone https://github.com/kendricktan/iffse.git 34 | cd iffse 35 | mkdir pretrained_weights 36 | wget https://github.com/ageitgey/face_recognition_models/raw/master/face_recognition_models/models/shape_predictor_68_face_landmarks.dat -O ./pretrained_weights/shape_predictor_68_face_landmarks.dat 37 | wget https://www.dropbox.com/s/lhus56cn1xikzeb/openface_cpu.pth?dl=1 -O ./pretrained_weights/openface_cpu.pth 38 | ``` 39 | 40 | 2. Time to scrap some data! I've written async multi-threaded scrapper that __doesn't__ require any instagram credentials :-). The following tags are used int `scrapper.py`, change them to whatever tags you want to scrap (e.g. `#NYC`, `#gymlife`) 41 | ```python 42 | # What kind of tags do we want to scrap 43 | tags_to_be_scraped = [ 44 | 'selfie', 'selfportait', 'dailylook', 'selfiesunday', 45 | 'selfietime', 'instaselfie', 'shamelessselefie', 46 | 'faceoftheday', 'me', 'selfieoftheday', 'instame', 47 | 'selfiestick', 'selfies' 48 | ] 49 | ``` 50 | 51 | 2. Run `python scrapper.py` and wait for a few hours / days. Ideally you do this step on a `C4 instance` on AWS as it'll max out your cores. 52 | 53 | 3. Run server 54 | 55 | ```bash 56 | # If you're running it for the first time 57 | # or want to update search indexes 58 | python app.py --rebuild-tree 59 | 60 | # Otherwise 61 | python app.py 62 | ``` 63 | 64 | # Special thanks: 65 | [OpenFacePytorch](https://github.com/thnkim/OpenFacePytorch) - OpenFace's `nn4.small2.v1.t7` model in PyTorch 66 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import os 3 | import jinja2 4 | import requests 5 | import argparse 6 | 7 | from io import BytesIO 8 | from PIL import ImageDraw 9 | 10 | from tqdm import tqdm 11 | from sanic import Sanic, response 12 | 13 | from annoy import AnnoyIndex 14 | from config import CONFIG 15 | 16 | from iffse.database import FacialEmbeddings, SelfiePost 17 | from iffse.utils.helpers import string_to_np, np_to_string 18 | from scrapper import ( 19 | get_instagram_shared_data, 20 | img_url_to_latent_space, 21 | img_url_to_pillow, 22 | ) 23 | 24 | # Global vars 25 | app = Sanic(__name__) 26 | app.static('/favicon.ico', './static/favicon.ico') 27 | app.static('/sadbaby', './static/sadbaby.jpg') 28 | app.static('/man', './static/man.png') 29 | 30 | annoy_settings = CONFIG['annoy_tree_settings'] 31 | annoy_tree = AnnoyIndex(128, metric=annoy_settings['metric']) 32 | 33 | # Helper functions 34 | 35 | 36 | def get_shortcode_from_facialembeddings_id(fe_id): 37 | """ 38 | Returns a shortcode given from the 39 | facial embedding index from annoy tree 40 | Indexes start from 0, ids from db start from 1, 41 | hence + 1 42 | """ 43 | try: 44 | return FacialEmbeddings.get(id=(fe_id + 1)).op.shortcode 45 | except: 46 | return None 47 | 48 | 49 | def get_unique_shortcodes_from_fe_ids(fe): 50 | """ 51 | Args: 52 | fe: facial embedding vector 53 | 54 | Returns: 55 | Shortcodes for the corresponding indexes 56 | """ 57 | global annoy_tree 58 | 59 | idxs = annoy_tree.get_nns_by_vector(fe, 20) 60 | 61 | shortcodes_unique = [] 62 | for i in idxs: 63 | s_ = get_shortcode_from_facialembeddings_id(i) 64 | if s_ not in shortcodes_unique and s_ is not None: 65 | shortcodes_unique.append(s_) 66 | 67 | return shortcodes_unique 68 | 69 | 70 | def pillow_to_base64(pil_img): 71 | """ 72 | Converts pillow image to base64 73 | so it can be sent back withou refreshing 74 | """ 75 | img_io = BytesIO() 76 | pil_img.save(img_io, 'PNG') 77 | return base64.b64encode(img_io.getvalue()) 78 | 79 | 80 | def render_jinja2(tpl_path, context): 81 | """ 82 | Render jinja2 html template (not used lol) 83 | """ 84 | path, filename = os.path.split(tpl_path) 85 | return jinja2.Environment( 86 | loader=jinja2.FileSystemLoader(path or './') 87 | ).get_template(filename).render(context) 88 | 89 | 90 | # Application logic 91 | @app.route('/') 92 | async def iffse_index(request): 93 | insta_post = request.args.get('p', '') 94 | html_ = render_jinja2('./templates/index.html', {'INSTA_POS_ID': insta_post}) 95 | return response.html(html_) 96 | 97 | 98 | @app.route('/search', methods=["POST", ]) 99 | async def iffse_search(request): 100 | try: 101 | url = request.json.get('url', None) 102 | 103 | if url is None: 104 | return response.json({'url': 'need to specify instagram url'}, status=400) 105 | 106 | # Get instagram json data 107 | # (in order to get img url and stuff) 108 | url = 'https://www.instagram.com/p/{}/'.format(url) 109 | 110 | r = requests.get(url) 111 | r_js = get_instagram_shared_data(r.text) 112 | 113 | # Get display url and shortcode 114 | media_json = r_js['entry_data']['PostPage'][0]['graphql']['shortcode_media'] 115 | display_src = media_json['display_url'] 116 | shortcode = media_json['shortcode'] 117 | 118 | # Pass image into the NN and 119 | # get the 128 dim embeddings 120 | # np_features: 128 dim embedding 121 | # img: origin image 122 | # bb: N x 4, bounding box for each face 123 | # fls: N x 68, facial landmarks for each face 124 | np_features, img, bb, fls = img_url_to_latent_space(display_src) 125 | 126 | # See if post has been indexed before 127 | # Fails on post that has multiple images 128 | # hacky fix 129 | try: 130 | s, created = SelfiePost.get_or_create( 131 | shortcode=shortcode, img_url=display_src) 132 | except: 133 | created = False 134 | 135 | # If it hasn't been indexed before, then 136 | # add the latent embeddings into it 137 | if created: 138 | for np_feature in np_features: 139 | # Convert to string and store in db 140 | np_str = np_to_string(np_feature) 141 | 142 | fe = FacialEmbeddings(op=s, latent_space=np_str) 143 | fe.save() 144 | 145 | # This copy of the image is to 146 | # display the facial landmarks 147 | img_landmarks = img.copy() 148 | draw_img = ImageDraw.Draw(img_landmarks) 149 | for fl in fls: 150 | for (x, y) in fl: 151 | draw_img.ellipse((x - 3, y - 3, x + 3, y + 3), fill='red', outline='red') 152 | 153 | # Now we can query it 154 | # For each face too 155 | shortcodes = {} 156 | for idx, feature in enumerate(np_features): 157 | b = bb[idx] 158 | fl = fls[idx] 159 | 160 | # Crop to specific face 161 | lrtp = (b.left(), b.top(), b.right(), b.bottom()) 162 | img_cropped = img.crop(lrtp) 163 | img_cropped_base64 = pillow_to_base64(img_cropped) 164 | 165 | # Draw facial landmarks 166 | img_landmarks_cropped = img_landmarks.crop(lrtp) 167 | img_landmarks_base64 = pillow_to_base64(img_landmarks_cropped) 168 | 169 | # Get unique shortcode based on the features provided 170 | shortcodes_unique = get_unique_shortcodes_from_fe_ids(feature) 171 | shortcodes[idx] = { 172 | 'face': img_cropped_base64, 173 | 'face_landmarks': img_landmarks_base64, 174 | 'shortcodes': shortcodes_unique 175 | } 176 | 177 | return response.json( 178 | {'data': shortcodes} 179 | ) 180 | 181 | except Exception as e: 182 | print(e) 183 | return response.json( 184 | {'error': 'something fucked up lol'}, 185 | status=400 186 | ) 187 | 188 | 189 | if __name__ == '__main__': 190 | parser = argparse.ArgumentParser(description='IFFSE') 191 | parser.add_argument('--rebuild-tree', action='store_true') 192 | args, unknown = parser.parse_known_args() 193 | 194 | # If specified to rebuild tree then rebuild it 195 | if args.rebuild_tree: 196 | print('Rebuidling tree...') 197 | for idx, f in enumerate(tqdm(FacialEmbeddings.select())): 198 | try: 199 | cur_np = string_to_np(f.latent_space) 200 | annoy_tree.add_item(idx, cur_np) 201 | 202 | except Exception as e: 203 | tqdm.write(str(e)) 204 | 205 | annoy_tree.build(annoy_settings['forest_trees_no']) 206 | annoy_tree.save(CONFIG['annoy_tree']) 207 | 208 | # Else just load config 209 | else: 210 | annoy_tree.load(CONFIG['annoy_tree']) 211 | 212 | app.run(host='127.0.0.1', port=8000, debug=False) 213 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | origin = os.path.abspath(os.path.dirname(__file__)) 4 | data_folder = os.path.join(origin, 'data') 5 | 6 | 7 | CONFIG = { 8 | 'annoy_tree': os.path.join(data_folder, 'annoy_selfiers.ann'), 9 | 'annoy_tree_settings': { 10 | 'metric': 'euclidean', # euclidean / angular 11 | 'forest_trees_no': 512, # Number a forest of N tr 12 | }, 13 | # If you change sqlite_db, please reflect it in 14 | # ./facemaps/data/database.py 15 | 'sqlite_db': os.path.join(data_folder, 'selfiers.db'), 16 | } 17 | -------------------------------------------------------------------------------- /iffse/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/iffse/__init__.py -------------------------------------------------------------------------------- /iffse/database.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from peewee import ( 4 | CharField, 5 | TextField, 6 | SqliteDatabase, 7 | OperationalError, 8 | ForeignKeyField, 9 | Model 10 | ) 11 | 12 | if not os.path.exists('./data'): 13 | os.makedirs('./data') 14 | 15 | db = SqliteDatabase('./data/selfiers.db') 16 | 17 | 18 | class SelfiePost(Model): 19 | # Instagram shortcode 20 | # e.g. instagram.com/p/SHORTCODE 21 | shortcode = CharField(unique=True) 22 | img_url = TextField() 23 | 24 | class Meta: 25 | database = db 26 | 27 | 28 | class FacialEmbeddings(Model): 29 | # Original Post 30 | op = ForeignKeyField(SelfiePost, related_name='op') 31 | latent_space = CharField() 32 | 33 | class Meta: 34 | database = db 35 | 36 | 37 | if __name__ == '__main__': 38 | db.connect() 39 | 40 | try: 41 | db.create_tables([SelfiePost]) 42 | except OperationalError: 43 | pass 44 | -------------------------------------------------------------------------------- /iffse/trees.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from helpers import string_to_np 4 | 5 | from annoy import AnnoyIndex 6 | from tqdm import tqdm 7 | 8 | 9 | def build_annoy_tree(facial_embeddings, tree_path, 10 | annoy_metric='euclidean', annoy_trees_no=256): 11 | """ 12 | Builds an annoy tree 13 | 14 | Args: 15 | facial_embeddings: List of facial embeddings to be indexed in tree 16 | tree_path: where the annoy tree will be saved 17 | annoy_metric: euclidean / angular 18 | annoy_tree_no: how many trees in the annoy forest? Larger = more accurate 19 | """ 20 | 21 | # Annoy tree 22 | tree = AnnoyIndex(128, metric=annoy_metric) 23 | 24 | # Don't wanna store entire db into memory 25 | for idx, f in enumerate(tqdm(facial_embeddings)): 26 | # Sqlte errors sometimes? 27 | try: 28 | cur_np = string_to_np(f.latent_space) 29 | 30 | tree.add_item(idx, cur_np) 31 | 32 | except Exception as e: 33 | tqdm.write(str(e)) 34 | 35 | tree.build(annoy_trees_no) 36 | tree.save(tree_path) 37 | -------------------------------------------------------------------------------- /iffse/utils/cv/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/iffse/utils/cv/__init__.py -------------------------------------------------------------------------------- /iffse/utils/cv/faces.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import dlib 3 | import numpy as np 4 | 5 | 6 | FACIAL_LANDMARKS_TEMPLATE = np.float32([ 7 | (0.0792396913815, 0.339223741112), (0.0829219487236, 0.456955367943), 8 | (0.0967927109165, 0.575648016728), (0.122141515615, 0.691921601066), 9 | (0.168687863544, 0.800341263616), (0.239789390707, 0.895732504778), 10 | (0.325662452515, 0.977068762493), (0.422318282013, 1.04329000149), 11 | (0.531777802068, 1.06080371126), (0.641296298053, 1.03981924107), 12 | (0.738105872266, 0.972268833998), (0.824444363295, 0.889624082279), 13 | (0.894792677532, 0.792494155836), (0.939395486253, 0.681546643421), 14 | (0.96111933829, 0.562238253072), (0.970579841181, 0.441758925744), 15 | (0.971193274221, 0.322118743967), (0.163846223133, 0.249151738053), 16 | (0.21780354657, 0.204255863861), (0.291299351124, 0.192367318323), 17 | (0.367460241458, 0.203582210627), (0.4392945113, 0.233135599851), 18 | (0.586445962425, 0.228141644834), (0.660152671635, 0.195923841854), 19 | (0.737466449096, 0.182360984545), (0.813236546239, 0.192828009114), 20 | (0.8707571886, 0.235293377042), (0.51534533827, 0.31863546193), 21 | (0.516221448289, 0.396200446263), (0.517118861835, 0.473797687758), 22 | (0.51816430343, 0.553157797772), (0.433701156035, 0.604054457668), 23 | (0.475501237769, 0.62076344024), (0.520712933176, 0.634268222208), 24 | (0.565874114041, 0.618796581487), (0.607054002672, 0.60157671656), 25 | (0.252418718401, 0.331052263829), (0.298663015648, 0.302646354002), 26 | (0.355749724218, 0.303020650651), (0.403718978315, 0.33867711083), 27 | (0.352507175597, 0.349987615384), (0.296791759886, 0.350478978225), 28 | (0.631326076346, 0.334136672344), (0.679073381078, 0.29645404267), 29 | (0.73597236153, 0.294721285802), (0.782865376271, 0.321305281656), 30 | (0.740312274764, 0.341849376713), (0.68499850091, 0.343734332172), 31 | (0.353167761422, 0.746189164237), (0.414587777921, 0.719053835073), 32 | (0.477677654595, 0.706835892494), (0.522732900812, 0.717092275768), 33 | (0.569832064287, 0.705414478982), (0.635195811927, 0.71565572516), 34 | (0.69951672331, 0.739419187253), (0.639447159575, 0.805236879972), 35 | (0.576410514055, 0.835436670169), (0.525398405766, 0.841706377792), 36 | (0.47641545769, 0.837505914975), (0.41379548902, 0.810045601727), 37 | (0.380084785646, 0.749979603086), (0.477955996282, 0.74513234612), 38 | (0.523389793327, 0.748924302636), (0.571057789237, 0.74332894691), 39 | (0.672409137852, 0.744177032192), (0.572539621444, 0.776609286626), 40 | (0.5240106503, 0.783370783245), (0.477561227414, 0.778476346951)]) 41 | 42 | FL_MIN = np.min(FACIAL_LANDMARKS_TEMPLATE, axis=0) 43 | FL_MAX = np.max(FACIAL_LANDMARKS_TEMPLATE, axis=0) 44 | SCALED_LANDMARKS = (FACIAL_LANDMARKS_TEMPLATE - FL_MIN) / (FL_MAX - FL_MIN) 45 | 46 | #: Landmark indices. 47 | INNER_EYES_AND_BOTTOM_LIP = [39, 42, 57] 48 | OUTER_EYES_AND_NOSE = [36, 45, 33] 49 | 50 | 51 | def maybe_face_bounding_box(detector, img): 52 | """ 53 | Returns a bounding box if it finds 54 | ONE face. Any other case, it returns 55 | none 56 | 57 | Args: 58 | detector: dlib.get_front_face_detector() 59 | img: image 60 | """ 61 | dets = detector(img, 1) 62 | if len(dets) >= 1: 63 | return dets 64 | return None 65 | 66 | 67 | def get_68_facial_landmarks(predictor, img, bb): 68 | """ 69 | Returns a list of 68 facial landmarks 70 | 71 | Args: 72 | predictor: dlibs.shape_predict(_FILE_) 73 | img: input image 74 | bb: bounding box containing the face 75 | """ 76 | points = predictor(img, bb) 77 | return list(map(lambda p: (p.x, p.y), points.parts())) 78 | 79 | 80 | def align_face_to_template(img, facial_landmarks, output_dim, landmarkIndices=INNER_EYES_AND_BOTTOM_LIP): 81 | """ 82 | Aligns image by warping it to fit the landmarks on 83 | the image (src) to the landmarks on the template (dst) 84 | 85 | Args: 86 | img: src image to be aligned 87 | facial_landmarks: list of 68 landmarks (obtained from dlib) 88 | output_dim: image output dimension 89 | """ 90 | np_landmarks = np.float32(facial_landmarks) 91 | np_landmarks_idx = np.array(landmarkIndices) 92 | 93 | H = cv2.getAffineTransform(np_landmarks[np_landmarks_idx], 94 | output_dim * SCALED_LANDMARKS[np_landmarks_idx]) 95 | warped = cv2.warpAffine(img, H, (output_dim, output_dim)) 96 | 97 | return warped 98 | -------------------------------------------------------------------------------- /iffse/utils/helpers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def np_to_string(n): 5 | """ 6 | Converts a one dimensional numpy array 7 | into a string format to be stored in db 8 | """ 9 | 10 | # Squeeze out dims 11 | n = np.squeeze(n) 12 | 13 | return np.array_str(n)[1:-1] 14 | 15 | 16 | def string_to_np(s): 17 | """ 18 | Converts string to numpy array. 19 | """ 20 | return np.fromstring(s, sep=' ') 21 | -------------------------------------------------------------------------------- /iffse/utils/ml/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/iffse/utils/ml/__init__.py -------------------------------------------------------------------------------- /iffse/utils/ml/example.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | from torch.autograd import Variable 4 | from loadOpenFace import prepareOpenFace 5 | 6 | net = prepareOpenFace(useCuda=False).cpu() 7 | feature = net(Variable(torch.randn(1, 3, 96, 96))) 8 | print(feature[0]) 9 | -------------------------------------------------------------------------------- /iffse/utils/ml/open_face.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | import torch.backends.cudnn as cudnn 8 | from collections import OrderedDict 9 | try: 10 | from .spatial_lrn import SpatialCrossMapLRN_temp 11 | except: 12 | from spatial_lrn import SpatialCrossMapLRN_temp 13 | import os 14 | import time 15 | 16 | import pathlib 17 | containing_dir = str(pathlib.Path(__file__).resolve().parent) 18 | 19 | 20 | class LambdaBase(nn.Sequential): 21 | def __init__(self, fn, *args): 22 | super(LambdaBase, self).__init__(*args) 23 | self.lambda_func = fn 24 | 25 | def forward_prepare(self, input): 26 | output = [] 27 | for module in self._modules.values(): 28 | output.append(module(input)) 29 | return output if output else input 30 | 31 | 32 | class Lambda(LambdaBase): 33 | def forward(self, input): 34 | return self.lambda_func(self.forward_prepare(input)) 35 | 36 | 37 | # 38 | def Conv2d(in_dim, out_dim, kernel, stride, padding): 39 | l = torch.nn.Conv2d(in_dim, out_dim, kernel, 40 | stride=stride, padding=padding) 41 | return l 42 | 43 | 44 | def BatchNorm(dim): 45 | l = torch.nn.BatchNorm2d(dim) 46 | return l 47 | 48 | 49 | def CrossMapLRN(size, alpha, beta, k=1.0, gpuDevice=0): 50 | lrn = SpatialCrossMapLRN_temp(size, alpha, beta, k, gpuDevice=gpuDevice) 51 | n = Lambda(lambda x, lrn=lrn: Variable(lrn.forward(x.data).cuda( 52 | gpuDevice)) if x.data.is_cuda else Variable(lrn.forward(x.data))) 53 | return n 54 | 55 | 56 | def Linear(in_dim, out_dim): 57 | l = torch.nn.Linear(in_dim, out_dim) 58 | return l 59 | 60 | 61 | class Inception(nn.Module): 62 | def __init__(self, inputSize, kernelSize, kernelStride, outputSize, reduceSize, pool, useBatchNorm, reduceStride=None, padding=True): 63 | super(Inception, self).__init__() 64 | # 65 | self.seq_list = [] 66 | self.outputSize = outputSize 67 | 68 | # 69 | # 1x1 conv (reduce) -> 3x3 conv 70 | # 1x1 conv (reduce) -> 5x5 conv 71 | # ... 72 | for i in range(len(kernelSize)): 73 | od = OrderedDict() 74 | # 1x1 conv 75 | od['1_conv'] = Conv2d(inputSize, reduceSize[i], (1, 1), 76 | reduceStride[i] if reduceStride is not None else 1, (0, 0)) 77 | if useBatchNorm: 78 | od['2_bn'] = BatchNorm(reduceSize[i]) 79 | od['3_relu'] = nn.ReLU() 80 | # nxn conv 81 | pad = int(numpy.floor(kernelSize[i] / 2)) if padding else 0 82 | od['4_conv'] = Conv2d( 83 | reduceSize[i], outputSize[i], kernelSize[i], kernelStride[i], pad) 84 | if useBatchNorm: 85 | od['5_bn'] = BatchNorm(outputSize[i]) 86 | od['6_relu'] = nn.ReLU() 87 | # 88 | self.seq_list.append(nn.Sequential(od)) 89 | 90 | ii = len(kernelSize) 91 | # pool -> 1x1 conv 92 | od = OrderedDict() 93 | od['1_pool'] = pool 94 | if ii < len(reduceSize) and reduceSize[ii] is not None: 95 | i = ii 96 | od['2_conv'] = Conv2d(inputSize, reduceSize[i], (1, 1), 97 | reduceStride[i] if reduceStride is not None else 1, (0, 0)) 98 | if useBatchNorm: 99 | od['3_bn'] = BatchNorm(reduceSize[i]) 100 | od['4_relu'] = nn.ReLU() 101 | # 102 | self.seq_list.append(nn.Sequential(od)) 103 | ii += 1 104 | 105 | # reduce: 1x1 conv (channel-wise pooling) 106 | if ii < len(reduceSize) and reduceSize[ii] is not None: 107 | i = ii 108 | od = OrderedDict() 109 | od['1_conv'] = Conv2d(inputSize, reduceSize[i], (1, 1), 110 | reduceStride[i] if reduceStride is not None else 1, (0, 0)) 111 | if useBatchNorm: 112 | od['2_bn'] = BatchNorm(reduceSize[i]) 113 | od['3_relu'] = nn.ReLU() 114 | self.seq_list.append(nn.Sequential(od)) 115 | 116 | self.seq_list = nn.ModuleList(self.seq_list) 117 | 118 | def forward(self, input): 119 | x = input 120 | 121 | ys = [] 122 | target_size = None 123 | depth_dim = 0 124 | for seq in self.seq_list: 125 | # print(seq) 126 | # print(self.outputSize) 127 | # print('x_size:', x.size()) 128 | y = seq(x) 129 | y_size = y.size() 130 | # print('y_size:', y_size) 131 | ys.append(y) 132 | # 133 | if target_size is None: 134 | target_size = [0] * len(y_size) 135 | # 136 | for i in range(len(target_size)): 137 | target_size[i] = max(target_size[i], y_size[i]) 138 | depth_dim += y_size[1] 139 | 140 | target_size[1] = depth_dim 141 | # print('target_size:', target_size) 142 | 143 | for i in range(len(ys)): 144 | y_size = ys[i].size() 145 | pad_l = int((target_size[3] - y_size[3]) // 2) 146 | pad_t = int((target_size[2] - y_size[2]) // 2) 147 | pad_r = target_size[3] - y_size[3] - pad_l 148 | pad_b = target_size[2] - y_size[2] - pad_t 149 | ys[i] = F.pad(ys[i], (pad_l, pad_r, pad_t, pad_b)) 150 | 151 | output = torch.cat(ys, 1) 152 | 153 | return output 154 | 155 | 156 | class netOpenFace(nn.Module): 157 | def __init__(self, useCuda, gpuDevice=0): 158 | super(netOpenFace, self).__init__() 159 | 160 | self.gpuDevice = gpuDevice 161 | 162 | self.layer1 = Conv2d(3, 64, (7, 7), (2, 2), (3, 3)) 163 | self.layer2 = BatchNorm(64) 164 | self.layer3 = nn.ReLU() 165 | self.layer4 = nn.MaxPool2d((3, 3), stride=(2, 2), padding=(1, 1)) 166 | self.layer5 = CrossMapLRN(5, 0.0001, 0.75, gpuDevice=gpuDevice) 167 | self.layer6 = Conv2d(64, 64, (1, 1), (1, 1), (0, 0)) 168 | self.layer7 = BatchNorm(64) 169 | self.layer8 = nn.ReLU() 170 | self.layer9 = Conv2d(64, 192, (3, 3), (1, 1), (1, 1)) 171 | self.layer10 = BatchNorm(192) 172 | self.layer11 = nn.ReLU() 173 | self.layer12 = CrossMapLRN(5, 0.0001, 0.75, gpuDevice=gpuDevice) 174 | self.layer13 = nn.MaxPool2d((3, 3), stride=(2, 2), padding=(1, 1)) 175 | self.layer14 = Inception(192, (3, 5), (1, 1), (128, 32), (96, 16, 32, 64), nn.MaxPool2d( 176 | (3, 3), stride=(2, 2), padding=(0, 0)), True) 177 | self.layer15 = Inception(256, (3, 5), (1, 1), (128, 64), (96, 32, 64, 64), nn.LPPool2d( 178 | 2, (3, 3), stride=(3, 3)), True) 179 | self.layer16 = Inception(320, (3, 5), (2, 2), (256, 64), (128, 32, None, None), nn.MaxPool2d( 180 | (3, 3), stride=(2, 2), padding=(0, 0)), True) 181 | self.layer17 = Inception(640, (3, 5), (1, 1), (192, 64), (96, 32, 128, 256), nn.LPPool2d( 182 | 2, (3, 3), stride=(3, 3)), True) 183 | self.layer18 = Inception(640, (3, 5), (2, 2), (256, 128), (160, 64, None, None), nn.MaxPool2d( 184 | (3, 3), stride=(2, 2), padding=(0, 0)), True) 185 | self.layer19 = Inception(1024, (3,), (1,), (384,), (96, 96, 256), nn.LPPool2d( 186 | 2, (3, 3), stride=(3, 3)), True) 187 | self.layer21 = Inception(736, (3,), (1,), (384,), (96, 96, 256), nn.MaxPool2d( 188 | (3, 3), stride=(2, 2), padding=(0, 0)), True) 189 | self.layer22 = nn.AvgPool2d((3, 3), stride=(1, 1), padding=(0, 0)) 190 | self.layer25 = Linear(736, 128) 191 | 192 | # 193 | self.resize1 = nn.UpsamplingNearest2d(scale_factor=3) 194 | self.resize2 = nn.AvgPool2d(4) 195 | 196 | # 197 | # self.eval() 198 | 199 | if useCuda: 200 | self.cuda(gpuDevice) 201 | 202 | def forward(self, input): 203 | x = input 204 | 205 | if x.data.is_cuda and self.gpuDevice != 0: 206 | x = x.cuda(self.gpuDevice) 207 | 208 | # 209 | if x.size()[-1] == 128: 210 | x = self.resize2(self.resize1(x)) 211 | 212 | x = self.layer8(self.layer7(self.layer6(self.layer5( 213 | self.layer4(self.layer3(self.layer2(self.layer1(x)))))))) 214 | x = self.layer13(self.layer12( 215 | self.layer11(self.layer10(self.layer9(x))))) 216 | x = self.layer14(x) 217 | x = self.layer15(x) 218 | x = self.layer16(x) 219 | x = self.layer17(x) 220 | x = self.layer18(x) 221 | x = self.layer19(x) 222 | x = self.layer21(x) 223 | x = self.layer22(x) 224 | x = x.view((-1, 736)) 225 | 226 | x_736 = x 227 | 228 | x = self.layer25(x) 229 | x_norm = torch.sqrt(torch.sum(x**2, 1) + 1e-6) 230 | x = torch.div(x, x_norm.view(-1, 1).expand_as(x)) 231 | 232 | return (x, x_736) 233 | 234 | 235 | def load_openface_net(checkpoint_pth, cuda=True, gpu_id=0, multi_gpu=False): 236 | """ 237 | Creates an OpenFace Network and loads the 238 | checkpoint file (openface.pth) 239 | """ 240 | model = netOpenFace(cuda, gpu_id) 241 | model.load_state_dict(torch.load(checkpoint_pth)) 242 | 243 | if multi_gpu: 244 | model = nn.DataParallel(model) 245 | 246 | return model 247 | -------------------------------------------------------------------------------- /iffse/utils/ml/spatial_lrn.py: -------------------------------------------------------------------------------- 1 | # This is a simple modification of https://github.com/pytorch/pytorch/blob/master/torch/legacy/nn/SpatialCrossMapLRN.py. 2 | 3 | import torch 4 | from torch.legacy.nn.Module import Module 5 | from torch.legacy.nn.utils import clear 6 | 7 | 8 | class SpatialCrossMapLRN_temp(Module): 9 | 10 | def __init__(self, size, alpha=1e-4, beta=0.75, k=1, gpuDevice=0): 11 | super(SpatialCrossMapLRN_temp, self).__init__() 12 | 13 | self.size = size 14 | self.alpha = alpha 15 | self.beta = beta 16 | self.k = k 17 | self.scale = None 18 | self.paddedRatio = None 19 | self.accumRatio = None 20 | self.gpuDevice = gpuDevice 21 | 22 | def updateOutput(self, input): 23 | assert input.dim() == 4 24 | 25 | if self.scale is None: 26 | self.scale = input.new() 27 | 28 | if self.output is None: 29 | self.output = input.new() 30 | 31 | batchSize = input.size(0) 32 | channels = input.size(1) 33 | inputHeight = input.size(2) 34 | inputWidth = input.size(3) 35 | 36 | if input.is_cuda: 37 | self.output = self.output.cuda(self.gpuDevice) 38 | self.scale = self.scale.cuda(self.gpuDevice) 39 | 40 | self.output.resize_as_(input) 41 | self.scale.resize_as_(input) 42 | 43 | # use output storage as temporary buffer 44 | inputSquare = self.output 45 | torch.pow(input, 2, out=inputSquare) 46 | 47 | prePad = int((self.size - 1) / 2 + 1) 48 | prePadCrop = channels if prePad > channels else prePad 49 | 50 | scaleFirst = self.scale.select(1, 0) 51 | scaleFirst.zero_() 52 | # compute first feature map normalization 53 | for c in range(prePadCrop): 54 | scaleFirst.add_(inputSquare.select(1, c)) 55 | 56 | # reuse computations for next feature maps normalization 57 | # by adding the next feature map and removing the previous 58 | for c in range(1, channels): 59 | scalePrevious = self.scale.select(1, c - 1) 60 | scaleCurrent = self.scale.select(1, c) 61 | scaleCurrent.copy_(scalePrevious) 62 | if c < channels - prePad + 1: 63 | squareNext = inputSquare.select(1, c + prePad - 1) 64 | scaleCurrent.add_(1, squareNext) 65 | 66 | if c > prePad: 67 | squarePrevious = inputSquare.select(1, c - prePad) 68 | scaleCurrent.add_(-1, squarePrevious) 69 | 70 | self.scale.mul_(self.alpha / self.size).add_(self.k) 71 | 72 | torch.pow(self.scale, -self.beta, out=self.output) 73 | self.output.mul_(input) 74 | 75 | return self.output 76 | 77 | def updateGradInput(self, input, gradOutput): 78 | assert input.dim() == 4 79 | 80 | batchSize = input.size(0) 81 | channels = input.size(1) 82 | inputHeight = input.size(2) 83 | inputWidth = input.size(3) 84 | 85 | if self.paddedRatio is None: 86 | self.paddedRatio = input.new() 87 | if self.accumRatio is None: 88 | self.accumRatio = input.new() 89 | self.paddedRatio.resize_( 90 | channels + self.size - 1, inputHeight, inputWidth) 91 | self.accumRatio.resize_(inputHeight, inputWidth) 92 | 93 | cacheRatioValue = 2 * self.alpha * self.beta / self.size 94 | inversePrePad = int(self.size - (self.size - 1) / 2) 95 | 96 | self.gradInput.resize_as_(input) 97 | torch.pow(self.scale, -self.beta, out=self.gradInput).mul_(gradOutput) 98 | 99 | self.paddedRatio.zero_() 100 | paddedRatioCenter = self.paddedRatio.narrow(0, inversePrePad, channels) 101 | for n in range(batchSize): 102 | torch.mul(gradOutput[n], self.output[n], out=paddedRatioCenter) 103 | paddedRatioCenter.div_(self.scale[n]) 104 | torch.sum(self.paddedRatio.narrow( 105 | 0, 0, self.size - 1), 0, out=self.accumRatio) 106 | for c in range(channels): 107 | self.accumRatio.add_(self.paddedRatio[c + self.size - 1]) 108 | self.gradInput[n][c].addcmul_(-cacheRatioValue, 109 | input[n][c], self.accumRatio) 110 | self.accumRatio.add_(-1, self.paddedRatio[c]) 111 | 112 | return self.gradInput 113 | 114 | def clearState(self): 115 | clear(self, 'scale', 'paddedRatio', 'accumRatio') 116 | return super(SpatialCrossMapLRN_temp, self).clearState() 117 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | annoy==1.8.3 2 | Jinja2==2.9.4 3 | opencv-python==3.2.0.7 4 | peewee==2.10.1 5 | Pillow==4.1.1 6 | requests==2.14.2 7 | sanic==0.5.4 8 | Sanic-Cors==0.5.4.2 9 | scikit-image==0.13.0 10 | tqdm==4.14.0 -------------------------------------------------------------------------------- /scrapper.py: -------------------------------------------------------------------------------- 1 | """ 2 | Instagram selfie scraper, date: 2017/06/14 3 | Author: Kendrick Tan 4 | """ 5 | import threading 6 | import time 7 | import json 8 | import re 9 | import requests 10 | import random 11 | 12 | import sys 13 | import dlib 14 | 15 | import torch 16 | import torch.nn as nn 17 | import torch.nn.functional as F 18 | 19 | import torchvision.transforms as transforms 20 | 21 | import skimage.io as skio 22 | import skimage.draw as skdr 23 | import numpy as np 24 | 25 | from peewee import OperationalError 26 | 27 | from torch.autograd import Variable 28 | 29 | from iffse.database import db, SelfiePost, FacialEmbeddings 30 | from iffse.utils.helpers import string_to_np, np_to_string 31 | from iffse.utils.ml.open_face import load_openface_net 32 | from iffse.utils.cv.faces import ( 33 | align_face_to_template, 34 | maybe_face_bounding_box, 35 | get_68_facial_landmarks 36 | ) 37 | 38 | from io import BytesIO 39 | from PIL import Image 40 | from multiprocessing import Pool, Queue 41 | 42 | # Global vars 43 | # Headers to mimic mozilla 44 | HEADERS = { 45 | 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 46 | 47 | # Network to get embeddings 48 | pyopenface = load_openface_net( 49 | './pretrained_weights/openface_cpu.pth', cuda=False 50 | ) 51 | 52 | # Dlib to preprocess images 53 | detector = dlib.get_frontal_face_detector() 54 | predictor = dlib.shape_predictor( 55 | './pretrained_weights/shape_predictor_68_face_landmarks.dat' 56 | ) 57 | 58 | transform = transforms.Compose( 59 | [ 60 | transforms.ToTensor(), 61 | transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), 62 | ] 63 | ) 64 | 65 | 66 | def get_instagram_feed_page_query_id(en_commons_url): 67 | """ 68 | Given the en_US_Commons.js url, find the query 69 | id from within for a feed page 70 | 71 | Args: 72 | en_commons_url: URL for the en_Commons_url.js 73 | (can be found by viewing instagram.com source) 74 | 75 | Returns: 76 | query_id needed to query graphql 77 | """ 78 | r = requests.get(en_commons_url, headers=HEADERS) 79 | 80 | # Has multiple ways of passing query id 81 | # (They using a nightly build...) 82 | query_id = re.findall(r'c="(\d+)",l="TAG_MEDIA_UPDATED"', r.text) 83 | if len(query_id) == 0: 84 | query_id = re.findall( 85 | r'byTagName.get\(t\).pagination},queryId:"(\d+)",queryParams', r.text) 86 | query_id = query_id[0] 87 | 88 | return query_id 89 | 90 | 91 | def get_instagram_us_common_js(text): 92 | """ 93 | Given a Instagram HTML page, return the 94 | en_US_Common.js thingo URL (contains the query_id) 95 | 96 | Args: 97 | text: Raw html source for instagram.com 98 | 99 | Returns: 100 | url to obtain us_commons_js 101 | """ 102 | js_file = re.findall(r"en_US_Commons.js/(\w+).js", text)[0] 103 | return "https://www.instagram.com/static/bundles/en_US_Commons.js/{}.js".format(str(js_file)) 104 | 105 | 106 | def get_instagram_shared_data(text): 107 | """ 108 | Given a Instagram HTML page, return the 109 | 'shared_data' json object 110 | G 111 | Args: 112 | text: Raw html source for instagram 113 | 114 | Returns: 115 | dict containing the json blob thats in 116 | instagram.com 117 | """ 118 | json_blob = re.findall(r"window._sharedData\s=\s(.+);", text)[0] 119 | return json.loads(json_blob) 120 | 121 | 122 | def get_instagram_hashtag_feed(query_id, end_cursor, tag_name='selfie'): 123 | """ 124 | Traverses through instagram's hashtag feed, using the 125 | graphql endpoint 126 | """ 127 | feed_url = 'https://www.instagram.com/graphql/query/?query_id={}&' \ 128 | 'tag_name={}&first=6&after={}'.format( 129 | query_id, tag_name, end_cursor) 130 | 131 | r = requests.get(feed_url, headers=HEADERS) 132 | r_js = json.loads(r.text) 133 | 134 | # Has next page or nah 135 | page_info = r_js['data']['hashtag']['edge_hashtag_to_media']['page_info'] 136 | end_cursor = page_info['end_cursor'] 137 | 138 | edges = r_js['data']['hashtag']['edge_hashtag_to_media']['edges'] 139 | 140 | display_srcs = [] 141 | shortcodes = [] 142 | 143 | for e in edges: 144 | shortcodes.append(e['node']['shortcode']) 145 | display_srcs.append(e['node']['display_url']) 146 | 147 | return list(zip(shortcodes, display_srcs)), end_cursor 148 | 149 | 150 | def instagram_hashtag_seed(tag_name='selfie'): 151 | """ 152 | Seed function that calls instagram's hashtag page 153 | in order to obtain the end_cursor thingo 154 | """ 155 | r = requests.get( 156 | 'https://www.instagram.com/explore/tags/{}/'.format(tag_name), 157 | headers=HEADERS) 158 | r_js = get_instagram_shared_data(r.text) 159 | 160 | # To get the query id 161 | en_common_js_url = get_instagram_us_common_js(r.text) 162 | query_id = get_instagram_feed_page_query_id(en_common_js_url) 163 | 164 | # Concat first 12 username and profile_ids here 165 | shortcodes = [] 166 | display_srcs = [] 167 | 168 | # Fb works by firstly calling the first page 169 | # and loading the HTML and all that jazz, so 170 | # you need to parse that bit during the 1st call. 171 | # The proceeding images can be obtained by 172 | # calling the graphql api endpoint with a 173 | # specified end_cursor 174 | media_json = r_js['entry_data']['TagPage'][0]['tag']['media'] 175 | for m in media_json['nodes']: 176 | shortcodes.append(m['code']) 177 | display_srcs.append(m['display_src']) 178 | 179 | page_info = media_json['page_info'] 180 | end_cursor = page_info['end_cursor'] 181 | 182 | print('[{}] Got seed page for instagram tag: {}'.format( 183 | time.ctime(), tag_name)) 184 | 185 | return list(zip(shortcodes, display_srcs)), query_id, end_cursor 186 | 187 | 188 | def img_url_to_pillow(display_url): 189 | """ 190 | Returns a Pillow Image given a url 191 | """ 192 | r = requests.get(display_url, headers=HEADERS) 193 | img = Image.open(BytesIO(r.content)).convert("RGB") 194 | return img 195 | 196 | 197 | def img_url_to_latent_space(display_url): 198 | """ 199 | Given a display url, download the image, 200 | find the faces (if any), crop them, feed 201 | through the NN to get the embeddings. If 202 | it fails at any stage, return 0 203 | 204 | Args: 205 | display_url: URL containing image to be id'ed 206 | 207 | Returns: 208 | None, None, None 209 | or 210 | N x 128 numpy array, Img, Bounding Box coordinates 211 | (N is the number of faces it found on img) 212 | """ 213 | global pyopenface 214 | 215 | # Download copy of image 216 | # Convert RGB and then to numpy 217 | img_pil = img_url_to_pillow(display_url) 218 | img = np.array(img_pil) 219 | 220 | # Get bounding box 221 | bb = maybe_face_bounding_box(detector, img) 222 | 223 | if bb is None: 224 | return None, None, None, None 225 | 226 | # Iterate through each possible bounding box 227 | # And chuck in their respective facial landmarks 228 | facial_landmarks = [] 229 | img_tensor = None 230 | for idx, b in enumerate(bb): 231 | # Get 68 landmarks 232 | points = get_68_facial_landmarks(predictor, img, b) 233 | facial_landmarks.append(points) 234 | 235 | # Realign image and resize 236 | # to 96 x 96 (network input) 237 | img_aligned = align_face_to_template(img, points, 96) 238 | 239 | # Convert to temporary tensor 240 | img_tensor_temp = transform(img_aligned) 241 | img_tensor_temp = img_tensor_temp.view(1, 3, 96, 96) 242 | 243 | # Essentially makes a 'batch' size 244 | if img_tensor is None: 245 | img_tensor = img_tensor_temp 246 | 247 | else: 248 | img_tensor = torch.cat((img_tensor, img_tensor_temp), 0) 249 | 250 | # Pass through network 251 | # get NUM_FACES x 128 latent space 252 | np_features = pyopenface(Variable(img_tensor))[0].data.numpy() 253 | 254 | return np_features, img_pil, bb, facial_landmarks 255 | 256 | 257 | def mp_instagram_hashtag_feed_to_queue(args): 258 | """ 259 | Multiprocessing function for scraping instagram hashtag feed 260 | 261 | Returns: 262 | (Success, shortcodes, display_srcs) 263 | """ 264 | global g_queue 265 | 266 | shortcode, display_url, tag = args 267 | 268 | try: 269 | # Facial recognition logic here: 270 | np_features, _, _, _ = img_url_to_latent_space(display_url) 271 | 272 | if np_features is None: 273 | print("[{}] No faces: {} <{}>".format( 274 | time.ctime(), shortcode, tag)) 275 | return 276 | 277 | # Create a selfie post 278 | # attach all latent space to this foreign key 279 | s, created = SelfiePost.get_or_create( 280 | shortcode=shortcode, img_url=display_url) 281 | 282 | # Break if already created 283 | if not created: 284 | print("[{}] !! Already indexed: {} <{}>".format( 285 | time.ctime(), shortcode, tag)) 286 | return 287 | 288 | for np_feature in np_features: 289 | # Convert to string and store in db 290 | np_str = np_to_string(np_feature) 291 | fe = FacialEmbeddings(op=s, latent_space=np_str) 292 | fe.save() 293 | 294 | print("[{}] Success: {} <{}>".format(time.ctime(), shortcode, tag)) 295 | 296 | except Exception as e: 297 | print("[{}] ====> Failed: {}, {}".format(time.ctime(), shortcode, e)) 298 | 299 | 300 | def maybe_get_next_instagram_hashtag_feed(qid, ec, tag): 301 | """ 302 | Trys to get instagram hashtag feed, it it can't 303 | changes query id and calls itself again 304 | """ 305 | try: 306 | sds, ec = get_instagram_hashtag_feed(qid, ec, tag) 307 | 308 | except Exception as e: 309 | print('[{}] !!!!!!!!'.format(time.ctime())) 310 | print('!!!! Error: {} !!!!'.format(e)) 311 | print('!!!! Instagram probably rate limited us... whoops !!!!') 312 | print('!!!! Pausing for ~1.5 minute!!!!') 313 | time.sleep(random.randint(60, 90)) 314 | 315 | # Get new query id 316 | # Swap out lines below if you 317 | # want to scrap new posts instead 318 | # of trying to go back to the beginning 319 | # of time on instagram (no possible) 320 | # as the ec (endcursor) gives different 321 | # results after a set time interval 322 | _, new_qid, ec = instagram_hashtag_seed() 323 | # _, new_qid, _ = instagram_hashtag_seed() 324 | 325 | # Calls itself infinitely until it returns 326 | # # untested 327 | return maybe_get_next_instagram_hashtag_feed(new_qid, ec, tag) 328 | 329 | return sds, qid, ec 330 | 331 | 332 | if __name__ == '__main__': 333 | try: 334 | db.connect() 335 | db.create_tables([SelfiePost, FacialEmbeddings]) 336 | 337 | except OperationalError: 338 | pass 339 | 340 | # What kind of tags do we want to scrap 341 | tags_to_be_scraped = [ 342 | 'selfie', 'selfportait', 'dailylook', 'selfiesunday', 343 | 'selfietime', 'instaselfie', 'shamelessselefie', 344 | 'faceoftheday', 'me', 'selfieoftheday', 'instame', 345 | 'selfiestick', 'selfies' 346 | ] 347 | tags_to_be_scraped_dict = { 348 | k: instagram_hashtag_seed(k) for k in tags_to_be_scraped 349 | } 350 | 351 | # Multithreading pool here 352 | p = Pool() 353 | 354 | # sds: Shortcodes, display_srcs 355 | # qid: query_id 356 | # ec : end cursor 357 | # sds, qid, ec = instagram_hashtag_seed() 358 | 359 | while True: 360 | for tag in tags_to_be_scraped_dict: 361 | sds, qid, ec = tags_to_be_scraped_dict[tag] 362 | mp_args = list(map(lambda x: (x[0], x[1], tag), sds)) 363 | 364 | # Async map through all given shortcodes 365 | p.map_async(mp_instagram_hashtag_feed_to_queue, mp_args) 366 | 367 | # Get next batch 368 | sds, qid, ec = maybe_get_next_instagram_hashtag_feed(qid, ec, tag) 369 | tags_to_be_scraped_dict[tag] = (sds, qid, ec) 370 | 371 | # Wait for pool to close 372 | p.close() 373 | p.join() 374 | -------------------------------------------------------------------------------- /static/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/static/favicon.ico -------------------------------------------------------------------------------- /static/man.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/static/man.png -------------------------------------------------------------------------------- /static/sadbaby.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kendricktan/iffse/c15d1413eb633d5c75c336d20b0c459903d89506/static/sadbaby.jpg -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | IFFSE 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 140 | 141 | 142 | 143 | Fork me on GitHub 144 | 145 |
146 | 147 | 148 |
149 |
150 |
151 |

152 |

Instagram Facial Feature Search Engine

153 |

Find Instagram look alikes

154 | 155 |
156 | *only photos with specified tags (e.g. #selfies) are indexed*
157 | read the project page to build an IFFSE with custom tags 158 |
159 |
160 |
161 |
162 |
163 | 164 | 165 |
166 |
167 |

Search

168 | 169 | 170 |
171 | 172 |
173 |
174 | 175 |
176 | 177 |
178 |
179 |
180 |
181 |
182 | 183 |
184 |
185 | 186 | 187 |
188 |
189 |
190 |
191 |
192 |
193 |
194 | 195 | 196 |
197 |
198 |
199 | 200 |
201 |
202 |
203 |
204 | 205 |
206 |
207 |
208 | 209 |
210 | 211 | 212 |
213 |
214 |

215 |
216 |
217 | 218 |
219 |
220 |
221 |
222 | 223 | 224 |
225 |
226 | 229 |
230 |
231 | 232 |
233 | 234 | 235 | 236 | 407 | 408 | 419 | 420 | 421 | --------------------------------------------------------------------------------