├── untitled.txt ├── LICENSE ├── tile_splitter ├── README.md ├── pytorch-cnn-problem ├── the-car-connection-image-scraper ├── Keras CNN Benchmark.py ├── generating-male-faces-with-aae ├── generating-male-faces-with-dcgan ├── the-car-connection-image-scraper.py ├── generating-male-faces-with-vae ├── Rapport Final.ipynb ├── Deep Convolutional GAN.ipynb ├── Pytorch CNN to Test on the Generated Samples.ipynb ├── Rapport_Final (1).ipynb └── Data Cleaning.ipynb /untitled.txt: -------------------------------------------------------------------------------- 1 | Given our non-significant results, we aimed to determine the reason why the generated samples did not improve the classifier. Clearly, the numerous additional samples did not provide any new information. As such, the information in the generated samples was already contained in the real samples. 2 | 3 | In an attempt to demonstrate this hypothesis, we trained a classifier on 8,000 _real_ men and 8,000 _real_ women. As test data, we used 10,000 generated men and 10,000 generated men. The pictures of women were reused from the previous analyses, and 10,000 men were generated with the 5 adversarial networks mentioned previously, without altering hyperparameters. 4 | 5 | As expected, when trained on real samples, the classifier had an outstanding performance on the generated samples. Within two epochs, the accuracy was 100%. We regret to say that the VAEs and GANs did not magically yield more information than what had been fed to them. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Nicolas Gervais 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /tile_splitter: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from glob import glob 3 | import matplotlib.pyplot as plt 4 | import os 5 | os.chdir('c:/users/nicolas/documents/data/faces') 6 | from PIL import Image 7 | 8 | 9 | def split_pics(source_dir): 10 | """ 11 | Splits an array of 5x5 pictures, of size 60x60. 12 | Additionally, it adds a directory in the provided directory, called 'split'. 13 | """ 14 | 15 | for photo in glob('%s/*.png' % source_dir): 16 | a = plt.imread(photo) 17 | b = np.array(a) 18 | c = np.vsplit(b, np.arange(1, b.shape[0], 62)) 19 | d = c[1:-1] 20 | 21 | pictures = [] 22 | 23 | for i in d: 24 | imgs = np.hsplit(i, np.arange(1, 312, 62)) 25 | imgs = imgs[1:-1] 26 | for i in imgs: 27 | pictures.append(i[1:-1, 1:-1]) 28 | 29 | letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 30 | 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 31 | 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 32 | 'y', 'z'] 33 | 34 | os.mkdir('%s/split' % sourcedir) 35 | 36 | for pic in pictures: 37 | filename = '{}/split/{}.png'.format(source_dir, ''.join(np.random.choice(letters, 15))) 38 | pic *= 255 39 | im = Image.fromarray(pic.astype(np.uint8)) 40 | im.save(filename) 41 | 42 | 43 | split_pics('aae') 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # data-augmentation-with-gan-and-vae :100: 2 | 3 | [Vincent Fortin](https://github.com/vincentfortin) and I are using the [UTK Faces dataset](http://aicip.eecs.utk.edu/wiki/UTKFace) to for the project in the [_Machine Learning I_](https://www.hec.ca/en/courses/detail/?cours=MATH80629A) project. 4 | 5 | Unbalanced classes is one of the most frequent struggle when dealing with real data. Is it better to down/upsample, or do nothing at all? Another approach is to generate samples resembling the smallest class. In this project, we are using Variational AutoEncoders (VAEs) and Generative Adversarial Networks (GANs) to generate samples of the smallest class. Using human faces, we will determine if a convolutional neural network (CNN) will be trained better with generated samples, or without. 6 | 7 | ## PROGRESS 8 | 1. [First we trained a VAE](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Variational%20Auto%20Encoder.ipynb) to generate human faces 9 | 2. [Then we trained a ConvNet with Pytorch](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Pytorch%20ConvNet%20Distinguishing%20Men%20and%20Women.ipynb) but it didn't work. 10 | 3. So we tried with Keras to see if our architecture was the problem. It's not. [We reached 90% accuracy](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Keras%20CNN%20Benchmark.ipynb). 11 | 4. Here is the [Adversarial Auto Encoder](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Adversarial%20Auto%20Encoder.ipynb). The results are very clear. 12 | 5. Here is the [Wasserstein GAN](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Wasserstein%20GAN.ipynb). 13 | 6. The [Softmax GAN](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Softmax%20GAN.ipynb) worked out pretty well. 14 | 7. The [Deep Convolutional GAN](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Deep%20Convolutional%20GAN.ipynb) has worked but its performance is quite low. 15 | 8. Finally fixed the [Pytorch CNN](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Pytorch%20ConvNet%20Distinguishing%20Men%20and%20Women.ipynb), with 92% accuracy! 16 | 9. The CNN was able to classify generated samples, when trained on the original samples, with [100% accuracy](https://github.com/nicolas-gervais/data-augmentation-with-gan-and-vae/blob/master/Pytorch%20CNN%20to%20Test%20on%20the%20Generated%20Samples.ipynb). 17 | ## TO DO 18 | - [x] Train a Tensorflow convolutional neural network as classifier 19 | - [x] Create a GAN to generate human faces 20 | - [x] Explore other generative methods 21 | - [ ] Train CNNs to see if the accuracy is better with the generative methods 22 | - [x] Fix the Pytorch CNN 23 | - [ ] Use Keras and Pydot to plot the chosen architecture 24 | - [x] Use generated samples as test set to see if there is untapped information 25 | ## PROJECT PLAN 26 | 1. Create various sample generators 27 | 2. Establish a benchmark CNN classifier, trained with 10% of the female samples (smaller class) 28 | 3. Train classifiers on 10% of the female samples, and add generated samples. Finally, compare performance. 29 | - VAE 30 | - GAN 31 | - other 32 | 4. Compare performance, plot 33 | 5. Determine if the generated samples have information that is not contained in the original pictures 34 | ## Example of the Adversarial Auto Encoder Learning 35 | ![Alt Text](https://media.discordapp.net/attachments/552684049588682752/632967292946350080/sickgif.gif) 36 | 37 | This is the output (generated faces) of the adversarial autoencoder. 38 | -------------------------------------------------------------------------------- /pytorch-cnn-problem: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | from torch.utils.data import DataLoader 8 | from torch.autograd import Variable 9 | from keras.datasets import mnist 10 | 11 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 12 | 13 | 14 | def resize(pics): 15 | pictures = [] 16 | for image in pics: 17 | image = Image.fromarray(image).resize((dim, dim)) 18 | image = np.array(image) 19 | pictures.append(image) 20 | return np.array(pictures) 21 | 22 | 23 | dim = 60 24 | 25 | x_train, x_test = resize(x_train), resize(x_test) 26 | 27 | x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255 28 | x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255 29 | y_train, y_test = y_train.astype('float32'), y_test.astype('float32') 30 | 31 | if torch.cuda.is_available(): 32 | x_train = torch.from_numpy(x_train)[:10_000] 33 | x_test = torch.from_numpy(x_test)[:4_000] 34 | y_train = torch.from_numpy(y_train)[:10_000] 35 | y_test = torch.from_numpy(y_test)[:4_000] 36 | 37 | 38 | class ConvNet(nn.Module): 39 | 40 | def __init__(self): 41 | super().__init__() 42 | self.conv1 = nn.Conv2d(1, 32, 3) 43 | self.conv2 = nn.Conv2d(32, 64, 3) 44 | self.conv3 = nn.Conv2d(64, 128, 3) 45 | 46 | self.fc1 = nn.Linear(5*5*128, 1024) 47 | self.fc2 = nn.Linear(1024, 2048) 48 | self.fc3 = nn.Linear(2048, 1) 49 | 50 | def forward(self, x): 51 | x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) 52 | x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2)) 53 | x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2)) 54 | 55 | x = x.view(x.size(0), -1) 56 | x = F.relu(self.fc1(x)) 57 | x = F.relu(self.fc2(x)) 58 | x = F.dropout(x, 0.5) 59 | x = torch.sigmoid(self.fc3(x)) 60 | return x 61 | 62 | 63 | net = ConvNet() 64 | 65 | optimizer = optim.Adam(net.parameters(), lr=0.03) 66 | 67 | loss_function = nn.BCELoss() 68 | 69 | 70 | class FaceTrain: 71 | 72 | def __init__(self): 73 | self.len = x_train.shape[0] 74 | self.x_train = x_train 75 | self.y_train = y_train 76 | 77 | def __getitem__(self, index): 78 | return x_train[index], y_train[index].unsqueeze(0) 79 | 80 | def __len__(self): 81 | return self.len 82 | 83 | 84 | class FaceTest: 85 | 86 | def __init__(self): 87 | self.len = x_test.shape[0] 88 | self.x_test = x_test 89 | self.y_test = y_test 90 | 91 | def __getitem__(self, index): 92 | return x_test[index], y_test[index].unsqueeze(0) 93 | 94 | def __len__(self): 95 | return self.len 96 | 97 | 98 | train = FaceTrain() 99 | test = FaceTest() 100 | 101 | train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True) 102 | test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True) 103 | 104 | epochs = 10 105 | steps = 0 106 | train_losses, test_losses = [], [] 107 | for e in range(epochs): 108 | running_loss = 0 109 | for images, labels in train_loader: 110 | optimizer.zero_grad() 111 | log_ps = net(images) 112 | loss = loss_function(log_ps, labels) 113 | loss.backward() 114 | optimizer.step() 115 | running_loss += loss.item() 116 | else: 117 | test_loss = 0 118 | accuracy = 0 119 | 120 | with torch.no_grad(): 121 | for images, labels in test_loader: 122 | log_ps = net(images) 123 | test_loss += loss_function(log_ps, labels) 124 | ps = torch.exp(log_ps) 125 | top_p, top_class = ps.topk(1, dim=1) 126 | equals = top_class.type('torch.LongTensor') == labels.type('torch.LongTensor').view(*top_class.shape) 127 | accuracy += torch.mean(equals.type('torch.FloatTensor')) 128 | train_losses.append(running_loss/len(train_loader)) 129 | test_losses.append(test_loss/len(test_loader)) 130 | print("[Epoch: {}/{}] ".format(e+1, epochs), 131 | "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)), 132 | "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)), 133 | "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader))) 134 | 135 | -------------------------------------------------------------------------------- /the-car-connection-image-scraper: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | import bs4 as bs 3 | from urllib.request import Request, urlopen 4 | import pandas as pd 5 | import time 6 | import os 7 | 8 | # os.chdir('/data') 9 | 10 | website = 'https://www.thecarconnection.com' 11 | 12 | 13 | def fetch(page, addition=''): 14 | return bs.BeautifulSoup(urlopen(Request(page + addition, 15 | headers={'User-Agent': 'Opera/9.80 (X11; Linux i686; Ub'\ 16 | 'untu/14.10) Presto/2.12.388 Version/12.16'})).read(), 'lxml') 17 | 18 | 19 | def all_makes(): 20 | # Fetches all makes (acura, cadilac, etc) 21 | all_makes_list = [] 22 | for a in fetch(website, "/new-cars").find_all("a", {"class": "add-zip"}): 23 | all_makes_list.append(a['href']) 24 | print(all_makes_list[:10]) 25 | print("All makes fetched") 26 | return all_makes_list 27 | 28 | 29 | def make_menu(listed): 30 | # Fetches all makes + model ? (acura_mdx, audi_q3, etc) 31 | make_menu_list = [] 32 | for make in listed: # REMOVE REMOVE REMOVE REMOVE REMOVE REMOVE # 33 | for div in fetch(website, make).find_all("div", {"class": "name"}): 34 | make_menu_list.append(div.find_all("a")[0]['href']) 35 | print(make_menu_list[:10]) 36 | print("Make menu list fetched") 37 | return make_menu_list 38 | 39 | 40 | def model_menu(listed): 41 | # Add year to previous step 42 | model_menu_list = [] 43 | for make in listed: 44 | soup = fetch(website, make) 45 | for div in soup.find_all("a", {"class": "btn avail-now first-item"}): 46 | model_menu_list.append(div['href']) 47 | for div in soup.find_all("a", {"class": "btn 1"})[:8]: 48 | model_menu_list.append(div['href']) 49 | print(model_menu_list[:10]) 50 | print("Model menu list fetched") 51 | return model_menu_list 52 | 53 | 54 | def year_model_overview(listed): 55 | year_model_overview_list = [] 56 | for make in listed: 57 | for id in fetch(website, make).find_all("a", {"id": "ymm-nav-specs-btn"}): 58 | year_model_overview_list.append(id['href']) 59 | try: 60 | year_model_overview_list.remove("/specifications/buick_enclave_2019_fwd-4dr-preferred") 61 | except: 62 | pass 63 | print(year_model_overview_list[:10]) 64 | print("Year model overview list fetched") 65 | return year_model_overview_list 66 | 67 | 68 | def trims(listed): 69 | trim_list = [] 70 | for row in listed: 71 | div = fetch(website, row).find_all("div", {"class": "block-inner"})[-1] 72 | div_a = div.find_all("a") 73 | for i in range(len(div_a)): 74 | trim_list.append(div_a[-i]['href']) 75 | print(trim_list[:10]) 76 | print("Trims list fetched") 77 | return trim_list 78 | 79 | 80 | def timer(start, end, iters, iters_left): 81 | hours, rem = divmod(end-start, 3600) 82 | minutes, seconds = divmod(rem, 60) 83 | 84 | hours_per_iter, rem_per_iter = divmod((end-start)/(iters+1),3600) 85 | minutes_per_iter, seconds_per_iter = divmod(rem_per_iter,60) 86 | 87 | hours_left , rem_left = divmod(((end-start)/(iters+1))*iters_left,3600) 88 | minutes_left, seconds_left = divmod(rem_left,60) 89 | print(" Total elapsed: {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds)) 90 | print(" Time per page: {:0>2}:{:0>2}:{:05.2f}".format(int(hours_per_iter),int(minutes_per_iter),seconds_per_iter)) 91 | print(" Time left : {:0>2}:{:0>2}:{:05.2f}".format(int(hours_left),int(minutes_left),seconds_left)) 92 | 93 | 94 | def specifications(website, trims, keep_all_images=True): 95 | ''' keep_all_images: True means we create 2 files, one for main (front/read) 96 | And one for all of the pictures. 97 | ''' 98 | options = webdriver.FirefoxOptions() 99 | options.add_argument('-headless') 100 | driver = webdriver.Firefox(options=options) 101 | # driver = webdriver.Firefox() 102 | 103 | # Timer start 104 | start = time.time() 105 | 106 | if not os.path.isfile('data/pictures_all.csv'): 107 | # Table for all images 108 | specifications_table_all = pd.DataFrame() 109 | # Table for only front and rear images 110 | specifications_table_front_rear = pd.DataFrame() 111 | else: 112 | specifications_table_all = pd.read_csv('data/pictures_all.csv',header=None) 113 | specifications_table_front_rear = pd.read_csv('data/pictures_rear_front.csv',header=None) 114 | 115 | trims_left = len(trims.index) 116 | 117 | for inx, webpage in enumerate(trims.iloc[len(specifications_table_all.columns):, 0]): 118 | soup = fetch(website, webpage.replace('overview', 'specifications')) 119 | # Same splitting as above 120 | specifications_df_all = pd.DataFrame(columns=[soup.find_all("title")[0].text[:-15]]) 121 | specifications_df_front_rear = pd.DataFrame(columns=[soup.find_all("title")[0].text[:-15]]) 122 | for div in soup.find_all("div", {"class": "specs-set-item"})[:9]: 123 | row_name = div.find_all("span")[0].text 124 | row_value = div.find_all("span")[1].text 125 | specifications_df_all.loc[row_name] = row_value 126 | specifications_df_front_rear.loc[row_name] = row_value 127 | try: 128 | driver.get(website + webpage.replace('specifications', 'overview')) 129 | class_img = driver.find_elements_by_class_name('image') 130 | except: 131 | print(f'Problem with {website + webpage}') 132 | list_urls = [] 133 | for ii in class_img: 134 | list_urls.append(ii.get_attribute('data-image-huge')) 135 | 136 | # Keep a count of rear and front images to put them at start of index 137 | rear_front_img_count = 0 138 | for ix, img_url in enumerate(list_urls): 139 | specifications_df_all.loc['Picture_%i' % ix, :] = img_url 140 | if keep_all_images and 'pkg-rear-exterior-view' in img_url: 141 | specifications_df_front_rear.loc['Picture_%i' % rear_front_img_count, :] = img_url 142 | rear_front_img_count += 1 143 | 144 | # If no images, we don't add to the main df 145 | if len(class_img) > 0: 146 | specifications_table_all = pd.concat([specifications_table_all, specifications_df_all], axis=1, sort=False) 147 | specifications_table_front_rear = pd.concat([specifications_table_front_rear, specifications_df_front_rear], axis=1, sort=False) 148 | # Save content every 10 images 149 | if inx % 10 == 0: 150 | print("%d/%d completed."%(inx, trims_left)) 151 | specifications_table_all.to_csv('data/pictures_all.csv',header=None) 152 | specifications_table_front_rear.to_csv('data/pictures_rear_front.csv',header=None) 153 | timer(start,time.time(), inx, trims_left-inx) 154 | 155 | # At the end of loop 156 | specifications_table_all.to_csv('data/pictures_all.csv',header=None) 157 | specifications_table_front_rear.to_csv('data/pictures_rear_front.csv',header=None) 158 | 159 | 160 | if __name__ == '__main__': 161 | # If list of trims has not been fetched 162 | if not os.path.isfile('data/trims_octobre_2019.csv'): 163 | a = all_makes() 164 | b = make_menu(a) 165 | c = model_menu(b) 166 | d = year_model_overview(c) 167 | e = trims(d) 168 | f = pd.DataFrame(e).to_csv('data/trims_octobre_2019.csv', index=False, header=None) 169 | 170 | # Read list of trims 171 | g = pd.read_csv('data/trims_octobre_2019.csv',header=None) 172 | g.drop_duplicates(inplace=True) 173 | h = specifications(website, g) 174 | h.to_csv('data/pictures.csv') 175 | 176 | -------------------------------------------------------------------------------- /Keras CNN Benchmark.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Keras Benchmark 5 | 6 | import numpy as np 7 | import matplotlib.pyplot as plt 8 | from glob import glob 9 | from PIL import Image 10 | from time import time 11 | from sklearn.model_selection import train_test_split 12 | from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, Dropout, Flatten, Dense 13 | from keras.callbacks import EarlyStopping 14 | from keras.models import Sequential 15 | from keras.optimizers import Adam 16 | from keras.utils import to_categorical 17 | from keras.backend import epsilon 18 | from keras import backend 19 | from keras.metrics import AUC 20 | import os 21 | import csv 22 | import pandas as pd 23 | 24 | 25 | def unison_shuffled_copies(a, b): 26 | # Shuffles two lists keeping orders 27 | assert len(a) == len(b) 28 | p = np.random.permutation(len(a)) 29 | return a[p], b[p] 30 | 31 | def crop(img): 32 | if img.shape[0]'.format(sex[int(labels_train[rand])].capitalize())) 181 | yticks = plt.xticks([]) 182 | yticks = plt.yticks([]) 183 | 184 | plt.show() 185 | 186 | trainsize, testsize = x_train.shape[0], x_test.shape[0] 187 | print(f'The size of the training set is {trainsize:,} and the ' f'size of the test set is {testsize:,}.') 188 | 189 | # ##### Scaling, casting the arrays 190 | print('Scaling...', end='') 191 | image_size = x_train.shape[1] * x_train.shape[1] 192 | x_train = x_train.astype('float32') / 255 193 | x_test = x_test.astype('float32') / 255 194 | print('\rDone. ') 195 | 196 | model = Sequential([ 197 | Conv2D(16*4, (3, 3), input_shape=(60, 60, 1), activation='relu'), 198 | MaxPooling2D(), 199 | 200 | Conv2D(32*4, (3, 3), activation='relu'), 201 | MaxPooling2D(), 202 | 203 | Conv2D(64*4, (3, 3), activation='relu'), 204 | MaxPooling2D(), 205 | 206 | Conv2D(128*4, (3, 3), activation='relu'), 207 | MaxPooling2D(), 208 | 209 | Flatten(), 210 | 211 | Dense(1024, activation='relu'), 212 | Dense(2048, activation='relu'), 213 | Dense(2, activation='sigmoid') 214 | ]) 215 | 216 | # model.summary() 217 | 218 | model.compile(optimizer=Adam(lr=0.001), 219 | loss='binary_crossentropy', 220 | metrics=['accuracy', AUC(),f1_m]) 221 | 222 | e_s = EarlyStopping(monitor='val_loss', patience=10) 223 | 224 | hist = model.fit(x_train, y_train, 225 | epochs=nb_epochs, 226 | validation_data=[x_test, y_test], 227 | batch_size=32, 228 | callbacks=[e_s]) 229 | 230 | 231 | pd.DataFrame(hist.history).to_csv(model_name+'_history.csv') 232 | 233 | test_loss, test_acc, test_AUC, test_f1 = model.evaluate(x_test, y_test) 234 | 235 | print("-------------------") 236 | print(model_name) 237 | print(f'Test loss: {np.round(test_loss, 4)} — Test accuracy: {np.round(test_acc*100,2)}%') 238 | print(f'Test AUC: {np.round(test_AUC, 4)} — Test F1: {np.round(test_f1,4)}%') 239 | 240 | -------------------------------------------------------------------------------- /generating-male-faces-with-aae: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | from glob import glob 4 | from PIL import Image 5 | from time import time 6 | import re 7 | import pandas as pd 8 | import os 9 | import argparse 10 | import math 11 | import itertools 12 | import torchvision.transforms as transforms 13 | from torchvision.utils import save_image 14 | from torch.utils.data import DataLoader 15 | from torchvision import datasets 16 | from torch.autograd import Variable 17 | import torch.nn as nn 18 | import torch.nn.functional as F 19 | import torch 20 | 21 | os.chdir('C:/Users/Nicolas/Documents/Data/Faces') 22 | 23 | files = glob('combined/*.jpg') 24 | 25 | faces = [i for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) == 2] 26 | y = [i[-34] for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) > 1] 27 | 28 | dim = 60 29 | 30 | start = time() 31 | x = list() 32 | num_to_load = len(faces) 33 | for ix, file in enumerate(faces[:num_to_load]): 34 | image = plt.imread(file, 'jpg') 35 | if image.shape[0] != image.shape[1]: 36 | prob += 1 37 | image = Image.fromarray(image).resize((dim, dim)).convert('L') 38 | image = np.array(image) 39 | x.append(image) 40 | 41 | x = np.array(x, dtype=np.float32).reshape(-1, 1, 60, 60) 42 | 43 | assert x.ndim == 4, 'The input is the wrong shape!' 44 | 45 | files, faces = None, None 46 | 47 | x = x.astype(np.float32) / 127.5 - 1 48 | y = np.array(y, dtype=np.float32) 49 | 50 | if torch.cuda.is_available(): 51 | x = torch.from_numpy(x) 52 | y = torch.from_numpy(y) 53 | print('Tensors successfully flushed to CUDA.') 54 | else: 55 | print('CUDA not available!') 56 | 57 | 58 | class Face: 59 | 60 | def __init__(self): 61 | self.len = x.shape[0] 62 | self.x = x 63 | self.y = y 64 | 65 | def __getitem__(self, index): 66 | return x[index], y[index].unsqueeze(0) 67 | 68 | def __len__(self): 69 | return self.len 70 | 71 | 72 | train = Face() 73 | 74 | parser = argparse.ArgumentParser() 75 | 76 | parser.add_argument("--n_epochs", type=int, default=100, help="number of epochs of training") 77 | parser.add_argument("--batch_size", type=int, default=32, help="size of the batches") 78 | parser.add_argument("--lr", type=float, default=0.005, help="adam: learning rate") 79 | parser.add_argument("--b1", type=float, default=0.3, help="adam: decay of first order momentum of gradient") 80 | parser.add_argument("--b2", type=float, default=0.999, help="adam: decay of first order momentum of gradient") 81 | parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation") 82 | parser.add_argument("--latent_dim", type=int, default=3, help="dimensionality of the latent code") 83 | parser.add_argument("--img_size", type=int, default=60, help="size of each image dimension") 84 | parser.add_argument("--channels", type=int, default=1, help="number of image channels") 85 | parser.add_argument("--sample_interval", type=int, default=50, help="interval between image sampling") 86 | opt, unknown = parser.parse_known_args() 87 | 88 | img_shape = (opt.channels, opt.img_size, opt.img_size) 89 | 90 | cuda = True if torch.cuda.is_available() else False 91 | 92 | 93 | def reparameterization(mu, logvar): 94 | std = torch.exp(logvar / 2) 95 | sampled_z = Variable(Tensor(np.random.normal(0, 1, (mu.size(0), opt.latent_dim)))) 96 | z = sampled_z * std + mu 97 | return z 98 | 99 | 100 | class Encoder(nn.Module): 101 | def __init__(self): 102 | super(Encoder, self).__init__() 103 | 104 | self.model = nn.Sequential( 105 | nn.Linear(int(np.prod(img_shape)), 512), 106 | nn.LeakyReLU(0.2, inplace=True), 107 | nn.Linear(512, 512), 108 | nn.BatchNorm1d(512), 109 | nn.LeakyReLU(0.2, inplace=True), 110 | ) 111 | 112 | self.mu = nn.Linear(512, opt.latent_dim) 113 | self.logvar = nn.Linear(512, opt.latent_dim) 114 | 115 | def forward(self, img): 116 | img_flat = img.view(img.shape[0], -1) 117 | x = self.model(img_flat) 118 | mu = self.mu(x) 119 | logvar = self.logvar(x) 120 | z = reparameterization(mu, logvar) 121 | return z 122 | 123 | 124 | class Decoder(nn.Module): 125 | def __init__(self): 126 | super(Decoder, self).__init__() 127 | 128 | self.model = nn.Sequential( 129 | nn.Linear(opt.latent_dim, 512), 130 | nn.LeakyReLU(0.2, inplace=True), 131 | nn.Linear(512, 512), 132 | nn.BatchNorm1d(512), 133 | nn.LeakyReLU(0.2, inplace=True), 134 | nn.Linear(512, int(np.prod(img_shape))), 135 | nn.Tanh(), 136 | ) 137 | 138 | def forward(self, z): 139 | img_flat = self.model(z) 140 | img = img_flat.view(img_flat.shape[0], *img_shape) 141 | return img 142 | 143 | 144 | class Discriminator(nn.Module): 145 | def __init__(self): 146 | super(Discriminator, self).__init__() 147 | 148 | self.model = nn.Sequential( 149 | nn.Linear(opt.latent_dim, 512), 150 | nn.LeakyReLU(0.2, inplace=True), 151 | nn.Linear(512, 256), 152 | nn.LeakyReLU(0.2, inplace=True), 153 | nn.Linear(256, 1), 154 | nn.Sigmoid(), 155 | ) 156 | 157 | def forward(self, z): 158 | validity = self.model(z) 159 | return validity 160 | 161 | 162 | adversarial_loss = torch.nn.BCELoss() 163 | pixelwise_loss = torch.nn.L1Loss() 164 | 165 | encoder = Encoder() 166 | decoder = Decoder() 167 | discriminator = Discriminator() 168 | 169 | decoder.load_state_dict(torch.load('aae_decoder_men')) 170 | encoder.load_state_dict(torch.load('aae_encoder_men')) 171 | discriminator.load_state_dict(torch.load('aae_discriminator_men')) 172 | 173 | if cuda: 174 | encoder.cuda() 175 | decoder.cuda() 176 | discriminator.cuda() 177 | adversarial_loss.cuda() 178 | pixelwise_loss.cuda() 179 | 180 | dataloader = torch.utils.data.DataLoader(train, batch_size=opt.batch_size, shuffle=True) 181 | 182 | optimizer_G = torch.optim.Adam( 183 | itertools.chain(encoder.parameters(), 184 | decoder.parameters()), lr=opt.lr, betas=(opt.b1, opt.b2)) 185 | 186 | optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2)) 187 | 188 | Tensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor 189 | 190 | 191 | def sample_image(n_row, batches_done, directory): 192 | """Saves a grid of generated digits""" 193 | # Sample noise 194 | z = Variable(Tensor(np.random.normal(0, 1, (n_row ** 2, opt.latent_dim)))) 195 | gen_imgs = decoder(z) 196 | save_image(gen_imgs.data, "%s/%d.png" % (directory, batches_done), nrow=n_row, normalize=True) 197 | 198 | 199 | if not os.path.isdir('generated_men_aae'): 200 | os.mkdir('generated_men_aae') 201 | 202 | for epoch in range(1, opt.n_epochs + 1): 203 | 204 | break # model already trained 205 | for i, (imgs, _) in enumerate(dataloader): 206 | 207 | # Adversarial ground truths 208 | valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False) 209 | fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False) 210 | 211 | # Configure input 212 | real_imgs = Variable(imgs.type(Tensor)) 213 | 214 | # ----------------- 215 | # Train Generator 216 | # ----------------- 217 | 218 | optimizer_G.zero_grad() 219 | 220 | encoded_imgs = encoder(real_imgs) 221 | decoded_imgs = decoder(encoded_imgs) 222 | 223 | # Loss measures generator's ability to fool the discriminator 224 | g_loss = 0.001 * adversarial_loss(discriminator(encoded_imgs), valid) + 0.999 * pixelwise_loss( 225 | decoded_imgs, real_imgs 226 | ) 227 | 228 | g_loss.backward() 229 | optimizer_G.step() 230 | 231 | # --------------------- 232 | # Train Discriminator 233 | # --------------------- 234 | 235 | optimizer_D.zero_grad() 236 | 237 | # Sample noise as discriminator ground truth 238 | z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim)))) 239 | 240 | # Measure discriminator's ability to classify real from generated samples 241 | real_loss = adversarial_loss(discriminator(z), valid) 242 | fake_loss = adversarial_loss(discriminator(encoded_imgs.detach()), fake) 243 | d_loss = 0.5 * (real_loss + fake_loss) 244 | 245 | d_loss.backward() 246 | optimizer_D.step() 247 | 248 | batches_done = epoch * len(dataloader) + i 249 | 250 | if epoch >= 25 and epoch % 10 == 0: 251 | val = input("\nContinue training? [y/n]: ") 252 | print() 253 | if val in ('y', 'yes'): 254 | val = True 255 | pass 256 | elif val in ('n', 'no'): 257 | break 258 | else: 259 | pass 260 | 261 | if epoch > 10: 262 | if batches_done % opt.sample_interval == 0: 263 | sample_image(n_row=5, batches_done=batches_done, directory='generated_men_aae') 264 | 265 | if epoch % 5 == 0: 266 | print( 267 | "[Epoch %d/%d] [D loss: %f] [G loss: %f]" 268 | % (epoch, opt.n_epochs, d_loss.item(), g_loss.item()) 269 | ) 270 | 271 | torch.save(decoder.state_dict(), 'aae_decoder_men') 272 | torch.save(encoder.state_dict(), 'aae_encoder_men') 273 | torch.save(discriminator.state_dict(), 'aae_discriminator_men') 274 | 275 | images = 0 276 | stop = False 277 | for epoch in range(1, 4 + 1): 278 | for i, (imgs, _) in enumerate(dataloader): 279 | 280 | with torch.no_grad(): 281 | 282 | # Adversarial ground truths 283 | valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False) 284 | fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False) 285 | 286 | # Configure input 287 | real_imgs = Variable(imgs.type(Tensor)) 288 | 289 | batches_done = epoch * len(dataloader) + i 290 | sample_image(directory='generated_men_aae', n_row=5, batches_done=batches_done) 291 | images += 25 292 | 293 | if len(os.listdir(os.path.join(os.getcwd(), 'generated_men_aae'))) >= 1000: 294 | stop = True 295 | break 296 | 297 | if stop: 298 | break 299 | 300 | -------------------------------------------------------------------------------- /generating-male-faces-with-dcgan: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | from glob import glob 4 | from PIL import Image 5 | from time import time 6 | import os 7 | import pandas as pd 8 | import argparse 9 | import math 10 | import re 11 | import itertools 12 | import torchvision.transforms as transforms 13 | from torchvision.utils import save_image 14 | from torch.utils.data import DataLoader 15 | from torchvision import datasets 16 | from torch.autograd import Variable 17 | import torch.nn as nn 18 | import torch.nn.functional as F 19 | import torch 20 | os.chdir('C:/Users/Nicolas/Documents/Data/Faces') 21 | 22 | 23 | def sorted_alphanumeric(data): 24 | convert = lambda text: int(text) if text.isdigit() else text.lower() 25 | alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 26 | return sorted(data, key=alphanum_key) 27 | 28 | files = sorted_alphanumeric(glob('combined/*.jpg')) 29 | 30 | 31 | faces = [i for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) == 2] 32 | y = [i[-34] for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) > 1] 33 | 34 | dim = 60 35 | 36 | 37 | def crop(img): 38 | if img.shape[0]= 10 and epoch % 5 == 0: 259 | val = input("\nContinue training? [y/n]: ") 260 | print() 261 | if val in ('y', 'yes'): 262 | val = True 263 | pass 264 | elif val in ('n', 'no'): 265 | break 266 | else: 267 | pass 268 | 269 | if batches_done % opt.sample_interval == 0: 270 | save_image(gen_imgs.data[:25], "generated_men_dcgan/%d.png" % batches_done, nrow=5, normalize=True) 271 | 272 | if epoch % 5 == 0: 273 | print( 274 | "[Epoch %d/%d] [D loss: %f] [G loss: %f]" 275 | % (epoch, opt.n_epochs, d_loss.item(), g_loss.item()) 276 | ) 277 | 278 | torch.save(generator.state_dict(), 'dcgan_generator_men') 279 | torch.save(discriminator.state_dict(), 'dcgan_discriminator_men') 280 | 281 | 282 | def sample_image(n_row, batches_done): 283 | z = Variable(Tensor(np.random.normal(0, 1, (n_row ** 2, opt.latent_dim)))) 284 | gen_imgs = generator(z) 285 | save_image(gen_imgs.data, "generated_men_dcgan/%d.png" % batches_done, nrow=n_row, normalize=True) 286 | 287 | 288 | images = 0 289 | stop =False 290 | for epoch in range(1, 2_50 + 1): 291 | for i, (imgs, _) in enumerate(dataloader, 1): 292 | 293 | with torch.no_grad(): 294 | 295 | # Adversarial ground truths 296 | valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False) 297 | fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False) 298 | 299 | # Configure input 300 | real_imgs = Variable(imgs.type(Tensor)) 301 | 302 | batches_done = epoch * len(dataloader) + i 303 | sample_image(n_row=5, batches_done=batches_done) 304 | images += 25 305 | 306 | if len(os.listdir(os.path.join(os.getcwd(), 'generated_men_dcgan'))) >= 1_000: 307 | print('\n25,000 images successfully generated.') 308 | stop = True 309 | break 310 | if stop: 311 | break 312 | 313 | if images % 5_000 == 0: 314 | print(f'Pictures created: {images:,}') 315 | 316 | -------------------------------------------------------------------------------- /the-car-connection-image-scraper.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | import bs4 as bs 3 | from urllib.request import Request, urlopen 4 | import pandas as pd 5 | import time 6 | import os 7 | import requests 8 | from IPython import embed 9 | 10 | # os.chdir('/data') 11 | 12 | website = 'https://www.thecarconnection.com' 13 | 14 | 15 | def fetch(page, addition=''): 16 | return bs.BeautifulSoup(urlopen(Request(page + addition, 17 | headers={'User-Agent': 'Opera/9.80 (X11; Linux i686; Ub'\ 18 | 'untu/14.10) Presto/2.12.388 Version/12.16'})).read(), 'lxml') 19 | 20 | def all_makes(): 21 | # Fetches all makes (acura, cadilac, etc) 22 | all_makes_list = [] 23 | for a in fetch(website, "/new-cars").find_all("a", {"class": "add-zip"}): 24 | all_makes_list.append(a['href']) 25 | print(all_makes_list[:10]) 26 | print("All makes fetched") 27 | return all_makes_list 28 | 29 | 30 | def make_menu(listed): 31 | # Fetches all makes + model ? (acura_mdx, audi_q3, etc) 32 | make_menu_list = [] 33 | for make in listed: # REMOVE REMOVE REMOVE REMOVE REMOVE REMOVE # 34 | for div in fetch(website, make).find_all("div", {"class": "name"}): 35 | make_menu_list.append(div.find_all("a")[0]['href']) 36 | print(make_menu_list[:10]) 37 | print("Make menu list fetched") 38 | return make_menu_list 39 | 40 | 41 | def model_menu(listed): 42 | # Add year to previous step 43 | model_menu_list = [] 44 | for make in listed: 45 | soup = fetch(website, make) 46 | for div in soup.find_all("a", {"class": "btn avail-now first-item"}): 47 | model_menu_list.append(div['href']) 48 | for div in soup.find_all("a", {"class": "btn 1"})[:8]: 49 | model_menu_list.append(div['href']) 50 | print(model_menu_list[:10]) 51 | print("Model menu list fetched") 52 | return model_menu_list 53 | 54 | 55 | def year_model_overview(listed): 56 | year_model_overview_list = [] 57 | for make in listed: # REMOVE REMOVE REMOVE REMOVE REMOVE REMOVE REMOVE REMOVE 58 | for id in fetch(website, make).find_all("a", {"id": "ymm-nav-specs-btn"}): 59 | year_model_overview_list.append(id['href']) 60 | try: 61 | year_model_overview_list.remove("/specifications/buick_enclave_2019_fwd-4dr-preferred") 62 | except: 63 | pass 64 | print(year_model_overview_list[:10]) 65 | print("Year model overview list fetched") 66 | return year_model_overview_list 67 | 68 | 69 | def trims(listed): 70 | trim_list = [] 71 | for row in listed: 72 | div = fetch(website, row).find_all("div", {"class": "block-inner"})[-1] 73 | div_a = div.find_all("a") 74 | for i in range(len(div_a)): 75 | trim_list.append(div_a[-i]['href']) 76 | print(trim_list[:10]) 77 | print("Trims list fetched") 78 | return trim_list 79 | 80 | 81 | def timer(start, end, iters, iters_left): 82 | hours, rem = divmod(end-start, 3600) 83 | minutes, seconds = divmod(rem, 60) 84 | 85 | hours_per_iter, rem_per_iter = divmod((end-start)/(iters+1),3600) 86 | minutes_per_iter, seconds_per_iter = divmod(rem_per_iter,60) 87 | 88 | hours_left , rem_left = divmod(((end-start)/(iters+1))*iters_left,3600) 89 | minutes_left, seconds_left = divmod(rem_left,60) 90 | print(" Total elapsed: {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds)) 91 | print(" Time per page: {:0>2}:{:0>2}:{:05.2f}".format(int(hours_per_iter),int(minutes_per_iter),seconds_per_iter)) 92 | print(" Time left : {:0>2}:{:0>2}:{:05.2f}".format(int(hours_left),int(minutes_left),seconds_left)) 93 | 94 | 95 | def saveImage(imgUrl, imgName, group): 96 | imgData = requests.get(imgUrl).content 97 | with open('data/pictures/'+group+'/'+imgName+'.jpg','wb') as handler: 98 | handler.write(imgData) 99 | 100 | 101 | def create_file_name(row): 102 | ''' 103 | Takes all columns not named pictures and delimits it with -- 104 | Also replaces spaces in individual columns with _ 105 | ''' 106 | return row[0].strip().replace(' ','_').replace('/','.') 107 | 108 | 109 | def specifications(website, trims, keep_all_images=True): 110 | ''' keep_all_images: True means we create 2 files, one for main (front/read) 111 | And one for all of the pictures. 112 | ''' 113 | options = webdriver.FirefoxOptions() 114 | options.add_argument('-headless') 115 | driver = webdriver.Firefox(options=options) 116 | # driver = webdriver.Firefox() 117 | 118 | # Timer start 119 | start = time.time() 120 | 121 | if not os.path.isfile('data/pictures_all.csv'): 122 | # Table for all images 123 | specifications_table_all = pd.DataFrame() 124 | # Table for only front and rear images 125 | specifications_table_front_rear = pd.DataFrame() 126 | else: 127 | specifications_table_all = pd.read_csv('data/pictures_all.csv',index_col=0) 128 | specifications_table_front_rear = pd.read_csv('data/pictures_rear_front.csv',index_col=0) 129 | 130 | trims_left = len(trims.index) 131 | if trims_left == 0: 132 | return 0 133 | for inx, webpage in enumerate(trims.iloc[:, 0]): 134 | soup = fetch(website, webpage.replace('overview', 'specifications')) 135 | # Same splitting as above 136 | specifications_df_all = pd.DataFrame(columns=[soup.find_all("title")[0].text[:-15]]) 137 | specifications_df_front_rear = pd.DataFrame(columns=[soup.find_all("title")[0].text[:-15]]) 138 | for div in soup.find_all("div", {"class": "specs-set-item"})[:9]: 139 | row_name = div.find_all("span")[0].text 140 | row_value = div.find_all("span")[1].text 141 | specifications_df_all.loc[row_name] = row_value 142 | specifications_df_front_rear.loc[row_name] = row_value 143 | 144 | try: 145 | driver.get(website + webpage.replace('overview', 'photos')) 146 | time.sleep(0.5) 147 | ext_btn = driver.find_element_by_class_name('view-mode.show-ext') 148 | if ext_btn.text == 'Exterior': 149 | ext_btn.click() 150 | time.sleep(0.5) 151 | class_img_ext = driver.find_elements_by_xpath("//div[@class='thumbs-wrapper']/div[starts-with(@class, 'thumbs-slide') and not(contains(@class, 'video'))]/img") 152 | list_urls = [x.get_attribute("src").replace('/tmb/','/sml/').replace('_t.gif','_s.jpg') for x in class_img_ext] 153 | except: 154 | list_urls = [] 155 | print(f'Problem with {website + webpage}') 156 | 157 | # Different layout for older images 158 | # if len(class_img) == 0: 159 | # try: 160 | # driver.get(website + webpage.replace('overview', 'photos')) 161 | # class_img = driver.find_elements_by_class_name('image') 162 | # list_urls = [] 163 | # for ii in class_img: 164 | # list_urls.append(ii.get_attribute('data-image-small')) 165 | # except: 166 | # print(f'Problem with {website + webpage}') 167 | 168 | 169 | 170 | # Keep a count of rear and front images to put them at start of index 171 | rear_front_img_count = 0 172 | for ix, img_url in enumerate(list_urls): # REMOVE REMOVE REMOVE 173 | specifications_df_all.loc['Picture_%i' % ix, :] = img_url 174 | if keep_all_images and 'pkg-rear-exterior-view' in img_url: 175 | specifications_df_front_rear.loc['Picture_%i' % rear_front_img_count, :] = img_url 176 | rear_front_img_count += 1 177 | 178 | # If no images, we don't add to the main df 179 | if len(list_urls) > 0: 180 | specifications_table_all = pd.concat([specifications_table_all, specifications_df_all], axis=1, sort=False) 181 | if rear_front_img_count > 0: 182 | specifications_table_front_rear = pd.concat([specifications_table_front_rear, specifications_df_front_rear], axis=1, sort=False) 183 | else: 184 | print(website + webpage.replace('specifications', 'overview')) 185 | 186 | # Save content every 10 images 187 | if inx % 10 == 0: 188 | print("%d/%d completed."%(inx, trims_left)) 189 | specifications_table_all.to_csv('data/pictures_all.csv') 190 | specifications_table_front_rear.to_csv('data/pictures_rear_front.csv') 191 | trims.iloc[inx:].to_csv('data/trims_octobre_2019.csv', header=None) 192 | timer(start,time.time(), inx, trims_left-inx) 193 | 194 | 195 | # At the end of loop 196 | specifications_table_all.to_csv('data/pictures_all.csv') 197 | specifications_table_front_rear.to_csv('data/pictures_rear_front.csv') 198 | specifications_table_all.to_csv('data/img_left_octobre_2019.csv') 199 | specifications_table_front_rear.to_csv('data/img_left_frontrear_octobre_2019.csv') 200 | 201 | fetch_urls = False 202 | download_images = True 203 | 204 | if __name__ == '__main__': 205 | if fetch_urls: 206 | # If list of trims has not been fetched 207 | if not os.path.isfile('data/trims_octobre_2019.csv'): 208 | a = all_makes() 209 | b = make_menu(a) 210 | c = model_menu(b) 211 | d = year_model_overview(c) 212 | e = trims(d) 213 | f = pd.DataFrame(e).to_csv('data/trims_octobre_2019.csv', header=None) 214 | # Previous one will be modified 215 | f = pd.DataFrame(e).to_csv('data/trims_octobre_2019_keep.csv', header=None) 216 | 217 | # Read list of trims 218 | g = pd.read_csv('data/trims_octobre_2019.csv',index_col=0, header=None) 219 | g.drop_duplicates(inplace=True) 220 | h = specifications(website, g) 221 | 222 | if download_images: 223 | i_all = pd.read_csv('data/img_left_octobre_2019.csv',index_col=0) 224 | i_front_rear = pd.read_csv('data/img_left_frontrear_octobre_2019.csv',index_col=0) 225 | 226 | if 'imgName' not in i_all.columns: 227 | i_all = i_all.transpose().reset_index() 228 | i_all['imgName'] = i_all.apply(create_file_name, axis=1) 229 | if 'imgName' not in i_front_rear.columns: 230 | i_front_rear = i_front_rear.transpose().reset_index() 231 | i_front_rear['imgName'] = i_front_rear.apply(create_file_name, axis=1) 232 | 233 | start = time.time() 234 | 235 | for ind, row in i_all.iterrows(): 236 | if ind % 10 == 0: 237 | timer(start, time.time(), ind, len(i_all.index)) 238 | print('%i/%i image pages for all angles completed.' %(ind,len(i_all.index))) 239 | i_all.iloc[ind:].to_csv('data/img_left_octobre_2019.csv') 240 | img_urls = [x for inx, x in row.iteritems() if 'Picture_' in inx and str(x) != 'nan'] 241 | pic_name = row['imgName'] 242 | for ix, url in enumerate(img_urls): 243 | saveImage(url, pic_name+'_'+str(ix), 'all_images') 244 | 245 | start = time.time() 246 | for ind, row_front in i_front_rear.iterrows(): 247 | if ind % 10 == 0: 248 | timer(start, time.time(), ind, len(i_front_rear.index)) 249 | print('%i/%i image pages for front/rear completed.' %(ind,len(i_front_rear.index))) 250 | i_front_rear.iloc[ind:].to_csv('data/img_left_frontrear_octobre_2019.csv') 251 | img_urls = [x for inx, x in row_front.iteritems() if 'Picture_' in inx and str(x) != 'nan'] 252 | pic_name = row_front['imgName'] 253 | for ix, url in enumerate(img_urls): 254 | saveImage(url, pic_name+'_'+str(ix), 'front_rear') 255 | 256 | 257 | -------------------------------------------------------------------------------- /generating-male-faces-with-vae: -------------------------------------------------------------------------------- 1 | 2 | 3 | import tensorflow as tf 4 | from tensorflow import keras 5 | import numpy as np 6 | import pandas as pd 7 | import re 8 | import matplotlib.pyplot as plt 9 | # fashion_mnist = keras.datasets.fashion_mnist 10 | from matplotlib.markers import MarkerStyle 11 | from keras import backend as K 12 | from keras.optimizers import Adam 13 | from keras.datasets import mnist 14 | from keras.layers import Lambda, Input, Dense 15 | from keras.losses import binary_crossentropy 16 | from keras.models import Model 17 | from keras.callbacks import EarlyStopping, ModelCheckpoint 18 | from glob import glob 19 | from PIL import Image 20 | from time import time 21 | from sklearn.model_selection import train_test_split 22 | import os 23 | import imageio 24 | from IPython.display import Image as Img 25 | os.chdir('c:/users/nicolas/documents/data/faces') 26 | 27 | 28 | # ##### Function to sort images 29 | 30 | # In[2]: 31 | 32 | 33 | def sorted_alphanumeric(data): 34 | convert = lambda text: int(text) if text.isdigit() else text.lower() 35 | alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 36 | return sorted(data, key=alphanum_key) 37 | 38 | 39 | # ##### Loading all file names 40 | 41 | # In[3]: 42 | 43 | 44 | files = sorted_alphanumeric(glob(r'C:\Users\Nicolas\Documents\Data\faces\combined/*.jpg')) 45 | 46 | 47 | 48 | np.unique([i[-34] for i in files], return_counts=True) 49 | 50 | 51 | # ##### Keeping only men/women (not both) 52 | 53 | # In[6]: 54 | 55 | 56 | faces = [i for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) == 2 ] # or in ('0', ''1'') 57 | 58 | 59 | # In[7]: 60 | 61 | 62 | y = [i[-34] for i in files if (i[-34] == '0') and len(i[-37:-35].strip('\\').strip('d')) > 1 ] 63 | 64 | 65 | assert len(y) == len(faces), 'The X and Y are not of the same length!' 66 | 67 | 68 | dim = 60 69 | 70 | 71 | def crop(img): 72 | if img.shape[0]