├── AI workshop.pptx ├── India Police Hackathon-Deep_Learners-Face_Recognition.pptx ├── TFUG Mysuru-GDB Bangalore-Co learning lounge-Applied Singularity Deep Learning in Videos.pdf ├── README.md ├── Tensorflow World Extended Bangalore.md ├── Applied Singularity - Deep dive into Cyclegan.md ├── Mantissa Data Science Meetups -Introduction to Deep Learning and Computer Vision.md ├── Bangalore Deep Learning Club - An Introduction to Generative Deep Learning.md └── Tf_lite.ipynb /AI workshop.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Anil-matcha/Speaking-Engagements/master/AI workshop.pptx -------------------------------------------------------------------------------- /India Police Hackathon-Deep_Learners-Face_Recognition.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Anil-matcha/Speaking-Engagements/master/India Police Hackathon-Deep_Learners-Face_Recognition.pptx -------------------------------------------------------------------------------- /TFUG Mysuru-GDB Bangalore-Co learning lounge-Applied Singularity Deep Learning in Videos.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Anil-matcha/Speaking-Engagements/master/TFUG Mysuru-GDB Bangalore-Co learning lounge-Applied Singularity Deep Learning in Videos.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Speaking-Engagements 2 | 3 | Content for various speaking engagements I have been part of will be posted here 4 | 5 | - [Bangalore Deep Learning Club](https://www.meetup.com/Bangalore-Deep-Learning-Club/events/265446520/) 6 | 7 | - [India Police Hackathon](https://ksp.gov.in/hackathon/) 8 | 9 | - [Tensorflow World Extended Bangalore](https://www.eventbrite.com/e/tensorflow-world-extended-bangalore-tickets-82022901707) 10 | 11 | - [Applied Singularity](https://www.meetup.com/AppliedSingularity/events/266996091/) 12 | 13 | - [ODSC Bangalore](https://www.meetup.com/Bengaluru-Data-Science-ODSC/events/266940892/) 14 | 15 | - [Mantissa Data Science Meetups](http://mantissadatascience.mystrikingly.com/) 16 | 17 | - [TFUG Mysuru-GDB Bangalore-Co learning lounge-Applied Singularity](https://www.meetup.com/en-AU/TFUG-Mysuru/events/270433421/) ([Video link](https://www.youtube.com/watch?v=A7KlDeAn8Iw)) 18 | 19 | - [Intro to NLP](https://www.boardinfinity.com/webinars/introduction-to-natural-language-processing) 20 | 21 | - [Intro to Deep Learning](https://www.boardinfinity.com/webinars/introduction-to-deep-learning) 22 | 23 | - [A comprehensive review of Super Resolution - Kaggle Days Meetup Surat](https://www.linkedin.com/company/e-meetups/) ([Video Link](https://www.youtube.com/watch?v=GrffeA85fcc)) 24 | 25 | - [A career in AI and ML, talks with coach by Board Infinity](https://www.instagram.com/p/CGVW6BrFutw/) 26 | -------------------------------------------------------------------------------- /Tensorflow World Extended Bangalore.md: -------------------------------------------------------------------------------- 1 | # TFLITE 2 | 3 | Code available here [TFLite code](https://github.com/Anil-matcha/Speaking-Engagements/blob/master/Tf_lite.ipynb) 4 | 5 | ![](https://miro.medium.com/max/8122/1*jM9g4p6g8k3FYquSG5l1gw.jpeg) 6 | 7 | Set of tools to optimize models for mobile 8 | 9 | # About me 10 | 11 | Senior computer vision engineer at Samsung R&D buliding optimized models for providing intelligence to mobile devices 12 | 13 | ![](https://www.analyticsinsight.net/wp-content/uploads/2019/07/Computer-Vision-Future-1024x682.jpg) 14 | 15 | Reach me at Linkedin https://in.linkedin.com/in/anilmatcha 16 | 17 | Apart from this I do blogging 18 | 19 | Run a telegram group for computer vision professionals http://t.me/ComputerVisiongroup 20 | 21 | 2 components 22 | 23 | 1. Tensorflow Interpreter :- Capable to run optimized models on different hardware such as mobile, raspberry etc. 24 | 2. TFlite converter :- Converts the tf trained model to an optimized one to run faster 25 | 26 | 27 | 28 | # Need for TFLite :- 29 | 30 | 1. Speed, can use smaller models 31 | 2. Privacy 32 | 3. Internet 33 | 4. Battery 34 | 35 | ![](https://miro.medium.com/max/1548/1*MtXrCASxGrQtX2PPmhJcAw.png) 36 | 37 | # Steps to run a model on mobile 38 | 39 | 1. Train a model on tensorflow 40 | 2. Convert the model to tflite format using converter 41 | 3. Add the tflite model to the device to run inference 42 | 43 | ![](https://miro.medium.com/max/2516/0*Bt9qwKDjd1xi5RDd.) 44 | 45 | # Utilities :- 46 | 47 | TFLite supports a subset of operations of tensorflow which are optimized for embedded devices. 48 | 49 | More ops can be added using Tensorflow Select 50 | 51 | You can select an operation and build a binary of TFLite which might be heavier. Also can write a custom operation in C++ if needed and can be used 52 | 53 | 54 | 55 | # Training classifier 56 | 57 | ![](https://miro.medium.com/max/2560/1*2oSWoC8Y3s25F87kfmEIcQ.jpeg) 58 | 59 | 1. Create dataset 60 | 2. Train final layers 61 | 3. Fine-tune last layers 62 | 63 | ![](https://miro.medium.com/max/1920/1*Ww3AMxZeoiB84GVSRBr4Bw.png) 64 | 65 | 4. Fine-tune more layers 66 | 67 | # Steps for android app 68 | 69 | The models and the labels file needs to be added in assets folder 70 | 71 | ``` 72 | android { 73 | aaptOptions { 74 | noCompress "tflite" 75 | } 76 | } 77 | ``` 78 | 79 | In build.gradle this needs to be added to say not to compress tflite files to android 80 | 81 | ![](https://miro.medium.com/max/13340/1*32DnkPuY-yP1hgDQR8Ke5g.png) 82 | 83 | ``` 84 | tflite = new Interpreter(tfliteModel, tfliteOptions); 85 | ``` 86 | 87 | This line instantiates the interpreter for android 88 | 89 | ``` 90 | imgData = ByteBuffer.allocateDirect( 91 | DIM_BATCH_SIZE * 92 | getImageSizeX() * 93 | getImageSizeY() * 94 | DIM_PIXEL_SIZE * 95 | getNumBytesPerChannel() 96 | ); 97 | ``` 98 | 99 | The image is provided like this 100 | 101 | ``` 102 | imgData.load(bitmap); 103 | ImageProcessor imageProcessor = 104 | new ImageProcessor.Builder() 105 | .add(new ResizeWithCropOrPadOp(cropSize, cropSize)) 106 | .add(new ResizeOp(imageSizeX, imageSizeY, ResizeMethod.BILINEAR)) 107 | .add(new Rot90Op(numRoration)) 108 | .add(getPreprocessNormalizeOp()) 109 | .build(); 110 | imageProcessor.process(inputImageBuffer) 111 | ``` 112 | 113 | To run tflite inference of an image 114 | 115 | ``` 116 | tflite.run(inputImageBuffer.getBuffer(), outputProbabilityBuffer.getBuffer()); 117 | ``` 118 | 119 | # Quantization 120 | 121 | ![](https://miro.medium.com/max/1350/0*Eqk0bsuRgzVf0Fyu) 122 | 123 | # Pruning 124 | 125 | ![](https://miro.medium.com/max/1532/0*iNI8Oc80Eunm8NgI) 126 | 127 | https://www.tensorflow.org/model_optimization 128 | 129 | #### Quantization aware training -------------------------------------------------------------------------------- /Applied Singularity - Deep dive into Cyclegan.md: -------------------------------------------------------------------------------- 1 | # Deep dive into Cyclegan 2 | 3 | Computer vision engineer at Samsung R&D buliding optimized models for providing intelligence to mobile devices 4 | 5 | **Usecases** :- Gallery object search, human object interaction, live segmentation, scene recognition 6 | 7 | ![img](https://camo.githubusercontent.com/9389472d239c0d04dca439ab87294942f4165f3e/68747470733a2f2f7777772e616e616c7974696373696e73696768742e6e65742f77702d636f6e74656e742f75706c6f6164732f323031392f30372f436f6d70757465722d566973696f6e2d4675747572652d31303234783638322e6a7067) 8 | 9 | Reach me out at Linkedin https://in.linkedin.com/in/anilmatcha 10 | 11 | I run a telegram group for computer vision professionals http://t.me/ComputerVisiongroup 12 | 13 | # What is GAN 14 | 15 | # Generative Algorithms 16 | 17 | - We all are familiar with task such as classification 18 | - Neural networks are pretty good at doing calculations, making comparisons 19 | - Now they are able to imagine things which is considered only capable by humans 20 | - Humans are pretty good at imagining just by closing eyes 21 | - A generative algorithm trained on learning how an object say horse looks like can generate new horse images which look like a horse in real 22 | 23 | GAN is one popular generative algorithm 24 | 25 | ![]() 26 | 27 | # Applications 28 | 29 | Few applications 30 | 31 | Generating art for cartoons or video games 32 | 33 | Used for augmenting input data 34 | 35 | Image-to-image translation 36 | 37 | ![]() 38 | 39 | Text to Image conversion 40 | 41 | ![]() 42 | 43 | Super resolution 44 | 45 | ![]() 46 | 47 | Faceapp like image editing 48 | 49 | ![]() 50 | 51 | Music generation 52 | 53 | Deepfakes 54 | 55 | More at https://github.com/nashory/gans-awesome-applications 56 | 57 | # GAN working 58 | 59 | - Consists of 2 networks instead of one network 60 | - Generator task is to generate images 61 | - Discriminator looks at an image and says if it's fake or real 62 | - Both the networks learn simultaneously by playing a game of overpowering each other 63 | - Training stops when discriminator gets confused completely 64 | - Many popular architectures like DCGAN, StyleGAN, CGAN, BigGAN etc. 65 | 66 | Let's discuss **CycleGAN** 67 | 68 | - CycleGAN is used in domain transfer usecases like zebra-horse, day-night, photo-painting 69 | 70 | ![]() 71 | 72 | - Basically an Image-to-Image translation problem 73 | - Pix2Pix does this by having paired images from domain 1 to domain 2 74 | - Cyclegan works without paired images with a clever trick 75 | 76 | ![]() 77 | 78 | - Instead of a single generator-discriminator we now have a pair of generator discriminator 79 | - First generator translates the image to a new domain 80 | - Second generator brings the image to the previous domain 81 | - The task of both the discriminators is to validate the generated images 82 | 83 | 84 | 85 | # Loss Function 86 | 87 | **Discriminator Loss** 88 | 89 | - General discriminator works like a classification network 90 | - Classifies the input as real or fake class basically a 2 class network 91 | - Cross-entropy loss can be used directly 92 | - CycleGAN uses a PatchGAN discriminator taken from pix2pix 93 | - Instead of a single output it produces a grid NxN of outputs 94 | - Each output in the grid corresponds to a patch of certain size in the input image 95 | - Now instead of classifying entire image we are classifying patches of images 96 | - Produced better and sharper features while doing research in pix2pix 97 | - Take loss for each cell and sum up or use mse 98 | 99 | **Generator Loss** 100 | 101 | - Adversial loss, Cyclic Loss, Identity loss 102 | 103 | - **Adverisal Loss** :- 104 | 105 | Opposite of Discriminator loss 106 | 107 | - **Cyclic Loss** :- 108 | 109 | Image translated to a new domain and translated back should be similar. 110 | 111 | L1 loss is applied. 112 | 113 | What this does is it not only translates the input image to a new domain 114 | 115 | But also keeps few relevant features of original domain 116 | 117 | So a horse converted to zebra looks like original horse 118 | 119 | - **Identity Loss** :- 120 | 121 | Image of opposite domain should not be altered 122 | 123 | Mainly used for Photo generation from paintings to preserve color composition between the input and output 124 | 125 | ​ ![]() 126 | 127 | 128 | 129 | # Face Changing 130 | 131 | **UTKFace Dataset** 132 | 133 | ![]() 134 | 135 | age_gender_race_date&time.jpg 136 | 137 | Collect all 20-30 age images into folderA 138 | 139 | Collect all 50-60 age images into folderB 140 | 141 | Train CycleGAN and do domain transfer 142 | 143 | Discriminator produces a patch of input_size/4 144 | 145 | Renset-style generator 146 | 147 | **Code Walkthrough** 148 | 149 | https://tinyurl.com/cyclegan 150 | 151 | # Tips to stabilize training 152 | 153 | 1. Use progressive resizing introduced by Jeremy Howard. Training with 256x256 would constrain you to smaller batch size like 1. Start with say 64x64(allows batch size 32) and progressively increase using checkpoints. Same like training a classification network 154 | 2. Look at loss values closely. See which one is lacking by looking at the images and change based on that. Reconstruction is bad -> Boost cyclic loss. Generated images are bad -> Boost adversial loss. Color lost in generated images -> Boost identity loss 155 | 156 | 157 | 158 | # Debugging 159 | 160 | In a classification network you check progress with say accuracy 161 | 162 | In a gan accuracy goes up and down, so look at the generated images 163 | 164 | Generate images at every few iterations and see results 165 | 166 | If you feel something is missing, tweak the losses and repeat 167 | 168 | 169 | 170 | # Extension 171 | 172 | Make the generator do multiple tasks by using conditions 173 | 174 | Can be done in multiple ways 175 | 176 | Add an extra channel to RGB input like a channel of all 0's for age conversion and all 1's for race conversion 177 | 178 | Have multiple discriminators one to discriminate age and one to discriminate race 179 | 180 | Train together both the tasks 181 | 182 | Instead can have a single discriminator by passing the conditional input to discriminator as well just like discriminator 183 | 184 | Instead of passing the condition as a channel, can pass the condition through an embedding layer and concatenating to the dense layer 185 | 186 | 187 | 188 | # Applying on real-world images 189 | 190 | Took a face recognition model to get the face cropped it out and sent to generator and replaced it back 191 | 192 | ![]() 193 | 194 | ![]() 195 | 196 | 197 | 198 | More resources for different kinds of GAN :- https://github.com/hindupuravinash/the-gan-zoo 199 | 200 | # Q & A 201 | 202 | 1. Changing parameters so as to affect a certain type of object in the image. ? 203 | 204 | We saw that using conditional gan 205 | 206 | 2. How many iterations to run ? 207 | 208 | No hard figure 209 | 210 | 3. Using GANs in our Image recognition platform ? 211 | 212 | Some startups are using and seems to have decent results 213 | 214 | 4. Production usage in gan ? 215 | 216 | Faceapp and similar applications use GAN in production 217 | 218 | 5. Mode collapse and GAN metrics such as FID score ? 219 | 220 | To deal with mode collapse use WGAN 221 | 222 | Inception score and FID score are used to evaluate GAN performance 223 | 224 | Inception score uses Inception network. 225 | 226 | ![]() 227 | 228 | ![]() 229 | 230 | FID improves on IS by comparing generated statistics with real data statistics of an Inception network 231 | 232 | 6. GAN transfer leanring ? 233 | 234 | Not yet 235 | 236 | 7. Learning path ? 237 | 238 | Start with simple DCGAN MNIST dataset, then celebA dataset -------------------------------------------------------------------------------- /Mantissa Data Science Meetups -Introduction to Deep Learning and Computer Vision.md: -------------------------------------------------------------------------------- 1 | # Introduction to Deep Learning and Computer Vision 2 | 3 | Computer vision engineer at Samsung R&D buliding optimized models for providing intelligence to mobile devices 4 | 5 | **Usecases** :- Gallery object search, human object interaction, live segmentation, scene recognition 6 | 7 | [![img](https://camo.githubusercontent.com/9389472d239c0d04dca439ab87294942f4165f3e/68747470733a2f2f7777772e616e616c7974696373696e73696768742e6e65742f77702d636f6e74656e742f75706c6f6164732f323031392f30372f436f6d70757465722d566973696f6e2d4675747572652d31303234783638322e6a7067)](https://camo.githubusercontent.com/9389472d239c0d04dca439ab87294942f4165f3e/68747470733a2f2f7777772e616e616c7974696373696e73696768742e6e65742f77702d636f6e74656e742f75706c6f6164732f323031392f30372f436f6d70757465722d566973696f6e2d4675747572652d31303234783638322e6a7067) 8 | 9 | Reach me out at Linkedin https://in.linkedin.com/in/anilmatcha 10 | 11 | I run a telegram group for computer vision professionals http://t.me/ComputerVisiongroup 12 | 13 | # 14 | 15 | 16 | 17 | Learning paths today 18 | 19 | 1. Classical Machine Learning 20 | 2. Computer Vision 21 | 3. NLP 22 | 4. Reinforcement Learning 23 | 24 | 25 | 26 | ![](https://tkwsibf.edu.in/wp-content/uploads/2016/05/choice-confuse-1-800x285.png) 27 | 28 | 29 | 30 | - There is no prerequisite between Deep Learning and Machine Learning 31 | 32 | - You don't need to be good at maths to get started 33 | 34 | - Pick up a broad field and break into topics which you need to conquer 35 | 36 | - Be consistent and see improvements 37 | 38 | 39 | 40 | # Goal 41 | 42 | Set a goal for the new year and conquer it with little milestones 43 | 44 | Project your work outside and let people recognize you 45 | 46 | Collaborate with people and fast-track your progress 47 | 48 | 49 | 50 | I had a goal to be good at deep learning at the beginning of the year and was able to complete few of the things I had in mind 51 | 52 | 53 | 54 | # Deep Learning 55 | 56 | Deep Learning doesn't need machine learning 57 | 58 | You can start straight off 59 | 60 | It is not very math heavy, mostly based on intuition. Can learn math along the way 61 | 62 | Can choose to work on either NLP or CV 63 | 64 | I chose CV since it interested me more 65 | 66 | 67 | 68 | Deep Learning is the study of big neural networks and their properties 69 | 70 | Interesting thing about neural networks which beats other algorithms - They can capture higher order relations - No need to do feature engineering 71 | 72 | 73 | 74 | You might think neural network is just a single algorithm, but there is an ocean of research going in it 75 | 76 | We will start with all the concepts you need to know to get started in this field 77 | 78 | 79 | 80 | There are multiple kinds of neural networks designed for specific use-cases. Main ones being DNN for tabular data, CNN for visual or speech information, RNN LSTM and Transformers for NLP 81 | 82 | 83 | 84 | Today our main lead for the talk is CNN 85 | 86 | 87 | 88 | The basic concept of a CNN is to identify key features from an image and build on top of that layer-by-layer 89 | 90 | This is done with something called filters in computer vision. 91 | 92 | 93 | 94 | Image is nothing but a set of pixel values i.e a 3d matrix. We can perform matrix operations on it 95 | 96 | 97 | 98 | A filter is one which can identify a particular feature from a region of image. Like consider a 9. It consists of a straight line and a 0. 99 | 100 | 101 | 102 | Layer is a bunch of filters. Neural network is a bunch of filters. 103 | 104 | ![]() 105 | 106 | The filter works by operating on a region of image by an operation called convolution similar to element-wise production of matrices. Each filter goes across entire region of image and tries to find out a particular feature 107 | 108 | 109 | 110 | By performing the operation we divided the information into multiple sub-parts. We can have all conv layers but it's too much in computation. So we do pooling. 111 | 112 | 113 | 114 | Pooling is nothing but identify most important features from an image and passing it across. This is max pooling. We have a bunch of these kind of layers. 115 | 116 | 117 | 118 | After some time you have observed all the features, you need to make decisions out of it. That is done by using dense layers similar to a general DNN. Flatten the output and keep dense layers 119 | 120 | 121 | 122 | Finally you constrain the outputs to a certain number of classes using something called softmax layer. 123 | 124 | 125 | 126 | **Loss function** :-The way to tell a neural network whether it is going in right direction or wrong direction 127 | 128 | 129 | 130 | **Backpropagation and Gradient descent**:- The magic which stitches all these together. The math is complex but you can take it for granted. 131 | 132 | 133 | 134 | **Learning rate** :- Speed of learning information, needs to be tuned, a hyperparameter 135 | 136 | 137 | 138 | **Batch size** :- Number of images in an iteration 139 | 140 | 141 | 142 | **Epoch** :- When network has looked at all the images 143 | 144 | 145 | 146 | **Activation Functions** :- Add non-linearity to your network. Relu, sigmod, tanh. 147 | 148 | 149 | 150 | **Optimizers** :- Fast track the convergence of neural networks. Few of them being SGD, RMSProp, Adam, SGD with Momentum 151 | 152 | 153 | 154 | **Convolution** :- Like we have seen before. 3x3 and 1x1. There are multiple types of convolution. Depthwise convolution, Separable Convolution, Group Convolution, Transpose convolution, Dilated Convolution 155 | 156 | 157 | 158 | # Regularization in Deep Learning 159 | 160 | 161 | 162 | **Batch Normalization** :- Normalize input at every layer. Variants of it Layer Normalization, Weight Normalization, Instance Normalization 163 | 164 | **Dropout** :- Make network do more with less resources 165 | 166 | **Label smoothing** :- Real world data may be noisy. Don't be too confident 167 | 168 | **Weight regularization** :- Simple weights means better generalization 169 | 170 | **Data Augmentation** :- Increase your dataset with augmentation 171 | 172 | 173 | 174 | # Network architectures 175 | 176 | ![]() 177 | 178 | 179 | 180 | This is how deep learning exploded 181 | 182 | Network architectures are optimized neural networks built to handle image recognition tasks 183 | 184 | Important ones :- VGG, **Resnet**, Inception, Densenet 185 | 186 | 187 | 188 | Try using any network architecture than designing one 189 | 190 | 191 | 192 | # Datasets for classification 193 | 194 | MNIST 195 | 196 | ![](https://camo.githubusercontent.com/d440ac2eee1cb3ea33340a2c5f6f15a0878e9275/687474703a2f2f692e7974696d672e636f6d2f76692f3051493378675875422d512f687164656661756c742e6a7067) 197 | 198 | CIFAR10 199 | 200 | ![]() 201 | 202 | Tinyimagenet/CIFAR100 203 | 204 | ![]() 205 | 206 | 207 | 208 | Once done with classification try out a task like Style Transfer 209 | 210 | ![]() 211 | 212 | 213 | 214 | # Object Detection 215 | 216 | ![]() 217 | 218 | Multiple architectures like all variants RCNN, YOLO, SSD, Retinanet etc. 219 | 220 | RCNN variants come under two stage detection 221 | 222 | YOLO, SSD come under single stage detection 223 | 224 | Datasets :- COCO, Pascal VOC 225 | 226 | # Image segmentation 227 | 228 | Classify each and every pixel as belonging to a class or not 229 | 230 | FCN, Unet, Deep Lab V3, Mask RCNN 231 | 232 | ![]() 233 | 234 | Datasets :- Cityscape, Camvid, KITTI 235 | 236 | 237 | 238 | # Transfer Learning 239 | 240 | Transfer knowledge from one domain to another domain 241 | 242 | Use a network trained on tons of data as feature extractor 243 | 244 | Finetune for your usecase 245 | 246 | # Image captioning 247 | 248 | ![]() 249 | 250 | Datasets :- MS COCO, Flickr 8k 251 | 252 | Architectures :- CNN + LSTM, Attention models 253 | 254 | 255 | 256 | # Generative Adversial Network 257 | 258 | Build a basic GAN like celebrity face generation with DCGAN 259 | 260 | Try a complex GAN like Pix2Pix or CycleGAN then 261 | 262 | Architectures :- DCGAN, CycleGAN, Pix2Pix, CGAN, WGAN etc. 263 | 264 | - thispersondoesnotexist.com 265 | 266 | Datasets :- MNIST, CelebA, UTKFace 267 | 268 | ![]() 269 | 270 | 271 | 272 | # Face Detection and Recognition 273 | 274 | Architectures :- Siamese Networks Facenet 275 | 276 | Dataset :- Aligned Face Dataset, UMDFaces 277 | 278 | ![]() 279 | 280 | 281 | 282 | # Object Tracking 283 | 284 | ![]() 285 | 286 | Architectures :- Siamese networks, GOTURN 287 | 288 | Dataset :- MOT, Tracknet 289 | 290 | 291 | 292 | # Mobile optimized networks 293 | 294 | Architectures :- Mobilenet, Squeezenet, EfficientNet 295 | 296 | Tools :- TFLite 297 | 298 | # Action Recognition 299 | 300 | ![]() 301 | 302 | Architectures :- CNN+LSTM, Two stream neural networks, 3D CNN 303 | 304 | Datasets :- UCF101, Kinetics-600 305 | 306 | 307 | 308 | # Pose Estimation 309 | 310 | ![]() 311 | 312 | Architectures :- Deep Pose, Convolutional Pose Machines 313 | 314 | Datasets :- LSP, FLIC 315 | 316 | 317 | 318 | # Human Object Interaction 319 | 320 | ![]() 321 | 322 | Architectures :- TIN, ICAN 323 | 324 | Datasets :- HICO 325 | 326 | 327 | 328 | # 3D computer vision 329 | 330 | Until now we have been looking at 2d data. But as humans we understand from 3d images. Computers do this using point clouds generated by using multiple cameras. 331 | 332 | Architectures :- Pointnet, Shapenet 333 | 334 | ![]() 335 | 336 | 337 | 338 | # Projecting Work 339 | 340 | 1. Add all the projects to github 341 | 2. Blog your work on various platform. Collaborate with publications. Post on linkedin 342 | 3. Take internships whenever possible 343 | 4. Speak about your work at events, showcase your projects 344 | 345 | -------------------------------------------------------------------------------- /Bangalore Deep Learning Club - An Introduction to Generative Deep Learning.md: -------------------------------------------------------------------------------- 1 | ### An Introduction to Generative Deep Learning 2 | 3 | #### A getting started guide to Generative Deep Learning Algorithms 4 | 5 | **Generative** and **Discriminative** models are two different approaches that are widely studied in task of classification. They follow a different route from each other to achieve the final result. **Discriminative** models are widely popular and are used more comparatively to perform the task since they give better results when provided with a good amount of data. All the popular algorithms such as **SVM, KNN** etc. and popular network architectures such as **Resnet, Inception** etc. come under this. 6 | 7 | The task of a **discriminative** model is simple, if it is shown data from different classes it should be able to discriminate between them. For example if I show the model a set of **dog and cat images** it should be able to say what is a dog and what is a cat by using discriminative features such as **eyes shape, ears size** etc. 8 | 9 | ![img](https://cdn-images-1.medium.com/max/800/0*Lqqh5KtEA3ZKaGy5.gif) 10 | 11 | **Generative** model on the other hand has a much more complex task to perform. It has to understand the distribution from which the data is obtained and then needs to use this understanding to perform the task of classification. Also generative models have the **capability of creating data** similar to the training data it received since it has learnt the distribution from which the data is provided. For example if I show a generative model a set of dog and cat images now the model should understand completely what are the features that belongs to a certain class and how they can be used to generate similar images. Using this information it can do multiple things. It can compare the attributes to classify the image similar to how discriminative algorithm can classify an image. It can **generate** a new image which looks like one of the class images it has been provided for training. 12 | 13 | > **Humans don’t act like pure discriminators, we possess enormous generative capabilities** 14 | 15 | Progress in generative algorithms is important because humans don’t act just like pure discriminators, we have enormous generative or imaginative capabilities. If we give certain attributes such as **blue car on road** we can instantly generate a picture of that in our mind and we are looking at providing this kind of intelligence to machines. 16 | 17 | In the below content we will discuss about two famous generative algorithms **Variational Autoencoders** and **Generative Adversial Networks** 18 | 19 | ### Autoencoder 20 | 21 | ![img](https://cdn-images-1.medium.com/max/800/0*ZSa-g5wH5a6KGrIs.jpeg) 22 | 23 | An **autoencoder** is a type of ANN used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for **dimensionality reduction.** 24 | 25 | We as humans are pretty good at **visualizing** using few attributes. For example if we describe a human as tall, fair, bulky, no mustache, punjabi you can create a visualization based on these attributes. An autoencoder tries to achieve same thing. If I show an image of a person it learns all the attributes(known as **latent attributes**) such as the above needed to identify the person and then can use them to visualize/reconstruct the person. 26 | 27 | ![img](https://cdn-images-1.medium.com/max/800/0*ARH-HFfnLSJQV51q.png) 28 | 29 | Autoencoder consists of 3 components 30 | 31 | 1. **Encoder** 32 | 2. **Bottleneck** 33 | 3. **Decoder** 34 | 35 | ![img](https://cdn-images-1.medium.com/max/800/0*fmfbfhHHJbkmhP0l.png) 36 | 37 | **Encoder** is similar to any classification neural network such as **Resnet** etc. sans the prediction softmax layer. If we see the below figure of VGG network if we remove the final softmax layer the final 1000 values that we get can be thought of as 1000 latent attributes of an image. 38 | 39 | ![img](https://cdn-images-1.medium.com/max/800/0*rYUj6xEmlQDfVOXq.png) 40 | 41 | **Decoder** is the opposite of encoder, it takes the latent attributes from the output of encoder and tries to reconstruct the image. This is done using deconv layers which can unsample the input 42 | 43 | ![img](https://cdn-images-1.medium.com/max/800/0*DyrAHiGm_bCBxOe-.png) 44 | 45 | **Bottleneck** is the latent vector that is output by the encoder and is upsampled by the decoder. It contains the **latent attributes** that are produced by the decoder such as the height, weight etc. described above. 46 | 47 | The network of encoder and decoder is trained together using **backpropagation** to reduce the loss of reconstruction such as mean square error between the pixels. 48 | 49 | **Applications of autoencoder :-** 50 | 51 | 1. **Denoising images** :- Autoencoders can be used to remove noise from images. Since autoencoder learns the latent representation and not the noise it can remove the noise and give the clear image. It is trained by providing noisy images at input and we try to minimize reconstruction error with proper images at output 52 | 2. **Recommender systems** :- Netflix movie recommendation challenge winner used deep autoencoders 53 | 3. **Compression** :- As we have seen autoencoder converts an input to its latent space attributes and converts it back, it can be used for compression by making latent space much smaller in comparison to input. 54 | 4. **Dimensionality Reduction** :- Autoencoders can be used similar to **PCA** to reduce the feature space by mapping input to latent attributes and using them for modelling. 55 | 5. **Generation of data** :- A variant of autoencoders called **variational autoencoders** can be used to generate data similar to the distribution it is trained on which we will discuss below. 56 | 57 | ### Variational Autoencoder 58 | 59 | Above we have seen how can we use an autoencoder to compress the input to its latent variables. One problem which we haven’t yet solved with the above approach is the network could learn a representation which works but doesn’t generalize well. This is a classic problem in deep neural networks and is called **overfitting**. If a neural network has enough capacity it can just memorize the input data and map it to latent attributes without creating a general understanding in which case the **latent attributes don’t possess good properties** like the one we have discusses above and they could be meaningless , the task is still performed i.e reconstruction but the latent attributes don’t carry any meaning. You can compare this to a student who has **mugged up all the answers** in the textbook and can solve the problem if given directly from textbook but completely falters even if there is a slight change in the problem. We try to solve this problem using **Variational Autoencoders** which generalize much better in comparison to Vanilla Autoencoder. 60 | 61 | ![img](https://cdn-images-1.medium.com/max/800/0*s9dNyBEtiUBBaXjL.png) 62 | 63 | Above is an image of MNIST data trained using Vanilla Autoencoder. As we can see there are different distinct clusters formed which is what we asked the autoencoder to do. Now if we see into the figure we can clearly see that the clusters are **not continuous** and there are gaps in between. So if we take a point from the gap and pass it to the decoder it might give an output which doesn’t resemble any of the classes. We don’t want this to happen, we want the space to be continuous and the outputs to make sense. We achieve this by using **VAE**. 64 | 65 | We want **VAE** to have the below 2 properties 66 | 67 | 1. **Continuity** :- Two close points in latent space space should give identical outputs, if not the case it means there is high variance and hence overfitting and no generalization 68 | 2. **Completeness** :- A point from latent space should map to a meaningful output and shouldn’t give an unknown image as output 69 | 70 | ![img](https://cdn-images-1.medium.com/max/800/0*tHKyxWh3om6Nokk8.png) 71 | 72 | To achieve this VAE **encoder** part outputs along with a set of latent attributes a set of **mean and variance** corresponding to each attribute in latent space. Vanilla autoencoder encodes the input to a single set of latent attributes but **VAE** encodes each latent attribute to a distribution having a mean and variance. Each time an image is passed, a set of latent attributes are **sampled** according to their mean and variance and is passed on to the decoder. The decoder works similar to that of decoder present Vanilla autoencoder i.e it **upsamples** the latent attributes to recover the input image. 73 | 74 | The advantage of following the above approach is since an input is mapped to a **distribution** of latent attributes, **points which are close in the latent space get mapped to similar output** by default. We enforce few **constraints** on the distribution of each latent space attribute such that it regularizes as per our expectations 75 | 76 | i) Distribution follows a normal distribution with **variance** of each attribute **close to 1**. This prevents the clusters from becoming very **tight** and hence helps in making the latent space continuous, if not the VAE can push the cluster to a very tight group, think of like a single point, which would fail our expectation of **continuity**. 77 | 78 | ii) We try to keep the **mean** all the clusters to be **close to 0** such that we can ensure a smooth transition from one cluster to another and there are no gaps in between since this will bring all the clusters closer to each other. This way any point in latent space maps to a meaningful output. 79 | 80 | ![img](https://cdn-images-1.medium.com/max/800/0*U16PT6BL2WlzXdy4.png) 81 | 82 | These are the steps followed by a VAE 83 | 84 | > Encoder receives the input and outputs a set of means and variances corresponding to each latent attribute 85 | 86 | > A latent attribute is sampled randomly from each mean and variance and is passed to decoder 87 | 88 | > Decoder takes the randomly sampled latent attributes, upsamples it to try and reconstruct the output by minimizing the loss of reconstruction 89 | 90 | ### Loss Function 91 | 92 | Loss function of VAE consists of 2 parts 93 | 94 | 1) **Reconstruction Loss** :- Similar to that of Vanilla AE we use MSE or cross entropy 95 | 96 | 2) **Regularization Loss** :- We try to model the output probability distributions of each latent attribute close to a standard normal. We do this by reducing **KL Divergence** between the output probability distribution and standard normal distribution. 97 | 98 | **Loss = Reconstruction_loss + c \* KL_loss** 99 | 100 | **KL Divergence** is used to measure the divergence between two probability distributions. Lower the value better is the match between two distributions. **c** is a **hyperparameter** which needs to be tuned and is used to balance the importance of reconstruction loss and regularization loss. 101 | 102 | Now we come to the interesting part of learning about using VAE to **generate new data** similar to the training data and **Applications of VAE**. Once we have trained a VAE to a good extent we should have developed a continuous and complete latent space. Now we can pick any point from the latent space and pass it to the decoder, it will **generate a new image** completely unseen till now but still looks like it belongs to the distribution of data the VAE is trained on i.e it looks like one of the classes of training data which is awesome since now the network can generate data on its own. 103 | 104 | **Applications of VAE** :- 105 | 106 | 1. Generating new data similar to the distribution of data the VAE is trained on 107 | 2. Adding artifacts to an existing image. For example if we know the images without sunglasses on and if we know the images with sunglasses on we can take a difference between their means and use it to add a sunglasses to any new image 108 | 109 | ![img](https://cdn-images-1.medium.com/max/800/0*m33eTklemM0VePJ3.png) 110 | 111 | ### Generative Adversial Networks 112 | 113 | GAN’s are another set of generative algorithms and are one of the primary reasons for producing so much hype in deep learning. Several applications have been made using GANs and a multitude of architectures have been researched upon which led to rapid development in the field of GAN’s which can generate cool results which can make one wonder if it is real image or an image generated by GAN. For example below are the faces of person who have never existed in the real world. Looks pretty **cool** right. You can go [**https://thispersondoesnotexist.com**](https://thispersondoesnotexist.com) . This website gives a realistic fake person on every refresh. 114 | 115 | ![img](https://cdn-images-1.medium.com/max/800/0*OtfXkDOfog4uNE3Z.jpg) 116 | 117 | Recently Samsung published a paper in which a neural network takes just a picture and can produce a small video gif out of it. Through this they have got **Monalisa** alive. Think about what is in the future possibility. You can make you dead ancestors speak to you. Holy **awesome** 118 | 119 | ![img](https://cdn-images-1.medium.com/max/800/0*wReOpDWFSyLwbnCp.gif) 120 | 121 | Until now we used to believe we can have faith of whatever we see or listen since they happened in real i.e **videos news must be true. Not anymore**. Now realistic fake videos or audio of the person can be generated like below. Now you can’t even trust video news. 122 | 123 | ![img](https://cdn-images-1.medium.com/max/800/0*dB95f4pXZT6pKNAL.gif) 124 | 125 | You can do domain transfer using a GAN as well. If you have an image of a horse you can **reimagine** as how it would look like if it is a zebra by using a GAN. 126 | 127 | ![img](https://cdn-images-1.medium.com/max/800/0*vuVen2RN5ioMVNME.gif) 128 | 129 | You can reimagine yourself playing out a protagonist character in a movie just like the below **guy transformed him into Leonardo Decaprio**. This is the next level of **Dubsmash**. 130 | 131 | ![img](https://cdn-images-1.medium.com/max/800/0*hGCOMPgw_mJg994h.gif) 132 | 133 | The possibilities are endless. You can create an entire movie without any real cast. Take a scene from real world and convert it to anime. Create fake persons for acting as models for donning the dresses in ecommerce website. 134 | 135 | #### Working of GAN 136 | 137 | Now we will look into the theory behind how all this magic works. GAN consists of 2 neural networks(VAE from above has only a single neural network) which work with each other namely **Generator** and **Discriminator** . They act like **teacher-student, thug-cop**. The task of a Generator is simple as it name says it generates data for example an image which has to look like the real-world data. The task of Discriminator is to look at the data from Generator and discriminate it from real-world data i.e it should look at data generated from Generator and say it’s **fake**. 138 | 139 | ![img](https://cdn-images-1.medium.com/max/800/0*0tfrMJqOpUSoapOA.jpg) 140 | 141 | Now the **cat and mouse game** starts. Since discriminator says that the data is fake generator tries to better itself so that it can produce more realistic data which the discriminator can’t judge as real or fake. Once this happens the discriminator knows that it is failing to properly discriminate so it will try to improve itself and next time it judges better. Now the ball is in the court of generator and this game of trying to overpower each other continues until a stage comes where the discriminator is completely confused whether the data from generator is real or fake. Now generator wins and we have been rooting for generator all along. But remember the **hero is as good as the villain**. **Avengers Endgame** was such a success because **Thanos** was such a menacing villain. So we want the discriminator to be the best and the generator needs to beat discriminator at its peak since then the victory is more sweeter. 142 | 143 | There are multiple architectures and quite a lot of complex loss functions to make the GANs work and we will be looking at one of the most successful architecure **DCGAN** (Deep Convolutional GAN) which first introduced the usage of convolutional layers for GAN. 144 | 145 | As said before GAN consists of two neural networks **Discriminator** and **Generator** . The architectures of these neural networks are similar to that of **Encoder** and **Decoder** in **VAE**.Discriminator is the neural network we are fairly familiar with, image as input which is sampled down with convolutional layers and finally we apply a softmax to get the output class, in this case we have only 2 classes **Fake or Not-Fake** . So common architectures like Resnet, Inception can used to model a Discriminator of DCGAN. Discriminator is trained by providing the real images as real class category and fake images given by generator as fake class category. So it is a 2 class classification problem. 146 | 147 | ![img](https://cdn-images-1.medium.com/max/800/0*OCGg7QKjYapcn8va.png) 148 | 149 | **Generator** architecture looks opposite to that of Discriminator. It takes a linear vector and upsamples it similar to **Decoder in VAE**. The linear vector is generator by random sampling and just like in VAE we can think of it as attributes of latent space. Different random samplings generate different outputs. 150 | 151 | ![img](https://cdn-images-1.medium.com/max/800/0*YsQmo0POSVoIjzJZ.png) 152 | 153 | Both the discriminator and generator are trained simultaneously in a **minimax** game. Discriminator tries to reduce the discriminative loss such as cross-entropy loss and Generator tries to oppose it by trying to increase the error. Unlike the general tasks like classification, detection etc. the **loss doesn’t constantly decrease** since there are two opposing parties involved. Also we don’t want either discriminator to overpower generator from the start or vice-versa since in that case it will be a one-sided game and the two networks on a whole wouldn’t be learning as there is no competition, so we start from a stage where both are equally dumb i.e think of generator as generating random images and discriminator as randomly classifying images as fake or real. The networks slowly improve by competing with each other until it reaches a stage where generator completely fools the discriminator. 154 | 155 | Training GANs is an art and there are a lot of hacks involved in stabilizing the process of training two networks simultaneously but that is the content for another article. 156 | 157 | **About me :- I work as a Senior Research Engineer in Samsung R&D and my area of work is computer vision, mobile object detection and recognition. You can reach out to me at** [**https://www.linkedin.com/in/anilmatcha/**](https://www.linkedin.com/in/anilmatcha/) -------------------------------------------------------------------------------- /Tf_lite.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "accelerator": "GPU", 6 | "colab": { 7 | "name": "Tf_lite.ipynb", 8 | "provenance": [], 9 | "private_outputs": true, 10 | "collapsed_sections": [], 11 | "toc_visible": true, 12 | "include_colab_link": true 13 | }, 14 | "kernelspec": { 15 | "name": "python3", 16 | "display_name": "Python 3" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text", 34 | "id": "n6ecAvsmQp1I" 35 | }, 36 | "source": [ 37 | "## To run this colab, press the \"Runtime\" button in the menu tab and then press the \"Run all\" button." 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": { 43 | "colab_type": "text", 44 | "id": "77gENRVX40S7" 45 | }, 46 | "source": [ 47 | "##### Copyright 2019 The TensorFlow Authors." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "metadata": { 53 | "cellView": "both", 54 | "colab_type": "code", 55 | "id": "d8jyt37T42Vf", 56 | "colab": {} 57 | }, 58 | "source": [ 59 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 60 | "# you may not use this file except in compliance with the License.\n", 61 | "# You may obtain a copy of the License at\n", 62 | "#\n", 63 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 64 | "#\n", 65 | "# Unless required by applicable law or agreed to in writing, software\n", 66 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 67 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 68 | "# See the License for the specific language governing permissions and\n", 69 | "# limitations under the License." 70 | ], 71 | "execution_count": 0, 72 | "outputs": [] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "colab_type": "text", 78 | "id": "hRTa3Ee15WsJ" 79 | }, 80 | "source": [ 81 | "# Recognize Images using Transfer Learning" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "colab_type": "text", 88 | "id": "dQHMcypT3vDT" 89 | }, 90 | "source": [ 91 | "\n", 92 | " \n", 95 | " \n", 98 | "
\n", 93 | " Run in Google Colab\n", 94 | " \n", 96 | " View source on GitHub\n", 97 | "
" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "metadata": { 104 | "colab_type": "code", 105 | "id": "iBMcobPHdD8O", 106 | "colab": {} 107 | }, 108 | "source": [ 109 | "from __future__ import absolute_import, division, print_function, unicode_literals\n", 110 | "\n", 111 | "try:\n", 112 | " # The %tensorflow_version magic only works in colab.\n", 113 | " %tensorflow_version 2.x\n", 114 | "except Exception:\n", 115 | " pass\n", 116 | "import tensorflow as tf\n", 117 | "\n", 118 | "import os\n", 119 | "import numpy as np\n", 120 | "import matplotlib.pyplot as plt" 121 | ], 122 | "execution_count": 0, 123 | "outputs": [] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "metadata": { 128 | "colab_type": "code", 129 | "id": "NOG3l_MsBO1A", 130 | "colab": {} 131 | }, 132 | "source": [ 133 | "tf.__version__" 134 | ], 135 | "execution_count": 0, 136 | "outputs": [] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": { 141 | "colab_type": "text", 142 | "id": "v77rlkCKW0IJ" 143 | }, 144 | "source": [ 145 | "## Setup Input Pipeline" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": { 151 | "colab_type": "text", 152 | "id": "j4QOy2uA3P_p" 153 | }, 154 | "source": [ 155 | "Download the dataset from online." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "metadata": { 161 | "id": "dM51yVFz51DU", 162 | "colab_type": "code", 163 | "colab": {} 164 | }, 165 | "source": [ 166 | "!pip install google_images_download" 167 | ], 168 | "execution_count": 0, 169 | "outputs": [] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "metadata": { 174 | "colab_type": "code", 175 | "id": "xxL2mjVVGIrV", 176 | "colab": {} 177 | }, 178 | "source": [ 179 | "# importing google_images_download module \n", 180 | "from google_images_download import google_images_download \n", 181 | " \n", 182 | "# creating object \n", 183 | "response = google_images_download.googleimagesdownload() \n", 184 | " \n", 185 | "search_queries = [ 'human'] \n", 186 | " \n", 187 | " \n", 188 | "def downloadimages(query): \n", 189 | " # keywords is the search query \n", 190 | " # format is the image file format \n", 191 | " # limit is the number of images to be downloaded \n", 192 | " # print urs is to print the image file url \n", 193 | " # size is the image size which can \n", 194 | " # be specified manually (\"large, medium, icon\") \n", 195 | " # aspect ratio denotes the height width ratio \n", 196 | " # of images to download. (\"tall, square, wide, panoramic\") \n", 197 | " arguments = {\"keywords\": query, \n", 198 | " \"format\": \"jpg\", \n", 199 | " \"limit\":100, \n", 200 | " \"print_urls\":True, \n", 201 | " \"size\": \"medium\", \n", 202 | " \"aspect_ratio\": \"panoramic\"} \n", 203 | " try: \n", 204 | " response.download(arguments) \n", 205 | " \n", 206 | " # Handling File NotFound Error \n", 207 | " except FileNotFoundError: \n", 208 | " arguments = {\"keywords\": query, \n", 209 | " \"format\": \"jpg\", \n", 210 | " \"limit\":4, \n", 211 | " \"print_urls\":True, \n", 212 | " \"size\": \"medium\"} \n", 213 | " \n", 214 | " # Providing arguments for the searched query \n", 215 | " try: \n", 216 | " # Downloading the photos based \n", 217 | " # on the given arguments \n", 218 | " response.download(arguments) \n", 219 | " except: \n", 220 | " pass\n", 221 | " \n", 222 | "# Driver Code \n", 223 | "for query in search_queries: \n", 224 | " downloadimages(query) \n", 225 | " print()" 226 | ], 227 | "execution_count": 0, 228 | "outputs": [] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "metadata": { 233 | "id": "31FSTdd98Wgf", 234 | "colab_type": "code", 235 | "colab": {} 236 | }, 237 | "source": [ 238 | "import os\n", 239 | "from PIL import Image as pil_image\n", 240 | "folders = os.listdir(\"downloads\")\n", 241 | "for folder in folders:\n", 242 | " images = os.listdir(\"downloads/\"+folder)\n", 243 | " for image in images:\n", 244 | " image_name = \"downloads/\"+folder+\"/\"+image \n", 245 | " if \"%20\" in image:\n", 246 | " new_image_name = image_name.replace(\"%20\", \"\")\n", 247 | " os.rename(image_name, new_image_name)\n", 248 | " image_name = new_image_name\n", 249 | " try:\n", 250 | " pil_image.open(image_name)\n", 251 | " except:\n", 252 | " os.remove(image_name) " 253 | ], 254 | "execution_count": 0, 255 | "outputs": [] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": { 260 | "colab_type": "text", 261 | "id": "z4gTv7ig2vMh" 262 | }, 263 | "source": [ 264 | "Use `ImageDataGenerator` to rescale the images.\n", 265 | "\n", 266 | "Create the train generator and specify where the train dataset directory, image size, batch size.\n", 267 | "\n", 268 | "Create the validation generator with similar approach as the train generator with the flow_from_directory() method." 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "metadata": { 274 | "colab_type": "code", 275 | "id": "aCLb_yV5JfF3", 276 | "colab": {} 277 | }, 278 | "source": [ 279 | "IMAGE_SIZE = 224\n", 280 | "BATCH_SIZE = 32\n", 281 | "base_dir = \"downloads\"\n", 282 | "datagen = tf.keras.preprocessing.image.ImageDataGenerator(\n", 283 | " rescale=1./255, \n", 284 | " validation_split=0.2,\n", 285 | "\t\thorizontal_flip=True,\n", 286 | "\t\tfill_mode=\"nearest\")\n", 287 | "\n", 288 | "train_generator = datagen.flow_from_directory(\n", 289 | " base_dir,\n", 290 | " target_size=(IMAGE_SIZE, IMAGE_SIZE),\n", 291 | " batch_size=BATCH_SIZE, \n", 292 | " subset='training')\n", 293 | "\n", 294 | "val_generator = datagen.flow_from_directory(\n", 295 | " base_dir,\n", 296 | " target_size=(IMAGE_SIZE, IMAGE_SIZE),\n", 297 | " batch_size=BATCH_SIZE, \n", 298 | " subset='validation')" 299 | ], 300 | "execution_count": 0, 301 | "outputs": [] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "metadata": { 306 | "colab_type": "code", 307 | "id": "tx1L7fxxWA_G", 308 | "colab": {} 309 | }, 310 | "source": [ 311 | "for image_batch, label_batch in train_generator:\n", 312 | " break\n", 313 | "image_batch.shape, label_batch.shape" 314 | ], 315 | "execution_count": 0, 316 | "outputs": [] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": { 321 | "colab_type": "text", 322 | "id": "ZrFFcwUb3iK9" 323 | }, 324 | "source": [ 325 | "Save the labels in a file which will be downloaded later." 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "metadata": { 331 | "colab_type": "code", 332 | "id": "-QFZIhWs4dsq", 333 | "colab": {} 334 | }, 335 | "source": [ 336 | "print (train_generator.class_indices)\n", 337 | "\n", 338 | "labels = '\\n'.join(sorted(train_generator.class_indices.keys()))\n", 339 | "\n", 340 | "with open('labels.txt', 'w') as f:\n", 341 | " f.write(labels)" 342 | ], 343 | "execution_count": 0, 344 | "outputs": [] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "metadata": { 349 | "colab_type": "code", 350 | "id": "duxD_UDSOmng", 351 | "colab": {} 352 | }, 353 | "source": [ 354 | "!cat labels.txt" 355 | ], 356 | "execution_count": 0, 357 | "outputs": [] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": { 362 | "colab_type": "text", 363 | "id": "OkH-kazQecHB" 364 | }, 365 | "source": [ 366 | "## Create the base model from the pre-trained convnets\n", 367 | "\n", 368 | "Create the base model from the **MobileNet V2** model developed at Google, and pre-trained on the ImageNet dataset, a large dataset of 1.4M images and 1000 classes of web images.\n", 369 | "\n", 370 | "First, pick which intermediate layer of MobileNet V2 will be used for feature extraction. A common practice is to use the output of the very last layer before the flatten operation, the so-called \"bottleneck layer\". The reasoning here is that the following fully-connected layers will be too specialized to the task the network was trained on, and thus the features learned by these layers won't be very useful for a new task. The bottleneck features, however, retain much generality.\n", 371 | "\n", 372 | "Let's instantiate an MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the `include_top=False` argument, we load a network that doesn't include the classification layers at the top, which is ideal for feature extraction." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "metadata": { 378 | "colab_type": "code", 379 | "id": "19IQ2gqneqmS", 380 | "colab": {} 381 | }, 382 | "source": [ 383 | "IMG_SHAPE = (IMAGE_SIZE, IMAGE_SIZE, 3)\n", 384 | "\n", 385 | "# Create the base model from the pre-trained model MobileNet V2\n", 386 | "base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n", 387 | " include_top=False, \n", 388 | " weights='imagenet')" 389 | ], 390 | "execution_count": 0, 391 | "outputs": [] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": { 396 | "colab_type": "text", 397 | "id": "rlx56nQtfe8Y" 398 | }, 399 | "source": [ 400 | "## Feature extraction\n", 401 | "You will freeze the convolutional base created from the previous step and use that as a feature extractor, add a classifier on top of it and train the top-level classifier." 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "metadata": { 407 | "colab_type": "code", 408 | "id": "Tts8BbAtRGvk", 409 | "colab": {} 410 | }, 411 | "source": [ 412 | "base_model.trainable = False" 413 | ], 414 | "execution_count": 0, 415 | "outputs": [] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": { 420 | "colab_type": "text", 421 | "id": "wdMRM8YModbk" 422 | }, 423 | "source": [ 424 | "### Add a classification head" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "metadata": { 430 | "colab_type": "code", 431 | "id": "eApvroIyn1K0", 432 | "colab": {} 433 | }, 434 | "source": [ 435 | "model = tf.keras.Sequential([\n", 436 | " base_model,\n", 437 | " tf.keras.layers.Conv2D(32, 3, activation='relu'),\n", 438 | " tf.keras.layers.Dropout(0.2),\n", 439 | " tf.keras.layers.GlobalAveragePooling2D(),\n", 440 | " tf.keras.layers.Dense(4, activation='softmax')\n", 441 | "])" 442 | ], 443 | "execution_count": 0, 444 | "outputs": [] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": { 449 | "colab_type": "text", 450 | "id": "g0ylJXE_kRLi" 451 | }, 452 | "source": [ 453 | "### Compile the model\n", 454 | "\n", 455 | "You must compile the model before training it. Since there are two classes, use a binary cross-entropy loss." 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "metadata": { 461 | "colab_type": "code", 462 | "id": "RpR8HdyMhukJ", 463 | "colab": {} 464 | }, 465 | "source": [ 466 | "model.compile(optimizer=tf.keras.optimizers.Adam(), \n", 467 | " loss='categorical_crossentropy', \n", 468 | " metrics=['accuracy'])" 469 | ], 470 | "execution_count": 0, 471 | "outputs": [] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "metadata": { 476 | "colab_type": "code", 477 | "id": "I8ARiyMFsgbH", 478 | "colab": {} 479 | }, 480 | "source": [ 481 | "model.summary()" 482 | ], 483 | "execution_count": 0, 484 | "outputs": [] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "metadata": { 489 | "colab_type": "code", 490 | "id": "krvBumovycVA", 491 | "colab": {} 492 | }, 493 | "source": [ 494 | "print('Number of trainable variables = {}'.format(len(model.trainable_variables)))" 495 | ], 496 | "execution_count": 0, 497 | "outputs": [] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": { 502 | "colab_type": "text", 503 | "id": "RxvgOYTDSWTx" 504 | }, 505 | "source": [ 506 | "### Train the model\n", 507 | "\n", 508 | "" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "metadata": { 514 | "colab_type": "code", 515 | "id": "JsaRFlZ9B6WK", 516 | "colab": {} 517 | }, 518 | "source": [ 519 | "epochs = 10\n", 520 | "!mkdir training\n", 521 | "checkpoint_path = \"training/cp-{epoch:04d}.ckpt\"\n", 522 | "checkpoint_dir = os.path.dirname(checkpoint_path)\n", 523 | "\n", 524 | "# Create a callback that saves the model's weights\n", 525 | "cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,\n", 526 | " save_weights_only=True,\n", 527 | " verbose=1)\n", 528 | "\n", 529 | "history = model.fit_generator(train_generator, \n", 530 | " epochs=epochs, \n", 531 | " validation_data=val_generator,\n", 532 | " callbacks=[cp_callback])" 533 | ], 534 | "execution_count": 0, 535 | "outputs": [] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "metadata": { 540 | "colab_type": "text", 541 | "id": "CqwV-CRdS6Nv" 542 | }, 543 | "source": [ 544 | "## Fine tuning\n", 545 | "In our feature extraction experiment, you were only training a few layers on top of an MobileNet V2 base model. The weights of the pre-trained network were **not** updated during training.\n", 546 | "\n", 547 | "One way to increase performance even further is to train (or \"fine-tune\") the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic features maps to features associated specifically to our dataset." 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": { 553 | "colab_type": "text", 554 | "id": "CPXnzUK0QonF" 555 | }, 556 | "source": [ 557 | "### Un-freeze the top layers of the model\n" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": { 563 | "colab_type": "text", 564 | "id": "rfxv_ifotQak" 565 | }, 566 | "source": [ 567 | "All you need to do is unfreeze the `base_model` and set the bottom layers be un-trainable. Then, recompile the model (necessary for these changes to take effect), and resume training." 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "metadata": { 573 | "colab_type": "code", 574 | "id": "4nzcagVitLQm", 575 | "colab": {} 576 | }, 577 | "source": [ 578 | "base_model.trainable = True" 579 | ], 580 | "execution_count": 0, 581 | "outputs": [] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "metadata": { 586 | "colab_type": "code", 587 | "id": "-4HgVAacRs5v", 588 | "colab": {} 589 | }, 590 | "source": [ 591 | "# Let's take a look to see how many layers are in the base model\n", 592 | "print(\"Number of layers in the base model: \", len(base_model.layers))\n", 593 | "\n", 594 | "# Fine tune from this layer onwards\n", 595 | "fine_tune_at = 100\n", 596 | "\n", 597 | "# Freeze all the layers before the `fine_tune_at` layer\n", 598 | "for layer in base_model.layers[:fine_tune_at]:\n", 599 | " layer.trainable = False" 600 | ], 601 | "execution_count": 0, 602 | "outputs": [] 603 | }, 604 | { 605 | "cell_type": "markdown", 606 | "metadata": { 607 | "colab_type": "text", 608 | "id": "4Uk1dgsxT0IS" 609 | }, 610 | "source": [ 611 | "### Compile the model\n", 612 | "\n", 613 | "Compile the model using a much lower training rate." 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "metadata": { 619 | "colab_type": "code", 620 | "id": "NtUnaz0WUDva", 621 | "colab": {} 622 | }, 623 | "source": [ 624 | "model.compile(loss='categorical_crossentropy',\n", 625 | " optimizer = tf.keras.optimizers.Adam(1e-5),\n", 626 | " metrics=['accuracy'])" 627 | ], 628 | "execution_count": 0, 629 | "outputs": [] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "metadata": { 634 | "colab_type": "code", 635 | "id": "WwBWy7J2kZvA", 636 | "colab": {} 637 | }, 638 | "source": [ 639 | "model.summary()" 640 | ], 641 | "execution_count": 0, 642 | "outputs": [] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "metadata": { 647 | "colab_type": "code", 648 | "id": "bNXelbMQtonr", 649 | "colab": {} 650 | }, 651 | "source": [ 652 | "print('Number of trainable variables = {}'.format(len(model.trainable_variables)))" 653 | ], 654 | "execution_count": 0, 655 | "outputs": [] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": { 660 | "colab_type": "text", 661 | "id": "4G5O4jd6TuAG" 662 | }, 663 | "source": [ 664 | "### Continue Train the model" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "metadata": { 670 | "colab_type": "code", 671 | "id": "ECQLkAsFTlun", 672 | "colab": {} 673 | }, 674 | "source": [ 675 | "history_fine = model.fit_generator(train_generator, \n", 676 | " epochs=5,\n", 677 | " validation_data=val_generator)" 678 | ], 679 | "execution_count": 0, 680 | "outputs": [] 681 | }, 682 | { 683 | "cell_type": "markdown", 684 | "metadata": { 685 | "colab_type": "text", 686 | "id": "kRDabW_u1wnv" 687 | }, 688 | "source": [ 689 | "## Convert to TFLite" 690 | ] 691 | }, 692 | { 693 | "cell_type": "markdown", 694 | "metadata": { 695 | "colab_type": "text", 696 | "id": "hNvMl6CM6lG4" 697 | }, 698 | "source": [ 699 | "Saved the model using `tf.saved_model.save` and then convert the saved model to a tf lite compatible format." 700 | ] 701 | }, 702 | { 703 | "cell_type": "code", 704 | "metadata": { 705 | "colab_type": "code", 706 | "id": "_LZiKVInWNGy", 707 | "colab": {} 708 | }, 709 | "source": [ 710 | "saved_model_dir = 'save/fine_tuning'\n", 711 | "tf.saved_model.save(model, saved_model_dir)\n", 712 | "\n", 713 | "converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)\n", 714 | "tflite_model = converter.convert()\n", 715 | "\n", 716 | "with open('model.tflite', 'wb') as f:\n", 717 | " f.write(tflite_model)" 718 | ], 719 | "execution_count": 0, 720 | "outputs": [] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": { 725 | "colab_type": "text", 726 | "id": "GE4w-9S410Dk" 727 | }, 728 | "source": [ 729 | "Download the converted model and labels" 730 | ] 731 | }, 732 | { 733 | "cell_type": "code", 734 | "metadata": { 735 | "colab_type": "code", 736 | "id": "x47uW_lI1DoV", 737 | "colab": {} 738 | }, 739 | "source": [ 740 | "from google.colab import drive\n", 741 | "drive.mount(\"/content/drive\")" 742 | ], 743 | "execution_count": 0, 744 | "outputs": [] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "metadata": { 749 | "id": "YZVH3WtPHbnM", 750 | "colab_type": "code", 751 | "colab": {} 752 | }, 753 | "source": [ 754 | "!cp model.tflite /content/drive/'My Drive'/\n", 755 | "!cp labels.txt /content/drive/'My Drive'/" 756 | ], 757 | "execution_count": 0, 758 | "outputs": [] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "metadata": { 763 | "id": "kEaJs7OM6BIy", 764 | "colab_type": "code", 765 | "colab": {} 766 | }, 767 | "source": [ 768 | "!cp /content/drive/'My Drive'/model.tflite .\n", 769 | "!cp /content/drive/'My Drive'/labels.txt ." 770 | ], 771 | "execution_count": 0, 772 | "outputs": [] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "metadata": { 777 | "id": "Apav7Z4y8V_J", 778 | "colab_type": "code", 779 | "colab": {} 780 | }, 781 | "source": [ 782 | "from google.colab import files\n", 783 | "files.download(\"model.tflite\")" 784 | ], 785 | "execution_count": 0, 786 | "outputs": [] 787 | }, 788 | { 789 | "cell_type": "markdown", 790 | "metadata": { 791 | "colab_type": "text", 792 | "id": "_TZTwG7nhm0C" 793 | }, 794 | "source": [ 795 | "## Summary:\n", 796 | "\n", 797 | "* **Using a pre-trained model for feature extraction**: When working with a small dataset, it is common to take advantage of features learned by a model trained on a larger dataset in the same domain. This is done by instantiating the pre-trained model and adding a fully-connected classifier on top. The pre-trained model is \"frozen\" and only the weights of the classifier get updated during training.\n", 798 | "In this case, the convolutional base extracted all the features associated with each image and you just trained a classifier that determines the image class given that set of extracted features.\n", 799 | "\n", 800 | "* **Fine-tuning a pre-trained model**: To further improve performance, one might want to repurpose the top-level layers of the pre-trained models to the new dataset via fine-tuning.\n", 801 | "In this case, you tuned your weights such that your model learned high-level features specific to the dataset. This technique is usually recommended when the training dataset is large and very similar to the orginial dataset that the pre-trained model was trained on.\n" 802 | ] 803 | }, 804 | { 805 | "cell_type": "code", 806 | "metadata": { 807 | "id": "XxX89NRREoIe", 808 | "colab_type": "code", 809 | "colab": {} 810 | }, 811 | "source": [ 812 | "!cp TFLITE.md /content/drive/'My Drive'/" 813 | ], 814 | "execution_count": 0, 815 | "outputs": [] 816 | }, 817 | { 818 | "cell_type": "code", 819 | "metadata": { 820 | "id": "wlprQQq3EsMa", 821 | "colab_type": "code", 822 | "colab": {} 823 | }, 824 | "source": [ 825 | "!cp app-debug.apk /content/drive/'My Drive'/" 826 | ], 827 | "execution_count": 0, 828 | "outputs": [] 829 | }, 830 | { 831 | "cell_type": "code", 832 | "metadata": { 833 | "id": "dUGmAT40FKJW", 834 | "colab_type": "code", 835 | "colab": {} 836 | }, 837 | "source": [ 838 | "import requests\n", 839 | "\n", 840 | "def download_file_from_google_drive(id, destination):\n", 841 | " URL = \"https://docs.google.com/uc?export=download\"\n", 842 | "\n", 843 | " session = requests.Session()\n", 844 | "\n", 845 | " response = session.get(URL, params = { 'id' : id }, stream = True)\n", 846 | " token = get_confirm_token(response)\n", 847 | "\n", 848 | " if token:\n", 849 | " params = { 'id' : id, 'confirm' : token }\n", 850 | " response = session.get(URL, params = params, stream = True)\n", 851 | "\n", 852 | " save_response_content(response, destination) \n", 853 | "\n", 854 | "def get_confirm_token(response):\n", 855 | " for key, value in response.cookies.items():\n", 856 | " if key.startswith('download_warning'):\n", 857 | " return value\n", 858 | "\n", 859 | " return None\n", 860 | "\n", 861 | "def save_response_content(response, destination):\n", 862 | " CHUNK_SIZE = 32768\n", 863 | "\n", 864 | " with open(destination, \"wb\") as f:\n", 865 | " for chunk in response.iter_content(CHUNK_SIZE):\n", 866 | " if chunk: # filter out keep-alive new chunks\n", 867 | " f.write(chunk)\n", 868 | "\n", 869 | "if __name__ == \"__main__\":\n", 870 | " file_id = 'TAKE ID FROM SHAREABLE LINK'\n", 871 | " destination = 'DESTINATION FILE ON YOUR DISK'\n", 872 | " download_file_from_google_drive(\"1-6-5l3i8mfBl5C5xg4-mqXwdGzNvfVn5\", \"obj_detect.apk\")" 873 | ], 874 | "execution_count": 0, 875 | "outputs": [] 876 | }, 877 | { 878 | "cell_type": "code", 879 | "metadata": { 880 | "id": "67qjyYG-MeHq", 881 | "colab_type": "code", 882 | "colab": {} 883 | }, 884 | "source": [ 885 | "" 886 | ], 887 | "execution_count": 0, 888 | "outputs": [] 889 | } 890 | ] 891 | } --------------------------------------------------------------------------------