├── LICENSE ├── R.mat ├── Running.mov ├── VideoClassificationExample.mlx ├── VideoClassificationExample_images ├── figure_0.png ├── figure_1.png ├── figure_2.png ├── image_0.png ├── image_1.png └── image_2.png ├── W.mat ├── Walking.mov ├── plainCode └── VideoClassificationExample.m └── readme.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 giants19 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /R.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/R.mat -------------------------------------------------------------------------------- /Running.mov: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/Running.mov -------------------------------------------------------------------------------- /VideoClassificationExample.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample.mlx -------------------------------------------------------------------------------- /VideoClassificationExample_images/figure_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_0.png -------------------------------------------------------------------------------- /VideoClassificationExample_images/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_1.png -------------------------------------------------------------------------------- /VideoClassificationExample_images/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_2.png -------------------------------------------------------------------------------- /VideoClassificationExample_images/image_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_0.png -------------------------------------------------------------------------------- /VideoClassificationExample_images/image_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_1.png -------------------------------------------------------------------------------- /VideoClassificationExample_images/image_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_2.png -------------------------------------------------------------------------------- /W.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/W.mat -------------------------------------------------------------------------------- /Walking.mov: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/Walking.mov -------------------------------------------------------------------------------- /plainCode/VideoClassificationExample.m: -------------------------------------------------------------------------------- 1 | %% Running/Walking Classification with Video Clips using LSTM 2 | % 3 | % 4 | % I am very grateful for the free-images from 5 | % 6 | % The video classification or recogintion from video will be more intriguing 7 | % as more video data is being accumulating currentely and we can easily record 8 | % videos with, for example, smartphones. In this example, running/walking classification 9 | % was conducted with the video clips taken while running and walking using a deep 10 | % learning-based technique called LSTM (Long Short Term Memory) which can classifies 11 | % time-series data. 12 | % 13 | % This example was created based on a Mathworks official documentation located 14 | % at 15 | % 16 | % 18 | % 19 | % While the official example requires down-loading a dataset about 2 GB, this 20 | % example can try that with a small amout of data, which may help you giving a 21 | % try easily. Note that this is just an example of LSTM with images and please 22 | % refer to the official example for your further study. 23 | %% Load Pretrained Convolutional Network 24 | %% 25 | % * A pre-trained network, resnet 18 was used for feature extraction here in 26 | % this example. 27 | % * The features extracted from the pre-trained network were fed into LSTM layer 28 | % as shown below. 29 | % * Other networks such as googlenet, resnet50, and mobilenetv2 are available. 30 | % * You may choose other networks when the final accuracy is not high enough. 31 | %% 32 | % 33 | 34 | clear;clc;close all 35 | % if you have not down-loaded the pre-trained network resnet18, pls get it 36 | % from "add-in". 37 | netCNN = resnet18; 38 | %% Load Data 39 | %% 40 | % * The video clips for the classification were retrieved from the videos recorded 41 | % while running and walking which lasted about 5 min and 10 min, respectively. 42 | % * The 2 kinds of video were taken at the same path to exclude the difference 43 | % of the scene to capture. 44 | % * In the future example, the scence with various place/moving condition should 45 | % be prepared. 46 | 47 | RunVideo=VideoReader('Running.mov'); % load the video taken while running 48 | WalkVideo=VideoReader('Walking.mov');% load the video taken while walking 49 | f=figure; 50 | title('Video while Running/Walking');hold on 51 | set(gcf,'Visible','on') 52 | numFrames = 5/(1/RunVideo.FrameRate); %"5" is a duration (second) to show 53 | for i = 1:numFrames 54 | RunFrame=readFrame(RunVideo); 55 | WalkFrame=readFrame(WalkVideo); 56 | imshow([RunFrame,WalkFrame]); 57 | drawnow 58 | pause(1/RunVideo.FrameRate) 59 | hold on 60 | end 61 | hold off 62 | % reset the state of waling/running video, otherwise the frames already 63 | % read are not to be retrieved 64 | WalkVideo.CurrentTime=0; 65 | RunVideo.CurrentTime=0; 66 | %% Read all frames and extract features to save into variables R and W 67 | %% 68 | % * Image features are extracted with the pre-trained network to feed to the 69 | % LSTM network with "single type". 70 | % * As the process of feature extraction takes a long time, pls load the variable 71 | % R and W pre-calculated for you. 72 | % * The function "activations" returns the vector of the extracted feature. 73 | 74 | if (exist('W.mat')==2)&&(exist('R.mat')==2) 75 | load W.mat 76 | load R.mat 77 | else 78 | RFrames=zeros(224,224,3,RunVideo.NumFrames,'uint8'); 79 | WFrames=zeros(224,224,3,WalkVideo.NumFrames,'uint8'); 80 | for i=1:RunVideo.NumFrames 81 | RFrames(:,:,:,i)=imresize(readFrame(RunVideo),[224 224]); 82 | % the video frames should be resized into 224 by 224 since the 83 | % resnet18 only accepts that size. 84 | end 85 | 86 | for i=1:WalkVideo.NumFrames 87 | WFrames(:,:,:,i)=imresize(readFrame(WalkVideo),[224 224]); 88 | end 89 | R=single(activations(netCNN,RFrames,'pool5','OutputAs','columns')); 90 | W=single(activations(netCNN,WFrames,'pool5','OutputAs','columns')); 91 | end 92 | %% Prepare the set of image features of a video clip lasting a few seconds 93 | % 94 | %% 95 | % * A video clip whose duration is from minDuration to maxDuration as difined 96 | % below was obtained. 97 | % * You can specify the number of the clips to obtain from each video. 98 | 99 | minDuration=2; 100 | maxDuration=4; 101 | numData=100; 102 | FrameRate=RunVideo.FrameRate; 103 | RData=cell(numData,1); 104 | WData=cell(numData,1); 105 | for i=1:numData 106 | ClipDuration=randi((maxDuration-minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 107 | StartingFrameNumRun=randi(RunVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 108 | StartingFrameNumWalk=randi(WalkVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 109 | RData{i}=R(:,StartingFrameNumRun:StartingFrameNumRun+ClipDuration); 110 | WData{i}=W(:,StartingFrameNumWalk:StartingFrameNumWalk+ClipDuration); 111 | end 112 | %% Prepare Training Data 113 | % Prepare the data for training by partitioning the data into training and validation 114 | % partitions. 115 | % 116 | % *Create Training and Validation Partitions* 117 | % 118 | % Partition the data. Assign 70% of the data to the training partition and 30% 119 | % to the validation partition. 120 | 121 | idx = randperm(numData); 122 | N = floor(0.7 * numData); 123 | sequencesTrainRun = {RData{idx(1:N)}}; 124 | sequencesTrainWalk = {WData{idx(1:N)}}; 125 | sequencesTrain=cat(2,sequencesTrainRun,sequencesTrainWalk); 126 | labelsTrain=categorical([zeros(N,1);ones(N,1)],[0 1],{'Run','Walk'}); 127 | 128 | sequencesValidRun = {RData{idx(N+1:end)}}; 129 | sequencesValidWalk = {WData{idx(N+1:end)}}; 130 | labelsValidation=categorical([zeros(numel(sequencesValidWalk),1);ones(numel(sequencesValidWalk),1)],[0 1],{'Run','Walk'}); 131 | sequencesValidation=cat(2,sequencesValidRun,sequencesValidWalk); 132 | %% Create LSTM Network 133 | % Create an LSTM network that can classify the sequences of feature vectors 134 | % representing the videos. 135 | % 136 | % Define the LSTM network architecture. Specify the following network layers. 137 | %% 138 | % * A sequence input layer with an input size corresponding to the feature dimension 139 | % of the feature vectors 140 | % * Here, the dimension of the extracted feature with resnet18 was 512, meaning 141 | % the numFeatures is 512. 142 | % * LSTM layer with 1500 hidden units with a dropout layer afterwards. To output 143 | % only one label for each sequence by setting the |'OutputMode'| option of the 144 | % BiLSTM layer to |'last'| 145 | % * You may use BiLSTM layer with which the image sequences can be learned 146 | % with forward and backward time-series. 147 | % * 2 LSTM layer can be put in the "layers" which might learn more detailed 148 | % information of the time-series data. 149 | % * A fully connected layer with an output size corresponding to the number 150 | % of classes (here, 2), a softmax layer, and a classification layer. 151 | % * If you would like to esimated a certain value from the time-series data, 152 | % you can prepare "regressionLayer" with numerical label (data) instead of the 153 | % categorical "|labelsTrain|". 154 | % * The dropout layer contribute to prevent the network from being "over-tuned" 155 | % to the training data. 156 | 157 | numFeatures = size(R,1); 158 | numClasses = 2; 159 | 160 | layers = [ 161 | sequenceInputLayer(numFeatures,'Name','sequence') 162 | lstmLayer(1500,'OutputMode','last','Name','lstm') 163 | dropoutLayer(0.5,'Name','drop') 164 | fullyConnectedLayer(numClasses,'Name','fc') 165 | softmaxLayer('Name','softmax') 166 | classificationLayer('Name','classification')]; 167 | %% Specify Training Options 168 | % Specify the training options using the |trainingOptions| function. 169 | %% 170 | % * Set a mini-batch size 16, an initial learning rate of 0.0001, and a gradient 171 | % threshold of 2 (to prevent the gradients from exploding). 172 | % * Shuffle the data every epoch. 173 | % * Validate the network once per about three epochs. 174 | % * Display the training progress in a plot and suppress verbose output. 175 | % * Max epoch: 20 176 | % * optimizer: adam 177 | 178 | miniBatchSize = 16; 179 | numData = numel(sequencesTrainRun); 180 | numIterationsPerEpoch = floor(numData / miniBatchSize)*3; 181 | 182 | options = trainingOptions('adam', ... 183 | 'MiniBatchSize',miniBatchSize, ... 184 | 'MaxEpoch',25, ... 185 | 'InitialLearnRate',1e-3, ... 186 | 'GradientThreshold',2, ... 187 | 'Shuffle','every-epoch', ... 188 | 'ValidationData',{sequencesValidation,labelsValidation}, ... 189 | 'ValidationFrequency',numIterationsPerEpoch, ... 190 | 'Plots','training-progress', ... 191 | 'Verbose',false); 192 | %% Train LSTM Network with the extracted image features 193 | % Train the network using the |trainNetwork| function. 194 | %% 195 | % * If you would like to plot the data in training process such as the accucary 196 | % and loss, please check the values saved in the variable |info|. 197 | 198 | [netLSTM,info] = trainNetwork(sequencesTrain,labelsTrain,layers,options); 199 | %% 200 | % Calculate the classification accuracy of the network on the validation set. 201 | % If the accuracy is quite satisfactory, please prepare the test video clips to 202 | % explore the feasibility of this LSTM network. 203 | 204 | YPred = classify(netLSTM,sequencesValidation,'MiniBatchSize',miniBatchSize); 205 | accuracy = mean(YPred == labelsValidation) 206 | % please confirm the balance of the classification. 207 | confusionchart(labelsValidation,YPred) 208 | %% Things to consider 209 | %% 210 | % # some clips are difficult to classify with a short period of the movie => 211 | % the accuracy is likely to increase if the duration at each clip gets longer 212 | % # What if the two LSTM layers are used while this example uses one LSTM layer? 213 | % # What if BiLSTM layer is used? -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | [![View Video classification using LSTM(LSTMによる動画の分類) on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://jp.mathworks.com/matlabcentral/fileexchange/74402-video-classification-using-lstm-lstm) 2 | 3 | # Running/Walking Classification with Video Clips using LSTM 4 | This is a simple example of video classification using LSTM with MATLAB. 5 | 6 | [English] 7 | This is a simple example of video classification using LSTM with MATLAB. 8 | Please run the code named VideoClassificationExample. 9 | This example was created based on a Mathworks official documentation located [here](https://jp.mathworks.com/help/deeplearning/examples/classify-videos-using-deep-learning.html). While the official example requires down-loading a dataset about 2 GB, this example can try that 10 | with a small amout of data, which may help you giving a try easily. 11 | Note that this is just an example of LSTM with images and please refer to the official example for your further study. 12 | I appreciate for the free pictures from used in the thumbnail and live editor obtrained from this [page](https://www.irasutoya.com/). 13 |
14 | 15 | [Japanese] 16 | 深層学習を用いてビデオの分類を行います。その人が歩いているのか/走っているのかをその人の頭に取り付けたカメラの動画から予測します。動画のフレームを入力とし、学習済みネットワークにより特徴量を取り出します。そして、その特徴量からLSTMによる分類を行います。静止画の分類は多く紹介されていますが、ビデオを入力とし、その数秒間のビデオから対象が何であるかを分類する例はmatlab document中にあまり多くありませんでした。また公式ドキュメントにも例はありますが、2ギガのデータセットをダウンロードする必要があり、ダウンロードや計算に多くの時間がかかり、手軽に試すにはやや不向きです。参考になれば幸いです。 17 | 18 | [References] 19 | [1] Matlab Official Documentation: [Classify Videos Using Deep Learning](https://jp.mathworks.com/help/deeplearning/ug/classify-videos-using-deep-learning.html) 20 | [2] [Irasutoya](https://www.irasutoya.com) : images in the script were obtained from this website 21 | 22 | While the official example requires down-loading a dataset about 2 GB, this example can try that with a small amout of data, which may help you giving a try easily. 23 | Note that this is just an example of LSTM with images and please refer to the official example for your further study. 24 | 25 | 26 | 27 | 28 | 29 | 30 | ![image_0.png](VideoClassificationExample_images/image_0.png) 31 | 32 | 33 | # Load Pretrained Convolutional Network 34 | 35 | - A pre-trained network, resnet 18 was used for feature extraction here in this example. 36 | - The features extracted from the pre-trained network were fed into LSTM layer as shown below. 37 | - Other networks such as googlenet, resnet50, and mobilenetv2 are available. 38 | - You may choose other networks when the final accuracy is not high enough. 39 | 40 | 41 | ![image_1.png](VideoClassificationExample_images/image_1.png) 42 | 43 | 44 | ```matlab 45 | clear;clc;close all 46 | % if you have not down-loaded the pre-trained network resnet18, pls get it 47 | % from "add-in". 48 | netCNN = resnet18; 49 | ``` 50 | # Load Data 51 | 52 | - The video clips for the classification were retrieved from the videos recorded while running and walking which lasted about 5 min and 10 min, respectively. 53 | - The 2 kinds of video were taken at the same path to exclude the difference of the scene to capture. 54 | - In the future example, the scence with various place/moving condition should be prepared. 55 | 56 | ```matlab 57 | RunVideo=VideoReader('Running.mov'); % load the video taken while running 58 | WalkVideo=VideoReader('Walking.mov');% load the video taken while walking 59 | f=figure; 60 | title('Video while Running/Walking');hold on 61 | set(gcf,'Visible','on') 62 | numFrames = 5/(1/RunVideo.FrameRate); %"5" is a duration (second) to show 63 | for i = 1:numFrames 64 | RunFrame=readFrame(RunVideo); 65 | WalkFrame=readFrame(WalkVideo); 66 | imshow([RunFrame,WalkFrame]); 67 | drawnow 68 | pause(1/RunVideo.FrameRate) 69 | hold on 70 | end 71 | hold off 72 | ``` 73 | 74 | ![figure_0.png](VideoClassificationExample_images/figure_0.png) 75 | 76 | ```matlab 77 | % reset the state of waling/running video, otherwise the frames already 78 | % read are not to be retrieved 79 | WalkVideo.CurrentTime=0; 80 | RunVideo.CurrentTime=0; 81 | ``` 82 | # Read all frames and extract features to save into variables R and W 83 | 84 | - Image features are extracted with the pre-trained network to feed to the LSTM network with "single type". 85 | - As the process of feature extraction takes a long time, pls load the variable R and W pre-calculated for you. 86 | - The function "activations" returns the vector of the extracted feature. 87 | 88 | ```matlab 89 | if (exist('W.mat')==2)&&(exist('R.mat')==2) 90 | load W.mat 91 | load R.mat 92 | else 93 | RFrames=zeros(224,224,3,RunVideo.NumFrames,'uint8'); 94 | WFrames=zeros(224,224,3,WalkVideo.NumFrames,'uint8'); 95 | for i=1:RunVideo.NumFrames 96 | RFrames(:,:,:,i)=imresize(readFrame(RunVideo),[224 224]); 97 | % the video frames should be resized into 224 by 224 since the 98 | % resnet18 only accepts that size. 99 | end 100 | 101 | for i=1:WalkVideo.NumFrames 102 | WFrames(:,:,:,i)=imresize(readFrame(WalkVideo),[224 224]); 103 | end 104 | R=single(activations(netCNN,RFrames,'pool5','OutputAs','columns')); 105 | W=single(activations(netCNN,WFrames,'pool5','OutputAs','columns')); 106 | end 107 | ``` 108 | # Prepare the set of image features of a video clip lasting a few seconds 109 | 110 | ![image_2.png](VideoClassificationExample_images/image_2.png) 111 | 112 | 113 | - A video clip whose duration is from minDuration to maxDuration as difined below was obtained. 114 | - You can specify the number of the clips to obtain from each video. 115 | 116 | ```matlab 117 | minDuration=2; 118 | maxDuration=4; 119 | numData=100; 120 | FrameRate=RunVideo.FrameRate; 121 | RData=cell(numData,1); 122 | WData=cell(numData,1); 123 | for i=1:numData 124 | ClipDuration=randi((maxDuration-minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 125 | StartingFrameNumRun=randi(RunVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 126 | StartingFrameNumWalk=randi(WalkVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate; 127 | RData{i}=R(:,StartingFrameNumRun:StartingFrameNumRun+ClipDuration); 128 | WData{i}=W(:,StartingFrameNumWalk:StartingFrameNumWalk+ClipDuration); 129 | end 130 | ``` 131 | # Prepare Training Data 132 | 133 | 134 | Prepare the data for training by partitioning the data into training and validation partitions. 135 | 136 | 137 | 138 | 139 | **Create Training and Validation Partitions** 140 | 141 | 142 | 143 | 144 | Partition the data. Assign 70% of the data to the training partition and 30% to the validation partition. 145 | 146 | 147 | ```matlab 148 | idx = randperm(numData); 149 | N = floor(0.7 * numData); 150 | sequencesTrainRun = {RData{idx(1:N)}}; 151 | sequencesTrainWalk = {WData{idx(1:N)}}; 152 | sequencesTrain=cat(2,sequencesTrainRun,sequencesTrainWalk); 153 | labelsTrain=categorical([zeros(N,1);ones(N,1)],[0 1],{'Run','Walk'}); 154 | 155 | sequencesValidRun = {RData{idx(N+1:end)}}; 156 | sequencesValidWalk = {WData{idx(N+1:end)}}; 157 | labelsValidation=categorical([zeros(numel(sequencesValidWalk),1);ones(numel(sequencesValidWalk),1)],[0 1],{'Run','Walk'}); 158 | sequencesValidation=cat(2,sequencesValidRun,sequencesValidWalk); 159 | ``` 160 | # Create LSTM Network 161 | 162 | 163 | Create an LSTM network that can classify the sequences of feature vectors representing the videos. 164 | 165 | 166 | 167 | 168 | Define the LSTM network architecture. Specify the following network layers. 169 | 170 | 171 | 172 | - A sequence input layer with an input size corresponding to the feature dimension of the feature vectors 173 | - Here, the dimension of the extracted feature with resnet18 was 512, meaning the numFeatures is 512. 174 | - LSTM layer with 1500 hidden units with a dropout layer afterwards. To output only one label for each sequence by setting the `'OutputMode'` option of the BiLSTM layer to `'last'` 175 | - You may use BiLSTM layer with which the image sequences can be learned with forward and backward time-series. 176 | - 2 LSTM layer can be put in the "layers" which might learn more detailed information of the time-series data. 177 | - A fully connected layer with an output size corresponding to the number of classes (here, 2), a softmax layer, and a classification layer. 178 | - If you would like to esimated a certain value from the time-series data, you can prepare "regressionLayer" with numerical label (data) instead of the categorical "`labelsTrain`". 179 | - The dropout layer contribute to prevent the network from being "over-tuned" to the training data. 180 | 181 | ```matlab 182 | numFeatures = size(R,1); 183 | numClasses = 2; 184 | 185 | layers = [ 186 | sequenceInputLayer(numFeatures,'Name','sequence') 187 | lstmLayer(1500,'OutputMode','last','Name','lstm') 188 | dropoutLayer(0.5,'Name','drop') 189 | fullyConnectedLayer(numClasses,'Name','fc') 190 | softmaxLayer('Name','softmax') 191 | classificationLayer('Name','classification')]; 192 | ``` 193 | # Specify Training Options 194 | 195 | 196 | Specify the training options using the `trainingOptions` function. 197 | 198 | 199 | 200 | - Set a mini-batch size 16, an initial learning rate of 0.0001, and a gradient threshold of 2 (to prevent the gradients from exploding). 201 | - Shuffle the data every epoch. 202 | - Validate the network once per about three epochs. 203 | - Display the training progress in a plot and suppress verbose output. 204 | - Max epoch: 20 205 | - optimizer: adam 206 | 207 | ```matlab 208 | miniBatchSize = 16; 209 | numData = numel(sequencesTrainRun); 210 | numIterationsPerEpoch = floor(numData / miniBatchSize)*3; 211 | 212 | options = trainingOptions('adam', ... 213 | 'MiniBatchSize',miniBatchSize, ... 214 | 'MaxEpoch',25, ... 215 | 'InitialLearnRate',1e-3, ... 216 | 'GradientThreshold',2, ... 217 | 'Shuffle','every-epoch', ... 218 | 'ValidationData',{sequencesValidation,labelsValidation}, ... 219 | 'ValidationFrequency',numIterationsPerEpoch, ... 220 | 'Plots','training-progress', ... 221 | 'Verbose',false); 222 | ``` 223 | # Train LSTM Network with the extracted image features 224 | 225 | 226 | Train the network using the `trainNetwork` function. 227 | 228 | 229 | 230 | - If you would like to plot the data in training process such as the accucary and loss, please check the values saved in the variable `info`. 231 | 232 | ```matlab 233 | [netLSTM,info] = trainNetwork(sequencesTrain,labelsTrain,layers,options); 234 | ``` 235 | 236 | ![figure_1.png](VideoClassificationExample_images/figure_1.png) 237 | 238 | 239 | 240 | Calculate the classification accuracy of the network on the validation set. If the accuracy is quite satisfactory, please prepare the test video clips to explore the feasibility of this LSTM network. 241 | 242 | 243 | ```matlab 244 | YPred = classify(netLSTM,sequencesValidation,'MiniBatchSize',miniBatchSize); 245 | accuracy = mean(YPred == labelsValidation) 246 | ``` 247 | ``` 248 | accuracy = 0.8667 249 | ``` 250 | ```matlab 251 | % please confirm the balance of the classification. 252 | confusionchart(labelsValidation,YPred) 253 | ``` 254 | 255 | ![figure_2.png](VideoClassificationExample_images/figure_2.png) 256 | 257 | ``` 258 | ans = 259 | ConfusionMatrixChart のプロパティ: 260 | 261 | NormalizedValues: [2x2 double] 262 | ClassLabels: [2x1 categorical] 263 | 264 | すべてのプロパティ を表示 265 | 266 | ``` 267 | # Things to consider 268 | 269 | 1. some clips are difficult to classify with a short period of the movie => the accuracy is likely to increase if the duration at each clip gets longer 270 | 1. What if the two LSTM layers are used while this example uses one LSTM layer? 271 | 1. What if BiLSTM layer is used? 272 | 273 | --------------------------------------------------------------------------------