├── LICENSE
├── R.mat
├── Running.mov
├── VideoClassificationExample.mlx
├── VideoClassificationExample_images
    ├── figure_0.png
    ├── figure_1.png
    ├── figure_2.png
    ├── image_0.png
    ├── image_1.png
    └── image_2.png
├── W.mat
├── Walking.mov
├── plainCode
    └── VideoClassificationExample.m
└── readme.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 giants19
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/R.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/R.mat


--------------------------------------------------------------------------------
/Running.mov:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/Running.mov


--------------------------------------------------------------------------------
/VideoClassificationExample.mlx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample.mlx


--------------------------------------------------------------------------------
/VideoClassificationExample_images/figure_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_0.png


--------------------------------------------------------------------------------
/VideoClassificationExample_images/figure_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_1.png


--------------------------------------------------------------------------------
/VideoClassificationExample_images/figure_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/figure_2.png


--------------------------------------------------------------------------------
/VideoClassificationExample_images/image_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_0.png


--------------------------------------------------------------------------------
/VideoClassificationExample_images/image_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_1.png


--------------------------------------------------------------------------------
/VideoClassificationExample_images/image_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/VideoClassificationExample_images/image_2.png


--------------------------------------------------------------------------------
/W.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/W.mat


--------------------------------------------------------------------------------
/Walking.mov:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KentaItakura/video_classification_LSTM_matlab/6160c42a2884a9c2ae14e14c61e7a205d81f561d/Walking.mov


--------------------------------------------------------------------------------
/plainCode/VideoClassificationExample.m:
--------------------------------------------------------------------------------
  1 | %% Running/Walking Classification with Video Clips using LSTM 
  2 | % 
  3 | % 
  4 | % I am very grateful for the free-images from <https://www.irasutoya.com/ https://www.irasutoya.com/>
  5 | % 
  6 | % The video classification or recogintion from video will be more intriguing 
  7 | % as more video data is being accumulating currentely and we can easily record 
  8 | % videos with, for example, smartphones. In this example, running/walking classification 
  9 | % was conducted with the video clips taken while running and walking using a deep 
 10 | % learning-based technique called LSTM (Long Short Term Memory) which can classifies 
 11 | % time-series data. 
 12 | % 
 13 | % This example was created based on a Mathworks official documentation located 
 14 | % at 
 15 | % 
 16 | % <https://jp.mathworks.com/help/deeplearning/examples/classify-videos-using-deep-learning.html 
 17 | % https://jp.mathworks.com/help/deeplearning/examples/classify-videos-using-deep-learning.html>
 18 | % 
 19 | % While the official example requires down-loading a dataset about 2 GB, this 
 20 | % example can try that with a small amout of data, which may help you giving a 
 21 | % try easily. Note that this is just an example of LSTM with images and please 
 22 | % refer to the official example for your further study.  
 23 | %% Load Pretrained Convolutional Network
 24 | %% 
 25 | % * A pre-trained network, resnet 18 was used for feature extraction here in 
 26 | % this example.
 27 | % * The features extracted from the pre-trained network were fed into LSTM layer 
 28 | % as shown below. 
 29 | % * Other networks such as googlenet, resnet50, and mobilenetv2 are available. 
 30 | % * You may choose other networks when the final accuracy is not high enough.
 31 | %% 
 32 | % 
 33 | 
 34 | clear;clc;close all
 35 | % if you have not down-loaded the pre-trained network resnet18, pls get it
 36 | % from "add-in". 
 37 | netCNN = resnet18;
 38 | %% Load Data
 39 | %% 
 40 | % * The video clips for the classification were retrieved from the videos recorded 
 41 | % while running and walking which lasted about 5 min and 10 min, respectively. 
 42 | % * The 2 kinds of video were taken at the same path to exclude the difference 
 43 | % of the scene to capture.
 44 | % * In the future example, the scence with various place/moving condition should 
 45 | % be prepared.    
 46 | 
 47 | RunVideo=VideoReader('Running.mov'); % load the video taken while running
 48 | WalkVideo=VideoReader('Walking.mov');% load the video taken while walking
 49 | f=figure;
 50 | title('Video while Running/Walking');hold on
 51 | set(gcf,'Visible','on')
 52 | numFrames = 5/(1/RunVideo.FrameRate); %"5" is a duration (second) to show 
 53 | for i = 1:numFrames
 54 |     RunFrame=readFrame(RunVideo);
 55 |     WalkFrame=readFrame(WalkVideo);
 56 |     imshow([RunFrame,WalkFrame]);
 57 |     drawnow
 58 |     pause(1/RunVideo.FrameRate)
 59 |     hold on
 60 | end
 61 | hold off
 62 | % reset the state of waling/running video, otherwise the frames already
 63 | % read are not to be retrieved
 64 | WalkVideo.CurrentTime=0;
 65 | RunVideo.CurrentTime=0;
 66 | %% Read all frames and extract features to save into variables R and W
 67 | %% 
 68 | % * Image features are extracted with the pre-trained network to feed to the 
 69 | % LSTM network with "single type".
 70 | % * As the process of feature extraction takes a long time, pls load the variable 
 71 | % R and W pre-calculated for you. 
 72 | % * The function "activations" returns the vector of the extracted feature. 
 73 | 
 74 | if (exist('W.mat')==2)&&(exist('R.mat')==2)
 75 |     load W.mat
 76 |     load R.mat
 77 | else
 78 |     RFrames=zeros(224,224,3,RunVideo.NumFrames,'uint8');
 79 |     WFrames=zeros(224,224,3,WalkVideo.NumFrames,'uint8');
 80 |     for i=1:RunVideo.NumFrames
 81 |         RFrames(:,:,:,i)=imresize(readFrame(RunVideo),[224 224]);
 82 |         % the video frames should be resized into 224 by 224 since the
 83 |         % resnet18 only accepts that size. 
 84 |     end
 85 |     
 86 |     for i=1:WalkVideo.NumFrames
 87 |         WFrames(:,:,:,i)=imresize(readFrame(WalkVideo),[224 224]);
 88 |     end
 89 |     R=single(activations(netCNN,RFrames,'pool5','OutputAs','columns'));
 90 |     W=single(activations(netCNN,WFrames,'pool5','OutputAs','columns'));
 91 | end
 92 | %% Prepare the set of image features of a video clip lasting a few seconds
 93 | % 
 94 | %% 
 95 | % * A video clip whose duration is from minDuration to maxDuration as difined 
 96 | % below was obtained. 
 97 | % * You can specify the number of the clips to obtain from each video. 
 98 | 
 99 | minDuration=2;
100 | maxDuration=4;
101 | numData=100;
102 | FrameRate=RunVideo.FrameRate;
103 | RData=cell(numData,1);
104 | WData=cell(numData,1);
105 | for i=1:numData
106 |     ClipDuration=randi((maxDuration-minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
107 |     StartingFrameNumRun=randi(RunVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
108 |     StartingFrameNumWalk=randi(WalkVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
109 |     RData{i}=R(:,StartingFrameNumRun:StartingFrameNumRun+ClipDuration);
110 |     WData{i}=W(:,StartingFrameNumWalk:StartingFrameNumWalk+ClipDuration);
111 | end
112 | %% Prepare Training Data
113 | % Prepare the data for training by partitioning the data into training and validation 
114 | % partitions.
115 | % 
116 | % *Create Training and Validation Partitions*
117 | % 
118 | % Partition the data. Assign 70% of the data to the training partition and 30% 
119 | % to the validation partition.
120 | 
121 | idx = randperm(numData);
122 | N = floor(0.7 * numData);
123 | sequencesTrainRun = {RData{idx(1:N)}};
124 | sequencesTrainWalk = {WData{idx(1:N)}};
125 | sequencesTrain=cat(2,sequencesTrainRun,sequencesTrainWalk);
126 | labelsTrain=categorical([zeros(N,1);ones(N,1)],[0 1],{'Run','Walk'});
127 | 
128 | sequencesValidRun = {RData{idx(N+1:end)}};
129 | sequencesValidWalk = {WData{idx(N+1:end)}};
130 | labelsValidation=categorical([zeros(numel(sequencesValidWalk),1);ones(numel(sequencesValidWalk),1)],[0 1],{'Run','Walk'});
131 | sequencesValidation=cat(2,sequencesValidRun,sequencesValidWalk);
132 | %% Create LSTM Network
133 | % Create an LSTM network that can classify the sequences of feature vectors 
134 | % representing the videos.
135 | % 
136 | % Define the LSTM network architecture. Specify the following network layers.
137 | %% 
138 | % * A sequence input layer with an input size corresponding to the feature dimension 
139 | % of the feature vectors
140 | % * Here, the dimension of the extracted feature with resnet18 was 512, meaning 
141 | % the numFeatures is 512. 
142 | % * LSTM layer with 1500 hidden units with a dropout layer afterwards. To output 
143 | % only one label for each sequence by setting the |'OutputMode'| option of the 
144 | % BiLSTM layer to |'last'|
145 | % * You may use BiLSTM  layer with which the image sequences can be learned 
146 | % with forward and backward time-series. 
147 | % * 2 LSTM layer can be put in the "layers" which might learn more detailed 
148 | % information of the time-series data. 
149 | % * A fully connected layer with an output size corresponding to the number 
150 | % of classes (here, 2), a softmax layer, and a classification layer.
151 | % * If you would like to esimated a certain value from the time-series data, 
152 | % you can prepare "regressionLayer" with numerical label (data) instead of the 
153 | % categorical "|labelsTrain|". 
154 | % * The dropout layer contribute to prevent the network from being "over-tuned" 
155 | % to the training data. 
156 | 
157 | numFeatures = size(R,1);
158 | numClasses = 2;
159 | 
160 | layers = [
161 |     sequenceInputLayer(numFeatures,'Name','sequence')
162 |     lstmLayer(1500,'OutputMode','last','Name','lstm')
163 |     dropoutLayer(0.5,'Name','drop')
164 |     fullyConnectedLayer(numClasses,'Name','fc')
165 |     softmaxLayer('Name','softmax')
166 |     classificationLayer('Name','classification')];
167 | %% Specify Training Options
168 | % Specify the training options using the |trainingOptions| function.
169 | %% 
170 | % * Set a mini-batch size 16, an initial learning rate of 0.0001, and a gradient 
171 | % threshold of 2 (to prevent the gradients from exploding).
172 | % * Shuffle the data every epoch.
173 | % * Validate the network once per about three epochs.
174 | % * Display the training progress in a plot and suppress verbose output.
175 | % * Max epoch: 20
176 | % * optimizer: adam
177 | 
178 | miniBatchSize = 16;
179 | numData = numel(sequencesTrainRun);
180 | numIterationsPerEpoch = floor(numData / miniBatchSize)*3;
181 | 
182 | options = trainingOptions('adam', ...
183 |     'MiniBatchSize',miniBatchSize, ...
184 |     'MaxEpoch',25, ...
185 |     'InitialLearnRate',1e-3, ...
186 |     'GradientThreshold',2, ...
187 |     'Shuffle','every-epoch', ...
188 |     'ValidationData',{sequencesValidation,labelsValidation}, ...
189 |     'ValidationFrequency',numIterationsPerEpoch, ...
190 |     'Plots','training-progress', ...
191 |     'Verbose',false);
192 | %% Train LSTM Network with the extracted image features
193 | % Train the network using the |trainNetwork| function. 
194 | %% 
195 | % * If you would like to plot the data in training process such as the accucary 
196 | % and loss, please check the values saved in the variable |info|. 
197 | 
198 | [netLSTM,info] = trainNetwork(sequencesTrain,labelsTrain,layers,options);
199 | %% 
200 | % Calculate the classification accuracy of the network on the validation set.  
201 | % If the accuracy is quite satisfactory, please prepare the test video clips to 
202 | % explore the feasibility of this LSTM network. 
203 | 
204 | YPred = classify(netLSTM,sequencesValidation,'MiniBatchSize',miniBatchSize);
205 | accuracy = mean(YPred == labelsValidation)
206 | % please confirm the balance of the classification. 
207 | confusionchart(labelsValidation,YPred)
208 | %% Things to consider 
209 | %% 
210 | % # some clips are difficult to classify with a short period of the movie => 
211 | % the accuracy is likely to increase if the duration at each clip gets longer
212 | % # What if the two LSTM layers are used while this example uses one LSTM layer?
213 | % # What if BiLSTM layer is used?


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
  1 | [![View Video classification using LSTM（LSTMによる動画の分類） on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://jp.mathworks.com/matlabcentral/fileexchange/74402-video-classification-using-lstm-lstm)
  2 | 
  3 | # Running/Walking Classification with Video Clips using LSTM 
  4 | This is a simple example of video classification using LSTM with MATLAB.
  5 | 
  6 | [English]  
  7 | This is a simple example of video classification using LSTM with MATLAB.
  8 | Please run the code named VideoClassificationExample.
  9 | This example was created based on a Mathworks official documentation located [here](https://jp.mathworks.com/help/deeplearning/examples/classify-videos-using-deep-learning.html). While the official example requires down-loading a dataset about 2 GB, this example can try that
 10 | with a small amout of data, which may help you giving a try easily.
 11 | Note that this is just an example of LSTM with images and please refer to the official example for your further study.
 12 | I appreciate for the free pictures from used in the thumbnail and live editor obtrained from this [page](https://www.irasutoya.com/).  
 13 | <br>  
 14 | 
 15 | [Japanese]  
 16 | 深層学習を用いてビデオの分類を行います。その人が歩いているのか/走っているのかをその人の頭に取り付けたカメラの動画から予測します。動画のフレームを入力とし、学習済みネットワークにより特徴量を取り出します。そして、その特徴量からLSTMによる分類を行います。静止画の分類は多く紹介されていますが、ビデオを入力とし、その数秒間のビデオから対象が何であるかを分類する例はmatlab document中にあまり多くありませんでした。また公式ドキュメントにも例はありますが、２ギガのデータセットをダウンロードする必要があり、ダウンロードや計算に多くの時間がかかり、手軽に試すにはやや不向きです。参考になれば幸いです。
 17 | 
 18 | [References]
 19 | [1] Matlab Official Documentation: [Classify Videos Using Deep Learning](https://jp.mathworks.com/help/deeplearning/ug/classify-videos-using-deep-learning.html)    
 20 | [2] [Irasutoya](https://www.irasutoya.com) : images in the script were obtained from this website  
 21 | 
 22 | While the official example requires down-loading a dataset about 2 GB, this example can try that with a small amout of data, which may help you giving a try easily.
 23 | Note that this is just an example of LSTM with images and please refer to the official example for your further study.  
 24 | 
 25 | 
 26 | 
 27 | 
 28 | 
 29 | 
 30 | ![image_0.png](VideoClassificationExample_images/image_0.png)
 31 | 
 32 | 
 33 | # Load Pretrained Convolutional Network
 34 | 
 35 |    -  A pre-trained network, resnet 18 was used for feature extraction here in this example. 
 36 |    -  The features extracted from the pre-trained network were fed into LSTM layer as shown below.  
 37 |    -  Other networks such as googlenet, resnet50, and mobilenetv2 are available.  
 38 |    -  You may choose other networks when the final accuracy is not high enough. 
 39 | 
 40 | 
 41 | ![image_1.png](VideoClassificationExample_images/image_1.png)
 42 | 
 43 | 
 44 | ```matlab
 45 | clear;clc;close all
 46 | % if you have not down-loaded the pre-trained network resnet18, pls get it
 47 | % from "add-in". 
 48 | netCNN = resnet18;
 49 | ```
 50 | # Load Data
 51 | 
 52 |    -  The video clips for the classification were retrieved from the videos recorded while running and walking which lasted about 5 min and 10 min, respectively.  
 53 |    -  The 2 kinds of video were taken at the same path to exclude the difference of the scene to capture. 
 54 |    -  In the future example, the scence with various place/moving condition should be prepared.     
 55 | 
 56 | ```matlab
 57 | RunVideo=VideoReader('Running.mov'); % load the video taken while running
 58 | WalkVideo=VideoReader('Walking.mov');% load the video taken while walking
 59 | f=figure;
 60 | title('Video while Running/Walking');hold on
 61 | set(gcf,'Visible','on')
 62 | numFrames = 5/(1/RunVideo.FrameRate); %"5" is a duration (second) to show 
 63 | for i = 1:numFrames
 64 |     RunFrame=readFrame(RunVideo);
 65 |     WalkFrame=readFrame(WalkVideo);
 66 |     imshow([RunFrame,WalkFrame]);
 67 |     drawnow
 68 |     pause(1/RunVideo.FrameRate)
 69 |     hold on
 70 | end
 71 | hold off
 72 | ```
 73 | 
 74 | ![figure_0.png](VideoClassificationExample_images/figure_0.png)
 75 | 
 76 | ```matlab
 77 | % reset the state of waling/running video, otherwise the frames already
 78 | % read are not to be retrieved
 79 | WalkVideo.CurrentTime=0;
 80 | RunVideo.CurrentTime=0;
 81 | ```
 82 | # Read all frames and extract features to save into variables R and W
 83 | 
 84 |    -  Image features are extracted with the pre-trained network to feed to the LSTM network with "single type". 
 85 |    -  As the process of feature extraction takes a long time, pls load the variable R and W pre-calculated for you.  
 86 |    -  The function "activations" returns the vector of the extracted feature.  
 87 | 
 88 | ```matlab
 89 | if (exist('W.mat')==2)&&(exist('R.mat')==2)
 90 |     load W.mat
 91 |     load R.mat
 92 | else
 93 |     RFrames=zeros(224,224,3,RunVideo.NumFrames,'uint8');
 94 |     WFrames=zeros(224,224,3,WalkVideo.NumFrames,'uint8');
 95 |     for i=1:RunVideo.NumFrames
 96 |         RFrames(:,:,:,i)=imresize(readFrame(RunVideo),[224 224]);
 97 |         % the video frames should be resized into 224 by 224 since the
 98 |         % resnet18 only accepts that size. 
 99 |     end
100 |     
101 |     for i=1:WalkVideo.NumFrames
102 |         WFrames(:,:,:,i)=imresize(readFrame(WalkVideo),[224 224]);
103 |     end
104 |     R=single(activations(netCNN,RFrames,'pool5','OutputAs','columns'));
105 |     W=single(activations(netCNN,WFrames,'pool5','OutputAs','columns'));
106 | end
107 | ```
108 | # Prepare the set of image features of a video clip lasting a few seconds
109 | 
110 | ![image_2.png](VideoClassificationExample_images/image_2.png)
111 | 
112 | 
113 |    -  A video clip whose duration is from minDuration to maxDuration as difined below was obtained.  
114 |    -  You can specify the number of the clips to obtain from each video.  
115 | 
116 | ```matlab
117 | minDuration=2;
118 | maxDuration=4;
119 | numData=100;
120 | FrameRate=RunVideo.FrameRate;
121 | RData=cell(numData,1);
122 | WData=cell(numData,1);
123 | for i=1:numData
124 |     ClipDuration=randi((maxDuration-minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
125 |     StartingFrameNumRun=randi(RunVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
126 |     StartingFrameNumWalk=randi(WalkVideo.NumFrames-(maxDuration+minDuration)*FrameRate,[1 1])+minDuration*FrameRate;
127 |     RData{i}=R(:,StartingFrameNumRun:StartingFrameNumRun+ClipDuration);
128 |     WData{i}=W(:,StartingFrameNumWalk:StartingFrameNumWalk+ClipDuration);
129 | end
130 | ```
131 | # Prepare Training Data
132 | 
133 | 
134 | Prepare the data for training by partitioning the data into training and validation partitions.
135 | 
136 | 
137 | 
138 | 
139 | **Create Training and Validation Partitions**
140 | 
141 | 
142 | 
143 | 
144 | Partition the data. Assign 70% of the data to the training partition and 30% to the validation partition.
145 | 
146 | 
147 | ```matlab
148 | idx = randperm(numData);
149 | N = floor(0.7 * numData);
150 | sequencesTrainRun = {RData{idx(1:N)}};
151 | sequencesTrainWalk = {WData{idx(1:N)}};
152 | sequencesTrain=cat(2,sequencesTrainRun,sequencesTrainWalk);
153 | labelsTrain=categorical([zeros(N,1);ones(N,1)],[0 1],{'Run','Walk'});
154 | 
155 | sequencesValidRun = {RData{idx(N+1:end)}};
156 | sequencesValidWalk = {WData{idx(N+1:end)}};
157 | labelsValidation=categorical([zeros(numel(sequencesValidWalk),1);ones(numel(sequencesValidWalk),1)],[0 1],{'Run','Walk'});
158 | sequencesValidation=cat(2,sequencesValidRun,sequencesValidWalk);
159 | ```
160 | # Create LSTM Network
161 | 
162 | 
163 | Create an LSTM network that can classify the sequences of feature vectors representing the videos.
164 | 
165 | 
166 | 
167 | 
168 | Define the LSTM network architecture. Specify the following network layers.
169 | 
170 | 
171 | 
172 |    -  A sequence input layer with an input size corresponding to the feature dimension of the feature vectors 
173 |    -  Here, the dimension of the extracted feature with resnet18 was 512, meaning the numFeatures is 512.  
174 |    -  LSTM layer with 1500 hidden units with a dropout layer afterwards. To output only one label for each sequence by setting the `'OutputMode'` option of the BiLSTM layer to `'last'` 
175 |    -  You may use BiLSTM  layer with which the image sequences can be learned with forward and backward time-series.  
176 |    -  2 LSTM layer can be put in the "layers" which might learn more detailed information of the time-series data.  
177 |    -  A fully connected layer with an output size corresponding to the number of classes (here, 2), a softmax layer, and a classification layer. 
178 |    -  If you would like to esimated a certain value from the time-series data, you can prepare "regressionLayer" with numerical label (data) instead of the categorical "`labelsTrain`".  
179 |    -  The dropout layer contribute to prevent the network from being "over-tuned" to the training data.  
180 | 
181 | ```matlab
182 | numFeatures = size(R,1);
183 | numClasses = 2;
184 | 
185 | layers = [
186 |     sequenceInputLayer(numFeatures,'Name','sequence')
187 |     lstmLayer(1500,'OutputMode','last','Name','lstm')
188 |     dropoutLayer(0.5,'Name','drop')
189 |     fullyConnectedLayer(numClasses,'Name','fc')
190 |     softmaxLayer('Name','softmax')
191 |     classificationLayer('Name','classification')];
192 | ```
193 | # Specify Training Options
194 | 
195 | 
196 | Specify the training options using the `trainingOptions` function.
197 | 
198 | 
199 | 
200 |    -  Set a mini-batch size 16, an initial learning rate of 0.0001, and a gradient threshold of 2 (to prevent the gradients from exploding). 
201 |    -  Shuffle the data every epoch. 
202 |    -  Validate the network once per about three epochs. 
203 |    -  Display the training progress in a plot and suppress verbose output. 
204 |    -  Max epoch: 20 
205 |    -  optimizer: adam 
206 | 
207 | ```matlab
208 | miniBatchSize = 16;
209 | numData = numel(sequencesTrainRun);
210 | numIterationsPerEpoch = floor(numData / miniBatchSize)*3;
211 | 
212 | options = trainingOptions('adam', ...
213 |     'MiniBatchSize',miniBatchSize, ...
214 |     'MaxEpoch',25, ...
215 |     'InitialLearnRate',1e-3, ...
216 |     'GradientThreshold',2, ...
217 |     'Shuffle','every-epoch', ...
218 |     'ValidationData',{sequencesValidation,labelsValidation}, ...
219 |     'ValidationFrequency',numIterationsPerEpoch, ...
220 |     'Plots','training-progress', ...
221 |     'Verbose',false);
222 | ```
223 | # Train LSTM Network with the extracted image features
224 | 
225 | 
226 | Train the network using the `trainNetwork` function. 
227 | 
228 | 
229 | 
230 |    -  If you would like to plot the data in training process such as the accucary and loss, please check the values saved in the variable `info`.  
231 | 
232 | ```matlab
233 | [netLSTM,info] = trainNetwork(sequencesTrain,labelsTrain,layers,options);
234 | ```
235 | 
236 | ![figure_1.png](VideoClassificationExample_images/figure_1.png)
237 | 
238 | 
239 | 
240 | Calculate the classification accuracy of the network on the validation set.  If the accuracy is quite satisfactory, please prepare the test video clips to explore the feasibility of this LSTM network. 
241 | 
242 | 
243 | ```matlab
244 | YPred = classify(netLSTM,sequencesValidation,'MiniBatchSize',miniBatchSize);
245 | accuracy = mean(YPred == labelsValidation)
246 | ```
247 | ```
248 | accuracy = 0.8667
249 | ```
250 | ```matlab
251 | % please confirm the balance of the classification. 
252 | confusionchart(labelsValidation,YPred)
253 | ```
254 | 
255 | ![figure_2.png](VideoClassificationExample_images/figure_2.png)
256 | 
257 | ```
258 | ans = 
259 |   ConfusionMatrixChart のプロパティ:
260 | 
261 |     NormalizedValues: [2x2 double]
262 |          ClassLabels: [2x1 categorical]
263 | 
264 |   すべてのプロパティ を表示
265 | 
266 | ```
267 | # Things to consider 
268 | 
269 |    1.  some clips are difficult to classify with a short period of the movie => the accuracy is likely to increase if the duration at each clip gets longer 
270 |    1.  What if the two LSTM layers are used while this example uses one LSTM layer? 
271 |    1.  What if BiLSTM layer is used?  
272 | 
273 | 


--------------------------------------------------------------------------------