├── LICENSE ├── README.md ├── char_rnns ├── README.md ├── dino_names.txt ├── notebooks │ ├── Dino_Names_GRU.ipynb │ ├── Dino_Names_LSTM.ipynb │ ├── Dino_Names_RNN.ipynb │ └── Dino_Names_Scratch.ipynb └── plots │ ├── Dino_Names_GRU.jpeg │ ├── Dino_Names_LSTM.jpeg │ ├── Dino_Names_RNN.jpeg │ └── Dino_Names_Scratch.jpeg ├── neural_machine_translation ├── README.md ├── notebooks │ ├── Attention_Is_All_You_Need.ipynb │ ├── Conv_Seq2Seq.ipynb │ ├── Seq2Seq.ipynb │ └── Seq2Seq_with_Attention.ipynb └── plots │ ├── Conv_Seq2Seq.jpeg │ ├── Seq2Seq.jpeg │ ├── Seq2Seq_with_Attention.jpeg │ └── Transformer.jpeg ├── text_classification ├── README.md ├── notebooks │ ├── BERT.ipynb │ ├── CNN.ipynb │ ├── FastText.ipynb │ └── LSTM.ipynb └── plots │ ├── BERT.png │ ├── CNN.png │ ├── FastText.png │ └── LSTM.png └── word_rnn ├── Book 1 - The Philosopher's Stone.txt ├── Main_Paragraph_generation_on_larger_dataset_lstm (1).ipynb ├── README.md ├── ezgif.com-gif-maker.gif ├── paragraph generation loss gru.png ├── paragraph generation loss lstm.png ├── paragraph generation loss rnn.png ├── wordRNN_paragraph_gru2.pth ├── wordRNN_paragraph_lstm2.pth └── wordRNN_paragraph_rnn2.pth /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 IvLabs 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Natural Language Processing 2 | This is the Natural Language Processing repository of IvLabs and contains implementation of various architectures, starting from Character Level RNN(s) built from scratch, up to and including the Transformer architecture. 3 | Further, we have also included a rough roadmap for enthusiasts with basic knowledge of Machine/Deep Learning. 4 | 5 | We have implemented and compared the following architectures: 6 | - [x] [Character Level RNN (Char RNN)](char_rnns) 7 | - [x] From scratch 8 | - [x] Vanilla RNN 9 | - [x] LSTM 10 | - [x] GRU 11 | 12 | - [x] [Language Models (Word RNN)](word_rnn) 13 | - [x] Vanilla RNN 14 | - [x] LSTM 15 | - [x] GRU 16 | 17 | - [x] [Neural Machine Translation](neural_machine_translation)\ 18 | For Neural Machine Translation, we have implemented the following papers. 19 | - [x] [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) 20 | - [x] [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473) 21 | - [x] [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) 22 | - [x] [Attention Is All You Need](https://arxiv.org/abs/1706.03762) 23 | 24 |
25 |
26 | 27 | ### Contributors: 28 | * [Rishika Bhagwatkar](https://https//github.com/rishika2110) 29 | * [Khurshed P. Fitter](https://https//github.com/GlazeDonuts) 30 | * [Aneesh A. Shetye](https://https//github.com/aneesh-shetye) 31 | * [Diksha Bagade](https://github.com/Diksha942) 32 | * [Kshitij Ambilduke](https://github.com/Kshitij-Ambilduke) 33 | * [Ajinkya Deshpande](https://github.com/AjinkyaDeshpande39) 34 | * [Prayash Swain](https://github.com/SprayashB) 35 | * [Thanmay Jayakumar](https://github.com/ThanmayJ) 36 | * [Fauzan Farooqui](https://github.com/FauzanFarooqui) 37 | 38 | -------------------------------------------------------------------------------- /char_rnns/README.md: -------------------------------------------------------------------------------- 1 | # Character Level RNN (Char RNN) 2 | The fundamental basis of any language is its alphabet. Hence character level RNN(s) are the most basic form of recurrent neural networks and we have implemented them from scratch. Also, we have used the RNN, LSTM and GRU layers available in PyTorch for comparison. 3 | 4 | All the models are trained on the [Dino Names](Dino_Names.txt) dataset which contains 1536 dinosaur names. Data splitting and training parameters are mentioned below. 5 | 6 | 7 | | Parameter | Value | 8 | | ----------------- |:------------------:| 9 | | Training Set | 95% (1460/1536) | 10 | | Testing Set | 3% (46/1536) | 11 | | Validation Set | 2% (30/1536) | 12 | | Number of Epochs | 40 | 13 | | Learning Rate | 4x10-4 | 14 | | Hidden Dimensions | 64 | 15 | | Loss Function | Cross Entropy Loss | 16 | | Optimizer | AdamW | 17 | 18 |
19 | 20 | ## General Pipeline 21 | 1. Every character in the vocabulary(including the delimiter ".") is embedded using One-Hot vector representation. 22 | 2. A sequence of embeddings is fed to the model. This typically contains embeddings of all the characters of a word, except the delimiter. 23 | 3. The output of the model is validated against the ground truth, i.e. a sequence of embeddings of all the characters of the word (including the delimiter) except the first. 24 | 4. The training and validation losses are calculated and stochastic optimization is applied. 25 | 5. This is repeated for every word in the corpus. 26 | 6. The cumilitive training and validation losses for each epoch are recorded and plotted. 27 | 28 |
29 | 30 | ## Architectures: 31 | ### 1. [Char RNN from Scratch](https://github.com/IvLabs/Natural-Language-Processing/blob/master/char_rnns/notebooks/Dino_Names_Scratch.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1N01IvqI0yxK1CAKi0cfwRTcgvR-_YukL?authuser=1#forceEdit=true&sandboxMode=true) 32 | This notebook implements a vanilla RNN, right from scratch using only basic linear layers and activation functions. This notebook aims at deepening the understanding and general implementation paradigm for recurrent nerual networks. It covers every step, right from data preprocessing to sampling from the trained model, using only basic Python and PyTorch functionalities. 33 | 34 | ### 2. [Char RNN using Vanilla RNN Layer](https://github.com/IvLabs/Natural-Language-Processing/blob/master/char_rnns/notebooks/Dino_Names_RNN.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1POL4Hjr-jATbmJLEhfGqcUhKNB6XYHHp?authuser=1#forceEdit=true&sandboxMode=true) 35 | This architecture employs the Vanilla RNN Layer available in PyTorch instead of writing the entire RNN model from scratch. This makes the model atleast 4 times faster owing to the optimized data-flow design of PyTorch's inbuilt layers. The increment in performance, is quite nominal due to the general drawbacks of Vanilla RNN models like vanishing gradients and short temporal memory spans. 36 | 37 | ### 3. [Char RNN using LSTM Layer](https://github.com/IvLabs/Natural-Language-Processing/blob/master/char_rnns/notebooks/Dino_Names_LSTM.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lj7S2NaPa55rS-3X4yWlMj3-dy1EPEne?authuser=1#forceEdit=true&sandboxMode=true) 38 | In this notebook, we implement a single layered Long Short Term Memory (LSTM) network using PyTorch's inbuilt LSTM Layer. This network has more than 3 times more parameters than the Vanilla RNN. The final convergent loss does not change a lot but the results of sampling seem to make a lot more sense, i.e. appear to be closer to natural language and sound more sensible or natural. 39 | 40 | ### 4. [Char RNN using GRU Layer](https://github.com/IvLabs/Natural-Language-Processing/blob/master/char_rnns/notebooks/Dino_Names_GRU.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1KHngbDPUXEpSyl1HfbIFeNwkun5ssZsK?authuser=1#forceEdit=true&sandboxMode=true) 41 | The final notebook in the comparison, uses Gated Recurrent Units (GRU) which are a more computationally efficient version of the LSTM units. The number of trainable parameters is almost 2.5 times that of the equivalent Vanilla RNN, but the performance is often at par with the LSTM network. This makes them a friendlier choice for large scale deployment. 42 | 43 |
44 | 45 | ## Summary 46 | Below is a table, summarising the number of parameters and the convergent loss achieved by each model. 47 | 48 | | Architecture | No. of Learnable Parameters | Training Loss | Validation Loss | Test Loss | 49 | | ---------------- |:---------------------------:|:-------------:|:---------------:| :-------: | 50 | | RNN from scratch | 10856 | 2.0110 | 1.8492 | 1.7026 | 51 | | Vanilla RNN | 7707 | 1.8576 | 1.6835 | 1.5294 | 52 | | LSTM | 25563 | 1.7426 | 1.6065 | 1.4923 | 53 | | GRU | 19611 | 1.7202 | 1.5082 | 1.3935 | 54 | 55 | 56 |
57 | 58 | ## Plots 59 |

60 | 61 | 62 | 63 | 64 |

65 | 66 |
67 | 68 | ### Note: 69 | In the notebooks (all except RNN from scratch), a parameter named "NUM_LAYERS" can be changed to change the number of stacked layers in the model. However, keeping in mind the miniscule amount of data available and the meagrely small objective at hand, using multi-layer architectures seems futile as the trade-off between performance and required computation, tilts more towards the computationally heavy side. 70 | -------------------------------------------------------------------------------- /char_rnns/dino_names.txt: -------------------------------------------------------------------------------- 1 | Aachenosaurus 2 | Aardonyx 3 | Abdallahsaurus 4 | Abelisaurus 5 | Abrictosaurus 6 | Abrosaurus 7 | Abydosaurus 8 | Acanthopholis 9 | Achelousaurus 10 | Acheroraptor 11 | Achillesaurus 12 | Achillobator 13 | Acristavus 14 | Acrocanthosaurus 15 | Acrotholus 16 | Actiosaurus 17 | Adamantisaurus 18 | Adasaurus 19 | Adelolophus 20 | Adeopapposaurus 21 | Aegyptosaurus 22 | Aeolosaurus 23 | Aepisaurus 24 | Aepyornithomimus 25 | Aerosteon 26 | AetonyxAfromimus 27 | Afrovenator 28 | Agathaumas 29 | Aggiosaurus 30 | Agilisaurus 31 | Agnosphitys 32 | Agrosaurus 33 | Agujaceratops 34 | Agustinia 35 | Ahshislepelta 36 | Airakoraptor 37 | Ajancingenia 38 | Ajkaceratops 39 | Alamosaurus 40 | Alaskacephale 41 | Albalophosaurus 42 | Albertaceratops 43 | Albertadromeus 44 | Albertavenator 45 | Albertonykus 46 | Albertosaurus 47 | Albinykus 48 | Albisaurus 49 | Alcovasaurus 50 | Alectrosaurus 51 | Aletopelta 52 | Algoasaurus 53 | Alioramus 54 | Aliwalia 55 | Allosaurus 56 | Almas 57 | Alnashetri 58 | Alocodon 59 | Altirhinus 60 | Altispinax 61 | Alvarezsaurus 62 | Alwalkeria 63 | Alxasaurus 64 | Amargasaurus 65 | Amargastegos 66 | Amargatitanis 67 | Amazonsaurus 68 | Ammosaurus 69 | Ampelosaurus 70 | Amphicoelias 71 | Amphicoelicaudia 72 | Amphisaurus 73 | Amtocephale 74 | Amtosaurus 75 | Amurosaurus 76 | Amygdalodon 77 | Anabisetia 78 | Anasazisaurus 79 | Anatosaurus 80 | Anatotitan 81 | Anchiceratops 82 | Anchiornis 83 | Anchisaurus 84 | Andesaurus 85 | Andhrasaurus 86 | Angaturama 87 | Angloposeidon 88 | Angolatitan 89 | Angulomastacator 90 | Aniksosaurus 91 | Animantarx 92 | Ankistrodon 93 | Ankylosaurus 94 | Anodontosaurus 95 | Anoplosaurus 96 | Anserimimus 97 | Antarctopelta 98 | Antarctosaurus 99 | Antetonitrus 100 | Anthodon 101 | Antrodemus 102 | Anzu 103 | Aoniraptor 104 | Aorun 105 | Apatodon 106 | Apatoraptor 107 | Apatosaurus 108 | Appalachiosaurus 109 | Aquilops 110 | Aragosaurus 111 | Aralosaurus 112 | Araucanoraptor 113 | Archaeoceratops 114 | Archaeodontosaurus 115 | Archaeopteryx 116 | Archaeoraptor 117 | Archaeornis 118 | Archaeornithoides 119 | Archaeornithomimus 120 | Arcovenator 121 | Arctosaurus 122 | Arcusaurus 123 | Arenysaurus 124 | Argentinosaurus 125 | Argyrosaurus 126 | Aristosaurus 127 | Aristosuchus 128 | Arizonasaurus 129 | Arkansaurus 130 | Arkharavia 131 | Arrhinoceratops 132 | Arstanosaurus 133 | Asiaceratops 134 | Asiamericana 135 | Asiatosaurus 136 | Astrodon 137 | Astrodonius 138 | Astrodontaurus 139 | Astrophocaudia 140 | Asylosaurus 141 | Atacamatitan 142 | Atlantosaurus 143 | Atlasaurus 144 | Atlascopcosaurus 145 | Atrociraptor 146 | Atsinganosaurus 147 | Aublysodon 148 | Aucasaurus 149 | Augustia 150 | Augustynolophus 151 | Auroraceratops 152 | Aurornis 153 | Australodocus 154 | Australovenator 155 | Austrocheirus 156 | Austroposeidon 157 | Austroraptor 158 | Austrosaurus 159 | Avaceratops 160 | Avalonia 161 | Avalonianus 162 | Aviatyrannis 163 | Avimimus 164 | Avisaurus 165 | Avipes 166 | Azendohsaurus 167 | Bactrosaurus 168 | Bagaceratops 169 | Bagaraatan 170 | Bahariasaurus 171 | Bainoceratops 172 | Bakesaurus 173 | Balaur 174 | Balochisaurus 175 | Bambiraptor 176 | Banji 177 | Baotianmansaurus 178 | Barapasaurus 179 | Barilium 180 | Barosaurus 181 | Barrosasaurus 182 | Barsboldia 183 | Baryonyx 184 | Bashunosaurus 185 | Basutodon 186 | Bathygnathus 187 | Batyrosaurus 188 | Baurutitan 189 | Bayosaurus 190 | Becklespinax 191 | Beelemodon 192 | Beibeilong 193 | Beipiaognathus 194 | Beipiaosaurus 195 | Beishanlong 196 | Bellusaurus 197 | Belodon 198 | Berberosaurus 199 | Betasuchus 200 | Bicentenaria 201 | Bienosaurus 202 | Bihariosaurus 203 | Bilbeyhallorum 204 | Bissektipelta 205 | Bistahieversor 206 | Blancocerosaurus 207 | Blasisaurus 208 | Blikanasaurus 209 | Bolong 210 | Bonapartenykus 211 | Bonapartesaurus 212 | Bonatitan 213 | Bonitasaura 214 | Borealopelta 215 | Borealosaurus 216 | Boreonykus 217 | Borogovia 218 | Bothriospondylus 219 | Brachiosaurus 220 | Brachyceratops 221 | Brachylophosaurus 222 | Brachypodosaurus 223 | Brachyrophus 224 | Brachytaenius 225 | Brachytrachelopan 226 | Bradycneme 227 | Brasileosaurus 228 | Brasilotitan 229 | Bravoceratops 230 | Breviceratops 231 | Brohisaurus 232 | Brontomerus 233 | Brontoraptor 234 | Brontosaurus 235 | Bruhathkayosaurus 236 | Bugenasaura 237 | Buitreraptor 238 | Burianosaurus 239 | Buriolestes 240 | Byranjaffia 241 | Byronosaurus 242 | Caenagnathasia 243 | Caenagnathus 244 | Calamosaurus 245 | Calamospondylus 246 | Calamospondylus 247 | Callovosaurus 248 | Camarasaurus 249 | Camarillasaurus 250 | Camelotia 251 | Camposaurus 252 | Camptonotus 253 | Camptosaurus 254 | Campylodon 255 | Campylodoniscus 256 | Canardia 257 | Capitalsaurus 258 | Carcharodontosaurus 259 | Cardiodon 260 | Carnotaurus 261 | Caseosaurus 262 | Cathartesaura 263 | Cathetosaurus 264 | Caudipteryx 265 | Caudocoelus 266 | Caulodon 267 | Cedarosaurus 268 | Cedarpelta 269 | Cedrorestes 270 | Centemodon 271 | Centrosaurus 272 | Cerasinops 273 | Ceratonykus 274 | Ceratops 275 | Ceratosaurus 276 | Cetiosauriscus 277 | Cetiosaurus 278 | Changchunsaurus 279 | Changdusaurus 280 | Changyuraptor 281 | Chaoyangsaurus 282 | Charonosaurus 283 | Chasmosaurus 284 | Chassternbergia 285 | Chebsaurus 286 | Chenanisaurus 287 | Cheneosaurus 288 | Chialingosaurus 289 | Chiayusaurus 290 | Chienkosaurus 291 | Chihuahuasaurus 292 | Chilantaisaurus 293 | Chilesaurus 294 | Chindesaurus 295 | Chingkankousaurus 296 | Chinshakiangosaurus 297 | Chirostenotes 298 | Choconsaurus 299 | Chondrosteosaurus 300 | Chromogisaurus 301 | Chuandongocoelurus 302 | Chuanjiesaurus 303 | Chuanqilong 304 | Chubutisaurus 305 | Chungkingosaurus 306 | Chuxiongosaurus 307 | Cinizasaurus 308 | Cionodon 309 | Citipati 310 | Cladeiodon 311 | Claorhynchus 312 | Claosaurus 313 | Clarencea 314 | Clasmodosaurus 315 | Clepsysaurus 316 | Coahuilaceratops 317 | Coelophysis 318 | Coelosaurus 319 | Coeluroides 320 | Coelurosauravus 321 | Coelurus 322 | Colepiocephale 323 | Coloradia 324 | Coloradisaurus 325 | Colossosaurus 326 | Comahuesaurus 327 | Comanchesaurus 328 | Compsognathus 329 | Compsosuchus 330 | Concavenator 331 | Conchoraptor 332 | Condorraptor 333 | Coronosaurus 334 | Corythoraptor 335 | Corythosaurus 336 | Craspedodon 337 | Crataeomus 338 | Craterosaurus 339 | Creosaurus 340 | Crichtonpelta 341 | Crichtonsaurus 342 | Cristatusaurus 343 | Crosbysaurus 344 | Cruxicheiros 345 | Cryolophosaurus 346 | Cryptodraco 347 | Cryptoraptor 348 | Cryptosaurus 349 | Cryptovolans 350 | Cumnoria 351 | Daanosaurus 352 | Dacentrurus 353 | Dachongosaurus 354 | Daemonosaurus 355 | Dahalokely 356 | Dakosaurus 357 | Dakotadon 358 | Dakotaraptor 359 | Daliansaurus 360 | Damalasaurus 361 | Dandakosaurus 362 | Danubiosaurus 363 | Daptosaurus 364 | Darwinsaurus 365 | Dashanpusaurus 366 | Daspletosaurus 367 | Dasygnathoides 368 | Dasygnathus 369 | Datanglong 370 | Datonglong 371 | Datousaurus 372 | Daurosaurus 373 | Daxiatitan 374 | Deinocheirus 375 | Deinodon 376 | Deinonychus 377 | Delapparentia 378 | Deltadromeus 379 | Demandasaurus 380 | Denversaurus 381 | Deuterosaurus 382 | Diabloceratops 383 | Diamantinasaurus 384 | Dianchungosaurus 385 | Diceratops 386 | DiceratusDiclonius 387 | Dicraeosaurus 388 | DidanodonDilong 389 | Dilophosaurus 390 | Diluvicursor 391 | Dimodosaurus 392 | Dinheirosaurus 393 | Dinodocus 394 | Dinotyrannus 395 | Diplodocus 396 | Diplotomodon 397 | Diracodon 398 | Dolichosuchus 399 | Dollodon 400 | Domeykosaurus 401 | Dongbeititan 402 | Dongyangopelta 403 | Dongyangosaurus 404 | Doratodon 405 | Doryphorosaurus 406 | Draconyx 407 | Dracopelta 408 | Dracoraptor 409 | Dracorex 410 | Dracovenator 411 | Dravidosaurus 412 | Dreadnoughtus 413 | Drinker 414 | Dromaeosauroides 415 | Dromaeosaurus 416 | Dromiceiomimus 417 | Dromicosaurus 418 | Drusilasaura 419 | Dryosaurus 420 | Dryptosauroides 421 | Dryptosaurus 422 | Dubreuillosaurus 423 | Duriatitan 424 | Duriavenator 425 | Dynamosaurus 426 | Dyoplosaurus 427 | Dysalotosaurus 428 | Dysganus 429 | Dyslocosaurus 430 | Dystrophaeus 431 | Dystylosaurus 432 | Echinodon 433 | Edmarka 434 | Edmontonia 435 | Edmontosaurus 436 | Efraasia 437 | Einiosaurus 438 | Ekrixinatosaurus 439 | Elachistosuchus 440 | Elaltitan 441 | Elaphrosaurus 442 | Elmisaurus 443 | Elopteryx 444 | Elosaurus 445 | Elrhazosaurus 446 | Elvisaurus 447 | Emausaurus 448 | Embasaurus 449 | Enigmosaurus 450 | Eoabelisaurus 451 | Eobrontosaurus 452 | Eocarcharia 453 | Eoceratops 454 | Eocursor 455 | Eodromaeus 456 | Eohadrosaurus 457 | Eolambia 458 | Eomamenchisaurus 459 | Eoplophysis 460 | Eoraptor 461 | Eosinopteryx 462 | Eotrachodon 463 | Eotriceratops 464 | Eotyrannus 465 | Eousdryosaurus 466 | Epachthosaurus 467 | Epanterias 468 | Ephoenosaurus 469 | Epicampodon 470 | Epichirostenotes 471 | Epidendrosaurus 472 | Epidexipteryx 473 | Equijubus 474 | Erectopus 475 | Erketu 476 | Erliansaurus 477 | Erlikosaurus 478 | Eshanosaurus 479 | Euacanthus 480 | Eucamerotus 481 | Eucentrosaurus 482 | Eucercosaurus 483 | Eucnemesaurus 484 | Eucoelophysis 485 | Eugongbusaurus 486 | Euhelopus 487 | Euoplocephalus 488 | Eupodosaurus 489 | Eureodon 490 | Eurolimnornis 491 | Euronychodon 492 | Europasaurus 493 | Europatitan 494 | Europelta 495 | Euskelosaurus 496 | Eustreptospondylus 497 | Fabrosaurus 498 | Falcarius 499 | Fendusaurus 500 | Fenestrosaurus 501 | Ferganasaurus 502 | Ferganastegos 503 | Ferganocephale 504 | Foraminacephale 505 | Fosterovenator 506 | Frenguellisaurus 507 | Fruitadens 508 | Fukuiraptor 509 | Fukuisaurus 510 | Fukuititan 511 | Fukuivenator 512 | Fulengia 513 | Fulgurotherium 514 | Fusinasus 515 | Fusuisaurus 516 | Futabasaurus 517 | Futalognkosaurus 518 | Gadolosaurus 519 | Galeamopus 520 | Galesaurus 521 | Gallimimus 522 | Galtonia 523 | Galveosaurus 524 | Galvesaurus 525 | Gannansaurus 526 | Gansutitan 527 | Ganzhousaurus 528 | Gargoyleosaurus 529 | Garudimimus 530 | Gasosaurus 531 | Gasparinisaura 532 | Gastonia 533 | Gavinosaurus 534 | Geminiraptor 535 | Genusaurus 536 | Genyodectes 537 | Geranosaurus 538 | Gideonmantellia 539 | Giganotosaurus 540 | Gigantoraptor 541 | Gigantosaurus 542 | Gigantosaurus 543 | Gigantoscelus 544 | Gigantspinosaurus 545 | Gilmoreosaurus 546 | Ginnareemimus 547 | Giraffatitan 548 | Glacialisaurus 549 | Glishades 550 | Glyptodontopelta 551 | Skeleton 552 | Gobiceratops 553 | Gobisaurus 554 | Gobititan 555 | Gobivenator 556 | Godzillasaurus 557 | Gojirasaurus 558 | Gondwanatitan 559 | Gongbusaurus 560 | Gongpoquansaurus 561 | Gongxianosaurus 562 | Gorgosaurus 563 | Goyocephale 564 | Graciliceratops 565 | Graciliraptor 566 | Gracilisuchus 567 | Gravitholus 568 | Gresslyosaurus 569 | Griphornis 570 | Griphosaurus 571 | Gryphoceratops 572 | Gryponyx 573 | Gryposaurus 574 | Gspsaurus 575 | Guaibasaurus 576 | Gualicho 577 | Guanlong 578 | Gwyneddosaurus 579 | Gyposaurus 580 | Hadrosauravus 581 | Hadrosaurus 582 | Haestasaurus 583 | Hagryphus 584 | Hallopus 585 | Halszkaraptor 586 | Halticosaurus 587 | Hanssuesia 588 | Hanwulosaurus 589 | Haplocanthosaurus 590 | Haplocanthus 591 | Haplocheirus 592 | Harpymimus 593 | Haya 594 | Hecatasaurus 595 | Heilongjiangosaurus 596 | Heishansaurus 597 | Helioceratops 598 | Helopus 599 | Heptasteornis 600 | Herbstosaurus 601 | Herrerasaurus 602 | Hesperonychus 603 | Hesperosaurus 604 | Heterodontosaurus 605 | Heterosaurus 606 | Hexing 607 | Hexinlusaurus 608 | Heyuannia 609 | Hierosaurus 610 | Hippodraco 611 | Hironosaurus 612 | Hisanohamasaurus 613 | Histriasaurus 614 | Homalocephale 615 | Honghesaurus 616 | Hongshanosaurus 617 | Hoplitosaurus 618 | Hoplosaurus 619 | Horshamosaurus 620 | Hortalotarsus 621 | Huabeisaurus 622 | Hualianceratops 623 | Huanansaurus 624 | Huanghetitan 625 | Huangshanlong 626 | Huaxiagnathus 627 | Huaxiaosaurus 628 | Huaxiasaurus 629 | Huayangosaurus 630 | Hudiesaurus 631 | Huehuecanauhtlus 632 | Hulsanpes 633 | Hungarosaurus 634 | Huxleysaurus 635 | Hylaeosaurus 636 | HylosaurusHypacrosaurus 637 | Hypselorhachis 638 | Hypselosaurus 639 | Hypselospinus 640 | Hypsibema 641 | Hypsilophodon 642 | Hypsirhophus 643 | habodcraniosaurus 644 | Ichthyovenator 645 | Ignavusaurus 646 | Iguanacolossus 647 | Iguanodon 648 | Iguanoides 649 | Skeleton 650 | Iguanosaurus 651 | Iliosuchus 652 | Ilokelesia 653 | Incisivosaurus 654 | Indosaurus 655 | Indosuchus 656 | Ingenia 657 | Inosaurus 658 | Irritator 659 | Isaberrysaura 660 | Isanosaurus 661 | Ischioceratops 662 | Ischisaurus 663 | Ischyrosaurus 664 | Isisaurus 665 | Issasaurus 666 | Itemirus 667 | Iuticosaurus 668 | Jainosaurus 669 | Jaklapallisaurus 670 | Janenschia 671 | Jaxartosaurus 672 | Jeholosaurus 673 | Jenghizkhan 674 | Jensenosaurus 675 | Jeyawati 676 | Jianchangosaurus 677 | Jiangjunmiaosaurus 678 | Jiangjunosaurus 679 | Jiangshanosaurus 680 | Jiangxisaurus 681 | Jianianhualong 682 | Jinfengopteryx 683 | Jingshanosaurus 684 | Jintasaurus 685 | Jinzhousaurus 686 | Jiutaisaurus 687 | Jobaria 688 | Jubbulpuria 689 | Judiceratops 690 | Jurapteryx 691 | Jurassosaurus 692 | Juratyrant 693 | Juravenator 694 | Kagasaurus 695 | Kaijiangosaurus 696 | Kakuru 697 | Kangnasaurus 698 | Karongasaurus 699 | Katepensaurus 700 | Katsuyamasaurus 701 | Kayentavenator 702 | Kazaklambia 703 | Kelmayisaurus 704 | KemkemiaKentrosaurus 705 | Kentrurosaurus 706 | Kerberosaurus 707 | Kentrosaurus 708 | Khaan 709 | Khetranisaurus 710 | Kileskus 711 | Kinnareemimus 712 | Kitadanisaurus 713 | Kittysaurus 714 | KlamelisaurusKol 715 | Koparion 716 | Koreaceratops 717 | Koreanosaurus 718 | Koreanosaurus 719 | Koshisaurus 720 | Kosmoceratops 721 | Kotasaurus 722 | Koutalisaurus 723 | Kritosaurus 724 | Kryptops 725 | Krzyzanowskisaurus 726 | Kukufeldia 727 | Kulceratops 728 | Kulindadromeus 729 | Kulindapteryx 730 | Kunbarrasaurus 731 | Kundurosaurus 732 | Kunmingosaurus 733 | Kuszholia 734 | Labocania 735 | Labrosaurus 736 | Laelaps 737 | Laevisuchus 738 | Lagerpeton 739 | Lagosuchus 740 | Laiyangosaurus 741 | Lamaceratops 742 | Lambeosaurus 743 | Lametasaurus 744 | Lamplughsaura 745 | Lanasaurus 746 | Lancangosaurus 747 | Lancanjiangosaurus 748 | Lanzhousaurus 749 | Laosaurus 750 | Lapampasaurus 751 | Laplatasaurus 752 | Lapparentosaurus 753 | Laquintasaura 754 | Latenivenatrix 755 | Latirhinus 756 | Leaellynasaura 757 | Leinkupal 758 | Leipsanosaurus 759 | Lengosaurus 760 | Leonerasaurus 761 | Lepidocheirosaurus 762 | Lepidus 763 | Leptoceratops 764 | Leptorhynchos 765 | Leptospondylus 766 | Leshansaurus 767 | Lesothosaurus 768 | Lessemsaurus 769 | Levnesovia 770 | Lewisuchus 771 | Lexovisaurus 772 | Leyesaurus 773 | Liaoceratops 774 | Liaoningosaurus 775 | Liaoningtitan 776 | Liaoningvenator 777 | Liassaurus 778 | Libycosaurus 779 | Ligabueino 780 | Ligabuesaurus 781 | Ligomasaurus 782 | Likhoelesaurus 783 | Liliensternus 784 | Limaysaurus 785 | Limnornis 786 | Limnosaurus 787 | Limusaurus 788 | Linhenykus 789 | Linheraptor 790 | Linhevenator 791 | Lirainosaurus 792 | LisboasaurusLiubangosaurus 793 | Lohuecotitan 794 | Loncosaurus 795 | Longisquama 796 | Longosaurus 797 | Lophorhothon 798 | Lophostropheus 799 | Loricatosaurus 800 | Loricosaurus 801 | Losillasaurus 802 | Lourinhanosaurus 803 | Lourinhasaurus 804 | Luanchuanraptor 805 | Luanpingosaurus 806 | Lucianosaurus 807 | Lucianovenator 808 | Lufengosaurus 809 | Lukousaurus 810 | Luoyanggia 811 | Lurdusaurus 812 | Lusitanosaurus 813 | Lusotitan 814 | Lycorhinus 815 | Lythronax 816 | Macelognathus 817 | Machairasaurus 818 | Machairoceratops 819 | Macrodontophion 820 | Macrogryphosaurus 821 | Macrophalangia 822 | Macroscelosaurus 823 | Macrurosaurus 824 | Madsenius 825 | Magnapaulia 826 | Magnamanus 827 | Magnirostris 828 | Magnosaurus 829 | Magulodon 830 | Magyarosaurus 831 | Mahakala 832 | Maiasaura 833 | Majungasaurus 834 | Majungatholus 835 | Malarguesaurus 836 | Malawisaurus 837 | Maleevosaurus 838 | Maleevus 839 | Mamenchisaurus 840 | Manidens 841 | Mandschurosaurus 842 | Manospondylus 843 | Mantellisaurus 844 | Mantellodon 845 | Mapusaurus 846 | Marasuchus 847 | Marisaurus 848 | Marmarospondylus 849 | Marshosaurus 850 | Martharaptor 851 | Masiakasaurus 852 | Massospondylus 853 | Matheronodon 854 | Maxakalisaurus 855 | Medusaceratops 856 | Megacervixosaurus 857 | Megadactylus 858 | Megadontosaurus 859 | Megalosaurus 860 | Megapnosaurus 861 | Megaraptor 862 | Mei 863 | Melanorosaurus 864 | Mendozasaurus 865 | Mercuriceratops 866 | Meroktenos 867 | Metriacanthosaurus 868 | Microcephale 869 | Microceratops 870 | Microceratus 871 | Microcoelus 872 | Microdontosaurus 873 | Microhadrosaurus 874 | Micropachycephalosaurus 875 | Microraptor 876 | Microvenator 877 | Mierasaurus 878 | Mifunesaurus 879 | Minmi 880 | Minotaurasaurus 881 | Miragaia 882 | Mirischia 883 | Moabosaurus 884 | Mochlodon 885 | Mohammadisaurus 886 | Mojoceratops 887 | Mongolosaurus 888 | Monkonosaurus 889 | Monoclonius 890 | Monolophosaurus 891 | Mononychus 892 | Mononykus 893 | Montanoceratops 894 | Morelladon 895 | Morinosaurus 896 | Morosaurus 897 | Morrosaurus 898 | Mosaiceratops 899 | Moshisaurus 900 | Mtapaiasaurus 901 | Mtotosaurus 902 | Murusraptor 903 | Mussaurus 904 | Muttaburrasaurus 905 | Muyelensaurus 906 | Mymoorapelta 907 | Naashoibitosaurus 908 | Nambalia 909 | Nankangia 910 | Nanningosaurus 911 | Nanosaurus 912 | Nanotyrannus 913 | Nanshiungosaurus 914 | Nanuqsaurus 915 | Nanyangosaurus 916 | Narambuenatitan 917 | Nasutoceratops 918 | Natronasaurus 919 | Nebulasaurus 920 | Nectosaurus 921 | Nedcolbertia 922 | Nedoceratops 923 | Neimongosaurus 924 | Nemegtia 925 | Nemegtomaia 926 | Nemegtosaurus 927 | Neosaurus 928 | Neosodon 929 | Neovenator 930 | Neuquenraptor 931 | Neuquensaurus 932 | Newtonsaurus 933 | Ngexisaurus 934 | Nicksaurus 935 | Nigersaurus 936 | Ningyuansaurus 937 | Niobrarasaurus 938 | Nipponosaurus 939 | Noasaurus 940 | Nodocephalosaurus 941 | Nodosaurus 942 | Nomingia 943 | Nopcsaspondylus 944 | Normanniasaurus 945 | Nothronychus 946 | Notoceratops 947 | Notocolossus 948 | Notohypsilophodon 949 | Nqwebasaurus 950 | Nteregosaurus 951 | Nurosaurus 952 | Nuthetes 953 | Nyasasaurus 954 | Nyororosaurus 955 | Ohmdenosaurus 956 | Ojoceratops 957 | Ojoraptorsaurus 958 | Oligosaurus 959 | Olorotitan 960 | Omeisaurus 961 | Omosaurus 962 | Onychosaurus 963 | Oohkotokia 964 | Opisthocoelicaudia 965 | Oplosaurus 966 | Orcomimus 967 | OrinosaurusOrkoraptor 968 | OrnatotholusOrnithodesmus 969 | Ornithoides 970 | Ornitholestes 971 | Ornithomerus 972 | Ornithomimoides 973 | Ornithomimus 974 | Ornithopsis 975 | Ornithosuchus 976 | Ornithotarsus 977 | Orodromeus 978 | Orosaurus 979 | Orthogoniosaurus 980 | Orthomerus 981 | Oryctodromeus 982 | Oshanosaurus 983 | Osmakasaurus 984 | Ostafrikasaurus 985 | Ostromia 986 | Othnielia 987 | Othnielosaurus 988 | Otogosaurus 989 | Ouranosaurus 990 | Overosaurus 991 | Oviraptor 992 | Ovoraptor 993 | Owenodon 994 | Oxalaia 995 | Ozraptor 996 | Pachycephalosaurus 997 | Pachyrhinosaurus 998 | Pachysauriscus 999 | Pachysaurops 1000 | Pachysaurus 1001 | Pachyspondylus 1002 | Pachysuchus 1003 | Padillasaurus 1004 | Pakisaurus 1005 | Palaeoctonus 1006 | Palaeocursornis 1007 | Palaeolimnornis 1008 | Palaeopteryx 1009 | Palaeosauriscus 1010 | Palaeosaurus 1011 | Palaeosaurus 1012 | Palaeoscincus 1013 | Paleosaurus 1014 | Paludititan 1015 | Paluxysaurus 1016 | Pampadromaeus 1017 | Pamparaptor 1018 | Panamericansaurus 1019 | Pandoravenator 1020 | Panguraptor 1021 | Panoplosaurus 1022 | Panphagia 1023 | Pantydraco 1024 | Paraiguanodon 1025 | Paralititan 1026 | Paranthodon 1027 | Pararhabdodon 1028 | Parasaurolophus 1029 | Pareiasaurus 1030 | Parksosaurus 1031 | Paronychodon 1032 | Parrosaurus 1033 | Parvicursor 1034 | Patagonykus 1035 | Patagosaurus 1036 | Patagotitan 1037 | Pawpawsaurus 1038 | Pectinodon 1039 | Pedopenna 1040 | Pegomastax 1041 | Peishansaurus 1042 | Pekinosaurus 1043 | Pelecanimimus 1044 | Pellegrinisaurus 1045 | Peloroplites 1046 | Pelorosaurus 1047 | Peltosaurus 1048 | Penelopognathus 1049 | Pentaceratops 1050 | Petrobrasaurus 1051 | Phaedrolosaurus 1052 | Philovenator 1053 | Phuwiangosaurus 1054 | Phyllodon 1055 | Piatnitzkysaurus 1056 | Picrodon 1057 | Pinacosaurus 1058 | Pisanosaurus 1059 | Pitekunsaurus 1060 | Piveteausaurus 1061 | Planicoxa 1062 | Plateosauravus 1063 | Plateosaurus 1064 | Platyceratops 1065 | Plesiohadros 1066 | Pleurocoelus 1067 | Pleuropeltus 1068 | Pneumatoarthrus 1069 | Pneumatoraptor 1070 | Podokesaurus 1071 | Poekilopleuron 1072 | Polacanthoides 1073 | Polacanthus 1074 | Polyodontosaurus 1075 | Polyonax 1076 | Ponerosteus 1077 | Poposaurus 1078 | Parasaurolophus 1079 | Postosuchus 1080 | Powellvenator 1081 | Pradhania 1082 | Prenocephale 1083 | Prenoceratops 1084 | Priconodon 1085 | Priodontognathus 1086 | Proa 1087 | Probactrosaurus 1088 | Probrachylophosaurus 1089 | Proceratops 1090 | Proceratosaurus 1091 | Procerosaurus 1092 | Procerosaurus 1093 | Procheneosaurus 1094 | Procompsognathus 1095 | Prodeinodon 1096 | Proiguanodon 1097 | Propanoplosaurus 1098 | Proplanicoxa 1099 | Prosaurolophus 1100 | Protarchaeopteryx 1101 | Protecovasaurus 1102 | Protiguanodon 1103 | Protoavis 1104 | Protoceratops 1105 | Protognathosaurus 1106 | Protognathus 1107 | Protohadros 1108 | Protorosaurus 1109 | Protorosaurus 1110 | Protrachodon 1111 | Proyandusaurus 1112 | Pseudolagosuchus 1113 | Psittacosaurus 1114 | Pteropelyx 1115 | Pterospondylus 1116 | Puertasaurus 1117 | Pukyongosaurus 1118 | Pulanesaura 1119 | Pycnonemosaurus 1120 | Pyroraptor 1121 | Qantassaurus 1122 | Qianzhousaurus 1123 | Qiaowanlong 1124 | Qijianglong 1125 | Qinlingosaurus 1126 | Qingxiusaurus 1127 | Qiupalong 1128 | Quaesitosaurus 1129 | Quetecsaurus 1130 | Quilmesaurus 1131 | Rachitrema 1132 | Rahiolisaurus 1133 | Rahona 1134 | Rahonavis 1135 | Rajasaurus 1136 | Rapator 1137 | Rapetosaurus 1138 | Raptorex 1139 | Ratchasimasaurus 1140 | Rativates 1141 | Rayososaurus 1142 | Razanandrongobe 1143 | Rebbachisaurus 1144 | Regaliceratops 1145 | Regnosaurus 1146 | Revueltosaurus 1147 | Rhabdodon 1148 | Rhadinosaurus 1149 | Rhinorex 1150 | Rhodanosaurus 1151 | Rhoetosaurus 1152 | Rhopalodon 1153 | Riabininohadros 1154 | Richardoestesia 1155 | Rileya 1156 | Rileyasuchus 1157 | Rinchenia 1158 | Rinconsaurus 1159 | Rioarribasaurus 1160 | Riodevasaurus 1161 | Riojasaurus 1162 | Riojasuchus 1163 | Rocasaurus 1164 | Roccosaurus 1165 | Rubeosaurus 1166 | Ruehleia 1167 | Rugocaudia 1168 | Rugops 1169 | Rukwatitan 1170 | Ruyangosaurus 1171 | Sacisaurus 1172 | Sahaliyania 1173 | Saichania 1174 | Saldamosaurus 1175 | Salimosaurus 1176 | Saltasaurus 1177 | Saltopus 1178 | Saltriosaurus 1179 | Sanchusaurus 1180 | Sangonghesaurus 1181 | Sanjuansaurus 1182 | Sanpasaurus 1183 | Santanaraptor 1184 | Saraikimasoom 1185 | Sarahsaurus 1186 | Sarcolestes 1187 | Sarcosaurus 1188 | Sarmientosaurus 1189 | Saturnalia 1190 | Sauraechinodon 1191 | Saurolophus 1192 | Sauroniops 1193 | Sauropelta 1194 | Saurophaganax 1195 | Saurophagus 1196 | Sauroplites 1197 | Sauroposeidon 1198 | Saurornithoides 1199 | Saurornitholestes 1200 | Savannasaurus 1201 | Scansoriopteryx 1202 | Scaphonyx 1203 | Scelidosaurus 1204 | Scipionyx 1205 | Sciurumimus 1206 | Scleromochlus 1207 | Scolosaurus 1208 | Scutellosaurus 1209 | Secernosaurus 1210 | Sefapanosaurus 1211 | Segisaurus 1212 | Segnosaurus 1213 | Seismosaurus 1214 | Seitaad 1215 | Selimanosaurus 1216 | Sellacoxa 1217 | Sellosaurus 1218 | Serendipaceratops 1219 | Serikornis 1220 | Shamosaurus 1221 | Shanag 1222 | Shanshanosaurus 1223 | Shantungosaurus 1224 | Shanxia 1225 | Shanyangosaurus 1226 | Shaochilong 1227 | Shenzhousaurus 1228 | Shidaisaurus 1229 | Shingopana 1230 | Shixinggia 1231 | Shuangbaisaurus 1232 | Shuangmiaosaurus 1233 | Shunosaurus 1234 | Shuvosaurus 1235 | Shuvuuia 1236 | Siamodon 1237 | Siamodracon 1238 | Siamosaurus 1239 | Siamotyrannus 1240 | Siats 1241 | Sibirosaurus 1242 | Sibirotitan 1243 | Sidormimus 1244 | Sigilmassasaurus 1245 | Silesaurus 1246 | Siluosaurus 1247 | Silvisaurus 1248 | Similicaudipteryx 1249 | Sinocalliopteryx 1250 | Sinoceratops 1251 | Sinocoelurus 1252 | Sinopelta 1253 | Sinopeltosaurus 1254 | Sinornithoides 1255 | Sinornithomimus 1256 | Sinornithosaurus 1257 | Sinosauropteryx 1258 | Sinosaurus 1259 | Sinotyrannus 1260 | Sinovenator 1261 | Sinraptor 1262 | Sinusonasus 1263 | Sirindhorna 1264 | Skorpiovenator 1265 | Smilodon 1266 | Sonidosaurus 1267 | Sonorasaurus 1268 | Soriatitan 1269 | Sphaerotholus 1270 | Sphenosaurus 1271 | Sphenospondylus 1272 | Spiclypeus 1273 | Spinophorosaurus 1274 | Spinops 1275 | Spinosaurus 1276 | Spinostropheus 1277 | Spinosuchus 1278 | Spondylosoma 1279 | Squalodon 1280 | Staurikosaurus 1281 | Stegoceras 1282 | Stegopelta 1283 | Stegosaurides 1284 | Stegosaurus 1285 | Stenonychosaurus 1286 | Stenopelix 1287 | Stenotholus 1288 | Stephanosaurus 1289 | Stereocephalus 1290 | Sterrholophus 1291 | Stokesosaurus 1292 | Stormbergia 1293 | Strenusaurus 1294 | Streptospondylus 1295 | Struthiomimus 1296 | Struthiosaurus 1297 | Stygimoloch 1298 | Stygivenator 1299 | Styracosaurus 1300 | Succinodon 1301 | Suchomimus 1302 | Suchosaurus 1303 | Suchoprion 1304 | Sugiyamasaurus 1305 | Skeleton 1306 | Sulaimanisaurus 1307 | Supersaurus 1308 | Suuwassea 1309 | Suzhousaurus 1310 | Symphyrophus 1311 | Syngonosaurus 1312 | Syntarsus 1313 | Syrmosaurus 1314 | Szechuanosaurus 1315 | Tachiraptor 1316 | Talarurus 1317 | Talenkauen 1318 | Talos 1319 | Tambatitanis 1320 | Tangvayosaurus 1321 | Tanius 1322 | Tanycolagreus 1323 | Tanystropheus 1324 | Tanystrosuchus 1325 | Taohelong 1326 | Tapinocephalus 1327 | Tapuiasaurus 1328 | Tarascosaurus 1329 | Tarbosaurus 1330 | Tarchia 1331 | Tastavinsaurus 1332 | Tatankacephalus 1333 | Tatankaceratops 1334 | Tataouinea 1335 | Tatisaurus 1336 | Taurovenator 1337 | Taveirosaurus 1338 | Tawa 1339 | Tawasaurus 1340 | Tazoudasaurus 1341 | Technosaurus 1342 | Tecovasaurus 1343 | Tehuelchesaurus 1344 | Teihivenator 1345 | Teinurosaurus 1346 | Teleocrater 1347 | Telmatosaurus 1348 | Tenantosaurus 1349 | Tenchisaurus 1350 | Tendaguria 1351 | Tengrisaurus 1352 | Tenontosaurus 1353 | Teratophoneus 1354 | Teratosaurus 1355 | Termatosaurus 1356 | Tethyshadros 1357 | Tetragonosaurus 1358 | Texacephale 1359 | Texasetes 1360 | Teyuwasu 1361 | Thecocoelurus 1362 | Thecodontosaurus 1363 | Thecospondylus 1364 | Theiophytalia 1365 | Therizinosaurus 1366 | Therosaurus 1367 | Thescelosaurus 1368 | Thespesius 1369 | Thotobolosaurus 1370 | Tianchisaurus 1371 | Tianchungosaurus 1372 | Tianyulong 1373 | Tianyuraptor 1374 | Tianzhenosaurus 1375 | Tichosteus 1376 | Tienshanosaurus 1377 | Timimus 1378 | Timurlengia 1379 | Titanoceratops 1380 | Titanosaurus 1381 | Titanosaurus 1382 | Tochisaurus 1383 | Tomodon 1384 | Tonganosaurus 1385 | Tongtianlong 1386 | Tonouchisaurus 1387 | Torilion 1388 | Tornieria 1389 | Torosaurus 1390 | Torvosaurus 1391 | Tototlmimus 1392 | Trachodon 1393 | Traukutitan 1394 | Trialestes 1395 | Triassolestes 1396 | Tribelesodon 1397 | Triceratops 1398 | Trigonosaurus 1399 | Trimucrodon 1400 | Trinisaura 1401 | Triunfosaurus 1402 | Troodon 1403 | Tsaagan 1404 | Tsagantegia 1405 | Tsintaosaurus 1406 | Tugulusaurus 1407 | Tuojiangosaurus 1408 | Turanoceratops 1409 | Turiasaurus 1410 | Tylocephale 1411 | Tylosteus 1412 | Tyrannosaurus 1413 | Tyrannotitan 1414 | Illustration 1415 | Uberabatitan 1416 | Udanoceratops 1417 | Ugrosaurus 1418 | Ugrunaaluk 1419 | Uintasaurus 1420 | Ultrasauros 1421 | Ultrasaurus 1422 | Ultrasaurus 1423 | Umarsaurus 1424 | Unaysaurus 1425 | Unenlagia 1426 | Unescoceratops 1427 | Unicerosaurus 1428 | Unquillosaurus 1429 | Urbacodon 1430 | Utahceratops 1431 | Utahraptor 1432 | Uteodon 1433 | Vagaceratops 1434 | Vahiny 1435 | Valdoraptor 1436 | Valdosaurus 1437 | Variraptor 1438 | Velociraptor 1439 | Vectensia 1440 | Vectisaurus 1441 | Velafrons 1442 | Velocipes 1443 | Velociraptor 1444 | Velocisaurus 1445 | Venaticosuchus 1446 | Venenosaurus 1447 | Veterupristisaurus 1448 | Viavenator 1449 | Vitakridrinda 1450 | Vitakrisaurus 1451 | Volkheimeria 1452 | Vouivria 1453 | Vulcanodon 1454 | Wadhurstia 1455 | Wakinosaurus 1456 | Walgettosuchus 1457 | Walkeria 1458 | Walkersaurus 1459 | Wangonisaurus 1460 | Wannanosaurus 1461 | Wellnhoferia 1462 | Wendiceratops 1463 | Wiehenvenator 1464 | Willinakaqe 1465 | Wintonotitan 1466 | Wuerhosaurus 1467 | Wulagasaurus 1468 | Wulatelong 1469 | Wyleyia 1470 | Wyomingraptor 1471 | Xenoceratops 1472 | Xenoposeidon 1473 | Xenotarsosaurus 1474 | Xianshanosaurus 1475 | Xiaosaurus 1476 | Xingxiulong 1477 | Xinjiangovenator 1478 | Xinjiangtitan 1479 | Xiongguanlong 1480 | Xixianykus 1481 | Xixiasaurus 1482 | Xixiposaurus 1483 | Xuanhanosaurus 1484 | Xuanhuaceratops 1485 | Xuanhuasaurus 1486 | Xuwulong 1487 | Yaleosaurus 1488 | Yamaceratops 1489 | Yandusaurus 1490 | Yangchuanosaurus 1491 | Yaverlandia 1492 | Yehuecauhceratops 1493 | Yezosaurus 1494 | Yibinosaurus 1495 | Yimenosaurus 1496 | Yingshanosaurus 1497 | Yinlong 1498 | Yixianosaurus 1499 | Yizhousaurus 1500 | Yongjinglong 1501 | Yuanmouraptor 1502 | Yuanmousaurus 1503 | Yueosaurus 1504 | Yulong 1505 | Yunganglong 1506 | Yunmenglong 1507 | Yunnanosaurus 1508 | Yunxianosaurus 1509 | Yurgovuchia 1510 | Yutyrannus 1511 | Zanabazar 1512 | Zanclodon 1513 | Zapalasaurus 1514 | Zapsalis 1515 | Zaraapelta 1516 | ZatomusZby 1517 | Zephyrosaurus 1518 | Zhanghenglong 1519 | Zhejiangosaurus 1520 | Zhenyuanlong 1521 | Zhongornis 1522 | Zhongjianosaurus 1523 | Zhongyuansaurus 1524 | Zhuchengceratops 1525 | Zhuchengosaurus 1526 | Zhuchengtitan 1527 | Zhuchengtyrannus 1528 | Ziapelta 1529 | Zigongosaurus 1530 | Zizhongosaurus 1531 | Zuniceratops 1532 | Zunityrannus 1533 | Zuolong 1534 | Zuoyunlong 1535 | Zupaysaurus 1536 | Zuul -------------------------------------------------------------------------------- /char_rnns/plots/Dino_Names_GRU.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/char_rnns/plots/Dino_Names_GRU.jpeg -------------------------------------------------------------------------------- /char_rnns/plots/Dino_Names_LSTM.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/char_rnns/plots/Dino_Names_LSTM.jpeg -------------------------------------------------------------------------------- /char_rnns/plots/Dino_Names_RNN.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/char_rnns/plots/Dino_Names_RNN.jpeg -------------------------------------------------------------------------------- /char_rnns/plots/Dino_Names_Scratch.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/char_rnns/plots/Dino_Names_Scratch.jpeg -------------------------------------------------------------------------------- /neural_machine_translation/README.md: -------------------------------------------------------------------------------- 1 | # Neural Machine Translation 2 | 3 | The machine translation task was traditionally performed using statistical methods. However, like most statistical paradigms, this Statistical Machine Translation (SMT) had large computational and memory overheads. 4 | 5 | With the popularization of neural networks and deep learning, heavy research into Neural Machine Translation (NMT) and its successful employment has largely replaced SMT. This section contains implementations of papers that introduced some of those ground-breaking architectures in NMT. 6 | 7 | All the models were trained on the [Multi30k](https://arxiv.org/abs/1605.00459) dataset which contains roughly 30 thousand English, German and French sentences, each sentence being 10-20 words long. We have trained and evaluated our models for translation from German to English. 8 | 9 | Below is a table addressing some common data and optimization related parameters. 10 | 11 | | Parameter | Value | 12 | | -------------- |:------------------:| 13 | | Training Set | 29000/31014 | 14 | | Testing Set | 1000/31014 | 15 | | Validation Set | 1014/31014 | 16 | | Loss Function | Cross Entropy Loss | 17 | | Optimizer | AdamW | 18 | 19 | ## Architectures 20 | 21 | ### 1. [Sequence to Sequence Learning with Neural Networks](https://github.com/IvLabs/Natural-Language-Processing/blob/master/neural_machine_translation/notebooks/Seq2Seq.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QaoSKUbLy4ViHnJsl3m3H6xEemDdowkL?usp=sharing) 22 | This paper proposes the pioneering paradigm for neural machine translation using a simple yet applaudable encoder-decoder RNN pair. Although being the poorest of performers in this list, it earns a spot due to its novely.\ 23 | **Note:** A few changes have been made in order to improve performance. 24 | 1. Unlike the paper, reversing the input sequences resulted in a lower BLEU Score. Hence, the input sequences have not been reversed. 25 | 2. Further an additional parameter called ```teacher_forcing_ratio```, which is the probability of using the ground truth tokens as inputs while decoding has been introduced. It is usually set to 1 while training and 0 while sampling. However, setting it to 0.5 while training resulted in a better BLEU Score than setting it to 1. 26 | 27 | ### 2. [Neural Machine Translation by Jointly Learning to Align and Translate](https://github.com/IvLabs/Natural-Language-Processing/blob/master/neural_machine_translation/notebooks/Seq2Seq_with_Attention.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hOd2JFafWgOvdbeXWoSm1gIiMEQ64KbM?usp=sharing) 28 | This paper presents a remarkable improvement in the Sequence to Sequence architecture by introducing a (soft)alignment metric called "attention". This metric induces a sense of similarity between tokens of the source and decoded sentences, which increases the BLEU score by almost 1.5 times and is much more robust with regards to the length of source and target sentences.\ 29 | **Note:** In this implementation, setting the ```teacher_forcing_ratio``` to 1 resulted in a better BLEU score (even with higher validation and test perplexities) than setting it to 0.5. 30 | 31 | ### 3. [Convolutional Sequence to Sequence Learning](https://github.com/IvLabs/Natural-Language-Processing/blob/master/neural_machine_translation/notebooks/Conv_Seq2Seq.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/18uxa0ZMlck4f5fzTUdkze_cuKz-Hgx_o?usp=sharing) 32 | Unlike the previous papers, this paper is the first paper to tackle language modelling without using Recurrent Neural Networks. However, it does provide an attention mechanism and outperforms the sequential attention based architecture while having almost similar training times. The validation and test perplexities are quite low and the BLEU score is much better than that of the previous paper. 33 | 34 | ### 4. [Attention Is All You Need](https://github.com/IvLabs/Natural-Language-Processing/blob/master/neural_machine_translation/notebooks/Attention_Is_All_You_Need.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RlDhclIlJWzcFC0iPqwbiFEuiCbYIJBG?usp=sharing) 35 | This paper can be said to have revolutionized almost all of Natural Language Processing, all owing to its tremendous simplicity and efficiency which makes it the backbone of almost all present State Of The Art architectures. The proposed *Transformer* architecture introduces a novel metric called *Self-Attention* which induces a sense of (soft)alignment amongst the source sentence tokens too. Additionally architecture uses simple feed-forward layers which makes it almost 4 times lighter than the Convolutional Sequence to Sequence architecture, while attaining the highest BLEU score amongst all the above mentioned architectures. 36 | 37 | ## Summary 38 | Below is a table, summarising the number of parameters and the BLEU scores achieved by each architecture. 39 | 40 | | Architecture | No. of Trainable Parameters | BLEU Score | 41 | | ----------------------------------- |:---------------------------:|:----------:| 42 | | Sequence to Sequence | 13,899,013 | 18.94 | 43 | | Sequence to Sequence with Attention | 20,518,917 | 31.24 | 44 | | Convolutional Sequence to Sequence | 37,351,685 | 36.53 | 45 | | Attention Is All You Need | 9,038,853 | 37.50 | 46 | 47 | **Note:** 48 | 1. The above BLEU scores may vary slightly upon training the models (even with fixed SEED). 49 | 2. The research paper notes for the above mentioned papers can be found [here](https://github.com/IvLabs/ResearchPaperNotes/tree/master/natural_language_processing). 50 | 51 | 52 | ### Reference(s): 53 | * [PyTorch Seq2Seq by Ben Trevett](https://github.com/bentrevett/pytorch-seq2seq) 54 | -------------------------------------------------------------------------------- /neural_machine_translation/plots/Conv_Seq2Seq.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/neural_machine_translation/plots/Conv_Seq2Seq.jpeg -------------------------------------------------------------------------------- /neural_machine_translation/plots/Seq2Seq.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/neural_machine_translation/plots/Seq2Seq.jpeg -------------------------------------------------------------------------------- /neural_machine_translation/plots/Seq2Seq_with_Attention.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/neural_machine_translation/plots/Seq2Seq_with_Attention.jpeg -------------------------------------------------------------------------------- /neural_machine_translation/plots/Transformer.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/neural_machine_translation/plots/Transformer.jpeg -------------------------------------------------------------------------------- /text_classification/README.md: -------------------------------------------------------------------------------- 1 | # Sentiment Analysis and Text Classification 2 | 3 | With the advent rise of digitization, the need to automate the analysis of the predominant method of interacting with the digital world, i.e. text, has exponentially risen for a variety of reasons and purposes. 4 | 5 | Text classification is heavily used in tasks such as spam detection, news categorization, customer query tagging, and of course, sentiment analysis. 6 | 7 | Sentiment analysis is the automated process of determining the polarity of a topic, given an input sentence. 8 | It is used in tasks such as analyzing a product/company's public sentiment/word of mouth, customer feedback analysis, political analysis, etc. 9 | 10 | We experiment with 4 architectures on two datasets. The IMDb dataset was to used to train the model for predicting the polarity of movie reviews. For text classification, the TREC dataset was used to train the model to predict one of six classes - `['ENTY', 'HUM', 'DESC', 'NUM', 'LOC', 'ABBR']`. 11 | 12 | 13 | Below is a table addressing some common data and optimization related parameters. 14 | 15 | | Parameter | Value | 16 | | -------------- |:------------------:| 17 | | Training Set | 17500 | 18 | | Testing Set | 25000 | 19 | | Validation Set | 7500 | 20 | | Loss Function | Cross Entropy Loss | 21 | | Optimizer | Adam | 22 | 23 | ## Architectures 24 | 25 | ### [1. Long Short-Term Memory Networks](https://github.com/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/LSTM.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/LSTM.ipynb) 26 | RNNs were one of the first architectures for capturing sequential data on various tasks such as predicting parts of speech, sentiment analysis, text classification, etc. However, RNNs suffers from a vanishing gradient problem that make them unable to propagate useful gradient information, thereby causing the model not to learn from larger contexts. To overcome this problem, LSTMs were introduced, which are a modified form of the RNN. 27 | 28 | ### [2. Bag of Tricks for Efficient Text Classification](https://github.com/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/FastText.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/FastText.ipynb) 29 | This paper shows that tweaking baseline linear classifiers enables us to learn an efficient and accurate text classifier for a very large corpus with a large output space (whereas deep networks would be very slow). Sentences being represented by BoW and trained by improved linear models (like logistic regression or SVM) with a rank constraint and a fast loss approximation are shown to achieve performance on par with the state-of-the-art. 30 | 31 | ### [3. Convolutional Neural Networks for Sentence Classification](https://github.com/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/CNN.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/CNN.ipynb) 32 | CNNs trained on top of pre-trained word vectors for sentence-level classification tasks are shown to achieve excellent results on multiple benchmarks. 33 | The underlying idea strengthened here is that feature extractors obtained from a pre- 34 | trained deep learning model perform well on a variety of tasks—including tasks that are very different from the original task for which the feature extractors were trained. 35 | 36 | ### [4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://github.com/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/BERT.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IvLabs/Natural-Language-Processing/blob/master/text_classification/notebooks/BERT.ipynb) 37 | BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, without substantial task-specific architecture modifications. 38 | 39 | ## Summary 40 | Below is a table, summarising the number of parameters and the train, test, validation accuracies. 41 | 42 | | Architecture | No. of Learnable Parameters | Train Acc. (%) | Valid. Acc. (%) | Test Acc. (%) | 43 | | ------------ | --------------------------- | -------------- | --------------- | ------------- | 44 | | LSTM | 4,810,857 | 89.10 | 88.37 | 87.62 | 45 | | FastText | 2,500,301 | 88.35 | 86.67 | 86.28 | 46 | | CNN | 843,906 | 94.95 | 79.89 | 84.90 | 47 | | BERT | 110,468,609 | 91.04 | 91.04 | 91.44 | 48 | 49 | ## Plots 50 |

51 | 52 | 53 | 54 | 55 |

56 | 57 | ## Reference(s) 58 | * [PyTorch Sentiment Analysis by Ben Trevett](https://github.com/bentrevett/) 59 | 60 | -------------------------------------------------------------------------------- /text_classification/notebooks/CNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "-B_MqscXWGI4" 7 | }, 8 | "source": [ 9 | "# Installing Packages" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "!pip install -U torchtext==0.6.0" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "bbMNpP192mb2" 25 | }, 26 | "source": [ 27 | "# Importing Required Libraries" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 1, 33 | "metadata": { 34 | "id": "slw2Y5t3taWQ" 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "import torch\n", 39 | "from torchtext import data, datasets\n", 40 | "import torch.nn as nn\n", 41 | "import torch.optim as optim\n", 42 | "from torchtext.data import Field, LabelField, BucketIterator\n", 43 | "import torch.nn.functional as F\n", 44 | "\n", 45 | "import random\n", 46 | "\n", 47 | "import matplotlib.pyplot as plt" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 2, 53 | "metadata": { 54 | "colab": { 55 | "base_uri": "https://localhost:8080/" 56 | }, 57 | "id": "7kAMYpKr2tJV", 58 | "outputId": "3ecc3577-55f7-4711-b775-7f5471892d0d" 59 | }, 60 | "outputs": [ 61 | { 62 | "name": "stdout", 63 | "output_type": "stream", 64 | "text": [ 65 | "Notebook is running on cuda\n" 66 | ] 67 | } 68 | ], 69 | "source": [ 70 | "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", 71 | "print(\"Notebook is running on\", device)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "id": "8ZOcRV2w2vpH" 78 | }, 79 | "source": [ 80 | "Fixing SEED for reproducibility of results" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 3, 86 | "metadata": { 87 | "id": "N99mW-8XtnWr" 88 | }, 89 | "outputs": [], 90 | "source": [ 91 | "SEED = 4444\n", 92 | "\n", 93 | "random.seed(SEED)\n", 94 | "torch.manual_seed(SEED)\n", 95 | "torch.cuda.manual_seed(SEED)\n", 96 | "torch.backends.cudnn.deterministic = True" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": { 103 | "id": "O5NaQke7touV" 104 | }, 105 | "outputs": [], 106 | "source": [ 107 | "FIELD = Field(tokenize = 'spacy', tokenizer_language = 'en_core_web_sm')\n", 108 | "\n", 109 | "LABEL = LabelField()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "id": "RWXCd0V73YK7" 116 | }, 117 | "source": [ 118 | "# Splitting the data" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 5, 124 | "metadata": { 125 | "colab": { 126 | "base_uri": "https://localhost:8080/" 127 | }, 128 | "id": "79ih2w2sttNq", 129 | "outputId": "c4358b46-cc35-4382-a74c-bdc0118e067b" 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "train_data, test_data = datasets.TREC.splits(FIELD, LABEL, fine_grained=False)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 6, 139 | "metadata": { 140 | "id": "nxpdowpRtwNQ" 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "train_data, valid_data = train_data.split(random_state = random.seed(SEED))" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 7, 150 | "metadata": { 151 | "colab": { 152 | "base_uri": "https://localhost:8080/" 153 | }, 154 | "id": "BSWt_wDwtzLt", 155 | "outputId": "548f8358-857e-454d-c80a-f8e320606108" 156 | }, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "Number of training examples: 3816\n", 163 | "Number of validation examples: 1636\n", 164 | "Number of testing examples: 500\n" 165 | ] 166 | } 167 | ], 168 | "source": [ 169 | "print(f'Number of training examples: {len(train_data)}')\n", 170 | "print(f'Number of validation examples: {len(valid_data)}')\n", 171 | "print(f'Number of testing examples: {len(test_data)}')" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 8, 177 | "metadata": { 178 | "colab": { 179 | "base_uri": "https://localhost:8080/" 180 | }, 181 | "id": "trFz-SmNt0fO", 182 | "outputId": "4d2dcf36-d394-40f5-c1b1-42bb09e00c6b" 183 | }, 184 | "outputs": [], 185 | "source": [ 186 | "MAX_VOCAB_SIZE = 25000 # excluding and token\n", 187 | "FIELD.build_vocab(train_data, max_size = MAX_VOCAB_SIZE, vectors=\"glove.6B.100d\", unk_init = torch.Tensor.normal_)\n", 188 | "LABEL.build_vocab(train_data)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 9, 194 | "metadata": { 195 | "colab": { 196 | "base_uri": "https://localhost:8080/" 197 | }, 198 | "id": "PQUUTTgat5hg", 199 | "outputId": "3703b174-a45b-4bb2-b5a7-65c7d123899e" 200 | }, 201 | "outputs": [ 202 | { 203 | "name": "stdout", 204 | "output_type": "stream", 205 | "text": [ 206 | "Unique tokens in FIELD vocabulary: 7518\n", 207 | "Unique tokens in LABEL vocabulary: 6\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "print(f\"Unique tokens in FIELD vocabulary: {len(FIELD.vocab)}\")\n", 213 | "print(f\"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}\")" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "id": "BIZMVoIA4S2K" 220 | }, 221 | "source": [ 222 | "# Model Definition" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 10, 228 | "metadata": { 229 | "id": "egyx2GtzuEdP" 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "class CNN(nn.Module):\n", 234 | " def __init__(self, vocab_size, emb_dim, num_filters, filter_sizes, output_dim, pad_idx):\n", 235 | " super().__init__()\n", 236 | " self.embedding = nn.Embedding(vocab_size, emb_dim)\n", 237 | " self.convs = nn.ModuleList([nn.Conv2d(in_channels = 1, \n", 238 | " out_channels = num_filters, \n", 239 | " kernel_size = (fs, emb_dim)) \n", 240 | " for fs in filter_sizes])\n", 241 | " self.fc = nn.Linear(len(filter_sizes) * num_filters, output_dim)\n", 242 | " self.dropout = nn.Dropout(0.5)\n", 243 | " \n", 244 | " def forward(self, text): # [input] = [seq_len, batch_size]\n", 245 | " text = text.permute(1, 0) # [input] = [batch_size, seq_len]\n", 246 | " embedded = self.embedding(text) # [embedded] = [batch_size, seq_len, emb_dim]\n", 247 | " embedded = embedded.unsqueeze(1) # [embedded] = [batch_size, 1, seq_len, emb_dim]\n", 248 | " conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs] # [conv] = [batch_size, num_filters, seq_len - filter_sizes[n]]\n", 249 | " pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved] # [pooled] = [batch_size, num_filters]\n", 250 | " output = self.dropout(torch.cat(pooled, dim = 1)) # [output] = [batch_size, num_filters * len(filter_sizes)]\n", 251 | " return self.fc(output)" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 11, 257 | "metadata": { 258 | "id": "2PDdgoXiufHr" 259 | }, 260 | "outputs": [], 261 | "source": [ 262 | "def batch_accuracy(preds, y):\n", 263 | " top_pred = preds.argmax(1, keepdim = True)\n", 264 | " correct = top_pred.eq(y.view_as(top_pred)).sum()\n", 265 | " acc = correct.float() / y.shape[0]\n", 266 | " return acc" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": { 272 | "id": "OpnqXXA1uvuD" 273 | }, 274 | "source": [ 275 | "# Training and Evaluation Functions" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 12, 281 | "metadata": { 282 | "id": "gKSFO3hRutOc" 283 | }, 284 | "outputs": [], 285 | "source": [ 286 | "def Train(model, iterator, optimizer, criterion): \n", 287 | " epoch_loss = 0\n", 288 | " epoch_acc = 0\n", 289 | " model.train()\n", 290 | " for batch in iterator:\n", 291 | " optimizer.zero_grad()\n", 292 | " inp = batch.text\n", 293 | " label = batch.label \n", 294 | " predictions = model(inp)\n", 295 | " loss = criterion(predictions, label)\n", 296 | " acc = batch_accuracy(predictions, label)\n", 297 | " loss.backward()\n", 298 | " optimizer.step()\n", 299 | " epoch_loss += loss.item()\n", 300 | " epoch_acc += acc.item()\n", 301 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 13, 307 | "metadata": { 308 | "id": "YqTp93Aruwoe" 309 | }, 310 | "outputs": [], 311 | "source": [ 312 | "def Evaluate(model, iterator, criterion):\n", 313 | " epoch_loss = 0\n", 314 | " epoch_acc = 0\n", 315 | " model.eval()\n", 316 | " with torch.no_grad():\n", 317 | " for batch in iterator:\n", 318 | " inp = batch.text\n", 319 | " label = batch.label \n", 320 | " predictions = model(inp).squeeze(1)\n", 321 | " loss = criterion(predictions, label)\n", 322 | " acc = batch_accuracy(predictions, label)\n", 323 | " epoch_loss += loss.item()\n", 324 | " epoch_acc += acc.item()\n", 325 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": { 331 | "id": "v1_3yX3-42vO" 332 | }, 333 | "source": [ 334 | "# Data Iterators, Hyperparameters and Model Initialization" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 14, 340 | "metadata": { 341 | "id": "yIHX-jbs5ClS" 342 | }, 343 | "outputs": [], 344 | "source": [ 345 | "BATCH_SIZE = 64\n", 346 | "\n", 347 | "train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits((train_data, valid_data, test_data), batch_size = BATCH_SIZE, device = device)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": 15, 353 | "metadata": { 354 | "id": "o6c7ld9wuYh4" 355 | }, 356 | "outputs": [], 357 | "source": [ 358 | "VOCAB_SIZE = len(FIELD.vocab) # dimension of one-hot vector / vocabulary\n", 359 | "EMB_DIM = 100 # dimensions of word embeddings\n", 360 | "NUM_FILTERS = 100 # number of filters in CNN\n", 361 | "FILTER_SIZES = [2,3,4] # filter sizes of CNN\n", 362 | "OUTPUT_DIM = len(LABEL.vocab) # dimensions of output\n", 363 | "DROPOUT = 0.5 \n", 364 | "\n", 365 | "NUM_EPOCHS = 10\n", 366 | "LR = 0.001" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 16, 372 | "metadata": { 373 | "id": "j5U3vbZcwEgr" 374 | }, 375 | "outputs": [], 376 | "source": [ 377 | "model = CNN(VOCAB_SIZE, EMB_DIM, NUM_FILTERS, FILTER_SIZES, OUTPUT_DIM, FIELD.vocab.stoi[FIELD.pad_token])" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 17, 383 | "metadata": { 384 | "id": "ziglTF1D5voa" 385 | }, 386 | "outputs": [], 387 | "source": [ 388 | "optimizer = optim.Adam(model.parameters(), lr=LR)\n", 389 | "\n", 390 | "criterion = nn.CrossEntropyLoss()\n", 391 | "\n", 392 | "model = model.to(device)\n", 393 | "criterion = criterion.to(device)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 18, 399 | "metadata": { 400 | "colab": { 401 | "base_uri": "https://localhost:8080/" 402 | }, 403 | "id": "X7Mm3BnuuZ2F", 404 | "outputId": "e195ceb6-9a27-4359-8aae-c8777a5f7b1a" 405 | }, 406 | "outputs": [ 407 | { 408 | "name": "stdout", 409 | "output_type": "stream", 410 | "text": [ 411 | "The model has 843,906 trainable parameters\n" 412 | ] 413 | } 414 | ], 415 | "source": [ 416 | "def count_parameters(model):\n", 417 | " return sum(p.numel() for p in model.parameters() if p.requires_grad)\n", 418 | "\n", 419 | "print(f'The model has {count_parameters(model):,} trainable parameters')" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": { 425 | "id": "99qvK1qw5aAV" 426 | }, 427 | "source": [ 428 | "# Training" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 19, 434 | "metadata": { 435 | "id": "WT1J71IjuySn" 436 | }, 437 | "outputs": [], 438 | "source": [ 439 | "import time\n", 440 | "\n", 441 | "def Epoch_time(start_time, end_time):\n", 442 | " elapsed_time = end_time - start_time\n", 443 | " elapsed_mins = int(elapsed_time / 60)\n", 444 | " elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n", 445 | " return elapsed_mins, elapsed_secs" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 21, 451 | "metadata": { 452 | "colab": { 453 | "base_uri": "https://localhost:8080/" 454 | }, 455 | "id": "TKDOAVH5uz2s", 456 | "outputId": "846b4443-f6d6-4519-fe6d-a70d4911f35d" 457 | }, 458 | "outputs": [ 459 | { 460 | "name": "stdout", 461 | "output_type": "stream", 462 | "text": [ 463 | "Learning Rate: 0.001, Number of Filters: 100, Filter Sizes: [2, 3, 4]\n", 464 | "Time taken for epoch 1: 0m 17s\n", 465 | "Training Loss: 1.4261 | Validation Loss: 1.0630\n", 466 | "Training Accuracy: 43.17 %| Validation Accuracy: 62.21 %\n", 467 | "Time taken for epoch 2: 0m 14s\n", 468 | "Training Loss: 0.9428 | Validation Loss: 0.8337\n", 469 | "Training Accuracy: 64.54 %| Validation Accuracy: 70.10 %\n", 470 | "Time taken for epoch 3: 0m 14s\n", 471 | "Training Loss: 0.7243 | Validation Loss: 0.7207\n", 472 | "Training Accuracy: 73.87 %| Validation Accuracy: 74.49 %\n", 473 | "Time taken for epoch 4: 0m 14s\n", 474 | "Training Loss: 0.5866 | Validation Loss: 0.6712\n", 475 | "Training Accuracy: 79.45 %| Validation Accuracy: 74.23 %\n", 476 | "Time taken for epoch 5: 0m 15s\n", 477 | "Training Loss: 0.4986 | Validation Loss: 0.6235\n", 478 | "Training Accuracy: 83.14 %| Validation Accuracy: 76.77 %\n", 479 | "Time taken for epoch 6: 0m 14s\n", 480 | "Training Loss: 0.3997 | Validation Loss: 0.5953\n", 481 | "Training Accuracy: 87.04 %| Validation Accuracy: 77.98 %\n", 482 | "Time taken for epoch 7: 0m 14s\n", 483 | "Training Loss: 0.3328 | Validation Loss: 0.5735\n", 484 | "Training Accuracy: 89.69 %| Validation Accuracy: 78.20 %\n", 485 | "Time taken for epoch 8: 0m 14s\n", 486 | "Training Loss: 0.2581 | Validation Loss: 0.5676\n", 487 | "Training Accuracy: 92.07 %| Validation Accuracy: 79.17 %\n", 488 | "Time taken for epoch 9: 0m 15s\n", 489 | "Training Loss: 0.2212 | Validation Loss: 0.5526\n", 490 | "Training Accuracy: 93.41 %| Validation Accuracy: 79.92 %\n", 491 | "Time taken for epoch 10: 0m 14s\n", 492 | "Training Loss: 0.1761 | Validation Loss: 0.5503\n", 493 | "Training Accuracy: 94.95 %| Validation Accuracy: 79.89 %\n", 494 | "Model with Train Loss 0.1761, Validation Loss: 0.5503 was saved.\n" 495 | ] 496 | } 497 | ], 498 | "source": [ 499 | "print(f\"Learning Rate: {LR}, Number of Filters: {NUM_FILTERS}, Filter Sizes: {FILTER_SIZES}\")\n", 500 | "train_losses = []\n", 501 | "valid_losses = []\n", 502 | "min_losses = [float('inf'), float('inf')]\n", 503 | "\n", 504 | "start_time = time.time()\n", 505 | "for epoch in range(1, NUM_EPOCHS+1):\n", 506 | " \n", 507 | " train_loss, train_acc = Train(model, train_iterator, optimizer, criterion)\n", 508 | " train_losses.append(train_loss)\n", 509 | " valid_loss, valid_acc = Evaluate(model, valid_iterator, criterion)\n", 510 | " valid_losses.append(valid_loss)\n", 511 | "\n", 512 | " if valid_loss < min_losses[0]:\n", 513 | " min_losses[0] = valid_loss\n", 514 | " min_losses[1] = train_loss\n", 515 | " torch.save(model.state_dict(), 'CNN.pt')\n", 516 | "\n", 517 | " elapsed_time = Epoch_time(start_time, time.time())\n", 518 | " print(f\"Time taken for epoch {epoch}: {elapsed_time[0]}m {elapsed_time[1]}s\")\n", 519 | " start_time = time.time()\n", 520 | " print(f\"Training Loss: {train_loss:.4f} | Validation Loss: {valid_loss:.4f}\")\n", 521 | " print(f\"Training Accuracy: {train_acc*100:.2f} %| Validation Accuracy: {valid_acc*100:.2f} %\")\n", 522 | "\n", 523 | "print(f\"Model with Train Loss {min_losses[1]:.4f}, Validation Loss: {min_losses[0]:.4f} was saved.\")" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": 27, 529 | "metadata": { 530 | "id": "OkjBpvPt8BmJ" 531 | }, 532 | "outputs": [ 533 | { 534 | "data": { 535 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAABFBElEQVR4nO3dd3xV9f348dc7gwSSEEYgEJJAQPZI2EOU4EAcgAOrCChubautvzr7tdU6vuq3trW2VauCCwWpq7hpVUBFtuwlI0CYYQQSICHj/fvjnIRLSMJNcm9uwn0/H4/7yL1nfM77niT3fT/jfI6oKsYYY4JXSKADMMYYE1iWCIwxJshZIjDGmCBnicAYY4KcJQJjjAlylgiMMSbIWSIIEiKSLiKZfiz/JRH5ncfrO0Vkj4jkikhz92d7Pxx3tYik+7pct+zbReQ5f5RdyTEjRGSdiLSozeP6gj9/F8a/LBH4gfuhV/IoFpFjHq/HV6M8rz7ERWSAiHwmItkickBEForIjdV7F1Wjqneo6uNuHOHAn4ERqhqtqvvdn5trcgwReV1Enihz3O6qOrsm5VZwrAbAw8AfPZeJyKMi8pOIHBGRDBGZIiLt3PWzRSRPRJI89rlARDI8XmeIyF4RifJYdouIzHbfTz4wBXiwCrE+KiJTq/1mfcRfvwsAEWktIpNFZJeI5LjJ8g+e59FUnyUCP3A/9KJVNRrYBozyWPa2P44pIoOBr4E5wFlAc+BO4GJ/HO804oFIYHUAju0rY4B1qrrDY9l7wGjgOiAWSAWWAOd7bHME+B2VCwV+Vcn6d4AbRCSiqkH7i4iEBfDYzYAfgIbAYFWNAS4EmgAdqlFewN5LnaWq9vDjA8gALnCfh+B809sE7AdmAM3cdS8C73vs9wzwFRAFHAOKgVz3kVDOcb4D/lFJHOlApsfrkjhygDXAFR7rzsJJKIeAfcC77nIB/gLsBQ4DK4Ee7rrXgSeATjgfhurG+rW7XoGz3OcNgT8BW91jfAc0dNf9C9jtLp8LdHeX3wYUAMfdcj8u5/xGAM8BO93Hc0CE5/sHfuPGvwu4sZLzNQV42OP1Be7vIamSfWYDj7jntIPHfhll/h4eBA4ATdxltwCzy5T1EzDMy7+xR4GpFawbBMwDsoHlQLrHuhuBtW68m4Hby/69AA+4v4+33OPMAN5091kN9Kvgb/102/YBfnTX/Qt4F3iigvfwBM7fWkgF69u5f19hZX4Xt7jPJwHf4/zt7geecs9HD4/tW7i/35bu68uAZe5284BeHts+AOxwY18PnB/oz5maPqxGULvuAi4HhgEJwEHgH+663wA9RWSSiJwD3AzcoKpHcL7V79QTtYqdnoWKSCNgMM43Vm9tAs7B+Wb7B2CqiLR21z0OzAKaAonA39zlI4BzcT7sY4Gf4fxjlVLVDUB392UTVT2vnGM/C/QFhgDNgPtxEh3A50BHoCWwFHjbLfdl9/n/uedgVDnl/g/OB18azrf1ATjNOyVauXG3wTm//xCRpuWdHKAnzj95iQuAhaq6vYLtS+wAXsE5pxVZjPNBdW8l26zFeQ+ISLLb3Jd8mmOfRETaAJ/ifJA2c4/3vkf/w16cD7zGOEnhLyLSx6OIVu5+bXESMTg1ouk438ZnAn+vJIRyt3Wb3T7E+fLQDJgGXFFJORcAH6hqcSXbnM5AnGQXDzwGfACM81j/M2COqu4Vkd44XwRux6lZ/xOY6fbfdAZ+CfRXp2ZyEU4CrNcsEdSuO4D/UdVMddqCHwXGikiYqh4FJuK0rU8F7lJVbzt3m+L8Lnd5G4iq/ktVd6pqsaq+i/MNdIC7ugDnnz9BVfNU9TuP5TFAF0BUda2qen1MABEJAW4CfqWqO1S1SFXnuecDVZ2iqjke5ydVRGK9LH488Jiq7lXVLJwP44ke6wvc9QWq+hlOzaJzBWU1wfnGV6I53p/fp4BRItK9km1+D9xVSadwjhsDqrpNVZuo6jYvj19iAvCZqn7m/p7/g5OELnHL/VRVN6ljDk7yP8dj/2LgEVXNV9Vj7rLv3PKKcGoJqZUcv6JtBwFhwPPu7+IDYGEl5VTl3Fdkp6r+TVUL3ffyDnCtx/rr3GXgJL1/quoC9+/zDSDfjbsIp+bZTUTCVTVDVTfVMLaAs0RQu9oCH7rf7rJxvvUV4XxLQVUX4HxrEZxqtbcO4vzTtj7dhiVE5HoRWeYRSw8gzl19vxvDQnckyE1ufF/jfKv7B7BXRF4WkcZViBP3GJE4NZKyMYWKyNMisklEDnPim1Zc2W0rkIDT3FRiq7usxH5VLfR4fRSIrqCsgzhJr3RfvDy/bhL6O843z4q2WQV8QsWdwjE4zRI10Ra4uuR37P6eh+K+DxG5WETmuwMLsnEShOe5zlLVvDJl7vZ4fhSIrKTNvaJtE4Ad6razuCqraXl97itRtvxvgEYiMtDt7E/DqaWAc95+U+a8JeF8MdoI/BrnS8peEZkuIgnUc5YIatd24GL3213JI1LdDkkR+QXOt42dOB/GJSqdItatTfwAXOVNECLSFqf54pdAc1VtAqzC+fBHVXer6q2qmoBTPX5BRM5y1z2vqn2BbjhNRPd599ZL7QPyKL+T7zqcTtoLcJpw2pWE7P483VS5O3H+iUsku8uqYwXO+yvxX2CAiCR6uf8fgeE4TWAVeQS4FaepqqyuOG36NbEdeKvM31uUqj7tdkS/j9NMF+/+DXzGiXMNpz/f1bULaCMinsdKqmhjnHN/hVubLM8R92cjj2Wtymxz0ntxaykzcJqHxgGfqGpJDXA78GSZ89ZIVae5+76jqkNx/tYUpz+vXrNEULteAp50P4gRkRYiMsZ93gmnLXcCTnPG/SKS5u63B2h+miaS+4FJInKfiDR3y0wVkenlbBuF8wec5W53I06NAPf11R4feAfdbYtFpL/7DSoc558vjxNt+15x23mnAH8WkQS3FjDY/WCKwamC78f5p/7fMrvvASq7FmEa8LB7XuNwml+qO6zyM5y+nJK4/wv8B6dG11dEwkQkRkTuKKkxlXmf2Tgd4veXXeexzUacTtK7PZe7bfvNgPlViDdERCI9HhE4732UiFzknudIcYYiJwINcL50ZAGFInIxTh9QbfgBpyb8S/c8juFEs2R5/ozTj/GGx/9OGxH5s4j0cmtgO4AJ7vu8Ce9GE70DXIPTpPiOx/JXgDvcv3URkSgRudT9fXcWkfPc85vHiYEc9Zolgtr1V5xOs1kikoPzjz7QrS5PBZ5R1eWq+hPwW+AtEYlQ1XU4H3Kb3arqKVVRVZ0HnOc+NovIAeBlnA+0stuuwfmQ+gHnw7UnzqiKEv2BBSKS68b7K3WuAWiM809yEKfZZT8e4+yr4F6cUSCLcEbPPIPzt/imW+4OnJFMZT8IJ+O0zWaLyEfllPsEThv4Crf8pe6y6vgY6FLmXI/FOZ/v4oxqWgX0w/nGWp6/4nzgVeYxnMTs6TrgjZJ+E7ezOPc0ncXjcD6USh6b3I7tMTh/S1k433Tvwxl9k4OTgGbg/D6vw/ld+52qHgeuxOmwz8b58vMJzpeA8rY/gDOwoADn7zIHZ0TdIWCju9mtOO9tP85ghXlexLEA5wtNAs4ghZLli93y/o5zbjbijDwCJ3k+jVOz3Y0zqOEhL952nSYnN9MZY0qIyG1AN1X9dS0eMwKnSehcVd1bW8cNNBFZALykqq8FOpZgZInAGFPrRGQYzvDcfThNMy8B7as6Cs34hl1hZ4wJhM44zVJROCPlxloSCByrERhjTJCzzmJjjAly9a5pKC4uTtu1axfoMIwxpl5ZsmTJPlUt90r2epcI2rVrx+LFiwMdhjHG1CsisrWiddY0ZIwxQc4SgTHGBDlLBMYYE+TqXR+BMaZ2FBQUkJmZSV5e2QlITV0WGRlJYmIi4eHhXu9jicAYU67MzExiYmJo164dJ08UauoqVWX//v1kZmaSkpLi9X7WNGSMKVdeXh7Nmze3JFCPiAjNmzevci3OEoExpkKWBOqf6vzOgiYRbMrK5Q8fr6agqN5PHW6MMT4VNIlg2/6jvPZ9Bp+ttHmtjKkP9u/fT1paGmlpabRq1Yo2bdqUvj5+/Hil+y5evJi777670m0AhgwZ4pNYZ8+ezWWXXeaTsgLBb53FIjIFuAzYq6o9KtmuP84NUq5V1ff8Fc+wTi1o3yKKyd9tYXRqglV5janjmjdvzrJlywB49NFHiY6O5t577y1dX1hYSFhY+R9h/fr1o1+/fqc9xrx5p71/TVDwZ43gdWBkZRuISCjO3alm+TEOAEJChJvOTmFF5iEWZRz09+GMMX4wadIk7rjjDgYOHMj999/PwoULGTx4ML1792bIkCGsX78eOPkb+qOPPspNN91Eeno67du35/nnny8tLzo6unT79PR0xo4dS5cuXRg/fjwlMzN/9tlndOnShb59+3L33XdX6Zv/tGnT6NmzJz169OCBBx4AoKioiEmTJtGjRw969uzJX/7yFwCef/55unXrRq9evbj22mtrfrKqwG81AlWdKyLtTrPZXTg30O7vrzg8XdUnkWdnrWfyd5sZkNKsNg5pzBnhDx+vZs3Owz4ts1tCYx4Z1b3K+2VmZjJv3jxCQ0M5fPgw3377LWFhYfz3v//lt7/9Le+///4p+6xbt45vvvmGnJwcOnfuzJ133nnKOPsff/yR1atXk5CQwNlnn833339Pv379uP3225k7dy4pKSmMGzfO6zh37tzJAw88wJIlS2jatCkjRozgo48+IikpiR07drBq1SoAsrOzAXj66afZsmULERERpctqS8D6CNwbdF8BvOjFtreJyGIRWZyVlVXtYzZsEMr4gcnMWrOHbfuPVrscY0zgXH311YSGhgJw6NAhrr76anr06ME999zD6tWry93n0ksvJSIigri4OFq2bMmePXtO2WbAgAEkJiYSEhJCWloaGRkZrFu3jvbt25eOya9KIli0aBHp6em0aNGCsLAwxo8fz9y5c2nfvj2bN2/mrrvu4osvvqBx48YA9OrVi/HjxzN16tQKm7z8JZAXlD0HPKCqxadrr1fVl3FuxE6/fv1qdCed6we34+W5m3lt3pZqfRsxJhjVpf+VqKio0ue/+93vGD58OB9++CEZGRmkp6eXu09ERETp89DQUAoLC6u1jS80bdqU5cuX8+WXX/LSSy8xY8YMpkyZwqeffsrcuXP5+OOPefLJJ1m5cmWtJYRAjhrqB0wXkQxgLPCCiFzu74PGN47ksl4JzFi0ncN5Bf4+nDHGjw4dOkSbNm0AeP31131efufOndm8eTMZGRkAvPvuu17vO2DAAObMmcO+ffsoKipi2rRpDBs2jH379lFcXMxVV13FE088wdKlSykuLmb79u0MHz6cZ555hkOHDpGbm+vz91ORgCUCVU1R1Xaq2g54D/i5qn5UG8e+eWgKR44X8e7C7bVxOGOMn9x///089NBD9O7d2y/f4Bs2bMgLL7zAyJEj6du3LzExMcTGxpa77VdffUViYmLpIyMjg6effprhw4eTmppK3759GTNmDDt27CA9PZ20tDQmTJjAU089RVFRERMmTKBnz5707t2bu+++myZNmvj8/VTEb/csFpFpQDoQB+wBHgHCAVT1pTLbvg584s3w0X79+qkvbkxzzT9/IPPgMebcl05YaNBcTmGM19auXUvXrl0DHUbA5ebmEh0djaryi1/8go4dO3LPPfcEOqxKlfe7E5ElqlrumFq/fQKq6jhVba2q4aqaqKqTVfWlsknA3XaSP68hKM/NQ1PYkX2ML1bvrs3DGmPqmVdeeYW0tDS6d+/OoUOHuP322wMdks8F7eyj53eNp13zRkz+bguX9UoIdDjGmDrqnnvuqfM1gJoK2jaR0BDhxrNT+HFbNku32QVmxpjgFbSJAGBs30QaR4Yx+bstgQ7FGGMCJqgTQVREGOMGJPP5yl1kHrQLzIwxwSmoEwHADUOcuy+9MS8j0KEYY0xABH0iSGjSkEt6tmb6wu3k5vvnSkJjTNUNHz6cL7/88qRlzz33HHfeeWeF+6Snp1MyvPySSy4pd86eRx99lGeffbbSY3/00UesWbOm9PXvf/97/vvf/1Yh+vLV1emqgz4RgDOUNCe/kBmL7AIzY+qKcePGMX369JOWTZ8+3ev5fj777LNqX5RVNhE89thjXHDBBdUqqz6wRACkJTWhb9umvDZvC0XF/rnAzhhTNWPHjuXTTz8tvQlNRkYGO3fu5JxzzuHOO++kX79+dO/enUceeaTc/du1a8e+ffsAePLJJ+nUqRNDhw4tnaoanGsE+vfvT2pqKldddRVHjx5l3rx5zJw5k/vuu4+0tDQ2bdrEpEmTeO8951Knr776it69e9OzZ09uuukm8vPzS4/3yCOP0KdPH3r27Mm6deu8fq+Bnq46aK8jKOuWoSnc+fZS/rNmDyN7tAp0OMbULZ8/CLtX+rbMVj3h4qcrXN2sWTMGDBjA559/zpgxY5g+fTo/+9nPEBGefPJJmjVrRlFREeeffz4rVqygV69e5ZazZMkSpk+fzrJlyygsLKRPnz707dsXgCuvvJJbb70VgIcffpjJkydz1113MXr0aC677DLGjh17Ull5eXlMmjSJr776ik6dOnH99dfz4osv8utf/xqAuLg4li5dygsvvMCzzz7Lq6++etrTUBemq7YagWtE91YkNm3IFBtKakyd4dk85NksNGPGDPr06UPv3r1ZvXr1Sc04ZX377bdcccUVNGrUiMaNGzN69OjSdatWreKcc86hZ8+evP322xVOY11i/fr1pKSk0KlTJwBuuOEG5s6dW7r+yiuvBKBv376lE9WdTl2YrtpqBK7QEGHSkHY88elaVmRm0yuxSaBDMqbuqOSbuz+NGTOGe+65h6VLl3L06FH69u3Lli1bePbZZ1m0aBFNmzZl0qRJ5OXlVav8SZMm8dFHH5Gamsrrr7/O7NmzaxRvyVTWvpjGujanq7YagYdr+icRHWEXmBlTV0RHRzN8+HBuuumm0trA4cOHiYqKIjY2lj179vD5559XWsa5557LRx99xLFjx8jJyeHjjz8uXZeTk0Pr1q0pKCjg7bffLl0eExNDTk7OKWV17tyZjIwMNm7cCMBbb73FsGHDavQe68J01VYj8BATGc41/ZN4Y14GD17chdaxDQMdkjFBb9y4cVxxxRWlTUSpqan07t2bLl26kJSUxNlnn13p/n369OGaa64hNTWVli1b0r//iTvjPv744wwcOJAWLVowcODA0g//a6+9lltvvZXnn3++tJMYIDIyktdee42rr76awsJC+vfvzx133FGl91MyXXWJf/3rX6XTVasql156KWPGjGH58uXceOONFBcXA5w0XfWhQ4dQVZ9NV+23aaj9xVfTUFdk+4GjDPvjN9x2bgcevLiL345jTF1n01DXX3VmGur6KqlZIy7q3oppC7dx9LhdYGaMOfNZIijHLeekcOhYAe8vyQx0KMYY43eWCMrRJ7kpqUlNmPJ9BsV2gZkJYvWt6dhU73dmiaAcIsLNQ1PYsu8IX6/bG+hwjAmIyMhI9u/fb8mgHlFV9u/fT2RkZJX2s1FDFbi4RysSYiOZ/N0WLugWH+hwjKl1iYmJZGZmkpWVFehQTBVERkaeNCrJG5YIKhAeGsINQ9rx1OfrWL3zEN0TYgMdkjG1Kjw8nJSUlECHYWqBNQ1V4toByTRqEGoXmBljzmiWCCoR2zCcq/sm8vHynew9XL1L2I0xpq6zRHAaN56dQmGx8tb8rYEOxRhj/MJviUBEpojIXhFZVcH68SKyQkRWisg8EUn1Vyw10S4uigu6xjN1/lbyCooCHY4xxvicP2sErwMjK1m/BRimqj2Bx4GX/RhLjdw8NIWDRwv4YOmOQIdijDE+57dEoKpzgQOVrJ+nqgfdl/OBqo13qkUDU5rRo01jpny/xcZUG2POOHWlj+BmoMK5ZEXkNhFZLCKLAzGmueQCs417c5mzwcZUG2POLKdNBCJytYjEuM8fFpEPRKSPrwIQkeE4ieCBirZR1ZdVtZ+q9mvRooWvDl0ll/ZMoGVMhA0lNcaccbypEfxOVXNEZChwATAZeNEXBxeRXsCrwBhV3e+LMv2lQZhzgdm3P+1j/e5Tb1hhjDH1lTeJoGSozKXAy6r6KdCgpgcWkWTgA2Ciqm6oaXm14boByUSGh9h9jY0xZxRvEsEOEfkncA3wmYhEeLOfiEwDfgA6i0imiNwsIneISMntfH4PNAdeEJFlIuK/u834SNOoBlzVJ5EPl+1gX25+oMMxxhifOO0dykSkEc4w0JWq+pOItAZ6quqs2giwLH/foex0NmXlcv6f5vDrCzry6ws6BSwOY4ypipreoaw18KmbBNKBq4GFvguvfunQIprhnVvYBWbGmDOGN4ngfaBIRM7CuegrCXjHr1HVcbec0559uceZuXxnoEMxxpga8yYRFKtqIXAl8DdVvQ+nlhC0hnRoTpdWMUz5zi4wM8bUf94kggIRGQdcD3ziLgv3X0h1n4hw09AU1u3O4fuNdXrUqzHGnJY3ieBGYDDwpKpuEZEU4C3/hlX3jUlLIC46gsnfbQ50KMYYUyOnTQSquga4F1gpIj2ATFV9xu+R1XERYaFMHNSWb9ZnsXFvbqDDMcaYavPmeoB04CfgH8ALwAYROde/YdUP4wcl0yAshCnf2wVmxpj6y5umoT8BI1R1mKqeC1wE/MW/YdUPcdERXJHWhg+WZnLwyPFAh2OMMdXiTSIIV9X1JS/c6SCCurPY083npJBXUMw7C7cFOhRjjKkWbxLBYhF5VUTS3ccrQJ2fDqK2dIqP4ZyOcbwxL4PjhcWBDscYY6rMm0RwJ7AGuNt9rAHuqHSPIHPz0BT25uTzyQq7wMwYU/94M2ooX1X/rKpXuo+/AN/UQmz1xrBOLejYMprJdoGZMaYequ4dypJ9GkU9V3KB2eqdh1mwpcK7cxpjTJ1U3URgX3vLuKJ3G5pFNeDVb20oqTGmfgmraIWIXFnRKqChf8Lxs6z10KKzX4qODA9l/MBk/v7NRjL2HaFdXJRfjmOMMb5WWY1gVAWPyzgx51D9sewdeGEwrPVf6BMHtyU8JITX7AIzY0w9UmGNQFVvrM1A/K7raFg0Gd67Eca/B+2H+fwQLWMiGZWawL+WZPL/LuxMbCO73MIYU/dVt4+g/omIhvH/guZnwbRxkLnEL4e5eWgKR48XMW2RXWBmjKkfgicRADRqBhM+gOgW8PZVsGeNzw/RLaExQzo05415GRQU2QVmxpi6L7gSAUDj1jDxIwiNgLeugIMZPj/EzUNT2HUoj89X7fZ52cYY42vezD66RER+ISJNayOgWtEsBSZ+CIV58OYYyPHtB/bwzi1pHxfF5G832wVmxpg6z5sawTVAArBIRKaLyEUiIn6Oy//iu8GE9yE3y6kZHPXdhWAhIcKNZ7djeeYhlmw96LNyjTHGH7yZYmKjqv4P0AnnpvVTgK0i8gcRaVbRfiIyRUT2isiqCtaLiDwvIhtFZIWI9Knum6i2xH4w7h3YvxHe+Rnk++4GM1f1TSS2YTiTv7OhpMaYus2rPgIR6YVzX4I/Au8DVwOHga8r2e11YGQl6y8GOrqP24AXvYnF59qnw9gpsGMJvDsBCvN9UmyjBmFcNzCZL1fvZvuBoz4p0xhj/MGrPgKcG9EsAnqp6t2qukBV/wRUeMNeVZ0LVNbeMgZ4Ux3zgSYi0rpq4ftI11Ew5h+w+Rt4/2YoKvRJsTcMbkeICK99n+GT8owxxh+8qRFcrarnq+o7qnrS12VVrWgaCm+0AbZ7vM50lwVG2nUw8mlY+zF8/CsorvnQz1axkVzWqzUzFm8nJ6/AB0EaY4zveZMIDrlt+UvdEUR/FZHmfo/Mg4jcJiKLRWRxVlaW/w406E4Y9gAsmwqzHgYfjPi5eWh7cvMLeXfR9tNvbIwxAeBNIpgOZAFXAWPd5+/64Ng7gCSP14nuslOo6suq2k9V+7Vo0cIHh65E+kMw4HaY/w+Y+2yNi+uZGMuAds147fsMCu0CM2NMHeRNImitqo+r6hb38QQQ74NjzwSud0cPDQIOqeouH5RbMyJOE1Gva+GbJ2DByzUu8qahKezIPsasNXt8EKAxxvhWhZPOeZglItcCM9zXY4EvT7eTiEwD0oE4EckEHsG96b2qvgR8BlwCbASOAnVnkruQEBjzd8g/DJ/fB5GxkHpNtYu7sFs8yc0aMfm7LVzSMzD94cYYUxE53ZWvIpIDRAEl7RohwBH3uapqY/+Fd6p+/frp4sWLa+dgBXnw9ljYOg+ufRs6X1ztol77fgt/+HgNH/58CL2Tz5yLtI0x9YOILFHVfuWt8+aCshhVDVHVMPcR4i6Lqe0kUOvCI2HcNGidCjNugC3fVruoq/slERMRZheYGWPqHG8vKBstIs+6j8v8HVSdEhHjTEXRLAWmXQs7llarmOiIMK4dkMTnq3azI/uYj4M0xpjq8+aCsqeBXwFr3MevROQpfwdWpzRq5kxS16gZTL0K9q6rVjE3DGkHwJvzMnwXmzHG1JA3NYJLgAtVdYqqTsGZNuJS/4ZVBzVOcKavDglzp6/eWuUiEps2YmSPVkydv5VVOw75PkZjjKkGb+9H0MTjeawf4qgfmndwagYFR+CtyyGn6sNBf3tJV5o0asCEyQtYu+uw72M0xpgq8iYR/C/wo4i8LiJvAEuAJ/0bVh3Wqodzz+Oc3TD1SjhWtWmm2zRpyLRbB9EwPJTxry5gw54cPwVqjDHeqTQRiEgIzrDRQcAHODOPDlZVX1xZXH8lDXCGk2ath3eugeNHTr+Ph+TmjXjn1kGEhQjXvbKAjXt9N/21McZUVaWJQFWLgftVdZeqznQfdv9FgA7nwdjJkLkI3p0IhcertHtKXBTv3DoIgOtemc+WfVVLJsYY4yveNA39V0TuFZEkEWlW8vB7ZPVBtzEw6nnY9BV8cCsUF1Vp97NaRvPOrQMpLFaue2U+2/bbfQuMMbXP21tV/gKYi9M/sASopUt764E+E2HEE7DmI2f66irOWNopPoapNw/kWEER416ZT+ZBSwbGmNrlTSLoqqopng+gm78Dq1eG3AXn3As/vgX/+V2Vk0G3hMZMvXkgOXkFjHtlPrsO2QVnxpja400imOflsuB23sPQ/1aY9zf47s9V3r1Hm1jeunkg2UcKGPfyfPYczvNDkMYYc6oKE4GItBKRvkBDEektIn3cRzrQqLYCrDdE4OL/g55Xw1ePwaJXq1xEalITXr9pAFk5+Yx7ZT5ZOb65f7IxxlSmshrBRcCzODeM+TPOzev/BPw/4Lf+D60eCgmBy1+ETiPh03th5XtVLqJv26a8ftMAdmXncd0r89mfa8nAGONf3kxDfZWqvl9L8ZxWrU5DXV0Fx2DqWNg+H659BzpdVOUifti0nxtfX0i75lFMu3UQTaMa+CFQY0ywqNE01MAnInKdiPxWRH5f8vBxjGeW8IbO9NXx3WHG9ZDxfZWLGNyhOa9e35/N+44wYfICDh0t8EOgxhjjXSL4NzAGKMS5IU3Jw1QmsjFM+ACaJDvTV+9cVuUihnaM4+WJfflpTy7XT1nA4TxLBsYY3/OmaWiVqvaopXhOq140DXk6lAlTRkLBUbjxC2jRqcpFfLV2D3dMXVI6sig6wps7jBpjzAk1bRqaJyI9fRxT8IhNhOv/DRLizFiavb3KRZzfNZ6/jevDisxD3PjaQo7kF/o+TmNM0PImEQwFlojIehFZISIrRWSFvwM7ozTv4DQT5ec6ySA3q8pFjOzRiuev7c2SrQe5+Y1FHDteteksjDGmIt4kgouBjsAIYBRwmfvTVEXrXjB+BhzaAW9cBpvnVLmIS3u15i/XpLFwywFufXMxeQWWDIwxNVfZBWXnAajqViBEVbeWPIC+tRXgGSV5kDOaKD8H3hwNb46BHUuqVMSYtDb8cWwq32/ax+1vLSG/0JKBMaZmKqsRPOvxvOx1BA/7IZbg0GE43LUULvpf2LUCXjnPmcY6a4PXRVzVN5Gnr+zJnA1Z/HzqUo4XFvsxYGPMma6yRCAVPC/vdfkFiIx0+xY2isiD5axPFpFvRORHt//hEm/KrffCI2HwL+BXy2HYg7Dpa3hhIHz0C687k6/pn8wTl/fgq3V7uWvaUgqKLBkYY6qnskSgFTwv7/UpRCQU+AdOH0M3YJyIlJ219GFghqr2Bq4FXjhtxGeSyMYw/CEnIQy8A1bOgL/1gS8egiP7Trv7hEFteXRUN75cvYdfT19GoSUDY0w1VDYgvb2IzMT59l/yHPd1ihdlDwA2qupmABGZjnNh2hqPbRRo7D6PBXZWIfYzR1QcjHwKBv0c5jwNC16CpW86tYbBv3QSRgUmnZ1CYbHyxKdrCQsV/vyzNEJDvKqwGWMMUMkFZSIyrLIdVbXSYS8iMhYYqaq3uK8nAgNV9Zce27QGZgFNgSjgAlU9pfdURG4DbgNITk7uu3Xr1soOXf9lrYevn4C1M6FhMzjnN9D/FqdJqQIvzt7EM1+s48o+bXh2bCohlgyMMR4qu6CswhrB6T7ofWQc8Lqq/klEBgNviUgP917JnrG8DLwMzpXFtRBXYLXoDNe8BTuWOlNaz/ofmP8CpD8IqddB6Km/tjvTO1BQVMyf/7OB8JAQnrqypyUDY4xXvLmOoLp2AEkerxPdZZ5uBmYAqOoPQCQQ58eY6pc2feD6j+D6mRDTCmbeBS8MgtUfQvGp/QF3n9+Ru887i3cXb+d3/17F6aYPMcYY8G8iWAR0FJEUEWmA0xk8s8w224DzAUSkK04iqPplt2e69sPglq/gmrchJBT+NQleGQ4bvzrltpj3XNiJO9M78PaCbfzh4zWWDIwxp1WlRCAiISJScc+lB1UtBH4JfAmsxRkdtFpEHhOR0e5mvwFuFZHlwDRgktonV/lEoOtlcOc85+Y3Rw/A1CvhjVGwfZHHZsL9F3XmlqEpvD4vgyc/XWvJwBhTKW9mH30HuAMowvmW3xj4q6r+0f/hnarezT7qL4X5sOR1mPN/cHQfdL7UuW9yvDNCV1X5w8dreH1eBncM68ADIzsjYn0GxgSrms4+2k1VDwOXA5/jDB2d6LvwTLWERcDA251rEIY/DBnfwotD4IPb4WAGIsIjo7oxfmAyL83ZxF/+4/2Vy8aY4OLNxPbhIhKOkwj+rqoFImJtDXVFRDQMuw/63wzf/QUWvgyr3od+NyLn3sfjY3pQWKQ8//VGwkJDuPv8joGO2BhTx3iTCP4JZADLgbki0hY47M+gTDU0agYjHodBd8KcZ2DRZPhxKiGDfs5Tl/ySgmJ3aGloCHemdwh0tMaYOuS0fQTl7iQS5nYG1zrrI/DS/k3wzZNO7SCyCcVn38P92wfy3ooDPHxpV245p32gIzTG1KIa9RGIyK9EpLE4JovIUuA8n0dpfKt5Bxg7BW6fC4n9CfnqEf6460aeSl7E05+u5PXvtwQ6QmNMHeFNZ/FNbmfxCJypICYCT/s1KuM7rVNhwnsw6TOkSTLj9v6F72MeYumnr/Lq3I02tNQY41UiKBlzeAnwlqquxstpqE0d0u5suOlLGPcuLZrG8nyDvzPyq4v4/v+uZP83L8DuVVBsN7kxJhh501m8RERm4QwbfUhEYgCb77g+EoHOIwnpOILiVe9TPO9dOu1aTPM5X8Mc0IjGSNIASBoEyQOhTV9oEBXoqI0xfubNBWUhQBqwWVWzRaQ50EZVA3IDe+ss9q3d2cf46/v/IW/TPEbEZDC80WYiD24AFELCoFUv5xabSQOdnzGtAh2yMaYaKuss9mrUkDslxLnuyzmq+rEP46sSSwS+p6p8vmo3v//3Kg4eLeDuIXHc2eEADXYuhG0LnPsqFx5zNm7SFpIHOzWGpEHQoguE+HPKKmOML9QoEYjI00B/4G130Thgkar+1qdReskSgf9kHz3Ok5+u5V9LMmkfF8VTV/ZkYPvmUHgcdq+EbT/A9vlOcjiy19kpMhYSBzi1heRBkNAHGjQK7BsxxpyipolgBZBWco8A9xaUP6pqL59H6gVLBP733U/7eOjDFWw/cIzrBibz4MVdaBwZfmIDVTi4xUkI236A7Qsga52zLiTMGalU0s+QNAhi4gPzRowxpXyRCNJV9YD7uhkw2xLBme3o8UL+PGsDU77fQsuYSB6/vAcXdqvkA/3oAchcBNvmO4+dS6Ewz1nXNOXkfoa4ztacZEwtq2kiuBZ4BvgGZ9joucCDqvqurwP1hiWC2rVsezYPvr+CdbtzuLRXax4d1Z0WMRGn37HwOOxa7jYluY+j+5x1kU3cpODWGNr0gfCGfn0fxgS7aicCd8TQWOBbnH4CgIWqutvnUXrJEkHtO15YzD/nbOJvX2+kYYNQfndZN67q06Zq01qrwoHNTkIo6WfYt95ZVzI6KWkAJPZ3fsYmOcNdjTE+UdMaweKKdg4ESwSBs3FvDg++v5LFWw9yTsc4/veKniQ1q0HH8NEDTv/C9oVOs9KOJVBw1FkX3QqS+jsd0UkDoHUahEf65H0YE4x8MWpoH/AucKRkeUmfQW2zRBBYxcXK1AVbeebzdRQr/GZEJ248O4XQEB98ey8qhD2rnKSwfSFkLoSDGc66kHBo3ctpUiqtNSTW/JjGBImaJoLyZidTVQ3I9JWWCOqGHdnHePjDlXyzPovUpCY8c1VPurTy6i6mVZO7100MC5xbcu788cQ1DTEJZWoNqc4Ne4wxp6jxBWV1iSWCukNVmbl8J3/4eA2HjxXw8/QO/OK8s4gIC/XfQYsKnGsaPGsN2ducdaEN3KGrHrWGxgn+i8WYeqRaiUBEJrjr3yqzfCJQpKrv+DxSL1giqHsOHDnO45+s4cMfd3BWy2ieuaonfds2q70AcnafSAoltYaifGdd48STaw2tekFYg9qLzZg6orqJYAFwvqrmllkeBcxV1b4+j9QLlgjqrm/W7+XhD1ex89Axrh/UlvtGdiE6wpt5DX2s5ErozIVOgti+EA5nOutCIyAhza0xDHSSg82fZIJAdRPBUlXtU8G6FXZBmSlPbn4hz365njd+yKB140ievKInw7u0DHRYcHjnidFJ2xfCrmVQdNxZF5vs1BriOkPTdice0S1tCKs5Y1Q3EawF+qnqkTLLY3DmGuri80i9YImgfliy9SAPvL+CjXtzGZOWwO8v60bz6DrUkVuYD7tWuLWGBZC55EStoURYQ4/E0PbkJNGkrc2pZOqV6iaCe4HzgTtUdau7rB3wD5wpJv7oxYFHAn8FQoFXVfWUO5uJyM+ARwEFlqvqdZWVaYmg/sgvLOKFbzbxwuyNREeE8cio7oxJS6jahWi1qeAYZG93hqx6PrK3Oj+P5568fXS8kxA8E0TJI6a1TaNh6pSaXFl8B/AQEO0uygWeVtUXvThoKLABuBDIBBYB41R1jcc2HYEZwHmqelBEWqrq3srKtURQ/6zfncMD769g2fZs0ju34InLe5DYtJ59m1aFo/tPTRIHM+DgVqc2oR73awptAE2ST00QJYkj0g9DbY2phC/uRxADoKo5VTjoYOBRVb3Iff2QW8ZTHtv8H7BBVV/1tlxLBPVTUbHy+rwMnv1yPSJw/0WduX5wO0J8cSFaXVB4HA5tP1F7KJso8rJP3r5hszJJou2JRNE4wa6HMD5XWSLwakhHVRKAhzbAdo/XmcDAMtt0cgP8Hqf56FFV/aJsQSJyG3AbQHJycjVCMYEWGiLcPDSFEd3i+e2HK3n04zXMXL6TZ67qRcf4mECHV3NhDaB5B+dRnmMHnYRQtrlp1zJYOxOKC0/evmEzp3kpppXHz5LnraFxa4hqCaEBGJVlzjh+u6BMRMYCI1X1Fvf1RGCgqv7SY5tPgALgZ0AiMBfoqarZFZVrNYL6T1X5YOkOHv90DUfzixg/KJmJg9rSvkX06Xc+ExUXweEdJxJFzm7I2XXyz9w9oEVldhRnZNNJyaKcn43irL/C1LxGUE07gCSP14nuMk+ZwAJVLQC2iMgGoCNOf4I5Q4kIV/VN5NxOLXjq87W89cNWXvs+g7PPas7EQW25oGs8YaFB9MEVEur0JzRJhpRzyt+muAiO7Ds1QZT8PLwTdiyFI1k44y48yw9zOrbLrV20cqbqiGkFDZvacNkg5c1cQ0uAKcA7qnrQ64JFwnA6i8/HSQCLgOtUdbXHNiNxOpBvEJE44Eecu6Htr6hcqxGcefbm5PHuwu1MW7iNnYfyiG8cwbX9kxk3IJlWsTbjaJUUFTi1h/KShefPY+X8K4dGnEgQUXEQFun0VYQ2KPMzwmkKC4useN1JP8uW4ZYb4sepSMwpajrp3FnAjcA1wGLgNWCWetGmJCKXAM/htP9PUdUnReQxYLGqzhRnHOGfgJFAEfCkqk6vrExLBGeuwqJivlmfxVvztzJ3QxahIcKFXeOZMKgtQzo0P3M6luuCgjzI3V1xojiyz7nDXOFxZ7qO0p/5nFLjqC4JPX0SadDIqal4PiKbnLqsYRPrYD8Nn0w6596k5jLgRZwP7deAv9b2dNSWCILD1v1HeGfBNmYs3s7BowWkxEUxfmAyV/dNIrZR+OkLMP6h6nRsF+Y7V2YX5jsJo+R56c/8cpJIBYnllH2Pu9vmO/enOJbt1GDysk8eoltWeNSJpHDSz9MkkQZRQdEk5ovho71wagWXAF8CbwNDgYmqmua7UE/PEkFwySso4rOVu5g6fytLt2UTERbC6NQEJgxqS2pSk0CHZ2pTcTHkH3aSQskjL9vjdfbJ60pfHzgxnUh5QsIrThwlj/CGICFOLSYk1Hle8rN0WajTKX/KslAn0ZyyLMSjHI91nmWfsiy02h3/NW0aWgJkA5OB91U132PdB6p6ZbWiqiZLBMFr9c5DTJ2/jX8v28HR40X0bBPLxEFtGZWaQMMG1t5sKqDqXDVeYQIp+8g+kUSOV2fkvB+d/Wu48A/V2rWm9yx+UFX/t1pH9gNLBOZwXgEf/biDt37Yyk97c2kcGcZVfROZMKgtHYJ1CKrxj6ICJyEU5jkjt7TY/Vl04rUWObWVU5YVeawru6zISVCnLCuu/BhJ/aF9erXeit2z2JyRVJWFWw4wdcE2vli1i4IiZUiH5kwY1JYLu8UTHkxDUI05DbtnsTnjZeXkM2Pxdt5ZsI0d2cdoGRPBtQOSGTcgidaxDQMdnjEBZ/csNkGjqFiZvX4vb83fypwNWYSIcH6Xlkwc3JazO8TZEFQTtGp0ZbGqpvg+JGP8IzREOL9rPOd3jWfb/qO8s9AZgjprzR7aNW/EhEFtGds3kSaN7HaVxpTwdvhoD6AbUHqZp6q+6ce4KmQ1AlNV+YVFfL5yN1Pnb2Xx1oNEhIVwWa8EJg5uS2pibN29P4IxPlTTpqFHgHScRPAZcDHwnaqO9XGcXrFEYGpi7a7DTJ2/lY9+3MGR40X0aNOYCQPbMjotgUYNbCZPc+aqaSJYCaQCP6pqqojEA1NV9ULfh3p6lgiML+S4Q1Cnzt/G+j05xESGMW5AMjednWLzG5kzUk1nHz2mqsUiUigijYG9nDyrqDH1TkxkOBMHt2PCoLYs3nqQN+Zl8Oq3m3nt+y1c0bsNt53bnrNangH3STDGC94kgsUi0gR4BViCc7vKH/wZlDG1RUTo364Z/ds1Y/uBo7zy7WZmLN7OjMWZXNA1njvT29O3bbNAh2mMX1XpxjTuzesbq+oKv0V0GtY0ZPxtf24+b/ywlTd/yCD7aAH92jbljmEdOK9LSxt+auotX0w61wZoi0cNQlXn+izCKrBEYGrL0eOFzFi0nVe+3cKO7GN0bBnNbee2Z0xaGxqE2VXLpn6paWfxMzj3IliDM/00OBeUjfZplF6yRGBqW0FRMZ+t3MVLczazdtdhWjWO5Kah7Rg3IJmYSJsS29QPNU0E64FenrOOBpIlAhMoqsrcn/bx0uxN/LB5PzGRYUwY1JYbz25HyxgbaWTqtpqOGtoMhAN1IhEYEygiwrBOLRjWqQXLt2fz8tzN/HPOJiZ/u4Wr+rbh1nPa095mPzX1kDc1gvdxriP4Co9koKp3+ze08lmNwNQlGfuO8Mq3m/nXkkwKioq5qFsr7kjvQJrdNMfUMTVtGrqhvOWq+oYPYqsySwSmLsrKyeeNeRm8+UMGh/MKGZjSjDvSO5DeqYVNYWHqBJ/cs7iusERg6rLc/EKmL9zG5O+2sOtQHl1axXDbue0ZlZpg90cwAVWtRCAiM1T1Z+4UE6dspKq9fBumdywRmPqgoKiYmct28s+5m9iwJ5eE2EhuPqc91/ZPIirC5jQyta+6iaC1qu4SkbblrVfVrT6M0WuWCEx9oqp8s34vL83ZzMItB4htGM71g9tyw5B2xEVHBDo8E0R81jQkInHAfvVyJxEZCfwVCAVeVdWnK9juKuA9oL+qVvopb4nA1FdLtx3kn3M2MWvNHhqEhnB1v0RuPac9bZtHBTo0EwSqWyMYBDwNHAAeB94C4oAQ4HpV/eI0Bw0FNgAXApnAImCcqq4ps10M8CnQAPilJQJzptuUlcur327m/SU7KCwu5uKerbnj3A70TIwNdGjmDFZZIqis9+rvwP8C04CvgVtUtRVwLvCUF8cdAGxU1c2qehyYDowpZ7vHgWeAPC/KNKbe69Aimqeu7MV3Dwzn9mEdmLs+i1F//47xr85nzoYsiorr1wAOU/9V1msVpqqzAETkMVWdD6Cq67wcDtcG2O7xOhMY6LmBiPQBklT1UxG5r0qRG1PPtWwcyQMju/Dz9A5Mc0ca3TBlIXHRDbigazwjusczpEMckeGhgQ7VnOEqSwTFHs+PlVlX468sIhIC/BmY5MW2twG3ASQnJ9f00MbUKTGR4dx2bgcmDUlh1prdfLl6D5+s2MX0RduJahBKeueWjOgeT3rnlsQ2tLmNjO9V1kdQBBwBBGgIHC1ZBUSqaqV/kSIyGHhUVS9yXz8EoKpPua9jgU049zcAaIXTHzG6sn4C6yMwwSC/sIj5mw8wa/Vu/rNmD3tz8gkLEQZ3aM6IbvFc0C2e1rENAx2mqUcCckGZiIThdBafD+zA6Sy+TlVXV7D9bOBe6yw25mTFxcryzGxmrdnDl6t3sznrCACpibGM6N6KEd3iOatltF3BbCoVsCuLReQS4Dmc4aNTVPVJEXkMWKyqM8tsOxtLBMac1sa9ucxas5tZq/ewbHs2AClxUYzo5vQr9E5qajfQMaewKSaMOUPtOZzHf9bsYdaaPfywaR8FRUpcdAQXdmvJiO6tGNKhORFh1tlsLBEYExQO5xUwe30Ws1bvZvb6LHLzC53O5i4tGdEtnuFdWtLYbqQTtCwRGBNk8guL+GHTfmat2cN/1uwhKyef8FBhUPvmjOjeigu7xtMq1m6mE0wsERgTxIqLlR+3Z5f2K2zZ53Y2JzVhRLd4LuoeT4cW1tl8prNEYIwBnEnwNmXl8uVqp19hudvZ3D4uigu7xzOiWyt6JzWxzuYzkCUCY0y5dh/K4z9r9zBr9W5+2LSfwmKlRUwEF3aLZ1SvBAamNLOkcIawRGCMOa1DxwqYvX4vs9bs4Zt1ezl6vIj4xhGM6pXA6LQEeraJteajeswSgTGmSo4eL+SrtXuZuXwns9fvpaBISYmLYlRqAqNTEzirZXSgQzRVZInAGFNth44W8MXqXcxcvpN5m/ajCt0TGjM6NYFRqQkkNLGpLuoDSwTGGJ/YeziPT1Y4SaHkquYB7ZoxKi2BS3u2pllUg8AGaCpkicAY43Nb9x/h4+U7+feynfy0N5fQEOGcjnGMTk1gRPdWRNu9mesUSwTGGL9RVdbtzmHm8p3MXLaTHdnHiAgL4YKu8YxKTSC9cwu7p0IdYInAGFMrVJWl2w4yc9lOPlmxi/1HjhMTGcbI7q0YnZbA4PbNCQut7MaIxl8sERhjal1hUTHzNu1n5vKdfLlqNzn5hcRFR3BZr9aMSk2gT3ITG45aiywRGGMCKq+giNnrneGo/127l+OFxSQ1a8ioXgmMSWtD51YxgQ7xjGeJwBhTZ+TkFTBr9R7+vXwn32/cR1Gx0jk+htFpzjUKSc0aBTrEM5IlAmNMnbQvN5/PV+7i38t2snjrQQB6JzdhdGoCl/ZqTcsYmyHVVywRGGPqvMyDR/lkhZMU1u46TIhA7+SmdG0dQ+f4GDrFx9C5VQxNGtm1CtVhicAYU69s3JvDzGXOlczr9+SQk1dYui6+cYSTFOJj6NQqhi6tYjirZTSNGth1C5WxRGCMqbdUlT2H81m3+zAb9uSwfncuG/bksGFPDvmFxQCIQHKzRqUJonMr55ESF0W4DVcFKk8ElkKNMXWaiNAqNpJWsZGkd25ZuryoWNl24Cjrd+e4CSKH9Xty+HrdXoqKnS+44aFC+7jo0sRQkigSmza06bU9WCIwxtRLoSFCSlwUKXFRjOzRqnR5fmERm7OOlCaGDbtznIvclu8s3aZRg1A6xsfQOT66tO+hc3wMLWIigvLaBksExpgzSkRYKF1bN6Zr68YnLc/JK+Cnvbls2J3DOrcW8fW6vcxYnFm6TdNG4aWJwfNnbMPw2n4btcoSgTEmKMREhtMnuSl9kpuetHxfbj4bSmoPbhPTB0t3kJt/ooM6sWlDUpOakJbYhNSkJvRo0/iM6pz26zsRkZHAX4FQ4FVVfbrM+v8H3AIUAlnATaq61Z8xGWOMp7joCOLOimDIWXGly1SVHdnH2LDHqT2s3nGYZduy+XTFLgBCBDrFx5CW5CSG1MQmdIqPrrfzKPlt1JCIhAIbgAuBTGARME5V13hsMxxYoKpHReROIF1Vr6msXBs1ZIwJlKycfFZkZrN8ezbLMg+xfHs2h44VABAZHkLPNrGkurWGtKQmJDZtWGf6HAI1amgAsFFVN7tBTAfGAKWJQFW/8dh+PjDBj/EYY0yNtIiJ4Pyu8ZzfNR5wag5b9x9leWY2y7Y7CeLN+Vs5/t0WAJpFNSA1MdapNbg1h7p48x5/JoI2wHaP15nAwEq2vxn4vLwVInIbcBtAcnKyr+IzxpgaERHaxUXRLi6KMWltACgoKmb97pzSxLA8M5vZG7IoaXxJataQ1MQmpc1KPRJiadggsPdrqBO9HSIyAegHDCtvvaq+DLwMTtNQLYZmjDFVEh4aQo82sfRoE8uEQW0ByM0vZGXmIZa7zUpLtx7kE7e/ITRE3P6GE81KHVvWbn+DPxPBDiDJ43Wiu+wkInIB8D/AMFXN92M8xhgTENERYQzu0JzBHZqXLtt7OI/lbj/D8kynI3raQqcRpWF4qNPfkBRb2qTkz/4Gf3YWh+F0Fp+PkwAWAdep6mqPbXoD7wEjVfUnb8q1zmJjzJmouFjZeuCo0xHtJofVOw9z3J1Go3lUA+4Y1oFbz21frfID0lmsqoUi8kvgS5zho1NUdbWIPAYsVtWZwB+BaOBfbqbbpqqj/RWTMcbUVSEeV0pf3tvpbzhe6PY3uE1K8bH+mZbbJp0zxpggUFmNoH5e/WCMMcZnLBEYY0yQs0RgjDFBzhKBMcYEOUsExhgT5CwRGGNMkLNEYIwxQc4SgTHGBLl6d0GZiGQB1b15TRywz4fh1Hd2Pk5m5+MEOxcnOxPOR1tVbVHeinqXCGpCRBZXdGVdMLLzcTI7HyfYuTjZmX4+rGnIGGOCnCUCY4wJcsGWCF4OdAB1jJ2Pk9n5OMHOxcnO6PMRVH0ExhhjThVsNQJjjDFlWCIwxpggFzSJQERGish6EdkoIg8GOp5AEpEkEflGRNaIyGoR+VWgYwo0EQkVkR9F5JNAxxJoItJERN4TkXUislZEBgc6pkARkXvc/5FVIjJNRPxzi7AAC4pEICKhwD+Ai4FuwDgR6RbYqAKqEPiNqnYDBgG/CPLzAfArYG2gg6gj/gp8oapdgFSC9LyISBvgbqCfqvbAueXutYGNyj+CIhEAA4CNqrpZVY8D04ExAY4pYFR1l6oudZ/n4PyjtwlsVIEjIonApcCrgY4l0EQkFjgXmAygqsdVNTugQQVWGNBQRMKARsDOAMfjF8GSCNoA2z1eZxLEH3yeRKQd0BtYEOBQAuk54H6gOMBx1AUpQBbwmttU9qqIRAU6qEBQ1R3As8A2YBdwSFVnBTYq/wiWRGDKISLRwPvAr1X1cKDjCQQRuQzYq6pLAh1LHREG9AFeVNXewBEgKPvURKQpTstBCpAARInIhMBG5R/Bkgh2AEkerxPdZUFLRMJxksDbqvpBoOMJoLOB0SKSgdNkeJ6ITA1sSAGVCWSqakkN8T2cxBCMLgC2qGqWqhYAHwBDAhyTXwRLIlgEdBSRFBFpgNPhMzPAMQWMiAhOG/BaVf1zoOMJJFV9SFUTVbUdzt/F16p6Rn7r84aq7ga2i0hnd9H5wJoAhhRI24BBItLI/Z85nzO04zws0AHUBlUtFJFfAl/i9PxPUdXVAQ4rkM4GJgIrRWSZu+y3qvpZ4EIydchdwNvul6bNwI0BjicgVHWBiLwHLMUZafcjZ+hUEzbFhDHGBLlgaRoyxhhTAUsExhgT5CwRGGNMkLNEYIwxQc4SgTHGBDlLBKbOEhEVkT95vL5XRB71Udmvi8hYX5R1muNc7c7g+U2Z5e1E5JiILPN4XO/D46bbTKrGW0FxHYGpt/KBK0XkKVXdF+hgSohImKoWern5zcCtqvpdOes2qWqa7yIzpnqsRmDqskKcC3juKbui7Dd6Ecl1f6aLyBwR+beIbBaRp0VkvIgsFJGVItLBo5gLRGSxiGxw5xwquS/BH0VkkYisEJHbPcr9VkRmUs6VtiIyzi1/lYg84y77PTAUmCwif/T2TYtIroj8xZ0H/ysRaeEuTxOR+W5cH7pz4SAiZ4nIf0VkuYgs9XiP0R73FXjbvToW95yscct51tu4zBlMVe1hjzr5AHKBxkAGEAvcCzzqrnsdGOu5rfszHcgGWgMROHNK/cFd9yvgOY/9v8D5MtQRZ46dSOA24GF3mwhgMc6kY+k4E7CllBNnAs50BC1watlfA5e762bjzGdfdp92wDFgmcfjHHedAuPd578H/u4+XwEMc58/5vFeFgBXuM8jcaZLTgcO4cyrFQL8gJOUmgPrOXExaZNA/57tEfiH1QhMnabOrKhv4twgxFuL1LnnQj6wCSiZOnglzgdwiRmqWqyqP+FMpdAFGAFc7069sQDng7Oju/1CVd1SzvH6A7PVmZysEHgbZ07/09mkqmkej2/d5cXAu+7zqcBQ9z4BTVR1jrv8DeBcEYkB2qjqhwCqmqeqRz3izVTVYpxE0w4nOeTh1FKuBEq2NUHMEoGpD57DaWv3nBe/EPfvV0RCgAYe6/I9nhd7vC7m5H6xsvOrKCDAXR4fzil6Yg76IzV5EzVQ3XlgPM9DEVDStzEAZ1bRy3BqRSbIWSIwdZ6qHgBm4CSDEhlAX/f5aCC8GkVfLSIhbpt6e5wmky+BO91puhGRTl7cmGUhMExE4tzboo4D5pxmn8qEACX9H9cB36nqIeCgiJzjLp8IzFHnDnOZInK5G2+EiDSqqGD3HhSx6kwweA/OrShNkLNRQ6a++BPwS4/XrwD/FpHlON9qq/NtfRvOh3hj4A5VzRORV3GaUJa6natZwOWVFaKqu0TkQeAbnBrFp6r6by+O38Fj9ldwZsV9Hue9DBCRh4G9wDXu+huAl9wPes9ZQScC/xSRx4AC4OpKjhmDc94i3Vj/nxdxmjOczT5qTB0jIrmqGh3oOEzwsKYhY4wJclYjMMaYIGc1AmOMCXKWCIwxJshZIjDGmCBnicAYY4KcJQJjjAly/x8fIzjFH07abgAAAABJRU5ErkJggg==", 536 | "text/plain": [ 537 | "
" 538 | ] 539 | }, 540 | "metadata": { 541 | "needs_background": "light" 542 | }, 543 | "output_type": "display_data" 544 | } 545 | ], 546 | "source": [ 547 | "plt.title(\"Text Classification (CNN): Learning Curves\")\n", 548 | "plt.xlabel(\"Number of Epochs\")\n", 549 | "plt.ylabel(\"Cross Entropy Loss\")\n", 550 | "plt.plot(train_losses, label = \"Training Loss\")\n", 551 | "plt.plot(valid_losses, label= \"Validation Loss\")\n", 552 | "plt.legend()\n", 553 | "plt.show()" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "# Testing" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 23, 566 | "metadata": { 567 | "id": "YLpKGbjyu2OK" 568 | }, 569 | "outputs": [ 570 | { 571 | "name": "stdout", 572 | "output_type": "stream", 573 | "text": [ 574 | "Test Loss: 0.3947\n", 575 | "Test Accuracy: 84.90%\n" 576 | ] 577 | } 578 | ], 579 | "source": [ 580 | "model.load_state_dict(torch.load('CNN.pt'))\n", 581 | "\n", 582 | "test_loss, test_acc = Evaluate(model, test_iterator, criterion)\n", 583 | "\n", 584 | "print(f'Test Loss: {test_loss:.4f}')\n", 585 | "print(f'Test Accuracy: {test_acc*100:.2f}%')" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": { 591 | "id": "o3vZ2NNn8sM6" 592 | }, 593 | "source": [ 594 | "# Sampling" 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "execution_count": 24, 600 | "metadata": { 601 | "id": "e0hc5XtDsqIH" 602 | }, 603 | "outputs": [], 604 | "source": [ 605 | "import spacy\n", 606 | "nlp = spacy.load('en_core_web_sm')\n", 607 | "\n", 608 | "def predict_class(model, text, min_len = 4):\n", 609 | " model.eval()\n", 610 | " tokenized = [tok.text for tok in nlp.tokenizer(text)]\n", 611 | " if len(tokenized) < min_len:\n", 612 | " tokenized += [''] * (min_len - len(tokenized))\n", 613 | " indexed = [FIELD.vocab.stoi[t] for t in tokenized]\n", 614 | " tensor = torch.LongTensor(indexed).to(device)\n", 615 | " tensor = tensor.unsqueeze(1)\n", 616 | " preds = model(tensor)\n", 617 | " max_preds = preds.argmax(dim = 1)\n", 618 | " return max_preds.item()" 619 | ] 620 | }, 621 | { 622 | "cell_type": "code", 623 | "execution_count": 42, 624 | "metadata": {}, 625 | "outputs": [ 626 | { 627 | "name": "stdout", 628 | "output_type": "stream", 629 | "text": [ 630 | "['ENTY', 'HUM', 'DESC', 'NUM', 'LOC', 'ABBR']\n" 631 | ] 632 | } 633 | ], 634 | "source": [ 635 | "print(LABEL.vocab.itos)" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 52, 641 | "metadata": { 642 | "id": "vc0zO3gTws-d" 643 | }, 644 | "outputs": [ 645 | { 646 | "name": "stdout", 647 | "output_type": "stream", 648 | "text": [ 649 | "Predicted class is: ABBR\n" 650 | ] 651 | } 652 | ], 653 | "source": [ 654 | "pred_class = predict_class(model, \"What does CNN stand for?\")\n", 655 | "print(f'Predicted class is: {LABEL.vocab.itos[pred_class]}')" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": 53, 661 | "metadata": { 662 | "id": "Umdey8pIw8yp" 663 | }, 664 | "outputs": [ 665 | { 666 | "name": "stdout", 667 | "output_type": "stream", 668 | "text": [ 669 | "Predicted class is: NUM\n" 670 | ] 671 | } 672 | ], 673 | "source": [ 674 | "pred_class = predict_class(model, \"How long did it take to train an epoch?\")\n", 675 | "print(f'Predicted class is: {LABEL.vocab.itos[pred_class]}')" 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": 57, 681 | "metadata": { 682 | "id": "8rbQ8FYRw-TM" 683 | }, 684 | "outputs": [ 685 | { 686 | "name": "stdout", 687 | "output_type": "stream", 688 | "text": [ 689 | "Predicted class is: DESC\n" 690 | ] 691 | } 692 | ], 693 | "source": [ 694 | "pred_class = predict_class(model, \"What is the model of GPU used?\")\n", 695 | "print(f'Predicted class is: {LABEL.vocab.itos[pred_class]}')" 696 | ] 697 | } 698 | ], 699 | "metadata": { 700 | "accelerator": "GPU", 701 | "colab": { 702 | "collapsed_sections": [], 703 | "name": "CNN.ipynb", 704 | "provenance": [] 705 | }, 706 | "kernelspec": { 707 | "display_name": "Python 3", 708 | "name": "python3" 709 | }, 710 | "language_info": { 711 | "codemirror_mode": { 712 | "name": "ipython", 713 | "version": 3 714 | }, 715 | "file_extension": ".py", 716 | "mimetype": "text/x-python", 717 | "name": "python", 718 | "nbconvert_exporter": "python", 719 | "pygments_lexer": "ipython3", 720 | "version": "3.9.7" 721 | } 722 | }, 723 | "nbformat": 4, 724 | "nbformat_minor": 0 725 | } 726 | -------------------------------------------------------------------------------- /text_classification/notebooks/FastText.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Installing Packages" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "!pip install -U torchtext==0.6.0" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "bbMNpP192mb2" 23 | }, 24 | "source": [ 25 | "# Importing Required Libraries" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": { 32 | "id": "slw2Y5t3taWQ" 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "import torch\n", 37 | "from torchtext import data, datasets\n", 38 | "import torch.nn as nn\n", 39 | "import torch.optim as optim\n", 40 | "import torch.nn.functional as F\n", 41 | "from torchtext.data import Field, LabelField, BucketIterator\n", 42 | "\n", 43 | "import random\n", 44 | "\n", 45 | "import matplotlib.pyplot as plt" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": { 52 | "colab": { 53 | "base_uri": "https://localhost:8080/" 54 | }, 55 | "id": "7kAMYpKr2tJV", 56 | "outputId": "5d7e7127-e114-496d-c1f0-08a57498694c" 57 | }, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": [ 63 | "Notebook is running on cuda\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", 69 | "print(\"Notebook is running on\", device)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "8ZOcRV2w2vpH" 76 | }, 77 | "source": [ 78 | "Fixing SEED for reproducibility of results" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 4, 84 | "metadata": { 85 | "id": "N99mW-8XtnWr" 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "SEED = 4444\n", 90 | "\n", 91 | "random.seed(SEED)\n", 92 | "torch.manual_seed(SEED)\n", 93 | "torch.cuda.manual_seed(SEED)\n", 94 | "torch.backends.cudnn.deterministic = True" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": { 100 | "id": "sMxBvhY8y0wq" 101 | }, 102 | "source": [ 103 | "Generate NGRAMS for FastText" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 5, 109 | "metadata": { 110 | "id": "jk054NHay0fN" 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "def generate_n_grams(x, n=2):\n", 115 | " n_grams = set(zip(*[x[i:] for i in range(n)]))\n", 116 | " for n_gram in n_grams:\n", 117 | " x.append(' '.join(n_gram))\n", 118 | " return list(set(x))" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 6, 124 | "metadata": { 125 | "id": "O5NaQke7touV" 126 | }, 127 | "outputs": [], 128 | "source": [ 129 | "FIELD = Field(tokenize = 'spacy',tokenizer_language = 'en_core_web_sm', preprocessing = generate_n_grams)\n", 130 | "\n", 131 | "LABEL = LabelField(dtype = torch.float)" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": { 137 | "id": "RWXCd0V73YK7" 138 | }, 139 | "source": [ 140 | "# Splitting the data" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 7, 146 | "metadata": { 147 | "colab": { 148 | "base_uri": "https://localhost:8080/" 149 | }, 150 | "id": "79ih2w2sttNq", 151 | "outputId": "b33d07a7-3983-4b49-a81d-db6f2ca0b908" 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "train_data, test_data = datasets.IMDB.splits(FIELD, LABEL)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 8, 161 | "metadata": { 162 | "id": "nxpdowpRtwNQ" 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "train_data, valid_data = train_data.split(random_state = random.seed(SEED))" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 9, 172 | "metadata": { 173 | "colab": { 174 | "base_uri": "https://localhost:8080/" 175 | }, 176 | "id": "BSWt_wDwtzLt", 177 | "outputId": "59e4e52f-b3d7-4300-ef8f-876519cd0092" 178 | }, 179 | "outputs": [ 180 | { 181 | "name": "stdout", 182 | "output_type": "stream", 183 | "text": [ 184 | "Number of training examples: 17500\n", 185 | "Number of validation examples: 7500\n", 186 | "Number of testing examples: 25000\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "print(f'Number of training examples: {len(train_data)}')\n", 192 | "print(f'Number of validation examples: {len(valid_data)}')\n", 193 | "print(f'Number of testing examples: {len(test_data)}')" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 10, 199 | "metadata": { 200 | "colab": { 201 | "base_uri": "https://localhost:8080/" 202 | }, 203 | "id": "trFz-SmNt0fO", 204 | "outputId": "6535fa21-210f-49a5-c8ae-01e80e80b092" 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "MAX_VOCAB_SIZE = 25000 # excluding and token\n", 209 | "\n", 210 | "FIELD.build_vocab(train_data, max_size = MAX_VOCAB_SIZE, vectors=\"glove.6B.100d\", unk_init = torch.Tensor.normal_)\n", 211 | "LABEL.build_vocab(train_data)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 11, 217 | "metadata": { 218 | "colab": { 219 | "base_uri": "https://localhost:8080/" 220 | }, 221 | "id": "PQUUTTgat5hg", 222 | "outputId": "965f4522-3b08-4866-bb7d-f0a32c2c5649" 223 | }, 224 | "outputs": [ 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "Unique tokens in FIELD vocabulary: 25002\n", 230 | "Unique tokens in LABEL vocabulary: 2\n" 231 | ] 232 | } 233 | ], 234 | "source": [ 235 | "print(f\"Unique tokens in FIELD vocabulary: {len(FIELD.vocab)}\")\n", 236 | "print(f\"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}\")" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": { 242 | "id": "BIZMVoIA4S2K" 243 | }, 244 | "source": [ 245 | "# Model Definition" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 12, 251 | "metadata": { 252 | "id": "egyx2GtzuEdP" 253 | }, 254 | "outputs": [], 255 | "source": [ 256 | "class FastText(nn.Module):\n", 257 | " def __init__(self, vocab_size, emb_dim, output_dim, pad_idx):\n", 258 | " super().__init__()\n", 259 | " self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=pad_idx)\n", 260 | " self.fc = nn.Linear(emb_dim, output_dim)\n", 261 | " \n", 262 | " def forward(self, input): # [input] = [input_length, batch_size]\n", 263 | " embedded = self.embedding(input) # [embedded] = [seq_len, batch_size, emb_dim]\n", 264 | " embedded = embedded.permute(1, 0, 2) # [embedded] = [batch_size, seq_len, emb_dim]\n", 265 | " pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)) # [pooled] = [batch_size, 1, emb_dim] \n", 266 | " pooled = pooled.squeeze(1) # [pooled] = [batch size, embedding_dim]\n", 267 | " return self.fc(pooled)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 13, 273 | "metadata": { 274 | "id": "2PDdgoXiufHr" 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "def batch_accuracy(preds, y):\n", 279 | " #round predictions to the closest integer\n", 280 | " rounded_preds = torch.round(torch.sigmoid(preds))\n", 281 | " correct = (rounded_preds == y).float()\n", 282 | " acc = correct.sum() / len(correct)\n", 283 | " return acc" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": { 289 | "id": "OpnqXXA1uvuD" 290 | }, 291 | "source": [ 292 | "# Training and Evaluation Functions" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 14, 298 | "metadata": { 299 | "id": "gKSFO3hRutOc" 300 | }, 301 | "outputs": [], 302 | "source": [ 303 | "def Train(model, iterator, optimizer, criterion):\n", 304 | " epoch_loss = 0\n", 305 | " epoch_acc = 0\n", 306 | " model.train()\n", 307 | " for batch in iterator:\n", 308 | " optimizer.zero_grad()\n", 309 | " inp = batch.text\n", 310 | " label = batch.label \n", 311 | " predictions = model(inp).squeeze(1)\n", 312 | " loss = criterion(predictions, label)\n", 313 | " acc = batch_accuracy(predictions, label)\n", 314 | " loss.backward()\n", 315 | " optimizer.step()\n", 316 | " epoch_loss += loss.item()\n", 317 | " epoch_acc += acc.item() \n", 318 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 15, 324 | "metadata": { 325 | "id": "YqTp93Aruwoe" 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "def Evaluate(model, iterator, criterion):\n", 330 | " epoch_loss = 0\n", 331 | " epoch_acc = 0\n", 332 | " model.eval()\n", 333 | " with torch.no_grad():\n", 334 | " for batch in iterator: \n", 335 | " inp = batch.text\n", 336 | " label = batch.label \n", 337 | " predictions = model(inp).squeeze(1)\n", 338 | " loss = criterion(predictions, label)\n", 339 | " acc = batch_accuracy(predictions, label)\n", 340 | " epoch_loss += loss.item()\n", 341 | " epoch_acc += acc.item() \n", 342 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": { 348 | "id": "v1_3yX3-42vO" 349 | }, 350 | "source": [ 351 | "# Data Iterators, Hyperparameters and Model Initialization" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 16, 357 | "metadata": { 358 | "id": "yIHX-jbs5ClS" 359 | }, 360 | "outputs": [], 361 | "source": [ 362 | "BATCH_SIZE = 64\n", 363 | "\n", 364 | "train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits((train_data, valid_data, test_data), batch_size = BATCH_SIZE, device = device)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 17, 370 | "metadata": { 371 | "id": "o6c7ld9wuYh4" 372 | }, 373 | "outputs": [], 374 | "source": [ 375 | "VOCAB_SIZE = len(FIELD.vocab) # dimension of one-hot vector / vocabulary\n", 376 | "EMB_DIM = 100 # dimension of word embeddings\n", 377 | "OUTPUT_DIM = 1 # dimension of output layer\n", 378 | "\n", 379 | "NUM_EPOCHS = 10\n", 380 | "LR = 0.001" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 18, 386 | "metadata": { 387 | "id": "j5U3vbZcwEgr" 388 | }, 389 | "outputs": [], 390 | "source": [ 391 | "model = FastText(VOCAB_SIZE, EMB_DIM, OUTPUT_DIM, FIELD.vocab.stoi[FIELD.pad_token])" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 19, 397 | "metadata": { 398 | "id": "ziglTF1D5voa" 399 | }, 400 | "outputs": [], 401 | "source": [ 402 | "optimizer = optim.Adam(model.parameters(), lr=LR)\n", 403 | "\n", 404 | "criterion = nn.BCEWithLogitsLoss()\n", 405 | "\n", 406 | "model = model.to(device)\n", 407 | "criterion = criterion.to(device)" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 20, 413 | "metadata": { 414 | "colab": { 415 | "base_uri": "https://localhost:8080/" 416 | }, 417 | "id": "X7Mm3BnuuZ2F", 418 | "outputId": "1c65ec04-2ac8-4f1d-fb03-08ef3b8f4114" 419 | }, 420 | "outputs": [ 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "The model has 2,500,301 trainable parameters\n" 426 | ] 427 | } 428 | ], 429 | "source": [ 430 | "def count_parameters(model):\n", 431 | " return sum(p.numel() for p in model.parameters() if p.requires_grad)\n", 432 | "\n", 433 | "print(f'The model has {count_parameters(model):,} trainable parameters')" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": { 439 | "id": "99qvK1qw5aAV" 440 | }, 441 | "source": [ 442 | "# Training" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 21, 448 | "metadata": { 449 | "id": "WT1J71IjuySn" 450 | }, 451 | "outputs": [], 452 | "source": [ 453 | "import time\n", 454 | "\n", 455 | "def Epoch_time(start_time, end_time):\n", 456 | " elapsed_time = end_time - start_time\n", 457 | " elapsed_mins = int(elapsed_time / 60)\n", 458 | " elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n", 459 | " return elapsed_mins, elapsed_secs" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 22, 465 | "metadata": { 466 | "colab": { 467 | "base_uri": "https://localhost:8080/" 468 | }, 469 | "id": "TKDOAVH5uz2s", 470 | "outputId": "aafc2306-93a3-4ee0-8678-4e8a276ea1e1" 471 | }, 472 | "outputs": [ 473 | { 474 | "name": "stdout", 475 | "output_type": "stream", 476 | "text": [ 477 | "Learning Rate: 0.001\n", 478 | "Time taken for epoch 1: 0m 9s\n", 479 | "Training Loss: 0.6900 | Validation Loss: 0.6643\n", 480 | "Training Accuracy: 53.88 %| Validation Accuracy: 63.70 %\n", 481 | "Time taken for epoch 2: 0m 8s\n", 482 | "Training Loss: 0.6689 | Validation Loss: 0.5680\n", 483 | "Training Accuracy: 70.73 %| Validation Accuracy: 75.88 %\n", 484 | "Time taken for epoch 3: 0m 8s\n", 485 | "Training Loss: 0.6161 | Validation Loss: 0.4792\n", 486 | "Training Accuracy: 78.73 %| Validation Accuracy: 78.10 %\n", 487 | "Time taken for epoch 4: 0m 8s\n", 488 | "Training Loss: 0.5431 | Validation Loss: 0.3829\n", 489 | "Training Accuracy: 83.51 %| Validation Accuracy: 83.00 %\n", 490 | "Time taken for epoch 5: 0m 8s\n", 491 | "Training Loss: 0.4712 | Validation Loss: 0.3528\n", 492 | "Training Accuracy: 86.64 %| Validation Accuracy: 85.13 %\n", 493 | "Time taken for epoch 6: 0m 8s\n", 494 | "Training Loss: 0.4132 | Validation Loss: 0.3434\n", 495 | "Training Accuracy: 88.35 %| Validation Accuracy: 86.67 %\n", 496 | "Time taken for epoch 7: 0m 8s\n", 497 | "Training Loss: 0.3627 | Validation Loss: 0.3479\n", 498 | "Training Accuracy: 89.37 %| Validation Accuracy: 87.30 %\n", 499 | "Time taken for epoch 8: 0m 8s\n", 500 | "Training Loss: 0.3263 | Validation Loss: 0.3536\n", 501 | "Training Accuracy: 90.39 %| Validation Accuracy: 87.87 %\n", 502 | "Time taken for epoch 9: 0m 8s\n", 503 | "Training Loss: 0.2976 | Validation Loss: 0.3649\n", 504 | "Training Accuracy: 91.05 %| Validation Accuracy: 88.28 %\n", 505 | "Time taken for epoch 10: 0m 8s\n", 506 | "Training Loss: 0.2725 | Validation Loss: 0.3771\n", 507 | "Training Accuracy: 91.55 %| Validation Accuracy: 88.56 %\n", 508 | "Model with Train Loss 0.4132, Validation Loss: 0.3434 was saved.\n" 509 | ] 510 | } 511 | ], 512 | "source": [ 513 | "print(f\"Learning Rate: {LR}\")\n", 514 | "train_losses = []\n", 515 | "valid_losses = []\n", 516 | "min_losses = [float('inf'), float('inf')]\n", 517 | "\n", 518 | "start_time = time.time()\n", 519 | "for epoch in range(1, NUM_EPOCHS+1):\n", 520 | " \n", 521 | " train_loss, train_acc = Train(model, train_iterator, optimizer, criterion)\n", 522 | " train_losses.append(train_loss)\n", 523 | " valid_loss, valid_acc = Evaluate(model, valid_iterator, criterion)\n", 524 | " valid_losses.append(valid_loss)\n", 525 | "\n", 526 | " if valid_loss < min_losses[0]:\n", 527 | " min_losses[0] = valid_loss\n", 528 | " min_losses[1] = train_loss\n", 529 | " torch.save(model.state_dict(), 'FastText.pt')\n", 530 | "\n", 531 | " elapsed_time = Epoch_time(start_time, time.time())\n", 532 | " print(f\"Time taken for epoch {epoch}: {elapsed_time[0]}m {elapsed_time[1]}s\")\n", 533 | " start_time = time.time()\n", 534 | " print(f\"Training Loss: {train_loss:.4f} | Validation Loss: {valid_loss:.4f}\")\n", 535 | " print(f\"Training Accuracy: {train_acc*100:.2f} %| Validation Accuracy: {valid_acc*100:.2f} %\")\n", 536 | "\n", 537 | "print(f\"Model with Train Loss {min_losses[1]:.4f}, Validation Loss: {min_losses[0]:.4f} was saved.\")" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": 23, 543 | "metadata": { 544 | "colab": { 545 | "base_uri": "https://localhost:8080/", 546 | "height": 295 547 | }, 548 | "id": "OkjBpvPt8BmJ", 549 | "outputId": "04320b26-1bc8-44d2-c904-17650335ee40" 550 | }, 551 | "outputs": [ 552 | { 553 | "data": { 554 | "image/png": "", 555 | "text/plain": [ 556 | "
" 557 | ] 558 | }, 559 | "metadata": { 560 | "needs_background": "light" 561 | }, 562 | "output_type": "display_data" 563 | } 564 | ], 565 | "source": [ 566 | "plt.title(\"Sentiment Analysis (FastText): Learning Curves\")\n", 567 | "plt.xlabel(\"Number of Epochs\")\n", 568 | "plt.ylabel(\"Binary Cross Entropy Loss\")\n", 569 | "plt.plot(train_losses, label = \"Training Loss\")\n", 570 | "plt.plot(valid_losses, label= \"Validation Loss\")\n", 571 | "plt.legend()\n", 572 | "plt.show()" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "# Testing" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 24, 585 | "metadata": { 586 | "colab": { 587 | "base_uri": "https://localhost:8080/" 588 | }, 589 | "id": "YLpKGbjyu2OK", 590 | "outputId": "fbad5f78-802f-441e-8f09-a461c781732e" 591 | }, 592 | "outputs": [ 593 | { 594 | "name": "stdout", 595 | "output_type": "stream", 596 | "text": [ 597 | "Test Loss: 0.3499\n", 598 | "Test Accuracy: 86.28%\n" 599 | ] 600 | } 601 | ], 602 | "source": [ 603 | "model.load_state_dict(torch.load('FastText.pt'))\n", 604 | "\n", 605 | "test_loss, test_acc = Evaluate(model, test_iterator, criterion)\n", 606 | "\n", 607 | "print(f'Test Loss: {test_loss:.4f}')\n", 608 | "print(f'Test Accuracy: {test_acc*100:.2f}%')" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": { 614 | "id": "o3vZ2NNn8sM6" 615 | }, 616 | "source": [ 617 | "# Sampling" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 25, 623 | "metadata": { 624 | "id": "e0hc5XtDsqIH" 625 | }, 626 | "outputs": [], 627 | "source": [ 628 | "import spacy\n", 629 | "nlp = spacy.load('en_core_web_sm')\n", 630 | "\n", 631 | "def predict_sentiment(model, text):\n", 632 | " model.eval()\n", 633 | " tokenized = generate_n_grams([tok.text for tok in nlp.tokenizer(text)], n=2)\n", 634 | " indexed = [FIELD.vocab.stoi[t] for t in tokenized]\n", 635 | " tensor = torch.LongTensor(indexed).to(device)\n", 636 | " tensor = tensor.unsqueeze(1)\n", 637 | " prediction = torch.sigmoid(model(tensor))\n", 638 | " return prediction.item()" 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": 26, 644 | "metadata": { 645 | "colab": { 646 | "base_uri": "https://localhost:8080/" 647 | }, 648 | "id": "vc0zO3gTws-d", 649 | "outputId": "1ab04c54-b9e4-4323-943d-31cf2600127a" 650 | }, 651 | "outputs": [ 652 | { 653 | "data": { 654 | "text/plain": [ 655 | "4.2760206042657956e-07" 656 | ] 657 | }, 658 | "execution_count": 26, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | } 662 | ], 663 | "source": [ 664 | "predict_sentiment(model, \"This film is not bad\")" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 27, 670 | "metadata": { 671 | "colab": { 672 | "base_uri": "https://localhost:8080/" 673 | }, 674 | "id": "Umdey8pIw8yp", 675 | "outputId": "de24d02b-3c28-4dfa-f0b8-c4f0a4650340" 676 | }, 677 | "outputs": [ 678 | { 679 | "data": { 680 | "text/plain": [ 681 | "1.0" 682 | ] 683 | }, 684 | "execution_count": 27, 685 | "metadata": {}, 686 | "output_type": "execute_result" 687 | } 688 | ], 689 | "source": [ 690 | "predict_sentiment(model, \"This film is excellent\")" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": 28, 696 | "metadata": { 697 | "colab": { 698 | "base_uri": "https://localhost:8080/" 699 | }, 700 | "id": "8rbQ8FYRw-TM", 701 | "outputId": "37bb1381-ac04-42ac-de39-e48de16ae9db" 702 | }, 703 | "outputs": [ 704 | { 705 | "data": { 706 | "text/plain": [ 707 | "2.538240104210665e-11" 708 | ] 709 | }, 710 | "execution_count": 28, 711 | "metadata": {}, 712 | "output_type": "execute_result" 713 | } 714 | ], 715 | "source": [ 716 | "predict_sentiment(model, \"This film is bad\")" 717 | ] 718 | } 719 | ], 720 | "metadata": { 721 | "accelerator": "GPU", 722 | "colab": { 723 | "name": "FastText.ipynb", 724 | "provenance": [] 725 | }, 726 | "kernelspec": { 727 | "display_name": "Python 3", 728 | "name": "python3" 729 | }, 730 | "language_info": { 731 | "codemirror_mode": { 732 | "name": "ipython", 733 | "version": 3 734 | }, 735 | "file_extension": ".py", 736 | "mimetype": "text/x-python", 737 | "name": "python", 738 | "nbconvert_exporter": "python", 739 | "pygments_lexer": "ipython3", 740 | "version": "3.9.7" 741 | } 742 | }, 743 | "nbformat": 4, 744 | "nbformat_minor": 0 745 | } 746 | -------------------------------------------------------------------------------- /text_classification/notebooks/LSTM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Installing Packages" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "id": "bbMNpP192mb2" 14 | }, 15 | "source": [ 16 | "# Importing Required Libraries" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": { 23 | "id": "slw2Y5t3taWQ" 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "import torch\n", 28 | "from torchtext import data, datasets\n", 29 | "import torch.nn as nn\n", 30 | "import torch.optim as optim\n", 31 | "from torch.nn.utils.rnn import pack_padded_sequence\n", 32 | "from torchtext.data import Field, LabelField, BucketIterator\n", 33 | "\n", 34 | "import random\n", 35 | "\n", 36 | "import matplotlib.pyplot as plt" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": { 43 | "colab": { 44 | "base_uri": "https://localhost:8080/" 45 | }, 46 | "id": "7kAMYpKr2tJV", 47 | "outputId": "1b33d632-6261-4881-8703-5a572fa8c210" 48 | }, 49 | "outputs": [ 50 | { 51 | "name": "stdout", 52 | "output_type": "stream", 53 | "text": [ 54 | "Notebook is running on cuda\n" 55 | ] 56 | } 57 | ], 58 | "source": [ 59 | "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", 60 | "print(\"Notebook is running on\", device)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": { 66 | "id": "8ZOcRV2w2vpH" 67 | }, 68 | "source": [ 69 | "Fixing SEED for reproducibility of results" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": { 76 | "id": "N99mW-8XtnWr" 77 | }, 78 | "outputs": [], 79 | "source": [ 80 | "SEED = 4444\n", 81 | "\n", 82 | "random.seed(SEED)\n", 83 | "torch.manual_seed(SEED)\n", 84 | "torch.cuda.manual_seed(SEED)\n", 85 | "torch.backends.cudnn.deterministic = True" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 4, 91 | "metadata": { 92 | "id": "O5NaQke7touV" 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "FIELD = Field(tokenize = 'spacy', tokenizer_language = 'en_core_web_sm', include_lengths=True)\n", 97 | "\n", 98 | "LABEL = LabelField(dtype = torch.float)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": { 104 | "id": "RWXCd0V73YK7" 105 | }, 106 | "source": [ 107 | "# Splitting the data" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 5, 113 | "metadata": { 114 | "id": "79ih2w2sttNq" 115 | }, 116 | "outputs": [], 117 | "source": [ 118 | "train_data, test_data = datasets.IMDB.splits(FIELD, LABEL)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 6, 124 | "metadata": { 125 | "id": "nxpdowpRtwNQ" 126 | }, 127 | "outputs": [], 128 | "source": [ 129 | "train_data, valid_data = train_data.split(random_state = random.seed(SEED))" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 7, 135 | "metadata": { 136 | "colab": { 137 | "base_uri": "https://localhost:8080/" 138 | }, 139 | "id": "BSWt_wDwtzLt", 140 | "outputId": "a5c136ea-d381-4ee1-8e91-a908e5f8ece1" 141 | }, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "Number of training examples: 17500\n", 148 | "Number of validation examples: 7500\n", 149 | "Number of testing examples: 25000\n" 150 | ] 151 | } 152 | ], 153 | "source": [ 154 | "print(f'Number of training examples: {len(train_data)}')\n", 155 | "print(f'Number of validation examples: {len(valid_data)}')\n", 156 | "print(f'Number of testing examples: {len(test_data)}')" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 8, 162 | "metadata": { 163 | "id": "trFz-SmNt0fO" 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "MAX_VOCAB_SIZE = 25000 # excluding and token\n", 168 | "\n", 169 | "FIELD.build_vocab(train_data, max_size = MAX_VOCAB_SIZE, vectors=\"glove.6B.100d\", unk_init = torch.Tensor.normal_)\n", 170 | "LABEL.build_vocab(train_data)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 9, 176 | "metadata": { 177 | "colab": { 178 | "base_uri": "https://localhost:8080/" 179 | }, 180 | "id": "PQUUTTgat5hg", 181 | "outputId": "6fde1f1d-4150-4e21-aba1-f66137439cee" 182 | }, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | "Unique tokens in FIELD vocabulary: 25002\n", 189 | "Unique tokens in LABEL vocabulary: 2\n" 190 | ] 191 | } 192 | ], 193 | "source": [ 194 | "print(f\"Unique tokens in FIELD vocabulary: {len(FIELD.vocab)}\")\n", 195 | "print(f\"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}\")" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": { 201 | "id": "BIZMVoIA4S2K" 202 | }, 203 | "source": [ 204 | "# Model Definition" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 10, 210 | "metadata": { 211 | "id": "egyx2GtzuEdP" 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "class LSTM(nn.Module):\n", 216 | " def __init__(self, vocab_size, emb_size, hidden_size, output_size, num_layers, pad_idx):\n", 217 | " super().__init__()\n", 218 | " self.embedding = nn.Embedding(vocab_size, emb_size, padding_idx = pad_idx)\n", 219 | " self.lstm = nn.LSTM(emb_size, hidden_size, num_layers = num_layers, bidirectional=True, dropout=0.5)\n", 220 | " self.fc = nn.Linear(hidden_size*2, output_size)\n", 221 | " self.dropout = nn.Dropout(0.5)\n", 222 | " \n", 223 | " def forward(self, input, input_lengths): # [input] = [seq_len, batch_size]\n", 224 | " embedded = self.dropout(self.embedding(input)) # [embedded] = [sent_len, batch_size, embedding_size]\n", 225 | " packed_embedded = pack_padded_sequence(embedded, input_lengths.to('cpu')) # inputs lengths need to be on CPU\n", 226 | " packed_output, (hidden, cell) = self.lstm(packed_embedded)\n", 227 | " output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)\n", 228 | " # [output] = [seq_len, batch_size, hidden_size*2]\n", 229 | " # [hidden] = [num_layers * 2, batch_size, hidden_size]\n", 230 | " # [cell] = [num_layers * 2, batch_size, hidden_size]\n", 231 | " hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)) # [hidden] = [batch_size, hidden_size * 2]\n", 232 | " return self.fc(hidden)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 11, 238 | "metadata": { 239 | "id": "2PDdgoXiufHr" 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "def batch_accuracy(preds, y):\n", 244 | "\n", 245 | " #round predictions to the closest integer\n", 246 | " rounded_preds = torch.round(torch.sigmoid(preds))\n", 247 | " correct = (rounded_preds == y).float()\n", 248 | " acc = correct.sum() / len(correct)\n", 249 | " return acc" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": { 255 | "id": "OpnqXXA1uvuD" 256 | }, 257 | "source": [ 258 | "# Training and Evaluation Functions" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 12, 264 | "metadata": { 265 | "id": "gKSFO3hRutOc" 266 | }, 267 | "outputs": [], 268 | "source": [ 269 | "def Train(model, iterator, optimizer, criterion):\n", 270 | " epoch_loss = 0\n", 271 | " epoch_acc = 0\n", 272 | " model.train()\n", 273 | " for batch in iterator:\n", 274 | " optimizer.zero_grad()\n", 275 | " inp, inp_lengths = batch.text\n", 276 | " label = batch.label \n", 277 | " predictions = model(inp, inp_lengths).squeeze(1)\n", 278 | " loss = criterion(predictions, label)\n", 279 | " acc = batch_accuracy(predictions, label)\n", 280 | " loss.backward()\n", 281 | " optimizer.step()\n", 282 | " epoch_loss += loss.item()\n", 283 | " epoch_acc += acc.item()\n", 284 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 13, 290 | "metadata": { 291 | "id": "YqTp93Aruwoe" 292 | }, 293 | "outputs": [], 294 | "source": [ 295 | "def Evaluate(model, iterator, criterion):\n", 296 | " epoch_loss = 0\n", 297 | " epoch_acc = 0\n", 298 | " model.eval()\n", 299 | " with torch.no_grad():\n", 300 | " for batch in iterator:\n", 301 | " inp, inp_lengths = batch.text\n", 302 | " label = batch.label \n", 303 | " predictions = model(inp, inp_lengths).squeeze(1)\n", 304 | " loss = criterion(predictions, label)\n", 305 | " acc = batch_accuracy(predictions, label)\n", 306 | " epoch_loss += loss.item()\n", 307 | " epoch_acc += acc.item()\n", 308 | " return epoch_loss / len(iterator), epoch_acc / len(iterator)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": { 314 | "id": "v1_3yX3-42vO" 315 | }, 316 | "source": [ 317 | "# Data Iterators, Hyperparameters and Model Initialization" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 14, 323 | "metadata": { 324 | "id": "yIHX-jbs5ClS" 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "BATCH_SIZE = 64\n", 329 | "\n", 330 | "train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits((train_data, valid_data, test_data), batch_size = BATCH_SIZE, sort_within_batch = True, device = device)" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 15, 336 | "metadata": { 337 | "id": "o6c7ld9wuYh4" 338 | }, 339 | "outputs": [], 340 | "source": [ 341 | "VOCAB_SIZE = len(FIELD.vocab) # dimension of one-hot vector / vocabulary\n", 342 | "EMB_DIM = 100 # dimension of word embeddings\n", 343 | "HIDDEN_DIM = 256 # dimension of hidden layer\n", 344 | "OUTPUT_DIM = 1 # dimension of output layer\n", 345 | "NUM_LAYERS = 2\n", 346 | "\n", 347 | "NUM_EPOCHS = 10\n", 348 | "LR = 0.001" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 16, 354 | "metadata": { 355 | "id": "j5U3vbZcwEgr" 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "model = LSTM(VOCAB_SIZE, EMB_DIM, HIDDEN_DIM, OUTPUT_DIM, NUM_LAYERS, FIELD.vocab.stoi[FIELD.pad_token])" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 17, 365 | "metadata": { 366 | "id": "ziglTF1D5voa" 367 | }, 368 | "outputs": [], 369 | "source": [ 370 | "optimizer = optim.Adam(model.parameters(), lr=LR)\n", 371 | "\n", 372 | "criterion = nn.BCEWithLogitsLoss()\n", 373 | "\n", 374 | "model = model.to(device)\n", 375 | "criterion = criterion.to(device)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 18, 381 | "metadata": { 382 | "colab": { 383 | "base_uri": "https://localhost:8080/" 384 | }, 385 | "id": "X7Mm3BnuuZ2F", 386 | "outputId": "6645fdd0-8293-41df-a4ff-cd29eea63919" 387 | }, 388 | "outputs": [ 389 | { 390 | "name": "stdout", 391 | "output_type": "stream", 392 | "text": [ 393 | "The model has 4,810,857 trainable parameters\n" 394 | ] 395 | } 396 | ], 397 | "source": [ 398 | "def count_parameters(model):\n", 399 | " return sum(p.numel() for p in model.parameters() if p.requires_grad)\n", 400 | "\n", 401 | "print(f'The model has {count_parameters(model):,} trainable parameters')" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": { 407 | "id": "99qvK1qw5aAV" 408 | }, 409 | "source": [ 410 | "# Training" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 19, 416 | "metadata": { 417 | "id": "WT1J71IjuySn" 418 | }, 419 | "outputs": [], 420 | "source": [ 421 | "import time\n", 422 | "\n", 423 | "def Epoch_time(start_time, end_time):\n", 424 | " elapsed_time = end_time - start_time\n", 425 | " elapsed_mins = int(elapsed_time / 60)\n", 426 | " elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n", 427 | " return elapsed_mins, elapsed_secs" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 20, 433 | "metadata": { 434 | "colab": { 435 | "base_uri": "https://localhost:8080/" 436 | }, 437 | "id": "TKDOAVH5uz2s", 438 | "outputId": "57357c85-df7f-4e9a-a609-dd367866adea" 439 | }, 440 | "outputs": [ 441 | { 442 | "name": "stdout", 443 | "output_type": "stream", 444 | "text": [ 445 | "Learning Rate: 0.001, Hidden Dimensions: 256\n", 446 | "Time taken for epoch 1: 0m 35s\n", 447 | "Training Loss: 0.6791 | Validation Loss: 0.7054\n", 448 | "Training Accuracy: 56.48 %| Validation Accuracy: 52.33 %\n", 449 | "Time taken for epoch 2: 0m 34s\n", 450 | "Training Loss: 0.6756 | Validation Loss: 0.6665\n", 451 | "Training Accuracy: 56.92 %| Validation Accuracy: 57.87 %\n", 452 | "Time taken for epoch 3: 0m 35s\n", 453 | "Training Loss: 0.6262 | Validation Loss: 0.5267\n", 454 | "Training Accuracy: 64.63 %| Validation Accuracy: 74.09 %\n", 455 | "Time taken for epoch 4: 0m 35s\n", 456 | "Training Loss: 0.5106 | Validation Loss: 0.4333\n", 457 | "Training Accuracy: 75.00 %| Validation Accuracy: 79.46 %\n", 458 | "Time taken for epoch 5: 0m 35s\n", 459 | "Training Loss: 0.4344 | Validation Loss: 0.3877\n", 460 | "Training Accuracy: 80.39 %| Validation Accuracy: 82.77 %\n", 461 | "Time taken for epoch 6: 0m 35s\n", 462 | "Training Loss: 0.3672 | Validation Loss: 0.3312\n", 463 | "Training Accuracy: 84.22 %| Validation Accuracy: 86.01 %\n", 464 | "Time taken for epoch 7: 0m 35s\n", 465 | "Training Loss: 0.3312 | Validation Loss: 0.3588\n", 466 | "Training Accuracy: 86.08 %| Validation Accuracy: 84.35 %\n", 467 | "Time taken for epoch 8: 0m 35s\n", 468 | "Training Loss: 0.2889 | Validation Loss: 0.3066\n", 469 | "Training Accuracy: 87.88 %| Validation Accuracy: 87.55 %\n", 470 | "Time taken for epoch 9: 0m 35s\n", 471 | "Training Loss: 0.2686 | Validation Loss: 0.2816\n", 472 | "Training Accuracy: 89.10 %| Validation Accuracy: 88.37 %\n", 473 | "Time taken for epoch 10: 0m 35s\n", 474 | "Training Loss: 0.2380 | Validation Loss: 0.3224\n", 475 | "Training Accuracy: 90.45 %| Validation Accuracy: 87.09 %\n", 476 | "Model with Train Loss 0.2686, Validation Loss: 0.2816 was saved.\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "print(f\"Learning Rate: {LR}, Hidden Dimensions: {HIDDEN_DIM}\")\n", 482 | "train_losses = []\n", 483 | "valid_losses = []\n", 484 | "min_losses = [float('inf'), float('inf')]\n", 485 | "\n", 486 | "start_time = time.time()\n", 487 | "for epoch in range(1, NUM_EPOCHS+1):\n", 488 | " \n", 489 | " train_loss, train_acc = Train(model, train_iterator, optimizer, criterion)\n", 490 | " train_losses.append(train_loss)\n", 491 | " valid_loss, valid_acc = Evaluate(model, valid_iterator, criterion)\n", 492 | " valid_losses.append(valid_loss)\n", 493 | "\n", 494 | " if valid_loss < min_losses[0]:\n", 495 | " min_losses[0] = valid_loss\n", 496 | " min_losses[1] = train_loss\n", 497 | " torch.save(model.state_dict(), 'LSTM.pt')\n", 498 | "\n", 499 | " elapsed_time = Epoch_time(start_time, time.time())\n", 500 | " print(f\"Time taken for epoch {epoch}: {elapsed_time[0]}m {elapsed_time[1]}s\")\n", 501 | " start_time = time.time()\n", 502 | " print(f\"Training Loss: {train_loss:.4f} | Validation Loss: {valid_loss:.4f}\")\n", 503 | " print(f\"Training Accuracy: {train_acc*100:.2f} %| Validation Accuracy: {valid_acc*100:.2f} %\")\n", 504 | "\n", 505 | "print(f\"Model with Train Loss {min_losses[1]:.4f}, Validation Loss: {min_losses[0]:.4f} was saved.\")" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 21, 511 | "metadata": { 512 | "colab": { 513 | "base_uri": "https://localhost:8080/", 514 | "height": 295 515 | }, 516 | "id": "OkjBpvPt8BmJ", 517 | "outputId": "e7d60c8f-a210-43a1-e465-afc3ca5153f8" 518 | }, 519 | "outputs": [ 520 | { 521 | "data": { 522 | "image/png": "", 523 | "text/plain": [ 524 | "
" 525 | ] 526 | }, 527 | "metadata": { 528 | "needs_background": "light" 529 | }, 530 | "output_type": "display_data" 531 | } 532 | ], 533 | "source": [ 534 | "plt.title(\"Sentiment Analysis (LSTM): Learning Curves\")\n", 535 | "plt.xlabel(\"Number of Epochs\")\n", 536 | "plt.ylabel(\"Binary Cross Entropy Loss\")\n", 537 | "plt.plot(train_losses, label = \"Training Loss\")\n", 538 | "plt.plot(valid_losses, label= \"Validation Loss\")\n", 539 | "plt.legend()\n", 540 | "plt.show()" 541 | ] 542 | }, 543 | { 544 | "cell_type": "markdown", 545 | "metadata": {}, 546 | "source": [ 547 | "# Testing" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 22, 553 | "metadata": { 554 | "colab": { 555 | "base_uri": "https://localhost:8080/" 556 | }, 557 | "id": "YLpKGbjyu2OK", 558 | "outputId": "2ba60055-1f76-4b99-a5b9-4b4315e8e5df" 559 | }, 560 | "outputs": [ 561 | { 562 | "name": "stdout", 563 | "output_type": "stream", 564 | "text": [ 565 | "Test Loss: 0.2986\n", 566 | "Test Accuracy: 87.62%\n" 567 | ] 568 | } 569 | ], 570 | "source": [ 571 | "model.load_state_dict(torch.load('LSTM.pt'))\n", 572 | "\n", 573 | "test_loss, test_acc = Evaluate(model, test_iterator, criterion)\n", 574 | "\n", 575 | "print(f'Test Loss: {test_loss:.4f}')\n", 576 | "print(f'Test Accuracy: {test_acc*100:.2f}%')" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": { 582 | "id": "o3vZ2NNn8sM6" 583 | }, 584 | "source": [ 585 | "# Sampling" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 23, 591 | "metadata": { 592 | "id": "e0hc5XtDsqIH" 593 | }, 594 | "outputs": [], 595 | "source": [ 596 | "import spacy\n", 597 | "nlp = spacy.load('en_core_web_sm')\n", 598 | "\n", 599 | "def predict_sentiment(model, text):\n", 600 | " model.eval()\n", 601 | " tokenized = [tok.text for tok in nlp.tokenizer(text)]\n", 602 | " indexed = [FIELD.vocab.stoi[t] for t in tokenized]\n", 603 | " length = [len(indexed)]\n", 604 | " tensor = torch.LongTensor(indexed).to(device)\n", 605 | " tensor = tensor.unsqueeze(1)\n", 606 | " length_tensor = torch.LongTensor(length)\n", 607 | " prediction = torch.sigmoid(model(tensor, length_tensor))\n", 608 | " return prediction.item()" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": 24, 614 | "metadata": { 615 | "colab": { 616 | "base_uri": "https://localhost:8080/", 617 | "height": 311 618 | }, 619 | "id": "vc0zO3gTws-d", 620 | "outputId": "fde91c09-6b91-4f2b-e49d-124d528c52aa" 621 | }, 622 | "outputs": [ 623 | { 624 | "data": { 625 | "text/plain": [ 626 | "0.011234746314585209" 627 | ] 628 | }, 629 | "execution_count": 24, 630 | "metadata": {}, 631 | "output_type": "execute_result" 632 | } 633 | ], 634 | "source": [ 635 | "predict_sentiment(model, \"This film is not bad\")" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 25, 641 | "metadata": { 642 | "id": "Umdey8pIw8yp" 643 | }, 644 | "outputs": [ 645 | { 646 | "data": { 647 | "text/plain": [ 648 | "0.939202070236206" 649 | ] 650 | }, 651 | "execution_count": 25, 652 | "metadata": {}, 653 | "output_type": "execute_result" 654 | } 655 | ], 656 | "source": [ 657 | "predict_sentiment(model, \"This film is excellent\")" 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "execution_count": 26, 663 | "metadata": { 664 | "id": "8rbQ8FYRw-TM" 665 | }, 666 | "outputs": [ 667 | { 668 | "data": { 669 | "text/plain": [ 670 | "0.0072547681629657745" 671 | ] 672 | }, 673 | "execution_count": 26, 674 | "metadata": {}, 675 | "output_type": "execute_result" 676 | } 677 | ], 678 | "source": [ 679 | "predict_sentiment(model, \"This film is bad\")" 680 | ] 681 | } 682 | ], 683 | "metadata": { 684 | "accelerator": "GPU", 685 | "colab": { 686 | "name": "LSTM.ipynb", 687 | "provenance": [] 688 | }, 689 | "kernelspec": { 690 | "display_name": "Python 3", 691 | "name": "python3" 692 | }, 693 | "language_info": { 694 | "codemirror_mode": { 695 | "name": "ipython", 696 | "version": 3 697 | }, 698 | "file_extension": ".py", 699 | "mimetype": "text/x-python", 700 | "name": "python", 701 | "nbconvert_exporter": "python", 702 | "pygments_lexer": "ipython3", 703 | "version": "3.9.7" 704 | } 705 | }, 706 | "nbformat": 4, 707 | "nbformat_minor": 0 708 | } 709 | -------------------------------------------------------------------------------- /text_classification/plots/BERT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/text_classification/plots/BERT.png -------------------------------------------------------------------------------- /text_classification/plots/CNN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/text_classification/plots/CNN.png -------------------------------------------------------------------------------- /text_classification/plots/FastText.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/text_classification/plots/FastText.png -------------------------------------------------------------------------------- /text_classification/plots/LSTM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/text_classification/plots/LSTM.png -------------------------------------------------------------------------------- /word_rnn/README.md: -------------------------------------------------------------------------------- 1 | # Paragraph generation (Language modeling) 2 | 3 | Model was trained over Book 1 - Philosopher's stone.txt which is 1st edition of Harry Potter franchise. 4 | 5 | Text consists of 6 | 5978 distinct words (vocabilary size) 7 | 3275 usable paragraphs 8 | 9 | 10 | > It is observed that model trained with nn.RNN is unable to form long term dependencies. Paragraphs formed were very short. 11 | Meaning of sentence is lost just in 3-4 words . Although it is able to recognise end of paragraph. Model is wise enough to choose proper punctuation marks. For example- double quotes are closed in almost every trial. 12 | 13 | > Whereas , model trained with nn.LSTM is able to form long term dependencies. Even next sentence formed after one shows some context shared. LSTM shows better results that RNN and GRU. Sentences and paragraphs are ended at proper position and proper meaning than other two. 14 | 15 | > GRU Gives somewhat similar results like LSTM. Sentences show some what meaning and shared context. But next sentence formed looses previous context. Sentence ending and paragraph ending is done properly. Punctuations are also maintained. Also you can observe that , length of sequence formed decreases. 16 | 17 | 18 | GRU Loss function 19 | 20 | 21 | ![](https://github.com/AjinkyaDeshpande39/Natural-Language-Processing/blob/master/word_rnn/paragraph%20generation%20loss%20gru.png) 22 | 23 | #embedding = 128 24 | #hidden_size = 256 25 | #num_layers = 4 26 | #learning_rate = 0.0006 27 | #epochs = 25 28 | #3,782,234 trainable parameters 29 | 30 | LSTM Loss function 31 | 32 | 33 | ![](https://github.com/AjinkyaDeshpande39/Natural-Language-Processing/blob/master/word_rnn/paragraph%20generation%20loss%20lstm.png) 34 | 35 | #embedding = 128 36 | #hidden_size = 256 37 | #num_layers = 4 38 | #learning_rate = 0.0006 39 | #epochs = 25 40 | #4,275,802 trainable parameters 41 | 42 | RNN Loss function 43 | 44 | ![](https://github.com/AjinkyaDeshpande39/Natural-Language-Processing/blob/master/word_rnn/paragraph%20generation%20loss%20rnn.png) 45 | 46 | #embedding = 150 47 | #hidden_size = 400 48 | #num_layers = 3 49 | #learning_rate = 0.0006 50 | #epochs = 25 51 | #4,156,278 trainable parameters 52 | > Since RNN has no cell states or gates those help in remembering details from previous inputs, we have increased size of hidden layer and embedding layer. Increment in number of parameters show better results. 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | ## __ Results __ - 61 | 62 | ![Alt Text](https://github.com/AjinkyaDeshpande39/Natural-Language-Processing/blob/master/word_rnn/ezgif.com-gif-maker.gif) 63 | 64 | ## LSTM 65 | 66 | 1. testing2('“ i tell you , we’re going to win that') 67 | - “ i tell you , we’re going to win that in the summer - year waffle , something that he is to die — dumbledore says nothing about it , i’ll therefore nicolas ‘ rude with all with a fifty - points - points points that year , you were steamrollered ! ” 68 | 69 | 2. testing2('there was a very good chance they were going to get caught') 70 | - 'there was a very good chance they were going to get caught some point \\ ”' 71 | 72 | 73 | 3. testing2('they pulled on their bathrobes , picked up their wands ,') 74 | - 'they pulled on their bathrobes , picked up their wands , insisting themselves , and snape into the slytherin table . “ aren’t you ? ” screeched dumbledore , but she passed , but mr . dursley was , mysterious a long bite with broad soon .' 75 | 76 | 4. testing2('“ i am scared ”') 77 | - '“ i am scared ” . . . for the soul . “ i can knew it , ” said dumbledore , grinning , grinning down the mom , “ mom , mom . ” ' 78 | 79 | 5. testing2('hermione took out a list of subjects and titles she had') 80 | - 'hermione took out a list of subjects and titles she had found talking to the ceiling . harry found a quiver again at the barrier , but the whisperers bear the platform as dressed there , too .' 81 | 82 | ## GRU 83 | 84 | 1. testing2('“ i tell you , we’re going to win that') 85 | - '“ i tell you , we’re going to win that , ” said harry .' 86 | 87 | 2. testing2('there was a very good chance they were going to get caught') 88 | - 'there was a very good chance they were going to get caught the fact . some and its common families and fifty trolls in the air ; it turned on the note .' 89 | 90 | 3. testing2('they pulled on their bathrobes , picked up their wands ,') 91 | - 'they pulled on their bathrobes , picked up their wands , stone - green - muggle words . the same thing was still around at quirrell’s mouth , a purple hand , and a still screamed of books into the ceiling , and a same same through the same one ; the last last time in harry’s hand , reached into the same wall . the rest of the first thing they were talking and down , and waving into the table and winked , it was a terrible nostrils in' 92 | 93 | 4. testing2('“ i am scared ”') 94 | - '“ i am scared ” . . . weasley face at harry . “ i said it , however , i thought . . . ”' 95 | 96 | 5. testing2('hermione took out a list of subjects and titles she had') 97 | - 'hermione took out a list of subjects and titles she had already passed at the table , and she looked forward to herself to be allowed to be looking at them . she looked very high , she looked at them .' 98 | 99 | 100 | ## RNN 101 | 102 | 1. testing2('“ i tell you , we’re going to win that') 103 | - “ i tell you , we’re going to win that hole . ”' 104 | 105 | 2. testing2('there was a very good chance they were going to get caught') 106 | - 'there was a very good chance they were going to get caught the whole day .' 107 | 108 | 3. testing2('they pulled on their bathrobes , picked up their wands ,') 109 | 'they pulled on their bathrobes , picked up their wands , and fell into blackness .' 110 | 111 | 4. testing2('“ i am scared ”') 112 | - '“ i am scared ” gasped . . . . ”' 113 | 114 | 5. testing2('hermione took out a list of subjects and titles she had') 115 | - 'hermione took out a list of subjects and titles she had to say .' 116 | 117 | 118 | ## Resources : 119 | https://youtu.be/iWea12EAu6U CS224N course over NLP. 120 | -------------------------------------------------------------------------------- /word_rnn/ezgif.com-gif-maker.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/ezgif.com-gif-maker.gif -------------------------------------------------------------------------------- /word_rnn/paragraph generation loss gru.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/paragraph generation loss gru.png -------------------------------------------------------------------------------- /word_rnn/paragraph generation loss lstm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/paragraph generation loss lstm.png -------------------------------------------------------------------------------- /word_rnn/paragraph generation loss rnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/paragraph generation loss rnn.png -------------------------------------------------------------------------------- /word_rnn/wordRNN_paragraph_gru2.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/wordRNN_paragraph_gru2.pth -------------------------------------------------------------------------------- /word_rnn/wordRNN_paragraph_lstm2.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/wordRNN_paragraph_lstm2.pth -------------------------------------------------------------------------------- /word_rnn/wordRNN_paragraph_rnn2.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IvLabs/Natural-Language-Processing/dcfd38d00a8d5137b122c778a67445d7e95cf53e/word_rnn/wordRNN_paragraph_rnn2.pth --------------------------------------------------------------------------------