├── LICENSE ├── README.md ├── Section5 ├── Reviews.rar ├── plot.html └── section5_video3_video4_training_visualizing_wordembedding.ipynb ├── section1 └── video3 │ └── section1_video3_install_corpora.ipynb ├── section2 ├── video 2 │ └── section_2_video_2_cleaning.ipynb ├── video 3 │ └── section_2_video_3_tokenizing.ipynb └── video 4 │ └── section_2_video_4_ngrams.ipynb ├── section3 ├── ner_dataset.rar ├── section3_video3_pretrained_models.ipynb └── section3_video4_training_ner.ipynb └── section4 └── section4_video3_basic_classifier.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Packt 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # Text Mining with Machine Learning and Python [Video] 5 | This is the code repository for [Text Mining with Machine Learning and Python [Video]](https://www.packtpub.com/application-development/text-mining-machine-learning-and-python-video?utm_source=github&utm_medium=repository&utm_campaign=9781789137361), published by [Packt](https://www.packtpub.com/?utm_source=github). It contains all the supporting project files necessary to work through the video course from start to finish. 6 | ## About the Video Course 7 | Text is one of the most actively researched and widely spread types of data in the Data Science field today. New advances in machine learning and deep learning techniques now make it possible to build fantastic data products on text sources. New exciting text data sources pop up all the time. You'll build your own toolbox of know-how, packages, and working code snippets so you can perform your own text mining analyses. 8 | 9 | You'll start by understanding the fundamentals of modern text mining and move on to some exciting processes involved in it. You'll learn how machine learning is used to extract meaningful information from text and the different processes involved in it. You will learn to read and process text features. Then you'll learn how to extract information from text and work on pre-trained models, while also delving into text classification, and entity extraction and classification. You will explore the process of word embedding by working on Skip-grams, CBOW, and X2Vec with some additional and important text mining processes. By the end of the course, you will have learned and understood the various aspects of text mining with ML aText is one of the most actively researched and widely spread types of data in the Data Science field today. New advances in machine learning and deep learning techniques now make it possible to build fantastic data products on text sources. New exciting text data sources pop out all the time like tulips in the spring. This course aims to you the first steps into this expertise. To build up your toolbox of know-how, packages and working code snippets to perform your own Text Mining analysis. 10 | 11 | Starting from the basics of preprocessing text features, we’ll take a look at how we can extract relevant features from text and classify documents through Machine Learning. Since Word Embeddings have become indispensable in today’s NLP world, we’ll dive deeper into their inner workings and have a go at training our own embedding models. 12 | 13 | By the end of the course, you will have a high-level understanding of the various components involved in a current-day NLP pipeline, and a set of working code to build further upon. 14 | nd the important processes involved in it, and will have begun your journey as an effective text miner. 15 | 16 | 17 |
\n", 108 | " | Id | \n", 109 | "ProductId | \n", 110 | "UserId | \n", 111 | "ProfileName | \n", 112 | "HelpfulnessNumerator | \n", 113 | "HelpfulnessDenominator | \n", 114 | "Score | \n", 115 | "Time | \n", 116 | "Summary | \n", 117 | "Text | \n", 118 | "tokens | \n", 119 | "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", 124 | "1 | \n", 125 | "B001E4KFG0 | \n", 126 | "A3SGXH7AUHU8GW | \n", 127 | "delmartian | \n", 128 | "1 | \n", 129 | "1 | \n", 130 | "5 | \n", 131 | "1303862400 | \n", 132 | "Good Quality Dog Food | \n", 133 | "I have bought several of the Vitality canned d... | \n", 134 | "[I, have, bought, several, of, the, Vitality, ... | \n", 135 | "
1 | \n", 138 | "2 | \n", 139 | "B00813GRG4 | \n", 140 | "A1D87F6ZCVE5NK | \n", 141 | "dll pa | \n", 142 | "0 | \n", 143 | "0 | \n", 144 | "1 | \n", 145 | "1346976000 | \n", 146 | "Not as Advertised | \n", 147 | "Product arrived labeled as Jumbo Salted Peanut... | \n", 148 | "[Product, arrived, labeled, as, Jumbo, Salted,... | \n", 149 | "
2 | \n", 152 | "3 | \n", 153 | "B000LQOCH0 | \n", 154 | "ABXLMWJIXXAIN | \n", 155 | "Natalia Corres \"Natalia Corres\" | \n", 156 | "1 | \n", 157 | "1 | \n", 158 | "4 | \n", 159 | "1219017600 | \n", 160 | "\"Delight\" says it all | \n", 161 | "This is a confection that has been around a fe... | \n", 162 | "[This, is, a, confection, that, has, been, aro... | \n", 163 | "
\n", 74 | " | Sentence # | \n", 75 | "Word | \n", 76 | "POS | \n", 77 | "Tag | \n", 78 | "
---|---|---|---|---|
0 | \n", 83 | "Sentence: 1 | \n", 84 | "Thousands | \n", 85 | "NNS | \n", 86 | "O | \n", 87 | "
1 | \n", 90 | "NaN | \n", 91 | "of | \n", 92 | "IN | \n", 93 | "O | \n", 94 | "
2 | \n", 97 | "NaN | \n", 98 | "demonstrators | \n", 99 | "NNS | \n", 100 | "O | \n", 101 | "
3 | \n", 104 | "NaN | \n", 105 | "have | \n", 106 | "VBP | \n", 107 | "O | \n", 108 | "
4 | \n", 111 | "NaN | \n", 112 | "marched | \n", 113 | "VBN | \n", 114 | "O | \n", 115 | "
5 | \n", 118 | "NaN | \n", 119 | "through | \n", 120 | "IN | \n", 121 | "O | \n", 122 | "
6 | \n", 125 | "NaN | \n", 126 | "London | \n", 127 | "NNP | \n", 128 | "B-geo | \n", 129 | "
7 | \n", 132 | "NaN | \n", 133 | "to | \n", 134 | "TO | \n", 135 | "O | \n", 136 | "
8 | \n", 139 | "NaN | \n", 140 | "protest | \n", 141 | "VB | \n", 142 | "O | \n", 143 | "
9 | \n", 146 | "NaN | \n", 147 | "the | \n", 148 | "DT | \n", 149 | "O | \n", 150 | "
\n", 442 | " | Sentence # | \n", 443 | "Word | \n", 444 | "Tag remapped | \n", 445 | "
---|---|---|---|
0 | \n", 450 | "121 | \n", 451 | "[Officials, say, the, 27-year, old, man, from,... | \n", 452 | "[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ... | \n", 453 | "
1 | \n", 456 | "206 | \n", 457 | "[Humans, are, usually, infected, with, bird, f... | \n", 458 | "[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ... | \n", 459 | "
2 | \n", 462 | "227 | \n", 463 | "[One, of, the, 2008, Olympic, mascots, is, mod... | \n", 464 | "[O, O, O, O, O, O, O, O, O, O, O, O, B-NAT, I-... | \n", 465 | "
3 | \n", 468 | "229 | \n", 469 | "[Sam, Beattie, reports, from, Jing, Jing, 's, ... | \n", 470 | "[O, O, O, O, B-NAT, I-NAT, O, O, O, O, O, O] | \n", 471 | "