├── Notebook ├── Multiclass_Text_Classification.ipynb └── info.md ├── README.md ├── Web-Application ├── Text_LR.pkl ├── app.py ├── count_vect.pkl ├── requirements.txt ├── static │ ├── info.md │ ├── script.js │ └── style.css ├── templates │ ├── about.html │ ├── contact.html │ ├── index.html │ ├── info.md │ └── thank-you.html └── transformer.pkl └── ui ├── info.md └── ui.PNG /Notebook/info.md: -------------------------------------------------------------------------------- 1 | notebook 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Multiclass Text Classification Project 2 | 3 | 4 | ## Project Overview 5 | The goal of this project is to classify text data into predefined categories using a combination of traditional machine learning models and deep learning architectures. The project includes: 6 | - A **Flask-based web application** for interactive text classification. 7 | - **Preprocessing** of text data, including cleaning, tokenization, and lemmatization. 8 | - Training and evaluation of multiple models, including: 9 | - Traditional ML models: Logistic Regression, SVM, Naive Bayes, Random Forest, Gradient Boosting, AdaBoost, and an Ensemble model. 10 | - Deep learning models: LSTM, GRU, CNN, and a hybrid LSTM+CNN model. 11 | - Fine-tuning of transformer-based models: BERT and XLNet using **ktrain**. 12 | - Visualization of results, including confusion matrices, accuracy plots, and word clouds. 13 | 14 | --- 15 | 16 | # Requirements: 17 | 18 | * Python 19 | 20 | * Scikit-learn 21 | 22 | * TensorFlow 23 | 24 | * Keras 25 | 26 | # Dataset: 27 | 28 | The dataset used in this project is the bbc-tex dataset, which consists of approximately 2225 text. 29 | 30 | # Results: 31 | The results of each model on the bbc-text dataset are as follows: 32 | 33 | | Model | Accuracy | 34 | |----------|----------| 35 | | Logistic Regression | 96.58% | 36 | | Support Vector Machine | 96.94% | 37 | | Multinomial Naive Bayes | 94.97% | 38 | | Randomforest | 95.15% | 39 | | GradientBoostingClassifier | 94.25% | 40 | | Ensemble Classifier | 97.12% | 41 | | AdaBoost | 94.43% | 42 | | LSTM 1-Layer | 99.22% | 43 | | LSTM 2-Layers | 97.78% | 44 | | GRU | 91.74% | 45 | | CNN+LSTM | 98.73% | 46 | | BERT | 99.60% | 47 | | XLNet | 99.46% | 48 | 49 | 50 | 51 | # Application Interface 52 | 53 | Original Image

54 | -------------------------------------------------------------------------------- /Web-Application/Text_LR.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Snigdho8869/Multiclass-Text-Classification/22cd8fd21e34650b31e4b6a815bb02f680691afe/Web-Application/Text_LR.pkl -------------------------------------------------------------------------------- /Web-Application/app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify, render_template, redirect, url_for 2 | from flask_mail import Mail, Message 3 | import smtplib 4 | import tensorflow 5 | import re 6 | import nltk 7 | import numpy as np 8 | import pandas as pd 9 | import joblib 10 | import json 11 | from nltk.tokenize import RegexpTokenizer 12 | from nltk.corpus import stopwords 13 | from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer 14 | from textblob import Word 15 | 16 | 17 | 18 | app = Flask(__name__, template_folder='templates', static_folder='static') 19 | 20 | 21 | model = joblib.load('Text_LR.pkl') 22 | count_vec = joblib.load('count_vect.pkl') 23 | transformer = joblib.load('transformer.pkl') 24 | 25 | 26 | 27 | @app.route('/') 28 | def home(): 29 | return render_template('index.html') 30 | 31 | @app.route('/index.html') 32 | def index(): 33 | return render_template('index.html') 34 | 35 | @app.route('/contact.html') 36 | def contact(): 37 | return render_template('contact.html') 38 | 39 | @app.route('/about.html') 40 | def about(): 41 | return render_template('about.html') 42 | 43 | @app.route('/resources.html') 44 | def resources(): 45 | return render_template('resources.html') 46 | 47 | 48 | @app.route('/text-classification', methods=['POST']) 49 | def predict(): 50 | data = request.get_json(force=True) 51 | text = data['text'] 52 | 53 | text_df = pd.DataFrame({'text': [text]}) 54 | text_df['lower_case'] = text_df['text'].apply(lambda x: x.lower().strip().replace('\n', ' ').replace('\r', ' ')) 55 | text_df['alphabetic'] = text_df['lower_case'].apply(lambda x: re.sub(r'[^a-zA-Z\']', ' ', x)).apply(lambda x: re.sub(r'[^\x00-\x7F]+', '', x)) 56 | tokenizer = RegexpTokenizer(r'\w+') 57 | text_df['special_word'] = text_df.apply(lambda row: tokenizer.tokenize(row['alphabetic']), axis=1) 58 | stop = [word for word in stopwords.words('english') if word not in ["my","haven't"]] 59 | text_df['stop_words'] = text_df['special_word'].apply(lambda x: [item for item in x if item not in stop]) 60 | text_df['stop_words'] = text_df['stop_words'].astype('str') 61 | text_df['short_word'] = text_df['stop_words'].str.findall(r'\w{2,}') 62 | text_df['text'] = text_df['short_word'].str.join(' ') 63 | text_df['text'] = text_df['text'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()])) 64 | 65 | text = count_vec.transform([text_df['text'][0]]) 66 | text_vec = transformer.transform(text) 67 | prediction = model.predict(text_vec) 68 | prediction_list = prediction.tolist() 69 | response_dict = {'category': prediction_list} 70 | response_json = json.dumps(response_dict) 71 | response = app.response_class(response=response_json, status=200, mimetype='application/json') 72 | 73 | return response 74 | 75 | 76 | @app.route('/send-email', methods=['POST']) 77 | def send_email(): 78 | 79 | name = request.form['name'] 80 | email = request.form['email'] 81 | message = request.form['message'] 82 | 83 | subject = 'Contact Form Submission from ' + name 84 | body = 'Name: ' + name + '\nEmail: ' + email + '\nMessage: ' + message 85 | 86 | server = smtplib.SMTP('smtp.gmail.com', 587) 87 | server.starttls() 88 | server.login('zahidulislam2225@gmail.com', 'valb mmmn awhg snpd') 89 | 90 | server.sendmail('zahidulislam2225@gmail.com', 'rafin3600@gmail.com', subject + '\n\n' + body) 91 | server.quit() 92 | 93 | return render_template('thank-you.html') 94 | 95 | 96 | if __name__ == '__main__': 97 | app.run(host='0.0.0.0', port=5000,debug=True) -------------------------------------------------------------------------------- /Web-Application/count_vect.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Snigdho8869/Multiclass-Text-Classification/22cd8fd21e34650b31e4b6a815bb02f680691afe/Web-Application/count_vect.pkl -------------------------------------------------------------------------------- /Web-Application/requirements.txt: -------------------------------------------------------------------------------- 1 | Flask==2.3.2 2 | Flask-Mail==0.9.1 3 | smtplib==0.0.1 4 | tensorflow==2.12.0 5 | nltk==3.8.1 6 | numpy==1.24.3 7 | pandas==2.0.3 8 | joblib==1.3.2 9 | scikit-learn==1.3.0 10 | textblob==0.17.1 -------------------------------------------------------------------------------- /Web-Application/static/info.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Web-Application/static/script.js: -------------------------------------------------------------------------------- 1 | function predict() { 2 | var text = document.getElementById('text').value.trim(); 3 | if (text === '') { 4 | document.getElementById('prediction').innerHTML = 'Please Enter Some Text'; 5 | return; 6 | } 7 | 8 | fetch('/text-classification', { 9 | method: 'POST', 10 | headers: { 11 | 'Content-Type': 'application/json' 12 | }, 13 | body: JSON.stringify({ 14 | text: text 15 | }) 16 | }) 17 | .then(function(response) { 18 | return response.json(); 19 | }) 20 | .then(function(data) { 21 | var category = data.category; 22 | document.getElementById('prediction').innerHTML = category[0]; 23 | 24 | const predictionDiv = document.getElementById('prediction'); 25 | const animation = anime({ 26 | targets: predictionDiv, 27 | translateY: ["-100%", 0], 28 | opacity: [0, 1], 29 | scale: [0.5, 1], 30 | duration: 1000, 31 | easing: "spring(1, 80, 10, 0)", 32 | delay: 500 33 | }); 34 | }) 35 | .catch(function(error) { 36 | document.getElementById('prediction').innerHTML = 'Error: ' + error.message; 37 | }); 38 | } 39 | -------------------------------------------------------------------------------- /Web-Application/static/style.css: -------------------------------------------------------------------------------- 1 | :root { 2 | --primary-color: #4CAF50; 3 | --secondary-color: #333; 4 | --background-color: #1e1e1e; 5 | --text-color: #ffffff; 6 | --card-background: #2c2c2c; 7 | --hover-color: #666; 8 | --font-family: 'Poppins', sans-serif; 9 | } 10 | 11 | body { 12 | background-color: var(--background-color); 13 | color: var(--text-color); 14 | font-family: var(--font-family); 15 | margin: 0; 16 | padding: 0; 17 | } 18 | 19 | a { 20 | color: var(--primary-color); 21 | text-decoration: none; 22 | transition: color 0.3s ease; 23 | } 24 | 25 | a:hover { 26 | color: var(--hover-color); 27 | } 28 | 29 | nav { 30 | background-color: var(--secondary-color); 31 | padding: 15px; 32 | text-align: center; 33 | margin-bottom: 10px; 34 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 35 | } 36 | 37 | nav ul { 38 | list-style: none; 39 | margin: 0; 40 | padding: 0; 41 | display: flex; 42 | justify-content: center; 43 | } 44 | 45 | nav li { 46 | margin: 0 15px; 47 | } 48 | 49 | nav ul li a { 50 | color: var(--text-color); 51 | font-size: 18px; 52 | padding: 10px 15px; 53 | border-radius: 5px; 54 | transition: background-color 0.3s ease, color 0.3s ease; 55 | } 56 | 57 | nav ul li a:hover { 58 | background-color: var(--primary-color); 59 | color: var(--text-color); 60 | } 61 | 62 | .index-h1 { 63 | font-size: 3vw; 64 | text-align: center; 65 | margin: 1vw 0; 66 | color: var(--primary-color); 67 | } 68 | 69 | .index-form { 70 | display: flex; 71 | flex-direction: column; 72 | align-items: center; 73 | margin: 20px auto; 74 | max-width: 800px; 75 | padding: 20px; 76 | background-color: var(--card-background); 77 | border-radius: 10px; 78 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 79 | } 80 | 81 | textarea { 82 | width: 90%; 83 | height: 200px; 84 | margin: 20px 0; 85 | padding: 15px; 86 | font-size: 16px; 87 | border: 1px solid var(--secondary-color); 88 | border-radius: 8px; 89 | background-color: var(--background-color); 90 | color: var(--text-color); 91 | resize: none; 92 | } 93 | 94 | button { 95 | padding: 12px 24px; 96 | font-size: 16px; 97 | border: none; 98 | border-radius: 8px; 99 | background-color: var(--primary-color); 100 | color: var(--text-color); 101 | cursor: pointer; 102 | transition: background-color 0.3s ease; 103 | } 104 | 105 | button:hover { 106 | background-color: var(--hover-color); 107 | } 108 | 109 | #prediction { 110 | margin: 20px auto; 111 | font-size: 24px; 112 | text-align: center; 113 | padding: 20px; 114 | background-color: var(--card-background); 115 | border-radius: 10px; 116 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 117 | max-width: 800px; 118 | } 119 | 120 | 121 | .about-container { 122 | max-width: 800px; 123 | margin: 40px auto; 124 | padding: 20px; 125 | background-color: var(--card-background); 126 | border-radius: 10px; 127 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 128 | } 129 | 130 | .about-h1, .about-h2 { 131 | color: var(--primary-color); 132 | text-align: center; 133 | margin-bottom: 20px; 134 | } 135 | 136 | .about-p { 137 | font-size: 16px; 138 | line-height: 1.6; 139 | margin-bottom: 20px; 140 | } 141 | 142 | .about-img { 143 | max-width: 200px; 144 | border-radius: 50%; 145 | display: block; 146 | margin: 20px auto; 147 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 148 | } 149 | 150 | 151 | #contact { 152 | max-width: 800px; 153 | margin: 40px auto; 154 | padding: 20px; 155 | background-color: var(--card-background); 156 | border-radius: 10px; 157 | box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); 158 | } 159 | 160 | .contact-h2 { 161 | color: var(--primary-color); 162 | text-align: center; 163 | margin-top: 0px; 164 | margin-bottom: 20px; 165 | } 166 | 167 | .contact-form { 168 | display: flex; 169 | flex-direction: column; 170 | align-items: center; 171 | } 172 | 173 | .contact-form label { 174 | margin-bottom: 10px; 175 | } 176 | 177 | .contact-input, .contact-textarea { 178 | width: 90%; 179 | padding: 12px; 180 | margin-bottom: 20px; 181 | border: 1px solid var(--secondary-color); 182 | border-radius: 8px; 183 | background-color: var(--background-color); 184 | color: var(--text-color); 185 | font-size: 16px; 186 | } 187 | 188 | .contact-button { 189 | background-color: var(--primary-color); 190 | color: var(--text-color); 191 | padding: 12px 24px; 192 | border: none; 193 | border-radius: 8px; 194 | cursor: pointer; 195 | transition: background-color 0.3s ease; 196 | } 197 | 198 | .contact-button:hover { 199 | background-color: var(--hover-color); 200 | } 201 | 202 | 203 | footer { 204 | background-color: var(--secondary-color); 205 | color: var(--text-color); 206 | padding: 10px; 207 | text-align: center; 208 | margin-top: 40px; 209 | } 210 | 211 | .footer-container { 212 | display: flex; 213 | justify-content: center; 214 | align-items: center; 215 | } 216 | 217 | .social-icons img { 218 | width: 40px; 219 | height: 40px; 220 | margin: 0 10px; 221 | transition: transform 0.3s ease; 222 | } 223 | 224 | .social-icons img:hover { 225 | transform: scale(1.1); 226 | } 227 | 228 | .rights-reserved { 229 | margin-top: 10px; 230 | font-size: 14px; 231 | } -------------------------------------------------------------------------------- /Web-Application/templates/about.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | About 6 | 7 | 8 | 9 | 10 | 17 | 18 |

19 |

About Text Classification

20 |

Text classification was a task where I assigned predefined categories or labels to text documents. This is a fundamental problem in natural language processing, and it has many applications such as sentiment analysis, spam detection, topic classification, etc.

21 | 22 | To perform text classification with 5 classes, I used both traditional machine learning (ML) and deep learning (DL) models.

23 | 24 | For ML models, I used algorithms such as Naive Bayes, Support Vector Machines (SVM), Random Forest, Logistic Regression, Ensemble Learning, AdaBoost, GradientBoosting. These models required feature extraction and selection techniques such as CountVectorizer, TfidfVectorizer.

25 | 26 | For DL models, I used architectures such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and their variants like LSTM and GRU. These models could learn to extract features and classify text documents end-to-end. I also used pre-trained models like BERT and XLNet to achieve better performance.

27 |

28 | 29 |

I am the developer of this project, and I have a strong background in Information Technology and a keen interest in Machine Learning, Deep Learning, and Federated Learning. I believe that technology has the power to transform lives and make a positive impact on society. I am passionate about using my skills to contribute to projects that make a difference, and I am excited to be working on this important issue.

30 |

You can find me on GitHub, where I shares my projects and contributions to the open-source community.

31 |

32 | 33 | 49 | 50 | 51 | 52 | -------------------------------------------------------------------------------- /Web-Application/templates/contact.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Contact Us 6 | 7 | 8 | 9 | 10 | 17 | 18 |

19 |

Contact Us

20 | 32 |

33 | 34 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /Web-Application/templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Text Classification 6 | 7 | 8 | 9 | 10 | 11 | 12 | 19 | 20 |

Text Classification

21 | 25 |

26 | 27 | 42 | 43 | -------------------------------------------------------------------------------- /Web-Application/templates/info.md: -------------------------------------------------------------------------------- 1 | HTML files directory. 2 | -------------------------------------------------------------------------------- /Web-Application/templates/thank-you.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Thank You 6 | 91 | 92 | 93 |

94 | 95 |

Thank You!

96 |

Your message has been sent.

97 |

We will get back to you as soon as possible.

98 | Back to Home 99 |

100 | 101 | 102 | -------------------------------------------------------------------------------- /Web-Application/transformer.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Snigdho8869/Multiclass-Text-Classification/22cd8fd21e34650b31e4b6a815bb02f680691afe/Web-Application/transformer.pkl -------------------------------------------------------------------------------- /ui/info.md: -------------------------------------------------------------------------------- 1 | UI images . 2 | -------------------------------------------------------------------------------- /ui/ui.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Snigdho8869/Multiclass-Text-Classification/22cd8fd21e34650b31e4b6a815bb02f680691afe/ui/ui.PNG --------------------------------------------------------------------------------