├── README.md
└── E_mail_Spam_Clasifier.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Spam-E-mail-detector
2 |
3 |
4 | ## Overview
5 | This is a machine learning-based project designed to classify emails as spam or not spam (ham) using natural language processing and predictive modeling. The detector analyzes email content to identify spam patterns, providing a practical tool for email filtering and security. It leverages a trained model to deliver reliable predictions based on text data.
6 |
7 | ## Features
8 | - **Spam Detection**: Classifies emails as spam or ham with high accuracy.
9 | - **Text Analysis**: Processes email content using NLP techniques.
10 | - **Customizable**: Can be trained on custom datasets for improved performance.
11 | - **User-Friendly**: Simple interface for inputting email text and receiving results.
12 |
13 | ## Technologies Used
14 |
15 | - Python: Core programming language.
16 |
17 | - Scikit-learn: For machine learning model development and evaluation.
18 | - NLTK/Spacy: For natural language processing and text preprocessing.
19 | - Pandas: For data manipulation and analysis.
20 | - NumPy: For numerical computations.
21 | - Matplotlib/Seaborn: For visualization (optional, if included).
22 | - Jupyter Notebook (optional): For development and testing.
23 |
24 |
25 |
26 | ## Usage
27 | - Input Data: Provide email text through the script or notebook interface.
28 | - Prediction: The model outputs a classification (e.g., "Spam" or "Ham").
29 | - Accuracy: 95%
30 |
31 | ## Folder Structure
32 | ```
33 | spam-email-detector/
34 | ├── data/ # Dataset files (e.g., spam_ham_dataset.csv)
35 | ├── src/ # Source code (e.g., detector.py)
36 | ├── notebooks/ # Jupyter notebooks (e.g., spam_detector.ipynb)
37 | ├── requirements.txt # Python dependencies
38 | └── README.md # Project documentation
39 | ```
40 |
41 |
42 | ## License
43 | This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
44 |
45 |
46 |
47 |
48 |
--------------------------------------------------------------------------------
/E_mail_Spam_Clasifier.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": []
7 | },
8 | "kernelspec": {
9 | "name": "python3",
10 | "display_name": "Python 3"
11 | },
12 | "language_info": {
13 | "name": "python"
14 | }
15 | },
16 | "cells": [
17 | {
18 | "cell_type": "code",
19 | "execution_count": 1,
20 | "metadata": {
21 | "id": "8EOKja88UzSy"
22 | },
23 | "outputs": [],
24 | "source": [
25 | "import numpy as np\n",
26 | "import pandas as pd\n",
27 | "from sklearn.model_selection import train_test_split\n",
28 | "from sklearn.feature_extraction.text import TfidfVectorizer\n",
29 | "from sklearn.linear_model import LogisticRegression\n",
30 | "from sklearn.metrics import accuracy_score"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "source": [
36 | "df = pd.read_csv('mail_data.csv')"
37 | ],
38 | "metadata": {
39 | "id": "eSDmkIuCVP3s"
40 | },
41 | "execution_count": 18,
42 | "outputs": []
43 | },
44 | {
45 | "cell_type": "code",
46 | "source": [
47 | "print(df)"
48 | ],
49 | "metadata": {
50 | "colab": {
51 | "base_uri": "https://localhost:8080/"
52 | },
53 | "id": "8J9UzNMLVnBk",
54 | "outputId": "f1b00088-f71f-4f95-b1cc-0ceb036097ac"
55 | },
56 | "execution_count": 19,
57 | "outputs": [
58 | {
59 | "output_type": "stream",
60 | "name": "stdout",
61 | "text": [
62 | " Category Message\n",
63 | "0 ham Go until jurong point, crazy.. Available only ...\n",
64 | "1 ham Ok lar... Joking wif u oni...\n",
65 | "2 spam Free entry in 2 a wkly comp to win FA Cup fina...\n",
66 | "3 ham U dun say so early hor... U c already then say...\n",
67 | "4 ham Nah I don't think he goes to usf, he lives aro...\n",
68 | "... ... ...\n",
69 | "5567 spam This is the 2nd time we have tried 2 contact u...\n",
70 | "5568 ham Will ü b going to esplanade fr home?\n",
71 | "5569 ham Pity, * was in mood for that. So...any other s...\n",
72 | "5570 ham The guy did some bitching but I acted like i'd...\n",
73 | "5571 ham Rofl. Its true to its name\n",
74 | "\n",
75 | "[5572 rows x 2 columns]\n"
76 | ]
77 | }
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "source": [
83 | "data = df.where((pd.notnull(df)), '')"
84 | ],
85 | "metadata": {
86 | "id": "pX6xnDr5cYMj"
87 | },
88 | "execution_count": 21,
89 | "outputs": []
90 | },
91 | {
92 | "cell_type": "code",
93 | "source": [
94 | "data.head(15)"
95 | ],
96 | "metadata": {
97 | "colab": {
98 | "base_uri": "https://localhost:8080/",
99 | "height": 1303
100 | },
101 | "id": "j8GHWZq0cmrh",
102 | "outputId": "435b8a9d-e738-47e1-99ba-c6267cf23a4f"
103 | },
104 | "execution_count": 22,
105 | "outputs": [
106 | {
107 | "output_type": "execute_result",
108 | "data": {
109 | "text/plain": [
110 | " Category Message\n",
111 | "0 ham Go until jurong point, crazy.. Available only ...\n",
112 | "1 ham Ok lar... Joking wif u oni...\n",
113 | "2 spam Free entry in 2 a wkly comp to win FA Cup fina...\n",
114 | "3 ham U dun say so early hor... U c already then say...\n",
115 | "4 ham Nah I don't think he goes to usf, he lives aro...\n",
116 | "5 spam FreeMsg Hey there darling it's been 3 week's n...\n",
117 | "6 ham Even my brother is not like to speak with me. ...\n",
118 | "7 ham As per your request 'Melle Melle (Oru Minnamin...\n",
119 | "8 spam WINNER!! As a valued network customer you have...\n",
120 | "9 spam Had your mobile 11 months or more? U R entitle...\n",
121 | "10 ham I'm gonna be home soon and i don't want to tal...\n",
122 | "11 spam SIX chances to win CASH! From 100 to 20,000 po...\n",
123 | "12 spam URGENT! You have won a 1 week FREE membership ...\n",
124 | "13 ham I've been searching for the right words to tha...\n",
125 | "14 ham I HAVE A DATE ON SUNDAY WITH WILL!!"
126 | ],
127 | "text/html": [
128 | "\n",
129 | "
\n",
130 | "
\n",
131 | "\n",
144 | "
\n",
145 | " \n",
146 | " \n",
147 | " | \n",
148 | " Category | \n",
149 | " Message | \n",
150 | "
\n",
151 | " \n",
152 | " \n",
153 | " \n",
154 | " | 0 | \n",
155 | " ham | \n",
156 | " Go until jurong point, crazy.. Available only ... | \n",
157 | "
\n",
158 | " \n",
159 | " | 1 | \n",
160 | " ham | \n",
161 | " Ok lar... Joking wif u oni... | \n",
162 | "
\n",
163 | " \n",
164 | " | 2 | \n",
165 | " spam | \n",
166 | " Free entry in 2 a wkly comp to win FA Cup fina... | \n",
167 | "
\n",
168 | " \n",
169 | " | 3 | \n",
170 | " ham | \n",
171 | " U dun say so early hor... U c already then say... | \n",
172 | "
\n",
173 | " \n",
174 | " | 4 | \n",
175 | " ham | \n",
176 | " Nah I don't think he goes to usf, he lives aro... | \n",
177 | "
\n",
178 | " \n",
179 | " | 5 | \n",
180 | " spam | \n",
181 | " FreeMsg Hey there darling it's been 3 week's n... | \n",
182 | "
\n",
183 | " \n",
184 | " | 6 | \n",
185 | " ham | \n",
186 | " Even my brother is not like to speak with me. ... | \n",
187 | "
\n",
188 | " \n",
189 | " | 7 | \n",
190 | " ham | \n",
191 | " As per your request 'Melle Melle (Oru Minnamin... | \n",
192 | "
\n",
193 | " \n",
194 | " | 8 | \n",
195 | " spam | \n",
196 | " WINNER!! As a valued network customer you have... | \n",
197 | "
\n",
198 | " \n",
199 | " | 9 | \n",
200 | " spam | \n",
201 | " Had your mobile 11 months or more? U R entitle... | \n",
202 | "
\n",
203 | " \n",
204 | " | 10 | \n",
205 | " ham | \n",
206 | " I'm gonna be home soon and i don't want to tal... | \n",
207 | "
\n",
208 | " \n",
209 | " | 11 | \n",
210 | " spam | \n",
211 | " SIX chances to win CASH! From 100 to 20,000 po... | \n",
212 | "
\n",
213 | " \n",
214 | " | 12 | \n",
215 | " spam | \n",
216 | " URGENT! You have won a 1 week FREE membership ... | \n",
217 | "
\n",
218 | " \n",
219 | " | 13 | \n",
220 | " ham | \n",
221 | " I've been searching for the right words to tha... | \n",
222 | "
\n",
223 | " \n",
224 | " | 14 | \n",
225 | " ham | \n",
226 | " I HAVE A DATE ON SUNDAY WITH WILL!! | \n",
227 | "
\n",
228 | " \n",
229 | "
\n",
230 | "
\n",
231 | "
\n",
439 | "
\n"
440 | ]
441 | },
442 | "metadata": {},
443 | "execution_count": 22
444 | },
445 | {
446 | "output_type": "display_data",
447 | "data": {
448 | "text/plain": [
449 | ""
450 | ],
451 | "text/html": [
452 | "Categorical distributions
\n",
453 | ""
458 | ]
459 | },
460 | "metadata": {}
461 | },
462 | {
463 | "output_type": "display_data",
464 | "data": {
465 | "text/plain": [
466 | "from matplotlib import pyplot as plt\n",
467 | "import seaborn as sns\n",
468 | "_df_0.groupby('Category').size().plot(kind='barh', color=sns.palettes.mpl_palette('Dark2'))\n",
469 | "plt.gca().spines[['top', 'right',]].set_visible(False)"
470 | ],
471 | "text/html": [
472 | " \n",
473 | "

\n",
562 | "
\n",
563 | " \n",
575 | " "
587 | ]
588 | },
589 | "metadata": {}
590 | },
591 | {
592 | "output_type": "display_data",
593 | "data": {
594 | "text/plain": [
595 | ""
596 | ],
597 | "text/html": [
598 | "Distributions
\n",
599 | ""
604 | ]
605 | },
606 | "metadata": {}
607 | },
608 | {
609 | "output_type": "display_data",
610 | "data": {
611 | "text/plain": [
612 | "from matplotlib import pyplot as plt\n",
613 | "_df_1['index'].plot(kind='hist', bins=20, title='index')\n",
614 | "plt.gca().spines[['top', 'right',]].set_visible(False)"
615 | ],
616 | "text/html": [
617 | " \n",
618 | "

\n",
759 | "
\n",
760 | " \n",
772 | " "
784 | ]
785 | },
786 | "metadata": {}
787 | },
788 | {
789 | "output_type": "display_data",
790 | "data": {
791 | "text/plain": [
792 | ""
793 | ],
794 | "text/html": [
795 | "Categorical distributions
\n",
796 | ""
801 | ]
802 | },
803 | "metadata": {}
804 | },
805 | {
806 | "output_type": "display_data",
807 | "data": {
808 | "text/plain": [
809 | "from matplotlib import pyplot as plt\n",
810 | "import seaborn as sns\n",
811 | "_df_2.groupby('Category').size().plot(kind='barh', color=sns.palettes.mpl_palette('Dark2'))\n",
812 | "plt.gca().spines[['top', 'right',]].set_visible(False)"
813 | ],
814 | "text/html": [
815 | " \n",
816 | "

\n",
905 | "
\n",
906 | " \n",
918 | " "
930 | ]
931 | },
932 | "metadata": {}
933 | },
934 | {
935 | "output_type": "display_data",
936 | "data": {
937 | "text/plain": [
938 | ""
939 | ],
940 | "text/html": [
941 | "Time series
\n",
942 | ""
947 | ]
948 | },
949 | "metadata": {}
950 | },
951 | {
952 | "output_type": "display_data",
953 | "data": {
954 | "text/plain": [
955 | "from matplotlib import pyplot as plt\n",
956 | "import seaborn as sns\n",
957 | "def _plot_series(series, series_name, series_index=0):\n",
958 | " from matplotlib import pyplot as plt\n",
959 | " import seaborn as sns\n",
960 | " palette = list(sns.palettes.mpl_palette('Dark2'))\n",
961 | " counted = (series['index']\n",
962 | " .value_counts()\n",
963 | " .reset_index(name='counts')\n",
964 | " .rename({'index': 'index'}, axis=1)\n",
965 | " .sort_values('index', ascending=True))\n",
966 | " xs = counted['index']\n",
967 | " ys = counted['counts']\n",
968 | " plt.plot(xs, ys, label=series_name, color=palette[series_index % len(palette)])\n",
969 | "\n",
970 | "fig, ax = plt.subplots(figsize=(10, 5.2), layout='constrained')\n",
971 | "df_sorted = _df_3.sort_values('index', ascending=True)\n",
972 | "for i, (series_name, series) in enumerate(df_sorted.groupby('Category')):\n",
973 | " _plot_series(series, series_name, i)\n",
974 | " fig.legend(title='Category', bbox_to_anchor=(1, 1), loc='upper left')\n",
975 | "sns.despine(fig=fig, ax=ax)\n",
976 | "plt.xlabel('index')\n",
977 | "_ = plt.ylabel('count()')"
978 | ],
979 | "text/html": [
980 | " \n",
981 | "

\n",
1180 | "
\n",
1181 | " \n",
1193 | " "
1205 | ]
1206 | },
1207 | "metadata": {}
1208 | },
1209 | {
1210 | "output_type": "display_data",
1211 | "data": {
1212 | "text/plain": [
1213 | ""
1214 | ],
1215 | "text/html": [
1216 | "Values
\n",
1217 | ""
1222 | ]
1223 | },
1224 | "metadata": {}
1225 | },
1226 | {
1227 | "output_type": "display_data",
1228 | "data": {
1229 | "text/plain": [
1230 | "from matplotlib import pyplot as plt\n",
1231 | "_df_4['index'].plot(kind='line', figsize=(8, 4), title='index')\n",
1232 | "plt.gca().spines[['top', 'right']].set_visible(False)"
1233 | ],
1234 | "text/html": [
1235 | " \n",
1236 | "

\n",
1490 | "
\n",
1491 | " \n",
1503 | " "
1515 | ]
1516 | },
1517 | "metadata": {}
1518 | },
1519 | {
1520 | "output_type": "display_data",
1521 | "data": {
1522 | "text/plain": [
1523 | ""
1524 | ],
1525 | "text/html": [
1526 | "Faceted distributions
\n",
1527 | ""
1532 | ]
1533 | },
1534 | "metadata": {}
1535 | },
1536 | {
1537 | "output_type": "display_data",
1538 | "data": {
1539 | "text/plain": [
1540 | "from matplotlib import pyplot as plt\n",
1541 | "import seaborn as sns\n",
1542 | "figsize = (12, 1.2 * len(_df_5['Category'].unique()))\n",
1543 | "plt.figure(figsize=figsize)\n",
1544 | "sns.violinplot(_df_5, x='index', y='Category', inner='stick', palette='Dark2')\n",
1545 | "sns.despine(top=True, right=True, bottom=True, left=True)"
1546 | ],
1547 | "text/html": [
1548 | " \n",
1549 | "

\n",
1693 | "
\n",
1694 | " \n",
1706 | " "
1718 | ]
1719 | },
1720 | "metadata": {}
1721 | }
1722 | ]
1723 | },
1724 | {
1725 | "cell_type": "code",
1726 | "source": [
1727 | "data.info()"
1728 | ],
1729 | "metadata": {
1730 | "colab": {
1731 | "base_uri": "https://localhost:8080/"
1732 | },
1733 | "id": "HNMBxZJgfY8i",
1734 | "outputId": "33d4fd43-19f5-49b7-a174-7ea37c16ca18"
1735 | },
1736 | "execution_count": 23,
1737 | "outputs": [
1738 | {
1739 | "output_type": "stream",
1740 | "name": "stdout",
1741 | "text": [
1742 | "\n",
1743 | "RangeIndex: 5572 entries, 0 to 5571\n",
1744 | "Data columns (total 2 columns):\n",
1745 | " # Column Non-Null Count Dtype \n",
1746 | "--- ------ -------------- ----- \n",
1747 | " 0 Category 5572 non-null object\n",
1748 | " 1 Message 5572 non-null object\n",
1749 | "dtypes: object(2)\n",
1750 | "memory usage: 87.2+ KB\n"
1751 | ]
1752 | }
1753 | ]
1754 | },
1755 | {
1756 | "cell_type": "code",
1757 | "source": [
1758 | "data.shape"
1759 | ],
1760 | "metadata": {
1761 | "colab": {
1762 | "base_uri": "https://localhost:8080/"
1763 | },
1764 | "id": "J-alKBCtfnzQ",
1765 | "outputId": "cf4b6e87-166b-497c-bcce-7b7129e033de"
1766 | },
1767 | "execution_count": 24,
1768 | "outputs": [
1769 | {
1770 | "output_type": "execute_result",
1771 | "data": {
1772 | "text/plain": [
1773 | "(5572, 2)"
1774 | ]
1775 | },
1776 | "metadata": {},
1777 | "execution_count": 24
1778 | }
1779 | ]
1780 | },
1781 | {
1782 | "cell_type": "code",
1783 | "source": [
1784 | "data.loc[data['Category'] == 'spam' , 'Category',] = 0\n",
1785 | "data.loc[data['Category'] == 'ham' , 'Category',] = 1"
1786 | ],
1787 | "metadata": {
1788 | "id": "mYElEEbegCVX"
1789 | },
1790 | "execution_count": 25,
1791 | "outputs": []
1792 | },
1793 | {
1794 | "cell_type": "code",
1795 | "source": [
1796 | "X = data['Message']\n",
1797 | "\n",
1798 | "Y = data['Category']"
1799 | ],
1800 | "metadata": {
1801 | "id": "t30D9nQFgaUo"
1802 | },
1803 | "execution_count": 26,
1804 | "outputs": []
1805 | },
1806 | {
1807 | "cell_type": "code",
1808 | "source": [
1809 | "print(X)"
1810 | ],
1811 | "metadata": {
1812 | "colab": {
1813 | "base_uri": "https://localhost:8080/"
1814 | },
1815 | "id": "WvlJSiUogxt4",
1816 | "outputId": "2f02a8f7-162d-451d-c364-1c92595483e5"
1817 | },
1818 | "execution_count": 27,
1819 | "outputs": [
1820 | {
1821 | "output_type": "stream",
1822 | "name": "stdout",
1823 | "text": [
1824 | "0 Go until jurong point, crazy.. Available only ...\n",
1825 | "1 Ok lar... Joking wif u oni...\n",
1826 | "2 Free entry in 2 a wkly comp to win FA Cup fina...\n",
1827 | "3 U dun say so early hor... U c already then say...\n",
1828 | "4 Nah I don't think he goes to usf, he lives aro...\n",
1829 | " ... \n",
1830 | "5567 This is the 2nd time we have tried 2 contact u...\n",
1831 | "5568 Will ü b going to esplanade fr home?\n",
1832 | "5569 Pity, * was in mood for that. So...any other s...\n",
1833 | "5570 The guy did some bitching but I acted like i'd...\n",
1834 | "5571 Rofl. Its true to its name\n",
1835 | "Name: Message, Length: 5572, dtype: object\n"
1836 | ]
1837 | }
1838 | ]
1839 | },
1840 | {
1841 | "cell_type": "code",
1842 | "source": [
1843 | "print(Y)"
1844 | ],
1845 | "metadata": {
1846 | "colab": {
1847 | "base_uri": "https://localhost:8080/"
1848 | },
1849 | "id": "FQLdsGZy20Ml",
1850 | "outputId": "68496bfe-ca75-4445-991b-01903df87203"
1851 | },
1852 | "execution_count": 28,
1853 | "outputs": [
1854 | {
1855 | "output_type": "stream",
1856 | "name": "stdout",
1857 | "text": [
1858 | "0 1\n",
1859 | "1 1\n",
1860 | "2 0\n",
1861 | "3 1\n",
1862 | "4 1\n",
1863 | " ..\n",
1864 | "5567 0\n",
1865 | "5568 1\n",
1866 | "5569 1\n",
1867 | "5570 1\n",
1868 | "5571 1\n",
1869 | "Name: Category, Length: 5572, dtype: object\n"
1870 | ]
1871 | }
1872 | ]
1873 | },
1874 | {
1875 | "cell_type": "code",
1876 | "source": [
1877 | "X_train ,X_test , Y_train , Y_test = train_test_split(X , Y , test_size = 0.2 , random_state = 3)"
1878 | ],
1879 | "metadata": {
1880 | "id": "2lmBU2PR264l"
1881 | },
1882 | "execution_count": 62,
1883 | "outputs": []
1884 | },
1885 | {
1886 | "cell_type": "code",
1887 | "source": [
1888 | "print(X.shape)\n",
1889 | "print(X_train.shape)\n",
1890 | "print(X_test.shape)"
1891 | ],
1892 | "metadata": {
1893 | "colab": {
1894 | "base_uri": "https://localhost:8080/"
1895 | },
1896 | "id": "jxU7Ds7c4Aq5",
1897 | "outputId": "6f1960e8-21bc-4c67-fa8f-059577e9184e"
1898 | },
1899 | "execution_count": 63,
1900 | "outputs": [
1901 | {
1902 | "output_type": "stream",
1903 | "name": "stdout",
1904 | "text": [
1905 | "(5572,)\n",
1906 | "(4457,)\n",
1907 | "(1115,)\n"
1908 | ]
1909 | }
1910 | ]
1911 | },
1912 | {
1913 | "cell_type": "code",
1914 | "source": [
1915 | "print(Y.shape)\n",
1916 | "print(Y_train.shape)\n",
1917 | "print(Y_test.shape)"
1918 | ],
1919 | "metadata": {
1920 | "colab": {
1921 | "base_uri": "https://localhost:8080/"
1922 | },
1923 | "id": "Eiu6bkkE4pRK",
1924 | "outputId": "0ee21223-2a13-4080-ba28-a80e9c440a22"
1925 | },
1926 | "execution_count": 64,
1927 | "outputs": [
1928 | {
1929 | "output_type": "stream",
1930 | "name": "stdout",
1931 | "text": [
1932 | "(5572,)\n",
1933 | "(4457,)\n",
1934 | "(1115,)\n"
1935 | ]
1936 | }
1937 | ]
1938 | },
1939 | {
1940 | "cell_type": "code",
1941 | "source": [
1942 | "feature_extraction = TfidfVectorizer(min_df = 1 , stop_words = 'english' , lowercase=True)\n",
1943 | "\n",
1944 | "\n",
1945 | "X_train_features = feature_extraction.fit_transform(X_train)\n",
1946 | "X_test_features = feature_extraction.transform(X_test)\n",
1947 | "\n",
1948 | "Y_train = Y_train.astype('int')\n",
1949 | "Y_test = Y_test.astype('int')\n"
1950 | ],
1951 | "metadata": {
1952 | "id": "ouN-SWcp2RST"
1953 | },
1954 | "execution_count": 65,
1955 | "outputs": []
1956 | },
1957 | {
1958 | "cell_type": "code",
1959 | "source": [
1960 | "print(X_train)"
1961 | ],
1962 | "metadata": {
1963 | "colab": {
1964 | "base_uri": "https://localhost:8080/"
1965 | },
1966 | "id": "ZezxctswwA46",
1967 | "outputId": "8aa89089-130a-4b19-891a-f2846566a498"
1968 | },
1969 | "execution_count": 66,
1970 | "outputs": [
1971 | {
1972 | "output_type": "stream",
1973 | "name": "stdout",
1974 | "text": [
1975 | "3075 Don know. I did't msg him recently.\n",
1976 | "1787 Do you know why god created gap between your f...\n",
1977 | "1614 Thnx dude. u guys out 2nite?\n",
1978 | "4304 Yup i'm free...\n",
1979 | "3266 44 7732584351, Do you want a New Nokia 3510i c...\n",
1980 | " ... \n",
1981 | "789 5 Free Top Polyphonic Tones call 087018728737,...\n",
1982 | "968 What do u want when i come back?.a beautiful n...\n",
1983 | "1667 Guess who spent all last night phasing in and ...\n",
1984 | "3321 Eh sorry leh... I din c ur msg. Not sad alread...\n",
1985 | "1688 Free Top ringtone -sub to weekly ringtone-get ...\n",
1986 | "Name: Message, Length: 4457, dtype: object\n"
1987 | ]
1988 | }
1989 | ]
1990 | },
1991 | {
1992 | "cell_type": "code",
1993 | "source": [
1994 | "print(X_train_features)"
1995 | ],
1996 | "metadata": {
1997 | "colab": {
1998 | "base_uri": "https://localhost:8080/"
1999 | },
2000 | "id": "rlA8CQbswOMl",
2001 | "outputId": "b19c8254-7638-4491-e48c-b251a2d6ae16"
2002 | },
2003 | "execution_count": 67,
2004 | "outputs": [
2005 | {
2006 | "output_type": "stream",
2007 | "name": "stdout",
2008 | "text": [
2009 | " (0, 5413)\t0.6198254967574347\n",
2010 | " (0, 4456)\t0.4168658090846482\n",
2011 | " (0, 2224)\t0.413103377943378\n",
2012 | " (0, 3811)\t0.34780165336891333\n",
2013 | " (0, 2329)\t0.38783870336935383\n",
2014 | " (1, 4080)\t0.18880584110891163\n",
2015 | " (1, 3185)\t0.29694482957694585\n",
2016 | " (1, 3325)\t0.31610586766078863\n",
2017 | " (1, 2957)\t0.3398297002864083\n",
2018 | " (1, 2746)\t0.3398297002864083\n",
2019 | " (1, 918)\t0.22871581159877646\n",
2020 | " (1, 1839)\t0.2784903590561455\n",
2021 | " (1, 2758)\t0.3226407885943799\n",
2022 | " (1, 2956)\t0.33036995955537024\n",
2023 | " (1, 1991)\t0.33036995955537024\n",
2024 | " (1, 3046)\t0.2503712792613518\n",
2025 | " (1, 3811)\t0.17419952275504033\n",
2026 | " (2, 407)\t0.509272536051008\n",
2027 | " (2, 3156)\t0.4107239318312698\n",
2028 | " (2, 2404)\t0.45287711070606745\n",
2029 | " (2, 6601)\t0.6056811524587518\n",
2030 | " (3, 2870)\t0.5864269879324768\n",
2031 | " (3, 7414)\t0.8100020912469564\n",
2032 | " (4, 50)\t0.23633754072626942\n",
2033 | " (4, 5497)\t0.15743785051118356\n",
2034 | " :\t:\n",
2035 | " (4454, 4602)\t0.2669765732445391\n",
2036 | " (4454, 3142)\t0.32014451677763156\n",
2037 | " (4455, 2247)\t0.37052851863170466\n",
2038 | " (4455, 2469)\t0.35441545511837946\n",
2039 | " (4455, 5646)\t0.33545678464631296\n",
2040 | " (4455, 6810)\t0.29731757715898277\n",
2041 | " (4455, 6091)\t0.23103841516927642\n",
2042 | " (4455, 7113)\t0.30536590342067704\n",
2043 | " (4455, 3872)\t0.3108911491788658\n",
2044 | " (4455, 4715)\t0.30714144758811196\n",
2045 | " (4455, 6916)\t0.19636985317119715\n",
2046 | " (4455, 3922)\t0.31287563163368587\n",
2047 | " (4455, 4456)\t0.24920025316220423\n",
2048 | " (4456, 141)\t0.292943737785358\n",
2049 | " (4456, 647)\t0.30133182431707617\n",
2050 | " (4456, 6311)\t0.30133182431707617\n",
2051 | " (4456, 5569)\t0.4619395404299172\n",
2052 | " (4456, 6028)\t0.21034888000987115\n",
2053 | " (4456, 7154)\t0.24083218452280053\n",
2054 | " (4456, 7150)\t0.3677554681447669\n",
2055 | " (4456, 6249)\t0.17573831794959716\n",
2056 | " (4456, 6307)\t0.2752760476857975\n",
2057 | " (4456, 334)\t0.2220077711654938\n",
2058 | " (4456, 5778)\t0.16243064490100795\n",
2059 | " (4456, 2870)\t0.31523196273113385\n"
2060 | ]
2061 | }
2062 | ]
2063 | },
2064 | {
2065 | "cell_type": "code",
2066 | "source": [
2067 | "model = LogisticRegression()"
2068 | ],
2069 | "metadata": {
2070 | "id": "KxY32fZlwX_9"
2071 | },
2072 | "execution_count": 68,
2073 | "outputs": []
2074 | },
2075 | {
2076 | "cell_type": "code",
2077 | "source": [
2078 | "model.fit(X_train_features,Y_train)"
2079 | ],
2080 | "metadata": {
2081 | "colab": {
2082 | "base_uri": "https://localhost:8080/",
2083 | "height": 52
2084 | },
2085 | "id": "3lWPU-uMwm-Y",
2086 | "outputId": "86351b57-0c22-4be5-b6cd-ac6361abf9b1"
2087 | },
2088 | "execution_count": 69,
2089 | "outputs": [
2090 | {
2091 | "output_type": "execute_result",
2092 | "data": {
2093 | "text/plain": [
2094 | "LogisticRegression()"
2095 | ],
2096 | "text/html": [
2097 | "LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
2098 | ]
2099 | },
2100 | "metadata": {},
2101 | "execution_count": 69
2102 | }
2103 | ]
2104 | },
2105 | {
2106 | "cell_type": "code",
2107 | "source": [
2108 | "prediction_on_training_data = model.predict(X_train_features)\n",
2109 | "accuracy_on_training_data = accuracy_score(Y_train,prediction_on_training_data)"
2110 | ],
2111 | "metadata": {
2112 | "id": "Ng-7kYRjxLBA"
2113 | },
2114 | "execution_count": 70,
2115 | "outputs": []
2116 | },
2117 | {
2118 | "cell_type": "code",
2119 | "source": [
2120 | "print('Accuracy on training data : ',accuracy_on_training_data)"
2121 | ],
2122 | "metadata": {
2123 | "colab": {
2124 | "base_uri": "https://localhost:8080/"
2125 | },
2126 | "id": "8gWcjBvcyDZ6",
2127 | "outputId": "7fc92db4-bcab-474d-ee0c-0c6bfb091c0a"
2128 | },
2129 | "execution_count": 71,
2130 | "outputs": [
2131 | {
2132 | "output_type": "stream",
2133 | "name": "stdout",
2134 | "text": [
2135 | "Accuracy on training data : 0.9670181736594121\n"
2136 | ]
2137 | }
2138 | ]
2139 | },
2140 | {
2141 | "cell_type": "code",
2142 | "source": [
2143 | "prediction_on_testing_data = model.predict(X_test_features)\n",
2144 | "accuracy_on_testing_data = accuracy_score(Y_test,prediction_on_testing_data)"
2145 | ],
2146 | "metadata": {
2147 | "id": "vbsG32t0yb_R"
2148 | },
2149 | "execution_count": 72,
2150 | "outputs": []
2151 | },
2152 | {
2153 | "cell_type": "code",
2154 | "source": [
2155 | "print('Accuracy on testing data : ',accuracy_on_testing_data)"
2156 | ],
2157 | "metadata": {
2158 | "colab": {
2159 | "base_uri": "https://localhost:8080/"
2160 | },
2161 | "id": "0e7TYC5vzAYI",
2162 | "outputId": "3e9ed356-4f78-45c0-832f-a87d63e0b7ab"
2163 | },
2164 | "execution_count": 73,
2165 | "outputs": [
2166 | {
2167 | "output_type": "stream",
2168 | "name": "stdout",
2169 | "text": [
2170 | "Accuracy on testing data : 0.9659192825112107\n"
2171 | ]
2172 | }
2173 | ]
2174 | },
2175 | {
2176 | "cell_type": "code",
2177 | "source": [
2178 | "input_mail = [\"Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030\"]\n",
2179 | "\n",
2180 | "input_data_features = feature_extraction.transform(input_mail)\n",
2181 | "\n",
2182 | "prediction = model.predict(input_data_features)\n",
2183 | "\n",
2184 | "print(prediction)\n",
2185 | "\n",
2186 | "if(prediction[0]==1):\n",
2187 | " print(\"Ham Mail\")\n",
2188 | "else:\n",
2189 | " print(\"Spam Mail\")\n"
2190 | ],
2191 | "metadata": {
2192 | "colab": {
2193 | "base_uri": "https://localhost:8080/"
2194 | },
2195 | "id": "5c_X82pbzmbC",
2196 | "outputId": "57ee2a8b-5b10-474b-f6f9-9c369d492747"
2197 | },
2198 | "execution_count": 76,
2199 | "outputs": [
2200 | {
2201 | "output_type": "stream",
2202 | "name": "stdout",
2203 | "text": [
2204 | "[0]\n",
2205 | "Spam Mail\n"
2206 | ]
2207 | }
2208 | ]
2209 | },
2210 | {
2211 | "cell_type": "code",
2212 | "source": [],
2213 | "metadata": {
2214 | "id": "VbDBuuxX0cEF"
2215 | },
2216 | "execution_count": null,
2217 | "outputs": []
2218 | }
2219 | ]
2220 | }
2221 |
--------------------------------------------------------------------------------