├── requirements.txt
├── README.md
└── Poetry NLP Notebook.ipynb
/requirements.txt:
--------------------------------------------------------------------------------
1 | absl-py==0.9.0
2 | astunparse==1.6.3
3 | cachetools==4.1.0
4 | certifi==2020.4.5.1
5 | chardet==3.0.4
6 | click==7.1.2
7 | cycler==0.10.0
8 | gast==0.3.3
9 | google-auth==1.15.0
10 | google-auth-oauthlib==0.4.1
11 | google-pasta==0.2.0
12 | grpcio==1.29.0
13 | h5py==2.10.0
14 | idna==2.9
15 | importlib-metadata==1.6.0
16 | joblib==0.15.1
17 | Keras-Preprocessing==1.1.2
18 | kiwisolver==1.2.0
19 | Markdown==3.2.2
20 | matplotlib==3.2.1
21 | nltk==3.5
22 | numpy==1.18.4
23 | oauthlib==3.1.0
24 | opt-einsum==3.2.1
25 | pandas==1.0.3
26 | Pillow==7.1.2
27 | protobuf==3.12.2
28 | pyasn1==0.4.8
29 | pyasn1-modules==0.2.8
30 | pyparsing==2.4.7
31 | python-dateutil==2.8.1
32 | pytz==2020.1
33 | regex==2020.5.14
34 | requests==2.23.0
35 | requests-oauthlib==1.3.0
36 | rsa==4.0
37 | scikit-learn==0.23.1
38 | scipy==1.4.1
39 | seaborn==0.10.1
40 | six==1.15.0
41 | sklearn==0.0
42 | tensorboard==2.2.1
43 | tensorboard-plugin-wit==1.6.0.post3
44 | tensorflow==2.2.0
45 | tensorflow-estimator==2.2.0
46 | termcolor==1.1.0
47 | threadpoolctl==2.0.0
48 | tqdm==4.46.0
49 | urllib3==1.25.9
50 | Werkzeug==1.0.1
51 | wordcloud==1.7.0
52 | wrapt==1.12.1
53 | zipp==3.1.0
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # Poetry Classification Notebook
4 | 
5 | This notebook includes classification of poetry ages and authors with both RNNs and decision trees (because the size of data is too small).
6 |
7 |
8 | ## Models and Data Used
9 |
10 | - Data: Poetry from various poets such as William Shakespeare, different genres and different ages.
11 | - Classification Methods: Decision Trees (sklearn) and RNNs (tf.keras)
12 |
13 | 
14 |
15 | # Files
16 |
17 | - *all.csv* including data taken from [kaggle](https://www.kaggle.com/ishnoor/poetry-analysis-with-machine-learning?select=all.csv)
18 | - *poetry-nlp-notebook.ipynb* Interactive Python Notebook that includes the code itself
19 |
20 | ## Libraries Used
21 |
22 | nltk
23 | re
24 | keras
25 | seaborn
26 | matplotlib
27 | scikit-learn
28 | pandas
29 | tensorflow
30 | numpy
31 | wordcloud
32 | ps: All the libraries can be downloaded by pip install -r requirements.txt
33 |
34 |
35 | ## Author
36 |
37 | - **Merve Noyan** - [merveenoyan](https://github.com/merveenoyan)
38 |
39 | ## Further Notes
40 | Will migrate this project to tensorflow and generate poetry, stay tuned and watch this repo if you don't want to miss 🤓
41 |
42 | > Written with [StackEdit](https://stackedit.io/).
43 |
44 |
--------------------------------------------------------------------------------
/Poetry NLP Notebook.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"metadata":{},"cell_type":"markdown","source":"**Poetry Classification Notebook**"},{"metadata":{},"cell_type":"markdown","source":"I've came across this dataset as I was looking for renaissance paintings to use in GAN, and seeing there are no kernels on it, I thought I might just dive in. \nThere are five columns, the poetry itself, the type, author, age of it. First I'll do exploratory data analysis and preprocessing, then I'll classify the author and the age of the poetries using decision trees."},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import pandas as pd\nimport numpy as np\nfrom sklearn.model_selection import train_test_split\nfrom sklearn import preprocessing\nfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\nfrom sklearn.metrics import classification_report, confusion_matrix\nfrom sklearn.metrics import accuracy_score\nfrom nltk.corpus import stopwords\nfrom nltk.corpus import stopwords\nfrom wordcloud import WordCloud, STOPWORDS\nfrom sklearn.tree import DecisionTreeClassifier\nimport seaborn as sns\nimport gc\nimport re","execution_count":2,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"import tensorflow as tf\nfrom tensorflow.keras.layers import GRU, LSTM, Embedding\nfrom tensorflow.keras.callbacks import EarlyStopping\nfrom tensorflow.keras import optimizers\nfrom tensorflow.keras.layers import Activation, Dense, Bidirectional","execution_count":3,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"**Importing the dataset**"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry=pd.read_csv(\"../input/poetry-analysis-with-machine-learning/all.csv\", sep=\",\")\ndf_poetry.head()","execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":" author \\\n0 WILLIAM SHAKESPEARE \n1 DUCHESS OF NEWCASTLE MARGARET CAVENDISH \n2 THOMAS BASTARD \n3 EDMUND SPENSER \n4 RICHARD BARNFIELD \n\n content \\\n0 Let the bird of loudest lay\\r\\nOn the sole Ara... \n1 Sir Charles into my chamber coming in,\\r\\nWhen... \n2 Our vice runs beyond all that old men saw,\\r\\n... \n3 Lo I the man, whose Muse whilome did maske,\\r\\... \n4 Long have I longd to see my love againe,\\r\\nSt... \n\n poem name age type \n0 The Phoenix and the Turtle Renaissance Mythology & Folklore \n1 An Epilogue to the Above Renaissance Mythology & Folklore \n2 Book 7, Epigram 42 Renaissance Mythology & Folklore \n3 from The Faerie Queene: Book I, Canto I Renaissance Mythology & Folklore \n4 Sonnet 16 Renaissance Mythology & Folklore ","text/html":"
\n\n
\n \n
\n
\n
author
\n
content
\n
poem name
\n
age
\n
type
\n
\n \n \n
\n
0
\n
WILLIAM SHAKESPEARE
\n
Let the bird of loudest lay\\r\\nOn the sole Ara...
\n
The Phoenix and the Turtle
\n
Renaissance
\n
Mythology & Folklore
\n
\n
\n
1
\n
DUCHESS OF NEWCASTLE MARGARET CAVENDISH
\n
Sir Charles into my chamber coming in,\\r\\nWhen...
\n
An Epilogue to the Above
\n
Renaissance
\n
Mythology & Folklore
\n
\n
\n
2
\n
THOMAS BASTARD
\n
Our vice runs beyond all that old men saw,\\r\\n...
\n
Book 7, Epigram 42
\n
Renaissance
\n
Mythology & Folklore
\n
\n
\n
3
\n
EDMUND SPENSER
\n
Lo I the man, whose Muse whilome did maske,\\r\\...
\n
from The Faerie Queene: Book I, Canto I
\n
Renaissance
\n
Mythology & Folklore
\n
\n
\n
4
\n
RICHARD BARNFIELD
\n
Long have I longd to see my love againe,\\r\\nSt...
\n
Sonnet 16
\n
Renaissance
\n
Mythology & Folklore
\n
\n \n
\n
"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"First I'll do exploratory data analysis, then classify the poetries in age, type and author. Let's see the list of authors, types and ages."},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry.rename(columns={\"poem name\":\"poem_name\"}, inplace=True)","execution_count":5,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry.age.unique()","execution_count":6,"outputs":[{"output_type":"execute_result","execution_count":6,"data":{"text/plain":"array(['Renaissance', 'Modern'], dtype=object)"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"There are three types of poetry."},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry.type.unique()","execution_count":7,"outputs":[{"output_type":"execute_result","execution_count":7,"data":{"text/plain":"array(['Mythology & Folklore', 'Nature', 'Love'], dtype=object)"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"Let's see the list of authors."},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry.author.unique()","execution_count":8,"outputs":[{"output_type":"execute_result","execution_count":8,"data":{"text/plain":"array(['WILLIAM SHAKESPEARE', 'DUCHESS OF NEWCASTLE MARGARET CAVENDISH',\n 'THOMAS BASTARD', 'EDMUND SPENSER', 'RICHARD BARNFIELD',\n 'SIR WALTER RALEGH', 'QUEEN ELIZABETH I', 'JOHN DONNE',\n 'JOHN SKELTON', 'CHRISTOPHER MARLOWE', 'LADY MARY WROTH',\n 'ROBERT SOUTHWELL, SJ', 'WILLIAM BYRD', 'GEORGE GASCOIGNE',\n 'HENRY VIII, KING OF ENGLAND', 'SIR THOMAS WYATT', 'EN JONSON',\n 'ORLANDO GIBBONS', 'THOMAS NASHE', 'SIR PHILIP SIDNEY',\n 'SECOND BARON VAUX OF HARROWDEN THOMAS, LORD VAUX',\n 'HENRY HOWARD, EARL OF SURREY', 'GEORGE CHAPMAN', 'THOMAS CAMPION',\n 'ISABELLA WHITNEY', 'SAMUEL DANIEL', 'THOMAS HEYWOOD',\n 'GIOVANNI BATTISTA GUARINI', 'SIR EDWARD DYER', 'THOMAS LODGE',\n 'JOHN FLETCHER', 'EDGAR LEE MASTERS', 'WILLIAM BUTLER YEATS',\n 'FORD MADOX FORD', 'IVOR GURNEY', 'CARL SANDBURG', 'EZRA POUND',\n 'ELINOR WYLIE', 'GEORGE SANTAYANA', 'LOUISE BOGAN',\n 'KENNETH SLESSOR', 'HART CRANE', 'D. H. LAWRENCE',\n 'HUGH MACDIARMID', 'E. E. CUMMINGS', 'LOUIS UNTERMEYER',\n 'WALLACE STEVENS', 'MARJORIE PICKTHALL', 'RICHARD ALDINGTON',\n 'GUILLAUME APOLLINAIRE', 'SAMUEL GREENBERG', 'STEPHEN SPENDER',\n 'EDITH SITWELL', 'PAUL LAURENCE DUNBAR', 'SARA TEASDALE',\n 'MINA LOY', 'MARIANNE MOORE', 'ASIL BUNTING', 'MICHAEL ANANIA',\n 'ARCHIBALD MACLEISH', 'CONRAD AIKEN', 'MALCOLM COWLEY',\n 'KATHERINE MANSFIELD', 'T. S. ELIOT', 'GERTRUDE STEIN',\n 'JAMES JOYCE', 'KENNETH FEARING'], dtype=object)"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"**Removing special characters from the content column, leaving the spaces for tokenization**"},{"metadata":{"trusted":true},"cell_type":"code","source":"def remove_special_chars(text, remove_digits=True):\n text=re.sub('[^a-zA-Z.\\d\\s]', '',text)\n return text\ndf_poetry.content=df_poetry.content.apply(remove_special_chars)","execution_count":9,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Importing the list of stopwords, I have gathered the below gist def remove_stopwords from another notebook."},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn.preprocessing import LabelEncoder\nle=LabelEncoder()\ndf_poetry.age=le.fit_transform(df_poetry.age)\ndf_poetry","execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":" author \\\n0 WILLIAM SHAKESPEARE \n1 DUCHESS OF NEWCASTLE MARGARET CAVENDISH \n2 THOMAS BASTARD \n3 EDMUND SPENSER \n4 RICHARD BARNFIELD \n.. ... \n568 SARA TEASDALE \n569 HART CRANE \n570 WILLIAM BUTLER YEATS \n571 CARL SANDBURG \n572 RICHARD ALDINGTON \n\n content \\\n0 Let the bird of loudest lay\\r\\nOn the sole Ara... \n1 Sir Charles into my chamber coming in\\r\\nWhen ... \n2 Our vice runs beyond all that old men saw\\r\\nA... \n3 Lo I the man whose Muse whilome did maske\\r\\nA... \n4 Long have I longd to see my love againe\\r\\nSti... \n.. ... \n568 With the man I love who loves me not\\r\\nI walk... \n569 Hart Crane Voyages I II III IV V VI from The C... \n570 When you are old and grey and full of sleep\\r\\... \n571 Give me hunger\\r\\nO you gods that sit and give... \n572 Potuia potuia\\r\\nWhite grave goddess\\r\\nPity m... \n\n poem_name age type \n0 The Phoenix and the Turtle 1 Mythology & Folklore \n1 An Epilogue to the Above 1 Mythology & Folklore \n2 Book 7, Epigram 42 1 Mythology & Folklore \n3 from The Faerie Queene: Book I, Canto I 1 Mythology & Folklore \n4 Sonnet 16 1 Mythology & Folklore \n.. ... ... ... \n568 Union Square 0 Love \n569 Voyages 0 Love \n570 When You Are Old 0 Love \n571 At a Window 0 Love \n572 To a Greek Marble 0 Love \n\n[573 rows x 5 columns]","text/html":"
\n\n
\n \n
\n
\n
author
\n
content
\n
poem_name
\n
age
\n
type
\n
\n \n \n
\n
0
\n
WILLIAM SHAKESPEARE
\n
Let the bird of loudest lay\\r\\nOn the sole Ara...
\n
The Phoenix and the Turtle
\n
1
\n
Mythology & Folklore
\n
\n
\n
1
\n
DUCHESS OF NEWCASTLE MARGARET CAVENDISH
\n
Sir Charles into my chamber coming in\\r\\nWhen ...
\n
An Epilogue to the Above
\n
1
\n
Mythology & Folklore
\n
\n
\n
2
\n
THOMAS BASTARD
\n
Our vice runs beyond all that old men saw\\r\\nA...
\n
Book 7, Epigram 42
\n
1
\n
Mythology & Folklore
\n
\n
\n
3
\n
EDMUND SPENSER
\n
Lo I the man whose Muse whilome did maske\\r\\nA...
\n
from The Faerie Queene: Book I, Canto I
\n
1
\n
Mythology & Folklore
\n
\n
\n
4
\n
RICHARD BARNFIELD
\n
Long have I longd to see my love againe\\r\\nSti...
\n
Sonnet 16
\n
1
\n
Mythology & Folklore
\n
\n
\n
...
\n
...
\n
...
\n
...
\n
...
\n
...
\n
\n
\n
568
\n
SARA TEASDALE
\n
With the man I love who loves me not\\r\\nI walk...
\n
Union Square
\n
0
\n
Love
\n
\n
\n
569
\n
HART CRANE
\n
Hart Crane Voyages I II III IV V VI from The C...
\n
Voyages
\n
0
\n
Love
\n
\n
\n
570
\n
WILLIAM BUTLER YEATS
\n
When you are old and grey and full of sleep\\r\\...
\n
When You Are Old
\n
0
\n
Love
\n
\n
\n
571
\n
CARL SANDBURG
\n
Give me hunger\\r\\nO you gods that sit and give...
"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_poetry.drop(columns=[\"author\", \"poem_name\",\"type\"])","execution_count":11,"outputs":[{"output_type":"execute_result","execution_count":11,"data":{"text/plain":" content age\n0 Let the bird of loudest lay\\r\\nOn the sole Ara... 1\n1 Sir Charles into my chamber coming in\\r\\nWhen ... 1\n2 Our vice runs beyond all that old men saw\\r\\nA... 1\n3 Lo I the man whose Muse whilome did maske\\r\\nA... 1\n4 Long have I longd to see my love againe\\r\\nSti... 1\n.. ... ...\n568 With the man I love who loves me not\\r\\nI walk... 0\n569 Hart Crane Voyages I II III IV V VI from The C... 0\n570 When you are old and grey and full of sleep\\r\\... 0\n571 Give me hunger\\r\\nO you gods that sit and give... 0\n572 Potuia potuia\\r\\nWhite grave goddess\\r\\nPity m... 0\n\n[573 rows x 2 columns]","text/html":"
\n\n
\n \n
\n
\n
content
\n
age
\n
\n \n \n
\n
0
\n
Let the bird of loudest lay\\r\\nOn the sole Ara...
\n
1
\n
\n
\n
1
\n
Sir Charles into my chamber coming in\\r\\nWhen ...
\n
1
\n
\n
\n
2
\n
Our vice runs beyond all that old men saw\\r\\nA...
\n
1
\n
\n
\n
3
\n
Lo I the man whose Muse whilome did maske\\r\\nA...
\n
1
\n
\n
\n
4
\n
Long have I longd to see my love againe\\r\\nSti...
\n
1
\n
\n
\n
...
\n
...
\n
...
\n
\n
\n
568
\n
With the man I love who loves me not\\r\\nI walk...
\n
0
\n
\n
\n
569
\n
Hart Crane Voyages I II III IV V VI from The C...
\n
0
\n
\n
\n
570
\n
When you are old and grey and full of sleep\\r\\...
\n
0
\n
\n
\n
571
\n
Give me hunger\\r\\nO you gods that sit and give...
"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"Heat map between label encoded features."},{"metadata":{"trusted":true},"cell_type":"code","source":"sns.heatmap(corr, \n xticklabels=corr.columns,\n yticklabels=corr.columns)","execution_count":31,"outputs":[{"output_type":"execute_result","execution_count":31,"data":{"text/plain":""},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"","image/png":"iVBORw0KGgoAAAANSUhEUgAAAWwAAAD8CAYAAABTjp5OAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFmVJREFUeJzt3X2UJXV95/H3hxEYH4FRk0WeBDI+4HEXdETU+BjAMScLYjA7BCMTYXsxEFx31w2ceICDhyzR45LFgGsnDKAxQBCF0QziACKJ0WSGOMvAwOg4IrTDijLsLAQZ7e7P/nGr17Lt2123b93ururPi/M7XQ+/uvW9d5rv/OZbv6or20RExMK3x3wHEBER1SRhR0Q0RBJ2RERDJGFHRDREEnZEREMkYUdENEQSdkREF5LWSHpU0r1d9kvSZZK2SbpH0qtK+06T9J2inVZHPEnYERHdXQ2snGb/O4DlRRsCPgkgaRlwAfBa4GjgAkn79RtMEnZERBe27wJ2TtPlRODT7vgmsK+k/YG3A+tt77T9OLCe6RN/Jc/o9wVm8rMfb8+tlAO2+agPzncIrXeBxuY7hEXhiw99Sf2+Ri85Z68XHv4f6IyMJwzbHu7hdAcAD5fWR4pt3bb3ZeAJOyJioSqScy8JerKp/oLxNNv7kpJIRLTL+Fj11r8R4KDS+oHAjmm29yUJOyLaZWy0euvfWuC9xWyRY4Bdth8BbgWOl7RfcbHx+GJbX1ISiYhWscdrey1J1wJvAV4gaYTOzI89O+fx/wTWAb8JbAOeAn6/2LdT0keADcVLXWR7uouXlSRhR0S7jNeXsG2fMsN+A2d12bcGWFNbMCRhR0Tb1DjCXmiSsCOiXeq5mLggJWFHRLtkhB0R0QyuZ/bHgpSEHRHtUuNFx4UmCTsi2iUlkYiIhshFx4iIhsgIOyKiIXLRMSKiIXLRMSKiGezUsCMimiE17IiIhkhJJCKiITLCjohoiLGfzXcEA5OEHRHtkpJIRERDpCQSEdEQGWFHRDREEnZERDM4Fx0jIhoiNeyIiIZISSQioiEywo6IaIiMsCMiGiIj7IiIhhjNFxhERDRDRtgREQ2RGnZEREO0eIS9x0wdJC2RdNtcBBMR0bfx8eptBpJWStoqaZukc6fYf6mkTUX7tqT/U9o3Vtq3to63NuMI2/aYpKck7WN7Vx0njYgYmJpG2JKWAJcDxwEjwAZJa21v+f+nsj9Y6v+HwFGll/iJ7SNrCaYw4wi78DSwWdKVki6baN06SxqStFHSxr/89LX1RBoRUcXoaPU2vaOBbba32/4pcB1w4jT9TwEGmvCq1rD/tmiV2B4GhgF+9uPtnkVcERGz4+opR9IQMFTaNFzkL4ADgIdL+0aA13Z5nUOAQ4E7SpuXStoIjAKX2L6pcmBdVErYtq+RtBfwkmLTVtvtfSRWRDRXD7NEyoPLKWiqQ7r0XQV8zvZYadvBtndIOgy4Q9Jm29+tHNwUKiVsSW8BrgEepPMmDpJ0mu27+jl5RETt6pvWNwIcVFo/ENjRpe8q4KzyBts7ip/bJd1Jp749+IQNfBw43vZWAEkvoVOreXU/J4+IqF190/o2AMslHQr8gE5S/t3JnSS9FNgP+EZp237AU7Z3S3oB8Abgo/0GVDVh7zmRrAFsf1vSnv2ePCKidmNjM/epwPaopLOBW4ElwBrb90m6CNhoe2Kq3inAdfYvFM9fDnxK0jidyR2XlGeXzFbVhL1R0pXAZ4r1U4G7+z15RETtarzT0fY6YN2kbedPWr9wiuP+AXhlbYEUqibs99Opz5xDp4Z9F3BF3cFERPRtsd+abns38N+LFhGxcLX41vSqs0TeAFwIHFI+xvZhgwkrImJ2PN7eWz+qlkSuBD5Ip25dT0U/ImIQFntJBNhl+5aBRhIRUYeaZoksRNMmbEmvKha/KuljwOeB3RP7bf/zAGOLiOjdIh5hf3zS+orSsoG31RtORESfFmvCtv1WAEmH2d5e3lfcHx8RsbD08PCnpqn6eNXPTbHthjoDiYioRY1fYLDQzFTDfhnwCmAfSe8q7XoesHSQgUVEzMointb3UuC3gH2Bf1va/gTw7wcVVETErC3WWSK2bwZulvQ629+Yrm9ExELgBpY6qqo6D3tI0i+NqG2/r+Z4IiL6s4hLIhO+VFpeCpxE9wd5R0TMn8X+LBHbN5bXJV0L3DaQiCIi+pER9i9ZDhxcZyAREbUYXaQXHSdIeoKff/mkgR8C/3VQQUVEzFpKIn6upGV0RtYT86/b+++OiGiuxV4SkXQG8AE63xq8CTiGzhdO5lkiEbGgtHlaX9Vb0z8AvAb4fvF8kaOAHw0sqoiI2Rp39dYwVS86Pm37aUlI2tv2A8VXu0dELCwNTMRVVU3YI5L2BW4C1kt6nMzDjoiFaLHemj7B9knF4oWSvgrsA3x5YFFFRMxSvtOxxPbXBhFIREQtkrAjIhqixbNEkrAjol0ywo6IaIgk7IiIZvBYSiKztvmoDw76FIveK7916XyH0Hq3vOiN8x1CVNXiEXbVOx0jIhrB467cZiJppaStkrZJOneK/asl/UjSpqKdUdp3mqTvFO20Ot5bSiIR0S41jbAlLQEuB44DRoANktba3jKp6/W2z5507DLgAmAFnQfl3V0c+3g/MWWEHRHtMt5Dm97RwDbb223/FLgOOLFiFG8H1tveWSTp9cDK3t7IL0vCjohW8eh45SZpSNLGUhsqvdQBwMOl9ZFi22S/LekeSZ+TdFCPx/YkJZGIaJceJonYHgaGu+zWVIdMWv8icK3t3ZLOBK6h89jpKsf2LCPsiGiVGi86jgAHldYPZNJD72w/Znt3sfoXwKurHjsbSdgR0S711bA3AMslHSppL2AVsLbcQdL+pdUTgPuL5VuB4yXtJ2k/4PhiW19SEomIVqnraX22RyWdTSfRLgHW2L5P0kXARttrgXMknQCMAjuB1cWxOyV9hE7SB7jI9s5+Y0rCjoh2qfFGR9vrgHWTtp1fWj4POK/LsWuANfVFk4QdES3j0fmOYHCSsCOiVdzeR4kkYUdEyyRhR0Q0Q0bYERENkYQdEdEQHpvqJsN2SMKOiFbJCDsioiE8nhF2REQjZIQdEdEQdkbYERGNkBF2RERDjGeWSEREM+SiY0REQyRhR0Q0hOt5HPaClIQdEa2SEXZERENkWl9EREOMZZZIREQzZIQdEdEQqWFHRDREZolERDRERtgREQ0xNr7HfIcwMEnYEdEqKYlERDTEeGaJREQ0Q6b1FSQ92/a/DCqYiIh+tbkkUqk6L+n1krYA9xfr/0bSFdP0H5K0UdLGzz/5YD2RRkRUMG5Vbk1T9XLqpcDbgccAbP8v4E3dOtsetr3C9op3PefFfQcZEVHV2PgelVvTVI7Y9sOTNo3VHEtERN/cQ5uJpJWStkraJuncKfb/J0lbJN0j6XZJh5T2jUnaVLS1Nby1yjXshyW9HrCkvYBzKMojERELSV2lDklLgMuB44ARYIOktba3lLp9C1hh+ylJ7wc+Cvy7Yt9PbB9ZSzCFqiPsM4GzgAPoBH5ksR4RsaDYqtxmcDSwzfZ22z8FrgNO/MVz+au2nypWvwkcWPsbKqk0wrb9Y+DUQQYSEVGHXr40XdIQMFTaNGx7uFg+ACiXgkeA107zcqcDt5TWl0raCIwCl9i+qYfQplQpYUu6bIrNu4CNtm/uN4iIiLqY6iWRIjkPd9k91QtNWfqW9B5gBfDm0uaDbe+QdBhwh6TNtr9bObgpVC2JLKVTBvlO0f41sAw4XdKf9RNARESdRq3KbQYjwEGl9QOBHZM7SToW+GPgBNu7J7bb3lH83A7cCRzV3zurftHx14C32R4tAvwk8BU6xfjN/QYREVGXXkbYM9gALJd0KPADYBXwu+UOko4CPgWstP1oaft+wFO2d0t6AfAGOhck+1I1YR8APJtOGYRi+UW2xyTt7n5YRMTc6qWGPR3bo5LOBm4FlgBrbN8n6SI65eC1wMeA5wA3SAJ4yPYJwMuBT0kap1PJuGTS7JJZqZqwPwpsknQnnbrOm4A/kfRs4LZ+g4iIqEuNI2xsrwPWTdp2fmn52C7H/QPwytoCKVSdJXKlpFuA3wMeoFMOGSmeK/KhuoOKiJitukbYC1HVWSJnAB+gU3TfBBwDfAN42+BCi4jo3ViNI+yFpuoskQ8ArwG+b/utdK52/mhgUUVEzNK4qremqVrDftr205KQtLftByS9dKCRRUTMwniLR9hVE/aIpH2Bm4D1kh5nivmIERHzrcWPw6580fGkYvFCSV8F9gG+PLCoIiJmadFfdCyz/bVBBBIRUYdxpSQSEdEIbX5QfxJ2RLRKE2d/VJWEHRGtklkiERENsehniURENEVKIhERDZFpfRERDTGWEXZERDNkhB0R0RBJ2BERDTHzVzU2VxJ2RLRKRtgREQ2RW9MjIhoi87AjIhoiJZGIiIZIwo6IaIg8SyQioiFSw46IaIjMEunDBWrzx7cw3PKiN853CK33kx1/N98hREXjLS6KZIQdEa2Si44REQ3R3vF1EnZEtEybR9h7zHcAERF1GpUrt5lIWilpq6Rtks6dYv/ekq4v9v+jpBeX9p1XbN8q6e11vLck7IhoFffQpiNpCXA58A7gCOAUSUdM6nY68LjtXwMuBf60OPYIYBXwCmAlcEXxen1Jwo6IVhnvoc3gaGCb7e22fwpcB5w4qc+JwDXF8ueA35CkYvt1tnfb/h6wrXi9viRhR0SrjOPKTdKQpI2lNlR6qQOAh0vrI8U2pupjexTYBTy/4rE9y0XHiGiVXmaJ2B4Ghrvsnuqeyckv361PlWN7lhF2RLRKjSWREeCg0vqBwI5ufSQ9A9gH2Fnx2J4lYUdEq4zhym0GG4Dlkg6VtBedi4hrJ/VZC5xWLJ8M3GHbxfZVxSySQ4HlwD/1+95SEomIVqlrHrbtUUlnA7cCS4A1tu+TdBGw0fZa4ErgM5K20RlZryqOvU/S3wBbgFHgLNt9P6cjCTsiWsU13utoex2wbtK280vLTwPv7nLsxcDFtQVDEnZEtEyb73RMwo6IVsnT+iIiGqK96ToJOyJaZrTFKTsJOyJapc6LjgtNEnZEtEouOkZENERG2BERDZERdkREQ4w5I+yIiEbIPOyIiIZIDTsioiFSw46IaIiURCIiGiIlkYiIhsgskYiIhkhJJCKiIXLRMSKiIVLDjohoiJREIiIawrnoGBHRDGMZYUdENENKIhERDdHmksgeM3VQx3sknV+sHyzp6MGHFhHRu3FcuTXNjAkbuAJ4HXBKsf4EcPl0B0gakrRR0sbvP/lQnyFGRFTnHv5rmioJ+7W2zwKeBrD9OLDXdAfYHra9wvaKQ55zcA1hRkRUM2ZXbk1TpYb9M0lLoPPXkaQX0u6biSKiwZpY6qiqSsK+DPgC8KuSLgZOBj480KgiImZpUSds25+VdDfwG8Wmd9q+f7BhRUTMzlzNEpG0DLgeeDHwIPA7Rcm43OdI4JPA84Ax4GLb1xf7rgbeDOwquq+2vWm6c1apYQM8C1hS9H9mxWMiIubcHM4SORe43fZy4PZifbKngPfafgWwEvgzSfuW9n/I9pFFmzZZQ7VpfecD1wDLgBcAV0lKSSQiFqQ5nCVyIp3cSPHznb8Ui/1t298plncAjwIvnO0Jq4ywTwFeY/tC2xcAxwCnzvaEERGDNObxyq08BbloQz2c6ldtPwJQ/PyV6ToX96/sBXy3tPliSfdIulTS3jOdsMpFxweBpRTT+oC9J50wImLB6KWGbXsYGO62X9JtwL+aYtcf9xKTpP2BzwCn2Z6YZXce8L/pJPFh4I+Ai6Z7nSoJezdwn6T1dKb2HQf8vaTLAGyf00vgERGDVOcsEdvHdtsn6YeS9rf9SJGQH+3S73nA3wIftv3N0ms/UizulnQV8F9miqdKwv5C0SbcWeGYiIh5MYd3MK4FTgMuKX7ePLmDpL3o5M9P275h0r6JZC869e97ZzphlYT9GLCuNIyPiFiwxufuDsZLgL+RdDrwEPBuAEkrgDNtnwH8DvAm4PmSVhfHTUzf+2xxI6KATcCZM52wSsJeBfwPSTcCV2UOdkQsZHM1wrb9GD+/P6W8fSNwRrH8V8BfdTn+bb2es8qNM+8pajCn0JnSZ+Aq4FrbT/R6woiIQRprcTGg0o0ztv8vcCNwHbA/cBLwz5L+cICxRUT0bNyu3JpmxhG2pBOA3wcOpzMt5Wjbj0p6FnA/8InBhhgRUV0TH5taVZUa9qnApbbvmtgg6U9t/5Gk9w0utIiI3jVx5FxVlZLI8nKyLrwDwPbt9YcUETF7bf4Cg64jbEnvB/4AOEzSPaVdzwW+PujAIiJmY8xj8x3CwExXEvlr4Bbgv/GLT6F6wvbOgUYVETFLbf4S3q4J2/YuOs9pPaVbn4iIhWZRf4FBRESTLMoRdkREE7V5lkgSdkS0ShNnf1SVhB0RrdLmW9OTsCOiVVLDjohoiNSwIyIaIiPsiIiGyDzsiIiGyAg7IqIhMkskIqIhctExIqIhUhKJiGiI3OkYEdEQGWFHRDREm2vYavPfRrMlacj28HzH0Wb5jAcvn3H7VPlOx8VoaL4DWATyGQ9ePuOWScKOiGiIJOyIiIZIwp5a6n6Dl8948PIZt0wuOkZENERG2BERDZGEHRHREEnYgKR3SjqitH6npBXzGVNEN5L2lfQH8x1HzL0k7I53AkfM2KsCSbl7NAZtXyAJexFqbcKWdJOkuyXdJ2mo2PZkaf/Jkq6W9HrgBOBjkjZJOrzo8m5J/yTp25LeWByzVNJVkjZL+paktxbbV0u6QdIXga/M7TtdmLp8/qcXn+edkv5C0p8X218o6UZJG4r2hvmNfsG7BDi8+H29QdKJEzskfVbSCcXv5M2Svixpq6QLSn3eU/xub5L0KUlL5uVdRO9st7IBy4qfzwTuBZ4PPFnafzJwdbF8NXByad+dwMeL5d8EbiuW/zNwVbH8MuAhYCmwGhiZOGfalJ//AcCDwDJgT+DvgD8v+vw18OvF8sHA/fMd/0JuwIuBe4vlNwM3Fcv7AN+j84yg1cAjxe/9xJ/BCuDlwBeBPYtjrgDeO9/vKa1aa/M/38+RdFKxfBCwvMfjP1/8vJvO/yAAvw58AsD2A5K+D7yk2Lfe9s7Zh9s6kz//3wO+NvEZSbqBn392xwJHSJo49nmSnmv7ibkMuIlsf03S5ZJ+BXgXcKPt0eKzXG/7MQBJn6fz+zsKvBrYUPR5JvDovAQfPWtlwpb0FjpJ4HW2n5J0J52RcHnS+dIZXmZ38XOMn39O6tIX4F96j7Sdunz+W+mM7qayR9H3J3MTYet8BjgVWAW8r7R98k0WpvM7fI3t8+YotqhRW2vY+wCPF8niZcAxxfYfSnq5pD2Ak0r9nwCeW+F176LzPwaSXkLnn+9b6wu7Nab6/J8FvFnSfsWF2d8u9f8KcPbEiqQj5zTa5pn8+3o18B8BbN9X2n6cpGWSnknnwvrXgduBk4sROcX+Q+Yk6uhbWxP2l4FnSLoH+AjwzWL7ucCXgDvo1PcmXAd8qLiQeDjdXQEskbQZuB5YbXv3NP0Xq6k+/x8AfwL8I3AbsAXYVfQ/B1gh6R5JW4Az5z7k5ijKHF+XdK+kj9n+IXA/cNWkrn9PZ/S9iU6pZKPtLcCHga8Ufz7rgf3nMPzoQ25Njzkj6Tm2nyxG2F8A1tj+wnzH1XSSngVsBl5le1exbTWwwvbZ0x0bzdLWEXYsTBdK2kRnxsL3gJvmOZ7Gk3Qs8ADwiYlkHe2VEXZERENkhB0R0RBJ2BERDZGEHRHREEnYERENkYQdEdEQ/w9bZFNAfeOGtgAAAABJRU5ErkJggg==\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"Categorical plot to explain distribution of type and authors of poetry through the ages. It'd be better if the ages were given in years instead of two categories."},{"metadata":{"trusted":true},"cell_type":"code","source":"sns.catplot(x=\"age\", y=\"author\",hue=\"type\", data=df_poetry);","execution_count":32,"outputs":[{"output_type":"display_data","data":{"text/plain":"","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"First I'll separate the dataset for training and test, then I'll vectorize both sets with TFIDF and Count Vectorizer, and then apply decision tree for classification."},{"metadata":{"trusted":true},"cell_type":"code","source":"y=df_poetry['author']\nx=df_poetry[\"content\"]\nX_train, X_test, y_train, y_test =train_test_split(x,y,test_size=0.33, random_state=50)\nprint(X_train)","execution_count":33,"outputs":[{"output_type":"stream","text":"526 [i carry your heart with me(i carry it in] Cop...\n63 Full fathom five thy father lies;\\r\\nOf his bo...\n158 Love is a sickness full of woes,\\r\\nAll remedi...\n248 No spring nor summer beauty hath such grace\\r\\...\n175 Come away, come away, death,\\r\\n And in sad...\n ... \n70 How like a winter hath my absence been\\r\\nFrom...\n132 Stella, think not that I by verse seek fame,\\r...\n289 If thou survive my well-contented day,\\r\\nWhen...\n109 Ye tradefull Merchants that with weary toyle,\\...\n480 [Version 1: 1921]\\r\\nThe quick sparks on the g...\nName: content, Length: 383, dtype: object\n","name":"stdout"}]},{"metadata":{},"cell_type":"raw","source":"Trying to predict the author of the poem from the content. Used Count Vectorizer and Decision Tree Classifier with entropy."},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn.feature_extraction.text import TfidfVectorizer\nvectorizer = TfidfVectorizer()\nvectrain = vectorizer.fit_transform(X_train)\nvectest = vectorizer.transform(X_test)","execution_count":34,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"vectest.shape","execution_count":35,"outputs":[{"output_type":"execute_result","execution_count":35,"data":{"text/plain":"(190, 9936)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"y_train.shape","execution_count":36,"outputs":[{"output_type":"execute_result","execution_count":36,"data":{"text/plain":"(383,)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"dtclassifier=DecisionTreeClassifier(criterion=\"entropy\", max_depth=None)\ndtclassifier.fit(vectrain,y_train)\npreddt = dtclassifier.predict(vectest)","execution_count":37,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"accuracy= accuracy_score(preddt,y_test)\nprint(accuracy)","execution_count":38,"outputs":[{"output_type":"stream","text":"0.35789473684210527\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"Trying to predict the age of the poem from the content. Used Count Vectorizer and Decision Tree Classifier with entropy."},{"metadata":{"trusted":true},"cell_type":"code","source":"y=df_poetry['age']\nx=df_poetry[\"content\"]\nX_train, X_test, y_train, y_test =train_test_split(x,y,test_size=0.33, random_state=50)","execution_count":39,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"vectorizer = TfidfVectorizer()\nvectrain = vectorizer.fit_transform(X_train)\nvectest = vectorizer.transform(X_test)","execution_count":40,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"dtclassifier=DecisionTreeClassifier(criterion=\"entropy\", max_depth=None)\ndtclassifier.fit(vectrain,y_train)\npreddt = dtclassifier.predict(vectest)","execution_count":41,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"accuracy= accuracy_score(preddt,y_test)\nprint(accuracy)","execution_count":42,"outputs":[{"output_type":"stream","text":"0.868421052631579\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"Trying to predict authors from rest of the features this time, I don't expect too much of an improvement. Used Tfidf vectorizer and decision tree with gini index as split criterion."},{"metadata":{"trusted":true},"cell_type":"code","source":"y=df_poetry['author']\nX=df_poetry.loc[:, df_poetry.columns!=\"author\"]\nX_train, X_test, y_train, y_test =train_test_split(x,y,test_size=0.33, random_state=50)","execution_count":43,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"vectorizer = TfidfVectorizer()\nvectrain = vectorizer.fit_transform(X_train)\nvectest = vectorizer.transform(X_test)","execution_count":44,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"dtclassifier=DecisionTreeClassifier(criterion=\"gini\", max_depth=None)\ndtclassifier.fit(vectrain,y_train)\npreddt = dtclassifier.predict(vectest)","execution_count":45,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"accuracy= accuracy_score(preddt,y_test)\nprint(accuracy)","execution_count":46,"outputs":[{"output_type":"stream","text":"0.4\n","name":"stdout"}]}],"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat":4,"nbformat_minor":4}
--------------------------------------------------------------------------------