├── LICENSE.md ├── Models └── .gitkeep ├── README.md ├── Results ├── Waseem_Hovy_auth.txt ├── Waseem_Hovy_hidden-auth.txt ├── Waseem_Hovy_hidden-baseline.txt ├── Waseem_Hovy_lr-auth.txt ├── Waseem_Hovy_lr-baseline.txt ├── Waseem_Hovy_sum-auth.txt └── Waseem_Hovy_sum-baseline.txt ├── TwitterData ├── README.md └── twitter_data_waseem_hovy.csv ├── __init__.py ├── cross_validate.py ├── featureExtractor ├── __init__.py ├── dnn_features.py ├── feature_extractor.py ├── graph_features.py └── ngram_features.py ├── grid_search.py ├── main_classifier.py ├── requirements.txt ├── resources ├── __init__.py ├── authors.txt ├── node2vec │ ├── .gitignore │ ├── LICENSE.md │ ├── README.md │ ├── requirements.txt │ └── src │ │ ├── main.py │ │ └── node2vec.py ├── stopwords.txt ├── structural.py └── textual.py ├── test.py ├── twitter_access.py └── twitter_model.py /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Pushkar Mishra 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Models/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pushkarmishra/AuthorProfilingAbuseDetection/6322467b26f53aca7d231c0ab92182879b9375ff/Models/.gitkeep -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Author Profiling for Abuse Detection 2 | 3 | Code for paper "Author Profiling for Abuse Detection", in Proceedings of the 27th International Conference on Computational Linguistics (COLING) 2018 4 | 5 | If you use this code, please cite our paper: 6 | ``` 7 | @inproceedings{mishra-etal-2018-author, 8 | title = "Author Profiling for Abuse Detection", 9 | author = "Mishra, Pushkar and 10 | Del Tredici, Marco and 11 | Yannakoudakis, Helen and 12 | Shutova, Ekaterina", 13 | booktitle = "Proceedings of the 27th International Conference on Computational Linguistics", 14 | month = aug, 15 | year = "2018", 16 | address = "Santa Fe, New Mexico, USA", 17 | publisher = "Association for Computational Linguistics", 18 | url = "https://www.aclweb.org/anthology/C18-1093", 19 | pages = "1088--1098", 20 | } 21 | ``` 22 | 23 | Python3.5+ required to run the code. Dependencies can be installed with `pip install -r requirements.txt` followed by `python -m nltk.downloader punkt` 24 | 25 | The dataset for the code is provided in the _TwitterData/twitter_data_waseem_hovy.csv_ file as a list of _\[tweet ID, annotation\]_ pairs. 26 | To run the code, please use a Twitter API (_twitter_access.py_ employs Tweepy) to retrieve the tweets for the given tweet IDs. Replace the dataset file with a 27 | file of the same name that has a list of _\[tweet ID, tweet, annotation\]_ triples. 28 | Additionally, _twitter_access.py_ contains functions to retrieve follower-following relationships amongst the authors of the tweets (specified in _resources/authors.txt_). Once the relationships have been retrieved, please use _Node2vec_ (see _resources/node2vec_) to produce embeddings for each of the authors and store them in a file named _authors.emb_ in the _resources_ directory. 29 | 30 | To run the best method (LR + AUTH): 31 | `python twitter_model.py -c 16202 -m lna` 32 | 33 | 34 |
To run the other methods: 35 | * AUTH: `python twitter_model.py -c 16202 -m a` 36 | * LR: `python twitter_model.py -c 16202 -m ln` 37 | * WS: `python twitter_model.py -c 16202 -m ws` 38 | * HS: `python twitter_model.py -c 16202 -m hs` 39 | * WS + AUTH: `python twitter_model.py -c 16202 -m wsa` 40 | * HS + AUTH: `python twitter_model.py -c 16202 -m hsa` 41 | 42 | For the HS and WS based methods, adding the `-ft` flag to the command ensures that the pre-trained deep neural models from the _Models_ directory 43 | are not used and instead all the training happens from scratch. This requires that the file of pre-trained GLoVe embeddings is downloaded from 44 | , unzipped and placed in the _resources_ directory prior to the execution. 45 | 46 |
An overview of the complete training-testing flow is as follows: 47 | 1. For each tweet in the dataset, its author's identity is obtained using functions available in the _twitter_access.py_ file. For each author, 48 | information about which other authors from the dataset follow them on Twitter is also obtained in order to create a community graph where nodes 49 | are authors and edges denote follow relationship. 50 | 2. Node2vec is applied to the community graph to generate embeddings for the nodes, i.e., the authors. These author embeddings are saved to the 51 | _authors.emb_ file in the _resources_ directory. 52 | 3. The dataset is randomly split into train set and test set. 53 | 4. Tweets in the train set are used to produce an n-gram count based model or deep neural model depending on the method being used. 54 | 5. A feature extractor is instantiated that uses the models from step 2 along with the author embeddings to convert tweets to feature vectors. 55 | 6. LR/GBDT classifier is trained using the feature vectors extracted for the tweets in the train set. A part of the train set is held out as 56 | validation data to prevent over-fitting. 57 | 7. The trained classifier is made to predict classes for tweets in the test set and precision, recall and F1 are calculated. 58 | 59 | In the 10-fold CV, steps 3-7 are run 10 times (each time with a different set of tweets as the test set) and the final precision, recall and 60 | F1 are calculated by averaging results from across the 10 runs. 61 | -------------------------------------------------------------------------------- /Results/Waseem_Hovy_auth.txt: -------------------------------------------------------------------------------- 1 | /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pushkarmishra/Desktop/AuthorProfileAbuseDetection/twitter_model.py -c 30000 2 | Using Theano backend. 3 | 2018-03-10 01:05:38 - CVLog - INFO - 10-fold cross validation procedure has begun 4 | 2018-03-10 01:05:38 - CVLog - INFO - Validation round 1 of 10 starting 5 | 2018-03-10 01:05:38 - TrainingLog - INFO - Initiating training of main classifier 6 | 2018-03-10 01:05:56 - TrainingLog - INFO - Feature extractor ready 7 | 2018-03-10 01:05:56 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 8 | 2018-03-10 01:05:56 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 9 | 2018-03-10 01:05:56 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 10 | 2018-03-10 01:05:56 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 11 | 2018-03-10 01:05:56 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 12 | 2018-03-10 01:05:56 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 13 | 2018-03-10 01:05:56 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 14 | 2018-03-10 01:05:56 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 15 | 2018-03-10 01:05:56 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 16 | 2018-03-10 01:05:56 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 17 | 2018-03-10 01:05:56 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 18 | 2018-03-10 01:05:56 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 19 | 2018-03-10 01:05:56 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 20 | 2018-03-10 01:05:56 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 21 | 2018-03-10 01:06:03 - TrainingLog - INFO - Main classifier training finished 22 | 2018-03-10 01:06:04 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8217821782178217 23 | 2018-03-10 01:06:04 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8159203980099502 24 | 2018-03-10 01:06:04 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8039867109634552 25 | 2018-03-10 01:06:04 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7930174563591023 26 | 2018-03-10 01:06:04 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7964071856287425 27 | 2018-03-10 01:06:04 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7853577371048253 28 | 2018-03-10 01:06:04 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.776034236804565 29 | 2018-03-10 01:06:04 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7777777777777778 30 | 2018-03-10 01:06:04 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7746947835738068 31 | 2018-03-10 01:06:04 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7612387612387612 32 | 2018-03-10 01:06:04 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7638510445049954 33 | 2018-03-10 01:06:04 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.765195670274771 34 | 2018-03-10 01:06:04 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7732513451191392 35 | 2018-03-10 01:06:04 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7723054960742327 36 | 2018-03-10 01:06:04 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7714856762158561 37 | 2018-03-10 01:06:04 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7676452217364147 38 | 2018-03-10 01:06:04 - TestLog - INFO - Total 1621 samples classified with accuracy 0.7686613201727329 39 | 2018-03-10 01:06:04 - TestLog - INFO - AUROC is 0.8904875087081418 40 | 2018-03-10 01:06:04 - TestLog - INFO - Classification report: 41 | precision recall f1-score support 42 | 43 | 0 1.00000 0.00515 0.01026 194 44 | 1 0.69275 0.75873 0.72424 315 45 | 2 0.78902 0.90468 0.84290 1112 46 | 47 | avg / total 0.79556 0.76866 0.72019 1621 48 | 49 | 2018-03-10 01:06:04 - TestLog - INFO - Confusion matrix: 50 | [[ 1 0 193] 51 | [ 0 239 76] 52 | [ 0 106 1006]] 53 | 2018-03-10 01:06:04 - CVLog - INFO - Validation round 2 of 10 starting 54 | 2018-03-10 01:06:04 - TrainingLog - INFO - Initiating training of main classifier 55 | 2018-03-10 01:06:20 - TrainingLog - INFO - Feature extractor ready 56 | 2018-03-10 01:06:20 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 57 | 2018-03-10 01:06:20 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 58 | 2018-03-10 01:06:20 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 59 | 2018-03-10 01:06:20 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 60 | 2018-03-10 01:06:20 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 61 | 2018-03-10 01:06:20 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 62 | 2018-03-10 01:06:20 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 63 | 2018-03-10 01:06:20 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 64 | 2018-03-10 01:06:20 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 65 | 2018-03-10 01:06:20 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 66 | 2018-03-10 01:06:20 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 67 | 2018-03-10 01:06:20 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 68 | 2018-03-10 01:06:20 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 69 | 2018-03-10 01:06:20 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 70 | 2018-03-10 01:06:28 - TrainingLog - INFO - Main classifier training finished 71 | 2018-03-10 01:06:28 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.6732673267326733 72 | 2018-03-10 01:06:28 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7213930348258707 73 | 2018-03-10 01:06:28 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7242524916943521 74 | 2018-03-10 01:06:28 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7082294264339152 75 | 2018-03-10 01:06:28 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7125748502994012 76 | 2018-03-10 01:06:28 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7221297836938436 77 | 2018-03-10 01:06:28 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7232524964336662 78 | 2018-03-10 01:06:28 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7365792759051186 79 | 2018-03-10 01:06:28 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7380688124306326 80 | 2018-03-10 01:06:28 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7482517482517482 81 | 2018-03-10 01:06:28 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7493188010899182 82 | 2018-03-10 01:06:28 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.7577019150707743 83 | 2018-03-10 01:06:28 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7617217524980784 84 | 2018-03-10 01:06:28 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7673090649536045 85 | 2018-03-10 01:06:28 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7694870086608927 86 | 2018-03-10 01:06:28 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7657713928794504 87 | 2018-03-10 01:06:28 - TestLog - INFO - Total 1621 samples classified with accuracy 0.7655768044417026 88 | 2018-03-10 01:06:28 - TestLog - INFO - AUROC is 0.882812427956678 89 | 2018-03-10 01:06:28 - TestLog - INFO - Classification report: 90 | precision recall f1-score support 91 | 92 | 0 0.33333 0.00515 0.01015 194 93 | 1 0.68067 0.77143 0.72321 315 94 | 2 0.79064 0.89658 0.84029 1112 95 | 96 | avg / total 0.71454 0.76558 0.71819 1621 97 | 98 | 2018-03-10 01:06:28 - TestLog - INFO - Confusion matrix: 99 | [[ 1 1 192] 100 | [ 0 243 72] 101 | [ 2 113 997]] 102 | 2018-03-10 01:06:28 - CVLog - INFO - Validation round 3 of 10 starting 103 | 2018-03-10 01:06:28 - TrainingLog - INFO - Initiating training of main classifier 104 | 2018-03-10 01:06:44 - TrainingLog - INFO - Feature extractor ready 105 | 2018-03-10 01:06:44 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 106 | 2018-03-10 01:06:44 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 107 | 2018-03-10 01:06:44 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 108 | 2018-03-10 01:06:44 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 109 | 2018-03-10 01:06:44 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 110 | 2018-03-10 01:06:44 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 111 | 2018-03-10 01:06:44 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 112 | 2018-03-10 01:06:44 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 113 | 2018-03-10 01:06:44 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 114 | 2018-03-10 01:06:44 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 115 | 2018-03-10 01:06:44 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 116 | 2018-03-10 01:06:44 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 117 | 2018-03-10 01:06:44 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 118 | 2018-03-10 01:06:44 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 119 | 2018-03-10 01:06:52 - TrainingLog - INFO - Main classifier training finished 120 | 2018-03-10 01:06:52 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.7524752475247525 121 | 2018-03-10 01:06:52 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7661691542288557 122 | 2018-03-10 01:06:52 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7740863787375415 123 | 2018-03-10 01:06:52 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7630922693266833 124 | 2018-03-10 01:06:52 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7624750499001997 125 | 2018-03-10 01:06:52 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7487520798668885 126 | 2018-03-10 01:06:52 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7560627674750356 127 | 2018-03-10 01:06:52 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7590511860174781 128 | 2018-03-10 01:06:52 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7635960044395117 129 | 2018-03-10 01:06:52 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7572427572427572 130 | 2018-03-10 01:06:52 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7620345140781108 131 | 2018-03-10 01:06:52 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.761865112406328 132 | 2018-03-10 01:06:52 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7632590315142198 133 | 2018-03-10 01:06:52 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7673090649536045 134 | 2018-03-10 01:06:53 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7634910059960026 135 | 2018-03-10 01:06:53 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7645221736414741 136 | 2018-03-10 01:06:53 - TestLog - INFO - Total 1621 samples classified with accuracy 0.7643429981492905 137 | 2018-03-10 01:06:53 - TestLog - INFO - AUROC is 0.8903265999142658 138 | 2018-03-10 01:06:53 - TestLog - INFO - Classification report: 139 | precision recall f1-score support 140 | 141 | 0 1.00000 0.00515 0.01026 194 142 | 1 0.67898 0.75873 0.71664 315 143 | 2 0.78785 0.89838 0.83950 1112 144 | 145 | avg / total 0.79209 0.76434 0.71638 1621 146 | 147 | 2018-03-10 01:06:53 - TestLog - INFO - Confusion matrix: 148 | [[ 1 0 193] 149 | [ 0 239 76] 150 | [ 0 113 999]] 151 | 2018-03-10 01:06:53 - CVLog - INFO - Validation round 4 of 10 starting 152 | 2018-03-10 01:06:53 - TrainingLog - INFO - Initiating training of main classifier 153 | 2018-03-10 01:07:09 - TrainingLog - INFO - Feature extractor ready 154 | 2018-03-10 01:07:09 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 155 | 2018-03-10 01:07:09 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 156 | 2018-03-10 01:07:09 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 157 | 2018-03-10 01:07:09 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 158 | 2018-03-10 01:07:09 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 159 | 2018-03-10 01:07:09 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 160 | 2018-03-10 01:07:09 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 161 | 2018-03-10 01:07:09 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 162 | 2018-03-10 01:07:09 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 163 | 2018-03-10 01:07:09 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 164 | 2018-03-10 01:07:09 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 165 | 2018-03-10 01:07:09 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 166 | 2018-03-10 01:07:09 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 167 | 2018-03-10 01:07:09 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 168 | 2018-03-10 01:07:17 - TrainingLog - INFO - Main classifier training finished 169 | 2018-03-10 01:07:17 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.7029702970297029 170 | 2018-03-10 01:07:17 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.736318407960199 171 | 2018-03-10 01:07:17 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7475083056478405 172 | 2018-03-10 01:07:17 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7655860349127181 173 | 2018-03-10 01:07:17 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7624750499001997 174 | 2018-03-10 01:07:17 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.762063227953411 175 | 2018-03-10 01:07:17 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7631954350927247 176 | 2018-03-10 01:07:17 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7615480649188514 177 | 2018-03-10 01:07:17 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7547169811320755 178 | 2018-03-10 01:07:17 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7572427572427572 179 | 2018-03-10 01:07:17 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7529518619436876 180 | 2018-03-10 01:07:17 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.7543713572023314 181 | 2018-03-10 01:07:18 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7524980784012298 182 | 2018-03-10 01:07:18 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7508922198429693 183 | 2018-03-10 01:07:18 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.754163890739507 184 | 2018-03-10 01:07:18 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7526545908806995 185 | 2018-03-10 01:07:18 - TestLog - INFO - Total 1621 samples classified with accuracy 0.7526218383713756 186 | 2018-03-10 01:07:18 - TestLog - INFO - AUROC is 0.8857194782076196 187 | 2018-03-10 01:07:18 - TestLog - INFO - Classification report: 188 | precision recall f1-score support 189 | 190 | 0 1.00000 0.00515 0.01026 194 191 | 1 0.64171 0.76190 0.69666 315 192 | 2 0.78571 0.88040 0.83036 1112 193 | 194 | avg / total 0.78338 0.75262 0.70623 1621 195 | 196 | 2018-03-10 01:07:18 - TestLog - INFO - Confusion matrix: 197 | [[ 1 1 192] 198 | [ 0 240 75] 199 | [ 0 133 979]] 200 | 2018-03-10 01:07:18 - CVLog - INFO - Validation round 5 of 10 starting 201 | 2018-03-10 01:07:18 - TrainingLog - INFO - Initiating training of main classifier 202 | 2018-03-10 01:07:37 - TrainingLog - INFO - Feature extractor ready 203 | 2018-03-10 01:07:37 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 204 | 2018-03-10 01:07:37 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 205 | 2018-03-10 01:07:37 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 206 | 2018-03-10 01:07:37 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 207 | 2018-03-10 01:07:37 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 208 | 2018-03-10 01:07:37 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 209 | 2018-03-10 01:07:37 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 210 | 2018-03-10 01:07:37 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 211 | 2018-03-10 01:07:37 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 212 | 2018-03-10 01:07:37 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 213 | 2018-03-10 01:07:37 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 214 | 2018-03-10 01:07:37 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 215 | 2018-03-10 01:07:37 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 216 | 2018-03-10 01:07:37 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 217 | 2018-03-10 01:07:46 - TrainingLog - INFO - Main classifier training finished 218 | 2018-03-10 01:07:46 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.7623762376237624 219 | 2018-03-10 01:07:46 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7562189054726368 220 | 2018-03-10 01:07:46 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7873754152823921 221 | 2018-03-10 01:07:46 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7930174563591023 222 | 2018-03-10 01:07:46 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7724550898203593 223 | 2018-03-10 01:07:46 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7603993344425957 224 | 2018-03-10 01:07:46 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7703281027104137 225 | 2018-03-10 01:07:46 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7765293383270911 226 | 2018-03-10 01:07:46 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7758046614872364 227 | 2018-03-10 01:07:46 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7682317682317682 228 | 2018-03-10 01:07:46 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7665758401453224 229 | 2018-03-10 01:07:46 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.768526228143214 230 | 2018-03-10 01:07:46 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7701767870868562 231 | 2018-03-10 01:07:46 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7651677373304783 232 | 2018-03-10 01:07:46 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7641572285143238 233 | 2018-03-10 01:07:46 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7582760774515928 234 | 2018-03-10 01:07:46 - TestLog - INFO - Total 1621 samples classified with accuracy 0.757557063541024 235 | 2018-03-10 01:07:46 - TestLog - INFO - AUROC is 0.8808544966232935 236 | 2018-03-10 01:07:46 - TestLog - INFO - Classification report: 237 | precision recall f1-score support 238 | 239 | 0 0.00000 0.00000 0.00000 194 240 | 1 0.65833 0.75238 0.70222 315 241 | 2 0.78651 0.89119 0.83558 1112 242 | 243 | avg / total 0.66747 0.75756 0.70966 1621 244 | 245 | 2018-03-10 01:07:46 - TestLog - INFO - Confusion matrix: 246 | [[ 0 3 191] 247 | [ 0 237 78] 248 | [ 1 120 991]] 249 | 2018-03-10 01:07:46 - CVLog - INFO - Validation round 6 of 10 starting 250 | 2018-03-10 01:07:46 - TrainingLog - INFO - Initiating training of main classifier 251 | 2018-03-10 01:08:03 - TrainingLog - INFO - Feature extractor ready 252 | 2018-03-10 01:08:03 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 253 | 2018-03-10 01:08:03 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 254 | 2018-03-10 01:08:03 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 255 | 2018-03-10 01:08:03 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 256 | 2018-03-10 01:08:03 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 257 | 2018-03-10 01:08:03 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 258 | 2018-03-10 01:08:03 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 259 | 2018-03-10 01:08:03 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 260 | 2018-03-10 01:08:03 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 261 | 2018-03-10 01:08:03 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 262 | 2018-03-10 01:08:03 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 263 | 2018-03-10 01:08:03 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 264 | 2018-03-10 01:08:03 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 265 | 2018-03-10 01:08:03 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 266 | 2018-03-10 01:08:10 - TrainingLog - INFO - Main classifier training finished 267 | 2018-03-10 01:08:11 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8118811881188119 268 | 2018-03-10 01:08:11 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7562189054726368 269 | 2018-03-10 01:08:11 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7740863787375415 270 | 2018-03-10 01:08:11 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7880299251870324 271 | 2018-03-10 01:08:11 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7924151696606786 272 | 2018-03-10 01:08:11 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7853577371048253 273 | 2018-03-10 01:08:11 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7902995720399429 274 | 2018-03-10 01:08:11 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7852684144818977 275 | 2018-03-10 01:08:11 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7857935627081021 276 | 2018-03-10 01:08:11 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7792207792207793 277 | 2018-03-10 01:08:11 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7756584922797457 278 | 2018-03-10 01:08:11 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.771856786011657 279 | 2018-03-10 01:08:11 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7778631821675634 280 | 2018-03-10 01:08:11 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7744468236973591 281 | 2018-03-10 01:08:11 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7634910059960026 282 | 2018-03-10 01:08:11 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7645221736414741 283 | 2018-03-10 01:08:11 - TestLog - INFO - Total 1620 samples classified with accuracy 0.7635802469135803 284 | 2018-03-10 01:08:11 - TestLog - INFO - AUROC is 0.8866142691231639 285 | 2018-03-10 01:08:11 - TestLog - INFO - Classification report: 286 | precision recall f1-score support 287 | 288 | 0 1.00000 0.01031 0.02041 194 289 | 1 0.67131 0.76508 0.71513 315 290 | 2 0.78952 0.89469 0.83882 1111 291 | 292 | avg / total 0.79174 0.76358 0.71676 1620 293 | 294 | 2018-03-10 01:08:11 - TestLog - INFO - Confusion matrix: 295 | [[ 2 1 191] 296 | [ 0 241 74] 297 | [ 0 117 994]] 298 | 2018-03-10 01:08:11 - CVLog - INFO - Validation round 7 of 10 starting 299 | 2018-03-10 01:08:11 - TrainingLog - INFO - Initiating training of main classifier 300 | 2018-03-10 01:08:27 - TrainingLog - INFO - Feature extractor ready 301 | 2018-03-10 01:08:27 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 302 | 2018-03-10 01:08:27 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 303 | 2018-03-10 01:08:27 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 304 | 2018-03-10 01:08:27 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 305 | 2018-03-10 01:08:27 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 306 | 2018-03-10 01:08:27 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 307 | 2018-03-10 01:08:27 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 308 | 2018-03-10 01:08:27 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 309 | 2018-03-10 01:08:27 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 310 | 2018-03-10 01:08:27 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 311 | 2018-03-10 01:08:27 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 312 | 2018-03-10 01:08:27 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 313 | 2018-03-10 01:08:27 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 314 | 2018-03-10 01:08:27 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 315 | 2018-03-10 01:08:35 - TrainingLog - INFO - Main classifier training finished 316 | 2018-03-10 01:08:35 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.7821782178217822 317 | 2018-03-10 01:08:35 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8159203980099502 318 | 2018-03-10 01:08:35 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8106312292358804 319 | 2018-03-10 01:08:35 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8154613466334164 320 | 2018-03-10 01:08:35 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8003992015968064 321 | 2018-03-10 01:08:35 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7953410981697171 322 | 2018-03-10 01:08:35 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7902995720399429 323 | 2018-03-10 01:08:35 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7852684144818977 324 | 2018-03-10 01:08:35 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7869034406215316 325 | 2018-03-10 01:08:35 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7812187812187812 326 | 2018-03-10 01:08:36 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.773841961852861 327 | 2018-03-10 01:08:36 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.768526228143214 328 | 2018-03-10 01:08:36 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7701767870868562 329 | 2018-03-10 01:08:36 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.76802284082798 330 | 2018-03-10 01:08:36 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7641572285143238 331 | 2018-03-10 01:08:36 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7632729544034978 332 | 2018-03-10 01:08:36 - TestLog - INFO - Total 1620 samples classified with accuracy 0.7623456790123457 333 | 2018-03-10 01:08:36 - TestLog - INFO - AUROC is 0.8935140467577484 334 | 2018-03-10 01:08:36 - TestLog - INFO - Classification report: 335 | precision recall f1-score support 336 | 337 | 0 0.00000 0.00000 0.00000 194 338 | 1 0.67318 0.76508 0.71620 315 339 | 2 0.78764 0.89469 0.83776 1111 340 | 341 | avg / total 0.67106 0.76235 0.71380 1620 342 | 343 | 2018-03-10 01:08:36 - TestLog - INFO - Confusion matrix: 344 | [[ 0 0 194] 345 | [ 0 241 74] 346 | [ 0 117 994]] 347 | 2018-03-10 01:08:36 - CVLog - INFO - Validation round 8 of 10 starting 348 | 2018-03-10 01:08:36 - TrainingLog - INFO - Initiating training of main classifier 349 | 2018-03-10 01:08:52 - TrainingLog - INFO - Feature extractor ready 350 | 2018-03-10 01:08:52 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 351 | 2018-03-10 01:08:52 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 352 | 2018-03-10 01:08:52 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 353 | 2018-03-10 01:08:52 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 354 | 2018-03-10 01:08:52 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 355 | 2018-03-10 01:08:52 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 356 | 2018-03-10 01:08:52 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 357 | 2018-03-10 01:08:52 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 358 | 2018-03-10 01:08:52 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 359 | 2018-03-10 01:08:52 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 360 | 2018-03-10 01:08:52 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 361 | 2018-03-10 01:08:52 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 362 | 2018-03-10 01:08:52 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 363 | 2018-03-10 01:08:52 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 364 | 2018-03-10 01:09:00 - TrainingLog - INFO - Main classifier training finished 365 | 2018-03-10 01:09:00 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.7524752475247525 366 | 2018-03-10 01:09:00 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7562189054726368 367 | 2018-03-10 01:09:00 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7674418604651163 368 | 2018-03-10 01:09:00 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7630922693266833 369 | 2018-03-10 01:09:00 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7744510978043913 370 | 2018-03-10 01:09:00 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7703826955074875 371 | 2018-03-10 01:09:00 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7631954350927247 372 | 2018-03-10 01:09:00 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7590511860174781 373 | 2018-03-10 01:09:00 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.7591564927857936 374 | 2018-03-10 01:09:00 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7552447552447552 375 | 2018-03-10 01:09:00 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7484105358764759 376 | 2018-03-10 01:09:00 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.7485428809325562 377 | 2018-03-10 01:09:00 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7486548808608763 378 | 2018-03-10 01:09:00 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7516059957173448 379 | 2018-03-10 01:09:00 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7534976682211859 380 | 2018-03-10 01:09:01 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7507807620237351 381 | 2018-03-10 01:09:01 - TestLog - INFO - Total 1620 samples classified with accuracy 0.7506172839506173 382 | 2018-03-10 01:09:01 - TestLog - INFO - AUROC is 0.8805339837381058 383 | 2018-03-10 01:09:01 - TestLog - INFO - Classification report: 384 | precision recall f1-score support 385 | 386 | 0 0.00000 0.00000 0.00000 194 387 | 1 0.64607 0.73016 0.68554 315 388 | 2 0.78006 0.88749 0.83032 1111 389 | 390 | avg / total 0.66059 0.75062 0.70273 1620 391 | 392 | 2018-03-10 01:09:01 - TestLog - INFO - Confusion matrix: 393 | [[ 0 1 193] 394 | [ 0 230 85] 395 | [ 0 125 986]] 396 | 2018-03-10 01:09:01 - CVLog - INFO - Validation round 9 of 10 starting 397 | 2018-03-10 01:09:01 - TrainingLog - INFO - Initiating training of main classifier 398 | 2018-03-10 01:09:17 - TrainingLog - INFO - Feature extractor ready 399 | 2018-03-10 01:09:17 - TrainingLog - INFO - 1001 of 14583 feature vectors prepared for training 400 | 2018-03-10 01:09:17 - TrainingLog - INFO - 2001 of 14583 feature vectors prepared for training 401 | 2018-03-10 01:09:17 - TrainingLog - INFO - 3001 of 14583 feature vectors prepared for training 402 | 2018-03-10 01:09:17 - TrainingLog - INFO - 4001 of 14583 feature vectors prepared for training 403 | 2018-03-10 01:09:17 - TrainingLog - INFO - 5001 of 14583 feature vectors prepared for training 404 | 2018-03-10 01:09:17 - TrainingLog - INFO - 6001 of 14583 feature vectors prepared for training 405 | 2018-03-10 01:09:17 - TrainingLog - INFO - 7001 of 14583 feature vectors prepared for training 406 | 2018-03-10 01:09:17 - TrainingLog - INFO - 8001 of 14583 feature vectors prepared for training 407 | 2018-03-10 01:09:17 - TrainingLog - INFO - 9001 of 14583 feature vectors prepared for training 408 | 2018-03-10 01:09:17 - TrainingLog - INFO - 10001 of 14583 feature vectors prepared for training 409 | 2018-03-10 01:09:17 - TrainingLog - INFO - 11001 of 14583 feature vectors prepared for training 410 | 2018-03-10 01:09:17 - TrainingLog - INFO - 12001 of 14583 feature vectors prepared for training 411 | 2018-03-10 01:09:17 - TrainingLog - INFO - 13001 of 14583 feature vectors prepared for training 412 | 2018-03-10 01:09:17 - TrainingLog - INFO - 14001 of 14583 feature vectors prepared for training 413 | 2018-03-10 01:09:25 - TrainingLog - INFO - Main classifier training finished 414 | 2018-03-10 01:09:25 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.693069306930693 415 | 2018-03-10 01:09:25 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7164179104477612 416 | 2018-03-10 01:09:25 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7342192691029901 417 | 2018-03-10 01:09:25 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7306733167082294 418 | 2018-03-10 01:09:25 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7445109780439122 419 | 2018-03-10 01:09:25 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7554076539101497 420 | 2018-03-10 01:09:25 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7574893009985735 421 | 2018-03-10 01:09:25 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.7565543071161048 422 | 2018-03-10 01:09:26 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.755826859045505 423 | 2018-03-10 01:09:26 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7602397602397603 424 | 2018-03-10 01:09:26 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7593097184377838 425 | 2018-03-10 01:09:26 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.7552039966694422 426 | 2018-03-10 01:09:26 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.754803996925442 427 | 2018-03-10 01:09:26 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7508922198429693 428 | 2018-03-10 01:09:26 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7481678880746169 429 | 2018-03-10 01:09:26 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7489069331667708 430 | 2018-03-10 01:09:26 - TestLog - INFO - Total 1619 samples classified with accuracy 0.7492279184681903 431 | 2018-03-10 01:09:26 - TestLog - INFO - AUROC is 0.8715033293006965 432 | 2018-03-10 01:09:26 - TestLog - INFO - Classification report: 433 | precision recall f1-score support 434 | 435 | 0 0.00000 0.00000 0.00000 194 436 | 1 0.64655 0.71656 0.67976 314 437 | 2 0.77734 0.88929 0.82955 1111 438 | 439 | avg / total 0.65883 0.74923 0.70110 1619 440 | 441 | 2018-03-10 01:09:26 - TestLog - INFO - Confusion matrix: 442 | [[ 0 0 194] 443 | [ 0 225 89] 444 | [ 0 123 988]] 445 | 2018-03-10 01:09:26 - CVLog - INFO - Validation round 10 of 10 starting 446 | 2018-03-10 01:09:26 - TrainingLog - INFO - Initiating training of main classifier 447 | 2018-03-10 01:09:44 - TrainingLog - INFO - Feature extractor ready 448 | 2018-03-10 01:09:44 - TrainingLog - INFO - 1001 of 14584 feature vectors prepared for training 449 | 2018-03-10 01:09:44 - TrainingLog - INFO - 2001 of 14584 feature vectors prepared for training 450 | 2018-03-10 01:09:44 - TrainingLog - INFO - 3001 of 14584 feature vectors prepared for training 451 | 2018-03-10 01:09:44 - TrainingLog - INFO - 4001 of 14584 feature vectors prepared for training 452 | 2018-03-10 01:09:44 - TrainingLog - INFO - 5001 of 14584 feature vectors prepared for training 453 | 2018-03-10 01:09:44 - TrainingLog - INFO - 6001 of 14584 feature vectors prepared for training 454 | 2018-03-10 01:09:44 - TrainingLog - INFO - 7001 of 14584 feature vectors prepared for training 455 | 2018-03-10 01:09:44 - TrainingLog - INFO - 8001 of 14584 feature vectors prepared for training 456 | 2018-03-10 01:09:44 - TrainingLog - INFO - 9001 of 14584 feature vectors prepared for training 457 | 2018-03-10 01:09:44 - TrainingLog - INFO - 10001 of 14584 feature vectors prepared for training 458 | 2018-03-10 01:09:44 - TrainingLog - INFO - 11001 of 14584 feature vectors prepared for training 459 | 2018-03-10 01:09:44 - TrainingLog - INFO - 12001 of 14584 feature vectors prepared for training 460 | 2018-03-10 01:09:44 - TrainingLog - INFO - 13001 of 14584 feature vectors prepared for training 461 | 2018-03-10 01:09:44 - TrainingLog - INFO - 14001 of 14584 feature vectors prepared for training 462 | 2018-03-10 01:09:54 - TrainingLog - INFO - Main classifier training finished 463 | 2018-03-10 01:09:54 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.801980198019802 464 | 2018-03-10 01:09:54 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.7910447761194029 465 | 2018-03-10 01:09:54 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.7973421926910299 466 | 2018-03-10 01:09:54 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.7880299251870324 467 | 2018-03-10 01:09:54 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.7904191616766467 468 | 2018-03-10 01:09:54 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.7820299500831946 469 | 2018-03-10 01:09:54 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.7803138373751783 470 | 2018-03-10 01:09:54 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.787765293383271 471 | 2018-03-10 01:09:54 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.779134295227525 472 | 2018-03-10 01:09:54 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.7752247752247752 473 | 2018-03-10 01:09:54 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.7683923705722071 474 | 2018-03-10 01:09:54 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.7735220649458784 475 | 2018-03-10 01:09:54 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.7732513451191392 476 | 2018-03-10 01:09:54 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.7715917201998572 477 | 2018-03-10 01:09:54 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.7721518987341772 478 | 2018-03-10 01:09:54 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.7695190505933791 479 | 2018-03-10 01:09:54 - TestLog - INFO - Total 1618 samples classified with accuracy 0.7707045735475896 480 | 2018-03-10 01:09:54 - TestLog - INFO - AUROC is 0.8862904796467865 481 | 2018-03-10 01:09:54 - TestLog - INFO - Classification report: 482 | precision recall f1-score support 483 | 484 | 0 0.00000 0.00000 0.00000 193 485 | 1 0.69565 0.76433 0.72838 314 486 | 2 0.79104 0.90639 0.84480 1111 487 | 488 | avg / total 0.67817 0.77070 0.72143 1618 489 | 490 | 2018-03-10 01:09:54 - TestLog - INFO - Confusion matrix: 491 | [[ 0 1 192] 492 | [ 0 240 74] 493 | [ 0 104 1007]] 494 | 2018-03-10 01:09:54 - CVLog - INFO - 495 | 496 | 2018-03-10 01:09:54 - CVLog - INFO - Summary (precision, recall, F1, accuracy): 497 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 1: [0.7955627361048222, 0.76866132017273292, 0.72019113865246276, 0.76866132017273292] 498 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 2: [0.71454208613102133, 0.76557680444170262, 0.71818673327773275, 0.76557680444170262] 499 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 3: [0.79208666140226136, 0.76434299814929052, 0.71637951897370034, 0.76434299814929052] 500 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 4: [0.78337651026985855, 0.75262183837137564, 0.70623305594035346, 0.75262183837137564] 501 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 5: [0.66747182319359988, 0.75755706354102403, 0.70966498934203581, 0.75755706354102403] 502 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 6: [0.79173710077018711, 0.76358024691358029, 0.7167615260174981, 0.76358024691358029] 503 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 7: [0.67106150224688976, 0.76234567901234573, 0.71379694172703356, 0.76234567901234573] 504 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 8: [0.66059355087083482, 0.75061728395061733, 0.70273283385092422, 0.75061728395061733] 505 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 9: [0.65882812422365578, 0.74922791846819026, 0.70109926442074211, 0.74922791846819026] 506 | 2018-03-10 01:09:54 - CVLog - INFO - Metrics for round 10: [0.67817399807005707, 0.77070457354758959, 0.72143478099159908, 0.77070457354758959] 507 | 2018-03-10 01:09:54 - CVLog - INFO - 508 | 509 | 2018-03-10 01:09:54 - CVLog - INFO - Final average metrics: 0.7213434093283188, 0.7605235726568448, 0.7126480783194082, 0.7605235726568448 510 | 511 | Process finished with exit code 0 512 | -------------------------------------------------------------------------------- /Results/Waseem_Hovy_hidden-auth.txt: -------------------------------------------------------------------------------- 1 | /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pushkarmishra/Desktop/AuthorProfileAbuseDetection/twitter_model.py -c 30000 2 | Using Theano backend. 3 | 2018-03-10 00:48:08 - CVLog - INFO - 10-fold cross validation procedure has begun 4 | 2018-03-10 00:48:08 - CVLog - INFO - Validation round 1 of 10 starting 5 | 2018-03-10 00:48:08 - TrainingLog - INFO - Initiating training of main classifier 6 | 2018-03-10 00:48:28 - TrainingLog - INFO - Feature extractor ready 7 | 2018-03-10 00:48:35 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 8 | 2018-03-10 00:48:41 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 9 | 2018-03-10 00:48:46 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 10 | 2018-03-10 00:48:52 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 11 | 2018-03-10 00:48:58 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 12 | 2018-03-10 00:49:02 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 13 | 2018-03-10 00:49:06 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 14 | 2018-03-10 00:49:10 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 15 | 2018-03-10 00:49:14 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 16 | 2018-03-10 00:49:18 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 17 | 2018-03-10 00:49:23 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 18 | 2018-03-10 00:49:27 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 19 | 2018-03-10 00:49:31 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 20 | 2018-03-10 00:49:35 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 21 | 2018-03-10 00:49:52 - TrainingLog - INFO - Main classifier training finished 22 | 2018-03-10 00:49:53 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8415841584158416 23 | 2018-03-10 00:49:54 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8557213930348259 24 | 2018-03-10 00:49:54 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8372093023255814 25 | 2018-03-10 00:49:55 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8403990024937655 26 | 2018-03-10 00:49:56 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8562874251497006 27 | 2018-03-10 00:49:57 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8635607321131448 28 | 2018-03-10 00:49:57 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8587731811697575 29 | 2018-03-10 00:49:58 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8614232209737828 30 | 2018-03-10 00:49:59 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8612652608213096 31 | 2018-03-10 00:49:59 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8591408591408591 32 | 2018-03-10 00:50:00 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8619436875567665 33 | 2018-03-10 00:50:01 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8659450457951707 34 | 2018-03-10 00:50:01 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8662567255956957 35 | 2018-03-10 00:50:02 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8679514632405425 36 | 2018-03-10 00:50:03 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8674217188540972 37 | 2018-03-10 00:50:04 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8688319800124922 38 | 2018-03-10 00:50:04 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8704503392967304 39 | 2018-03-10 00:50:04 - TestLog - INFO - AUROC is 0.9603672569920351 40 | 2018-03-10 00:50:04 - TestLog - INFO - Classification report: 41 | precision recall f1-score support 42 | 43 | 0 0.73000 0.75258 0.74112 194 44 | 1 0.83553 0.80635 0.82068 315 45 | 2 0.90510 0.90917 0.90713 1112 46 | 47 | avg / total 0.87063 0.87045 0.87046 1621 48 | 49 | 2018-03-10 00:50:04 - TestLog - INFO - Confusion matrix: 50 | [[ 146 1 47] 51 | [ 2 254 59] 52 | [ 52 49 1011]] 53 | 2018-03-10 00:50:04 - CVLog - INFO - Validation round 2 of 10 starting 54 | 2018-03-10 00:50:04 - TrainingLog - INFO - Initiating training of main classifier 55 | 2018-03-10 00:50:25 - TrainingLog - INFO - Feature extractor ready 56 | 2018-03-10 00:50:35 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 57 | 2018-03-10 00:50:41 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 58 | 2018-03-10 00:50:47 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 59 | 2018-03-10 00:50:53 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 60 | 2018-03-10 00:51:00 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 61 | 2018-03-10 00:51:07 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 62 | 2018-03-10 00:51:14 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 63 | 2018-03-10 00:51:21 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 64 | 2018-03-10 00:51:28 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 65 | 2018-03-10 00:51:36 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 66 | 2018-03-10 00:51:43 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 67 | 2018-03-10 00:51:50 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 68 | 2018-03-10 00:51:59 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 69 | 2018-03-10 00:52:07 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 70 | 2018-03-10 00:52:24 - TrainingLog - INFO - Main classifier training finished 71 | 2018-03-10 00:52:25 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8910891089108911 72 | 2018-03-10 00:52:25 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8905472636815921 73 | 2018-03-10 00:52:26 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8803986710963455 74 | 2018-03-10 00:52:26 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8678304239401496 75 | 2018-03-10 00:52:26 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.872255489021956 76 | 2018-03-10 00:52:27 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.870216306156406 77 | 2018-03-10 00:52:27 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8744650499286734 78 | 2018-03-10 00:52:28 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8726591760299626 79 | 2018-03-10 00:52:28 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8745837957824639 80 | 2018-03-10 00:52:29 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8791208791208791 81 | 2018-03-10 00:52:29 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8737511353315168 82 | 2018-03-10 00:52:29 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8726061615320566 83 | 2018-03-10 00:52:30 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8693312836279785 84 | 2018-03-10 00:52:30 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8665239114917915 85 | 2018-03-10 00:52:31 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.866089273817455 86 | 2018-03-10 00:52:31 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8657089319175515 87 | 2018-03-10 00:52:31 - TestLog - INFO - Total 1621 samples classified with accuracy 0.864898210980876 88 | 2018-03-10 00:52:31 - TestLog - INFO - AUROC is 0.9606214246607155 89 | 2018-03-10 00:52:31 - TestLog - INFO - Classification report: 90 | precision recall f1-score support 91 | 92 | 0 0.72222 0.73711 0.72959 194 93 | 1 0.82524 0.80952 0.81731 315 94 | 2 0.90126 0.90288 0.90207 1112 95 | 96 | avg / total 0.86506 0.86490 0.86495 1621 97 | 98 | 2018-03-10 00:52:31 - TestLog - INFO - Confusion matrix: 99 | [[ 143 1 50] 100 | [ 0 255 60] 101 | [ 55 53 1004]] 102 | 2018-03-10 00:52:31 - CVLog - INFO - Validation round 3 of 10 starting 103 | 2018-03-10 00:52:31 - TrainingLog - INFO - Initiating training of main classifier 104 | 2018-03-10 00:52:48 - TrainingLog - INFO - Feature extractor ready 105 | 2018-03-10 00:52:56 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 106 | 2018-03-10 00:53:03 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 107 | 2018-03-10 00:53:11 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 108 | 2018-03-10 00:53:18 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 109 | 2018-03-10 00:53:25 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 110 | 2018-03-10 00:53:33 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 111 | 2018-03-10 00:53:41 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 112 | 2018-03-10 00:53:47 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 113 | 2018-03-10 00:53:53 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 114 | 2018-03-10 00:53:59 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 115 | 2018-03-10 00:54:05 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 116 | 2018-03-10 00:54:11 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 117 | 2018-03-10 00:54:18 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 118 | 2018-03-10 00:54:24 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 119 | 2018-03-10 00:54:43 - TrainingLog - INFO - Main classifier training finished 120 | 2018-03-10 00:54:43 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8811881188118812 121 | 2018-03-10 00:54:44 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8955223880597015 122 | 2018-03-10 00:54:45 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.893687707641196 123 | 2018-03-10 00:54:45 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8902743142144638 124 | 2018-03-10 00:54:46 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8882235528942116 125 | 2018-03-10 00:54:47 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8801996672212978 126 | 2018-03-10 00:54:48 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8730385164051355 127 | 2018-03-10 00:54:48 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8639200998751561 128 | 2018-03-10 00:54:49 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8634850166481687 129 | 2018-03-10 00:54:50 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8621378621378621 130 | 2018-03-10 00:54:50 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8646684831970936 131 | 2018-03-10 00:54:51 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8684429641965029 132 | 2018-03-10 00:54:52 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8700999231360492 133 | 2018-03-10 00:54:52 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8715203426124197 134 | 2018-03-10 00:54:53 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8720852764823451 135 | 2018-03-10 00:54:54 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8707058088694566 136 | 2018-03-10 00:54:54 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8698334361505243 137 | 2018-03-10 00:54:54 - TestLog - INFO - AUROC is 0.9605434641915648 138 | 2018-03-10 00:54:54 - TestLog - INFO - Classification report: 139 | precision recall f1-score support 140 | 141 | 0 0.73737 0.75258 0.74490 194 142 | 1 0.82166 0.81905 0.82035 315 143 | 2 0.90712 0.90468 0.90590 1112 144 | 145 | avg / total 0.87020 0.86983 0.87001 1621 146 | 147 | 2018-03-10 00:54:54 - TestLog - INFO - Confusion matrix: 148 | [[ 146 0 48] 149 | [ 2 258 55] 150 | [ 50 56 1006]] 151 | 2018-03-10 00:54:54 - CVLog - INFO - Validation round 4 of 10 starting 152 | 2018-03-10 00:54:54 - TrainingLog - INFO - Initiating training of main classifier 153 | 2018-03-10 00:55:16 - TrainingLog - INFO - Feature extractor ready 154 | 2018-03-10 00:55:24 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 155 | 2018-03-10 00:55:30 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 156 | 2018-03-10 00:55:36 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 157 | 2018-03-10 00:55:43 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 158 | 2018-03-10 00:55:49 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 159 | 2018-03-10 00:55:56 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 160 | 2018-03-10 00:56:03 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 161 | 2018-03-10 00:56:10 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 162 | 2018-03-10 00:56:17 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 163 | 2018-03-10 00:56:23 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 164 | 2018-03-10 00:56:29 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 165 | 2018-03-10 00:56:36 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 166 | 2018-03-10 00:56:43 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 167 | 2018-03-10 00:56:50 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 168 | 2018-03-10 00:57:06 - TrainingLog - INFO - Main classifier training finished 169 | 2018-03-10 00:57:07 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8514851485148515 170 | 2018-03-10 00:57:07 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8905472636815921 171 | 2018-03-10 00:57:08 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8770764119601329 172 | 2018-03-10 00:57:08 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8927680798004988 173 | 2018-03-10 00:57:08 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8982035928143712 174 | 2018-03-10 00:57:09 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8935108153078203 175 | 2018-03-10 00:57:09 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8873038516405135 176 | 2018-03-10 00:57:10 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8826466916354557 177 | 2018-03-10 00:57:10 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.881243063263041 178 | 2018-03-10 00:57:11 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8821178821178821 179 | 2018-03-10 00:57:11 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8801089918256131 180 | 2018-03-10 00:57:12 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8776019983347211 181 | 2018-03-10 00:57:12 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8724058416602614 182 | 2018-03-10 00:57:13 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8715203426124197 183 | 2018-03-10 00:57:13 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.871419053964024 184 | 2018-03-10 00:57:14 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8744534665833854 185 | 2018-03-10 00:57:14 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8753855644663788 186 | 2018-03-10 00:57:14 - TestLog - INFO - AUROC is 0.9598398472423768 187 | 2018-03-10 00:57:14 - TestLog - INFO - Classification report: 188 | precision recall f1-score support 189 | 190 | 0 0.79213 0.72680 0.75806 194 191 | 1 0.82315 0.81270 0.81789 315 192 | 2 0.90283 0.91906 0.91087 1112 193 | 194 | avg / total 0.87410 0.87539 0.87452 1621 195 | 196 | 2018-03-10 00:57:14 - TestLog - INFO - Confusion matrix: 197 | [[ 141 2 51] 198 | [ 0 256 59] 199 | [ 37 53 1022]] 200 | 2018-03-10 00:57:14 - CVLog - INFO - Validation round 5 of 10 starting 201 | 2018-03-10 00:57:14 - TrainingLog - INFO - Initiating training of main classifier 202 | 2018-03-10 00:57:30 - TrainingLog - INFO - Feature extractor ready 203 | 2018-03-10 00:57:40 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 204 | 2018-03-10 00:57:47 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 205 | 2018-03-10 00:57:53 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 206 | 2018-03-10 00:58:00 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 207 | 2018-03-10 00:58:06 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 208 | 2018-03-10 00:58:13 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 209 | 2018-03-10 00:58:22 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 210 | 2018-03-10 00:58:29 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 211 | 2018-03-10 00:58:35 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 212 | 2018-03-10 00:58:41 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 213 | 2018-03-10 00:58:47 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 214 | 2018-03-10 00:58:55 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 215 | 2018-03-10 00:59:03 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 216 | 2018-03-10 00:59:10 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 217 | 2018-03-10 00:59:31 - TrainingLog - INFO - Main classifier training finished 218 | 2018-03-10 00:59:32 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.900990099009901 219 | 2018-03-10 00:59:33 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8805970149253731 220 | 2018-03-10 00:59:33 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8770764119601329 221 | 2018-03-10 00:59:34 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8827930174563591 222 | 2018-03-10 00:59:34 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8762475049900199 223 | 2018-03-10 00:59:35 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.870216306156406 224 | 2018-03-10 00:59:35 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8744650499286734 225 | 2018-03-10 00:59:36 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8789013732833958 226 | 2018-03-10 00:59:36 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8790233074361821 227 | 2018-03-10 00:59:37 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8791208791208791 228 | 2018-03-10 00:59:38 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8801089918256131 229 | 2018-03-10 00:59:38 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8825978351373855 230 | 2018-03-10 00:59:39 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8831667947732513 231 | 2018-03-10 00:59:39 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8807994289793005 232 | 2018-03-10 00:59:40 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8800799467021986 233 | 2018-03-10 00:59:40 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.876951905059338 234 | 2018-03-10 00:59:40 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8778531770512029 235 | 2018-03-10 00:59:40 - TestLog - INFO - AUROC is 0.9594299612390799 236 | 2018-03-10 00:59:40 - TestLog - INFO - Classification report: 237 | precision recall f1-score support 238 | 239 | 0 0.76404 0.70103 0.73118 194 240 | 1 0.85574 0.82857 0.84194 315 241 | 2 0.90158 0.92266 0.91200 1112 242 | 243 | avg / total 0.87621 0.87785 0.87674 1621 244 | 245 | 2018-03-10 00:59:40 - TestLog - INFO - Confusion matrix: 246 | [[ 136 0 58] 247 | [ 0 261 54] 248 | [ 42 44 1026]] 249 | 2018-03-10 00:59:40 - CVLog - INFO - Validation round 6 of 10 starting 250 | 2018-03-10 00:59:40 - TrainingLog - INFO - Initiating training of main classifier 251 | 2018-03-10 00:59:55 - TrainingLog - INFO - Feature extractor ready 252 | 2018-03-10 01:00:03 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 253 | 2018-03-10 01:00:09 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 254 | 2018-03-10 01:00:15 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 255 | 2018-03-10 01:00:21 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 256 | 2018-03-10 01:00:27 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 257 | 2018-03-10 01:00:32 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 258 | 2018-03-10 01:00:38 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 259 | 2018-03-10 01:00:44 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 260 | 2018-03-10 01:00:50 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 261 | 2018-03-10 01:00:57 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 262 | 2018-03-10 01:01:04 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 263 | 2018-03-10 01:01:09 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 264 | 2018-03-10 01:01:15 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 265 | 2018-03-10 01:01:21 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 266 | 2018-03-10 01:01:36 - TrainingLog - INFO - Main classifier training finished 267 | 2018-03-10 01:01:36 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8910891089108911 268 | 2018-03-10 01:01:37 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8756218905472637 269 | 2018-03-10 01:01:37 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8704318936877077 270 | 2018-03-10 01:01:38 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8902743142144638 271 | 2018-03-10 01:01:38 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8982035928143712 272 | 2018-03-10 01:01:39 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8968386023294509 273 | 2018-03-10 01:01:39 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8958630527817404 274 | 2018-03-10 01:01:40 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8963795255930087 275 | 2018-03-10 01:01:40 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8912319644839067 276 | 2018-03-10 01:01:41 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8921078921078921 277 | 2018-03-10 01:01:41 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8873751135331517 278 | 2018-03-10 01:01:41 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.884263114071607 279 | 2018-03-10 01:01:42 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8854727132974635 280 | 2018-03-10 01:01:42 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.880085653104925 281 | 2018-03-10 01:01:43 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8760826115922719 282 | 2018-03-10 01:01:43 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8800749531542785 283 | 2018-03-10 01:01:43 - TestLog - INFO - Total 1620 samples classified with accuracy 0.8802469135802469 284 | 2018-03-10 01:01:43 - TestLog - INFO - AUROC is 0.9605301476112963 285 | 2018-03-10 01:01:43 - TestLog - INFO - Classification report: 286 | precision recall f1-score support 287 | 288 | 0 0.74611 0.74227 0.74419 194 289 | 1 0.86424 0.82857 0.84603 315 290 | 2 0.90756 0.91899 0.91324 1111 291 | 292 | avg / total 0.87980 0.88025 0.87993 1620 293 | 294 | 2018-03-10 01:01:43 - TestLog - INFO - Confusion matrix: 295 | [[ 144 0 50] 296 | [ 0 261 54] 297 | [ 49 41 1021]] 298 | 2018-03-10 01:01:43 - CVLog - INFO - Validation round 7 of 10 starting 299 | 2018-03-10 01:01:43 - TrainingLog - INFO - Initiating training of main classifier 300 | 2018-03-10 01:01:59 - TrainingLog - INFO - Feature extractor ready 301 | 2018-03-10 01:02:07 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 302 | 2018-03-10 01:02:13 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 303 | 2018-03-10 01:02:21 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 304 | 2018-03-10 01:02:29 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 305 | 2018-03-10 01:02:35 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 306 | 2018-03-10 01:02:40 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 307 | 2018-03-10 01:02:46 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 308 | 2018-03-10 01:02:52 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 309 | 2018-03-10 01:02:57 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 310 | 2018-03-10 01:03:03 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 311 | 2018-03-10 01:03:08 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 312 | 2018-03-10 01:03:14 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 313 | 2018-03-10 01:03:19 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 314 | 2018-03-10 01:03:25 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 315 | 2018-03-10 01:03:37 - TrainingLog - INFO - Main classifier training finished 316 | 2018-03-10 01:03:38 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8514851485148515 317 | 2018-03-10 01:03:39 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8855721393034826 318 | 2018-03-10 01:03:39 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.893687707641196 319 | 2018-03-10 01:03:40 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.9027431421446384 320 | 2018-03-10 01:03:41 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8922155688622755 321 | 2018-03-10 01:03:41 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.891846921797005 322 | 2018-03-10 01:03:42 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8972895863052782 323 | 2018-03-10 01:03:43 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.898876404494382 324 | 2018-03-10 01:03:43 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8990011098779135 325 | 2018-03-10 01:03:44 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8981018981018981 326 | 2018-03-10 01:03:44 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8946412352406903 327 | 2018-03-10 01:03:45 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8900915903413822 328 | 2018-03-10 01:03:46 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8862413528055342 329 | 2018-03-10 01:03:46 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8872234118486795 330 | 2018-03-10 01:03:47 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8834110592938041 331 | 2018-03-10 01:03:47 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8838226108682073 332 | 2018-03-10 01:03:47 - TestLog - INFO - Total 1620 samples classified with accuracy 0.8833333333333333 333 | 2018-03-10 01:03:47 - TestLog - INFO - AUROC is 0.9644468378794926 334 | 2018-03-10 01:03:47 - TestLog - INFO - Classification report: 335 | precision recall f1-score support 336 | 337 | 0 0.74742 0.74742 0.74742 194 338 | 1 0.85342 0.83175 0.84244 315 339 | 2 0.91510 0.92169 0.91839 1111 340 | 341 | avg / total 0.88303 0.88333 0.88315 1620 342 | 343 | 2018-03-10 01:03:47 - TestLog - INFO - Confusion matrix: 344 | [[ 145 3 46] 345 | [ 4 262 49] 346 | [ 45 42 1024]] 347 | 2018-03-10 01:03:47 - CVLog - INFO - Validation round 8 of 10 starting 348 | 2018-03-10 01:03:48 - TrainingLog - INFO - Initiating training of main classifier 349 | 2018-03-10 01:04:04 - TrainingLog - INFO - Feature extractor ready 350 | 2018-03-10 01:04:11 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 351 | 2018-03-10 01:04:17 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 352 | 2018-03-10 01:04:22 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 353 | 2018-03-10 01:04:28 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 354 | 2018-03-10 01:04:34 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 355 | 2018-03-10 01:04:39 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 356 | 2018-03-10 01:04:45 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 357 | 2018-03-10 01:04:50 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 358 | 2018-03-10 01:04:56 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 359 | 2018-03-10 01:05:02 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 360 | 2018-03-10 01:05:07 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 361 | 2018-03-10 01:05:13 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 362 | 2018-03-10 01:05:18 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 363 | 2018-03-10 01:05:24 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 364 | 2018-03-10 01:05:36 - TrainingLog - INFO - Main classifier training finished 365 | 2018-03-10 01:05:37 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8910891089108911 366 | 2018-03-10 01:05:37 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8805970149253731 367 | 2018-03-10 01:05:38 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8903654485049833 368 | 2018-03-10 01:05:38 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8877805486284289 369 | 2018-03-10 01:05:39 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8862275449101796 370 | 2018-03-10 01:05:39 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8868552412645591 371 | 2018-03-10 01:05:39 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.884450784593438 372 | 2018-03-10 01:05:40 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8888888888888888 373 | 2018-03-10 01:05:40 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8912319644839067 374 | 2018-03-10 01:05:41 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8881118881118881 375 | 2018-03-10 01:05:41 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.888283378746594 376 | 2018-03-10 01:05:42 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8892589508742714 377 | 2018-03-10 01:05:42 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8847040737893928 378 | 2018-03-10 01:05:42 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8843683083511777 379 | 2018-03-10 01:05:43 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8807461692205196 380 | 2018-03-10 01:05:43 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8782011242973142 381 | 2018-03-10 01:05:43 - TestLog - INFO - Total 1620 samples classified with accuracy 0.8783950617283951 382 | 2018-03-10 01:05:43 - TestLog - INFO - AUROC is 0.9646955779919653 383 | 2018-03-10 01:05:43 - TestLog - INFO - Classification report: 384 | precision recall f1-score support 385 | 386 | 0 0.78613 0.70103 0.74114 194 387 | 1 0.87108 0.79365 0.83056 315 388 | 2 0.89397 0.93339 0.91325 1111 389 | 390 | avg / total 0.87660 0.87840 0.87656 1620 391 | 392 | 2018-03-10 01:05:43 - TestLog - INFO - Confusion matrix: 393 | [[ 136 0 58] 394 | [ 0 250 65] 395 | [ 37 37 1037]] 396 | 2018-03-10 01:05:43 - CVLog - INFO - Validation round 9 of 10 starting 397 | 2018-03-10 01:05:44 - TrainingLog - INFO - Initiating training of main classifier 398 | 2018-03-10 01:06:00 - TrainingLog - INFO - Feature extractor ready 399 | 2018-03-10 01:06:08 - TrainingLog - INFO - 1001 of 14583 feature vectors prepared for training 400 | 2018-03-10 01:06:14 - TrainingLog - INFO - 2001 of 14583 feature vectors prepared for training 401 | 2018-03-10 01:06:19 - TrainingLog - INFO - 3001 of 14583 feature vectors prepared for training 402 | 2018-03-10 01:06:26 - TrainingLog - INFO - 4001 of 14583 feature vectors prepared for training 403 | 2018-03-10 01:06:31 - TrainingLog - INFO - 5001 of 14583 feature vectors prepared for training 404 | 2018-03-10 01:06:37 - TrainingLog - INFO - 6001 of 14583 feature vectors prepared for training 405 | 2018-03-10 01:06:43 - TrainingLog - INFO - 7001 of 14583 feature vectors prepared for training 406 | 2018-03-10 01:06:49 - TrainingLog - INFO - 8001 of 14583 feature vectors prepared for training 407 | 2018-03-10 01:06:55 - TrainingLog - INFO - 9001 of 14583 feature vectors prepared for training 408 | 2018-03-10 01:07:01 - TrainingLog - INFO - 10001 of 14583 feature vectors prepared for training 409 | 2018-03-10 01:07:06 - TrainingLog - INFO - 11001 of 14583 feature vectors prepared for training 410 | 2018-03-10 01:07:13 - TrainingLog - INFO - 12001 of 14583 feature vectors prepared for training 411 | 2018-03-10 01:07:19 - TrainingLog - INFO - 13001 of 14583 feature vectors prepared for training 412 | 2018-03-10 01:07:25 - TrainingLog - INFO - 14001 of 14583 feature vectors prepared for training 413 | 2018-03-10 01:07:41 - TrainingLog - INFO - Main classifier training finished 414 | 2018-03-10 01:07:42 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8712871287128713 415 | 2018-03-10 01:07:43 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8606965174129353 416 | 2018-03-10 01:07:43 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8903654485049833 417 | 2018-03-10 01:07:44 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8778054862842892 418 | 2018-03-10 01:07:44 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8662674650698603 419 | 2018-03-10 01:07:45 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8635607321131448 420 | 2018-03-10 01:07:46 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8630527817403709 421 | 2018-03-10 01:07:46 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8664169787765293 422 | 2018-03-10 01:07:47 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8723640399556049 423 | 2018-03-10 01:07:47 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8681318681318682 424 | 2018-03-10 01:07:47 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8683015440508629 425 | 2018-03-10 01:07:48 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8734388009991674 426 | 2018-03-10 01:07:48 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8754803996925442 427 | 2018-03-10 01:07:49 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8736616702355461 428 | 2018-03-10 01:07:49 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8747501665556295 429 | 2018-03-10 01:07:50 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8744534665833854 430 | 2018-03-10 01:07:50 - TestLog - INFO - Total 1619 samples classified with accuracy 0.8746139592340951 431 | 2018-03-10 01:07:50 - TestLog - INFO - AUROC is 0.9607994436010016 432 | 2018-03-10 01:07:50 - TestLog - INFO - Classification report: 433 | precision recall f1-score support 434 | 435 | 0 0.72464 0.77320 0.74813 194 436 | 1 0.84950 0.80892 0.82871 314 437 | 2 0.90925 0.91089 0.91007 1111 438 | 439 | avg / total 0.87554 0.87461 0.87489 1619 440 | 441 | 2018-03-10 01:07:50 - TestLog - INFO - Confusion matrix: 442 | [[ 150 2 42] 443 | [ 1 254 59] 444 | [ 56 43 1012]] 445 | 2018-03-10 01:07:50 - CVLog - INFO - Validation round 10 of 10 starting 446 | 2018-03-10 01:07:50 - TrainingLog - INFO - Initiating training of main classifier 447 | 2018-03-10 01:08:07 - TrainingLog - INFO - Feature extractor ready 448 | 2018-03-10 01:08:16 - TrainingLog - INFO - 1001 of 14584 feature vectors prepared for training 449 | 2018-03-10 01:08:22 - TrainingLog - INFO - 2001 of 14584 feature vectors prepared for training 450 | 2018-03-10 01:08:27 - TrainingLog - INFO - 3001 of 14584 feature vectors prepared for training 451 | 2018-03-10 01:08:34 - TrainingLog - INFO - 4001 of 14584 feature vectors prepared for training 452 | 2018-03-10 01:08:40 - TrainingLog - INFO - 5001 of 14584 feature vectors prepared for training 453 | 2018-03-10 01:08:45 - TrainingLog - INFO - 6001 of 14584 feature vectors prepared for training 454 | 2018-03-10 01:08:51 - TrainingLog - INFO - 7001 of 14584 feature vectors prepared for training 455 | 2018-03-10 01:08:58 - TrainingLog - INFO - 8001 of 14584 feature vectors prepared for training 456 | 2018-03-10 01:09:04 - TrainingLog - INFO - 9001 of 14584 feature vectors prepared for training 457 | 2018-03-10 01:09:09 - TrainingLog - INFO - 10001 of 14584 feature vectors prepared for training 458 | 2018-03-10 01:09:15 - TrainingLog - INFO - 11001 of 14584 feature vectors prepared for training 459 | 2018-03-10 01:09:22 - TrainingLog - INFO - 12001 of 14584 feature vectors prepared for training 460 | 2018-03-10 01:09:29 - TrainingLog - INFO - 13001 of 14584 feature vectors prepared for training 461 | 2018-03-10 01:09:34 - TrainingLog - INFO - 14001 of 14584 feature vectors prepared for training 462 | 2018-03-10 01:09:51 - TrainingLog - INFO - Main classifier training finished 463 | 2018-03-10 01:09:51 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.900990099009901 464 | 2018-03-10 01:09:52 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8955223880597015 465 | 2018-03-10 01:09:53 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.867109634551495 466 | 2018-03-10 01:09:54 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8553615960099751 467 | 2018-03-10 01:09:55 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8602794411177644 468 | 2018-03-10 01:09:55 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8635607321131448 469 | 2018-03-10 01:09:56 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8616262482168331 470 | 2018-03-10 01:09:56 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8626716604244694 471 | 2018-03-10 01:09:57 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8645948945615982 472 | 2018-03-10 01:09:58 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8611388611388612 473 | 2018-03-10 01:09:58 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8637602179836512 474 | 2018-03-10 01:09:59 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8659450457951707 475 | 2018-03-10 01:09:59 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8639508070714835 476 | 2018-03-10 01:10:00 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.860813704496788 477 | 2018-03-10 01:10:01 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8580946035976016 478 | 2018-03-10 01:10:01 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.858213616489694 479 | 2018-03-10 01:10:01 - TestLog - INFO - Total 1618 samples classified with accuracy 0.8572311495673671 480 | 2018-03-10 01:10:01 - TestLog - INFO - AUROC is 0.9508771093530785 481 | 2018-03-10 01:10:01 - TestLog - INFO - Classification report: 482 | precision recall f1-score support 483 | 484 | 0 0.69154 0.72021 0.70558 193 485 | 1 0.81553 0.80255 0.80899 314 486 | 2 0.89892 0.89649 0.89770 1111 487 | 488 | avg / total 0.85800 0.85723 0.85757 1618 489 | 490 | 2018-03-10 01:10:01 - TestLog - INFO - Confusion matrix: 491 | [[139 0 54] 492 | [ 4 252 58] 493 | [ 58 57 996]] 494 | 2018-03-10 01:10:01 - CVLog - INFO - 495 | 496 | 2018-03-10 01:10:01 - CVLog - INFO - Summary (precision, recall, F1, accuracy): 497 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 1: [0.87062632615791136, 0.87045033929673044, 0.87046424951431234, 0.87045033929673044] 498 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 2: [0.86505863908500913, 0.86489821098087605, 0.86495414738832921, 0.86489821098087605] 499 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 3: [0.87019958772708716, 0.86983343615052433, 0.87000569175104703, 0.86983343615052433] 500 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 4: [0.87409637551049046, 0.87538556446637883, 0.87451669615902761, 0.87538556446637883] 501 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 5: [0.87621281392349637, 0.87785317705120292, 0.87674468833124375, 0.87785317705120292] 502 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 6: [0.879799651573832, 0.88024691358024687, 0.8799250723311256, 0.88024691358024687] 503 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 7: [0.88302872802523003, 0.8833333333333333, 0.88314582241896122, 0.8833333333333333] 504 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 8: [0.87660160745545845, 0.8783950617283951, 0.87656493736708452, 0.8783950617283951] 505 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 9: [0.8755427279344199, 0.87461395923409513, 0.8748872264272185, 0.87461395923409513] 506 | 2018-03-10 01:10:01 - CVLog - INFO - Metrics for round 10: [0.85799881489306673, 0.85723114956736712, 0.85756902928668621, 0.85723114956736712] 507 | 2018-03-10 01:10:01 - CVLog - INFO - 508 | 509 | 2018-03-10 01:10:01 - CVLog - INFO - Final average metrics: 0.8729165272286001, 0.873224114538915, 0.8728777560975036, 0.873224114538915 510 | 511 | Process finished with exit code 0 512 | -------------------------------------------------------------------------------- /Results/Waseem_Hovy_hidden-baseline.txt: -------------------------------------------------------------------------------- 1 | /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pushkarmishra/Desktop/AuthorProfileAbuseDetection/twitter_model.py -c 30000 2 | Using Theano backend. 3 | 2018-03-06 17:25:33 - CVLog - INFO - 10-fold cross validation procedure has begun 4 | 2018-03-06 17:25:33 - CVLog - INFO - Validation round 1 of 10 starting 5 | 2018-03-06 17:25:33 - TrainingLog - INFO - Initiating training of main classifier 6 | 2018-03-06 17:25:53 - TrainingLog - INFO - Feature extractor ready 7 | 2018-03-06 17:26:06 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 8 | 2018-03-06 17:26:13 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 9 | 2018-03-06 17:26:20 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 10 | 2018-03-06 17:26:27 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 11 | 2018-03-06 17:26:34 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 12 | 2018-03-06 17:26:40 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 13 | 2018-03-06 17:26:46 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 14 | 2018-03-06 17:26:53 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 15 | 2018-03-06 17:27:00 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 16 | 2018-03-06 17:27:07 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 17 | 2018-03-06 17:27:13 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 18 | 2018-03-06 17:27:19 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 19 | 2018-03-06 17:27:26 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 20 | 2018-03-06 17:27:32 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 21 | 2018-03-06 17:27:47 - TrainingLog - INFO - Main classifier training finished 22 | 2018-03-06 17:27:48 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8316831683168316 23 | 2018-03-06 17:27:49 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8159203980099502 24 | 2018-03-06 17:27:51 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8205980066445183 25 | 2018-03-06 17:27:52 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8129675810473815 26 | 2018-03-06 17:27:54 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8243512974051896 27 | 2018-03-06 17:27:55 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8386023294509152 28 | 2018-03-06 17:27:57 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8388017118402282 29 | 2018-03-06 17:27:58 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8389513108614233 30 | 2018-03-06 17:28:00 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8379578246392897 31 | 2018-03-06 17:28:01 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8371628371628371 32 | 2018-03-06 17:28:02 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8365122615803815 33 | 2018-03-06 17:28:04 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8401332223147377 34 | 2018-03-06 17:28:04 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8378170637970792 35 | 2018-03-06 17:28:05 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8379728765167738 36 | 2018-03-06 17:28:06 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8374417055296469 37 | 2018-03-06 17:28:07 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8376014990630856 38 | 2018-03-06 17:28:07 - TestLog - INFO - Total 1621 samples classified with accuracy 0.838371375694016 39 | 2018-03-06 17:28:07 - TestLog - INFO - AUROC is 0.9214737176884372 40 | 2018-03-06 17:28:07 - TestLog - INFO - Classification report: 41 | precision recall f1-score support 42 | 43 | 0 0.76316 0.74742 0.75521 194 44 | 1 0.74823 0.66984 0.70687 315 45 | 2 0.87293 0.90198 0.88722 1112 46 | 47 | avg / total 0.83556 0.83837 0.83637 1621 48 | 49 | 2018-03-06 17:28:07 - TestLog - INFO - Confusion matrix: 50 | [[ 145 4 45] 51 | [ 3 211 101] 52 | [ 42 67 1003]] 53 | 2018-03-06 17:28:07 - CVLog - INFO - Validation round 2 of 10 starting 54 | 2018-03-06 17:28:07 - TrainingLog - INFO - Initiating training of main classifier 55 | 2018-03-06 17:28:27 - TrainingLog - INFO - Feature extractor ready 56 | 2018-03-06 17:28:35 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 57 | 2018-03-06 17:28:41 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 58 | 2018-03-06 17:28:48 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 59 | 2018-03-06 17:28:55 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 60 | 2018-03-06 17:29:01 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 61 | 2018-03-06 17:29:09 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 62 | 2018-03-06 17:29:15 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 63 | 2018-03-06 17:29:22 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 64 | 2018-03-06 17:29:31 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 65 | 2018-03-06 17:29:39 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 66 | 2018-03-06 17:29:46 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 67 | 2018-03-06 17:29:53 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 68 | 2018-03-06 17:30:01 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 69 | 2018-03-06 17:30:09 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 70 | 2018-03-06 17:30:26 - TrainingLog - INFO - Main classifier training finished 71 | 2018-03-06 17:30:27 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8415841584158416 72 | 2018-03-06 17:30:28 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8706467661691543 73 | 2018-03-06 17:30:29 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8504983388704319 74 | 2018-03-06 17:30:30 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8403990024937655 75 | 2018-03-06 17:30:31 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8363273453093812 76 | 2018-03-06 17:30:33 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8402662229617305 77 | 2018-03-06 17:30:34 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8459343794579173 78 | 2018-03-06 17:30:36 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8401997503121099 79 | 2018-03-06 17:30:37 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8357380688124306 80 | 2018-03-06 17:30:38 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8391608391608392 81 | 2018-03-06 17:30:39 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8310626702997275 82 | 2018-03-06 17:30:41 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.832639467110741 83 | 2018-03-06 17:30:43 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8285933897002306 84 | 2018-03-06 17:30:44 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8251249107780158 85 | 2018-03-06 17:30:45 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8221185876082612 86 | 2018-03-06 17:30:46 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8219862585883823 87 | 2018-03-06 17:30:47 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8217149907464528 88 | 2018-03-06 17:30:47 - TestLog - INFO - AUROC is 0.9248618781665271 89 | 2018-03-06 17:30:47 - TestLog - INFO - Classification report: 90 | precision recall f1-score support 91 | 92 | 0 0.72021 0.71649 0.71835 194 93 | 1 0.72982 0.66032 0.69333 315 94 | 2 0.86177 0.88579 0.87361 1112 95 | 96 | avg / total 0.81919 0.82171 0.82000 1621 97 | 98 | 2018-03-06 17:30:47 - TestLog - INFO - Confusion matrix: 99 | [[139 2 53] 100 | [ 2 208 105] 101 | [ 52 75 985]] 102 | 2018-03-06 17:30:47 - CVLog - INFO - Validation round 3 of 10 starting 103 | 2018-03-06 17:30:47 - TrainingLog - INFO - Initiating training of main classifier 104 | 2018-03-06 17:31:10 - TrainingLog - INFO - Feature extractor ready 105 | 2018-03-06 17:31:18 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 106 | 2018-03-06 17:31:25 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 107 | 2018-03-06 17:31:33 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 108 | 2018-03-06 17:31:40 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 109 | 2018-03-06 17:31:48 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 110 | 2018-03-06 17:31:56 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 111 | 2018-03-06 17:32:03 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 112 | 2018-03-06 17:32:09 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 113 | 2018-03-06 17:32:17 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 114 | 2018-03-06 17:32:28 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 115 | 2018-03-06 17:32:37 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 116 | 2018-03-06 17:32:45 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 117 | 2018-03-06 17:32:53 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 118 | 2018-03-06 17:33:03 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 119 | 2018-03-06 17:33:21 - TrainingLog - INFO - Main classifier training finished 120 | 2018-03-06 17:33:22 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8514851485148515 121 | 2018-03-06 17:33:23 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8656716417910447 122 | 2018-03-06 17:33:25 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8571428571428571 123 | 2018-03-06 17:33:26 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8478802992518704 124 | 2018-03-06 17:33:28 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.846307385229541 125 | 2018-03-06 17:33:30 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.848585690515807 126 | 2018-03-06 17:33:31 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8416547788873039 127 | 2018-03-06 17:33:32 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8314606741573034 128 | 2018-03-06 17:33:34 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8312985571587126 129 | 2018-03-06 17:33:35 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8271728271728271 130 | 2018-03-06 17:33:36 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8265213442325159 131 | 2018-03-06 17:33:38 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8309741881765196 132 | 2018-03-06 17:33:39 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8339738662567256 133 | 2018-03-06 17:33:41 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8358315488936474 134 | 2018-03-06 17:33:42 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8367754830113258 135 | 2018-03-06 17:33:44 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.835103060587133 136 | 2018-03-06 17:33:44 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8340530536705737 137 | 2018-03-06 17:33:44 - TestLog - INFO - AUROC is 0.9239567880714993 138 | 2018-03-06 17:33:44 - TestLog - INFO - Classification report: 139 | precision recall f1-score support 140 | 141 | 0 0.71287 0.74227 0.72727 194 142 | 1 0.75779 0.69524 0.72517 315 143 | 2 0.87522 0.88939 0.88225 1112 144 | 145 | avg / total 0.83297 0.83405 0.83318 1621 146 | 147 | 2018-03-06 17:33:44 - TestLog - INFO - Confusion matrix: 148 | [[144 1 49] 149 | [ 4 219 92] 150 | [ 54 69 989]] 151 | 2018-03-06 17:33:44 - CVLog - INFO - Validation round 4 of 10 starting 152 | 2018-03-06 17:33:44 - TrainingLog - INFO - Initiating training of main classifier 153 | 2018-03-06 17:34:06 - TrainingLog - INFO - Feature extractor ready 154 | 2018-03-06 17:34:15 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 155 | 2018-03-06 17:34:22 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 156 | 2018-03-06 17:34:30 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 157 | 2018-03-06 17:34:38 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 158 | 2018-03-06 17:34:49 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 159 | 2018-03-06 17:34:58 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 160 | 2018-03-06 17:35:08 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 161 | 2018-03-06 17:35:18 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 162 | 2018-03-06 17:35:27 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 163 | 2018-03-06 17:35:36 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 164 | 2018-03-06 17:35:46 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 165 | 2018-03-06 17:35:56 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 166 | 2018-03-06 17:36:07 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 167 | 2018-03-06 17:36:16 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 168 | 2018-03-06 17:36:34 - TrainingLog - INFO - Main classifier training finished 169 | 2018-03-06 17:36:35 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8514851485148515 170 | 2018-03-06 17:36:36 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8855721393034826 171 | 2018-03-06 17:36:37 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.867109634551495 172 | 2018-03-06 17:36:38 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8778054862842892 173 | 2018-03-06 17:36:39 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8842315369261478 174 | 2018-03-06 17:36:40 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8768718801996672 175 | 2018-03-06 17:36:41 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8730385164051355 176 | 2018-03-06 17:36:42 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8589263420724095 177 | 2018-03-06 17:36:44 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8546059933407325 178 | 2018-03-06 17:36:46 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8541458541458542 179 | 2018-03-06 17:36:47 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.846503178928247 180 | 2018-03-06 17:36:49 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8442964196502915 181 | 2018-03-06 17:36:50 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8393543428132206 182 | 2018-03-06 17:36:51 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8386866523911491 183 | 2018-03-06 17:36:53 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8367754830113258 184 | 2018-03-06 17:36:54 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8425983760149907 185 | 2018-03-06 17:36:54 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8433066008636644 186 | 2018-03-06 17:36:54 - TestLog - INFO - AUROC is 0.9252341574439802 187 | 2018-03-06 17:36:54 - TestLog - INFO - Classification report: 188 | precision recall f1-score support 189 | 190 | 0 0.78453 0.73196 0.75733 194 191 | 1 0.76157 0.67937 0.71812 315 192 | 2 0.87230 0.90917 0.89036 1112 193 | 194 | avg / total 0.84028 0.84331 0.84097 1621 195 | 196 | 2018-03-06 17:36:54 - TestLog - INFO - Confusion matrix: 197 | [[ 142 3 49] 198 | [ 2 214 99] 199 | [ 37 64 1011]] 200 | 2018-03-06 17:36:54 - CVLog - INFO - Validation round 5 of 10 starting 201 | 2018-03-06 17:36:54 - TrainingLog - INFO - Initiating training of main classifier 202 | 2018-03-06 17:37:24 - TrainingLog - INFO - Feature extractor ready 203 | 2018-03-06 17:37:37 - TrainingLog - INFO - 1001 of 14581 feature vectors prepared for training 204 | 2018-03-06 17:37:46 - TrainingLog - INFO - 2001 of 14581 feature vectors prepared for training 205 | 2018-03-06 17:37:56 - TrainingLog - INFO - 3001 of 14581 feature vectors prepared for training 206 | 2018-03-06 17:38:06 - TrainingLog - INFO - 4001 of 14581 feature vectors prepared for training 207 | 2018-03-06 17:38:16 - TrainingLog - INFO - 5001 of 14581 feature vectors prepared for training 208 | 2018-03-06 17:38:26 - TrainingLog - INFO - 6001 of 14581 feature vectors prepared for training 209 | 2018-03-06 17:38:38 - TrainingLog - INFO - 7001 of 14581 feature vectors prepared for training 210 | 2018-03-06 17:38:49 - TrainingLog - INFO - 8001 of 14581 feature vectors prepared for training 211 | 2018-03-06 17:38:59 - TrainingLog - INFO - 9001 of 14581 feature vectors prepared for training 212 | 2018-03-06 17:39:09 - TrainingLog - INFO - 10001 of 14581 feature vectors prepared for training 213 | 2018-03-06 17:39:18 - TrainingLog - INFO - 11001 of 14581 feature vectors prepared for training 214 | 2018-03-06 17:39:27 - TrainingLog - INFO - 12001 of 14581 feature vectors prepared for training 215 | 2018-03-06 17:39:37 - TrainingLog - INFO - 13001 of 14581 feature vectors prepared for training 216 | 2018-03-06 17:39:48 - TrainingLog - INFO - 14001 of 14581 feature vectors prepared for training 217 | 2018-03-06 17:40:08 - TrainingLog - INFO - Main classifier training finished 218 | 2018-03-06 17:40:09 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8613861386138614 219 | 2018-03-06 17:40:12 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8208955223880597 220 | 2018-03-06 17:40:13 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8172757475083057 221 | 2018-03-06 17:40:14 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8329177057356608 222 | 2018-03-06 17:40:15 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8323353293413174 223 | 2018-03-06 17:40:16 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8336106489184693 224 | 2018-03-06 17:40:17 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8402282453637661 225 | 2018-03-06 17:40:18 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8439450686641697 226 | 2018-03-06 17:40:20 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8423973362930077 227 | 2018-03-06 17:40:21 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8471528471528471 228 | 2018-03-06 17:40:23 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.849227974568574 229 | 2018-03-06 17:40:24 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8492922564529559 230 | 2018-03-06 17:40:26 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.851652574942352 231 | 2018-03-06 17:40:27 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8479657387580299 232 | 2018-03-06 17:40:29 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8481012658227848 233 | 2018-03-06 17:40:31 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8450968144909432 234 | 2018-03-06 17:40:32 - TestLog - INFO - Total 1621 samples classified with accuracy 0.8457742134484886 235 | 2018-03-06 17:40:32 - TestLog - INFO - AUROC is 0.9265803964921612 236 | 2018-03-06 17:40:32 - TestLog - INFO - Classification report: 237 | precision recall f1-score support 238 | 239 | 0 0.76744 0.68041 0.72131 194 240 | 1 0.77627 0.72698 0.75082 315 241 | 2 0.87522 0.90827 0.89144 1112 242 | 243 | avg / total 0.84309 0.84577 0.84375 1621 244 | 245 | 2018-03-06 17:40:32 - TestLog - INFO - Confusion matrix: 246 | [[ 132 3 59] 247 | [ 1 229 85] 248 | [ 39 63 1010]] 249 | 2018-03-06 17:40:32 - CVLog - INFO - Validation round 6 of 10 starting 250 | 2018-03-06 17:40:32 - TrainingLog - INFO - Initiating training of main classifier 251 | 2018-03-06 17:41:00 - TrainingLog - INFO - Feature extractor ready 252 | 2018-03-06 17:41:09 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 253 | 2018-03-06 17:41:19 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 254 | 2018-03-06 17:41:29 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 255 | 2018-03-06 17:41:39 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 256 | 2018-03-06 17:41:50 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 257 | 2018-03-06 17:42:02 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 258 | 2018-03-06 17:42:13 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 259 | 2018-03-06 17:42:24 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 260 | 2018-03-06 17:42:35 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 261 | 2018-03-06 17:42:44 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 262 | 2018-03-06 17:42:55 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 263 | 2018-03-06 17:43:05 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 264 | 2018-03-06 17:43:15 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 265 | 2018-03-06 17:43:26 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 266 | 2018-03-06 17:43:45 - TrainingLog - INFO - Main classifier training finished 267 | 2018-03-06 17:43:46 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8910891089108911 268 | 2018-03-06 17:43:47 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.835820895522388 269 | 2018-03-06 17:43:48 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8504983388704319 270 | 2018-03-06 17:43:49 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.85785536159601 271 | 2018-03-06 17:43:50 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8642714570858283 272 | 2018-03-06 17:43:51 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8569051580698835 273 | 2018-03-06 17:43:53 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8573466476462197 274 | 2018-03-06 17:43:56 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8576779026217228 275 | 2018-03-06 17:43:57 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8523862375138734 276 | 2018-03-06 17:43:58 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8521478521478522 277 | 2018-03-06 17:44:00 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.849227974568574 278 | 2018-03-06 17:44:01 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8484596169858452 279 | 2018-03-06 17:44:02 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8485780169100692 280 | 2018-03-06 17:44:04 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8465381870092791 281 | 2018-03-06 17:44:05 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8421052631578947 282 | 2018-03-06 17:44:06 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8450968144909432 283 | 2018-03-06 17:44:07 - TestLog - INFO - Total 1620 samples classified with accuracy 0.845679012345679 284 | 2018-03-06 17:44:07 - TestLog - INFO - AUROC is 0.9191414585478324 285 | 2018-03-06 17:44:07 - TestLog - INFO - Classification report: 286 | precision recall f1-score support 287 | 288 | 0 0.75691 0.70619 0.73067 194 289 | 1 0.78322 0.71111 0.74542 315 290 | 2 0.87511 0.90819 0.89134 1111 291 | 292 | avg / total 0.84309 0.84568 0.84373 1620 293 | 294 | 2018-03-06 17:44:07 - TestLog - INFO - Confusion matrix: 295 | [[ 137 4 53] 296 | [ 0 224 91] 297 | [ 44 58 1009]] 298 | 2018-03-06 17:44:07 - CVLog - INFO - Validation round 7 of 10 starting 299 | 2018-03-06 17:44:07 - TrainingLog - INFO - Initiating training of main classifier 300 | 2018-03-06 17:44:38 - TrainingLog - INFO - Feature extractor ready 301 | 2018-03-06 17:44:51 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 302 | 2018-03-06 17:45:00 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 303 | 2018-03-06 17:45:10 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 304 | 2018-03-06 17:45:18 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 305 | 2018-03-06 17:45:27 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 306 | 2018-03-06 17:45:36 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 307 | 2018-03-06 17:45:48 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 308 | 2018-03-06 17:45:58 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 309 | 2018-03-06 17:46:07 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 310 | 2018-03-06 17:46:18 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 311 | 2018-03-06 17:46:28 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 312 | 2018-03-06 17:46:40 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 313 | 2018-03-06 17:46:50 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 314 | 2018-03-06 17:47:00 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 315 | 2018-03-06 17:47:18 - TrainingLog - INFO - Main classifier training finished 316 | 2018-03-06 17:47:20 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8415841584158416 317 | 2018-03-06 17:47:22 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8756218905472637 318 | 2018-03-06 17:47:23 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8704318936877077 319 | 2018-03-06 17:47:24 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8728179551122195 320 | 2018-03-06 17:47:25 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8522954091816367 321 | 2018-03-06 17:47:26 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8535773710482529 322 | 2018-03-06 17:47:27 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.854493580599144 323 | 2018-03-06 17:47:29 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8514357053682896 324 | 2018-03-06 17:47:31 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8512763596004439 325 | 2018-03-06 17:47:33 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8511488511488512 326 | 2018-03-06 17:47:35 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8483197093551317 327 | 2018-03-06 17:47:37 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8401332223147377 328 | 2018-03-06 17:47:38 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8355111452728671 329 | 2018-03-06 17:47:39 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8336902212705211 330 | 2018-03-06 17:47:41 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.832111925383078 331 | 2018-03-06 17:47:43 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.835103060587133 332 | 2018-03-06 17:47:43 - TestLog - INFO - Total 1620 samples classified with accuracy 0.8351851851851851 333 | 2018-03-06 17:47:43 - TestLog - INFO - AUROC is 0.9241269014770084 334 | 2018-03-06 17:47:43 - TestLog - INFO - Classification report: 335 | precision recall f1-score support 336 | 337 | 0 0.73232 0.74742 0.73980 194 338 | 1 0.74138 0.68254 0.71074 315 339 | 2 0.87721 0.89379 0.88542 1111 340 | 341 | avg / total 0.83345 0.83519 0.83402 1620 342 | 343 | 2018-03-06 17:47:43 - TestLog - INFO - Confusion matrix: 344 | [[145 4 45] 345 | [ 6 215 94] 346 | [ 47 71 993]] 347 | 2018-03-06 17:47:43 - CVLog - INFO - Validation round 8 of 10 starting 348 | 2018-03-06 17:47:44 - TrainingLog - INFO - Initiating training of main classifier 349 | 2018-03-06 17:48:16 - TrainingLog - INFO - Feature extractor ready 350 | 2018-03-06 17:48:28 - TrainingLog - INFO - 1001 of 14582 feature vectors prepared for training 351 | 2018-03-06 17:48:38 - TrainingLog - INFO - 2001 of 14582 feature vectors prepared for training 352 | 2018-03-06 17:48:49 - TrainingLog - INFO - 3001 of 14582 feature vectors prepared for training 353 | 2018-03-06 17:48:59 - TrainingLog - INFO - 4001 of 14582 feature vectors prepared for training 354 | 2018-03-06 17:49:08 - TrainingLog - INFO - 5001 of 14582 feature vectors prepared for training 355 | 2018-03-06 17:49:17 - TrainingLog - INFO - 6001 of 14582 feature vectors prepared for training 356 | 2018-03-06 17:49:26 - TrainingLog - INFO - 7001 of 14582 feature vectors prepared for training 357 | 2018-03-06 17:49:34 - TrainingLog - INFO - 8001 of 14582 feature vectors prepared for training 358 | 2018-03-06 17:49:41 - TrainingLog - INFO - 9001 of 14582 feature vectors prepared for training 359 | 2018-03-06 17:49:49 - TrainingLog - INFO - 10001 of 14582 feature vectors prepared for training 360 | 2018-03-06 17:49:57 - TrainingLog - INFO - 11001 of 14582 feature vectors prepared for training 361 | 2018-03-06 17:50:04 - TrainingLog - INFO - 12001 of 14582 feature vectors prepared for training 362 | 2018-03-06 17:50:11 - TrainingLog - INFO - 13001 of 14582 feature vectors prepared for training 363 | 2018-03-06 17:50:21 - TrainingLog - INFO - 14001 of 14582 feature vectors prepared for training 364 | 2018-03-06 17:50:39 - TrainingLog - INFO - Main classifier training finished 365 | 2018-03-06 17:50:40 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.900990099009901 366 | 2018-03-06 17:50:41 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8805970149253731 367 | 2018-03-06 17:50:42 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8737541528239202 368 | 2018-03-06 17:50:43 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8603491271820449 369 | 2018-03-06 17:50:44 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8522954091816367 370 | 2018-03-06 17:50:45 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8452579034941764 371 | 2018-03-06 17:50:46 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8459343794579173 372 | 2018-03-06 17:50:47 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.846441947565543 373 | 2018-03-06 17:50:48 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8523862375138734 374 | 2018-03-06 17:50:49 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8511488511488512 375 | 2018-03-06 17:50:50 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8555858310626703 376 | 2018-03-06 17:50:51 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8567860116569526 377 | 2018-03-06 17:50:53 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8524212144504227 378 | 2018-03-06 17:50:54 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8501070663811563 379 | 2018-03-06 17:50:55 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8421052631578947 380 | 2018-03-06 17:50:57 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8413491567770144 381 | 2018-03-06 17:50:57 - TestLog - INFO - Total 1620 samples classified with accuracy 0.841358024691358 382 | 2018-03-06 17:50:57 - TestLog - INFO - AUROC is 0.9299742699303101 383 | 2018-03-06 17:50:57 - TestLog - INFO - Classification report: 384 | precision recall f1-score support 385 | 386 | 0 0.75843 0.69588 0.72581 194 387 | 1 0.79468 0.66349 0.72318 315 388 | 2 0.86429 0.91719 0.88996 1111 389 | 390 | avg / total 0.83808 0.84136 0.83787 1620 391 | 392 | 2018-03-06 17:50:57 - TestLog - INFO - Confusion matrix: 393 | [[ 135 1 58] 394 | [ 4 209 102] 395 | [ 39 53 1019]] 396 | 2018-03-06 17:50:57 - CVLog - INFO - Validation round 9 of 10 starting 397 | 2018-03-06 17:50:57 - TrainingLog - INFO - Initiating training of main classifier 398 | 2018-03-06 17:51:25 - TrainingLog - INFO - Feature extractor ready 399 | 2018-03-06 17:51:38 - TrainingLog - INFO - 1001 of 14583 feature vectors prepared for training 400 | 2018-03-06 17:51:45 - TrainingLog - INFO - 2001 of 14583 feature vectors prepared for training 401 | 2018-03-06 17:51:51 - TrainingLog - INFO - 3001 of 14583 feature vectors prepared for training 402 | 2018-03-06 17:51:57 - TrainingLog - INFO - 4001 of 14583 feature vectors prepared for training 403 | 2018-03-06 17:52:04 - TrainingLog - INFO - 5001 of 14583 feature vectors prepared for training 404 | 2018-03-06 17:52:11 - TrainingLog - INFO - 6001 of 14583 feature vectors prepared for training 405 | 2018-03-06 17:52:17 - TrainingLog - INFO - 7001 of 14583 feature vectors prepared for training 406 | 2018-03-06 17:52:24 - TrainingLog - INFO - 8001 of 14583 feature vectors prepared for training 407 | 2018-03-06 17:52:31 - TrainingLog - INFO - 9001 of 14583 feature vectors prepared for training 408 | 2018-03-06 17:52:38 - TrainingLog - INFO - 10001 of 14583 feature vectors prepared for training 409 | 2018-03-06 17:52:45 - TrainingLog - INFO - 11001 of 14583 feature vectors prepared for training 410 | 2018-03-06 17:52:53 - TrainingLog - INFO - 12001 of 14583 feature vectors prepared for training 411 | 2018-03-06 17:53:02 - TrainingLog - INFO - 13001 of 14583 feature vectors prepared for training 412 | 2018-03-06 17:53:09 - TrainingLog - INFO - 14001 of 14583 feature vectors prepared for training 413 | 2018-03-06 17:53:26 - TrainingLog - INFO - Main classifier training finished 414 | 2018-03-06 17:53:27 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8217821782178217 415 | 2018-03-06 17:53:28 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8308457711442786 416 | 2018-03-06 17:53:28 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8571428571428571 417 | 2018-03-06 17:53:29 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8478802992518704 418 | 2018-03-06 17:53:30 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.846307385229541 419 | 2018-03-06 17:53:31 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8419301164725458 420 | 2018-03-06 17:53:32 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8416547788873039 421 | 2018-03-06 17:53:33 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8414481897627965 422 | 2018-03-06 17:53:33 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8446170921198668 423 | 2018-03-06 17:53:34 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8411588411588412 424 | 2018-03-06 17:53:35 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.8392370572207084 425 | 2018-03-06 17:53:36 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8451290591174022 426 | 2018-03-06 17:53:38 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8447348193697156 427 | 2018-03-06 17:53:40 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8422555317630264 428 | 2018-03-06 17:53:41 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.843437708194537 429 | 2018-03-06 17:53:43 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8457214241099313 430 | 2018-03-06 17:53:43 - TestLog - INFO - Total 1619 samples classified with accuracy 0.8449660284126004 431 | 2018-03-06 17:53:43 - TestLog - INFO - AUROC is 0.9303725809457754 432 | 2018-03-06 17:53:43 - TestLog - INFO - Classification report: 433 | precision recall f1-score support 434 | 435 | 0 0.71569 0.75258 0.73367 194 436 | 1 0.78947 0.71656 0.75125 314 437 | 2 0.88230 0.89739 0.88978 1111 438 | 439 | avg / total 0.84433 0.84497 0.84421 1619 440 | 441 | 2018-03-06 17:53:43 - TestLog - INFO - Confusion matrix: 442 | [[146 2 46] 443 | [ 2 225 87] 444 | [ 56 58 997]] 445 | 2018-03-06 17:53:43 - CVLog - INFO - Validation round 10 of 10 starting 446 | 2018-03-06 17:53:43 - TrainingLog - INFO - Initiating training of main classifier 447 | 2018-03-06 17:54:13 - TrainingLog - INFO - Feature extractor ready 448 | 2018-03-06 17:54:26 - TrainingLog - INFO - 1001 of 14584 feature vectors prepared for training 449 | 2018-03-06 17:54:35 - TrainingLog - INFO - 2001 of 14584 feature vectors prepared for training 450 | 2018-03-06 17:54:43 - TrainingLog - INFO - 3001 of 14584 feature vectors prepared for training 451 | 2018-03-06 17:54:52 - TrainingLog - INFO - 4001 of 14584 feature vectors prepared for training 452 | 2018-03-06 17:55:02 - TrainingLog - INFO - 5001 of 14584 feature vectors prepared for training 453 | 2018-03-06 17:55:12 - TrainingLog - INFO - 6001 of 14584 feature vectors prepared for training 454 | 2018-03-06 17:55:20 - TrainingLog - INFO - 7001 of 14584 feature vectors prepared for training 455 | 2018-03-06 17:55:27 - TrainingLog - INFO - 8001 of 14584 feature vectors prepared for training 456 | 2018-03-06 17:55:35 - TrainingLog - INFO - 9001 of 14584 feature vectors prepared for training 457 | 2018-03-06 17:55:43 - TrainingLog - INFO - 10001 of 14584 feature vectors prepared for training 458 | 2018-03-06 17:55:50 - TrainingLog - INFO - 11001 of 14584 feature vectors prepared for training 459 | 2018-03-06 17:55:59 - TrainingLog - INFO - 12001 of 14584 feature vectors prepared for training 460 | 2018-03-06 17:56:07 - TrainingLog - INFO - 13001 of 14584 feature vectors prepared for training 461 | 2018-03-06 17:56:16 - TrainingLog - INFO - 14001 of 14584 feature vectors prepared for training 462 | 2018-03-06 17:56:34 - TrainingLog - INFO - Main classifier training finished 463 | 2018-03-06 17:56:35 - TestLog - INFO - 101 samples classified. Accuracy up till now is 0.8316831683168316 464 | 2018-03-06 17:56:37 - TestLog - INFO - 201 samples classified. Accuracy up till now is 0.8407960199004975 465 | 2018-03-06 17:56:38 - TestLog - INFO - 301 samples classified. Accuracy up till now is 0.8239202657807309 466 | 2018-03-06 17:56:39 - TestLog - INFO - 401 samples classified. Accuracy up till now is 0.8154613466334164 467 | 2018-03-06 17:56:40 - TestLog - INFO - 501 samples classified. Accuracy up till now is 0.8223552894211577 468 | 2018-03-06 17:56:41 - TestLog - INFO - 601 samples classified. Accuracy up till now is 0.8186356073211315 469 | 2018-03-06 17:56:42 - TestLog - INFO - 701 samples classified. Accuracy up till now is 0.8231098430813124 470 | 2018-03-06 17:56:43 - TestLog - INFO - 801 samples classified. Accuracy up till now is 0.8277153558052435 471 | 2018-03-06 17:56:44 - TestLog - INFO - 901 samples classified. Accuracy up till now is 0.8246392896781354 472 | 2018-03-06 17:56:46 - TestLog - INFO - 1001 samples classified. Accuracy up till now is 0.8191808191808192 473 | 2018-03-06 17:56:47 - TestLog - INFO - 1101 samples classified. Accuracy up till now is 0.821071752951862 474 | 2018-03-06 17:56:48 - TestLog - INFO - 1201 samples classified. Accuracy up till now is 0.8259783513738551 475 | 2018-03-06 17:56:50 - TestLog - INFO - 1301 samples classified. Accuracy up till now is 0.8270561106840891 476 | 2018-03-06 17:56:52 - TestLog - INFO - 1401 samples classified. Accuracy up till now is 0.8251249107780158 477 | 2018-03-06 17:56:53 - TestLog - INFO - 1501 samples classified. Accuracy up till now is 0.8241172551632245 478 | 2018-03-06 17:56:54 - TestLog - INFO - 1601 samples classified. Accuracy up till now is 0.8219862585883823 479 | 2018-03-06 17:56:54 - TestLog - INFO - Total 1618 samples classified with accuracy 0.8207663782447466 480 | 2018-03-06 17:56:54 - TestLog - INFO - AUROC is 0.9136706536351435 481 | 2018-03-06 17:56:54 - TestLog - INFO - Classification report: 482 | precision recall f1-score support 483 | 484 | 0 0.70352 0.72539 0.71429 193 485 | 1 0.72203 0.67834 0.69951 314 486 | 2 0.86744 0.87759 0.87248 1111 487 | 488 | avg / total 0.81967 0.82077 0.82004 1618 489 | 490 | 2018-03-06 17:56:54 - TestLog - INFO - Confusion matrix: 491 | [[140 0 53] 492 | [ 5 213 96] 493 | [ 54 82 975]] 494 | 2018-03-06 17:56:54 - CVLog - INFO - 495 | 496 | 2018-03-06 17:56:54 - CVLog - INFO - Summary (precision, recall, F1, accuracy): 497 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 1: [0.83556175230603469, 0.83837137569401599, 0.83637273252004629, 0.83837137569401599] 498 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 2: [0.81918578559173216, 0.82171499074645282, 0.81999886068535532, 0.82171499074645282] 499 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 3: [0.83297067831425919, 0.8340530536705737, 0.83317571219803721, 0.8340530536705737] 500 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 4: [0.84027998709890628, 0.84330660086366438, 0.84096689610598185, 0.84330660086366438] 501 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 5: [0.84309071306120764, 0.84577421344848858, 0.84375225855879799, 0.84577421344848858] 502 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 6: [0.84308550146894823, 0.84567901234567899, 0.8437282640121796, 0.84567901234567899] 503 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 7: [0.83344679736879934, 0.83518518518518514, 0.83401714933401772, 0.83518518518518514] 504 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 8: [0.83807789187295734, 0.84135802469135801, 0.83787080524391999, 0.84135802469135801] 505 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 9: [0.84433240103943663, 0.84496602841260038, 0.84420746803452851, 0.84496602841260038] 506 | 2018-03-06 17:56:54 - CVLog - INFO - Metrics for round 10: [0.81966677883108718, 0.82076637824474663, 0.82004408041692833, 0.82076637824474663] 507 | 2018-03-06 17:56:54 - CVLog - INFO - 508 | 509 | 2018-03-06 17:56:54 - CVLog - INFO - Final average metrics: 0.8349698286953368, 0.8371174863302764, 0.8354134227109793, 0.8371174863302764 510 | 511 | Process finished with exit code 0 512 | -------------------------------------------------------------------------------- /TwitterData/README.md: -------------------------------------------------------------------------------- 1 | # Twitter Data 2 | 3 | 16,202 tweets annotated as 0 (racism), 1 (sexism), or 2 (none). 4 | This is a subset of the dataset made available by Waseem and Hovy 5 | in proceedings of the NAACL 2016 Student Research Workshop. 6 | 7 | The original dataset is can be found here: . 8 | It contains 16,907 tweet IDs along with corresponding annotations. We could only 9 | retrieve 16,202 of the tweets since some of them have been deleted or their visibility 10 | limited. 11 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pushkarmishra/AuthorProfilingAbuseDetection/6322467b26f53aca7d231c0ab92182879b9375ff/__init__.py -------------------------------------------------------------------------------- /cross_validate.py: -------------------------------------------------------------------------------- 1 | from main_classifier import MainClassifier 2 | from sklearn.model_selection import StratifiedKFold 3 | from test import test 4 | 5 | import coloredlogs 6 | import logging 7 | 8 | 9 | logger = logging.getLogger('CVLog') 10 | coloredlogs.install(logger=logger, level='DEBUG', 11 | fmt='%(asctime)s - %(name)s - %(levelname)s' 12 | ' - %(message)s') 13 | 14 | EMB_MODEL = [ 15 | 'Emb_2018-03-04_12-22-03.453692.h5', 16 | 'Emb_2018-03-04_12-29-57.342629.h5', 17 | 'Emb_2018-03-04_12-38-56.418197.h5', 18 | 'Emb_2018-03-04_12-46-41.840651.h5', 19 | 'Emb_2018-03-04_12-54-29.838667.h5', 20 | 'Emb_2018-03-04_13-02-14.060916.h5', 21 | 'Emb_2018-03-04_13-09-58.910309.h5', 22 | 'Emb_2018-03-04_13-17-44.565754.h5', 23 | 'Emb_2018-03-04_13-25-30.865847.h5', 24 | 'Emb_2018-03-04_13-33-38.104125.h5', 25 | ] 26 | 27 | def run_cv(text_ids, all_texts, categories, CONFIG, folds=10): 28 | logger.info('{}-fold cross validation procedure has begun'.format(folds)) 29 | 30 | k_fold = StratifiedKFold(n_splits=folds, shuffle=True, random_state=7) 31 | metrics = [] 32 | count = 0 33 | for train_idx, test_idx in k_fold.split(all_texts, categories): 34 | count += 1 35 | logger.info('Validation round {} of {} starting' 36 | .format(count, folds)) 37 | 38 | ids_train, X_train, y_train = [], [], [] 39 | for idx in train_idx: 40 | ids_train.append(text_ids[idx]) 41 | X_train.append(all_texts[idx]) 42 | y_train.append(categories[idx]) 43 | 44 | ids_test, X_test, y_test = [], [], [] 45 | for idx in test_idx: 46 | ids_test.append(text_ids[idx]) 47 | X_test.append(all_texts[idx]) 48 | y_test.append(categories[idx]) 49 | 50 | if CONFIG['EMB_MODEL'] is None: 51 | CONFIG['EMB_MODEL'] = EMB_MODEL[count - 1] 52 | else: 53 | CONFIG['EMB_MODEL'] = None 54 | 55 | classifier = MainClassifier(CONFIG) 56 | classifier.train(ids_train, X_train, y_train) 57 | 58 | metrics.append(test(ids_test, X_test, y_test, classifier)) 59 | 60 | # Average metrics 61 | logger.info('\n') 62 | logger.info('Summary (precision, recall, F1, accuracy):') 63 | 64 | prec = rec = f1 = acc = 0.0 65 | for (i, metric) in enumerate(metrics): 66 | logger.info('Metrics for round {}: {}'.format(i + 1, metric)) 67 | prec += metric[0] 68 | rec += metric[1] 69 | f1 += metric[2] 70 | acc += metric[3] 71 | 72 | logger.info('\n') 73 | logger.info('Final average metrics: {}, {}, {}, {}'.format(prec/folds, 74 | rec/folds, 75 | f1/folds, 76 | acc/folds)) 77 | -------------------------------------------------------------------------------- /featureExtractor/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pushkarmishra/AuthorProfilingAbuseDetection/6322467b26f53aca7d231c0ab92182879b9375ff/featureExtractor/__init__.py -------------------------------------------------------------------------------- /featureExtractor/dnn_features.py: -------------------------------------------------------------------------------- 1 | from keras.callbacks import EarlyStopping 2 | from keras.callbacks import ModelCheckpoint 3 | from keras.layers import Dense 4 | from keras.layers import Dropout 5 | from keras.layers import Embedding 6 | from keras.layers import GRU 7 | from keras.layers import Input 8 | from keras.models import load_model 9 | from keras.models import Model 10 | from keras.preprocessing.sequence import pad_sequences 11 | from keras.utils import np_utils 12 | from resources.textual import process_words 13 | from resources.structural import word_tokenizer 14 | from string import punctuation 15 | 16 | import datetime 17 | import io 18 | import logging 19 | import numpy 20 | import os 21 | import pickle 22 | import re 23 | import time 24 | 25 | 26 | class DNNFeatures: 27 | 28 | def __init__(self, CONFIG): 29 | self.EMBED_DIM = CONFIG['EMB_DIM'] 30 | self.EMB_FILE = CONFIG['EMB_FILE'] 31 | self.MIN_DF = CONFIG['EMB_MIN_DF'] 32 | self.MAX_DF = CONFIG['EMB_MAX_DF'] 33 | self.MAX_VOCAB = CONFIG['EMB_MAX_VCB'] 34 | self.WORD_MIN_FREQ = CONFIG['WORD_MIN_FREQ'] 35 | self.EPOCH = CONFIG['DNN_EPOCH'] 36 | self.BATCH_SIZE = CONFIG['DNN_BATCH'] 37 | self.VAL_SPLIT = CONFIG['DNN_VAL_SPLIT'] 38 | self.HIDDEN_UNITS = CONFIG['DNN_HIDDEN_UNITS'] 39 | self.BASE = CONFIG['BASE'] 40 | 41 | self.model = None 42 | self.prediction_model = None 43 | self.vocab = None 44 | self.word_freq = None 45 | 46 | if CONFIG['EMB_MODEL'] is not None: 47 | saved_vocab = CONFIG['EMB_MODEL'].split('.h5')[0] + '.pkl' 48 | self.model = load_model(os.path.join(self.BASE, 'Models', 49 | CONFIG['EMB_MODEL'])) 50 | 51 | with open(os.path.join(self.BASE, 'Models', saved_vocab), 'rb') as vocab_file: 52 | self.vocab, self.word_freq = pickle.load(vocab_file) 53 | 54 | 55 | def tokenize_text(self, texts): 56 | text_tokens = [] 57 | for (i, text) in enumerate(texts): 58 | text = re.sub('[' + punctuation + ']', ' ', text) 59 | text = re.sub('\\b[0-9]+\\b', '', text) 60 | text = process_words(text) 61 | 62 | tokens = word_tokenizer(text) 63 | text_tokens.append(tokens) 64 | 65 | return text_tokens 66 | 67 | 68 | def build_vocab(self, text_tokens): 69 | self.word_freq = {} 70 | for text_token in text_tokens: 71 | for token in set(text_token): 72 | self.word_freq[token] = self.word_freq.get(token, 0) + 1 73 | self.word_freq = [(f, w) for (w, f) in self.word_freq.items()] 74 | self.word_freq.sort(reverse=True) 75 | 76 | token_counts = [] 77 | for (count, token) in self.word_freq: 78 | if self.MAX_DF != -1 and count > self.MAX_DF: 79 | continue 80 | if count < self.MIN_DF: 81 | continue 82 | token_counts.append((count, token)) 83 | 84 | token_counts.sort(reverse=True) 85 | if self.MAX_VOCAB != -1: 86 | token_counts = token_counts[:self.MAX_VOCAB] 87 | # NIV: not in vocab token, i.e., out of vocab 88 | token_counts.append((0, 'NIV')) 89 | 90 | self.vocab = {} 91 | for (i, (count, token)) in enumerate(token_counts): 92 | self.vocab[token] = i + 1 93 | 94 | 95 | def transform_texts(self, text_tokens): 96 | transformed = [] 97 | for text_token in text_tokens: 98 | entry = [] 99 | for token in text_token: 100 | entry.append(self.vocab.get(token, self.vocab['NIV'])) 101 | transformed.append(entry) 102 | 103 | return transformed 104 | 105 | 106 | def prepare_model(self, emb_dimension, seq_length, num_categories): 107 | input = Input(shape=(seq_length,), dtype='int32') 108 | embed = Embedding(input_dim=len(self.vocab) + 1, 109 | output_dim=emb_dimension, 110 | input_length=seq_length, 111 | mask_zero=True, trainable=True)(input) 112 | dropout_1 = Dropout(0.25)(embed) 113 | gru_1 = GRU(self.HIDDEN_UNITS, return_sequences=True)(dropout_1) 114 | dropout_2 = Dropout(0.25)(gru_1) 115 | gru_2 = GRU(self.HIDDEN_UNITS)(dropout_2) 116 | dropout_3 = Dropout(0.50)(gru_2) 117 | softmax = Dense(num_categories, activation='softmax')(dropout_3) 118 | 119 | self.model = Model(inputs=input, outputs=softmax) 120 | self.model.compile(optimizer='adam', 121 | loss='categorical_crossentropy', 122 | metrics=['accuracy']) 123 | 124 | 125 | def train(self, texts, classes): 126 | logger = logging.getLogger('TrainingLog') 127 | 128 | tokens = self.tokenize_text(texts) 129 | self.build_vocab(tokens) 130 | logger.info('Vocabulary of size {} built for embeddings' 131 | .format(len(self.vocab))) 132 | 133 | X = self.transform_texts(tokens) 134 | X = pad_sequences(X) 135 | 136 | seq_length = X.shape[1] 137 | class_weights = {} 138 | for clazz in classes: 139 | class_weights[clazz] = class_weights.get(clazz, 0) + 1 140 | for clazz in class_weights: 141 | class_weights[clazz] /= (1.0 * len(classes)) 142 | 143 | y = numpy.array(classes) 144 | y = np_utils.to_categorical(y, len(class_weights)) 145 | 146 | self.prepare_model(self.EMBED_DIM, seq_length, len(class_weights)) 147 | if self.EMB_FILE is not None: 148 | trained_vectors = self.initialise_embeddings( 149 | os.path.join(self.BASE, 'resources', self.EMB_FILE)) 150 | self.model.layers[1].set_weights([trained_vectors]) 151 | 152 | # Train DNN 153 | best_model = 'Emb_best_' + str(time.time()) + '.h5' 154 | checkpoint = ModelCheckpoint(os.path.join(self.BASE, 'Models', best_model), 155 | monitor='val_loss', verbose=1, 156 | save_best_only=True, mode='auto') 157 | earlyStopping = EarlyStopping(monitor='val_loss', 158 | patience=3, verbose=0, 159 | mode='auto') 160 | callbacks = [checkpoint, earlyStopping] 161 | 162 | self.model.fit(X, y, epochs=self.EPOCH, 163 | class_weight=class_weights, 164 | batch_size=self.BATCH_SIZE, 165 | validation_split=self.VAL_SPLIT, 166 | callbacks=callbacks, verbose=2) 167 | self.model.load_weights(os.path.join(self.BASE, 'Models', best_model)) 168 | 169 | # Save model 170 | logger.info('DNN training finished') 171 | cur_time = str(datetime.datetime.now()).replace(':', '-') \ 172 | .replace(' ', '_') 173 | model_name = 'Emb_' + cur_time + '.h5' 174 | self.model.save(os.path.join(self.BASE, 'Models', model_name)) 175 | vocab_name = 'Emb_' + cur_time + '.pkl' 176 | with open(os.path.join(self.BASE, 'Models', vocab_name), 'wb') as vocab_file: 177 | pickle.dump([self.vocab, self.word_freq], vocab_file) 178 | 179 | return model_name 180 | 181 | 182 | def sum_word_embeddings(self, text): 183 | tokens = self.tokenize_text([text]) 184 | X = self.transform_texts(tokens)[0] 185 | 186 | embed = numpy.zeros(self.EMBED_DIM) 187 | embeddings = self.model.layers[1].get_weights()[0] 188 | 189 | for (i, word) in enumerate(X): 190 | embed += embeddings[word] 191 | embed = np_utils.normalize(embed)[0] 192 | 193 | return embed 194 | 195 | 196 | def last_hidden_state(self, text): 197 | if self.prediction_model is None: 198 | self.prediction_model = Model(inputs=self.model.input, 199 | outputs=self.model.layers[-3].output) 200 | 201 | tokens = self.tokenize_text([text]) 202 | indexes = self.transform_texts(tokens)[0] 203 | seq_length = self.model.layers[1].input_length 204 | 205 | while len(indexes) < seq_length: 206 | indexes.append(0) 207 | indexes = indexes[:seq_length] 208 | 209 | X = numpy.array([indexes]) 210 | return self.prediction_model.predict(X)[0] 211 | 212 | 213 | def predict(self, text): 214 | if self.prediction_model is None: 215 | self.prediction_model = Model(inputs=self.model.input, 216 | outputs=self.model.output) 217 | 218 | tokens = self.tokenize_text([text]) 219 | indexes = self.transform_texts(tokens)[0] 220 | seq_length = self.model.layers[1].input_length 221 | 222 | while len(indexes) < seq_length: 223 | indexes.append(0) 224 | indexes = indexes[:seq_length] 225 | 226 | X = numpy.array([indexes]) 227 | return self.prediction_model.predict(X) 228 | 229 | 230 | def initialise_embeddings(self, filename): 231 | logger = logging.getLogger('TrainingLog') 232 | weights = numpy.random.uniform(size=(len(self.vocab) + 1, 233 | self.EMBED_DIM), 234 | low=-0.05, high=0.05) 235 | 236 | with io.open(filename, 'r', encoding='utf-8') as vectors: 237 | for vector in vectors: 238 | tokens = vector.split(' ') 239 | word = tokens[0] 240 | embed = [float(val) for val in tokens[1:]] 241 | 242 | if word not in self.vocab: 243 | continue 244 | weights[self.vocab[word]] = numpy.array(embed) 245 | logger.info('{} vectors initialised'.format(len(self.vocab))) 246 | 247 | return weights 248 | -------------------------------------------------------------------------------- /featureExtractor/feature_extractor.py: -------------------------------------------------------------------------------- 1 | from featureExtractor.dnn_features import DNNFeatures 2 | from featureExtractor.graph_features import GraphFeatures 3 | from featureExtractor.ngram_features import NGramFeatures 4 | 5 | 6 | class FeatureExtractor: 7 | 8 | def __init__(self, CONFIG): 9 | self.METHOD = CONFIG['METHOD'] 10 | 11 | if 'hs' in self.METHOD or 'ws' in self.METHOD: 12 | self.dnn = DNNFeatures(CONFIG) 13 | if 'n' in self.METHOD: 14 | self.ngram = NGramFeatures(CONFIG) 15 | if 'a' in self.METHOD: 16 | self.graph = GraphFeatures(CONFIG) 17 | 18 | 19 | def extract_features(self, text, text_id=None): 20 | features = [] 21 | 22 | if 'hs' in self.METHOD or 'ws' in self.METHOD: 23 | self.get_dnn_features(features, text) 24 | if 'n' in self.METHOD: 25 | self.get_ngram_features(features, text) 26 | if 'a' in self.METHOD: 27 | self.get_graph_features(features, text_id) 28 | 29 | return features 30 | 31 | 32 | def get_dnn_features(self, features, text): 33 | if 'ws' in self.METHOD: 34 | features += self.dnn.sum_word_embeddings(text).tolist() 35 | else: 36 | features += self.dnn.last_hidden_state(text).tolist() 37 | 38 | 39 | def get_ngram_features(self, features, text): 40 | features += self.ngram.extract(text).tolist() 41 | 42 | 43 | def get_graph_features(self, features, text_id): 44 | features += self.graph.extract(text_id).tolist() 45 | -------------------------------------------------------------------------------- /featureExtractor/graph_features.py: -------------------------------------------------------------------------------- 1 | import numpy 2 | import os 3 | 4 | 5 | class GraphFeatures: 6 | 7 | def __init__(self, CONFIG): 8 | self.BASE = CONFIG['BASE'] 9 | self.EMBED_DIM = 200 10 | 11 | self.authors = {} 12 | with open(os.path.join(self.BASE, 'resources', 'authors.txt')) as authors: 13 | for line in authors.readlines(): 14 | text_id, author_id = line.strip().split() 15 | self.authors[text_id] = author_id 16 | 17 | self.embeddings = {} 18 | with open(os.path.join(self.BASE, 'resources', 'authors.emb')) as embeds: 19 | for line in embeds.readlines(): 20 | tokens = line.strip().split() 21 | author_id = tokens[0] 22 | embed = [float(x) for x in tokens[1:]] 23 | self.embeddings[author_id] = numpy.array(embed) 24 | 25 | 26 | def extract(self, text_id): 27 | author_id = self.authors.get(text_id, None) 28 | if author_id is None: 29 | return numpy.zeros(self.EMBED_DIM) 30 | 31 | return self.embeddings.get(author_id, numpy.zeros(self.EMBED_DIM)) 32 | -------------------------------------------------------------------------------- /featureExtractor/ngram_features.py: -------------------------------------------------------------------------------- 1 | from sklearn.externals import joblib 2 | from sklearn.feature_extraction.text import TfidfVectorizer 3 | 4 | import datetime 5 | import logging 6 | import os 7 | 8 | 9 | class NGramFeatures: 10 | 11 | def __init__(self, CONFIG): 12 | self.USE_IDF = CONFIG['TF_USE_IDF'] 13 | self.NRANGE = CONFIG['TF_NRANGE'] 14 | self.SUBLIN = CONFIG['TF_SUBLIN'] 15 | self.MAX_FEAT = CONFIG['TF_MAX_FEAT'] 16 | self.BASE = CONFIG['BASE'] 17 | 18 | self.model = None 19 | if CONFIG['NGRAM_MODEL'] is not None: 20 | self.model = joblib.load(os.path.join(self.BASE, 'Models', 21 | CONFIG['NGRAM_MODEL'])) 22 | 23 | 24 | def extract(self, text): 25 | return self.model.transform([text]).toarray()[0] 26 | 27 | 28 | def train(self, all_texts): 29 | self.model = TfidfVectorizer(analyzer='char', 30 | ngram_range=self.NRANGE, 31 | max_features=self.MAX_FEAT, 32 | use_idf=self.USE_IDF, 33 | sublinear_tf=self.SUBLIN) 34 | self.model.fit(all_texts) 35 | 36 | # Save N-gram vocabulary 37 | cur_time = str(datetime.datetime.now()).replace(':', '-') \ 38 | .replace(' ', '_') 39 | model_name = 'NGramModel_' + cur_time + '.pkl' 40 | joblib.dump(self.model, os.path.join(self.BASE, 'Models', model_name)) 41 | 42 | logger = logging.getLogger('TrainingLog') 43 | logger.info('N-gram vectorization finished with vocabulary' 44 | ' size {}'.format(len(self.model.vocabulary_))) 45 | 46 | return model_name 47 | -------------------------------------------------------------------------------- /grid_search.py: -------------------------------------------------------------------------------- 1 | from featureExtractor.feature_extractor import FeatureExtractor 2 | from sklearn.model_selection import GridSearchCV 3 | from sklearn.linear_model import LogisticRegression 4 | from lightgbm import LGBMClassifier 5 | 6 | import coloredlogs 7 | import logging 8 | import numpy 9 | 10 | 11 | logger = logging.getLogger('GridSearchLog') 12 | coloredlogs.install(logger=logger, level='DEBUG', 13 | fmt='%(asctime)s - %(name)s - %(levelname)s' 14 | ' - %(message)s') 15 | 16 | 17 | def gbc_details(): 18 | classifier = LGBMClassifier(silent=False) 19 | parameters = {'num_leaves': [15, 31, 63, 127], 20 | 'min_child_weight': [1, 5, 7, 10, 20], 21 | 'min_child_samples': [1, 5, 10, 15, 20], 22 | 'learning_rate': [0.01, 0.05, 0.08, 0.1, 0.25], 23 | 'n_estimators': [80, 100, 125, 150, 200]} 24 | return (classifier, parameters) 25 | 26 | 27 | def lr_details(): 28 | classifier = LogisticRegression(verbose=True, max_iter=1000) 29 | parameters = {'C': [0.01, 0.1, 0.25, 0.5, 0.75, 30 | 1.0, 10.0, 25.0, 50.0, 100.0]} 31 | return (classifier, parameters) 32 | 33 | 34 | def perform_grid_search(texts_ids, all_texts, classes, args, CONFIG): 35 | estimator = args[0] 36 | size = CONFIG['GRID_SEARCH_SIZE'] 37 | CONFIG['EMB_MODEL'] = args[1] 38 | CONFIG['NGRAM_MODEL'] = args[2] 39 | 40 | feature_extractor = FeatureExtractor(CONFIG) 41 | (classifier, parameters) = eval(estimator + '_details' + '()') 42 | 43 | data = [] 44 | for (i, text) in enumerate(all_texts[:size]): 45 | features = feature_extractor.extract_features(text, texts_ids[i]) 46 | data.append(features) 47 | 48 | if i % 1000 == 0 and i > 0: 49 | logger.info('{} of {} feature vectors prepared ' 50 | 'for grid search'.format(i + 1, size)) 51 | data = numpy.array(data) 52 | categories = numpy.array(classes[:size]) 53 | 54 | clf = GridSearchCV(classifier, parameters, cv=5) 55 | clf.fit(data, categories) 56 | 57 | logger.info('Grid search results:\n{}'.format(clf.cv_results_)) 58 | logger.info('Best param set: {}'.format(clf.best_params_)) 59 | -------------------------------------------------------------------------------- /main_classifier.py: -------------------------------------------------------------------------------- 1 | from featureExtractor.dnn_features import DNNFeatures 2 | from featureExtractor.feature_extractor import FeatureExtractor 3 | from featureExtractor.ngram_features import NGramFeatures 4 | from sklearn.linear_model import LogisticRegression 5 | from sklearn.externals import joblib 6 | 7 | import coloredlogs 8 | import copy 9 | import datetime 10 | import lightgbm 11 | import logging 12 | import numpy 13 | import os 14 | 15 | 16 | logger = logging.getLogger('TrainingLog') 17 | coloredlogs.install(logger=logger, level='DEBUG', 18 | fmt='%(asctime)s - %(name)s - %(levelname)s' 19 | ' - %(message)s') 20 | 21 | 22 | class MainClassifier: 23 | 24 | def __init__(self, CONFIG): 25 | self.CONFIG = copy.deepcopy(CONFIG) 26 | self.BASE = CONFIG['BASE'] 27 | 28 | self. featureExtract = None 29 | self.classifier = None 30 | if CONFIG['CLASSIFIER'] is not None: 31 | self.classifier = joblib.load(os.path.join(self.BASE, 'Models', 32 | CONFIG['CLASSIFIER'])) 33 | 34 | 35 | def train(self, text_ids, all_texts, classes): 36 | logger = logging.getLogger('TrainingLog') 37 | logger.info('Initiating training of main classifier') 38 | 39 | # Prepare feature extractor 40 | if self.CONFIG['EMB_MODEL'] is None and \ 41 | ('ws' in self.CONFIG['METHOD'] or 'hs' in self.CONFIG['METHOD']): 42 | self.CONFIG['EMB_MODEL'] = \ 43 | DNNFeatures(self.CONFIG).train(all_texts, classes) 44 | 45 | if self.CONFIG['NGRAM_MODEL'] is None and 'n' in self.CONFIG['METHOD']: 46 | self.CONFIG['NGRAM_MODEL'] = \ 47 | NGramFeatures(self.CONFIG).train(all_texts) 48 | 49 | self.featureExtract = FeatureExtractor(self.CONFIG) 50 | logger.info('Feature extractor ready') 51 | 52 | # Prepare data 53 | data = [] 54 | for (i, text) in enumerate(all_texts): 55 | features = self.featureExtract.extract_features(text, text_ids[i]) 56 | data.append(features) 57 | 58 | if i % 1000 == 0 and i > 0: 59 | logger.info('{} of {} feature vectors prepared ' 60 | 'for training'.format(i + 1, len(all_texts))) 61 | train_X, train_Y = numpy.array(data), numpy.array(classes) 62 | 63 | # Train classifier 64 | train_data = lightgbm.Dataset(train_X, train_Y) 65 | params = { 66 | 'learning_rate': self.CONFIG['GB_LEARN_RATE'], 67 | 'num_leaves': self.CONFIG['GB_LEAVES'], 68 | 'min_child_weight': self.CONFIG['GB_LEAF_WEIGHT'], 69 | 'min_child_samples': self.CONFIG['GB_LEAF_SAMPLES'], 70 | 'objective': 'multiclass', 71 | 'num_class': len(set(classes)), 72 | 'metric': {'multi_logloss'}, 73 | } 74 | if 'l' not in self.CONFIG['METHOD']: 75 | self.classifier = lightgbm.train(params, train_data, 76 | self.CONFIG['GB_ITERATIONS']) 77 | else: 78 | self.classifier = LogisticRegression(C=self.CONFIG['LR_C']) 79 | self.classifier.fit(train_X, train_Y) 80 | 81 | # Save classifier 82 | cur_time = str(datetime.datetime.now()).replace(':', '-') \ 83 | .replace(' ', '_') 84 | self.CONFIG['CLASSIFIER'] = 'Classifier_' + cur_time + '.pkl' 85 | joblib.dump(self.classifier, os.path.join(self.BASE, 'Models', 86 | self.CONFIG['CLASSIFIER'])) 87 | 88 | logger = logging.getLogger('TrainingLog') 89 | logger.info('Main classifier training finished') 90 | 91 | return self.CONFIG['CLASSIFIER'] 92 | 93 | 94 | def classify(self, text_id, text, prob=False): 95 | # Prepare classifier 96 | if self.classifier is None: 97 | logger = logging.getLogger('TrainingLog') 98 | models = os.listdir(os.path.join(self.BASE, 'Models')) 99 | models.sort(reverse=True) 100 | 101 | for model in models: 102 | if model.startswith('Classifier') and model.endswith('.pkl'): 103 | self.CONFIG['CLASSIFIER'] = model 104 | break 105 | 106 | logger.info('Using Classifier Model {}' 107 | .format(self.CONFIG['CLASSIFIER'])) 108 | self.classifier = joblib.load(os.path.join(self.BASE, 'Models', 109 | self.CONFIG['CLASSIFIER'])) 110 | 111 | # Prepare feature extractor 112 | if self.featureExtract is None: 113 | logger = logging.getLogger('TrainingLog') 114 | models = os.listdir(os.path.join(self.BASE, 'Models')) 115 | models.sort(reverse=True) 116 | 117 | if self.CONFIG['EMB_MODEL'] is None: 118 | for model in models: 119 | if model.startswith('Emb_') and model.endswith('.h5'): 120 | self.CONFIG['EMB_MODEL'] = model 121 | break 122 | 123 | if self.CONFIG['NGRAM_MODEL'] is None: 124 | for model in models: 125 | if model.startswith('NGram') and model.endswith('.pkl'): 126 | self.CONFIG['NGRAM_MODEL'] = model 127 | break 128 | 129 | logger.info('Using Embedding Model {} and N-gram Model {}' 130 | .format(self.CONFIG['EMB_MODEL'], 131 | self.CONFIG['NGRAM_MODEL'])) 132 | 133 | self.featureExtract = FeatureExtractor(self.CONFIG) 134 | logger.info('Feature extractor ready') 135 | 136 | # Classify 137 | features = self.featureExtract.extract_features(text, text_id) 138 | features = numpy.array([features]) 139 | if isinstance(self.classifier, LogisticRegression): 140 | prediction = self.classifier.predict_proba(features)[0].tolist() 141 | else: prediction = self.classifier.predict(features)[0].tolist() 142 | 143 | if prob: 144 | return (prediction.index(max(prediction)), prediction) 145 | return prediction.index(max(prediction)) 146 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto==2.48.0 2 | bz2file==0.98 3 | certifi==2017.7.27.1 4 | chardet==3.0.4 5 | coloredlogs==7.3 6 | h5py==2.7.1 7 | humanfriendly==4.4.1 8 | idna==2.6 9 | Keras==2.0.8 10 | lightgbm==2.0.10 11 | nltk==3.2.5 12 | numpy==1.13.3 13 | requests==2.18.4 14 | scikit-learn==0.19.1 15 | scipy==1.0.0 16 | six==1.11.0 17 | sklearn==0.0 18 | smart-open==1.5.3 19 | Theano==0.9.0 20 | urllib3==1.22 21 | -------------------------------------------------------------------------------- /resources/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pushkarmishra/AuthorProfilingAbuseDetection/6322467b26f53aca7d231c0ab92182879b9375ff/resources/__init__.py -------------------------------------------------------------------------------- /resources/node2vec/.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .DS_Store 3 | target 4 | bin 5 | build 6 | .gradle 7 | *.iml 8 | *.ipr 9 | *.iws 10 | *.log 11 | .classpath 12 | .project 13 | .settings 14 | .idea -------------------------------------------------------------------------------- /resources/node2vec/LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Aditya Grover 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /resources/node2vec/README.md: -------------------------------------------------------------------------------- 1 | # node2vec 2 | 3 | This repository provides a reference implementation of *node2vec* as described in the paper:
4 | > node2vec: Scalable Feature Learning for Networks.
5 | > Aditya Grover and Jure Leskovec.
6 | > Knowledge Discovery and Data Mining, 2016.
7 | > 8 | 9 | The *node2vec* algorithm learns continuous representations for nodes in any (un)directed, (un)weighted graph. Please check the [project page](https://snap.stanford.edu/node2vec/) for more details. 10 | 11 | ### Basic Usage 12 | 13 | #### Example 14 | To run *node2vec* on Zachary's karate club network, execute the following command from the project home directory:
15 | ``python src/main.py --input graph/karate.edgelist --output emb/karate.emd`` 16 | 17 | #### Options 18 | You can check out the other options available to use with *node2vec* using:
19 | ``python src/main.py --help`` 20 | 21 | #### Input 22 | The supported input format is an edgelist: 23 | 24 | node1_id_int node2_id_int 25 | 26 | The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags. 27 | 28 | #### Output 29 | The output file has *n+1* lines for a graph with *n* vertices. 30 | The first line has the following format: 31 | 32 | num_of_nodes dim_of_representation 33 | 34 | The next *n* lines are as follows: 35 | 36 | node_id dim1 dim2 ... dimd 37 | 38 | where dim1, ... , dimd is the *d*-dimensional representation learned by *node2vec*. 39 | 40 | ### Citing 41 | If you find *node2vec* useful for your research, please consider citing the following paper: 42 | 43 | @inproceedings{node2vec-kdd2016, 44 | author = {Grover, Aditya and Leskovec, Jure}, 45 | title = {node2vec: Scalable Feature Learning for Networks}, 46 | booktitle = {Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, 47 | year = {2016} 48 | } 49 | 50 | 51 | ### Miscellaneous 52 | 53 | Please send any questions you might have about the code and/or the algorithm to . 54 | 55 | *Note:* This is only a reference implementation of the *node2vec* algorithm and could benefit from several performance enhancement schemes, some of which are discussed in the paper. 56 | -------------------------------------------------------------------------------- /resources/node2vec/requirements.txt: -------------------------------------------------------------------------------- 1 | networkx==1.11 2 | numpy==1.11.2 3 | gensim==0.13.3 4 | -------------------------------------------------------------------------------- /resources/node2vec/src/main.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Reference implementation of node2vec. 3 | 4 | Author: Aditya Grover 5 | 6 | For more details, refer to the paper: 7 | node2vec: Scalable Feature Learning for Networks 8 | Aditya Grover and Jure Leskovec 9 | Knowledge Discovery and Data Mining (KDD), 2016 10 | ''' 11 | 12 | import argparse 13 | import numpy as np 14 | import networkx as nx 15 | import node2vec 16 | from gensim.models import Word2Vec 17 | 18 | def parse_args(): 19 | ''' 20 | Parses the node2vec arguments. 21 | ''' 22 | parser = argparse.ArgumentParser(description="Run node2vec.") 23 | 24 | parser.add_argument('--input', nargs='?', default='graph/karate.edgelist', 25 | help='Input graph path') 26 | 27 | parser.add_argument('--output', nargs='?', default='emb/karate.emb', 28 | help='Embeddings path') 29 | 30 | parser.add_argument('--dimensions', type=int, default=128, 31 | help='Number of dimensions. Default is 128.') 32 | 33 | parser.add_argument('--walk-length', type=int, default=80, 34 | help='Length of walk per source. Default is 80.') 35 | 36 | parser.add_argument('--num-walks', type=int, default=10, 37 | help='Number of walks per source. Default is 10.') 38 | 39 | parser.add_argument('--window-size', type=int, default=10, 40 | help='Context size for optimization. Default is 10.') 41 | 42 | parser.add_argument('--iter', default=1, type=int, 43 | help='Number of epochs in SGD') 44 | 45 | parser.add_argument('--workers', type=int, default=8, 46 | help='Number of parallel workers. Default is 8.') 47 | 48 | parser.add_argument('--p', type=float, default=1, 49 | help='Return hyperparameter. Default is 1.') 50 | 51 | parser.add_argument('--q', type=float, default=1, 52 | help='Inout hyperparameter. Default is 1.') 53 | 54 | parser.add_argument('--weighted', dest='weighted', action='store_true', 55 | help='Boolean specifying (un)weighted. Default is unweighted.') 56 | parser.add_argument('--unweighted', dest='unweighted', action='store_false') 57 | parser.set_defaults(weighted=False) 58 | 59 | parser.add_argument('--directed', dest='directed', action='store_true', 60 | help='Graph is (un)directed. Default is undirected.') 61 | parser.add_argument('--undirected', dest='undirected', action='store_false') 62 | parser.set_defaults(directed=False) 63 | 64 | return parser.parse_args() 65 | 66 | def read_graph(): 67 | ''' 68 | Reads the input network in networkx. 69 | ''' 70 | if args.weighted: 71 | G = nx.read_edgelist(args.input, nodetype=int, data=(('weight',float),), create_using=nx.DiGraph()) 72 | else: 73 | G = nx.read_edgelist(args.input, nodetype=int, create_using=nx.DiGraph()) 74 | for edge in G.edges(): 75 | G[edge[0]][edge[1]]['weight'] = 1 76 | 77 | if not args.directed: 78 | G = G.to_undirected() 79 | 80 | return G 81 | 82 | def learn_embeddings(walks): 83 | ''' 84 | Learn embeddings by optimizing the Skipgram objective using SGD. 85 | ''' 86 | walks = [list(map(str, walk)) for walk in walks] 87 | model = Word2Vec(walks, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers, iter=args.iter) 88 | model.wv.save_word2vec_format(args.output) 89 | 90 | return 91 | 92 | def main(args): 93 | ''' 94 | Pipeline for representational learning for all nodes in a graph. 95 | ''' 96 | nx_G = read_graph() 97 | G = node2vec.Graph(nx_G, args.directed, args.p, args.q) 98 | G.preprocess_transition_probs() 99 | walks = G.simulate_walks(args.num_walks, args.walk_length) 100 | learn_embeddings(walks) 101 | 102 | if __name__ == "__main__": 103 | args = parse_args() 104 | main(args) 105 | -------------------------------------------------------------------------------- /resources/node2vec/src/node2vec.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import networkx as nx 3 | import random 4 | 5 | 6 | class Graph(): 7 | def __init__(self, nx_G, is_directed, p, q): 8 | self.G = nx_G 9 | self.is_directed = is_directed 10 | self.p = p 11 | self.q = q 12 | 13 | def node2vec_walk(self, walk_length, start_node): 14 | ''' 15 | Simulate a random walk starting from start node. 16 | ''' 17 | G = self.G 18 | alias_nodes = self.alias_nodes 19 | alias_edges = self.alias_edges 20 | 21 | walk = [start_node] 22 | 23 | while len(walk) < walk_length: 24 | cur = walk[-1] 25 | cur_nbrs = sorted(G.neighbors(cur)) 26 | if len(cur_nbrs) > 0: 27 | if len(walk) == 1: 28 | walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])]) 29 | else: 30 | prev = walk[-2] 31 | next = cur_nbrs[alias_draw(alias_edges[(prev, cur)][0], 32 | alias_edges[(prev, cur)][1])] 33 | walk.append(next) 34 | else: 35 | break 36 | 37 | return walk 38 | 39 | def simulate_walks(self, num_walks, walk_length): 40 | ''' 41 | Repeatedly simulate random walks from each node. 42 | ''' 43 | G = self.G 44 | walks = [] 45 | nodes = list(G.nodes()) 46 | print('Walk iteration:') 47 | for walk_iter in range(num_walks): 48 | print(str(walk_iter+1), '/', str(num_walks)) 49 | random.shuffle(nodes) 50 | for node in nodes: 51 | walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node)) 52 | 53 | return walks 54 | 55 | def get_alias_edge(self, src, dst): 56 | ''' 57 | Get the alias edge setup lists for a given edge. 58 | ''' 59 | G = self.G 60 | p = self.p 61 | q = self.q 62 | 63 | unnormalized_probs = [] 64 | for dst_nbr in sorted(G.neighbors(dst)): 65 | if dst_nbr == src: 66 | unnormalized_probs.append(G[dst][dst_nbr]['weight']/p) 67 | elif G.has_edge(dst_nbr, src): 68 | unnormalized_probs.append(G[dst][dst_nbr]['weight']) 69 | else: 70 | unnormalized_probs.append(G[dst][dst_nbr]['weight']/q) 71 | norm_const = sum(unnormalized_probs) 72 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 73 | 74 | return alias_setup(normalized_probs) 75 | 76 | def preprocess_transition_probs(self): 77 | ''' 78 | Preprocessing of transition probabilities for guiding the random walks. 79 | ''' 80 | G = self.G 81 | is_directed = self.is_directed 82 | 83 | alias_nodes = {} 84 | for node in G.nodes(): 85 | unnormalized_probs = [G[node][nbr]['weight'] for nbr in sorted(G.neighbors(node))] 86 | norm_const = sum(unnormalized_probs) 87 | normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] 88 | alias_nodes[node] = alias_setup(normalized_probs) 89 | 90 | alias_edges = {} 91 | triads = {} 92 | 93 | if is_directed: 94 | for edge in G.edges(): 95 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1]) 96 | else: 97 | for edge in G.edges(): 98 | alias_edges[edge] = self.get_alias_edge(edge[0], edge[1]) 99 | alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0]) 100 | 101 | self.alias_nodes = alias_nodes 102 | self.alias_edges = alias_edges 103 | 104 | return 105 | 106 | 107 | def alias_setup(probs): 108 | ''' 109 | Compute utility lists for non-uniform sampling from discrete distributions. 110 | Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/ 111 | for details 112 | ''' 113 | K = len(probs) 114 | q = np.zeros(K) 115 | J = np.zeros(K, dtype=np.int) 116 | 117 | smaller = [] 118 | larger = [] 119 | for kk, prob in enumerate(probs): 120 | q[kk] = K*prob 121 | if q[kk] < 1.0: 122 | smaller.append(kk) 123 | else: 124 | larger.append(kk) 125 | 126 | while len(smaller) > 0 and len(larger) > 0: 127 | small = smaller.pop() 128 | large = larger.pop() 129 | 130 | J[small] = large 131 | q[large] = q[large] + q[small] - 1.0 132 | if q[large] < 1.0: 133 | smaller.append(large) 134 | else: 135 | larger.append(large) 136 | 137 | return J, q 138 | 139 | def alias_draw(J, q): 140 | ''' 141 | Draw sample from a non-uniform discrete distribution using alias sampling. 142 | ''' 143 | K = len(J) 144 | 145 | kk = int(np.floor(np.random.rand()*K)) 146 | if np.random.rand() < q[kk]: 147 | return kk 148 | else: 149 | return J[kk] -------------------------------------------------------------------------------- /resources/stopwords.txt: -------------------------------------------------------------------------------- 1 | a 2 | about 3 | above 4 | accordingly 5 | across 6 | after 7 | again 8 | all 9 | almost 10 | alone 11 | along 12 | already 13 | also 14 | although 15 | altogether 16 | always 17 | am 18 | among 19 | amongst 20 | an 21 | and 22 | any 23 | another 24 | anybody 25 | anyone 26 | anything 27 | anyway 28 | anyways 29 | anywhere 30 | are 31 | around 32 | as 33 | ask 34 | at 35 | away 36 | b 37 | back 38 | be 39 | because 40 | become 41 | been 42 | before 43 | began 44 | begin 45 | begun 46 | behind 47 | below 48 | between 49 | both 50 | but 51 | by 52 | c 53 | can 54 | cannot 55 | cant 56 | can't 57 | certain 58 | certainly 59 | clear 60 | clearly 61 | could 62 | couldnt 63 | couldn't 64 | d 65 | despite 66 | did 67 | didnt 68 | didn't 69 | do 70 | does 71 | doesnt 72 | doesn't 73 | done 74 | dont 75 | don't 76 | down 77 | due 78 | e 79 | each 80 | earlier 81 | either 82 | end 83 | enough 84 | especially 85 | even 86 | evenly 87 | ever 88 | every 89 | everybody 90 | everyone 91 | everything 92 | everywhere 93 | example 94 | except 95 | f 96 | final 97 | find 98 | first 99 | for 100 | from 101 | full 102 | fully 103 | further 104 | furthermore 105 | g 106 | gave 107 | generate 108 | get 109 | given 110 | go 111 | got 112 | h 113 | ha 114 | had 115 | hadnt 116 | hadn't 117 | hasnt 118 | hasn't 119 | have 120 | havent 121 | haven't 122 | he 123 | hence 124 | her 125 | here 126 | herself 127 | hi 128 | him 129 | himself 130 | how 131 | however 132 | i 133 | if 134 | import 135 | in 136 | into 137 | is 138 | isnt 139 | isn't 140 | it 141 | its 142 | itself 143 | j 144 | just 145 | k 146 | keep 147 | l 148 | last 149 | later 150 | least 151 | less 152 | let 153 | ll 154 | m 155 | many 156 | may 157 | me 158 | might 159 | more 160 | most 161 | mostly 162 | mr 163 | much 164 | must 165 | mustnt 166 | mustn't 167 | my 168 | myself 169 | n 170 | necessary 171 | neither 172 | next 173 | no 174 | nobody 175 | nothing 176 | now 177 | nowhere 178 | number 179 | o 180 | of 181 | off 182 | often 183 | on 184 | once 185 | only 186 | onto 187 | open 188 | or 189 | other 190 | otherwise 191 | ought 192 | our 193 | ourself 194 | ourselves 195 | out 196 | over 197 | own 198 | p 199 | per 200 | perhaps 201 | possible 202 | possibly 203 | q 204 | r 205 | rather 206 | re 207 | really 208 | right 209 | rt 210 | s 211 | said 212 | same 213 | seem 214 | shall 215 | shant 216 | shan't 217 | she 218 | should 219 | shouldnt 220 | shouldn't 221 | since 222 | so 223 | some 224 | somebody 225 | someone 226 | something 227 | somethings 228 | somewhere 229 | still 230 | such 231 | sure 232 | t 233 | taken 234 | than 235 | that 236 | the 237 | their 238 | them 239 | themselve 240 | themselves 241 | then 242 | there 243 | therefore 244 | these 245 | they 246 | thing 247 | think 248 | this 249 | those 250 | though 251 | through 252 | thus 253 | to 254 | today 255 | together 256 | too 257 | took 258 | toward 259 | towards 260 | turn 261 | u 262 | under 263 | until 264 | up 265 | upon 266 | us 267 | v 268 | ve 269 | very 270 | w 271 | want 272 | was 273 | wasnt 274 | wasn't 275 | way 276 | we 277 | well 278 | went 279 | were 280 | what 281 | whatsoever 282 | when 283 | where 284 | whereas 285 | wherever 286 | whether 287 | why 288 | which 289 | while 290 | who 291 | whole 292 | whom 293 | whose 294 | will 295 | with 296 | within 297 | without 298 | would 299 | wouldnt 300 | wouldn't 301 | wont 302 | won't 303 | x 304 | y 305 | ya 306 | year 307 | yes 308 | yet 309 | you 310 | your 311 | yours 312 | yourself 313 | yourselves 314 | z 315 | -------------------------------------------------------------------------------- /resources/structural.py: -------------------------------------------------------------------------------- 1 | from nltk import word_tokenize 2 | from nltk import sent_tokenize 3 | from nltk import PorterStemmer 4 | 5 | import re 6 | 7 | 8 | def word_stem(token): 9 | stem = PorterStemmer() 10 | return stem.stem(token) 11 | 12 | 13 | def word_tokenizer(text): 14 | return word_tokenize(text) 15 | 16 | 17 | def remove_non_words(all_words): 18 | only_words = [] 19 | pattern = re.compile('[a-zA-Z]+') 20 | 21 | for word in all_words: 22 | if pattern.match(word) != None: 23 | only_words.append(word) 24 | return only_words 25 | 26 | 27 | def sentence_tokenizer(text): 28 | return sent_tokenize(text) 29 | -------------------------------------------------------------------------------- /resources/textual.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | 4 | 5 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 6 | stop_words = set(open(os.path.join(BASE_DIR, 'stopwords.txt'), 'r').read().split()) 7 | 8 | 9 | def clean_tweet(text): 10 | space_pattern = '\\s+' 11 | giant_url_regex = ('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|' 12 | '[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+') 13 | mention_regex = '@[\w\-]+' 14 | rt_regex = '\\b[Rr][Tt]\\b' 15 | 16 | cleaned_tweet = re.sub(giant_url_regex, '_URL_', text) 17 | cleaned_tweet = re.sub(mention_regex, '_MTN_', cleaned_tweet) 18 | cleaned_tweet = re.sub(rt_regex, '', cleaned_tweet) 19 | cleaned_tweet = re.sub(space_pattern, ' ', cleaned_tweet) 20 | 21 | return cleaned_tweet 22 | 23 | 24 | def clean_detox(text): 25 | space_pattern = '\\s+' 26 | giant_url_regex = ('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|' 27 | '[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+') 28 | line_token_pattern = 'NEWLINE_TOKEN' 29 | 30 | cleaned_text = re.sub(giant_url_regex, '_URL_', text) 31 | cleaned_text = re.sub(line_token_pattern, ' ', cleaned_text) 32 | cleaned_text = re.sub(space_pattern, ' ', cleaned_text) 33 | 34 | return cleaned_text 35 | 36 | 37 | def process_words(text): 38 | space_pattern = '\\s+' 39 | text = re.sub(space_pattern, ' ', text) 40 | 41 | words = text.split(' ') 42 | text = [] 43 | for word in words: 44 | word = word.lower() 45 | if word not in stop_words: 46 | text.append(word) 47 | 48 | return ' '.join(text) 49 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from main_classifier import MainClassifier 2 | from sklearn.metrics import accuracy_score 3 | from sklearn.metrics import classification_report 4 | from sklearn.metrics import confusion_matrix 5 | from sklearn.metrics import precision_recall_fscore_support 6 | from sklearn.metrics import roc_auc_score 7 | 8 | import coloredlogs 9 | import logging 10 | import numpy 11 | 12 | 13 | logger = logging.getLogger('TestLog') 14 | coloredlogs.install(logger=logger, level='DEBUG', 15 | fmt='%(asctime)s - %(name)s - %(levelname)s' 16 | ' - %(message)s') 17 | 18 | 19 | def one_hot(y): 20 | m = y.shape[0] 21 | 22 | if len(y.shape) == 1: 23 | n = len(set(y.ravel())) 24 | idxs = y.astype(int) 25 | else: 26 | idxs = y.argmax(axis=1) 27 | n = y.shape[1] 28 | 29 | y_oh = numpy.zeros((m, n)) 30 | y_oh[list(range(m)), idxs] = 1 31 | 32 | return y_oh 33 | 34 | 35 | def compute_roc_auc(classes, probs): 36 | classes_arr = one_hot(numpy.array(classes)) 37 | prob_arr = numpy.array(probs) 38 | 39 | return roc_auc_score(classes_arr, prob_arr, average='macro') 40 | 41 | 42 | def test(text_ids, texts, classes, classifier): 43 | classes_pred = [] 44 | probs = [] 45 | count_match = 0 46 | for (i, text) in enumerate(texts): 47 | (clazz, prob_score) = classifier.classify(text_ids[i], text, prob=True) 48 | probs.append(prob_score) 49 | classes_pred.append(clazz) 50 | if clazz == classes[i]: 51 | count_match += 1 52 | 53 | if i > 0 and i % 100 == 0: 54 | accuracy = (1.0 * count_match) / (i + 1) 55 | logger.info('{} samples classified. Accuracy up till ' 56 | 'now is {}'.format(i + 1, accuracy)) 57 | 58 | # Calculate metrics 59 | accuracy = (1.0 * count_match) / len(classes) 60 | report = classification_report(classes, classes_pred, digits=5) 61 | conf_matrix = confusion_matrix(classes, classes_pred) 62 | roc_auc = compute_roc_auc(classes, probs) 63 | 64 | # Log results 65 | logger.info('Total {} samples classified with accuracy ' 66 | '{}'.format(len(classes), accuracy)) 67 | logger.info('AUROC is {}'.format(roc_auc)) 68 | logger.info('Classification report:\n{}'.format(report)) 69 | logger.info('Confusion matrix:\n{}'.format(conf_matrix)) 70 | 71 | metrics = precision_recall_fscore_support(classes, classes_pred, 72 | average='weighted') 73 | metrics = [metrics[0], metrics[1], metrics[2], 74 | accuracy_score(classes, classes_pred)] 75 | 76 | return metrics 77 | -------------------------------------------------------------------------------- /twitter_access.py: -------------------------------------------------------------------------------- 1 | # from matplotlib import pyplot 2 | from networkx.drawing.nx_agraph import write_dot 3 | from tweepy import OAuthHandler 4 | 5 | import coloredlogs 6 | import csv 7 | import json 8 | import logging 9 | import networkx 10 | import os 11 | import time 12 | import tweepy 13 | 14 | 15 | logger = logging.getLogger('TwitterAccess') 16 | coloredlogs.install(logger=logger, level='DEBUG', 17 | fmt='%(asctime)s - %(name)s - %(levelname)s' 18 | ' - %(message)s') 19 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 20 | SLEEP_TIME = 1000 21 | 22 | 23 | class TwitterAccess: 24 | 25 | def __init__(self): 26 | self.api = self.load_api() 27 | 28 | 29 | def load_api(self): 30 | consumer_key = '' 31 | consumer_secret = '' 32 | access_token = '' 33 | access_secret = '' 34 | auth = OAuthHandler(consumer_key, consumer_secret) 35 | auth.set_access_token(access_token, access_secret) 36 | 37 | # Load the twitter API via Tweepy 38 | return tweepy.API(auth) 39 | 40 | 41 | # Status Methods 42 | def tweet_text_from_tweet_id(self, idx): 43 | tweet = self.api.get_status(idx) 44 | return tweet.text 45 | 46 | 47 | # User Methods 48 | def get_followers(self, screen_name): 49 | user_ids = [] 50 | for page in tweepy.Cursor(self.api.followers_ids, 51 | screen_name=screen_name).pages(): 52 | user_ids.extend(page) 53 | time.sleep(60) 54 | 55 | return user_ids 56 | 57 | 58 | def user_from_tweet_id(self, idx): 59 | status = self.api.get_status(idx) 60 | return (status.user.id_str, status.user.screen_name) 61 | 62 | 63 | def get_follow_info(self, x, y): 64 | return self.api.show_friendship(source_id=x, target_id=y) 65 | 66 | 67 | def username_from_user_id(self, idx): 68 | user = self.api.get_user(user_id=idx) 69 | return user.screen_name 70 | 71 | 72 | def timeline_from_username(self, screen_name): 73 | timeline = self.api.user_timeline(screen_name=screen_name) 74 | return timeline 75 | 76 | 77 | class Graph: 78 | 79 | def __init__(self, tweet_ids=None, nodes_file=None, edges_file=None): 80 | self.TWEET_IDS = tweet_ids 81 | self.ACCESSOR = TwitterAccess() 82 | self.GRAPH = networkx.Graph() 83 | self.NODES = {} 84 | self.EDGES = set() 85 | 86 | if nodes_file is not None: 87 | with open(os.path.join(BASE_DIR, 'resources', nodes_file)) as nodes: 88 | self.NODES = json.load(nodes) 89 | else: 90 | self.prepare_nodes() 91 | 92 | if edges_file is not None: 93 | with open(os.path.join(BASE_DIR, 'resources', edges_file)) as edges: 94 | for line in edges.readlines(): 95 | x, y = line.strip().split(',') 96 | self.EDGES.add((x, y)) 97 | 98 | 99 | def prepare_nodes(self): 100 | if self.TWEET_IDS is None: 101 | return 102 | 103 | def fill_node_data(idx): 104 | user = self.ACCESSOR.user_from_tweet_id(idx) 105 | self.NODES[user[0]] = user[1] 106 | 107 | for idx in self.TWEET_IDS: 108 | try: 109 | fill_node_data(idx) 110 | except tweepy.error.RateLimitError: 111 | try: 112 | logger.info('Hit rate limit; waiting and retrying') 113 | time.sleep(SLEEP_TIME) 114 | fill_node_data(idx) 115 | except: 116 | break 117 | except Exception as e: 118 | logger.error('Problem with tweet id {}: {}'.format(idx, e)) 119 | 120 | with open(os.path.join(BASE_DIR, 'resources', 121 | 'authors.json'), 'w') as nodes_file: 122 | json.dump(self.NODES, nodes_file) 123 | 124 | 125 | def add_follower_edges(self): 126 | edges_file = open(os.path.join(BASE_DIR, 'resources', 127 | 'author_edges.txt'), 'a') 128 | def fill_edge_data(x): 129 | followers = self.ACCESSOR.get_followers(self.NODES[x]) 130 | followers = set([str(f) for f in followers]) 131 | 132 | for y in self.NODES: 133 | if (x, y) in self.EDGES: 134 | continue 135 | 136 | if y in followers: 137 | self.EDGES.add((y, x)) 138 | print('{},{}'.format(y, x), file=edges_file) 139 | edges_file.flush() 140 | logger.info('Followers of user {} added'.format(self.NODES[x])) 141 | 142 | for x in self.NODES.keys(): 143 | try: 144 | fill_edge_data(x) 145 | except tweepy.error.RateLimitError: 146 | try: 147 | logger.info('Hit rate limit; waiting and retrying') 148 | time.sleep(SLEEP_TIME) 149 | fill_edge_data(x) 150 | except: 151 | break 152 | except Exception as e: 153 | logger.error('Problem with user {}: {}'.format(self.NODES[x], e)) 154 | edges_file.close() 155 | 156 | 157 | def form_graph(self): 158 | for node_id in self.NODES.keys(): 159 | self.GRAPH.add_node(self.NODES[node_id]) 160 | for edge in self.EDGES: 161 | self.GRAPH.add_edge(self.NODES[edge[0]], self.NODES[edge[1]]) 162 | 163 | 164 | def print_graph(self): 165 | write_dot(self.GRAPH, 'graph.dot') 166 | # networkx.draw(self.GRAPH) 167 | # pyplot.savefig('graph.png') 168 | 169 | 170 | def main(): 171 | f = open(os.path.join(BASE_DIR, 'TwitterData', 'twitter_data_waseem_hovy.csv'), 172 | 'r', encoding='utf-8') 173 | csv_read = csv.reader(f) 174 | 175 | count = 0 176 | tweet_ids = [] 177 | for line in csv_read: 178 | count += 1 179 | if count == 1: 180 | continue 181 | 182 | idx, text, cat = line 183 | tweet_ids.append(idx) 184 | 185 | graph = Graph(tweet_ids=None, nodes_file='authors.json', 186 | edges_file='author_edges.txt') 187 | graph.add_follower_edges() 188 | graph.form_graph() 189 | # graph.print_graph() 190 | 191 | 192 | if __name__ == "__main__": 193 | main() 194 | -------------------------------------------------------------------------------- /twitter_model.py: -------------------------------------------------------------------------------- 1 | import numpy 2 | import os 3 | import random 4 | os.environ['PYTHONHASHSEED'] = '0' 5 | numpy.random.seed(57) 6 | random.seed(75) 7 | os.environ['KERAS_BACKEND'] = 'theano' 8 | 9 | if os.environ['KERAS_BACKEND'] == 'tensorflow': 10 | import tensorflow 11 | tensorflow.set_random_seed(35) 12 | 13 | from cross_validate import run_cv 14 | from grid_search import perform_grid_search 15 | from main_classifier import MainClassifier 16 | from resources.textual import clean_tweet 17 | from test import test 18 | 19 | import argparse 20 | import csv 21 | 22 | 23 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 24 | 25 | CONFIG = { 26 | 'EMB_FILE': 'glove.twitter.27B.200d.txt', 27 | 'EMB_MODEL': None, 28 | 'EMB_DIM': 200, 29 | 'EMB_MIN_DF': 1, 30 | 'EMB_MAX_DF': -1, 31 | 'EMB_MAX_VCB': 50000, 32 | 'WORD_MIN_FREQ': 2, 33 | 'DNN_EPOCH': 50, 34 | 'DNN_BATCH': 64, 35 | 'DNN_VAL_SPLIT': 0.04, 36 | 'DNN_HIDDEN_UNITS': 128, 37 | 'GB_LEAVES': 31, 38 | 'GB_LEAF_WEIGHT': 7, 39 | 'GB_LEAF_SAMPLES': 10, 40 | 'GB_ITERATIONS': 125, 41 | 'GB_LEARN_RATE': 0.08, 42 | 'LR_C': 25, 43 | 'NGRAM_MODEL': None, 44 | 'TF_NRANGE': (1, 4), 45 | 'TF_SUBLIN': False, 46 | 'TF_MAX_FEAT': 10000, 47 | 'TF_USE_IDF': False, 48 | 'CLASSIFIER': None, 49 | 'METHOD': None, 50 | 'GRID_SEARCH_SIZE': 25000, 51 | 'BASE': BASE_DIR, 52 | } 53 | 54 | 55 | def read_data(data_file): 56 | read_f = open(data_file, 'r', encoding='utf-8') 57 | csv_read = csv.reader(read_f) 58 | 59 | texts = [] 60 | classes = [] 61 | ids = [] 62 | count = 0 63 | 64 | for line in csv_read: 65 | count += 1 66 | if count == 1: 67 | continue 68 | 69 | id, text, clazz = line 70 | classes.append(int(clazz)) 71 | texts.append(text) 72 | ids.append(id) 73 | 74 | return (ids, texts, classes) 75 | 76 | 77 | def check_classifier(): 78 | classifier = MainClassifier(CONFIG) 79 | classifier.classify(None, '') 80 | while(True): 81 | text = input() 82 | category = classifier.classify(None, text) 83 | print(category) 84 | 85 | 86 | def parse_arguments(): 87 | parser = argparse.ArgumentParser(description='Experimentation with' 88 | ' Twitter datasets') 89 | 90 | parser.add_argument('-c', '--cross_val', action='store', type=int, 91 | dest='cross_val_size', 92 | help='Part of dataset to be used for cross validation') 93 | 94 | parser.add_argument('-g', '--grid_search', action='store', type=str, 95 | nargs=3, dest='grid_params', 96 | metavar=('ESTIMATOR: gbc/svm', 'FEATURES', 'FEATURES'), 97 | help='Model and features to be used for grid search') 98 | 99 | parser.add_argument('-t', '--train_test', action='store', type=int, 100 | dest='train_test_split', default=10000, 101 | help='Split point of data for training and testing') 102 | 103 | parser.add_argument('-m', '--method', action='store', type=str, 104 | dest='method', default='lna', 105 | help='Method to run') 106 | 107 | parser.add_argument('-ft', '--full-train', action='store_true', 108 | dest='full_train', 109 | help='Presence of flag will ensure pre-trained ' 110 | 'models are not used') 111 | 112 | return parser.parse_args() 113 | 114 | 115 | if __name__ == "__main__": 116 | args = parse_arguments() 117 | 118 | data_file = os.path.join(BASE_DIR, 'TwitterData', 'twitter_data_waseem_hovy.csv') 119 | (ids, texts, classes) = read_data(data_file) 120 | texts = [clean_tweet(t) for t in texts] 121 | 122 | CONFIG['METHOD'] = args.method 123 | CONFIG['EMB_MODEL'] = '' if args.full_train else None 124 | if args.cross_val_size is not None: 125 | run_cv(ids[:args.cross_val_size], 126 | texts[:args.cross_val_size], 127 | classes[:args.cross_val_size], 128 | CONFIG) 129 | 130 | elif args.grid_params is not None: 131 | perform_grid_search(ids, texts, classes, args.grid_params, CONFIG) 132 | 133 | else: 134 | classifier = MainClassifier(CONFIG) 135 | 136 | split = args.train_test_split 137 | classifier.train(ids[:split], texts[:split], classes[:split]) 138 | test(ids[split:], texts[split:], classes[split:], classifier) 139 | --------------------------------------------------------------------------------