├── .gitignore ├── README.md ├── corpus ├── DaleChallEasyWordList.txt ├── common.txt ├── common_phrases.txt ├── contractions.txt ├── hyperbolic.txt └── terrier-stopword.txt ├── datasets ├── clickbait_data ├── non_clickbait_data ├── test.csv ├── train.csv └── unlabelled.csv ├── notebooks ├── .ipynb_checkpoints │ ├── Embeddings-checkpoint.ipynb │ ├── Feature_engineering-checkpoint.ipynb │ ├── Splitting_data_EDA-checkpoint.ipynb │ ├── feature_selection_decomposition-checkpoint.ipynb │ └── models_ensembles_tuning-checkpoint.ipynb ├── Embeddings.ipynb ├── Feature_engineering.ipynb ├── Splitting_data_EDA.ipynb ├── __pycache__ │ ├── featurization.cpython-37.pyc │ └── utility.cpython-37.pyc ├── decision_boundary.ipynb ├── feature_selection_decomposition.ipynb ├── featurization.py ├── models_ensembles_tuning.ipynb ├── saved_models │ ├── saved_model.pb │ └── variables │ │ ├── variables.data-00000-of-00002 │ │ ├── variables.data-00001-of-00002 │ │ └── variables.index └── utility.py └── web_crawled └── breitbart.csv /.gitignore: -------------------------------------------------------------------------------- 1 | vectors/* 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/README.md -------------------------------------------------------------------------------- /corpus/DaleChallEasyWordList.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/DaleChallEasyWordList.txt -------------------------------------------------------------------------------- /corpus/common.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/common.txt -------------------------------------------------------------------------------- /corpus/common_phrases.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/common_phrases.txt -------------------------------------------------------------------------------- /corpus/contractions.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/contractions.txt -------------------------------------------------------------------------------- /corpus/hyperbolic.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/hyperbolic.txt -------------------------------------------------------------------------------- /corpus/terrier-stopword.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/corpus/terrier-stopword.txt -------------------------------------------------------------------------------- /datasets/clickbait_data: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/datasets/clickbait_data -------------------------------------------------------------------------------- /datasets/non_clickbait_data: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/datasets/non_clickbait_data -------------------------------------------------------------------------------- /datasets/test.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/datasets/test.csv -------------------------------------------------------------------------------- /datasets/train.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/datasets/train.csv -------------------------------------------------------------------------------- /datasets/unlabelled.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/datasets/unlabelled.csv -------------------------------------------------------------------------------- /notebooks/.ipynb_checkpoints/Embeddings-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/.ipynb_checkpoints/Embeddings-checkpoint.ipynb -------------------------------------------------------------------------------- /notebooks/.ipynb_checkpoints/Feature_engineering-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/.ipynb_checkpoints/Feature_engineering-checkpoint.ipynb -------------------------------------------------------------------------------- /notebooks/.ipynb_checkpoints/Splitting_data_EDA-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/.ipynb_checkpoints/Splitting_data_EDA-checkpoint.ipynb -------------------------------------------------------------------------------- /notebooks/.ipynb_checkpoints/feature_selection_decomposition-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/.ipynb_checkpoints/feature_selection_decomposition-checkpoint.ipynb -------------------------------------------------------------------------------- /notebooks/.ipynb_checkpoints/models_ensembles_tuning-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/.ipynb_checkpoints/models_ensembles_tuning-checkpoint.ipynb -------------------------------------------------------------------------------- /notebooks/Embeddings.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/Embeddings.ipynb -------------------------------------------------------------------------------- /notebooks/Feature_engineering.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/Feature_engineering.ipynb -------------------------------------------------------------------------------- /notebooks/Splitting_data_EDA.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/Splitting_data_EDA.ipynb -------------------------------------------------------------------------------- /notebooks/__pycache__/featurization.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/__pycache__/featurization.cpython-37.pyc -------------------------------------------------------------------------------- /notebooks/__pycache__/utility.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/__pycache__/utility.cpython-37.pyc -------------------------------------------------------------------------------- /notebooks/decision_boundary.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/decision_boundary.ipynb -------------------------------------------------------------------------------- /notebooks/feature_selection_decomposition.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/feature_selection_decomposition.ipynb -------------------------------------------------------------------------------- /notebooks/featurization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/featurization.py -------------------------------------------------------------------------------- /notebooks/models_ensembles_tuning.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/models_ensembles_tuning.ipynb -------------------------------------------------------------------------------- /notebooks/saved_models/saved_model.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/saved_models/saved_model.pb -------------------------------------------------------------------------------- /notebooks/saved_models/variables/variables.data-00000-of-00002: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/saved_models/variables/variables.data-00000-of-00002 -------------------------------------------------------------------------------- /notebooks/saved_models/variables/variables.data-00001-of-00002: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/saved_models/variables/variables.data-00001-of-00002 -------------------------------------------------------------------------------- /notebooks/saved_models/variables/variables.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/saved_models/variables/variables.index -------------------------------------------------------------------------------- /notebooks/utility.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/notebooks/utility.py -------------------------------------------------------------------------------- /web_crawled/breitbart.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anirudhshenoy/text-classification-small-datasets/HEAD/web_crawled/breitbart.csv --------------------------------------------------------------------------------