├── .gitignore ├── Annotations.csv ├── LICENSE ├── MANIFEST.in ├── README.md ├── requirements.txt ├── setup.py └── synthesis_paragraph_classifier ├── __init__.py ├── classifier.py ├── data ├── Model-2018-08-02-12_02_36-lightlda_r0_paragraph_topic_100-lightlda_r0_sentence_topic_200-RandomForest-all.pypickle ├── SynthesisClassification.yml ├── topic_100_p │ ├── LightLDA.0.1720000.log │ ├── LightLDA.1.1820000.log │ ├── corpus.dict │ ├── server_0_table_0.model │ ├── server_0_table_1.model │ ├── server_1_table_0.model │ └── server_1_table_1.model └── topic_200_s │ ├── LightLDA.0.1670000.log │ ├── LightLDA.1.1690000.log │ ├── corpus.dict │ ├── server_0_table_0.model │ ├── server_0_table_1.model │ ├── server_1_table_0.model │ └── server_1_table_1.model ├── featurizers ├── ParagraphHighFreqTopics.py ├── SentenceHighFreqTopics.py ├── TopicNGram.py ├── __init__.py └── base.py ├── nlp ├── __init__.py ├── preprocessing.py ├── test │ └── test_token_filter.py ├── token_filter.py ├── token_storage.py └── vocabulary.py └── topics ├── __init__.py └── lightlda.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/.gitignore -------------------------------------------------------------------------------- /Annotations.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/Annotations.csv -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/LICENSE -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/MANIFEST.in -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/README.md -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | chemdataextractor 2 | numpy 3 | pyyaml 4 | scikit-learn 5 | spacy 6 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/setup.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/classifier.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/classifier.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/Model-2018-08-02-12_02_36-lightlda_r0_paragraph_topic_100-lightlda_r0_sentence_topic_200-RandomForest-all.pypickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/Model-2018-08-02-12_02_36-lightlda_r0_paragraph_topic_100-lightlda_r0_sentence_topic_200-RandomForest-all.pypickle -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/SynthesisClassification.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/SynthesisClassification.yml -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/LightLDA.0.1720000.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/LightLDA.0.1720000.log -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/LightLDA.1.1820000.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/LightLDA.1.1820000.log -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/corpus.dict: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/corpus.dict -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/server_0_table_0.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/server_0_table_0.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/server_0_table_1.model: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/server_1_table_0.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/server_1_table_0.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_100_p/server_1_table_1.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_100_p/server_1_table_1.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/LightLDA.0.1670000.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/LightLDA.0.1670000.log -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/LightLDA.1.1690000.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/LightLDA.1.1690000.log -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/corpus.dict: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/corpus.dict -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/server_0_table_0.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/server_0_table_0.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/server_0_table_1.model: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/server_1_table_0.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/server_1_table_0.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/data/topic_200_s/server_1_table_1.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/data/topic_200_s/server_1_table_1.model -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/featurizers/ParagraphHighFreqTopics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/featurizers/ParagraphHighFreqTopics.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/featurizers/SentenceHighFreqTopics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/featurizers/SentenceHighFreqTopics.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/featurizers/TopicNGram.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/featurizers/TopicNGram.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/featurizers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/featurizers/__init__.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/featurizers/base.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/featurizers/base.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/__init__.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/preprocessing.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/preprocessing.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/test/test_token_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/test/test_token_filter.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/token_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/token_filter.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/token_storage.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/token_storage.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/nlp/vocabulary.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/nlp/vocabulary.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/topics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/topics/__init__.py -------------------------------------------------------------------------------- /synthesis_paragraph_classifier/topics/lightlda.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CederGroupHub/synthesis-paragraph-classifier/HEAD/synthesis_paragraph_classifier/topics/lightlda.py --------------------------------------------------------------------------------