├── README.md ├── Topic_Modeling_BERT+LDA.ipynb ├── dev-deployment.png └── model-explanation.png /README.md: -------------------------------------------------------------------------------- 1 | # Topic-Modeling-BERT-LDA 2 | Topic modeling with BERT, LDA and Clustering. Latent Dirichlet Allocation(LDA) probabilistic topic assignment and pre-trained sentence embeddings from BERT/RoBERTa. 3 | 4 | 5 | Model explanation 6 | LDA for probabilistic topic assignment vector. 7 | BERT for sentence embedding vector. 8 | Concatenated both LDA and BERT vector with a weight hyperparameter to balance the relative importance of information from each source. 9 | Use autoencoders to learn a lower dimensional latent space representation of the concatenated vector.(Why auto encoders why not pca. PCA is linear transformation of vectors and AE are non linear. 10 | The assumption is that the concatenated vector should have manifold shape in higher dimensional space. 11 | Used clustering on the latent space representations to get topics. 12 | ![](model-explanation.png) 13 | 14 | Data pipeline(From development to deployment) 15 | ![](dev-deployment.png) 16 | -------------------------------------------------------------------------------- /dev-deployment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AravindR7/Topic-Modeling-BERT-LDA/cbb7e4777d677330c64a84e4ab9be547424c5a0b/dev-deployment.png -------------------------------------------------------------------------------- /model-explanation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AravindR7/Topic-Modeling-BERT-LDA/cbb7e4777d677330c64a84e4ab9be547424c5a0b/model-explanation.png --------------------------------------------------------------------------------