├── .gitignore ├── LDAdata ├── model-final.others ├── model-final.phi ├── model-final.tassign ├── model-final.theta ├── model-final.twords ├── newdocs.dat └── wordmap.txt ├── README.md ├── StopWordTable.txt ├── bin ├── application │ ├── Controller1.class │ ├── Main.class │ ├── Scene1.fxml │ └── application.css └── com │ └── sxu │ ├── Crawler │ ├── GetUrl.class │ ├── InfoSpider.class │ ├── Url.class │ └── spider.class │ ├── Similarity │ └── Similarity.class │ ├── UserData │ └── User.class │ └── Vector │ ├── LDA │ └── LDA.class │ ├── Segmentation │ ├── Filepro.class │ ├── JieBa.class │ └── StopWords.class │ ├── Word2Vec │ └── Word2Vec.class │ └── main │ └── Process.class ├── build.fxbuild ├── data └── data.txt ├── lib ├── LDA.jar ├── Word2Vec.jar ├── jieba.jar └── jsoup-1.8.2.jar ├── result ├── vector.txt ├── vector1000.txt └── 分词结果.txt ├── src ├── application │ ├── Controller1.java │ ├── Main.java │ ├── Scene1.fxml │ └── application.css └── com │ └── sxu │ ├── Similarity │ └── Similarity.java │ ├── UserData │ └── User.java │ └── Vector │ ├── LDA │ └── LDA.java │ ├── Segmentation │ ├── Filepro.java │ ├── JieBa.java │ └── StopWords.java │ ├── Word2Vec │ └── Word2Vec.java │ └── main │ └── Process.java ├── temp ├── tempCorpus303848893241311564.txt ├── tempCorpus4113479871682855437.txt └── tempCorpus460574311849048332.txt └── 网络水军识别系统使用手册.docx /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled class file 2 | *.class 3 | 4 | # Log file 5 | *.log 6 | 7 | # BlueJ files 8 | *.ctxt 9 | 10 | # Mobile Tools for Java (J2ME) 11 | .mtj.tmp/ 12 | 13 | # Package Files # 14 | *.jar 15 | *.war 16 | *.nar 17 | *.ear 18 | *.zip 19 | *.tar.gz 20 | *.rar 21 | 22 | # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml 23 | hs_err_pid* 24 | -------------------------------------------------------------------------------- /LDAdata/model-final.others: -------------------------------------------------------------------------------- 1 | alpha=0.5 2 | beta=0.1 3 | ntopics=100 4 | ndocs=3065 5 | nwords=9735 6 | liters=49 7 | -------------------------------------------------------------------------------- /LDAdata/newdocs.dat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Wiskia/Spamer-Detect-System/dd79aa5bf90ff35c7a2f5124a1de3c1ad2a575ce/LDAdata/newdocs.dat -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Spamer-Detect-System 2 | 对汽车之家论坛里的评论数据处理和分析,利用用户潜在行为数据得出用户行为特征,采用LDA主题模型得出用户评论的主题特征,采用Word2Vec词向量模型得出用户评论的文本内容特征,采用K-Means聚类得出水军文本类别,结合用户行为特征,最终实现了对网络水军的识别。 3 | -------------------------------------------------------------------------------- /StopWordTable.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Wiskia/Spamer-Detect-System/dd79aa5bf90ff35c7a2f5124a1de3c1ad2a575ce/StopWordTable.txt -------------------------------------------------------------------------------- /bin/application/Controller1.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Wiskia/Spamer-Detect-System/dd79aa5bf90ff35c7a2f5124a1de3c1ad2a575ce/bin/application/Controller1.class -------------------------------------------------------------------------------- /bin/application/Main.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Wiskia/Spamer-Detect-System/dd79aa5bf90ff35c7a2f5124a1de3c1ad2a575ce/bin/application/Main.class -------------------------------------------------------------------------------- /bin/application/Scene1.fxml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |