├── .gitignore ├── README.md ├── analysis_results ├── comment_wordcloud.png ├── content_tags.html ├── hot_topics.html ├── hot_words.html ├── interaction_network.html ├── location_analysis.html ├── sentiment_analysis.html ├── time_analysis.html ├── user_influence.html └── user_portraits.html ├── crawled_comments ├── douyin_comments_H_O7wyWOUQQ_20250312_181221.csv ├── douyin_comments__20250311_212326.csv ├── douyin_comments__20250311_223146.csv ├── douyin_comments__20250311_231655.csv ├── douyin_comments__20250312_001546.csv ├── douyin_comments__20250312_002242.csv ├── douyin_comments_fEJejTD6CQ8_20250524_000829.csv ├── douyin_comments_fEJejTD6CQ8_20250524_002249.csv ├── douyin_comments_i5g6kb83_20250312_001130.csv ├── douyin_comments_i5g6kb83_20250312_001710.csv ├── douyin_comments_i5g6kb83_20250312_003421.csv ├── douyin_comments_i5g6kb83_20250313_153107.csv └── douyin_comments_i5ph454C_20250312_233129.csv ├── douyin_analysis_results ├── README.md ├── comment_wordcloud.png ├── crawled_comments │ ├── douyin_comments_7505063238057889081_20250524_000454.csv │ ├── douyin_comments_fEJejTD6CQ8_20250524_001812.csv │ ├── douyin_comments_fEJejTD6CQ8_20250524_104505.csv │ ├── douyin_comments_fEJejTD6CQ8_20250524_105423.csv │ └── douyin_comments_fEJejTD6CQ8_20250524_113220.csv ├── hot_words.html ├── location_analysis.html ├── requirements.txt ├── time_analysis.html ├── 抖音工具集.py ├── 抖音数据分析器.py ├── 抖音视频搜索.py ├── 抖音评论分析器_旧版.py └── 抖音评论爬虫.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Python 临时文件 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | *.so 6 | .Python 7 | env/ 8 | build/ 9 | develop-eggs/ 10 | dist/ 11 | downloads/ 12 | eggs/ 13 | .eggs/ 14 | lib/ 15 | lib64/ 16 | parts/ 17 | sdist/ 18 | var/ 19 | *.egg-info/ 20 | .installed.cfg 21 | *.egg 22 | 23 | # 虚拟环境 24 | venv/ 25 | ENV/ 26 | .env 27 | 28 | # IDE相关文件 29 | .idea/ 30 | .vscode/ 31 | *.swp 32 | *.swo 33 | 34 | # 日志文件 35 | logs/ 36 | *.log 37 | 38 | # 系统文件 39 | .DS_Store 40 | Thumbs.db 41 | 42 | # 配置文件中可能包含的敏感信息 43 | config.ini 44 | secrets.json 45 | 46 | # 缓存目录 47 | .pytest_cache/ 48 | .coverage 49 | htmlcov/ 50 | 51 | # 可能的大数据文件 52 | # *.csv # 注释掉,保留CSV数据文件 53 | *.xlsx 54 | *.db 55 | *.sqlite3 56 | 57 | # 但保留项目必要的样例数据文件和爬取的评论数据 58 | !example_data/*.csv 59 | !crawled_comments/*.csv 60 | !analysis_results/*.csv -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 抖音评论爬取与分析工具 2 | 3 | ## 功能介绍 4 | 5 | 这是一个强大的抖音评论爬取与分析工具,具有以下功能: 6 | 7 | 1. **评论爬取**:自动爬取指定抖音视频的评论数据 8 | - 支持各种格式的抖音链接(短链接和标准链接) 9 | - 支持爬取全部评论,不受页数限制 10 | - 自动处理滚动和加载,提高评论获取的成功率 11 | - 支持提取评论的回复内容 12 | - 详细的评论元数据提取(用户信息、时间、位置等) 13 | 14 | 2. **基础分析功能**: 15 | - 评论词云生成 16 | - 评论地区分布分析 17 | - 评论时间分布分析 18 | - 热门词汇统计 19 | - 情感分析 20 | 21 | 3. **高级数据分析功能**: 22 | - **用户群体画像分析**:分析用户特征、活跃时间、地域分布和语言风格 23 | - **评论互动关系图**:可视化用户间的回复和互动关系 24 | - **内容标签分布分析**:自动提取和分析评论主题与标签 25 | - **用户活跃度与影响力分析**:识别高影响力用户和活跃用户 26 | - **热点话题识别与趋势追踪**:分析评论中的热点话题及其时间变化 27 | 28 | ## 安装依赖 29 | 30 | ```bash 31 | pip install -r requirements.txt 32 | ``` 33 | 34 | ## 使用方法 35 | 36 | ### 运行主程序 37 | 38 | ```bash 39 | python douyin_analysis_results/douyin_tool.py 40 | ``` 41 | 42 | ### 爬取评论 43 | 44 | 1. 选择菜单中的"爬取新的评论并分析" 45 | 2. 输入抖音视频URL(支持多种格式) 46 | 3. 设置最大爬取页数(直接回车表示爬取全部评论) 47 | 4. 选择是否使用正常浏览器模式(可以登录账号查看评论) 48 | 5. 等待爬取完成 49 | 50 | ### 分析评论 51 | 52 | 1. 选择菜单中的"分析已有的评论数据" 53 | 2. 选择要分析的CSV文件 54 | 3. 选择词云形状图片(可选) 55 | 4. 等待分析完成,结果将保存在`analysis_results`目录下 56 | 57 | ## 分析功能详解 58 | 59 | ### 1. 用户群体画像分析 60 | 61 | - **语言风格分析**:识别用户使用的网络流行语、学生用语、职场用语等 62 | - **活跃时段分析**:分析用户在一天中的活跃时间分布 63 | - **互动频率分析**:区分高频、中频和低频互动用户 64 | - **地域分布热力图**:以地图形式展示用户地理分布 65 | 66 | ### 2. 评论互动关系图分析 67 | 68 | - **用户互动网络**:可视化用户之间的回复和互动关系 69 | - **中心用户识别**:自动识别网络中的中心用户和意见领袖 70 | - **互动模式分析**:分析用户间的互动模式和社交结构 71 | 72 | ### 3. 内容标签分布分析 73 | 74 | - **主题分类**:将评论内容自动分类到预定义主题 75 | - **关键词提取**:基于内容提取主要关键词和话题标签 76 | - **热门标签统计**:分析评论中出现的热门标签和话题 77 | 78 | ### 4. 用户活跃度与影响力分析 79 | 80 | - **活跃用户排行**:根据评论频率识别最活跃的用户 81 | - **影响力得分**:基于评论数、点赞数和评论质量计算用户影响力 82 | - **内容贡献分析**:识别提供高质量内容的用户 83 | 84 | ### 5. 热点话题识别与追踪 85 | 86 | - **话题识别**:自动识别评论中的主要话题 87 | - **时间趋势分析**:追踪话题热度随时间的变化 88 | - **关键词重要性**:使用TF-IDF技术分析关键词的重要性 89 | - **突发热点识别**:检测评论中出现的突发热点话题 90 | 91 | ## 注意事项 92 | 93 | - 使用前请确保安装了所有依赖项 94 | - 首次运行时可能需要下载浏览器驱动 95 | - 爬取大量评论可能需要较长时间 96 | - 部分分析功能可能需要较高的系统配置 -------------------------------------------------------------------------------- /analysis_results/comment_wordcloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/analysis_results/comment_wordcloud.png -------------------------------------------------------------------------------- /analysis_results/content_tags.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
14 |
15 | 177 |
178 | 488 |
489 | 491 | 492 | 493 | -------------------------------------------------------------------------------- /analysis_results/hot_words.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 525 | 526 | 527 | -------------------------------------------------------------------------------- /analysis_results/location_analysis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 |
12 | 291 | 292 | 293 | -------------------------------------------------------------------------------- /analysis_results/sentiment_analysis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 |
12 | 230 | 231 | 232 | -------------------------------------------------------------------------------- /analysis_results/time_analysis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 |
12 | 296 | 297 | 298 | -------------------------------------------------------------------------------- /analysis_results/user_portraits.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
14 |
15 | 176 |
177 | 344 |
345 | 501 |
502 | 772 |
773 | 775 | 776 | 777 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments__20250311_212326.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | 7480167895684023097,a 嘉楠 a,内蒙古,2025-03-10 21:11:48,客厅飘窗下面的抽屉里还有几十盒面膜,二十多个洗面奶,口红二十多个,唇膏十来个,精华液,精油洗发水护发素各十几瓶,两个卫生间柜子里抽屉里都是护肤品[泪奔]卫生巾几十包[泪奔],还有过期的精华放茶几上当护手霜[泪奔]没穿过的新鞋四十多双,衣服更数不清[尬笑],6 3 | 7480146101266957091,麻薯小丸子,广东,2025-03-10 19:47:17,寄给我吧,我爱吃,0 4 | 7480142284349440828,璟恩子,四川,2025-03-10 19:32:26,不爱吃零食,3 5 | 7480419867767866170,啊~biu,四川,2025-03-11 13:29:38,我也是!但是我觉得这就是在宴请小时候的自己[大笑],11 6 | 7480129278760944393,诶嘿嘿💎,陕西,2025-03-10 18:42:00,买了辣的回去又想吃甜的,买了甜的回去又想吃咸的,一进零食店又不想买了,回去后又后悔没买[尬笑]跟有病似的,114 7 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments__20250311_223146.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments__20250311_231655.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments__20250312_001546.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments__20250312_002242.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_000829.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,240,49,,,否,否,, 3 | 7507664919627662121,温柔,96357929612,MS4wLjABAAAAq03QBbhYNNBQfIRYLPy3m-CxlJexMOE_k3EE1h_VubE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oE5meaBHAAAOA9rpgI8oSoCnHbXofIdDAACCTQ.jpeg?from=2064092626,福建,2025-05-24 00:05:21,要想让这款u热火,搭载这款cpu的手机开放root权限就是最好的路,0,0,,,否,否,, 4 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29492,775,,,否,否,, 5 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51405,565,,,否,否,, 6 | 7506482138801226522,迷路的旺仔,81924511976,MS4wLjABAAAAJmDSj5YI7AEl3hepgkNBwaCMhDzDm-9M9yodQefHbfE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_00d105df8e1545078b0450c99f6f5d99.jpeg?from=2064092626,湖北,2025-05-20 19:21:36,雷总你终于回来了,你知道我有多想你吗 [流泪] [流泪],1734,14,,,否,否,, 7 | 7507670325846393639,凉夜听风/自律,4032664197929355,MS4wLjABAAAAhJE1pcmIqOiCWkfVIY0igBh4iwHMqn3_q_ocA5OVHahwc8uAIcObox0jT4RIobeq,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogC79ADVvF1fVARWfjVoCAQEAs5FIAmlAQgAVT.jpeg?from=2064092626,甘肃,2025-05-23 23:55:28,"全网最成功的几位博主 8 | 1、湖远行(2534万粉丝) 9 | 2、雷军(4562万粉丝) 10 | 3、董宇辉(2731万粉丝) 11 | 4、我 (59 .79 万粉丝)",2,0,,,否,否,, 12 | 7506480179830293275,赶路人,83246963821,MS4wLjABAAAA4SBS2nwK5tgy3D1UdDCm8BCFLSd-XAymBVVcNcQig2w,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o01AcLr7Ar5BVlGQ7n3EBef1zIIsEALUAAAKeG.jpeg?from=2064092626,广东,2025-05-20 18:56:57,雷总超级期待新产品 必须买15sPro,1716,145,,,否,否,, 13 | 7507653011641238331,椰果冻,2955956893538686,MS4wLjABAAAAcUlfUsoimmFwlQ6M0ZRa5YA_k7K11iTX411rfWWKtkkM-kfKD4i_RrMkzaErK5u_,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oYARAfclDEDnFAeAmUELCYEi1EeIXjpwsZ9u4A.jpeg?from=2064092626,安徽,2025-05-23 22:48:05,爸,不许和我们冷战了[酷拽]一个月没理我们了[流泪],15,2,,,否,否,, 14 | 7506483045878727461,在下方何,3369368544881071,MS4wLjABAAAAoaFVNHMv2W9k6GlhkGEr1jBlVkfOZEUbazU6fjL5EqINaMM3SVYokjdyvEzc8yiy,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4AEEkmfADxVZD4XGgFBgRIE9AzYAqGfAACnA8.jpeg?from=2064092626,广东,2025-05-20 19:35:51,爸:吃了吗,1065,118,,,否,否,, 15 | 7506541873923769142,冬雨,98699399279,MS4wLjABAAAAFowAYqjTcYdDE8ILI3G4EptJlR9ZkUEe2-DEUU00z-U,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_b687e66f31a44bb7b4c1e2840959ce74.jpeg?from=2064092626,云南,2025-05-20 22:56:23,兄弟们,今天刚领证[嘿哈],想要雷总祝福,2072,175,,,否,否,, 16 | 7507625909693039423,棋棋1102,87761611655,MS4wLjABAAAAir3SzAGiUX1ThGKIT_TdEO1-h4IuqORWDM7EG8PEbMc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_edbb3d586277ee65aafe7577f95e3227.jpeg?from=2064092626,湖南,2025-05-23 21:02:57,雷总,我刚提车十天,被网暴了十天,从来不说粗口的我,与黑粉对骂了几天[流泪],120,18,,,否,否,, 17 | 7506881578614735675,鑫河车机,3318751792735328,MS4wLjABAAAANNWgCK0wglqJMiBk8k69F4R1KEQTeKTQ2xpJ2cc_9Uj-RU6D5vCd8bbz_St-9fR1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_17e2eb811ff3b2760527e5266eb1d654.jpeg?from=2064092626,江苏,2025-05-21 20:54:31,雷总你冷战期间,我偷偷去买了一部 512G+16G的小米15,下次别再冷战了,我已经没钱了,719,51,,,否,否,, 18 | 7506603418384368444,炸丸子,99907808631,MS4wLjABAAAAQpwz4rYpjqy5QLKAkv0tyv8YLppLSZK2Kgr-vmyoAcE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_02ab4fe0fc7d42f38256713ce5ae1c06.jpeg?from=2064092626,云南,2025-05-21 02:55:20,雷爸 以后不准和我们冷战了[流泪],152,6,,,否,否,, 19 | 7506484813844185890,@爱你一万年,1797054772550670,MS4wLjABAAAAa_0ivBpm4YeKmz_Q33G1G_0A9sV4Ybrpbxf50DhitHVj4OMEuAQcf_jRl0PzBkvr,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oQdBeFXcDIM3PiAAxiAA8AQeNwQc7FKBGAiPfE.jpeg?from=2064092626,山西,2025-05-20 19:35:55,军儿,你终于回来了 [流泪], 不要再断更了,你不在时每隔几天我都要来看看你发视频了没,5665,78,,,否,否,, 20 | 7507672331880776463,诛戮陷绝,71716676720,MS4wLjABAAAAXNipfx-WnaoQj1nHNdNa_cC9fzh_51fnJKTTP6Jfugg,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogbC7cHAAACDAcgwJJnfCkIIuEASgt9j7ekACo.jpeg?from=2064092626,新疆,2025-05-24 00:03:12,[微笑]为什么不早说,我上个月刚买的15,我想用用玄界。,0,0,,,否,否,, 21 | 7506486944446300979,卢伟冰,2544737774480633,MS4wLjABAAAAeeGpeTBNIRe66uQLgsFZmiRXR4GEQnh6FCtORpNOjXMPVjYqkmDeDtYhzBirNj_k,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oIz1AhKAFIA5LfdA1CctDn9p3bAaCegjPyE4AA.jpeg?from=2064092626,北京,2025-05-20 19:23:13,5月22日,发布会见![爱心],2279,143,,,否,否,, 22 | 7506488223708889867,🌈带努比的二明,85271783837,MS4wLjABAAAABCaMkM4yYIo56epa-TYx3MRfbHUmvx_jhnrROzH9gDs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a0c35b5fe71acd688f89782beadd1e86.jpeg?from=2064092626,广东,2025-05-20 19:55:42,爸吃饭了吗,55,23,,,否,否,, 23 | 7506537868958892800,小汪哥精致男装,1084535024136804,MS4wLjABAAAA_rkgRekkXc9BqXuVcgu3dLgzqqzkLBJt7qpkFVELHDTchJ0AUKBkr9rocCZrJl4K,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_f16a5d1a8b1d44635794d759eca66d40.jpeg?from=2064092626,重庆,2025-05-20 23:01:18,支持雷军的有多少,657,31,,,否,否,, 24 | 7506494061470663465,做个向日葵,97352596218,MS4wLjABAAAAOC1_YnR8s2eI8819pC0xBpwdUNWzt1V1Y1UQIuf0G1M,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_3a65ef707b78443bb201bf8bcfae01ff.jpeg?from=2064092626,湖南,2025-05-20 20:07:28,雷总,终于回来了,想死你了,1151,10,,,否,否,, 25 | 7506543779400254223,真由美🎖,7411158697970304059,MS4wLjABAAAAnWoUhTX74tn3ekpwq8hYuEcaozWIqL5ufhyVrEdGzBwc0sG5xZny3l1Ocg59kBo1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_bdd4a061a7958126e3580b7b94852a4d.jpeg?from=2064092626,重庆,2025-05-20 23:03:47,帅不帅你们说[舔屏],5306,472,,,否,否,, 26 | 7506578153490547492,蜜雪冰城(福州总店),3496062407160124,MS4wLjABAAAA7R_qUDVMbBZDIroUyQx9ptCrxpNVlsMRKBSKvd0u1z4641KVK8vKDaH0P0Yb9AbT,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4DGCA9AIAAbaQBeAmAgQ0ifCC8WYVPnl5rhh9.jpeg?from=2064092626,福建,2025-05-21 01:17:21,爸,443,58,,,否,否,, 27 | 7506848922740982566,卡通,2546112058894104,MS4wLjABAAAAk96fCkeU6KcVzYwIjBwiannGRfY1MoLgWOYvMjm-_0kUquq_otRmSXqHCqkDQpJp,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_3797_2889309425.jpeg?from=2064092626,广西,2025-05-21 18:47:50,爸,吃了吗?,1141,252,,,否,否,, 28 | 7506492129842217768,青禾,64301672255,MS4wLjABAAAAgEDqvTfEbI8noWC1J2SkJvWiwb_iKl_RBUCErHhyiiw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oEcAyiAsigXDzPy9BIhpZj1iEA7A4Oz3PgIAP.jpeg?from=2064092626,河南,2025-05-20 19:58:04,冷战结束,雷总回来啦,999,8,,,否,否,, 29 | 7506536991661245242,南重小肖,1279473564788093,MS4wLjABAAAAxbhMheOU0lQ5krDTRGD1R6g9WC-BcQNy4YaiLT9QFmTMESLdzorHt0PpCfydBneV,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_3137c6d29dabe13b8bc7790f03ec95db.jpeg?from=2064092626,广东,2025-05-20 22:37:21,米粉今天领证啦[抱抱你],509,53,,,否,否,, 30 | 7506550905915982650,王六六,69232476424,MS4wLjABAAAA77RVJV-6zevOEmadycWZZQjQ8lZwdCMBdmkaYDVN2z0,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_c52ecdfe87854914948071dc24442d71.jpeg?from=2064092626,江苏,2025-05-20 23:31:21,雷总 我的小米6为啥总是发烫呀[看],513,82,,,否,否,, 31 | 7506755771851391803,昏君不吃鱼,3019677097539415,MS4wLjABAAAASAtgMxIM7KeG8dKvLaRANhrIjwez7Vdo2L-hdAOgkNzNiL7p1X5KEp91Jx756fsK,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/douyin-user-image-file_85ab3a39770bf1d23b032e7abb2d0682.jpeg?from=2064092626,北京,2025-05-21 12:46:21,您好,雷总,我们是从西安到北京度蜜月的小米 su7 车主,我本人也是红米 turbo 4 pro 手机用户, 我和我的妻子很早之前就已经确定今天到北京了,没想到刚好遇到了明天是发布会,能不能让我们近距离的参加一下发布会🎉('ω')🎉感谢🙏,403,42,,,否,否,, 32 | 7506548565968814875,腾飞,954029966884967,MS4wLjABAAAAX29QOfYJnwJKi3bStCXfymChXaY8HhEmEppdw96rJ3g,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_9ee4a3a2a2ba6077d7fced2858c75bd3.jpeg?from=2064092626,广东,2025-05-20 23:22:18,爹,吃饭了吗?[偷笑],24,6,,,否,否,, 33 | 7506583171318170428,By,81801015985242,MS4wLjABAAAAYaFXru45hASQuh9AdF9VpO3HsTfMCWl1Q4lbZ-50zxE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ocf5cmACfACMA2DxAnouJRIg2F9HnrBAGDEEA6.jpeg?from=2064092626,广东,2025-05-21 01:36:43,我老总圈唯一人脉终于回来了,747,7,,,否,否,, 34 | 7507664223754060603,用户7823815321920,910018155381435,MS4wLjABAAAA1QYmS_Wju2LEHigo5yxs5rdvA201sDGydNS1SvJwFAo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_42550a394ef049da86d194c1a78ccff6.jpeg?from=2064092626,福建,2025-05-23 23:31:36,,23,1,,,否,否,, 35 | 7506488070751437625,A.R,4305321844552584,MS4wLjABAAAAsAEfNyY5D-kUae2B6Up1OX5hJF6uurc8Q4cYbS1VyInMZisMro6VvqfMrFsb91tk,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_owK9fkECXABqXQeRAUEV84wEDfAGFEAUzEAnPI.jpeg?from=2064092626,云南,2025-05-20 19:27:32,爸你终于回来了,228,14,,,否,否,, 36 | 7506481351215579942,🌈Y h,59237659425,MS4wLjABAAAAo9spT8FJ4Zdwm7gq0vsXbu6lpauaMK3YYpc9JzvnuJo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ooxAfC8XVInxkwAAHAAgbTDAl6QS9WaADAeVoC.jpeg?from=2064092626,江苏,2025-05-20 19:01:30,雷总,我的小米14聊天刷视频都烫手[尬笑][尬笑][尬笑],573,215,,,否,否,, 37 | 7506847770889274131,Moonlight,262129025558627,MS4wLjABAAAAsWAwmD2f53hsOt1J4LWp3TtJcEZo6JaOJGrdk8f2S0g,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_40dcb9b98afe1aacf814e5fcac697eb0.jpeg?from=2064092626,江苏,2025-05-21 18:43:24,雷总,下次不要和我们冷战了[流泪],1589,20,,,否,否,, 38 | 7506486988432016180,一枕清风明月,1516964907065831,MS4wLjABAAAAGfvWblt_lzcqneYBZKw_YU309MWbISyCUGK4y7W6S_zsjC8LXNc1UMk8l9yhv_j6,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oAHICO1HAD3F6Nmo0OFAg4QE9AsfAvXfAACnAG.jpeg?from=2064092626,陕西,2025-05-20 19:57:19,别人不断的黑小米,小米还能像个巨人一样屹立不倒,这才是真正用心的企业。希望小米越来越好,164,0,,,否,否,, 39 | 7507117301013218060,段小林儿,94921424644,MS4wLjABAAAAt9OH_Q5CUpi-6Di19lh2SJf9up1p0E1Bc4aKeX7QxRI,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a35920346420a0b987cfea605d0df79f.jpeg?from=2064092626,云南,2025-05-22 12:09:23,爸,吃饭了吗[比心],464,92,,,否,否,, 40 | 7507258796034999049,您保重,1357254247323852,MS4wLjABAAAAP7Aowepq-AGwKjhzadPqrhyn0EsecEThdcLn-Qbrtipk-sOGsRuRbdbofNMLCOi7,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_owh6YAATnDAT9YXyCziigTBCAeA9lVAIEKfQDX.jpeg?from=2064092626,山东,2025-05-22 21:18:24,盲猜开始了!来!大家出个价,26,15,,,否,否,, 41 | 7506533584396714812,卡皮丘,100702910418,MS4wLjABAAAAEkuLBz4Ykj64bc-dDtgim4hbLXkQARFbkdP87Drr3vs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a03d4f85ea88a5e9adb8fabdd8c004e8.jpeg?from=2064092626,重庆,2025-05-20 22:24:07,下次再冷暴力试试,997,13,,,否,否,, 42 | 7506493511123043091,橘子苹果大菠萝,95583713531,MS4wLjABAAAAsFlxJPhfZOiNM1-n_0_T7zruEotyvibeTw1grNmUzrM,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_2be01d94027c4da2b470a0a39d3a6d31.jpeg?from=2064092626,广东,2025-05-20 20:05:20,爸爸你为什么这么久了才回来 [流泪],499,19,,,否,否,, 43 | 7506484883049923387,梗姐姐,3474112572840599,MS4wLjABAAAARhMLBRHP-d570w9GcmOi_I6luP9gMqQco9HE-O94jnJA2ORTfzMFKt24ncmfKTYp,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_30a0e659e2bd5dff0ddbf7dd553dbdf0.jpeg?from=2064092626,北京,2025-05-20 19:15:09,爸,我盲订15s pro了![得意],207,17,,,否,否,, 44 | 7506484189350183719,杨林灿,3056230252949075,MS4wLjABAAAALNXq-4C-8hWZcJSwdpfN6hywSdVua0Om0rghJFWCH1UQHxNqlz3IXoyMXkkdt4CO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_2e3320001b947f2879603.jpeg?from=2064092626,上海,2025-05-20 19:12:32,小米为国人为国家做出重大贡献,为雷总点赞[赞][赞][赞][鼓掌][鼓掌][鼓掌][平安果][平安果][平安果],77,2,,,否,否,, 45 | 7506564456395080463,怜悯,65423123801,MS4wLjABAAAAPCZnpMKZhG_C12PfbP3i1jRnsBkfUPbDLkQpGIGPGfI,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oMnACra0IC6EjAF9fAbIpAgAAWAJg6fBhgODEb.jpeg?from=2064092626,湖南,2025-05-21 00:24:03,雷总,能不能解决一下小米十五发烫问题,51,55,,,否,否,, 46 | 7506531888698524431,ZhanG,86955928420,MS4wLjABAAAAc_kvIRbOCm26kwKQ8BiopdI2qiWiQWPg52oSV_tRKzA,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_9c9933d4e9b817ff1d13d80b6d632d5e.jpeg?from=2064092626,浙江,2025-05-20 22:17:35,那我的小米15 Pro算什么[泪奔][泪奔][泪奔],27,14,,,否,否,, 47 | 7507124521351807796,Doki兜兜,3426539489728047,MS4wLjABAAAAJ8IxviJWeiEjCX9sDmMXf-1ZDIsP-6oS5l7F-yLc1ATKhYvb-uQE03n9hEsDg0Fw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_105c8ede9583aca43d63dab0ee6e06d6.jpeg?from=2064092626,北京,2025-05-22 12:37:21,雷总你好,我5.20那天刚提了Ultra ,超级无敌帅,而且我们车主群里从来没有一个人提过想不要这辆车,非常支持您!顺便提一句,我做汽车主播几年了,前几天听说yu7在招主播,我第一时间报名但是我并没有收到面试信息,非常遗憾,所以在这里自荐一下,以后如果有需要,希望您的团队可以考虑我,以下是我的照片。,290,17,,,否,否,, 48 | 7506904505452675900,瑶瑶²⁰²⁵~,100323943278,MS4wLjABAAAAxDeDUYDMIdx-XKOPNVa5mj3nwEDAFA4tsMoR0ujcKeA,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_86f6c561441abeec4cf2a83421d25cdf.jpeg?from=2064092626,江西,2025-05-21 22:23:30,爸,你吃饭了吗,377,88,,,否,否,, 49 | 7507667954403312422,— —,98126385233,MS4wLjABAAAATSI9bEfcSOY3qPMvpUkYJ6KrWKLpXG8SYpWbEGpRYkw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_fd8dbd2a5eca4d1aa52dee64ffb00fdf.jpeg?from=2064092626,新疆,2025-05-23 23:46:10,雷总出啥都无脑支持,2,1,,,否,否,, 50 | 7506489714263147290,鬼畜眼镜,1614548231792604,MS4wLjABAAAAqeBDFrUNu6SdHluquTGprkHKSm48rthMBOCYY9UA6WH13ejk42hIY5ACvRRYuqRd,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_69c10b7fd32948c7809b93d9ad67b536.jpeg?from=2064092626,天津,2025-05-20 19:33:56,有人想让小米死[抠鼻],17,0,,,否,否,, 51 | 7506568967345275706,独秀青春,1833615464871400,MS4wLjABAAAA8IPy93oRYYhkqxjwLDv2xKBIkRhoa4ZqT-DIUh0YEBppKDdfZ-dFBJeZcS7EVLAf,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oMfKAFgAeOPAEgEp7rlFQB7DEjAIYI7x4f8AAJ.jpeg?from=2064092626,广西,2025-05-21 00:41:32,卖1999价格吧对得起所有米粉[大笑],27,4,,,否,否,, 52 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_002249.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51643,568,,,否,否,, 3 | 7507676972824658728,四季开农业种子店,4218184767116749,MS4wLjABAAAAFBZ32pjgpvvyrXiSmwt9w11cuV0-TTQCh4x_s1hSd8PtEpUpGN5O3lCdEAM81l4C,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_o0DECIA2gAGkAX86eifj4zBCPAYrWiApwALBAH.jpeg?from=2064092626,江苏,2025-05-24 00:21:19,啥时候出一个主机放车里,或者带身上,屏幕分离,加大屏幕待电量,就像路由器那样把主机也可以放家里,放车里,放包里,放口袋里[泣不成声],出门就拿着屏幕,游戏就数据线连主机[泪奔],0,0,,,否,否,, 4 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29628,776,,,否,否,, 5 | 7506494061470663465,做个向日葵,97352596218,MS4wLjABAAAAOC1_YnR8s2eI8819pC0xBpwdUNWzt1V1Y1UQIuf0G1M,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_3a65ef707b78443bb201bf8bcfae01ff.jpeg?from=2064092626,湖南,2025-05-20 20:07:28,雷总,终于回来了,想死你了,1152,10,,,否,否,, 6 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,267,56,,,否,否,, 7 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments_i5g6kb83_20250312_001130.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments_i5g6kb83_20250312_001710.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,地区,时间,评论,点赞数 2 | 7480322720539231014,长矛沾屎戳谁谁死 水瓶装尿呲谁谁叫,福建,2025-03-11 07:12:44,"谁懂啊 3 | 毫无欲望的一堵墙",11 4 | 7480373749487125283,不换,海南,2025-03-11 10:30:47,过期之前喊我去你家可以吗,0 5 | 7480167895684023097,a 嘉楠 a,内蒙古,2025-03-10 21:11:48,客厅飘窗下面的抽屉里还有几十盒面膜,二十多个洗面奶,口红二十多个,唇膏十来个,精华液,精油洗发水护发素各十几瓶,两个卫生间柜子里抽屉里都是护肤品[泪奔]卫生巾几十包[泪奔],还有过期的精华放茶几上当护手霜[泪奔]没穿过的新鞋四十多双,衣服更数不清[尬笑],6 6 | 7480358874639270683,饼干小姐🍪,海南,2025-03-11 09:33:02,要不你给我邮点 邮费我自付 不然太可惜了[捂脸],0 7 | 7480160776440693555,90岁冷艳美蟑螂,四川,2025-03-10 20:44:13,我不一样,就跟饕餮一样,不管买了多少一会儿就吃完了,本来打算攒着吃的[憨笑][憨笑],4 8 | -------------------------------------------------------------------------------- /crawled_comments/douyin_comments_i5g6kb83_20250313_153107.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | 7480158671126479676,是想引起我的注意吗,60594754198,MS4wLjABAAAAOsqQ7da7h6mZ0HJwd7sYoXFPZ85TAWXZlsvqfUofV4Y,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oUL7AAScGeAtmEInDwFfEEwdDaxAuuEf9ADgA7.jpeg?from=2956013662,广东,2025-03-10 20:36:01,你们是怎么忍住不吃的?中午买的恨不得一晚上炫完[微笑],2743,1072,,,否,否,, 3 | 7481170032313484069,李瑞泽(已老实版),61721752502,MS4wLjABAAAAVrsCLgVGnMdtoSLve6jTbqV-tjDYy-jvCM6YAPXMFXc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oAgszAIABAzjgqAoEABACoNFxCfNeI1hyAZ9NA.jpeg?from=2956013662,湖北,2025-03-13 14:00:41,我有个坏习惯不吃完不会再去买,只有吃的干干净净了才会再去买[捂脸],1,1,,,否,否,, 4 | 7480109441121256232,·,61306370428,MS4wLjABAAAAYrNuWH0HUR873JZO6er8QJcVqNe8JBa0tnaP47EJ5bk,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_20b4d000515fe287a65c7.jpeg?from=2956013662,湖南,2025-03-10 17:25:05,爱买化妆品。不用。放到过期[泪奔][泪奔][泪奔],264,54,,,否,否,, 5 | 7481178287105114895,🌾 JFF、,60600708724,MS4wLjABAAAAzyK88r686Ncv3AdrJC_CW2Ra-x4EpRGvKYexzZ_DGzs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oYJ6eAfZCuIAxA9CAVgcEkFEoVARAn7lDbAwCn.jpeg?from=2956013662,四川,2025-03-13 14:32:41,每个月我老公帮我清理一堆,每个月都在说我,说我浪费[捂脸],0,0,,,否,否,, 6 | 7481193537113260860,粽粽👩‍🚀,106072335272,MS4wLjABAAAA62prbZiWNcKLVQjzhz192FOO3wvf8UsWOmnk1kqMUlU,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_850be733592ee535c29723867b84d946.jpeg?from=2956013662,江苏,2025-03-13 15:31:51,我会放到过期,救命啊,然后想吃的时候吧外包装还扔了有些,就总在吃过期的零食感觉,0,0,,,否,否,, 7 | 7480129278760944393,诶嘿嘿💎,3091441175501164,MS4wLjABAAAAQuARjFtB-BvM1gy5VMHmsmvML8hKPjY-maezMsKfAuU3DXnt9ek-s5mDbMNNKo0D,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oMfDieUgNAu2GYAQAfWIA3I0pCwLISIwc3f5AB.jpeg?from=2956013662,陕西,2025-03-10 18:42:00,买了辣的回去又想吃甜的,买了甜的回去又想吃咸的,一进零食店又不想买了,回去后又后悔没买[尬笑]跟有病似的,163,21,,,否,否,, 8 | -------------------------------------------------------------------------------- /douyin_analysis_results/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/README.md -------------------------------------------------------------------------------- /douyin_analysis_results/comment_wordcloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/comment_wordcloud.png -------------------------------------------------------------------------------- /douyin_analysis_results/crawled_comments/douyin_comments_7505063238057889081_20250524_000454.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | 7505554471775208229,玫瑰郁金香~,4195333778452264,MS4wLjABAAAA_9K-j3RLTwTYcwMzzVbuWpWiQaVGVqIQgJCBZq-TlzYGjUUzzorPthVauIvJ2aVC,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_f5bf7e5d516141538f2f894a4a8b3bb0.jpeg?from=2064092626,河北,2025-05-18 07:04:43,[比心][比心][比心],0,0,,,否,否,, 3 | -------------------------------------------------------------------------------- /douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_001812.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51533,568,,,否,否,, 3 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29590,775,,,否,否,, 4 | 7506755771851391803,昏君不吃鱼,3019677097539415,MS4wLjABAAAASAtgMxIM7KeG8dKvLaRANhrIjwez7Vdo2L-hdAOgkNzNiL7p1X5KEp91Jx756fsK,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/douyin-user-image-file_85ab3a39770bf1d23b032e7abb2d0682.jpeg?from=2064092626,北京,2025-05-21 12:46:21,您好,雷总,我们是从西安到北京度蜜月的小米 su7 车主,我本人也是红米 turbo 4 pro 手机用户, 我和我的妻子很早之前就已经确定今天到北京了,没想到刚好遇到了明天是发布会,能不能让我们近距离的参加一下发布会🎉('ω')🎉感谢🙏,403,42,,,否,否,, 5 | 7506480179830293275,赶路人,83246963821,MS4wLjABAAAA4SBS2nwK5tgy3D1UdDCm8BCFLSd-XAymBVVcNcQig2w,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o01AcLr7Ar5BVlGQ7n3EBef1zIIsEALUAAAKeG.jpeg?from=2064092626,广东,2025-05-20 18:56:57,雷总超级期待新产品 必须买15sPro,1735,145,,,否,否,, 6 | 7506492129842217768,青禾,64301672255,MS4wLjABAAAAgEDqvTfEbI8noWC1J2SkJvWiwb_iKl_RBUCErHhyiiw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oEcAyiAsigXDzPy9BIhpZj1iEA7A4Oz3PgIAP.jpeg?from=2064092626,河南,2025-05-20 19:58:04,冷战结束,雷总回来啦,1001,8,,,否,否,, 7 | 7507656153338692391,伟生:岑[cen],96002475445,MS4wLjABAAAASzg3Wj2ivLtKqgQ6CgKYkUDM8-Q6svmJOVvC8_VtTD8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_3c7e31f803801e7ba6daf8ee13ed8d71.jpeg?from=2064092626,广东,2025-05-23 23:00:20,小米芯片牛逼[捂脸],9,0,,,否,否,, 8 | 7506486944446300979,卢伟冰,2544737774480633,MS4wLjABAAAAeeGpeTBNIRe66uQLgsFZmiRXR4GEQnh6FCtORpNOjXMPVjYqkmDeDtYhzBirNj_k,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oIz1AhKAFIA5LfdA1CctDn9p3bAaCegjPyE4AA.jpeg?from=2064092626,北京,2025-05-20 19:23:13,5月22日,发布会见![爱心],2282,143,,,否,否,, 9 | 7506890136362369849,帽子掉了囖,2634020313116317,MS4wLjABAAAA83Davw1de5aRD3LiEOpk31MyVDWLrgpB6L5Ow979EklND6pUVdVGppOiWMspKBHc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_osPoDAPI2ICepfGIBAAgbQAHIAwbRfGcCNeCLI.jpeg?from=2064092626,福建,2025-05-21 21:27:44,"雷总,你好,我一直是小米粉丝,16左右就一直用小米,今年3月20号呢我买了小米15pro,直到上周天早上8.40开车出车库的时候,我用手机打开carlife后连接,结果小米15pro页面卡死,而后我去网上查结果发现这个bug目前仅我这台出现,我就去找客服,客服协调后让我去小米之家检查,检查如果有硬件问题就换机, 10 | 11 | 但是都已经恢复了,能测出啥来?日志日志不看,视频视频不看,什么工作态度?我已经录制了视频也把日志导出反馈了,但测试也只是测一测我的手机硬件问题,最后报告出来啥事没有, 12 | 13 | 那会出地库卡死的50分钟,我也有很重要的事情啊,本来9给学生上课,硬生生被推迟,手机使用不了,车库缴费出不去,打车用不了,很难受啊,和客服协商后还好些,但现在啥也不解决,我就很无语了,我需要的是解决方案,而不是一味的不好意思 14 | 一直卡在os启动界面[流泪][流泪][流泪][流泪][流泪][流泪][流泪][流泪],没解决,没下文,好伤心[流泪][流泪][流泪][流泪][流泪][流泪][流泪]@雷军 @雷军小米记事录",92,27,,,否,否,,"104815668206,3094492413967114" 15 | 7507536897355399948,心想事成。,75989719933,MS4wLjABAAAAsQTLpo_8onSZljE6vyzCZVWyC5bhv_JrW5xWCJ-gW0A,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_okDwAEkAVQiKpFGArDMfBuYtfnA4C9gDToIcAA.jpeg?from=2064092626,广东,2025-05-23 15:17:34,湖北的是谁,15,7,,,否,否,, 16 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,261,55,,,否,否,, 17 | 7506481351215579942,🌈Y h,59237659425,MS4wLjABAAAAo9spT8FJ4Zdwm7gq0vsXbu6lpauaMK3YYpc9JzvnuJo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ooxAfC8XVInxkwAAHAAgbTDAl6QS9WaADAeVoC.jpeg?from=2064092626,江苏,2025-05-20 19:01:30,雷总,我的小米14聊天刷视频都烫手[尬笑][尬笑][尬笑],573,215,,,否,否,, 18 | 7506508566116696890,时间的猫,72933156278,MS4wLjABAAAAaMfSDEj8o6e__89fhI8dqYxefOZl1D4RhFEJROkRwN8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oUB6bzACPAEoArBeAXBaaHIwPiehTzZAUeiEEV.jpeg?from=2064092626,贵州,2025-05-20 20:47:03,准备买苹果的,可是我能等!,44,7,,,否,否,, 19 | 7507670325846393639,凉夜听风/自律,4032664197929355,MS4wLjABAAAAhJE1pcmIqOiCWkfVIY0igBh4iwHMqn3_q_ocA5OVHahwc8uAIcObox0jT4RIobeq,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogC79ADVvF1fVARWfjVoCAQEAs5FIAmlAQgAVT.jpeg?from=2064092626,甘肃,2025-05-23 23:55:28,"全网最成功的几位博主 20 | 1、湖远行(2534万粉丝) 21 | 2、雷军(4562万粉丝) 22 | 3、董宇辉(2731万粉丝) 23 | 4、我 (59 .79 万粉丝)",3,0,,,否,否,, 24 | 7506483045878727461,在下方何,3369368544881071,MS4wLjABAAAAoaFVNHMv2W9k6GlhkGEr1jBlVkfOZEUbazU6fjL5EqINaMM3SVYokjdyvEzc8yiy,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4AEEkmfADxVZD4XGgFBgRIE9AzYAqGfAACnA8.jpeg?from=2064092626,广东,2025-05-20 19:35:51,爸:吃了吗,1071,119,,,否,否,, 25 | 7507675421903766330,你好,我是一个喜欢女人的中国男人🇨🇳,66454192133,MS4wLjABAAAAPPa3FpsZScnFuxp4zRe2lw-vVqdBPgmVmYKNsPzUjcM,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oAEbAjHKOFBAlAMAIoEDuRfyQAeYwaapEeRPG9.jpeg?from=2064092626,广东,2025-05-24 00:15:04,只能说,这个芯片更加适合小米。正常来说骁龙芯片,需要各个手机厂商去适配芯片,但是小米的这个芯片,可以更加适配于小米,只能说这次芯片对比骁龙一般般。以后小米更新几代一定会全面超过骁龙的,但是只在小米手机上全面超越,就这个意思。,1,0,,,否,否,, 26 | 7507653011641238331,椰果冻,2955956893538686,MS4wLjABAAAAcUlfUsoimmFwlQ6M0ZRa5YA_k7K11iTX411rfWWKtkkM-kfKD4i_RrMkzaErK5u_,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oYARAfclDEDnFAeAmUELCYEi1EeIXjpwsZ9u4A.jpeg?from=2064092626,安徽,2025-05-23 22:48:05,爸,不许和我们冷战了[酷拽]一个月没理我们了[流泪],15,2,,,否,否,, 27 | 7506486988432016180,一枕清风明月,1516964907065831,MS4wLjABAAAAGfvWblt_lzcqneYBZKw_YU309MWbISyCUGK4y7W6S_zsjC8LXNc1UMk8l9yhv_j6,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oAHICO1HAD3F6Nmo0OFAg4QE9AsfAvXfAACnAG.jpeg?from=2064092626,陕西,2025-05-20 19:57:19,别人不断的黑小米,小米还能像个巨人一样屹立不倒,这才是真正用心的企业。希望小米越来越好,164,0,,,否,否,, 28 | 7506881578614735675,鑫河车机,3318751792735328,MS4wLjABAAAANNWgCK0wglqJMiBk8k69F4R1KEQTeKTQ2xpJ2cc_9Uj-RU6D5vCd8bbz_St-9fR1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_17e2eb811ff3b2760527e5266eb1d654.jpeg?from=2064092626,江苏,2025-05-21 20:54:31,雷总你冷战期间,我偷偷去买了一部 512G+16G的小米15,下次别再冷战了,我已经没钱了,724,52,,,否,否,, 29 | -------------------------------------------------------------------------------- /douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_104505.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_104505.csv -------------------------------------------------------------------------------- /douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_105423.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_105423.csv -------------------------------------------------------------------------------- /douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_113220.csv: -------------------------------------------------------------------------------- 1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户 2 | -------------------------------------------------------------------------------- /douyin_analysis_results/hot_words.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 441 | 442 | 443 | -------------------------------------------------------------------------------- /douyin_analysis_results/location_analysis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 |
12 | 241 | 242 | 243 | -------------------------------------------------------------------------------- /douyin_analysis_results/requirements.txt: -------------------------------------------------------------------------------- 1 | DrissionPage>=3.0.0 2 | pandas>=1.3.0 3 | jieba>=0.42.1 4 | wordcloud>=1.8.2 5 | matplotlib>=3.5.0 6 | pyecharts>=1.9.0 7 | numpy>=1.20.0 8 | pillow>=9.0.0 -------------------------------------------------------------------------------- /douyin_analysis_results/time_analysis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Awesome-pyecharts 6 | 7 | 8 | 9 | 10 | 11 |
12 | 258 | 259 | 260 | -------------------------------------------------------------------------------- /douyin_analysis_results/抖音工具集.py: -------------------------------------------------------------------------------- 1 | """ 2 | 抖音评论爬取与分析工具 - 统一入口 3 | 4 | 功能: 5 | 1. 爬取抖音视频评论 6 | 2. 分析评论数据并生成可视化图表 7 | 3. 支持爬取全部评论,不受限于页数 8 | 4. 支持通过关键词搜索视频并爬取评论 9 | 5. 按日期时间保存数据,便于历史追踪 10 | 11 | 日期: 2025年3月12日 12 | 版本: 3.1 13 | 作者: TO:梁 14 | """ 15 | 16 | import os 17 | import sys 18 | import time 19 | 20 | # 导入爬虫和分析器 21 | try: 22 | from 抖音评论爬虫 import DouyinCommentCrawler 23 | from 抖音数据分析器 import CommentAnalyzer 24 | from 抖音视频搜索 import DouyinVideoSearcher 25 | except ImportError: 26 | print("正在导入模块...") 27 | # 尝试相对导入 28 | try: 29 | # 当前文件目录 30 | current_dir = os.path.dirname(os.path.abspath(__file__)) 31 | if current_dir not in sys.path: 32 | sys.path.append(current_dir) 33 | 34 | from 抖音评论爬虫 import DouyinCommentCrawler 35 | from 抖音数据分析器 import CommentAnalyzer 36 | from 抖音视频搜索 import DouyinVideoSearcher 37 | except ImportError: 38 | print("无法导入必要模块。请确保相关模块文件位于同一目录下。") 39 | sys.exit(1) 40 | 41 | 42 | def show_banner(): 43 | """显示欢迎横幅""" 44 | print("\n" + "=" * 80) 45 | print("抖音评论爬取与分析工具 V3.1".center(78)) 46 | print("=" * 80) 47 | print(" 功能:爬取抖音视频评论数据并生成多维度分析图表") 48 | print(" 特点:支持爬取全部评论 | 自动保存历史数据 | 多维度数据可视化 | 关键词搜索视频") 49 | print(" 作者:TO:梁") 50 | print("=" * 80 + "\n") 51 | 52 | 53 | def print_section(title): 54 | """打印带有分隔符的小节标题""" 55 | print("\n" + "-" * 50) 56 | print(f" {title} ".center(48, "-")) 57 | print("-" * 50) 58 | 59 | 60 | def show_menu(): 61 | """显示主菜单""" 62 | print_section("主菜单") 63 | print("1. 爬取新的评论并分析") 64 | print("2. 分析已有的评论数据") 65 | print("3. 同时执行爬取和分析") 66 | print("4. 通过关键词搜索视频并爬取评论") 67 | print("0. 退出程序") 68 | return input("\n请选择操作 [0-4]: ") 69 | 70 | 71 | def crawl_comments(): 72 | """爬取评论功能""" 73 | print_section("评论爬取") 74 | 75 | # 获取视频URL 76 | video_url = input("请输入抖音视频URL (例如: https://www.douyin.com/video/7353500880198536457): ") 77 | if not video_url: 78 | print("错误: URL不能为空!") 79 | return None 80 | 81 | # 设置最大爬取页数 82 | try: 83 | pages_input = input("请输入最大爬取页数 (直接回车表示爬取全部评论): ") 84 | max_pages = int(pages_input) if pages_input.strip() else None 85 | except ValueError: 86 | max_pages = None 87 | 88 | if max_pages is None: 89 | print("将爬取全部评论,直到没有更多评论为止") 90 | else: 91 | print(f"将爬取最多 {max_pages} 页评论") 92 | 93 | # 询问是否使用正常模式 94 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n' 95 | 96 | # 如果使用正常模式,询问是否需要先登录 97 | login_first = False 98 | if use_normal_mode: 99 | login_first = input("是否需要在爬取前先登录抖音账号 [y/N]: ").lower() == 'y' 100 | 101 | # 创建爬虫实例 102 | print("\n正在初始化爬虫...") 103 | crawler = DouyinCommentCrawler( 104 | video_url=video_url, 105 | max_pages=max_pages, 106 | use_normal_mode=use_normal_mode, 107 | login_first=login_first 108 | ) 109 | 110 | # 执行爬取 111 | print("\n开始爬取评论,请稍候...\n") 112 | start_time = time.time() 113 | comments = crawler.start_crawler() 114 | end_time = time.time() 115 | 116 | # 打印爬取结果 117 | if comments: 118 | print(f"\n成功爬取 {len(comments)} 条评论,耗时 {end_time - start_time:.2f} 秒") 119 | print(f"评论已保存到文件: {crawler.get_output_file()}") 120 | return crawler.get_output_file() 121 | else: 122 | print("\n爬取失败或未获取到评论") 123 | return None 124 | 125 | 126 | def analyze_comments(csv_file=None): 127 | """分析评论功能""" 128 | print_section("评论分析") 129 | 130 | if not csv_file: 131 | print("\n请选择操作:") 132 | print("1. 分析最新爬取的评论数据") 133 | print("2. 指定CSV文件进行分析") 134 | choice = input("请输入选项 (1/2): ") 135 | 136 | if choice == "2": 137 | csv_file = input("请输入CSV文件路径: ") 138 | if not os.path.exists(csv_file): 139 | print(f"错误: 文件 {csv_file} 不存在!") 140 | return 141 | 142 | # 设置词云形状图片 143 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ") 144 | if shape_img and not os.path.exists(shape_img): 145 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状") 146 | shape_img = None 147 | 148 | # 创建分析器 149 | print("\n正在初始化分析器...") 150 | analyzer = CommentAnalyzer(csv_file=csv_file) 151 | 152 | # 执行所有分析 153 | print("\n开始分析评论数据,请稍候...\n") 154 | start_time = time.time() 155 | try: 156 | outputs = analyzer.run_all_analysis(shape_img=shape_img) 157 | end_time = time.time() 158 | 159 | print(f"\n分析完成! 耗时 {end_time - start_time:.2f} 秒") 160 | print("生成的文件:") 161 | for output in outputs: 162 | print(f" - {output}") 163 | except Exception as e: 164 | print(f"分析过程中出错: {str(e)}") 165 | 166 | 167 | def search_and_crawl_comments(): 168 | """通过关键词搜索视频并爬取评论功能""" 169 | print_section("视频搜索与评论爬取") 170 | 171 | # 获取搜索关键词 172 | keyword = input("请输入要搜索的关键词: ") 173 | if not keyword: 174 | print("错误: 关键词不能为空!") 175 | return None 176 | 177 | # 设置搜索结果数量 178 | try: 179 | result_count_input = input("请输入要显示的最大搜索结果数量 (直接回车默认为10): ") 180 | max_results = int(result_count_input) if result_count_input.strip() else 10 181 | except ValueError: 182 | max_results = 10 183 | 184 | # 询问是否使用正常模式 185 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n' 186 | 187 | # 如果使用正常模式,询问是否需要先登录 188 | login_first = False 189 | if use_normal_mode: 190 | login_first = input("是否需要在搜索前先登录抖音账号 [y/N]: ").lower() == 'y' 191 | 192 | # 创建搜索器实例 193 | print("\n正在初始化搜索器...") 194 | searcher = DouyinVideoSearcher( 195 | use_normal_mode=use_normal_mode, 196 | login_first=login_first 197 | ) 198 | 199 | # 执行搜索 200 | print(f"\n开始搜索关键词: \"{keyword}\",请稍候...\n") 201 | start_time = time.time() 202 | search_results = searcher.search_videos(keyword, max_results) 203 | end_time = time.time() 204 | 205 | # 处理搜索结果 206 | if not search_results: 207 | print("\n未找到相关视频或搜索失败") 208 | if searcher.driver: 209 | searcher.close() 210 | return None 211 | 212 | print(f"\n成功找到 {len(search_results)} 个相关视频,耗时 {end_time - start_time:.2f} 秒") 213 | 214 | # 显示搜索结果 215 | searcher.display_search_results() 216 | 217 | # 用户选择视频 218 | selected_video = searcher.select_video() 219 | if not selected_video: 220 | print("\n未选择任何视频,操作取消") 221 | if searcher.driver: 222 | searcher.close() 223 | return None 224 | 225 | # 获取选定视频的URL 226 | video_url = selected_video['url'] 227 | 228 | # 设置最大爬取页数 229 | try: 230 | pages_input = input("\n请输入最大爬取页数 (直接回车表示爬取全部评论): ") 231 | max_pages = int(pages_input) if pages_input.strip() else None 232 | except ValueError: 233 | max_pages = None 234 | 235 | if max_pages is None: 236 | print("将爬取全部评论,直到没有更多评论为止") 237 | else: 238 | print(f"将爬取最多 {max_pages} 页评论") 239 | 240 | # 关闭搜索浏览器 241 | if searcher.driver: 242 | searcher.close() 243 | 244 | # 创建爬虫实例 245 | print("\n正在初始化爬虫...") 246 | crawler = DouyinCommentCrawler( 247 | video_url=video_url, 248 | max_pages=max_pages, 249 | use_normal_mode=use_normal_mode, 250 | login_first=login_first 251 | ) 252 | 253 | # 执行爬取 254 | print("\n开始爬取评论,请稍候...\n") 255 | start_time = time.time() 256 | comments = crawler.start_crawler() 257 | end_time = time.time() 258 | 259 | # 打印爬取结果 260 | if comments: 261 | print(f"\n成功爬取 {len(comments)} 条评论,耗时 {end_time - start_time:.2f} 秒") 262 | print(f"评论已保存到文件: {crawler.get_output_file()}") 263 | return crawler.get_output_file() 264 | else: 265 | print("\n爬取失败或未获取到评论") 266 | return None 267 | 268 | 269 | def main(): 270 | """主函数""" 271 | show_banner() 272 | 273 | while True: 274 | choice = show_menu() 275 | 276 | if choice == "1": 277 | # 爬取新评论 278 | csv_file = crawl_comments() 279 | 280 | # 询问是否要分析 281 | if csv_file: 282 | if input("\n是否要分析刚爬取的评论数据? (y/n): ").lower() == 'y': 283 | analyze_comments(csv_file) 284 | 285 | elif choice == "2": 286 | # 分析已有评论 287 | analyze_comments() 288 | 289 | elif choice == "3": 290 | # 爬取并分析 291 | csv_file = crawl_comments() 292 | if csv_file: 293 | print("\n自动开始分析评论数据...") 294 | analyze_comments(csv_file) 295 | 296 | elif choice == "4": 297 | # 搜索并爬取评论 298 | csv_file = search_and_crawl_comments() 299 | 300 | # 询问是否要分析 301 | if csv_file: 302 | if input("\n是否要分析刚爬取的评论数据? (y/n): ").lower() == 'y': 303 | analyze_comments(csv_file) 304 | 305 | elif choice == "0": 306 | print("\n感谢使用,再见!") 307 | break 308 | 309 | else: 310 | print("\n无效的选择,请重新输入") 311 | 312 | input("\n按Enter键继续...") 313 | 314 | 315 | if __name__ == "__main__": 316 | main() -------------------------------------------------------------------------------- /douyin_analysis_results/抖音视频搜索.py: -------------------------------------------------------------------------------- 1 | """ 2 | 抖音视频搜索模块 - URL版本 3 | 4 | 功能: 5 | 1. 根据关键词搜索抖音视频 6 | 2. 返回搜索结果中的视频列表 7 | 3. 支持通过网络接口获取搜索结果,无需浏览器 8 | 9 | 日期: 2024年 10 | """ 11 | 12 | import time 13 | import random 14 | import re 15 | import json 16 | import requests 17 | from urllib.parse import quote, urlencode 18 | import hashlib 19 | 20 | 21 | class DouyinVideoSearcher: 22 | """抖音视频搜索类 - URL版本""" 23 | 24 | def __init__(self): 25 | """初始化搜索器""" 26 | self.session = requests.Session() 27 | self.headers = { 28 | 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36', 29 | 'Referer': 'https://www.douyin.com/', 30 | 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7', 31 | 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8', 32 | 'Sec-Fetch-Site': 'same-origin', 33 | 'Sec-Fetch-Mode': 'navigate', 34 | } 35 | self.search_results = [] 36 | 37 | def search_direct_url(self, keyword, max_videos=10): 38 | """ 39 | 使用网络搜索API直接获取视频列表 40 | 41 | :param keyword: 搜索关键词 42 | :param max_videos: 最大返回视频数量 43 | :return: 视频列表 44 | """ 45 | print(f"\n正在初始化搜索器...") 46 | 47 | if not keyword: 48 | print("错误: 搜索关键词不能为空") 49 | return [] 50 | 51 | print(f"\n开始搜索关键词: \"{keyword}\",请稍候...") 52 | 53 | try: 54 | # 方法1:搜索页URL格式 55 | encoded_keyword = quote(keyword) 56 | search_url = f"https://www.douyin.com/search/{encoded_keyword}" 57 | 58 | # 访问搜索页面 59 | results = self._fetch_search_results(search_url, max_videos) 60 | if results: 61 | print(f"搜索成功,找到 {len(results)} 个视频") 62 | self.search_results = results 63 | return results 64 | 65 | # 方法2:使用热门分享URL 66 | print("尝试使用备用方法搜索...") 67 | try: 68 | backup_results = self._search_by_keywords(keyword, max_videos) 69 | if backup_results: 70 | print(f"备用搜索成功,找到 {len(backup_results)} 个视频") 71 | self.search_results = backup_results 72 | return backup_results 73 | except Exception as e: 74 | print(f"备用搜索方法失败: {e}") 75 | 76 | print("未找到相关视频或搜索失败") 77 | return [] 78 | 79 | except Exception as e: 80 | print(f"搜索出错: {str(e)}") 81 | return [] 82 | 83 | def _fetch_search_results(self, url, max_count=10): 84 | """从URL获取搜索结果""" 85 | try: 86 | response = self.session.get(url, headers=self.headers, timeout=10) 87 | if response.status_code != 200: 88 | return [] 89 | 90 | # 尝试从HTML中提取视频信息 91 | html = response.text 92 | video_ids = re.findall(r'/video/(\d+)', html) 93 | 94 | if not video_ids: 95 | return [] 96 | 97 | # 去重 98 | video_ids = list(set(video_ids))[:max_count] 99 | 100 | results = [] 101 | for vid in video_ids: 102 | video_url = f"https://www.douyin.com/video/{vid}" 103 | # 获取视频详情 104 | try: 105 | video_info = self._fetch_video_info(video_url) 106 | if video_info: 107 | results.append(video_info) 108 | except Exception as e: 109 | print(f"获取视频 {vid} 信息失败: {str(e)}") 110 | 111 | return results 112 | 113 | except Exception as e: 114 | print(f"获取搜索结果失败: {str(e)}") 115 | return [] 116 | 117 | def _fetch_video_info(self, video_url): 118 | """获取视频详细信息""" 119 | try: 120 | response = self.session.get(video_url, headers=self.headers, timeout=10) 121 | if response.status_code != 200: 122 | return None 123 | 124 | html = response.text 125 | 126 | # 提取标题 127 | title_match = re.search(r']*>(.*?)', html) 128 | title = title_match.group(1) if title_match else "未知标题" 129 | # 清理标题 130 | title = title.replace(" - 抖音", "").strip() 131 | 132 | # 提取作者 133 | author_match = re.search(r'name="author" content="([^"]+)"', html) 134 | author = author_match.group(1) if author_match else "未知作者" 135 | 136 | # 提取视频ID 137 | video_id = video_url.split("/")[-1].split("?")[0] 138 | 139 | return { 140 | 'title': title, 141 | 'author': author, 142 | 'url': video_url, 143 | 'video_id': video_id, 144 | 'likes': "未知", 145 | 'comments': "未知" 146 | } 147 | 148 | except Exception as e: 149 | print(f"获取视频信息失败: {str(e)}") 150 | return None 151 | 152 | def _search_by_keywords(self, keyword, max_count=10): 153 | """使用关键词搜索抖音视频的URL方法""" 154 | # 构建搜索URL 155 | keyword_for_api = quote(keyword) 156 | search_api_url = f"https://www.douyin.com/aweme/v1/web/general/search/single/" 157 | 158 | # 生成时间戳和设备ID 159 | timestamp = str(int(time.time())) 160 | device_id = hashlib.md5(timestamp.encode()).hexdigest()[:16] # 简单模拟设备ID 161 | 162 | # 搜索参数 163 | params = { 164 | 'keyword': keyword, 165 | 'device_platform': 'webapp', 166 | 'source': 'normal_search', 167 | 'search_channel': 'aweme_general', 168 | 'type': 1, # 视频类型 169 | 'device_id': device_id, 170 | 'count': max_count, 171 | 'version_name': '23.5.0', 172 | 'aid': 6383 173 | } 174 | 175 | headers = self.headers.copy() 176 | headers['Content-Type'] = 'application/json' 177 | 178 | try: 179 | response = self.session.get( 180 | search_api_url, 181 | params=params, 182 | headers=headers, 183 | timeout=10 184 | ) 185 | 186 | # 尝试解析JSON响应 187 | if response.status_code == 200: 188 | try: 189 | data = response.json() 190 | if 'status_code' in data and data['status_code'] == 0: 191 | # 提取视频信息 192 | videos = [] 193 | for item in data.get('data', []): 194 | if 'aweme_info' in item: 195 | video = item['aweme_info'] 196 | video_id = video.get('aweme_id') 197 | title = video.get('desc', '未知标题') 198 | author = video.get('author', {}).get('nickname', '未知作者') 199 | video_url = f"https://www.douyin.com/video/{video_id}" 200 | 201 | videos.append({ 202 | 'title': title, 203 | 'author': author, 204 | 'url': video_url, 205 | 'video_id': video_id, 206 | 'likes': "未知", 207 | 'comments': "未知" 208 | }) 209 | 210 | return videos 211 | except json.JSONDecodeError: 212 | pass 213 | 214 | # 备用方法:使用web搜索页面 215 | backup_url = f"https://www.douyin.com/search/{keyword_for_api}?source=normal_search&type=video" 216 | return self._fetch_search_results(backup_url, max_count) 217 | 218 | except Exception as e: 219 | print(f"通过关键词API搜索失败: {str(e)}") 220 | # 尝试备用方法 221 | try: 222 | backup_url = f"https://www.douyin.com/search/{keyword_for_api}?aid=0&source=normal_search&type=video" 223 | return self._fetch_search_results(backup_url, max_count) 224 | except: 225 | return [] 226 | 227 | def search_videos(self, keyword, max_videos=10): 228 | """保留旧的接口名称兼容已有代码调用""" 229 | return self.search_direct_url(keyword, max_videos) 230 | 231 | def display_search_results(self): 232 | """显示搜索结果""" 233 | if not self.search_results: 234 | print("没有找到视频结果") 235 | return 236 | 237 | print("\n" + "=" * 80) 238 | print(" 搜索结果 ".center(78, "=")) 239 | print("=" * 80) 240 | 241 | for i, video in enumerate(self.search_results): 242 | # 安全获取视频信息 243 | title = video.get('title', '未知标题') 244 | author = video.get('author', '未知作者') 245 | likes = video.get('likes', '未知') 246 | comments = video.get('comments', '未知') 247 | video_url = video.get('url', '') 248 | 249 | print(f"\n[{i+1}] {title}") 250 | print(f" 作者: {author}") 251 | print(f" 点赞: {likes} | 评论: {comments}") 252 | print(f" 链接: {video_url}") 253 | print("-" * 80) 254 | 255 | def select_video(self): 256 | """让用户选择一个视频""" 257 | if not self.search_results: 258 | print("没有可选择的视频") 259 | return None 260 | 261 | # 安全处理,确保至少有一个有效结果 262 | valid_results = [v for v in self.search_results if 'url' in v and v['url']] 263 | if not valid_results: 264 | print("没有找到有效的视频链接") 265 | return None 266 | 267 | # 将有效结果更新回搜索结果 268 | self.search_results = valid_results 269 | 270 | while True: 271 | try: 272 | choice = input("\n请选择要爬取评论的视频编号 [1-{}]: ".format(len(self.search_results))) 273 | 274 | if not choice.strip(): 275 | return None 276 | 277 | index = int(choice) - 1 278 | if 0 <= index < len(self.search_results): 279 | selected_video = self.search_results[index] 280 | print(f"\n已选择: {selected_video.get('title', '未知视频')}") 281 | return selected_video 282 | else: 283 | print(f"无效的选择,请输入 1-{len(self.search_results)} 之间的数字") 284 | except ValueError: 285 | print("请输入有效的数字") 286 | except Exception as e: 287 | print(f"选择视频时发生错误: {str(e)}") 288 | return None 289 | 290 | 291 | def main(): 292 | """主函数""" 293 | print("=" * 60) 294 | print("抖音视频搜索工具 - URL版本") 295 | print("=" * 60) 296 | 297 | # 创建搜索器实例 298 | searcher = DouyinVideoSearcher() 299 | 300 | while True: 301 | # 获取搜索关键词 302 | keyword = input("\n请输入搜索关键词 (直接回车退出): ") 303 | 304 | if not keyword.strip(): 305 | print("退出搜索") 306 | break 307 | 308 | # 设置最大返回结果数 309 | try: 310 | max_count_input = input("请输入最大返回结果数 (直接回车使用默认值10): ") 311 | max_count = int(max_count_input) if max_count_input.strip() else 10 312 | except ValueError: 313 | max_count = 10 314 | print("输入格式错误,使用默认值10") 315 | 316 | # 执行搜索 317 | videos = searcher.search_direct_url(keyword, max_count) 318 | 319 | # 显示搜索结果 320 | searcher.display_search_results() 321 | 322 | # 选择视频(如果有结果) 323 | if videos: 324 | selected = searcher.select_video() 325 | if selected: 326 | # 询问是否爬取评论 327 | crawl_choice = input("\n是否立即爬取该视频的评论? (y/n): ") 328 | if crawl_choice.lower() == 'y': 329 | try: 330 | # 导入爬虫模块并爬取评论 331 | from 抖音评论爬虫 import DouyinCommentCrawler 332 | 333 | # 询问是否使用正常浏览器模式 334 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n' 335 | 336 | # 创建爬虫实例 337 | crawler = DouyinCommentCrawler( 338 | video_url=selected['url'], 339 | use_normal_mode=use_normal_mode, 340 | login_first=False if not use_normal_mode else input("是否需要在爬取前先登录抖音账号 [y/N]: ").lower() == 'y' 341 | ) 342 | 343 | # 执行爬取 344 | crawler.start_crawler() 345 | 346 | except ImportError: 347 | print("未找到评论爬虫模块,请确保 douyin_crawler.py 在正确的位置") 348 | except Exception as e: 349 | print(f"爬取评论时出错: {str(e)}") 350 | 351 | # 询问是否继续搜索 352 | continue_choice = input("\n是否继续搜索? (y/n): ") 353 | if continue_choice.lower() != 'y': 354 | break 355 | 356 | print("\n感谢使用抖音视频搜索工具!") 357 | 358 | 359 | if __name__ == "__main__": 360 | main() -------------------------------------------------------------------------------- /douyin_analysis_results/抖音评论分析器_旧版.py: -------------------------------------------------------------------------------- 1 | """ 2 | 抖音视频评论爬取与数据可视化分析工具 3 | 4 | 功能: 5 | 1. 自动爬取指定抖音视频的评论数据 6 | 2. 将评论数据保存为CSV格式 7 | 3. 生成评论词云图 8 | 4. 评论情感分析与地区分布可视化 9 | 10 | 日期: 2024年 11 | """ 12 | 13 | import time 14 | import json 15 | import datetime 16 | import csv 17 | import os 18 | import random 19 | import jieba 20 | import pandas as pd 21 | import numpy as np 22 | from PIL import Image 23 | import wordcloud 24 | import matplotlib.pyplot as plt 25 | from pyecharts import options as opts 26 | from pyecharts.charts import Pie, Bar, Map, WordCloud as PyechartsWordCloud 27 | from pyecharts.globals import ThemeType 28 | from collections import Counter 29 | from DrissionPage import ChromiumPage 30 | 31 | 32 | class DouyinCommentCrawler: 33 | """抖音评论爬虫类""" 34 | 35 | def __init__(self, video_url=None, video_id=None, max_pages=None): 36 | """ 37 | 初始化爬虫 38 | :param video_url: 视频URL,例如 https://www.douyin.com/video/7353500880198536457 39 | :param video_id: 视频ID,如果提供了video_url则可不提供 40 | :param max_pages: 最大爬取页数,默认为None表示爬取全部评论 41 | """ 42 | self.video_url = video_url 43 | self.video_id = video_id if video_id else self._extract_video_id(video_url) 44 | self.max_pages = max_pages 45 | self.comments = [] 46 | self.driver = None 47 | self.comment_ids = set() # 用于去重的评论ID集合 48 | 49 | # 使用当前日期和时间创建唯一的文件名 50 | current_time = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") 51 | self.output_file = f"douyin_comments_{self.video_id}_{current_time}.csv" 52 | 53 | def _extract_video_id(self, url): 54 | """从URL中提取视频ID""" 55 | if not url: 56 | raise ValueError("需要提供视频URL或视频ID") 57 | return url.split("/")[-1].split("?")[0] 58 | 59 | def start_crawler(self): 60 | """启动爬虫""" 61 | print(f"开始爬取视频 {self.video_id} 的评论...") 62 | 63 | # 创建CSV文件 64 | f = open(self.output_file, mode='w', encoding='utf-8-sig', newline='') 65 | fieldnames = ['评论ID', '昵称', '地区', '时间', '评论', '点赞数'] 66 | csv_writer = csv.DictWriter(f, fieldnames=fieldnames) 67 | csv_writer.writeheader() 68 | 69 | try: 70 | # 初始化浏览器 71 | self.driver = ChromiumPage() 72 | # 监听评论数据API 73 | self.driver.listen.start('aweme/v1/web/comment/list/') 74 | 75 | # 访问视频页面 76 | if self.video_url: 77 | self.driver.get(self.video_url) 78 | else: 79 | self.driver.get(f'https://www.douyin.com/video/{self.video_id}') 80 | 81 | # 等待页面加载 82 | time.sleep(5) 83 | 84 | # 尝试点击"查看更多评论"按钮(如果存在) 85 | try: 86 | more_comment_btn = self.driver.find_element('xpath://div[contains(text(), "查看更多评论")]') 87 | if more_comment_btn: 88 | more_comment_btn.click() 89 | time.sleep(2) 90 | except: 91 | print("没有找到"查看更多评论"按钮,继续使用滚动加载评论") 92 | 93 | # 爬取评论 94 | page = 0 95 | no_new_comments_count = 0 # 连续没有新评论的次数 96 | last_comment_count = 0 # 上一次的评论总数 97 | retry_count = 0 # 当前页面重试次数 98 | max_retry = 5 # 最大重试次数 99 | 100 | # 如果设置了最大页数,则限制页数;否则一直爬取直到没有更多评论 101 | while self.max_pages is None or page < self.max_pages: 102 | try: 103 | page += 1 104 | print(f'正在爬取第 {page} 页评论...') 105 | 106 | # 使用不同的滚动策略 107 | if page % 3 == 0: 108 | # 精确滚动到评论区 109 | try: 110 | comment_area = self.driver.find_element('xpath://div[contains(@class, "comment-mainContent")]') 111 | if comment_area: 112 | self.driver.scroll.to_element(comment_area, center=True) 113 | time.sleep(1) 114 | except: 115 | pass 116 | 117 | # 平滑滚动到底部 118 | self.driver.scroll.to_bottom(smooth=True) 119 | else: 120 | # 先快速滚动一段距离,再滚动到底部 121 | self.driver.scroll.down(300) 122 | time.sleep(0.5) 123 | self.driver.scroll.to_bottom() 124 | 125 | # 随机等待时间,模拟人工浏览 126 | wait_time = 1 + random.random() * 2 127 | time.sleep(wait_time) 128 | 129 | # 等待数据包 130 | resp = self.driver.listen.wait(timeout=5) 131 | 132 | if not resp: 133 | print(f"未检测到新的评论数据,尝试继续... (重试 {retry_count+1}/{max_retry})") 134 | retry_count += 1 135 | 136 | if retry_count >= max_retry: 137 | no_new_comments_count += 1 138 | retry_count = 0 139 | 140 | if no_new_comments_count >= 3: 141 | print("连续多次未检测到新评论,尝试使用其他方法加载评论") 142 | 143 | # 尝试其他方法触发评论加载 144 | try: 145 | # 尝试点击"展开更多"按钮 146 | expand_btns = self.driver.find_elements('xpath://span[contains(text(), "展开") or contains(text(), "更多")]') 147 | if expand_btns: 148 | for btn in expand_btns[:3]: # 最多点击前3个 149 | try: 150 | btn.click() 151 | time.sleep(1) 152 | except: 153 | pass 154 | except: 155 | pass 156 | 157 | # 再尝试一次,如果还是失败则认为已到达末页 158 | if no_new_comments_count >= 5: 159 | print("已尝试多种方法但无法加载更多评论,可能已到达末页") 160 | break 161 | 162 | # 再次尝试不同的滚动方式 163 | self.driver.scroll.up(200) 164 | time.sleep(1) 165 | self.driver.scroll.to_bottom() 166 | continue 167 | 168 | # 重置重试计数器 169 | retry_count = 0 170 | 171 | # 解析JSON数据 172 | json_data = resp.response.body 173 | 174 | if not json_data or 'comments' not in json_data: 175 | print(f"未获取到有效评论数据,尝试继续... (尝试 {no_new_comments_count+1}/3)") 176 | no_new_comments_count += 1 177 | if no_new_comments_count >= 3: 178 | print("连续多次未获取到有效评论数据,可能已到达末页") 179 | break 180 | continue 181 | 182 | # 提取评论 183 | comments = json_data['comments'] 184 | if not comments: 185 | print("本页无评论数据,可能已到达末页") 186 | no_new_comments_count += 1 187 | if no_new_comments_count >= 3: 188 | break 189 | continue 190 | 191 | # 重置无新评论计数器(如果找到了评论) 192 | no_new_comments_count = 0 193 | 194 | # 记录爬取前的评论数和评论ID数 195 | comment_count_before = len(self.comments) 196 | comment_id_count_before = len(self.comment_ids) 197 | 198 | # 处理评论数据 199 | for comment in comments: 200 | try: 201 | comment_id = comment.get('cid', '') or str(comment.get('id', '')) 202 | 203 | # 如果已经处理过这个评论,则跳过 204 | if comment_id in self.comment_ids: 205 | continue 206 | 207 | # 添加到已处理集合 208 | self.comment_ids.add(comment_id) 209 | 210 | nickname = comment['user']['nickname'] 211 | create_time = comment['create_time'] 212 | date = str(datetime.datetime.fromtimestamp(create_time)) 213 | ip_label = comment.get('ip_label', '未知') 214 | text = comment['text'] 215 | digg_count = comment.get('digg_count', 0) # 点赞数 216 | 217 | # 创建评论数据字典 218 | comment_data = { 219 | '评论ID': comment_id, 220 | '昵称': nickname, 221 | '地区': ip_label, 222 | '时间': date, 223 | '评论': text, 224 | '点赞数': digg_count 225 | } 226 | 227 | # 保存到列表和文件 228 | self.comments.append(comment_data) 229 | csv_writer.writerow(comment_data) 230 | print(f"[{len(self.comments)}] 评论: {text[:30]}... - 来自: {nickname} - {ip_label}") 231 | 232 | except Exception as e: 233 | print(f"处理评论时出错: {str(e)}") 234 | 235 | # 检查是否有新的评论被添加 236 | comment_count_added = len(self.comments) - comment_count_before 237 | comment_id_added = len(self.comment_ids) - comment_id_count_before 238 | 239 | print(f"本次获取了 {comment_count_added} 条新评论,累计 {len(self.comments)} 条") 240 | 241 | # 如果没有新的评论ID被添加,说明可能需要尝试其他方法或已到达末页 242 | if comment_id_added == 0: 243 | no_new_comments_count += 1 244 | print(f"未获取到新评论ID,尝试继续... (尝试 {no_new_comments_count}/3)") 245 | 246 | # 尝试点击页面上的"查看更多回复"按钮 247 | try: 248 | more_reply_btns = self.driver.find_elements('xpath://span[contains(text(), "查看") and contains(text(), "回复")]') 249 | if more_reply_btns: 250 | for btn in more_reply_btns[:5]: # 最多点击前5个 251 | try: 252 | btn.click() 253 | time.sleep(1) 254 | except: 255 | pass 256 | # 点击了按钮后重置计数器,再次尝试 257 | no_new_comments_count = 0 258 | except: 259 | pass 260 | 261 | if no_new_comments_count >= 3: 262 | print("连续多次未获取到新评论,可能已到达末页") 263 | 264 | # 最后再尝试一次刷新页面的方法 265 | if no_new_comments_count == 3: 266 | print("尝试刷新页面后继续爬取...") 267 | self.driver.refresh() 268 | time.sleep(5) 269 | no_new_comments_count = 2 # 给最后一次机会 270 | continue 271 | break 272 | else: 273 | # 有新评论,重置计数器 274 | no_new_comments_count = 0 275 | 276 | except Exception as e: 277 | print(f"爬取第 {page} 页时出错: {str(e)}") 278 | no_new_comments_count += 1 279 | if no_new_comments_count >= 3: 280 | print("连续多次爬取出错,停止爬取") 281 | break 282 | 283 | print(f"评论爬取完成,共获取 {len(self.comments)} 条评论") 284 | return self.comments 285 | 286 | except Exception as e: 287 | print(f"爬虫运行出错: {str(e)}") 288 | return [] 289 | 290 | finally: 291 | # 关闭文件和浏览器 292 | f.close() 293 | if self.driver: 294 | self.driver.quit() 295 | 296 | def get_output_file(self): 297 | """获取输出文件路径""" 298 | return self.output_file 299 | 300 | 301 | class CommentAnalyzer: 302 | """评论分析与可视化类""" 303 | 304 | def __init__(self, csv_file=None, comments=None): 305 | """ 306 | 初始化分析器 307 | :param csv_file: CSV文件路径 308 | :param comments: 评论数据列表,如果没有提供CSV文件则使用此数据 309 | """ 310 | self.csv_file = csv_file 311 | self.comments = comments 312 | self.df = None 313 | self.output_dir = "douyin_analysis_results" 314 | 315 | # 创建输出目录 316 | if not os.path.exists(self.output_dir): 317 | os.makedirs(self.output_dir) 318 | 319 | def find_latest_csv(self, video_id=None): 320 | """ 321 | 查找最新的评论CSV文件 322 | :param video_id: 可选的视频ID过滤条件 323 | :return: 最新CSV文件的路径,如果未找到则返回None 324 | """ 325 | all_csv_files = [] 326 | 327 | # 搜索当前目录下的所有CSV文件 328 | for file in os.listdir('.'): 329 | if file.startswith('douyin_comments_') and file.endswith('.csv'): 330 | # 如果指定了视频ID,则只查找该视频的CSV文件 331 | if video_id and video_id not in file: 332 | continue 333 | all_csv_files.append(file) 334 | 335 | if not all_csv_files: 336 | return None 337 | 338 | # 按文件修改时间排序,返回最新的文件 339 | latest_file = max(all_csv_files, key=lambda x: os.path.getmtime(x)) 340 | print(f"找到最新的CSV文件: {latest_file}") 341 | return latest_file 342 | 343 | def load_data(self): 344 | """加载数据""" 345 | if self.csv_file and os.path.exists(self.csv_file): 346 | self.df = pd.read_csv(self.csv_file) 347 | print(f"从 {self.csv_file} 加载了 {len(self.df)} 条评论") 348 | elif self.comments: 349 | self.df = pd.DataFrame(self.comments) 350 | print(f"从内存加载了 {len(self.df)} 条评论") 351 | else: 352 | # 尝试查找最新的CSV文件 353 | latest_csv = self.find_latest_csv() 354 | if latest_csv: 355 | self.csv_file = latest_csv 356 | self.df = pd.read_csv(self.csv_file) 357 | print(f"自动从最新文件 {self.csv_file} 加载了 {len(self.df)} 条评论") 358 | else: 359 | raise ValueError("需要提供CSV文件或评论数据") 360 | 361 | return self.df 362 | 363 | def generate_wordcloud(self, shape_img=None, output_file=None): 364 | """ 365 | 生成词云图 366 | :param shape_img: 形状图片路径 367 | :param output_file: 输出文件名 368 | :return: 输出文件路径 369 | """ 370 | if self.df is None: 371 | self.load_data() 372 | 373 | # 默认输出文件名 374 | if not output_file: 375 | output_file = os.path.join(self.output_dir, "comment_wordcloud.png") 376 | 377 | print("正在生成词云图...") 378 | 379 | # 合并所有评论 380 | content = ' '.join([str(i).replace('\n', '') for i in self.df['评论']]) 381 | 382 | # 结巴分词 383 | jieba.setLogLevel(20) # 设置日志级别,避免输出过多日志 384 | words = jieba.lcut(content) 385 | string = ' '.join(words) 386 | 387 | # 加载形状图片 388 | mask = None 389 | if shape_img and os.path.exists(shape_img): 390 | mask = np.array(Image.open(shape_img)) 391 | 392 | # 设置停用词 393 | stopwords = {'了', '的', '我', '你', '是', '都', '把', '能', '就', '这', '还', 394 | '和', '啊', '在', '吧', '有', '也', '不', '呢', '吗', '啥', '怎么', 395 | '一个', '什么', '一下', '一样', '一直', '为了', '可以', '那么'} 396 | 397 | # 配置词云 398 | wc = wordcloud.WordCloud( 399 | font_path='simhei.ttf' if os.path.exists('simhei.ttf') else None, # 字体文件 400 | width=1000, # 宽 401 | height=700, # 高 402 | mask=mask, # 词云形状 403 | background_color='white', # 背景色 404 | max_words=200, # 最大词数 405 | stopwords=stopwords, # 停用词 406 | contour_width=1, # 轮廓宽度 407 | contour_color='steelblue' # 轮廓颜色 408 | ) 409 | 410 | # 生成词云 411 | wc.generate(string) 412 | 413 | # 保存词云图 414 | wc.to_file(output_file) 415 | print(f"词云图已保存至: {output_file}") 416 | 417 | return output_file 418 | 419 | def analyze_location(self, output_file=None): 420 | """ 421 | 分析评论地区分布 422 | :param output_file: 输出文件名 423 | :return: 输出文件路径 424 | """ 425 | if self.df is None: 426 | self.load_data() 427 | 428 | if not output_file: 429 | output_file = os.path.join(self.output_dir, "location_analysis.html") 430 | 431 | print("正在分析评论地区分布...") 432 | 433 | # 统计地区 434 | location_count = self.df['地区'].value_counts() 435 | 436 | # 取前15个地区 437 | top_locations = location_count.head(15) 438 | 439 | # 创建饼图 440 | pie = ( 441 | Pie(init_opts=opts.InitOpts(theme=ThemeType.LIGHT, width="900px", height="500px")) 442 | .add( 443 | "", 444 | [list(z) for z in zip(top_locations.index, top_locations.values)], 445 | radius=["30%", "75%"], 446 | center=["50%", "50%"], 447 | rosetype="radius", 448 | ) 449 | .set_global_opts( 450 | title_opts=opts.TitleOpts(title="评论地区分布"), 451 | legend_opts=opts.LegendOpts(orient="vertical", pos_left="5%", pos_top="15%"), 452 | toolbox_opts=opts.ToolboxOpts() 453 | ) 454 | .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c} ({d}%)")) 455 | ) 456 | 457 | # 保存图表 458 | pie.render(output_file) 459 | print(f"地区分析已保存至: {output_file}") 460 | 461 | return output_file 462 | 463 | def analyze_time_distribution(self, output_file=None): 464 | """ 465 | 分析评论时间分布 466 | :param output_file: 输出文件名 467 | :return: 输出文件路径 468 | """ 469 | if self.df is None: 470 | self.load_data() 471 | 472 | if not output_file: 473 | output_file = os.path.join(self.output_dir, "time_analysis.html") 474 | 475 | print("正在分析评论时间分布...") 476 | 477 | # 转换时间字符串为datetime对象 478 | self.df['时间'] = pd.to_datetime(self.df['时间']) 479 | 480 | # 提取小时 481 | self.df['小时'] = self.df['时间'].dt.hour 482 | 483 | # 统计每小时的评论数 484 | hour_count = self.df['小时'].value_counts().sort_index() 485 | 486 | # 创建条形图 487 | bar = ( 488 | Bar(init_opts=opts.InitOpts(theme=ThemeType.LIGHT, width="900px", height="500px")) 489 | .add_xaxis(hour_count.index.tolist()) 490 | .add_yaxis("评论数", hour_count.values.tolist()) 491 | .set_global_opts( 492 | title_opts=opts.TitleOpts(title="评论时间分布 (小时)"), 493 | xaxis_opts=opts.AxisOpts(name="小时"), 494 | yaxis_opts=opts.AxisOpts(name="评论数"), 495 | toolbox_opts=opts.ToolboxOpts() 496 | ) 497 | ) 498 | 499 | # 保存图表 500 | bar.render(output_file) 501 | print(f"时间分析已保存至: {output_file}") 502 | 503 | return output_file 504 | 505 | def analyze_hot_words(self, top_n=50, output_file=None): 506 | """ 507 | 分析热门词汇 508 | :param top_n: 热门词数量 509 | :param output_file: 输出文件名 510 | :return: 输出文件路径 511 | """ 512 | if self.df is None: 513 | self.load_data() 514 | 515 | if not output_file: 516 | output_file = os.path.join(self.output_dir, "hot_words.html") 517 | 518 | print(f"正在分析热门词汇 (Top {top_n})...") 519 | 520 | # 合并所有评论 521 | content = ' '.join([str(i).replace('\n', '') for i in self.df['评论']]) 522 | 523 | # 结巴分词 524 | jieba.setLogLevel(20) 525 | words = jieba.lcut(content) 526 | 527 | # 过滤停用词 528 | stopwords = {'了', '的', '我', '你', '是', '都', '把', '能', '就', '这', '还', 529 | '和', '啊', '在', '吧', '有', '也', '不', '呢', '吗', '啥', '怎么', 530 | '一个', '什么', '一下', '一样', '一直', '为了', '可以', '那么'} 531 | filtered_words = [word for word in words if len(word) > 1 and word not in stopwords] 532 | 533 | # 统计词频 534 | word_count = Counter(filtered_words) 535 | 536 | # 取前N个高频词 537 | top_words = word_count.most_common(top_n) 538 | 539 | # 创建词云图 540 | wordcloud_chart = ( 541 | PyechartsWordCloud(init_opts=opts.InitOpts( 542 | theme=ThemeType.LIGHT, width="900px", height="500px") 543 | ) 544 | .add( 545 | "", 546 | top_words, 547 | word_size_range=[20, 100], 548 | shape="circle" 549 | ) 550 | .set_global_opts( 551 | title_opts=opts.TitleOpts(title=f"热门词汇 Top {top_n}"), 552 | toolbox_opts=opts.ToolboxOpts() 553 | ) 554 | ) 555 | 556 | # 保存图表 557 | wordcloud_chart.render(output_file) 558 | print(f"热门词汇分析已保存至: {output_file}") 559 | 560 | return output_file 561 | 562 | def run_all_analysis(self, shape_img=None): 563 | """ 564 | 运行所有分析 565 | :param shape_img: 词云形状图片 566 | :return: 所有输出文件的列表 567 | """ 568 | if self.df is None: 569 | self.load_data() 570 | 571 | outputs = [] 572 | 573 | # 生成词云 574 | outputs.append(self.generate_wordcloud(shape_img)) 575 | 576 | # 地区分析 577 | outputs.append(self.analyze_location()) 578 | 579 | # 时间分析 580 | outputs.append(self.analyze_time_distribution()) 581 | 582 | # 热词分析 583 | outputs.append(self.analyze_hot_words()) 584 | 585 | print(f"所有分析已完成,结果保存在 {self.output_dir} 目录") 586 | return outputs 587 | 588 | 589 | def main(): 590 | """主函数""" 591 | print("=" * 60) 592 | print("抖音视频评论爬取与数据可视化分析工具") 593 | print("=" * 60) 594 | 595 | # 询问用户操作类型 596 | mode = input("请选择操作类型:\n1. 爬取新的评论并分析\n2. 分析已有的CSV文件\n请输入选项编号 (1/2): ") 597 | 598 | if mode == "2": 599 | # 分析现有CSV文件 600 | print("\n== 分析已有评论数据 ==") 601 | 602 | # 初始化分析器 - 会自动查找最新的CSV文件 603 | analyzer = CommentAnalyzer() 604 | 605 | # 设置词云形状图片 606 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ") 607 | if shape_img and not os.path.exists(shape_img): 608 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状") 609 | shape_img = None 610 | 611 | # 执行所有分析 612 | analyzer.run_all_analysis(shape_img=shape_img) 613 | 614 | else: 615 | # 爬取新的评论数据 616 | print("\n== 爬取新的评论数据 ==") 617 | 618 | # 获取视频URL 619 | video_url = input("请输入抖音视频URL (例如: https://www.douyin.com/video/7353500880198536457): ") 620 | 621 | # 设置最大爬取页数 622 | try: 623 | pages_input = input("请输入最大爬取页数 (直接回车表示爬取全部评论): ") 624 | max_pages = int(pages_input) if pages_input.strip() else None 625 | except ValueError: 626 | max_pages = None 627 | 628 | if max_pages is None: 629 | print("将爬取全部评论,直到没有更多评论为止") 630 | else: 631 | print(f"将爬取最多 {max_pages} 页评论") 632 | 633 | # 设置词云形状图片 634 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ") 635 | if shape_img and not os.path.exists(shape_img): 636 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状") 637 | shape_img = None 638 | 639 | # 创建爬虫实例 640 | crawler = DouyinCommentCrawler(video_url=video_url, max_pages=max_pages) 641 | 642 | # 执行爬取 643 | comments = crawler.start_crawler() 644 | 645 | if comments: 646 | # 获取输出文件 647 | csv_file = crawler.get_output_file() 648 | 649 | # 创建分析器 650 | analyzer = CommentAnalyzer(csv_file=csv_file) 651 | 652 | # 执行所有分析 653 | analyzer.run_all_analysis(shape_img=shape_img) 654 | 655 | print("程序运行完成!") 656 | 657 | 658 | if __name__ == "__main__": 659 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | DrissionPage>=4.0.0 2 | pandas>=1.0.0 3 | numpy>=1.18.0 4 | jieba>=0.42.1 5 | Pillow>=8.0.0 6 | wordcloud>=1.8.0 7 | matplotlib>=3.3.0 8 | pyecharts>=1.9.0 9 | scikit-learn>=0.24.0 10 | networkx>=2.5.0 --------------------------------------------------------------------------------