├── .gitignore
├── README.md
├── analysis_results
├── comment_wordcloud.png
├── content_tags.html
├── hot_topics.html
├── hot_words.html
├── interaction_network.html
├── location_analysis.html
├── sentiment_analysis.html
├── time_analysis.html
├── user_influence.html
└── user_portraits.html
├── crawled_comments
├── douyin_comments_H_O7wyWOUQQ_20250312_181221.csv
├── douyin_comments__20250311_212326.csv
├── douyin_comments__20250311_223146.csv
├── douyin_comments__20250311_231655.csv
├── douyin_comments__20250312_001546.csv
├── douyin_comments__20250312_002242.csv
├── douyin_comments_fEJejTD6CQ8_20250524_000829.csv
├── douyin_comments_fEJejTD6CQ8_20250524_002249.csv
├── douyin_comments_i5g6kb83_20250312_001130.csv
├── douyin_comments_i5g6kb83_20250312_001710.csv
├── douyin_comments_i5g6kb83_20250312_003421.csv
├── douyin_comments_i5g6kb83_20250313_153107.csv
└── douyin_comments_i5ph454C_20250312_233129.csv
├── douyin_analysis_results
├── README.md
├── comment_wordcloud.png
├── crawled_comments
│ ├── douyin_comments_7505063238057889081_20250524_000454.csv
│ ├── douyin_comments_fEJejTD6CQ8_20250524_001812.csv
│ ├── douyin_comments_fEJejTD6CQ8_20250524_104505.csv
│ ├── douyin_comments_fEJejTD6CQ8_20250524_105423.csv
│ └── douyin_comments_fEJejTD6CQ8_20250524_113220.csv
├── hot_words.html
├── location_analysis.html
├── requirements.txt
├── time_analysis.html
├── 抖音工具集.py
├── 抖音数据分析器.py
├── 抖音视频搜索.py
├── 抖音评论分析器_旧版.py
└── 抖音评论爬虫.py
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # Python 临时文件
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 | *.so
6 | .Python
7 | env/
8 | build/
9 | develop-eggs/
10 | dist/
11 | downloads/
12 | eggs/
13 | .eggs/
14 | lib/
15 | lib64/
16 | parts/
17 | sdist/
18 | var/
19 | *.egg-info/
20 | .installed.cfg
21 | *.egg
22 |
23 | # 虚拟环境
24 | venv/
25 | ENV/
26 | .env
27 |
28 | # IDE相关文件
29 | .idea/
30 | .vscode/
31 | *.swp
32 | *.swo
33 |
34 | # 日志文件
35 | logs/
36 | *.log
37 |
38 | # 系统文件
39 | .DS_Store
40 | Thumbs.db
41 |
42 | # 配置文件中可能包含的敏感信息
43 | config.ini
44 | secrets.json
45 |
46 | # 缓存目录
47 | .pytest_cache/
48 | .coverage
49 | htmlcov/
50 |
51 | # 可能的大数据文件
52 | # *.csv # 注释掉,保留CSV数据文件
53 | *.xlsx
54 | *.db
55 | *.sqlite3
56 |
57 | # 但保留项目必要的样例数据文件和爬取的评论数据
58 | !example_data/*.csv
59 | !crawled_comments/*.csv
60 | !analysis_results/*.csv
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 抖音评论爬取与分析工具
2 |
3 | ## 功能介绍
4 |
5 | 这是一个强大的抖音评论爬取与分析工具,具有以下功能:
6 |
7 | 1. **评论爬取**:自动爬取指定抖音视频的评论数据
8 | - 支持各种格式的抖音链接(短链接和标准链接)
9 | - 支持爬取全部评论,不受页数限制
10 | - 自动处理滚动和加载,提高评论获取的成功率
11 | - 支持提取评论的回复内容
12 | - 详细的评论元数据提取(用户信息、时间、位置等)
13 |
14 | 2. **基础分析功能**:
15 | - 评论词云生成
16 | - 评论地区分布分析
17 | - 评论时间分布分析
18 | - 热门词汇统计
19 | - 情感分析
20 |
21 | 3. **高级数据分析功能**:
22 | - **用户群体画像分析**:分析用户特征、活跃时间、地域分布和语言风格
23 | - **评论互动关系图**:可视化用户间的回复和互动关系
24 | - **内容标签分布分析**:自动提取和分析评论主题与标签
25 | - **用户活跃度与影响力分析**:识别高影响力用户和活跃用户
26 | - **热点话题识别与趋势追踪**:分析评论中的热点话题及其时间变化
27 |
28 | ## 安装依赖
29 |
30 | ```bash
31 | pip install -r requirements.txt
32 | ```
33 |
34 | ## 使用方法
35 |
36 | ### 运行主程序
37 |
38 | ```bash
39 | python douyin_analysis_results/douyin_tool.py
40 | ```
41 |
42 | ### 爬取评论
43 |
44 | 1. 选择菜单中的"爬取新的评论并分析"
45 | 2. 输入抖音视频URL(支持多种格式)
46 | 3. 设置最大爬取页数(直接回车表示爬取全部评论)
47 | 4. 选择是否使用正常浏览器模式(可以登录账号查看评论)
48 | 5. 等待爬取完成
49 |
50 | ### 分析评论
51 |
52 | 1. 选择菜单中的"分析已有的评论数据"
53 | 2. 选择要分析的CSV文件
54 | 3. 选择词云形状图片(可选)
55 | 4. 等待分析完成,结果将保存在`analysis_results`目录下
56 |
57 | ## 分析功能详解
58 |
59 | ### 1. 用户群体画像分析
60 |
61 | - **语言风格分析**:识别用户使用的网络流行语、学生用语、职场用语等
62 | - **活跃时段分析**:分析用户在一天中的活跃时间分布
63 | - **互动频率分析**:区分高频、中频和低频互动用户
64 | - **地域分布热力图**:以地图形式展示用户地理分布
65 |
66 | ### 2. 评论互动关系图分析
67 |
68 | - **用户互动网络**:可视化用户之间的回复和互动关系
69 | - **中心用户识别**:自动识别网络中的中心用户和意见领袖
70 | - **互动模式分析**:分析用户间的互动模式和社交结构
71 |
72 | ### 3. 内容标签分布分析
73 |
74 | - **主题分类**:将评论内容自动分类到预定义主题
75 | - **关键词提取**:基于内容提取主要关键词和话题标签
76 | - **热门标签统计**:分析评论中出现的热门标签和话题
77 |
78 | ### 4. 用户活跃度与影响力分析
79 |
80 | - **活跃用户排行**:根据评论频率识别最活跃的用户
81 | - **影响力得分**:基于评论数、点赞数和评论质量计算用户影响力
82 | - **内容贡献分析**:识别提供高质量内容的用户
83 |
84 | ### 5. 热点话题识别与追踪
85 |
86 | - **话题识别**:自动识别评论中的主要话题
87 | - **时间趋势分析**:追踪话题热度随时间的变化
88 | - **关键词重要性**:使用TF-IDF技术分析关键词的重要性
89 | - **突发热点识别**:检测评论中出现的突发热点话题
90 |
91 | ## 注意事项
92 |
93 | - 使用前请确保安装了所有依赖项
94 | - 首次运行时可能需要下载浏览器驱动
95 | - 爬取大量评论可能需要较长时间
96 | - 部分分析功能可能需要较高的系统配置
--------------------------------------------------------------------------------
/analysis_results/comment_wordcloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/analysis_results/comment_wordcloud.png
--------------------------------------------------------------------------------
/analysis_results/content_tags.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
177 |
178 |
488 |
489 |
491 |
492 |
493 |
--------------------------------------------------------------------------------
/analysis_results/hot_words.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
525 |
526 |
527 |
--------------------------------------------------------------------------------
/analysis_results/location_analysis.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
291 |
292 |
293 |
--------------------------------------------------------------------------------
/analysis_results/sentiment_analysis.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
230 |
231 |
232 |
--------------------------------------------------------------------------------
/analysis_results/time_analysis.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
296 |
297 |
298 |
--------------------------------------------------------------------------------
/analysis_results/user_portraits.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
176 |
177 |
344 |
345 |
501 |
502 |
772 |
773 |
775 |
776 |
777 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments__20250311_212326.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 | 7480167895684023097,a 嘉楠 a,内蒙古,2025-03-10 21:11:48,客厅飘窗下面的抽屉里还有几十盒面膜,二十多个洗面奶,口红二十多个,唇膏十来个,精华液,精油洗发水护发素各十几瓶,两个卫生间柜子里抽屉里都是护肤品[泪奔]卫生巾几十包[泪奔],还有过期的精华放茶几上当护手霜[泪奔]没穿过的新鞋四十多双,衣服更数不清[尬笑],6
3 | 7480146101266957091,麻薯小丸子,广东,2025-03-10 19:47:17,寄给我吧,我爱吃,0
4 | 7480142284349440828,璟恩子,四川,2025-03-10 19:32:26,不爱吃零食,3
5 | 7480419867767866170,啊~biu,四川,2025-03-11 13:29:38,我也是!但是我觉得这就是在宴请小时候的自己[大笑],11
6 | 7480129278760944393,诶嘿嘿💎,陕西,2025-03-10 18:42:00,买了辣的回去又想吃甜的,买了甜的回去又想吃咸的,一进零食店又不想买了,回去后又后悔没买[尬笑]跟有病似的,114
7 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments__20250311_223146.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments__20250311_231655.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments__20250312_001546.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments__20250312_002242.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_000829.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,240,49,,,否,否,,
3 | 7507664919627662121,温柔,96357929612,MS4wLjABAAAAq03QBbhYNNBQfIRYLPy3m-CxlJexMOE_k3EE1h_VubE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oE5meaBHAAAOA9rpgI8oSoCnHbXofIdDAACCTQ.jpeg?from=2064092626,福建,2025-05-24 00:05:21,要想让这款u热火,搭载这款cpu的手机开放root权限就是最好的路,0,0,,,否,否,,
4 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29492,775,,,否,否,,
5 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51405,565,,,否,否,,
6 | 7506482138801226522,迷路的旺仔,81924511976,MS4wLjABAAAAJmDSj5YI7AEl3hepgkNBwaCMhDzDm-9M9yodQefHbfE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_00d105df8e1545078b0450c99f6f5d99.jpeg?from=2064092626,湖北,2025-05-20 19:21:36,雷总你终于回来了,你知道我有多想你吗 [流泪] [流泪],1734,14,,,否,否,,
7 | 7507670325846393639,凉夜听风/自律,4032664197929355,MS4wLjABAAAAhJE1pcmIqOiCWkfVIY0igBh4iwHMqn3_q_ocA5OVHahwc8uAIcObox0jT4RIobeq,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogC79ADVvF1fVARWfjVoCAQEAs5FIAmlAQgAVT.jpeg?from=2064092626,甘肃,2025-05-23 23:55:28,"全网最成功的几位博主
8 | 1、湖远行(2534万粉丝)
9 | 2、雷军(4562万粉丝)
10 | 3、董宇辉(2731万粉丝)
11 | 4、我 (59 .79 万粉丝)",2,0,,,否,否,,
12 | 7506480179830293275,赶路人,83246963821,MS4wLjABAAAA4SBS2nwK5tgy3D1UdDCm8BCFLSd-XAymBVVcNcQig2w,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o01AcLr7Ar5BVlGQ7n3EBef1zIIsEALUAAAKeG.jpeg?from=2064092626,广东,2025-05-20 18:56:57,雷总超级期待新产品 必须买15sPro,1716,145,,,否,否,,
13 | 7507653011641238331,椰果冻,2955956893538686,MS4wLjABAAAAcUlfUsoimmFwlQ6M0ZRa5YA_k7K11iTX411rfWWKtkkM-kfKD4i_RrMkzaErK5u_,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oYARAfclDEDnFAeAmUELCYEi1EeIXjpwsZ9u4A.jpeg?from=2064092626,安徽,2025-05-23 22:48:05,爸,不许和我们冷战了[酷拽]一个月没理我们了[流泪],15,2,,,否,否,,
14 | 7506483045878727461,在下方何,3369368544881071,MS4wLjABAAAAoaFVNHMv2W9k6GlhkGEr1jBlVkfOZEUbazU6fjL5EqINaMM3SVYokjdyvEzc8yiy,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4AEEkmfADxVZD4XGgFBgRIE9AzYAqGfAACnA8.jpeg?from=2064092626,广东,2025-05-20 19:35:51,爸:吃了吗,1065,118,,,否,否,,
15 | 7506541873923769142,冬雨,98699399279,MS4wLjABAAAAFowAYqjTcYdDE8ILI3G4EptJlR9ZkUEe2-DEUU00z-U,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_b687e66f31a44bb7b4c1e2840959ce74.jpeg?from=2064092626,云南,2025-05-20 22:56:23,兄弟们,今天刚领证[嘿哈],想要雷总祝福,2072,175,,,否,否,,
16 | 7507625909693039423,棋棋1102,87761611655,MS4wLjABAAAAir3SzAGiUX1ThGKIT_TdEO1-h4IuqORWDM7EG8PEbMc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_edbb3d586277ee65aafe7577f95e3227.jpeg?from=2064092626,湖南,2025-05-23 21:02:57,雷总,我刚提车十天,被网暴了十天,从来不说粗口的我,与黑粉对骂了几天[流泪],120,18,,,否,否,,
17 | 7506881578614735675,鑫河车机,3318751792735328,MS4wLjABAAAANNWgCK0wglqJMiBk8k69F4R1KEQTeKTQ2xpJ2cc_9Uj-RU6D5vCd8bbz_St-9fR1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_17e2eb811ff3b2760527e5266eb1d654.jpeg?from=2064092626,江苏,2025-05-21 20:54:31,雷总你冷战期间,我偷偷去买了一部 512G+16G的小米15,下次别再冷战了,我已经没钱了,719,51,,,否,否,,
18 | 7506603418384368444,炸丸子,99907808631,MS4wLjABAAAAQpwz4rYpjqy5QLKAkv0tyv8YLppLSZK2Kgr-vmyoAcE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_02ab4fe0fc7d42f38256713ce5ae1c06.jpeg?from=2064092626,云南,2025-05-21 02:55:20,雷爸 以后不准和我们冷战了[流泪],152,6,,,否,否,,
19 | 7506484813844185890,@爱你一万年,1797054772550670,MS4wLjABAAAAa_0ivBpm4YeKmz_Q33G1G_0A9sV4Ybrpbxf50DhitHVj4OMEuAQcf_jRl0PzBkvr,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oQdBeFXcDIM3PiAAxiAA8AQeNwQc7FKBGAiPfE.jpeg?from=2064092626,山西,2025-05-20 19:35:55,军儿,你终于回来了 [流泪], 不要再断更了,你不在时每隔几天我都要来看看你发视频了没,5665,78,,,否,否,,
20 | 7507672331880776463,诛戮陷绝,71716676720,MS4wLjABAAAAXNipfx-WnaoQj1nHNdNa_cC9fzh_51fnJKTTP6Jfugg,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogbC7cHAAACDAcgwJJnfCkIIuEASgt9j7ekACo.jpeg?from=2064092626,新疆,2025-05-24 00:03:12,[微笑]为什么不早说,我上个月刚买的15,我想用用玄界。,0,0,,,否,否,,
21 | 7506486944446300979,卢伟冰,2544737774480633,MS4wLjABAAAAeeGpeTBNIRe66uQLgsFZmiRXR4GEQnh6FCtORpNOjXMPVjYqkmDeDtYhzBirNj_k,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oIz1AhKAFIA5LfdA1CctDn9p3bAaCegjPyE4AA.jpeg?from=2064092626,北京,2025-05-20 19:23:13,5月22日,发布会见![爱心],2279,143,,,否,否,,
22 | 7506488223708889867,🌈带努比的二明,85271783837,MS4wLjABAAAABCaMkM4yYIo56epa-TYx3MRfbHUmvx_jhnrROzH9gDs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a0c35b5fe71acd688f89782beadd1e86.jpeg?from=2064092626,广东,2025-05-20 19:55:42,爸吃饭了吗,55,23,,,否,否,,
23 | 7506537868958892800,小汪哥精致男装,1084535024136804,MS4wLjABAAAA_rkgRekkXc9BqXuVcgu3dLgzqqzkLBJt7qpkFVELHDTchJ0AUKBkr9rocCZrJl4K,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_f16a5d1a8b1d44635794d759eca66d40.jpeg?from=2064092626,重庆,2025-05-20 23:01:18,支持雷军的有多少,657,31,,,否,否,,
24 | 7506494061470663465,做个向日葵,97352596218,MS4wLjABAAAAOC1_YnR8s2eI8819pC0xBpwdUNWzt1V1Y1UQIuf0G1M,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_3a65ef707b78443bb201bf8bcfae01ff.jpeg?from=2064092626,湖南,2025-05-20 20:07:28,雷总,终于回来了,想死你了,1151,10,,,否,否,,
25 | 7506543779400254223,真由美🎖,7411158697970304059,MS4wLjABAAAAnWoUhTX74tn3ekpwq8hYuEcaozWIqL5ufhyVrEdGzBwc0sG5xZny3l1Ocg59kBo1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_bdd4a061a7958126e3580b7b94852a4d.jpeg?from=2064092626,重庆,2025-05-20 23:03:47,帅不帅你们说[舔屏],5306,472,,,否,否,,
26 | 7506578153490547492,蜜雪冰城(福州总店),3496062407160124,MS4wLjABAAAA7R_qUDVMbBZDIroUyQx9ptCrxpNVlsMRKBSKvd0u1z4641KVK8vKDaH0P0Yb9AbT,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4DGCA9AIAAbaQBeAmAgQ0ifCC8WYVPnl5rhh9.jpeg?from=2064092626,福建,2025-05-21 01:17:21,爸,443,58,,,否,否,,
27 | 7506848922740982566,卡通,2546112058894104,MS4wLjABAAAAk96fCkeU6KcVzYwIjBwiannGRfY1MoLgWOYvMjm-_0kUquq_otRmSXqHCqkDQpJp,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_3797_2889309425.jpeg?from=2064092626,广西,2025-05-21 18:47:50,爸,吃了吗?,1141,252,,,否,否,,
28 | 7506492129842217768,青禾,64301672255,MS4wLjABAAAAgEDqvTfEbI8noWC1J2SkJvWiwb_iKl_RBUCErHhyiiw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oEcAyiAsigXDzPy9BIhpZj1iEA7A4Oz3PgIAP.jpeg?from=2064092626,河南,2025-05-20 19:58:04,冷战结束,雷总回来啦,999,8,,,否,否,,
29 | 7506536991661245242,南重小肖,1279473564788093,MS4wLjABAAAAxbhMheOU0lQ5krDTRGD1R6g9WC-BcQNy4YaiLT9QFmTMESLdzorHt0PpCfydBneV,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_3137c6d29dabe13b8bc7790f03ec95db.jpeg?from=2064092626,广东,2025-05-20 22:37:21,米粉今天领证啦[抱抱你],509,53,,,否,否,,
30 | 7506550905915982650,王六六,69232476424,MS4wLjABAAAA77RVJV-6zevOEmadycWZZQjQ8lZwdCMBdmkaYDVN2z0,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_c52ecdfe87854914948071dc24442d71.jpeg?from=2064092626,江苏,2025-05-20 23:31:21,雷总 我的小米6为啥总是发烫呀[看],513,82,,,否,否,,
31 | 7506755771851391803,昏君不吃鱼,3019677097539415,MS4wLjABAAAASAtgMxIM7KeG8dKvLaRANhrIjwez7Vdo2L-hdAOgkNzNiL7p1X5KEp91Jx756fsK,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/douyin-user-image-file_85ab3a39770bf1d23b032e7abb2d0682.jpeg?from=2064092626,北京,2025-05-21 12:46:21,您好,雷总,我们是从西安到北京度蜜月的小米 su7 车主,我本人也是红米 turbo 4 pro 手机用户, 我和我的妻子很早之前就已经确定今天到北京了,没想到刚好遇到了明天是发布会,能不能让我们近距离的参加一下发布会🎉('ω')🎉感谢🙏,403,42,,,否,否,,
32 | 7506548565968814875,腾飞,954029966884967,MS4wLjABAAAAX29QOfYJnwJKi3bStCXfymChXaY8HhEmEppdw96rJ3g,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_9ee4a3a2a2ba6077d7fced2858c75bd3.jpeg?from=2064092626,广东,2025-05-20 23:22:18,爹,吃饭了吗?[偷笑],24,6,,,否,否,,
33 | 7506583171318170428,By,81801015985242,MS4wLjABAAAAYaFXru45hASQuh9AdF9VpO3HsTfMCWl1Q4lbZ-50zxE,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ocf5cmACfACMA2DxAnouJRIg2F9HnrBAGDEEA6.jpeg?from=2064092626,广东,2025-05-21 01:36:43,我老总圈唯一人脉终于回来了,747,7,,,否,否,,
34 | 7507664223754060603,用户7823815321920,910018155381435,MS4wLjABAAAA1QYmS_Wju2LEHigo5yxs5rdvA201sDGydNS1SvJwFAo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_42550a394ef049da86d194c1a78ccff6.jpeg?from=2064092626,福建,2025-05-23 23:31:36,,23,1,,,否,否,,
35 | 7506488070751437625,A.R,4305321844552584,MS4wLjABAAAAsAEfNyY5D-kUae2B6Up1OX5hJF6uurc8Q4cYbS1VyInMZisMro6VvqfMrFsb91tk,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_owK9fkECXABqXQeRAUEV84wEDfAGFEAUzEAnPI.jpeg?from=2064092626,云南,2025-05-20 19:27:32,爸你终于回来了,228,14,,,否,否,,
36 | 7506481351215579942,🌈Y h,59237659425,MS4wLjABAAAAo9spT8FJ4Zdwm7gq0vsXbu6lpauaMK3YYpc9JzvnuJo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ooxAfC8XVInxkwAAHAAgbTDAl6QS9WaADAeVoC.jpeg?from=2064092626,江苏,2025-05-20 19:01:30,雷总,我的小米14聊天刷视频都烫手[尬笑][尬笑][尬笑],573,215,,,否,否,,
37 | 7506847770889274131,Moonlight,262129025558627,MS4wLjABAAAAsWAwmD2f53hsOt1J4LWp3TtJcEZo6JaOJGrdk8f2S0g,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_40dcb9b98afe1aacf814e5fcac697eb0.jpeg?from=2064092626,江苏,2025-05-21 18:43:24,雷总,下次不要和我们冷战了[流泪],1589,20,,,否,否,,
38 | 7506486988432016180,一枕清风明月,1516964907065831,MS4wLjABAAAAGfvWblt_lzcqneYBZKw_YU309MWbISyCUGK4y7W6S_zsjC8LXNc1UMk8l9yhv_j6,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oAHICO1HAD3F6Nmo0OFAg4QE9AsfAvXfAACnAG.jpeg?from=2064092626,陕西,2025-05-20 19:57:19,别人不断的黑小米,小米还能像个巨人一样屹立不倒,这才是真正用心的企业。希望小米越来越好,164,0,,,否,否,,
39 | 7507117301013218060,段小林儿,94921424644,MS4wLjABAAAAt9OH_Q5CUpi-6Di19lh2SJf9up1p0E1Bc4aKeX7QxRI,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a35920346420a0b987cfea605d0df79f.jpeg?from=2064092626,云南,2025-05-22 12:09:23,爸,吃饭了吗[比心],464,92,,,否,否,,
40 | 7507258796034999049,您保重,1357254247323852,MS4wLjABAAAAP7Aowepq-AGwKjhzadPqrhyn0EsecEThdcLn-Qbrtipk-sOGsRuRbdbofNMLCOi7,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_owh6YAATnDAT9YXyCziigTBCAeA9lVAIEKfQDX.jpeg?from=2064092626,山东,2025-05-22 21:18:24,盲猜开始了!来!大家出个价,26,15,,,否,否,,
41 | 7506533584396714812,卡皮丘,100702910418,MS4wLjABAAAAEkuLBz4Ykj64bc-dDtgim4hbLXkQARFbkdP87Drr3vs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_a03d4f85ea88a5e9adb8fabdd8c004e8.jpeg?from=2064092626,重庆,2025-05-20 22:24:07,下次再冷暴力试试,997,13,,,否,否,,
42 | 7506493511123043091,橘子苹果大菠萝,95583713531,MS4wLjABAAAAsFlxJPhfZOiNM1-n_0_T7zruEotyvibeTw1grNmUzrM,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_2be01d94027c4da2b470a0a39d3a6d31.jpeg?from=2064092626,广东,2025-05-20 20:05:20,爸爸你为什么这么久了才回来 [流泪],499,19,,,否,否,,
43 | 7506484883049923387,梗姐姐,3474112572840599,MS4wLjABAAAARhMLBRHP-d570w9GcmOi_I6luP9gMqQco9HE-O94jnJA2ORTfzMFKt24ncmfKTYp,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_30a0e659e2bd5dff0ddbf7dd553dbdf0.jpeg?from=2064092626,北京,2025-05-20 19:15:09,爸,我盲订15s pro了![得意],207,17,,,否,否,,
44 | 7506484189350183719,杨林灿,3056230252949075,MS4wLjABAAAALNXq-4C-8hWZcJSwdpfN6hywSdVua0Om0rghJFWCH1UQHxNqlz3IXoyMXkkdt4CO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_2e3320001b947f2879603.jpeg?from=2064092626,上海,2025-05-20 19:12:32,小米为国人为国家做出重大贡献,为雷总点赞[赞][赞][赞][鼓掌][鼓掌][鼓掌][平安果][平安果][平安果],77,2,,,否,否,,
45 | 7506564456395080463,怜悯,65423123801,MS4wLjABAAAAPCZnpMKZhG_C12PfbP3i1jRnsBkfUPbDLkQpGIGPGfI,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oMnACra0IC6EjAF9fAbIpAgAAWAJg6fBhgODEb.jpeg?from=2064092626,湖南,2025-05-21 00:24:03,雷总,能不能解决一下小米十五发烫问题,51,55,,,否,否,,
46 | 7506531888698524431,ZhanG,86955928420,MS4wLjABAAAAc_kvIRbOCm26kwKQ8BiopdI2qiWiQWPg52oSV_tRKzA,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_9c9933d4e9b817ff1d13d80b6d632d5e.jpeg?from=2064092626,浙江,2025-05-20 22:17:35,那我的小米15 Pro算什么[泪奔][泪奔][泪奔],27,14,,,否,否,,
47 | 7507124521351807796,Doki兜兜,3426539489728047,MS4wLjABAAAAJ8IxviJWeiEjCX9sDmMXf-1ZDIsP-6oS5l7F-yLc1ATKhYvb-uQE03n9hEsDg0Fw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_105c8ede9583aca43d63dab0ee6e06d6.jpeg?from=2064092626,北京,2025-05-22 12:37:21,雷总你好,我5.20那天刚提了Ultra ,超级无敌帅,而且我们车主群里从来没有一个人提过想不要这辆车,非常支持您!顺便提一句,我做汽车主播几年了,前几天听说yu7在招主播,我第一时间报名但是我并没有收到面试信息,非常遗憾,所以在这里自荐一下,以后如果有需要,希望您的团队可以考虑我,以下是我的照片。,290,17,,,否,否,,
48 | 7506904505452675900,瑶瑶²⁰²⁵~,100323943278,MS4wLjABAAAAxDeDUYDMIdx-XKOPNVa5mj3nwEDAFA4tsMoR0ujcKeA,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_86f6c561441abeec4cf2a83421d25cdf.jpeg?from=2064092626,江西,2025-05-21 22:23:30,爸,你吃饭了吗,377,88,,,否,否,,
49 | 7507667954403312422,— —,98126385233,MS4wLjABAAAATSI9bEfcSOY3qPMvpUkYJ6KrWKLpXG8SYpWbEGpRYkw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_fd8dbd2a5eca4d1aa52dee64ffb00fdf.jpeg?from=2064092626,新疆,2025-05-23 23:46:10,雷总出啥都无脑支持,2,1,,,否,否,,
50 | 7506489714263147290,鬼畜眼镜,1614548231792604,MS4wLjABAAAAqeBDFrUNu6SdHluquTGprkHKSm48rthMBOCYY9UA6WH13ejk42hIY5ACvRRYuqRd,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_69c10b7fd32948c7809b93d9ad67b536.jpeg?from=2064092626,天津,2025-05-20 19:33:56,有人想让小米死[抠鼻],17,0,,,否,否,,
51 | 7506568967345275706,独秀青春,1833615464871400,MS4wLjABAAAA8IPy93oRYYhkqxjwLDv2xKBIkRhoa4ZqT-DIUh0YEBppKDdfZ-dFBJeZcS7EVLAf,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oMfKAFgAeOPAEgEp7rlFQB7DEjAIYI7x4f8AAJ.jpeg?from=2064092626,广西,2025-05-21 00:41:32,卖1999价格吧对得起所有米粉[大笑],27,4,,,否,否,,
52 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_002249.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51643,568,,,否,否,,
3 | 7507676972824658728,四季开农业种子店,4218184767116749,MS4wLjABAAAAFBZ32pjgpvvyrXiSmwt9w11cuV0-TTQCh4x_s1hSd8PtEpUpGN5O3lCdEAM81l4C,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_o0DECIA2gAGkAX86eifj4zBCPAYrWiApwALBAH.jpeg?from=2064092626,江苏,2025-05-24 00:21:19,啥时候出一个主机放车里,或者带身上,屏幕分离,加大屏幕待电量,就像路由器那样把主机也可以放家里,放车里,放包里,放口袋里[泣不成声],出门就拿着屏幕,游戏就数据线连主机[泪奔],0,0,,,否,否,,
4 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29628,776,,,否,否,,
5 | 7506494061470663465,做个向日葵,97352596218,MS4wLjABAAAAOC1_YnR8s2eI8819pC0xBpwdUNWzt1V1Y1UQIuf0G1M,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_3a65ef707b78443bb201bf8bcfae01ff.jpeg?from=2064092626,湖南,2025-05-20 20:07:28,雷总,终于回来了,想死你了,1152,10,,,否,否,,
6 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,267,56,,,否,否,,
7 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments_i5g6kb83_20250312_001130.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments_i5g6kb83_20250312_001710.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,地区,时间,评论,点赞数
2 | 7480322720539231014,长矛沾屎戳谁谁死 水瓶装尿呲谁谁叫,福建,2025-03-11 07:12:44,"谁懂啊
3 | 毫无欲望的一堵墙",11
4 | 7480373749487125283,不换,海南,2025-03-11 10:30:47,过期之前喊我去你家可以吗,0
5 | 7480167895684023097,a 嘉楠 a,内蒙古,2025-03-10 21:11:48,客厅飘窗下面的抽屉里还有几十盒面膜,二十多个洗面奶,口红二十多个,唇膏十来个,精华液,精油洗发水护发素各十几瓶,两个卫生间柜子里抽屉里都是护肤品[泪奔]卫生巾几十包[泪奔],还有过期的精华放茶几上当护手霜[泪奔]没穿过的新鞋四十多双,衣服更数不清[尬笑],6
6 | 7480358874639270683,饼干小姐🍪,海南,2025-03-11 09:33:02,要不你给我邮点 邮费我自付 不然太可惜了[捂脸],0
7 | 7480160776440693555,90岁冷艳美蟑螂,四川,2025-03-10 20:44:13,我不一样,就跟饕餮一样,不管买了多少一会儿就吃完了,本来打算攒着吃的[憨笑][憨笑],4
8 |
--------------------------------------------------------------------------------
/crawled_comments/douyin_comments_i5g6kb83_20250313_153107.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 | 7480158671126479676,是想引起我的注意吗,60594754198,MS4wLjABAAAAOsqQ7da7h6mZ0HJwd7sYoXFPZ85TAWXZlsvqfUofV4Y,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oUL7AAScGeAtmEInDwFfEEwdDaxAuuEf9ADgA7.jpeg?from=2956013662,广东,2025-03-10 20:36:01,你们是怎么忍住不吃的?中午买的恨不得一晚上炫完[微笑],2743,1072,,,否,否,,
3 | 7481170032313484069,李瑞泽(已老实版),61721752502,MS4wLjABAAAAVrsCLgVGnMdtoSLve6jTbqV-tjDYy-jvCM6YAPXMFXc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oAgszAIABAzjgqAoEABACoNFxCfNeI1hyAZ9NA.jpeg?from=2956013662,湖北,2025-03-13 14:00:41,我有个坏习惯不吃完不会再去买,只有吃的干干净净了才会再去买[捂脸],1,1,,,否,否,,
4 | 7480109441121256232,·,61306370428,MS4wLjABAAAAYrNuWH0HUR873JZO6er8QJcVqNe8JBa0tnaP47EJ5bk,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/mosaic-legacy_20b4d000515fe287a65c7.jpeg?from=2956013662,湖南,2025-03-10 17:25:05,爱买化妆品。不用。放到过期[泪奔][泪奔][泪奔],264,54,,,否,否,,
5 | 7481178287105114895,🌾 JFF、,60600708724,MS4wLjABAAAAzyK88r686Ncv3AdrJC_CW2Ra-x4EpRGvKYexzZ_DGzs,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oYJ6eAfZCuIAxA9CAVgcEkFEoVARAn7lDbAwCn.jpeg?from=2956013662,四川,2025-03-13 14:32:41,每个月我老公帮我清理一堆,每个月都在说我,说我浪费[捂脸],0,0,,,否,否,,
6 | 7481193537113260860,粽粽👩🚀,106072335272,MS4wLjABAAAA62prbZiWNcKLVQjzhz192FOO3wvf8UsWOmnk1kqMUlU,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_850be733592ee535c29723867b84d946.jpeg?from=2956013662,江苏,2025-03-13 15:31:51,我会放到过期,救命啊,然后想吃的时候吧外包装还扔了有些,就总在吃过期的零食感觉,0,0,,,否,否,,
7 | 7480129278760944393,诶嘿嘿💎,3091441175501164,MS4wLjABAAAAQuARjFtB-BvM1gy5VMHmsmvML8hKPjY-maezMsKfAuU3DXnt9ek-s5mDbMNNKo0D,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oMfDieUgNAu2GYAQAfWIA3I0pCwLISIwc3f5AB.jpeg?from=2956013662,陕西,2025-03-10 18:42:00,买了辣的回去又想吃甜的,买了甜的回去又想吃咸的,一进零食店又不想买了,回去后又后悔没买[尬笑]跟有病似的,163,21,,,否,否,,
8 |
--------------------------------------------------------------------------------
/douyin_analysis_results/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/README.md
--------------------------------------------------------------------------------
/douyin_analysis_results/comment_wordcloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/comment_wordcloud.png
--------------------------------------------------------------------------------
/douyin_analysis_results/crawled_comments/douyin_comments_7505063238057889081_20250524_000454.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 | 7505554471775208229,玫瑰郁金香~,4195333778452264,MS4wLjABAAAA_9K-j3RLTwTYcwMzzVbuWpWiQaVGVqIQgJCBZq-TlzYGjUUzzorPthVauIvJ2aVC,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_f5bf7e5d516141538f2f894a4a8b3bb0.jpeg?from=2064092626,河北,2025-05-18 07:04:43,[比心][比心][比心],0,0,,,否,否,,
3 |
--------------------------------------------------------------------------------
/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_001812.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 | 7506489038375912250,胖胖的瘦子A,62657988556,MS4wLjABAAAA1VLsgI4LPOjg_fYDHlF5DfDxPxMAzHgNdjb_5O3YYb8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_46f3d3b12c9e74996fac759285ed4810.jpeg?from=2064092626,山东,2025-05-20 19:31:16,以后不准跟我们冷暴力了啊,51533,568,,,否,否,,
3 | 7506483467308991290,小罗罗,1041991924452430,MS4wLjABAAAAYT4SFMQnzUiCy_-0N8Yl58HRLTfHUfFA88g2wq95HYd4gF9hsw9EMfjd8rlFs_1R,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_e2774d94b419443dbd92a31b11232b0f.jpeg?from=2064092626,重庆,2025-05-20 19:09:40,雷总,我女儿和你一天生日,目标大学武汉大学!我希望我女儿像你一样幸运[比心],29590,775,,,否,否,,
4 | 7506755771851391803,昏君不吃鱼,3019677097539415,MS4wLjABAAAASAtgMxIM7KeG8dKvLaRANhrIjwez7Vdo2L-hdAOgkNzNiL7p1X5KEp91Jx756fsK,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/douyin-user-image-file_85ab3a39770bf1d23b032e7abb2d0682.jpeg?from=2064092626,北京,2025-05-21 12:46:21,您好,雷总,我们是从西安到北京度蜜月的小米 su7 车主,我本人也是红米 turbo 4 pro 手机用户, 我和我的妻子很早之前就已经确定今天到北京了,没想到刚好遇到了明天是发布会,能不能让我们近距离的参加一下发布会🎉('ω')🎉感谢🙏,403,42,,,否,否,,
5 | 7506480179830293275,赶路人,83246963821,MS4wLjABAAAA4SBS2nwK5tgy3D1UdDCm8BCFLSd-XAymBVVcNcQig2w,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o01AcLr7Ar5BVlGQ7n3EBef1zIIsEALUAAAKeG.jpeg?from=2064092626,广东,2025-05-20 18:56:57,雷总超级期待新产品 必须买15sPro,1735,145,,,否,否,,
6 | 7506492129842217768,青禾,64301672255,MS4wLjABAAAAgEDqvTfEbI8noWC1J2SkJvWiwb_iKl_RBUCErHhyiiw,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oEcAyiAsigXDzPy9BIhpZj1iEA7A4Oz3PgIAP.jpeg?from=2064092626,河南,2025-05-20 19:58:04,冷战结束,雷总回来啦,1001,8,,,否,否,,
7 | 7507656153338692391,伟生:岑[cen],96002475445,MS4wLjABAAAASzg3Wj2ivLtKqgQ6CgKYkUDM8-Q6svmJOVvC8_VtTD8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_3c7e31f803801e7ba6daf8ee13ed8d71.jpeg?from=2064092626,广东,2025-05-23 23:00:20,小米芯片牛逼[捂脸],9,0,,,否,否,,
8 | 7506486944446300979,卢伟冰,2544737774480633,MS4wLjABAAAAeeGpeTBNIRe66uQLgsFZmiRXR4GEQnh6FCtORpNOjXMPVjYqkmDeDtYhzBirNj_k,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_oIz1AhKAFIA5LfdA1CctDn9p3bAaCegjPyE4AA.jpeg?from=2064092626,北京,2025-05-20 19:23:13,5月22日,发布会见![爱心],2282,143,,,否,否,,
9 | 7506890136362369849,帽子掉了囖,2634020313116317,MS4wLjABAAAA83Davw1de5aRD3LiEOpk31MyVDWLrgpB6L5Ow979EklND6pUVdVGppOiWMspKBHc,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_osPoDAPI2ICepfGIBAAgbQAHIAwbRfGcCNeCLI.jpeg?from=2064092626,福建,2025-05-21 21:27:44,"雷总,你好,我一直是小米粉丝,16左右就一直用小米,今年3月20号呢我买了小米15pro,直到上周天早上8.40开车出车库的时候,我用手机打开carlife后连接,结果小米15pro页面卡死,而后我去网上查结果发现这个bug目前仅我这台出现,我就去找客服,客服协调后让我去小米之家检查,检查如果有硬件问题就换机,
10 |
11 | 但是都已经恢复了,能测出啥来?日志日志不看,视频视频不看,什么工作态度?我已经录制了视频也把日志导出反馈了,但测试也只是测一测我的手机硬件问题,最后报告出来啥事没有,
12 |
13 | 那会出地库卡死的50分钟,我也有很重要的事情啊,本来9给学生上课,硬生生被推迟,手机使用不了,车库缴费出不去,打车用不了,很难受啊,和客服协商后还好些,但现在啥也不解决,我就很无语了,我需要的是解决方案,而不是一味的不好意思
14 | 一直卡在os启动界面[流泪][流泪][流泪][流泪][流泪][流泪][流泪][流泪],没解决,没下文,好伤心[流泪][流泪][流泪][流泪][流泪][流泪][流泪]@雷军 @雷军小米记事录",92,27,,,否,否,,"104815668206,3094492413967114"
15 | 7507536897355399948,心想事成。,75989719933,MS4wLjABAAAAsQTLpo_8onSZljE6vyzCZVWyC5bhv_JrW5xWCJ-gW0A,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_okDwAEkAVQiKpFGArDMfBuYtfnA4C9gDToIcAA.jpeg?from=2064092626,广东,2025-05-23 15:17:34,湖北的是谁,15,7,,,否,否,,
16 | 7507653481228075788,LUCK(解压食记),3798186297661322,MS4wLjABAAAAhmOuD6-uWnvcn6oRbHa8NKRUJF__NdPDginIxYKwW7efMtdAlg0IkBeVpKsA3dfO,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_19ae47614ad1ea84205b4f9446a250d3.jpeg?from=2064092626,广东,2025-05-23 22:50:00,雷总你把头像把成这个,我买10辆,261,55,,,否,否,,
17 | 7506481351215579942,🌈Y h,59237659425,MS4wLjABAAAAo9spT8FJ4Zdwm7gq0vsXbu6lpauaMK3YYpc9JzvnuJo,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ooxAfC8XVInxkwAAHAAgbTDAl6QS9WaADAeVoC.jpeg?from=2064092626,江苏,2025-05-20 19:01:30,雷总,我的小米14聊天刷视频都烫手[尬笑][尬笑][尬笑],573,215,,,否,否,,
18 | 7506508566116696890,时间的猫,72933156278,MS4wLjABAAAAaMfSDEj8o6e__89fhI8dqYxefOZl1D4RhFEJROkRwN8,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oUB6bzACPAEoArBeAXBaaHIwPiehTzZAUeiEEV.jpeg?from=2064092626,贵州,2025-05-20 20:47:03,准备买苹果的,可是我能等!,44,7,,,否,否,,
19 | 7507670325846393639,凉夜听风/自律,4032664197929355,MS4wLjABAAAAhJE1pcmIqOiCWkfVIY0igBh4iwHMqn3_q_ocA5OVHahwc8uAIcObox0jT4RIobeq,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_ogC79ADVvF1fVARWfjVoCAQEAs5FIAmlAQgAVT.jpeg?from=2064092626,甘肃,2025-05-23 23:55:28,"全网最成功的几位博主
20 | 1、湖远行(2534万粉丝)
21 | 2、雷军(4562万粉丝)
22 | 3、董宇辉(2731万粉丝)
23 | 4、我 (59 .79 万粉丝)",3,0,,,否,否,,
24 | 7506483045878727461,在下方何,3369368544881071,MS4wLjABAAAAoaFVNHMv2W9k6GlhkGEr1jBlVkfOZEUbazU6fjL5EqINaMM3SVYokjdyvEzc8yiy,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813_o4AEEkmfADxVZD4XGgFBgRIE9AzYAqGfAACnA8.jpeg?from=2064092626,广东,2025-05-20 19:35:51,爸:吃了吗,1071,119,,,否,否,,
25 | 7507675421903766330,你好,我是一个喜欢女人的中国男人🇨🇳,66454192133,MS4wLjABAAAAPPa3FpsZScnFuxp4zRe2lw-vVqdBPgmVmYKNsPzUjcM,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oAEbAjHKOFBAlAMAIoEDuRfyQAeYwaapEeRPG9.jpeg?from=2064092626,广东,2025-05-24 00:15:04,只能说,这个芯片更加适合小米。正常来说骁龙芯片,需要各个手机厂商去适配芯片,但是小米的这个芯片,可以更加适配于小米,只能说这次芯片对比骁龙一般般。以后小米更新几代一定会全面超过骁龙的,但是只在小米手机上全面超越,就这个意思。,1,0,,,否,否,,
26 | 7507653011641238331,椰果冻,2955956893538686,MS4wLjABAAAAcUlfUsoimmFwlQ6M0ZRa5YA_k7K11iTX411rfWWKtkkM-kfKD4i_RrMkzaErK5u_,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c000-ce_oYARAfclDEDnFAeAmUELCYEi1EeIXjpwsZ9u4A.jpeg?from=2064092626,安徽,2025-05-23 22:48:05,爸,不许和我们冷战了[酷拽]一个月没理我们了[流泪],15,2,,,否,否,,
27 | 7506486988432016180,一枕清风明月,1516964907065831,MS4wLjABAAAAGfvWblt_lzcqneYBZKw_YU309MWbISyCUGK4y7W6S_zsjC8LXNc1UMk8l9yhv_j6,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-i-0813c001_oAHICO1HAD3F6Nmo0OFAg4QE9AsfAvXfAACnAG.jpeg?from=2064092626,陕西,2025-05-20 19:57:19,别人不断的黑小米,小米还能像个巨人一样屹立不倒,这才是真正用心的企业。希望小米越来越好,164,0,,,否,否,,
28 | 7506881578614735675,鑫河车机,3318751792735328,MS4wLjABAAAANNWgCK0wglqJMiBk8k69F4R1KEQTeKTQ2xpJ2cc_9Uj-RU6D5vCd8bbz_St-9fR1,https://p3-pc.douyinpic.com/aweme/100x100/aweme-avatar/tos-cn-avt-0015_17e2eb811ff3b2760527e5266eb1d654.jpeg?from=2064092626,江苏,2025-05-21 20:54:31,雷总你冷战期间,我偷偷去买了一部 512G+16G的小米15,下次别再冷战了,我已经没钱了,724,52,,,否,否,,
29 |
--------------------------------------------------------------------------------
/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_104505.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_104505.csv
--------------------------------------------------------------------------------
/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_105423.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crownoranges/douyin-comment/24c6ff6324449a4294b6a713cab58412cd420819/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_105423.csv
--------------------------------------------------------------------------------
/douyin_analysis_results/crawled_comments/douyin_comments_fEJejTD6CQ8_20250524_113220.csv:
--------------------------------------------------------------------------------
1 | 评论ID,昵称,用户ID,用户sec_id,头像,地区,时间,评论,点赞数,回复数,回复给用户,回复给用户ID,是否置顶,是否热评,包含话题,提及用户
2 |
--------------------------------------------------------------------------------
/douyin_analysis_results/hot_words.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
441 |
442 |
443 |
--------------------------------------------------------------------------------
/douyin_analysis_results/location_analysis.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
241 |
242 |
243 |
--------------------------------------------------------------------------------
/douyin_analysis_results/requirements.txt:
--------------------------------------------------------------------------------
1 | DrissionPage>=3.0.0
2 | pandas>=1.3.0
3 | jieba>=0.42.1
4 | wordcloud>=1.8.2
5 | matplotlib>=3.5.0
6 | pyecharts>=1.9.0
7 | numpy>=1.20.0
8 | pillow>=9.0.0
--------------------------------------------------------------------------------
/douyin_analysis_results/time_analysis.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Awesome-pyecharts
6 |
7 |
8 |
9 |
10 |
11 |
12 |
258 |
259 |
260 |
--------------------------------------------------------------------------------
/douyin_analysis_results/抖音工具集.py:
--------------------------------------------------------------------------------
1 | """
2 | 抖音评论爬取与分析工具 - 统一入口
3 |
4 | 功能:
5 | 1. 爬取抖音视频评论
6 | 2. 分析评论数据并生成可视化图表
7 | 3. 支持爬取全部评论,不受限于页数
8 | 4. 支持通过关键词搜索视频并爬取评论
9 | 5. 按日期时间保存数据,便于历史追踪
10 |
11 | 日期: 2025年3月12日
12 | 版本: 3.1
13 | 作者: TO:梁
14 | """
15 |
16 | import os
17 | import sys
18 | import time
19 |
20 | # 导入爬虫和分析器
21 | try:
22 | from 抖音评论爬虫 import DouyinCommentCrawler
23 | from 抖音数据分析器 import CommentAnalyzer
24 | from 抖音视频搜索 import DouyinVideoSearcher
25 | except ImportError:
26 | print("正在导入模块...")
27 | # 尝试相对导入
28 | try:
29 | # 当前文件目录
30 | current_dir = os.path.dirname(os.path.abspath(__file__))
31 | if current_dir not in sys.path:
32 | sys.path.append(current_dir)
33 |
34 | from 抖音评论爬虫 import DouyinCommentCrawler
35 | from 抖音数据分析器 import CommentAnalyzer
36 | from 抖音视频搜索 import DouyinVideoSearcher
37 | except ImportError:
38 | print("无法导入必要模块。请确保相关模块文件位于同一目录下。")
39 | sys.exit(1)
40 |
41 |
42 | def show_banner():
43 | """显示欢迎横幅"""
44 | print("\n" + "=" * 80)
45 | print("抖音评论爬取与分析工具 V3.1".center(78))
46 | print("=" * 80)
47 | print(" 功能:爬取抖音视频评论数据并生成多维度分析图表")
48 | print(" 特点:支持爬取全部评论 | 自动保存历史数据 | 多维度数据可视化 | 关键词搜索视频")
49 | print(" 作者:TO:梁")
50 | print("=" * 80 + "\n")
51 |
52 |
53 | def print_section(title):
54 | """打印带有分隔符的小节标题"""
55 | print("\n" + "-" * 50)
56 | print(f" {title} ".center(48, "-"))
57 | print("-" * 50)
58 |
59 |
60 | def show_menu():
61 | """显示主菜单"""
62 | print_section("主菜单")
63 | print("1. 爬取新的评论并分析")
64 | print("2. 分析已有的评论数据")
65 | print("3. 同时执行爬取和分析")
66 | print("4. 通过关键词搜索视频并爬取评论")
67 | print("0. 退出程序")
68 | return input("\n请选择操作 [0-4]: ")
69 |
70 |
71 | def crawl_comments():
72 | """爬取评论功能"""
73 | print_section("评论爬取")
74 |
75 | # 获取视频URL
76 | video_url = input("请输入抖音视频URL (例如: https://www.douyin.com/video/7353500880198536457): ")
77 | if not video_url:
78 | print("错误: URL不能为空!")
79 | return None
80 |
81 | # 设置最大爬取页数
82 | try:
83 | pages_input = input("请输入最大爬取页数 (直接回车表示爬取全部评论): ")
84 | max_pages = int(pages_input) if pages_input.strip() else None
85 | except ValueError:
86 | max_pages = None
87 |
88 | if max_pages is None:
89 | print("将爬取全部评论,直到没有更多评论为止")
90 | else:
91 | print(f"将爬取最多 {max_pages} 页评论")
92 |
93 | # 询问是否使用正常模式
94 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n'
95 |
96 | # 如果使用正常模式,询问是否需要先登录
97 | login_first = False
98 | if use_normal_mode:
99 | login_first = input("是否需要在爬取前先登录抖音账号 [y/N]: ").lower() == 'y'
100 |
101 | # 创建爬虫实例
102 | print("\n正在初始化爬虫...")
103 | crawler = DouyinCommentCrawler(
104 | video_url=video_url,
105 | max_pages=max_pages,
106 | use_normal_mode=use_normal_mode,
107 | login_first=login_first
108 | )
109 |
110 | # 执行爬取
111 | print("\n开始爬取评论,请稍候...\n")
112 | start_time = time.time()
113 | comments = crawler.start_crawler()
114 | end_time = time.time()
115 |
116 | # 打印爬取结果
117 | if comments:
118 | print(f"\n成功爬取 {len(comments)} 条评论,耗时 {end_time - start_time:.2f} 秒")
119 | print(f"评论已保存到文件: {crawler.get_output_file()}")
120 | return crawler.get_output_file()
121 | else:
122 | print("\n爬取失败或未获取到评论")
123 | return None
124 |
125 |
126 | def analyze_comments(csv_file=None):
127 | """分析评论功能"""
128 | print_section("评论分析")
129 |
130 | if not csv_file:
131 | print("\n请选择操作:")
132 | print("1. 分析最新爬取的评论数据")
133 | print("2. 指定CSV文件进行分析")
134 | choice = input("请输入选项 (1/2): ")
135 |
136 | if choice == "2":
137 | csv_file = input("请输入CSV文件路径: ")
138 | if not os.path.exists(csv_file):
139 | print(f"错误: 文件 {csv_file} 不存在!")
140 | return
141 |
142 | # 设置词云形状图片
143 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ")
144 | if shape_img and not os.path.exists(shape_img):
145 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状")
146 | shape_img = None
147 |
148 | # 创建分析器
149 | print("\n正在初始化分析器...")
150 | analyzer = CommentAnalyzer(csv_file=csv_file)
151 |
152 | # 执行所有分析
153 | print("\n开始分析评论数据,请稍候...\n")
154 | start_time = time.time()
155 | try:
156 | outputs = analyzer.run_all_analysis(shape_img=shape_img)
157 | end_time = time.time()
158 |
159 | print(f"\n分析完成! 耗时 {end_time - start_time:.2f} 秒")
160 | print("生成的文件:")
161 | for output in outputs:
162 | print(f" - {output}")
163 | except Exception as e:
164 | print(f"分析过程中出错: {str(e)}")
165 |
166 |
167 | def search_and_crawl_comments():
168 | """通过关键词搜索视频并爬取评论功能"""
169 | print_section("视频搜索与评论爬取")
170 |
171 | # 获取搜索关键词
172 | keyword = input("请输入要搜索的关键词: ")
173 | if not keyword:
174 | print("错误: 关键词不能为空!")
175 | return None
176 |
177 | # 设置搜索结果数量
178 | try:
179 | result_count_input = input("请输入要显示的最大搜索结果数量 (直接回车默认为10): ")
180 | max_results = int(result_count_input) if result_count_input.strip() else 10
181 | except ValueError:
182 | max_results = 10
183 |
184 | # 询问是否使用正常模式
185 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n'
186 |
187 | # 如果使用正常模式,询问是否需要先登录
188 | login_first = False
189 | if use_normal_mode:
190 | login_first = input("是否需要在搜索前先登录抖音账号 [y/N]: ").lower() == 'y'
191 |
192 | # 创建搜索器实例
193 | print("\n正在初始化搜索器...")
194 | searcher = DouyinVideoSearcher(
195 | use_normal_mode=use_normal_mode,
196 | login_first=login_first
197 | )
198 |
199 | # 执行搜索
200 | print(f"\n开始搜索关键词: \"{keyword}\",请稍候...\n")
201 | start_time = time.time()
202 | search_results = searcher.search_videos(keyword, max_results)
203 | end_time = time.time()
204 |
205 | # 处理搜索结果
206 | if not search_results:
207 | print("\n未找到相关视频或搜索失败")
208 | if searcher.driver:
209 | searcher.close()
210 | return None
211 |
212 | print(f"\n成功找到 {len(search_results)} 个相关视频,耗时 {end_time - start_time:.2f} 秒")
213 |
214 | # 显示搜索结果
215 | searcher.display_search_results()
216 |
217 | # 用户选择视频
218 | selected_video = searcher.select_video()
219 | if not selected_video:
220 | print("\n未选择任何视频,操作取消")
221 | if searcher.driver:
222 | searcher.close()
223 | return None
224 |
225 | # 获取选定视频的URL
226 | video_url = selected_video['url']
227 |
228 | # 设置最大爬取页数
229 | try:
230 | pages_input = input("\n请输入最大爬取页数 (直接回车表示爬取全部评论): ")
231 | max_pages = int(pages_input) if pages_input.strip() else None
232 | except ValueError:
233 | max_pages = None
234 |
235 | if max_pages is None:
236 | print("将爬取全部评论,直到没有更多评论为止")
237 | else:
238 | print(f"将爬取最多 {max_pages} 页评论")
239 |
240 | # 关闭搜索浏览器
241 | if searcher.driver:
242 | searcher.close()
243 |
244 | # 创建爬虫实例
245 | print("\n正在初始化爬虫...")
246 | crawler = DouyinCommentCrawler(
247 | video_url=video_url,
248 | max_pages=max_pages,
249 | use_normal_mode=use_normal_mode,
250 | login_first=login_first
251 | )
252 |
253 | # 执行爬取
254 | print("\n开始爬取评论,请稍候...\n")
255 | start_time = time.time()
256 | comments = crawler.start_crawler()
257 | end_time = time.time()
258 |
259 | # 打印爬取结果
260 | if comments:
261 | print(f"\n成功爬取 {len(comments)} 条评论,耗时 {end_time - start_time:.2f} 秒")
262 | print(f"评论已保存到文件: {crawler.get_output_file()}")
263 | return crawler.get_output_file()
264 | else:
265 | print("\n爬取失败或未获取到评论")
266 | return None
267 |
268 |
269 | def main():
270 | """主函数"""
271 | show_banner()
272 |
273 | while True:
274 | choice = show_menu()
275 |
276 | if choice == "1":
277 | # 爬取新评论
278 | csv_file = crawl_comments()
279 |
280 | # 询问是否要分析
281 | if csv_file:
282 | if input("\n是否要分析刚爬取的评论数据? (y/n): ").lower() == 'y':
283 | analyze_comments(csv_file)
284 |
285 | elif choice == "2":
286 | # 分析已有评论
287 | analyze_comments()
288 |
289 | elif choice == "3":
290 | # 爬取并分析
291 | csv_file = crawl_comments()
292 | if csv_file:
293 | print("\n自动开始分析评论数据...")
294 | analyze_comments(csv_file)
295 |
296 | elif choice == "4":
297 | # 搜索并爬取评论
298 | csv_file = search_and_crawl_comments()
299 |
300 | # 询问是否要分析
301 | if csv_file:
302 | if input("\n是否要分析刚爬取的评论数据? (y/n): ").lower() == 'y':
303 | analyze_comments(csv_file)
304 |
305 | elif choice == "0":
306 | print("\n感谢使用,再见!")
307 | break
308 |
309 | else:
310 | print("\n无效的选择,请重新输入")
311 |
312 | input("\n按Enter键继续...")
313 |
314 |
315 | if __name__ == "__main__":
316 | main()
--------------------------------------------------------------------------------
/douyin_analysis_results/抖音视频搜索.py:
--------------------------------------------------------------------------------
1 | """
2 | 抖音视频搜索模块 - URL版本
3 |
4 | 功能:
5 | 1. 根据关键词搜索抖音视频
6 | 2. 返回搜索结果中的视频列表
7 | 3. 支持通过网络接口获取搜索结果,无需浏览器
8 |
9 | 日期: 2024年
10 | """
11 |
12 | import time
13 | import random
14 | import re
15 | import json
16 | import requests
17 | from urllib.parse import quote, urlencode
18 | import hashlib
19 |
20 |
21 | class DouyinVideoSearcher:
22 | """抖音视频搜索类 - URL版本"""
23 |
24 | def __init__(self):
25 | """初始化搜索器"""
26 | self.session = requests.Session()
27 | self.headers = {
28 | 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
29 | 'Referer': 'https://www.douyin.com/',
30 | 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
31 | 'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
32 | 'Sec-Fetch-Site': 'same-origin',
33 | 'Sec-Fetch-Mode': 'navigate',
34 | }
35 | self.search_results = []
36 |
37 | def search_direct_url(self, keyword, max_videos=10):
38 | """
39 | 使用网络搜索API直接获取视频列表
40 |
41 | :param keyword: 搜索关键词
42 | :param max_videos: 最大返回视频数量
43 | :return: 视频列表
44 | """
45 | print(f"\n正在初始化搜索器...")
46 |
47 | if not keyword:
48 | print("错误: 搜索关键词不能为空")
49 | return []
50 |
51 | print(f"\n开始搜索关键词: \"{keyword}\",请稍候...")
52 |
53 | try:
54 | # 方法1:搜索页URL格式
55 | encoded_keyword = quote(keyword)
56 | search_url = f"https://www.douyin.com/search/{encoded_keyword}"
57 |
58 | # 访问搜索页面
59 | results = self._fetch_search_results(search_url, max_videos)
60 | if results:
61 | print(f"搜索成功,找到 {len(results)} 个视频")
62 | self.search_results = results
63 | return results
64 |
65 | # 方法2:使用热门分享URL
66 | print("尝试使用备用方法搜索...")
67 | try:
68 | backup_results = self._search_by_keywords(keyword, max_videos)
69 | if backup_results:
70 | print(f"备用搜索成功,找到 {len(backup_results)} 个视频")
71 | self.search_results = backup_results
72 | return backup_results
73 | except Exception as e:
74 | print(f"备用搜索方法失败: {e}")
75 |
76 | print("未找到相关视频或搜索失败")
77 | return []
78 |
79 | except Exception as e:
80 | print(f"搜索出错: {str(e)}")
81 | return []
82 |
83 | def _fetch_search_results(self, url, max_count=10):
84 | """从URL获取搜索结果"""
85 | try:
86 | response = self.session.get(url, headers=self.headers, timeout=10)
87 | if response.status_code != 200:
88 | return []
89 |
90 | # 尝试从HTML中提取视频信息
91 | html = response.text
92 | video_ids = re.findall(r'/video/(\d+)', html)
93 |
94 | if not video_ids:
95 | return []
96 |
97 | # 去重
98 | video_ids = list(set(video_ids))[:max_count]
99 |
100 | results = []
101 | for vid in video_ids:
102 | video_url = f"https://www.douyin.com/video/{vid}"
103 | # 获取视频详情
104 | try:
105 | video_info = self._fetch_video_info(video_url)
106 | if video_info:
107 | results.append(video_info)
108 | except Exception as e:
109 | print(f"获取视频 {vid} 信息失败: {str(e)}")
110 |
111 | return results
112 |
113 | except Exception as e:
114 | print(f"获取搜索结果失败: {str(e)}")
115 | return []
116 |
117 | def _fetch_video_info(self, video_url):
118 | """获取视频详细信息"""
119 | try:
120 | response = self.session.get(video_url, headers=self.headers, timeout=10)
121 | if response.status_code != 200:
122 | return None
123 |
124 | html = response.text
125 |
126 | # 提取标题
127 | title_match = re.search(r']*>(.*?)', html)
128 | title = title_match.group(1) if title_match else "未知标题"
129 | # 清理标题
130 | title = title.replace(" - 抖音", "").strip()
131 |
132 | # 提取作者
133 | author_match = re.search(r'name="author" content="([^"]+)"', html)
134 | author = author_match.group(1) if author_match else "未知作者"
135 |
136 | # 提取视频ID
137 | video_id = video_url.split("/")[-1].split("?")[0]
138 |
139 | return {
140 | 'title': title,
141 | 'author': author,
142 | 'url': video_url,
143 | 'video_id': video_id,
144 | 'likes': "未知",
145 | 'comments': "未知"
146 | }
147 |
148 | except Exception as e:
149 | print(f"获取视频信息失败: {str(e)}")
150 | return None
151 |
152 | def _search_by_keywords(self, keyword, max_count=10):
153 | """使用关键词搜索抖音视频的URL方法"""
154 | # 构建搜索URL
155 | keyword_for_api = quote(keyword)
156 | search_api_url = f"https://www.douyin.com/aweme/v1/web/general/search/single/"
157 |
158 | # 生成时间戳和设备ID
159 | timestamp = str(int(time.time()))
160 | device_id = hashlib.md5(timestamp.encode()).hexdigest()[:16] # 简单模拟设备ID
161 |
162 | # 搜索参数
163 | params = {
164 | 'keyword': keyword,
165 | 'device_platform': 'webapp',
166 | 'source': 'normal_search',
167 | 'search_channel': 'aweme_general',
168 | 'type': 1, # 视频类型
169 | 'device_id': device_id,
170 | 'count': max_count,
171 | 'version_name': '23.5.0',
172 | 'aid': 6383
173 | }
174 |
175 | headers = self.headers.copy()
176 | headers['Content-Type'] = 'application/json'
177 |
178 | try:
179 | response = self.session.get(
180 | search_api_url,
181 | params=params,
182 | headers=headers,
183 | timeout=10
184 | )
185 |
186 | # 尝试解析JSON响应
187 | if response.status_code == 200:
188 | try:
189 | data = response.json()
190 | if 'status_code' in data and data['status_code'] == 0:
191 | # 提取视频信息
192 | videos = []
193 | for item in data.get('data', []):
194 | if 'aweme_info' in item:
195 | video = item['aweme_info']
196 | video_id = video.get('aweme_id')
197 | title = video.get('desc', '未知标题')
198 | author = video.get('author', {}).get('nickname', '未知作者')
199 | video_url = f"https://www.douyin.com/video/{video_id}"
200 |
201 | videos.append({
202 | 'title': title,
203 | 'author': author,
204 | 'url': video_url,
205 | 'video_id': video_id,
206 | 'likes': "未知",
207 | 'comments': "未知"
208 | })
209 |
210 | return videos
211 | except json.JSONDecodeError:
212 | pass
213 |
214 | # 备用方法:使用web搜索页面
215 | backup_url = f"https://www.douyin.com/search/{keyword_for_api}?source=normal_search&type=video"
216 | return self._fetch_search_results(backup_url, max_count)
217 |
218 | except Exception as e:
219 | print(f"通过关键词API搜索失败: {str(e)}")
220 | # 尝试备用方法
221 | try:
222 | backup_url = f"https://www.douyin.com/search/{keyword_for_api}?aid=0&source=normal_search&type=video"
223 | return self._fetch_search_results(backup_url, max_count)
224 | except:
225 | return []
226 |
227 | def search_videos(self, keyword, max_videos=10):
228 | """保留旧的接口名称兼容已有代码调用"""
229 | return self.search_direct_url(keyword, max_videos)
230 |
231 | def display_search_results(self):
232 | """显示搜索结果"""
233 | if not self.search_results:
234 | print("没有找到视频结果")
235 | return
236 |
237 | print("\n" + "=" * 80)
238 | print(" 搜索结果 ".center(78, "="))
239 | print("=" * 80)
240 |
241 | for i, video in enumerate(self.search_results):
242 | # 安全获取视频信息
243 | title = video.get('title', '未知标题')
244 | author = video.get('author', '未知作者')
245 | likes = video.get('likes', '未知')
246 | comments = video.get('comments', '未知')
247 | video_url = video.get('url', '')
248 |
249 | print(f"\n[{i+1}] {title}")
250 | print(f" 作者: {author}")
251 | print(f" 点赞: {likes} | 评论: {comments}")
252 | print(f" 链接: {video_url}")
253 | print("-" * 80)
254 |
255 | def select_video(self):
256 | """让用户选择一个视频"""
257 | if not self.search_results:
258 | print("没有可选择的视频")
259 | return None
260 |
261 | # 安全处理,确保至少有一个有效结果
262 | valid_results = [v for v in self.search_results if 'url' in v and v['url']]
263 | if not valid_results:
264 | print("没有找到有效的视频链接")
265 | return None
266 |
267 | # 将有效结果更新回搜索结果
268 | self.search_results = valid_results
269 |
270 | while True:
271 | try:
272 | choice = input("\n请选择要爬取评论的视频编号 [1-{}]: ".format(len(self.search_results)))
273 |
274 | if not choice.strip():
275 | return None
276 |
277 | index = int(choice) - 1
278 | if 0 <= index < len(self.search_results):
279 | selected_video = self.search_results[index]
280 | print(f"\n已选择: {selected_video.get('title', '未知视频')}")
281 | return selected_video
282 | else:
283 | print(f"无效的选择,请输入 1-{len(self.search_results)} 之间的数字")
284 | except ValueError:
285 | print("请输入有效的数字")
286 | except Exception as e:
287 | print(f"选择视频时发生错误: {str(e)}")
288 | return None
289 |
290 |
291 | def main():
292 | """主函数"""
293 | print("=" * 60)
294 | print("抖音视频搜索工具 - URL版本")
295 | print("=" * 60)
296 |
297 | # 创建搜索器实例
298 | searcher = DouyinVideoSearcher()
299 |
300 | while True:
301 | # 获取搜索关键词
302 | keyword = input("\n请输入搜索关键词 (直接回车退出): ")
303 |
304 | if not keyword.strip():
305 | print("退出搜索")
306 | break
307 |
308 | # 设置最大返回结果数
309 | try:
310 | max_count_input = input("请输入最大返回结果数 (直接回车使用默认值10): ")
311 | max_count = int(max_count_input) if max_count_input.strip() else 10
312 | except ValueError:
313 | max_count = 10
314 | print("输入格式错误,使用默认值10")
315 |
316 | # 执行搜索
317 | videos = searcher.search_direct_url(keyword, max_count)
318 |
319 | # 显示搜索结果
320 | searcher.display_search_results()
321 |
322 | # 选择视频(如果有结果)
323 | if videos:
324 | selected = searcher.select_video()
325 | if selected:
326 | # 询问是否爬取评论
327 | crawl_choice = input("\n是否立即爬取该视频的评论? (y/n): ")
328 | if crawl_choice.lower() == 'y':
329 | try:
330 | # 导入爬虫模块并爬取评论
331 | from 抖音评论爬虫 import DouyinCommentCrawler
332 |
333 | # 询问是否使用正常浏览器模式
334 | use_normal_mode = input("是否使用正常浏览器模式 (可以登录账号) [Y/n]: ").lower() != 'n'
335 |
336 | # 创建爬虫实例
337 | crawler = DouyinCommentCrawler(
338 | video_url=selected['url'],
339 | use_normal_mode=use_normal_mode,
340 | login_first=False if not use_normal_mode else input("是否需要在爬取前先登录抖音账号 [y/N]: ").lower() == 'y'
341 | )
342 |
343 | # 执行爬取
344 | crawler.start_crawler()
345 |
346 | except ImportError:
347 | print("未找到评论爬虫模块,请确保 douyin_crawler.py 在正确的位置")
348 | except Exception as e:
349 | print(f"爬取评论时出错: {str(e)}")
350 |
351 | # 询问是否继续搜索
352 | continue_choice = input("\n是否继续搜索? (y/n): ")
353 | if continue_choice.lower() != 'y':
354 | break
355 |
356 | print("\n感谢使用抖音视频搜索工具!")
357 |
358 |
359 | if __name__ == "__main__":
360 | main()
--------------------------------------------------------------------------------
/douyin_analysis_results/抖音评论分析器_旧版.py:
--------------------------------------------------------------------------------
1 | """
2 | 抖音视频评论爬取与数据可视化分析工具
3 |
4 | 功能:
5 | 1. 自动爬取指定抖音视频的评论数据
6 | 2. 将评论数据保存为CSV格式
7 | 3. 生成评论词云图
8 | 4. 评论情感分析与地区分布可视化
9 |
10 | 日期: 2024年
11 | """
12 |
13 | import time
14 | import json
15 | import datetime
16 | import csv
17 | import os
18 | import random
19 | import jieba
20 | import pandas as pd
21 | import numpy as np
22 | from PIL import Image
23 | import wordcloud
24 | import matplotlib.pyplot as plt
25 | from pyecharts import options as opts
26 | from pyecharts.charts import Pie, Bar, Map, WordCloud as PyechartsWordCloud
27 | from pyecharts.globals import ThemeType
28 | from collections import Counter
29 | from DrissionPage import ChromiumPage
30 |
31 |
32 | class DouyinCommentCrawler:
33 | """抖音评论爬虫类"""
34 |
35 | def __init__(self, video_url=None, video_id=None, max_pages=None):
36 | """
37 | 初始化爬虫
38 | :param video_url: 视频URL,例如 https://www.douyin.com/video/7353500880198536457
39 | :param video_id: 视频ID,如果提供了video_url则可不提供
40 | :param max_pages: 最大爬取页数,默认为None表示爬取全部评论
41 | """
42 | self.video_url = video_url
43 | self.video_id = video_id if video_id else self._extract_video_id(video_url)
44 | self.max_pages = max_pages
45 | self.comments = []
46 | self.driver = None
47 | self.comment_ids = set() # 用于去重的评论ID集合
48 |
49 | # 使用当前日期和时间创建唯一的文件名
50 | current_time = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
51 | self.output_file = f"douyin_comments_{self.video_id}_{current_time}.csv"
52 |
53 | def _extract_video_id(self, url):
54 | """从URL中提取视频ID"""
55 | if not url:
56 | raise ValueError("需要提供视频URL或视频ID")
57 | return url.split("/")[-1].split("?")[0]
58 |
59 | def start_crawler(self):
60 | """启动爬虫"""
61 | print(f"开始爬取视频 {self.video_id} 的评论...")
62 |
63 | # 创建CSV文件
64 | f = open(self.output_file, mode='w', encoding='utf-8-sig', newline='')
65 | fieldnames = ['评论ID', '昵称', '地区', '时间', '评论', '点赞数']
66 | csv_writer = csv.DictWriter(f, fieldnames=fieldnames)
67 | csv_writer.writeheader()
68 |
69 | try:
70 | # 初始化浏览器
71 | self.driver = ChromiumPage()
72 | # 监听评论数据API
73 | self.driver.listen.start('aweme/v1/web/comment/list/')
74 |
75 | # 访问视频页面
76 | if self.video_url:
77 | self.driver.get(self.video_url)
78 | else:
79 | self.driver.get(f'https://www.douyin.com/video/{self.video_id}')
80 |
81 | # 等待页面加载
82 | time.sleep(5)
83 |
84 | # 尝试点击"查看更多评论"按钮(如果存在)
85 | try:
86 | more_comment_btn = self.driver.find_element('xpath://div[contains(text(), "查看更多评论")]')
87 | if more_comment_btn:
88 | more_comment_btn.click()
89 | time.sleep(2)
90 | except:
91 | print("没有找到"查看更多评论"按钮,继续使用滚动加载评论")
92 |
93 | # 爬取评论
94 | page = 0
95 | no_new_comments_count = 0 # 连续没有新评论的次数
96 | last_comment_count = 0 # 上一次的评论总数
97 | retry_count = 0 # 当前页面重试次数
98 | max_retry = 5 # 最大重试次数
99 |
100 | # 如果设置了最大页数,则限制页数;否则一直爬取直到没有更多评论
101 | while self.max_pages is None or page < self.max_pages:
102 | try:
103 | page += 1
104 | print(f'正在爬取第 {page} 页评论...')
105 |
106 | # 使用不同的滚动策略
107 | if page % 3 == 0:
108 | # 精确滚动到评论区
109 | try:
110 | comment_area = self.driver.find_element('xpath://div[contains(@class, "comment-mainContent")]')
111 | if comment_area:
112 | self.driver.scroll.to_element(comment_area, center=True)
113 | time.sleep(1)
114 | except:
115 | pass
116 |
117 | # 平滑滚动到底部
118 | self.driver.scroll.to_bottom(smooth=True)
119 | else:
120 | # 先快速滚动一段距离,再滚动到底部
121 | self.driver.scroll.down(300)
122 | time.sleep(0.5)
123 | self.driver.scroll.to_bottom()
124 |
125 | # 随机等待时间,模拟人工浏览
126 | wait_time = 1 + random.random() * 2
127 | time.sleep(wait_time)
128 |
129 | # 等待数据包
130 | resp = self.driver.listen.wait(timeout=5)
131 |
132 | if not resp:
133 | print(f"未检测到新的评论数据,尝试继续... (重试 {retry_count+1}/{max_retry})")
134 | retry_count += 1
135 |
136 | if retry_count >= max_retry:
137 | no_new_comments_count += 1
138 | retry_count = 0
139 |
140 | if no_new_comments_count >= 3:
141 | print("连续多次未检测到新评论,尝试使用其他方法加载评论")
142 |
143 | # 尝试其他方法触发评论加载
144 | try:
145 | # 尝试点击"展开更多"按钮
146 | expand_btns = self.driver.find_elements('xpath://span[contains(text(), "展开") or contains(text(), "更多")]')
147 | if expand_btns:
148 | for btn in expand_btns[:3]: # 最多点击前3个
149 | try:
150 | btn.click()
151 | time.sleep(1)
152 | except:
153 | pass
154 | except:
155 | pass
156 |
157 | # 再尝试一次,如果还是失败则认为已到达末页
158 | if no_new_comments_count >= 5:
159 | print("已尝试多种方法但无法加载更多评论,可能已到达末页")
160 | break
161 |
162 | # 再次尝试不同的滚动方式
163 | self.driver.scroll.up(200)
164 | time.sleep(1)
165 | self.driver.scroll.to_bottom()
166 | continue
167 |
168 | # 重置重试计数器
169 | retry_count = 0
170 |
171 | # 解析JSON数据
172 | json_data = resp.response.body
173 |
174 | if not json_data or 'comments' not in json_data:
175 | print(f"未获取到有效评论数据,尝试继续... (尝试 {no_new_comments_count+1}/3)")
176 | no_new_comments_count += 1
177 | if no_new_comments_count >= 3:
178 | print("连续多次未获取到有效评论数据,可能已到达末页")
179 | break
180 | continue
181 |
182 | # 提取评论
183 | comments = json_data['comments']
184 | if not comments:
185 | print("本页无评论数据,可能已到达末页")
186 | no_new_comments_count += 1
187 | if no_new_comments_count >= 3:
188 | break
189 | continue
190 |
191 | # 重置无新评论计数器(如果找到了评论)
192 | no_new_comments_count = 0
193 |
194 | # 记录爬取前的评论数和评论ID数
195 | comment_count_before = len(self.comments)
196 | comment_id_count_before = len(self.comment_ids)
197 |
198 | # 处理评论数据
199 | for comment in comments:
200 | try:
201 | comment_id = comment.get('cid', '') or str(comment.get('id', ''))
202 |
203 | # 如果已经处理过这个评论,则跳过
204 | if comment_id in self.comment_ids:
205 | continue
206 |
207 | # 添加到已处理集合
208 | self.comment_ids.add(comment_id)
209 |
210 | nickname = comment['user']['nickname']
211 | create_time = comment['create_time']
212 | date = str(datetime.datetime.fromtimestamp(create_time))
213 | ip_label = comment.get('ip_label', '未知')
214 | text = comment['text']
215 | digg_count = comment.get('digg_count', 0) # 点赞数
216 |
217 | # 创建评论数据字典
218 | comment_data = {
219 | '评论ID': comment_id,
220 | '昵称': nickname,
221 | '地区': ip_label,
222 | '时间': date,
223 | '评论': text,
224 | '点赞数': digg_count
225 | }
226 |
227 | # 保存到列表和文件
228 | self.comments.append(comment_data)
229 | csv_writer.writerow(comment_data)
230 | print(f"[{len(self.comments)}] 评论: {text[:30]}... - 来自: {nickname} - {ip_label}")
231 |
232 | except Exception as e:
233 | print(f"处理评论时出错: {str(e)}")
234 |
235 | # 检查是否有新的评论被添加
236 | comment_count_added = len(self.comments) - comment_count_before
237 | comment_id_added = len(self.comment_ids) - comment_id_count_before
238 |
239 | print(f"本次获取了 {comment_count_added} 条新评论,累计 {len(self.comments)} 条")
240 |
241 | # 如果没有新的评论ID被添加,说明可能需要尝试其他方法或已到达末页
242 | if comment_id_added == 0:
243 | no_new_comments_count += 1
244 | print(f"未获取到新评论ID,尝试继续... (尝试 {no_new_comments_count}/3)")
245 |
246 | # 尝试点击页面上的"查看更多回复"按钮
247 | try:
248 | more_reply_btns = self.driver.find_elements('xpath://span[contains(text(), "查看") and contains(text(), "回复")]')
249 | if more_reply_btns:
250 | for btn in more_reply_btns[:5]: # 最多点击前5个
251 | try:
252 | btn.click()
253 | time.sleep(1)
254 | except:
255 | pass
256 | # 点击了按钮后重置计数器,再次尝试
257 | no_new_comments_count = 0
258 | except:
259 | pass
260 |
261 | if no_new_comments_count >= 3:
262 | print("连续多次未获取到新评论,可能已到达末页")
263 |
264 | # 最后再尝试一次刷新页面的方法
265 | if no_new_comments_count == 3:
266 | print("尝试刷新页面后继续爬取...")
267 | self.driver.refresh()
268 | time.sleep(5)
269 | no_new_comments_count = 2 # 给最后一次机会
270 | continue
271 | break
272 | else:
273 | # 有新评论,重置计数器
274 | no_new_comments_count = 0
275 |
276 | except Exception as e:
277 | print(f"爬取第 {page} 页时出错: {str(e)}")
278 | no_new_comments_count += 1
279 | if no_new_comments_count >= 3:
280 | print("连续多次爬取出错,停止爬取")
281 | break
282 |
283 | print(f"评论爬取完成,共获取 {len(self.comments)} 条评论")
284 | return self.comments
285 |
286 | except Exception as e:
287 | print(f"爬虫运行出错: {str(e)}")
288 | return []
289 |
290 | finally:
291 | # 关闭文件和浏览器
292 | f.close()
293 | if self.driver:
294 | self.driver.quit()
295 |
296 | def get_output_file(self):
297 | """获取输出文件路径"""
298 | return self.output_file
299 |
300 |
301 | class CommentAnalyzer:
302 | """评论分析与可视化类"""
303 |
304 | def __init__(self, csv_file=None, comments=None):
305 | """
306 | 初始化分析器
307 | :param csv_file: CSV文件路径
308 | :param comments: 评论数据列表,如果没有提供CSV文件则使用此数据
309 | """
310 | self.csv_file = csv_file
311 | self.comments = comments
312 | self.df = None
313 | self.output_dir = "douyin_analysis_results"
314 |
315 | # 创建输出目录
316 | if not os.path.exists(self.output_dir):
317 | os.makedirs(self.output_dir)
318 |
319 | def find_latest_csv(self, video_id=None):
320 | """
321 | 查找最新的评论CSV文件
322 | :param video_id: 可选的视频ID过滤条件
323 | :return: 最新CSV文件的路径,如果未找到则返回None
324 | """
325 | all_csv_files = []
326 |
327 | # 搜索当前目录下的所有CSV文件
328 | for file in os.listdir('.'):
329 | if file.startswith('douyin_comments_') and file.endswith('.csv'):
330 | # 如果指定了视频ID,则只查找该视频的CSV文件
331 | if video_id and video_id not in file:
332 | continue
333 | all_csv_files.append(file)
334 |
335 | if not all_csv_files:
336 | return None
337 |
338 | # 按文件修改时间排序,返回最新的文件
339 | latest_file = max(all_csv_files, key=lambda x: os.path.getmtime(x))
340 | print(f"找到最新的CSV文件: {latest_file}")
341 | return latest_file
342 |
343 | def load_data(self):
344 | """加载数据"""
345 | if self.csv_file and os.path.exists(self.csv_file):
346 | self.df = pd.read_csv(self.csv_file)
347 | print(f"从 {self.csv_file} 加载了 {len(self.df)} 条评论")
348 | elif self.comments:
349 | self.df = pd.DataFrame(self.comments)
350 | print(f"从内存加载了 {len(self.df)} 条评论")
351 | else:
352 | # 尝试查找最新的CSV文件
353 | latest_csv = self.find_latest_csv()
354 | if latest_csv:
355 | self.csv_file = latest_csv
356 | self.df = pd.read_csv(self.csv_file)
357 | print(f"自动从最新文件 {self.csv_file} 加载了 {len(self.df)} 条评论")
358 | else:
359 | raise ValueError("需要提供CSV文件或评论数据")
360 |
361 | return self.df
362 |
363 | def generate_wordcloud(self, shape_img=None, output_file=None):
364 | """
365 | 生成词云图
366 | :param shape_img: 形状图片路径
367 | :param output_file: 输出文件名
368 | :return: 输出文件路径
369 | """
370 | if self.df is None:
371 | self.load_data()
372 |
373 | # 默认输出文件名
374 | if not output_file:
375 | output_file = os.path.join(self.output_dir, "comment_wordcloud.png")
376 |
377 | print("正在生成词云图...")
378 |
379 | # 合并所有评论
380 | content = ' '.join([str(i).replace('\n', '') for i in self.df['评论']])
381 |
382 | # 结巴分词
383 | jieba.setLogLevel(20) # 设置日志级别,避免输出过多日志
384 | words = jieba.lcut(content)
385 | string = ' '.join(words)
386 |
387 | # 加载形状图片
388 | mask = None
389 | if shape_img and os.path.exists(shape_img):
390 | mask = np.array(Image.open(shape_img))
391 |
392 | # 设置停用词
393 | stopwords = {'了', '的', '我', '你', '是', '都', '把', '能', '就', '这', '还',
394 | '和', '啊', '在', '吧', '有', '也', '不', '呢', '吗', '啥', '怎么',
395 | '一个', '什么', '一下', '一样', '一直', '为了', '可以', '那么'}
396 |
397 | # 配置词云
398 | wc = wordcloud.WordCloud(
399 | font_path='simhei.ttf' if os.path.exists('simhei.ttf') else None, # 字体文件
400 | width=1000, # 宽
401 | height=700, # 高
402 | mask=mask, # 词云形状
403 | background_color='white', # 背景色
404 | max_words=200, # 最大词数
405 | stopwords=stopwords, # 停用词
406 | contour_width=1, # 轮廓宽度
407 | contour_color='steelblue' # 轮廓颜色
408 | )
409 |
410 | # 生成词云
411 | wc.generate(string)
412 |
413 | # 保存词云图
414 | wc.to_file(output_file)
415 | print(f"词云图已保存至: {output_file}")
416 |
417 | return output_file
418 |
419 | def analyze_location(self, output_file=None):
420 | """
421 | 分析评论地区分布
422 | :param output_file: 输出文件名
423 | :return: 输出文件路径
424 | """
425 | if self.df is None:
426 | self.load_data()
427 |
428 | if not output_file:
429 | output_file = os.path.join(self.output_dir, "location_analysis.html")
430 |
431 | print("正在分析评论地区分布...")
432 |
433 | # 统计地区
434 | location_count = self.df['地区'].value_counts()
435 |
436 | # 取前15个地区
437 | top_locations = location_count.head(15)
438 |
439 | # 创建饼图
440 | pie = (
441 | Pie(init_opts=opts.InitOpts(theme=ThemeType.LIGHT, width="900px", height="500px"))
442 | .add(
443 | "",
444 | [list(z) for z in zip(top_locations.index, top_locations.values)],
445 | radius=["30%", "75%"],
446 | center=["50%", "50%"],
447 | rosetype="radius",
448 | )
449 | .set_global_opts(
450 | title_opts=opts.TitleOpts(title="评论地区分布"),
451 | legend_opts=opts.LegendOpts(orient="vertical", pos_left="5%", pos_top="15%"),
452 | toolbox_opts=opts.ToolboxOpts()
453 | )
454 | .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c} ({d}%)"))
455 | )
456 |
457 | # 保存图表
458 | pie.render(output_file)
459 | print(f"地区分析已保存至: {output_file}")
460 |
461 | return output_file
462 |
463 | def analyze_time_distribution(self, output_file=None):
464 | """
465 | 分析评论时间分布
466 | :param output_file: 输出文件名
467 | :return: 输出文件路径
468 | """
469 | if self.df is None:
470 | self.load_data()
471 |
472 | if not output_file:
473 | output_file = os.path.join(self.output_dir, "time_analysis.html")
474 |
475 | print("正在分析评论时间分布...")
476 |
477 | # 转换时间字符串为datetime对象
478 | self.df['时间'] = pd.to_datetime(self.df['时间'])
479 |
480 | # 提取小时
481 | self.df['小时'] = self.df['时间'].dt.hour
482 |
483 | # 统计每小时的评论数
484 | hour_count = self.df['小时'].value_counts().sort_index()
485 |
486 | # 创建条形图
487 | bar = (
488 | Bar(init_opts=opts.InitOpts(theme=ThemeType.LIGHT, width="900px", height="500px"))
489 | .add_xaxis(hour_count.index.tolist())
490 | .add_yaxis("评论数", hour_count.values.tolist())
491 | .set_global_opts(
492 | title_opts=opts.TitleOpts(title="评论时间分布 (小时)"),
493 | xaxis_opts=opts.AxisOpts(name="小时"),
494 | yaxis_opts=opts.AxisOpts(name="评论数"),
495 | toolbox_opts=opts.ToolboxOpts()
496 | )
497 | )
498 |
499 | # 保存图表
500 | bar.render(output_file)
501 | print(f"时间分析已保存至: {output_file}")
502 |
503 | return output_file
504 |
505 | def analyze_hot_words(self, top_n=50, output_file=None):
506 | """
507 | 分析热门词汇
508 | :param top_n: 热门词数量
509 | :param output_file: 输出文件名
510 | :return: 输出文件路径
511 | """
512 | if self.df is None:
513 | self.load_data()
514 |
515 | if not output_file:
516 | output_file = os.path.join(self.output_dir, "hot_words.html")
517 |
518 | print(f"正在分析热门词汇 (Top {top_n})...")
519 |
520 | # 合并所有评论
521 | content = ' '.join([str(i).replace('\n', '') for i in self.df['评论']])
522 |
523 | # 结巴分词
524 | jieba.setLogLevel(20)
525 | words = jieba.lcut(content)
526 |
527 | # 过滤停用词
528 | stopwords = {'了', '的', '我', '你', '是', '都', '把', '能', '就', '这', '还',
529 | '和', '啊', '在', '吧', '有', '也', '不', '呢', '吗', '啥', '怎么',
530 | '一个', '什么', '一下', '一样', '一直', '为了', '可以', '那么'}
531 | filtered_words = [word for word in words if len(word) > 1 and word not in stopwords]
532 |
533 | # 统计词频
534 | word_count = Counter(filtered_words)
535 |
536 | # 取前N个高频词
537 | top_words = word_count.most_common(top_n)
538 |
539 | # 创建词云图
540 | wordcloud_chart = (
541 | PyechartsWordCloud(init_opts=opts.InitOpts(
542 | theme=ThemeType.LIGHT, width="900px", height="500px")
543 | )
544 | .add(
545 | "",
546 | top_words,
547 | word_size_range=[20, 100],
548 | shape="circle"
549 | )
550 | .set_global_opts(
551 | title_opts=opts.TitleOpts(title=f"热门词汇 Top {top_n}"),
552 | toolbox_opts=opts.ToolboxOpts()
553 | )
554 | )
555 |
556 | # 保存图表
557 | wordcloud_chart.render(output_file)
558 | print(f"热门词汇分析已保存至: {output_file}")
559 |
560 | return output_file
561 |
562 | def run_all_analysis(self, shape_img=None):
563 | """
564 | 运行所有分析
565 | :param shape_img: 词云形状图片
566 | :return: 所有输出文件的列表
567 | """
568 | if self.df is None:
569 | self.load_data()
570 |
571 | outputs = []
572 |
573 | # 生成词云
574 | outputs.append(self.generate_wordcloud(shape_img))
575 |
576 | # 地区分析
577 | outputs.append(self.analyze_location())
578 |
579 | # 时间分析
580 | outputs.append(self.analyze_time_distribution())
581 |
582 | # 热词分析
583 | outputs.append(self.analyze_hot_words())
584 |
585 | print(f"所有分析已完成,结果保存在 {self.output_dir} 目录")
586 | return outputs
587 |
588 |
589 | def main():
590 | """主函数"""
591 | print("=" * 60)
592 | print("抖音视频评论爬取与数据可视化分析工具")
593 | print("=" * 60)
594 |
595 | # 询问用户操作类型
596 | mode = input("请选择操作类型:\n1. 爬取新的评论并分析\n2. 分析已有的CSV文件\n请输入选项编号 (1/2): ")
597 |
598 | if mode == "2":
599 | # 分析现有CSV文件
600 | print("\n== 分析已有评论数据 ==")
601 |
602 | # 初始化分析器 - 会自动查找最新的CSV文件
603 | analyzer = CommentAnalyzer()
604 |
605 | # 设置词云形状图片
606 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ")
607 | if shape_img and not os.path.exists(shape_img):
608 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状")
609 | shape_img = None
610 |
611 | # 执行所有分析
612 | analyzer.run_all_analysis(shape_img=shape_img)
613 |
614 | else:
615 | # 爬取新的评论数据
616 | print("\n== 爬取新的评论数据 ==")
617 |
618 | # 获取视频URL
619 | video_url = input("请输入抖音视频URL (例如: https://www.douyin.com/video/7353500880198536457): ")
620 |
621 | # 设置最大爬取页数
622 | try:
623 | pages_input = input("请输入最大爬取页数 (直接回车表示爬取全部评论): ")
624 | max_pages = int(pages_input) if pages_input.strip() else None
625 | except ValueError:
626 | max_pages = None
627 |
628 | if max_pages is None:
629 | print("将爬取全部评论,直到没有更多评论为止")
630 | else:
631 | print(f"将爬取最多 {max_pages} 页评论")
632 |
633 | # 设置词云形状图片
634 | shape_img = input("请输入词云形状图片路径 (留空使用默认形状): ")
635 | if shape_img and not os.path.exists(shape_img):
636 | print(f"警告: 图片 {shape_img} 不存在,将使用默认形状")
637 | shape_img = None
638 |
639 | # 创建爬虫实例
640 | crawler = DouyinCommentCrawler(video_url=video_url, max_pages=max_pages)
641 |
642 | # 执行爬取
643 | comments = crawler.start_crawler()
644 |
645 | if comments:
646 | # 获取输出文件
647 | csv_file = crawler.get_output_file()
648 |
649 | # 创建分析器
650 | analyzer = CommentAnalyzer(csv_file=csv_file)
651 |
652 | # 执行所有分析
653 | analyzer.run_all_analysis(shape_img=shape_img)
654 |
655 | print("程序运行完成!")
656 |
657 |
658 | if __name__ == "__main__":
659 | main()
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | DrissionPage>=4.0.0
2 | pandas>=1.0.0
3 | numpy>=1.18.0
4 | jieba>=0.42.1
5 | Pillow>=8.0.0
6 | wordcloud>=1.8.0
7 | matplotlib>=3.3.0
8 | pyecharts>=1.9.0
9 | scikit-learn>=0.24.0
10 | networkx>=2.5.0
--------------------------------------------------------------------------------