├── README.md ├── douyin_parse.exe ├── douyin_parse.py └── ui.py /README.md: -------------------------------------------------------------------------------- 1 | # douyin_parse 2 | 抖音短视频解析 3 | 在网上看过一些论坛帖,有一些解析抖音无水印视频的教程。说是教程,其实大部分都是提供接口,或引流或卖接口。我想看看究竟是怎么实现的去水印。立帖记录全过程。 4 | 5 | ## 1.浏览器分析 6 | 从抖音短视频中分享一段视频。可以得到: 7 | 8 | > \#在抖音,记录美好生活#再见,武汉!战“疫”英雄要回家了。一路平安~https://v.douyin.com/WuRMPV/ 复制此链接,打开【抖音短视频】,直接观看视频! 9 | 10 | 我将这段文字中的链接部分复制下来,在浏览器打开。并使用开发者工具调试。 11 | 12 | ![浏览器打开初始链接](https://upload-images.jianshu.io/upload_images/13604849-07dfb8fb61b824bd.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 13 | 14 | 可以看到在video标签中存在一个链接。 15 | ``` 16 | https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0200fba0000bpo4s1b82vu9dp4ehlog&line=0 17 | ``` 18 | 复制该链接在浏览器打开: 19 | ![直接打开src链接](https://upload-images.jianshu.io/upload_images/13604849-f36ff52b59fbd569.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 20 | 21 | 发现仍然是带水印的视频,而且页面重定向成了新地址: 22 | ``` 23 | http://v6-dy-y.ixigua.com 24 | /8d090338ca04948b648bb7e4ba0b215f/5e72da81/video/tos/hxsy/tos-hxsy-ve-0015/832e6e52408d4c1e931b763b152e5d21 25 | /?a=1128&br=0&bt=2405&cr=0&cs=0&dr=0&ds=3&er=&l=202003190935350101940982142734B1FC&lr=aweme&qs=0&rc=am9oc 26 | zx5OzQ3czMzZGkzM0ApODVpNzk8OWRmNzVnM2g1N2dsZTFhci9fcGxfLS1fLS9zczM0Yl8vMzVfYGBhNmItYTE6Yw%3D%3D&vl=&vr= 27 | ``` 28 | 29 | 分析之前的地址: 30 | 31 | ### **https**://aweme**.snssdk.com**/aweme/v1/**playwm**/?video_id=v0200fba0000bpo4s1b82vu9dp4ehlog&line=0 32 | 33 | 包含**playwm** 后面的wm是什么意思?将**playwm**改成**play**,并将请求的User-Agent修改为手机。便得到了无水印版本的视频。手动操作部分结束! 34 | 35 | ![无水印视频](https://upload-images.jianshu.io/upload_images/13604849-b7a9a1bd21f8c49c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 36 | 37 | 38 | ## 2. 代码实现 39 | 40 | 先测试一下下载流媒体。 41 | ```python 42 | def download(video_url, file_name): 43 | r = get_resp(video_url) 44 | with open(file_name, 'wb') as mp4: 45 | for trunk in r.iter_content(1024 * 1024): 46 | if trunk: 47 | mp4.write(trunk) 48 | ``` 49 | 调用之后可以正常下载视频。所以可以放心写爬虫获取到这个真实地址了。剩下的按照第一步的手动操作即可。 50 | 51 | 遇到了一个大问题,即在初始页面上没有视频地址,必须点击一下按钮,才会跳出。故直接用XPATH会找不到要的链接。怎么办呢?首先想到模拟点击,但是这样我就需要使用**selenium**(或许有更好的办法我想不到),这样就会让程序庞大不少。非我所愿。 52 | 仔细观察页面,发现页面下方的js有这样一段: 53 | ```javascript 54 | $(function(){ 55 | require('web:component/reflow_video/index').create({ 56 | hasData: 1, 57 | videoWidth: 720, 58 | videoHeight: 1280, 59 | playAddr: "https://aweme.snssdk.com/aweme/v1/playwm/?s_vid=93f1b41336a8b7a442dbf1c29c6bbc561699c13ffb2ce3cacb960e9bcb7c0b8f9f0ec410108d165bd0bfd2b83c1070676ccafc940fd5dc933ea73704a90e4faf&line=0", 60 | cover: "https://p3.pstatp.com/large/tos-cn-p-0015/584d6a06932940998a1decc057ab2978_1584418313.jpg" 61 | 62 | }); 63 | }); 64 | ``` 65 | 66 | 这不就把地址封面直接给我了吗。实在有种“得来全不费功夫”的感觉! 67 | 写一个函数来解析js: 68 | ```python 69 | # 从script中获取真实视频地址 70 | def findUrlInScript(script): 71 | test = script.split('playAddr: "', 1) 72 | test = test[1].split('",', 1) 73 | like_link = test[0] 74 | link = like_link.replace('playwm', 'play').strip() 75 | return link 76 | ``` 77 | 给文件命名: 78 | ``` 79 | id = et.xpath("//*[@id='pageletReflowVideo']/div/div[2]/div[2]/div/div[2]/p/text()")[0].split('@')[1] 80 | content = et.xpath("//*[@id='pageletReflowVideo']/div/div[2]/div[2]/p/text()")[0] 81 | content = content.split('#')[0].split(',')[0].split('。')[0].split('?')[0].split('?')[0].split(',')[0].split('!')[0].split('!')[0] 82 | name = id + ':' + content + '.mp4' 83 | ``` 84 | 85 | 随便测试一个,已经可以下载到根目录了。为了工整,还是创建一个文件夹用于保存吧~ 86 | ```python 87 | if not os.path.exists(path): 88 | os.mkdir(os.getcwd() + '\\douyin_download') 89 | os.chdir(path) 90 | ``` 91 | 调用download的时,加一个路径的参数即可。测试成功! 92 | ![下载成功图](https://upload-images.jianshu.io/upload_images/13604849-f3c86610d7ceeb6e.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 93 | 94 | 95 | ![无水印](https://upload-images.jianshu.io/upload_images/13604849-2ff6e764d30e6b1e.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 96 | 97 | 测试确实把水印去了,至此,核心功能已经全部实现,编写总代码~~~ 98 | 99 | ## 3. 最后一步 封装 100 | 从来没有用过python的用户界面,但是这次想发到论坛,所以还是简单做一个用户界面方便使用吧。 101 | 口碑比较好的似乎是PyQt,试一下吧~ 102 | 103 | 工具还是比较好的,但是我第一次用,所以界面比较丑,也存在一些小bug,比如说错误的链接会闪退~下个版本再更新吧 104 | 105 | ![打包后效果](https://upload-images.jianshu.io/upload_images/13604849-5d6e99b4997b55f7.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 106 | 107 | 最终代码已经上传到github上,看到的帮我点个star吧~ 108 | [源码及成果](https://github.com/DLWangSan/douyin_parse) 109 | -------------------------------------------------------------------------------- /douyin_parse.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DLWangSan/douyin_parse/c691aec0ffad87123db9fcb24f2e84d97b1230ea/douyin_parse.exe -------------------------------------------------------------------------------- /douyin_parse.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import time 4 | 5 | import requests 6 | from PyQt5 import QtCore 7 | from lxml import etree 8 | from PyQt5.QtWidgets import QApplication 9 | from PyQt5.QtWidgets import * 10 | 11 | from qianfeng.shizhan.parse_douyin.ui import Ui_MainWindow 12 | 13 | ua_phone = 'Mozilla/5.0 (Linux; Android 6.0; ' \ 14 | 'Nexus 5 Build/MRA58N) AppleWebKit/537.36 (' \ 15 | 'KHTML, like Gecko) Chrome/80.0.3987.116 Mobile Safari/537.36' 16 | ua_win = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' \ 17 | 'AppleWebKit/537.36 (KHTML, like Gecko) ' \ 18 | 'Chrome/80.0.3987.116 Safari/537.36' 19 | 20 | 21 | # 以指定ua发起get请求 22 | def get_resp(url, ua): 23 | headers = { 24 | 'User-Agent': ua 25 | } 26 | resp = requests.get(url, headers=headers) 27 | if resp: 28 | return resp 29 | else: 30 | log_tab.insertPlainText( 31 | time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ':\n' + '链接错误!' + '\n\n') 32 | link_text.clear() 33 | 34 | 35 | # 下载视频用 36 | def get_resp_video(url): 37 | headers = { 38 | 'User-Agent': ua_phone 39 | } 40 | resp = requests.get(url, headers=headers, stream=True) 41 | return resp 42 | 43 | 44 | # 从script中获取真实视频地址 45 | def findUrlInScript(script): 46 | test = script.split('playAddr: "', 1) 47 | test = test[1].split('",', 1) 48 | like_link = test[0] 49 | link = like_link.replace('playwm', 'play').strip() 50 | return link 51 | 52 | 53 | # 链接处理,包含重定向 54 | def parse_shareLink(link): 55 | resp = get_resp(link, ua_win) 56 | # 获取重定向之后的地址 57 | re_link = resp.url 58 | re_resp = get_resp(re_link, ua_win) 59 | et = etree.HTML(re_resp.text) 60 | # 获取链接 61 | script = et.xpath("/html/body/div/script[3]/text()")[0] 62 | script = (str(script)) 63 | # 获取id及content组成文件名 64 | id = et.xpath("//*[@id='pageletReflowVideo']/div/div[2]/div[2]/div/div[2]/p/text()")[0].split('@')[1] 65 | content = et.xpath("//*[@id='pageletReflowVideo']/div/div[2]/div[2]/p/text()")[0] 66 | content = content.split('#')[0].split(',')[0].split('。')[0].split('?')[0].split('?')[0].split(',')[0].split('!')[0].split('!')[0] 67 | name = id + ':' + content + '.mp4' 68 | return name, findUrlInScript(script) 69 | 70 | 71 | # 下载 72 | def download(path, video_url, file_name): 73 | if not os.path.exists(path): 74 | os.mkdir(os.getcwd() + '\\douyin_download') 75 | log_tab.insertPlainText(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ':\n' + '正在创建下载文件夹:douyin_download' + '\n\n') 76 | os.chdir(path) 77 | log_tab.insertPlainText(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ':\n' + file_name + '开始下载...' + '\n\n') 78 | r = get_resp_video(video_url) 79 | link_text.clear() 80 | with open(file_name, 'wb') as mp4: 81 | print(file_name) 82 | for trunk in r.iter_content(chunk_size=1024 * 1024): 83 | if trunk: 84 | mp4.write(trunk) 85 | os.system('explorer.exe /n, %s' % os.getcwd()) 86 | log_tab.insertPlainText(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ':\n' + file_name + '下载完成!' + '\n\n') 87 | 88 | 89 | def download_click(): 90 | log_tab.insertPlainText(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ':\n' + 91 | link_text.toPlainText() + '\n') 92 | share_link = link_text.toPlainText() 93 | name, _url = parse_shareLink(share_link) 94 | resp = get_resp(_url, ua_phone) 95 | # 获取最终下载地址 96 | last_url = resp.url 97 | download('./douyin_download', last_url, name) 98 | os.chdir('..') 99 | 100 | 101 | if __name__ == '__main__': 102 | app = QApplication(sys.argv) 103 | MainWindow = QMainWindow() 104 | MainWindow.setFixedSize(438, 303) 105 | MainWindow.show() 106 | ui = Ui_MainWindow() 107 | ui.setupUi(MainWindow) 108 | log_tab = ui.plainTextEdit 109 | link_text = ui.textEdit 110 | download_button = ui.pushButton_2 111 | download_button.clicked.connect(lambda: download_click()) 112 | 113 | 114 | 115 | 116 | 117 | # print(MainWindow.width(),MainWindow.height()) 118 | sys.exit(app.exec_()) 119 | 120 | 121 | 122 | 123 | -------------------------------------------------------------------------------- /ui.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Form implementation generated from reading ui file 'douyintool.ui' 4 | # 5 | # Created by: PyQt5 UI code generator 5.13.2 6 | # 7 | # WARNING! All changes made in this file will be lost! 8 | 9 | 10 | from PyQt5 import QtCore, QtGui, QtWidgets 11 | 12 | 13 | class Ui_MainWindow(object): 14 | def onclick_clear(self): 15 | self.textEdit.clear() 16 | def setupUi(self, MainWindow): 17 | MainWindow.setObjectName("MainWindow") 18 | MainWindow.resize(438, 303) 19 | MainWindow.setAnimated(True) 20 | MainWindow.setDocumentMode(False) 21 | MainWindow.setUnifiedTitleAndToolBarOnMac(False) 22 | self.centralwidget = QtWidgets.QWidget(MainWindow) 23 | self.centralwidget.setObjectName("centralwidget") 24 | # self.lcdNumber = QtWidgets.QLCDNumber(self.centralwidget) 25 | # self.lcdNumber.setGeometry(QtCore.QRect(340, 0, 64, 23)) 26 | # self.lcdNumber.setObjectName("lcdNumber") 27 | # self.lcdNumber.setWindowIconText('123') 28 | self.textEdit = QtWidgets.QTextEdit(self.centralwidget) 29 | self.textEdit.setGeometry(QtCore.QRect(100, 40, 251, 41)) 30 | self.textEdit.setObjectName("textEdit") 31 | # self.textEdit.insertPlainText('123') 32 | self.textEdit.setAcceptRichText(False) 33 | self.label = QtWidgets.QLabel(self.centralwidget) 34 | self.label.setGeometry(QtCore.QRect(40, 40, 81, 41)) 35 | self.label.setObjectName("label") 36 | self.pushButton = QtWidgets.QPushButton(self.centralwidget) 37 | self.pushButton.setGeometry(QtCore.QRect(120, 100, 75, 23)) 38 | self.pushButton.setObjectName("pushButton") 39 | self.pushButton.clicked.connect(lambda: self.onclick_clear()) 40 | self.pushButton_2 = QtWidgets.QPushButton(self.centralwidget) 41 | self.pushButton_2.setGeometry(QtCore.QRect(250, 100, 75, 23)) 42 | self.pushButton_2.setObjectName("pushButton_2") 43 | # self.pushButton_2.clicked.connect(lambda : onclick1) 44 | self.plainTextEdit = QtWidgets.QPlainTextEdit(self.centralwidget) 45 | self.plainTextEdit.setGeometry(QtCore.QRect(100, 150, 251, 101)) 46 | self.plainTextEdit.setDocumentTitle("") 47 | self.plainTextEdit.setObjectName("plainTextEdit") 48 | self.plainTextEdit.setReadOnly(True) 49 | # self.plainTextEdit.insertPlainText("123\n") 50 | # self.plainTextEdit.insertPlainText("456") 51 | self.label_2 = QtWidgets.QLabel(self.centralwidget) 52 | self.label_2.setGeometry(QtCore.QRect(60, 190, 54, 12)) 53 | self.label_2.setObjectName("label_2") 54 | MainWindow.setCentralWidget(self.centralwidget) 55 | self.menubar = QtWidgets.QMenuBar(MainWindow) 56 | self.menubar.setGeometry(QtCore.QRect(0, 0, 438, 23)) 57 | self.menubar.setObjectName("menubar") 58 | self.menu = QtWidgets.QMenu(self.menubar) 59 | self.menu.setObjectName("menu") 60 | MainWindow.setMenuBar(self.menubar) 61 | self.statusbar = QtWidgets.QStatusBar(MainWindow) 62 | self.statusbar.setObjectName("statusbar") 63 | MainWindow.setStatusBar(self.statusbar) 64 | self.actionv1_0_by_DLWangSan = QtWidgets.QAction(MainWindow) 65 | self.actionv1_0_by_DLWangSan.setObjectName("actionv1_0_by_DLWangSan") 66 | self.menu.addAction(self.actionv1_0_by_DLWangSan) 67 | self.menubar.addAction(self.menu.menuAction()) 68 | 69 | self.retranslateUi(MainWindow) 70 | QtCore.QMetaObject.connectSlotsByName(MainWindow) 71 | 72 | def retranslateUi(self, MainWindow): 73 | _translate = QtCore.QCoreApplication.translate 74 | MainWindow.setWindowTitle(_translate("MainWindow", "开源抖音小工具V1")) 75 | self.label.setText(_translate("MainWindow", "分享链接")) 76 | self.pushButton.setText(_translate("MainWindow", "清空输入框")) 77 | self.pushButton_2.setText(_translate("MainWindow", "下载")) 78 | self.label_2.setText(_translate("MainWindow", "日志")) 79 | self.menu.setTitle(_translate("MainWindow", "版本")) 80 | self.actionv1_0_by_DLWangSan.setText(_translate("MainWindow", "v1.0 by DLWangSan")) 81 | --------------------------------------------------------------------------------